linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* a simple and scalable pNFS block layout server V2
@ 2015-01-22 11:09 Christoph Hellwig
  2015-01-22 11:09 ` [PATCH 04/20] nfsd: factor out a helper to decode nfstime4 values Christoph Hellwig
                   ` (8 more replies)
  0 siblings, 9 replies; 63+ messages in thread
From: Christoph Hellwig @ 2015-01-22 11:09 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Jeff Layton, linux-nfs, linux-fsdevel, xfs

This series adds support for the pNFS operations in NFS v4.1, as well
as a block layout driver that can export block based filesystems that
implement a few additional export operations.  Support for XFS is
provided in this series, but other filesystems could be added easily.

The core pNFS code of course owns its heritage to the existing Linux
pNFS server prototype, but except for a few bits and pieces in the
XDR path nothing is left from it.

The design of this new pNFS server is fairly different from the old
one - while the old one implemented very little semantics in nfsd
and left almost everything to filesystems my implementation implements
as much as possible in common nfsd code, then dispatches to a layout
driver that still is part of nfsd and only then calls into the
filesystem, thus keeping it free from intimate pNFS knowledge.

This version of the code has been rebased to the locks-for-3.20
tree which adds a new lock context structure to the inode.  For now
we still (ab)use the lease list like in the last version.  Adding
a pNFS-specific list would duplicate a lot of code without much
benefit.  But during this research I came up with way to associate
a nfs4_file with a struct file at open time which should allow
to greatly simplify the pNFS and delegation code.  So stay tuned
for some patches in this area!

For now this also doesn't take errata 3901 for rfc 5661 into account
yet and sticks to the verified errata.  The changes in 3901 seem
useful in the longer run one verified and will be implemented
eventually. Note that the only existing pNFS block client (Linux)
would not benefit from the longer layout lifetimes anyway.

More details are document in the individual patch descriptions and
code comments.

This code is also available from:

	git://git.infradead.org/users/hch/pnfs.git pnfsd-for-3.20-2

Changes since V1:
	- rebased to the locks-for-3.20 tree
	- fixed a memory leak in the nfsd4_encode_getdeviceinfo error path
	- removed the always one lg_roc field
	- added a nopnfs export option
	- use a synchronous transaction in layoutget to simplify recovery
	- various XFS fixes pointed out by Dave
	- various documentation typo fixes


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 01/20] nfs: add LAYOUT_TYPE_MAX enum value
       [not found] ` <1421925006-24231-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
@ 2015-01-22 11:09   ` Christoph Hellwig
  2015-01-22 11:09   ` [PATCH 02/20] fs: track fl_owner for leases Christoph Hellwig
                     ` (13 subsequent siblings)
  14 siblings, 0 replies; 63+ messages in thread
From: Christoph Hellwig @ 2015-01-22 11:09 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

This gives us a nice upper bound for later use in nfѕd.

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 include/linux/nfs4.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 022b761..8a3589c 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -516,6 +516,7 @@ enum pnfs_layouttype {
 	LAYOUT_NFSV4_1_FILES  = 1,
 	LAYOUT_OSD2_OBJECTS = 2,
 	LAYOUT_BLOCK_VOLUME = 3,
+	LAYOUT_TYPE_MAX
 };
 
 /* used for both layout return and recall */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 02/20] fs: track fl_owner for leases
       [not found] ` <1421925006-24231-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
  2015-01-22 11:09   ` [PATCH 01/20] nfs: add LAYOUT_TYPE_MAX enum value Christoph Hellwig
@ 2015-01-22 11:09   ` Christoph Hellwig
  2015-01-22 11:09   ` [PATCH 03/20] fs: add FL_LAYOUT lease type Christoph Hellwig
                     ` (12 subsequent siblings)
  14 siblings, 0 replies; 63+ messages in thread
From: Christoph Hellwig @ 2015-01-22 11:09 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

Just like for other lock types we should allow different owners to have
a read lease on a file.  Currently this can't happen, but with the addition
of pNFS layout leases we'll need this feature.

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 fs/locks.c          | 12 +++++++-----
 fs/nfsd/nfs4state.c |  2 +-
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 2fc36b3..65350a23 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1661,7 +1661,8 @@ generic_add_lease(struct file *filp, long arg, struct file_lock **flp, void **pr
 	 */
 	error = -EAGAIN;
 	list_for_each_entry(fl, &ctx->flc_lease, fl_list) {
-		if (fl->fl_file == filp) {
+		if (fl->fl_file == filp &&
+		    fl->fl_owner == lease->fl_owner) {
 			my_fl = fl;
 			continue;
 		}
@@ -1721,7 +1722,7 @@ out:
 	return error;
 }
 
-static int generic_delete_lease(struct file *filp)
+static int generic_delete_lease(struct file *filp, void *owner)
 {
 	int error = -EAGAIN;
 	struct file_lock *fl, *victim = NULL;
@@ -1737,7 +1738,8 @@ static int generic_delete_lease(struct file *filp)
 
 	spin_lock(&ctx->flc_lock);
 	list_for_each_entry(fl, &ctx->flc_lease, fl_list) {
-		if (fl->fl_file == filp) {
+		if (fl->fl_file == filp &&
+		    fl->fl_owner == owner) {
 			victim = fl;
 			break;
 		}
@@ -1778,7 +1780,7 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp,
 
 	switch (arg) {
 	case F_UNLCK:
-		return generic_delete_lease(filp);
+		return generic_delete_lease(filp, *priv);
 	case F_RDLCK:
 	case F_WRLCK:
 		if (!(*flp)->fl_lmops->lm_break) {
@@ -1857,7 +1859,7 @@ static int do_fcntl_add_lease(unsigned int fd, struct file *filp, long arg)
 int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
 {
 	if (arg == F_UNLCK)
-		return vfs_setlease(filp, F_UNLCK, NULL, NULL);
+		return vfs_setlease(filp, F_UNLCK, NULL, (void **)&filp);
 	return do_fcntl_add_lease(fd, filp, arg);
 }
 
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 532a60c..30ff0d4 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -693,7 +693,7 @@ static void nfs4_put_deleg_lease(struct nfs4_file *fp)
 	spin_unlock(&fp->fi_lock);
 
 	if (filp) {
-		vfs_setlease(filp, F_UNLCK, NULL, NULL);
+		vfs_setlease(filp, F_UNLCK, NULL, (void **)&fp);
 		fput(filp);
 	}
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 03/20] fs: add FL_LAYOUT lease type
       [not found] ` <1421925006-24231-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
  2015-01-22 11:09   ` [PATCH 01/20] nfs: add LAYOUT_TYPE_MAX enum value Christoph Hellwig
  2015-01-22 11:09   ` [PATCH 02/20] fs: track fl_owner for leases Christoph Hellwig
@ 2015-01-22 11:09   ` Christoph Hellwig
  2015-01-22 15:45     ` Jeff Layton
       [not found]     ` <1421925006-24231-4-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
  2015-01-22 11:09   ` [PATCH 06/20] nfsd: add fh_fsid_match helper Christoph Hellwig
                     ` (11 subsequent siblings)
  14 siblings, 2 replies; 63+ messages in thread
From: Christoph Hellwig @ 2015-01-22 11:09 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

This (ab-)uses the file locking code to allow filesystems to recall
outstanding pNFS layouts on a file.  This new lease type is similar but
not quite the same as FL_DELEG.  A FL_LAYOUT lease can always be granted,
an a per-filesystem lock (XFS iolock for the initial implementation)
ensures not FL_LAYOUT leases granted when we would need to recall them.

Also included are changes that allow multiple outstanding read
leases of different types on the same file as long as they have a
differnt owner.  This wasn't a problem until now as nfsd never set
FL_LEASE leases, and no one else used FL_DELEG leases, but given that
nfsd will also issues FL_LAYOUT leases we will have to handle it now.

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 fs/locks.c         | 14 ++++++++++----
 include/linux/fs.h | 16 ++++++++++++++++
 2 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 65350a23..6b9772d1 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -137,7 +137,7 @@
 
 #define IS_POSIX(fl)	(fl->fl_flags & FL_POSIX)
 #define IS_FLOCK(fl)	(fl->fl_flags & FL_FLOCK)
-#define IS_LEASE(fl)	(fl->fl_flags & (FL_LEASE|FL_DELEG))
+#define IS_LEASE(fl)	(fl->fl_flags & (FL_LEASE|FL_DELEG|FL_LAYOUT))
 #define IS_OFDLCK(fl)	(fl->fl_flags & FL_OFDLCK)
 
 static bool lease_breaking(struct file_lock *fl)
@@ -1371,6 +1371,8 @@ static void time_out_leases(struct inode *inode, struct list_head *dispose)
 
 static bool leases_conflict(struct file_lock *lease, struct file_lock *breaker)
 {
+	if ((breaker->fl_flags & FL_LAYOUT) != (lease->fl_flags & FL_LAYOUT))
+		return false;
 	if ((breaker->fl_flags & FL_DELEG) && (lease->fl_flags & FL_LEASE))
 		return false;
 	return locks_conflict(breaker, lease);
@@ -1594,11 +1596,14 @@ int fcntl_getlease(struct file *filp)
  * conflict with the lease we're trying to set.
  */
 static int
-check_conflicting_open(const struct dentry *dentry, const long arg)
+check_conflicting_open(const struct dentry *dentry, const long arg, int flags)
 {
 	int ret = 0;
 	struct inode *inode = dentry->d_inode;
 
+	if (flags & FL_LAYOUT)
+		return 0;
+
 	if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0))
 		return -EAGAIN;
 
@@ -1647,7 +1652,7 @@ generic_add_lease(struct file *filp, long arg, struct file_lock **flp, void **pr
 
 	spin_lock(&ctx->flc_lock);
 	time_out_leases(inode, &dispose);
-	error = check_conflicting_open(dentry, arg);
+	error = check_conflicting_open(dentry, arg, lease->fl_flags);
 	if (error)
 		goto out;
 
@@ -1703,7 +1708,7 @@ generic_add_lease(struct file *filp, long arg, struct file_lock **flp, void **pr
 	 * precedes these checks.
 	 */
 	smp_mb();
-	error = check_conflicting_open(dentry, arg);
+	error = check_conflicting_open(dentry, arg, lease->fl_flags);
 	if (error) {
 		locks_unlink_lock_ctx(lease, &ctx->flc_lease_cnt);
 		goto out;
@@ -1787,6 +1792,7 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp,
 			WARN_ON_ONCE(1);
 			return -ENOLCK;
 		}
+
 		return generic_add_lease(filp, arg, flp, priv);
 	default:
 		return -EINVAL;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f87cb2f..d5658f4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -875,6 +875,7 @@ static inline struct file *get_file(struct file *f)
 #define FL_DOWNGRADE_PENDING	256 /* Lease is being downgraded */
 #define FL_UNLOCK_PENDING	512 /* Lease is being broken */
 #define FL_OFDLCK	1024	/* lock is "owned" by struct file */
+#define FL_LAYOUT	2048	/* outstanding pNFS layout */
 
 /*
  * Special return value from posix_lock_file() and vfs_lock_file() for
@@ -2036,6 +2037,16 @@ static inline int break_deleg_wait(struct inode **delegated_inode)
 	return ret;
 }
 
+static inline int break_layout(struct inode *inode, bool wait)
+{
+	smp_mb();
+	if (inode->i_flctx && !list_empty_careful(&inode->i_flctx->flc_lease))
+		return __break_lease(inode,
+				wait ? O_WRONLY : O_WRONLY | O_NONBLOCK,
+				FL_LAYOUT);
+	return 0;
+}
+
 #else /* !CONFIG_FILE_LOCKING */
 static inline int locks_mandatory_locked(struct file *file)
 {
@@ -2091,6 +2102,11 @@ static inline int break_deleg_wait(struct inode **delegated_inode)
 	return 0;
 }
 
+static inline int break_layout(struct inode *inode, bool wait)
+{
+	return 0;
+}
+
 #endif /* CONFIG_FILE_LOCKING */
 
 /* fs/open.c */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 04/20] nfsd: factor out a helper to decode nfstime4 values
  2015-01-22 11:09 a simple and scalable pNFS block layout server V2 Christoph Hellwig
@ 2015-01-22 11:09 ` Christoph Hellwig
       [not found]   ` <1421925006-24231-5-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
  2015-01-22 11:09 ` [PATCH 05/20] nfsd: move nfsd_fh_match to nfsfh.h Christoph Hellwig
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2015-01-22 11:09 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Jeff Layton, linux-nfs, linux-fsdevel, xfs

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/nfsd/nfs4xdr.c | 43 ++++++++++++++++++++++++++-----------------
 1 file changed, 26 insertions(+), 17 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 15f7b73..884ffa3 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -234,6 +234,26 @@ static char *savemem(struct nfsd4_compoundargs *argp, __be32 *p, int nbytes)
 	return ret;
 }
 
+/*
+ * We require the high 32 bits of 'seconds' to be 0, and
+ * we ignore all 32 bits of 'nseconds'.
+ */
+static __be32
+nfsd4_decode_time(struct nfsd4_compoundargs *argp, struct timespec *tv)
+{
+	DECODE_HEAD;
+	u64 sec;
+
+	READ_BUF(12);
+	p = xdr_decode_hyper(p, &sec);
+	tv->tv_sec = sec;
+	tv->tv_nsec = be32_to_cpup(p++);
+	if (tv->tv_nsec >= (u32)1000000000)
+		return nfserr_inval;
+
+	DECODE_TAIL;
+}
+
 static __be32
 nfsd4_decode_bitmap(struct nfsd4_compoundargs *argp, u32 *bmval)
 {
@@ -267,7 +287,6 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval,
 {
 	int expected_len, len = 0;
 	u32 dummy32;
-	u64 sec;
 	char *buf;
 
 	DECODE_HEAD;
@@ -358,15 +377,10 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval,
 		dummy32 = be32_to_cpup(p++);
 		switch (dummy32) {
 		case NFS4_SET_TO_CLIENT_TIME:
-			/* We require the high 32 bits of 'seconds' to be 0, and we ignore
-			   all 32 bits of 'nseconds'. */
-			READ_BUF(12);
 			len += 12;
-			p = xdr_decode_hyper(p, &sec);
-			iattr->ia_atime.tv_sec = (time_t)sec;
-			iattr->ia_atime.tv_nsec = be32_to_cpup(p++);
-			if (iattr->ia_atime.tv_nsec >= (u32)1000000000)
-				return nfserr_inval;
+			status = nfsd4_decode_time(argp, &iattr->ia_atime);
+			if (status)
+				return status;
 			iattr->ia_valid |= (ATTR_ATIME | ATTR_ATIME_SET);
 			break;
 		case NFS4_SET_TO_SERVER_TIME:
@@ -382,15 +396,10 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval,
 		dummy32 = be32_to_cpup(p++);
 		switch (dummy32) {
 		case NFS4_SET_TO_CLIENT_TIME:
-			/* We require the high 32 bits of 'seconds' to be 0, and we ignore
-			   all 32 bits of 'nseconds'. */
-			READ_BUF(12);
 			len += 12;
-			p = xdr_decode_hyper(p, &sec);
-			iattr->ia_mtime.tv_sec = sec;
-			iattr->ia_mtime.tv_nsec = be32_to_cpup(p++);
-			if (iattr->ia_mtime.tv_nsec >= (u32)1000000000)
-				return nfserr_inval;
+			status = nfsd4_decode_time(argp, &iattr->ia_mtime);
+			if (status)
+				return status;
 			iattr->ia_valid |= (ATTR_MTIME | ATTR_MTIME_SET);
 			break;
 		case NFS4_SET_TO_SERVER_TIME:
-- 
1.9.1


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 05/20] nfsd: move nfsd_fh_match to nfsfh.h
  2015-01-22 11:09 a simple and scalable pNFS block layout server V2 Christoph Hellwig
  2015-01-22 11:09 ` [PATCH 04/20] nfsd: factor out a helper to decode nfstime4 values Christoph Hellwig
@ 2015-01-22 11:09 ` Christoph Hellwig
  2015-01-22 11:09 ` [PATCH 09/20] nfsd: make find_any_file available outside nfs4state.c Christoph Hellwig
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 63+ messages in thread
From: Christoph Hellwig @ 2015-01-22 11:09 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-fsdevel, linux-nfs, Jeff Layton, xfs

The pnfs code will need it too.  Also remove the nfsd_ prefix to match the
other filehandle helpers in that file.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/nfsd/nfs4state.c | 12 ++----------
 fs/nfsd/nfsfh.h     |  9 +++++++++
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 30ff0d4..b4f86f8 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -408,14 +408,6 @@ static unsigned int file_hashval(struct knfsd_fh *fh)
 	return nfsd_fh_hashval(fh) & (FILE_HASH_SIZE - 1);
 }
 
-static bool nfsd_fh_match(struct knfsd_fh *fh1, struct knfsd_fh *fh2)
-{
-	return fh1->fh_size == fh2->fh_size &&
-		!memcmp(fh1->fh_base.fh_pad,
-				fh2->fh_base.fh_pad,
-				fh1->fh_size);
-}
-
 static struct hlist_head file_hashtbl[FILE_HASH_SIZE];
 
 static void
@@ -3300,7 +3292,7 @@ find_file_locked(struct knfsd_fh *fh, unsigned int hashval)
 	struct nfs4_file *fp;
 
 	hlist_for_each_entry_rcu(fp, &file_hashtbl[hashval], fi_hash) {
-		if (nfsd_fh_match(&fp->fi_fhandle, fh)) {
+		if (fh_match(&fp->fi_fhandle, fh)) {
 			if (atomic_inc_not_zero(&fp->fi_ref))
 				return fp;
 		}
@@ -4295,7 +4287,7 @@ laundromat_main(struct work_struct *laundry)
 
 static inline __be32 nfs4_check_fh(struct svc_fh *fhp, struct nfs4_ol_stateid *stp)
 {
-	if (!nfsd_fh_match(&fhp->fh_handle, &stp->st_stid.sc_file->fi_fhandle))
+	if (!fh_match(&fhp->fh_handle, &stp->st_stid.sc_file->fi_fhandle))
 		return nfserr_bad_stateid;
 	return nfs_ok;
 }
diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
index 08236d7..e24d954 100644
--- a/fs/nfsd/nfsfh.h
+++ b/fs/nfsd/nfsfh.h
@@ -187,6 +187,15 @@ fh_init(struct svc_fh *fhp, int maxsize)
 	return fhp;
 }
 
+static inline bool fh_match(struct knfsd_fh *fh1, struct knfsd_fh *fh2)
+{
+	if (fh1->fh_size != fh2->fh_size)
+		return false;
+	if (memcmp(fh1->fh_base.fh_pad, fh2->fh_base.fh_pad, fh1->fh_size) != 0)
+		return false;
+	return true;
+}
+
 #ifdef CONFIG_NFSD_V3
 /*
  * The wcc data stored in current_fh should be cleared
-- 
1.9.1

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 06/20] nfsd: add fh_fsid_match helper
       [not found] ` <1421925006-24231-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
                     ` (2 preceding siblings ...)
  2015-01-22 11:09   ` [PATCH 03/20] fs: add FL_LAYOUT lease type Christoph Hellwig
@ 2015-01-22 11:09   ` Christoph Hellwig
  2015-01-22 11:09   ` [PATCH 07/20] nfsd: make lookup/alloc/unhash_stid available outside nfs4state.c Christoph Hellwig
                     ` (10 subsequent siblings)
  14 siblings, 0 replies; 63+ messages in thread
From: Christoph Hellwig @ 2015-01-22 11:09 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

Add a helper to check that the fsid parts of two file handles match.

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 fs/nfsd/nfsfh.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
index e24d954..84cae20 100644
--- a/fs/nfsd/nfsfh.h
+++ b/fs/nfsd/nfsfh.h
@@ -196,6 +196,15 @@ static inline bool fh_match(struct knfsd_fh *fh1, struct knfsd_fh *fh2)
 	return true;
 }
 
+static inline bool fh_fsid_match(struct knfsd_fh *fh1, struct knfsd_fh *fh2)
+{
+	if (fh1->fh_fsid_type != fh2->fh_fsid_type)
+		return false;
+	if (memcmp(fh1->fh_fsid, fh2->fh_fsid, key_len(fh1->fh_fsid_type) != 0))
+		return false;
+	return true;
+}
+
 #ifdef CONFIG_NFSD_V3
 /*
  * The wcc data stored in current_fh should be cleared
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 07/20] nfsd: make lookup/alloc/unhash_stid available outside nfs4state.c
       [not found] ` <1421925006-24231-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
                     ` (3 preceding siblings ...)
  2015-01-22 11:09   ` [PATCH 06/20] nfsd: add fh_fsid_match helper Christoph Hellwig
@ 2015-01-22 11:09   ` Christoph Hellwig
  2015-01-22 11:09   ` [PATCH 08/20] nfsd: make find/get/put file " Christoph Hellwig
                     ` (9 subsequent siblings)
  14 siblings, 0 replies; 63+ messages in thread
From: Christoph Hellwig @ 2015-01-22 11:09 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 fs/nfsd/nfs4state.c | 8 ++++----
 fs/nfsd/state.h     | 6 ++++++
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index b4f86f8..7515207 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -486,7 +486,7 @@ static void nfs4_file_put_access(struct nfs4_file *fp, u32 access)
 		__nfs4_file_put_access(fp, O_RDONLY);
 }
 
-static struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl,
+struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl,
 					 struct kmem_cache *slab)
 {
 	struct nfs4_stid *stid;
@@ -690,7 +690,7 @@ static void nfs4_put_deleg_lease(struct nfs4_file *fp)
 	}
 }
 
-static void unhash_stid(struct nfs4_stid *s)
+void nfs4_unhash_stid(struct nfs4_stid *s)
 {
 	s->sc_type = 0;
 }
@@ -998,7 +998,7 @@ static void unhash_lock_stateid(struct nfs4_ol_stateid *stp)
 
 	list_del_init(&stp->st_locks);
 	unhash_ol_stateid(stp);
-	unhash_stid(&stp->st_stid);
+	nfs4_unhash_stid(&stp->st_stid);
 }
 
 static void release_lock_stateid(struct nfs4_ol_stateid *stp)
@@ -4438,7 +4438,7 @@ out_unlock:
 	return status;
 }
 
-static __be32
+__be32
 nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
 		     stateid_t *stateid, unsigned char typemask,
 		     struct nfs4_stid **s, struct nfsd_net *nn)
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 9d3be37..534983a 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -545,6 +545,12 @@ struct nfsd_net;
 extern __be32 nfs4_preprocess_stateid_op(struct net *net,
 		struct nfsd4_compound_state *cstate,
 		stateid_t *stateid, int flags, struct file **filp);
+__be32 nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
+		     stateid_t *stateid, unsigned char typemask,
+		     struct nfs4_stid **s, struct nfsd_net *nn);
+struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl,
+		struct kmem_cache *slab);
+void nfs4_unhash_stid(struct nfs4_stid *s);
 void nfs4_put_stid(struct nfs4_stid *s);
 void nfs4_remove_reclaim_record(struct nfs4_client_reclaim *, struct nfsd_net *);
 extern void nfs4_release_reclaim(struct nfsd_net *);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 08/20] nfsd: make find/get/put file available outside nfs4state.c
       [not found] ` <1421925006-24231-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
                     ` (4 preceding siblings ...)
  2015-01-22 11:09   ` [PATCH 07/20] nfsd: make lookup/alloc/unhash_stid available outside nfs4state.c Christoph Hellwig
@ 2015-01-22 11:09   ` Christoph Hellwig
  2015-01-22 11:09   ` [PATCH 10/20] nfsd: implement pNFS operations Christoph Hellwig
                     ` (8 subsequent siblings)
  14 siblings, 0 replies; 63+ messages in thread
From: Christoph Hellwig @ 2015-01-22 11:09 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 fs/nfsd/nfs4state.c | 10 ++--------
 fs/nfsd/state.h     |  7 +++++++
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 7515207..c31b8f8 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -282,7 +282,7 @@ static void nfsd4_free_file_rcu(struct rcu_head *rcu)
 	kmem_cache_free(file_slab, fp);
 }
 
-static inline void
+void
 put_nfs4_file(struct nfs4_file *fi)
 {
 	might_lock(&state_lock);
@@ -295,12 +295,6 @@ put_nfs4_file(struct nfs4_file *fi)
 	}
 }
 
-static inline void
-get_nfs4_file(struct nfs4_file *fi)
-{
-	atomic_inc(&fi->fi_ref);
-}
-
 static struct file *
 __nfs4_get_fd(struct nfs4_file *f, int oflag)
 {
@@ -3300,7 +3294,7 @@ find_file_locked(struct knfsd_fh *fh, unsigned int hashval)
 	return NULL;
 }
 
-static struct nfs4_file *
+struct nfs4_file *
 find_file(struct knfsd_fh *fh)
 {
 	struct nfs4_file *fp;
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 534983a..cd61e7d 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -573,6 +573,13 @@ extern struct nfs4_client_reclaim *nfs4_client_to_reclaim(const char *name,
 							struct nfsd_net *nn);
 extern bool nfs4_has_reclaimed_state(const char *name, struct nfsd_net *nn);
 
+struct nfs4_file *find_file(struct knfsd_fh *fh);
+void put_nfs4_file(struct nfs4_file *fi);
+static inline void get_nfs4_file(struct nfs4_file *fi)
+{
+	atomic_inc(&fi->fi_ref);
+}
+
 /* grace period management */
 void nfsd4_end_grace(struct nfsd_net *nn);
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 09/20] nfsd: make find_any_file available outside nfs4state.c
  2015-01-22 11:09 a simple and scalable pNFS block layout server V2 Christoph Hellwig
  2015-01-22 11:09 ` [PATCH 04/20] nfsd: factor out a helper to decode nfstime4 values Christoph Hellwig
  2015-01-22 11:09 ` [PATCH 05/20] nfsd: move nfsd_fh_match to nfsfh.h Christoph Hellwig
@ 2015-01-22 11:09 ` Christoph Hellwig
  2015-01-22 11:09 ` [PATCH 12/20] nfsd: update documentation for pNFS support Christoph Hellwig
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 63+ messages in thread
From: Christoph Hellwig @ 2015-01-22 11:09 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Jeff Layton, linux-nfs, linux-fsdevel, xfs

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/nfsd/nfs4state.c | 2 +-
 fs/nfsd/state.h     | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index c31b8f8..444d43c 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -352,7 +352,7 @@ find_readable_file(struct nfs4_file *f)
 	return ret;
 }
 
-static struct file *
+struct file *
 find_any_file(struct nfs4_file *f)
 {
 	struct file *ret;
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index cd61e7d..18b5ab8 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -579,6 +579,7 @@ static inline void get_nfs4_file(struct nfs4_file *fi)
 {
 	atomic_inc(&fi->fi_ref);
 }
+struct file *find_any_file(struct nfs4_file *f);
 
 /* grace period management */
 void nfsd4_end_grace(struct nfsd_net *nn);
-- 
1.9.1


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 10/20] nfsd: implement pNFS operations
       [not found] ` <1421925006-24231-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
                     ` (5 preceding siblings ...)
  2015-01-22 11:09   ` [PATCH 08/20] nfsd: make find/get/put file " Christoph Hellwig
@ 2015-01-22 11:09   ` Christoph Hellwig
       [not found]     ` <1421925006-24231-11-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
  2015-01-22 11:09   ` [PATCH 11/20] nfsd: implement pNFS layout recalls Christoph Hellwig
                     ` (7 subsequent siblings)
  14 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2015-01-22 11:09 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

Add support for the GETDEVICEINFO, LAYOUTGET, LAYOUTCOMMIT and
LAYOUTRETURN NFSv4.1 operations, as well as backing code to manage
outstanding layouts and devices.

Layout management is very straight forward, with a nfs4_layout_stateid
structure that extends nfs4_stid to manage layout stateids as the
top-level structure.  It is linked into the nfs4_file and nfs4_client
structures like the other stateids, and contains a linked list of
layouts that hang of the stateid.  The actual layout operations are
implemented in layout drivers that are not part of this commit, but
will be added later.

The worst part of this commit is the management of the pNFS device IDs,
which suffers from a specification that is not sanely implementable due
to the fact that the device-IDs are global and not bound to an export,
and have a small enough size so that we can't store the fsid portion of
a file handle, and must never be reused.  As we still do need perform all
export authentication and validation checks on a device ID passed to
GETDEVICEINFO we are caught between a rock and a hard place.  To work
around this issue we add a new hash that maps from a 64-bit integer to a
fsid so that we can look up the export to authenticate against it,
a 32-bit integer as a generation that we can bump when changing the device,
and a currently unused 32-bit integer that could be used in the future
to handle more than a single device per export.  Entries in this hash
table are never deleted as we can't reuse the ids anyway, and would have
a severe lifetime problem anyway as Linux export structures are temporary
structures that can go away under load.

Parts of the XDR data, structures and marshaling/unmarshaling code, as
well as many concepts are derived from the old pNFS server implementation
from Andy Adamson, Benny Halevy, Dean Hildebrand, Marc Eshel, Fred Isaman,
Mike Sager, Ricardo Labiaga and many others.

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 fs/nfsd/Kconfig                  |  10 +
 fs/nfsd/Makefile                 |   1 +
 fs/nfsd/export.c                 |   8 +
 fs/nfsd/export.h                 |   2 +
 fs/nfsd/nfs4layouts.c            | 488 +++++++++++++++++++++++++++++++++++++++
 fs/nfsd/nfs4proc.c               | 266 +++++++++++++++++++++
 fs/nfsd/nfs4state.c              |  16 +-
 fs/nfsd/nfs4xdr.c                | 312 +++++++++++++++++++++++++
 fs/nfsd/nfsctl.c                 |   9 +-
 fs/nfsd/nfsd.h                   |  16 +-
 fs/nfsd/pnfs.h                   |  80 +++++++
 fs/nfsd/state.h                  |  21 ++
 fs/nfsd/xdr4.h                   |  59 +++++
 include/linux/nfs4.h             |   1 +
 include/uapi/linux/nfsd/debug.h  |   1 +
 include/uapi/linux/nfsd/export.h |   4 +-
 16 files changed, 1289 insertions(+), 5 deletions(-)
 create mode 100644 fs/nfsd/nfs4layouts.c
 create mode 100644 fs/nfsd/pnfs.h

diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
index 7339515..683bf71 100644
--- a/fs/nfsd/Kconfig
+++ b/fs/nfsd/Kconfig
@@ -82,6 +82,16 @@ config NFSD_V4
 
 	  If unsure, say N.
 
+config NFSD_PNFS
+	bool "NFSv4.1 server support for Parallel NFS (pNFS)"
+	depends on NFSD_V4
+	help
+	  This option enables support for the parallel NFS features of the
+	  minor version 1 of the NFSv4 protocol (RFC5661) in the kernel's NFS
+	  server.
+
+	  If unsure, say N.
+
 config NFSD_V4_SECURITY_LABEL
 	bool "Provide Security Label support for NFSv4 server"
 	depends on NFSD_V4 && SECURITY
diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
index af32ef0..5806270 100644
--- a/fs/nfsd/Makefile
+++ b/fs/nfsd/Makefile
@@ -12,3 +12,4 @@ nfsd-$(CONFIG_NFSD_V3)	+= nfs3proc.o nfs3xdr.o
 nfsd-$(CONFIG_NFSD_V3_ACL) += nfs3acl.o
 nfsd-$(CONFIG_NFSD_V4)	+= nfs4proc.o nfs4xdr.o nfs4state.o nfs4idmap.o \
 			   nfs4acl.o nfs4callback.o nfs4recover.o
+nfsd-$(CONFIG_NFSD_PNFS) += nfs4layouts.o
diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index 30a739d..c3e3b6e 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -20,6 +20,7 @@
 #include "nfsd.h"
 #include "nfsfh.h"
 #include "netns.h"
+#include "pnfs.h"
 
 #define NFSDDBG_FACILITY	NFSDDBG_EXPORT
 
@@ -545,6 +546,7 @@ static int svc_export_parse(struct cache_detail *cd, char *mesg, int mlen)
 
 	exp.ex_client = dom;
 	exp.cd = cd;
+	exp.ex_devid_map = NULL;
 
 	/* expiry */
 	err = -EINVAL;
@@ -621,6 +623,8 @@ static int svc_export_parse(struct cache_detail *cd, char *mesg, int mlen)
 		if (!gid_valid(exp.ex_anon_gid))
 			goto out4;
 		err = 0;
+
+		nfsd4_setup_layout_type(&exp);
 	}
 
 	expp = svc_export_lookup(&exp);
@@ -703,6 +707,7 @@ static void svc_export_init(struct cache_head *cnew, struct cache_head *citem)
 	new->ex_fslocs.locations = NULL;
 	new->ex_fslocs.locations_count = 0;
 	new->ex_fslocs.migrated = 0;
+	new->ex_layout_type = 0;
 	new->ex_uuid = NULL;
 	new->cd = item->cd;
 }
@@ -717,6 +722,8 @@ static void export_update(struct cache_head *cnew, struct cache_head *citem)
 	new->ex_anon_uid = item->ex_anon_uid;
 	new->ex_anon_gid = item->ex_anon_gid;
 	new->ex_fsid = item->ex_fsid;
+	new->ex_devid_map = item->ex_devid_map;
+	item->ex_devid_map = NULL;
 	new->ex_uuid = item->ex_uuid;
 	item->ex_uuid = NULL;
 	new->ex_fslocs.locations = item->ex_fslocs.locations;
@@ -725,6 +732,7 @@ static void export_update(struct cache_head *cnew, struct cache_head *citem)
 	item->ex_fslocs.locations_count = 0;
 	new->ex_fslocs.migrated = item->ex_fslocs.migrated;
 	item->ex_fslocs.migrated = 0;
+	new->ex_layout_type = item->ex_layout_type;
 	new->ex_nflavors = item->ex_nflavors;
 	for (i = 0; i < MAX_SECINFO_LIST; i++) {
 		new->ex_flavors[i] = item->ex_flavors[i];
diff --git a/fs/nfsd/export.h b/fs/nfsd/export.h
index 04dc8c1..1f52bfc 100644
--- a/fs/nfsd/export.h
+++ b/fs/nfsd/export.h
@@ -56,6 +56,8 @@ struct svc_export {
 	struct nfsd4_fs_locations ex_fslocs;
 	uint32_t		ex_nflavors;
 	struct exp_flavor_info	ex_flavors[MAX_SECINFO_LIST];
+	enum pnfs_layouttype	ex_layout_type;
+	struct nfsd4_deviceid_map *ex_devid_map;
 	struct cache_detail	*cd;
 };
 
diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
new file mode 100644
index 0000000..28c8ff2
--- /dev/null
+++ b/fs/nfsd/nfs4layouts.c
@@ -0,0 +1,488 @@
+/*
+ * Copyright (c) 2014 Christoph Hellwig.
+ */
+#include <linux/jhash.h>
+#include <linux/sched.h>
+
+#include "pnfs.h"
+#include "netns.h"
+
+#define NFSDDBG_FACILITY                NFSDDBG_PNFS
+
+struct nfs4_layout {
+	struct list_head		lo_perstate;
+	struct nfs4_layout_stateid	*lo_state;
+	struct nfsd4_layout_seg		lo_seg;
+};
+
+static struct kmem_cache *nfs4_layout_cache;
+static struct kmem_cache *nfs4_layout_stateid_cache;
+
+const struct nfsd4_layout_ops *nfsd4_layout_ops[LAYOUT_TYPE_MAX] =  {
+};
+
+/* pNFS device ID to export fsid mapping */
+#define DEVID_HASH_BITS	8
+#define DEVID_HASH_SIZE	(1 << DEVID_HASH_BITS)
+#define DEVID_HASH_MASK	(DEVID_HASH_SIZE - 1)
+static u64 nfsd_devid_seq = 1;
+static struct list_head nfsd_devid_hash[DEVID_HASH_SIZE];
+static DEFINE_SPINLOCK(nfsd_devid_lock);
+
+static inline u32 devid_hashfn(u64 idx)
+{
+	return jhash_2words(idx, idx >> 32, 0) & DEVID_HASH_MASK;
+}
+
+static void
+nfsd4_alloc_devid_map(const struct svc_fh *fhp)
+{
+	const struct knfsd_fh *fh = &fhp->fh_handle;
+	size_t fsid_len = key_len(fh->fh_fsid_type);
+	struct nfsd4_deviceid_map *map, *old;
+	int i;
+
+	map = kzalloc(sizeof(*map) + fsid_len, GFP_KERNEL);
+	if (!map)
+		return;
+
+	map->fsid_type = fh->fh_fsid_type;
+	memcpy(&map->fsid, fh->fh_fsid, fsid_len);
+
+	spin_lock(&nfsd_devid_lock);
+	if (fhp->fh_export->ex_devid_map)
+		goto out_unlock;
+
+	for (i = 0; i < DEVID_HASH_SIZE; i++) {
+		list_for_each_entry(old, &nfsd_devid_hash[i], hash) {
+			if (old->fsid_type != fh->fh_fsid_type)
+				continue;
+			if (memcmp(old->fsid, fh->fh_fsid,
+					key_len(old->fsid_type)))
+				continue;
+
+			fhp->fh_export->ex_devid_map = old;
+			goto out_unlock;
+		}
+	}
+
+	map->idx = nfsd_devid_seq++;
+	list_add_tail_rcu(&map->hash, &nfsd_devid_hash[devid_hashfn(map->idx)]);
+	fhp->fh_export->ex_devid_map = map;
+	map = NULL;
+
+out_unlock:
+	spin_unlock(&nfsd_devid_lock);
+	if (map)
+		kfree(map);
+}
+
+struct nfsd4_deviceid_map *
+nfsd4_find_devid_map(int idx)
+{
+	struct nfsd4_deviceid_map *map, *ret = NULL;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(map, &nfsd_devid_hash[devid_hashfn(idx)], hash)
+		if (map->idx == idx)
+			ret = map;
+	rcu_read_unlock();
+
+	return ret;
+}
+
+int
+nfsd4_set_deviceid(struct nfsd4_deviceid *id, const struct svc_fh *fhp,
+		u32 device_generation)
+{
+	if (!fhp->fh_export->ex_devid_map) {
+		nfsd4_alloc_devid_map(fhp);
+		if (!fhp->fh_export->ex_devid_map)
+			return -ENOMEM;
+	}
+
+	id->fsid_idx = fhp->fh_export->ex_devid_map->idx;
+	id->generation = device_generation;
+	id->pad = 0;
+	return 0;
+}
+
+void nfsd4_setup_layout_type(struct svc_export *exp)
+{
+	if (exp->ex_flags & NFSEXP_NOPNFS)
+		return;
+}
+
+static void
+nfsd4_free_layout_stateid(struct nfs4_stid *stid)
+{
+	struct nfs4_layout_stateid *ls = layoutstateid(stid);
+	struct nfs4_client *clp = ls->ls_stid.sc_client;
+	struct nfs4_file *fp = ls->ls_stid.sc_file;
+
+	spin_lock(&clp->cl_lock);
+	list_del_init(&ls->ls_perclnt);
+	spin_unlock(&clp->cl_lock);
+
+	spin_lock(&fp->fi_lock);
+	list_del_init(&ls->ls_perfile);
+	spin_unlock(&fp->fi_lock);
+
+	kmem_cache_free(nfs4_layout_stateid_cache, ls);
+}
+
+static struct nfs4_layout_stateid *
+nfsd4_alloc_layout_stateid(struct nfsd4_compound_state *cstate,
+		struct nfs4_stid *parent, u32 layout_type)
+{
+	struct nfs4_client *clp = cstate->clp;
+	struct nfs4_file *fp = parent->sc_file;
+	struct nfs4_layout_stateid *ls;
+	struct nfs4_stid *stp;
+
+	stp = nfs4_alloc_stid(cstate->clp, nfs4_layout_stateid_cache);
+	if (!stp)
+		return NULL;
+	stp->sc_free = nfsd4_free_layout_stateid;
+	get_nfs4_file(fp);
+	stp->sc_file = fp;
+
+	ls = layoutstateid(stp);
+	INIT_LIST_HEAD(&ls->ls_perclnt);
+	INIT_LIST_HEAD(&ls->ls_perfile);
+	spin_lock_init(&ls->ls_lock);
+	INIT_LIST_HEAD(&ls->ls_layouts);
+	ls->ls_layout_type = layout_type;
+
+	spin_lock(&clp->cl_lock);
+	stp->sc_type = NFS4_LAYOUT_STID;
+	list_add(&ls->ls_perclnt, &clp->cl_lo_states);
+	spin_unlock(&clp->cl_lock);
+
+	spin_lock(&fp->fi_lock);
+	list_add(&ls->ls_perfile, &fp->fi_lo_states);
+	spin_unlock(&fp->fi_lock);
+
+	return ls;
+}
+
+__be32
+nfsd4_preprocess_layout_stateid(struct svc_rqst *rqstp,
+		struct nfsd4_compound_state *cstate, stateid_t *stateid,
+		bool create, u32 layout_type, struct nfs4_layout_stateid **lsp)
+{
+	struct nfs4_layout_stateid *ls;
+	struct nfs4_stid *stid;
+	unsigned char typemask = NFS4_LAYOUT_STID;
+	__be32 status;
+
+	if (create)
+		typemask |= (NFS4_OPEN_STID | NFS4_LOCK_STID | NFS4_DELEG_STID);
+
+	status = nfsd4_lookup_stateid(cstate, stateid, typemask, &stid,
+			net_generic(SVC_NET(rqstp), nfsd_net_id));
+	if (status)
+		goto out;
+
+	if (!fh_match(&cstate->current_fh.fh_handle,
+		      &stid->sc_file->fi_fhandle)) {
+		status = nfserr_bad_stateid;
+		goto out_put_stid;
+	}
+
+	if (stid->sc_type != NFS4_LAYOUT_STID) {
+		ls = nfsd4_alloc_layout_stateid(cstate, stid, layout_type);
+		nfs4_put_stid(stid);
+
+		status = nfserr_jukebox;
+		if (!ls)
+			goto out;
+	} else {
+		ls = container_of(stid, struct nfs4_layout_stateid, ls_stid);
+
+		status = nfserr_bad_stateid;
+		if (stateid->si_generation > stid->sc_stateid.si_generation)
+			goto out_put_stid;
+		if (layout_type != ls->ls_layout_type)
+			goto out_put_stid;
+	}
+
+	*lsp = ls;
+	return 0;
+
+out_put_stid:
+	nfs4_put_stid(stid);
+out:
+	return status;
+}
+
+static inline u64
+layout_end(struct nfsd4_layout_seg *seg)
+{
+	u64 end = seg->offset + seg->length;
+	return end >= seg->offset ? seg->length : NFS4_MAX_UINT64;
+}
+
+static void
+layout_update_len(struct nfsd4_layout_seg *lo, u64 end)
+{
+	if (end == NFS4_MAX_UINT64)
+		lo->length = NFS4_MAX_UINT64;
+	else
+		lo->length = end - lo->offset;
+}
+
+static bool
+layouts_overlapping(struct nfs4_layout *lo, struct nfsd4_layout_seg *s)
+{
+	if (s->iomode != IOMODE_ANY && s->iomode != lo->lo_seg.iomode)
+		return false;
+	if (layout_end(&lo->lo_seg) <= s->offset)
+		return false;
+	if (layout_end(s) <= lo->lo_seg.offset)
+		return false;
+	return true;
+}
+
+static bool
+layouts_try_merge(struct nfsd4_layout_seg *lo, struct nfsd4_layout_seg *new)
+{
+	if (lo->iomode != new->iomode)
+		return false;
+	if (layout_end(new) < lo->offset)
+		return false;
+	if (layout_end(lo) < new->offset)
+		return false;
+
+	lo->offset = min(lo->offset, new->offset);
+	layout_update_len(lo, max(layout_end(lo), layout_end(new)));
+	return true;
+}
+
+__be32
+nfsd4_insert_layout(struct nfsd4_layoutget *lgp, struct nfs4_layout_stateid *ls)
+{
+	struct nfsd4_layout_seg *seg = &lgp->lg_seg;
+	struct nfs4_layout *lp, *new = NULL;
+
+	spin_lock(&ls->ls_lock);
+	list_for_each_entry(lp, &ls->ls_layouts, lo_perstate) {
+		if (layouts_try_merge(&lp->lo_seg, seg))
+			goto done;
+	}
+	spin_unlock(&ls->ls_lock);
+
+	new = kmem_cache_alloc(nfs4_layout_cache, GFP_KERNEL);
+	if (!new)
+		return nfserr_jukebox;
+	memcpy(&new->lo_seg, seg, sizeof(lp->lo_seg));
+	new->lo_state = ls;
+
+	spin_lock(&ls->ls_lock);
+	list_for_each_entry(lp, &ls->ls_layouts, lo_perstate) {
+		if (layouts_try_merge(&lp->lo_seg, seg))
+			goto done;
+	}
+
+	atomic_inc(&ls->ls_stid.sc_count);
+	list_add_tail(&new->lo_perstate, &ls->ls_layouts);
+	new = NULL;
+done:
+	update_stateid(&ls->ls_stid.sc_stateid);
+	memcpy(&lgp->lg_sid, &ls->ls_stid.sc_stateid, sizeof(stateid_t));
+	spin_unlock(&ls->ls_lock);
+	if (new)
+		kmem_cache_free(nfs4_layout_cache, new);
+	return nfs_ok;
+}
+
+static void
+nfsd4_free_layouts(struct list_head *reaplist)
+{
+	while (!list_empty(reaplist)) {
+		struct nfs4_layout *lp = list_first_entry(reaplist,
+				struct nfs4_layout, lo_perstate);
+
+		list_del(&lp->lo_perstate);
+		nfs4_put_stid(&lp->lo_state->ls_stid);
+		kmem_cache_free(nfs4_layout_cache, lp);
+	}
+}
+
+static void
+nfsd4_return_file_layout(struct nfs4_layout *lp, struct nfsd4_layout_seg *seg,
+		struct list_head *reaplist)
+{
+	struct nfsd4_layout_seg *lo = &lp->lo_seg;
+	u64 end = layout_end(lo);
+
+	if (seg->offset <= lo->offset) {
+		if (layout_end(seg) >= end) {
+			list_move_tail(&lp->lo_perstate, reaplist);
+			return;
+		}
+		end = seg->offset;
+	} else {
+		/* retain the whole layout segment on a split. */
+		if (layout_end(seg) < end) {
+			dprintk("%s: split not supported\n", __func__);
+			return;
+		}
+
+		lo->offset = layout_end(seg);
+	}
+
+	layout_update_len(lo, end);
+}
+
+__be32
+nfsd4_return_file_layouts(struct svc_rqst *rqstp,
+		struct nfsd4_compound_state *cstate,
+		struct nfsd4_layoutreturn *lrp)
+{
+	struct nfs4_layout_stateid *ls;
+	struct nfs4_layout *lp, *n;
+	LIST_HEAD(reaplist);
+	__be32 nfserr;
+	int found = 0;
+
+	nfserr = nfsd4_preprocess_layout_stateid(rqstp, cstate, &lrp->lr_sid,
+						false, lrp->lr_layout_type,
+						&ls);
+	if (nfserr)
+		return nfserr;
+
+	spin_lock(&ls->ls_lock);
+	list_for_each_entry_safe(lp, n, &ls->ls_layouts, lo_perstate) {
+		if (layouts_overlapping(lp, &lrp->lr_seg)) {
+			nfsd4_return_file_layout(lp, &lrp->lr_seg, &reaplist);
+			found++;
+		}
+	}
+	if (!list_empty(&ls->ls_layouts)) {
+		if (found) {
+			update_stateid(&ls->ls_stid.sc_stateid);
+			memcpy(&lrp->lr_sid, &ls->ls_stid.sc_stateid,
+				sizeof(stateid_t));
+		}
+		lrp->lrs_present = 1;
+	} else {
+		nfs4_unhash_stid(&ls->ls_stid);
+		lrp->lrs_present = 0;
+	}
+	spin_unlock(&ls->ls_lock);
+
+	nfs4_put_stid(&ls->ls_stid);
+	nfsd4_free_layouts(&reaplist);
+	return nfs_ok;
+}
+
+__be32
+nfsd4_return_client_layouts(struct svc_rqst *rqstp,
+		struct nfsd4_compound_state *cstate,
+		struct nfsd4_layoutreturn *lrp)
+{
+	struct nfs4_layout_stateid *ls, *n;
+	struct nfs4_client *clp = cstate->clp;
+	struct nfs4_layout *lp, *t;
+	LIST_HEAD(reaplist);
+
+	lrp->lrs_present = 0;
+
+	spin_lock(&clp->cl_lock);
+	list_for_each_entry_safe(ls, n, &clp->cl_lo_states, ls_perclnt) {
+		if (lrp->lr_return_type == RETURN_FSID &&
+		    !fh_fsid_match(&ls->ls_stid.sc_file->fi_fhandle,
+				   &cstate->current_fh.fh_handle))
+			continue;
+
+		spin_lock(&ls->ls_lock);
+		list_for_each_entry_safe(lp, t, &ls->ls_layouts, lo_perstate) {
+			if (lrp->lr_seg.iomode == IOMODE_ANY ||
+			    lrp->lr_seg.iomode == lp->lo_seg.iomode)
+				list_move_tail(&lp->lo_perstate, &reaplist);
+		}
+		spin_unlock(&ls->ls_lock);
+	}
+	spin_unlock(&clp->cl_lock);
+
+	nfsd4_free_layouts(&reaplist);
+	return 0;
+}
+
+static void
+nfsd4_return_all_layouts(struct nfs4_layout_stateid *ls,
+		struct list_head *reaplist)
+{
+	spin_lock(&ls->ls_lock);
+	list_splice_init(&ls->ls_layouts, reaplist);
+	spin_unlock(&ls->ls_lock);
+}
+
+void
+nfsd4_return_all_client_layouts(struct nfs4_client *clp)
+{
+	struct nfs4_layout_stateid *ls, *n;
+	LIST_HEAD(reaplist);
+
+	spin_lock(&clp->cl_lock);
+	list_for_each_entry_safe(ls, n, &clp->cl_lo_states, ls_perclnt)
+		nfsd4_return_all_layouts(ls, &reaplist);
+	spin_unlock(&clp->cl_lock);
+
+	nfsd4_free_layouts(&reaplist);
+}
+
+void
+nfsd4_return_all_file_layouts(struct nfs4_client *clp, struct nfs4_file *fp)
+{
+	struct nfs4_layout_stateid *ls, *n;
+	LIST_HEAD(reaplist);
+
+	spin_lock(&fp->fi_lock);
+	list_for_each_entry_safe(ls, n, &fp->fi_lo_states, ls_perfile) {
+		if (ls->ls_stid.sc_client == clp)
+			nfsd4_return_all_layouts(ls, &reaplist);
+	}
+	spin_unlock(&fp->fi_lock);
+
+	nfsd4_free_layouts(&reaplist);
+}
+
+int
+nfsd4_init_pnfs(void)
+{
+	int i;
+
+	for (i = 0; i < DEVID_HASH_SIZE; i++)
+		INIT_LIST_HEAD(&nfsd_devid_hash[i]);
+
+	nfs4_layout_cache = kmem_cache_create("nfs4_layout",
+			sizeof(struct nfs4_layout), 0, 0, NULL);
+	if (!nfs4_layout_cache)
+		return -ENOMEM;
+
+	nfs4_layout_stateid_cache = kmem_cache_create("nfs4_layout_stateid",
+			sizeof(struct nfs4_layout_stateid), 0, 0, NULL);
+	if (!nfs4_layout_stateid_cache) {
+		kmem_cache_destroy(nfs4_layout_cache);
+		return -ENOMEM;
+	}
+	return 0;
+}
+
+void
+nfsd4_exit_pnfs(void)
+{
+	int i;
+
+	kmem_cache_destroy(nfs4_layout_cache);
+	kmem_cache_destroy(nfs4_layout_stateid_cache);
+
+	for (i = 0; i < DEVID_HASH_SIZE; i++) {
+		struct nfsd4_deviceid_map *map, *n;
+
+		list_for_each_entry_safe(map, n, &nfsd_devid_hash[i], hash)
+			kfree(map);
+	}
+}
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index ac71d13..b813913 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -43,6 +43,7 @@
 #include "current_stateid.h"
 #include "netns.h"
 #include "acl.h"
+#include "pnfs.h"
 
 #ifdef CONFIG_NFSD_V4_SECURITY_LABEL
 #include <linux/security.h>
@@ -1178,6 +1179,252 @@ nfsd4_verify(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	return status == nfserr_same ? nfs_ok : status;
 }
 
+#ifdef CONFIG_NFSD_PNFS
+static const struct nfsd4_layout_ops *
+nfsd4_layout_verify(struct svc_export *exp, unsigned int layout_type)
+{
+	if (!exp->ex_layout_type) {
+		dprintk("%s: export does not support pNFS\n", __func__);
+		return NULL;
+	}
+
+	if (exp->ex_layout_type != layout_type) {
+		dprintk("%s: layout type %d not supported\n",
+			__func__, layout_type);
+		return NULL;
+	}
+
+	return nfsd4_layout_ops[layout_type];
+}
+
+static __be32
+nfsd4_getdeviceinfo(struct svc_rqst *rqstp,
+		struct nfsd4_compound_state *cstate,
+		struct nfsd4_getdeviceinfo *gdp)
+{
+	const struct nfsd4_layout_ops *ops;
+	struct nfsd4_deviceid_map *map;
+	struct svc_export *exp;
+	__be32 nfserr;
+
+	dprintk("%s: layout_type %u dev_id [0x%llx:0x%x] maxcnt %u\n",
+	       __func__,
+	       gdp->gd_layout_type,
+	       gdp->gd_devid.fsid_idx, gdp->gd_devid.generation,
+	       gdp->gd_maxcount);
+
+	map = nfsd4_find_devid_map(gdp->gd_devid.fsid_idx);
+	if (!map) {
+		dprintk("%s: couldn't find device ID to export mapping!\n",
+			__func__);
+		return nfserr_noent;
+	}
+
+	exp = rqst_exp_find(rqstp, map->fsid_type, map->fsid);
+	if (IS_ERR(exp)) {
+		dprintk("%s: could not find device id\n", __func__);
+		return nfserr_noent;
+	}
+
+	nfserr = nfserr_layoutunavailable;
+	ops = nfsd4_layout_verify(exp, gdp->gd_layout_type);
+	if (!ops)
+		goto out;
+
+	nfserr = nfs_ok;
+	if (gdp->gd_maxcount != 0)
+		nfserr = ops->proc_getdeviceinfo(exp->ex_path.mnt->mnt_sb, gdp);
+
+	gdp->gd_notify_types &= ops->notify_types;
+	exp_put(exp);
+out:
+	return nfserr;
+}
+
+static __be32
+nfsd4_layoutget(struct svc_rqst *rqstp,
+		struct nfsd4_compound_state *cstate,
+		struct nfsd4_layoutget *lgp)
+{
+	struct svc_fh *current_fh = &cstate->current_fh;
+	const struct nfsd4_layout_ops *ops;
+	struct nfs4_layout_stateid *ls;
+	__be32 nfserr;
+	int accmode;
+
+	switch (lgp->lg_seg.iomode) {
+	case IOMODE_READ:
+		accmode = NFSD_MAY_READ;
+		break;
+	case IOMODE_RW:
+		accmode = NFSD_MAY_READ | NFSD_MAY_WRITE;
+		break;
+	default:
+		dprintk("%s: invalid iomode %d\n",
+			__func__, lgp->lg_seg.iomode);
+		nfserr = nfserr_badiomode;
+		goto out;
+	}
+
+	nfserr = fh_verify(rqstp, current_fh, 0, accmode);
+	if (nfserr)
+		goto out;
+
+	nfserr = nfserr_layoutunavailable;
+	ops = nfsd4_layout_verify(current_fh->fh_export, lgp->lg_layout_type);
+	if (!ops)
+		goto out;
+
+	/*
+	 * Verify minlength and range as per RFC5661:
+	 *  o  If loga_length is less than loga_minlength,
+	 *     the metadata server MUST return NFS4ERR_INVAL.
+	 *  o  If the sum of loga_offset and loga_minlength exceeds
+	 *     NFS4_UINT64_MAX, and loga_minlength is not
+	 *     NFS4_UINT64_MAX, the error NFS4ERR_INVAL MUST result.
+	 *  o  If the sum of loga_offset and loga_length exceeds
+	 *     NFS4_UINT64_MAX, and loga_length is not NFS4_UINT64_MAX,
+	 *     the error NFS4ERR_INVAL MUST result.
+	 */
+	nfserr = nfserr_inval;
+	if (lgp->lg_seg.length < lgp->lg_minlength ||
+	    (lgp->lg_minlength != NFS4_MAX_UINT64 &&
+	     lgp->lg_minlength > NFS4_MAX_UINT64 - lgp->lg_seg.offset) ||
+	    (lgp->lg_seg.length != NFS4_MAX_UINT64 &&
+	     lgp->lg_seg.length > NFS4_MAX_UINT64 - lgp->lg_seg.offset))
+		goto out;
+	if (lgp->lg_seg.length == 0)
+		goto out;
+
+	nfserr = nfsd4_preprocess_layout_stateid(rqstp, cstate, &lgp->lg_sid,
+						true, lgp->lg_layout_type, &ls);
+	if (nfserr)
+		goto out;
+
+	nfserr = ops->proc_layoutget(current_fh->fh_dentry->d_inode,
+				     current_fh, lgp);
+	if (nfserr)
+		goto out_put_stid;
+
+	nfserr = nfsd4_insert_layout(lgp, ls);
+
+out_put_stid:
+	nfs4_put_stid(&ls->ls_stid);
+out:
+	return nfserr;
+}
+
+static __be32
+nfsd4_layoutcommit(struct svc_rqst *rqstp,
+		struct nfsd4_compound_state *cstate,
+		struct nfsd4_layoutcommit *lcp)
+{
+	const struct nfsd4_layout_seg *seg = &lcp->lc_seg;
+	struct svc_fh *current_fh = &cstate->current_fh;
+	const struct nfsd4_layout_ops *ops;
+	loff_t new_size = lcp->lc_last_wr + 1;
+	struct inode *inode;
+	struct nfs4_layout_stateid *ls;
+	__be32 nfserr;
+
+	nfserr = fh_verify(rqstp, current_fh, 0, NFSD_MAY_WRITE);
+	if (nfserr)
+		goto out;
+
+	nfserr = nfserr_layoutunavailable;
+	ops = nfsd4_layout_verify(current_fh->fh_export, lcp->lc_layout_type);
+	if (!ops)
+		goto out;
+	inode = current_fh->fh_dentry->d_inode;
+
+	nfserr = nfserr_inval;
+	if (new_size <= seg->offset) {
+		dprintk("pnfsd: last write before layout segment\n");
+		goto out;
+	}
+	if (new_size > seg->offset + seg->length) {
+		dprintk("pnfsd: last write beyond layout segment\n");
+		goto out;
+	}
+	if (!lcp->lc_newoffset && new_size > i_size_read(inode)) {
+		dprintk("pnfsd: layoutcommit beyond EOF\n");
+		goto out;
+	}
+
+	nfserr = nfsd4_preprocess_layout_stateid(rqstp, cstate, &lcp->lc_sid,
+						false, lcp->lc_layout_type,
+						&ls);
+	if (nfserr) {
+		/* fixup error code as per RFC5661 */
+		if (nfserr == nfserr_bad_stateid)
+			nfserr = nfserr_badlayout;
+		goto out;
+	}
+
+	nfserr = ops->proc_layoutcommit(inode, lcp);
+	if (nfserr)
+		goto out_put_stid;
+
+	if (new_size > i_size_read(inode)) {
+		lcp->lc_size_chg = 1;
+		lcp->lc_newsize = new_size;
+	} else {
+		lcp->lc_size_chg = 0;
+	}
+
+out_put_stid:
+	nfs4_put_stid(&ls->ls_stid);
+out:
+	return nfserr;
+}
+
+static __be32
+nfsd4_layoutreturn(struct svc_rqst *rqstp,
+		struct nfsd4_compound_state *cstate,
+		struct nfsd4_layoutreturn *lrp)
+{
+	struct svc_fh *current_fh = &cstate->current_fh;
+	__be32 nfserr;
+
+	nfserr = fh_verify(rqstp, current_fh, 0, NFSD_MAY_NOP);
+	if (nfserr)
+		goto out;
+
+	nfserr = nfserr_layoutunavailable;
+	if (!nfsd4_layout_verify(current_fh->fh_export, lrp->lr_layout_type))
+		goto out;
+
+	switch (lrp->lr_seg.iomode) {
+	case IOMODE_READ:
+	case IOMODE_RW:
+	case IOMODE_ANY:
+		break;
+	default:
+		dprintk("%s: invalid iomode %d\n", __func__,
+			lrp->lr_seg.iomode);
+		nfserr = nfserr_inval;
+		goto out;
+	}
+
+	switch (lrp->lr_return_type) {
+	case RETURN_FILE:
+		nfserr = nfsd4_return_file_layouts(rqstp, cstate, lrp);
+		break;
+	case RETURN_FSID:
+	case RETURN_ALL:
+		nfserr = nfsd4_return_client_layouts(rqstp, cstate, lrp);
+		break;
+	default:
+		dprintk("%s: invalid return_type %d\n", __func__,
+			lrp->lr_return_type);
+		nfserr = nfserr_inval;
+		break;
+	}
+out:
+	return nfserr;
+}
+#endif /* CONFIG_NFSD_PNFS */
+
 /*
  * NULL call.
  */
@@ -1966,6 +2213,25 @@ static struct nfsd4_operation nfsd4_ops[] = {
 		.op_get_currentstateid = (stateid_getter)nfsd4_get_freestateid,
 		.op_rsize_bop = (nfsd4op_rsize)nfsd4_only_status_rsize,
 	},
+#ifdef CONFIG_NFSD_PNFS
+	[OP_GETDEVICEINFO] = {
+		.op_func = (nfsd4op_func)nfsd4_getdeviceinfo,
+		.op_flags = ALLOWED_WITHOUT_FH,
+		.op_name = "OP_GETDEVICEINFO",
+	},
+	[OP_LAYOUTGET] = {
+		.op_func = (nfsd4op_func)nfsd4_layoutget,
+		.op_name = "OP_LAYOUTGET",
+	},
+	[OP_LAYOUTCOMMIT] = {
+		.op_func = (nfsd4op_func)nfsd4_layoutcommit,
+		.op_name = "OP_LAYOUTCOMMIT",
+	},
+	[OP_LAYOUTRETURN] = {
+		.op_func = (nfsd4op_func)nfsd4_layoutreturn,
+		.op_name = "OP_LAYOUTRETURN",
+	},
+#endif /* CONFIG_NFSD_PNFS */
 
 	/* NFSv4.2 operations */
 	[OP_ALLOCATE] = {
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 444d43c..925836e 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -48,6 +48,7 @@
 #include "current_stateid.h"
 
 #include "netns.h"
+#include "pnfs.h"
 
 #define NFSDDBG_FACILITY                NFSDDBG_PROC
 
@@ -1544,6 +1545,9 @@ static struct nfs4_client *alloc_client(struct xdr_netobj name)
 	INIT_LIST_HEAD(&clp->cl_lru);
 	INIT_LIST_HEAD(&clp->cl_callbacks);
 	INIT_LIST_HEAD(&clp->cl_revoked);
+#ifdef CONFIG_NFSD_PNFS
+	INIT_LIST_HEAD(&clp->cl_lo_states);
+#endif
 	spin_lock_init(&clp->cl_lock);
 	rpc_init_wait_queue(&clp->cl_cb_waitq, "Backchannel slot table");
 	return clp;
@@ -1648,6 +1652,7 @@ __destroy_client(struct nfs4_client *clp)
 		nfs4_get_stateowner(&oo->oo_owner);
 		release_openowner(oo);
 	}
+	nfsd4_return_all_client_layouts(clp);
 	nfsd4_shutdown_callback(clp);
 	if (clp->cl_cb_conn.cb_xprt)
 		svc_xprt_put(clp->cl_cb_conn.cb_xprt);
@@ -2131,8 +2136,11 @@ nfsd4_replay_cache_entry(struct nfsd4_compoundres *resp,
 static void
 nfsd4_set_ex_flags(struct nfs4_client *new, struct nfsd4_exchange_id *clid)
 {
-	/* pNFS is not supported */
+#ifdef CONFIG_NFSD_PNFS
+	new->cl_exchange_flags |= EXCHGID4_FLAG_USE_PNFS_MDS;
+#else
 	new->cl_exchange_flags |= EXCHGID4_FLAG_USE_NON_PNFS;
+#endif
 
 	/* Referrals are supported, Migration is not. */
 	new->cl_exchange_flags |= EXCHGID4_FLAG_SUPP_MOVED_REFER;
@@ -3060,6 +3068,9 @@ static void nfsd4_init_file(struct knfsd_fh *fh, unsigned int hashval,
 	fp->fi_share_deny = 0;
 	memset(fp->fi_fds, 0, sizeof(fp->fi_fds));
 	memset(fp->fi_access, 0, sizeof(fp->fi_access));
+#ifdef CONFIG_NFSD_PNFS
+	INIT_LIST_HEAD(&fp->fi_lo_states);
+#endif
 	hlist_add_head_rcu(&fp->fi_hash, &file_hashtbl[hashval]);
 }
 
@@ -4846,6 +4857,9 @@ nfsd4_close(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	update_stateid(&stp->st_stid.sc_stateid);
 	memcpy(&close->cl_stateid, &stp->st_stid.sc_stateid, sizeof(stateid_t));
 
+	nfsd4_return_all_file_layouts(stp->st_stateowner->so_client,
+				      stp->st_stid.sc_file);
+
 	nfsd4_close_open_stateid(stp);
 
 	/* put reference from nfs4_preprocess_seqid_op */
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 884ffa3..dead4ac 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -47,6 +47,7 @@
 #include "state.h"
 #include "cache.h"
 #include "netns.h"
+#include "pnfs.h"
 
 #ifdef CONFIG_NFSD_V4_SECURITY_LABEL
 #include <linux/security.h>
@@ -1522,6 +1523,127 @@ static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, str
 	DECODE_TAIL;
 }
 
+#ifdef CONFIG_NFSD_PNFS
+static __be32
+nfsd4_decode_getdeviceinfo(struct nfsd4_compoundargs *argp,
+		struct nfsd4_getdeviceinfo *gdev)
+{
+	DECODE_HEAD;
+	u32 num, i;
+
+	READ_BUF(sizeof(struct nfsd4_deviceid) + 3 * 4);
+	COPYMEM(&gdev->gd_devid, sizeof(struct nfsd4_deviceid));
+	gdev->gd_layout_type = be32_to_cpup(p++);
+	gdev->gd_maxcount = be32_to_cpup(p++);
+	num = be32_to_cpup(p++);
+	if (num) {
+		READ_BUF(4 * num);
+		gdev->gd_notify_types = be32_to_cpup(p++);
+		for (i = 1; i < num; i++) {
+			if (be32_to_cpup(p++)) {
+				status = nfserr_inval;
+				goto out;
+			}
+		}
+	}
+	DECODE_TAIL;
+}
+
+static __be32
+nfsd4_decode_layoutget(struct nfsd4_compoundargs *argp,
+		struct nfsd4_layoutget *lgp)
+{
+	DECODE_HEAD;
+
+	READ_BUF(36);
+	lgp->lg_signal = be32_to_cpup(p++);
+	lgp->lg_layout_type = be32_to_cpup(p++);
+	lgp->lg_seg.iomode = be32_to_cpup(p++);
+	p = xdr_decode_hyper(p, &lgp->lg_seg.offset);
+	p = xdr_decode_hyper(p, &lgp->lg_seg.length);
+	p = xdr_decode_hyper(p, &lgp->lg_minlength);
+	nfsd4_decode_stateid(argp, &lgp->lg_sid);
+	READ_BUF(4);
+	lgp->lg_maxcount = be32_to_cpup(p++);
+
+	DECODE_TAIL;
+}
+
+static __be32
+nfsd4_decode_layoutcommit(struct nfsd4_compoundargs *argp,
+		struct nfsd4_layoutcommit *lcp)
+{
+	DECODE_HEAD;
+	u32 timechange;
+
+	READ_BUF(20);
+	p = xdr_decode_hyper(p, &lcp->lc_seg.offset);
+	p = xdr_decode_hyper(p, &lcp->lc_seg.length);
+	lcp->lc_reclaim = be32_to_cpup(p++);
+	nfsd4_decode_stateid(argp, &lcp->lc_sid);
+	READ_BUF(4);
+	lcp->lc_newoffset = be32_to_cpup(p++);
+	if (lcp->lc_newoffset) {
+		READ_BUF(8);
+		p = xdr_decode_hyper(p, &lcp->lc_last_wr);
+	} else
+		lcp->lc_last_wr = 0;
+	READ_BUF(4);
+	timechange = be32_to_cpup(p++);
+	if (timechange) {
+		status = nfsd4_decode_time(argp, &lcp->lc_mtime);
+		if (status)
+			return status;
+	} else {
+		lcp->lc_mtime.tv_nsec = UTIME_NOW;
+	}
+	READ_BUF(8);
+	lcp->lc_layout_type = be32_to_cpup(p++);
+
+	/*
+	 * Save the layout update in XDR format and let the layout driver deal
+	 * with it later.
+	 */
+	lcp->lc_up_len = be32_to_cpup(p++);
+	if (lcp->lc_up_len > 0) {
+		READ_BUF(lcp->lc_up_len);
+		READMEM(lcp->lc_up_layout, lcp->lc_up_len);
+	}
+
+	DECODE_TAIL;
+}
+
+static __be32
+nfsd4_decode_layoutreturn(struct nfsd4_compoundargs *argp,
+		struct nfsd4_layoutreturn *lrp)
+{
+	DECODE_HEAD;
+
+	READ_BUF(16);
+	lrp->lr_reclaim = be32_to_cpup(p++);
+	lrp->lr_layout_type = be32_to_cpup(p++);
+	lrp->lr_seg.iomode = be32_to_cpup(p++);
+	lrp->lr_return_type = be32_to_cpup(p++);
+	if (lrp->lr_return_type == RETURN_FILE) {
+		READ_BUF(16);
+		p = xdr_decode_hyper(p, &lrp->lr_seg.offset);
+		p = xdr_decode_hyper(p, &lrp->lr_seg.length);
+		nfsd4_decode_stateid(argp, &lrp->lr_sid);
+		READ_BUF(4);
+		lrp->lrf_body_len = be32_to_cpup(p++);
+		if (lrp->lrf_body_len > 0) {
+			READ_BUF(lrp->lrf_body_len);
+			READMEM(lrp->lrf_body, lrp->lrf_body_len);
+		}
+	} else {
+		lrp->lr_seg.offset = 0;
+		lrp->lr_seg.length = NFS4_MAX_UINT64;
+	}
+
+	DECODE_TAIL;
+}
+#endif /* CONFIG_NFSD_PNFS */
+
 static __be32
 nfsd4_decode_fallocate(struct nfsd4_compoundargs *argp,
 		       struct nfsd4_fallocate *fallocate)
@@ -1616,11 +1738,19 @@ static nfsd4_dec nfsd4_dec_ops[] = {
 	[OP_DESTROY_SESSION]	= (nfsd4_dec)nfsd4_decode_destroy_session,
 	[OP_FREE_STATEID]	= (nfsd4_dec)nfsd4_decode_free_stateid,
 	[OP_GET_DIR_DELEGATION]	= (nfsd4_dec)nfsd4_decode_notsupp,
+#ifdef CONFIG_NFSD_PNFS
+	[OP_GETDEVICEINFO]	= (nfsd4_dec)nfsd4_decode_getdeviceinfo,
+	[OP_GETDEVICELIST]	= (nfsd4_dec)nfsd4_decode_notsupp,
+	[OP_LAYOUTCOMMIT]	= (nfsd4_dec)nfsd4_decode_layoutcommit,
+	[OP_LAYOUTGET]		= (nfsd4_dec)nfsd4_decode_layoutget,
+	[OP_LAYOUTRETURN]	= (nfsd4_dec)nfsd4_decode_layoutreturn,
+#else
 	[OP_GETDEVICEINFO]	= (nfsd4_dec)nfsd4_decode_notsupp,
 	[OP_GETDEVICELIST]	= (nfsd4_dec)nfsd4_decode_notsupp,
 	[OP_LAYOUTCOMMIT]	= (nfsd4_dec)nfsd4_decode_notsupp,
 	[OP_LAYOUTGET]		= (nfsd4_dec)nfsd4_decode_notsupp,
 	[OP_LAYOUTRETURN]	= (nfsd4_dec)nfsd4_decode_notsupp,
+#endif
 	[OP_SECINFO_NO_NAME]	= (nfsd4_dec)nfsd4_decode_secinfo_no_name,
 	[OP_SEQUENCE]		= (nfsd4_dec)nfsd4_decode_sequence,
 	[OP_SET_SSV]		= (nfsd4_dec)nfsd4_decode_notsupp,
@@ -2548,6 +2678,30 @@ out_acl:
 			get_parent_attributes(exp, &stat);
 		p = xdr_encode_hyper(p, stat.ino);
 	}
+#ifdef CONFIG_NFSD_PNFS
+	if ((bmval1 & FATTR4_WORD1_FS_LAYOUT_TYPES) ||
+	    (bmval2 & FATTR4_WORD2_LAYOUT_TYPES)) {
+		if (exp->ex_layout_type) {
+			p = xdr_reserve_space(xdr, 8);
+			if (!p)
+				goto out_resource;
+			*p++ = cpu_to_be32(1);
+			*p++ = cpu_to_be32(exp->ex_layout_type);
+		} else {
+			p = xdr_reserve_space(xdr, 4);
+			if (!p)
+				goto out_resource;
+			*p++ = cpu_to_be32(0);
+		}
+	}
+
+	if (bmval2 & FATTR4_WORD2_LAYOUT_BLKSIZE) {
+		p = xdr_reserve_space(xdr, 4);
+		if (!p)
+			goto out_resource;
+		*p++ = cpu_to_be32(stat.blksize);
+	}
+#endif /* CONFIG_NFSD_PNFS */
 	if (bmval2 & FATTR4_WORD2_SECURITY_LABEL) {
 		status = nfsd4_encode_security_label(xdr, rqstp, context,
 								contextlen);
@@ -3823,6 +3977,156 @@ nfsd4_encode_test_stateid(struct nfsd4_compoundres *resp, __be32 nfserr,
 	return nfserr;
 }
 
+#ifdef CONFIG_NFSD_PNFS
+static __be32
+nfsd4_encode_getdeviceinfo(struct nfsd4_compoundres *resp, __be32 nfserr,
+		struct nfsd4_getdeviceinfo *gdev)
+{
+	struct xdr_stream *xdr = &resp->xdr;
+	const struct nfsd4_layout_ops *ops =
+		nfsd4_layout_ops[gdev->gd_layout_type];
+	u32 starting_len = xdr->buf->len, needed_len;
+	__be32 *p;
+
+	dprintk("%s: err %d\n", __func__, nfserr);
+	if (nfserr)
+		goto out;
+
+	nfserr = nfserr_resource;
+	p = xdr_reserve_space(xdr, 4);
+	if (!p)
+		goto out;
+
+	*p++ = cpu_to_be32(gdev->gd_layout_type);
+
+	/* If maxcount is 0 then just update notifications */
+	if (gdev->gd_maxcount != 0) {
+		nfserr = ops->encode_getdeviceinfo(xdr, gdev);
+		if (nfserr) {
+			/*
+			 * We don't bother to burden the layout drivers with
+			 * enforcing gd_maxcount, just tell the client to
+			 * come back with a bigger buffer if it's not enough.
+			 */
+			if (xdr->buf->len + 4 > gdev->gd_maxcount)
+				goto toosmall;
+			goto out;
+		}
+	}
+
+	nfserr = nfserr_resource;
+	if (gdev->gd_notify_types) {
+		p = xdr_reserve_space(xdr, 4 + 4);
+		if (!p)
+			goto out;
+		*p++ = cpu_to_be32(1);			/* bitmap length */
+		*p++ = cpu_to_be32(gdev->gd_notify_types);
+	} else {
+		p = xdr_reserve_space(xdr, 4);
+		if (!p)
+			goto out;
+		*p++ = 0;
+	}
+
+	nfserr = 0;
+out:
+	kfree(gdev->gd_device);
+	dprintk("%s: done: %d\n", __func__, be32_to_cpu(nfserr));
+	return nfserr;
+
+toosmall:
+	dprintk("%s: maxcount too small\n", __func__);
+	needed_len = xdr->buf->len + 4 /* notifications */;
+	xdr_truncate_encode(xdr, starting_len);
+	p = xdr_reserve_space(xdr, 4);
+	if (!p) {
+		nfserr = nfserr_resource;
+	} else {
+		*p++ = cpu_to_be32(needed_len);
+		nfserr = nfserr_toosmall;
+	}
+	goto out;
+}
+
+static __be32
+nfsd4_encode_layoutget(struct nfsd4_compoundres *resp, __be32 nfserr,
+		struct nfsd4_layoutget *lgp)
+{
+	struct xdr_stream *xdr = &resp->xdr;
+	const struct nfsd4_layout_ops *ops =
+		nfsd4_layout_ops[lgp->lg_layout_type];
+	__be32 *p;
+
+	dprintk("%s: err %d\n", __func__, nfserr);
+	if (nfserr)
+		goto out;
+
+	nfserr = nfserr_resource;
+	p = xdr_reserve_space(xdr, 36 + sizeof(stateid_opaque_t));
+	if (!p)
+		goto out;
+
+	*p++ = cpu_to_be32(1);	/* we always set return-on-close */
+	*p++ = cpu_to_be32(lgp->lg_sid.si_generation);
+	p = xdr_encode_opaque_fixed(p, &lgp->lg_sid.si_opaque,
+				    sizeof(stateid_opaque_t));
+
+	*p++ = cpu_to_be32(1);	/* we always return a single layout */
+	p = xdr_encode_hyper(p, lgp->lg_seg.offset);
+	p = xdr_encode_hyper(p, lgp->lg_seg.length);
+	*p++ = cpu_to_be32(lgp->lg_seg.iomode);
+	*p++ = cpu_to_be32(lgp->lg_layout_type);
+
+	nfserr = ops->encode_layoutget(xdr, lgp);
+out:
+	kfree(lgp->lg_content);
+	return nfserr;
+}
+
+static __be32
+nfsd4_encode_layoutcommit(struct nfsd4_compoundres *resp, __be32 nfserr,
+			  struct nfsd4_layoutcommit *lcp)
+{
+	struct xdr_stream *xdr = &resp->xdr;
+	__be32 *p;
+
+	if (nfserr)
+		return nfserr;
+
+	p = xdr_reserve_space(xdr, 4);
+	if (!p)
+		return nfserr_resource;
+	*p++ = cpu_to_be32(lcp->lc_size_chg);
+	if (lcp->lc_size_chg) {
+		p = xdr_reserve_space(xdr, 8);
+		if (!p)
+			return nfserr_resource;
+		p = xdr_encode_hyper(p, lcp->lc_newsize);
+	}
+
+	return nfs_ok;
+}
+
+static __be32
+nfsd4_encode_layoutreturn(struct nfsd4_compoundres *resp, __be32 nfserr,
+		struct nfsd4_layoutreturn *lrp)
+{
+	struct xdr_stream *xdr = &resp->xdr;
+	__be32 *p;
+
+	if (nfserr)
+		return nfserr;
+
+	p = xdr_reserve_space(xdr, 4);
+	if (!p)
+		return nfserr_resource;
+	*p++ = cpu_to_be32(lrp->lrs_present);
+	if (lrp->lrs_present)
+		nfsd4_encode_stateid(xdr, &lrp->lr_sid);
+	return nfs_ok;
+}
+#endif /* CONFIG_NFSD_PNFS */
+
 static __be32
 nfsd4_encode_seek(struct nfsd4_compoundres *resp, __be32 nfserr,
 		  struct nfsd4_seek *seek)
@@ -3899,11 +4203,19 @@ static nfsd4_enc nfsd4_enc_ops[] = {
 	[OP_DESTROY_SESSION]	= (nfsd4_enc)nfsd4_encode_noop,
 	[OP_FREE_STATEID]	= (nfsd4_enc)nfsd4_encode_noop,
 	[OP_GET_DIR_DELEGATION]	= (nfsd4_enc)nfsd4_encode_noop,
+#ifdef CONFIG_NFSD_PNFS
+	[OP_GETDEVICEINFO]	= (nfsd4_enc)nfsd4_encode_getdeviceinfo,
+	[OP_GETDEVICELIST]	= (nfsd4_enc)nfsd4_encode_noop,
+	[OP_LAYOUTCOMMIT]	= (nfsd4_enc)nfsd4_encode_layoutcommit,
+	[OP_LAYOUTGET]		= (nfsd4_enc)nfsd4_encode_layoutget,
+	[OP_LAYOUTRETURN]	= (nfsd4_enc)nfsd4_encode_layoutreturn,
+#else
 	[OP_GETDEVICEINFO]	= (nfsd4_enc)nfsd4_encode_noop,
 	[OP_GETDEVICELIST]	= (nfsd4_enc)nfsd4_encode_noop,
 	[OP_LAYOUTCOMMIT]	= (nfsd4_enc)nfsd4_encode_noop,
 	[OP_LAYOUTGET]		= (nfsd4_enc)nfsd4_encode_noop,
 	[OP_LAYOUTRETURN]	= (nfsd4_enc)nfsd4_encode_noop,
+#endif
 	[OP_SECINFO_NO_NAME]	= (nfsd4_enc)nfsd4_encode_secinfo_no_name,
 	[OP_SEQUENCE]		= (nfsd4_enc)nfsd4_encode_sequence,
 	[OP_SET_SSV]		= (nfsd4_enc)nfsd4_encode_noop,
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 19ace74..aa47d75 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -21,6 +21,7 @@
 #include "cache.h"
 #include "state.h"
 #include "netns.h"
+#include "pnfs.h"
 
 /*
  *	We have a single directory with several nodes in it.
@@ -1258,9 +1259,12 @@ static int __init init_nfsd(void)
 	retval = nfsd4_init_slabs();
 	if (retval)
 		goto out_unregister_pernet;
-	retval = nfsd_fault_inject_init(); /* nfsd fault injection controls */
+	retval = nfsd4_init_pnfs();
 	if (retval)
 		goto out_free_slabs;
+	retval = nfsd_fault_inject_init(); /* nfsd fault injection controls */
+	if (retval)
+		goto out_exit_pnfs;
 	nfsd_stat_init();	/* Statistics */
 	retval = nfsd_reply_cache_init();
 	if (retval)
@@ -1282,6 +1286,8 @@ out_free_lockd:
 out_free_stat:
 	nfsd_stat_shutdown();
 	nfsd_fault_inject_cleanup();
+out_exit_pnfs:
+	nfsd4_exit_pnfs();
 out_free_slabs:
 	nfsd4_free_slabs();
 out_unregister_pernet:
@@ -1299,6 +1305,7 @@ static void __exit exit_nfsd(void)
 	nfsd_stat_shutdown();
 	nfsd_lockd_shutdown();
 	nfsd4_free_slabs();
+	nfsd4_exit_pnfs();
 	nfsd_fault_inject_cleanup();
 	unregister_filesystem(&nfsd_fs_type);
 	unregister_pernet_subsys(&nfsd_net_ops);
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 33a46a8..565c4da 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -325,15 +325,27 @@ void		nfsd_lockd_shutdown(void);
 
 #define NFSD4_SUPPORTED_ATTRS_WORD2 0
 
+/* 4.1 */
+#ifdef CONFIG_NFSD_PNFS
+#define PNFSD_SUPPORTED_ATTRS_WORD1	FATTR4_WORD1_FS_LAYOUT_TYPES
+#define PNFSD_SUPPORTED_ATTRS_WORD2 \
+(FATTR4_WORD2_LAYOUT_BLKSIZE	| FATTR4_WORD2_LAYOUT_TYPES)
+#else
+#define PNFSD_SUPPORTED_ATTRS_WORD1	0
+#define PNFSD_SUPPORTED_ATTRS_WORD2	0
+#endif /* CONFIG_NFSD_PNFS */
+
 #define NFSD4_1_SUPPORTED_ATTRS_WORD0 \
 	NFSD4_SUPPORTED_ATTRS_WORD0
 
 #define NFSD4_1_SUPPORTED_ATTRS_WORD1 \
-	NFSD4_SUPPORTED_ATTRS_WORD1
+	(NFSD4_SUPPORTED_ATTRS_WORD1	| PNFSD_SUPPORTED_ATTRS_WORD1)
 
 #define NFSD4_1_SUPPORTED_ATTRS_WORD2 \
-	(NFSD4_SUPPORTED_ATTRS_WORD2 | FATTR4_WORD2_SUPPATTR_EXCLCREAT)
+	(NFSD4_SUPPORTED_ATTRS_WORD2	| PNFSD_SUPPORTED_ATTRS_WORD2 | \
+	 FATTR4_WORD2_SUPPATTR_EXCLCREAT)
 
+/* 4.2 */
 #ifdef CONFIG_NFSD_V4_SECURITY_LABEL
 #define NFSD4_2_SECURITY_ATTRS		FATTR4_WORD2_SECURITY_LABEL
 #else
diff --git a/fs/nfsd/pnfs.h b/fs/nfsd/pnfs.h
new file mode 100644
index 0000000..a9616a4
--- /dev/null
+++ b/fs/nfsd/pnfs.h
@@ -0,0 +1,80 @@
+#ifndef _FS_NFSD_PNFS_H
+#define _FS_NFSD_PNFS_H 1
+
+#include <linux/exportfs.h>
+#include <linux/nfsd/export.h>
+
+#include "state.h"
+#include "xdr4.h"
+
+struct xdr_stream;
+
+struct nfsd4_deviceid_map {
+	struct list_head	hash;
+	u64			idx;
+	int			fsid_type;
+	u32			fsid[];
+};
+
+struct nfsd4_layout_ops {
+	u32		notify_types;
+
+	__be32 (*proc_getdeviceinfo)(struct super_block *sb,
+			struct nfsd4_getdeviceinfo *gdevp);
+	__be32 (*encode_getdeviceinfo)(struct xdr_stream *xdr,
+			struct nfsd4_getdeviceinfo *gdevp);
+
+	__be32 (*proc_layoutget)(struct inode *, const struct svc_fh *fhp,
+			struct nfsd4_layoutget *lgp);
+	__be32 (*encode_layoutget)(struct xdr_stream *,
+			struct nfsd4_layoutget *lgp);
+
+	__be32 (*proc_layoutcommit)(struct inode *inode,
+			struct nfsd4_layoutcommit *lcp);
+};
+
+extern const struct nfsd4_layout_ops *nfsd4_layout_ops[];
+
+__be32 nfsd4_preprocess_layout_stateid(struct svc_rqst *rqstp,
+		struct nfsd4_compound_state *cstate, stateid_t *stateid,
+		bool create, u32 layout_type, struct nfs4_layout_stateid **lsp);
+__be32 nfsd4_insert_layout(struct nfsd4_layoutget *lgp,
+		struct nfs4_layout_stateid *ls);
+__be32 nfsd4_return_file_layouts(struct svc_rqst *rqstp,
+		struct nfsd4_compound_state *cstate,
+		struct nfsd4_layoutreturn *lrp);
+__be32 nfsd4_return_client_layouts(struct svc_rqst *rqstp,
+		struct nfsd4_compound_state *cstate,
+		struct nfsd4_layoutreturn *lrp);
+int nfsd4_set_deviceid(struct nfsd4_deviceid *id, const struct svc_fh *fhp,
+		u32 device_generation);
+struct nfsd4_deviceid_map *nfsd4_find_devid_map(int idx);
+
+#ifdef CONFIG_NFSD_PNFS
+void nfsd4_setup_layout_type(struct svc_export *exp);
+void nfsd4_return_all_client_layouts(struct nfs4_client *);
+void nfsd4_return_all_file_layouts(struct nfs4_client *clp,
+		struct nfs4_file *fp);
+int nfsd4_init_pnfs(void);
+void nfsd4_exit_pnfs(void);
+#else
+static inline void nfsd4_setup_layout_type(struct svc_export *exp)
+{
+}
+
+static inline void nfsd4_return_all_client_layouts(struct nfs4_client *clp)
+{
+}
+static inline void nfsd4_return_all_file_layouts(struct nfs4_client *clp,
+		struct nfs4_file *fp)
+{
+}
+static inline void nfsd4_exit_pnfs(void)
+{
+}
+static inline int nfsd4_init_pnfs(void)
+{
+	return 0;
+}
+#endif /* CONFIG_NFSD_PNFS */
+#endif /* _FS_NFSD_PNFS_H */
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 18b5ab8..4641fbb 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -92,6 +92,7 @@ struct nfs4_stid {
 /* For a deleg stateid kept around only to process free_stateid's: */
 #define NFS4_REVOKED_DELEG_STID 16
 #define NFS4_CLOSED_DELEG_STID 32
+#define NFS4_LAYOUT_STID 64
 	unsigned char sc_type;
 	stateid_t sc_stateid;
 	struct nfs4_client *sc_client;
@@ -297,6 +298,9 @@ struct nfs4_client {
 	struct list_head	cl_delegations;
 	struct list_head	cl_revoked;	/* unacknowledged, revoked 4.1 state */
 	struct list_head        cl_lru;         /* tail queue */
+#ifdef CONFIG_NFSD_PNFS
+	struct list_head	cl_lo_states;	/* outstanding layout states */
+#endif
 	struct xdr_netobj	cl_name; 	/* id generated by client */
 	nfs4_verifier		cl_verifier; 	/* generated by client */
 	time_t                  cl_time;        /* time of last lease renewal */
@@ -496,6 +500,9 @@ struct nfs4_file {
 	atomic_t		fi_delegees;
 	struct knfsd_fh		fi_fhandle;
 	bool			fi_had_conflict;
+#ifdef CONFIG_NFSD_PNFS
+	struct list_head	fi_lo_states;
+#endif
 };
 
 /*
@@ -528,6 +535,20 @@ static inline struct nfs4_ol_stateid *openlockstateid(struct nfs4_stid *s)
 	return container_of(s, struct nfs4_ol_stateid, st_stid);
 }
 
+struct nfs4_layout_stateid {
+	struct nfs4_stid		ls_stid;
+	struct list_head		ls_perclnt;
+	struct list_head		ls_perfile;
+	spinlock_t			ls_lock;
+	struct list_head		ls_layouts;
+	u32				ls_layout_type;
+};
+
+static inline struct nfs4_layout_stateid *layoutstateid(struct nfs4_stid *s)
+{
+	return container_of(s, struct nfs4_layout_stateid, ls_stid);
+}
+
 /* flags for preprocess_seqid_op() */
 #define RD_STATE	        0x00000010
 #define WR_STATE	        0x00000020
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index 90a5925..0bda93e 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -428,6 +428,61 @@ struct nfsd4_reclaim_complete {
 	u32 rca_one_fs;
 };
 
+struct nfsd4_deviceid {
+	u64			fsid_idx;
+	u32			generation;
+	u32			pad;
+};
+
+struct nfsd4_layout_seg {
+	u32			iomode;
+	u64			offset;
+	u64			length;
+};
+
+struct nfsd4_getdeviceinfo {
+	struct nfsd4_deviceid	gd_devid;	/* request */
+	u32			gd_layout_type;	/* request */
+	u32			gd_maxcount;	/* request */
+	u32			gd_notify_types;/* request - response */
+	void			*gd_device;	/* response */
+};
+
+struct nfsd4_layoutget {
+	u64			lg_minlength;	/* request */
+	u32			lg_signal;	/* request */
+	u32			lg_layout_type;	/* request */
+	u32			lg_maxcount;	/* request */
+	stateid_t		lg_sid;		/* request/response */
+	struct nfsd4_layout_seg	lg_seg;		/* request/response */
+	void			*lg_content;	/* response */
+};
+
+struct nfsd4_layoutcommit {
+	stateid_t		lc_sid;		/* request */
+	struct nfsd4_layout_seg	lc_seg;		/* request */
+	u32			lc_reclaim;	/* request */
+	u32			lc_newoffset;	/* request */
+	u64			lc_last_wr;	/* request */
+	struct timespec		lc_mtime;	/* request */
+	u32			lc_layout_type;	/* request */
+	u32			lc_up_len;	/* layout length */
+	void			*lc_up_layout;	/* decoded by callback */
+	u32			lc_size_chg;	/* boolean for response */
+	u64			lc_newsize;	/* response */
+};
+
+struct nfsd4_layoutreturn {
+	u32			lr_return_type;	/* request */
+	u32			lr_layout_type;	/* request */
+	struct nfsd4_layout_seg	lr_seg;		/* request */
+	u32			lr_reclaim;	/* request */
+	u32			lrf_body_len;	/* request */
+	void			*lrf_body;	/* request */
+	stateid_t		lr_sid;		/* request/response */
+	u32			lrs_present;	/* response */
+};
+
 struct nfsd4_fallocate {
 	/* request */
 	stateid_t	falloc_stateid;
@@ -491,6 +546,10 @@ struct nfsd4_op {
 		struct nfsd4_reclaim_complete	reclaim_complete;
 		struct nfsd4_test_stateid	test_stateid;
 		struct nfsd4_free_stateid	free_stateid;
+		struct nfsd4_getdeviceinfo	getdeviceinfo;
+		struct nfsd4_layoutget		layoutget;
+		struct nfsd4_layoutcommit	layoutcommit;
+		struct nfsd4_layoutreturn	layoutreturn;
 
 		/* NFSv4.2 */
 		struct nfsd4_fallocate		allocate;
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 8a3589c..bc10d68 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -411,6 +411,7 @@ enum lock_type4 {
 #define FATTR4_WORD1_TIME_MODIFY_SET    (1UL << 22)
 #define FATTR4_WORD1_MOUNTED_ON_FILEID  (1UL << 23)
 #define FATTR4_WORD1_FS_LAYOUT_TYPES    (1UL << 30)
+#define FATTR4_WORD2_LAYOUT_TYPES       (1UL << 0)
 #define FATTR4_WORD2_LAYOUT_BLKSIZE     (1UL << 1)
 #define FATTR4_WORD2_MDSTHRESHOLD       (1UL << 4)
 #define FATTR4_WORD2_SECURITY_LABEL     (1UL << 16)
diff --git a/include/uapi/linux/nfsd/debug.h b/include/uapi/linux/nfsd/debug.h
index 1fdc95b..0bf130a 100644
--- a/include/uapi/linux/nfsd/debug.h
+++ b/include/uapi/linux/nfsd/debug.h
@@ -32,6 +32,7 @@
 #define NFSDDBG_REPCACHE	0x0080
 #define NFSDDBG_XDR		0x0100
 #define NFSDDBG_LOCKD		0x0200
+#define NFSDDBG_PNFS		0x0400
 #define NFSDDBG_ALL		0x7FFF
 #define NFSDDBG_NOCHANGE	0xFFFF
 
diff --git a/include/uapi/linux/nfsd/export.h b/include/uapi/linux/nfsd/export.h
index 584b6ef..4742f2c 100644
--- a/include/uapi/linux/nfsd/export.h
+++ b/include/uapi/linux/nfsd/export.h
@@ -47,8 +47,10 @@
  * exported filesystem.
  */
 #define	NFSEXP_V4ROOT		0x10000
+#define NFSEXP_NOPNFS		0x20000
+
 /* All flags that we claim to support.  (Note we don't support NOACL.) */
-#define NFSEXP_ALLFLAGS		0x1FE7F
+#define NFSEXP_ALLFLAGS		0x3FE7F
 
 /* The flags that may vary depending on security flavor: */
 #define NFSEXP_SECINFO_FLAGS	(NFSEXP_READONLY | NFSEXP_ROOTSQUASH \
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 11/20] nfsd: implement pNFS layout recalls
       [not found] ` <1421925006-24231-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
                     ` (6 preceding siblings ...)
  2015-01-22 11:09   ` [PATCH 10/20] nfsd: implement pNFS operations Christoph Hellwig
@ 2015-01-22 11:09   ` Christoph Hellwig
  2015-01-22 11:10   ` [PATCH 14/20] exportfs: add methods for block layout exports Christoph Hellwig
                     ` (6 subsequent siblings)
  14 siblings, 0 replies; 63+ messages in thread
From: Christoph Hellwig @ 2015-01-22 11:09 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

Add support to issue layout recalls to clients.  For now we only support
full-file recalls to get a simple and stable implementation.  This allows
to embedd a nfsd4_callback structure in the layout_state and thus avoid
any memory allocations under spinlocks during a recall.  For normal
use cases that do not intent to share a single file between multiple
clients this implementation is fully sufficient.

To ensure layouts are recalled on local filesystem access each layout
state registers a new FL_LAYOUT lease with the kernel file locking code,
which filesystems that support pNFS exports that require recalls need
to break on conflicting access patterns.

The XDR code is based on the old pNFS server implementation by
Andy Adamson, Benny Halevy, Boaz Harrosh, Dean Hildebrand, Fred Isaman,
Marc Eshel, Mike Sager and Ricardo Labiaga.

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 fs/nfsd/nfs4callback.c |  99 +++++++++++++++++++++++
 fs/nfsd/nfs4layouts.c  | 214 ++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/nfsd/nfs4proc.c     |   4 +
 fs/nfsd/nfs4state.c    |   1 +
 fs/nfsd/state.h        |   6 ++
 fs/nfsd/xdr4cb.h       |   7 ++
 6 files changed, 330 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 7cbdf1b..5827785 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -546,6 +546,102 @@ out:
 	return status;
 }
 
+#ifdef CONFIG_NFSD_PNFS
+/*
+ * CB_LAYOUTRECALL4args
+ *
+ *	struct layoutrecall_file4 {
+ *		nfs_fh4         lor_fh;
+ *		offset4         lor_offset;
+ *		length4         lor_length;
+ *		stateid4        lor_stateid;
+ *	};
+ *
+ *	union layoutrecall4 switch(layoutrecall_type4 lor_recalltype) {
+ *	case LAYOUTRECALL4_FILE:
+ *		layoutrecall_file4 lor_layout;
+ *	case LAYOUTRECALL4_FSID:
+ *		fsid4              lor_fsid;
+ *	case LAYOUTRECALL4_ALL:
+ *		void;
+ *	};
+ *
+ *	struct CB_LAYOUTRECALL4args {
+ *		layouttype4             clora_type;
+ *		layoutiomode4           clora_iomode;
+ *		bool                    clora_changed;
+ *		layoutrecall4           clora_recall;
+ *	};
+ */
+static void encode_cb_layout4args(struct xdr_stream *xdr,
+				  const struct nfs4_layout_stateid *ls,
+				  struct nfs4_cb_compound_hdr *hdr)
+{
+	__be32 *p;
+
+	BUG_ON(hdr->minorversion == 0);
+
+	p = xdr_reserve_space(xdr, 5 * 4);
+	*p++ = cpu_to_be32(OP_CB_LAYOUTRECALL);
+	*p++ = cpu_to_be32(ls->ls_layout_type);
+	*p++ = cpu_to_be32(IOMODE_ANY);
+	*p++ = cpu_to_be32(1);
+	*p = cpu_to_be32(RETURN_FILE);
+
+	encode_nfs_fh4(xdr, &ls->ls_stid.sc_file->fi_fhandle);
+
+	p = xdr_reserve_space(xdr, 2 * 8);
+	p = xdr_encode_hyper(p, 0);
+	xdr_encode_hyper(p, NFS4_MAX_UINT64);
+
+	encode_stateid4(xdr, &ls->ls_recall_sid);
+
+	hdr->nops++;
+}
+
+static void nfs4_xdr_enc_cb_layout(struct rpc_rqst *req,
+				   struct xdr_stream *xdr,
+				   const struct nfsd4_callback *cb)
+{
+	const struct nfs4_layout_stateid *ls =
+		container_of(cb, struct nfs4_layout_stateid, ls_recall);
+	struct nfs4_cb_compound_hdr hdr = {
+		.ident = 0,
+		.minorversion = cb->cb_minorversion,
+	};
+
+	encode_cb_compound4args(xdr, &hdr);
+	encode_cb_sequence4args(xdr, cb, &hdr);
+	encode_cb_layout4args(xdr, ls, &hdr);
+	encode_cb_nops(&hdr);
+}
+
+static int nfs4_xdr_dec_cb_layout(struct rpc_rqst *rqstp,
+				  struct xdr_stream *xdr,
+				  struct nfsd4_callback *cb)
+{
+	struct nfs4_cb_compound_hdr hdr;
+	enum nfsstat4 nfserr;
+	int status;
+
+	status = decode_cb_compound4res(xdr, &hdr);
+	if (unlikely(status))
+		goto out;
+	if (cb) {
+		status = decode_cb_sequence4res(xdr, cb);
+		if (unlikely(status))
+			goto out;
+	}
+	status = decode_cb_op_status(xdr, OP_CB_LAYOUTRECALL, &nfserr);
+	if (unlikely(status))
+		goto out;
+	if (unlikely(nfserr != NFS4_OK))
+		status = nfs_cb_stat_to_errno(nfserr);
+out:
+	return status;
+}
+#endif /* CONFIG_NFSD_PNFS */
+
 /*
  * RPC procedure tables
  */
@@ -563,6 +659,9 @@ out:
 static struct rpc_procinfo nfs4_cb_procedures[] = {
 	PROC(CB_NULL,	NULL,		cb_null,	cb_null),
 	PROC(CB_RECALL,	COMPOUND,	cb_recall,	cb_recall),
+#ifdef CONFIG_NFSD_PNFS
+	PROC(CB_LAYOUT,	COMPOUND,	cb_layout,	cb_layout),
+#endif
 };
 
 static struct rpc_version nfs_cb_version4 = {
diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
index 28c8ff2..04a358a 100644
--- a/fs/nfsd/nfs4layouts.c
+++ b/fs/nfsd/nfs4layouts.c
@@ -1,8 +1,11 @@
 /*
  * Copyright (c) 2014 Christoph Hellwig.
  */
+#include <linux/kmod.h>
+#include <linux/file.h>
 #include <linux/jhash.h>
 #include <linux/sched.h>
+#include <linux/sunrpc/addr.h>
 
 #include "pnfs.h"
 #include "netns.h"
@@ -18,6 +21,9 @@ struct nfs4_layout {
 static struct kmem_cache *nfs4_layout_cache;
 static struct kmem_cache *nfs4_layout_stateid_cache;
 
+static struct nfsd4_callback_ops nfsd4_cb_layout_ops;
+static const struct lock_manager_operations nfsd4_layouts_lm_ops;
+
 const struct nfsd4_layout_ops *nfsd4_layout_ops[LAYOUT_TYPE_MAX] =  {
 };
 
@@ -128,9 +134,42 @@ nfsd4_free_layout_stateid(struct nfs4_stid *stid)
 	list_del_init(&ls->ls_perfile);
 	spin_unlock(&fp->fi_lock);
 
+	vfs_setlease(ls->ls_file, F_UNLCK, NULL, (void **)&ls);
+	fput(ls->ls_file);
+
+	if (ls->ls_recalled)
+		atomic_dec(&ls->ls_stid.sc_file->fi_lo_recalls);
+
 	kmem_cache_free(nfs4_layout_stateid_cache, ls);
 }
 
+static int
+nfsd4_layout_setlease(struct nfs4_layout_stateid *ls)
+{
+	struct file_lock *fl;
+	int status;
+
+	fl = locks_alloc_lock();
+	if (!fl)
+		return -ENOMEM;
+	locks_init_lock(fl);
+	fl->fl_lmops = &nfsd4_layouts_lm_ops;
+	fl->fl_flags = FL_LAYOUT;
+	fl->fl_type = F_RDLCK;
+	fl->fl_end = OFFSET_MAX;
+	fl->fl_owner = ls;
+	fl->fl_pid = current->tgid;
+	fl->fl_file = ls->ls_file;
+
+	status = vfs_setlease(fl->fl_file, fl->fl_type, &fl, NULL);
+	if (status) {
+		locks_free_lock(fl);
+		return status;
+	}
+	BUG_ON(fl != NULL);
+	return 0;
+}
+
 static struct nfs4_layout_stateid *
 nfsd4_alloc_layout_stateid(struct nfsd4_compound_state *cstate,
 		struct nfs4_stid *parent, u32 layout_type)
@@ -153,6 +192,20 @@ nfsd4_alloc_layout_stateid(struct nfsd4_compound_state *cstate,
 	spin_lock_init(&ls->ls_lock);
 	INIT_LIST_HEAD(&ls->ls_layouts);
 	ls->ls_layout_type = layout_type;
+	nfsd4_init_cb(&ls->ls_recall, clp, &nfsd4_cb_layout_ops,
+			NFSPROC4_CLNT_CB_LAYOUT);
+
+	if (parent->sc_type == NFS4_DELEG_STID)
+		ls->ls_file = get_file(fp->fi_deleg_file);
+	else
+		ls->ls_file = find_any_file(fp);
+	BUG_ON(!ls->ls_file);
+
+	if (nfsd4_layout_setlease(ls)) {
+		put_nfs4_file(fp);
+		kmem_cache_free(nfs4_layout_stateid_cache, ls);
+		return NULL;
+	}
 
 	spin_lock(&clp->cl_lock);
 	stp->sc_type = NFS4_LAYOUT_STID;
@@ -216,6 +269,27 @@ out:
 	return status;
 }
 
+static void
+nfsd4_recall_file_layout(struct nfs4_layout_stateid *ls)
+{
+	spin_lock(&ls->ls_lock);
+	if (ls->ls_recalled)
+		goto out_unlock;
+
+	ls->ls_recalled = true;
+	atomic_inc(&ls->ls_stid.sc_file->fi_lo_recalls);
+	if (list_empty(&ls->ls_layouts))
+		goto out_unlock;
+
+	atomic_inc(&ls->ls_stid.sc_count);
+	update_stateid(&ls->ls_stid.sc_stateid);
+	memcpy(&ls->ls_recall_sid, &ls->ls_stid.sc_stateid, sizeof(stateid_t));
+	nfsd4_run_cb(&ls->ls_recall);
+
+out_unlock:
+	spin_unlock(&ls->ls_lock);
+}
+
 static inline u64
 layout_end(struct nfsd4_layout_seg *seg)
 {
@@ -259,18 +333,44 @@ layouts_try_merge(struct nfsd4_layout_seg *lo, struct nfsd4_layout_seg *new)
 	return true;
 }
 
+static __be32
+nfsd4_recall_conflict(struct nfs4_layout_stateid *ls)
+{
+	struct nfs4_file *fp = ls->ls_stid.sc_file;
+	struct nfs4_layout_stateid *l, *n;
+	__be32 nfserr = nfs_ok;
+
+	assert_spin_locked(&fp->fi_lock);
+
+	list_for_each_entry_safe(l, n, &fp->fi_lo_states, ls_perfile) {
+		if (l != ls) {
+			nfsd4_recall_file_layout(l);
+			nfserr = nfserr_recallconflict;
+		}
+	}
+
+	return nfserr;
+}
+
 __be32
 nfsd4_insert_layout(struct nfsd4_layoutget *lgp, struct nfs4_layout_stateid *ls)
 {
 	struct nfsd4_layout_seg *seg = &lgp->lg_seg;
+	struct nfs4_file *fp = ls->ls_stid.sc_file;
 	struct nfs4_layout *lp, *new = NULL;
+	__be32 nfserr;
 
+	spin_lock(&fp->fi_lock);
+	nfserr = nfsd4_recall_conflict(ls);
+	if (nfserr)
+		goto out;
 	spin_lock(&ls->ls_lock);
 	list_for_each_entry(lp, &ls->ls_layouts, lo_perstate) {
 		if (layouts_try_merge(&lp->lo_seg, seg))
 			goto done;
 	}
 	spin_unlock(&ls->ls_lock);
+	spin_unlock(&fp->fi_lock);
 
 	new = kmem_cache_alloc(nfs4_layout_cache, GFP_KERNEL);
 	if (!new)
@@ -278,6 +378,10 @@ nfsd4_insert_layout(struct nfsd4_layoutget *lgp, struct nfs4_layout_stateid *ls)
 	memcpy(&new->lo_seg, seg, sizeof(lp->lo_seg));
 	new->lo_state = ls;
 
+	spin_lock(&fp->fi_lock);
+	nfserr = nfsd4_recall_conflict(ls);
+	if (nfserr)
+		goto out;
 	spin_lock(&ls->ls_lock);
 	list_for_each_entry(lp, &ls->ls_layouts, lo_perstate) {
 		if (layouts_try_merge(&lp->lo_seg, seg))
@@ -291,9 +395,11 @@ done:
 	update_stateid(&ls->ls_stid.sc_stateid);
 	memcpy(&lgp->lg_sid, &ls->ls_stid.sc_stateid, sizeof(stateid_t));
 	spin_unlock(&ls->ls_lock);
+out:
+	spin_unlock(&fp->fi_lock);
 	if (new)
 		kmem_cache_free(nfs4_layout_cache, new);
-	return nfs_ok;
+	return nfserr;
 }
 
 static void
@@ -449,6 +555,112 @@ nfsd4_return_all_file_layouts(struct nfs4_client *clp, struct nfs4_file *fp)
 	nfsd4_free_layouts(&reaplist);
 }
 
+static void
+nfsd4_cb_layout_fail(struct nfs4_layout_stateid *ls)
+{
+	struct nfs4_client *clp = ls->ls_stid.sc_client;
+	char addr_str[INET6_ADDRSTRLEN];
+	static char *envp[] = {
+		"HOME=/",
+		"TERM=linux",
+		"PATH=/sbin:/usr/sbin:/bin:/usr/bin",
+		NULL
+	};
+	char *argv[8];
+	int error;
+
+	rpc_ntop((struct sockaddr *)&clp->cl_addr, addr_str, sizeof(addr_str));
+
+	printk(KERN_WARNING
+		"nfsd: client %s failed to respond to layout recall. "
+		"  Fencing..\n", addr_str);
+
+	argv[0] = "/sbin/nfsd-recall-failed";
+	argv[1] = addr_str;
+	argv[2] = ls->ls_file->f_path.mnt->mnt_sb->s_id;
+	argv[3] = NULL;
+
+	error = call_usermodehelper(argv[0], argv, envp, UMH_WAIT_PROC);
+	if (error) {
+		printk(KERN_ERR "nfsd: fence failed for client %s: %d!\n",
+			addr_str, error);
+	}
+}
+
+static int
+nfsd4_cb_layout_done(struct nfsd4_callback *cb, struct rpc_task *task)
+{
+	struct nfs4_layout_stateid *ls =
+		container_of(cb, struct nfs4_layout_stateid, ls_recall);
+	LIST_HEAD(reaplist);
+
+	switch (task->tk_status) {
+	case 0:
+		return 1;
+	case -NFS4ERR_NOMATCHING_LAYOUT:
+		task->tk_status = 0;
+		return 1;
+	case -NFS4ERR_DELAY:
+		/* Poll the client until it's done with the layout */
+		/* FIXME: cap number of retries.
+		 * The pnfs standard states that we need to only expire
+		 * the client after at-least "lease time" .eg lease-time * 2
+		 * when failing to communicate a recall
+		 */
+		rpc_delay(task, HZ/100); /* 10 mili-seconds */
+		return 0;
+	default:
+		/*
+		 * Unknown error or non-responding client, we'll need to fence.
+		 */
+		nfsd4_cb_layout_fail(ls);
+		return -1;
+	}
+}
+
+static void
+nfsd4_cb_layout_release(struct nfsd4_callback *cb)
+{
+	struct nfs4_layout_stateid *ls =
+		container_of(cb, struct nfs4_layout_stateid, ls_recall);
+	LIST_HEAD(reaplist);
+
+	nfsd4_return_all_layouts(ls, &reaplist);
+	nfsd4_free_layouts(&reaplist);
+	nfs4_put_stid(&ls->ls_stid);
+}
+
+static struct nfsd4_callback_ops nfsd4_cb_layout_ops = {
+	.done		= nfsd4_cb_layout_done,
+	.release	= nfsd4_cb_layout_release,
+};
+
+static bool
+nfsd4_layout_lm_break(struct file_lock *fl)
+{
+	/*
+	 * We don't want the locks code to timeout the lease for us;
+	 * we'll remove it ourself if a layout isn't returned
+	 * in time:
+	 */
+	fl->fl_break_time = 0;
+	nfsd4_recall_file_layout(fl->fl_owner);
+	return false;
+}
+
+static int
+nfsd4_layout_lm_change(struct file_lock *onlist, int arg,
+		struct list_head *dispose)
+{
+	BUG_ON(!(arg & F_UNLCK));
+	return lease_modify(onlist, arg, dispose);
+}
+
+static const struct lock_manager_operations nfsd4_layouts_lm_ops = {
+	.lm_break	= nfsd4_layout_lm_break,
+	.lm_change	= nfsd4_layout_lm_change,
+};
+
 int
 nfsd4_init_pnfs(void)
 {
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index b813913..c051d5b 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1301,6 +1301,10 @@ nfsd4_layoutget(struct svc_rqst *rqstp,
 	if (nfserr)
 		goto out;
 
+	nfserr = nfserr_recallconflict;
+	if (atomic_read(&ls->ls_stid.sc_file->fi_lo_recalls))
+		goto out_put_stid;
+
 	nfserr = ops->proc_layoutget(current_fh->fh_dentry->d_inode,
 				     current_fh, lgp);
 	if (nfserr)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 925836e..1eba300 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -3070,6 +3070,7 @@ static void nfsd4_init_file(struct knfsd_fh *fh, unsigned int hashval,
 	memset(fp->fi_access, 0, sizeof(fp->fi_access));
 #ifdef CONFIG_NFSD_PNFS
 	INIT_LIST_HEAD(&fp->fi_lo_states);
+	atomic_set(&fp->fi_lo_recalls, 0);
 #endif
 	hlist_add_head_rcu(&fp->fi_hash, &file_hashtbl[hashval]);
 }
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 4641fbb..9290534 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -502,6 +502,7 @@ struct nfs4_file {
 	bool			fi_had_conflict;
 #ifdef CONFIG_NFSD_PNFS
 	struct list_head	fi_lo_states;
+	atomic_t		fi_lo_recalls;
 #endif
 };
 
@@ -542,6 +543,10 @@ struct nfs4_layout_stateid {
 	spinlock_t			ls_lock;
 	struct list_head		ls_layouts;
 	u32				ls_layout_type;
+	struct file			*ls_file;
+	struct nfsd4_callback		ls_recall;
+	stateid_t			ls_recall_sid;
+	bool				ls_recalled;
 };
 
 static inline struct nfs4_layout_stateid *layoutstateid(struct nfs4_stid *s)
@@ -556,6 +561,7 @@ static inline struct nfs4_layout_stateid *layoutstateid(struct nfs4_stid *s)
 enum nfsd4_cb_op {
 	NFSPROC4_CLNT_CB_NULL = 0,
 	NFSPROC4_CLNT_CB_RECALL,
+	NFSPROC4_CLNT_CB_LAYOUT,
 	NFSPROC4_CLNT_CB_SEQUENCE,
 };
 
diff --git a/fs/nfsd/xdr4cb.h b/fs/nfsd/xdr4cb.h
index c5c55df..c47f6fd 100644
--- a/fs/nfsd/xdr4cb.h
+++ b/fs/nfsd/xdr4cb.h
@@ -21,3 +21,10 @@
 #define NFS4_dec_cb_recall_sz		(cb_compound_dec_hdr_sz  +      \
 					cb_sequence_dec_sz +            \
 					op_dec_sz)
+#define NFS4_enc_cb_layout_sz		(cb_compound_enc_hdr_sz +       \
+					cb_sequence_enc_sz +            \
+					1 + 3 +                         \
+					enc_nfs4_fh_sz + 4)
+#define NFS4_dec_cb_layout_sz		(cb_compound_dec_hdr_sz  +      \
+					cb_sequence_dec_sz +            \
+					op_dec_sz)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 12/20] nfsd: update documentation for pNFS support
  2015-01-22 11:09 a simple and scalable pNFS block layout server V2 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2015-01-22 11:09 ` [PATCH 09/20] nfsd: make find_any_file available outside nfs4state.c Christoph Hellwig
@ 2015-01-22 11:09 ` Christoph Hellwig
  2015-01-22 11:09 ` [PATCH 13/20] nfsd: add trace events Christoph Hellwig
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 63+ messages in thread
From: Christoph Hellwig @ 2015-01-22 11:09 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Jeff Layton, linux-nfs, linux-fsdevel, xfs

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 Documentation/filesystems/nfs/nfs41-server.txt | 23 ++++++++---------------
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/Documentation/filesystems/nfs/nfs41-server.txt b/Documentation/filesystems/nfs/nfs41-server.txt
index c49cd7e..682a59f 100644
--- a/Documentation/filesystems/nfs/nfs41-server.txt
+++ b/Documentation/filesystems/nfs/nfs41-server.txt
@@ -24,11 +24,6 @@ focuses on the mandatory-to-implement NFSv4.1 Sessions, providing
 "exactly once" semantics and better control and throttling of the
 resources allocated for each client.
 
-Other NFSv4.1 features, Parallel NFS operations in particular,
-are still under development out of tree.
-See http://wiki.linux-nfs.org/wiki/index.php/PNFS_prototype_design
-for more information.
-
 The table below, taken from the NFSv4.1 document, lists
 the operations that are mandatory to implement (REQ), optional
 (OPT), and NFSv4.0 operations that are required not to implement (MNI)
@@ -43,9 +38,7 @@ The OPTIONAL features identified and their abbreviations are as follows:
 The following abbreviations indicate the linux server implementation status.
 	I	Implemented NFSv4.1 operations.
 	NS	Not Supported.
-	NS*	unimplemented optional feature.
-	P	pNFS features implemented out of tree.
-	PNS	pNFS features that are not supported yet (out of tree).
+	NS*	Unimplemented optional feature.
 
 Operations
 
@@ -70,13 +63,13 @@ I  | DESTROY_SESSION      | REQ        |              | Section 18.37  |
 I  | EXCHANGE_ID          | REQ        |              | Section 18.35  |
 I  | FREE_STATEID         | REQ        |              | Section 18.38  |
    | GETATTR              | REQ        |              | Section 18.7   |
-P  | GETDEVICEINFO        | OPT        | pNFS (REQ)   | Section 18.40  |
-P  | GETDEVICELIST        | OPT        | pNFS (OPT)   | Section 18.41  |
+I  | GETDEVICEINFO        | OPT        | pNFS (REQ)   | Section 18.40  |
+NS*| GETDEVICELIST        | OPT        | pNFS (OPT)   | Section 18.41  |
    | GETFH                | REQ        |              | Section 18.8   |
 NS*| GET_DIR_DELEGATION   | OPT        | DDELG (REQ)  | Section 18.39  |
-P  | LAYOUTCOMMIT         | OPT        | pNFS (REQ)   | Section 18.42  |
-P  | LAYOUTGET            | OPT        | pNFS (REQ)   | Section 18.43  |
-P  | LAYOUTRETURN         | OPT        | pNFS (REQ)   | Section 18.44  |
+I  | LAYOUTCOMMIT         | OPT        | pNFS (REQ)   | Section 18.42  |
+I  | LAYOUTGET            | OPT        | pNFS (REQ)   | Section 18.43  |
+I  | LAYOUTRETURN         | OPT        | pNFS (REQ)   | Section 18.44  |
    | LINK                 | OPT        |              | Section 18.9   |
    | LOCK                 | REQ        |              | Section 18.10  |
    | LOCKT                | REQ        |              | Section 18.11  |
@@ -122,9 +115,9 @@ Callback Operations
    |                         | MNI       | or OPT)     |               |
    +-------------------------+-----------+-------------+---------------+
    | CB_GETATTR              | OPT       | FDELG (REQ) | Section 20.1  |
-P  | CB_LAYOUTRECALL         | OPT       | pNFS (REQ)  | Section 20.3  |
+I  | CB_LAYOUTRECALL         | OPT       | pNFS (REQ)  | Section 20.3  |
 NS*| CB_NOTIFY               | OPT       | DDELG (REQ) | Section 20.4  |
-P  | CB_NOTIFY_DEVICEID      | OPT       | pNFS (OPT)  | Section 20.12 |
+NS*| CB_NOTIFY_DEVICEID      | OPT       | pNFS (OPT)  | Section 20.12 |
 NS*| CB_NOTIFY_LOCK          | OPT       |             | Section 20.11 |
 NS*| CB_PUSH_DELEG           | OPT       | FDELG (OPT) | Section 20.5  |
    | CB_RECALL               | OPT       | FDELG,      | Section 20.2  |
-- 
1.9.1


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 13/20] nfsd: add trace events
  2015-01-22 11:09 a simple and scalable pNFS block layout server V2 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2015-01-22 11:09 ` [PATCH 12/20] nfsd: update documentation for pNFS support Christoph Hellwig
@ 2015-01-22 11:09 ` Christoph Hellwig
  2015-01-22 11:10 ` [PATCH 15/20] nfsd: pNFS block layout driver Christoph Hellwig
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 63+ messages in thread
From: Christoph Hellwig @ 2015-01-22 11:09 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Jeff Layton, linux-nfs, linux-fsdevel, xfs

For now just a few simple events to trace the layout stateid lifetime, but
these already were enough to find several bugs in the Linux client layout
stateid handling.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/nfsd/Makefile      |  7 ++++++-
 fs/nfsd/nfs4layouts.c | 16 ++++++++++++++-
 fs/nfsd/nfs4proc.c    |  6 +++++-
 fs/nfsd/trace.c       |  5 +++++
 fs/nfsd/trace.h       | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 85 insertions(+), 3 deletions(-)
 create mode 100644 fs/nfsd/trace.c
 create mode 100644 fs/nfsd/trace.h

diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
index 5806270..6cba933 100644
--- a/fs/nfsd/Makefile
+++ b/fs/nfsd/Makefile
@@ -2,9 +2,14 @@
 # Makefile for the Linux nfs server
 #
 
+ccflags-y += -I$(src)			# needed for trace events
+
 obj-$(CONFIG_NFSD)	+= nfsd.o
 
-nfsd-y 			:= nfssvc.o nfsctl.o nfsproc.o nfsfh.o vfs.o \
+# this one should be compiled first, as the tracing macros can easily blow up
+nfsd-y			+= trace.o
+
+nfsd-y 			+= nfssvc.o nfsctl.o nfsproc.o nfsfh.o vfs.o \
 			   export.o auth.o lockd.o nfscache.o nfsxdr.o stats.o
 nfsd-$(CONFIG_NFSD_FAULT_INJECTION) += fault_inject.o
 nfsd-$(CONFIG_NFSD_V2_ACL) += nfs2acl.o
diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
index 04a358a..1650075 100644
--- a/fs/nfsd/nfs4layouts.c
+++ b/fs/nfsd/nfs4layouts.c
@@ -9,6 +9,7 @@
 
 #include "pnfs.h"
 #include "netns.h"
+#include "trace.h"
 
 #define NFSDDBG_FACILITY                NFSDDBG_PNFS
 
@@ -126,6 +127,8 @@ nfsd4_free_layout_stateid(struct nfs4_stid *stid)
 	struct nfs4_client *clp = ls->ls_stid.sc_client;
 	struct nfs4_file *fp = ls->ls_stid.sc_file;
 
+	trace_layoutstate_free(&ls->ls_stid.sc_stateid);
+
 	spin_lock(&clp->cl_lock);
 	list_del_init(&ls->ls_perclnt);
 	spin_unlock(&clp->cl_lock);
@@ -216,6 +219,7 @@ nfsd4_alloc_layout_stateid(struct nfsd4_compound_state *cstate,
 	list_add(&ls->ls_perfile, &fp->fi_lo_states);
 	spin_unlock(&fp->fi_lock);
 
+	trace_layoutstate_alloc(&ls->ls_stid.sc_stateid);
 	return ls;
 }
 
@@ -281,6 +285,8 @@ nfsd4_recall_file_layout(struct nfs4_layout_stateid *ls)
 	if (list_empty(&ls->ls_layouts))
 		goto out_unlock;
 
+	trace_layout_recall(&ls->ls_stid.sc_stateid);
+
 	atomic_inc(&ls->ls_stid.sc_count);
 	update_stateid(&ls->ls_stid.sc_stateid);
 	memcpy(&ls->ls_recall_sid, &ls->ls_stid.sc_stateid, sizeof(stateid_t));
@@ -455,8 +461,10 @@ nfsd4_return_file_layouts(struct svc_rqst *rqstp,
 	nfserr = nfsd4_preprocess_layout_stateid(rqstp, cstate, &lrp->lr_sid,
 						false, lrp->lr_layout_type,
 						&ls);
-	if (nfserr)
+	if (nfserr) {
+		trace_layout_return_lookup_fail(&lrp->lr_sid);
 		return nfserr;
+	}
 
 	spin_lock(&ls->ls_lock);
 	list_for_each_entry_safe(lp, n, &ls->ls_layouts, lo_perstate) {
@@ -473,6 +481,7 @@ nfsd4_return_file_layouts(struct svc_rqst *rqstp,
 		}
 		lrp->lrs_present = 1;
 	} else {
+		trace_layoutstate_unhash(&ls->ls_stid.sc_stateid);
 		nfs4_unhash_stid(&ls->ls_stid);
 		lrp->lrs_present = 0;
 	}
@@ -571,6 +580,8 @@ nfsd4_cb_layout_fail(struct nfs4_layout_stateid *ls)
 
 	rpc_ntop((struct sockaddr *)&clp->cl_addr, addr_str, sizeof(addr_str));
 
+	nfsd4_cb_layout_fail(ls);
+
 	printk(KERN_WARNING
 		"nfsd: client %s failed to respond to layout recall. "
 		"  Fencing..\n", addr_str);
@@ -598,6 +609,7 @@ nfsd4_cb_layout_done(struct nfsd4_callback *cb, struct rpc_task *task)
 	case 0:
 		return 1;
 	case -NFS4ERR_NOMATCHING_LAYOUT:
+		trace_layout_recall_done(&ls->ls_stid.sc_stateid);
 		task->tk_status = 0;
 		return 1;
 	case -NFS4ERR_DELAY:
@@ -625,6 +637,8 @@ nfsd4_cb_layout_release(struct nfsd4_callback *cb)
 		container_of(cb, struct nfs4_layout_stateid, ls_recall);
 	LIST_HEAD(reaplist);
 
+	trace_layout_recall_release(&ls->ls_stid.sc_stateid);
+
 	nfsd4_return_all_layouts(ls, &reaplist);
 	nfsd4_free_layouts(&reaplist);
 	nfs4_put_stid(&ls->ls_stid);
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index c051d5b..28e3927 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -44,6 +44,7 @@
 #include "netns.h"
 #include "acl.h"
 #include "pnfs.h"
+#include "trace.h"
 
 #ifdef CONFIG_NFSD_V4_SECURITY_LABEL
 #include <linux/security.h>
@@ -1298,8 +1299,10 @@ nfsd4_layoutget(struct svc_rqst *rqstp,
 
 	nfserr = nfsd4_preprocess_layout_stateid(rqstp, cstate, &lgp->lg_sid,
 						true, lgp->lg_layout_type, &ls);
-	if (nfserr)
+	if (nfserr) {
+		trace_layout_get_lookup_fail(&lgp->lg_sid);
 		goto out;
+	}
 
 	nfserr = nfserr_recallconflict;
 	if (atomic_read(&ls->ls_stid.sc_file->fi_lo_recalls))
@@ -1359,6 +1362,7 @@ nfsd4_layoutcommit(struct svc_rqst *rqstp,
 						false, lcp->lc_layout_type,
 						&ls);
 	if (nfserr) {
+		trace_layout_commit_lookup_fail(&lcp->lc_sid);
 		/* fixup error code as per RFC5661 */
 		if (nfserr == nfserr_bad_stateid)
 			nfserr = nfserr_badlayout;
diff --git a/fs/nfsd/trace.c b/fs/nfsd/trace.c
new file mode 100644
index 0000000..82f8907
--- /dev/null
+++ b/fs/nfsd/trace.c
@@ -0,0 +1,5 @@
+
+#include "state.h"
+
+#define CREATE_TRACE_POINTS
+#include "trace.h"
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
new file mode 100644
index 0000000..c668520
--- /dev/null
+++ b/fs/nfsd/trace.h
@@ -0,0 +1,54 @@
+/*
+ * Copyright (c) 2014 Christoph Hellwig.
+ */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM nfsd
+
+#if !defined(_NFSD_TRACE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _NFSD_TRACE_H
+
+#include <linux/tracepoint.h>
+
+DECLARE_EVENT_CLASS(nfsd_stateid_class,
+	TP_PROTO(stateid_t *stp),
+	TP_ARGS(stp),
+	TP_STRUCT__entry(
+		__field(u32, cl_boot)
+		__field(u32, cl_id)
+		__field(u32, si_id)
+		__field(u32, si_generation)
+	),
+	TP_fast_assign(
+		__entry->cl_boot = stp->si_opaque.so_clid.cl_boot;
+		__entry->cl_id = stp->si_opaque.so_clid.cl_id;
+		__entry->si_id = stp->si_opaque.so_id;
+		__entry->si_generation = stp->si_generation;
+	),
+	TP_printk("client %08x:%08x stateid %08x:%08x",
+		__entry->cl_boot,
+		__entry->cl_id,
+		__entry->si_id,
+		__entry->si_generation)
+)
+
+#define DEFINE_STATEID_EVENT(name) \
+DEFINE_EVENT(nfsd_stateid_class, name, \
+	TP_PROTO(stateid_t *stp), \
+	TP_ARGS(stp))
+DEFINE_STATEID_EVENT(layoutstate_alloc);
+DEFINE_STATEID_EVENT(layoutstate_unhash);
+DEFINE_STATEID_EVENT(layoutstate_free);
+DEFINE_STATEID_EVENT(layout_get_lookup_fail);
+DEFINE_STATEID_EVENT(layout_commit_lookup_fail);
+DEFINE_STATEID_EVENT(layout_return_lookup_fail);
+DEFINE_STATEID_EVENT(layout_recall);
+DEFINE_STATEID_EVENT(layout_recall_done);
+DEFINE_STATEID_EVENT(layout_recall_fail);
+DEFINE_STATEID_EVENT(layout_recall_release);
+
+#endif /* _NFSD_TRACE_H */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE trace
+#include <trace/define_trace.h>
-- 
1.9.1


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 14/20] exportfs: add methods for block layout exports
       [not found] ` <1421925006-24231-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
                     ` (7 preceding siblings ...)
  2015-01-22 11:09   ` [PATCH 11/20] nfsd: implement pNFS layout recalls Christoph Hellwig
@ 2015-01-22 11:10   ` Christoph Hellwig
  2015-01-22 11:10   ` [PATCH 16/20] xfs: pass a 64-bit count argument to xfs_iomap_write_unwritten Christoph Hellwig
                     ` (5 subsequent siblings)
  14 siblings, 0 replies; 63+ messages in thread
From: Christoph Hellwig @ 2015-01-22 11:10 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

Add three methods to allow exporting pnfs block layout volumes:

 - get_uuid: get a filesystem unique signature exposed to clients
 - map_blocks: map and if nessecary allocate blocks for a layout
 - commit_blocks: commit blocks in a layout once the client is done with them

For now we stick the external pnfs block layout interfaces into s_export_op to
avoid mixing them up with the internal interface between the NFS server and
the layout drivers.  Once we've fully internalized the latter interface we
can redecide if these methods should stay in s_export_ops.

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 include/linux/exportfs.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
index 41b223a..ff46bf7 100644
--- a/include/linux/exportfs.h
+++ b/include/linux/exportfs.h
@@ -4,6 +4,7 @@
 #include <linux/types.h>
 
 struct dentry;
+struct iattr;
 struct inode;
 struct super_block;
 struct vfsmount;
@@ -180,6 +181,19 @@ struct fid {
  *    get_name is not (which is possibly inconsistent)
  */
 
+/* types of block ranges for multipage write mappings. */
+#define IOMAP_HOLE	0x01	/* no blocks allocated, need allocation */
+#define IOMAP_DELALLOC	0x02	/* delayed allocation blocks */
+#define IOMAP_MAPPED	0x03	/* blocks allocated @blkno */
+#define IOMAP_UNWRITTEN	0x04	/* blocks allocated @blkno in unwritten state */
+
+struct iomap {
+	sector_t	blkno;	/* first sector of mapping */
+	loff_t		offset;	/* file offset of mapping, bytes */
+	u64		length;	/* length of mapping, bytes */
+	int		type;	/* type of mapping */
+};
+
 struct export_operations {
 	int (*encode_fh)(struct inode *inode, __u32 *fh, int *max_len,
 			struct inode *parent);
@@ -191,6 +205,13 @@ struct export_operations {
 			struct dentry *child);
 	struct dentry * (*get_parent)(struct dentry *child);
 	int (*commit_metadata)(struct inode *inode);
+
+	int (*get_uuid)(struct super_block *sb, u8 *buf, u32 *len, u64 *offset);
+	int (*map_blocks)(struct inode *inode, loff_t offset,
+			  u64 len, struct iomap *iomap,
+			  bool write, u32 *device_generation);
+	int (*commit_blocks)(struct inode *inode, struct iomap *iomaps,
+			     int nr_iomaps, struct iattr *iattr);
 };
 
 extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid,
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 15/20] nfsd: pNFS block layout driver
  2015-01-22 11:09 a simple and scalable pNFS block layout server V2 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2015-01-22 11:09 ` [PATCH 13/20] nfsd: add trace events Christoph Hellwig
@ 2015-01-22 11:10 ` Christoph Hellwig
  2015-01-22 11:10 ` [PATCH 18/20] xfs: factor out a xfs_update_prealloc_flags() helper Christoph Hellwig
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 63+ messages in thread
From: Christoph Hellwig @ 2015-01-22 11:10 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Jeff Layton, linux-nfs, linux-fsdevel, xfs

Add a small shim between core nfsd and filesystems to translate the
somewhat cumbersome pNFS data structures and semantics to something
more palatable for Linux filesystems.

Thanks to Rick McNeal for the old prototype pNFS blocklayout server
code, which gave a lot of inspiration to this version even if no
code is left from it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 .../filesystems/nfs/pnfs-block-server.txt          |  37 ++++
 fs/nfsd/Makefile                                   |   2 +-
 fs/nfsd/blocklayout.c                              | 190 +++++++++++++++++++++
 fs/nfsd/blocklayoutxdr.c                           | 157 +++++++++++++++++
 fs/nfsd/blocklayoutxdr.h                           |  62 +++++++
 fs/nfsd/nfs4layouts.c                              |   8 +
 fs/nfsd/pnfs.h                                     |   1 +
 7 files changed, 456 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/filesystems/nfs/pnfs-block-server.txt
 create mode 100644 fs/nfsd/blocklayout.c
 create mode 100644 fs/nfsd/blocklayoutxdr.c
 create mode 100644 fs/nfsd/blocklayoutxdr.h

diff --git a/Documentation/filesystems/nfs/pnfs-block-server.txt b/Documentation/filesystems/nfs/pnfs-block-server.txt
new file mode 100644
index 0000000..2143673
--- /dev/null
+++ b/Documentation/filesystems/nfs/pnfs-block-server.txt
@@ -0,0 +1,37 @@
+pNFS block layout server user guide
+
+The Linux NFS server now supports the pNFS block layout extension.  In this
+case the NFS server acts as Metadata Server (MDS) for pNFS, which in addition
+to handling all the metadata access to the NFS export also hands out layouts
+to the clients to directly access the underlying block devices that are
+shared with the client.
+
+To use pNFS block layouts with with the Linux NFS server the exported file
+system needs to support the pNFS block layouts (currently just XFS), and the
+file system must sit on shared storage (typically iSCSI) that is accessible
+to the clients in addition to the MDS.  As of now the file system needs to
+sit directly on the exported volume, striping or concatenation of
+volumes on the MDS and clients is not supported yet.
+
+On the server, pNFS block volume support is automatically if the file system
+support it.  On the client make sure the kernel has the CONFIG_PNFS_BLOCK
+option enabled, the blkmapd daemon from nfs-utils is running, and the
+file system is mounted using the NFSv4.1 protocol version (mount -o vers=4.1).
+
+If the nfsd server needs to fence a non-responding client it calls
+/sbin/nfsd-recall-failed with the first argument set to the IP address of
+the client, and the second argument set to the device node without the /dev
+prefix for the file system to be fenced. Below is an example file that shows
+how to translate the device into a serial number from SCSI EVPD 0x80:
+
+cat > /sbin/nfsd-recall-failed << EOF
+#!/bin/sh
+
+CLIENT="$1"
+DEV="/dev/$2"
+EVPD=`sg_inq --page=0x80 ${DEV} | \
+	grep "Unit serial number:" | \
+	awk -F ': ' '{print $2}'`
+
+echo "fencing client ${CLIENT} serial ${EVPD}" >> /var/log/pnfsd-fence.log
+EOF
diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
index 6cba933..9a6028e 100644
--- a/fs/nfsd/Makefile
+++ b/fs/nfsd/Makefile
@@ -17,4 +17,4 @@ nfsd-$(CONFIG_NFSD_V3)	+= nfs3proc.o nfs3xdr.o
 nfsd-$(CONFIG_NFSD_V3_ACL) += nfs3acl.o
 nfsd-$(CONFIG_NFSD_V4)	+= nfs4proc.o nfs4xdr.o nfs4state.o nfs4idmap.o \
 			   nfs4acl.o nfs4callback.o nfs4recover.o
-nfsd-$(CONFIG_NFSD_PNFS) += nfs4layouts.o
+nfsd-$(CONFIG_NFSD_PNFS) += nfs4layouts.o blocklayout.o blocklayoutxdr.o
diff --git a/fs/nfsd/blocklayout.c b/fs/nfsd/blocklayout.c
new file mode 100644
index 0000000..9bced6c
--- /dev/null
+++ b/fs/nfsd/blocklayout.c
@@ -0,0 +1,190 @@
+/*
+ * Copyright (c) 2014 Christoph Hellwig.
+ */
+#include <linux/exportfs.h>
+#include <linux/genhd.h>
+#include <linux/slab.h>
+#include <linux/raid_class.h>
+
+#include <linux/nfsd/debug.h>
+
+#include "blocklayoutxdr.h"
+#include "pnfs.h"
+
+#define NFSDDBG_FACILITY	NFSDDBG_PNFS
+
+
+static int
+nfsd4_block_get_device_info_simple(struct super_block *sb,
+		struct nfsd4_getdeviceinfo *gdp)
+{
+	struct pnfs_block_deviceaddr *dev;
+	struct pnfs_block_volume *b;
+
+	dev = kzalloc(sizeof(struct pnfs_block_deviceaddr) +
+		      sizeof(struct pnfs_block_volume), GFP_KERNEL);
+	if (!dev)
+		return -ENOMEM;
+	gdp->gd_device = dev;
+
+	dev->nr_volumes = 1;
+	b = &dev->volumes[0];
+
+	b->type = PNFS_BLOCK_VOLUME_SIMPLE;
+	b->simple.sig_len = PNFS_BLOCK_UUID_LEN;
+	return sb->s_export_op->get_uuid(sb, b->simple.sig, &b->simple.sig_len,
+			&b->simple.offset);
+}
+
+static __be32
+nfsd4_block_proc_getdeviceinfo(struct super_block *sb,
+		struct nfsd4_getdeviceinfo *gdp)
+{
+	if (sb->s_bdev != sb->s_bdev->bd_contains)
+		return nfserr_inval;
+	return nfserrno(nfsd4_block_get_device_info_simple(sb, gdp));
+}
+
+static __be32
+nfsd4_block_proc_layoutget(struct inode *inode, const struct svc_fh *fhp,
+		struct nfsd4_layoutget *args)
+{
+	struct nfsd4_layout_seg *seg = &args->lg_seg;
+	struct super_block *sb = inode->i_sb;
+	u32 block_size = (1 << inode->i_blkbits);
+	struct pnfs_block_extent *bex;
+	struct iomap iomap;
+	u32 device_generation = 0;
+	int error;
+
+	/*
+	 * We do not attempt to support I/O smaller than the fs block size,
+	 * or not aligned to it.
+	 */
+	if (args->lg_minlength < block_size) {
+		dprintk("pnfsd: I/O too small\n");
+		goto out_layoutunavailable;
+	}
+	if (seg->offset & (block_size - 1)) {
+		dprintk("pnfsd: I/O misaligned\n");
+		goto out_layoutunavailable;
+	}
+
+	/*
+	 * Some clients barf on non-zero block numbers for NONE or INVALID
+	 * layouts, so make sure to zero the whole structure.
+	 */
+	error = -ENOMEM;
+	bex = kzalloc(sizeof(*bex), GFP_KERNEL);
+	if (!bex)
+		goto out_error;
+	args->lg_content = bex;
+
+	error = sb->s_export_op->map_blocks(inode, seg->offset, seg->length,
+					    &iomap, seg->iomode != IOMODE_READ,
+					    &device_generation);
+	if (error) {
+		if (error == -ENXIO)
+			goto out_layoutunavailable;
+		goto out_error;
+	}
+
+	if (iomap.length < args->lg_minlength) {
+		dprintk("pnfsd: extent smaller than minlength\n");
+		goto out_layoutunavailable;
+	}
+
+	switch (iomap.type) {
+	case IOMAP_MAPPED:
+		if (seg->iomode == IOMODE_READ)
+			bex->es = PNFS_BLOCK_READ_DATA;
+		else
+			bex->es = PNFS_BLOCK_READWRITE_DATA;
+		bex->soff = (iomap.blkno << 9);
+		break;
+	case IOMAP_UNWRITTEN:
+		if (seg->iomode & IOMODE_RW) {
+			/*
+			 * Crack monkey special case from section 2.3.1.
+			 */
+			if (args->lg_minlength == 0) {
+				dprintk("pnfsd: no soup for you!\n");
+				goto out_layoutunavailable;
+			}
+
+			bex->es = PNFS_BLOCK_INVALID_DATA;
+			bex->soff = (iomap.blkno << 9);
+			break;
+		}
+		/*FALLTHRU*/
+	case IOMAP_HOLE:
+		if (seg->iomode == IOMODE_READ) {
+			bex->es = PNFS_BLOCK_NONE_DATA;
+			break;
+		}
+		/*FALLTHRU*/
+	case IOMAP_DELALLOC:
+	default:
+		WARN(1, "pnfsd: filesystem returned %d extent\n", iomap.type);
+		goto out_layoutunavailable;
+	}
+
+	error = nfsd4_set_deviceid(&bex->vol_id, fhp, device_generation);
+	if (error)
+		goto out_error;
+	bex->foff = iomap.offset;
+	bex->len = iomap.length;
+
+	seg->offset = iomap.offset;
+	seg->length = iomap.length;
+
+	dprintk("GET: %lld:%lld %d\n", bex->foff, bex->len, bex->es);
+	return 0;
+
+out_error:
+	seg->length = 0;
+	return nfserrno(error);
+out_layoutunavailable:
+	seg->length = 0;
+	return nfserr_layoutunavailable;
+}
+
+static __be32
+nfsd4_block_proc_layoutcommit(struct inode *inode,
+		struct nfsd4_layoutcommit *lcp)
+{
+	loff_t new_size = lcp->lc_last_wr + 1;
+	struct iattr iattr = { .ia_valid = 0 };
+	struct iomap *iomaps;
+	int nr_iomaps;
+	int error;
+
+	nr_iomaps = nfsd4_block_decode_layoutupdate(lcp->lc_up_layout,
+			lcp->lc_up_len, &iomaps, 1 << inode->i_blkbits);
+	if (nr_iomaps < 0)
+		return nfserrno(nr_iomaps);
+
+	if (lcp->lc_mtime.tv_nsec == UTIME_NOW ||
+	    timespec_compare(&lcp->lc_mtime, &inode->i_mtime) < 0)
+		lcp->lc_mtime = current_fs_time(inode->i_sb);
+	iattr.ia_valid |= ATTR_ATIME | ATTR_CTIME | ATTR_MTIME;
+	iattr.ia_atime = iattr.ia_ctime = iattr.ia_mtime = lcp->lc_mtime;
+
+	if (new_size > i_size_read(inode)) {
+		iattr.ia_valid |= ATTR_SIZE;
+		iattr.ia_size = new_size;
+	}
+
+	error = inode->i_sb->s_export_op->commit_blocks(inode, iomaps,
+			nr_iomaps, &iattr);
+	kfree(iomaps);
+	return nfserrno(error);
+}
+
+const struct nfsd4_layout_ops bl_layout_ops = {
+	.proc_getdeviceinfo	= nfsd4_block_proc_getdeviceinfo,
+	.encode_getdeviceinfo	= nfsd4_block_encode_getdeviceinfo,
+	.proc_layoutget		= nfsd4_block_proc_layoutget,
+	.encode_layoutget	= nfsd4_block_encode_layoutget,
+	.proc_layoutcommit	= nfsd4_block_proc_layoutcommit,
+};
diff --git a/fs/nfsd/blocklayoutxdr.c b/fs/nfsd/blocklayoutxdr.c
new file mode 100644
index 0000000..9da89fd
--- /dev/null
+++ b/fs/nfsd/blocklayoutxdr.c
@@ -0,0 +1,157 @@
+/*
+ * Copyright (c) 2014 Christoph Hellwig.
+ */
+#include <linux/sunrpc/svc.h>
+#include <linux/exportfs.h>
+#include <linux/nfs4.h>
+
+#include "nfsd.h"
+#include "blocklayoutxdr.h"
+
+#define NFSDDBG_FACILITY	NFSDDBG_PNFS
+
+
+__be32
+nfsd4_block_encode_layoutget(struct xdr_stream *xdr,
+		struct nfsd4_layoutget *lgp)
+{
+	struct pnfs_block_extent *b = lgp->lg_content;
+	int len = sizeof(__be32) + 5 * sizeof(__be64) + sizeof(__be32);
+	__be32 *p;
+
+	p = xdr_reserve_space(xdr, sizeof(__be32) + len);
+	if (!p)
+		return nfserr_toosmall;
+
+	*p++ = cpu_to_be32(len);
+	*p++ = cpu_to_be32(1);		/* we always return a single extent */
+
+	p = xdr_encode_opaque_fixed(p, &b->vol_id,
+			sizeof(struct nfsd4_deviceid));
+	p = xdr_encode_hyper(p, b->foff);
+	p = xdr_encode_hyper(p, b->len);
+	p = xdr_encode_hyper(p, b->soff);
+	*p++ = cpu_to_be32(b->es);
+	return 0;
+}
+
+static int
+nfsd4_block_encode_volume(struct xdr_stream *xdr, struct pnfs_block_volume *b)
+{
+	__be32 *p;
+	int len;
+
+	switch (b->type) {
+	case PNFS_BLOCK_VOLUME_SIMPLE:
+		len = 4 + 4 + 8 + 4 + b->simple.sig_len;
+		p = xdr_reserve_space(xdr, len);
+		if (!p)
+			return -ETOOSMALL;
+
+		*p++ = cpu_to_be32(b->type);
+		*p++ = cpu_to_be32(1);	/* single signature */
+		p = xdr_encode_hyper(p, b->simple.offset);
+		p = xdr_encode_opaque(p, b->simple.sig, b->simple.sig_len);
+		break;
+	default:
+		return -ENOTSUPP;
+	}
+
+	return len;
+}
+
+__be32
+nfsd4_block_encode_getdeviceinfo(struct xdr_stream *xdr,
+		struct nfsd4_getdeviceinfo *gdp)
+{
+	struct pnfs_block_deviceaddr *dev = gdp->gd_device;
+	int len = sizeof(__be32), ret, i;
+	__be32 *p;
+
+	p = xdr_reserve_space(xdr, len + sizeof(__be32));
+	if (!p)
+		return nfserr_resource;
+
+	for (i = 0; i < dev->nr_volumes; i++) {
+		ret = nfsd4_block_encode_volume(xdr, &dev->volumes[i]);
+		if (ret < 0)
+			return nfserrno(ret);
+		len += ret;
+	}
+
+	/*
+	 * Fill in the overall length and number of volumes at the beginning
+	 * of the layout.
+	 */
+	*p++ = cpu_to_be32(len);
+	*p++ = cpu_to_be32(dev->nr_volumes);
+	return 0;
+}
+
+int
+nfsd4_block_decode_layoutupdate(__be32 *p, u32 len, struct iomap **iomapp,
+		u32 block_size)
+{
+	struct iomap *iomaps;
+	u32 nr_iomaps, expected, i;
+
+	if (len < sizeof(u32)) {
+		dprintk("%s: extent array too small: %u\n", __func__, len);
+		return -EINVAL;
+	}
+
+	nr_iomaps = be32_to_cpup(p++);
+	expected = sizeof(__be32) + nr_iomaps * NFS4_BLOCK_EXTENT_SIZE;
+	if (len != expected) {
+		dprintk("%s: extent array size mismatch: %u/%u\n",
+			__func__, len, expected);
+		return -EINVAL;
+	}
+
+	iomaps = kcalloc(nr_iomaps, sizeof(*iomaps), GFP_KERNEL);
+	if (!iomaps) {
+		dprintk("%s: failed to allocate extent array\n", __func__);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < nr_iomaps; i++) {
+		struct pnfs_block_extent bex;
+
+		memcpy(&bex.vol_id, p, sizeof(struct nfsd4_deviceid));
+		p += XDR_QUADLEN(sizeof(struct nfsd4_deviceid));
+
+		p = xdr_decode_hyper(p, &bex.foff);
+		if (bex.foff & (block_size - 1)) {
+			dprintk("%s: unaligned offset %lld\n",
+				__func__, bex.foff);
+			goto fail;
+		}
+		p = xdr_decode_hyper(p, &bex.len);
+		if (bex.len & (block_size - 1)) {
+			dprintk("%s: unaligned length %lld\n",
+				__func__, bex.foff);
+			goto fail;
+		}
+		p = xdr_decode_hyper(p, &bex.soff);
+		if (bex.soff & (block_size - 1)) {
+			dprintk("%s: unaligned disk offset %lld\n",
+				__func__, bex.soff);
+			goto fail;
+		}
+		bex.es = be32_to_cpup(p++);
+		if (bex.es != PNFS_BLOCK_READWRITE_DATA) {
+			dprintk("%s: incorrect extent state %d\n",
+				__func__, bex.es);
+			goto fail;
+		}
+
+		iomaps[i].offset = bex.foff;
+		iomaps[i].length = bex.len;
+	}
+
+	*iomapp = iomaps;
+	return nr_iomaps;
+fail:
+	kfree(iomaps);
+	return -EINVAL;
+}
diff --git a/fs/nfsd/blocklayoutxdr.h b/fs/nfsd/blocklayoutxdr.h
new file mode 100644
index 0000000..fdc7903
--- /dev/null
+++ b/fs/nfsd/blocklayoutxdr.h
@@ -0,0 +1,62 @@
+#ifndef _NFSD_BLOCKLAYOUTXDR_H
+#define _NFSD_BLOCKLAYOUTXDR_H 1
+
+#include <linux/blkdev.h>
+#include "xdr4.h"
+
+struct iomap;
+struct xdr_stream;
+
+enum pnfs_block_extent_state {
+	PNFS_BLOCK_READWRITE_DATA	= 0,
+	PNFS_BLOCK_READ_DATA		= 1,
+	PNFS_BLOCK_INVALID_DATA		= 2,
+	PNFS_BLOCK_NONE_DATA		= 3,
+};
+
+struct pnfs_block_extent {
+	struct nfsd4_deviceid		vol_id;
+	u64				foff;
+	u64				len;
+	u64				soff;
+	enum pnfs_block_extent_state	es;
+};
+#define NFS4_BLOCK_EXTENT_SIZE		44
+
+enum pnfs_block_volume_type {
+	PNFS_BLOCK_VOLUME_SIMPLE	= 0,
+	PNFS_BLOCK_VOLUME_SLICE		= 1,
+	PNFS_BLOCK_VOLUME_CONCAT	= 2,
+	PNFS_BLOCK_VOLUME_STRIPE	= 3,
+};
+
+/*
+ * Random upper cap for the uuid length to avoid unbounded allocation.
+ * Not actually limited by the protocol.
+ */
+#define PNFS_BLOCK_UUID_LEN	128
+
+struct pnfs_block_volume {
+	enum pnfs_block_volume_type	type;
+	union {
+		struct {
+			u64		offset;
+			u32		sig_len;
+			u8		sig[PNFS_BLOCK_UUID_LEN];
+		} simple;
+	};
+};
+
+struct pnfs_block_deviceaddr {
+	u32				nr_volumes;
+	struct pnfs_block_volume	volumes[];
+};
+
+__be32 nfsd4_block_encode_getdeviceinfo(struct xdr_stream *xdr,
+		struct nfsd4_getdeviceinfo *gdp);
+__be32 nfsd4_block_encode_layoutget(struct xdr_stream *xdr,
+		struct nfsd4_layoutget *lgp);
+int nfsd4_block_decode_layoutupdate(__be32 *p, u32 len, struct iomap **iomapp,
+		u32 block_size);
+
+#endif /* _NFSD_BLOCKLAYOUTXDR_H */
diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
index 1650075..7152250 100644
--- a/fs/nfsd/nfs4layouts.c
+++ b/fs/nfsd/nfs4layouts.c
@@ -26,6 +26,7 @@ static struct nfsd4_callback_ops nfsd4_cb_layout_ops;
 static const struct lock_manager_operations nfsd4_layouts_lm_ops;
 
 const struct nfsd4_layout_ops *nfsd4_layout_ops[LAYOUT_TYPE_MAX] =  {
+	[LAYOUT_BLOCK_VOLUME]	= &bl_layout_ops,
 };
 
 /* pNFS device ID to export fsid mapping */
@@ -116,8 +117,15 @@ nfsd4_set_deviceid(struct nfsd4_deviceid *id, const struct svc_fh *fhp,
 
 void nfsd4_setup_layout_type(struct svc_export *exp)
 {
+	struct super_block *sb = exp->ex_path.mnt->mnt_sb;
+
 	if (exp->ex_flags & NFSEXP_NOPNFS)
 		return;
+
+	if (sb->s_export_op->get_uuid &&
+	    sb->s_export_op->map_blocks &&
+	    sb->s_export_op->commit_blocks)
+		exp->ex_layout_type = LAYOUT_BLOCK_VOLUME;
 }
 
 static void
diff --git a/fs/nfsd/pnfs.h b/fs/nfsd/pnfs.h
index a9616a4..fedb4d6 100644
--- a/fs/nfsd/pnfs.h
+++ b/fs/nfsd/pnfs.h
@@ -34,6 +34,7 @@ struct nfsd4_layout_ops {
 };
 
 extern const struct nfsd4_layout_ops *nfsd4_layout_ops[];
+extern const struct nfsd4_layout_ops bl_layout_ops;
 
 __be32 nfsd4_preprocess_layout_stateid(struct svc_rqst *rqstp,
 		struct nfsd4_compound_state *cstate, stateid_t *stateid,
-- 
1.9.1


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 16/20] xfs: pass a 64-bit count argument to xfs_iomap_write_unwritten
       [not found] ` <1421925006-24231-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
                     ` (8 preceding siblings ...)
  2015-01-22 11:10   ` [PATCH 14/20] exportfs: add methods for block layout exports Christoph Hellwig
@ 2015-01-22 11:10   ` Christoph Hellwig
  2015-01-29 20:52     ` J. Bruce Fields
  2015-01-22 11:10   ` [PATCH 17/20] xfs: update the superblock using a synchronous transaction in growfs Christoph Hellwig
                     ` (4 subsequent siblings)
  14 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2015-01-22 11:10 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

The code is already ready for it, and the pnfs layout commit code expects
to be able to pass a larger than 32-bit argument.

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 fs/xfs/xfs_iomap.c | 2 +-
 fs/xfs/xfs_iomap.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index c980e2a..ccb1dd0 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -802,7 +802,7 @@ int
 xfs_iomap_write_unwritten(
 	xfs_inode_t	*ip,
 	xfs_off_t	offset,
-	size_t		count)
+	xfs_off_t	count)
 {
 	xfs_mount_t	*mp = ip->i_mount;
 	xfs_fileoff_t	offset_fsb;
diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
index 411fbb8..8688e66 100644
--- a/fs/xfs/xfs_iomap.h
+++ b/fs/xfs/xfs_iomap.h
@@ -27,6 +27,6 @@ int xfs_iomap_write_delay(struct xfs_inode *, xfs_off_t, size_t,
 			struct xfs_bmbt_irec *);
 int xfs_iomap_write_allocate(struct xfs_inode *, xfs_off_t,
 			struct xfs_bmbt_irec *);
-int xfs_iomap_write_unwritten(struct xfs_inode *, xfs_off_t, size_t);
+int xfs_iomap_write_unwritten(struct xfs_inode *, xfs_off_t, xfs_off_t);
 
 #endif /* __XFS_IOMAP_H__*/
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 17/20] xfs: update the superblock using a synchronous transaction in growfs
       [not found] ` <1421925006-24231-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
                     ` (9 preceding siblings ...)
  2015-01-22 11:10   ` [PATCH 16/20] xfs: pass a 64-bit count argument to xfs_iomap_write_unwritten Christoph Hellwig
@ 2015-01-22 11:10   ` Christoph Hellwig
  2015-01-22 11:10   ` [PATCH 19/20] xfs: implement pNFS export operations Christoph Hellwig
                     ` (3 subsequent siblings)
  14 siblings, 0 replies; 63+ messages in thread
From: Christoph Hellwig @ 2015-01-22 11:10 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

Growfs updates the secondary superblocks using synchronous unlogged
buffer writes after committing the updates to the primary superblock.

Mark the transaction to the primary superblock as synchronous so that
we guarantee it is committed to disk before we update the secondary
superblocks.

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 fs/xfs/xfs_fsops.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index fdc6422..74c6211 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -488,6 +488,7 @@ xfs_growfs_data_private(
 		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, nfree);
 	if (dpct)
 		xfs_trans_mod_sb(tp, XFS_TRANS_SB_IMAXPCT, dpct);
+	xfs_trans_set_sync(tp);
 	error = xfs_trans_commit(tp, 0);
 	if (error)
 		return error;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 18/20] xfs: factor out a xfs_update_prealloc_flags() helper
  2015-01-22 11:09 a simple and scalable pNFS block layout server V2 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2015-01-22 11:10 ` [PATCH 15/20] nfsd: pNFS block layout driver Christoph Hellwig
@ 2015-01-22 11:10 ` Christoph Hellwig
       [not found]   ` <1421925006-24231-19-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
  2015-01-22 16:04 ` a simple and scalable pNFS block layout server V2 Chuck Lever
       [not found] ` <1421925006-24231-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
  8 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2015-01-22 11:10 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Jeff Layton, linux-nfs, linux-fsdevel, xfs

This logic is duplicated in xfs_file_fallocate and xfs_ioc_space, and
we'll need another copy of it for pNFS block support.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_file.c  | 64 +++++++++++++++++++++++++++++++++++-------------------
 fs/xfs/xfs_inode.h |  9 ++++++++
 fs/xfs/xfs_ioctl.c | 50 ++++++++++--------------------------------
 3 files changed, 63 insertions(+), 60 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 13e974e..712d312 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -127,6 +127,42 @@ xfs_iozero(
 	return (-status);
 }
 
+int
+xfs_update_prealloc_flags(
+	struct xfs_inode	*ip,
+	enum xfs_prealloc_flags	flags)
+{
+	struct xfs_trans	*tp;
+	int			error;
+
+	tp = xfs_trans_alloc(ip->i_mount, XFS_TRANS_WRITEID);
+	error = xfs_trans_reserve(tp, &M_RES(ip->i_mount)->tr_writeid, 0, 0);
+	if (error) {
+		xfs_trans_cancel(tp, 0);
+		return error;
+	}
+
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+
+	if (!(flags & XFS_PREALLOC_INVISIBLE)) {
+		ip->i_d.di_mode &= ~S_ISUID;
+		if (ip->i_d.di_mode & S_IXGRP)
+			ip->i_d.di_mode &= ~S_ISGID;
+		xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
+	}
+
+	if (flags & XFS_PREALLOC_SET)
+		ip->i_d.di_flags |= XFS_DIFLAG_PREALLOC;
+	if (flags & XFS_PREALLOC_CLEAR)
+		ip->i_d.di_flags &= ~XFS_DIFLAG_PREALLOC;
+
+	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+	if (flags & XFS_PREALLOC_SYNC)
+		xfs_trans_set_sync(tp);
+	return xfs_trans_commit(tp, 0);
+}
+
 /*
  * Fsync operations on directories are much simpler than on regular files,
  * as there is no file data to flush, and thus also no need for explicit
@@ -784,8 +820,8 @@ xfs_file_fallocate(
 {
 	struct inode		*inode = file_inode(file);
 	struct xfs_inode	*ip = XFS_I(inode);
-	struct xfs_trans	*tp;
 	long			error;
+	enum xfs_prealloc_flags	flags = 0;
 	loff_t			new_size = 0;
 
 	if (!S_ISREG(inode->i_mode))
@@ -822,6 +858,8 @@ xfs_file_fallocate(
 		if (error)
 			goto out_unlock;
 	} else {
+		flags |= XFS_PREALLOC_SET;
+
 		if (!(mode & FALLOC_FL_KEEP_SIZE) &&
 		    offset + len > i_size_read(inode)) {
 			new_size = offset + len;
@@ -839,28 +877,10 @@ xfs_file_fallocate(
 			goto out_unlock;
 	}
 
-	tp = xfs_trans_alloc(ip->i_mount, XFS_TRANS_WRITEID);
-	error = xfs_trans_reserve(tp, &M_RES(ip->i_mount)->tr_writeid, 0, 0);
-	if (error) {
-		xfs_trans_cancel(tp, 0);
-		goto out_unlock;
-	}
-
-	xfs_ilock(ip, XFS_ILOCK_EXCL);
-	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
-	ip->i_d.di_mode &= ~S_ISUID;
-	if (ip->i_d.di_mode & S_IXGRP)
-		ip->i_d.di_mode &= ~S_ISGID;
-
-	if (!(mode & (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_COLLAPSE_RANGE)))
-		ip->i_d.di_flags |= XFS_DIFLAG_PREALLOC;
-
-	xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
-	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
-
 	if (file->f_flags & O_DSYNC)
-		xfs_trans_set_sync(tp);
-	error = xfs_trans_commit(tp, 0);
+		flags |= XFS_PREALLOC_SYNC;
+
+	error = xfs_update_prealloc_flags(ip, flags);
 	if (error)
 		goto out_unlock;
 
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 4ed2ba9..bc220bc 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -377,6 +377,15 @@ int		xfs_droplink(struct xfs_trans *, struct xfs_inode *);
 int		xfs_bumplink(struct xfs_trans *, struct xfs_inode *);
 
 /* from xfs_file.c */
+enum xfs_prealloc_flags {
+	XFS_PREALLOC_SET	= (1 << 1),
+	XFS_PREALLOC_CLEAR	= (1 << 2),
+	XFS_PREALLOC_SYNC	= (1 << 3),
+	XFS_PREALLOC_INVISIBLE	= (1 << 4),
+};
+
+int		xfs_update_prealloc_flags(struct xfs_inode *,
+			enum xfs_prealloc_flags);
 int		xfs_zero_eof(struct xfs_inode *, xfs_off_t, xfs_fsize_t);
 int		xfs_iozero(struct xfs_inode *, loff_t, size_t);
 
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index a183198..d58bcd2 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -606,11 +606,8 @@ xfs_ioc_space(
 	unsigned int		cmd,
 	xfs_flock64_t		*bf)
 {
-	struct xfs_mount	*mp = ip->i_mount;
-	struct xfs_trans	*tp;
 	struct iattr		iattr;
-	bool			setprealloc = false;
-	bool			clrprealloc = false;
+	enum xfs_prealloc_flags	flags = 0;
 	int			error;
 
 	/*
@@ -630,6 +627,11 @@ xfs_ioc_space(
 	if (!S_ISREG(inode->i_mode))
 		return -EINVAL;
 
+	if (filp->f_flags & O_DSYNC)
+		flags |= XFS_PREALLOC_SYNC;
+	if (ioflags & XFS_IO_INVIS)	
+		flags |= XFS_PREALLOC_INVISIBLE;
+
 	error = mnt_want_write_file(filp);
 	if (error)
 		return error;
@@ -673,25 +675,23 @@ xfs_ioc_space(
 	}
 
 	if (bf->l_start < 0 ||
-	    bf->l_start > mp->m_super->s_maxbytes ||
+	    bf->l_start > inode->i_sb->s_maxbytes ||
 	    bf->l_start + bf->l_len < 0 ||
-	    bf->l_start + bf->l_len >= mp->m_super->s_maxbytes) {
+	    bf->l_start + bf->l_len >= inode->i_sb->s_maxbytes) {
 		error = -EINVAL;
 		goto out_unlock;
 	}
 
 	switch (cmd) {
 	case XFS_IOC_ZERO_RANGE:
+		flags |= XFS_PREALLOC_SET;
 		error = xfs_zero_file_space(ip, bf->l_start, bf->l_len);
-		if (!error)
-			setprealloc = true;
 		break;
 	case XFS_IOC_RESVSP:
 	case XFS_IOC_RESVSP64:
+		flags |= XFS_PREALLOC_SET;
 		error = xfs_alloc_file_space(ip, bf->l_start, bf->l_len,
 						XFS_BMAPI_PREALLOC);
-		if (!error)
-			setprealloc = true;
 		break;
 	case XFS_IOC_UNRESVSP:
 	case XFS_IOC_UNRESVSP64:
@@ -701,6 +701,7 @@ xfs_ioc_space(
 	case XFS_IOC_ALLOCSP64:
 	case XFS_IOC_FREESP:
 	case XFS_IOC_FREESP64:
+		flags |= XFS_PREALLOC_CLEAR;
 		if (bf->l_start > XFS_ISIZE(ip)) {
 			error = xfs_alloc_file_space(ip, XFS_ISIZE(ip),
 					bf->l_start - XFS_ISIZE(ip), 0);
@@ -712,8 +713,6 @@ xfs_ioc_space(
 		iattr.ia_size = bf->l_start;
 
 		error = xfs_setattr_size(ip, &iattr);
-		if (!error)
-			clrprealloc = true;
 		break;
 	default:
 		ASSERT(0);
@@ -723,32 +722,7 @@ xfs_ioc_space(
 	if (error)
 		goto out_unlock;
 
-	tp = xfs_trans_alloc(mp, XFS_TRANS_WRITEID);
-	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_writeid, 0, 0);
-	if (error) {
-		xfs_trans_cancel(tp, 0);
-		goto out_unlock;
-	}
-
-	xfs_ilock(ip, XFS_ILOCK_EXCL);
-	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
-
-	if (!(ioflags & XFS_IO_INVIS)) {
-		ip->i_d.di_mode &= ~S_ISUID;
-		if (ip->i_d.di_mode & S_IXGRP)
-			ip->i_d.di_mode &= ~S_ISGID;
-		xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
-	}
-
-	if (setprealloc)
-		ip->i_d.di_flags |= XFS_DIFLAG_PREALLOC;
-	else if (clrprealloc)
-		ip->i_d.di_flags &= ~XFS_DIFLAG_PREALLOC;
-
-	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
-	if (filp->f_flags & O_DSYNC)
-		xfs_trans_set_sync(tp);
-	error = xfs_trans_commit(tp, 0);
+	error = xfs_update_prealloc_flags(ip, flags);
 
 out_unlock:
 	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
-- 
1.9.1


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 19/20] xfs: implement pNFS export operations
       [not found] ` <1421925006-24231-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
                     ` (10 preceding siblings ...)
  2015-01-22 11:10   ` [PATCH 17/20] xfs: update the superblock using a synchronous transaction in growfs Christoph Hellwig
@ 2015-01-22 11:10   ` Christoph Hellwig
       [not found]     ` <1421925006-24231-20-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
  2015-01-22 11:10   ` [PATCH 20/20] xfs: recall pNFS layouts on conflicting access Christoph Hellwig
                     ` (2 subsequent siblings)
  14 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2015-01-22 11:10 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

Add operations to export pNFS block layouts from an XFS filesystem.  See
the previous commit adding the operations for an explanation of them.

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 fs/xfs/Makefile     |   1 +
 fs/xfs/xfs_export.c |   6 ++
 fs/xfs/xfs_fsops.c  |   2 +
 fs/xfs/xfs_iops.c   |   2 +-
 fs/xfs/xfs_iops.h   |   1 +
 fs/xfs/xfs_mount.h  |  11 +++
 fs/xfs/xfs_pnfs.c   | 243 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_pnfs.h   |  11 +++
 8 files changed, 276 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/xfs_pnfs.c
 create mode 100644 fs/xfs/xfs_pnfs.h

diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index d617999..df68285 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -121,3 +121,4 @@ xfs-$(CONFIG_XFS_POSIX_ACL)	+= xfs_acl.o
 xfs-$(CONFIG_PROC_FS)		+= xfs_stats.o
 xfs-$(CONFIG_SYSCTL)		+= xfs_sysctl.o
 xfs-$(CONFIG_COMPAT)		+= xfs_ioctl32.o
+xfs-$(CONFIG_NFSD_PNFS)		+= xfs_pnfs.o
diff --git a/fs/xfs/xfs_export.c b/fs/xfs/xfs_export.c
index 5eb4a14..b97359b 100644
--- a/fs/xfs/xfs_export.c
+++ b/fs/xfs/xfs_export.c
@@ -30,6 +30,7 @@
 #include "xfs_trace.h"
 #include "xfs_icache.h"
 #include "xfs_log.h"
+#include "xfs_pnfs.h"
 
 /*
  * Note that we only accept fileids which are long enough rather than allow
@@ -245,4 +246,9 @@ const struct export_operations xfs_export_operations = {
 	.fh_to_parent		= xfs_fs_fh_to_parent,
 	.get_parent		= xfs_fs_get_parent,
 	.commit_metadata	= xfs_fs_nfs_commit_metadata,
+#ifdef CONFIG_NFSD_PNFS
+	.get_uuid		= xfs_fs_get_uuid,
+	.map_blocks		= xfs_fs_map_blocks,
+	.commit_blocks		= xfs_fs_commit_blocks,
+#endif
 };
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 74c6211..99465ba 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -602,6 +602,8 @@ xfs_growfs_data(
 	if (!mutex_trylock(&mp->m_growlock))
 		return -EWOULDBLOCK;
 	error = xfs_growfs_data_private(mp, in);
+	if (!error)
+		mp->m_generation++;
 	mutex_unlock(&mp->m_growlock);
 	return error;
 }
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index c50311c..6ff84e8 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -496,7 +496,7 @@ xfs_setattr_mode(
 	inode->i_mode |= mode & ~S_IFMT;
 }
 
-static void
+void
 xfs_setattr_time(
 	struct xfs_inode	*ip,
 	struct iattr		*iattr)
diff --git a/fs/xfs/xfs_iops.h b/fs/xfs/xfs_iops.h
index 1c34e43..ea7a98e 100644
--- a/fs/xfs/xfs_iops.h
+++ b/fs/xfs/xfs_iops.h
@@ -32,6 +32,7 @@ extern void xfs_setup_inode(struct xfs_inode *);
  */
 #define XFS_ATTR_NOACL		0x01	/* Don't call posix_acl_chmod */
 
+extern void xfs_setattr_time(struct xfs_inode *ip, struct iattr *iattr);
 extern int xfs_setattr_nonsize(struct xfs_inode *ip, struct iattr *vap,
 			       int flags);
 extern int xfs_setattr_size(struct xfs_inode *ip, struct iattr *vap);
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 22ccf69..12925d5 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -175,6 +175,17 @@ typedef struct xfs_mount {
 	struct workqueue_struct	*m_reclaim_workqueue;
 	struct workqueue_struct	*m_log_workqueue;
 	struct workqueue_struct *m_eofblocks_workqueue;
+
+	/*
+	 * Generation of the filesysyem layout.  This is incremented by each
+	 * growfs, and used by the pNFS server to ensure the client updates
+	 * its view of the block device once it gets a layout that might
+	 * reference the newly added blocks.  Does not need to be persistent
+	 * as long as we only allow file system size increments, but if we
+	 * ever support shrinks it would have to be persisted in addition
+	 * to various other kinds of pain inflicted on the pNFS server.
+	 */
+	__uint32_t		m_generation;
 } xfs_mount_t;
 
 /*
diff --git a/fs/xfs/xfs_pnfs.c b/fs/xfs/xfs_pnfs.c
new file mode 100644
index 0000000..5d25f5d
--- /dev/null
+++ b/fs/xfs/xfs_pnfs.c
@@ -0,0 +1,243 @@
+/*
+ * Copyright (c) 2014 Christoph Hellwig.
+ */
+#include "xfs.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
+#include "xfs_inode.h"
+#include "xfs_trans.h"
+#include "xfs_log.h"
+#include "xfs_bmap.h"
+#include "xfs_bmap_util.h"
+#include "xfs_error.h"
+#include "xfs_iomap.h"
+#include "xfs_shared.h"
+#include "xfs_pnfs.h"
+
+int
+xfs_fs_get_uuid(
+	struct super_block	*sb,
+	u8			*buf,
+	u32			*len,
+	u64			*offset)
+{
+	struct xfs_mount	*mp = XFS_M(sb);
+
+	if (*len < sizeof(uuid_t))
+		return -EINVAL;
+
+	memcpy(buf, &mp->m_sb.sb_uuid, sizeof(uuid_t));
+	*len = sizeof(uuid_t);
+	*offset = offsetof(struct xfs_dsb, sb_uuid);
+	return 0;
+}
+
+static void
+xfs_bmbt_to_iomap(
+	struct xfs_inode	*ip,
+	struct iomap		*iomap,
+	struct xfs_bmbt_irec	*imap)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+
+	if (imap->br_startblock == HOLESTARTBLOCK) {
+		iomap->blkno = -1;
+		iomap->type = IOMAP_HOLE;
+	} else if (imap->br_startblock == DELAYSTARTBLOCK) {
+		iomap->blkno = -1;
+		iomap->type = IOMAP_DELALLOC;
+	} else {
+		iomap->blkno = xfs_fsb_to_db(ip, imap->br_startblock);
+		if (imap->br_state == XFS_EXT_UNWRITTEN)
+			iomap->type = IOMAP_UNWRITTEN;
+		else
+			iomap->type = IOMAP_MAPPED;
+	}
+	iomap->offset = XFS_FSB_TO_B(mp, imap->br_startoff);
+	iomap->length = XFS_FSB_TO_B(mp, imap->br_blockcount);
+}
+
+/*
+ * Get a layout for the pNFS client.
+ */
+int
+xfs_fs_map_blocks(
+	struct inode		*inode,
+	loff_t			offset,
+	u64			length,
+	struct iomap		*iomap,
+	bool			write,
+	u32			*device_generation)
+{
+	struct xfs_inode	*ip = XFS_I(inode);
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_bmbt_irec	imap;
+	xfs_fileoff_t		offset_fsb, end_fsb;
+	loff_t			limit;
+	int			bmapi_flags = XFS_BMAPI_ENTIRE;
+	int			nimaps = 1;
+	uint			lock_flags;
+	int			error = 0;
+
+	if (XFS_FORCED_SHUTDOWN(mp))
+		return -EIO;
+	if (XFS_IS_REALTIME_INODE(ip))
+		return -ENXIO;
+
+	/*
+	 * Lock out any other I/O before we flush and invalidate the pagecache,
+	 * and then hand out a layout to the remote system.  This is very
+	 * similar to direct I/O, except that the synchronization is much more
+	 * complicated.  See the comment near xfs_break_layouts for a detailed
+	 * explanation.
+	 */
+	xfs_ilock(ip, XFS_IOLOCK_EXCL);
+
+	error = -EINVAL;
+	limit = mp->m_super->s_maxbytes;
+	if (!write)
+		limit = max(limit, round_up(i_size_read(inode),
+				     inode->i_sb->s_blocksize));
+	if (offset > limit)
+		goto out_unlock;
+	if (offset > limit - length)
+		length = limit - offset;
+
+	error = filemap_write_and_wait(inode->i_mapping);
+	if (error)
+		goto out_unlock;
+	error = invalidate_inode_pages2(inode->i_mapping);
+	if (WARN_ON_ONCE(error))
+		return error;
+
+	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + length);
+	offset_fsb = XFS_B_TO_FSBT(mp, offset);
+
+	lock_flags = xfs_ilock_data_map_shared(ip);
+	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
+				&imap, &nimaps, bmapi_flags);
+	xfs_iunlock(ip, lock_flags);
+
+	if (error)
+		goto out_unlock;
+
+	if (write) {
+		enum xfs_prealloc_flags	flags = 0;
+
+		ASSERT(imap.br_startblock != DELAYSTARTBLOCK);
+
+		if (!nimaps || imap.br_startblock == HOLESTARTBLOCK) {
+			error = xfs_iomap_write_direct(ip, offset, length,
+						       &imap, nimaps);
+			if (error)
+				goto out_unlock;
+
+			/*
+			 * Ensure the next transaction is committed
+			 * synchronously so that the blocks allocated and
+			 * handed out to the client are guaranteed to be
+			 * present even after a server crash.
+			 */
+			flags |= XFS_PREALLOC_SET | XFS_PREALLOC_SYNC;
+		}
+
+		error = xfs_update_prealloc_flags(ip, flags);
+		if (error)
+			goto out_unlock;
+	}
+	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
+
+	xfs_bmbt_to_iomap(ip, iomap, &imap);
+	*device_generation = mp->m_generation;
+	return error;
+out_unlock:
+	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
+	return error;
+}
+
+/*
+ * Make sure the blocks described by maps are stable on disk.  This includes
+ * converting any unwritten extents, flushing the disk cache and updating the
+ * time stamps.
+ *
+ * Note that we rely on the caller to always send us a timestamp update so that
+ * we always commit a transaction here.  If that stops being true we will have
+ * to manually flush the cache here similar to what the fsync code path does
+ * for datasyncs on files that have no dirty metadata.
+ */
+int
+xfs_fs_commit_blocks(
+	struct inode		*inode,
+	struct iomap		*maps,
+	int			nr_maps,
+	struct iattr		*iattr)
+{
+	struct xfs_inode	*ip = XFS_I(inode);
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_trans	*tp;
+	int			error, i;
+	loff_t			size;
+
+	ASSERT(iattr->ia_valid & (ATTR_ATIME|ATTR_CTIME|ATTR_MTIME));
+
+	xfs_ilock(ip, XFS_IOLOCK_EXCL);
+
+	size = i_size_read(inode);
+	if ((iattr->ia_valid & ATTR_SIZE) && iattr->ia_size > size)
+		size = iattr->ia_size;
+
+	for (i = 0; i < nr_maps; i++) {
+		u64 start, length, end;
+
+		start = maps[i].offset;
+		if (start > size)
+			continue;
+
+		end = start + maps[i].length;
+		if (end > size)
+			end = size;
+
+		length = end - start;
+		if (!length)
+			continue;
+	
+		/*
+		 * Make sure reads through the pagecache see the new data.
+		 */
+		error = invalidate_inode_pages2_range(inode->i_mapping,
+					start >> PAGE_CACHE_SHIFT,
+					(end - 1) >> PAGE_CACHE_SHIFT);
+		WARN_ON_ONCE(error);
+
+		error = xfs_iomap_write_unwritten(ip, start, length);
+		if (error)
+			goto out_drop_iolock;
+	}
+
+	tp = xfs_trans_alloc(mp, XFS_TRANS_SETATTR_NOT_SIZE);
+	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_ichange, 0, 0);
+	if (error)
+		goto out_drop_iolock;
+
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+
+	xfs_setattr_time(ip, iattr);
+	if (iattr->ia_valid & ATTR_SIZE) {
+		if (iattr->ia_size > i_size_read(inode)) {
+			i_size_write(inode, iattr->ia_size);
+			ip->i_d.di_size = iattr->ia_size;
+		}
+	}
+
+	xfs_trans_set_sync(tp);
+	error = xfs_trans_commit(tp, 0);
+
+out_drop_iolock:
+	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
+	return error;
+}
diff --git a/fs/xfs/xfs_pnfs.h b/fs/xfs/xfs_pnfs.h
new file mode 100644
index 0000000..0d91255
--- /dev/null
+++ b/fs/xfs/xfs_pnfs.h
@@ -0,0 +1,11 @@
+#ifndef _XFS_PNFS_H
+#define _XFS_PNFS_H 1
+
+#ifdef CONFIG_NFSD_PNFS
+int xfs_fs_get_uuid(struct super_block *sb, u8 *buf, u32 *len, u64 *offset);
+int xfs_fs_map_blocks(struct inode *inode, loff_t offset, u64 length,
+		struct iomap *iomap, bool write, u32 *device_generation);
+int xfs_fs_commit_blocks(struct inode *inode, struct iomap *maps, int nr_maps,
+		struct iattr *iattr);
+#endif /* CONFIG_NFSD_PNFS */
+#endif /* _XFS_PNFS_H */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 20/20] xfs: recall pNFS layouts on conflicting access
       [not found] ` <1421925006-24231-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
                     ` (11 preceding siblings ...)
  2015-01-22 11:10   ` [PATCH 19/20] xfs: implement pNFS export operations Christoph Hellwig
@ 2015-01-22 11:10   ` Christoph Hellwig
       [not found]     ` <1421925006-24231-21-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
  2015-01-22 20:01   ` a simple and scalable pNFS block layout server V2 J. Bruce Fields
  2015-01-22 20:06   ` J. Bruce Fields
  14 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2015-01-22 11:10 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

Recall all outstanding pNFS layouts and truncates, writes and similar extent
list modifying operations.

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 fs/xfs/xfs_file.c  | 14 ++++++++++++--
 fs/xfs/xfs_ioctl.c |  9 +++++++--
 fs/xfs/xfs_iops.c  | 11 ++++++++---
 fs/xfs/xfs_pnfs.c  | 31 +++++++++++++++++++++++++++++++
 fs/xfs/xfs_pnfs.h  |  7 +++++++
 5 files changed, 65 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 712d312..56dcfce 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -36,6 +36,7 @@
 #include "xfs_trace.h"
 #include "xfs_log.h"
 #include "xfs_icache.h"
+#include "xfs_pnfs.h"
 
 #include <linux/aio.h>
 #include <linux/dcache.h>
@@ -554,6 +555,10 @@ restart:
 	if (error)
 		return error;
 
+	error = xfs_break_layouts(inode, iolock);
+	if (error)
+		return error;
+
 	/*
 	 * If the offset is beyond the size of the file, we need to zero any
 	 * blocks that fall between the existing EOF and the start of this
@@ -822,6 +827,7 @@ xfs_file_fallocate(
 	struct xfs_inode	*ip = XFS_I(inode);
 	long			error;
 	enum xfs_prealloc_flags	flags = 0;
+	uint			iolock = XFS_IOLOCK_EXCL;
 	loff_t			new_size = 0;
 
 	if (!S_ISREG(inode->i_mode))
@@ -830,7 +836,11 @@ xfs_file_fallocate(
 		     FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE))
 		return -EOPNOTSUPP;
 
-	xfs_ilock(ip, XFS_IOLOCK_EXCL);
+	xfs_ilock(ip, iolock);
+	error = xfs_break_layouts(inode, &iolock);
+	if (error)
+		goto out_unlock;
+
 	if (mode & FALLOC_FL_PUNCH_HOLE) {
 		error = xfs_free_file_space(ip, offset, len);
 		if (error)
@@ -894,7 +904,7 @@ xfs_file_fallocate(
 	}
 
 out_unlock:
-	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
+	xfs_iunlock(ip, iolock);
 	return error;
 }
 
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index d58bcd2..0b64310 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -39,6 +39,7 @@
 #include "xfs_icache.h"
 #include "xfs_symlink.h"
 #include "xfs_trans.h"
+#include "xfs_pnfs.h"
 
 #include <linux/capability.h>
 #include <linux/dcache.h>
@@ -608,6 +609,7 @@ xfs_ioc_space(
 {
 	struct iattr		iattr;
 	enum xfs_prealloc_flags	flags = 0;
+	uint			iolock = XFS_IOLOCK_EXCL;
 	int			error;
 
 	/*
@@ -636,7 +638,10 @@ xfs_ioc_space(
 	if (error)
 		return error;
 
-	xfs_ilock(ip, XFS_IOLOCK_EXCL);
+	xfs_ilock(ip, iolock);
+	error = xfs_break_layouts(inode, &iolock);
+	if (error)
+		goto out_unlock;
 
 	switch (bf->l_whence) {
 	case 0: /*SEEK_SET*/
@@ -725,7 +730,7 @@ xfs_ioc_space(
 	error = xfs_update_prealloc_flags(ip, flags);
 
 out_unlock:
-	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
+	xfs_iunlock(ip, iolock);
 	mnt_drop_write_file(filp);
 	return error;
 }
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 6ff84e8..b1e849a 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -37,6 +37,7 @@
 #include "xfs_da_btree.h"
 #include "xfs_dir2.h"
 #include "xfs_trans_space.h"
+#include "xfs_pnfs.h"
 
 #include <linux/capability.h>
 #include <linux/xattr.h>
@@ -970,9 +971,13 @@ xfs_vn_setattr(
 	int			error;
 
 	if (iattr->ia_valid & ATTR_SIZE) {
-		xfs_ilock(ip, XFS_IOLOCK_EXCL);
-		error = xfs_setattr_size(ip, iattr);
-		xfs_iunlock(ip, XFS_IOLOCK_EXCL);
+		uint		iolock = XFS_IOLOCK_EXCL;
+
+		xfs_ilock(ip, iolock);
+		error = xfs_break_layouts(dentry->d_inode, &iolock);
+		if (!error)
+			error = xfs_setattr_size(ip, iattr);
+		xfs_iunlock(ip, iolock);
 	} else {
 		error = xfs_setattr_nonsize(ip, iattr, 0);
 	}
diff --git a/fs/xfs/xfs_pnfs.c b/fs/xfs/xfs_pnfs.c
index 5d25f5d..f2bbc2f 100644
--- a/fs/xfs/xfs_pnfs.c
+++ b/fs/xfs/xfs_pnfs.c
@@ -17,6 +17,37 @@
 #include "xfs_shared.h"
 #include "xfs_pnfs.h"
 
+/*
+ * Ensure that we do not have any outstanding pNFS layouts that can be
+ * used by clients to directly read from or write to this inode.
+ * This must be called before every operation that can remove blocks
+ * from the extent map.  Additionally we call it during the write
+ * operation, where aren't concerned about exposing unallocated blocks
+ * but just want to provide basic synchronization between a local
+ * writer and pNFS clients.  mmap writes would also benefit from
+ * this sort of synchronization, but due to the tricky locking
+ * rules in the page fault path we don't bother.
+ */
+int
+xfs_break_layouts(
+	struct inode		*inode,
+	uint			*iolock)
+{
+	struct xfs_inode	*ip = XFS_I(inode);
+	int			error;
+
+	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_SHARED|XFS_IOLOCK_EXCL));
+
+	while ((error = break_layout(inode, false) == -EWOULDBLOCK)) {
+		xfs_iunlock(ip, *iolock);
+		error = break_layout(inode, true);
+		*iolock = XFS_IOLOCK_EXCL;
+		xfs_ilock(ip, *iolock);
+	}
+
+	return error;
+}
+
 int
 xfs_fs_get_uuid(
 	struct super_block	*sb,
diff --git a/fs/xfs/xfs_pnfs.h b/fs/xfs/xfs_pnfs.h
index 0d91255..b7fbfce 100644
--- a/fs/xfs/xfs_pnfs.h
+++ b/fs/xfs/xfs_pnfs.h
@@ -7,5 +7,12 @@ int xfs_fs_map_blocks(struct inode *inode, loff_t offset, u64 length,
 		struct iomap *iomap, bool write, u32 *device_generation);
 int xfs_fs_commit_blocks(struct inode *inode, struct iomap *maps, int nr_maps,
 		struct iattr *iattr);
+
+int xfs_break_layouts(struct inode *inode, uint *iolock);
+#else
+static inline int xfs_break_layouts(struct inode *inode, uint *iolock)
+{
+	return 0;
+}
 #endif /* CONFIG_NFSD_PNFS */
 #endif /* _XFS_PNFS_H */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 03/20] fs: add FL_LAYOUT lease type
  2015-01-22 11:09   ` [PATCH 03/20] fs: add FL_LAYOUT lease type Christoph Hellwig
@ 2015-01-22 15:45     ` Jeff Layton
       [not found]     ` <1421925006-24231-4-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
  1 sibling, 0 replies; 63+ messages in thread
From: Jeff Layton @ 2015-01-22 15:45 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: J. Bruce Fields, linux-nfs, linux-fsdevel, xfs

On Thu, 22 Jan 2015 12:09:49 +0100
Christoph Hellwig <hch@lst.de> wrote:

> This (ab-)uses the file locking code to allow filesystems to recall
> outstanding pNFS layouts on a file.  This new lease type is similar but
> not quite the same as FL_DELEG.  A FL_LAYOUT lease can always be granted,
> an a per-filesystem lock (XFS iolock for the initial implementation)
> ensures not FL_LAYOUT leases granted when we would need to recall them.
> 
> Also included are changes that allow multiple outstanding read
> leases of different types on the same file as long as they have a
> differnt owner.  This wasn't a problem until now as nfsd never set
> FL_LEASE leases, and no one else used FL_DELEG leases, but given that
> nfsd will also issues FL_LAYOUT leases we will have to handle it now.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/locks.c         | 14 ++++++++++----
>  include/linux/fs.h | 16 ++++++++++++++++
>  2 files changed, 26 insertions(+), 4 deletions(-)
> 

Abuse indeed. :)

I'd probably prefer to move this to a separate list within the i_flctx
instead of overloading the lease stuff, but I'm not religious about it.

I can live with this for now, and if and when we get around to
representing different types of file locks with different types of
structures, we can do that conversion then.

Acked-by: Jeff Layton <jlayton@primarydata.com>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: a simple and scalable pNFS block layout server V2
  2015-01-22 11:09 a simple and scalable pNFS block layout server V2 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2015-01-22 11:10 ` [PATCH 18/20] xfs: factor out a xfs_update_prealloc_flags() helper Christoph Hellwig
@ 2015-01-22 16:04 ` Chuck Lever
  2015-01-22 16:21   ` Christoph Hellwig
       [not found] ` <1421925006-24231-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
  8 siblings, 1 reply; 63+ messages in thread
From: Chuck Lever @ 2015-01-22 16:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: J. Bruce Fields, linux-fsdevel, Linux NFS Mailing List, Jeff Layton, xfs

Hi Christoph-

Very nice, and welcome, work.

On Jan 22, 2015, at 6:09 AM, Christoph Hellwig <hch@lst.de> wrote:

> This series adds support for the pNFS operations in NFS v4.1, as well
> as a block layout driver that can export block based filesystems that
> implement a few additional export operations.  Support for XFS is
> provided in this series, but other filesystems could be added easily.

I’m still not clear on what layout types are supported. Clearly
the block layout type is supported. But you mention XFS here. Does
that mean file layouts are also supported now? If so, can you
briefly describe the architecture and any current limitations?

Did I miss documentation for how administrators construct layouts?

> The core pNFS code of course owns its heritage to the existing Linux
> pNFS server prototype, but except for a few bits and pieces in the
> XDR path nothing is left from it.
> 
> The design of this new pNFS server is fairly different from the old
> one - while the old one implemented very little semantics in nfsd
> and left almost everything to filesystems my implementation implements
> as much as possible in common nfsd code, then dispatches to a layout
> driver that still is part of nfsd and only then calls into the
> filesystem, thus keeping it free from intimate pNFS knowledge.
> 
> This version of the code has been rebased to the locks-for-3.20
> tree which adds a new lock context structure to the inode.  For now
> we still (ab)use the lease list like in the last version.  Adding
> a pNFS-specific list would duplicate a lot of code without much
> benefit.  But during this research I came up with way to associate
> a nfs4_file with a struct file at open time which should allow
> to greatly simplify the pNFS and delegation code.  So stay tuned
> for some patches in this area!
> 
> For now this also doesn't take errata 3901 for rfc 5661 into account
> yet and sticks to the verified errata.  The changes in 3901 seem
> useful in the longer run one verified and will be implemented
> eventually. Note that the only existing pNFS block client (Linux)
> would not benefit from the longer layout lifetimes anyway.
> 
> More details are document in the individual patch descriptions and
> code comments.
> 
> This code is also available from:
> 
> 	git://git.infradead.org/users/hch/pnfs.git pnfsd-for-3.20-2
> 
> Changes since V1:
> 	- rebased to the locks-for-3.20 tree
> 	- fixed a memory leak in the nfsd4_encode_getdeviceinfo error path
> 	- removed the always one lg_roc field
> 	- added a nopnfs export option
> 	- use a synchronous transaction in layoutget to simplify recovery
> 	- various XFS fixes pointed out by Dave
> 	- various documentation typo fixes
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: a simple and scalable pNFS block layout server V2
  2015-01-22 16:04 ` a simple and scalable pNFS block layout server V2 Chuck Lever
@ 2015-01-22 16:21   ` Christoph Hellwig
  0 siblings, 0 replies; 63+ messages in thread
From: Christoph Hellwig @ 2015-01-22 16:21 UTC (permalink / raw)
  To: Chuck Lever
  Cc: J. Bruce Fields, Jeff Layton, Linux NFS Mailing List, linux-fsdevel, xfs

On Thu, Jan 22, 2015 at 11:04:15AM -0500, Chuck Lever wrote:
> I’m still not clear on what layout types are supported. Clearly
> the block layout type is supported. But you mention XFS here. Does
> that mean file layouts are also supported now? If so, can you
> briefly describe the architecture and any current limitations?
> 
> Did I miss documentation for how administrators construct layouts?

Only block layouts are supporte at the moment.  They are automatically
enabled when an XFS filesystem is exported.   There is no requirement
for the administrator to construct in layout, the only requirement
for working block layouts is shared storage that is accessible
to the NFS Server / MDS and the clients that wish to use pNFS‥

If you are interested in file layout support porting over the existing
file layout drivers from the old pNFS server tree should only be
a moderate effort, and at least the lexp code would be grately
simplified by the common recall infrastructure in this series.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: a simple and scalable pNFS block layout server V2
       [not found] ` <1421925006-24231-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
                     ` (12 preceding siblings ...)
  2015-01-22 11:10   ` [PATCH 20/20] xfs: recall pNFS layouts on conflicting access Christoph Hellwig
@ 2015-01-22 20:01   ` J. Bruce Fields
  2015-01-22 20:06   ` J. Bruce Fields
  14 siblings, 0 replies; 63+ messages in thread
From: J. Bruce Fields @ 2015-01-22 20:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jeff Layton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

On Thu, Jan 22, 2015 at 12:09:46PM +0100, Christoph Hellwig wrote:
> This series adds support for the pNFS operations in NFS v4.1, as well
> as a block layout driver that can export block based filesystems that
> implement a few additional export operations.  Support for XFS is
> provided in this series, but other filesystems could be added easily.
> 
> The core pNFS code of course owns its heritage to the existing Linux
> pNFS server prototype, but except for a few bits and pieces in the
> XDR path nothing is left from it.
> 
> The design of this new pNFS server is fairly different from the old
> one - while the old one implemented very little semantics in nfsd
> and left almost everything to filesystems my implementation implements
> as much as possible in common nfsd code, then dispatches to a layout
> driver that still is part of nfsd and only then calls into the
> filesystem, thus keeping it free from intimate pNFS knowledge.
> 
> This version of the code has been rebased to the locks-for-3.20
> tree which adds a new lock context structure to the inode.  For now
> we still (ab)use the lease list like in the last version.  Adding
> a pNFS-specific list would duplicate a lot of code without much
> benefit.  But during this research I came up with way to associate
> a nfs4_file with a struct file at open time which should allow
> to greatly simplify the pNFS and delegation code.  So stay tuned
> for some patches in this area!

Good, that could definitely use simplification.--b.

> 
> For now this also doesn't take errata 3901 for rfc 5661 into account
> yet and sticks to the verified errata.  The changes in 3901 seem
> useful in the longer run one verified and will be implemented
> eventually. Note that the only existing pNFS block client (Linux)
> would not benefit from the longer layout lifetimes anyway.
> 
> More details are document in the individual patch descriptions and
> code comments.
> 
> This code is also available from:
> 
> 	git://git.infradead.org/users/hch/pnfs.git pnfsd-for-3.20-2
> 
> Changes since V1:
> 	- rebased to the locks-for-3.20 tree
> 	- fixed a memory leak in the nfsd4_encode_getdeviceinfo error path
> 	- removed the always one lg_roc field
> 	- added a nopnfs export option
> 	- use a synchronous transaction in layoutget to simplify recovery
> 	- various XFS fixes pointed out by Dave
> 	- various documentation typo fixes
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: a simple and scalable pNFS block layout server V2
       [not found] ` <1421925006-24231-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
                     ` (13 preceding siblings ...)
  2015-01-22 20:01   ` a simple and scalable pNFS block layout server V2 J. Bruce Fields
@ 2015-01-22 20:06   ` J. Bruce Fields
  2015-01-22 20:20     ` Christoph Hellwig
       [not found]     ` <20150122200618.GI898-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
  14 siblings, 2 replies; 63+ messages in thread
From: J. Bruce Fields @ 2015-01-22 20:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jeff Layton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

On Thu, Jan 22, 2015 at 12:09:46PM +0100, Christoph Hellwig wrote:
> This series adds support for the pNFS operations in NFS v4.1, as well
> as a block layout driver that can export block based filesystems that
> implement a few additional export operations.  Support for XFS is
> provided in this series, but other filesystems could be added easily.
> 
> The core pNFS code of course owns its heritage to the existing Linux
> pNFS server prototype, but except for a few bits and pieces in the
> XDR path nothing is left from it.
> 
> The design of this new pNFS server is fairly different from the old
> one - while the old one implemented very little semantics in nfsd
> and left almost everything to filesystems my implementation implements
> as much as possible in common nfsd code, then dispatches to a layout
> driver that still is part of nfsd and only then calls into the
> filesystem, thus keeping it free from intimate pNFS knowledge.
> 
> This version of the code has been rebased to the locks-for-3.20
> tree which adds a new lock context structure to the inode.

By the way, Jeff, do you consider the lock changes are ready?  Would you
mind if I pulled or cherry-picked them into the nfsd tree just to
simplify merging this stuff for 3.20?  (Assuming it's going to be
ready.)

--b.

> For now
> we still (ab)use the lease list like in the last version.  Adding
> a pNFS-specific list would duplicate a lot of code without much
> benefit.  But during this research I came up with way to associate
> a nfs4_file with a struct file at open time which should allow
> to greatly simplify the pNFS and delegation code.  So stay tuned
> for some patches in this area!
> 
> For now this also doesn't take errata 3901 for rfc 5661 into account
> yet and sticks to the verified errata.  The changes in 3901 seem
> useful in the longer run one verified and will be implemented
> eventually. Note that the only existing pNFS block client (Linux)
> would not benefit from the longer layout lifetimes anyway.
> 
> More details are document in the individual patch descriptions and
> code comments.
> 
> This code is also available from:
> 
> 	git://git.infradead.org/users/hch/pnfs.git pnfsd-for-3.20-2
> 
> Changes since V1:
> 	- rebased to the locks-for-3.20 tree
> 	- fixed a memory leak in the nfsd4_encode_getdeviceinfo error path
> 	- removed the always one lg_roc field
> 	- added a nopnfs export option
> 	- use a synchronous transaction in layoutget to simplify recovery
> 	- various XFS fixes pointed out by Dave
> 	- various documentation typo fixes
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 03/20] fs: add FL_LAYOUT lease type
       [not found]     ` <1421925006-24231-4-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
@ 2015-01-22 20:14       ` J. Bruce Fields
       [not found]         ` <20150122201442.GJ898-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
  0 siblings, 1 reply; 63+ messages in thread
From: J. Bruce Fields @ 2015-01-22 20:14 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jeff Layton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

On Thu, Jan 22, 2015 at 12:09:49PM +0100, Christoph Hellwig wrote:
> This (ab-)uses the file locking code to allow filesystems to recall
> outstanding pNFS layouts on a file.  This new lease type is similar but
> not quite the same as FL_DELEG.  A FL_LAYOUT lease can always be granted,
> an a per-filesystem lock (XFS iolock for the initial implementation)
> ensures not FL_LAYOUT leases granted when we would need to recall them.

So when there's a conflicting operation it's xfs's responsibility to
call break_layout and wait for the recall?

(And what roughly is the set of conflicting operations?)

--b.

> 
> Also included are changes that allow multiple outstanding read
> leases of different types on the same file as long as they have a
> differnt owner.  This wasn't a problem until now as nfsd never set
> FL_LEASE leases, and no one else used FL_DELEG leases, but given that
> nfsd will also issues FL_LAYOUT leases we will have to handle it now.
> 
> Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> ---
>  fs/locks.c         | 14 ++++++++++----
>  include/linux/fs.h | 16 ++++++++++++++++
>  2 files changed, 26 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index 65350a23..6b9772d1 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -137,7 +137,7 @@
>  
>  #define IS_POSIX(fl)	(fl->fl_flags & FL_POSIX)
>  #define IS_FLOCK(fl)	(fl->fl_flags & FL_FLOCK)
> -#define IS_LEASE(fl)	(fl->fl_flags & (FL_LEASE|FL_DELEG))
> +#define IS_LEASE(fl)	(fl->fl_flags & (FL_LEASE|FL_DELEG|FL_LAYOUT))
>  #define IS_OFDLCK(fl)	(fl->fl_flags & FL_OFDLCK)
>  
>  static bool lease_breaking(struct file_lock *fl)
> @@ -1371,6 +1371,8 @@ static void time_out_leases(struct inode *inode, struct list_head *dispose)
>  
>  static bool leases_conflict(struct file_lock *lease, struct file_lock *breaker)
>  {
> +	if ((breaker->fl_flags & FL_LAYOUT) != (lease->fl_flags & FL_LAYOUT))
> +		return false;
>  	if ((breaker->fl_flags & FL_DELEG) && (lease->fl_flags & FL_LEASE))
>  		return false;
>  	return locks_conflict(breaker, lease);
> @@ -1594,11 +1596,14 @@ int fcntl_getlease(struct file *filp)
>   * conflict with the lease we're trying to set.
>   */
>  static int
> -check_conflicting_open(const struct dentry *dentry, const long arg)
> +check_conflicting_open(const struct dentry *dentry, const long arg, int flags)
>  {
>  	int ret = 0;
>  	struct inode *inode = dentry->d_inode;
>  
> +	if (flags & FL_LAYOUT)
> +		return 0;
> +
>  	if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0))
>  		return -EAGAIN;
>  
> @@ -1647,7 +1652,7 @@ generic_add_lease(struct file *filp, long arg, struct file_lock **flp, void **pr
>  
>  	spin_lock(&ctx->flc_lock);
>  	time_out_leases(inode, &dispose);
> -	error = check_conflicting_open(dentry, arg);
> +	error = check_conflicting_open(dentry, arg, lease->fl_flags);
>  	if (error)
>  		goto out;
>  
> @@ -1703,7 +1708,7 @@ generic_add_lease(struct file *filp, long arg, struct file_lock **flp, void **pr
>  	 * precedes these checks.
>  	 */
>  	smp_mb();
> -	error = check_conflicting_open(dentry, arg);
> +	error = check_conflicting_open(dentry, arg, lease->fl_flags);
>  	if (error) {
>  		locks_unlink_lock_ctx(lease, &ctx->flc_lease_cnt);
>  		goto out;
> @@ -1787,6 +1792,7 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp,
>  			WARN_ON_ONCE(1);
>  			return -ENOLCK;
>  		}
> +
>  		return generic_add_lease(filp, arg, flp, priv);
>  	default:
>  		return -EINVAL;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index f87cb2f..d5658f4 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -875,6 +875,7 @@ static inline struct file *get_file(struct file *f)
>  #define FL_DOWNGRADE_PENDING	256 /* Lease is being downgraded */
>  #define FL_UNLOCK_PENDING	512 /* Lease is being broken */
>  #define FL_OFDLCK	1024	/* lock is "owned" by struct file */
> +#define FL_LAYOUT	2048	/* outstanding pNFS layout */
>  
>  /*
>   * Special return value from posix_lock_file() and vfs_lock_file() for
> @@ -2036,6 +2037,16 @@ static inline int break_deleg_wait(struct inode **delegated_inode)
>  	return ret;
>  }
>  
> +static inline int break_layout(struct inode *inode, bool wait)
> +{
> +	smp_mb();
> +	if (inode->i_flctx && !list_empty_careful(&inode->i_flctx->flc_lease))
> +		return __break_lease(inode,
> +				wait ? O_WRONLY : O_WRONLY | O_NONBLOCK,
> +				FL_LAYOUT);
> +	return 0;
> +}
> +
>  #else /* !CONFIG_FILE_LOCKING */
>  static inline int locks_mandatory_locked(struct file *file)
>  {
> @@ -2091,6 +2102,11 @@ static inline int break_deleg_wait(struct inode **delegated_inode)
>  	return 0;
>  }
>  
> +static inline int break_layout(struct inode *inode, bool wait)
> +{
> +	return 0;
> +}
> +
>  #endif /* CONFIG_FILE_LOCKING */
>  
>  /* fs/open.c */
> -- 
> 1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 04/20] nfsd: factor out a helper to decode nfstime4 values
       [not found]   ` <1421925006-24231-5-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
@ 2015-01-22 20:15     ` J. Bruce Fields
  0 siblings, 0 replies; 63+ messages in thread
From: J. Bruce Fields @ 2015-01-22 20:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jeff Layton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

I'll go ahead and take this one.--b.

On Thu, Jan 22, 2015 at 12:09:50PM +0100, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> ---
>  fs/nfsd/nfs4xdr.c | 43 ++++++++++++++++++++++++++-----------------
>  1 file changed, 26 insertions(+), 17 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 15f7b73..884ffa3 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -234,6 +234,26 @@ static char *savemem(struct nfsd4_compoundargs *argp, __be32 *p, int nbytes)
>  	return ret;
>  }
>  
> +/*
> + * We require the high 32 bits of 'seconds' to be 0, and
> + * we ignore all 32 bits of 'nseconds'.
> + */
> +static __be32
> +nfsd4_decode_time(struct nfsd4_compoundargs *argp, struct timespec *tv)
> +{
> +	DECODE_HEAD;
> +	u64 sec;
> +
> +	READ_BUF(12);
> +	p = xdr_decode_hyper(p, &sec);
> +	tv->tv_sec = sec;
> +	tv->tv_nsec = be32_to_cpup(p++);
> +	if (tv->tv_nsec >= (u32)1000000000)
> +		return nfserr_inval;
> +
> +	DECODE_TAIL;
> +}
> +
>  static __be32
>  nfsd4_decode_bitmap(struct nfsd4_compoundargs *argp, u32 *bmval)
>  {
> @@ -267,7 +287,6 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval,
>  {
>  	int expected_len, len = 0;
>  	u32 dummy32;
> -	u64 sec;
>  	char *buf;
>  
>  	DECODE_HEAD;
> @@ -358,15 +377,10 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval,
>  		dummy32 = be32_to_cpup(p++);
>  		switch (dummy32) {
>  		case NFS4_SET_TO_CLIENT_TIME:
> -			/* We require the high 32 bits of 'seconds' to be 0, and we ignore
> -			   all 32 bits of 'nseconds'. */
> -			READ_BUF(12);
>  			len += 12;
> -			p = xdr_decode_hyper(p, &sec);
> -			iattr->ia_atime.tv_sec = (time_t)sec;
> -			iattr->ia_atime.tv_nsec = be32_to_cpup(p++);
> -			if (iattr->ia_atime.tv_nsec >= (u32)1000000000)
> -				return nfserr_inval;
> +			status = nfsd4_decode_time(argp, &iattr->ia_atime);
> +			if (status)
> +				return status;
>  			iattr->ia_valid |= (ATTR_ATIME | ATTR_ATIME_SET);
>  			break;
>  		case NFS4_SET_TO_SERVER_TIME:
> @@ -382,15 +396,10 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval,
>  		dummy32 = be32_to_cpup(p++);
>  		switch (dummy32) {
>  		case NFS4_SET_TO_CLIENT_TIME:
> -			/* We require the high 32 bits of 'seconds' to be 0, and we ignore
> -			   all 32 bits of 'nseconds'. */
> -			READ_BUF(12);
>  			len += 12;
> -			p = xdr_decode_hyper(p, &sec);
> -			iattr->ia_mtime.tv_sec = sec;
> -			iattr->ia_mtime.tv_nsec = be32_to_cpup(p++);
> -			if (iattr->ia_mtime.tv_nsec >= (u32)1000000000)
> -				return nfserr_inval;
> +			status = nfsd4_decode_time(argp, &iattr->ia_mtime);
> +			if (status)
> +				return status;
>  			iattr->ia_valid |= (ATTR_MTIME | ATTR_MTIME_SET);
>  			break;
>  		case NFS4_SET_TO_SERVER_TIME:
> -- 
> 1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 03/20] fs: add FL_LAYOUT lease type
       [not found]         ` <20150122201442.GJ898-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
@ 2015-01-22 20:18           ` Christoph Hellwig
  0 siblings, 0 replies; 63+ messages in thread
From: Christoph Hellwig @ 2015-01-22 20:18 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Christoph Hellwig, Jeff Layton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

On Thu, Jan 22, 2015 at 03:14:42PM -0500, J. Bruce Fields wrote:
> On Thu, Jan 22, 2015 at 12:09:49PM +0100, Christoph Hellwig wrote:
> > This (ab-)uses the file locking code to allow filesystems to recall
> > outstanding pNFS layouts on a file.  This new lease type is similar but
> > not quite the same as FL_DELEG.  A FL_LAYOUT lease can always be granted,
> > an a per-filesystem lock (XFS iolock for the initial implementation)
> > ensures not FL_LAYOUT leases granted when we would need to recall them.
> 
> So when there's a conflicting operation it's xfs's responsibility to
> call break_layout and wait for the recall?
> 
> (And what roughly is the set of conflicting operations?)

The last patch in the series has a comment explaining this.

There's two categories:

 a) operations that need to be protected to maintain filesystem integrity:
    truncate, hole punch, collapse range, a well as any new operation
    that can deallocate blocks.
 b) operations that write data locally, and we want to provide a best
    effort attempt at data cohrency between local use and pFNS clients.
    This covers writes in all variants, and should apply to shared mmap
    writes, but sleeping in the page fault handler is too hard to bother
    with for this sort of best effort coherency.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: a simple and scalable pNFS block layout server V2
  2015-01-22 20:06   ` J. Bruce Fields
@ 2015-01-22 20:20     ` Christoph Hellwig
       [not found]     ` <20150122200618.GI898-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
  1 sibling, 0 replies; 63+ messages in thread
From: Christoph Hellwig @ 2015-01-22 20:20 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Christoph Hellwig, Jeff Layton, linux-nfs, linux-fsdevel, xfs

On Thu, Jan 22, 2015 at 03:06:18PM -0500, J. Bruce Fields wrote:
> By the way, Jeff, do you consider the lock changes are ready?  Would you
> mind if I pulled or cherry-picked them into the nfsd tree just to
> simplify merging this stuff for 3.20?  (Assuming it's going to be
> ready.)

I did the rebase because he mentioned he wanted to get it into 3.20.

Note that the conflicts were fairly trivial, so it's something we
could just ignore and let Linus handle the merge in the worst case.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: a simple and scalable pNFS block layout server V2
       [not found]     ` <20150122200618.GI898-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
@ 2015-01-22 20:20       ` Jeff Layton
  0 siblings, 0 replies; 63+ messages in thread
From: Jeff Layton @ 2015-01-22 20:20 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Christoph Hellwig, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

On Thu, 22 Jan 2015 15:06:18 -0500
"J. Bruce Fields" <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> wrote:

> On Thu, Jan 22, 2015 at 12:09:46PM +0100, Christoph Hellwig wrote:
> > This series adds support for the pNFS operations in NFS v4.1, as well
> > as a block layout driver that can export block based filesystems that
> > implement a few additional export operations.  Support for XFS is
> > provided in this series, but other filesystems could be added easily.
> > 
> > The core pNFS code of course owns its heritage to the existing Linux
> > pNFS server prototype, but except for a few bits and pieces in the
> > XDR path nothing is left from it.
> > 
> > The design of this new pNFS server is fairly different from the old
> > one - while the old one implemented very little semantics in nfsd
> > and left almost everything to filesystems my implementation implements
> > as much as possible in common nfsd code, then dispatches to a layout
> > driver that still is part of nfsd and only then calls into the
> > filesystem, thus keeping it free from intimate pNFS knowledge.
> > 
> > This version of the code has been rebased to the locks-for-3.20
> > tree which adds a new lock context structure to the inode.
> 
> By the way, Jeff, do you consider the lock changes are ready?  Would you
> mind if I pulled or cherry-picked them into the nfsd tree just to
> simplify merging this stuff for 3.20?  (Assuming it's going to be
> ready.)
> 
> --b.
> 

Sure, that'd be fine.

The latest version of that set seems to be OK (tip commit ==
8116bf4cb62d33) and has been sitting in linux-next for a while now.
I'll let you know if I get any reports of problems in that area between
now and the merge window though.

Thanks,
Jeff

> > For now
> > we still (ab)use the lease list like in the last version.  Adding
> > a pNFS-specific list would duplicate a lot of code without much
> > benefit.  But during this research I came up with way to associate
> > a nfs4_file with a struct file at open time which should allow
> > to greatly simplify the pNFS and delegation code.  So stay tuned
> > for some patches in this area!
> > 
> > For now this also doesn't take errata 3901 for rfc 5661 into account
> > yet and sticks to the verified errata.  The changes in 3901 seem
> > useful in the longer run one verified and will be implemented
> > eventually. Note that the only existing pNFS block client (Linux)
> > would not benefit from the longer layout lifetimes anyway.
> > 
> > More details are document in the individual patch descriptions and
> > code comments.
> > 
> > This code is also available from:
> > 
> > 	git://git.infradead.org/users/hch/pnfs.git pnfsd-for-3.20-2
> > 
> > Changes since V1:
> > 	- rebased to the locks-for-3.20 tree
> > 	- fixed a memory leak in the nfsd4_encode_getdeviceinfo error path
> > 	- removed the always one lg_roc field
> > 	- added a nopnfs export option
> > 	- use a synchronous transaction in layoutget to simplify recovery
> > 	- various XFS fixes pointed out by Dave
> > 	- various documentation typo fixes


-- 
Jeff Layton <jlayton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 10/20] nfsd: implement pNFS operations
       [not found]     ` <1421925006-24231-11-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
@ 2015-01-29 20:33       ` J. Bruce Fields
       [not found]         ` <20150129203346.GA11064-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
  0 siblings, 1 reply; 63+ messages in thread
From: J. Bruce Fields @ 2015-01-29 20:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jeff Layton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

On Thu, Jan 22, 2015 at 12:09:56PM +0100, Christoph Hellwig wrote:
> Add support for the GETDEVICEINFO, LAYOUTGET, LAYOUTCOMMIT and
> LAYOUTRETURN NFSv4.1 operations, as well as backing code to manage
> outstanding layouts and devices.
> 
> Layout management is very straight forward, with a nfs4_layout_stateid
> structure that extends nfs4_stid to manage layout stateids as the
> top-level structure.  It is linked into the nfs4_file and nfs4_client
> structures like the other stateids, and contains a linked list of
> layouts that hang of the stateid.  The actual layout operations are
> implemented in layout drivers that are not part of this commit, but
> will be added later.
> 
> The worst part of this commit is the management of the pNFS device IDs,
> which suffers from a specification that is not sanely implementable due
> to the fact that the device-IDs are global and not bound to an export,
> and have a small enough size so that we can't store the fsid portion of
> a file handle, and must never be reused.  As we still do need perform all
> export authentication and validation checks on a device ID passed to
> GETDEVICEINFO we are caught between a rock and a hard place.  To work
> around this issue we add a new hash that maps from a 64-bit integer to a
> fsid so that we can look up the export to authenticate against it,
> a 32-bit integer as a generation that we can bump when changing the device,
> and a currently unused 32-bit integer that could be used in the future
> to handle more than a single device per export.  Entries in this hash
> table are never deleted as we can't reuse the ids anyway, and would have
> a severe lifetime problem anyway as Linux export structures are temporary
> structures that can go away under load.

Looks to me like that works.

...
> diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
> new file mode 100644
> index 0000000..28c8ff2
> --- /dev/null
> +++ b/fs/nfsd/nfs4layouts.c
> @@ -0,0 +1,488 @@
...
> +__be32
> +nfsd4_preprocess_layout_stateid(struct svc_rqst *rqstp,
> +		struct nfsd4_compound_state *cstate, stateid_t *stateid,
> +		bool create, u32 layout_type, struct nfs4_layout_stateid **lsp)
> +{
> +	struct nfs4_layout_stateid *ls;
> +	struct nfs4_stid *stid;
> +	unsigned char typemask = NFS4_LAYOUT_STID;
> +	__be32 status;
> +
> +	if (create)
> +		typemask |= (NFS4_OPEN_STID | NFS4_LOCK_STID | NFS4_DELEG_STID);
> +
> +	status = nfsd4_lookup_stateid(cstate, stateid, typemask, &stid,
> +			net_generic(SVC_NET(rqstp), nfsd_net_id));
> +	if (status)
> +		goto out;
> +
> +	if (!fh_match(&cstate->current_fh.fh_handle,
> +		      &stid->sc_file->fi_fhandle)) {
> +		status = nfserr_bad_stateid;
> +		goto out_put_stid;
> +	}
> +
> +	if (stid->sc_type != NFS4_LAYOUT_STID) {
> +		ls = nfsd4_alloc_layout_stateid(cstate, stid, layout_type);
> +		nfs4_put_stid(stid);
> +
> +		status = nfserr_jukebox;
> +		if (!ls)
> +			goto out;
> +	} else {
> +		ls = container_of(stid, struct nfs4_layout_stateid, ls_stid);
> +
> +		status = nfserr_bad_stateid;
> +		if (stateid->si_generation > stid->sc_stateid.si_generation)
> +			goto out_put_stid;

Is there no old_stateid case for layout stateids?  And is there any
chance of wraparound?  (I was comparing to check_stateid_generation and
expecting the only difference to be the handling of the generation-zero
case.)

> +		if (layout_type != ls->ls_layout_type)
> +			goto out_put_stid;
> +	}
> +
> +	*lsp = ls;
> +	return 0;
> +
> +out_put_stid:
> +	nfs4_put_stid(stid);
> +out:
> +	return status;
> +}
> +
> +static inline u64
> +layout_end(struct nfsd4_layout_seg *seg)
> +{
> +	u64 end = seg->offset + seg->length;
> +	return end >= seg->offset ? seg->length : NFS4_MAX_UINT64;

Shouldn't that be

	return end >= seg->offset ? end : NFS_MAX_UINT64;

?

> +}
> +
> +static void
> +layout_update_len(struct nfsd4_layout_seg *lo, u64 end)
> +{
> +	if (end == NFS4_MAX_UINT64)
> +		lo->length = NFS4_MAX_UINT64;

Is this case necessary?

> +	else
> +		lo->length = end - lo->offset;
> +}
...
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index ac71d13..b813913 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
...
> @@ -1966,6 +2213,25 @@ static struct nfsd4_operation nfsd4_ops[] = {
>  		.op_get_currentstateid = (stateid_getter)nfsd4_get_freestateid,
>  		.op_rsize_bop = (nfsd4op_rsize)nfsd4_only_status_rsize,
>  	},
> +#ifdef CONFIG_NFSD_PNFS
> +	[OP_GETDEVICEINFO] = {
> +		.op_func = (nfsd4op_func)nfsd4_getdeviceinfo,
> +		.op_flags = ALLOWED_WITHOUT_FH,
> +		.op_name = "OP_GETDEVICEINFO",
> +	},
> +	[OP_LAYOUTGET] = {
> +		.op_func = (nfsd4op_func)nfsd4_layoutget,
> +		.op_name = "OP_LAYOUTGET",
> +	},
> +	[OP_LAYOUTCOMMIT] = {
> +		.op_func = (nfsd4op_func)nfsd4_layoutcommit,
> +		.op_name = "OP_LAYOUTCOMMIT",
> +	},
> +	[OP_LAYOUTRETURN] = {
> +		.op_func = (nfsd4op_func)nfsd4_layoutreturn,
> +		.op_name = "OP_LAYOUTRETURN",
> +	},

Should any of these have OP_MODIFIES_SOMETHING set?  (Basically: would
we be in trouble if we succesfully completed one of these operations and
then weren't able to encode the result?)

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 16/20] xfs: pass a 64-bit count argument to xfs_iomap_write_unwritten
  2015-01-22 11:10   ` [PATCH 16/20] xfs: pass a 64-bit count argument to xfs_iomap_write_unwritten Christoph Hellwig
@ 2015-01-29 20:52     ` J. Bruce Fields
  2015-02-02  7:30       ` Christoph Hellwig
  0 siblings, 1 reply; 63+ messages in thread
From: J. Bruce Fields @ 2015-01-29 20:52 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-nfs, Jeff Layton, xfs

Who can give us ACKs on these last five fs/xfs patches?  (And is it
going to cause trouble if they go in through the nfsd tree?)

--b.

On Thu, Jan 22, 2015 at 12:10:02PM +0100, Christoph Hellwig wrote:
> The code is already ready for it, and the pnfs layout commit code expects
> to be able to pass a larger than 32-bit argument.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_iomap.c | 2 +-
>  fs/xfs/xfs_iomap.h | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index c980e2a..ccb1dd0 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -802,7 +802,7 @@ int
>  xfs_iomap_write_unwritten(
>  	xfs_inode_t	*ip,
>  	xfs_off_t	offset,
> -	size_t		count)
> +	xfs_off_t	count)
>  {
>  	xfs_mount_t	*mp = ip->i_mount;
>  	xfs_fileoff_t	offset_fsb;
> diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
> index 411fbb8..8688e66 100644
> --- a/fs/xfs/xfs_iomap.h
> +++ b/fs/xfs/xfs_iomap.h
> @@ -27,6 +27,6 @@ int xfs_iomap_write_delay(struct xfs_inode *, xfs_off_t, size_t,
>  			struct xfs_bmbt_irec *);
>  int xfs_iomap_write_allocate(struct xfs_inode *, xfs_off_t,
>  			struct xfs_bmbt_irec *);
> -int xfs_iomap_write_unwritten(struct xfs_inode *, xfs_off_t, size_t);
> +int xfs_iomap_write_unwritten(struct xfs_inode *, xfs_off_t, xfs_off_t);
>  
>  #endif /* __XFS_IOMAP_H__*/
> -- 
> 1.9.1

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 18/20] xfs: factor out a xfs_update_prealloc_flags() helper
       [not found]   ` <1421925006-24231-19-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
@ 2015-02-01 23:06     ` Dave Chinner
  0 siblings, 0 replies; 63+ messages in thread
From: Dave Chinner @ 2015-02-01 23:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: J. Bruce Fields, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, Jeff Layton,
	xfs-VZNHf3L845pBDgjK7y7TUQ

On Thu, Jan 22, 2015 at 12:10:04PM +0100, Christoph Hellwig wrote:
> This logic is duplicated in xfs_file_fallocate and xfs_ioc_space, and
> we'll need another copy of it for pNFS block support.
> 
> Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>

This stands alone, so I'm pulling it into the XFS tree without
waiting for the rest of the patch set.

Cheers,

Dave.
-- 
Dave Chinner
david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 16/20] xfs: pass a 64-bit count argument to xfs_iomap_write_unwritten
  2015-01-29 20:52     ` J. Bruce Fields
@ 2015-02-02  7:30       ` Christoph Hellwig
  2015-02-02 19:24         ` Dave Chinner
  0 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2015-02-02  7:30 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-fsdevel, linux-nfs, Jeff Layton, xfs

On Thu, Jan 29, 2015 at 03:52:32PM -0500, J. Bruce Fields wrote:
> Who can give us ACKs on these last five fs/xfs patches?  (And is it
> going to cause trouble if they go in through the nfsd tree?)


We'd need ACKs from Dave.  He already has pulled in two patches so
we might run into some conflicts.  Maybe the best idea is to add the
exportfs patch to both the XFS and nfsd trees, so each of them can
pull in the rest?  Or we could commit the two XFS preparation patches
to both tree and get something that compiles and works in the nfsd
tree.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 10/20] nfsd: implement pNFS operations
       [not found]         ` <20150129203346.GA11064-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
@ 2015-02-02 12:43           ` Christoph Hellwig
  2015-02-02 14:28             ` J. Bruce Fields
  0 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2015-02-02 12:43 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

On Thu, Jan 29, 2015 at 03:33:46PM -0500, J. Bruce Fields wrote:
> Is there no old_stateid case for layout stateids?  And is there any
> chance of wraparound?  (I was comparing to check_stateid_generation and
> expecting the only difference to be the handling of the generation-zero
> case.)

12.5.3. explicitly mentions that LAYOUTGET and LAYOUTRETURN might be
outstading and processed in parallel, and sais that pNFS operations
use special stateid rules.  It does not explicitly say that old stateids
are ok, but the model described in there very much requires the server
to not reject them.

> > +static inline u64
> > +layout_end(struct nfsd4_layout_seg *seg)
> > +{
> > +	u64 end = seg->offset + seg->length;
> > +	return end >= seg->offset ? seg->length : NFS4_MAX_UINT64;
> 
> Shouldn't that be
> 
> 	return end >= seg->offset ? end : NFS_MAX_UINT64;
> 
> ?

Yes.  This is an interesting one that sneaked in, and it turns out
besides dislabling layout merging it didn't have adverse effects.

> > +}
> > +
> > +static void
> > +layout_update_len(struct nfsd4_layout_seg *lo, u64 end)
> > +{
> > +	if (end == NFS4_MAX_UINT64)
> > +		lo->length = NFS4_MAX_UINT64;
> 
> Is this case necessary?
> 
> > +	else
> > +		lo->length = end - lo->offset;


We use NFS4_MAX_UINT64 as a magic value for layouts until the
field end, as specified in the standard.  But because we do all
kinds of calculations using the end value we need to propagate
that magic from and to it.

> Should any of these have OP_MODIFIES_SOMETHING set?  (Basically: would
> we be in trouble if we succesfully completed one of these operations and
> then weren't able to encode the result?)

All but GETDEVICEINFO should get it.

I've implemented all your suggested changes and will send out and update
after doing a little more testing.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 10/20] nfsd: implement pNFS operations
  2015-02-02 12:43           ` Christoph Hellwig
@ 2015-02-02 14:28             ` J. Bruce Fields
       [not found]               ` <20150202142832.GC22301-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
  0 siblings, 1 reply; 63+ messages in thread
From: J. Bruce Fields @ 2015-02-02 14:28 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-nfs, Jeff Layton, xfs

On Mon, Feb 02, 2015 at 01:43:49PM +0100, Christoph Hellwig wrote:
> On Thu, Jan 29, 2015 at 03:33:46PM -0500, J. Bruce Fields wrote:
> > Is there no old_stateid case for layout stateids?  And is there any
> > chance of wraparound?  (I was comparing to check_stateid_generation and
> > expecting the only difference to be the handling of the generation-zero
> > case.)
> 
> 12.5.3. explicitly mentions that LAYOUTGET and LAYOUTRETURN might be
> outstading and processed in parallel, and sais that pNFS operations
> use special stateid rules.  It does not explicitly say that old stateids
> are ok, but the model described in there very much requires the server
> to not reject them.
> 
> > > +static inline u64
> > > +layout_end(struct nfsd4_layout_seg *seg)
> > > +{
> > > +	u64 end = seg->offset + seg->length;
> > > +	return end >= seg->offset ? seg->length : NFS4_MAX_UINT64;
> > 
> > Shouldn't that be
> > 
> > 	return end >= seg->offset ? end : NFS_MAX_UINT64;
> > 
> > ?
> 
> Yes.  This is an interesting one that sneaked in, and it turns out
> besides dislabling layout merging it didn't have adverse effects.
> 
> > > +}
> > > +
> > > +static void
> > > +layout_update_len(struct nfsd4_layout_seg *lo, u64 end)
> > > +{
> > > +	if (end == NFS4_MAX_UINT64)
> > > +		lo->length = NFS4_MAX_UINT64;
> > 
> > Is this case necessary?
> > 
> > > +	else
> > > +		lo->length = end - lo->offset;
> 
> 
> We use NFS4_MAX_UINT64 as a magic value for layouts until the
> field end, as specified in the standard.  But because we do all
> kinds of calculations using the end value we need to propagate
> that magic from and to it.
> 
> > Should any of these have OP_MODIFIES_SOMETHING set?  (Basically: would
> > we be in trouble if we succesfully completed one of these operations and
> > then weren't able to encode the result?)
> 
> All but GETDEVICEINFO should get it.
> 
> I've implemented all your suggested changes and will send out and update
> after doing a little more testing.

Thanks!

I didn't notice anything that looked like a big problem to me, so absent
any objections I'll commit the revised versions for 3.20 once we figure
out how to handle the xfs stuff.

--b.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 10/20] nfsd: implement pNFS operations
       [not found]               ` <20150202142832.GC22301-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
@ 2015-02-02 14:56                 ` Christoph Hellwig
       [not found]                   ` <20150202145619.GA18387-jcswGhMUV9g@public.gmane.org>
  0 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2015-02-02 14:56 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

On Mon, Feb 02, 2015 at 09:28:32AM -0500, J. Bruce Fields wrote:
> I didn't notice anything that looked like a big problem to me, so absent
> any objections I'll commit the revised versions for 3.20 once we figure
> out how to handle the xfs stuff.

Do you want the resend to be on top of Jeffs locks tree, or do you
want me to be based just on the nfsd changes, which would require
a fairly trivial merge once both of the trees hit mainline?
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 10/20] nfsd: implement pNFS operations
       [not found]                   ` <20150202145619.GA18387-jcswGhMUV9g@public.gmane.org>
@ 2015-02-02 15:00                     ` J. Bruce Fields
       [not found]                       ` <20150202150032.GD22301-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
  0 siblings, 1 reply; 63+ messages in thread
From: J. Bruce Fields @ 2015-02-02 15:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jeff Layton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

On Mon, Feb 02, 2015 at 03:56:19PM +0100, Christoph Hellwig wrote:
> On Mon, Feb 02, 2015 at 09:28:32AM -0500, J. Bruce Fields wrote:
> > I didn't notice anything that looked like a big problem to me, so absent
> > any objections I'll commit the revised versions for 3.20 once we figure
> > out how to handle the xfs stuff.
> 
> Do you want the resend to be on top of Jeffs locks tree, or do you
> want me to be based just on the nfsd changes, which would require
> a fairly trivial merge once both of the trees hit mainline?

I'm planning to pull Jeff's tree and then apply these on top.  (Even if
the conflict's fairly trivial I'm just happier being able to test the
combination exactly as they're commited.)

I'll do that now, should be pushed out in an hour or two.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 10/20] nfsd: implement pNFS operations
       [not found]                       ` <20150202150032.GD22301-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
@ 2015-02-02 18:56                         ` Christoph Hellwig
       [not found]                           ` <20150202185638.GB23319-jcswGhMUV9g@public.gmane.org>
  0 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2015-02-02 18:56 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

On Mon, Feb 02, 2015 at 10:00:32AM -0500, J. Bruce Fields wrote:
> I'm planning to pull Jeff's tree and then apply these on top.  (Even if
> the conflict's fairly trivial I'm just happier being able to test the
> combination exactly as they're commited.)
> 
> I'll do that now, should be pushed out in an hour or two.

I've pushed out a pnfsd-for-3.20-3 branch to
git://git.infradead.org/users/hch/pnfs.git

Or do you want me to resend the whole thing?
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 16/20] xfs: pass a 64-bit count argument to xfs_iomap_write_unwritten
  2015-02-02  7:30       ` Christoph Hellwig
@ 2015-02-02 19:24         ` Dave Chinner
  2015-02-02 19:43           ` Dave Chinner
  0 siblings, 1 reply; 63+ messages in thread
From: Dave Chinner @ 2015-02-02 19:24 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: J. Bruce Fields, linux-fsdevel, linux-nfs, Jeff Layton, xfs

On Mon, Feb 02, 2015 at 08:30:24AM +0100, Christoph Hellwig wrote:
> On Thu, Jan 29, 2015 at 03:52:32PM -0500, J. Bruce Fields wrote:
> > Who can give us ACKs on these last five fs/xfs patches?  (And is it
> > going to cause trouble if they go in through the nfsd tree?)
> 
> 
> We'd need ACKs from Dave.  He already has pulled in two patches so
> we might run into some conflicts.  Maybe the best idea is to add the
> exportfs patch to both the XFS and nfsd trees, so each of them can
> pull in the rest?  Or we could commit the two XFS preparation patches
> to both tree and get something that compiles and works in the nfsd
> tree.

This patch has already been committed to the XFS repo.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 16/20] xfs: pass a 64-bit count argument to xfs_iomap_write_unwritten
  2015-02-02 19:24         ` Dave Chinner
@ 2015-02-02 19:43           ` Dave Chinner
  2015-02-02 19:48             ` J. Bruce Fields
  2015-02-04  7:57             ` Christoph Hellwig
  0 siblings, 2 replies; 63+ messages in thread
From: Dave Chinner @ 2015-02-02 19:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: J. Bruce Fields, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, Jeff Layton,
	xfs-VZNHf3L845pBDgjK7y7TUQ

On Tue, Feb 03, 2015 at 06:24:04AM +1100, Dave Chinner wrote:
> On Mon, Feb 02, 2015 at 08:30:24AM +0100, Christoph Hellwig wrote:
> > On Thu, Jan 29, 2015 at 03:52:32PM -0500, J. Bruce Fields wrote:
> > > Who can give us ACKs on these last five fs/xfs patches?  (And is it
> > > going to cause trouble if they go in through the nfsd tree?)
> > 
> > 
> > We'd need ACKs from Dave.  He already has pulled in two patches so
> > we might run into some conflicts.  Maybe the best idea is to add the
> > exportfs patch to both the XFS and nfsd trees, so each of them can
> > pull in the rest?  Or we could commit the two XFS preparation patches
> > to both tree and get something that compiles and works in the nfsd
> > tree.
> 
> This patch has already been committed to the XFS repo.

And it looks like I missed the sync transaction on growfs patch,
too, so I'll commit that one later today.

As to the pNFSD specific changes, I haven't really looked them over
in any great detail yet. My main concern is that there are no
specific regression tests for this yet, I'm not sure how we go about
verifying it actually works properly and we don't inadvertantly
break it in the future. Christoph?

Cheers,

Dave.
-- 
Dave Chinner
david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 16/20] xfs: pass a 64-bit count argument to xfs_iomap_write_unwritten
  2015-02-02 19:43           ` Dave Chinner
@ 2015-02-02 19:48             ` J. Bruce Fields
  2015-02-03 18:35               ` Christoph Hellwig
  2015-02-04  7:57             ` Christoph Hellwig
  1 sibling, 1 reply; 63+ messages in thread
From: J. Bruce Fields @ 2015-02-02 19:48 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, Jeff Layton,
	xfs-VZNHf3L845pBDgjK7y7TUQ

On Tue, Feb 03, 2015 at 06:43:00AM +1100, Dave Chinner wrote:
> On Tue, Feb 03, 2015 at 06:24:04AM +1100, Dave Chinner wrote:
> > On Mon, Feb 02, 2015 at 08:30:24AM +0100, Christoph Hellwig wrote:
> > > On Thu, Jan 29, 2015 at 03:52:32PM -0500, J. Bruce Fields wrote:
> > > > Who can give us ACKs on these last five fs/xfs patches?  (And is it
> > > > going to cause trouble if they go in through the nfsd tree?)
> > > 
> > > 
> > > We'd need ACKs from Dave.  He already has pulled in two patches so
> > > we might run into some conflicts.  Maybe the best idea is to add the
> > > exportfs patch to both the XFS and nfsd trees, so each of them can
> > > pull in the rest?  Or we could commit the two XFS preparation patches
> > > to both tree and get something that compiles and works in the nfsd
> > > tree.
> > 
> > This patch has already been committed to the XFS repo.
> 
> And it looks like I missed the sync transaction on growfs patch,
> too, so I'll commit that one later today.
> 
> As to the pNFSD specific changes, I haven't really looked them over
> in any great detail yet. My main concern is that there are no
> specific regression tests for this yet, I'm not sure how we go about
> verifying it actually works properly and we don't inadvertantly
> break it in the future. Christoph?

Previously: http://lkml.kernel.org/r/20150106175611.GA16413-jcswGhMUV9g@public.gmane.org

	>       - any advice on testing?  Is there was some simple
	>       virtual setup that would allow any loser with no special
	>       hardware (e.g., me) to check whether they've broken the
	>       block server?

	Run two kvm VMs that share the same disk.  Create an XFS
	filesystem on the MDS, and export it.  If the client has blkmapd
	running (on Debian it needs to be started manually) it will use
	pNFS for accessing the filesystem.  Verify that using the
	per-operation counters in /proc/self/mounstats.  Repeat with
	additional clients as nessecary.

	Alternatively set up a simple iSCSI target using tgt or lio and
	connect to it from multiple clients.

Which sounds reasonable to me, but I haven't tried to incorporate this
into my regression testing yet.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 10/20] nfsd: implement pNFS operations
       [not found]                           ` <20150202185638.GB23319-jcswGhMUV9g@public.gmane.org>
@ 2015-02-03 16:08                             ` J. Bruce Fields
  0 siblings, 0 replies; 63+ messages in thread
From: J. Bruce Fields @ 2015-02-03 16:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jeff Layton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

On Mon, Feb 02, 2015 at 07:56:38PM +0100, Christoph Hellwig wrote:
> On Mon, Feb 02, 2015 at 10:00:32AM -0500, J. Bruce Fields wrote:
> > I'm planning to pull Jeff's tree and then apply these on top.  (Even if
> > the conflict's fairly trivial I'm just happier being able to test the
> > combination exactly as they're commited.)
> > 
> > I'll do that now, should be pushed out in an hour or two.
> 
> I've pushed out a pnfsd-for-3.20-3 branch to
> git://git.infradead.org/users/hch/pnfs.git
> 
> Or do you want me to resend the whole thing?

No, looks fine to me as is.  Thanks!  I'll play around with it a little
more and then push it out.

GETLAYOUT is going to be annoying if we need big layouts.  We'll have to
figure out what to do with that reply size estimation, and clients will
have to figure out how to get the maxresponsesized_cached right, I
guess?  Oh well.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 16/20] xfs: pass a 64-bit count argument to xfs_iomap_write_unwritten
  2015-02-02 19:48             ` J. Bruce Fields
@ 2015-02-03 18:35               ` Christoph Hellwig
       [not found]                 ` <20150203183533.GA16929-jcswGhMUV9g@public.gmane.org>
  0 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2015-02-03 18:35 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Dave Chinner, Christoph Hellwig, linux-fsdevel, linux-nfs,
	Jeff Layton, xfs

On Mon, Feb 02, 2015 at 02:48:26PM -0500, J. Bruce Fields wrote:
> Previously: http://lkml.kernel.org/r/20150106175611.GA16413@lst.de
> 
> 	>       - any advice on testing?  Is there was some simple
> 	>       virtual setup that would allow any loser with no special
> 	>       hardware (e.g., me) to check whether they've broken the
> 	>       block server?
> 
> 	Run two kvm VMs that share the same disk.  Create an XFS
> 	filesystem on the MDS, and export it.  If the client has blkmapd
> 	running (on Debian it needs to be started manually) it will use
> 	pNFS for accessing the filesystem.  Verify that using the
> 	per-operation counters in /proc/self/mounstats.  Repeat with
> 	additional clients as nessecary.
> 
> 	Alternatively set up a simple iSCSI target using tgt or lio and
> 	connect to it from multiple clients.
> 
> Which sounds reasonable to me, but I haven't tried to incorporate this
> into my regression testing yet.

Additionally I can offer the following script to generate recalls,
which don't really happen during normal operation.  I don't
really know how to write a proper testcase that coordinates access
to the exported filesystem and nfs unless it runs locally on the same node,
though.  It would need some higher level, network aware test harness:

----- snip -----
#!/bin/sh

set +x

# wait for grace period
touch /mnt/nfs1/foo

dd if=/dev/zero of=/mnt/nfs1/foo bs=128M count=32 conv=fdatasync oflag=direct &

sleep 2

echo "" > /mnt/test/foo && echo "recall done"

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 16/20] xfs: pass a 64-bit count argument to xfs_iomap_write_unwritten
  2015-02-02 19:43           ` Dave Chinner
  2015-02-02 19:48             ` J. Bruce Fields
@ 2015-02-04  7:57             ` Christoph Hellwig
       [not found]               ` <20150204075756.GA763-jcswGhMUV9g@public.gmane.org>
  1 sibling, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2015-02-04  7:57 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, J. Bruce Fields,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, Jeff Layton,
	xfs-VZNHf3L845pBDgjK7y7TUQ

On Tue, Feb 03, 2015 at 06:43:00AM +1100, Dave Chinner wrote:
> As to the pNFSD specific changes, I haven't really looked them over
> in any great detail yet. My main concern is that there are no
> specific regression tests for this yet, I'm not sure how we go about
> verifying it actually works properly and we don't inadvertantly
> break it in the future. Christoph?

Any chance you could review them this week so we can get them
merged in time for 3.20?  In the worst case Bruce will have to pull
the xfs tree into the nfsd tree so that we have all the dependencies
available.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 16/20] xfs: pass a 64-bit count argument to xfs_iomap_write_unwritten
       [not found]               ` <20150204075756.GA763-jcswGhMUV9g@public.gmane.org>
@ 2015-02-04 20:02                 ` Dave Chinner
  0 siblings, 0 replies; 63+ messages in thread
From: Dave Chinner @ 2015-02-04 20:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: J. Bruce Fields, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, Jeff Layton,
	xfs-VZNHf3L845pBDgjK7y7TUQ

On Wed, Feb 04, 2015 at 08:57:56AM +0100, Christoph Hellwig wrote:
> On Tue, Feb 03, 2015 at 06:43:00AM +1100, Dave Chinner wrote:
> > As to the pNFSD specific changes, I haven't really looked them over
> > in any great detail yet. My main concern is that there are no
> > specific regression tests for this yet, I'm not sure how we go about
> > verifying it actually works properly and we don't inadvertantly
> > break it in the future. Christoph?
> 
> Any chance you could review them this week so we can get them
> merged in time for 3.20?  In the worst case Bruce will have to pull
> the xfs tree into the nfsd tree so that we have all the dependencies
> available.

I'm working my way through them. I'm just about to pull in the
growfs transaction patch (missed it last time around), and I try to
have a decent look over the other two patches later today.

I'm not sure I have any bandwidth to test them yet, but perhaps if I
add a one-time message "Experimental feature in use" when the code
is first executed then it will be OK to merge (i.e. process similar
to delayed logging and CRC introduction). Once we've got more
confidence that it's all working properly, then we can remove the
experimental tag from it. Does that sound like a reasonable
approach to take?

Cheers,

Dave.
-- 
Dave Chinner
david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 19/20] xfs: implement pNFS export operations
       [not found]     ` <1421925006-24231-20-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
@ 2015-02-05  0:47       ` Dave Chinner
  2015-02-05  7:08         ` Christoph Hellwig
  0 siblings, 1 reply; 63+ messages in thread
From: Dave Chinner @ 2015-02-05  0:47 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: J. Bruce Fields, Jeff Layton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

On Thu, Jan 22, 2015 at 12:10:05PM +0100, Christoph Hellwig wrote:
> Add operations to export pNFS block layouts from an XFS filesystem.  See
> the previous commit adding the operations for an explanation of them.
> 
> Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>

Note that I haven't applied this patch, or attempted to compile it
yet....

> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index d617999..df68285 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -121,3 +121,4 @@ xfs-$(CONFIG_XFS_POSIX_ACL)	+= xfs_acl.o
>  xfs-$(CONFIG_PROC_FS)		+= xfs_stats.o
>  xfs-$(CONFIG_SYSCTL)		+= xfs_sysctl.o
>  xfs-$(CONFIG_COMPAT)		+= xfs_ioctl32.o
> +xfs-$(CONFIG_NFSD_PNFS)		+= xfs_pnfs.o

... because I'll have to jump through hoops to get it to compile, I
think.

> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index 74c6211..99465ba 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -602,6 +602,8 @@ xfs_growfs_data(
>  	if (!mutex_trylock(&mp->m_growlock))
>  		return -EWOULDBLOCK;
>  	error = xfs_growfs_data_private(mp, in);
> +	if (!error)
> +		mp->m_generation++;
>  	mutex_unlock(&mp->m_growlock);
>  	return error;
>  }

Even on error I think we should bump this. Errors can come from
secondary superblock updates after the filesystem has been grown,
hence an error is not a reliable indication of whether the layout
has changed or not.

> +int
> +xfs_fs_get_uuid(
> +	struct super_block	*sb,
> +	u8			*buf,
> +	u32			*len,
> +	u64			*offset)
> +{
> +	struct xfs_mount	*mp = XFS_M(sb);
> +
> +	if (*len < sizeof(uuid_t))
> +		return -EINVAL;
> +
> +	memcpy(buf, &mp->m_sb.sb_uuid, sizeof(uuid_t));
> +	*len = sizeof(uuid_t);
> +	*offset = offsetof(struct xfs_dsb, sb_uuid);

What purpose does the offset serve here? I can't tell from the usage
in the PNFS code. Can we ignore offset - as it seems entirely
arbitrary - and still have this work? Either way, comment please.


> +static void
> +xfs_bmbt_to_iomap(
> +	struct xfs_inode	*ip,
> +	struct iomap		*iomap,
> +	struct xfs_bmbt_irec	*imap)
> +{
> +	struct xfs_mount	*mp = ip->i_mount;
> +
> +	if (imap->br_startblock == HOLESTARTBLOCK) {
> +		iomap->blkno = -1;
> +		iomap->type = IOMAP_HOLE;
> +	} else if (imap->br_startblock == DELAYSTARTBLOCK) {
> +		iomap->blkno = -1;
> +		iomap->type = IOMAP_DELALLOC;

I'd like to see a IOMAP_NULL_BLOCK define here for the -1 value,
say:

#define IOMAP_NULL_BLOCK	-1LL

> +int
> +xfs_fs_map_blocks(
> +	struct inode		*inode,
> +	loff_t			offset,
> +	u64			length,
> +	struct iomap		*iomap,
> +	bool			write,
> +	u32			*device_generation)
> +{
> +	struct xfs_inode	*ip = XFS_I(inode);
> +	struct xfs_mount	*mp = ip->i_mount;
> +	struct xfs_bmbt_irec	imap;
> +	xfs_fileoff_t		offset_fsb, end_fsb;
> +	loff_t			limit;
> +	int			bmapi_flags = XFS_BMAPI_ENTIRE;
> +	int			nimaps = 1;
> +	uint			lock_flags;
> +	int			error = 0;
> +
> +	if (XFS_FORCED_SHUTDOWN(mp))
> +		return -EIO;
> +	if (XFS_IS_REALTIME_INODE(ip))
> +		return -ENXIO;

OK, so we are not mapping realtime inodes here. Any specific reason?
FWIW, that also means you can use XFS_FSB_TO_DADDR() in the iomap
mapping as xfs_fsb_to_db() is only needed if we might be mapping
realtime extents...

> +	for (i = 0; i < nr_maps; i++) {
> +		u64 start, length, end;
> +
> +		start = maps[i].offset;
> +		if (start > size)
> +			continue;
> +
> +		end = start + maps[i].length;
> +		if (end > size)
> +			end = size;
> +
> +		length = end - start;
> +		if (!length)
> +			continue;
> +	
   ^^^^^
Stray whitespace

> +	tp = xfs_trans_alloc(mp, XFS_TRANS_SETATTR_NOT_SIZE);
> +	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_ichange, 0, 0);
> +	if (error)
> +		goto out_drop_iolock;
> +
> +	xfs_ilock(ip, XFS_ILOCK_EXCL);
> +	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
> +	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> +
> +	xfs_setattr_time(ip, iattr);
> +	if (iattr->ia_valid & ATTR_SIZE) {
> +		if (iattr->ia_size > i_size_read(inode)) {
> +			i_size_write(inode, iattr->ia_size);
> +			ip->i_d.di_size = iattr->ia_size;
> +		}
> +	}

The concern I have about this is that extending the file size can
expose uninitialised blocks beyond the old EOF. That can happen if
delayed allocation has previously been done on the file and we
haven't trimmed the excess beyond EOF back yet. I know the pnfs
server is not aimed at mixed usage, but it still makes me
uncomfortable in the case where you have normal NFS and PNFS clients
accessing the same files...

> +	xfs_trans_set_sync(tp);
> +	error = xfs_trans_commit(tp, 0);

I just had a thought about these sync transctions - could the NFS
server handle persistence of the maps via ->commit_metadata?

Cheers,

Dave.
-- 
Dave Chinner
david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 20/20] xfs: recall pNFS layouts on conflicting access
       [not found]     ` <1421925006-24231-21-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
@ 2015-02-05  0:51       ` Dave Chinner
  0 siblings, 0 replies; 63+ messages in thread
From: Dave Chinner @ 2015-02-05  0:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: J. Bruce Fields, Jeff Layton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

On Thu, Jan 22, 2015 at 12:10:06PM +0100, Christoph Hellwig wrote:
> Recall all outstanding pNFS layouts and truncates, writes and similar extent
> list modifying operations.
> 
> Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>

This looks fine, assuming break_layout() doesn't require any other
VFS inode locks to be held for serialisation.

Cheers,

Dave.
-- 
Dave Chinner
david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 19/20] xfs: implement pNFS export operations
  2015-02-05  0:47       ` Dave Chinner
@ 2015-02-05  7:08         ` Christoph Hellwig
       [not found]           ` <20150205070858.GA593-jcswGhMUV9g@public.gmane.org>
  0 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2015-02-05  7:08 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, J. Bruce Fields, Jeff Layton,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

On Thu, Feb 05, 2015 at 11:47:58AM +1100, Dave Chinner wrote:
> > +++ b/fs/xfs/Makefile
> > @@ -121,3 +121,4 @@ xfs-$(CONFIG_XFS_POSIX_ACL)	+= xfs_acl.o
> >  xfs-$(CONFIG_PROC_FS)		+= xfs_stats.o
> >  xfs-$(CONFIG_SYSCTL)		+= xfs_sysctl.o
> >  xfs-$(CONFIG_COMPAT)		+= xfs_ioctl32.o
> > +xfs-$(CONFIG_NFSD_PNFS)		+= xfs_pnfs.o
> 
> ... because I'll have to jump through hoops to get it to compile, I
> think.

Just get the whole git tree from

	git://git.infradead.org/users/hch/pnfs.git pnfsd-for-3.20-3

> > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> > index 74c6211..99465ba 100644
> > --- a/fs/xfs/xfs_fsops.c
> > +++ b/fs/xfs/xfs_fsops.c
> > @@ -602,6 +602,8 @@ xfs_growfs_data(
> >  	if (!mutex_trylock(&mp->m_growlock))
> >  		return -EWOULDBLOCK;
> >  	error = xfs_growfs_data_private(mp, in);
> > +	if (!error)
> > +		mp->m_generation++;
> >  	mutex_unlock(&mp->m_growlock);
> >  	return error;
> >  }
> 
> Even on error I think we should bump this. Errors can come from
> secondary superblock updates after the filesystem has been grown,
> hence an error is not a reliable indication of whether the layout
> has changed or not.

Ok.
> 
> > +int
> > +xfs_fs_get_uuid(
> > +	struct super_block	*sb,
> > +	u8			*buf,
> > +	u32			*len,
> > +	u64			*offset)
> > +{
> > +	struct xfs_mount	*mp = XFS_M(sb);
> > +
> > +	if (*len < sizeof(uuid_t))
> > +		return -EINVAL;
> > +
> > +	memcpy(buf, &mp->m_sb.sb_uuid, sizeof(uuid_t));
> > +	*len = sizeof(uuid_t);
> > +	*offset = offsetof(struct xfs_dsb, sb_uuid);
> 
> What purpose does the offset serve here? I can't tell from the usage
> in the PNFS code. Can we ignore offset - as it seems entirely
> arbitrary - and still have this work? Either way, comment please.

The get_uuid methods gets content and location of the uuid so that
the client can find the disks.  The offset simply is part of the wire
protocol.

> I'd like to see a IOMAP_NULL_BLOCK define here for the -1 value,
> say:
> 
> #define IOMAP_NULL_BLOCK	-1LL

Ok.

> OK, so we are not mapping realtime inodes here. Any specific reason?

The realtime device doesn't have a way for the client to find it, as
it doesn't have its own superblockuuids.

> FWIW, that also means you can use XFS_FSB_TO_DADDR() in the iomap
> mapping as xfs_fsb_to_db() is only needed if we might be mapping
> realtime extents...

ok.

> > +	tp = xfs_trans_alloc(mp, XFS_TRANS_SETATTR_NOT_SIZE);
> > +	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_ichange, 0, 0);
> > +	if (error)
> > +		goto out_drop_iolock;
> > +
> > +	xfs_ilock(ip, XFS_ILOCK_EXCL);
> > +	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
> > +	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> > +
> > +	xfs_setattr_time(ip, iattr);
> > +	if (iattr->ia_valid & ATTR_SIZE) {
> > +		if (iattr->ia_size > i_size_read(inode)) {
> > +			i_size_write(inode, iattr->ia_size);
> > +			ip->i_d.di_size = iattr->ia_size;
> > +		}
> > +	}
> 
> The concern I have about this is that extending the file size can
> expose uninitialised blocks beyond the old EOF. That can happen if
> delayed allocation has previously been done on the file and we
> haven't trimmed the excess beyond EOF back yet. I know the pnfs
> server is not aimed at mixed usage, but it still makes me
> uncomfortable in the case where you have normal NFS and PNFS clients
> accessing the same files...

The protocol only allows to commit to a size that we previously
returned a layout for, which means we already have allocated space
for it at same time.  For robustness reasons a sanity check might make
sense, though.

> > +	xfs_trans_set_sync(tp);
> > +	error = xfs_trans_commit(tp, 0);
> 
> I just had a thought about these sync transctions - could the NFS
> server handle persistence of the maps via ->commit_metadata?

It probably could, but that wouldn't buy us anything over just committing
the transaction synchronously.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 19/20] xfs: implement pNFS export operations
       [not found]           ` <20150205070858.GA593-jcswGhMUV9g@public.gmane.org>
@ 2015-02-05 13:57             ` Christoph Hellwig
       [not found]               ` <20150205135756.GA6386-jcswGhMUV9g@public.gmane.org>
  0 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2015-02-05 13:57 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, J. Bruce Fields, Jeff Layton,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

I've updated the patch and pushed out a new pnfsd-for-3.20-4 branch.

The changes relative to the old one are below:

diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 99465ba..48561a0 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -602,8 +602,12 @@ xfs_growfs_data(
 	if (!mutex_trylock(&mp->m_growlock))
 		return -EWOULDBLOCK;
 	error = xfs_growfs_data_private(mp, in);
-	if (!error)
-		mp->m_generation++;
+	/*
+	 * Increment the generation unconditionally, the error could be from
+	 * updating the secondary superblocks, in which case the new size
+	 * is live already.
+	 */
+	mp->m_generation++;
 	mutex_unlock(&mp->m_growlock);
 	return error;
 }
diff --git a/fs/xfs/xfs_pnfs.c b/fs/xfs/xfs_pnfs.c
index ab5ee78..7440b40 100644
--- a/fs/xfs/xfs_pnfs.c
+++ b/fs/xfs/xfs_pnfs.c
@@ -15,6 +15,7 @@
 #include "xfs_error.h"
 #include "xfs_iomap.h"
 #include "xfs_shared.h"
+#include "xfs_bit.h"
 #include "xfs_pnfs.h"
 
 /*
@@ -48,6 +49,10 @@ xfs_break_layouts(
 	return error;
 }
 
+/*
+ * Get a uniqueue ID including its location so that the client can identify
+ * the exported device.
+ */
 int
 xfs_fs_get_uuid(
 	struct super_block	*sb,
@@ -57,6 +62,10 @@ xfs_fs_get_uuid(
 {
 	struct xfs_mount	*mp = XFS_M(sb);
 
+	printk_once(KERN_NOTICE
+"XFS (%s): using experimental pNFS feature, use at your own risk!\n",
+		mp->m_fsname);
+
 	if (*len < sizeof(uuid_t))
 		return -EINVAL;
 
@@ -75,13 +84,14 @@ xfs_bmbt_to_iomap(
 	struct xfs_mount	*mp = ip->i_mount;
 
 	if (imap->br_startblock == HOLESTARTBLOCK) {
-		iomap->blkno = -1;
+		iomap->blkno = IOMAP_NULL_BLOCK;
 		iomap->type = IOMAP_HOLE;
 	} else if (imap->br_startblock == DELAYSTARTBLOCK) {
-		iomap->blkno = -1;
+		iomap->blkno = IOMAP_NULL_BLOCK;
 		iomap->type = IOMAP_DELALLOC;
 	} else {
-		iomap->blkno = xfs_fsb_to_db(ip, imap->br_startblock);
+		iomap->blkno =
+			XFS_FSB_TO_DADDR(ip->i_mount, imap->br_startblock);
 		if (imap->br_state == XFS_EXT_UNWRITTEN)
 			iomap->type = IOMAP_UNWRITTEN;
 		else
@@ -115,6 +125,12 @@ xfs_fs_map_blocks(
 
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
+
+	/*
+	 * We can't export inodes residing on the realtime device.  The realtime
+	 * device doesn't have a UUID to identify it, so the client has no way
+	 * to find it.
+	 */
 	if (XFS_IS_REALTIME_INODE(ip))
 		return -ENXIO;
 
@@ -190,6 +206,32 @@ out_unlock:
 }
 
 /*
+ * Ensure the size update falls into a valid allocated block.
+ */
+static int
+xfs_pnfs_validate_isize(
+	struct xfs_inode	*ip,
+	xfs_off_t		isize)
+{
+	struct xfs_bmbt_irec	imap;
+	int			nimaps = 1;
+	int			error = 0;
+
+	xfs_ilock(ip, XFS_ILOCK_SHARED);
+	error = xfs_bmapi_read(ip, XFS_B_TO_FSBT(ip->i_mount, isize - 1), 1,
+				&imap, &nimaps, 0);
+	xfs_iunlock(ip, XFS_ILOCK_SHARED);
+	if (error)
+		return error;
+
+	if (imap.br_startblock == HOLESTARTBLOCK ||
+	    imap.br_startblock == DELAYSTARTBLOCK ||
+	    imap.br_state == XFS_EXT_UNWRITTEN)
+		return -EIO;
+	return 0;
+}
+
+/*
  * Make sure the blocks described by maps are stable on disk.  This includes
  * converting any unwritten extents, flushing the disk cache and updating the
  * time stamps.
@@ -209,6 +251,7 @@ xfs_fs_commit_blocks(
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_trans	*tp;
+	bool			update_isize = false;
 	int			error, i;
 	loff_t			size;
 
@@ -217,8 +260,10 @@ xfs_fs_commit_blocks(
 	xfs_ilock(ip, XFS_IOLOCK_EXCL);
 
 	size = i_size_read(inode);
-	if ((iattr->ia_valid & ATTR_SIZE) && iattr->ia_size > size)
+	if ((iattr->ia_valid & ATTR_SIZE) && iattr->ia_size > size) {
+		update_isize = true;
 		size = iattr->ia_size;
+	}
 
 	for (i = 0; i < nr_maps; i++) {
 		u64 start, length, end;
@@ -248,6 +293,12 @@ xfs_fs_commit_blocks(
 			goto out_drop_iolock;
 	}
 
+	if (update_isize) {
+		error = xfs_pnfs_validate_isize(ip, size);
+		if (error)
+			goto out_drop_iolock;
+	}
+
 	tp = xfs_trans_alloc(mp, XFS_TRANS_SETATTR_NOT_SIZE);
 	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_ichange, 0, 0);
 	if (error)
@@ -258,11 +309,9 @@ xfs_fs_commit_blocks(
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 
 	xfs_setattr_time(ip, iattr);
-	if (iattr->ia_valid & ATTR_SIZE) {
-		if (iattr->ia_size > i_size_read(inode)) {
-			i_size_write(inode, iattr->ia_size);
-			ip->i_d.di_size = iattr->ia_size;
-		}
+	if (update_isize) {
+		i_size_write(inode, iattr->ia_size);
+		ip->i_d.di_size = iattr->ia_size;
 	}
 
 	xfs_trans_set_sync(tp);
diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
index ff46bf7..fa05e04 100644
--- a/include/linux/exportfs.h
+++ b/include/linux/exportfs.h
@@ -187,6 +187,8 @@ struct fid {
 #define IOMAP_MAPPED	0x03	/* blocks allocated @blkno */
 #define IOMAP_UNWRITTEN	0x04	/* blocks allocated @blkno in unwritten state */
 
+#define IOMAP_NULL_BLOCK -1LL	/* blkno is not valid */
+
 struct iomap {
 	sector_t	blkno;	/* first sector of mapping */
 	loff_t		offset;	/* file offset of mapping, bytes */
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 19/20] xfs: implement pNFS export operations
       [not found]               ` <20150205135756.GA6386-jcswGhMUV9g@public.gmane.org>
@ 2015-02-06 22:20                 ` Dave Chinner
  2015-02-06 22:42                   ` J. Bruce Fields
  0 siblings, 1 reply; 63+ messages in thread
From: Dave Chinner @ 2015-02-06 22:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: J. Bruce Fields, Jeff Layton, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

On Thu, Feb 05, 2015 at 02:57:56PM +0100, Christoph Hellwig wrote:
> I've updated the patch and pushed out a new pnfsd-for-3.20-4 branch.
> 
> The changes relative to the old one are below:

Hi Christoph, with these changes I think this is fine to be merged
with the experimental tag attached to it

Acked-by: Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>

I'm expecting the merge window to open on Monday so it's kinda late
to be adding new stuff to the XFS tree and co-ordinating it with the
NFS tree merge - how were you planning to get this to merged?

I've already merged all but the two pNFSD support patches, so
there's some duplicate commits in your pnfsd-for-3.20-4 branch.
i.e. these commits in your tree:

b8d5187 xfs: factor out a xfs_update_prealloc_flags() helper
6d5ca2a xfs: update the superblock using a synchronous transaction in growfs
e3ea93e xfs: pass a 64-bit count argument to xfs_iomap_write_unwritten

are already merged into the xfs for-next branch as:

8add71c xfs: factor out a xfs_update_prealloc_flags() helper
f8079b8 xfs: growfs should use synchronous transactions
d32057f xfs: pass a 64-bit count argument to xfs_iomap_write_unwritten

A straight merge from your tree ends up with both sets of commits in
the history. So a rebase on your side, or me pulling them into the
XFS tree is probably required to keep the history clean.

I didn't really want to add any more to the XFS tree this close to
the merge window opening, but I've already got a regression fix that
needs to be added, so perhaps I'll delay sending Linus a pull
request for a week and just merge all of these XFS changes directly.

What do you think?

Cheers,

Dave.
-- 
Dave Chinner
david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 19/20] xfs: implement pNFS export operations
  2015-02-06 22:20                 ` Dave Chinner
@ 2015-02-06 22:42                   ` J. Bruce Fields
  2015-02-08 13:34                     ` Christoph Hellwig
  0 siblings, 1 reply; 63+ messages in thread
From: J. Bruce Fields @ 2015-02-06 22:42 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Jeff Layton, linux-nfs, linux-fsdevel, xfs

On Sat, Feb 07, 2015 at 09:20:47AM +1100, Dave Chinner wrote:
> On Thu, Feb 05, 2015 at 02:57:56PM +0100, Christoph Hellwig wrote:
> > I've updated the patch and pushed out a new pnfsd-for-3.20-4 branch.
> > 
> > The changes relative to the old one are below:
> 
> Hi Christoph, with these changes I think this is fine to be merged
> with the experimental tag attached to it
> 
> Acked-by: Dave Chinner <david@fromorbit.com>
> 
> I'm expecting the merge window to open on Monday so it's kinda late
> to be adding new stuff to the XFS tree and co-ordinating it with the
> NFS tree merge - how were you planning to get this to merged?
> 
> I've already merged all but the two pNFSD support patches, so
> there's some duplicate commits in your pnfsd-for-3.20-4 branch.
> i.e. these commits in your tree:
> 
> b8d5187 xfs: factor out a xfs_update_prealloc_flags() helper
> 6d5ca2a xfs: update the superblock using a synchronous transaction in growfs
> e3ea93e xfs: pass a 64-bit count argument to xfs_iomap_write_unwritten
> 
> are already merged into the xfs for-next branch as:
> 
> 8add71c xfs: factor out a xfs_update_prealloc_flags() helper
> f8079b8 xfs: growfs should use synchronous transactions
> d32057f xfs: pass a 64-bit count argument to xfs_iomap_write_unwritten
> 
> A straight merge from your tree ends up with both sets of commits in
> the history. So a rebase on your side, or me pulling them into the
> XFS tree is probably required to keep the history clean.
> 
> I didn't really want to add any more to the XFS tree this close to
> the merge window opening, but I've already got a regression fix that
> needs to be added, so perhaps I'll delay sending Linus a pull
> request for a week and just merge all of these XFS changes directly.
> 
> What do you think?

You'd basically just be pulling my tree (Christoph's is just my nfsd
tree with his patches on top, and I've been testing with exactly that
locally, just putting off pushing it out till we decide this.)

So anyway, fine with me if you want to just pull that into the xfs tree.
Mine's ready whenever, so if I send my pull pretty soon after the merge
window and you send it a little later then we still keep the property
that Linus's merge still has a diffstat only in our respective areas.

(OK, it's a little more complicated because I've got the same
arrangement with jlayton, so the order is jlayton's lock pull, then my
nfsd pull, then your xfs pull.  Is this getting too complicated?
jlayton and I are both ready to so and I think it'd work.)

I'm also fine with duplicating those few patches, or whatever.

--b.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 19/20] xfs: implement pNFS export operations
  2015-02-06 22:42                   ` J. Bruce Fields
@ 2015-02-08 13:34                     ` Christoph Hellwig
       [not found]                       ` <20150208133435.GA27081-jcswGhMUV9g@public.gmane.org>
  0 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2015-02-08 13:34 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-fsdevel, linux-nfs, Jeff Layton, xfs

On Fri, Feb 06, 2015 at 05:42:58PM -0500, J. Bruce Fields wrote:
> You'd basically just be pulling my tree (Christoph's is just my nfsd
> tree with his patches on top, and I've been testing with exactly that
> locally, just putting off pushing it out till we decide this.)
> 
> So anyway, fine with me if you want to just pull that into the xfs tree.
> Mine's ready whenever, so if I send my pull pretty soon after the merge
> window and you send it a little later then we still keep the property
> that Linus's merge still has a diffstat only in our respective areas.
> 
> (OK, it's a little more complicated because I've got the same
> arrangement with jlayton, so the order is jlayton's lock pull, then my
> nfsd pull, then your xfs pull.  Is this getting too complicated?
> jlayton and I are both ready to so and I think it'd work.)
> 
> I'm also fine with duplicating those few patches, or whatever.

Maybe the better idea is to pull the xfs tree in the nfsd tree, but
that would require Dave sending an early pull request so that the
nfsd pull doesn't get delayed.

Or we just defer the pnfsd merge.  While I tried to get it in in time
for 3.20 all the delays during review mean we're really late no and should
punt it to 3.21.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 19/20] xfs: implement pNFS export operations
       [not found]                       ` <20150208133435.GA27081-jcswGhMUV9g@public.gmane.org>
@ 2015-02-08 14:09                         ` Jeff Layton
       [not found]                           ` <20150208090942.51e99687-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
  0 siblings, 1 reply; 63+ messages in thread
From: Jeff Layton @ 2015-02-08 14:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: J. Bruce Fields, Dave Chinner, Jeff Layton,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

On Sun, 8 Feb 2015 14:34:35 +0100
Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org> wrote:

> On Fri, Feb 06, 2015 at 05:42:58PM -0500, J. Bruce Fields wrote:
> > You'd basically just be pulling my tree (Christoph's is just my nfsd
> > tree with his patches on top, and I've been testing with exactly that
> > locally, just putting off pushing it out till we decide this.)
> > 
> > So anyway, fine with me if you want to just pull that into the xfs tree.
> > Mine's ready whenever, so if I send my pull pretty soon after the merge
> > window and you send it a little later then we still keep the property
> > that Linus's merge still has a diffstat only in our respective areas.
> > 
> > (OK, it's a little more complicated because I've got the same
> > arrangement with jlayton, so the order is jlayton's lock pull, then my
> > nfsd pull, then your xfs pull.  Is this getting too complicated?
> > jlayton and I are both ready to so and I think it'd work.)
> > 
> > I'm also fine with duplicating those few patches, or whatever.
> 
> Maybe the better idea is to pull the xfs tree in the nfsd tree, but
> that would require Dave sending an early pull request so that the
> nfsd pull doesn't get delayed.
> 
> Or we just defer the pnfsd merge.  While I tried to get it in in time
> for 3.20 all the delays during review mean we're really late no and should
> punt it to 3.21.

FWIW, I plan to send a pull request for the locking changes as soon as
the merge window opens. Hopefully that won't be an issue for long...

-- 
Jeff Layton <jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 19/20] xfs: implement pNFS export operations
       [not found]                           ` <20150208090942.51e99687-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
@ 2015-02-09 20:11                             ` J. Bruce Fields
       [not found]                               ` <20150209201154.GA27746-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
  0 siblings, 1 reply; 63+ messages in thread
From: J. Bruce Fields @ 2015-02-09 20:11 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Christoph Hellwig, Dave Chinner, Jeff Layton,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

On Sun, Feb 08, 2015 at 09:09:42AM -0500, Jeff Layton wrote:
> On Sun, 8 Feb 2015 14:34:35 +0100
> Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org> wrote:
> 
> > On Fri, Feb 06, 2015 at 05:42:58PM -0500, J. Bruce Fields wrote:
> > > You'd basically just be pulling my tree (Christoph's is just my nfsd
> > > tree with his patches on top, and I've been testing with exactly that
> > > locally, just putting off pushing it out till we decide this.)
> > > 
> > > So anyway, fine with me if you want to just pull that into the xfs tree.
> > > Mine's ready whenever, so if I send my pull pretty soon after the merge
> > > window and you send it a little later then we still keep the property
> > > that Linus's merge still has a diffstat only in our respective areas.
> > > 
> > > (OK, it's a little more complicated because I've got the same
> > > arrangement with jlayton, so the order is jlayton's lock pull, then my
> > > nfsd pull, then your xfs pull.  Is this getting too complicated?
> > > jlayton and I are both ready to so and I think it'd work.)
> > > 
> > > I'm also fine with duplicating those few patches, or whatever.
> > 
> > Maybe the better idea is to pull the xfs tree in the nfsd tree, but
> > that would require Dave sending an early pull request so that the
> > nfsd pull doesn't get delayed.
> > 
> > Or we just defer the pnfsd merge.  While I tried to get it in in time
> > for 3.20 all the delays during review mean we're really late no and should
> > punt it to 3.21.
> 
> FWIW, I plan to send a pull request for the locking changes as soon as
> the merge window opens. Hopefully that won't be an issue for long...

This includes Christoph's branch (all but the final xfs commits):

	git://linux-nfs.org/~bfields/linux.git for-3.20

That's what I intend to submit.  Hope that's OK.  Then it's up to Dave
whether he wants to pull that in and include the xfs patches.

Worst case we end up with not-yet-usable pnfs code in 3.20, which
wouldn't be ideal but shouldn't cause any serious problem either.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 19/20] xfs: implement pNFS export operations
       [not found]                               ` <20150209201154.GA27746-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
@ 2015-02-10  0:04                                 ` Dave Chinner
  2015-02-13  1:11                                   ` J. Bruce Fields
  0 siblings, 1 reply; 63+ messages in thread
From: Dave Chinner @ 2015-02-10  0:04 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, Christoph Hellwig, Jeff Layton,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

On Mon, Feb 09, 2015 at 03:11:54PM -0500, J. Bruce Fields wrote:
> On Sun, Feb 08, 2015 at 09:09:42AM -0500, Jeff Layton wrote:
> > On Sun, 8 Feb 2015 14:34:35 +0100
> > Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org> wrote:
> > 
> > > On Fri, Feb 06, 2015 at 05:42:58PM -0500, J. Bruce Fields wrote:
> > > > You'd basically just be pulling my tree (Christoph's is just my nfsd
> > > > tree with his patches on top, and I've been testing with exactly that
> > > > locally, just putting off pushing it out till we decide this.)
> > > > 
> > > > So anyway, fine with me if you want to just pull that into the xfs tree.
> > > > Mine's ready whenever, so if I send my pull pretty soon after the merge
> > > > window and you send it a little later then we still keep the property
> > > > that Linus's merge still has a diffstat only in our respective areas.
> > > > 
> > > > (OK, it's a little more complicated because I've got the same
> > > > arrangement with jlayton, so the order is jlayton's lock pull, then my
> > > > nfsd pull, then your xfs pull.  Is this getting too complicated?
> > > > jlayton and I are both ready to so and I think it'd work.)
> > > > 
> > > > I'm also fine with duplicating those few patches, or whatever.
> > > 
> > > Maybe the better idea is to pull the xfs tree in the nfsd tree, but
> > > that would require Dave sending an early pull request so that the
> > > nfsd pull doesn't get delayed.
> > > 
> > > Or we just defer the pnfsd merge.  While I tried to get it in in time
> > > for 3.20 all the delays during review mean we're really late no and should
> > > punt it to 3.21.
> > 
> > FWIW, I plan to send a pull request for the locking changes as soon as
> > the merge window opens. Hopefully that won't be an issue for long...
> 
> This includes Christoph's branch (all but the final xfs commits):
> 
> 	git://linux-nfs.org/~bfields/linux.git for-3.20
> 
> That's what I intend to submit.  Hope that's OK.  Then it's up to Dave
> whether he wants to pull that in and include the xfs patches.

I'm about to send a pull request to Linus for the current XFS tree.
Once that is merged, I'll pull in the remaining xfs-pnfs patches
and send another pull request to Linus after the NFS tree is merged.

Cheers,

Dave.
-- 
Dave Chinner
david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 16/20] xfs: pass a 64-bit count argument to xfs_iomap_write_unwritten
       [not found]                 ` <20150203183533.GA16929-jcswGhMUV9g@public.gmane.org>
@ 2015-02-11 22:35                   ` J. Bruce Fields
  2015-02-11 22:54                     ` J. Bruce Fields
  0 siblings, 1 reply; 63+ messages in thread
From: J. Bruce Fields @ 2015-02-11 22:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dave Chinner, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, Jeff Layton,
	xfs-VZNHf3L845pBDgjK7y7TUQ

On Tue, Feb 03, 2015 at 07:35:33PM +0100, Christoph Hellwig wrote:
> On Mon, Feb 02, 2015 at 02:48:26PM -0500, J. Bruce Fields wrote:
> > Previously: http://lkml.kernel.org/r/20150106175611.GA16413-jcswGhMUV9g@public.gmane.org
> > 
> > 	>       - any advice on testing?  Is there was some simple
> > 	>       virtual setup that would allow any loser with no special
> > 	>       hardware (e.g., me) to check whether they've broken the
> > 	>       block server?
> > 
> > 	Run two kvm VMs that share the same disk.  Create an XFS
> > 	filesystem on the MDS, and export it.  If the client has blkmapd
> > 	running (on Debian it needs to be started manually) it will use
> > 	pNFS for accessing the filesystem.  Verify that using the
> > 	per-operation counters in /proc/self/mounstats.  Repeat with
> > 	additional clients as nessecary.
> > 
> > 	Alternatively set up a simple iSCSI target using tgt or lio and
> > 	connect to it from multiple clients.
> > 
> > Which sounds reasonable to me, but I haven't tried to incorporate this
> > into my regression testing yet.
> 
> Additionally I can offer the following script to generate recalls,
> which don't really happen during normal operation.  I don't
> really know how to write a proper testcase that coordinates access
> to the exported filesystem and nfs unless it runs locally on the same node,
> though.  It would need some higher level, network aware test harness:

Thanks.  I got as far as doing a quick manual test with vm's sharing a
"disk":

	[root@f21-2]# mount -overs=4.1 f21-1:/exports/xfs-pnfs /mnt/
	[root@f21-2]# echo "hello world" >/mnt/testfile
	[root@f21-2]# grep LAYOUTGET /proc/self/mountstats 
		   LAYOUTGET: 1 1 0 236 196 0 4 4

I haven't tried to set up automated testing with recalls, but that
shouldn't be hard.

--b.

> 
> ----- snip -----
> #!/bin/sh
> 
> set +x
> 
> # wait for grace period
> touch /mnt/nfs1/foo
> 
> dd if=/dev/zero of=/mnt/nfs1/foo bs=128M count=32 conv=fdatasync oflag=direct &
> 
> sleep 2
> 
> echo "" > /mnt/test/foo && echo "recall done"
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 16/20] xfs: pass a 64-bit count argument to xfs_iomap_write_unwritten
  2015-02-11 22:35                   ` J. Bruce Fields
@ 2015-02-11 22:54                     ` J. Bruce Fields
  0 siblings, 0 replies; 63+ messages in thread
From: J. Bruce Fields @ 2015-02-11 22:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dave Chinner, linux-fsdevel, linux-nfs, Jeff Layton, xfs

On Wed, Feb 11, 2015 at 05:35:22PM -0500, J. Bruce Fields wrote:
> On Tue, Feb 03, 2015 at 07:35:33PM +0100, Christoph Hellwig wrote:
> > On Mon, Feb 02, 2015 at 02:48:26PM -0500, J. Bruce Fields wrote:
> > > Previously: http://lkml.kernel.org/r/20150106175611.GA16413@lst.de
> > > 
> > > 	>       - any advice on testing?  Is there was some simple
> > > 	>       virtual setup that would allow any loser with no special
> > > 	>       hardware (e.g., me) to check whether they've broken the
> > > 	>       block server?
> > > 
> > > 	Run two kvm VMs that share the same disk.  Create an XFS
> > > 	filesystem on the MDS, and export it.  If the client has blkmapd
> > > 	running (on Debian it needs to be started manually) it will use
> > > 	pNFS for accessing the filesystem.  Verify that using the
> > > 	per-operation counters in /proc/self/mounstats.  Repeat with
> > > 	additional clients as nessecary.
> > > 
> > > 	Alternatively set up a simple iSCSI target using tgt or lio and
> > > 	connect to it from multiple clients.
> > > 
> > > Which sounds reasonable to me, but I haven't tried to incorporate this
> > > into my regression testing yet.
> > 
> > Additionally I can offer the following script to generate recalls,
> > which don't really happen during normal operation.  I don't
> > really know how to write a proper testcase that coordinates access
> > to the exported filesystem and nfs unless it runs locally on the same node,
> > though.  It would need some higher level, network aware test harness:
> 
> Thanks.  I got as far as doing a quick manual test with vm's sharing a
> "disk":

Oh, also I forgot, on fedora:

	[root@f21-2]# systemctl enable nfs-blkmap.target

> 	[root@f21-2]# mount -overs=4.1 f21-1:/exports/xfs-pnfs /mnt/
> 	[root@f21-2]# echo "hello world" >/mnt/testfile
> 	[root@f21-2]# grep LAYOUTGET /proc/self/mountstats 
> 		   LAYOUTGET: 1 1 0 236 196 0 4 4
> 
> I haven't tried to set up automated testing with recalls, but that
> shouldn't be hard.
> 
> --b.
> 
> > 
> > ----- snip -----
> > #!/bin/sh
> > 
> > set +x
> > 
> > # wait for grace period
> > touch /mnt/nfs1/foo
> > 
> > dd if=/dev/zero of=/mnt/nfs1/foo bs=128M count=32 conv=fdatasync oflag=direct &
> > 
> > sleep 2
> > 
> > echo "" > /mnt/test/foo && echo "recall done"

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 19/20] xfs: implement pNFS export operations
  2015-02-10  0:04                                 ` Dave Chinner
@ 2015-02-13  1:11                                   ` J. Bruce Fields
  2015-02-13  1:54                                     ` Dave Chinner
  0 siblings, 1 reply; 63+ messages in thread
From: J. Bruce Fields @ 2015-02-13  1:11 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Jeff Layton, Christoph Hellwig, Jeff Layton,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ

On Tue, Feb 10, 2015 at 11:04:23AM +1100, Dave Chinner wrote:
> On Mon, Feb 09, 2015 at 03:11:54PM -0500, J. Bruce Fields wrote:
> > On Sun, Feb 08, 2015 at 09:09:42AM -0500, Jeff Layton wrote:
> > > On Sun, 8 Feb 2015 14:34:35 +0100
> > > Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org> wrote:
> > > 
> > > > On Fri, Feb 06, 2015 at 05:42:58PM -0500, J. Bruce Fields wrote:
> > > > > You'd basically just be pulling my tree (Christoph's is just my nfsd
> > > > > tree with his patches on top, and I've been testing with exactly that
> > > > > locally, just putting off pushing it out till we decide this.)
> > > > > 
> > > > > So anyway, fine with me if you want to just pull that into the xfs tree.
> > > > > Mine's ready whenever, so if I send my pull pretty soon after the merge
> > > > > window and you send it a little later then we still keep the property
> > > > > that Linus's merge still has a diffstat only in our respective areas.
> > > > > 
> > > > > (OK, it's a little more complicated because I've got the same
> > > > > arrangement with jlayton, so the order is jlayton's lock pull, then my
> > > > > nfsd pull, then your xfs pull.  Is this getting too complicated?
> > > > > jlayton and I are both ready to so and I think it'd work.)
> > > > > 
> > > > > I'm also fine with duplicating those few patches, or whatever.
> > > > 
> > > > Maybe the better idea is to pull the xfs tree in the nfsd tree, but
> > > > that would require Dave sending an early pull request so that the
> > > > nfsd pull doesn't get delayed.
> > > > 
> > > > Or we just defer the pnfsd merge.  While I tried to get it in in time
> > > > for 3.20 all the delays during review mean we're really late no and should
> > > > punt it to 3.21.
> > > 
> > > FWIW, I plan to send a pull request for the locking changes as soon as
> > > the merge window opens. Hopefully that won't be an issue for long...
> > 
> > This includes Christoph's branch (all but the final xfs commits):
> > 
> > 	git://linux-nfs.org/~bfields/linux.git for-3.20
> > 
> > That's what I intend to submit.  Hope that's OK.  Then it's up to Dave
> > whether he wants to pull that in and include the xfs patches.
> 
> I'm about to send a pull request to Linus for the current XFS tree.
> Once that is merged, I'll pull in the remaining xfs-pnfs patches
> and send another pull request to Linus after the NFS tree is merged.

Sounds good, thanks.  The nfsd tree's merged now so it should be good to
go if you haven't found any show-stoppers.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 19/20] xfs: implement pNFS export operations
  2015-02-13  1:11                                   ` J. Bruce Fields
@ 2015-02-13  1:54                                     ` Dave Chinner
  2015-02-13  2:38                                       ` Stephen Rothwell
  0 siblings, 1 reply; 63+ messages in thread
From: Dave Chinner @ 2015-02-13  1:54 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, Christoph Hellwig, Jeff Layton, linux-nfs,
	linux-fsdevel, xfs

On Thu, Feb 12, 2015 at 08:11:30PM -0500, J. Bruce Fields wrote:
> On Tue, Feb 10, 2015 at 11:04:23AM +1100, Dave Chinner wrote:
> > On Mon, Feb 09, 2015 at 03:11:54PM -0500, J. Bruce Fields wrote:
> > > On Sun, Feb 08, 2015 at 09:09:42AM -0500, Jeff Layton wrote:
> > > > On Sun, 8 Feb 2015 14:34:35 +0100
> > > > Christoph Hellwig <hch@lst.de> wrote:
> > > > 
> > > > > On Fri, Feb 06, 2015 at 05:42:58PM -0500, J. Bruce Fields wrote:
> > > > > > You'd basically just be pulling my tree (Christoph's is just my nfsd
> > > > > > tree with his patches on top, and I've been testing with exactly that
> > > > > > locally, just putting off pushing it out till we decide this.)
> > > > > > 
> > > > > > So anyway, fine with me if you want to just pull that into the xfs tree.
> > > > > > Mine's ready whenever, so if I send my pull pretty soon after the merge
> > > > > > window and you send it a little later then we still keep the property
> > > > > > that Linus's merge still has a diffstat only in our respective areas.
> > > > > > 
> > > > > > (OK, it's a little more complicated because I've got the same
> > > > > > arrangement with jlayton, so the order is jlayton's lock pull, then my
> > > > > > nfsd pull, then your xfs pull.  Is this getting too complicated?
> > > > > > jlayton and I are both ready to so and I think it'd work.)
> > > > > > 
> > > > > > I'm also fine with duplicating those few patches, or whatever.
> > > > > 
> > > > > Maybe the better idea is to pull the xfs tree in the nfsd tree, but
> > > > > that would require Dave sending an early pull request so that the
> > > > > nfsd pull doesn't get delayed.
> > > > > 
> > > > > Or we just defer the pnfsd merge.  While I tried to get it in in time
> > > > > for 3.20 all the delays during review mean we're really late no and should
> > > > > punt it to 3.21.
> > > > 
> > > > FWIW, I plan to send a pull request for the locking changes as soon as
> > > > the merge window opens. Hopefully that won't be an issue for long...
> > > 
> > > This includes Christoph's branch (all but the final xfs commits):
> > > 
> > > 	git://linux-nfs.org/~bfields/linux.git for-3.20
> > > 
> > > That's what I intend to submit.  Hope that's OK.  Then it's up to Dave
> > > whether he wants to pull that in and include the xfs patches.
> > 
> > I'm about to send a pull request to Linus for the current XFS tree.
> > Once that is merged, I'll pull in the remaining xfs-pnfs patches
> > and send another pull request to Linus after the NFS tree is merged.
> 
> Sounds good, thanks.  The nfsd tree's merged now so it should be good to
> go if you haven't found any show-stoppers.

Thanks Bruce. I might have to build a merged tree because one of the
changes from the review modified a header file introduced in the NFS
tree.

I'll see how it goes, and see if I can avoid doing something that
will make Linus yell at me :P

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 19/20] xfs: implement pNFS export operations
  2015-02-13  1:54                                     ` Dave Chinner
@ 2015-02-13  2:38                                       ` Stephen Rothwell
  2015-02-15 23:25                                         ` Dave Chinner
  0 siblings, 1 reply; 63+ messages in thread
From: Stephen Rothwell @ 2015-02-13  2:38 UTC (permalink / raw)
  To: Dave Chinner
  Cc: J. Bruce Fields, Jeff Layton, Christoph Hellwig, Jeff Layton,
	linux-nfs, linux-fsdevel, xfs

[-- Attachment #1: Type: text/plain, Size: 693 bytes --]

Hi Dave,

On Fri, 13 Feb 2015 12:54:22 +1100 Dave Chinner <david@fromorbit.com> wrote:
>
> Thanks Bruce. I might have to build a merged tree because one of the
> changes from the review modified a header file introduced in the NFS
> tree.
> 
> I'll see how it goes, and see if I can avoid doing something that
> will make Linus yell at me :P

If its a syntactic conflict with an obvious resolution, just leave it
(or maybe mention it in the pull request).  It gives him something to
do so he feels useful. :-)

/me notes that I did not see a conflict while merging the xfs tree in
linux-next today ...

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 19/20] xfs: implement pNFS export operations
  2015-02-13  2:38                                       ` Stephen Rothwell
@ 2015-02-15 23:25                                         ` Dave Chinner
  0 siblings, 0 replies; 63+ messages in thread
From: Dave Chinner @ 2015-02-15 23:25 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: J. Bruce Fields, Jeff Layton, Christoph Hellwig, Jeff Layton,
	linux-nfs, linux-fsdevel, xfs

On Fri, Feb 13, 2015 at 01:38:11PM +1100, Stephen Rothwell wrote:
> Hi Dave,
> 
> On Fri, 13 Feb 2015 12:54:22 +1100 Dave Chinner <david@fromorbit.com> wrote:
> >
> > Thanks Bruce. I might have to build a merged tree because one of the
> > changes from the review modified a header file introduced in the NFS
> > tree.
> > 
> > I'll see how it goes, and see if I can avoid doing something that
> > will make Linus yell at me :P
> 
> If its a syntactic conflict with an obvious resolution, just leave it
> (or maybe mention it in the pull request).  It gives him something to
> do so he feels useful. :-)

Heh.

It's not actually a conflict, though. The issue is that some of the
review comments on the XFS side resulted in changes files introduced
from the NFS tree. So the XFS side can't be merged without the NFS
changes being present....

> /me notes that I did not see a conflict while merging the xfs tree in
> linux-next today ...

Because I haven't pushed them yet ;)

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2015-02-15 23:26 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-22 11:09 a simple and scalable pNFS block layout server V2 Christoph Hellwig
2015-01-22 11:09 ` [PATCH 04/20] nfsd: factor out a helper to decode nfstime4 values Christoph Hellwig
     [not found]   ` <1421925006-24231-5-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2015-01-22 20:15     ` J. Bruce Fields
2015-01-22 11:09 ` [PATCH 05/20] nfsd: move nfsd_fh_match to nfsfh.h Christoph Hellwig
2015-01-22 11:09 ` [PATCH 09/20] nfsd: make find_any_file available outside nfs4state.c Christoph Hellwig
2015-01-22 11:09 ` [PATCH 12/20] nfsd: update documentation for pNFS support Christoph Hellwig
2015-01-22 11:09 ` [PATCH 13/20] nfsd: add trace events Christoph Hellwig
2015-01-22 11:10 ` [PATCH 15/20] nfsd: pNFS block layout driver Christoph Hellwig
2015-01-22 11:10 ` [PATCH 18/20] xfs: factor out a xfs_update_prealloc_flags() helper Christoph Hellwig
     [not found]   ` <1421925006-24231-19-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2015-02-01 23:06     ` Dave Chinner
2015-01-22 16:04 ` a simple and scalable pNFS block layout server V2 Chuck Lever
2015-01-22 16:21   ` Christoph Hellwig
     [not found] ` <1421925006-24231-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2015-01-22 11:09   ` [PATCH 01/20] nfs: add LAYOUT_TYPE_MAX enum value Christoph Hellwig
2015-01-22 11:09   ` [PATCH 02/20] fs: track fl_owner for leases Christoph Hellwig
2015-01-22 11:09   ` [PATCH 03/20] fs: add FL_LAYOUT lease type Christoph Hellwig
2015-01-22 15:45     ` Jeff Layton
     [not found]     ` <1421925006-24231-4-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2015-01-22 20:14       ` J. Bruce Fields
     [not found]         ` <20150122201442.GJ898-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2015-01-22 20:18           ` Christoph Hellwig
2015-01-22 11:09   ` [PATCH 06/20] nfsd: add fh_fsid_match helper Christoph Hellwig
2015-01-22 11:09   ` [PATCH 07/20] nfsd: make lookup/alloc/unhash_stid available outside nfs4state.c Christoph Hellwig
2015-01-22 11:09   ` [PATCH 08/20] nfsd: make find/get/put file " Christoph Hellwig
2015-01-22 11:09   ` [PATCH 10/20] nfsd: implement pNFS operations Christoph Hellwig
     [not found]     ` <1421925006-24231-11-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2015-01-29 20:33       ` J. Bruce Fields
     [not found]         ` <20150129203346.GA11064-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2015-02-02 12:43           ` Christoph Hellwig
2015-02-02 14:28             ` J. Bruce Fields
     [not found]               ` <20150202142832.GC22301-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2015-02-02 14:56                 ` Christoph Hellwig
     [not found]                   ` <20150202145619.GA18387-jcswGhMUV9g@public.gmane.org>
2015-02-02 15:00                     ` J. Bruce Fields
     [not found]                       ` <20150202150032.GD22301-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2015-02-02 18:56                         ` Christoph Hellwig
     [not found]                           ` <20150202185638.GB23319-jcswGhMUV9g@public.gmane.org>
2015-02-03 16:08                             ` J. Bruce Fields
2015-01-22 11:09   ` [PATCH 11/20] nfsd: implement pNFS layout recalls Christoph Hellwig
2015-01-22 11:10   ` [PATCH 14/20] exportfs: add methods for block layout exports Christoph Hellwig
2015-01-22 11:10   ` [PATCH 16/20] xfs: pass a 64-bit count argument to xfs_iomap_write_unwritten Christoph Hellwig
2015-01-29 20:52     ` J. Bruce Fields
2015-02-02  7:30       ` Christoph Hellwig
2015-02-02 19:24         ` Dave Chinner
2015-02-02 19:43           ` Dave Chinner
2015-02-02 19:48             ` J. Bruce Fields
2015-02-03 18:35               ` Christoph Hellwig
     [not found]                 ` <20150203183533.GA16929-jcswGhMUV9g@public.gmane.org>
2015-02-11 22:35                   ` J. Bruce Fields
2015-02-11 22:54                     ` J. Bruce Fields
2015-02-04  7:57             ` Christoph Hellwig
     [not found]               ` <20150204075756.GA763-jcswGhMUV9g@public.gmane.org>
2015-02-04 20:02                 ` Dave Chinner
2015-01-22 11:10   ` [PATCH 17/20] xfs: update the superblock using a synchronous transaction in growfs Christoph Hellwig
2015-01-22 11:10   ` [PATCH 19/20] xfs: implement pNFS export operations Christoph Hellwig
     [not found]     ` <1421925006-24231-20-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2015-02-05  0:47       ` Dave Chinner
2015-02-05  7:08         ` Christoph Hellwig
     [not found]           ` <20150205070858.GA593-jcswGhMUV9g@public.gmane.org>
2015-02-05 13:57             ` Christoph Hellwig
     [not found]               ` <20150205135756.GA6386-jcswGhMUV9g@public.gmane.org>
2015-02-06 22:20                 ` Dave Chinner
2015-02-06 22:42                   ` J. Bruce Fields
2015-02-08 13:34                     ` Christoph Hellwig
     [not found]                       ` <20150208133435.GA27081-jcswGhMUV9g@public.gmane.org>
2015-02-08 14:09                         ` Jeff Layton
     [not found]                           ` <20150208090942.51e99687-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2015-02-09 20:11                             ` J. Bruce Fields
     [not found]                               ` <20150209201154.GA27746-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2015-02-10  0:04                                 ` Dave Chinner
2015-02-13  1:11                                   ` J. Bruce Fields
2015-02-13  1:54                                     ` Dave Chinner
2015-02-13  2:38                                       ` Stephen Rothwell
2015-02-15 23:25                                         ` Dave Chinner
2015-01-22 11:10   ` [PATCH 20/20] xfs: recall pNFS layouts on conflicting access Christoph Hellwig
     [not found]     ` <1421925006-24231-21-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2015-02-05  0:51       ` Dave Chinner
2015-01-22 20:01   ` a simple and scalable pNFS block layout server V2 J. Bruce Fields
2015-01-22 20:06   ` J. Bruce Fields
2015-01-22 20:20     ` Christoph Hellwig
     [not found]     ` <20150122200618.GI898-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2015-01-22 20:20       ` Jeff Layton

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox