All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/50] Squashed and re-organized pnfs-submit tree
@ 2010-08-13 21:31 andros
  2010-08-13 21:31 ` [PATCH 01/50] nfs41: prevent exchange_id from sending server-only flag andros
  2010-08-19 20:50 ` [PATCH 0/50] Squashed and re-organized pnfs-submit tree Benny Halevy
  0 siblings, 2 replies; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs



Here is the pnfs-submit tree all squashed and reorganized. The resultant tree
is unchanged. The patch comments and Signed-off-by's need to be added.

We may want to split or re-order some of the patches. I left the "introduce
new file" patches #6-9 unsquashed, but we probably want to squash them into
the patch that uses them first.

Patches 1-4 are not pnfs patches, and should be submitted to Trond/Bruce.

Patch #50 "SQUASHME pnfs_submit: remove this unused code"
should not be squashed, but removed.

Patches 47,-49 are for the post-submit trees.

-->Andy


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 01/50] nfs41: prevent exchange_id from sending server-only flag
  2010-08-13 21:31 [PATCH 0/50] Squashed and re-organized pnfs-submit tree andros
@ 2010-08-13 21:31 ` andros
  2010-08-13 21:31   ` [PATCH 02/50] sunrpc: define xdr_decode_opaque_fixed andros
  2010-08-19 20:50 ` [PATCH 0/50] Squashed and re-organized pnfs-submit tree Benny Halevy
  1 sibling, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Fred Isaman

From: Fred Isaman <iisaman@citi.umich.edu>

clp->cl_exchange_flags is used both for client output and server input.
This causes problems in certain recovery situations, when the server
has sent back EXCHGID4_FLAG_CONFIRMED_R, causing the client to erroneously
use the flag in future EXCHANGE_ID requests.

Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/nfs4proc.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index ba79c0e..7684817 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -4573,7 +4573,7 @@ int nfs4_proc_exchange_id(struct nfs_client *clp, struct rpc_cred *cred)
 	nfs4_verifier verifier;
 	struct nfs41_exchange_id_args args = {
 		.client = clp,
-		.flags = clp->cl_exchange_flags,
+		.flags = clp->cl_exchange_flags & ~EXCHGID4_FLAG_CONFIRMED_R,
 	};
 	struct nfs41_exchange_id_res res = {
 		.client = clp,
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 02/50] sunrpc: define xdr_decode_opaque_fixed
  2010-08-13 21:31 ` [PATCH 01/50] nfs41: prevent exchange_id from sending server-only flag andros
@ 2010-08-13 21:31   ` andros
  2010-08-13 21:31     ` [PATCH 03/50] sunrpc: don't reset buflen twice in xdr_shrink_pagelen andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs

From: Benny Halevy <bhalevy@panasas.com>

A helper for decoding a fixed length opaque value.
Returns a pointer to the next item in the xdr stream.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 include/linux/sunrpc/xdr.h |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h
index 35cf2e8..23dc117 100644
--- a/include/linux/sunrpc/xdr.h
+++ b/include/linux/sunrpc/xdr.h
@@ -131,6 +131,13 @@ xdr_decode_hyper(__be32 *p, __u64 *valp)
 	return p + 2;
 }
 
+static inline __be32 *
+xdr_decode_opaque_fixed(__be32 *p, void *ptr, unsigned int len)
+{
+	memcpy(ptr, p, len);
+	return p + XDR_QUADLEN(len);
+}
+
 /*
  * Adjust kvec to reflect end of xdr'ed data (RPC client XDR)
  */
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 03/50] sunrpc: don't reset buflen twice in xdr_shrink_pagelen
  2010-08-13 21:31   ` [PATCH 02/50] sunrpc: define xdr_decode_opaque_fixed andros
@ 2010-08-13 21:31     ` andros
  2010-08-13 21:31       ` [PATCH 04/50] nfsd: remove duplicate NFS4_STATEID_SIZE declaration andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs

From: Benny Halevy <bhalevy@panasas.com>

On Jan. 14, 2009, 2:50 +0200, andros@netapp.com wrote:
> From: Andy Adamson <andros@netapp.com>
>
> The buflen is reset for all cases at the end of xdr_shrink_pagelen.
> The data left in the tail after xdr_read_pages is not processed when the
> buflen is incorrectly set.

Note that in this case we also lose (len - tail->iov_len)
bytes from the buffered data in pages.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Acked-by: Andy Adamson <andros@netapp.com>
---
 net/sunrpc/xdr.c |   14 ++++++--------
 1 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index a1f82a8..b43258e 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -403,16 +403,14 @@ xdr_shrink_pagelen(struct xdr_buf *buf, size_t len)
 
 	/* Shift the tail first */
 	if (tail->iov_len != 0) {
-		p = (char *)tail->iov_base + len;
-		if (tail->iov_len > len) {
-			copy = tail->iov_len - len;
-			memmove(p, tail->iov_base, copy);
-		} else
-			buf->buflen -= len;
-		/* Copy from the inlined pages into the tail */
 		copy = len;
-		if (copy > tail->iov_len)
+		if (tail->iov_len > len) {
+			p = (char *)tail->iov_base + len;
+			memmove(p, tail->iov_base, tail->iov_len - len);
+		} else {
 			copy = tail->iov_len;
+		}
+		/* Copy from the inlined pages into the tail */
 		_copy_from_pages((char *)tail->iov_base,
 				buf->pages, buf->page_base + pglen - len,
 				copy);
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 04/50] nfsd: remove duplicate NFS4_STATEID_SIZE declaration
  2010-08-13 21:31     ` [PATCH 03/50] sunrpc: don't reset buflen twice in xdr_shrink_pagelen andros
@ 2010-08-13 21:31       ` andros
  2010-08-13 21:31         ` [PATCH 05/50] pnfs_submit: pnfs and nfslayoutdriver kconfig andros
  2010-08-20 22:13         ` [PATCH 04/50] nfsd: remove duplicate NFS4_STATEID_SIZE declaration J. Bruce Fields
  0 siblings, 2 replies; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Use NFS4_STATEID_SIZE from include/linux/nfs4

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfsd/nfs4callback.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index f324f4b..e89476e 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -41,7 +41,6 @@
 
 #define NFSPROC4_CB_NULL 0
 #define NFSPROC4_CB_COMPOUND 1
-#define NFS4_STATEID_SIZE 16
 
 /* Index of predefined Linux callback client operations */
 
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 05/50] pnfs_submit: pnfs and nfslayoutdriver kconfig
  2010-08-13 21:31       ` [PATCH 04/50] nfsd: remove duplicate NFS4_STATEID_SIZE declaration andros
@ 2010-08-13 21:31         ` andros
  2010-08-13 21:31           ` [PATCH 06/50] pnfs_submit: introduce include/linux/nfs4_pnfs.h andros
  2010-08-18 20:25           ` [PATCH 05/50] pnfs_submit: pnfs and nfslayoutdriver kconfig Christoph Hellwig
  2010-08-20 22:13         ` [PATCH 04/50] nfsd: remove duplicate NFS4_STATEID_SIZE declaration J. Bruce Fields
  1 sibling, 2 replies; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/Kconfig |   12 +++++++++++-
 1 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
index a43d07e..7b914fe 100644
--- a/fs/nfs/Kconfig
+++ b/fs/nfs/Kconfig
@@ -79,7 +79,17 @@ config NFS_V4_1
 	depends on NFS_V4 && EXPERIMENTAL
 	help
 	  This option enables support for minor version 1 of the NFSv4 protocol
-	  (draft-ietf-nfsv4-minorversion1) in the kernel's NFS client.
+	  (RFC5661) including support for the parallel NFS (pNFS) features
+	  in the kernel's NFS client.
+
+	  Unless you're an NFS developer, say N.
+
+config PNFS_FILE_LAYOUT
+	tristate "NFS client support for the pNFS nfs-files layout (DEVELOPER ONLY)"
+	depends on NFS_FS && NFS_V4_1
+	default y
+	help
+	  This option enables support for the pNFS nfs-files layout.
 
 	  Unless you're an NFS developer, say N.
 
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 06/50] pnfs_submit: introduce include/linux/nfs4_pnfs.h
  2010-08-13 21:31         ` [PATCH 05/50] pnfs_submit: pnfs and nfslayoutdriver kconfig andros
@ 2010-08-13 21:31           ` andros
  2010-08-13 21:31             ` [PATCH 07/50] pnfs_submit: introduce include/linux/pnfs_xdr.h andros
  2010-08-18 20:27             ` [PATCH 06/50] pnfs_submit: introduce include/linux/nfs4_pnfs.h Christoph Hellwig
  2010-08-18 20:25           ` [PATCH 05/50] pnfs_submit: pnfs and nfslayoutdriver kconfig Christoph Hellwig
  1 sibling, 2 replies; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Dean Hildebrand

From: The pNFS Team <linux-nfs@vger.kernel.org>

Common data structures needed by the pnfs client and pnfs layout driver.

[extraced from pnfsd: Initial pNFS server implementation.]
Signed-off-by: Dean Hildebrand <seattleplus@gmail.com>
[pnfs: nfs4_pnfs.h remove CONFIG_PNFS]
[removed CONFIG_NFS_V4_1 altogether, always define structs]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 include/linux/nfs4_pnfs.h |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/nfs4_pnfs.h

diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
new file mode 100644
index 0000000..04bdd10
--- /dev/null
+++ b/include/linux/nfs4_pnfs.h
@@ -0,0 +1,15 @@
+/*
+ *  include/linux/nfs4_pnfs.h
+ *
+ *  Common data structures needed by the pnfs client and pnfs layout driver.
+ *
+ *  Copyright (c) 2002 The Regents of the University of Michigan.
+ *  All rights reserved.
+ *
+ *  Dean Hildebrand   <dhildebz@eecs.umich.edu>
+ */
+
+#ifndef LINUX_NFS4_PNFS_H
+#define LINUX_NFS4_PNFS_H
+
+#endif /* LINUX_NFS4_PNFS_H */
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 07/50] pnfs_submit: introduce include/linux/pnfs_xdr.h
  2010-08-13 21:31           ` [PATCH 06/50] pnfs_submit: introduce include/linux/nfs4_pnfs.h andros
@ 2010-08-13 21:31             ` andros
  2010-08-13 21:31               ` [PATCH 08/50] pnfs_submit: introduce fs/nfs/pnfs.h andros
  2010-08-18 20:27             ` [PATCH 06/50] pnfs_submit: introduce include/linux/nfs4_pnfs.h Christoph Hellwig
  1 sibling, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Dean Hildebrand, Boaz Harrosh

From: The pNFS Team <linux-nfs@vger.kernel.org>

Common xdr data structures needed by pnfs client.

[extracted from: pnfs: Initial check-in of pNFS File Layout Driver.]
Signed-off-by: Dean Hildebrand <dhildebz@eecs.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[pnfs: Even when CONFIG_PNFS not set some definitions are needed]
    exofs uses the pnfs_osd_xdr.h file, so it must be compilable
    even if CONFIG_PNFS is not set.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[define all structures unconfitionally]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 include/linux/pnfs_xdr.h |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/pnfs_xdr.h

diff --git a/include/linux/pnfs_xdr.h b/include/linux/pnfs_xdr.h
new file mode 100644
index 0000000..bcbfbe0
--- /dev/null
+++ b/include/linux/pnfs_xdr.h
@@ -0,0 +1,16 @@
+/*
+ *  include/linux/pnfs_xdr.h
+ *
+ *  Common xdr data structures needed by pnfs client.
+ *
+ *  Copyright (c) 2002 The Regents of the University of Michigan.
+ *  All rights reserved.
+ *
+ * Dean Hildebrand   <dhildebz@eecs.umich.edu>
+ */
+
+#ifndef LINUX_PNFS_XDR_H
+#define LINUX_PNFS_XDR_H
+
+
+#endif /* LINUX_PNFS_XDR_H */
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 08/50] pnfs_submit: introduce fs/nfs/pnfs.h
  2010-08-13 21:31             ` [PATCH 07/50] pnfs_submit: introduce include/linux/pnfs_xdr.h andros
@ 2010-08-13 21:31               ` andros
  2010-08-13 21:31                 ` [PATCH 09/50] pnfs_submit: introduce fs/nfs/pnfs.c andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Dean Hildebrand, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

pNFS client data structures.

[extraceted from: pnfs: Initial check-in of pNFS File Layout Driver.]
Signed-off-by: Dean Hildebrand <dhildebz@eecs.umich.edu>
[pnfs: pnfs.h: remove CONFIG_PNFS]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/pnfs.h |   18 ++++++++++++++++++
 1 files changed, 18 insertions(+), 0 deletions(-)
 create mode 100644 fs/nfs/pnfs.h

diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
new file mode 100644
index 0000000..7b5ebd9
--- /dev/null
+++ b/fs/nfs/pnfs.h
@@ -0,0 +1,18 @@
+/*
+ *  fs/nfs/pnfs.h
+ *
+ *  pNFS client data structures.
+ *
+ *  Copyright (c) 2002 The Regents of the University of Michigan.
+ *  All rights reserved.
+ *
+ *  Dean Hildebrand   <dhildebz@eecs.umich.edu>
+ */
+
+#ifndef FS_NFS_PNFS_H
+#define FS_NFS_PNFS_H
+
+#ifdef CONFIG_NFS_V4_1
+#endif /* CONFIG_NFS_V4_1 */
+
+#endif /* FS_NFS_PNFS_H */
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 09/50] pnfs_submit: introduce fs/nfs/pnfs.c
  2010-08-13 21:31               ` [PATCH 08/50] pnfs_submit: introduce fs/nfs/pnfs.h andros
@ 2010-08-13 21:31                 ` andros
  2010-08-13 21:31                   ` [PATCH 10/50] pnfs_submit: register unregister pnfs module andros
  2010-08-18 20:28                   ` [PATCH 09/50] pnfs_submit: introduce fs/nfs/pnfs.c Christoph Hellwig
  0 siblings, 2 replies; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Dean Hildebrand, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

pNFS functions to call and manage layout drivers.

pnfs.o is compiled and linked conditionally on CONFIG_NFS_V4_1
in fs/nfs/Makefile.

[extracted from: pnfs: Initial check-in of pNFS File Layout Driver.]
Signed-off-by: Dean Hildebrand <dhildebz@eecs.umich.edu>
[pnfs: remove CONFIG_PNFS]
Signed-off-by: Andy Adamson <andros@netapp.com>
[pnfs.c: remove CONFIG_NFS_V4_1 altogether]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/Makefile |    1 +
 fs/nfs/pnfs.c   |   37 +++++++++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+), 0 deletions(-)
 create mode 100644 fs/nfs/pnfs.c

diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index da7fda6..bb9e773 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -15,5 +15,6 @@ nfs-$(CONFIG_NFS_V4)	+= nfs4proc.o nfs4xdr.o nfs4state.o nfs4renewd.o \
 			   delegation.o idmap.o \
 			   callback.o callback_xdr.o callback_proc.o \
 			   nfs4namespace.o
+nfs-$(CONFIG_NFS_V4_1)	+= pnfs.o
 nfs-$(CONFIG_SYSCTL) += sysctl.o
 nfs-$(CONFIG_NFS_FSCACHE) += fscache.o fscache-index.o
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
new file mode 100644
index 0000000..4ea7301
--- /dev/null
+++ b/fs/nfs/pnfs.c
@@ -0,0 +1,37 @@
+/*
+ *  linux/fs/nfs/pnfs.c
+ *
+ *  pNFS functions to call and manage layout drivers.
+ *
+ *  Copyright (c) 2002 The Regents of the University of Michigan.
+ *  All rights reserved.
+ *
+ *  Dean Hildebrand <dhildebz@eecs.umich.edu>
+ *
+ *  Redistribution and use in source and binary forms, with or without
+ *  modification, are permitted provided that the following conditions
+ *  are met:
+ *
+ *  1. Redistributions of source code must retain the above copyright
+ *     notice, this list of conditions and the following disclaimer.
+ *  2. Redistributions in binary form must reproduce the above copyright
+ *     notice, this list of conditions and the following disclaimer in the
+ *     documentation and/or other materials provided with the distribution.
+ *  3. Neither the name of the University nor the names of its
+ *     contributors may be used to endorse or promote products derived
+ *     from this software without specific prior written permission.
+ *
+ *  THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED
+ *  WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+ *  MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ *  DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+ *  FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ *  CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ *  SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ *  BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ *  LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+ *  NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ *  SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#define NFSDBG_FACILITY		NFSDBG_PNFS
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 10/50] pnfs_submit: register unregister pnfs module
  2010-08-13 21:31                 ` [PATCH 09/50] pnfs_submit: introduce fs/nfs/pnfs.c andros
@ 2010-08-13 21:31                   ` andros
  2010-08-13 21:31                     ` [PATCH 11/50] pnfs_submit: set and unset pnfs layoutdriver modules andros
  2010-08-18 20:29                     ` [PATCH 10/50] pnfs_submit: register unregister pnfs module Christoph Hellwig
  2010-08-18 20:28                   ` [PATCH 09/50] pnfs_submit: introduce fs/nfs/pnfs.c Christoph Hellwig
  1 sibling, 2 replies; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/inode.c            |   10 ++++
 fs/nfs/pnfs.c             |  113 +++++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/pnfs.h             |   10 ++++
 include/linux/nfs4_pnfs.h |   31 ++++++++++++
 include/linux/nfs_fs.h    |    1 +
 5 files changed, 165 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index ec7a8f9..64261ea 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -48,6 +48,7 @@
 #include "internal.h"
 #include "fscache.h"
 #include "dns_resolve.h"
+#include "pnfs.h"
 
 #define NFSDBG_FACILITY		NFSDBG_VFS
 
@@ -1550,6 +1551,12 @@ static int __init init_nfs_fs(void)
 	if (err)
 		goto out0;
 
+#ifdef CONFIG_NFS_V4_1
+	err = pnfs_initialize();
+	if (err)
+		goto out00;
+#endif /* CONFIG_NFS_V4_1 */
+
 #ifdef CONFIG_PROC_FS
 	rpc_proc_register(&nfs_rpcstat);
 #endif
@@ -1560,6 +1567,9 @@ out:
 #ifdef CONFIG_PROC_FS
 	rpc_proc_unregister("nfs");
 #endif
+#ifdef CONFIG_NFS_V4_1
+out00:
+#endif /* CONFIG_NFS_V4_1 */
 	nfs_destroy_directcache();
 out0:
 	nfs_destroy_writepagecache();
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 4ea7301..73558b7 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -34,4 +34,117 @@
  *  SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/smp_lock.h>
+#include <linux/nfs_fs.h>
+#include <linux/nfs_mount.h>
+#include <linux/nfs_page.h>
+#include <linux/nfs4.h>
+#include <linux/pnfs_xdr.h>
+#include <linux/nfs4_pnfs.h>
+#include <linux/rculist.h>
+
+#include "internal.h"
+#include "nfs4_fs.h"
+#include "pnfs.h"
+
 #define NFSDBG_FACILITY		NFSDBG_PNFS
+
+#define MIN_POOL_LC		(4)
+
+static int pnfs_initialized;
+
+/* Locking:
+ *
+ * pnfs_spinlock:
+ * 	protects pnfs_modules_tbl.
+ */
+static spinlock_t pnfs_spinlock = __SPIN_LOCK_UNLOCKED(pnfs_spinlock);
+
+/*
+ * pnfs_modules_tbl holds all pnfs modules
+ */
+static struct list_head	pnfs_modules_tbl;
+
+/*
+ * struct pnfs_module - One per pNFS device module.
+ */
+struct pnfs_module {
+	struct pnfs_layoutdriver_type *pnfs_ld_type;
+	struct list_head        pnfs_tblid;
+};
+
+int
+pnfs_initialize(void)
+{
+	INIT_LIST_HEAD(&pnfs_modules_tbl);
+
+	pnfs_initialized = 1;
+	return 0;
+}
+
+/* search pnfs_modules_tbl for right pnfs module */
+static int
+find_pnfs(u32 id, struct pnfs_module **module) {
+	struct  pnfs_module *local = NULL;
+
+	dprintk("PNFS: %s: Searching for %u\n", __func__, id);
+	list_for_each_entry(local, &pnfs_modules_tbl, pnfs_tblid) {
+		if (local->pnfs_ld_type->id == id) {
+			*module = local;
+			return(1);
+		}
+	}
+	return 0;
+}
+
+
+/* Allow I/O module to set its functions structure */
+struct pnfs_client_operations*
+pnfs_register_layoutdriver(struct pnfs_layoutdriver_type *ld_type)
+{
+	struct pnfs_module *pnfs_mod;
+
+	if (!pnfs_initialized) {
+		printk(KERN_ERR "%s Registration failure. "
+		       "pNFS not initialized.\n", __func__);
+		return NULL;
+	}
+
+	pnfs_mod = kmalloc(sizeof(struct pnfs_module), GFP_KERNEL);
+	if (pnfs_mod != NULL) {
+		dprintk("%s Registering id:%u name:%s\n",
+			__func__,
+			ld_type->id,
+			ld_type->name);
+		pnfs_mod->pnfs_ld_type = ld_type;
+		INIT_LIST_HEAD(&pnfs_mod->pnfs_tblid);
+
+		spin_lock(&pnfs_spinlock);
+		list_add(&pnfs_mod->pnfs_tblid, &pnfs_modules_tbl);
+		spin_unlock(&pnfs_spinlock);
+	}
+
+	return &pnfs_ops;
+}
+
+/*  Allow I/O module to set its functions structure */
+void
+pnfs_unregister_layoutdriver(struct pnfs_layoutdriver_type *ld_type)
+{
+	struct pnfs_module *pnfs_mod;
+
+	if (find_pnfs(ld_type->id, &pnfs_mod)) {
+		dprintk("%s Deregistering id:%u\n", __func__, ld_type->id);
+		spin_lock(&pnfs_spinlock);
+		list_del(&pnfs_mod->pnfs_tblid);
+		spin_unlock(&pnfs_spinlock);
+		kfree(pnfs_mod);
+	}
+}
+
+EXPORT_SYMBOL(pnfs_unregister_layoutdriver);
+EXPORT_SYMBOL(pnfs_register_layoutdriver);
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 7b5ebd9..92538ce 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -12,7 +12,17 @@
 #ifndef FS_NFS_PNFS_H
 #define FS_NFS_PNFS_H
 
+#include <linux/nfs4_pnfs.h>
+
 #ifdef CONFIG_NFS_V4_1
+
+#include <linux/nfs_page.h>
+#include <linux/pnfs_xdr.h>
+#include <linux/nfs_iostat.h>
+#include "iostat.h"
+
+int pnfs_initialize(void);
+
 #endif /* CONFIG_NFS_V4_1 */
 
 #endif /* FS_NFS_PNFS_H */
diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
index 04bdd10..4cc22c6 100644
--- a/include/linux/nfs4_pnfs.h
+++ b/include/linux/nfs4_pnfs.h
@@ -12,4 +12,35 @@
 #ifndef LINUX_NFS4_PNFS_H
 #define LINUX_NFS4_PNFS_H
 
+
+/* Per-layout driver specific registration structure */
+struct pnfs_layoutdriver_type {
+	const u32 id;
+	const char *name;
+	struct layoutdriver_io_operations *ld_io_ops;
+	struct layoutdriver_policy_operations *ld_policy_ops;
+};
+
+/* Layout driver I/O operations.
+ * Either the pagecache or non-pagecache read/write operations must be implemented
+ */
+struct layoutdriver_io_operations {
+};
+
+struct layoutdriver_policy_operations {
+};
+
+/* pNFS client callback functions.
+ * These operations allow the layout driver to access pNFS client
+ * specific information or call pNFS client->server operations.
+ * E.g., getdeviceinfo, I/O callbacks, etc
+ */
+struct pnfs_client_operations {
+};
+
+extern struct pnfs_client_operations pnfs_ops;
+
+extern struct pnfs_client_operations *pnfs_register_layoutdriver(struct pnfs_layoutdriver_type *);
+extern void pnfs_unregister_layoutdriver(struct pnfs_layoutdriver_type *);
+
 #endif /* LINUX_NFS4_PNFS_H */
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 508f8cf..042c2bd 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -613,6 +613,7 @@ extern void * nfs_root_data(void);
 #define NFSDBG_CLIENT		0x0200
 #define NFSDBG_MOUNT		0x0400
 #define NFSDBG_FSCACHE		0x0800
+#define NFSDBG_PNFS		0x1000
 #define NFSDBG_ALL		0xFFFF
 
 #ifdef __KERNEL__
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 11/50] pnfs_submit: set and unset pnfs layoutdriver modules
  2010-08-13 21:31                   ` [PATCH 10/50] pnfs_submit: register unregister pnfs module andros
@ 2010-08-13 21:31                     ` andros
  2010-08-13 21:31                       ` [PATCH 12/50] pnfs_submit: generic pnfs deviceid cache andros
  2010-08-18 20:31                       ` [PATCH 11/50] pnfs_submit: set and unset pnfs layoutdriver modules Christoph Hellwig
  2010-08-18 20:29                     ` [PATCH 10/50] pnfs_submit: register unregister pnfs module Christoph Hellwig
  1 sibling, 2 replies; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/client.c           |   26 +++++++++++++++++++
 fs/nfs/nfs4proc.c         |    4 +++
 fs/nfs/nfs4xdr.c          |   60 +++++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/pnfs.c             |   45 +++++++++++++++++++++++++++++++++
 fs/nfs/pnfs.h             |    9 ++++++
 include/linux/nfs4.h      |    1 +
 include/linux/nfs4_pnfs.h |    4 +++
 include/linux/nfs_fs_sb.h |    5 +++
 include/linux/nfs_xdr.h   |    3 ++
 9 files changed, 157 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 4e7df2a..6560866 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -48,6 +48,7 @@
 #include "iostat.h"
 #include "internal.h"
 #include "fscache.h"
+#include "pnfs.h"
 
 #define NFSDBG_FACILITY		NFSDBG_CLIENT
 
@@ -866,6 +867,28 @@ error:
 }
 
 /*
+ * Initialize the pNFS layout driver and setup pNFS related parameters
+ */
+static void nfs4_init_pnfs(struct nfs_server *server, struct nfs_fsinfo *fsinfo)
+{
+#if defined(CONFIG_NFS_V4_1)
+	struct nfs_client *clp = server->nfs_client;
+
+	if (nfs4_has_session(clp) &&
+	    (clp->cl_exchange_flags & EXCHGID4_FLAG_USE_PNFS_MDS))
+		set_pnfs_layoutdriver(server, fsinfo->layouttype);
+#endif /* CONFIG_NFS_V4_1 */
+}
+
+static void nfs4_uninit_pnfs(struct nfs_server *server)
+{
+#if defined(CONFIG_NFS_V4_1)
+	if (server->nfs_client && nfs4_has_session(server->nfs_client))
+		unmount_pnfs_layoutdriver(server);
+#endif /* CONFIG_NFS_V4_1 */
+}
+
+/*
  * Load up the server record from information gained in an fsinfo record
  */
 static void nfs_server_set_fsinfo(struct nfs_server *server, struct nfs_fsinfo *fsinfo)
@@ -898,6 +921,8 @@ static void nfs_server_set_fsinfo(struct nfs_server *server, struct nfs_fsinfo *
 	if (server->wsize > NFS_MAX_FILE_IO_SIZE)
 		server->wsize = NFS_MAX_FILE_IO_SIZE;
 	server->wpages = (server->wsize + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
+	nfs4_init_pnfs(server, fsinfo);
+
 	server->wtmult = nfs_block_bits(fsinfo->wtmult, NULL);
 
 	server->dtsize = nfs_block_size(fsinfo->dtpref, NULL);
@@ -1017,6 +1042,7 @@ void nfs_free_server(struct nfs_server *server)
 {
 	dprintk("--> nfs_free_server()\n");
 
+	nfs4_uninit_pnfs(server);
 	spin_lock(&nfs_client_lock);
 	list_del(&server->client_link);
 	list_del(&server->master_link);
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 7684817..45d6526 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -129,7 +129,11 @@ const u32 nfs4_fsinfo_bitmap[2] = { FATTR4_WORD0_MAXFILESIZE
 			| FATTR4_WORD0_MAXREAD
 			| FATTR4_WORD0_MAXWRITE
 			| FATTR4_WORD0_LEASE_TIME,
+#ifdef CONFIG_NFS_V4_1
+			FATTR4_WORD1_FS_LAYOUT_TYPES
+#else /* CONFIG_NFS_V4_1 */
 			0
+#endif /* CONFIG_NFS_V4_1 */
 };
 
 const u32 nfs4_fs_locations_bitmap[2] = {
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 257c181..075845d 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -3868,6 +3868,61 @@ xdr_error:
 	return status;
 }
 
+#if defined(CONFIG_NFS_V4_1)
+/*
+ * Decode potentially multiple layout types. Currently we only support
+ * one layout driver per file system.
+ */
+static int decode_pnfs_list(struct xdr_stream *xdr, uint32_t *layoutclass)
+{
+	uint32_t *p;
+	int num;
+
+	p = xdr_inline_decode(xdr, 4);
+	if (unlikely(!p))
+		goto out_overflow;
+	num = be32_to_cpup(p);
+
+	/* pNFS is not supported by the underlying file system */
+	if (num == 0) {
+		*layoutclass = 0;
+		return 0;
+	}
+
+	/* TODO: We will eventually support multiple layout drivers ? */
+	if (num > 1)
+		printk(KERN_INFO "%s: Warning: Multiple pNFS layout drivers "
+			"per filesystem not supported\n", __func__);
+
+	/* Decode and set first layout type */
+	p = xdr_inline_decode(xdr, num * 4);
+	if (unlikely(!p))
+		goto out_overflow;
+	*layoutclass = be32_to_cpup(p);
+	return 0;
+out_overflow:
+	print_overflow_msg(__func__, xdr);
+	return -EIO;
+}
+
+/*
+ * The type of file system exported
+ */
+static int decode_attr_pnfstype(struct xdr_stream *xdr, uint32_t *bitmap,
+				uint32_t *layoutclass)
+{
+	int status = 0;
+
+	dprintk("%s: bitmap is %x\n", __func__, bitmap[1]);
+	if (unlikely(bitmap[1] & (FATTR4_WORD1_FS_LAYOUT_TYPES - 1U)))
+		return -EIO;
+	if (likely(bitmap[1] & FATTR4_WORD1_FS_LAYOUT_TYPES)) {
+		status = decode_pnfs_list(xdr, layoutclass);
+		bitmap[1] &= ~FATTR4_WORD1_FS_LAYOUT_TYPES;
+	}
+	return status;
+}
+#endif /* CONFIG_NFS_V4_1 */
 
 static int decode_fsinfo(struct xdr_stream *xdr, struct nfs_fsinfo *fsinfo)
 {
@@ -3894,6 +3949,11 @@ static int decode_fsinfo(struct xdr_stream *xdr, struct nfs_fsinfo *fsinfo)
 	if ((status = decode_attr_maxwrite(xdr, bitmap, &fsinfo->wtmax)) != 0)
 		goto xdr_error;
 	fsinfo->wtpref = fsinfo->wtmax;
+#if defined(CONFIG_NFS_V4_1)
+	status = decode_attr_pnfstype(xdr, bitmap, &fsinfo->layouttype);
+	if (status)
+		goto xdr_error;
+#endif /* CONFIG_NFS_V4_1 */
 
 	status = verify_attr_len(xdr, savep, attrlen);
 xdr_error:
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 73558b7..4c78277 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -101,6 +101,51 @@ find_pnfs(u32 id, struct pnfs_module **module) {
 	return 0;
 }
 
+/* Unitialize a mountpoint in a layout driver */
+void
+unmount_pnfs_layoutdriver(struct nfs_server *nfss)
+{
+	if (PNFS_EXISTS_LDIO_OP(nfss, uninitialize_mountpoint))
+		nfss->pnfs_curr_ld->ld_io_ops->uninitialize_mountpoint(nfss);
+}
+
+/*
+ * Set the server pnfs module to the first registered pnfs_type.
+ * Only one pNFS layout driver is supported.
+ */
+void
+set_pnfs_layoutdriver(struct nfs_server *server, u32 id)
+{
+	struct pnfs_module *mod = NULL;
+
+	if (server->pnfs_curr_ld)
+		return;
+
+	if (!find_pnfs(id, &mod)) {
+		request_module("%s-%u", LAYOUT_NFSV4_1_MODULE_PREFIX, id);
+		find_pnfs(id, &mod);
+	}
+
+	if (!mod) {
+		dprintk("%s: No pNFS module found for %u. ", __func__, id);
+		goto out_err;
+	}
+
+	server->pnfs_curr_ld = mod->pnfs_ld_type;
+	if (mod->pnfs_ld_type->ld_io_ops->initialize_mountpoint(
+							server->nfs_client)) {
+		printk(KERN_ERR "%s: Error initializing mount point "
+		       "for layout driver %u. ", __func__, id);
+		goto out_err;
+	}
+
+	dprintk("%s: pNFS module for %u set\n", __func__, id);
+	return;
+
+out_err:
+	dprintk("Using NFSv4 I/O\n");
+	server->pnfs_curr_ld = NULL;
+}
 
 /* Allow I/O module to set its functions structure */
 struct pnfs_client_operations*
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 92538ce..3561fa8 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -21,8 +21,17 @@
 #include <linux/nfs_iostat.h>
 #include "iostat.h"
 
+/* pnfs.c */
+void set_pnfs_layoutdriver(struct nfs_server *, u32 id);
+void unmount_pnfs_layoutdriver(struct nfs_server *);
 int pnfs_initialize(void);
 
+#define PNFS_EXISTS_LDIO_OP(srv, opname) ((srv)->pnfs_curr_ld &&	\
+				     (srv)->pnfs_curr_ld->ld_io_ops &&	\
+				     (srv)->pnfs_curr_ld->ld_io_ops->opname)
+
+#define LAYOUT_NFSV4_1_MODULE_PREFIX "nfs-layouttype4"
+
 #endif /* CONFIG_NFS_V4_1 */
 
 #endif /* FS_NFS_PNFS_H */
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 07e40c6..1598e7b 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -471,6 +471,7 @@ enum lock_type4 {
 #define FATTR4_WORD1_TIME_MODIFY        (1UL << 21)
 #define FATTR4_WORD1_TIME_MODIFY_SET    (1UL << 22)
 #define FATTR4_WORD1_MOUNTED_ON_FILEID  (1UL << 23)
+#define FATTR4_WORD1_FS_LAYOUT_TYPES    (1UL << 30)
 
 #define NFSPROC4_NULL 0
 #define NFSPROC4_COMPOUND 1
diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
index 4cc22c6..7240d7e 100644
--- a/include/linux/nfs4_pnfs.h
+++ b/include/linux/nfs4_pnfs.h
@@ -25,6 +25,10 @@ struct pnfs_layoutdriver_type {
  * Either the pagecache or non-pagecache read/write operations must be implemented
  */
 struct layoutdriver_io_operations {
+	/* Registration information for a new mounted file system
+	 */
+	int (*initialize_mountpoint) (struct nfs_client *);
+	int (*uninitialize_mountpoint) (struct nfs_server *server);
 };
 
 struct layoutdriver_policy_operations {
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index c82ee7c..e683128 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -145,6 +145,11 @@ struct nfs_server {
 						   that are supported on this
 						   filesystem */
 #endif
+
+#ifdef CONFIG_NFS_V4_1
+	struct pnfs_layoutdriver_type  *pnfs_curr_ld; /* Active layout driver */
+#endif /* CONFIG_NFS_V4_1 */
+
 	void (*destroy)(struct nfs_server *);
 
 	atomic_t active; /* Keep trace of any activity to this server */
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index fc46192..f1054d4 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -113,6 +113,9 @@ struct nfs_fsinfo {
 	__u32			dtpref;	/* pref. readdir transfer size */
 	__u64			maxfilesize;
 	__u32			lease_time; /* in seconds */
+#if defined(CONFIG_NFS_V4_1)
+	__u32			layouttype; /* supported pnfs layout driver */
+#endif
 };
 
 struct nfs_fsstat {
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 12/50] pnfs_submit: generic pnfs deviceid cache
  2010-08-13 21:31                     ` [PATCH 11/50] pnfs_submit: set and unset pnfs layoutdriver modules andros
@ 2010-08-13 21:31                       ` andros
  2010-08-13 21:31                         ` [PATCH 13/50] pnfs_submit: introduce nfs4layoutdriver module andros
  2010-08-18 20:31                       ` [PATCH 11/50] pnfs_submit: set and unset pnfs layoutdriver modules Christoph Hellwig
  1 sibling, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/pnfs.c             |  141 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/nfs4_pnfs.h |   44 ++++++++++++++
 include/linux/nfs_fs_sb.h |    1 +
 include/linux/pnfs_xdr.h  |    5 ++
 4 files changed, 191 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 4c78277..e17835e 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -193,3 +193,144 @@ pnfs_unregister_layoutdriver(struct pnfs_layoutdriver_type *ld_type)
 
 EXPORT_SYMBOL(pnfs_unregister_layoutdriver);
 EXPORT_SYMBOL(pnfs_register_layoutdriver);
+
+
+/* Device ID cache. Supports one layout type per struct nfs_client */
+int
+nfs4_alloc_init_deviceid_cache(struct nfs_client *clp,
+			 void (*free_callback)(struct kref *))
+{
+	struct nfs4_deviceid_cache *c;
+
+	c = kzalloc(sizeof(struct nfs4_deviceid_cache), GFP_KERNEL);
+	if (!c)
+		return -ENOMEM;
+	spin_lock(&clp->cl_lock);
+	if (clp->cl_devid_cache != NULL) {
+		kref_get(&clp->cl_devid_cache->dc_kref);
+		spin_unlock(&clp->cl_lock);
+		dprintk("%s [kref [%d]]\n", __func__,
+			atomic_read(&clp->cl_devid_cache->dc_kref.refcount));
+		kfree(c);
+	} else {
+		int i;
+
+		spin_lock_init(&c->dc_lock);
+		for (i = 0; i < NFS4_DEVICE_ID_HASH_SIZE ; i++)
+			INIT_HLIST_HEAD(&c->dc_deviceids[i]);
+		kref_init(&c->dc_kref);
+		c->dc_free_callback = free_callback;
+		clp->cl_devid_cache = c;
+		spin_unlock(&clp->cl_lock);
+		dprintk("%s [new]\n", __func__);
+	}
+	return 0;
+}
+EXPORT_SYMBOL(nfs4_alloc_init_deviceid_cache);
+
+void
+nfs4_init_deviceid_node(struct nfs4_deviceid *d)
+{
+	INIT_HLIST_NODE(&d->de_node);
+	kref_init(&d->de_kref);
+}
+EXPORT_SYMBOL(nfs4_init_deviceid_node);
+
+struct nfs4_deviceid *
+nfs4_find_deviceid(struct nfs4_deviceid_cache *c, struct pnfs_deviceid *id)
+{
+	struct nfs4_deviceid *d;
+	struct hlist_node *n;
+	long hash = nfs4_deviceid_hash(id);
+
+	dprintk("--> %s hash %ld\n", __func__, hash);
+	rcu_read_lock();
+	hlist_for_each_entry_rcu(d, n, &c->dc_deviceids[hash], de_node) {
+		if (!memcmp(&d->de_id, id, NFS4_PNFS_DEVICEID4_SIZE)) {
+			rcu_read_unlock();
+			return d;
+		}
+	}
+	rcu_read_unlock();
+	return NULL;
+}
+EXPORT_SYMBOL(nfs4_find_deviceid);
+
+/*
+ * Add or kref_get a deviceid.
+ * GETDEVICEINFOs for same deviceid can race. If deviceid is found, discard new
+ */
+struct nfs4_deviceid *
+nfs4_add_deviceid(struct nfs4_deviceid_cache *c, struct nfs4_deviceid *new)
+{
+	struct nfs4_deviceid *d;
+	struct hlist_node *n;
+	long hash = nfs4_deviceid_hash(&new->de_id);
+
+	dprintk("--> %s hash %ld\n", __func__, hash);
+	spin_lock(&c->dc_lock);
+	hlist_for_each_entry_rcu(d, n, &c->dc_deviceids[hash], de_node) {
+		if (!memcmp(&d->de_id, &new->de_id, NFS4_PNFS_DEVICEID4_SIZE)) {
+			spin_unlock(&c->dc_lock);
+			dprintk("%s [discard]\n", __func__);
+			c->dc_free_callback(&new->de_kref);
+			return d;
+		}
+	}
+	hlist_add_head_rcu(&new->de_node, &c->dc_deviceids[hash]);
+	spin_unlock(&c->dc_lock);
+	dprintk("%s [new]\n", __func__);
+	return new;
+}
+EXPORT_SYMBOL(nfs4_add_deviceid);
+
+static int
+nfs4_remove_deviceid(struct nfs4_deviceid_cache *c, long hash)
+{
+	struct nfs4_deviceid *d;
+	struct hlist_node *n;
+
+	dprintk("--> %s hash %ld\n", __func__, hash);
+	spin_lock(&c->dc_lock);
+	hlist_for_each_entry_rcu(d, n, &c->dc_deviceids[hash], de_node) {
+		hlist_del_rcu(&d->de_node);
+		spin_unlock(&c->dc_lock);
+		synchronize_rcu();
+		dprintk("%s [%d]\n", __func__,
+			atomic_read(&d->de_kref.refcount));
+		kref_put(&d->de_kref, c->dc_free_callback);
+		return 1;
+	}
+	spin_unlock(&c->dc_lock);
+	return 0;
+}
+
+static void
+nfs4_free_deviceid_cache(struct kref *kref)
+{
+	struct nfs4_deviceid_cache *cache =
+		container_of(kref, struct nfs4_deviceid_cache, dc_kref);
+	long i;
+
+	for (i = 0; i < NFS4_DEVICE_ID_HASH_SIZE; i++)
+		while (nfs4_remove_deviceid(cache, i))
+			;
+	kfree(cache);
+}
+
+void
+nfs4_put_deviceid_cache(struct nfs_client *clp)
+{
+	struct nfs4_deviceid_cache *tmp = clp->cl_devid_cache;
+	int refcount;
+
+	dprintk("--> %s cl_devid_cache %p\n", __func__, clp->cl_devid_cache);
+	spin_lock(&clp->cl_lock);
+	refcount = atomic_read(&clp->cl_devid_cache->dc_kref.refcount);
+	if (refcount == 1)
+		clp->cl_devid_cache = NULL;
+	spin_unlock(&clp->cl_lock);
+	dprintk("%s [%d]\n", __func__, refcount);
+	kref_put(&tmp->dc_kref, nfs4_free_deviceid_cache);
+}
+EXPORT_SYMBOL(nfs4_put_deviceid_cache);
diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
index 7240d7e..a3fa1d2 100644
--- a/include/linux/nfs4_pnfs.h
+++ b/include/linux/nfs4_pnfs.h
@@ -12,6 +12,7 @@
 #ifndef LINUX_NFS4_PNFS_H
 #define LINUX_NFS4_PNFS_H
 
+#include <linux/pnfs_xdr.h>
 
 /* Per-layout driver specific registration structure */
 struct pnfs_layoutdriver_type {
@@ -34,6 +35,49 @@ struct layoutdriver_io_operations {
 struct layoutdriver_policy_operations {
 };
 
+/*
+ * Device ID RCU cache. A device ID is unique per client ID and layout type.
+ */
+#define NFS4_DEVICE_ID_HASH_BITS	5
+#define NFS4_DEVICE_ID_HASH_SIZE	(1 << NFS4_DEVICE_ID_HASH_BITS)
+#define NFS4_DEVICE_ID_HASH_MASK	(NFS4_DEVICE_ID_HASH_SIZE - 1)
+
+static inline u32
+nfs4_deviceid_hash(struct pnfs_deviceid *id)
+{
+	unsigned char *cptr = (unsigned char *)id->data;
+	unsigned int nbytes = NFS4_PNFS_DEVICEID4_SIZE;
+	u32 x = 0;
+
+	while (nbytes--) {
+		x *= 37;
+		x += *cptr++;
+	}
+	return x & NFS4_DEVICE_ID_HASH_MASK;
+}
+
+struct nfs4_deviceid_cache {
+	spinlock_t		dc_lock;
+	struct kref		dc_kref;
+	void			(*dc_free_callback)(struct kref *);
+	struct hlist_head	dc_deviceids[NFS4_DEVICE_ID_HASH_SIZE];
+};
+
+/* Device ID cache node */
+struct nfs4_deviceid {
+	struct hlist_node	de_node;
+	struct pnfs_deviceid	de_id;
+	struct kref		de_kref;
+};
+
+extern int nfs4_alloc_init_deviceid_cache(struct nfs_client *,
+				void (*free_callback)(struct kref *));
+extern void nfs4_put_deviceid_cache(struct nfs_client *);
+extern void nfs4_init_deviceid_node(struct nfs4_deviceid *);
+extern struct nfs4_deviceid *nfs4_find_deviceid(struct nfs4_deviceid_cache *,
+				struct pnfs_deviceid *);
+extern struct nfs4_deviceid *nfs4_add_deviceid(struct nfs4_deviceid_cache *,
+				struct nfs4_deviceid *);
 /* pNFS client callback functions.
  * These operations allow the layout driver to access pNFS client
  * specific information or call pNFS client->server operations.
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index e683128..4544b52 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -82,6 +82,7 @@ struct nfs_client {
 	/* The flags used for obtaining the clientid during EXCHANGE_ID */
 	u32			cl_exchange_flags;
 	struct nfs4_session	*cl_session; 	/* sharred session */
+	struct nfs4_deviceid_cache *cl_devid_cache; /* pNFS deviceid cache */
 #endif /* CONFIG_NFS_V4_1 */
 
 #ifdef CONFIG_NFS_FSCACHE
diff --git a/include/linux/pnfs_xdr.h b/include/linux/pnfs_xdr.h
index bcbfbe0..1decc11 100644
--- a/include/linux/pnfs_xdr.h
+++ b/include/linux/pnfs_xdr.h
@@ -12,5 +12,10 @@
 #ifndef LINUX_PNFS_XDR_H
 #define LINUX_PNFS_XDR_H
 
+#define NFS4_PNFS_DEVICEID4_SIZE 16
+
+struct pnfs_deviceid {
+	char data[NFS4_PNFS_DEVICEID4_SIZE];
+};
 
 #endif /* LINUX_PNFS_XDR_H */
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 13/50] pnfs_submit: introduce nfs4layoutdriver module
  2010-08-13 21:31                       ` [PATCH 12/50] pnfs_submit: generic pnfs deviceid cache andros
@ 2010-08-13 21:31                         ` andros
  2010-08-13 21:31                           ` [PATCH 14/50] pnfs_submit: filelayout data server cache andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Marc Eshel, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Basic initialization and unintialization of module and mount point.

Define CONFIG_PNFS_FILE_LAYOUT for conditionally building
the nfslayoutdriver.

[extracted from: pnfs: Initial check-in of pNFS File Layout Driver.]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[pnfs: device destory]
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
[pnfs: nfs4filelayoutdev.c remove CONFIG_PNFS]
[pnfs: nfs4filelayout.c return CONFIG_PNFS]
Signed-off-by: Andy Adamson <andros@netapp.com>
[nfslayout: remove CONFIG_NFS_V4_1 altogether]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/Makefile            |    3 +
 fs/nfs/nfs4filelayout.c    |  124 ++++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/nfs4filelayout.h    |   19 +++++++
 fs/nfs/nfs4filelayoutdev.c |   60 +++++++++++++++++++++
 include/linux/nfs4.h       |    4 ++
 include/linux/nfs_fs.h     |    1 +
 6 files changed, 211 insertions(+), 0 deletions(-)
 create mode 100644 fs/nfs/nfs4filelayout.c
 create mode 100644 fs/nfs/nfs4filelayout.h
 create mode 100644 fs/nfs/nfs4filelayoutdev.c

diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index bb9e773..4776ff9 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -18,3 +18,6 @@ nfs-$(CONFIG_NFS_V4)	+= nfs4proc.o nfs4xdr.o nfs4state.o nfs4renewd.o \
 nfs-$(CONFIG_NFS_V4_1)	+= pnfs.o
 nfs-$(CONFIG_SYSCTL) += sysctl.o
 nfs-$(CONFIG_NFS_FSCACHE) += fscache.o fscache-index.o
+
+obj-$(CONFIG_PNFS_FILE_LAYOUT) += nfs_layout_nfsv41_files.o
+nfs_layout_nfsv41_files-y := nfs4filelayout.o nfs4filelayoutdev.o
diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
new file mode 100644
index 0000000..7039d9d
--- /dev/null
+++ b/fs/nfs/nfs4filelayout.c
@@ -0,0 +1,124 @@
+/*
+ *  linux/fs/nfs/nfs4filelayout.c
+ *
+ *  Module for the pnfs nfs4 file layout driver.
+ *  Defines all I/O and Policy interface operations, plus code
+ *  to register itself with the pNFS client.
+ *
+ *  Copyright (c) 2002 The Regents of the University of Michigan.
+ *  All rights reserved.
+ *
+ *  Dean Hildebrand <dhildebz@eecs.umich.edu>
+ *
+ *  Redistribution and use in source and binary forms, with or without
+ *  modification, are permitted provided that the following conditions
+ *  are met:
+ *
+ *  1. Redistributions of source code must retain the above copyright
+ *     notice, this list of conditions and the following disclaimer.
+ *  2. Redistributions in binary form must reproduce the above copyright
+ *     notice, this list of conditions and the following disclaimer in the
+ *     documentation and/or other materials provided with the distribution.
+ *  3. Neither the name of the University nor the names of its
+ *     contributors may be used to endorse or promote products derived
+ *     from this software without specific prior written permission.
+ *
+ *  THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED
+ *  WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+ *  MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ *  DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+ *  FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ *  CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ *  SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ *  BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ *  LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+ *  NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ *  SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+
+#include <linux/nfs_fs.h>
+#include <linux/nfs_page.h>
+#include <linux/nfs4_pnfs.h>
+
+#include "nfs4filelayout.h"
+#include "nfs4_fs.h"
+#include "internal.h"
+
+#define NFSDBG_FACILITY         NFSDBG_PNFS_LD
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Dean Hildebrand <dhildebz@eecs.umich.edu>");
+MODULE_DESCRIPTION("The NFSv4 file layout driver");
+
+/* Callback operations to the pNFS client */
+struct pnfs_client_operations *pnfs_callback_ops;
+
+int
+filelayout_initialize_mountpoint(struct nfs_client *clp)
+{
+	int status = nfs4_alloc_init_deviceid_cache(clp,
+						nfs4_fl_free_deviceid_callback);
+	if (status) {
+		printk(KERN_WARNING "%s: deviceid cache could not be "
+			"initialized\n", __func__);
+		return status;
+	}
+	dprintk("%s: deviceid cache has been initialized successfully\n",
+		__func__);
+	return 0;
+}
+
+/* Uninitialize a mountpoint by destroying its device list */
+int
+filelayout_uninitialize_mountpoint(struct nfs_server *nfss)
+{
+	dprintk("--> %s\n", __func__);
+
+	if (nfss->pnfs_curr_ld && nfss->nfs_client->cl_devid_cache)
+		nfs4_put_deviceid_cache(nfss->nfs_client);
+	return 0;
+}
+
+struct layoutdriver_io_operations filelayout_io_operations = {
+	.initialize_mountpoint   = filelayout_initialize_mountpoint,
+	.uninitialize_mountpoint = filelayout_uninitialize_mountpoint,
+};
+
+struct layoutdriver_policy_operations filelayout_policy_operations = {
+};
+
+struct pnfs_layoutdriver_type filelayout_type = {
+	.id = LAYOUT_NFSV4_1_FILES,
+	.name = "LAYOUT_NFSV4_1_FILES",
+	.ld_io_ops = &filelayout_io_operations,
+	.ld_policy_ops = &filelayout_policy_operations,
+};
+
+static int __init nfs4filelayout_init(void)
+{
+	printk(KERN_INFO "%s: NFSv4 File Layout Driver Registering...\n",
+	       __func__);
+
+	/*
+	 * Need to register file_operations struct with global list to indicate
+	 * that NFS4 file layout is a possible pNFS I/O module
+	 */
+	pnfs_callback_ops = pnfs_register_layoutdriver(&filelayout_type);
+
+	return 0;
+}
+
+static void __exit nfs4filelayout_exit(void)
+{
+	printk(KERN_INFO "%s: NFSv4 File Layout Driver Unregistering...\n",
+	       __func__);
+
+	/* Unregister NFS4 file layout driver with pNFS client*/
+	pnfs_unregister_layoutdriver(&filelayout_type);
+}
+
+module_init(nfs4filelayout_init);
+module_exit(nfs4filelayout_exit);
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
new file mode 100644
index 0000000..4af4c54
--- /dev/null
+++ b/fs/nfs/nfs4filelayout.h
@@ -0,0 +1,19 @@
+/*
+ *  pnfs_nfs4filelayout.h
+ *
+ *  NFSv4 file layout driver data structures.
+ *
+ *  Copyright (c) 2002 The Regents of the University of Michigan.
+ *  All rights reserved.
+ *
+ *  Dean Hildebrand   <dhildebz@eecs.umich.edu>
+ */
+
+#ifndef FS_NFS_NFS4FILELAYOUT_H
+#define FS_NFS_NFS4FILELAYOUT_H
+
+extern void nfs4_fl_free_deviceid_callback(struct kref *);
+
+extern struct pnfs_client_operations *pnfs_callback_ops;
+
+#endif /* FS_NFS_NFS4FILELAYOUT_H */
diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
new file mode 100644
index 0000000..0aee56b
--- /dev/null
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -0,0 +1,60 @@
+/*
+ *  linux/fs/nfs/nfs4filelayoutdev.c
+ *
+ *  Device operations for the pnfs nfs4 file layout driver.
+ *
+ *  Copyright (c) 2002 The Regents of the University of Michigan.
+ *  All rights reserved.
+ *
+ *  Dean Hildebrand <dhildebz@eecs.umich.edu>
+ *  Garth Goodson   <Garth.Goodson@netapp.com>
+ *
+ *  Redistribution and use in source and binary forms, with or without
+ *  modification, are permitted provided that the following conditions
+ *  are met:
+ *
+ *  1. Redistributions of source code must retain the above copyright
+ *     notice, this list of conditions and the following disclaimer.
+ *  2. Redistributions in binary form must reproduce the above copyright
+ *     notice, this list of conditions and the following disclaimer in the
+ *     documentation and/or other materials provided with the distribution.
+ *  3. Neither the name of the University nor the names of its
+ *     contributors may be used to endorse or promote products derived
+ *     from this software without specific prior written permission.
+ *
+ *  THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED
+ *  WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+ *  MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ *  DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+ *  FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ *  CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ *  SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ *  BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ *  LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+ *  NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ *  SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <linux/hash.h>
+
+#include <linux/nfs4.h>
+#include <linux/nfs_fs.h>
+#include <linux/nfs_xdr.h>
+
+#include <asm/div64.h>
+
+#include <linux/utsname.h>
+#include <linux/vmalloc.h>
+#include <linux/nfs4_pnfs.h>
+#include <linux/pnfs_xdr.h>
+#include "nfs4filelayout.h"
+#include "internal.h"
+#include "nfs4_fs.h"
+
+#define NFSDBG_FACILITY         NFSDBG_PNFS_LD
+
+void
+nfs4_fl_free_deviceid_callback(struct kref *kref)
+{
+}
+
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 1598e7b..010c3ba 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -551,6 +551,10 @@ enum state_protect_how4 {
 	SP4_SSV		= 2
 };
 
+enum pnfs_layouttype {
+	LAYOUT_NFSV4_1_FILES  = 1,
+};
+
 #endif
 #endif
 
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 042c2bd..a0f49a3 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -614,6 +614,7 @@ extern void * nfs_root_data(void);
 #define NFSDBG_MOUNT		0x0400
 #define NFSDBG_FSCACHE		0x0800
 #define NFSDBG_PNFS		0x1000
+#define NFSDBG_PNFS_LD		0x2000
 #define NFSDBG_ALL		0xFFFF
 
 #ifdef __KERNEL__
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 14/50] pnfs_submit: filelayout data server cache
  2010-08-13 21:31                         ` [PATCH 13/50] pnfs_submit: introduce nfs4layoutdriver module andros
@ 2010-08-13 21:31                           ` andros
  2010-08-13 21:31                             ` [PATCH 15/50] pnfs_submit: filelayout deviceid cache andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/client.c            |    1 +
 fs/nfs/nfs4filelayout.h    |   29 ++++++++++++-
 fs/nfs/nfs4filelayoutdev.c |   98 +++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 126 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 6560866..2e440b6 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -253,6 +253,7 @@ void nfs_put_client(struct nfs_client *clp)
 		nfs_free_client(clp);
 	}
 }
+EXPORT_SYMBOL(nfs_put_client);
 
 #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
 /*
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index 4af4c54..f272d0f 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -12,8 +12,35 @@
 #ifndef FS_NFS_NFS4FILELAYOUT_H
 #define FS_NFS_NFS4FILELAYOUT_H
 
-extern void nfs4_fl_free_deviceid_callback(struct kref *);
+#include <linux/kref.h>
+#include <linux/nfs4_pnfs.h>
+#include <linux/pnfs_xdr.h>
+
+#define NFS4_PNFS_DEV_HASH_BITS 5
+#define NFS4_PNFS_DEV_HASH_SIZE (1 << NFS4_PNFS_DEV_HASH_BITS)
+#define NFS4_PNFS_DEV_HASH_MASK (NFS4_PNFS_DEV_HASH_SIZE - 1)
+
+/* Individual ip address */
+struct nfs4_pnfs_ds {
+	struct list_head	ds_node;  /* nfs4_pnfs_dev_hlist dev_dslist */
+	u32 			ds_ip_addr;
+	u32 			ds_port;
+	struct nfs_client	*ds_clp;
+	atomic_t		ds_count;
+	char r_addr[29];
+};
+
+struct nfs4_file_layout_dsaddr {
+	struct nfs4_deviceid	deviceid;
+	u32 			stripe_count;
+	u8			*stripe_indices;
+	u32			ds_num;
+	struct nfs4_pnfs_ds	*ds_list[1];
+};
+
 
 extern struct pnfs_client_operations *pnfs_callback_ops;
 
+extern void nfs4_fl_free_deviceid_callback(struct kref *);
+extern void print_ds(struct nfs4_pnfs_ds *ds);
 #endif /* FS_NFS_NFS4FILELAYOUT_H */
diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index 0aee56b..c8b8ca7 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -51,10 +51,106 @@
 #include "internal.h"
 #include "nfs4_fs.h"
 
-#define NFSDBG_FACILITY         NFSDBG_PNFS_LD
+#define NFSDBG_FACILITY		NFSDBG_PNFS_LD
+
+DEFINE_SPINLOCK(nfs4_ds_cache_lock);
+static LIST_HEAD(nfs4_data_server_cache);
+
+void
+print_ds(struct nfs4_pnfs_ds *ds)
+{
+	if (ds == NULL) {
+		dprintk("%s NULL device \n", __func__);
+		return;
+	}
+	dprintk("        ip_addr %x\n", ntohl(ds->ds_ip_addr));
+	dprintk("        port %hu\n", ntohs(ds->ds_port));
+	dprintk("        client %p\n", ds->ds_clp);
+	dprintk("        ref count %d\n", atomic_read(&ds->ds_count));
+	if (ds->ds_clp)
+		dprintk("        cl_exchange_flags %x\n",
+					    ds->ds_clp->cl_exchange_flags);
+	dprintk("        ip:port %s\n", ds->r_addr);
+}
+
+void
+print_ds_list(struct nfs4_file_layout_dsaddr *dsaddr)
+{
+	int i;
+
+	dprintk("%s dsaddr->ds_num %d\n", __func__,
+		dsaddr->ds_num);
+	for (i = 0; i < dsaddr->ds_num; i++)
+		print_ds(dsaddr->ds_list[i]);
+}
+
+/* nfs4_ds_cache_lock is held */
+static inline struct nfs4_pnfs_ds *
+_data_server_lookup(u32 ip_addr, u32 port)
+{
+	struct nfs4_pnfs_ds *ds;
+
+	dprintk("_data_server_lookup: ip_addr=%x port=%hu\n",
+			ntohl(ip_addr), ntohs(port));
+
+	list_for_each_entry(ds, &nfs4_data_server_cache, ds_node) {
+		if (ds->ds_ip_addr == ip_addr &&
+		    ds->ds_port == port) {
+			return ds;
+		}
+	}
+	return NULL;
+}
+
+static void
+destroy_ds(struct nfs4_pnfs_ds *ds)
+{
+	dprintk("--> %s\n", __func__);
+	print_ds(ds);
+
+	if (ds->ds_clp)
+		nfs_put_client(ds->ds_clp);
+	kfree(ds);
+}
 
 void
 nfs4_fl_free_deviceid_callback(struct kref *kref)
 {
 }
 
+static void
+nfs4_pnfs_ds_add(struct inode *inode, struct nfs4_pnfs_ds **dsp,
+		 u32 ip_addr, u32 port, char *r_addr, int len)
+{
+	struct nfs4_pnfs_ds *tmp_ds, *ds;
+
+	*dsp = NULL;
+
+	ds = kzalloc(sizeof(*tmp_ds), GFP_KERNEL);
+	if (!ds)
+		return;
+
+	spin_lock(&nfs4_ds_cache_lock);
+	tmp_ds = _data_server_lookup(ip_addr, port);
+	if (tmp_ds == NULL) {
+		ds->ds_ip_addr = ip_addr;
+		ds->ds_port = port;
+		strncpy(ds->r_addr, r_addr, len);
+		atomic_set(&ds->ds_count, 1);
+		INIT_LIST_HEAD(&ds->ds_node);
+		ds->ds_clp = NULL;
+		list_add(&ds->ds_node, &nfs4_data_server_cache);
+		*dsp = ds;
+		dprintk("%s add new data server ip 0x%x\n", __func__,
+				ds->ds_ip_addr);
+		spin_unlock(&nfs4_ds_cache_lock);
+	} else {
+		atomic_inc(&tmp_ds->ds_count);
+		*dsp = tmp_ds;
+		dprintk("%s data server found ip 0x%x, inc'ed ds_count to %d\n",
+				__func__, tmp_ds->ds_ip_addr,
+				atomic_read(&tmp_ds->ds_count));
+		spin_unlock(&nfs4_ds_cache_lock);
+		kfree(ds);
+	}
+}
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 15/50] pnfs_submit: filelayout deviceid cache
  2010-08-13 21:31                           ` [PATCH 14/50] pnfs_submit: filelayout data server cache andros
@ 2010-08-13 21:31                             ` andros
  2010-08-13 21:31                               ` [PATCH 16/50] pnfs_submit: generic getdeviceinfo andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayout.h    |    3 ++
 fs/nfs/nfs4filelayoutdev.c |   57 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index f272d0f..39dcdba 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -43,4 +43,7 @@ extern struct pnfs_client_operations *pnfs_callback_ops;
 
 extern void nfs4_fl_free_deviceid_callback(struct kref *);
 extern void print_ds(struct nfs4_pnfs_ds *ds);
+char *deviceid_fmt(const struct pnfs_deviceid *dev_id);
+extern struct nfs4_file_layout_dsaddr *
+nfs4_pnfs_device_item_find(struct nfs_client *, struct pnfs_deviceid *dev_id);
 #endif /* FS_NFS_NFS4FILELAYOUT_H */
diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index c8b8ca7..0da1510 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -84,6 +84,21 @@ print_ds_list(struct nfs4_file_layout_dsaddr *dsaddr)
 		print_ds(dsaddr->ds_list[i]);
 }
 
+/* Debugging function assuming a 64bit major/minor split of the deviceid */
+char *
+deviceid_fmt(const struct pnfs_deviceid *dev_id)
+{
+	static char buf[17];
+	uint32_t *p = (uint32_t *)dev_id->data;
+	uint64_t major, minor;
+
+	p = xdr_decode_hyper(p, &major);
+	p = xdr_decode_hyper(p, &minor);
+
+	sprintf(buf, "%08llu %08llu", major, minor);
+	return buf;
+}
+
 /* nfs4_ds_cache_lock is held */
 static inline struct nfs4_pnfs_ds *
 _data_server_lookup(u32 ip_addr, u32 port)
@@ -113,9 +128,39 @@ destroy_ds(struct nfs4_pnfs_ds *ds)
 	kfree(ds);
 }
 
+static void
+nfs4_fl_free_deviceid(struct nfs4_file_layout_dsaddr *dsaddr)
+{
+	struct nfs4_pnfs_ds *ds;
+	int i;
+
+	dprintk("%s: device id=%s\n", __func__,
+		deviceid_fmt(&dsaddr->deviceid.de_id));
+
+	for (i = 0; i < dsaddr->ds_num; i++) {
+		ds = dsaddr->ds_list[i];
+		if (ds != NULL) {
+			if (atomic_dec_and_lock(&ds->ds_count,
+						&nfs4_ds_cache_lock)) {
+				list_del_init(&ds->ds_node);
+				spin_unlock(&nfs4_ds_cache_lock);
+				destroy_ds(ds);
+			}
+		}
+	}
+	kfree(dsaddr->stripe_indices);
+	kfree(dsaddr);
+}
+
 void
 nfs4_fl_free_deviceid_callback(struct kref *kref)
 {
+	struct nfs4_deviceid *device =
+		container_of(kref, struct nfs4_deviceid, de_kref);
+	struct nfs4_file_layout_dsaddr *dsaddr =
+		container_of(device, struct nfs4_file_layout_dsaddr, deviceid);
+
+	nfs4_fl_free_deviceid(dsaddr);
 }
 
 static void
@@ -154,3 +199,15 @@ nfs4_pnfs_ds_add(struct inode *inode, struct nfs4_pnfs_ds **dsp,
 		kfree(ds);
 	}
 }
+
+struct nfs4_file_layout_dsaddr *
+nfs4_pnfs_device_item_find(struct nfs_client *clp, struct pnfs_deviceid *id)
+{
+	struct nfs4_deviceid *d;
+
+	d = nfs4_find_deviceid(clp->cl_devid_cache, id);
+	dprintk("%s device id (%s) nfs4_deviceid %p\n", __func__,
+		deviceid_fmt(id), d);
+	return (d == NULL) ? NULL :
+		container_of(d, struct nfs4_file_layout_dsaddr, deviceid);
+}
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 16/50] pnfs_submit: generic getdeviceinfo
  2010-08-13 21:31                             ` [PATCH 15/50] pnfs_submit: filelayout deviceid cache andros
@ 2010-08-13 21:31                               ` andros
  2010-08-13 21:31                                 ` [PATCH 17/50] pnfs_submit: filelayout getdeviceinfo andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4proc.c         |   26 ++++++++
 fs/nfs/nfs4xdr.c          |  154 +++++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/pnfs.c             |    6 ++
 fs/nfs/pnfs.h             |    3 +
 include/linux/nfs4.h      |    1 +
 include/linux/nfs4_pnfs.h |   13 ++++
 include/linux/pnfs_xdr.h  |   10 +++
 7 files changed, 213 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 45d6526..9eebe46 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -49,12 +49,15 @@
 #include <linux/mount.h>
 #include <linux/module.h>
 #include <linux/sunrpc/bc_xprt.h>
+#include <linux/pnfs_xdr.h>
+#include <linux/nfs4_pnfs.h>
 
 #include "nfs4_fs.h"
 #include "delegation.h"
 #include "internal.h"
 #include "iostat.h"
 #include "callback.h"
+#include "pnfs.h"
 
 #define NFSDBG_FACILITY		NFSDBG_PROC
 
@@ -5353,6 +5356,29 @@ out:
 	dprintk("<-- %s status=%d\n", __func__, status);
 	return status;
 }
+
+int nfs4_pnfs_getdeviceinfo(struct nfs_server *server, struct pnfs_device *pdev)
+{
+	struct nfs4_pnfs_getdeviceinfo_arg args = {
+		.pdev = pdev,
+	};
+	struct nfs4_pnfs_getdeviceinfo_res res = {
+		.pdev = pdev,
+	};
+	struct rpc_message msg = {
+		.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_PNFS_GETDEVICEINFO],
+		.rpc_argp = &args,
+		.rpc_resp = &res,
+	};
+	int status;
+
+	dprintk("--> %s\n", __func__);
+	status = nfs4_call_sync(server, &msg, &args, &res, 0);
+	dprintk("<-- %s status=%d\n", __func__, status);
+
+	return status;
+}
+
 #endif /* CONFIG_NFS_V4_1 */
 
 struct nfs4_state_recovery_ops nfs40_reboot_recovery_ops = {
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 075845d..58bb81e 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -50,8 +50,11 @@
 #include <linux/nfs4.h>
 #include <linux/nfs_fs.h>
 #include <linux/nfs_idmap.h>
+#include <linux/nfs4_pnfs.h>
+#include <linux/pnfs_xdr.h>
 #include "nfs4_fs.h"
 #include "internal.h"
+#include "pnfs.h"
 
 #define NFSDBG_FACILITY		NFSDBG_XDR
 
@@ -310,6 +313,13 @@ static int nfs4_stat_to_errno(int);
 				XDR_QUADLEN(NFS4_MAX_SESSIONID_LEN) + 5)
 #define encode_reclaim_complete_maxsz	(op_encode_hdr_maxsz + 4)
 #define decode_reclaim_complete_maxsz	(op_decode_hdr_maxsz + 4)
+#define encode_getdeviceinfo_maxsz (op_encode_hdr_maxsz + 4 + \
+				XDR_QUADLEN(NFS4_PNFS_DEVICEID4_SIZE))
+#define decode_getdeviceinfo_maxsz (op_decode_hdr_maxsz + \
+				4 /*layout type */ + \
+				4 /* opaque devaddr4 length */ +\
+				4 /* notification bitmap length */ + \
+				4 /* notification bitmap */)
 #else /* CONFIG_NFS_V4_1 */
 #define encode_sequence_maxsz	0
 #define decode_sequence_maxsz	0
@@ -699,6 +709,12 @@ static int nfs4_stat_to_errno(int);
 #define NFS4_dec_reclaim_complete_sz	(compound_decode_hdr_maxsz + \
 					 decode_sequence_maxsz + \
 					 decode_reclaim_complete_maxsz)
+#define NFS4_enc_getdeviceinfo_sz (compound_encode_hdr_maxsz +    \
+				encode_sequence_maxsz +\
+				encode_getdeviceinfo_maxsz)
+#define NFS4_dec_getdeviceinfo_sz (compound_decode_hdr_maxsz +    \
+				decode_sequence_maxsz + \
+				decode_getdeviceinfo_maxsz)
 
 const u32 nfs41_maxwrite_overhead = ((RPC_MAX_HEADER_WITH_AUTH +
 				      compound_encode_hdr_maxsz +
@@ -1726,6 +1742,30 @@ static void encode_sequence(struct xdr_stream *xdr,
 #endif /* CONFIG_NFS_V4_1 */
 }
 
+#ifdef CONFIG_NFS_V4_1
+static void
+encode_getdeviceinfo(struct xdr_stream *xdr,
+		     const struct nfs4_pnfs_getdeviceinfo_arg *args,
+		     struct compound_hdr *hdr)
+{
+	int has_bitmap = (args->pdev->dev_notify_types != 0);
+	int len = 16 + NFS4_PNFS_DEVICEID4_SIZE + (has_bitmap * 4);
+	__be32 *p;
+
+	p = reserve_space(xdr, len);
+	*p++ = cpu_to_be32(OP_GETDEVICEINFO);
+	p = xdr_encode_opaque_fixed(p, args->pdev->dev_id.data,
+				    NFS4_PNFS_DEVICEID4_SIZE);
+	*p++ = cpu_to_be32(args->pdev->layout_type);
+	*p++ = cpu_to_be32(args->pdev->pglen + len);	/* gdia_maxcount */
+	*p++ = cpu_to_be32(has_bitmap);			/* bitmap length [01] */
+	if (has_bitmap)
+		*p = cpu_to_be32(args->pdev->dev_notify_types);
+	hdr->nops++;
+}
+
+#endif /* CONFIG_NFS_V4_1 */
+
 /*
  * END OF "GENERIC" ENCODE ROUTINES.
  */
@@ -2543,6 +2583,38 @@ static int nfs4_xdr_enc_reclaim_complete(struct rpc_rqst *req, uint32_t *p,
 	return 0;
 }
 
+/*
+ * Encode GETDEVICEINFO request
+ */
+static int nfs4_xdr_enc_getdeviceinfo(struct rpc_rqst *req, uint32_t *p,
+				      struct nfs4_pnfs_getdeviceinfo_arg *args)
+{
+	struct xdr_stream xdr;
+	struct rpc_auth *auth = req->rq_task->tk_msg.rpc_cred->cr_auth;
+	struct compound_hdr hdr = {
+		.minorversion = nfs4_xdr_minorversion(&args->seq_args),
+	};
+	int replen;
+
+	xdr_init_encode(&xdr, &req->rq_snd_buf, p);
+	encode_compound_hdr(&xdr, req, &hdr);
+	encode_sequence(&xdr, &args->seq_args, &hdr);
+	encode_getdeviceinfo(&xdr, args, &hdr);
+
+	/* set up reply kvec. Subtract notification bitmap max size (8)
+	 * so that notification bitmap is put in xdr_buf tail */
+	replen = (RPC_REPHDRSIZE + auth->au_rslack +
+		  NFS4_dec_getdeviceinfo_sz - 8) << 2;
+	xdr_inline_pages(&req->rq_rcv_buf, replen, args->pdev->pages,
+			 args->pdev->pgbase, args->pdev->pglen);
+	dprintk("%s: inlined page args = (%u, %p, %u, %u)\n",
+		__func__, replen, args->pdev->pages,
+		args->pdev->pgbase, args->pdev->pglen);
+
+	encode_nops(&hdr);
+	return 0;
+}
+
 #endif /* CONFIG_NFS_V4_1 */
 
 static void print_overflow_msg(const char *func, const struct xdr_stream *xdr)
@@ -4791,6 +4863,64 @@ out_overflow:
 #endif /* CONFIG_NFS_V4_1 */
 }
 
+#if defined(CONFIG_NFS_V4_1)
+
+static int decode_getdeviceinfo(struct xdr_stream *xdr,
+				struct pnfs_device *pdev)
+{
+	__be32 *p;
+	uint32_t len, type;
+	int status;
+
+	status = decode_op_hdr(xdr, OP_GETDEVICEINFO);
+	if (status) {
+		if (status == -ETOOSMALL) {
+			p = xdr_inline_decode(xdr, 4);
+			if (unlikely(!p))
+				goto out_overflow;
+			pdev->mincount = be32_to_cpup(p);
+			dprintk("%s: Min count too small. mincnt = %u\n",
+				__func__, pdev->mincount);
+		}
+		return status;
+	}
+
+	p = xdr_inline_decode(xdr, 8);
+	if (unlikely(!p))
+		goto out_overflow;
+	type = be32_to_cpup(p++);
+	if (type != pdev->layout_type) {
+		dprintk("%s: layout mismatch req: %u pdev: %u\n",
+			__func__, pdev->layout_type, type);
+		return -EINVAL;
+	}
+	/*
+	 * Get the length of the opaque device_addr4. xdr_read_pages places
+	 * the opaque device_addr4 in the xdr_buf->pages (pnfs_device->pages)
+	 * and places the remaining xdr data in xdr_buf->tail
+	 */
+	pdev->mincount = be32_to_cpup(p);
+	xdr_read_pages(xdr, pdev->mincount); /* include space for the length */
+
+	/* At most one bitmap word */
+	p = xdr_inline_decode(xdr, 4);
+	if (unlikely(!p))
+		goto out_overflow;
+	len = be32_to_cpup(p);
+	if (len) {
+		p = xdr_inline_decode(xdr, 4);
+		if (unlikely(!p))
+			goto out_overflow;
+		pdev->dev_notify_types = be32_to_cpup(p);
+	} else
+		pdev->dev_notify_types = 0;
+	return 0;
+out_overflow:
+	print_overflow_msg(__func__, xdr);
+	return -EIO;
+}
+#endif /* CONFIG_NFS_V4_1 */
+
 /*
  * END OF "GENERIC" DECODE ROUTINES.
  */
@@ -5818,6 +5948,29 @@ static int nfs4_xdr_dec_reclaim_complete(struct rpc_rqst *rqstp, uint32_t *p,
 		status = decode_reclaim_complete(&xdr, (void *)NULL);
 	return status;
 }
+
+/*
+ * Decode GETDEVINFO response
+ */
+static int nfs4_xdr_dec_getdeviceinfo(struct rpc_rqst *rqstp, uint32_t *p,
+				      struct nfs4_pnfs_getdeviceinfo_res *res)
+{
+	struct xdr_stream xdr;
+	struct compound_hdr hdr;
+	int status;
+
+	xdr_init_decode(&xdr, &rqstp->rq_rcv_buf, p);
+	status = decode_compound_hdr(&xdr, &hdr);
+	if (status != 0)
+		goto out;
+	status = decode_sequence(&xdr, &res->seq_res, rqstp);
+	if (status != 0)
+		goto out;
+	status = decode_getdeviceinfo(&xdr, res->pdev);
+out:
+	return status;
+}
+
 #endif /* CONFIG_NFS_V4_1 */
 
 __be32 *nfs4_decode_dirent(__be32 *p, struct nfs_entry *entry, int plus)
@@ -5996,6 +6149,7 @@ struct rpc_procinfo	nfs4_procedures[] = {
   PROC(SEQUENCE,	enc_sequence,	dec_sequence),
   PROC(GET_LEASE_TIME,	enc_get_lease_time,	dec_get_lease_time),
   PROC(RECLAIM_COMPLETE, enc_reclaim_complete,  dec_reclaim_complete),
+  PROC(PNFS_GETDEVICEINFO, enc_getdeviceinfo, dec_getdeviceinfo),
 #endif /* CONFIG_NFS_V4_1 */
 };
 
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index e17835e..dcede52 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -191,6 +191,12 @@ pnfs_unregister_layoutdriver(struct pnfs_layoutdriver_type *ld_type)
 	}
 }
 
+/* Callback operations for layout drivers.
+ */
+struct pnfs_client_operations pnfs_ops = {
+	.nfs_getdeviceinfo = nfs4_pnfs_getdeviceinfo,
+};
+
 EXPORT_SYMBOL(pnfs_unregister_layoutdriver);
 EXPORT_SYMBOL(pnfs_register_layoutdriver);
 
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 3561fa8..f9fb58b 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -21,6 +21,9 @@
 #include <linux/nfs_iostat.h>
 #include "iostat.h"
 
+/* nfs4proc.c */
+extern int nfs4_pnfs_getdeviceinfo(struct nfs_server *server,
+				   struct pnfs_device *dev);
 /* pnfs.c */
 void set_pnfs_layoutdriver(struct nfs_server *, u32 id);
 void unmount_pnfs_layoutdriver(struct nfs_server *);
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 010c3ba..c0c8cba 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -533,6 +533,7 @@ enum {
 	NFSPROC4_CLNT_SEQUENCE,
 	NFSPROC4_CLNT_GET_LEASE_TIME,
 	NFSPROC4_CLNT_RECLAIM_COMPLETE,
+	NFSPROC4_CLNT_PNFS_GETDEVICEINFO,
 };
 
 /* nfs41 types */
diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
index a3fa1d2..dee53f2 100644
--- a/include/linux/nfs4_pnfs.h
+++ b/include/linux/nfs4_pnfs.h
@@ -35,6 +35,17 @@ struct layoutdriver_io_operations {
 struct layoutdriver_policy_operations {
 };
 
+struct pnfs_device {
+	struct pnfs_deviceid dev_id;
+	unsigned int  layout_type;
+	unsigned int  mincount;
+	struct page **pages;
+	void          *area;
+	unsigned int  pgbase;
+	unsigned int  pglen;
+	unsigned int  dev_notify_types;
+};
+
 /*
  * Device ID RCU cache. A device ID is unique per client ID and layout type.
  */
@@ -84,6 +95,8 @@ extern struct nfs4_deviceid *nfs4_add_deviceid(struct nfs4_deviceid_cache *,
  * E.g., getdeviceinfo, I/O callbacks, etc
  */
 struct pnfs_client_operations {
+	int (*nfs_getdeviceinfo) (struct nfs_server *,
+				  struct pnfs_device *dev);
 };
 
 extern struct pnfs_client_operations pnfs_ops;
diff --git a/include/linux/pnfs_xdr.h b/include/linux/pnfs_xdr.h
index 1decc11..458ff69 100644
--- a/include/linux/pnfs_xdr.h
+++ b/include/linux/pnfs_xdr.h
@@ -18,4 +18,14 @@ struct pnfs_deviceid {
 	char data[NFS4_PNFS_DEVICEID4_SIZE];
 };
 
+struct nfs4_pnfs_getdeviceinfo_arg {
+	struct pnfs_device *pdev;
+	struct nfs4_sequence_args seq_args;
+};
+
+struct nfs4_pnfs_getdeviceinfo_res {
+	struct pnfs_device *pdev;
+	struct nfs4_sequence_res seq_res;
+};
+
 #endif /* LINUX_PNFS_XDR_H */
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 17/50] pnfs_submit: filelayout getdeviceinfo
  2010-08-13 21:31                               ` [PATCH 16/50] pnfs_submit: generic getdeviceinfo andros
@ 2010-08-13 21:31                                 ` andros
  2010-08-13 21:31                                   ` [PATCH 18/50] pnfs-submit: change stateid to be a union andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayout.h    |    6 +
 fs/nfs/nfs4filelayoutdev.c |  243 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 249 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index 39dcdba..be3c9fe 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -20,6 +20,9 @@
 #define NFS4_PNFS_DEV_HASH_SIZE (1 << NFS4_PNFS_DEV_HASH_BITS)
 #define NFS4_PNFS_DEV_HASH_MASK (NFS4_PNFS_DEV_HASH_SIZE - 1)
 
+#define NFS4_PNFS_MAX_STRIPE_CNT 4096
+#define NFS4_PNFS_MAX_MULTI_CNT  64 /* 256 fit into a u8 stripe_index */
+
 /* Individual ip address */
 struct nfs4_pnfs_ds {
 	struct list_head	ds_node;  /* nfs4_pnfs_dev_hlist dev_dslist */
@@ -46,4 +49,7 @@ extern void print_ds(struct nfs4_pnfs_ds *ds);
 char *deviceid_fmt(const struct pnfs_deviceid *dev_id);
 extern struct nfs4_file_layout_dsaddr *
 nfs4_pnfs_device_item_find(struct nfs_client *, struct pnfs_deviceid *dev_id);
+struct nfs4_file_layout_dsaddr *
+get_device_info(struct inode *inode, struct pnfs_deviceid *dev_id);
+
 #endif /* FS_NFS_NFS4FILELAYOUT_H */
diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index 0da1510..f7614f6 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -200,6 +200,249 @@ nfs4_pnfs_ds_add(struct inode *inode, struct nfs4_pnfs_ds **dsp,
 	}
 }
 
+static struct nfs4_pnfs_ds *
+decode_and_add_ds(uint32_t **pp, struct inode *inode)
+{
+	struct nfs4_pnfs_ds *ds = NULL;
+	char r_addr[29]; /* max size of ip/port string */
+	int len;
+	u32 ip_addr, port;
+	int tmp[6];
+	uint32_t *p = *pp;
+
+	dprintk("%s enter\n", __func__);
+	/* check and skip r_netid */
+	len = be32_to_cpup(p++);
+	/* "tcp" */
+	if (len != 3) {
+		printk("%s: ERROR: non TCP r_netid len %d\n",
+			__func__, len);
+		goto out_err;
+	}
+	/*
+	 * Read the bytes into a temporary buffer
+	 * XXX: should probably sanity check them
+	 */
+	tmp[0] = be32_to_cpup(p++);
+
+	len = be32_to_cpup(p++);
+	if (len >= sizeof(r_addr)) {
+		printk("%s: ERROR: Device ip/port too long (%d)\n",
+			__func__, len);
+		goto out_err;
+	}
+	memcpy(r_addr, p, len);
+	p += XDR_QUADLEN(len);
+	*pp = p;
+	r_addr[len] = '\0';
+	sscanf(r_addr, "%d.%d.%d.%d.%d.%d", &tmp[0], &tmp[1],
+	       &tmp[2], &tmp[3], &tmp[4], &tmp[5]);
+	ip_addr = htonl((tmp[0]<<24) | (tmp[1]<<16) | (tmp[2]<<8) | (tmp[3]));
+	port = htons((tmp[4] << 8) | (tmp[5]));
+
+	nfs4_pnfs_ds_add(inode, &ds, ip_addr, port, r_addr, len);
+
+	dprintk("%s: addr:port string = %s\n", __func__, r_addr);
+	return ds;
+out_err:
+	dprintk("%s returned NULL\n", __func__);
+	return NULL;
+}
+
+/* Decode opaque device data and return the result */
+static struct nfs4_file_layout_dsaddr*
+decode_device(struct inode *ino, struct pnfs_device *pdev)
+{
+	int i, dummy;
+	u32 cnt, num;
+	u8 *indexp;
+	uint32_t *p = (u32 *)pdev->area, *indicesp;
+	struct nfs4_file_layout_dsaddr *dsaddr;
+
+	/* Get the stripe count (number of stripe index) */
+	cnt = be32_to_cpup(p++);
+	dprintk("%s stripe count  %d\n", __func__, cnt);
+	if (cnt > NFS4_PNFS_MAX_STRIPE_CNT) {
+		printk(KERN_WARNING "%s: stripe count %d greater than "
+		       "supported maximum %d\n", __func__,
+			cnt, NFS4_PNFS_MAX_STRIPE_CNT);
+		goto out_err;
+	}
+
+	/* Check the multipath list count */
+	indicesp = p;
+	p += XDR_QUADLEN(cnt << 2);
+	num = be32_to_cpup(p++);
+	dprintk("%s ds_num %u\n", __func__, num);
+	if (num > NFS4_PNFS_MAX_MULTI_CNT) {
+		printk(KERN_WARNING "%s: multipath count %d greater than "
+			"supported maximum %d\n", __func__,
+			num, NFS4_PNFS_MAX_MULTI_CNT);
+		goto out_err;
+	}
+	dsaddr = kzalloc(sizeof(*dsaddr) +
+			(sizeof(struct nfs4_pnfs_ds *) * (num - 1)),
+			GFP_KERNEL);
+	if (!dsaddr)
+		goto out_err;
+
+	dsaddr->stripe_indices = kzalloc(sizeof(u8) * cnt, GFP_KERNEL);
+	if (!dsaddr->stripe_indices)
+		goto out_err_free;
+
+	dsaddr->stripe_count = cnt;
+	dsaddr->ds_num = num;
+
+	memcpy(&dsaddr->deviceid.de_id, &pdev->dev_id,
+	       NFS4_PNFS_DEVICEID4_SIZE);
+
+	/* Go back an read stripe indices */
+	p = indicesp;
+	indexp = &dsaddr->stripe_indices[0];
+	for (i = 0; i < dsaddr->stripe_count; i++) {
+		dummy = be32_to_cpup(p++);
+		*indexp = dummy; /* bound by NFS4_PNFS_MAX_MULTI_CNT */
+		indexp++;
+	}
+	/* Skip already read multipath list count */
+	p++;
+
+	for (i = 0; i < dsaddr->ds_num; i++) {
+		int j;
+
+		dummy = be32_to_cpup(p++); /* multipath count */
+		if (dummy > 1) {
+			printk(KERN_WARNING
+			       "%s: Multipath count %d not supported, "
+			       "skipping all greater than 1\n", __func__,
+				dummy);
+		}
+		for (j = 0; j < dummy; j++) {
+			if (j == 0) {
+				dsaddr->ds_list[i] = decode_and_add_ds(&p, ino);
+				if (dsaddr->ds_list[i] == NULL)
+					goto out_err_free;
+			} else {
+				u32 len;
+				/* skip extra multipath */
+				len = be32_to_cpup(p++);
+				p += XDR_QUADLEN(len);
+				len = be32_to_cpup(p++);
+				p += XDR_QUADLEN(len);
+				continue;
+			}
+		}
+	}
+	nfs4_init_deviceid_node(&dsaddr->deviceid);
+
+	return dsaddr;
+
+out_err_free:
+	nfs4_fl_free_deviceid(dsaddr);
+out_err:
+	dprintk("%s ERROR: returning NULL\n", __func__);
+	return NULL;
+}
+
+/*
+ * Decode the opaque device specified in 'dev'
+ * and add it to the list of available devices.
+ * If the deviceid is already cached, nfs4_add_deviceid will return
+ * a pointer to the cached struct and throw away the new.
+ */
+static struct nfs4_file_layout_dsaddr*
+decode_and_add_device(struct inode *inode, struct pnfs_device *dev)
+{
+	struct nfs4_file_layout_dsaddr *dsaddr;
+	struct nfs4_deviceid *d;
+
+	dsaddr = decode_device(inode, dev);
+	if (!dsaddr) {
+		printk(KERN_WARNING "%s: Could not decode or add device\n",
+			__func__);
+		return NULL;
+	}
+
+	d = nfs4_add_deviceid(NFS_SERVER(inode)->nfs_client->cl_devid_cache,
+			      &dsaddr->deviceid);
+
+	return container_of(d, struct nfs4_file_layout_dsaddr, deviceid);
+}
+
+/*
+ * Retrieve the information for dev_id, add it to the list
+ * of available devices, and return it.
+ */
+struct nfs4_file_layout_dsaddr *
+get_device_info(struct inode *inode, struct pnfs_deviceid *dev_id)
+{
+	struct pnfs_device *pdev = NULL;
+	u32 max_resp_sz;
+	int max_pages;
+	struct page **pages = NULL;
+	struct nfs4_file_layout_dsaddr *dsaddr = NULL;
+	int rc, i;
+	struct nfs_server *server = NFS_SERVER(inode);
+
+	/*
+	 * Use the session max response size as the basis for setting
+	 * GETDEVICEINFO's maxcount
+	 */
+	max_resp_sz = server->nfs_client->cl_session->fc_attrs.max_resp_sz;
+	max_pages = max_resp_sz >> PAGE_SHIFT;
+	dprintk("%s inode %p max_resp_sz %u max_pages %d\n",
+		__func__, inode, max_resp_sz, max_pages);
+
+	pdev = kzalloc(sizeof(struct pnfs_device), GFP_KERNEL);
+	if (pdev == NULL)
+		return NULL;
+
+	pages = kzalloc(max_pages * sizeof(struct page *), GFP_KERNEL);
+	if (pages == NULL) {
+		kfree(pdev);
+		return NULL;
+	}
+	for (i = 0; i < max_pages; i++) {
+		pages[i] = alloc_page(GFP_KERNEL);
+		if (!pages[i])
+			goto out_free;
+	}
+
+	/* set pdev->area */
+	pdev->area = vmap(pages, max_pages, VM_MAP, PAGE_KERNEL);
+	if (!pdev->area)
+		goto out_free;
+
+	memcpy(&pdev->dev_id, dev_id, NFS4_PNFS_DEVICEID4_SIZE);
+	pdev->layout_type = LAYOUT_NFSV4_1_FILES;
+	pdev->pages = pages;
+	pdev->pgbase = 0;
+	pdev->pglen = PAGE_SIZE * max_pages;
+	pdev->mincount = 0;
+	/* TODO: Update types when CB_NOTIFY_DEVICEID is available */
+	pdev->dev_notify_types = 0;
+
+	rc = pnfs_callback_ops->nfs_getdeviceinfo(server, pdev);
+	dprintk("%s getdevice info returns %d\n", __func__, rc);
+	if (rc)
+		goto out_free;
+
+	/*
+	 * Found new device, need to decode it and then add it to the
+	 * list of known devices for this mountpoint.
+	 */
+	dsaddr = decode_and_add_device(inode, pdev);
+out_free:
+	if (pdev->area != NULL)
+		vunmap(pdev->area);
+	for (i = 0; i < max_pages; i++)
+		__free_page(pages[i]);
+	kfree(pages);
+	kfree(pdev);
+	dprintk("<-- %s dsaddr %p\n", __func__, dsaddr);
+	return dsaddr;
+}
+
 struct nfs4_file_layout_dsaddr *
 nfs4_pnfs_device_item_find(struct nfs_client *clp, struct pnfs_deviceid *id)
 {
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 18/50] pnfs-submit: change stateid to be a union
  2010-08-13 21:31                                 ` [PATCH 17/50] pnfs_submit: filelayout getdeviceinfo andros
@ 2010-08-13 21:31                                   ` andros
  2010-08-13 21:31                                     ` [PATCH 19/50] pnfs_submit: layout header alloc,reference, and destroy andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Alexandros Batsakis

From: The pNFS Team <linux-nfs@vger.kernel.org>

In NFSv4.1 the stateid consists of the other and seqid fields. For layout
processing we need to numerically compare the seqid value of layout stateids.
To do so, introduce a union to nfs4_stateid to swtich between opaque(16 bytes)
and opaque(12 bytes) / __be32

Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/callback_proc.c |   13 +++++++------
 fs/nfs/callback_xdr.c  |    2 +-
 fs/nfs/delegation.c    |   19 +++++++++++--------
 fs/nfs/nfs4proc.c      |   41 +++++++++++++++++++++++++----------------
 fs/nfs/nfs4state.c     |    4 ++--
 fs/nfs/nfs4xdr.c       |   28 +++++++++++++++-------------
 include/linux/nfs4.h   |   17 +++++++++++++++--
 7 files changed, 76 insertions(+), 48 deletions(-)

diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
index 7445dd0..c7f0021 100644
--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -105,8 +105,9 @@ out:
 
 int nfs4_validate_delegation_stateid(struct nfs_delegation *delegation, const nfs4_stateid *stateid)
 {
-	if (delegation == NULL || memcmp(delegation->stateid.data, stateid->data,
-					 sizeof(delegation->stateid.data)) != 0)
+	if (delegation == NULL || memcmp(delegation->stateid.u.data,
+					 stateid->u.data,
+					 sizeof(delegation->stateid.u.data)))
 		return 0;
 	return 1;
 }
@@ -118,11 +119,11 @@ int nfs41_validate_delegation_stateid(struct nfs_delegation *delegation, const n
 	if (delegation == NULL)
 		return 0;
 
-	/* seqid is 4-bytes long */
-	if (((u32 *) &stateid->data)[0] != 0)
+	if (stateid->u.stateid.seqid != 0)
 		return 0;
-	if (memcmp(&delegation->stateid.data[4], &stateid->data[4],
-		   sizeof(stateid->data)-4))
+	if (memcmp(&delegation->stateid.u.stateid.other,
+		   &stateid->u.stateid.other,
+		   NFS4_STATEID_OTHER_SIZE))
 		return 0;
 
 	return 1;
diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c
index 05af212..79b0fb7 100644
--- a/fs/nfs/callback_xdr.c
+++ b/fs/nfs/callback_xdr.c
@@ -136,7 +136,7 @@ static __be32 decode_stateid(struct xdr_stream *xdr, nfs4_stateid *stateid)
 	p = read_buf(xdr, 16);
 	if (unlikely(p == NULL))
 		return htonl(NFS4ERR_RESOURCE);
-	memcpy(stateid->data, p, 16);
+	memcpy(stateid->u.data, p, 16);
 	return 0;
 }
 
diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index f34f4ac..3431d10 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -104,7 +104,8 @@ again:
 			continue;
 		if (!test_bit(NFS_DELEGATED_STATE, &state->flags))
 			continue;
-		if (memcmp(state->stateid.data, stateid->data, sizeof(state->stateid.data)) != 0)
+		if (memcmp(state->stateid.u.data, stateid->u.data,
+			   sizeof(state->stateid.u.data)) != 0)
 			continue;
 		get_nfs_open_context(ctx);
 		spin_unlock(&inode->i_lock);
@@ -133,8 +134,8 @@ void nfs_inode_reclaim_delegation(struct inode *inode, struct rpc_cred *cred, st
 	if (delegation != NULL) {
 		spin_lock(&delegation->lock);
 		if (delegation->inode != NULL) {
-			memcpy(delegation->stateid.data, res->delegation.data,
-			       sizeof(delegation->stateid.data));
+			memcpy(delegation->stateid.u.data, res->delegation.u.data,
+			       sizeof(delegation->stateid.u.data));
 			delegation->type = res->delegation_type;
 			delegation->maxsize = res->maxsize;
 			oldcred = delegation->cred;
@@ -187,8 +188,9 @@ static struct nfs_delegation *nfs_detach_delegation_locked(struct nfs_inode *nfs
 	if (delegation == NULL)
 		goto nomatch;
 	spin_lock(&delegation->lock);
-	if (stateid != NULL && memcmp(delegation->stateid.data, stateid->data,
-				sizeof(delegation->stateid.data)) != 0)
+	if (stateid != NULL && memcmp(delegation->stateid.u.data,
+				      stateid->u.data,
+				      sizeof(delegation->stateid.u.data)) != 0)
 		goto nomatch_unlock;
 	list_del_rcu(&delegation->super_list);
 	delegation->inode = NULL;
@@ -216,8 +218,8 @@ int nfs_inode_set_delegation(struct inode *inode, struct rpc_cred *cred, struct
 	delegation = kmalloc(sizeof(*delegation), GFP_NOFS);
 	if (delegation == NULL)
 		return -ENOMEM;
-	memcpy(delegation->stateid.data, res->delegation.data,
-			sizeof(delegation->stateid.data));
+	memcpy(delegation->stateid.u.data, res->delegation.u.data,
+			sizeof(delegation->stateid.u.data));
 	delegation->type = res->delegation_type;
 	delegation->maxsize = res->maxsize;
 	delegation->change_attr = nfsi->change_attr;
@@ -560,7 +562,8 @@ int nfs4_copy_delegation_stateid(nfs4_stateid *dst, struct inode *inode)
 	rcu_read_lock();
 	delegation = rcu_dereference(nfsi->delegation);
 	if (delegation != NULL) {
-		memcpy(dst->data, delegation->stateid.data, sizeof(dst->data));
+		memcpy(dst->u.data, delegation->stateid.u.data,
+		       sizeof(dst->u.data));
 		ret = 1;
 	}
 	rcu_read_unlock();
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 9eebe46..72e5132 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -877,8 +877,10 @@ static void update_open_stateflags(struct nfs4_state *state, fmode_t fmode)
 static void nfs_set_open_stateid_locked(struct nfs4_state *state, nfs4_stateid *stateid, fmode_t fmode)
 {
 	if (test_bit(NFS_DELEGATED_STATE, &state->flags) == 0)
-		memcpy(state->stateid.data, stateid->data, sizeof(state->stateid.data));
-	memcpy(state->open_stateid.data, stateid->data, sizeof(state->open_stateid.data));
+		memcpy(state->stateid.u.data, stateid->u.data,
+		       sizeof(state->stateid.u.data));
+	memcpy(state->open_stateid.u.data, stateid->u.data,
+	       sizeof(state->open_stateid.u.data));
 	switch (fmode) {
 		case FMODE_READ:
 			set_bit(NFS_O_RDONLY_STATE, &state->flags);
@@ -906,7 +908,8 @@ static void __update_open_stateid(struct nfs4_state *state, nfs4_stateid *open_s
 	 */
 	write_seqlock(&state->seqlock);
 	if (deleg_stateid != NULL) {
-		memcpy(state->stateid.data, deleg_stateid->data, sizeof(state->stateid.data));
+		memcpy(state->stateid.u.data, deleg_stateid->u.data,
+		       sizeof(state->stateid.u.data));
 		set_bit(NFS_DELEGATED_STATE, &state->flags);
 	}
 	if (open_stateid != NULL)
@@ -937,7 +940,8 @@ static int update_open_stateid(struct nfs4_state *state, nfs4_stateid *open_stat
 
 	if (delegation == NULL)
 		delegation = &deleg_cur->stateid;
-	else if (memcmp(deleg_cur->stateid.data, delegation->data, NFS4_STATEID_SIZE) != 0)
+	else if (memcmp(deleg_cur->stateid.u.data, delegation->u.data,
+			NFS4_STATEID_SIZE) != 0)
 		goto no_delegation_unlock;
 
 	nfs_mark_delegation_referenced(deleg_cur);
@@ -999,7 +1003,8 @@ static struct nfs4_state *nfs4_try_open_cached(struct nfs4_opendata *opendata)
 			break;
 		}
 		/* Save the delegation */
-		memcpy(stateid.data, delegation->stateid.data, sizeof(stateid.data));
+		memcpy(stateid.u.data, delegation->stateid.u.data,
+		       sizeof(stateid.u.data));
 		rcu_read_unlock();
 		ret = nfs_may_open(state->inode, state->owner->so_cred, open_mode);
 		if (ret != 0)
@@ -1153,10 +1158,13 @@ static int nfs4_open_recover(struct nfs4_opendata *opendata, struct nfs4_state *
 	 * Check if we need to update the current stateid.
 	 */
 	if (test_bit(NFS_DELEGATED_STATE, &state->flags) == 0 &&
-	    memcmp(state->stateid.data, state->open_stateid.data, sizeof(state->stateid.data)) != 0) {
+	    memcmp(state->stateid.u.data, state->open_stateid.u.data,
+		   sizeof(state->stateid.u.data)) != 0) {
 		write_seqlock(&state->seqlock);
 		if (test_bit(NFS_DELEGATED_STATE, &state->flags) == 0)
-			memcpy(state->stateid.data, state->open_stateid.data, sizeof(state->stateid.data));
+			memcpy(state->stateid.u.data,
+			       state->open_stateid.u.data,
+			       sizeof(state->stateid.u.data));
 		write_sequnlock(&state->seqlock);
 	}
 	return 0;
@@ -1225,8 +1233,8 @@ static int _nfs4_open_delegation_recall(struct nfs_open_context *ctx, struct nfs
 	if (IS_ERR(opendata))
 		return PTR_ERR(opendata);
 	opendata->o_arg.claim = NFS4_OPEN_CLAIM_DELEGATE_CUR;
-	memcpy(opendata->o_arg.u.delegation.data, stateid->data,
-			sizeof(opendata->o_arg.u.delegation.data));
+	memcpy(opendata->o_arg.u.delegation.u.data, stateid->u.data,
+			sizeof(opendata->o_arg.u.delegation.u.data));
 	ret = nfs4_open_recover(opendata, state);
 	nfs4_opendata_put(opendata);
 	return ret;
@@ -1284,8 +1292,8 @@ static void nfs4_open_confirm_done(struct rpc_task *task, void *calldata)
 	if (RPC_ASSASSINATED(task))
 		return;
 	if (data->rpc_status == 0) {
-		memcpy(data->o_res.stateid.data, data->c_res.stateid.data,
-				sizeof(data->o_res.stateid.data));
+		memcpy(data->o_res.stateid.u.data, data->c_res.stateid.u.data,
+				sizeof(data->o_res.stateid.u.data));
 		nfs_confirm_seqid(&data->owner->so_seqid, 0);
 		renew_lease(data->o_res.server, data->timestamp);
 		data->rpc_done = 1;
@@ -3923,9 +3931,10 @@ static void nfs4_locku_done(struct rpc_task *task, void *data)
 		return;
 	switch (task->tk_status) {
 		case 0:
-			memcpy(calldata->lsp->ls_stateid.data,
-					calldata->res.stateid.data,
-					sizeof(calldata->lsp->ls_stateid.data));
+			memcpy(calldata->lsp->ls_stateid.u.data,
+					calldata->res.stateid.u.data,
+					sizeof(calldata->lsp->ls_stateid.u.
+					       data));
 			renew_lease(calldata->server, calldata->timestamp);
 			break;
 		case -NFS4ERR_BAD_STATEID:
@@ -4140,8 +4149,8 @@ static void nfs4_lock_done(struct rpc_task *task, void *calldata)
 			goto out;
 	}
 	if (data->rpc_status == 0) {
-		memcpy(data->lsp->ls_stateid.data, data->res.stateid.data,
-					sizeof(data->lsp->ls_stateid.data));
+		memcpy(data->lsp->ls_stateid.u.data, data->res.stateid.u.data,
+					sizeof(data->lsp->ls_stateid.u.data));
 		data->lsp->ls_flags |= NFS_LOCK_INITIALIZED;
 		renew_lease(NFS_SERVER(data->ctx->path.dentry->d_inode), data->timestamp);
 	}
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 3e2f19b..cedd0cc 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1058,8 +1058,8 @@ restart:
 				 * Open state on this file cannot be recovered
 				 * All we can do is revert to using the zero stateid.
 				 */
-				memset(state->stateid.data, 0,
-					sizeof(state->stateid.data));
+				memset(state->stateid.u.data, 0,
+					sizeof(state->stateid.u.data));
 				/* Mark the file as being 'closed' */
 				state->state = 0;
 				break;
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 58bb81e..25aa191 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -945,7 +945,7 @@ static void encode_close(struct xdr_stream *xdr, const struct nfs_closeargs *arg
 	p = reserve_space(xdr, 8+NFS4_STATEID_SIZE);
 	*p++ = cpu_to_be32(OP_CLOSE);
 	*p++ = cpu_to_be32(arg->seqid->sequence->counter);
-	xdr_encode_opaque_fixed(p, arg->stateid->data, NFS4_STATEID_SIZE);
+	xdr_encode_opaque_fixed(p, arg->stateid->u.data, NFS4_STATEID_SIZE);
 	hdr->nops++;
 	hdr->replen += decode_close_maxsz;
 }
@@ -1101,13 +1101,14 @@ static void encode_lock(struct xdr_stream *xdr, const struct nfs_lock_args *args
 	if (args->new_lock_owner){
 		p = reserve_space(xdr, 4+NFS4_STATEID_SIZE+4);
 		*p++ = cpu_to_be32(args->open_seqid->sequence->counter);
-		p = xdr_encode_opaque_fixed(p, args->open_stateid->data, NFS4_STATEID_SIZE);
+		p = xdr_encode_opaque_fixed(p, args->open_stateid->u.data,
+					    NFS4_STATEID_SIZE);
 		*p++ = cpu_to_be32(args->lock_seqid->sequence->counter);
 		encode_lockowner(xdr, &args->lock_owner);
 	}
 	else {
 		p = reserve_space(xdr, NFS4_STATEID_SIZE+4);
-		p = xdr_encode_opaque_fixed(p, args->lock_stateid->data, NFS4_STATEID_SIZE);
+		p = xdr_encode_opaque_fixed(p, args->lock_stateid->u.data, NFS4_STATEID_SIZE);
 		*p = cpu_to_be32(args->lock_seqid->sequence->counter);
 	}
 	hdr->nops++;
@@ -1136,7 +1137,8 @@ static void encode_locku(struct xdr_stream *xdr, const struct nfs_locku_args *ar
 	*p++ = cpu_to_be32(OP_LOCKU);
 	*p++ = cpu_to_be32(nfs4_lock_type(args->fl, 0));
 	*p++ = cpu_to_be32(args->seqid->sequence->counter);
-	p = xdr_encode_opaque_fixed(p, args->stateid->data, NFS4_STATEID_SIZE);
+	p = xdr_encode_opaque_fixed(p, args->stateid->u.data,
+				    NFS4_STATEID_SIZE);
 	p = xdr_encode_hyper(p, args->fl->fl_start);
 	xdr_encode_hyper(p, nfs4_lock_length(args->fl));
 	hdr->nops++;
@@ -1297,7 +1299,7 @@ static inline void encode_claim_delegate_cur(struct xdr_stream *xdr, const struc
 
 	p = reserve_space(xdr, 4+NFS4_STATEID_SIZE);
 	*p++ = cpu_to_be32(NFS4_OPEN_CLAIM_DELEGATE_CUR);
-	xdr_encode_opaque_fixed(p, stateid->data, NFS4_STATEID_SIZE);
+	xdr_encode_opaque_fixed(p, stateid->u.data, NFS4_STATEID_SIZE);
 	encode_string(xdr, name->len, name->name);
 }
 
@@ -1328,7 +1330,7 @@ static void encode_open_confirm(struct xdr_stream *xdr, const struct nfs_open_co
 
 	p = reserve_space(xdr, 4+NFS4_STATEID_SIZE+4);
 	*p++ = cpu_to_be32(OP_OPEN_CONFIRM);
-	p = xdr_encode_opaque_fixed(p, arg->stateid->data, NFS4_STATEID_SIZE);
+	p = xdr_encode_opaque_fixed(p, arg->stateid->u.data, NFS4_STATEID_SIZE);
 	*p = cpu_to_be32(arg->seqid->sequence->counter);
 	hdr->nops++;
 	hdr->replen += decode_open_confirm_maxsz;
@@ -1340,7 +1342,7 @@ static void encode_open_downgrade(struct xdr_stream *xdr, const struct nfs_close
 
 	p = reserve_space(xdr, 4+NFS4_STATEID_SIZE+4);
 	*p++ = cpu_to_be32(OP_OPEN_DOWNGRADE);
-	p = xdr_encode_opaque_fixed(p, arg->stateid->data, NFS4_STATEID_SIZE);
+	p = xdr_encode_opaque_fixed(p, arg->stateid->u.data, NFS4_STATEID_SIZE);
 	*p = cpu_to_be32(arg->seqid->sequence->counter);
 	encode_share_access(xdr, arg->fmode);
 	hdr->nops++;
@@ -1378,9 +1380,9 @@ static void encode_stateid(struct xdr_stream *xdr, const struct nfs_open_context
 	p = reserve_space(xdr, NFS4_STATEID_SIZE);
 	if (ctx->state != NULL) {
 		nfs4_copy_stateid(&stateid, ctx->state, l_ctx->lockowner, l_ctx->pid);
-		xdr_encode_opaque_fixed(p, stateid.data, NFS4_STATEID_SIZE);
+		xdr_encode_opaque_fixed(p, stateid.u.data, NFS4_STATEID_SIZE);
 	} else
-		xdr_encode_opaque_fixed(p, zero_stateid.data, NFS4_STATEID_SIZE);
+		xdr_encode_opaque_fixed(p, zero_stateid.u.data, NFS4_STATEID_SIZE);
 }
 
 static void encode_read(struct xdr_stream *xdr, const struct nfs_readargs *args, struct compound_hdr *hdr)
@@ -1494,7 +1496,7 @@ encode_setacl(struct xdr_stream *xdr, struct nfs_setaclargs *arg, struct compoun
 
 	p = reserve_space(xdr, 4+NFS4_STATEID_SIZE);
 	*p++ = cpu_to_be32(OP_SETATTR);
-	xdr_encode_opaque_fixed(p, zero_stateid.data, NFS4_STATEID_SIZE);
+	xdr_encode_opaque_fixed(p, zero_stateid.u.data, NFS4_STATEID_SIZE);
 	p = reserve_space(xdr, 2*4);
 	*p++ = cpu_to_be32(1);
 	*p = cpu_to_be32(FATTR4_WORD0_ACL);
@@ -1525,7 +1527,7 @@ static void encode_setattr(struct xdr_stream *xdr, const struct nfs_setattrargs
 
 	p = reserve_space(xdr, 4+NFS4_STATEID_SIZE);
 	*p++ = cpu_to_be32(OP_SETATTR);
-	xdr_encode_opaque_fixed(p, arg->stateid.data, NFS4_STATEID_SIZE);
+	xdr_encode_opaque_fixed(p, arg->stateid.u.data, NFS4_STATEID_SIZE);
 	hdr->nops++;
 	hdr->replen += decode_setattr_maxsz;
 	encode_attrs(xdr, arg->iap, server);
@@ -1588,7 +1590,7 @@ static void encode_delegreturn(struct xdr_stream *xdr, const nfs4_stateid *state
 	p = reserve_space(xdr, 4+NFS4_STATEID_SIZE);
 
 	*p++ = cpu_to_be32(OP_DELEGRETURN);
-	xdr_encode_opaque_fixed(p, stateid->data, NFS4_STATEID_SIZE);
+	xdr_encode_opaque_fixed(p, stateid->u.data, NFS4_STATEID_SIZE);
 	hdr->nops++;
 	hdr->replen += decode_delegreturn_maxsz;
 }
@@ -3681,7 +3683,7 @@ static int decode_opaque_fixed(struct xdr_stream *xdr, void *buf, size_t len)
 
 static int decode_stateid(struct xdr_stream *xdr, nfs4_stateid *stateid)
 {
-	return decode_opaque_fixed(xdr, stateid->data, NFS4_STATEID_SIZE);
+	return decode_opaque_fixed(xdr, stateid->u.data, NFS4_STATEID_SIZE);
 }
 
 static int decode_close(struct xdr_stream *xdr, struct nfs_closeres *res)
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index c0c8cba..25665cc 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -17,7 +17,9 @@
 
 #define NFS4_BITMAP_SIZE	2
 #define NFS4_VERIFIER_SIZE	8
-#define NFS4_STATEID_SIZE	16
+#define NFS4_STATEID_SEQID_SIZE 4
+#define NFS4_STATEID_OTHER_SIZE 12
+#define NFS4_STATEID_SIZE	(NFS4_STATEID_SEQID_SIZE + NFS4_STATEID_OTHER_SIZE)
 #define NFS4_FHSIZE		128
 #define NFS4_MAXPATHLEN		PATH_MAX
 #define NFS4_MAXNAMLEN		NAME_MAX
@@ -167,7 +169,18 @@ struct nfs4_acl {
 };
 
 typedef struct { char data[NFS4_VERIFIER_SIZE]; } nfs4_verifier;
-typedef struct { char data[NFS4_STATEID_SIZE]; } nfs4_stateid;
+
+struct nfs41_stateid {
+	__be32 seqid;
+	char other[NFS4_STATEID_OTHER_SIZE];
+} __attribute__ ((packed));
+
+typedef struct {
+	union {
+		char data[NFS4_STATEID_SIZE];
+		struct nfs41_stateid stateid;
+	} u;
+} nfs4_stateid;
 
 enum nfs_opnum4 {
 	OP_ACCESS = 3,
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 19/50] pnfs_submit: layout header alloc,reference, and destroy
  2010-08-13 21:31                                   ` [PATCH 18/50] pnfs-submit: change stateid to be a union andros
@ 2010-08-13 21:31                                     ` andros
  2010-08-13 21:31                                       ` [PATCH 20/50] pnfs_submit: filelayout alloc_layout and free_layout andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs

From: The pNFS Team <linux-nfs@vger.kernel.org>

---
 fs/nfs/client.c           |    4 +-
 fs/nfs/inode.c            |    8 ++-
 fs/nfs/nfs4state.c        |    2 +
 fs/nfs/pnfs.c             |  229 +++++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/pnfs.h             |   22 +++++
 include/linux/nfs4.h      |    6 +
 include/linux/nfs4_pnfs.h |   51 ++++++++++
 include/linux/nfs_fs.h    |   25 +++++
 include/linux/nfs_fs_sb.h |    1 +
 include/linux/pnfs_xdr.h  |    8 ++
 10 files changed, 354 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 2e440b6..09ee926 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -156,7 +156,9 @@ static struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_
 	cred = rpc_lookup_machine_cred();
 	if (!IS_ERR(cred))
 		clp->cl_machine_cred = cred;
-
+#if defined(CONFIG_NFS_V4_1)
+	INIT_LIST_HEAD(&clp->cl_layouts);
+#endif
 	nfs_fscache_get_client_cookie(clp);
 
 	return clp;
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 64261ea..15cdcb1 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1430,7 +1430,10 @@ struct inode *nfs_alloc_inode(struct super_block *sb)
 
 void nfs_destroy_inode(struct inode *inode)
 {
-	kmem_cache_free(nfs_inode_cachep, NFS_I(inode));
+	struct nfs_inode *nfsi = NFS_I(inode);
+
+	pnfs_destroy_layout(nfsi);
+	kmem_cache_free(nfs_inode_cachep, nfsi);
 }
 
 static inline void nfs4_init_once(struct nfs_inode *nfsi)
@@ -1440,6 +1443,9 @@ static inline void nfs4_init_once(struct nfs_inode *nfsi)
 	nfsi->delegation = NULL;
 	nfsi->delegation_state = 0;
 	init_rwsem(&nfsi->rwsem);
+#ifdef CONFIG_NFS_V4_1
+	nfsi->layout = NULL;
+#endif /* CONFIG_NFS_V4_1 */
 #endif
 }
 
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index cedd0cc..506a92f 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -53,6 +53,7 @@
 #include "callback.h"
 #include "delegation.h"
 #include "internal.h"
+#include "pnfs.h"
 
 #define OPENOWNER_POOL_SIZE	8
 
@@ -1447,6 +1448,7 @@ static void nfs4_state_manager(struct nfs_client *clp)
 			}
 			clear_bit(NFS4CLNT_CHECK_LEASE, &clp->cl_state);
 			set_bit(NFS4CLNT_RECLAIM_REBOOT, &clp->cl_state);
+			pnfs_destroy_all_layouts(clp);
 		}
 
 		if (test_and_clear_bit(NFS4CLNT_CHECK_LEASE, &clp->cl_state)) {
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index dcede52..3dc3701 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -57,6 +57,10 @@
 
 static int pnfs_initialized;
 
+static void pnfs_free_layout(struct pnfs_layout_type *lo,
+			     struct nfs4_pnfs_layout_segment *range);
+static inline void get_layout(struct pnfs_layout_type *lo);
+
 /* Locking:
  *
  * pnfs_spinlock:
@@ -152,6 +156,7 @@ struct pnfs_client_operations*
 pnfs_register_layoutdriver(struct pnfs_layoutdriver_type *ld_type)
 {
 	struct pnfs_module *pnfs_mod;
+	struct layoutdriver_io_operations *io_ops = ld_type->ld_io_ops;
 
 	if (!pnfs_initialized) {
 		printk(KERN_ERR "%s Registration failure. "
@@ -159,6 +164,12 @@ pnfs_register_layoutdriver(struct pnfs_layoutdriver_type *ld_type)
 		return NULL;
 	}
 
+	if (!io_ops || !io_ops->alloc_layout || !io_ops->free_layout) {
+		printk(KERN_ERR "%s Layout driver must provide "
+		       "alloc_layout and free_layout.\n", __func__);
+		return NULL;
+	}
+
 	pnfs_mod = kmalloc(sizeof(struct pnfs_module), GFP_KERNEL);
 	if (pnfs_mod != NULL) {
 		dprintk("%s Registering id:%u name:%s\n",
@@ -191,6 +202,224 @@ pnfs_unregister_layoutdriver(struct pnfs_layoutdriver_type *ld_type)
 	}
 }
 
+/*
+ * pNFS client layout cache
+ */
+#if defined(CONFIG_SMP)
+#define BUG_ON_UNLOCKED_INO(ino) \
+	BUG_ON(!spin_is_locked(&ino->i_lock))
+#define BUG_ON_UNLOCKED_LO(lo) \
+	BUG_ON_UNLOCKED_INO(PNFS_INODE(lo))
+#else /* CONFIG_SMP */
+#define BUG_ON_UNLOCKED_INO(lo) do {} while (0)
+#define BUG_ON_UNLOCKED_LO(lo) do {} while (0)
+#endif /* CONFIG_SMP */
+
+static inline void
+get_layout(struct pnfs_layout_type *lo)
+{
+	BUG_ON_UNLOCKED_LO(lo);
+	lo->refcount++;
+}
+
+static inline void
+put_layout_locked(struct pnfs_layout_type *lo)
+{
+	BUG_ON_UNLOCKED_LO(lo);
+	BUG_ON(lo->refcount <= 0);
+
+	lo->refcount--;
+	if (!lo->refcount) {
+		struct layoutdriver_io_operations *io_ops = PNFS_LD_IO_OPS(lo);
+		struct nfs_inode *nfsi = PNFS_NFS_INODE(lo);
+
+		dprintk("%s: freeing layout cache %p\n", __func__, lo);
+		WARN_ON(!list_empty(&lo->lo_layouts));
+		io_ops->free_layout(lo);
+		nfsi->layout = NULL;
+	}
+}
+
+void
+put_layout(struct inode *inode)
+{
+	spin_lock(&inode->i_lock);
+	put_layout_locked(NFS_I(inode)->layout);
+	spin_unlock(&inode->i_lock);
+
+}
+
+void
+pnfs_destroy_layout(struct nfs_inode *nfsi)
+{
+	struct pnfs_layout_type *lo;
+	struct nfs4_pnfs_layout_segment range = {
+		.iomode = IOMODE_ANY,
+		.offset = 0,
+		.length = NFS4_MAX_UINT64,
+	};
+
+	spin_lock(&nfsi->vfs_inode.i_lock);
+	lo = nfsi->layout;
+	if (lo) {
+		pnfs_free_layout(lo, &range);
+		WARN_ON(!list_empty(&nfsi->layout->segs));
+		WARN_ON(!list_empty(&nfsi->layout->lo_layouts));
+
+		if (nfsi->layout->refcount != 1)
+			printk(KERN_WARNING "%s: layout refcount not=1 %d\n",
+				__func__, nfsi->layout->refcount);
+		WARN_ON(nfsi->layout->refcount != 1);
+
+		/* Matched by refcount set to 1 in alloc_init_layout */
+		put_layout_locked(lo);
+	}
+	spin_unlock(&nfsi->vfs_inode.i_lock);
+}
+
+/*
+ * Called by the state manger to remove all layouts established under an
+ * expired lease.
+ */
+void
+pnfs_destroy_all_layouts(struct nfs_client *clp)
+{
+	struct pnfs_layout_type *lo;
+
+	while (!list_empty(&clp->cl_layouts)) {
+		lo = list_entry(clp->cl_layouts.next, struct pnfs_layout_type,
+				lo_layouts);
+		dprintk("%s freeing layout for inode %lu\n", __func__,
+			lo->lo_inode->i_ino);
+		pnfs_destroy_layout(NFS_I(lo->lo_inode));
+	}
+}
+
+void
+pnfs_set_layout_stateid(struct pnfs_layout_type *lo,
+			const nfs4_stateid *stateid)
+{
+	write_seqlock(&lo->seqlock);
+	memcpy(lo->stateid.u.data, stateid->u.data, sizeof(lo->stateid.u.data));
+	write_sequnlock(&lo->seqlock);
+}
+
+void
+pnfs_get_layout_stateid(nfs4_stateid *dst, struct pnfs_layout_type *lo)
+{
+	int seq;
+
+	dprintk("--> %s\n", __func__);
+
+	do {
+		seq = read_seqbegin(&lo->seqlock);
+		memcpy(dst->u.data, lo->stateid.u.data,
+		       sizeof(lo->stateid.u.data));
+	} while (read_seqretry(&lo->seqlock, seq));
+
+	dprintk("<-- %s\n", __func__);
+}
+
+static void
+pnfs_layout_from_open_stateid(struct pnfs_layout_type *lo,
+			      struct nfs4_state *state)
+{
+	int seq;
+
+	dprintk("--> %s\n", __func__);
+
+	write_seqlock(&lo->seqlock);
+	if (!memcmp(lo->stateid.u.data, &zero_stateid, NFS4_STATEID_SIZE))
+		do {
+			seq = read_seqbegin(&state->seqlock);
+			memcpy(lo->stateid.u.data, state->stateid.u.data,
+					sizeof(state->stateid.u.data));
+		} while (read_seqretry(&state->seqlock, seq));
+	write_sequnlock(&lo->seqlock);
+	dprintk("<-- %s\n", __func__);
+}
+
+static void
+pnfs_free_layout(struct pnfs_layout_type *lo,
+		 struct nfs4_pnfs_layout_segment *range)
+{
+	dprintk("%s:Begin lo %p offset %llu length %llu iomode %d\n",
+		__func__, lo, range->offset, range->length, range->iomode);
+
+	if (list_empty(&lo->segs)) {
+		struct nfs_client *clp;
+
+		clp = PNFS_NFS_SERVER(lo)->nfs_client;
+		spin_lock(&clp->cl_lock);
+		list_del_init(&lo->lo_layouts);
+		spin_unlock(&clp->cl_lock);
+		pnfs_set_layout_stateid(lo, &zero_stateid);
+	}
+
+	dprintk("%s:Return\n", __func__);
+}
+
+/*
+ * Each layoutdriver embeds pnfs_layout_type as the first field in it's
+ * per-layout type layout cache structure and returns it ZEROed
+ * from layoutdriver_io_ops->alloc_layout
+ */
+static struct pnfs_layout_type *
+alloc_init_layout(struct inode *ino)
+{
+	struct pnfs_layout_type *lo;
+	struct layoutdriver_io_operations *io_ops;
+
+	io_ops = NFS_SERVER(ino)->pnfs_curr_ld->ld_io_ops;
+	lo = io_ops->alloc_layout(ino);
+	if (!lo) {
+		printk(KERN_ERR
+			"%s: out of memory: io_ops->alloc_layout failed\n",
+			__func__);
+		return NULL;
+	}
+	lo->refcount = 1;
+	INIT_LIST_HEAD(&lo->lo_layouts);
+	INIT_LIST_HEAD(&lo->segs);
+	seqlock_init(&lo->seqlock);
+	lo->lo_inode = ino;
+	return lo;
+}
+
+/*
+ * Retrieve and possibly allocate the inode layout
+ *
+ * ino->i_lock must be taken by the caller.
+ */
+static struct pnfs_layout_type *
+pnfs_alloc_layout(struct inode *ino)
+{
+	struct nfs_inode *nfsi = NFS_I(ino);
+	struct pnfs_layout_type *new = NULL;
+
+	dprintk("%s Begin ino=%p layout=%p\n", __func__, ino, nfsi->layout);
+
+	BUG_ON_UNLOCKED_INO(ino);
+	if (likely(nfsi->layout))
+		return nfsi->layout;
+
+	spin_unlock(&ino->i_lock);
+	new = alloc_init_layout(ino);
+	spin_lock(&ino->i_lock);
+
+	if (likely(nfsi->layout == NULL)) {	/* Won the race? */
+		nfsi->layout = new;
+	} else if (new) {
+		/* Reference the layout accross i_lock release and grab */
+		get_layout(nfsi->layout);
+		spin_unlock(&ino->i_lock);
+		NFS_SERVER(ino)->pnfs_curr_ld->ld_io_ops->free_layout(new);
+		spin_lock(&ino->i_lock);
+		put_layout_locked(nfsi->layout);
+	}
+	return nfsi->layout;
+}
+
 /* Callback operations for layout drivers.
  */
 struct pnfs_client_operations pnfs_ops = {
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index f9fb58b..1e40a0d 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -28,6 +28,12 @@ extern int nfs4_pnfs_getdeviceinfo(struct nfs_server *server,
 void set_pnfs_layoutdriver(struct nfs_server *, u32 id);
 void unmount_pnfs_layoutdriver(struct nfs_server *);
 int pnfs_initialize(void);
+void pnfs_set_layout_stateid(struct pnfs_layout_type *lo,
+			     const nfs4_stateid *stateid);
+void pnfs_destroy_layout(struct nfs_inode *);
+void pnfs_destroy_all_layouts(struct nfs_client *);
+void put_layout(struct inode *inode);
+void pnfs_get_layout_stateid(nfs4_stateid *dst, struct pnfs_layout_type *lo);
 
 #define PNFS_EXISTS_LDIO_OP(srv, opname) ((srv)->pnfs_curr_ld &&	\
 				     (srv)->pnfs_curr_ld->ld_io_ops &&	\
@@ -35,6 +41,22 @@ int pnfs_initialize(void);
 
 #define LAYOUT_NFSV4_1_MODULE_PREFIX "nfs-layouttype4"
 
+/* Return true if a layout driver is being used for this mountpoint */
+static inline int pnfs_enabled_sb(struct nfs_server *nfss)
+{
+	return nfss->pnfs_curr_ld != NULL;
+}
+
+#else  /* CONFIG_NFS_V4_1 */
+
+static inline void pnfs_destroy_all_layouts(struct nfs_client *clp)
+{
+}
+
+static inline void pnfs_destroy_layout(struct nfs_inode *nfsi)
+{
+}
+
 #endif /* CONFIG_NFS_V4_1 */
 
 #endif /* FS_NFS_PNFS_H */
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 25665cc..06912b0 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -569,6 +569,12 @@ enum pnfs_layouttype {
 	LAYOUT_NFSV4_1_FILES  = 1,
 };
 
+enum pnfs_iomode {
+	IOMODE_READ = 1,
+	IOMODE_RW = 2,
+	IOMODE_ANY = 3,
+};
+
 #endif
 #endif
 
diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
index dee53f2..b961f97 100644
--- a/include/linux/nfs4_pnfs.h
+++ b/include/linux/nfs4_pnfs.h
@@ -22,10 +22,61 @@ struct pnfs_layoutdriver_type {
 	struct layoutdriver_policy_operations *ld_policy_ops;
 };
 
+#if defined(CONFIG_NFS_V4_1)
+
+static inline struct nfs_inode *
+PNFS_NFS_INODE(struct pnfs_layout_type *lo)
+{
+	return NFS_I(lo->lo_inode);
+}
+
+static inline struct inode *
+PNFS_INODE(struct pnfs_layout_type *lo)
+{
+	return lo->lo_inode;
+}
+
+static inline struct nfs_server *
+PNFS_NFS_SERVER(struct pnfs_layout_type *lo)
+{
+	return NFS_SERVER(PNFS_INODE(lo));
+}
+
+static inline struct pnfs_layoutdriver_type *
+PNFS_LD(struct pnfs_layout_type *lo)
+{
+	return NFS_SERVER(PNFS_INODE(lo))->pnfs_curr_ld;
+}
+
+static inline struct layoutdriver_io_operations *
+PNFS_LD_IO_OPS(struct pnfs_layout_type *lo)
+{
+	return PNFS_LD(lo)->ld_io_ops;
+}
+
+
+#endif /* CONFIG_NFS_V4_1 */
+
+struct pnfs_layout_segment {
+	struct list_head fi_list;
+	struct nfs4_pnfs_layout_segment range;
+	struct kref kref;
+	bool valid;
+	struct pnfs_layout_type *layout;
+	struct nfs4_deviceid *deviceid;
+	u8 ld_data[];			/* layout driver private data */
+};
+
 /* Layout driver I/O operations.
  * Either the pagecache or non-pagecache read/write operations must be implemented
  */
 struct layoutdriver_io_operations {
+	/* Layout information. For each inode, alloc_layout is executed once to retrieve an
+	 * inode specific layout structure.  Each subsequent layoutget operation results in
+	 * a set_layout call to set the opaque layout in the layout driver.*/
+	struct pnfs_layout_type * (*alloc_layout) (struct inode *inode);
+	void (*free_layout) (struct pnfs_layout_type *);
+
 	/* Registration information for a new mounted file system
 	 */
 	int (*initialize_mountpoint) (struct nfs_client *);
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index a0f49a3..e3b11b3 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -104,6 +104,26 @@ struct nfs_delegation;
 
 struct posix_acl;
 
+struct pnfs_layout_type {
+	int refcount;
+	struct list_head lo_layouts;	/* other client layouts */
+	struct list_head segs;		/* layout segments list */
+	int roc_iomode;			/* iomode to return on close, 0=none */
+	seqlock_t seqlock;		/* Protects the stateid */
+	nfs4_stateid stateid;
+	unsigned long pnfs_layout_state;
+	#define NFS_INO_RO_LAYOUT_FAILED 0      /* get ro layout failed stop trying */
+	#define NFS_INO_RW_LAYOUT_FAILED 1      /* get rw layout failed stop trying */
+	#define NFS_INO_LAYOUTCOMMIT     3      /* LAYOUTCOMMIT needed */
+	struct rpc_cred         *lo_cred; /* layoutcommit credential */
+	/* DH: These vars keep track of the maximum write range
+	 * so the values can be used for layoutcommit.
+	 */
+	loff_t                  pnfs_write_begin_pos;
+	loff_t                  pnfs_write_end_pos;
+	struct inode		*lo_inode;
+};
+
 /*
  * nfs fs inode data in memory
  */
@@ -188,6 +208,11 @@ struct nfs_inode {
 	struct nfs_delegation	*delegation;
 	fmode_t			 delegation_state;
 	struct rw_semaphore	rwsem;
+
+	/* pNFS layout information */
+#if defined(CONFIG_NFS_V4_1)
+	struct pnfs_layout_type *layout;
+#endif /* CONFIG_NFS_V4_1 */
 #endif /* CONFIG_NFS_V4*/
 #ifdef CONFIG_NFS_FSCACHE
 	struct fscache_cookie	*fscache;
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 4544b52..8d17e67 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -82,6 +82,7 @@ struct nfs_client {
 	/* The flags used for obtaining the clientid during EXCHANGE_ID */
 	u32			cl_exchange_flags;
 	struct nfs4_session	*cl_session; 	/* sharred session */
+	struct list_head	cl_layouts;
 	struct nfs4_deviceid_cache *cl_devid_cache; /* pNFS deviceid cache */
 #endif /* CONFIG_NFS_V4_1 */
 
diff --git a/include/linux/pnfs_xdr.h b/include/linux/pnfs_xdr.h
index 458ff69..0f037a6 100644
--- a/include/linux/pnfs_xdr.h
+++ b/include/linux/pnfs_xdr.h
@@ -12,12 +12,20 @@
 #ifndef LINUX_PNFS_XDR_H
 #define LINUX_PNFS_XDR_H
 
+#define PNFS_LAYOUT_MAXSIZE 4096
 #define NFS4_PNFS_DEVICEID4_SIZE 16
 
 struct pnfs_deviceid {
 	char data[NFS4_PNFS_DEVICEID4_SIZE];
 };
 
+
+struct nfs4_pnfs_layout_segment {
+	u32 iomode;
+	u64 offset;
+	u64 length;
+};
+
 struct nfs4_pnfs_getdeviceinfo_arg {
 	struct pnfs_device *pdev;
 	struct nfs4_sequence_args seq_args;
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 20/50] pnfs_submit: filelayout alloc_layout and free_layout
  2010-08-13 21:31                                     ` [PATCH 19/50] pnfs_submit: layout header alloc,reference, and destroy andros
@ 2010-08-13 21:31                                       ` andros
  2010-08-13 21:31                                         ` [PATCH 21/50] pnfs_submit: layout segment alloc, reference, destroy andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayout.c |   24 ++++++++++++++++++++++++
 fs/nfs/nfs4filelayout.h |   10 ++++++++++
 2 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 7039d9d..e1a09a8 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -82,7 +82,31 @@ filelayout_uninitialize_mountpoint(struct nfs_server *nfss)
 	return 0;
 }
 
+/*
+ * Create a filelayout layout structure and return it.  The pNFS client
+ * will use the pnfs_layout_type type to refer to the layout for this
+ * inode from now on.
+ */
+static struct pnfs_layout_type *
+filelayout_alloc_layout(struct inode *inode)
+{
+	struct nfs4_filelayout *flp;
+
+	dprintk("NFS_FILELAYOUT: allocating layout\n");
+	flp =  kzalloc(sizeof(struct nfs4_filelayout), GFP_KERNEL);
+	return flp ? &flp->fl_layout : NULL;
+}
+
+/* Free a filelayout layout structure */
+static void
+filelayout_free_layout(struct pnfs_layout_type *lo)
+{
+	dprintk("NFS_FILELAYOUT: freeing layout\n");
+	kfree(FILE_LO(lo));
+}
 struct layoutdriver_io_operations filelayout_io_operations = {
+	.alloc_layout            = filelayout_alloc_layout,
+	.free_layout             = filelayout_free_layout,
 	.initialize_mountpoint   = filelayout_initialize_mountpoint,
 	.uninitialize_mountpoint = filelayout_uninitialize_mountpoint,
 };
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index be3c9fe..ad975fd 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -41,6 +41,16 @@ struct nfs4_file_layout_dsaddr {
 	struct nfs4_pnfs_ds	*ds_list[1];
 };
 
+struct nfs4_filelayout {
+	struct pnfs_layout_type fl_layout;
+	u32 stripe_unit;
+};
+
+static inline struct nfs4_filelayout *
+FILE_LO(struct pnfs_layout_type *lo)
+{
+	return container_of(lo, struct nfs4_filelayout, fl_layout);
+}
 
 extern struct pnfs_client_operations *pnfs_callback_ops;
 
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 21/50] pnfs_submit: layout segment alloc, reference, destroy
  2010-08-13 21:31                                       ` [PATCH 20/50] pnfs_submit: filelayout alloc_layout and free_layout andros
@ 2010-08-13 21:31                                         ` andros
  2010-08-13 21:31                                           ` [PATCH 22/50] pnfs_submit: layoutget andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/inode.c            |    1 +
 fs/nfs/pnfs.c             |  105 +++++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/pnfs.h             |   14 ++++++
 include/linux/nfs4_pnfs.h |    2 +
 include/linux/nfs_fs.h    |    1 +
 include/linux/pnfs_xdr.h  |    3 +
 6 files changed, 126 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 15cdcb1..ce91e8f 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1444,6 +1444,7 @@ static inline void nfs4_init_once(struct nfs_inode *nfsi)
 	nfsi->delegation_state = 0;
 	init_rwsem(&nfsi->rwsem);
 #ifdef CONFIG_NFS_V4_1
+	init_waitqueue_head(&nfsi->lo_waitq);
 	nfsi->layout = NULL;
 #endif /* CONFIG_NFS_V4_1 */
 #endif
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 3dc3701..cfee1d6 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -170,6 +170,12 @@ pnfs_register_layoutdriver(struct pnfs_layoutdriver_type *ld_type)
 		return NULL;
 	}
 
+	if (!io_ops->alloc_lseg || !io_ops->free_lseg) {
+		printk(KERN_ERR "%s Layout driver must provide "
+		       "alloc_lseg and free_lseg.\n", __func__);
+		return NULL;
+	}
+
 	pnfs_mod = kmalloc(sizeof(struct pnfs_module), GFP_KERNEL);
 	if (pnfs_mod != NULL) {
 		dprintk("%s Registering id:%u name:%s\n",
@@ -295,6 +301,66 @@ pnfs_destroy_all_layouts(struct nfs_client *clp)
 	}
 }
 
+static inline void
+init_lseg(struct pnfs_layout_type *lo, struct pnfs_layout_segment *lseg)
+{
+	INIT_LIST_HEAD(&lseg->fi_list);
+	kref_init(&lseg->kref);
+	lseg->valid = true;
+	lseg->layout = lo;
+}
+
+static void
+destroy_lseg(struct kref *kref)
+{
+	struct pnfs_layout_segment *lseg =
+		container_of(kref, struct pnfs_layout_segment, kref);
+
+	dprintk("--> %s\n", __func__);
+	/* Matched by get_layout in pnfs_insert_layout */
+	put_layout_locked(lseg->layout);
+	PNFS_LD_IO_OPS(lseg->layout)->free_lseg(lseg);
+}
+
+static void
+put_lseg_locked(struct pnfs_layout_segment *lseg)
+{
+	bool do_wake_up;
+	struct nfs_inode *nfsi;
+
+	if (!lseg)
+		return;
+
+	dprintk("%s: lseg %p ref %d valid %d\n", __func__, lseg,
+		atomic_read(&lseg->kref.refcount), lseg->valid);
+	do_wake_up = !lseg->valid;
+	nfsi = PNFS_NFS_INODE(lseg->layout);
+	kref_put(&lseg->kref, destroy_lseg);
+	if (do_wake_up)
+		wake_up(&nfsi->lo_waitq);
+}
+
+void
+put_lseg(struct pnfs_layout_segment *lseg)
+{
+	bool do_wake_up;
+	struct nfs_inode *nfsi;
+
+	if (!lseg)
+		return;
+
+	dprintk("%s: lseg %p ref %d valid %d\n", __func__, lseg,
+		atomic_read(&lseg->kref.refcount), lseg->valid);
+	do_wake_up = !lseg->valid;
+	nfsi = PNFS_NFS_INODE(lseg->layout);
+	spin_lock(&nfsi->vfs_inode.i_lock);
+	kref_put(&lseg->kref, destroy_lseg);
+	spin_unlock(&nfsi->vfs_inode.i_lock);
+	if (do_wake_up)
+		wake_up(&nfsi->lo_waitq);
+}
+EXPORT_SYMBOL(put_lseg);
+
 void
 pnfs_set_layout_stateid(struct pnfs_layout_type *lo,
 			const nfs4_stateid *stateid)
@@ -339,13 +405,52 @@ pnfs_layout_from_open_stateid(struct pnfs_layout_type *lo,
 	dprintk("<-- %s\n", __func__);
 }
 
+/*
+ * iomode matching rules:
+ * range	lseg	match
+ * -----	-----	-----
+ * ANY		READ	true
+ * ANY		RW	true
+ * RW		READ	false
+ * RW		RW	true
+ * READ		READ	true
+ * READ		RW	false
+ */
+static inline int
+should_free_lseg(struct pnfs_layout_segment *lseg,
+		   struct nfs4_pnfs_layout_segment *range)
+{
+	return (range->iomode == IOMODE_ANY ||
+		lseg->range.iomode == range->iomode);
+}
+
+static inline bool
+_pnfs_can_return_lseg(struct pnfs_layout_segment *lseg)
+{
+	return atomic_read(&lseg->kref.refcount) == 1;
+}
+
+
 static void
 pnfs_free_layout(struct pnfs_layout_type *lo,
 		 struct nfs4_pnfs_layout_segment *range)
 {
+	struct pnfs_layout_segment *lseg, *next;
 	dprintk("%s:Begin lo %p offset %llu length %llu iomode %d\n",
 		__func__, lo, range->offset, range->length, range->iomode);
 
+	BUG_ON_UNLOCKED_LO(lo);
+	list_for_each_entry_safe (lseg, next, &lo->segs, fi_list) {
+		if (!should_free_lseg(lseg, range) ||
+		    !_pnfs_can_return_lseg(lseg))
+			continue;
+		dprintk("%s: freeing lseg %p iomode %d "
+			"offset %llu length %llu\n", __func__,
+			lseg, lseg->range.iomode, lseg->range.offset,
+			lseg->range.length);
+		list_del(&lseg->fi_list);
+		put_lseg_locked(lseg);
+	}
 	if (list_empty(&lo->segs)) {
 		struct nfs_client *clp;
 
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 1e40a0d..d8de4c1 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -25,6 +25,7 @@
 extern int nfs4_pnfs_getdeviceinfo(struct nfs_server *server,
 				   struct pnfs_device *dev);
 /* pnfs.c */
+void put_lseg(struct pnfs_layout_segment *lseg);
 void set_pnfs_layoutdriver(struct nfs_server *, u32 id);
 void unmount_pnfs_layoutdriver(struct nfs_server *);
 int pnfs_initialize(void);
@@ -41,6 +42,11 @@ void pnfs_get_layout_stateid(nfs4_stateid *dst, struct pnfs_layout_type *lo);
 
 #define LAYOUT_NFSV4_1_MODULE_PREFIX "nfs-layouttype4"
 
+static inline void get_lseg(struct pnfs_layout_segment *lseg)
+{
+	kref_get(&lseg->kref);
+}
+
 /* Return true if a layout driver is being used for this mountpoint */
 static inline int pnfs_enabled_sb(struct nfs_server *nfss)
 {
@@ -57,6 +63,14 @@ static inline void pnfs_destroy_layout(struct nfs_inode *nfsi)
 {
 }
 
+static inline void get_lseg(struct pnfs_layout_segment *lseg)
+{
+}
+
+static inline void put_lseg(struct pnfs_layout_segment *lseg)
+{
+}
+
 #endif /* CONFIG_NFS_V4_1 */
 
 #endif /* FS_NFS_PNFS_H */
diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
index b961f97..287a7dc 100644
--- a/include/linux/nfs4_pnfs.h
+++ b/include/linux/nfs4_pnfs.h
@@ -76,6 +76,8 @@ struct layoutdriver_io_operations {
 	 * a set_layout call to set the opaque layout in the layout driver.*/
 	struct pnfs_layout_type * (*alloc_layout) (struct inode *inode);
 	void (*free_layout) (struct pnfs_layout_type *);
+	struct pnfs_layout_segment * (*alloc_lseg) (struct pnfs_layout_type *layoutid, struct nfs4_pnfs_layoutget_res *lgr);
+	void (*free_lseg) (struct pnfs_layout_segment *lseg);
 
 	/* Registration information for a new mounted file system
 	 */
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index e3b11b3..c8b6129 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -211,6 +211,7 @@ struct nfs_inode {
 
 	/* pNFS layout information */
 #if defined(CONFIG_NFS_V4_1)
+	wait_queue_head_t lo_waitq;
 	struct pnfs_layout_type *layout;
 #endif /* CONFIG_NFS_V4_1 */
 #endif /* CONFIG_NFS_V4*/
diff --git a/include/linux/pnfs_xdr.h b/include/linux/pnfs_xdr.h
index 0f037a6..e6743f3 100644
--- a/include/linux/pnfs_xdr.h
+++ b/include/linux/pnfs_xdr.h
@@ -26,6 +26,9 @@ struct nfs4_pnfs_layout_segment {
 	u64 length;
 };
 
+struct nfs4_pnfs_layoutget_res {
+};
+
 struct nfs4_pnfs_getdeviceinfo_arg {
 	struct pnfs_device *pdev;
 	struct nfs4_sequence_args seq_args;
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 22/50] pnfs_submit: layoutget
  2010-08-13 21:31                                         ` [PATCH 21/50] pnfs_submit: layout segment alloc, reference, destroy andros
@ 2010-08-13 21:31                                           ` andros
  2010-08-13 21:31                                             ` [PATCH 23/50] pnfs_submit: layout helper functions andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4proc.c        |  114 +++++++++++++++++++++++++++++++++
 fs/nfs/nfs4xdr.c         |  156 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/pnfs.c            |   19 ++++++
 fs/nfs/pnfs.h            |    5 ++
 include/linux/nfs4.h     |    1 +
 include/linux/pnfs_xdr.h |   26 ++++++++
 6 files changed, 321 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 72e5132..279a37d 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5366,6 +5366,120 @@ out:
 	return status;
 }
 
+static void
+nfs4_pnfs_layoutget_prepare(struct rpc_task *task, void *calldata)
+{
+	struct nfs4_pnfs_layoutget *lgp = calldata;
+	struct inode *ino = lgp->args.inode;
+	struct nfs_server *server = NFS_SERVER(ino);
+
+	dprintk("--> %s\n", __func__);
+	if (nfs4_setup_sequence(server, &lgp->args.seq_args,
+				&lgp->res.seq_res, 0, task))
+		return;
+	rpc_call_start(task);
+}
+
+static void nfs4_pnfs_layoutget_done(struct rpc_task *task, void *calldata)
+{
+	struct nfs4_pnfs_layoutget *lgp = calldata;
+	struct inode *ino = lgp->args.inode;
+	struct nfs_server *server = NFS_SERVER(ino);
+
+	dprintk("--> %s\n", __func__);
+
+	if (!nfs4_sequence_done(task, &lgp->res.seq_res))
+		return;
+
+	if (RPC_ASSASSINATED(task))
+		return;
+
+	pnfs_get_layout_done(lgp, task->tk_status);
+
+	if (nfs4_async_handle_error(task, server, NULL) == -EAGAIN)
+		nfs_restart_rpc(task, server->nfs_client);
+
+	lgp->status = task->tk_status;
+	dprintk("<-- %s\n", __func__);
+}
+
+static void nfs4_pnfs_layoutget_release(void *calldata)
+{
+	struct nfs4_pnfs_layoutget *lgp = calldata;
+
+	dprintk("--> %s\n", __func__);
+	pnfs_layout_release(NFS_I(lgp->args.inode)->layout, NULL);
+	if (lgp->res.layout.buf != NULL)
+		free_page((unsigned long) lgp->res.layout.buf);
+	kfree(calldata);
+	dprintk("<-- %s\n", __func__);
+}
+
+static const struct rpc_call_ops nfs4_pnfs_layoutget_call_ops = {
+	.rpc_call_prepare = nfs4_pnfs_layoutget_prepare,
+	.rpc_call_done = nfs4_pnfs_layoutget_done,
+	.rpc_release = nfs4_pnfs_layoutget_release,
+};
+
+/* FIXME: We need to call nfs4_handle_exception
+ * and deal with retries.
+ * Currently we can't since we release lgp and its contents.
+ */
+static int _pnfs4_proc_layoutget(struct nfs4_pnfs_layoutget *lgp)
+{
+	struct nfs_server *server = NFS_SERVER(lgp->args.inode);
+	struct rpc_task *task;
+	struct rpc_message msg = {
+		.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_PNFS_LAYOUTGET],
+		.rpc_argp = &lgp->args,
+		.rpc_resp = &lgp->res,
+	};
+	struct rpc_task_setup task_setup_data = {
+		.rpc_client = server->client,
+		.rpc_message = &msg,
+		.callback_ops = &nfs4_pnfs_layoutget_call_ops,
+		.callback_data = lgp,
+		.flags = RPC_TASK_ASYNC,
+	};
+	int status = 0;
+
+	dprintk("--> %s\n", __func__);
+
+	lgp->res.layout.buf = (void *)__get_free_page(GFP_NOFS);
+	if (lgp->res.layout.buf == NULL) {
+		nfs4_pnfs_layoutget_release(lgp);
+		return -ENOMEM;
+	}
+
+	lgp->res.seq_res.sr_slotid = NFS4_MAX_SLOT_TABLE;
+	task = rpc_run_task(&task_setup_data);
+	if (IS_ERR(task))
+		return PTR_ERR(task);
+	status = nfs4_wait_for_completion_rpc_task(task);
+	if (status != 0)
+		goto out;
+	status = lgp->status;
+	if (status != 0)
+		goto out;
+	status = pnfs_layout_process(lgp);
+out:
+	rpc_put_task(task);
+	dprintk("<-- %s status=%d\n", __func__, status);
+	return status;
+}
+
+int pnfs4_proc_layoutget(struct nfs4_pnfs_layoutget *lgp)
+{
+	struct nfs_server *server = NFS_SERVER(lgp->args.inode);
+	struct nfs4_exception exception = { };
+	int err;
+	do {
+		err = nfs4_handle_exception(server, _pnfs4_proc_layoutget(lgp),
+					    &exception);
+	} while (exception.retry);
+	return err;
+}
+
 int nfs4_pnfs_getdeviceinfo(struct nfs_server *server, struct pnfs_device *pdev)
 {
 	struct nfs4_pnfs_getdeviceinfo_arg args = {
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 25aa191..a096e5b 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -320,6 +320,11 @@ static int nfs4_stat_to_errno(int);
 				4 /* opaque devaddr4 length */ +\
 				4 /* notification bitmap length */ + \
 				4 /* notification bitmap */)
+#define encode_layoutget_sz	(op_encode_hdr_maxsz + 10 + \
+				encode_stateid_maxsz)
+#define decode_layoutget_maxsz	(op_decode_hdr_maxsz + 8 + \
+				decode_stateid_maxsz + \
+				XDR_QUADLEN(PNFS_LAYOUT_MAXSIZE))
 #else /* CONFIG_NFS_V4_1 */
 #define encode_sequence_maxsz	0
 #define decode_sequence_maxsz	0
@@ -715,6 +720,14 @@ static int nfs4_stat_to_errno(int);
 #define NFS4_dec_getdeviceinfo_sz (compound_decode_hdr_maxsz +    \
 				decode_sequence_maxsz + \
 				decode_getdeviceinfo_maxsz)
+#define NFS4_enc_layoutget_sz	(compound_encode_hdr_maxsz + \
+				encode_sequence_maxsz + \
+				encode_putfh_maxsz +        \
+				encode_layoutget_sz)
+#define NFS4_dec_layoutget_sz	(compound_decode_hdr_maxsz + \
+				decode_sequence_maxsz + \
+				decode_putfh_maxsz +        \
+				decode_layoutget_maxsz)
 
 const u32 nfs41_maxwrite_overhead = ((RPC_MAX_HEADER_WITH_AUTH +
 				      compound_encode_hdr_maxsz +
@@ -1766,6 +1779,36 @@ encode_getdeviceinfo(struct xdr_stream *xdr,
 	hdr->nops++;
 }
 
+static void
+encode_layoutget(struct xdr_stream *xdr,
+		      const struct nfs4_pnfs_layoutget_arg *args,
+		      struct compound_hdr *hdr)
+{
+	nfs4_stateid stateid;
+	__be32 *p;
+
+	p = reserve_space(xdr, 44 + NFS4_STATEID_SIZE);
+	*p++ = cpu_to_be32(OP_LAYOUTGET);
+	*p++ = cpu_to_be32(0);     /* Signal layout available */
+	*p++ = cpu_to_be32(args->type);
+	*p++ = cpu_to_be32(args->lseg.iomode);
+	p = xdr_encode_hyper(p, args->lseg.offset);
+	p = xdr_encode_hyper(p, args->lseg.length);
+	p = xdr_encode_hyper(p, args->minlength);
+	pnfs_get_layout_stateid(&stateid, NFS_I(args->inode)->layout);
+	p = xdr_encode_opaque_fixed(p, &stateid.u.data, NFS4_STATEID_SIZE);
+	*p = cpu_to_be32(args->maxcount);
+
+	dprintk("%s: 1st type:0x%x iomode:%d off:%lu len:%lu mc:%d\n",
+		__func__,
+		args->type,
+		args->lseg.iomode,
+		(unsigned long)args->lseg.offset,
+		(unsigned long)args->lseg.length,
+		args->maxcount);
+	hdr->nops++;
+	hdr->replen += decode_layoutget_maxsz;
+}
 #endif /* CONFIG_NFS_V4_1 */
 
 /*
@@ -2617,6 +2660,25 @@ static int nfs4_xdr_enc_getdeviceinfo(struct rpc_rqst *req, uint32_t *p,
 	return 0;
 }
 
+/*
+ *  Encode LAYOUTGET request
+ */
+static int nfs4_xdr_enc_layoutget(struct rpc_rqst *req, uint32_t *p,
+				  struct nfs4_pnfs_layoutget_arg *args)
+{
+	struct xdr_stream xdr;
+	struct compound_hdr hdr = {
+		.minorversion = nfs4_xdr_minorversion(&args->seq_args),
+	};
+
+	xdr_init_encode(&xdr, &req->rq_snd_buf, p);
+	encode_compound_hdr(&xdr, req, &hdr);
+	encode_sequence(&xdr, &args->seq_args, &hdr);
+	encode_putfh(&xdr, NFS_FH(args->inode), &hdr);
+	encode_layoutget(&xdr, args, &hdr);
+	encode_nops(&hdr);
+	return 0;
+}
 #endif /* CONFIG_NFS_V4_1 */
 
 static void print_overflow_msg(const char *func, const struct xdr_stream *xdr)
@@ -4921,6 +4983,75 @@ out_overflow:
 	print_overflow_msg(__func__, xdr);
 	return -EIO;
 }
+
+static int decode_layoutget(struct xdr_stream *xdr, struct rpc_rqst *req,
+			    struct nfs4_pnfs_layoutget_res *res)
+{
+	__be32 *p;
+	int status;
+	u32 layout_count, dummy;
+
+	status = decode_op_hdr(xdr, OP_LAYOUTGET);
+	if (status)
+		return status;
+	p = xdr_inline_decode(xdr, 8 + NFS4_STATEID_SIZE);
+	if (unlikely(!p))
+		goto out_overflow;
+	res->return_on_close = be32_to_cpup(p++);
+	p = xdr_decode_opaque_fixed(p, res->stateid.u.data, NFS4_STATEID_SIZE);
+	layout_count = be32_to_cpup(p);
+	if (!layout_count) {
+		dprintk("%s: server responded with empty layout array\n",
+			__func__);
+		return -EINVAL;
+	}
+
+	p = xdr_inline_decode(xdr, 24);
+	if (unlikely(!p))
+		goto out_overflow;
+	p = xdr_decode_hyper(p, &res->lseg.offset);
+	p = xdr_decode_hyper(p, &res->lseg.length);
+	res->lseg.iomode = be32_to_cpup(p++);
+	res->type = be32_to_cpup(p++);
+
+	status = decode_opaque_inline(xdr, &res->layout.len, (char **)&p);
+	if (unlikely(status))
+		return status;
+
+	dprintk("%s roff:%lu rlen:%lu riomode:%d, lo_type:0x%x, lo.len:%d\n",
+		__func__,
+		(unsigned long)res->lseg.offset,
+		(unsigned long)res->lseg.length,
+		res->lseg.iomode,
+		res->type,
+		res->layout.len);
+
+	/* presuambly, pnfs4_proc_layoutget allocated a single page */
+	if (res->layout.len > PAGE_SIZE)
+		return -ENOMEM;
+	memcpy(res->layout.buf, p, res->layout.len);
+
+	/* FIXME: the whole layout array should be passed up to the pnfs
+	 * client */
+	if (layout_count > 1) {
+		dprintk("%s: server responded with %d layouts, dropping tail\n",
+			__func__, layout_count);
+
+		while (--layout_count) {
+			p = xdr_inline_decode(xdr, 24);
+			if (unlikely(!p))
+				goto out_overflow;
+			status = decode_opaque_inline(xdr, &dummy, (char **)&p);
+			if (unlikely(status))
+				return status;
+		}
+	}
+
+	return 0;
+out_overflow:
+	print_overflow_msg(__func__, xdr);
+	return -EIO;
+}
 #endif /* CONFIG_NFS_V4_1 */
 
 /*
@@ -5973,6 +6104,30 @@ out:
 	return status;
 }
 
+/*
+ * Decode LAYOUTGET response
+ */
+static int nfs4_xdr_dec_layoutget(struct rpc_rqst *rqstp, uint32_t *p,
+				  struct nfs4_pnfs_layoutget_res *res)
+{
+	struct xdr_stream xdr;
+	struct compound_hdr hdr;
+	int status;
+
+	xdr_init_decode(&xdr, &rqstp->rq_rcv_buf, p);
+	status = decode_compound_hdr(&xdr, &hdr);
+	if (status)
+		goto out;
+	status = decode_sequence(&xdr, &res->seq_res, rqstp);
+	if (status)
+		goto out;
+	status = decode_putfh(&xdr);
+	if (status)
+		goto out;
+	status = decode_layoutget(&xdr, rqstp, res);
+out:
+	return status;
+}
 #endif /* CONFIG_NFS_V4_1 */
 
 __be32 *nfs4_decode_dirent(__be32 *p, struct nfs_entry *entry, int plus)
@@ -6152,6 +6307,7 @@ struct rpc_procinfo	nfs4_procedures[] = {
   PROC(GET_LEASE_TIME,	enc_get_lease_time,	dec_get_lease_time),
   PROC(RECLAIM_COMPLETE, enc_reclaim_complete,  dec_reclaim_complete),
   PROC(PNFS_GETDEVICEINFO, enc_getdeviceinfo, dec_getdeviceinfo),
+  PROC(PNFS_LAYOUTGET,  enc_layoutget,     dec_layoutget),
 #endif /* CONFIG_NFS_V4_1 */
 };
 
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index cfee1d6..36a3056 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -256,6 +256,12 @@ put_layout(struct inode *inode)
 }
 
 void
+pnfs_layout_release(struct pnfs_layout_type *lo,
+		    struct nfs4_pnfs_layout_segment *range)
+{
+}
+
+void
 pnfs_destroy_layout(struct nfs_inode *nfsi)
 {
 	struct pnfs_layout_type *lo;
@@ -525,6 +531,19 @@ pnfs_alloc_layout(struct inode *ino)
 	return nfsi->layout;
 }
 
+void
+pnfs_get_layout_done(struct nfs4_pnfs_layoutget *lgp, int rpc_status)
+{
+}
+
+int
+pnfs_layout_process(struct nfs4_pnfs_layoutget *lgp)
+{
+	int status = 0;
+
+	return status;
+}
+
 /* Callback operations for layout drivers.
  */
 struct pnfs_client_operations pnfs_ops = {
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index d8de4c1..8c1d50e 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -24,11 +24,16 @@
 /* nfs4proc.c */
 extern int nfs4_pnfs_getdeviceinfo(struct nfs_server *server,
 				   struct pnfs_device *dev);
+extern int pnfs4_proc_layoutget(struct nfs4_pnfs_layoutget *lgp);
+
 /* pnfs.c */
 void put_lseg(struct pnfs_layout_segment *lseg);
 void set_pnfs_layoutdriver(struct nfs_server *, u32 id);
 void unmount_pnfs_layoutdriver(struct nfs_server *);
 int pnfs_initialize(void);
+void pnfs_get_layout_done(struct nfs4_pnfs_layoutget *, int rpc_status);
+int pnfs_layout_process(struct nfs4_pnfs_layoutget *lgp);
+void pnfs_layout_release(struct pnfs_layout_type *, struct nfs4_pnfs_layout_segment *range);
 void pnfs_set_layout_stateid(struct pnfs_layout_type *lo,
 			     const nfs4_stateid *stateid);
 void pnfs_destroy_layout(struct nfs_inode *);
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 06912b0..a5f5c94 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -546,6 +546,7 @@ enum {
 	NFSPROC4_CLNT_SEQUENCE,
 	NFSPROC4_CLNT_GET_LEASE_TIME,
 	NFSPROC4_CLNT_RECLAIM_COMPLETE,
+	NFSPROC4_CLNT_PNFS_LAYOUTGET,
 	NFSPROC4_CLNT_PNFS_GETDEVICEINFO,
 };
 
diff --git a/include/linux/pnfs_xdr.h b/include/linux/pnfs_xdr.h
index e6743f3..b85320d 100644
--- a/include/linux/pnfs_xdr.h
+++ b/include/linux/pnfs_xdr.h
@@ -19,6 +19,10 @@ struct pnfs_deviceid {
 	char data[NFS4_PNFS_DEVICEID4_SIZE];
 };
 
+struct nfs4_pnfs_layout {
+	__u32 len;
+	void *buf;
+};
 
 struct nfs4_pnfs_layout_segment {
 	u32 iomode;
@@ -26,7 +30,29 @@ struct nfs4_pnfs_layout_segment {
 	u64 length;
 };
 
+struct nfs4_pnfs_layoutget_arg {
+	__u32 type;
+	struct nfs4_pnfs_layout_segment lseg;
+	__u64 minlength;
+	__u32 maxcount;
+	struct inode *inode;
+	struct nfs4_sequence_args seq_args;
+};
+
 struct nfs4_pnfs_layoutget_res {
+	__u32 return_on_close;
+	struct nfs4_pnfs_layout_segment lseg;
+	__u32 type;
+	nfs4_stateid stateid;
+	struct nfs4_pnfs_layout layout;
+	struct nfs4_sequence_res seq_res;
+};
+
+struct nfs4_pnfs_layoutget {
+	struct nfs4_pnfs_layoutget_arg args;
+	struct nfs4_pnfs_layoutget_res res;
+	struct pnfs_layout_segment **lsegpp;
+	int status;
 };
 
 struct nfs4_pnfs_getdeviceinfo_arg {
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 23/50] pnfs_submit: layout helper functions
  2010-08-13 21:31                                           ` [PATCH 22/50] pnfs_submit: layoutget andros
@ 2010-08-13 21:31                                             ` andros
  2010-08-13 21:31                                               ` [PATCH 24/50] pnfs_submit: filelayout layout segment alloc and free andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/inode.c         |    1 +
 fs/nfs/pnfs.c          |  377 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/pnfs.h          |   34 +++++
 include/linux/nfs_fs.h |    1 +
 4 files changed, 413 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index ce91e8f..5e355de 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1445,6 +1445,7 @@ static inline void nfs4_init_once(struct nfs_inode *nfsi)
 	init_rwsem(&nfsi->rwsem);
 #ifdef CONFIG_NFS_V4_1
 	init_waitqueue_head(&nfsi->lo_waitq);
+	nfsi->pnfs_layout_suspend = 0;
 	nfsi->layout = NULL;
 #endif /* CONFIG_NFS_V4_1 */
 #endif
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 36a3056..0f98261 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -259,6 +259,18 @@ void
 pnfs_layout_release(struct pnfs_layout_type *lo,
 		    struct nfs4_pnfs_layout_segment *range)
 {
+	struct nfs_inode *nfsi = PNFS_NFS_INODE(lo);
+
+	spin_lock(&nfsi->vfs_inode.i_lock);
+	if (range)
+		pnfs_free_layout(lo, range);
+	/*
+	 * Matched in _pnfs_update_layout for layoutget
+	 * and by get_layout in _pnfs_return_layout for layoutreturn
+	 */
+	put_layout_locked(lo);
+	spin_unlock(&nfsi->vfs_inode.i_lock);
+	wake_up_all(&nfsi->lo_waitq);
 }
 
 void
@@ -412,6 +424,61 @@ pnfs_layout_from_open_stateid(struct pnfs_layout_type *lo,
 }
 
 /*
+* Get layout from server.
+*    for now, assume that whole file layouts are requested.
+*    arg->offset: 0
+*    arg->length: all ones
+*/
+static int
+send_layoutget(struct inode *ino,
+	   struct nfs_open_context *ctx,
+	   struct nfs4_pnfs_layout_segment *range,
+	   struct pnfs_layout_segment **lsegpp,
+	   struct pnfs_layout_type *lo)
+{
+	int status;
+	struct nfs_server *server = NFS_SERVER(ino);
+	struct nfs4_pnfs_layoutget *lgp;
+
+	dprintk("--> %s\n", __func__);
+
+	lgp = kzalloc(sizeof(*lgp), GFP_KERNEL);
+	if (lgp == NULL) {
+		pnfs_layout_release(lo, NULL);
+		return -ENOMEM;
+	}
+	lgp->args.minlength = NFS4_MAX_UINT64;
+	lgp->args.maxcount = PNFS_LAYOUT_MAXSIZE;
+	lgp->args.lseg.iomode = range->iomode;
+	lgp->args.lseg.offset = 0;
+	lgp->args.lseg.length = NFS4_MAX_UINT64;
+	lgp->args.type = server->pnfs_curr_ld->id;
+	lgp->args.inode = ino;
+	lgp->lsegpp = lsegpp;
+
+	if (!memcmp(lo->stateid.u.data, &zero_stateid, NFS4_STATEID_SIZE)) {
+		struct nfs_open_context *oldctx = ctx;
+
+		if (!oldctx) {
+			ctx = nfs_find_open_context(ino, NULL,
+					(range->iomode == IOMODE_READ) ?
+					FMODE_READ: FMODE_WRITE);
+			BUG_ON(!ctx);
+		}
+		/* Set the layout stateid from the open stateid */
+		pnfs_layout_from_open_stateid(NFS_I(ino)->layout, ctx->state);
+		if (!oldctx)
+			put_nfs_open_context(ctx);
+	}
+
+	/* Retrieve layout information from server */
+	status = pnfs4_proc_layoutget(lgp);
+
+	dprintk("<-- %s status %d\n", __func__, status);
+	return status;
+}
+
+/*
  * iomode matching rules:
  * range	lseg	match
  * -----	-----	-----
@@ -471,6 +538,62 @@ pnfs_free_layout(struct pnfs_layout_type *lo,
 }
 
 /*
+ * cmp two layout segments for sorting into layout cache
+ */
+static inline s64
+cmp_layout(struct nfs4_pnfs_layout_segment *l1,
+	   struct nfs4_pnfs_layout_segment *l2)
+{
+	/* read > read/write */
+	return (int)(l1->iomode == IOMODE_READ) -
+	       (int)(l2->iomode == IOMODE_READ);
+}
+
+static void
+pnfs_insert_layout(struct pnfs_layout_type *lo,
+		   struct pnfs_layout_segment *lseg)
+{
+	struct pnfs_layout_segment *lp;
+	int found = 0;
+
+	dprintk("%s:Begin\n", __func__);
+
+	BUG_ON_UNLOCKED_LO(lo);
+	if (list_empty(&lo->segs)) {
+		struct nfs_client *clp = PNFS_NFS_SERVER(lo)->nfs_client;
+
+		spin_lock(&clp->cl_lock);
+		BUG_ON(!list_empty(&lo->lo_layouts));
+		list_add_tail(&lo->lo_layouts, &clp->cl_layouts);
+		spin_unlock(&clp->cl_lock);
+	}
+	list_for_each_entry (lp, &lo->segs, fi_list) {
+		if (cmp_layout(&lp->range, &lseg->range) > 0)
+			continue;
+		list_add_tail(&lseg->fi_list, &lp->fi_list);
+		dprintk("%s: inserted lseg %p "
+			"iomode %d offset %llu length %llu before "
+			"lp %p iomode %d offset %llu length %llu\n",
+			__func__, lseg, lseg->range.iomode,
+			lseg->range.offset, lseg->range.length,
+			lp, lp->range.iomode, lp->range.offset,
+			lp->range.length);
+		found = 1;
+		break;
+	}
+	if (!found) {
+		list_add_tail(&lseg->fi_list, &lo->segs);
+		dprintk("%s: inserted lseg %p "
+			"iomode %d offset %llu length %llu at tail\n",
+			__func__, lseg, lseg->range.iomode,
+			lseg->range.offset, lseg->range.length);
+	}
+	get_layout(lo);
+
+	dprintk("%s:Return\n", __func__);
+}
+
+/*
  * Each layoutdriver embeds pnfs_layout_type as the first field in it's
  * per-layout type layout cache structure and returns it ZEROed
  * from layoutdriver_io_ops->alloc_layout
@@ -531,16 +654,270 @@ pnfs_alloc_layout(struct inode *ino)
 	return nfsi->layout;
 }
 
+/*
+ * iomode matching rules:
+ * range	lseg	match
+ * -----	-----	-----
+ * ANY		READ	true
+ * ANY		RW	true
+ * RW		READ	false
+ * RW		RW	true
+ * READ		READ	true
+ * READ		RW	true
+ */
+static inline int
+has_matching_lseg(struct pnfs_layout_segment *lseg,
+		  struct nfs4_pnfs_layout_segment *range)
+{
+	return (range->iomode != IOMODE_RW || lseg->range.iomode == IOMODE_RW);
+}
+
+/*
+ * lookup range in layout
+ */
+static struct pnfs_layout_segment *
+pnfs_has_layout(struct pnfs_layout_type *lo,
+		struct nfs4_pnfs_layout_segment *range)
+{
+	struct pnfs_layout_segment *lseg, *ret = NULL;
+
+	dprintk("%s:Begin\n", __func__);
+
+	BUG_ON_UNLOCKED_LO(lo);
+	list_for_each_entry (lseg, &lo->segs, fi_list) {
+		if (has_matching_lseg(lseg, range)) {
+			ret = lseg;
+			get_lseg(ret);
+			break;
+		}
+		if (cmp_layout(range, &lseg->range) > 0)
+			break;
+	}
+
+	dprintk("%s:Return lseg %p ref %d valid %d\n",
+		__func__, ret, ret ? atomic_read(&ret->kref.refcount) : 0,
+		ret ? ret->valid : 0);
+	return ret;
+}
+
+/* Update the file's layout for the given range and iomode.
+ * Layout is retreived from the server if needed.
+ * The appropriate layout segment is referenced and returned to the caller.
+ */
+void
+_pnfs_update_layout(struct inode *ino,
+		   struct nfs_open_context *ctx,
+		   enum pnfs_iomode iomode,
+		   struct pnfs_layout_segment **lsegpp)
+{
+	struct nfs4_pnfs_layout_segment arg = {
+		.iomode = iomode,
+		.offset = 0,
+		.length = NFS4_MAX_UINT64,
+	};
+	struct nfs_inode *nfsi = NFS_I(ino);
+	struct pnfs_layout_type *lo;
+	struct pnfs_layout_segment *lseg = NULL;
+
+	*lsegpp = NULL;
+	spin_lock(&ino->i_lock);
+	lo = pnfs_alloc_layout(ino);
+	if (lo == NULL) {
+		dprintk("%s ERROR: can't get pnfs_layout_type\n", __func__);
+		goto out_unlock;
+	}
+
+	/* Check to see if the layout for the given range already exists */
+	lseg = pnfs_has_layout(lo, &arg);
+	if (lseg && !lseg->valid) {
+		put_lseg_locked(lseg);
+		/* someone is cleaning the layout */
+		lseg = NULL;
+		goto out_unlock;
+	}
+
+	if (lseg) {
+		dprintk("%s: Using cached lseg %p for %llu@%llu iomode %d)\n",
+			__func__,
+			lseg,
+			arg.length,
+			arg.offset,
+			arg.iomode);
+
+		goto out_unlock;
+	}
+
+	/* if get layout already failed once goto out */
+	if (test_bit(lo_fail_bit(iomode), &nfsi->layout->pnfs_layout_state)) {
+		if (unlikely(nfsi->pnfs_layout_suspend &&
+		    get_seconds() >= nfsi->pnfs_layout_suspend)) {
+			dprintk("%s: layout_get resumed\n", __func__);
+			clear_bit(lo_fail_bit(iomode),
+				  &nfsi->layout->pnfs_layout_state);
+			nfsi->pnfs_layout_suspend = 0;
+		} else
+			goto out_unlock;
+	}
+
+	/* Reference the layout for layoutget matched in pnfs_layout_release */
+	get_layout(lo);
+	spin_unlock(&ino->i_lock);
+
+	send_layoutget(ino, ctx, &arg, lsegpp, lo);
+out:
+	dprintk("%s end, state 0x%lx lseg %p\n", __func__,
+		nfsi->layout->pnfs_layout_state, lseg);
+	return;
+out_unlock:
+	*lsegpp = lseg;
+	spin_unlock(&ino->i_lock);
+	goto out;
+}
+
 void
 pnfs_get_layout_done(struct nfs4_pnfs_layoutget *lgp, int rpc_status)
 {
+	struct pnfs_layout_segment *lseg = NULL;
+	struct nfs_inode *nfsi = NFS_I(lgp->args.inode);
+	time_t suspend = 0;
+
+	dprintk("-->%s\n", __func__);
+
+	lgp->status = rpc_status;
+	if (likely(!rpc_status)) {
+		if (unlikely(lgp->res.layout.len < 0)) {
+			printk(KERN_ERR
+			       "%s: ERROR Returned layout size is ZERO\n", __func__);
+			lgp->status = -EIO;
+		}
+		goto out;
+	}
+
+	dprintk("%s: ERROR retrieving layout %d\n", __func__, rpc_status);
+	switch (rpc_status) {
+	case -NFS4ERR_BADLAYOUT:
+		lgp->status = -ENOENT;
+		/* FALLTHROUGH */
+	case -EACCES:	/* NFS4ERR_ACCESS */
+		/* transient error, don't mark with NFS_INO_LAYOUT_FAILED */
+		goto out;
+
+	case -NFS4ERR_LAYOUTTRYLATER:
+	case -NFS4ERR_RECALLCONFLICT:
+	case -NFS4ERR_OLD_STATEID:
+	case -EAGAIN:	/* NFS4ERR_LOCKED */
+		lgp->status = -NFS4ERR_DELAY;	/* for nfs4_handle_exception */
+		/* FALLTHROUGH */
+	case -NFS4ERR_GRACE:
+	case -NFS4ERR_DELAY:
+		goto out;
+
+	case -NFS4ERR_ADMIN_REVOKED:
+	case -NFS4ERR_DELEG_REVOKED:
+		/* The layout is expected to be returned at this point.
+		 * This should clear the layout stateid as well */
+		suspend = get_seconds() + 1;
+		break;
+
+	case -NFS4ERR_LAYOUTUNAVAILABLE:
+		lgp->status = -ENOTSUPP;
+		break;
+
+	case -NFS4ERR_REP_TOO_BIG:
+	case -NFS4ERR_REP_TOO_BIG_TO_CACHE:
+		lgp->status = -E2BIG;
+		break;
+
+	/* Leave the following errors untranslated */
+	case -NFS4ERR_DEADSESSION:
+	case -NFS4ERR_DQUOT:
+	case -EINVAL:		/* NFS4ERR_INVAL */
+	case -EIO:		/* NFS4ERR_IO */
+	case -NFS4ERR_FHEXPIRED:
+	case -NFS4ERR_MOVED:
+	case -NFS4ERR_NOSPC:
+	case -ESERVERFAULT:	/* NFS4ERR_SERVERFAULT */
+	case -ESTALE:		/* NFS4ERR_STALE */
+	case -ETOOSMALL:	/* NFS4ERR_TOOSMALL */
+		break;
+
+	/* The following errors are our fault and should never happen */
+	case -NFS4ERR_BADIOMODE:
+	case -NFS4ERR_BADXDR:
+	case -NFS4ERR_REQ_TOO_BIG:
+	case -NFS4ERR_UNKNOWN_LAYOUTTYPE:
+	case -NFS4ERR_WRONG_TYPE:
+		lgp->status = -EINVAL;
+		/* FALLTHROUGH */
+	case -NFS4ERR_BAD_STATEID:
+	case -NFS4ERR_NOFILEHANDLE:
+	case -ENOTSUPP:	/* NFS4ERR_NOTSUPP */
+	case -NFS4ERR_OPENMODE:
+	case -NFS4ERR_OP_NOT_IN_SESSION:
+	case -NFS4ERR_TOO_MANY_OPS:
+		dprintk("%s: error %d: should never happen\n", __func__,
+			rpc_status);
+		break;
+
+	/* The following errors are the server's fault */
+	default:
+		dprintk("%s: illegal error %d\n", __func__, rpc_status);
+		lgp->status = -EIO;
+		break;
+	}
+
+	/* remember that get layout failed and suspend trying */
+	nfsi->pnfs_layout_suspend = suspend;
+	set_bit(lo_fail_bit(lgp->args.lseg.iomode),
+		&nfsi->layout->pnfs_layout_state);
+	dprintk("%s: layout_get suspended until %ld\n",
+		__func__, suspend);
+out:
+	dprintk("%s end (err:%d) state 0x%lx lseg %p\n",
+		__func__, lgp->status, nfsi->layout->pnfs_layout_state, lseg);
+	return;
 }
 
 int
 pnfs_layout_process(struct nfs4_pnfs_layoutget *lgp)
 {
+	struct pnfs_layout_type *lo = NFS_I(lgp->args.inode)->layout;
+	struct nfs4_pnfs_layoutget_res *res = &lgp->res;
+	struct pnfs_layout_segment *lseg;
+	struct inode *ino = PNFS_INODE(lo);
 	int status = 0;
 
+	/* Inject layout blob into I/O device driver */
+	lseg = PNFS_LD_IO_OPS(lo)->alloc_lseg(lo, res);
+	if (!lseg || IS_ERR(lseg)) {
+		if (!lseg)
+			status = -ENOMEM;
+		else
+			status = PTR_ERR(lseg);
+		dprintk("%s: Could not allocate layout: error %d\n",
+		       __func__, status);
+		goto out;
+	}
+
+	spin_lock(&ino->i_lock);
+	init_lseg(lo, lseg);
+	lseg->range = res->lseg;
+	if (lgp->lsegpp) {
+		get_lseg(lseg);
+		*lgp->lsegpp = lseg;
+	}
+	pnfs_insert_layout(lo, lseg);
+
+	if (res->return_on_close) {
+		lo->roc_iomode |= res->lseg.iomode;
+		if (!lo->roc_iomode)
+			lo->roc_iomode = IOMODE_ANY;
+	}
+
+	/* Done processing layoutget. Set the layout stateid */
+	pnfs_set_layout_stateid(lo, &res->stateid);
+	spin_unlock(&ino->i_lock);
+out:
 	return status;
 }
 
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 8c1d50e..379aa18 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -28,6 +28,10 @@ extern int pnfs4_proc_layoutget(struct nfs4_pnfs_layoutget *lgp);
 
 /* pnfs.c */
 void put_lseg(struct pnfs_layout_segment *lseg);
+void _pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
+	enum pnfs_iomode access_type,
+	struct pnfs_layout_segment **lsegpp);
+
 void set_pnfs_layoutdriver(struct nfs_server *, u32 id);
 void unmount_pnfs_layoutdriver(struct nfs_server *);
 int pnfs_initialize(void);
@@ -47,6 +51,12 @@ void pnfs_get_layout_stateid(nfs4_stateid *dst, struct pnfs_layout_type *lo);
 
 #define LAYOUT_NFSV4_1_MODULE_PREFIX "nfs-layouttype4"
 
+static inline int lo_fail_bit(u32 iomode)
+{
+	return iomode == IOMODE_RW ?
+			 NFS_INO_RW_LAYOUT_FAILED : NFS_INO_RO_LAYOUT_FAILED;
+}
+
 static inline void get_lseg(struct pnfs_layout_segment *lseg)
 {
 	kref_get(&lseg->kref);
@@ -58,6 +68,21 @@ static inline int pnfs_enabled_sb(struct nfs_server *nfss)
 	return nfss->pnfs_curr_ld != NULL;
 }
 
+static inline void pnfs_update_layout(struct inode *ino,
+	struct nfs_open_context *ctx,
+	enum pnfs_iomode access_type,
+	struct pnfs_layout_segment **lsegpp)
+{
+	struct nfs_server *nfss = NFS_SERVER(ino);
+
+	if (pnfs_enabled_sb(nfss))
+		_pnfs_update_layout(ino, ctx, access_type, lsegpp);
+	else {
+		if (lsegpp)
+			*lsegpp = NULL;
+	}
+}
+
 #else  /* CONFIG_NFS_V4_1 */
 
 static inline void pnfs_destroy_all_layouts(struct nfs_client *clp)
@@ -76,6 +101,15 @@ static inline void put_lseg(struct pnfs_layout_segment *lseg)
 {
 }
 
+static inline void
+pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
+	enum pnfs_iomode access_type,
+	struct pnfs_layout_segment **lsegpp)
+{
+	if (lsegpp)
+		*lsegpp = NULL;
+}
+
 #endif /* CONFIG_NFS_V4_1 */
 
 #endif /* FS_NFS_PNFS_H */
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index c8b6129..7202c05 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -213,6 +213,7 @@ struct nfs_inode {
 #if defined(CONFIG_NFS_V4_1)
 	wait_queue_head_t lo_waitq;
 	struct pnfs_layout_type *layout;
+	time_t pnfs_layout_suspend;
 #endif /* CONFIG_NFS_V4_1 */
 #endif /* CONFIG_NFS_V4*/
 #ifdef CONFIG_NFS_FSCACHE
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 24/50] pnfs_submit: filelayout layout segment alloc and free
  2010-08-13 21:31                                             ` [PATCH 23/50] pnfs_submit: layout helper functions andros
@ 2010-08-13 21:31                                               ` andros
  2010-08-13 21:31                                                 ` [PATCH 25/50] pnfs_submit: layoutcommit helper functions andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayout.c   |  201 +++++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/nfs4filelayout.h   |   16 ++++
 fs/nfs/pnfs.c             |   22 +++++
 include/linux/nfs4.h      |    5 +
 include/linux/nfs4_pnfs.h |   12 +++
 5 files changed, 256 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index e1a09a8..50620f4 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -46,6 +46,7 @@
 #include "nfs4filelayout.h"
 #include "nfs4_fs.h"
 #include "internal.h"
+#include "pnfs.h"
 
 #define NFSDBG_FACILITY         NFSDBG_PNFS_LD
 
@@ -104,9 +105,209 @@ filelayout_free_layout(struct pnfs_layout_type *lo)
 	dprintk("NFS_FILELAYOUT: freeing layout\n");
 	kfree(FILE_LO(lo));
 }
+
+/*
+ * filelayout_check_layout()
+ *
+ * Make sure layout segment parameters are sane WRT the device.
+ *
+ * Notes:
+ * 1) current code insists that # stripe index = # data servers in ds_list
+ *    which is wrong.
+ * 2) pattern_offset is ignored and must == 0 which is wrong;
+ * 3) the pattern_offset needs to be a mutliple of the stripe unit.
+ * 4) stripe unit is multiple of page size
+ */
+
+static int
+filelayout_check_layout(struct pnfs_layout_type *lo,
+			struct pnfs_layout_segment *lseg)
+{
+	struct nfs4_filelayout_segment *fl = LSEG_LD_DATA(lseg);
+	struct nfs4_file_layout_dsaddr *dsaddr;
+	int status = -EINVAL;
+	struct nfs_server *nfss = NFS_SERVER(PNFS_INODE(lo));
+
+	dprintk("--> %s\n", __func__);
+	dsaddr = nfs4_pnfs_device_item_find(nfss->nfs_client, &fl->dev_id);
+	if (dsaddr == NULL) {
+		dsaddr = get_device_info(PNFS_INODE(lo), &fl->dev_id);
+		if (dsaddr == NULL) {
+			dprintk("%s NO device for dev_id %s\n",
+				__func__, deviceid_fmt(&fl->dev_id));
+			goto out;
+		}
+	}
+	if (fl->first_stripe_index < 0 ||
+	    fl->first_stripe_index > dsaddr->stripe_count) {
+		dprintk("%s Bad first_stripe_index %d\n",
+				__func__, fl->first_stripe_index);
+		goto out;
+	}
+
+	if (fl->pattern_offset != 0) {
+		dprintk("%s Unsupported no-zero pattern_offset %Ld\n",
+				__func__, fl->pattern_offset);
+		goto out;
+	}
+
+	if (fl->stripe_unit % PAGE_SIZE) {
+		dprintk("%s Stripe unit (%u) not page aligned\n",
+			__func__, fl->stripe_unit);
+		goto out;
+	}
+
+	/* XXX only support SPARSE packing. Don't support use MDS open fh */
+	if (!(fl->num_fh == 1 || fl->num_fh == dsaddr->ds_num)) {
+		dprintk("%s num_fh %u not equal to 1 or ds_num %u\n",
+			__func__, fl->num_fh, dsaddr->ds_num);
+		goto out;
+	}
+
+	if (fl->stripe_unit % nfss->rsize || fl->stripe_unit % nfss->wsize) {
+		dprintk("%s Stripe unit (%u) not aligned with rsize %u "
+			"wsize %u\n", __func__, fl->stripe_unit, nfss->rsize,
+			nfss->wsize);
+	}
+
+	/* reference the device */
+	nfs4_set_layout_deviceid(lseg, &dsaddr->deviceid);
+
+	status = 0;
+out:
+	dprintk("--> %s returns %d\n", __func__, status);
+	return status;
+}
+
+static void _filelayout_free_lseg(struct pnfs_layout_segment *lseg);
+static void filelayout_free_fh_array(struct nfs4_filelayout_segment *fl);
+
+/* Decode layout and store in layoutid.  Overwrite any existing layout
+ * information for this file.
+ */
+static int
+filelayout_set_layout(struct nfs4_filelayout *flo,
+		      struct nfs4_filelayout_segment *fl,
+		      struct nfs4_pnfs_layoutget_res *lgr)
+{
+	uint32_t *p = (uint32_t *)lgr->layout.buf;
+	uint32_t nfl_util;
+	int i;
+
+	dprintk("%s: set_layout_map Begin\n", __func__);
+
+	memcpy(&fl->dev_id, p, NFS4_PNFS_DEVICEID4_SIZE);
+	p += XDR_QUADLEN(NFS4_PNFS_DEVICEID4_SIZE);
+	nfl_util = be32_to_cpup(p++);
+	if (nfl_util & NFL4_UFLG_COMMIT_THRU_MDS)
+		fl->commit_through_mds = 1;
+	if (nfl_util & NFL4_UFLG_DENSE)
+		fl->stripe_type = STRIPE_DENSE;
+	else
+		fl->stripe_type = STRIPE_SPARSE;
+	fl->stripe_unit = nfl_util & ~NFL4_UFLG_MASK;
+
+	if (!flo->stripe_unit)
+		flo->stripe_unit = fl->stripe_unit;
+	else if (flo->stripe_unit != fl->stripe_unit) {
+		printk(KERN_NOTICE "%s: updating strip_unit from %u to %u\n",
+			__func__, flo->stripe_unit, fl->stripe_unit);
+		flo->stripe_unit = fl->stripe_unit;
+	}
+
+	fl->first_stripe_index = be32_to_cpup(p++);
+	p = xdr_decode_hyper(p, &fl->pattern_offset);
+	fl->num_fh = be32_to_cpup(p++);
+
+	dprintk("%s: nfl_util 0x%X num_fh %u fsi %u po %llu dev_id %s\n",
+		__func__, nfl_util, fl->num_fh, fl->first_stripe_index,
+		fl->pattern_offset, deviceid_fmt(&fl->dev_id));
+
+	if (fl->num_fh * sizeof(struct nfs_fh) > 2*PAGE_SIZE) {
+		fl->fh_array = vmalloc(fl->num_fh * sizeof(struct nfs_fh));
+		if (fl->fh_array)
+			memset(fl->fh_array, 0,
+				fl->num_fh * sizeof(struct nfs_fh));
+	} else {
+		fl->fh_array = kzalloc(fl->num_fh * sizeof(struct nfs_fh),
+					GFP_KERNEL);
+       }
+	if (!fl->fh_array)
+		return -ENOMEM;
+
+	for (i = 0; i < fl->num_fh; i++) {
+		/* fh */
+		fl->fh_array[i].size = be32_to_cpup(p++);
+		if (sizeof(struct nfs_fh) < fl->fh_array[i].size) {
+			printk(KERN_ERR "Too big fh %d received %d\n",
+				i, fl->fh_array[i].size);
+			/* Layout is now invalid, pretend it doesn't exist */
+			filelayout_free_fh_array(fl);
+			fl->num_fh = 0;
+			break;
+		}
+		memcpy(fl->fh_array[i].data, p, fl->fh_array[i].size);
+		p += XDR_QUADLEN(fl->fh_array[i].size);
+		dprintk("DEBUG: %s: fh len %d\n", __func__,
+					fl->fh_array[i].size);
+	}
+
+	return 0;
+}
+
+static struct pnfs_layout_segment *
+filelayout_alloc_lseg(struct pnfs_layout_type *layoutid,
+		      struct nfs4_pnfs_layoutget_res *lgr)
+{
+	struct nfs4_filelayout *flo = FILE_LO(layoutid);
+	struct pnfs_layout_segment *lseg;
+	int rc;
+
+	dprintk("--> %s\n", __func__);
+	lseg = kzalloc(sizeof(struct pnfs_layout_segment) +
+		       sizeof(struct nfs4_filelayout_segment), GFP_KERNEL);
+	if (!lseg)
+		return NULL;
+
+	rc = filelayout_set_layout(flo, LSEG_LD_DATA(lseg), lgr);
+
+	if (rc != 0 || filelayout_check_layout(layoutid, lseg)) {
+		_filelayout_free_lseg(lseg);
+		lseg = NULL;
+	}
+	return lseg;
+}
+
+static void filelayout_free_fh_array(struct nfs4_filelayout_segment *fl)
+{
+	if (fl->num_fh * sizeof(struct nfs_fh) > 2*PAGE_SIZE)
+		vfree(fl->fh_array);
+	else
+		kfree(fl->fh_array);
+
+	fl->fh_array = NULL;
+}
+
+static void
+_filelayout_free_lseg(struct pnfs_layout_segment *lseg)
+{
+	filelayout_free_fh_array(LSEG_LD_DATA(lseg));
+	kfree(lseg);
+}
+
+static void
+filelayout_free_lseg(struct pnfs_layout_segment *lseg)
+{
+	dprintk("--> %s\n", __func__);
+	nfs4_unset_layout_deviceid(lseg, lseg->deviceid,
+				   nfs4_fl_free_deviceid_callback);
+	_filelayout_free_lseg(lseg);
+}
 struct layoutdriver_io_operations filelayout_io_operations = {
 	.alloc_layout            = filelayout_alloc_layout,
 	.free_layout             = filelayout_free_layout,
+	.alloc_lseg              = filelayout_alloc_lseg,
+	.free_lseg               = filelayout_free_lseg,
 	.initialize_mountpoint   = filelayout_initialize_mountpoint,
 	.uninitialize_mountpoint = filelayout_uninitialize_mountpoint,
 };
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index ad975fd..aeb2147 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -23,6 +23,11 @@
 #define NFS4_PNFS_MAX_STRIPE_CNT 4096
 #define NFS4_PNFS_MAX_MULTI_CNT  64 /* 256 fit into a u8 stripe_index */
 
+enum stripetype4 {
+	STRIPE_SPARSE = 1,
+	STRIPE_DENSE = 2
+};
+
 /* Individual ip address */
 struct nfs4_pnfs_ds {
 	struct list_head	ds_node;  /* nfs4_pnfs_dev_hlist dev_dslist */
@@ -41,6 +46,17 @@ struct nfs4_file_layout_dsaddr {
 	struct nfs4_pnfs_ds	*ds_list[1];
 };
 
+struct nfs4_filelayout_segment {
+	u32 stripe_type;
+	u32 commit_through_mds;
+	u32 stripe_unit;
+	u32 first_stripe_index;
+	u64 pattern_offset;
+	struct pnfs_deviceid dev_id;
+	unsigned int num_fh;
+	struct nfs_fh *fh_array;
+};
+
 struct nfs4_filelayout {
 	struct pnfs_layout_type fl_layout;
 	u32 stripe_unit;
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 0f98261..33be484 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -972,6 +972,28 @@ nfs4_init_deviceid_node(struct nfs4_deviceid *d)
 }
 EXPORT_SYMBOL(nfs4_init_deviceid_node);
 
+/* Called from layoutdriver_io_operations->alloc_lseg */
+void
+nfs4_set_layout_deviceid(struct pnfs_layout_segment *l, struct nfs4_deviceid *d)
+{
+	dprintk("%s [%d]\n", __func__, atomic_read(&d->de_kref.refcount));
+	l->deviceid = d;
+	kref_get(&d->de_kref);
+}
+EXPORT_SYMBOL(nfs4_set_layout_deviceid);
+
+/* Called from layoutdriver_io_operations->free_lseg */
+void
+nfs4_unset_layout_deviceid(struct pnfs_layout_segment *l,
+			   struct nfs4_deviceid *d,
+			   void (*free_callback)(struct kref *))
+{
+	dprintk("%s [%d]\n", __func__, atomic_read(&d->de_kref.refcount));
+	l->deviceid = NULL;
+	kref_put(&d->de_kref, free_callback);
+}
+EXPORT_SYMBOL(nfs4_unset_layout_deviceid);
+
 struct nfs4_deviceid *
 nfs4_find_deviceid(struct nfs4_deviceid_cache *c, struct pnfs_deviceid *id)
 {
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index a5f5c94..2e11a3d 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -576,6 +576,11 @@ enum pnfs_iomode {
 	IOMODE_ANY = 3,
 };
 
+#define NFL4_UFLG_MASK			0x0000003F
+#define NFL4_UFLG_DENSE			0x00000001
+#define NFL4_UFLG_COMMIT_THRU_MDS	0x00000002
+#define NFL4_UFLG_STRIPE_UNIT_SIZE_MASK	0xFFFFFFC0
+
 #endif
 #endif
 
diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
index 287a7dc..1ed509c 100644
--- a/include/linux/nfs4_pnfs.h
+++ b/include/linux/nfs4_pnfs.h
@@ -67,6 +67,12 @@ struct pnfs_layout_segment {
 	u8 ld_data[];			/* layout driver private data */
 };
 
+static inline void *
+LSEG_LD_DATA(struct pnfs_layout_segment *lseg)
+{
+	return lseg->ld_data;
+}
+
 /* Layout driver I/O operations.
  * Either the pagecache or non-pagecache read/write operations must be implemented
  */
@@ -142,6 +148,12 @@ extern struct nfs4_deviceid *nfs4_find_deviceid(struct nfs4_deviceid_cache *,
 				struct pnfs_deviceid *);
 extern struct nfs4_deviceid *nfs4_add_deviceid(struct nfs4_deviceid_cache *,
 				struct nfs4_deviceid *);
+extern void nfs4_set_layout_deviceid(struct pnfs_layout_segment *,
+				struct nfs4_deviceid *);
+extern void nfs4_unset_layout_deviceid(struct pnfs_layout_segment *,
+				struct nfs4_deviceid *,
+				void (*free_callback)(struct kref *));
+
 /* pNFS client callback functions.
  * These operations allow the layout driver to access pNFS client
  * specific information or call pNFS client->server operations.
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 25/50] pnfs_submit: layoutcommit helper functions
  2010-08-13 21:31                                               ` [PATCH 24/50] pnfs_submit: filelayout layout segment alloc and free andros
@ 2010-08-13 21:31                                                 ` andros
  2010-08-13 21:31                                                   ` [PATCH 26/50] pnfs_submit: layoutcommit andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/inode.c            |    4 ++
 fs/nfs/pnfs.c             |  119 +++++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/pnfs.h             |    4 ++
 include/linux/nfs4_pnfs.h |   13 +++++
 include/linux/pnfs_xdr.h  |   33 ++++++++++++
 5 files changed, 173 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 5e355de..97fb2d1 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1577,6 +1577,7 @@ out:
 #endif
 #ifdef CONFIG_NFS_V4_1
 out00:
+	pnfs_uninitialize();
 #endif /* CONFIG_NFS_V4_1 */
 	nfs_destroy_directcache();
 out0:
@@ -1611,6 +1612,9 @@ static void __exit exit_nfs_fs(void)
 #ifdef CONFIG_PROC_FS
 	rpc_proc_unregister("nfs");
 #endif
+#ifdef CONFIG_NFS_V4_1
+	pnfs_uninitialize();
+#endif
 	unregister_nfs_fs();
 	nfs_fs_proc_exit();
 	nfsiod_stop();
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 33be484..96d379d 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -72,6 +72,23 @@ static spinlock_t pnfs_spinlock = __SPIN_LOCK_UNLOCKED(pnfs_spinlock);
  * pnfs_modules_tbl holds all pnfs modules
  */
 static struct list_head	pnfs_modules_tbl;
+static struct kmem_cache *pnfs_cachep;
+static mempool_t *pnfs_layoutcommit_mempool;
+
+static inline struct pnfs_layoutcommit_data *pnfs_layoutcommit_alloc(void)
+{
+	struct pnfs_layoutcommit_data *p =
+			mempool_alloc(pnfs_layoutcommit_mempool, GFP_NOFS);
+	if (p)
+		memset(p, 0, sizeof(*p));
+
+	return p;
+}
+
+void pnfs_layoutcommit_free(struct pnfs_layoutcommit_data *p)
+{
+	mempool_free(p, pnfs_layoutcommit_mempool);
+}
 
 /*
  * struct pnfs_module - One per pNFS device module.
@@ -86,10 +103,31 @@ pnfs_initialize(void)
 {
 	INIT_LIST_HEAD(&pnfs_modules_tbl);
 
+	pnfs_cachep = kmem_cache_create("pnfs_layoutcommit_data",
+					sizeof(struct pnfs_layoutcommit_data),
+					0, SLAB_HWCACHE_ALIGN, NULL);
+	if (pnfs_cachep == NULL)
+		return -ENOMEM;
+
+	pnfs_layoutcommit_mempool = mempool_create(MIN_POOL_LC,
+						   mempool_alloc_slab,
+						   mempool_free_slab,
+						   pnfs_cachep);
+	if (pnfs_layoutcommit_mempool == NULL) {
+		kmem_cache_destroy(pnfs_cachep);
+		return -ENOMEM;
+	}
+
 	pnfs_initialized = 1;
 	return 0;
 }
 
+void pnfs_uninitialize(void)
+{
+	mempool_destroy(pnfs_layoutcommit_mempool);
+	kmem_cache_destroy(pnfs_cachep);
+}
+
 /* search pnfs_modules_tbl for right pnfs module */
 static int
 find_pnfs(u32 id, struct pnfs_module **module) {
@@ -105,6 +143,52 @@ find_pnfs(u32 id, struct pnfs_module **module) {
 	return 0;
 }
 
+/* Set lo_cred to indicate we require a layoutcommit
+ * If we don't even have a layout, we don't need to commit it.
+ */
+void
+pnfs_need_layoutcommit(struct nfs_inode *nfsi, struct nfs_open_context *ctx)
+{
+	dprintk("%s: has_layout=%d ctx=%p\n", __func__, has_layout(nfsi), ctx);
+	spin_lock(&nfsi->vfs_inode.i_lock);
+	if (has_layout(nfsi) &&
+	    !test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout->pnfs_layout_state)) {
+		nfsi->layout->lo_cred = get_rpccred(ctx->state->owner->so_cred);
+		__set_bit(NFS_INO_LAYOUTCOMMIT,
+			  &nfsi->layout->pnfs_layout_state);
+		nfsi->change_attr++;
+		spin_unlock(&nfsi->vfs_inode.i_lock);
+		dprintk("%s: Set layoutcommit\n", __func__);
+		return;
+	}
+	spin_unlock(&nfsi->vfs_inode.i_lock);
+}
+
+/* Update last_write_offset for layoutcommit.
+ * TODO: We should only use commited extents, but the current nfs
+ * implementation does not calculate the written range in nfs_commit_done.
+ * We therefore update this field in writeback_done.
+ */
+void
+pnfs_update_last_write(struct nfs_inode *nfsi, loff_t offset, size_t extent)
+{
+	loff_t end_pos;
+
+	spin_lock(&nfsi->vfs_inode.i_lock);
+	if (offset < nfsi->layout->pnfs_write_begin_pos)
+		nfsi->layout->pnfs_write_begin_pos = offset;
+	end_pos = offset + extent - 1; /* I'm being inclusive */
+	if (end_pos > nfsi->layout->pnfs_write_end_pos)
+		nfsi->layout->pnfs_write_end_pos = end_pos;
+	dprintk("%s: Wrote %lu@%lu bpos %lu, epos: %lu\n",
+		__func__,
+		(unsigned long) extent,
+		(unsigned long) offset ,
+		(unsigned long) nfsi->layout->pnfs_write_begin_pos,
+		(unsigned long) nfsi->layout->pnfs_write_end_pos);
+	spin_unlock(&nfsi->vfs_inode.i_lock);
+}
+
 /* Unitialize a mountpoint in a layout driver */
 void
 unmount_pnfs_layoutdriver(struct nfs_server *nfss)
@@ -921,6 +1005,41 @@ out:
 	return status;
 }
 
+/*
+ * Set up the argument/result storage required for the RPC call.
+ */
+static int
+pnfs_layoutcommit_setup(struct inode *inode,
+			struct pnfs_layoutcommit_data *data,
+			loff_t write_begin_pos, loff_t write_end_pos)
+{
+	struct nfs_server *nfss = NFS_SERVER(inode);
+	int result = 0;
+
+	dprintk("--> %s\n", __func__);
+
+	data->args.inode = inode;
+	data->args.fh = NFS_FH(inode);
+	data->args.layout_type = nfss->pnfs_curr_ld->id;
+	data->res.fattr = &data->fattr;
+	nfs_fattr_init(&data->fattr);
+
+	/* TODO: Need to determine the correct values */
+	data->args.time_modify_changed = 0;
+
+	/* Set values from inode so it can be reset
+	 */
+	data->args.lseg.iomode = IOMODE_RW;
+	data->args.lseg.offset = write_begin_pos;
+	data->args.lseg.length = write_end_pos - write_begin_pos + 1;
+	data->args.lastbytewritten =  min(write_end_pos,
+					  i_size_read(inode) - 1);
+	data->args.bitmask = nfss->attr_bitmask;
+	data->res.server = nfss;
+
+	dprintk("<-- %s Status %d\n", __func__, result);
+	return result;
+}
 /* Callback operations for layout drivers.
  */
 struct pnfs_client_operations pnfs_ops = {
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 379aa18..6410617 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -35,6 +35,10 @@ void _pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
 void set_pnfs_layoutdriver(struct nfs_server *, u32 id);
 void unmount_pnfs_layoutdriver(struct nfs_server *);
 int pnfs_initialize(void);
+void pnfs_uninitialize(void);
+void pnfs_layoutcommit_free(struct pnfs_layoutcommit_data *data);
+void pnfs_update_last_write(struct nfs_inode *nfsi, loff_t offset, size_t extent);
+void pnfs_need_layoutcommit(struct nfs_inode *nfsi, struct nfs_open_context *ctx);
 void pnfs_get_layout_done(struct nfs4_pnfs_layoutget *, int rpc_status);
 int pnfs_layout_process(struct nfs4_pnfs_layoutget *lgp);
 void pnfs_layout_release(struct pnfs_layout_type *, struct nfs4_pnfs_layout_segment *range);
diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
index 1ed509c..482659f 100644
--- a/include/linux/nfs4_pnfs.h
+++ b/include/linux/nfs4_pnfs.h
@@ -54,6 +54,19 @@ PNFS_LD_IO_OPS(struct pnfs_layout_type *lo)
 	return PNFS_LD(lo)->ld_io_ops;
 }
 
+static inline bool
+has_layout(struct nfs_inode *nfsi)
+{
+	return nfsi->layout != NULL;
+}
+
+#else /* CONFIG_NFS_V4_1 */
+
+static inline bool
+has_layout(struct nfs_inode *nfsi)
+{
+	return false;
+}
 
 #endif /* CONFIG_NFS_V4_1 */
 
diff --git a/include/linux/pnfs_xdr.h b/include/linux/pnfs_xdr.h
index b85320d..4921778 100644
--- a/include/linux/pnfs_xdr.h
+++ b/include/linux/pnfs_xdr.h
@@ -55,6 +55,39 @@ struct nfs4_pnfs_layoutget {
 	int status;
 };
 
+struct pnfs_layoutcommit_arg {
+	nfs4_stateid stateid;
+	__u64 lastbytewritten;
+	__u32 time_modify_changed;
+	struct timespec time_modify;
+	const u32 *bitmask;
+	struct nfs_fh *fh;
+	struct inode *inode;
+
+	/* Values set by layout driver */
+	struct nfs4_pnfs_layout_segment lseg;
+	__u32 layout_type;
+	void *layoutdriver_data;
+	struct nfs4_sequence_args seq_args;
+};
+
+struct pnfs_layoutcommit_res {
+	__u32 sizechanged;
+	__u64 newsize;
+	struct nfs_fattr *fattr;
+	const struct nfs_server *server;
+	struct nfs4_sequence_res seq_res;
+};
+
+struct pnfs_layoutcommit_data {
+	struct rpc_task task;
+	struct rpc_cred *cred;
+	struct nfs_fattr fattr;
+	struct pnfs_layoutcommit_arg args;
+	struct pnfs_layoutcommit_res res;
+	int status;
+};
+
 struct nfs4_pnfs_getdeviceinfo_arg {
 	struct pnfs_device *pdev;
 	struct nfs4_sequence_args seq_args;
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 26/50] pnfs_submit: layoutcommit
  2010-08-13 21:31                                                 ` [PATCH 25/50] pnfs_submit: layoutcommit helper functions andros
@ 2010-08-13 21:31                                                   ` andros
  2010-08-13 21:31                                                     ` [PATCH 27/50] pnfs_submit: layoutreturn helper functions andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4proc.c         |  107 ++++++++++++++++++++++++++++++++++++
 fs/nfs/nfs4xdr.c          |  132 +++++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/pnfs.c             |   62 +++++++++++++++++++++
 fs/nfs/pnfs.h             |    8 +++
 fs/nfs/write.c            |   14 +++++-
 include/linux/nfs4.h      |    1 +
 include/linux/nfs4_pnfs.h |   13 +++++
 7 files changed, 336 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 279a37d..5299c66 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5480,6 +5480,113 @@ int pnfs4_proc_layoutget(struct nfs4_pnfs_layoutget *lgp)
 	return err;
 }
 
+static void pnfs_layoutcommit_prepare(struct rpc_task *task, void *data)
+{
+	struct pnfs_layoutcommit_data *ldata =
+		(struct pnfs_layoutcommit_data *)data;
+	struct nfs_server *server = NFS_SERVER(ldata->args.inode);
+
+	if (nfs4_setup_sequence(server, &ldata->args.seq_args,
+				&ldata->res.seq_res, 1, task))
+		return;
+	rpc_call_start(task);
+}
+
+static void
+pnfs_layoutcommit_done(struct rpc_task *task, void *calldata)
+{
+	struct pnfs_layoutcommit_data *data =
+		(struct pnfs_layoutcommit_data *)calldata;
+	struct nfs_server *server = NFS_SERVER(data->args.inode);
+
+	if (!nfs4_sequence_done(task, &data->res.seq_res))
+		return;
+
+	if (RPC_ASSASSINATED(task))
+		return;
+
+	if (nfs4_async_handle_error(task, server, NULL) == -EAGAIN)
+		nfs_restart_rpc(task, server->nfs_client);
+
+	data->status = task->tk_status;
+}
+
+static void pnfs_layoutcommit_release(void *lcdata)
+{
+	struct pnfs_layoutcommit_data *data =
+		(struct pnfs_layoutcommit_data *)lcdata;
+
+	/* Matched by get_layout in pnfs_layoutcommit_inode */
+	put_layout(data->args.inode);
+	put_rpccred(data->cred);
+	pnfs_layoutcommit_free(lcdata);
+}
+
+static const struct rpc_call_ops pnfs_layoutcommit_ops = {
+	.rpc_call_prepare = pnfs_layoutcommit_prepare,
+	.rpc_call_done = pnfs_layoutcommit_done,
+	.rpc_release = pnfs_layoutcommit_release,
+};
+
+/* Execute a layoutcommit to the server */
+static int
+_pnfs4_proc_layoutcommit(struct pnfs_layoutcommit_data *data, int issync)
+{
+	struct rpc_message msg = {
+		.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_PNFS_LAYOUTCOMMIT],
+		.rpc_argp = &data->args,
+		.rpc_resp = &data->res,
+		.rpc_cred = data->cred,
+	};
+	struct rpc_task_setup task_setup_data = {
+		.task = &data->task,
+		.rpc_client = NFS_CLIENT(data->args.inode),
+		.rpc_message = &msg,
+		.callback_ops = &pnfs_layoutcommit_ops,
+		.callback_data = data,
+		.flags = RPC_TASK_ASYNC,
+	};
+	struct rpc_task *task;
+	int status = 0;
+
+	dprintk("NFS: %4d initiating layoutcommit call. %llu@%llu lbw: %llu "
+		"type: %d issync %d\n",
+		data->task.tk_pid,
+		data->args.lseg.length,
+		data->args.lseg.offset,
+		data->args.lastbytewritten,
+		data->args.layout_type, issync);
+
+	data->res.seq_res.sr_slotid = NFS4_MAX_SLOT_TABLE;
+	task = rpc_run_task(&task_setup_data);
+	if (IS_ERR(task))
+		return PTR_ERR(task);
+	if (!issync)
+		goto out;
+	status = nfs4_wait_for_completion_rpc_task(task);
+	if (status != 0)
+		goto out;
+	status = data->status;
+out:
+	dprintk("%s: status %d\n", __func__, status);
+	rpc_put_task(task);
+	return 0;
+}
+
+int pnfs4_proc_layoutcommit(struct pnfs_layoutcommit_data *data, int issync)
+{
+	struct nfs4_exception exception = { };
+	struct nfs_server *server = NFS_SERVER(data->args.inode);
+	int err;
+
+	do {
+		err = nfs4_handle_exception(server,
+					_pnfs4_proc_layoutcommit(data, issync),
+					&exception);
+	} while (exception.retry);
+	return err;
+}
+
 int nfs4_pnfs_getdeviceinfo(struct nfs_server *server, struct pnfs_device *pdev)
 {
 	struct nfs4_pnfs_getdeviceinfo_arg args = {
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index a096e5b..c63e2fb 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -325,6 +325,11 @@ static int nfs4_stat_to_errno(int);
 #define decode_layoutget_maxsz	(op_decode_hdr_maxsz + 8 + \
 				decode_stateid_maxsz + \
 				XDR_QUADLEN(PNFS_LAYOUT_MAXSIZE))
+#define encode_layoutcommit_sz	(18 +                           \
+				XDR_QUADLEN(PNFS_LAYOUT_MAXSIZE) + \
+				op_encode_hdr_maxsz +          \
+				encode_stateid_maxsz)
+#define decode_layoutcommit_maxsz (3 + op_decode_hdr_maxsz)
 #else /* CONFIG_NFS_V4_1 */
 #define encode_sequence_maxsz	0
 #define decode_sequence_maxsz	0
@@ -728,6 +733,16 @@ static int nfs4_stat_to_errno(int);
 				decode_sequence_maxsz + \
 				decode_putfh_maxsz +        \
 				decode_layoutget_maxsz)
+#define NFS4_enc_layoutcommit_sz (compound_encode_hdr_maxsz + \
+				encode_sequence_maxsz +\
+				encode_putfh_maxsz + \
+				encode_layoutcommit_sz + \
+				encode_getattr_maxsz)
+#define NFS4_dec_layoutcommit_sz (compound_decode_hdr_maxsz + \
+				decode_sequence_maxsz + \
+				decode_putfh_maxsz + \
+				decode_layoutcommit_maxsz + \
+				decode_getattr_maxsz)
 
 const u32 nfs41_maxwrite_overhead = ((RPC_MAX_HEADER_WITH_AUTH +
 				      compound_encode_hdr_maxsz +
@@ -1809,6 +1824,44 @@ encode_layoutget(struct xdr_stream *xdr,
 	hdr->nops++;
 	hdr->replen += decode_layoutget_maxsz;
 }
+
+static int
+encode_layoutcommit(struct xdr_stream *xdr,
+		    const struct pnfs_layoutcommit_arg *args,
+		    struct compound_hdr *hdr)
+{
+	__be32 *p;
+
+	dprintk("%s: %llu@%llu lbw: %llu type: %d\n", __func__,
+		args->lseg.length, args->lseg.offset, args->lastbytewritten,
+		args->layout_type);
+
+	p = reserve_space(xdr, 40 + NFS4_STATEID_SIZE);
+	*p++ = cpu_to_be32(OP_LAYOUTCOMMIT);
+	p = xdr_encode_hyper(p, args->lseg.offset);
+	p = xdr_encode_hyper(p, args->lseg.length);
+	*p++ = cpu_to_be32(0);     /* reclaim */
+	p = xdr_encode_opaque_fixed(p, args->stateid.u.data, NFS4_STATEID_SIZE);
+	*p++ = cpu_to_be32(1);     /* newoffset = TRUE */
+	p = xdr_encode_hyper(p, args->lastbytewritten);
+	*p = cpu_to_be32(args->time_modify_changed != 0);
+	if (args->time_modify_changed) {
+		p = reserve_space(xdr, 12);
+		*p++ = cpu_to_be32(0);
+		*p++ = cpu_to_be32(args->time_modify.tv_sec);
+		*p = cpu_to_be32(args->time_modify.tv_nsec);
+	}
+
+	p = reserve_space(xdr, 4);
+	*p = cpu_to_be32(args->layout_type);
+
+	p = reserve_space(xdr, 4);
+	xdr_encode_opaque(p, NULL, 0);
+
+	hdr->nops++;
+	hdr->replen += decode_layoutcommit_maxsz;
+	return 0;
+}
 #endif /* CONFIG_NFS_V4_1 */
 
 /*
@@ -2679,6 +2732,27 @@ static int nfs4_xdr_enc_layoutget(struct rpc_rqst *req, uint32_t *p,
 	encode_nops(&hdr);
 	return 0;
 }
+
+/*
+ *  Encode LAYOUTCOMMIT request
+ */
+static int nfs4_xdr_enc_layoutcommit(struct rpc_rqst *req, uint32_t *p,
+				     struct pnfs_layoutcommit_arg *args)
+{
+	struct xdr_stream xdr;
+	struct compound_hdr hdr = {
+		.minorversion = nfs4_xdr_minorversion(&args->seq_args),
+	};
+
+	xdr_init_encode(&xdr, &req->rq_snd_buf, p);
+	encode_compound_hdr(&xdr, req, &hdr);
+	encode_sequence(&xdr, &args->seq_args, &hdr);
+	encode_putfh(&xdr, args->fh, &hdr);
+	encode_layoutcommit(&xdr, args, &hdr);
+	encode_getfattr(&xdr, args->bitmask, &hdr);
+	encode_nops(&hdr);
+	return 0;
+}
 #endif /* CONFIG_NFS_V4_1 */
 
 static void print_overflow_msg(const char *func, const struct xdr_stream *xdr)
@@ -5052,6 +5126,34 @@ out_overflow:
 	print_overflow_msg(__func__, xdr);
 	return -EIO;
 }
+
+static int decode_layoutcommit(struct xdr_stream *xdr,
+				    struct rpc_rqst *req,
+				    struct pnfs_layoutcommit_res *res)
+{
+	__be32 *p;
+	int status;
+
+	status = decode_op_hdr(xdr, OP_LAYOUTCOMMIT);
+	if (status)
+		return status;
+
+	p = xdr_inline_decode(xdr, 4);
+	if (unlikely(!p))
+		goto out_overflow;
+	res->sizechanged = be32_to_cpup(p);
+
+	if (res->sizechanged) {
+		p = xdr_inline_decode(xdr, 8);
+		if (unlikely(!p))
+			goto out_overflow;
+		xdr_decode_hyper(p, &res->newsize);
+	}
+	return 0;
+out_overflow:
+	print_overflow_msg(__func__, xdr);
+	return -EIO;
+}
 #endif /* CONFIG_NFS_V4_1 */
 
 /*
@@ -6128,6 +6230,35 @@ static int nfs4_xdr_dec_layoutget(struct rpc_rqst *rqstp, uint32_t *p,
 out:
 	return status;
 }
+
+/*
+ * Decode LAYOUTCOMMIT response
+ */
+static int nfs4_xdr_dec_layoutcommit(struct rpc_rqst *rqstp, uint32_t *p,
+				     struct pnfs_layoutcommit_res *res)
+{
+	struct xdr_stream xdr;
+	struct compound_hdr hdr;
+	int status;
+
+	xdr_init_decode(&xdr, &rqstp->rq_rcv_buf, p);
+	status = decode_compound_hdr(&xdr, &hdr);
+	if (status)
+		goto out;
+	status = decode_sequence(&xdr, &res->seq_res, rqstp);
+	if (status)
+		goto out;
+	status = decode_putfh(&xdr);
+	if (status)
+		goto out;
+	status = decode_layoutcommit(&xdr, rqstp, res);
+	if (status)
+		goto out;
+	decode_getfattr(&xdr, res->fattr, res->server,
+			!RPC_IS_ASYNC(rqstp->rq_task));
+out:
+	return status;
+}
 #endif /* CONFIG_NFS_V4_1 */
 
 __be32 *nfs4_decode_dirent(__be32 *p, struct nfs_entry *entry, int plus)
@@ -6308,6 +6439,7 @@ struct rpc_procinfo	nfs4_procedures[] = {
   PROC(RECLAIM_COMPLETE, enc_reclaim_complete,  dec_reclaim_complete),
   PROC(PNFS_GETDEVICEINFO, enc_getdeviceinfo, dec_getdeviceinfo),
   PROC(PNFS_LAYOUTGET,  enc_layoutget,     dec_layoutget),
+  PROC(PNFS_LAYOUTCOMMIT, enc_layoutcommit,  dec_layoutcommit),
 #endif /* CONFIG_NFS_V4_1 */
 };
 
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 96d379d..d0a6320 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1040,6 +1040,68 @@ pnfs_layoutcommit_setup(struct inode *inode,
 	dprintk("<-- %s Status %d\n", __func__, result);
 	return result;
 }
+
+/* Issue a async layoutcommit for an inode.
+ */
+int
+pnfs_layoutcommit_inode(struct inode *inode, int sync)
+{
+	struct pnfs_layoutcommit_data *data;
+	struct nfs_inode *nfsi = NFS_I(inode);
+	loff_t write_begin_pos;
+	loff_t write_end_pos;
+
+	int status = 0;
+
+	dprintk("%s Begin (sync:%d)\n", __func__, sync);
+
+	BUG_ON(!has_layout(nfsi));
+
+	data = pnfs_layoutcommit_alloc();
+	if (!data)
+		return -ENOMEM;
+
+	spin_lock(&inode->i_lock);
+	if (!layoutcommit_needed(nfsi)) {
+		spin_unlock(&inode->i_lock);
+		goto out_free;
+	}
+
+	/* Clear layoutcommit properties in the inode so
+	 * new lc info can be generated
+	 */
+	write_begin_pos = nfsi->layout->pnfs_write_begin_pos;
+	write_end_pos = nfsi->layout->pnfs_write_end_pos;
+	data->cred = nfsi->layout->lo_cred;
+	nfsi->layout->pnfs_write_begin_pos = 0;
+	nfsi->layout->pnfs_write_end_pos = 0;
+	nfsi->layout->lo_cred = NULL;
+	__clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout->pnfs_layout_state);
+	pnfs_get_layout_stateid(&data->args.stateid, nfsi->layout);
+
+	/* Reference for layoutcommit matched in pnfs_layoutcommit_release */
+	get_layout(NFS_I(inode)->layout);
+
+	spin_unlock(&inode->i_lock);
+
+	/* Set up layout commit args */
+	status = pnfs_layoutcommit_setup(inode, data, write_begin_pos,
+					 write_end_pos);
+	if (status) {
+		/* The layout driver failed to setup the layoutcommit */
+		put_rpccred(data->cred);
+		put_layout(inode);
+		goto out_free;
+	}
+	status = pnfs4_proc_layoutcommit(data, sync);
+out:
+	dprintk("%s end (err:%d)\n", __func__, status);
+	return status;
+out_free:
+	pnfs_layoutcommit_free(data);
+	goto out;
+}
+
 /* Callback operations for layout drivers.
  */
 struct pnfs_client_operations pnfs_ops = {
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 6410617..a1648c0 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -25,6 +25,8 @@
 extern int nfs4_pnfs_getdeviceinfo(struct nfs_server *server,
 				   struct pnfs_device *dev);
 extern int pnfs4_proc_layoutget(struct nfs4_pnfs_layoutget *lgp);
+extern int pnfs4_proc_layoutcommit(struct pnfs_layoutcommit_data *data,
+				   int issync);
 
 /* pnfs.c */
 void put_lseg(struct pnfs_layout_segment *lseg);
@@ -37,6 +39,7 @@ void unmount_pnfs_layoutdriver(struct nfs_server *);
 int pnfs_initialize(void);
 void pnfs_uninitialize(void);
 void pnfs_layoutcommit_free(struct pnfs_layoutcommit_data *data);
+int pnfs_layoutcommit_inode(struct inode *inode, int sync);
 void pnfs_update_last_write(struct nfs_inode *nfsi, loff_t offset, size_t extent);
 void pnfs_need_layoutcommit(struct nfs_inode *nfsi, struct nfs_open_context *ctx);
 void pnfs_get_layout_done(struct nfs4_pnfs_layoutget *, int rpc_status);
@@ -114,6 +117,11 @@ pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
 		*lsegpp = NULL;
 }
 
+static inline int pnfs_layoutcommit_inode(struct inode *inode, int sync)
+{
+	return 0;
+}
+
 #endif /* CONFIG_NFS_V4_1 */
 
 #endif /* FS_NFS_PNFS_H */
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 874972d..4d37229 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -28,6 +28,7 @@
 #include "iostat.h"
 #include "nfs4_fs.h"
 #include "fscache.h"
+#include "pnfs.h"
 
 #define NFSDBG_FACILITY		NFSDBG_PAGECACHE
 
@@ -1465,7 +1466,18 @@ static int nfs_commit_unstable_pages(struct inode *inode, struct writeback_contr
 
 int nfs_write_inode(struct inode *inode, struct writeback_control *wbc)
 {
-	return nfs_commit_unstable_pages(inode, wbc);
+	int ret;
+	ret = nfs_commit_unstable_pages(inode, wbc);
+	if (ret >= 0 && layoutcommit_needed(NFS_I(inode))) {
+		int err, sync = wbc->sync_mode;
+
+		if (wbc->nonblocking || wbc->for_background)
+			sync = 0;
+		err = pnfs_layoutcommit_inode(inode, sync);
+		if (err < 0)
+			ret = err;
+	}
+	return ret;
 }
 
 /*
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 2e11a3d..4c4c4cc 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -547,6 +547,7 @@ enum {
 	NFSPROC4_CLNT_GET_LEASE_TIME,
 	NFSPROC4_CLNT_RECLAIM_COMPLETE,
 	NFSPROC4_CLNT_PNFS_LAYOUTGET,
+	NFSPROC4_CLNT_PNFS_LAYOUTCOMMIT,
 	NFSPROC4_CLNT_PNFS_GETDEVICEINFO,
 };
 
diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
index 482659f..52f7a21 100644
--- a/include/linux/nfs4_pnfs.h
+++ b/include/linux/nfs4_pnfs.h
@@ -60,6 +60,13 @@ has_layout(struct nfs_inode *nfsi)
 	return nfsi->layout != NULL;
 }
 
+static inline bool
+layoutcommit_needed(struct nfs_inode *nfsi)
+{
+	return has_layout(nfsi) &&
+	       test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->layout->pnfs_layout_state);
+}
+
 #else /* CONFIG_NFS_V4_1 */
 
 static inline bool
@@ -68,6 +75,12 @@ has_layout(struct nfs_inode *nfsi)
 	return false;
 }
 
+static inline bool
+layoutcommit_needed(struct nfs_inode *nfsi)
+{
+	return 0;
+}
+
 #endif /* CONFIG_NFS_V4_1 */
 
 struct pnfs_layout_segment {
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 27/50] pnfs_submit: layoutreturn helper functions
  2010-08-13 21:31                                                   ` [PATCH 26/50] pnfs_submit: layoutcommit andros
@ 2010-08-13 21:31                                                     ` andros
  2010-08-13 21:31                                                       ` [PATCH 28/50] pnfs_submit: layoutreturn andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/pnfs.c        |  124 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/nfs4.h |    7 +++
 2 files changed, 131 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index d0a6320..2ea3cbd 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -581,6 +581,25 @@ should_free_lseg(struct pnfs_layout_segment *lseg,
 		lseg->range.iomode == range->iomode);
 }
 
+static struct pnfs_layout_segment *
+has_layout_to_return(struct pnfs_layout_type *lo,
+		     struct nfs4_pnfs_layout_segment *range)
+{
+	struct pnfs_layout_segment *out = NULL, *lseg;
+	dprintk("%s:Begin lo %p offset %llu length %llu iomode %d\n",
+		__func__, lo, range->offset, range->length, range->iomode);
+
+	BUG_ON_UNLOCKED_LO(lo);
+	list_for_each_entry (lseg, &lo->segs, fi_list)
+		if (should_free_lseg(lseg, range)) {
+			out = lseg;
+			break;
+		}
+
+	dprintk("%s:Return lseg=%p\n", __func__, out);
+	return out;
+}
+
 static inline bool
 _pnfs_can_return_lseg(struct pnfs_layout_segment *lseg)
 {
@@ -621,6 +640,111 @@ pnfs_free_layout(struct pnfs_layout_type *lo,
 	dprintk("%s:Return\n", __func__);
 }
 
+static bool
+pnfs_return_layout_barrier(struct nfs_inode *nfsi,
+			   struct nfs4_pnfs_layout_segment *range)
+{
+	struct pnfs_layout_segment *lseg;
+	bool ret = false;
+
+	spin_lock(&nfsi->vfs_inode.i_lock);
+	list_for_each_entry(lseg, &nfsi->layout->segs, fi_list) {
+		if (!should_free_lseg(lseg, range))
+			continue;
+		lseg->valid = false;
+		if (!_pnfs_can_return_lseg(lseg)) {
+			dprintk("%s: wait on lseg %p refcount %d\n",
+				__func__, lseg,
+				atomic_read(&lseg->kref.refcount));
+			ret = true;
+		}
+	}
+	spin_unlock(&nfsi->vfs_inode.i_lock);
+	dprintk("%s:Return %d\n", __func__, ret);
+	return ret;
+}
+
+static int
+return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
+	      enum pnfs_layoutreturn_type type, struct pnfs_layout_type *lo,
+	      bool wait)
+{
+	return 0;
+}
+
+int
+_pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
+		    const nfs4_stateid *stateid, /* optional */
+		    enum pnfs_layoutreturn_type type,
+		    bool wait)
+{
+	struct pnfs_layout_type *lo = NULL;
+	struct nfs_inode *nfsi = NFS_I(ino);
+	struct nfs4_pnfs_layout_segment arg;
+	int status = 0;
+
+	dprintk("--> %s type %d\n", __func__, type);
+
+
+	arg.iomode = range ? range->iomode : IOMODE_ANY;
+	arg.offset = 0;
+	arg.length = NFS4_MAX_UINT64;
+
+	if (type == RETURN_FILE) {
+		spin_lock(&ino->i_lock);
+		lo = nfsi->layout;
+		if (lo && !has_layout_to_return(lo, &arg)) {
+			lo = NULL;
+		}
+		if (!lo) {
+			spin_unlock(&ino->i_lock);
+			dprintk("%s: no layout segments to return\n", __func__);
+			goto out;
+		}
+
+		/* Reference for layoutreturn matched in pnfs_layout_release */
+		get_layout(lo);
+
+		spin_unlock(&ino->i_lock);
+
+		if (pnfs_return_layout_barrier(nfsi, &arg)) {
+			if (stateid) { /* callback */
+				status = -EAGAIN;
+				goto out_put;
+			}
+			dprintk("%s: waiting\n", __func__);
+			wait_event(nfsi->lo_waitq,
+				   !pnfs_return_layout_barrier(nfsi, &arg));
+		}
+
+		if (layoutcommit_needed(nfsi)) {
+			if (stateid && !wait) { /* callback */
+				dprintk("%s: layoutcommit pending\n", __func__);
+				status = -EAGAIN;
+				goto out_put;
+			}
+			status = pnfs_layoutcommit_inode(ino, wait);
+			if (status) {
+				/* Return layout even if layoutcommit fails */
+				dprintk("%s: layoutcommit failed, status=%d. "
+					"Returning layout anyway\n",
+					__func__, status);
+			}
+		}
+
+		if (!stateid)
+			status = return_layout(ino, &arg, type, lo, wait);
+		else
+			pnfs_layout_release(lo, &arg);
+	}
+out:
+	dprintk("<-- %s status: %d\n", __func__, status);
+	return status;
+out_put:
+	put_layout(ino);
+	goto out;
+}
+
 /*
  * cmp two layout segments for sorting into layout cache
  */
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 4c4c4cc..f0cf013 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -571,6 +571,13 @@ enum pnfs_layouttype {
 	LAYOUT_NFSV4_1_FILES  = 1,
 };
 
+/* used for both layout return and recall */
+enum pnfs_layoutreturn_type {
+	RETURN_FILE = 1,
+	RETURN_FSID = 2,
+	RETURN_ALL  = 3
+};
+
 enum pnfs_iomode {
 	IOMODE_READ = 1,
 	IOMODE_RW = 2,
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 28/50] pnfs_submit: layoutreturn
  2010-08-13 21:31                                                     ` [PATCH 27/50] pnfs_submit: layoutreturn helper functions andros
@ 2010-08-13 21:31                                                       ` andros
  2010-08-13 21:31                                                         ` [PATCH 29/50] pnfs_submit: add data server session to nfs4_setup_sequence andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/inode.c           |    3 +-
 fs/nfs/nfs4proc.c        |  106 ++++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/nfs4state.c       |   20 ++++++++-
 fs/nfs/nfs4xdr.c         |  109 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/pnfs.c            |   25 ++++++++++-
 fs/nfs/pnfs.h            |   49 +++++++++++++++++++++
 include/linux/nfs4.h     |    1 +
 include/linux/pnfs_xdr.h |   22 +++++++++
 8 files changed, 332 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 97fb2d1..0360336 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1403,9 +1403,10 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
  */
 void nfs4_clear_inode(struct inode *inode)
 {
+	pnfs_return_layout(inode, NULL, NULL, RETURN_FILE, true);
+
 	/* If we are holding a delegation, return it! */
 	nfs_inode_return_delegation_noreclaim(inode);
-	/* First call standard NFS clear_inode() code */
 	nfs_clear_inode(inode);
 }
 #endif
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 5299c66..2f00a67 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5587,6 +5587,112 @@ int pnfs4_proc_layoutcommit(struct pnfs_layoutcommit_data *data, int issync)
 	return err;
 }
 
+static void
+nfs4_pnfs_layoutreturn_prepare(struct rpc_task *task, void *calldata)
+{
+	struct nfs4_pnfs_layoutreturn *lrp = calldata;
+	struct inode *ino = lrp->args.inode;
+	struct nfs_server *server = NFS_SERVER(ino);
+
+	dprintk("--> %s\n", __func__);
+	if (nfs4_setup_sequence(server, &lrp->args.seq_args,
+				&lrp->res.seq_res, 0, task))
+		return;
+	rpc_call_start(task);
+}
+
+static void nfs4_pnfs_layoutreturn_done(struct rpc_task *task, void *calldata)
+{
+	struct nfs4_pnfs_layoutreturn *lrp = calldata;
+	struct inode *ino = lrp->args.inode;
+	struct nfs_server *server = NFS_SERVER(ino);
+
+	dprintk("--> %s\n", __func__);
+
+	if (!nfs4_sequence_done(task, &lrp->res.seq_res))
+		return;
+
+	if (RPC_ASSASSINATED(task))
+		return;
+
+	if (nfs4_async_handle_error(task, server, NULL) == -EAGAIN)
+		nfs_restart_rpc(task, server->nfs_client);
+
+	dprintk("<-- %s\n", __func__);
+}
+
+static void nfs4_pnfs_layoutreturn_release(void *calldata)
+{
+	struct nfs4_pnfs_layoutreturn *lrp = calldata;
+	struct pnfs_layout_type *lo = NFS_I(lrp->args.inode)->layout;
+
+	dprintk("--> %s return_type %d lo %p\n", __func__,
+		lrp->args.return_type, lo);
+
+	if (lrp->args.return_type == RETURN_FILE) {
+		if (!lrp->res.lrs_present)
+			pnfs_set_layout_stateid(lo, &zero_stateid);
+		pnfs_layout_release(lo, &lrp->args.lseg);
+	}
+	kfree(calldata);
+	dprintk("<-- %s\n", __func__);
+}
+
+static const struct rpc_call_ops nfs4_pnfs_layoutreturn_call_ops = {
+	.rpc_call_prepare = nfs4_pnfs_layoutreturn_prepare,
+	.rpc_call_done = nfs4_pnfs_layoutreturn_done,
+	.rpc_release = nfs4_pnfs_layoutreturn_release,
+};
+
+int _pnfs4_proc_layoutreturn(struct nfs4_pnfs_layoutreturn *lrp, bool issync)
+{
+	struct inode *ino = lrp->args.inode;
+	struct nfs_server *server = NFS_SERVER(ino);
+	struct rpc_task *task;
+	struct rpc_message msg = {
+		.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_PNFS_LAYOUTRETURN],
+		.rpc_argp = &lrp->args,
+		.rpc_resp = &lrp->res,
+	};
+	struct rpc_task_setup task_setup_data = {
+		.rpc_client = server->client,
+		.rpc_message = &msg,
+		.callback_ops = &nfs4_pnfs_layoutreturn_call_ops,
+		.callback_data = lrp,
+		.flags = RPC_TASK_ASYNC,
+	};
+	int status = 0;
+
+	dprintk("--> %s\n", __func__);
+	lrp->res.seq_res.sr_slotid = NFS4_MAX_SLOT_TABLE;
+	task = rpc_run_task(&task_setup_data);
+	if (IS_ERR(task))
+		return PTR_ERR(task);
+	if (!issync)
+		goto out;
+	status = nfs4_wait_for_completion_rpc_task(task);
+	if (status != 0)
+		goto out;
+	status = task->tk_status;
+out:
+	dprintk("<-- %s\n", __func__);
+	rpc_put_task(task);
+	return status;
+}
+
+int pnfs4_proc_layoutreturn(struct nfs4_pnfs_layoutreturn *lrp, bool issync)
+{
+	struct nfs_server *server = NFS_SERVER(lrp->args.inode);
+	struct nfs4_exception exception = { };
+	int err;
+	do {
+		err = nfs4_handle_exception(server,
+				_pnfs4_proc_layoutreturn(lrp, issync),
+				&exception);
+	} while (exception.retry);
+	return err;
+}
+
 int nfs4_pnfs_getdeviceinfo(struct nfs_server *server, struct pnfs_device *pdev)
 {
 	struct nfs4_pnfs_getdeviceinfo_arg args = {
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 506a92f..a674452 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -53,6 +53,8 @@
 #include "callback.h"
 #include "delegation.h"
 #include "internal.h"
+#include <linux/pnfs_xdr.h>
+#include <linux/nfs4_pnfs.h>
 #include "pnfs.h"
 
 #define OPENOWNER_POOL_SIZE	8
@@ -584,8 +586,24 @@ static void __nfs4_close(struct path *path, struct nfs4_state *state,
 	if (!call_close) {
 		nfs4_put_open_state(state);
 		nfs4_put_state_owner(owner);
-	} else
+	} else {
+		u32 roc_iomode;
+		struct nfs_inode *nfsi = NFS_I(state->inode);
+
+		if (has_layout(nfsi) &&
+		    (roc_iomode = pnfs_layout_roc_iomode(nfsi)) != 0) {
+			struct nfs4_pnfs_layout_segment range = {
+				.iomode = roc_iomode,
+				.offset = 0,
+				.length = NFS4_MAX_UINT64,
+			};
+
+			pnfs_return_layout(state->inode, &range, NULL,
+					   RETURN_FILE, wait);
+		}
+
 		nfs4_do_close(path, state, gfp_mask, wait);
+	}
 }
 
 void nfs4_close_state(struct path *path, struct nfs4_state *state, fmode_t fmode)
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index c63e2fb..82a3412 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -330,6 +330,12 @@ static int nfs4_stat_to_errno(int);
 				op_encode_hdr_maxsz +          \
 				encode_stateid_maxsz)
 #define decode_layoutcommit_maxsz (3 + op_decode_hdr_maxsz)
+#define encode_layoutreturn_sz	(8 + op_encode_hdr_maxsz + \
+				encode_stateid_maxsz + \
+				1 /* FIXME: opaque lrf_body always empty at
+				   *the moment */)
+#define decode_layoutreturn_maxsz (op_decode_hdr_maxsz + \
+				1 + decode_stateid_maxsz)
 #else /* CONFIG_NFS_V4_1 */
 #define encode_sequence_maxsz	0
 #define decode_sequence_maxsz	0
@@ -743,6 +749,14 @@ static int nfs4_stat_to_errno(int);
 				decode_putfh_maxsz + \
 				decode_layoutcommit_maxsz + \
 				decode_getattr_maxsz)
+#define NFS4_enc_layoutreturn_sz (compound_encode_hdr_maxsz + \
+				encode_sequence_maxsz + \
+				encode_putfh_maxsz + \
+				encode_layoutreturn_sz)
+#define NFS4_dec_layoutreturn_sz (compound_decode_hdr_maxsz + \
+				decode_sequence_maxsz + \
+				decode_putfh_maxsz + \
+				decode_layoutreturn_maxsz)
 
 const u32 nfs41_maxwrite_overhead = ((RPC_MAX_HEADER_WITH_AUTH +
 				      compound_encode_hdr_maxsz +
@@ -1862,6 +1876,34 @@ encode_layoutcommit(struct xdr_stream *xdr,
 	hdr->replen += decode_layoutcommit_maxsz;
 	return 0;
 }
+
+static void
+encode_layoutreturn(struct xdr_stream *xdr,
+		    const struct nfs4_pnfs_layoutreturn_arg *args,
+		    struct compound_hdr *hdr)
+{
+	nfs4_stateid stateid;
+	__be32 *p;
+
+	p = reserve_space(xdr, 20);
+	*p++ = cpu_to_be32(OP_LAYOUTRETURN);
+	*p++ = cpu_to_be32(args->reclaim);
+	*p++ = cpu_to_be32(args->layout_type);
+	*p++ = cpu_to_be32(args->lseg.iomode);
+	*p = cpu_to_be32(args->return_type);
+	if (args->return_type == RETURN_FILE) {
+		p = reserve_space(xdr, 16 + NFS4_STATEID_SIZE);
+		p = xdr_encode_hyper(p, args->lseg.offset);
+		p = xdr_encode_hyper(p, args->lseg.length);
+		pnfs_get_layout_stateid(&stateid, NFS_I(args->inode)->layout);
+		p = xdr_encode_opaque_fixed(p, &stateid.u.data,
+					    NFS4_STATEID_SIZE);
+		p = reserve_space(xdr, 4);
+		*p = cpu_to_be32(0);
+	}
+	hdr->nops++;
+	hdr->replen += decode_layoutreturn_maxsz;
+}
 #endif /* CONFIG_NFS_V4_1 */
 
 /*
@@ -2753,6 +2795,26 @@ static int nfs4_xdr_enc_layoutcommit(struct rpc_rqst *req, uint32_t *p,
 	encode_nops(&hdr);
 	return 0;
 }
+
+/*
+ * Encode LAYOUTRETURN request
+ */
+static int nfs4_xdr_enc_layoutreturn(struct rpc_rqst *req, uint32_t *p,
+				     struct nfs4_pnfs_layoutreturn_arg *args)
+{
+	struct xdr_stream xdr;
+	struct compound_hdr hdr = {
+		.minorversion = nfs4_xdr_minorversion(&args->seq_args),
+	};
+
+	xdr_init_encode(&xdr, &req->rq_snd_buf, p);
+	encode_compound_hdr(&xdr, req, &hdr);
+	encode_sequence(&xdr, &args->seq_args, &hdr);
+	encode_putfh(&xdr, NFS_FH(args->inode), &hdr);
+	encode_layoutreturn(&xdr, args, &hdr);
+	encode_nops(&hdr);
+	return 0;
+}
 #endif /* CONFIG_NFS_V4_1 */
 
 static void print_overflow_msg(const char *func, const struct xdr_stream *xdr)
@@ -5127,6 +5189,27 @@ out_overflow:
 	return -EIO;
 }
 
+static int decode_layoutreturn(struct xdr_stream *xdr,
+			       struct nfs4_pnfs_layoutreturn_res *res)
+{
+	__be32 *p;
+	int status;
+
+	status = decode_op_hdr(xdr, OP_LAYOUTRETURN);
+	if (status)
+		return status;
+	p = xdr_inline_decode(xdr, 4);
+	if (unlikely(!p))
+		goto out_overflow;
+	res->lrs_present = be32_to_cpup(p);
+	if (res->lrs_present)
+		status = decode_stateid(xdr, &res->stateid);
+	return status;
+out_overflow:
+	print_overflow_msg(__func__, xdr);
+	return -EIO;
+}
+
 static int decode_layoutcommit(struct xdr_stream *xdr,
 				    struct rpc_rqst *req,
 				    struct pnfs_layoutcommit_res *res)
@@ -6232,6 +6315,31 @@ out:
 }
 
 /*
+ * Decode LAYOUTRETURN response
+ */
+static int nfs4_xdr_dec_layoutreturn(struct rpc_rqst *rqstp, uint32_t *p,
+				     struct nfs4_pnfs_layoutreturn_res *res)
+{
+	struct xdr_stream xdr;
+	struct compound_hdr hdr;
+	int status;
+
+	xdr_init_decode(&xdr, &rqstp->rq_rcv_buf, p);
+	status = decode_compound_hdr(&xdr, &hdr);
+	if (status)
+		goto out;
+	status = decode_sequence(&xdr, &res->seq_res, rqstp);
+	if (status)
+		goto out;
+	status = decode_putfh(&xdr);
+	if (status)
+		goto out;
+	status = decode_layoutreturn(&xdr, res);
+out:
+	return status;
+}
+
+/*
  * Decode LAYOUTCOMMIT response
  */
 static int nfs4_xdr_dec_layoutcommit(struct rpc_rqst *rqstp, uint32_t *p,
@@ -6440,6 +6548,7 @@ struct rpc_procinfo	nfs4_procedures[] = {
   PROC(PNFS_GETDEVICEINFO, enc_getdeviceinfo, dec_getdeviceinfo),
   PROC(PNFS_LAYOUTGET,  enc_layoutget,     dec_layoutget),
   PROC(PNFS_LAYOUTCOMMIT, enc_layoutcommit,  dec_layoutcommit),
+  PROC(PNFS_LAYOUTRETURN, enc_layoutreturn,  dec_layoutreturn),
 #endif /* CONFIG_NFS_V4_1 */
 };
 
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 2ea3cbd..9f17ef5 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -669,7 +669,30 @@ return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
 	      enum pnfs_layoutreturn_type type, struct pnfs_layout_type *lo,
 	      bool wait)
 {
-	return 0;
+	struct nfs4_pnfs_layoutreturn *lrp;
+	struct nfs_server *server = NFS_SERVER(ino);
+	int status = -ENOMEM;
+
+	dprintk("--> %s\n", __func__);
+
+	BUG_ON(type != RETURN_FILE);
+
+	lrp = kzalloc(sizeof(*lrp), GFP_KERNEL);
+	if (lrp == NULL) {
+		if (lo && (type == RETURN_FILE))
+			pnfs_layout_release(lo, NULL);
+		goto out;
+	}
+	lrp->args.reclaim = 0;
+	lrp->args.layout_type = server->pnfs_curr_ld->id;
+	lrp->args.return_type = type;
+	lrp->args.lseg = *range;
+	lrp->args.inode = ino;
+
+	status = pnfs4_proc_layoutreturn(lrp, wait);
+out:
+	dprintk("<-- %s status: %d\n", __func__, status);
+	return status;
 }
 
 int
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index a1648c0..74b9a70 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -27,6 +27,7 @@ extern int nfs4_pnfs_getdeviceinfo(struct nfs_server *server,
 extern int pnfs4_proc_layoutget(struct nfs4_pnfs_layoutget *lgp);
 extern int pnfs4_proc_layoutcommit(struct pnfs_layoutcommit_data *data,
 				   int issync);
+extern int pnfs4_proc_layoutreturn(struct nfs4_pnfs_layoutreturn *lrp, bool wait);
 
 /* pnfs.c */
 void put_lseg(struct pnfs_layout_segment *lseg);
@@ -34,6 +35,9 @@ void _pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
 	enum pnfs_iomode access_type,
 	struct pnfs_layout_segment **lsegpp);
 
+int _pnfs_return_layout(struct inode *, struct nfs4_pnfs_layout_segment *,
+			const nfs4_stateid *stateid, /* optional */
+			enum pnfs_layoutreturn_type, bool wait);
 void set_pnfs_layoutdriver(struct nfs_server *, u32 id);
 void unmount_pnfs_layoutdriver(struct nfs_server *);
 int pnfs_initialize(void);
@@ -75,6 +79,30 @@ static inline int pnfs_enabled_sb(struct nfs_server *nfss)
 	return nfss->pnfs_curr_ld != NULL;
 }
 
+/* Should the pNFS client commit and return the layout on close
+ */
+static inline int
+pnfs_layout_roc_iomode(struct nfs_inode *nfsi)
+{
+	return nfsi->layout->roc_iomode;
+}
+
+static inline int pnfs_return_layout(struct inode *ino,
+				     struct nfs4_pnfs_layout_segment *lseg,
+				     const nfs4_stateid *stateid, /* optional */
+				     enum pnfs_layoutreturn_type type,
+				     bool wait)
+{
+	struct nfs_inode *nfsi = NFS_I(ino);
+	struct nfs_server *nfss = NFS_SERVER(ino);
+
+	if (pnfs_enabled_sb(nfss) &&
+	    (type != RETURN_FILE || has_layout(nfsi)))
+		return _pnfs_return_layout(ino, lseg, stateid, type, wait);
+
+	return 0;
+}
+
 static inline void pnfs_update_layout(struct inode *ino,
 	struct nfs_open_context *ctx,
 	enum pnfs_iomode access_type,
@@ -122,6 +150,27 @@ static inline int pnfs_layoutcommit_inode(struct inode *inode, int sync)
 	return 0;
 }
 
+static inline bool
+pnfs_ld_layoutret_on_setattr(struct inode *inode)
+{
+	return false;
+}
+
+static inline int
+pnfs_layout_roc_iomode(struct nfs_inode *nfsi)
+{
+	return 0;
+}
+
+static inline int pnfs_return_layout(struct inode *ino,
+				     struct nfs4_pnfs_layout_segment *lseg,
+				     const nfs4_stateid *stateid, /* optional */
+				     enum pnfs_layoutreturn_type type,
+				     bool wait)
+{
+	return 0;
+}
+
 #endif /* CONFIG_NFS_V4_1 */
 
 #endif /* FS_NFS_PNFS_H */
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index f0cf013..004a867 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -548,6 +548,7 @@ enum {
 	NFSPROC4_CLNT_RECLAIM_COMPLETE,
 	NFSPROC4_CLNT_PNFS_LAYOUTGET,
 	NFSPROC4_CLNT_PNFS_LAYOUTCOMMIT,
+	NFSPROC4_CLNT_PNFS_LAYOUTRETURN,
 	NFSPROC4_CLNT_PNFS_GETDEVICEINFO,
 };
 
diff --git a/include/linux/pnfs_xdr.h b/include/linux/pnfs_xdr.h
index 4921778..ed16c65 100644
--- a/include/linux/pnfs_xdr.h
+++ b/include/linux/pnfs_xdr.h
@@ -88,6 +88,28 @@ struct pnfs_layoutcommit_data {
 	int status;
 };
 
+struct nfs4_pnfs_layoutreturn_arg {
+	__u32   reclaim;
+	__u32   layout_type;
+	__u32   return_type;
+	struct nfs4_pnfs_layout_segment lseg;
+	struct inode *inode;
+	struct nfs4_sequence_args seq_args;
+};
+
+struct nfs4_pnfs_layoutreturn_res {
+	struct nfs4_sequence_res seq_res;
+	u32 lrs_present;
+	nfs4_stateid stateid;
+};
+
+struct nfs4_pnfs_layoutreturn {
+	struct nfs4_pnfs_layoutreturn_arg args;
+	struct nfs4_pnfs_layoutreturn_res res;
+	struct rpc_cred *cred;
+	int rpc_status;
+};
+
 struct nfs4_pnfs_getdeviceinfo_arg {
 	struct pnfs_device *pdev;
 	struct nfs4_sequence_args seq_args;
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 29/50] pnfs_submit: add data server session to nfs4_setup_sequence
  2010-08-13 21:31                                                       ` [PATCH 28/50] pnfs_submit: layoutreturn andros
@ 2010-08-13 21:31                                                         ` andros
  2010-08-13 21:31                                                           ` [PATCH 30/50] pnfs_submit: update nfs4_async_handle_error for data server andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Note: original patch of same name...

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off_by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/nfs4_fs.h  |    3 +++
 fs/nfs/nfs4proc.c |   21 ++++++++++++---------
 fs/nfs/read.c     |    2 +-
 fs/nfs/unlink.c   |    2 +-
 fs/nfs/write.c    |    2 +-
 5 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 311e15c..ef70bef 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -256,10 +256,12 @@ static inline struct nfs4_session *nfs4_get_session(const struct nfs_server *ser
 }
 
 extern int nfs4_setup_sequence(const struct nfs_server *server,
+		struct nfs4_session *ds_session,
 		struct nfs4_sequence_args *args, struct nfs4_sequence_res *res,
 		int cache_reply, struct rpc_task *task);
 extern void nfs4_destroy_session(struct nfs4_session *session);
 extern struct nfs4_session *nfs4_alloc_session(struct nfs_client *clp);
+extern int nfs4_proc_exchange_id(struct nfs_client *, struct rpc_cred *);
 extern int nfs4_proc_create_session(struct nfs_client *);
 extern int nfs4_proc_destroy_session(struct nfs4_session *);
 extern int nfs4_init_session(struct nfs_server *server);
@@ -272,6 +274,7 @@ static inline struct nfs4_session *nfs4_get_session(const struct nfs_server *ser
 }
 
 static inline int nfs4_setup_sequence(const struct nfs_server *server,
+		struct nfs4_session *ds_session,
 		struct nfs4_sequence_args *args, struct nfs4_sequence_res *res,
 		int cache_reply, struct rpc_task *task)
 {
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 2f00a67..e8b1620 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -570,6 +570,7 @@ static int nfs41_setup_sequence(struct nfs4_session *session,
 }
 
 int nfs4_setup_sequence(const struct nfs_server *server,
+		struct nfs4_session *ds_session,
 			struct nfs4_sequence_args *args,
 			struct nfs4_sequence_res *res,
 			int cache_reply,
@@ -578,6 +579,8 @@ int nfs4_setup_sequence(const struct nfs_server *server,
 	struct nfs4_session *session = nfs4_get_session(server);
 	int ret = 0;
 
+	if (ds_session)
+		session = ds_session;
 	if (session == NULL) {
 		args->sa_session = NULL;
 		res->sr_session = NULL;
@@ -607,7 +610,7 @@ static void nfs41_call_sync_prepare(struct rpc_task *task, void *calldata)
 
 	dprintk("--> %s data->seq_server %p\n", __func__, data->seq_server);
 
-	if (nfs4_setup_sequence(data->seq_server, data->seq_args,
+	if (nfs4_setup_sequence(data->seq_server, NULL, data->seq_args,
 				data->seq_res, data->cache_reply, task))
 		return;
 	rpc_call_start(task);
@@ -1396,7 +1399,7 @@ static void nfs4_open_prepare(struct rpc_task *task, void *calldata)
 		nfs_copy_fh(&data->o_res.fh, data->o_arg.fh);
 	}
 	data->timestamp = jiffies;
-	if (nfs4_setup_sequence(data->o_arg.server,
+	if (nfs4_setup_sequence(data->o_arg.server, NULL,
 				&data->o_arg.seq_args,
 				&data->o_res.seq_res, 1, task))
 		return;
@@ -1938,7 +1941,7 @@ static void nfs4_close_prepare(struct rpc_task *task, void *data)
 
 	nfs_fattr_init(calldata->res.fattr);
 	calldata->timestamp = jiffies;
-	if (nfs4_setup_sequence(NFS_SERVER(calldata->inode),
+	if (nfs4_setup_sequence(NFS_SERVER(calldata->inode), NULL,
 				&calldata->arg.seq_args, &calldata->res.seq_res,
 				1, task))
 		return;
@@ -3705,7 +3708,7 @@ static void nfs4_delegreturn_prepare(struct rpc_task *task, void *data)
 
 	d_data = (struct nfs4_delegreturndata *)data;
 
-	if (nfs4_setup_sequence(d_data->res.server,
+	if (nfs4_setup_sequence(d_data->res.server, NULL,
 				&d_data->args.seq_args,
 				&d_data->res.seq_res, 1, task))
 		return;
@@ -3961,7 +3964,7 @@ static void nfs4_locku_prepare(struct rpc_task *task, void *data)
 		return;
 	}
 	calldata->timestamp = jiffies;
-	if (nfs4_setup_sequence(calldata->server,
+	if (nfs4_setup_sequence(calldata->server, NULL,
 				&calldata->arg.seq_args,
 				&calldata->res.seq_res, 1, task))
 		return;
@@ -4116,7 +4119,7 @@ static void nfs4_lock_prepare(struct rpc_task *task, void *calldata)
 	} else
 		data->arg.new_lock_owner = 0;
 	data->timestamp = jiffies;
-	if (nfs4_setup_sequence(data->server,
+	if (nfs4_setup_sequence(data->server, NULL,
 				&data->arg.seq_args,
 				&data->res.seq_res, 1, task))
 		return;
@@ -5374,7 +5377,7 @@ nfs4_pnfs_layoutget_prepare(struct rpc_task *task, void *calldata)
 	struct nfs_server *server = NFS_SERVER(ino);
 
 	dprintk("--> %s\n", __func__);
-	if (nfs4_setup_sequence(server, &lgp->args.seq_args,
+	if (nfs4_setup_sequence(server, NULL, &lgp->args.seq_args,
 				&lgp->res.seq_res, 0, task))
 		return;
 	rpc_call_start(task);
@@ -5486,7 +5489,7 @@ static void pnfs_layoutcommit_prepare(struct rpc_task *task, void *data)
 		(struct pnfs_layoutcommit_data *)data;
 	struct nfs_server *server = NFS_SERVER(ldata->args.inode);
 
-	if (nfs4_setup_sequence(server, &ldata->args.seq_args,
+	if (nfs4_setup_sequence(server, NULL, &ldata->args.seq_args,
 				&ldata->res.seq_res, 1, task))
 		return;
 	rpc_call_start(task);
@@ -5595,7 +5598,7 @@ nfs4_pnfs_layoutreturn_prepare(struct rpc_task *task, void *calldata)
 	struct nfs_server *server = NFS_SERVER(ino);
 
 	dprintk("--> %s\n", __func__);
-	if (nfs4_setup_sequence(server, &lrp->args.seq_args,
+	if (nfs4_setup_sequence(server, NULL, &lrp->args.seq_args,
 				&lrp->res.seq_res, 0, task))
 		return;
 	rpc_call_start(task);
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 87adc27..8f3a894 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -411,7 +411,7 @@ void nfs_read_prepare(struct rpc_task *task, void *calldata)
 {
 	struct nfs_read_data *data = calldata;
 
-	if (nfs4_setup_sequence(NFS_SERVER(data->inode),
+	if (nfs4_setup_sequence(NFS_SERVER(data->inode), NULL,
 				&data->args.seq_args, &data->res.seq_res,
 				0, task))
 		return;
diff --git a/fs/nfs/unlink.c b/fs/nfs/unlink.c
index 2f84ada..51ae53b 100644
--- a/fs/nfs/unlink.c
+++ b/fs/nfs/unlink.c
@@ -110,7 +110,7 @@ void nfs_unlink_prepare(struct rpc_task *task, void *calldata)
 	struct nfs_unlinkdata *data = calldata;
 	struct nfs_server *server = NFS_SERVER(data->dir);
 
-	if (nfs4_setup_sequence(server, &data->args.seq_args,
+	if (nfs4_setup_sequence(server, NULL, &data->args.seq_args,
 				&data->res.seq_res, 1, task))
 		return;
 	rpc_call_start(task);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 4d37229..5937247 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1052,7 +1052,7 @@ void nfs_write_prepare(struct rpc_task *task, void *calldata)
 {
 	struct nfs_write_data *data = calldata;
 
-	if (nfs4_setup_sequence(NFS_SERVER(data->inode),
+	if (nfs4_setup_sequence(NFS_SERVER(data->inode), NULL,
 				&data->args.seq_args,
 				&data->res.seq_res, 1, task))
 		return;
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 30/50] pnfs_submit: update nfs4_async_handle_error for data server
  2010-08-13 21:31                                                         ` [PATCH 29/50] pnfs_submit: add data server session to nfs4_setup_sequence andros
@ 2010-08-13 21:31                                                           ` andros
  2010-08-13 21:31                                                             ` [PATCH 31/50] pnfs_submit: update state renewal for data servers andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4proc.c         |   39 ++++++++++++++++++++++-----------------
 include/linux/nfs4.h      |    7 +++++++
 include/linux/nfs_fs_sb.h |   10 ++++++++++
 3 files changed, 39 insertions(+), 17 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index e8b1620..353c2fb 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -70,7 +70,7 @@ struct nfs4_opendata;
 static int _nfs4_proc_open(struct nfs4_opendata *data);
 static int _nfs4_recover_proc_open(struct nfs4_opendata *data);
 static int nfs4_do_fsinfo(struct nfs_server *, struct nfs_fh *, struct nfs_fsinfo *);
-static int nfs4_async_handle_error(struct rpc_task *, const struct nfs_server *, struct nfs4_state *);
+static int nfs4_async_handle_error(struct rpc_task *, const struct nfs_server *, struct nfs4_state *, struct nfs_client *);
 static int _nfs4_proc_lookup(struct inode *dir, const struct qstr *name, struct nfs_fh *fhandle, struct nfs_fattr *fattr);
 static int _nfs4_proc_getattr(struct nfs_server *server, struct nfs_fh *fhandle, struct nfs_fattr *fattr);
 static int nfs4_do_setattr(struct inode *inode, struct rpc_cred *cred,
@@ -1896,7 +1896,7 @@ static void nfs4_close_done(struct rpc_task *task, void *data)
 			if (calldata->arg.fmode == 0)
 				break;
 		default:
-			if (nfs4_async_handle_error(task, server, state) == -EAGAIN)
+			if (nfs4_async_handle_error(task, server, state, NULL) == -EAGAIN)
 				rpc_restart_call_prepare(task);
 	}
 	nfs_release_seqid(calldata->arg.seqid);
@@ -2688,7 +2688,7 @@ static int nfs4_proc_unlink_done(struct rpc_task *task, struct inode *dir)
 
 	if (!nfs4_sequence_done(task, &res->seq_res))
 		return 0;
-	if (nfs4_async_handle_error(task, res->server, NULL) == -EAGAIN)
+	if (nfs4_async_handle_error(task, res->server, NULL, NULL) == -EAGAIN)
 		return 0;
 	update_changeattr(dir, &res->cinfo);
 	nfs_post_op_update_inode(dir, res->dir_attr);
@@ -3135,7 +3135,7 @@ static int nfs4_read_done(struct rpc_task *task, struct nfs_read_data *data)
 	if (!nfs4_sequence_done(task, &data->res.seq_res))
 		return -EAGAIN;
 
-	if (nfs4_async_handle_error(task, server, data->args.context->state) == -EAGAIN) {
+	if (nfs4_async_handle_error(task, server, data->args.context->state, NULL) == -EAGAIN) {
 		nfs_restart_rpc(task, server->nfs_client);
 		return -EAGAIN;
 	}
@@ -3159,7 +3159,7 @@ static int nfs4_write_done(struct rpc_task *task, struct nfs_write_data *data)
 	if (!nfs4_sequence_done(task, &data->res.seq_res))
 		return -EAGAIN;
 
-	if (nfs4_async_handle_error(task, NFS_SERVER(inode), data->args.context->state) == -EAGAIN) {
+	if (nfs4_async_handle_error(task, NFS_SERVER(inode), data->args.context->state, NULL) == -EAGAIN) {
 		nfs_restart_rpc(task, NFS_SERVER(inode)->nfs_client);
 		return -EAGAIN;
 	}
@@ -3188,7 +3188,7 @@ static int nfs4_commit_done(struct rpc_task *task, struct nfs_write_data *data)
 	if (!nfs4_sequence_done(task, &data->res.seq_res))
 		return -EAGAIN;
 
-	if (nfs4_async_handle_error(task, NFS_SERVER(inode), NULL) == -EAGAIN) {
+	if (nfs4_async_handle_error(task, NFS_SERVER(inode), NULL, NULL) == -EAGAIN) {
 		nfs_restart_rpc(task, NFS_SERVER(inode)->nfs_client);
 		return -EAGAIN;
 	}
@@ -3505,9 +3505,10 @@ static int nfs4_proc_set_acl(struct inode *inode, const void *buf, size_t buflen
 }
 
 static int
-nfs4_async_handle_error(struct rpc_task *task, const struct nfs_server *server, struct nfs4_state *state)
+nfs4_async_handle_error(struct rpc_task *task, const struct nfs_server *server, struct nfs4_state *state, struct nfs_client *clp)
 {
-	struct nfs_client *clp = server->nfs_client;
+	if (!clp)
+		clp = server->nfs_client;
 
 	if (task->tk_status >= 0)
 		return 0;
@@ -3534,14 +3535,16 @@ nfs4_async_handle_error(struct rpc_task *task, const struct nfs_server *server,
 		case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION:
 		case -NFS4ERR_SEQ_FALSE_RETRY:
 		case -NFS4ERR_SEQ_MISORDERED:
-			dprintk("%s ERROR %d, Reset session\n", __func__,
-				task->tk_status);
+			dprintk("%s ERROR %d, Reset session. Exchangeid "
+				"flags 0x%x\n", __func__, task->tk_status,
+				clp->cl_exchange_flags);
 			nfs4_schedule_state_recovery(clp);
 			task->tk_status = 0;
 			return -EAGAIN;
 #endif /* CONFIG_NFS_V4_1 */
 		case -NFS4ERR_DELAY:
-			nfs_inc_server_stats(server, NFSIOS_DELAY);
+			if (server)
+				nfs_inc_server_stats(server, NFSIOS_DELAY);
 		case -NFS4ERR_GRACE:
 		case -EKEYEXPIRED:
 			rpc_delay(task, NFS4_POLL_RETRY_MAX);
@@ -3554,6 +3557,8 @@ nfs4_async_handle_error(struct rpc_task *task, const struct nfs_server *server,
 	task->tk_status = nfs4_map_errors(task->tk_status);
 	return 0;
 do_state_recovery:
+	if (is_ds_only_client(clp))
+		return 0;
 	rpc_sleep_on(&clp->cl_rpcwaitq, task, NULL);
 	nfs4_schedule_state_recovery(clp);
 	if (test_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state) == 0)
@@ -3687,8 +3692,8 @@ static void nfs4_delegreturn_done(struct rpc_task *task, void *calldata)
 		renew_lease(data->res.server, data->timestamp);
 		break;
 	default:
-		if (nfs4_async_handle_error(task, data->res.server, NULL) ==
-				-EAGAIN) {
+		if (nfs4_async_handle_error(task, data->res.server, NULL, NULL)
+				== -EAGAIN) {
 			nfs_restart_rpc(task, data->res.server->nfs_client);
 			return;
 		}
@@ -3946,7 +3951,7 @@ static void nfs4_locku_done(struct rpc_task *task, void *data)
 		case -NFS4ERR_EXPIRED:
 			break;
 		default:
-			if (nfs4_async_handle_error(task, calldata->server, NULL) == -EAGAIN)
+			if (nfs4_async_handle_error(task, calldata->server, NULL, NULL) == -EAGAIN)
 				nfs_restart_rpc(task,
 						 calldata->server->nfs_client);
 	}
@@ -5399,7 +5404,7 @@ static void nfs4_pnfs_layoutget_done(struct rpc_task *task, void *calldata)
 
 	pnfs_get_layout_done(lgp, task->tk_status);
 
-	if (nfs4_async_handle_error(task, server, NULL) == -EAGAIN)
+	if (nfs4_async_handle_error(task, server, NULL, NULL) == -EAGAIN)
 		nfs_restart_rpc(task, server->nfs_client);
 
 	lgp->status = task->tk_status;
@@ -5508,7 +5513,7 @@ pnfs_layoutcommit_done(struct rpc_task *task, void *calldata)
 	if (RPC_ASSASSINATED(task))
 		return;
 
-	if (nfs4_async_handle_error(task, server, NULL) == -EAGAIN)
+	if (nfs4_async_handle_error(task, server, NULL, NULL) == -EAGAIN)
 		nfs_restart_rpc(task, server->nfs_client);
 
 	data->status = task->tk_status;
@@ -5618,7 +5623,7 @@ static void nfs4_pnfs_layoutreturn_done(struct rpc_task *task, void *calldata)
 	if (RPC_ASSASSINATED(task))
 		return;
 
-	if (nfs4_async_handle_error(task, server, NULL) == -EAGAIN)
+	if (nfs4_async_handle_error(task, server, NULL, NULL) == -EAGAIN)
 		nfs_restart_rpc(task, server->nfs_client);
 
 	dprintk("<-- %s\n", __func__);
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 004a867..2737013 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -121,6 +121,13 @@
 #define EXCHGID4_FLAG_MASK_A			0x40070003
 #define EXCHGID4_FLAG_MASK_R			0x80070003
 
+static inline bool
+is_ds_only_session(u32 exchange_flags)
+{
+	u32 mask = EXCHGID4_FLAG_USE_PNFS_DS | EXCHGID4_FLAG_USE_PNFS_MDS;
+	return (exchange_flags & mask) == EXCHGID4_FLAG_USE_PNFS_DS;
+}
+
 #define SEQ4_STATUS_CB_PATH_DOWN		0x00000001
 #define SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING	0x00000002
 #define SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED	0x00000004
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 8d17e67..c17440b 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -91,6 +91,16 @@ struct nfs_client {
 #endif
 };
 
+static inline bool
+is_ds_only_client(struct nfs_client *clp)
+{
+#ifdef CONFIG_NFS_V4_1
+	return is_ds_only_session(clp->cl_exchange_flags);
+#else
+	return false;
+#endif
+}
+
 /*
  * NFS client parameters stored in the superblock.
  */
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 31/50] pnfs_submit: update state renewal for data servers
  2010-08-13 21:31                                                           ` [PATCH 30/50] pnfs_submit: update nfs4_async_handle_error for data server andros
@ 2010-08-13 21:31                                                             ` andros
  2010-08-13 21:31                                                               ` [PATCH 32/50] pnfs_submit-pageio-helpers.patch andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4renewd.c |    2 +-
 fs/nfs/nfs4state.c  |    5 +++++
 2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/nfs4renewd.c b/fs/nfs/nfs4renewd.c
index 72b6c58..b57f41f 100644
--- a/fs/nfs/nfs4renewd.c
+++ b/fs/nfs/nfs4renewd.c
@@ -64,7 +64,7 @@ nfs4_renew_state(struct work_struct *work)
 	ops = clp->cl_mvops->state_renewal_ops;
 	dprintk("%s: start\n", __func__);
 	/* Are there any active superblocks? */
-	if (list_empty(&clp->cl_superblocks))
+	if (list_empty(&clp->cl_superblocks) && !is_ds_only_client(clp))
 		goto out;
 	spin_lock(&clp->cl_lock);
 	lease = clp->cl_lease_time;
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index a674452..18284bd 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -129,6 +129,11 @@ static int nfs41_setup_state_renewal(struct nfs_client *clp)
 	int status;
 	struct nfs_fsinfo fsinfo;
 
+	if (is_ds_only_client(clp)) {
+		nfs4_schedule_state_renewal(clp);
+		return 0;
+	}
+
 	status = nfs4_proc_get_lease_time(clp, &fsinfo);
 	if (status == 0) {
 		/* Update lease time and schedule renewal */
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 32/50] pnfs_submit-pageio-helpers.patch
  2010-08-13 21:31                                                             ` [PATCH 31/50] pnfs_submit: update state renewal for data servers andros
@ 2010-08-13 21:31                                                               ` andros
  2010-08-13 21:31                                                                 ` [PATCH 33/50] pnfs_submit: associate layout segmennt with nfs_page andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Signed-of-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/pagelist.c         |   11 +++++-
 fs/nfs/pnfs.c             |   82 +++++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/pnfs.h             |    3 ++
 fs/nfs/read.c             |    4 ++
 fs/nfs/write.c            |    4 ++
 include/linux/nfs4_pnfs.h |    6 +++
 include/linux/nfs_page.h  |    7 ++++
 7 files changed, 115 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 9194902..41b3966 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -237,7 +237,8 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
  * Return 'true' if this is the case, else return 'false'.
  */
 static int nfs_can_coalesce_requests(struct nfs_page *prev,
-				     struct nfs_page *req)
+				     struct nfs_page *req,
+				     struct nfs_pageio_descriptor *pgio)
 {
 	if (req->wb_context->cred != prev->wb_context->cred)
 		return 0;
@@ -251,6 +252,12 @@ static int nfs_can_coalesce_requests(struct nfs_page *prev,
 		return 0;
 	if (prev->wb_pgbase + prev->wb_bytes != PAGE_CACHE_SIZE)
 		return 0;
+	if (req->wb_lseg != prev->wb_lseg)
+		return 0;
+#ifdef CONFIG_NFS_V4_1
+	if (pgio->pg_test && !pgio->pg_test(pgio, prev, req))
+		return 0;
+#endif /* CONFIG_NFS_V4_1 */
 	return 1;
 }
 
@@ -283,7 +290,7 @@ static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
 		if (newlen > desc->pg_bsize)
 			return 0;
 		prev = nfs_list_entry(desc->pg_list.prev);
-		if (!nfs_can_coalesce_requests(prev, req))
+		if (!nfs_can_coalesce_requests(prev, req, desc))
 			return 0;
 	} else
 		desc->pg_base = req->wb_pgbase;
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 9f17ef5..e25e5d9 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1152,6 +1152,88 @@ out:
 	return status;
 }
 
+void
+pnfs_set_pg_test(struct inode *inode, struct nfs_pageio_descriptor *pgio)
+{
+	struct pnfs_layout_type *laytype;
+	struct pnfs_layoutdriver_type *ld;
+
+	pgio->pg_test = NULL;
+
+	laytype = NFS_I(inode)->layout;
+	ld = NFS_SERVER(inode)->pnfs_curr_ld;
+	if (!pnfs_enabled_sb(NFS_SERVER(inode)) || !laytype)
+		return;
+
+	if (ld->ld_policy_ops)
+		pgio->pg_test = ld->ld_policy_ops->pg_test;
+}
+
+static u32
+pnfs_getboundary(struct inode *inode)
+{
+	u32 stripe_size = 0;
+	struct nfs_server *nfss = NFS_SERVER(inode);
+	struct layoutdriver_policy_operations *policy_ops;
+
+	if (!nfss->pnfs_curr_ld)
+		goto out;
+
+	policy_ops = nfss->pnfs_curr_ld->ld_policy_ops;
+	if (!policy_ops || !policy_ops->get_stripesize)
+		goto out;
+
+	spin_lock(&inode->i_lock);
+	if (NFS_I(inode)->layout)
+		stripe_size = policy_ops->get_stripesize(NFS_I(inode)->layout);
+	spin_unlock(&inode->i_lock);
+out:
+	return stripe_size;
+}
+
+/*
+ * rsize is already set by caller to MDS rsize.
+ */
+void
+pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,
+		  struct inode *inode,
+		  struct nfs_open_context *ctx,
+		  struct list_head *pages)
+{
+	struct nfs_server *nfss = NFS_SERVER(inode);
+
+	pgio->pg_iswrite = 0;
+	pgio->pg_boundary = 0;
+	pgio->pg_test = NULL;
+	pgio->pg_lseg = NULL;
+
+	if (!pnfs_enabled_sb(nfss))
+		return;
+
+	_pnfs_update_layout(inode, ctx, IOMODE_READ, &pgio->pg_lseg);
+	if (!pgio->pg_lseg)
+		return;
+
+	pgio->pg_boundary = pnfs_getboundary(inode);
+	if (pgio->pg_boundary)
+		pnfs_set_pg_test(inode, pgio);
+}
+
+void
+pnfs_pageio_init_write(struct nfs_pageio_descriptor *pgio, struct inode *inode)
+{
+	struct nfs_server *server = NFS_SERVER(inode);
+
+	pgio->pg_iswrite = 1;
+	if (!pnfs_enabled_sb(server)) {
+		pgio->pg_boundary = 0;
+		pgio->pg_test = NULL;
+		return;
+	}
+	pgio->pg_boundary = pnfs_getboundary(inode);
+	pnfs_set_pg_test(inode, pgio);
+}
+
 /*
  * Set up the argument/result storage required for the RPC call.
  */
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 74b9a70..5cd00fd 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -46,6 +46,9 @@ void pnfs_layoutcommit_free(struct pnfs_layoutcommit_data *data);
 int pnfs_layoutcommit_inode(struct inode *inode, int sync);
 void pnfs_update_last_write(struct nfs_inode *nfsi, loff_t offset, size_t extent);
 void pnfs_need_layoutcommit(struct nfs_inode *nfsi, struct nfs_open_context *ctx);
+void pnfs_pageio_init_read(struct nfs_pageio_descriptor *, struct inode *,
+			   struct nfs_open_context *, struct list_head *);
+void pnfs_pageio_init_write(struct nfs_pageio_descriptor *, struct inode *);
 void pnfs_get_layout_done(struct nfs4_pnfs_layoutget *, int rpc_status);
 int pnfs_layout_process(struct nfs4_pnfs_layoutget *lgp);
 void pnfs_layout_release(struct pnfs_layout_type *, struct nfs4_pnfs_layout_segment *range);
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 8f3a894..99d95ec 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -20,6 +20,7 @@
 #include <linux/nfs_page.h>
 
 #include <asm/system.h>
+#include "pnfs.h"
 
 #include "nfs4_fs.h"
 #include "internal.h"
@@ -625,6 +626,9 @@ int nfs_readpages(struct file *filp, struct address_space *mapping,
 	if (ret == 0)
 		goto read_complete; /* all pages were read */
 
+#ifdef CONFIG_NFS_V4_1
+	pnfs_pageio_init_read(&pgio, inode, desc.ctx, pages);
+#endif /* CONFIG_NFS_V4_1 */
 	if (rsize < PAGE_CACHE_SIZE)
 		nfs_pageio_init(&pgio, inode, nfs_pagein_multi, rsize, 0);
 	else
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 5937247..0667eda 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -986,6 +986,10 @@ static void nfs_pageio_init_write(struct nfs_pageio_descriptor *pgio,
 {
 	size_t wsize = NFS_SERVER(inode)->wsize;
 
+#ifdef CONFIG_NFS_V4_1
+	pnfs_pageio_init_write(pgio, inode);
+#endif /* CONFIG_NFS_V4_1 */
+
 	if (wsize < PAGE_CACHE_SIZE)
 		nfs_pageio_init(pgio, inode, nfs_flush_multi, wsize, ioflags);
 	else
diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
index 52f7a21..9e3fff4 100644
--- a/include/linux/nfs4_pnfs.h
+++ b/include/linux/nfs4_pnfs.h
@@ -13,6 +13,7 @@
 #define LINUX_NFS4_PNFS_H
 
 #include <linux/pnfs_xdr.h>
+#include <linux/nfs_page.h>
 
 /* Per-layout driver specific registration structure */
 struct pnfs_layoutdriver_type {
@@ -118,6 +119,11 @@ struct layoutdriver_io_operations {
 };
 
 struct layoutdriver_policy_operations {
+	/* The stripe size of the file system */
+	ssize_t (*get_stripesize) (struct pnfs_layout_type *layoutid);
+
+	/* test for nfs page cache coalescing */
+	int (*pg_test)(struct nfs_pageio_descriptor *, struct nfs_page *, struct nfs_page *);
 };
 
 struct pnfs_device {
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index f8b60e7..d7deec9 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -48,6 +48,7 @@ struct nfs_page {
 	struct kref		wb_kref;	/* reference count */
 	unsigned long		wb_flags;
 	struct nfs_writeverf	wb_verf;	/* Commit cookie */
+	struct pnfs_layout_segment *wb_lseg;	/* Pnfs layout info */
 };
 
 struct nfs_pageio_descriptor {
@@ -61,6 +62,12 @@ struct nfs_pageio_descriptor {
 	int			(*pg_doio)(struct inode *, struct list_head *, unsigned int, size_t, int);
 	int 			pg_ioflags;
 	int			pg_error;
+	struct pnfs_layout_segment *pg_lseg;
+#ifdef CONFIG_NFS_V4_1
+	int			pg_iswrite;
+	int			pg_boundary;
+	int			(*pg_test)(struct nfs_pageio_descriptor *, struct nfs_page *, struct nfs_page *);
+#endif /* CONFIG_NFS_V4_1 */
 };
 
 #define NFS_WBACK_BUSY(req)	(test_bit(PG_BUSY,&(req)->wb_flags))
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 33/50] pnfs_submit: associate layout segmennt with nfs_page
  2010-08-13 21:31                                                               ` [PATCH 32/50] pnfs_submit-pageio-helpers.patch andros
@ 2010-08-13 21:31                                                                 ` andros
  2010-08-13 21:31                                                                   ` [PATCH 34/50] pnfs_submit: filelayout policy operations andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/file.c            |   32 +++++++++++++++++++++++++-------
 fs/nfs/pagelist.c        |   12 ++++++++++--
 fs/nfs/pnfs.c            |   10 ++++++++++
 fs/nfs/read.c            |    8 ++++++--
 fs/nfs/write.c           |   42 ++++++++++++++++++++++++++++++------------
 include/linux/nfs_fs.h   |    8 ++++++--
 include/linux/nfs_page.h |    3 ++-
 7 files changed, 89 insertions(+), 26 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index f036153..d0ed767 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -36,6 +36,7 @@
 #include "internal.h"
 #include "iostat.h"
 #include "fscache.h"
+#include "pnfs.h"
 
 #define NFSDBG_FACILITY		NFSDBG_FILE
 
@@ -389,12 +390,17 @@ static int nfs_write_begin(struct file *file, struct address_space *mapping,
 	pgoff_t index = pos >> PAGE_CACHE_SHIFT;
 	struct page *page;
 	int once_thru = 0;
+	struct pnfs_layout_segment *lseg;
 
 	dfprintk(PAGECACHE, "NFS: write_begin(%s/%s(%ld), %u@%lld)\n",
 		file->f_path.dentry->d_parent->d_name.name,
 		file->f_path.dentry->d_name.name,
 		mapping->host->i_ino, len, (long long) pos);
 
+	pnfs_update_layout(mapping->host,
+			   nfs_file_open_context(file),
+			   IOMODE_RW,
+			   &lseg);
 start:
 	/*
 	 * Prevent starvation issues if someone is doing a consistency
@@ -403,14 +409,16 @@ start:
 	ret = wait_on_bit(&NFS_I(mapping->host)->flags, NFS_INO_FLUSHING,
 			nfs_wait_bit_killable, TASK_KILLABLE);
 	if (ret)
-		return ret;
+		goto out;
 
 	page = grab_cache_page_write_begin(mapping, index, flags);
-	if (!page)
-		return -ENOMEM;
+	if (!page) {
+		ret = -ENOMEM;
+		goto out;
+	}
 	*pagep = page;
 
-	ret = nfs_flush_incompatible(file, page);
+	ret = nfs_flush_incompatible(file, page, lseg);
 	if (ret) {
 		unlock_page(page);
 		page_cache_release(page);
@@ -422,6 +430,12 @@ start:
 		if (!ret)
 			goto start;
 	}
+	*fsdata = lseg;
+ out:
+	if (ret) {
+		put_lseg(lseg);
+		*fsdata = NULL;
+	}
 	return ret;
 }
 
@@ -431,6 +445,7 @@ static int nfs_write_end(struct file *file, struct address_space *mapping,
 {
 	unsigned offset = pos & (PAGE_CACHE_SIZE - 1);
 	int status;
+	struct pnfs_layout_segment *lseg = fsdata;
 
 	dfprintk(PAGECACHE, "NFS: write_end(%s/%s(%ld), %u@%lld)\n",
 		file->f_path.dentry->d_parent->d_name.name,
@@ -457,10 +472,11 @@ static int nfs_write_end(struct file *file, struct address_space *mapping,
 			zero_user_segment(page, pglen, PAGE_CACHE_SIZE);
 	}
 
-	status = nfs_updatepage(file, page, offset, copied);
+	status = nfs_updatepage(file, page, offset, copied, lseg);
 
 	unlock_page(page);
 	page_cache_release(page);
+	put_lseg(lseg);
 
 	if (status < 0)
 		return status;
@@ -571,6 +587,8 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	/* make sure the cache has finished storing the page */
 	nfs_fscache_wait_on_page_write(NFS_I(dentry->d_inode), page);
 
+	/* XXX Do we want to call pnfs_update_layout here? */
+
 	lock_page(page);
 	mapping = page->mapping;
 	if (mapping != dentry->d_inode->i_mapping)
@@ -581,11 +599,11 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	if (pagelen == 0)
 		goto out_unlock;
 
-	ret = nfs_flush_incompatible(filp, page);
+	ret = nfs_flush_incompatible(filp, page, NULL);
 	if (ret != 0)
 		goto out_unlock;
 
-	ret = nfs_updatepage(filp, page, 0, pagelen);
+	ret = nfs_updatepage(filp, page, 0, pagelen, NULL);
 out_unlock:
 	if (!ret)
 		return VM_FAULT_LOCKED;
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 41b3966..a014814 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -20,6 +20,7 @@
 #include <linux/nfs_mount.h>
 
 #include "internal.h"
+#include "pnfs.h"
 
 static struct kmem_cache *nfs_page_cachep;
 
@@ -56,7 +57,8 @@ nfs_page_free(struct nfs_page *p)
 struct nfs_page *
 nfs_create_request(struct nfs_open_context *ctx, struct inode *inode,
 		   struct page *page,
-		   unsigned int offset, unsigned int count)
+		   unsigned int offset, unsigned int count,
+		   struct pnfs_layout_segment *lseg)
 {
 	struct nfs_page		*req;
 
@@ -81,6 +83,9 @@ nfs_create_request(struct nfs_open_context *ctx, struct inode *inode,
 	req->wb_context = get_nfs_open_context(ctx);
 	req->wb_lock_context = nfs_get_lock_context(ctx);
 	kref_init(&req->wb_kref);
+	req->wb_lseg    = lseg;
+	if (lseg)
+		get_lseg(lseg);
 	return req;
 }
 
@@ -156,9 +161,12 @@ void nfs_clear_request(struct nfs_page *req)
 		put_nfs_open_context(ctx);
 		req->wb_context = NULL;
 	}
+	if (req->wb_lseg != NULL) {
+		put_lseg(req->wb_lseg);
+		req->wb_lseg = NULL;
+	}
 }
 
-
 /**
  * nfs_release_request - Release the count on an NFS read/write request
  * @req: request to release
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index e25e5d9..3ff193b 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1234,6 +1234,16 @@ pnfs_pageio_init_write(struct nfs_pageio_descriptor *pgio, struct inode *inode)
 	pnfs_set_pg_test(inode, pgio);
 }
 
+static void _pnfs_clear_lseg_from_pages(struct list_head *head)
+{
+	struct nfs_page *req;
+
+	list_for_each_entry(req, head, wb_list) {
+		put_lseg(req->wb_lseg);
+		req->wb_lseg = NULL;
+	}
+}
+
 /*
  * Set up the argument/result storage required for the RPC call.
  */
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 99d95ec..324b577 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -118,11 +118,14 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
 	LIST_HEAD(one_request);
 	struct nfs_page	*new;
 	unsigned int len;
+	struct pnfs_layout_segment *lseg;
 
 	len = nfs_page_length(page);
 	if (len == 0)
 		return nfs_return_empty_page(page);
-	new = nfs_create_request(ctx, inode, page, 0, len);
+	pnfs_update_layout(inode, ctx, IOMODE_READ, &lseg);
+	new = nfs_create_request(ctx, inode, page, 0, len, lseg);
+	put_lseg(lseg);
 	if (IS_ERR(new)) {
 		unlock_page(page);
 		return PTR_ERR(new);
@@ -570,7 +573,8 @@ readpage_async_filler(void *data, struct page *page)
 	if (len == 0)
 		return nfs_return_empty_page(page);
 
-	new = nfs_create_request(desc->ctx, inode, page, 0, len);
+	new = nfs_create_request(desc->ctx, inode, page, 0, len,
+				 desc->pgio->pg_lseg);
 	if (IS_ERR(new))
 		goto out_error;
 
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 0667eda..a1f28c5 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -430,6 +430,17 @@ static void nfs_inode_remove_request(struct nfs_page *req)
 	nfs_clear_request(req);
 	nfs_release_request(req);
 }
+static void
+nfs_mark_request_nopnfs(struct nfs_page *req)
+{
+	struct pnfs_layout_segment *lseg = req->wb_lseg;
+
+	if (req->wb_lseg == NULL)
+		return;
+	req->wb_lseg = NULL;
+	put_lseg(lseg);
+	dprintk(" retry through MDS\n");
+}
 
 static void
 nfs_mark_request_dirty(struct nfs_page *req)
@@ -572,7 +583,8 @@ static inline int nfs_scan_commit(struct inode *inode, struct list_head *dst, pg
 static struct nfs_page *nfs_try_to_update_request(struct inode *inode,
 		struct page *page,
 		unsigned int offset,
-		unsigned int bytes)
+		unsigned int bytes,
+		struct pnfs_layout_segment *lseg)
 {
 	struct nfs_page *req;
 	unsigned int rqend;
@@ -597,8 +609,8 @@ static struct nfs_page *nfs_try_to_update_request(struct inode *inode,
 		 * Note: nfs_flush_incompatible() will already
 		 * have flushed out requests having wrong owners.
 		 */
-		if (offset > rqend
-		    || end < req->wb_offset)
+		if (offset > rqend || end < req->wb_offset ||
+		    req->wb_lseg != lseg)
 			goto out_flushme;
 
 		if (nfs_set_page_tag_locked(req))
@@ -646,16 +658,17 @@ out_err:
  * already called nfs_flush_incompatible() if necessary.
  */
 static struct nfs_page * nfs_setup_write_request(struct nfs_open_context* ctx,
-		struct page *page, unsigned int offset, unsigned int bytes)
+		struct page *page, unsigned int offset, unsigned int bytes,
+		struct pnfs_layout_segment *lseg)
 {
 	struct inode *inode = page->mapping->host;
 	struct nfs_page	*req;
 	int error;
 
-	req = nfs_try_to_update_request(inode, page, offset, bytes);
+	req = nfs_try_to_update_request(inode, page, offset, bytes, lseg);
 	if (req != NULL)
 		goto out;
-	req = nfs_create_request(ctx, inode, page, offset, bytes);
+	req = nfs_create_request(ctx, inode, page, offset, bytes, lseg);
 	if (IS_ERR(req))
 		goto out;
 	error = nfs_inode_add_request(inode, req);
@@ -668,11 +681,12 @@ out:
 }
 
 static int nfs_writepage_setup(struct nfs_open_context *ctx, struct page *page,
-		unsigned int offset, unsigned int count)
+			       unsigned int offset, unsigned int count,
+			       struct pnfs_layout_segment *lseg)
 {
 	struct nfs_page	*req;
 
-	req = nfs_setup_write_request(ctx, page, offset, count);
+	req = nfs_setup_write_request(ctx, page, offset, count, lseg);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 	nfs_mark_request_dirty(req);
@@ -684,7 +698,8 @@ static int nfs_writepage_setup(struct nfs_open_context *ctx, struct page *page,
 	return 0;
 }
 
-int nfs_flush_incompatible(struct file *file, struct page *page)
+int nfs_flush_incompatible(struct file *file, struct page *page,
+			   struct pnfs_layout_segment *lseg)
 {
 	struct nfs_open_context *ctx = nfs_file_open_context(file);
 	struct nfs_page	*req;
@@ -703,7 +718,8 @@ int nfs_flush_incompatible(struct file *file, struct page *page)
 			return 0;
 		do_flush = req->wb_page != page || req->wb_context != ctx ||
 			req->wb_lock_context->lockowner != current->files ||
-			req->wb_lock_context->pid != current->tgid;
+			req->wb_lock_context->pid != current->tgid ||
+			req->wb_lseg != lseg;
 		nfs_release_request(req);
 		if (!do_flush)
 			return 0;
@@ -730,7 +746,8 @@ static int nfs_write_pageuptodate(struct page *page, struct inode *inode)
  * things with a page scheduled for an RPC call (e.g. invalidate it).
  */
 int nfs_updatepage(struct file *file, struct page *page,
-		unsigned int offset, unsigned int count)
+		   unsigned int offset, unsigned int count,
+		   struct pnfs_layout_segment *lseg)
 {
 	struct nfs_open_context *ctx = nfs_file_open_context(file);
 	struct inode	*inode = page->mapping->host;
@@ -755,7 +772,7 @@ int nfs_updatepage(struct file *file, struct page *page,
 		offset = 0;
 	}
 
-	status = nfs_writepage_setup(ctx, page, offset, count);
+	status = nfs_writepage_setup(ctx, page, offset, count, lseg);
 	if (status < 0)
 		nfs_set_pageerror(page);
 
@@ -874,6 +891,7 @@ static void nfs_redirty_request(struct nfs_page *req)
 {
 	struct page *page = req->wb_page;
 
+	nfs_mark_request_nopnfs(req);
 	nfs_mark_request_dirty(req);
 	nfs_clear_page_tag_locked(req);
 	nfs_end_page_writeback(page);
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 7202c05..6f67aec 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -517,8 +517,12 @@ extern void nfs_unblock_sillyrename(struct dentry *dentry);
 extern int  nfs_congestion_kb;
 extern int  nfs_writepage(struct page *page, struct writeback_control *wbc);
 extern int  nfs_writepages(struct address_space *, struct writeback_control *);
-extern int  nfs_flush_incompatible(struct file *file, struct page *page);
-extern int  nfs_updatepage(struct file *, struct page *, unsigned int, unsigned int);
+struct pnfs_layout_segment;
+extern int  nfs_flush_incompatible(struct file *file, struct page *page,
+				   struct pnfs_layout_segment *lseg);
+extern int  nfs_updatepage(struct file *, struct page *,
+			   unsigned int offset, unsigned int count,
+			   struct pnfs_layout_segment *lseg);
 extern int nfs_writeback_done(struct rpc_task *, struct nfs_write_data *);
 
 /*
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index d7deec9..ce0a1a5 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -76,7 +76,8 @@ extern	struct nfs_page *nfs_create_request(struct nfs_open_context *ctx,
 					    struct inode *inode,
 					    struct page *page,
 					    unsigned int offset,
-					    unsigned int count);
+					    unsigned int count,
+					    struct pnfs_layout_segment *lseg);
 extern	void nfs_clear_request(struct nfs_page *req);
 extern	void nfs_release_request(struct nfs_page *req);
 
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 34/50] pnfs_submit: filelayout policy operations
  2010-08-13 21:31                                                                 ` [PATCH 33/50] pnfs_submit: associate layout segmennt with nfs_page andros
@ 2010-08-13 21:31                                                                   ` andros
  2010-08-13 21:31                                                                     ` [PATCH 35/50] pnfs_submit: filelayout i/o helpers andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayout.c |   35 +++++++++++++++++++++++++++++++++++
 1 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 50620f4..4af089c 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -303,6 +303,39 @@ filelayout_free_lseg(struct pnfs_layout_segment *lseg)
 				   nfs4_fl_free_deviceid_callback);
 	_filelayout_free_lseg(lseg);
 }
+
+/* Return the stripesize for the specified file */
+ssize_t
+filelayout_get_stripesize(struct pnfs_layout_type *layoutid)
+{
+	struct nfs4_filelayout *flo = FILE_LO(layoutid);
+
+	return flo->stripe_unit;
+}
+
+/*
+ * filelayout_pg_test(). Called by nfs_can_coalesce_requests()
+ *
+ * return 1 :  coalesce page
+ * return 0 :  don't coalesce page
+ */
+int
+filelayout_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
+		   struct nfs_page *req)
+{
+	u64 p_stripe, r_stripe;
+
+	if (pgio->pg_boundary == 0)
+		return 1;
+	p_stripe = (u64)prev->wb_index << PAGE_CACHE_SHIFT;
+	r_stripe = (u64)req->wb_index << PAGE_CACHE_SHIFT;
+
+	do_div(p_stripe, pgio->pg_boundary);
+	do_div(r_stripe, pgio->pg_boundary);
+
+	return (p_stripe == r_stripe);
+}
+
 struct layoutdriver_io_operations filelayout_io_operations = {
 	.alloc_layout            = filelayout_alloc_layout,
 	.free_layout             = filelayout_free_layout,
@@ -313,6 +346,8 @@ struct layoutdriver_io_operations filelayout_io_operations = {
 };
 
 struct layoutdriver_policy_operations filelayout_policy_operations = {
+	.get_stripesize        = filelayout_get_stripesize,
+	.pg_test               = filelayout_pg_test,
 };
 
 struct pnfs_layoutdriver_type filelayout_type = {
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 35/50] pnfs_submit: filelayout i/o helpers
  2010-08-13 21:31                                                                   ` [PATCH 34/50] pnfs_submit: filelayout policy operations andros
@ 2010-08-13 21:31                                                                     ` andros
  2010-08-13 21:31                                                                       ` [PATCH 36/50] pnfs_submit: generic read andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/client.c            |    7 +-
 fs/nfs/internal.h          |   12 +++
 fs/nfs/nfs4filelayout.c    |   42 ++++++++++-
 fs/nfs/nfs4filelayout.h    |   10 +++
 fs/nfs/nfs4filelayoutdev.c |  180 ++++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/nfs4proc.c          |    8 +-
 6 files changed, 252 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 09ee926..b53f61c 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -348,7 +348,7 @@ static int nfs_sockaddr_match_ipaddr(const struct sockaddr *sa1,
  * Test if two socket addresses represent the same actual socket,
  * by comparing (only) relevant fields, including the port number.
  */
-static int nfs_sockaddr_cmp(const struct sockaddr *sa1,
+int nfs_sockaddr_cmp(const struct sockaddr *sa1,
 			    const struct sockaddr *sa2)
 {
 	if (sa1->sa_family != sa2->sa_family)
@@ -362,6 +362,7 @@ static int nfs_sockaddr_cmp(const struct sockaddr *sa1,
 	}
 	return 0;
 }
+EXPORT_SYMBOL(nfs_sockaddr_cmp);
 
 /*
  * Find a client by IP address and protocol version
@@ -553,6 +554,7 @@ int nfs4_check_client_ready(struct nfs_client *clp)
 		return -EPROTONOSUPPORT;
 	return 0;
 }
+EXPORT_SYMBOL(nfs4_check_client_ready);
 
 /*
  * Initialise the timeout values for a connection
@@ -1250,7 +1252,7 @@ error:
 /*
  * Set up an NFS4 client
  */
-static int nfs4_set_client(struct nfs_server *server,
+int nfs4_set_client(struct nfs_server *server,
 		const char *hostname,
 		const struct sockaddr *addr,
 		const size_t addrlen,
@@ -1293,6 +1295,7 @@ error:
 	dprintk("<-- nfs4_set_client() = xerror %d\n", error);
 	return error;
 }
+EXPORT_SYMBOL(nfs4_set_client);
 
 
 /*
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index e70f44b..eba1cc0 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -139,6 +139,16 @@ extern struct nfs_server *nfs_clone_server(struct nfs_server *,
 					   struct nfs_fattr *);
 extern void nfs_mark_client_ready(struct nfs_client *clp, int state);
 extern int nfs4_check_client_ready(struct nfs_client *clp);
+extern int nfs_sockaddr_cmp(const struct sockaddr *sa1,
+		const struct sockaddr *sa2);
+extern int nfs4_set_client(struct nfs_server *server,
+		const char *hostname,
+		const struct sockaddr *addr,
+		const size_t addrlen,
+		const char *ip_addr,
+		rpc_authflavor_t authflavour,
+		int proto, const struct rpc_timeout *timeparms,
+		u32 minorversion);
 #ifdef CONFIG_PROC_FS
 extern int __init nfs_fs_proc_init(void);
 extern void nfs_fs_proc_exit(void);
@@ -201,6 +211,8 @@ extern const u32 nfs41_maxwrite_overhead;
 extern struct rpc_procinfo nfs4_procedures[];
 #endif
 
+extern int nfs4_recover_expired_lease(struct nfs_client *clp);
+
 /* proc.c */
 void nfs_close_context(struct nfs_open_context *ctx, int is_sync);
 
diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 4af089c..d1c0d35 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -38,9 +38,17 @@
 
 #include <linux/module.h>
 #include <linux/init.h>
-
+#include <linux/time.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/string.h>
+#include <linux/vmalloc.h>
+#include <linux/stat.h>
+#include <linux/errno.h>
+#include <linux/unistd.h>
 #include <linux/nfs_fs.h>
 #include <linux/nfs_page.h>
+#include <linux/pnfs_xdr.h>
 #include <linux/nfs4_pnfs.h>
 
 #include "nfs4filelayout.h"
@@ -83,6 +91,38 @@ filelayout_uninitialize_mountpoint(struct nfs_server *nfss)
 	return 0;
 }
 
+/* This function is used by the layout driver to calculate the
+ * offset of the file on the dserver based on whether the
+ * layout type is STRIPE_DENSE or STRIPE_SPARSE
+ */
+static loff_t
+filelayout_get_dserver_offset(struct pnfs_layout_segment *lseg, loff_t offset)
+{
+	struct nfs4_filelayout_segment *flseg = LSEG_LD_DATA(lseg);
+
+	switch (flseg->stripe_type) {
+	case STRIPE_SPARSE:
+		return offset;
+
+	case STRIPE_DENSE:
+	{
+		u32 stripe_width;
+		u64 tmp, off;
+		u32 unit = flseg->stripe_unit;
+
+		stripe_width = unit * FILE_DSADDR(lseg)->stripe_count;
+		tmp = off = offset - flseg->pattern_offset;
+		do_div(tmp, stripe_width);
+		return tmp * unit + do_div(off, unit);
+	}
+	default:
+		BUG();
+	}
+
+	/* We should never get here... just to stop the gcc warning */
+	return 0;
+}
+
 /*
  * Create a filelayout layout structure and return it.  The pNFS client
  * will use the pnfs_layout_type type to refer to the layout for this
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index aeb2147..f8f7c05 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -23,6 +23,10 @@
 #define NFS4_PNFS_MAX_STRIPE_CNT 4096
 #define NFS4_PNFS_MAX_MULTI_CNT  64 /* 256 fit into a u8 stripe_index */
 
+#define FILE_DSADDR(lseg) (container_of(lseg->deviceid, \
+					struct nfs4_file_layout_dsaddr, \
+					deviceid))
+
 enum stripetype4 {
 	STRIPE_SPARSE = 1,
 	STRIPE_DENSE = 2
@@ -62,6 +66,9 @@ struct nfs4_filelayout {
 	u32 stripe_unit;
 };
 
+extern struct nfs_fh *
+nfs4_fl_select_ds_fh(struct pnfs_layout_segment *lseg, loff_t offset);
+
 static inline struct nfs4_filelayout *
 FILE_LO(struct pnfs_layout_type *lo)
 {
@@ -73,6 +80,9 @@ extern struct pnfs_client_operations *pnfs_callback_ops;
 extern void nfs4_fl_free_deviceid_callback(struct kref *);
 extern void print_ds(struct nfs4_pnfs_ds *ds);
 char *deviceid_fmt(const struct pnfs_deviceid *dev_id);
+u32 nfs4_fl_calc_ds_index(struct pnfs_layout_segment *lseg, loff_t offset);
+struct nfs4_pnfs_ds *nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg,
+					u32 ds_idx);
 extern struct nfs4_file_layout_dsaddr *
 nfs4_pnfs_device_item_find(struct nfs_client *, struct pnfs_deviceid *dev_id);
 struct nfs4_file_layout_dsaddr *
diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index f7614f6..1452710 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -117,6 +117,112 @@ _data_server_lookup(u32 ip_addr, u32 port)
 	return NULL;
 }
 
+/* Create an rpc to the data server defined in 'dev_list' */
+static int
+nfs4_pnfs_ds_create(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds)
+{
+	struct nfs_server	*tmp;
+	struct sockaddr_in	sin;
+	struct rpc_clnt 	*mds_clnt = mds_srv->client;
+	struct nfs_client	*clp = mds_srv->nfs_client;
+	struct sockaddr		*mds_addr;
+	int err = 0;
+
+	dprintk("--> %s ip:port %s au_flavor %d\n", __func__,
+		ds->r_addr, mds_clnt->cl_auth->au_flavor);
+
+	sin.sin_family = AF_INET;
+	sin.sin_addr.s_addr = ds->ds_ip_addr;
+	sin.sin_port = ds->ds_port;
+
+	/*
+	 * If this DS is also the MDS, use the MDS session only if the
+	 * MDS exchangeid flags show the EXCHGID4_FLAG_USE_PNFS_DS pNFS role.
+	 */
+	mds_addr = (struct sockaddr *)&clp->cl_addr;
+	if (nfs_sockaddr_cmp((struct sockaddr *)&sin, mds_addr)) {
+		if (!(clp->cl_exchange_flags & EXCHGID4_FLAG_USE_PNFS_DS)) {
+			printk(KERN_INFO "ip:port %s is not a pNFS Data "
+				"Server\n", ds->r_addr);
+			err = -ENODEV;
+		} else {
+			atomic_inc(&clp->cl_count);
+			ds->ds_clp = clp;
+			dprintk("%s Using MDS Session for DS\n", __func__);
+		}
+		goto out;
+	}
+
+	/* Temporay server for nfs4_set_client */
+	tmp = kzalloc(sizeof(struct nfs_server), GFP_KERNEL);
+	if (!tmp)
+		goto out;
+
+	/*
+	 * Set a retrans, timeout interval, and authflavor equual to the MDS
+	 * values. Use the MDS nfs_client cl_ipaddr field so as to use the
+	 * same co_ownerid as the MDS.
+	 */
+	err = nfs4_set_client(tmp,
+			      mds_srv->nfs_client->cl_hostname,
+			      (struct sockaddr *)&sin,
+			      sizeof(struct sockaddr),
+			      mds_srv->nfs_client->cl_ipaddr,
+			      mds_clnt->cl_auth->au_flavor,
+			      IPPROTO_TCP,
+			      mds_clnt->cl_xprt->timeout,
+			      1 /* minorversion */);
+	if (err < 0)
+		goto out_free;
+
+	clp = tmp->nfs_client;
+
+	/* Ask for only the EXCHGID4_FLAG_USE_PNFS_DS pNFS role */
+	dprintk("%s EXCHANGE_ID for clp %p\n", __func__, clp);
+	clp->cl_exchange_flags = EXCHGID4_FLAG_USE_PNFS_DS;
+
+	err = nfs4_recover_expired_lease(clp);
+	if (!err)
+		err = nfs4_check_client_ready(clp);
+	if (err)
+		goto out_put;
+
+	if (!(clp->cl_exchange_flags & EXCHGID4_FLAG_USE_PNFS_DS)) {
+		printk(KERN_INFO "ip:port %s is not a pNFS Data Server\n",
+			ds->r_addr);
+		err = -ENODEV;
+		goto out_put;
+	}
+	/*
+	 * Mask the (possibly) returned EXCHGID4_FLAG_USE_PNFS_MDS pNFS role
+	 * The is_ds_only_session depends on this.
+	 */
+	clp->cl_exchange_flags &= ~EXCHGID4_FLAG_USE_PNFS_MDS;
+	/*
+	 * Set DS lease equal to the MDS lease, renewal is scheduled in
+	 * create_session
+	 */
+	spin_lock(&mds_srv->nfs_client->cl_lock);
+	clp->cl_lease_time = mds_srv->nfs_client->cl_lease_time;
+	spin_unlock(&mds_srv->nfs_client->cl_lock);
+	clp->cl_last_renewal = jiffies;
+
+	clear_bit(NFS4CLNT_SESSION_RESET, &clp->cl_state);
+	ds->ds_clp = clp;
+
+	dprintk("%s: ip=%x, port=%hu, rpcclient %p\n", __func__,
+				ntohl(ds->ds_ip_addr), ntohs(ds->ds_port),
+				clp->cl_rpcclient);
+out_free:
+	kfree(tmp);
+out:
+	dprintk("%s Returns %d\n", __func__, err);
+	return err;
+out_put:
+	nfs_put_client(clp);
+	goto out_free;
+}
+
 static void
 destroy_ds(struct nfs4_pnfs_ds *ds)
 {
@@ -454,3 +560,77 @@ nfs4_pnfs_device_item_find(struct nfs_client *clp, struct pnfs_deviceid *id)
 	return (d == NULL) ? NULL :
 		container_of(d, struct nfs4_file_layout_dsaddr, deviceid);
 }
+
+/*
+ * Want res = (offset - layout->pattern_offset)/ layout->stripe_unit
+ * Then: ((res + fsi) % dsaddr->stripe_count)
+ */
+static inline u32
+_nfs4_fl_calc_j_index(struct pnfs_layout_segment *lseg, loff_t offset)
+{
+	struct nfs4_filelayout_segment *flseg = LSEG_LD_DATA(lseg);
+	u64 tmp;
+
+	tmp = offset - flseg->pattern_offset;
+	do_div(tmp, flseg->stripe_unit);
+	tmp += flseg->first_stripe_index;
+	return do_div(tmp, FILE_DSADDR(lseg)->stripe_count);
+}
+
+u32
+nfs4_fl_calc_ds_index(struct pnfs_layout_segment *lseg, loff_t offset)
+{
+	u32 j;
+
+	j = _nfs4_fl_calc_j_index(lseg, offset);
+	return FILE_DSADDR(lseg)->stripe_indices[j];
+}
+
+struct nfs_fh *
+nfs4_fl_select_ds_fh(struct pnfs_layout_segment *lseg, loff_t offset)
+{
+	struct nfs4_filelayout_segment *flseg = LSEG_LD_DATA(lseg);
+	u32 i;
+
+	if (flseg->stripe_type == STRIPE_SPARSE) {
+		if (flseg->num_fh == 1)
+			i = 0;
+		else if (flseg->num_fh == 0)
+			return NULL;
+		else
+			i = nfs4_fl_calc_ds_index(lseg, offset);
+	} else
+		i = _nfs4_fl_calc_j_index(lseg, offset);
+	return &flseg->fh_array[i];
+}
+
+struct nfs4_pnfs_ds *
+nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
+{
+	struct nfs4_filelayout_segment *flseg = LSEG_LD_DATA(lseg);
+	struct nfs4_file_layout_dsaddr *dsaddr;
+
+	dsaddr = FILE_DSADDR(lseg);
+	if (dsaddr->ds_list[ds_idx] == NULL) {
+		printk(KERN_ERR "%s: No data server for device id (%s)!!\n",
+			__func__, deviceid_fmt(&flseg->dev_id));
+		return NULL;
+	}
+
+	if (!dsaddr->ds_list[ds_idx]->ds_clp) {
+		int err;
+
+		err = nfs4_pnfs_ds_create(PNFS_NFS_SERVER(lseg->layout),
+					  dsaddr->ds_list[ds_idx]);
+		if (err) {
+			printk(KERN_ERR "%s nfs4_pnfs_ds_create error %d\n",
+			       __func__, err);
+			return NULL;
+		}
+	}
+	dprintk("%s: dev_id=%s, ds_idx=%u\n",
+		__func__, deviceid_fmt(&flseg->dev_id), ds_idx);
+
+	return dsaddr->ds_list[ds_idx];
+}
+
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 353c2fb..d7d193b 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1576,9 +1576,8 @@ static int _nfs4_proc_open(struct nfs4_opendata *data)
 	return 0;
 }
 
-static int nfs4_recover_expired_lease(struct nfs_server *server)
+int nfs4_recover_expired_lease(struct nfs_client *clp)
 {
-	struct nfs_client *clp = server->nfs_client;
 	unsigned int loop;
 	int ret;
 
@@ -1594,6 +1593,7 @@ static int nfs4_recover_expired_lease(struct nfs_server *server)
 	}
 	return ret;
 }
+EXPORT_SYMBOL(nfs4_recover_expired_lease);
 
 /*
  * OPEN_EXPIRED:
@@ -1683,7 +1683,7 @@ static int _nfs4_do_open(struct inode *dir, struct path *path, fmode_t fmode, in
 		dprintk("nfs4_do_open: nfs4_get_state_owner failed!\n");
 		goto out_err;
 	}
-	status = nfs4_recover_expired_lease(server);
+	status = nfs4_recover_expired_lease(server->nfs_client);
 	if (status != 0)
 		goto err_put_state_owner;
 	if (path->dentry->d_inode != NULL)
@@ -5121,7 +5121,7 @@ int nfs4_init_session(struct nfs_server *server)
 	session->fc_attrs.max_rqst_sz = wsize + nfs41_maxwrite_overhead;
 	session->fc_attrs.max_resp_sz = rsize + nfs41_maxread_overhead;
 
-	ret = nfs4_recover_expired_lease(server);
+	ret = nfs4_recover_expired_lease(server->nfs_client);
 	if (!ret)
 		ret = nfs4_check_client_ready(clp);
 	return ret;
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 36/50] pnfs_submit: generic read
  2010-08-13 21:31                                                                     ` [PATCH 35/50] pnfs_submit: filelayout i/o helpers andros
@ 2010-08-13 21:31                                                                       ` andros
  2010-08-13 21:31                                                                         ` [PATCH 37/50] pnfs_submit: filelayout read andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/internal.h          |    4 ++
 fs/nfs/nfs4proc.c          |   15 ++++++-
 fs/nfs/pnfs.c              |   35 +++++++++++++++++
 fs/nfs/pnfs.h              |   16 ++++++++
 fs/nfs/read.c              |   90 ++++++++++++++++++++++++++++++-------------
 include/linux/nfs4_pnfs.h  |   17 ++++++++
 include/linux/nfs_iostat.h |    1 +
 include/linux/nfs_xdr.h    |   21 ++++++++++
 8 files changed, 169 insertions(+), 30 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index eba1cc0..37f9926 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -261,6 +261,10 @@ extern int nfs4_get_rootfh(struct nfs_server *server, struct nfs_fh *mntfh);
 #endif
 
 /* read.c */
+extern int nfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
+			     const struct rpc_call_ops *call_ops);
+extern int pnfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
+			     const struct rpc_call_ops *call_ops);
 extern void nfs_read_prepare(struct rpc_task *task, void *calldata);
 
 /* write.c */
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index d7d193b..4346a82 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3129,19 +3129,28 @@ static int nfs4_proc_pathconf(struct nfs_server *server, struct nfs_fh *fhandle,
 static int nfs4_read_done(struct rpc_task *task, struct nfs_read_data *data)
 {
 	struct nfs_server *server = NFS_SERVER(data->inode);
+	struct nfs_client *client = server->nfs_client;
 
 	dprintk("--> %s\n", __func__);
 
+#ifdef CONFIG_NFS_V4_1
+	/* Is this a DS session */
+	if (data->fldata.ds_nfs_client) {
+		dprintk("%s DS read\n", __func__);
+		client = data->fldata.ds_nfs_client;
+	}
+#endif /* CONFIG_NFS_V4_1 */
+
 	if (!nfs4_sequence_done(task, &data->res.seq_res))
 		return -EAGAIN;
 
-	if (nfs4_async_handle_error(task, server, data->args.context->state, NULL) == -EAGAIN) {
-		nfs_restart_rpc(task, server->nfs_client);
+	if (nfs4_async_handle_error(task, server, data->args.context->state, client) == -EAGAIN) {
+		nfs_restart_rpc(task, client);
 		return -EAGAIN;
 	}
 
 	nfs_invalidate_atime(data->inode);
-	if (task->tk_status > 0)
+	if (task->tk_status > 0 && client == server->nfs_client)
 		renew_lease(server, data->timestamp);
 	return 0;
 }
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 3ff193b..6725539 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1245,6 +1245,41 @@ static void _pnfs_clear_lseg_from_pages(struct list_head *head)
 }
 
 /*
+ * Call the appropriate parallel I/O subsystem read function.
+ * If no I/O device driver exists, or one does match the returned
+ * fstype, then return a positive status for regular NFS processing.
+ */
+enum pnfs_try_status
+pnfs_try_to_read_data(struct nfs_read_data *rdata,
+		       const struct rpc_call_ops *call_ops)
+{
+	struct inode *inode = rdata->inode;
+	struct nfs_server *nfss = NFS_SERVER(inode);
+	struct pnfs_layout_segment *lseg = rdata->req->wb_lseg;
+	enum pnfs_try_status trypnfs;
+
+	rdata->pdata.call_ops = call_ops;
+
+	dprintk("%s: Reading ino:%lu %u@%llu\n",
+		__func__, inode->i_ino, rdata->args.count, rdata->args.offset);
+
+	get_lseg(lseg);
+
+	rdata->pdata.lseg = lseg;
+	trypnfs = nfss->pnfs_curr_ld->ld_io_ops->read_pagelist(rdata,
+		nfs_page_array_len(rdata->args.pgbase, rdata->args.count));
+	if (trypnfs == PNFS_NOT_ATTEMPTED) {
+		rdata->pdata.lseg = NULL;
+		put_lseg(lseg);
+		_pnfs_clear_lseg_from_pages(&rdata->pages);
+	} else {
+		nfs_inc_stats(inode, NFSIOS_PNFS_READ);
+	}
+	dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
+	return trypnfs;
+}
+
+/*
  * Set up the argument/result storage required for the RPC call.
  */
 static int
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 5cd00fd..b7a3769 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -40,6 +40,8 @@ int _pnfs_return_layout(struct inode *, struct nfs4_pnfs_layout_segment *,
 			enum pnfs_layoutreturn_type, bool wait);
 void set_pnfs_layoutdriver(struct nfs_server *, u32 id);
 void unmount_pnfs_layoutdriver(struct nfs_server *);
+enum pnfs_try_status pnfs_try_to_read_data(struct nfs_read_data *,
+					    const struct rpc_call_ops *);
 int pnfs_initialize(void);
 void pnfs_uninitialize(void);
 void pnfs_layoutcommit_free(struct pnfs_layoutcommit_data *data);
@@ -148,6 +150,20 @@ pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
 		*lsegpp = NULL;
 }
 
+static inline enum pnfs_try_status
+pnfs_try_to_read_data(struct nfs_read_data *data,
+		      const struct rpc_call_ops *call_ops)
+{
+	return PNFS_NOT_ATTEMPTED;
+}
+
+static inline enum pnfs_try_status
+pnfs_try_to_commit(struct nfs_write_data *data,
+		   const struct rpc_call_ops *call_ops, int how)
+{
+	return PNFS_NOT_ATTEMPTED;
+}
+
 static inline int pnfs_layoutcommit_inode(struct inode *inode, int sync)
 {
 	return 0;
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 324b577..ae3681b 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -18,8 +18,11 @@
 #include <linux/sunrpc/clnt.h>
 #include <linux/nfs_fs.h>
 #include <linux/nfs_page.h>
+#include <linux/smp_lock.h>
+#include <linux/module.h>
 
 #include <asm/system.h>
+#include <linux/module.h>
 #include "pnfs.h"
 
 #include "nfs4_fs.h"
@@ -159,24 +162,20 @@ static void nfs_readpage_release(struct nfs_page *req)
 	nfs_release_request(req);
 }
 
-/*
- * Set up the NFS read request struct
- */
-static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
-		const struct rpc_call_ops *call_ops,
-		unsigned int count, unsigned int offset)
+int nfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
+		      const struct rpc_call_ops *call_ops)
 {
-	struct inode *inode = req->wb_context->path.dentry->d_inode;
+	struct inode *inode = data->inode;
 	int swap_flags = IS_SWAPFILE(inode) ? NFS_RPC_SWAPFLAGS : 0;
 	struct rpc_task *task;
 	struct rpc_message msg = {
 		.rpc_argp = &data->args,
 		.rpc_resp = &data->res,
-		.rpc_cred = req->wb_context->cred,
+		.rpc_cred = data->cred,
 	};
 	struct rpc_task_setup task_setup_data = {
 		.task = &data->task,
-		.rpc_client = NFS_CLIENT(inode),
+		.rpc_client = clnt,
 		.rpc_message = &msg,
 		.callback_ops = call_ops,
 		.callback_data = data,
@@ -184,9 +183,46 @@ static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
 		.flags = RPC_TASK_ASYNC | swap_flags,
 	};
 
+	/* Set up the initial task struct. */
+	NFS_PROTO(inode)->read_setup(data, &msg);
+
+	dprintk("NFS: %5u initiated read call (req %s/%Ld, %u bytes @ offset %Lu)\n",
+			data->task.tk_pid,
+			inode->i_sb->s_id,
+			(long long)NFS_FILEID(inode),
+			data->args.count,
+			(unsigned long long)data->args.offset);
+
+	task = rpc_run_task(&task_setup_data);
+	if (IS_ERR(task))
+		return PTR_ERR(task);
+	rpc_put_task(task);
+	return 0;
+}
+EXPORT_SYMBOL(nfs_initiate_read);
+
+int pnfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
+		       const struct rpc_call_ops *call_ops)
+{
+	if (data->req->wb_lseg &&
+	    (pnfs_try_to_read_data(data, call_ops) == PNFS_ATTEMPTED))
+		return 0;
+
+	return nfs_initiate_read(data, clnt, call_ops);
+}
+
+/*
+ * Set up the NFS read request struct
+ */
+static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
+		const struct rpc_call_ops *call_ops,
+		unsigned int count, unsigned int offset)
+{
+	struct inode *inode = req->wb_context->path.dentry->d_inode;
+
 	data->req	  = req;
 	data->inode	  = inode;
-	data->cred	  = msg.rpc_cred;
+	data->cred	  = req->wb_context->cred;
 
 	data->args.fh     = NFS_FH(inode);
 	data->args.offset = req_offset(req) + offset;
@@ -201,21 +237,7 @@ static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
 	data->res.eof     = 0;
 	nfs_fattr_init(&data->fattr);
 
-	/* Set up the initial task struct. */
-	NFS_PROTO(inode)->read_setup(data, &msg);
-
-	dprintk("NFS: %5u initiated read call (req %s/%Ld, %u bytes @ offset %Lu)\n",
-			data->task.tk_pid,
-			inode->i_sb->s_id,
-			(long long)NFS_FILEID(inode),
-			count,
-			(unsigned long long)data->args.offset);
-
-	task = rpc_run_task(&task_setup_data);
-	if (IS_ERR(task))
-		return PTR_ERR(task);
-	rpc_put_task(task);
-	return 0;
+	return pnfs_initiate_read(data, NFS_CLIENT(inode), call_ops);
 }
 
 static void
@@ -359,7 +381,14 @@ static void nfs_readpage_retry(struct rpc_task *task, struct nfs_read_data *data
 {
 	struct nfs_readargs *argp = &data->args;
 	struct nfs_readres *resp = &data->res;
+	struct nfs_client *clp = NFS_SERVER(data->inode)->nfs_client;
 
+#ifdef CONFIG_NFS_V4_1
+	if (data->fldata.ds_nfs_client) {
+		dprintk("%s DS read\n", __func__);
+		clp = data->fldata.ds_nfs_client;
+	}
+#endif /* CONFIG_NFS_V4_1 */
 	if (resp->eof || resp->count == argp->count)
 		return;
 
@@ -373,7 +402,7 @@ static void nfs_readpage_retry(struct rpc_task *task, struct nfs_read_data *data
 	argp->offset += resp->count;
 	argp->pgbase += resp->count;
 	argp->count -= resp->count;
-	nfs_restart_rpc(task, NFS_SERVER(data->inode)->nfs_client);
+	nfs_restart_rpc(task, clp);
 }
 
 /*
@@ -414,13 +443,19 @@ static void nfs_readpage_release_partial(void *calldata)
 void nfs_read_prepare(struct rpc_task *task, void *calldata)
 {
 	struct nfs_read_data *data = calldata;
+	struct nfs4_session *ds_session = NULL;
 
-	if (nfs4_setup_sequence(NFS_SERVER(data->inode), NULL,
+	if (data->fldata.ds_nfs_client) {
+		dprintk("%s DS read\n", __func__);
+		ds_session = data->fldata.ds_nfs_client->cl_session;
+	}
+	if (nfs4_setup_sequence(NFS_SERVER(data->inode), ds_session,
 				&data->args.seq_args, &data->res.seq_res,
 				0, task))
 		return;
 	rpc_call_start(task);
 }
+EXPORT_SYMBOL(nfs_read_prepare);
 #endif /* CONFIG_NFS_V4_1 */
 
 static const struct rpc_call_ops nfs_read_partial_ops = {
@@ -641,6 +676,7 @@ int nfs_readpages(struct file *filp, struct address_space *mapping,
 	ret = read_cache_pages(mapping, pages, readpage_async_filler, &desc);
 
 	nfs_pageio_complete(&pgio);
+	put_lseg(pgio.pg_lseg);
 	npages = (pgio.pg_bytes_written + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
 	nfs_add_stats(inode, NFSIOS_READPAGES, npages);
 read_complete:
diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
index 9e3fff4..2bd068d 100644
--- a/include/linux/nfs4_pnfs.h
+++ b/include/linux/nfs4_pnfs.h
@@ -15,6 +15,11 @@
 #include <linux/pnfs_xdr.h>
 #include <linux/nfs_page.h>
 
+enum pnfs_try_status {
+	PNFS_ATTEMPTED     = 0,
+	PNFS_NOT_ATTEMPTED = 1,
+};
+
 /* Per-layout driver specific registration structure */
 struct pnfs_layoutdriver_type {
 	const u32 id;
@@ -104,6 +109,18 @@ LSEG_LD_DATA(struct pnfs_layout_segment *lseg)
  * Either the pagecache or non-pagecache read/write operations must be implemented
  */
 struct layoutdriver_io_operations {
+	/* Functions that use the pagecache.
+	 * If use_pagecache == 1, then these functions must be implemented.
+	 */
+	/* read and write pagelist should return just 0 (to indicate that
+	 * the layout code has taken control) or 1 (to indicate that the
+	 * layout code wishes to fall back to normal nfs.)  If 0 is returned,
+	 * information can be passed back through nfs_data->res and
+	 * nfs_data->task.tk_status, and the appropriate pnfs done function
+	 * MUST be called.
+	 */
+	enum pnfs_try_status
+	(*read_pagelist) (struct nfs_read_data *nfs_data, unsigned nr_pages);
 	/* Layout information. For each inode, alloc_layout is executed once to retrieve an
 	 * inode specific layout structure.  Each subsequent layoutget operation results in
 	 * a set_layout call to set the opaque layout in the layout driver.*/
diff --git a/include/linux/nfs_iostat.h b/include/linux/nfs_iostat.h
index 68b10f5..37a1437 100644
--- a/include/linux/nfs_iostat.h
+++ b/include/linux/nfs_iostat.h
@@ -113,6 +113,7 @@ enum nfs_stat_eventcounters {
 	NFSIOS_SHORTREAD,
 	NFSIOS_SHORTWRITE,
 	NFSIOS_DELAY,
+	NFSIOS_PNFS_READ,
 	__NFSIOS_COUNTSMAX,
 };
 
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index f1054d4..2de5313 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -972,6 +972,23 @@ struct nfs_page;
 
 #define NFS_PAGEVEC_SIZE	(8U)
 
+#if defined(CONFIG_NFS_V4_1)
+
+/* pnfs-specific data needed for read, write, and commit calls */
+struct pnfs_call_data {
+	struct pnfs_layout_segment *lseg;
+	const struct rpc_call_ops *call_ops;
+	u32			orig_count;	/* for retry via MDS */
+	u8			how;		/* for FLUSH_STABLE */
+};
+
+/* files layout-type specific data for read, write, and commit */
+struct pnfs_fl_call_data {
+	struct nfs_client	*ds_nfs_client;
+	__u64			orig_offset;
+};
+#endif /* CONFIG_NFS_V4_1 */
+
 struct nfs_read_data {
 	int			flags;
 	struct rpc_task		task;
@@ -987,6 +1004,10 @@ struct nfs_read_data {
 #ifdef CONFIG_NFS_V4
 	unsigned long		timestamp;	/* For lease renewal */
 #endif
+#if defined(CONFIG_NFS_V4_1)
+	struct pnfs_call_data	pdata;
+	struct pnfs_fl_call_data fldata;
+#endif /* CONFIG_NFS_V4_1 */
 	struct page		*page_array[NFS_PAGEVEC_SIZE];
 };
 
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 37/50] pnfs_submit: filelayout read
  2010-08-13 21:31                                                                       ` [PATCH 36/50] pnfs_submit: generic read andros
@ 2010-08-13 21:31                                                                         ` andros
  2010-08-13 21:31                                                                           ` [PATCH 38/50] pnfs_submit: generic write andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayout.c |   88 +++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 88 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index d1c0d35..c9251ca 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -124,6 +124,93 @@ filelayout_get_dserver_offset(struct pnfs_layout_segment *lseg, loff_t offset)
 }
 
 /*
+ * Call ops for the async read/write cases
+ * In the case of dense layouts, the offset needs to be reset to its
+ * original value.
+ */
+static void filelayout_read_call_done(struct rpc_task *task, void *data)
+{
+	struct nfs_read_data *rdata = (struct nfs_read_data *)data;
+
+	if (rdata->fldata.orig_offset) {
+		dprintk("%s new off %llu orig offset %llu\n", __func__,
+			rdata->args.offset, rdata->fldata.orig_offset);
+		rdata->args.offset = rdata->fldata.orig_offset;
+	}
+
+	/* Note this may cause RPC to be resent */
+	rdata->pdata.call_ops->rpc_call_done(task, data);
+}
+
+static void filelayout_read_release(void *data)
+{
+	struct nfs_read_data *rdata = (struct nfs_read_data *)data;
+
+	put_lseg(rdata->pdata.lseg);
+	rdata->pdata.lseg = NULL;
+	rdata->pdata.call_ops->rpc_release(data);
+}
+
+struct rpc_call_ops filelayout_read_call_ops = {
+	.rpc_call_prepare = nfs_read_prepare,
+	.rpc_call_done = filelayout_read_call_done,
+	.rpc_release = filelayout_read_release,
+};
+
+/* Perform sync or async reads.
+ *
+ * An optimization for the NFS file layout driver
+ * allows the original read/write data structs to be passed in the
+ * last argument.
+ *
+ * TODO: join with write_pagelist?
+ */
+static enum pnfs_try_status
+filelayout_read_pagelist(struct nfs_read_data *data, unsigned nr_pages)
+{
+	struct pnfs_layout_segment *lseg = data->pdata.lseg;
+	struct nfs4_pnfs_ds *ds;
+	loff_t offset = data->args.offset;
+	u32 idx;
+	struct nfs_fh *fh;
+
+	dprintk("--> %s ino %lu nr_pages %d pgbase %u req %Zu@%llu\n",
+		__func__, data->inode->i_ino, nr_pages,
+		data->args.pgbase, (size_t)data->args.count, offset);
+
+	/* Retrieve the correct rpc_client for the byte range */
+	idx = nfs4_fl_calc_ds_index(lseg, offset);
+	ds = nfs4_fl_prepare_ds(lseg, idx);
+	if (!ds) {
+		printk(KERN_ERR "%s: prepare_ds failed, use MDS\n", __func__);
+		return PNFS_NOT_ATTEMPTED;
+	}
+	dprintk("%s USE DS:ip %x %s\n", __func__,
+		htonl(ds->ds_ip_addr), ds->r_addr);
+
+	/* just try the first data server for the index..*/
+	data->fldata.ds_nfs_client = ds->ds_clp;
+	fh = nfs4_fl_select_ds_fh(lseg, offset);
+	if (fh)
+		data->args.fh = fh;
+
+	/*
+	 * Now get the file offset on the dserver
+	 * Set the read offset to this offset, and
+	 * save the original offset in orig_offset
+	 * In the case of aync reads, the offset will be reset in the
+	 * call_ops->rpc_call_done() routine.
+	 */
+	data->args.offset = filelayout_get_dserver_offset(lseg, offset);
+	data->fldata.orig_offset = offset;
+
+	/* Perform an asynchronous read */
+	nfs_initiate_read(data, ds->ds_clp->cl_rpcclient,
+			  &filelayout_read_call_ops);
+	return PNFS_ATTEMPTED;
+}
+
+/*
  * Create a filelayout layout structure and return it.  The pNFS client
  * will use the pnfs_layout_type type to refer to the layout for this
  * inode from now on.
@@ -377,6 +464,7 @@ filelayout_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
 }
 
 struct layoutdriver_io_operations filelayout_io_operations = {
+	.read_pagelist           = filelayout_read_pagelist,
 	.alloc_layout            = filelayout_alloc_layout,
 	.free_layout             = filelayout_free_layout,
 	.alloc_lseg              = filelayout_alloc_lseg,
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 38/50] pnfs_submit: generic write
  2010-08-13 21:31                                                                         ` [PATCH 37/50] pnfs_submit: filelayout read andros
@ 2010-08-13 21:31                                                                           ` andros
  2010-08-13 21:31                                                                             ` [PATCH 39/50] pnfs_submit: data server write with no getattr andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/internal.h          |    9 +++
 fs/nfs/nfs4proc.c          |   43 +++++++++++++--
 fs/nfs/pnfs.c              |   44 +++++++++++++++
 fs/nfs/pnfs.h              |    9 +++
 fs/nfs/write.c             |  125 ++++++++++++++++++++++++++++++--------------
 include/linux/nfs4_pnfs.h  |    4 ++
 include/linux/nfs_iostat.h |    1 +
 include/linux/nfs_xdr.h    |    4 ++
 8 files changed, 194 insertions(+), 45 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 37f9926..02f0da8 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -268,7 +268,16 @@ extern int pnfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
 extern void nfs_read_prepare(struct rpc_task *task, void *calldata);
 
 /* write.c */
+extern int nfs_initiate_write(struct nfs_write_data *data,
+			      struct rpc_clnt *clnt,
+			      const struct rpc_call_ops *call_ops,
+			      int how);
+extern int pnfs_initiate_write(struct nfs_write_data *data,
+			      struct rpc_clnt *clnt,
+			      const struct rpc_call_ops *call_ops,
+			      int how);
 extern void nfs_write_prepare(struct rpc_task *task, void *calldata);
+extern void nfs_mark_list_commit(struct list_head *head);
 #ifdef CONFIG_MIGRATION
 extern int nfs_migrate_page(struct address_space *,
 		struct page *, struct page *);
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 4346a82..a6a0e7e 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3161,20 +3161,53 @@ static void nfs4_proc_read_setup(struct nfs_read_data *data, struct rpc_message
 	msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
 }
 
+static void pnfs4_update_write_done(struct nfs_inode *nfsi, struct nfs_write_data *data)
+{
+#ifdef CONFIG_NFS_V4_1
+	pnfs_update_last_write(nfsi, data->args.offset, data->res.count);
+	pnfs_need_layoutcommit(nfsi, data->args.context);
+#endif /* CONFIG_NFS_V4_1 */
+}
+
 static int nfs4_write_done(struct rpc_task *task, struct nfs_write_data *data)
 {
 	struct inode *inode = data->inode;
-	
+	struct nfs_server *server = NFS_SERVER(inode);
+	struct nfs_client *client = server->nfs_client;
+
 	if (!nfs4_sequence_done(task, &data->res.seq_res))
 		return -EAGAIN;
 
-	if (nfs4_async_handle_error(task, NFS_SERVER(inode), data->args.context->state, NULL) == -EAGAIN) {
-		nfs_restart_rpc(task, NFS_SERVER(inode)->nfs_client);
+#ifdef CONFIG_NFS_V4_1
+	/* restore original count after retry? */
+	if (data->pdata.orig_count) {
+		dprintk("%s: restoring original count %u\n", __func__,
+			data->pdata.orig_count);
+		data->args.count = data->pdata.orig_count;
+	}
+
+	/* Is this a DS session */
+	if (data->fldata.ds_nfs_client) {
+		dprintk("%s DS write\n", __func__);
+		client = data->fldata.ds_nfs_client;
+	}
+#endif /* CONFIG_NFS_V4_1 */
+
+	if (nfs4_async_handle_error(task, server, data->args.context->state, client) == -EAGAIN) {
+		nfs_restart_rpc(task, client);
 		return -EAGAIN;
 	}
+
+	/*
+	 * MDS write: renew lease
+	 * DS write: update lastbyte written, mark for layout commit
+	 */
 	if (task->tk_status >= 0) {
-		renew_lease(NFS_SERVER(inode), data->timestamp);
-		nfs_post_op_update_inode_force_wcc(inode, data->res.fattr);
+		if (client == server->nfs_client) {
+			renew_lease(server, data->timestamp);
+			nfs_post_op_update_inode_force_wcc(inode, data->res.fattr);
+		} else
+			pnfs4_update_write_done(NFS_I(inode), data);
 	}
 	return 0;
 }
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 6725539..424efce 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1245,6 +1245,50 @@ static void _pnfs_clear_lseg_from_pages(struct list_head *head)
 }
 
 /*
+ * Call the appropriate parallel I/O subsystem write function.
+ * If no I/O device driver exists, or one does match the returned
+ * fstype, then return a positive status for regular NFS processing.
+ *
+ * TODO: Is wdata->how and wdata->args.stable always the same value?
+ * TODO: It seems in NFS, the server may not do a stable write even
+ * though it was requested (and vice-versa?).  To check, it looks
+ * in data->res.verf->committed.  Do we need this ability
+ * for non-file layout drivers?
+ */
+enum pnfs_try_status
+pnfs_try_to_write_data(struct nfs_write_data *wdata,
+			const struct rpc_call_ops *call_ops, int how)
+{
+	struct inode *inode = wdata->inode;
+	enum pnfs_try_status trypnfs;
+	struct nfs_server *nfss = NFS_SERVER(inode);
+	struct pnfs_layout_segment *lseg = wdata->req->wb_lseg;
+
+	wdata->pdata.call_ops = call_ops;
+	wdata->pdata.how = how;
+
+	dprintk("%s: Writing ino:%lu %u@%llu (how %d)\n", __func__,
+		inode->i_ino, wdata->args.count, wdata->args.offset, how);
+
+	get_lseg(lseg);
+
+	wdata->pdata.lseg = lseg;
+	trypnfs = nfss->pnfs_curr_ld->ld_io_ops->write_pagelist(wdata,
+		nfs_page_array_len(wdata->args.pgbase, wdata->args.count),
+								how);
+
+	if (trypnfs == PNFS_NOT_ATTEMPTED) {
+		wdata->pdata.lseg = NULL;
+		put_lseg(lseg);
+		_pnfs_clear_lseg_from_pages(&wdata->pages);
+	} else {
+		nfs_inc_stats(inode, NFSIOS_PNFS_WRITE);
+	}
+	dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
+	return trypnfs;
+}
+
+/*
  * Call the appropriate parallel I/O subsystem read function.
  * If no I/O device driver exists, or one does match the returned
  * fstype, then return a positive status for regular NFS processing.
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index b7a3769..b110f4e 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -40,6 +40,8 @@ int _pnfs_return_layout(struct inode *, struct nfs4_pnfs_layout_segment *,
 			enum pnfs_layoutreturn_type, bool wait);
 void set_pnfs_layoutdriver(struct nfs_server *, u32 id);
 void unmount_pnfs_layoutdriver(struct nfs_server *);
+enum pnfs_try_status pnfs_try_to_write_data(struct nfs_write_data *,
+					     const struct rpc_call_ops *, int);
 enum pnfs_try_status pnfs_try_to_read_data(struct nfs_read_data *,
 					    const struct rpc_call_ops *);
 int pnfs_initialize(void);
@@ -158,6 +160,13 @@ pnfs_try_to_read_data(struct nfs_read_data *data,
 }
 
 static inline enum pnfs_try_status
+pnfs_try_to_write_data(struct nfs_write_data *data,
+		       const struct rpc_call_ops *call_ops, int how)
+{
+	return PNFS_NOT_ATTEMPTED;
+}
+
+static inline enum pnfs_try_status
 pnfs_try_to_commit(struct nfs_write_data *data,
 		   const struct rpc_call_ops *call_ops, int how)
 {
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index a1f28c5..fbc8657 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -802,25 +802,21 @@ static int flush_task_priority(int how)
 	return RPC_PRIORITY_NORMAL;
 }
 
-/*
- * Set up the argument/result storage required for the RPC call.
- */
-static int nfs_write_rpcsetup(struct nfs_page *req,
-		struct nfs_write_data *data,
-		const struct rpc_call_ops *call_ops,
-		unsigned int count, unsigned int offset,
-		int how)
+int nfs_initiate_write(struct nfs_write_data *data,
+		       struct rpc_clnt *clnt,
+		       const struct rpc_call_ops *call_ops,
+		       int how)
 {
-	struct inode *inode = req->wb_context->path.dentry->d_inode;
+	struct inode *inode = data->inode;
 	int priority = flush_task_priority(how);
 	struct rpc_task *task;
 	struct rpc_message msg = {
 		.rpc_argp = &data->args,
 		.rpc_resp = &data->res,
-		.rpc_cred = req->wb_context->cred,
+		.rpc_cred = data->cred,
 	};
 	struct rpc_task_setup task_setup_data = {
-		.rpc_client = NFS_CLIENT(inode),
+		.rpc_client = clnt,
 		.task = &data->task,
 		.rpc_message = &msg,
 		.callback_ops = call_ops,
@@ -831,12 +827,62 @@ static int nfs_write_rpcsetup(struct nfs_page *req,
 	};
 	int ret = 0;
 
+	/* Set up the initial task struct.  */
+	NFS_PROTO(inode)->write_setup(data, &msg);
+
+	dprintk("NFS: %5u initiated write call "
+		"(req %s/%lld, %u bytes @ offset %llu)\n",
+		data->task.tk_pid,
+		inode->i_sb->s_id,
+		(long long)NFS_FILEID(inode),
+		data->args.count,
+		(unsigned long long)data->args.offset);
+
+	task = rpc_run_task(&task_setup_data);
+	if (IS_ERR(task)) {
+		ret = PTR_ERR(task);
+		goto out;
+	}
+	if (how & FLUSH_SYNC) {
+		ret = rpc_wait_for_completion_task(task);
+		if (ret == 0)
+			ret = task->tk_status;
+	}
+	rpc_put_task(task);
+out:
+	return ret;
+}
+EXPORT_SYMBOL(nfs_initiate_write);
+
+int pnfs_initiate_write(struct nfs_write_data *data,
+			struct rpc_clnt *clnt,
+			const struct rpc_call_ops *call_ops,
+			int how)
+{
+	if (data->req->wb_lseg &&
+	    (pnfs_try_to_write_data(data, call_ops, how) == PNFS_ATTEMPTED))
+		return 0;
+
+	return nfs_initiate_write(data, clnt, call_ops, how);
+}
+
+/*
+ * Set up the argument/result storage required for the RPC call.
+ */
+static int nfs_write_rpcsetup(struct nfs_page *req,
+		struct nfs_write_data *data,
+		const struct rpc_call_ops *call_ops,
+		unsigned int count, unsigned int offset,
+		int how)
+{
+	struct inode *inode = req->wb_context->path.dentry->d_inode;
+
 	/* Set up the RPC argument and reply structs
 	 * NB: take care not to mess about with data->commit et al. */
 
 	data->req = req;
 	data->inode = inode = req->wb_context->path.dentry->d_inode;
-	data->cred = msg.rpc_cred;
+	data->cred = req->wb_context->cred;
 
 	data->args.fh     = NFS_FH(inode);
 	data->args.offset = req_offset(req) + offset;
@@ -857,30 +903,7 @@ static int nfs_write_rpcsetup(struct nfs_page *req,
 	data->res.verf    = &data->verf;
 	nfs_fattr_init(&data->fattr);
 
-	/* Set up the initial task struct.  */
-	NFS_PROTO(inode)->write_setup(data, &msg);
-
-	dprintk("NFS: %5u initiated write call "
-		"(req %s/%lld, %u bytes @ offset %llu)\n",
-		data->task.tk_pid,
-		inode->i_sb->s_id,
-		(long long)NFS_FILEID(inode),
-		count,
-		(unsigned long long)data->args.offset);
-
-	task = rpc_run_task(&task_setup_data);
-	if (IS_ERR(task)) {
-		ret = PTR_ERR(task);
-		goto out;
-	}
-	if (how & FLUSH_SYNC) {
-		ret = rpc_wait_for_completion_task(task);
-		if (ret == 0)
-			ret = task->tk_status;
-	}
-	rpc_put_task(task);
-out:
-	return ret;
+	return pnfs_initiate_write(data, NFS_CLIENT(inode), call_ops, how);
 }
 
 /* If a nfs_flush_* function fails, it should remove reqs from @head and
@@ -1073,13 +1096,27 @@ out:
 void nfs_write_prepare(struct rpc_task *task, void *calldata)
 {
 	struct nfs_write_data *data = calldata;
+	struct nfs4_session *ds_session = NULL;
+
+	if (data->fldata.ds_nfs_client) {
+		dprintk("%s DS read\n", __func__);
+		ds_session = data->fldata.ds_nfs_client->cl_session;
+	} else if (data->args.count > NFS_SERVER(data->inode)->wsize) {
+		/* retrying via MDS? */
+		data->pdata.orig_count = data->args.count;
+		data->args.count = NFS_SERVER(data->inode)->wsize;
+		dprintk("%s: trimmed count %u to wsize %u\n", __func__,
+		data->pdata.orig_count, data->args.count);
+	} else
+		data->pdata.orig_count = 0;
 
-	if (nfs4_setup_sequence(NFS_SERVER(data->inode), NULL,
+	if (nfs4_setup_sequence(NFS_SERVER(data->inode), ds_session,
 				&data->args.seq_args,
 				&data->res.seq_res, 1, task))
 		return;
 	rpc_call_start(task);
 }
+EXPORT_SYMBOL(nfs_write_prepare);
 #endif /* CONFIG_NFS_V4_1 */
 
 static const struct rpc_call_ops nfs_write_partial_ops = {
@@ -1163,10 +1200,11 @@ int nfs_writeback_done(struct rpc_task *task, struct nfs_write_data *data)
 	struct nfs_writeargs	*argp = &data->args;
 	struct nfs_writeres	*resp = &data->res;
 	struct nfs_server	*server = NFS_SERVER(data->inode);
+	struct nfs_client	*clp = server->nfs_client;
 	int status;
 
-	dprintk("NFS: %5u nfs_writeback_done (status %d)\n",
-		task->tk_pid, task->tk_status);
+	dprintk("NFS: %5u nfs_writeback_done (status %d count %u)\n",
+		task->tk_pid, task->tk_status, resp->count);
 
 	/*
 	 * ->write_done will attempt to use post-op attributes to detect
@@ -1179,6 +1217,13 @@ int nfs_writeback_done(struct rpc_task *task, struct nfs_write_data *data)
 	if (status != 0)
 		return status;
 	nfs_add_stats(data->inode, NFSIOS_SERVERWRITTENBYTES, resp->count);
+#ifdef CONFIG_NFS_V4_1
+	/* Is this a DS session */
+	if (data->fldata.ds_nfs_client) {
+		dprintk("%s DS write\n", __func__);
+		clp = data->fldata.ds_nfs_client;
+	}
+#endif /* CONFIG_NFS_V4_1 */
 
 #if defined(CONFIG_NFS_V3) || defined(CONFIG_NFS_V4)
 	if (resp->verf->committed < argp->stable && task->tk_status >= 0) {
@@ -1195,7 +1240,7 @@ int nfs_writeback_done(struct rpc_task *task, struct nfs_write_data *data)
 		if (time_before(complain, jiffies)) {
 			dprintk("NFS:       faulty NFS server %s:"
 				" (committed = %d) != (stable = %d)\n",
-				server->nfs_client->cl_hostname,
+				clp->cl_hostname,
 				resp->verf->committed, argp->stable);
 			complain = jiffies + 300 * HZ;
 		}
diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
index 2bd068d..b010ff1 100644
--- a/include/linux/nfs4_pnfs.h
+++ b/include/linux/nfs4_pnfs.h
@@ -121,6 +121,10 @@ struct layoutdriver_io_operations {
 	 */
 	enum pnfs_try_status
 	(*read_pagelist) (struct nfs_read_data *nfs_data, unsigned nr_pages);
+	enum pnfs_try_status
+	(*write_pagelist) (struct nfs_write_data *nfs_data, unsigned nr_pages, int how);
+
+
 	/* Layout information. For each inode, alloc_layout is executed once to retrieve an
 	 * inode specific layout structure.  Each subsequent layoutget operation results in
 	 * a set_layout call to set the opaque layout in the layout driver.*/
diff --git a/include/linux/nfs_iostat.h b/include/linux/nfs_iostat.h
index 37a1437..8866bb3 100644
--- a/include/linux/nfs_iostat.h
+++ b/include/linux/nfs_iostat.h
@@ -114,6 +114,7 @@ enum nfs_stat_eventcounters {
 	NFSIOS_SHORTWRITE,
 	NFSIOS_DELAY,
 	NFSIOS_PNFS_READ,
+	NFSIOS_PNFS_WRITE,
 	__NFSIOS_COUNTSMAX,
 };
 
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 2de5313..544d282 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1027,6 +1027,10 @@ struct nfs_write_data {
 #ifdef CONFIG_NFS_V4
 	unsigned long		timestamp;	/* For lease renewal */
 #endif
+#if defined(CONFIG_NFS_V4_1)
+	struct pnfs_call_data	pdata;
+	struct pnfs_fl_call_data fldata;
+#endif /* CONFIG_NFS_V4_1 */
 	struct page		*page_array[NFS_PAGEVEC_SIZE];
 };
 
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 39/50] pnfs_submit: data server write with no getattr
  2010-08-13 21:31                                                                           ` [PATCH 38/50] pnfs_submit: generic write andros
@ 2010-08-13 21:31                                                                             ` andros
  2010-08-13 21:31                                                                               ` [PATCH 40/50] pnfs_submit: filelayout write andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@netpp.com>
---
 fs/nfs/nfs4proc.c    |    8 ++++++-
 fs/nfs/nfs4xdr.c     |   58 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/nfs4.h |    1 +
 3 files changed, 66 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index a6a0e7e..44ffa33 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3220,13 +3220,19 @@ static void nfs4_proc_write_setup(struct nfs_write_data *data, struct rpc_messag
 	data->res.server = server;
 	data->timestamp   = jiffies;
 
+#ifdef CONFIG_NFS_V4_1
+	/* writes to DS use pnfs vector */
+	if (data->fldata.ds_nfs_client) {
+		msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_PNFS_WRITE];
+		return;
+	}
+#endif /* CONFIG_NFS_V4_1 */
 	msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_WRITE];
 }
 
 static int nfs4_commit_done(struct rpc_task *task, struct nfs_write_data *data)
 {
 	struct inode *inode = data->inode;
-	
 	if (!nfs4_sequence_done(task, &data->res.seq_res))
 		return -EAGAIN;
 
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 82a3412..520b589 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -757,6 +757,14 @@ static int nfs4_stat_to_errno(int);
 				decode_sequence_maxsz + \
 				decode_putfh_maxsz + \
 				decode_layoutreturn_maxsz)
+#define NFS4_enc_dswrite_sz	(compound_encode_hdr_maxsz + \
+				encode_sequence_maxsz +\
+				encode_putfh_maxsz + \
+				encode_write_maxsz)
+#define NFS4_dec_dswrite_sz	(compound_decode_hdr_maxsz + \
+				decode_sequence_maxsz + \
+				decode_putfh_maxsz + \
+				decode_write_maxsz)
 
 const u32 nfs41_maxwrite_overhead = ((RPC_MAX_HEADER_WITH_AUTH +
 				      compound_encode_hdr_maxsz +
@@ -2815,6 +2823,27 @@ static int nfs4_xdr_enc_layoutreturn(struct rpc_rqst *req, uint32_t *p,
 	encode_nops(&hdr);
 	return 0;
 }
+
+/*
+ * Encode a pNFS File Layout Data Server WRITE request
+ */
+static int nfs4_xdr_enc_dswrite(struct rpc_rqst *req, uint32_t *p,
+				struct nfs_writeargs *args)
+{
+	struct xdr_stream xdr;
+	struct compound_hdr hdr = {
+		.minorversion = nfs4_xdr_minorversion(&args->seq_args),
+	};
+
+	xdr_init_encode(&xdr, &req->rq_snd_buf, p);
+	encode_compound_hdr(&xdr, req, &hdr);
+	encode_sequence(&xdr, &args->seq_args, &hdr);
+	encode_putfh(&xdr, args->fh, &hdr);
+	encode_write(&xdr, args, &hdr);
+	encode_nops(&hdr);
+	return 0;
+}
+
 #endif /* CONFIG_NFS_V4_1 */
 
 static void print_overflow_msg(const char *func, const struct xdr_stream *xdr)
@@ -6367,6 +6396,34 @@ static int nfs4_xdr_dec_layoutcommit(struct rpc_rqst *rqstp, uint32_t *p,
 out:
 	return status;
 }
+
+/*
+ * Decode pNFS File Layout Data Server WRITE response
+ */
+static int nfs4_xdr_dec_dswrite(struct rpc_rqst *rqstp, uint32_t *p,
+				struct nfs_writeres *res)
+{
+	struct xdr_stream xdr;
+	struct compound_hdr hdr;
+	int status;
+
+	xdr_init_decode(&xdr, &rqstp->rq_rcv_buf, p);
+	status = decode_compound_hdr(&xdr, &hdr);
+	if (status)
+		goto out;
+	status = decode_sequence(&xdr, &res->seq_res, rqstp);
+	if (status)
+		goto out;
+	status = decode_putfh(&xdr);
+	if (status)
+		goto out;
+	status = decode_write(&xdr, res);
+	if (!status)
+		return res->count;
+out:
+	return status;
+}
+
 #endif /* CONFIG_NFS_V4_1 */
 
 __be32 *nfs4_decode_dirent(__be32 *p, struct nfs_entry *entry, int plus)
@@ -6549,6 +6606,7 @@ struct rpc_procinfo	nfs4_procedures[] = {
   PROC(PNFS_LAYOUTGET,  enc_layoutget,     dec_layoutget),
   PROC(PNFS_LAYOUTCOMMIT, enc_layoutcommit,  dec_layoutcommit),
   PROC(PNFS_LAYOUTRETURN, enc_layoutreturn,  dec_layoutreturn),
+  PROC(PNFS_WRITE, enc_dswrite,  dec_dswrite),
 #endif /* CONFIG_NFS_V4_1 */
 };
 
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 2737013..d5509b7 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -557,6 +557,7 @@ enum {
 	NFSPROC4_CLNT_PNFS_LAYOUTCOMMIT,
 	NFSPROC4_CLNT_PNFS_LAYOUTRETURN,
 	NFSPROC4_CLNT_PNFS_GETDEVICEINFO,
+	NFSPROC4_CLNT_PNFS_WRITE,
 };
 
 /* nfs41 types */
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 40/50] pnfs_submit: filelayout write
  2010-08-13 21:31                                                                             ` [PATCH 39/50] pnfs_submit: data server write with no getattr andros
@ 2010-08-13 21:31                                                                               ` andros
  2010-08-13 21:31                                                                                 ` [PATCH 41/50] pnfs_submit: signal layoutdriver commit andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayout.c |   71 +++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 71 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index c9251ca..8c5702b 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -151,12 +151,41 @@ static void filelayout_read_release(void *data)
 	rdata->pdata.call_ops->rpc_release(data);
 }
 
+static void filelayout_write_call_done(struct rpc_task *task, void *data)
+{
+	struct nfs_write_data *wdata = (struct nfs_write_data *)data;
+
+	if (wdata->fldata.orig_offset) {
+		dprintk("%s new off %llu orig offset %llu\n", __func__,
+			wdata->args.offset, wdata->fldata.orig_offset);
+		wdata->args.offset = wdata->fldata.orig_offset;
+	}
+
+	/* Note this may cause RPC to be resent */
+	wdata->pdata.call_ops->rpc_call_done(task, data);
+}
+
+static void filelayout_write_release(void *data)
+{
+	struct nfs_write_data *wdata = (struct nfs_write_data *)data;
+
+	put_lseg(wdata->pdata.lseg);
+	wdata->pdata.lseg = NULL;
+	wdata->pdata.call_ops->rpc_release(data);
+}
+
 struct rpc_call_ops filelayout_read_call_ops = {
 	.rpc_call_prepare = nfs_read_prepare,
 	.rpc_call_done = filelayout_read_call_done,
 	.rpc_release = filelayout_read_release,
 };
 
+struct rpc_call_ops filelayout_write_call_ops = {
+	.rpc_call_prepare = nfs_write_prepare,
+	.rpc_call_done = filelayout_write_call_done,
+	.rpc_release = filelayout_write_release,
+};
+
 /* Perform sync or async reads.
  *
  * An optimization for the NFS file layout driver
@@ -210,6 +239,47 @@ filelayout_read_pagelist(struct nfs_read_data *data, unsigned nr_pages)
 	return PNFS_ATTEMPTED;
 }
 
+/* Perform async writes. */
+static enum pnfs_try_status
+filelayout_write_pagelist(struct nfs_write_data *data, unsigned nr_pages, int sync)
+{
+	struct pnfs_layout_segment *lseg = data->pdata.lseg;
+	struct nfs4_pnfs_ds *ds;
+	loff_t offset = data->args.offset;
+	u32 idx;
+	struct nfs_fh *fh;
+
+	/* Retrieve the correct rpc_client for the byte range */
+	idx = nfs4_fl_calc_ds_index(lseg, offset);
+	ds = nfs4_fl_prepare_ds(lseg, idx);
+	if (!ds) {
+		printk(KERN_ERR "%s: prepare_ds failed, use MDS\n", __func__);
+		return PNFS_NOT_ATTEMPTED;
+	}
+	dprintk("%s ino %lu sync %d req %Zu@%llu DS:%x:%hu %s\n", __func__,
+		data->inode->i_ino, sync, (size_t) data->args.count, offset,
+		htonl(ds->ds_ip_addr), ntohs(ds->ds_port), ds->r_addr);
+
+	data->fldata.ds_nfs_client = ds->ds_clp;
+	fh = nfs4_fl_select_ds_fh(lseg, offset);
+	if (fh)
+		data->args.fh = fh;
+	/*
+	 * Get the file offset on the dserver. Set the write offset to
+	 * this offset and save the original offset.
+	 */
+	data->args.offset = filelayout_get_dserver_offset(lseg, offset);
+	data->fldata.orig_offset = offset;
+
+	/*
+	 * Perform an asynchronous write The offset will be reset in the
+	 * call_ops->rpc_call_done() routine
+	 */
+	nfs_initiate_write(data, ds->ds_clp->cl_rpcclient,
+			   &filelayout_write_call_ops, sync);
+	return PNFS_ATTEMPTED;
+}
+
 /*
  * Create a filelayout layout structure and return it.  The pNFS client
  * will use the pnfs_layout_type type to refer to the layout for this
@@ -465,6 +535,7 @@ filelayout_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
 
 struct layoutdriver_io_operations filelayout_io_operations = {
 	.read_pagelist           = filelayout_read_pagelist,
+	.write_pagelist          = filelayout_write_pagelist,
 	.alloc_layout            = filelayout_alloc_layout,
 	.free_layout             = filelayout_free_layout,
 	.alloc_lseg              = filelayout_alloc_lseg,
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 41/50] pnfs_submit: signal layoutdriver commit
  2010-08-13 21:31                                                                               ` [PATCH 40/50] pnfs_submit: filelayout write andros
@ 2010-08-13 21:31                                                                                 ` andros
  2010-08-13 21:31                                                                                   ` [PATCH 42/50] pnfs_submit: generic commit andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/pagelist.c        |    5 ++++-
 fs/nfs/write.c           |   12 +++++++-----
 include/linux/nfs_page.h |    3 ++-
 include/linux/nfs_xdr.h  |    2 ++
 4 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index a014814..96e375e 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -387,6 +387,7 @@ void nfs_pageio_cond_complete(struct nfs_pageio_descriptor *desc, pgoff_t index)
  * @idx_start: lower bound of page->index to scan
  * @npages: idx_start + npages sets the upper bound to scan.
  * @tag: tag to scan for
+ * @use_pnfs: will be set TRUE if commit needs to be handled by layout driver
  *
  * Moves elements from one of the inode request lists.
  * If the number of requests is set to 0, the entire address_space
@@ -396,7 +397,7 @@ void nfs_pageio_cond_complete(struct nfs_pageio_descriptor *desc, pgoff_t index)
  */
 int nfs_scan_list(struct nfs_inode *nfsi,
 		struct list_head *dst, pgoff_t idx_start,
-		unsigned int npages, int tag)
+		  unsigned int npages, int tag, int *use_pnfs)
 {
 	struct nfs_page *pgvec[NFS_SCAN_MAXENTRIES];
 	struct nfs_page *req;
@@ -427,6 +428,8 @@ int nfs_scan_list(struct nfs_inode *nfsi,
 				radix_tree_tag_clear(&nfsi->nfs_page_tree,
 						req->wb_index, tag);
 				nfs_list_add_request(req, dst);
+				if (req->wb_lseg)
+					*use_pnfs = 1;
 				res++;
 				if (res == INT_MAX)
 					goto out;
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index fbc8657..18926d3 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -546,7 +546,7 @@ nfs_need_commit(struct nfs_inode *nfsi)
  * The requests are *not* checked to ensure that they form a contiguous set.
  */
 static int
-nfs_scan_commit(struct inode *inode, struct list_head *dst, pgoff_t idx_start, unsigned int npages)
+nfs_scan_commit(struct inode *inode, struct list_head *dst, pgoff_t idx_start, unsigned int npages, int *use_pnfs)
 {
 	struct nfs_inode *nfsi = NFS_I(inode);
 	int ret;
@@ -554,7 +554,8 @@ nfs_scan_commit(struct inode *inode, struct list_head *dst, pgoff_t idx_start, u
 	if (!nfs_need_commit(nfsi))
 		return 0;
 
-	ret = nfs_scan_list(nfsi, dst, idx_start, npages, NFS_PAGE_TAG_COMMIT);
+	ret = nfs_scan_list(nfsi, dst, idx_start, npages, NFS_PAGE_TAG_COMMIT,
+			    use_pnfs);
 	if (ret > 0)
 		nfsi->ncommit -= ret;
 	if (nfs_need_commit(NFS_I(inode)))
@@ -1466,21 +1467,22 @@ int nfs_commit_inode(struct inode *inode, int how)
 	LIST_HEAD(head);
 	int may_wait = how & FLUSH_SYNC;
 	int res = 0;
+	int use_pnfs = 0;
 
 	if (!nfs_commit_set_lock(NFS_I(inode), may_wait))
 		goto out_mark_dirty;
 	spin_lock(&inode->i_lock);
-	res = nfs_scan_commit(inode, &head, 0, 0);
+	res = nfs_scan_commit(inode, &head, 0, 0, &use_pnfs);
 	spin_unlock(&inode->i_lock);
 	if (res) {
 		int error = nfs_commit_list(inode, &head, how);
 		if (error < 0)
 			return error;
-		if (may_wait)
+		if (may_wait) {
 			wait_on_bit(&NFS_I(inode)->flags, NFS_INO_COMMIT,
 					nfs_wait_bit_killable,
 					TASK_KILLABLE);
-		else
+		} else
 			goto out_mark_dirty;
 	} else
 		nfs_commit_clear_lock(NFS_I(inode));
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index ce0a1a5..287ee81 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -83,7 +83,8 @@ extern	void nfs_release_request(struct nfs_page *req);
 
 
 extern	int nfs_scan_list(struct nfs_inode *nfsi, struct list_head *dst,
-			  pgoff_t idx_start, unsigned int npages, int tag);
+			  pgoff_t idx_start, unsigned int npages, int tag,
+			  int *use_pnfs);
 extern	void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
 			     struct inode *inode,
 			     int (*doio)(struct inode *, struct list_head *, unsigned int, size_t, int),
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 544d282..e627788 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1012,6 +1012,8 @@ struct nfs_read_data {
 };
 
 struct nfs_write_data {
+	struct kref		refcount;	/* For pnfs commit splitting */
+	struct nfs_write_data	*parent;	/* For pnfs commit splitting */
 	int			flags;
 	struct rpc_task		task;
 	struct inode		*inode;
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 42/50] pnfs_submit: generic commit
  2010-08-13 21:31                                                                                 ` [PATCH 41/50] pnfs_submit: signal layoutdriver commit andros
@ 2010-08-13 21:31                                                                                   ` andros
  2010-08-13 21:31                                                                                     ` [PATCH 43/50] pnfs_submit: data server commit with no getattr andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/internal.h          |    8 ++++
 fs/nfs/nfs4proc.c          |   14 ++++++-
 fs/nfs/pnfs.c              |   37 ++++++++++++++++
 fs/nfs/pnfs.h              |    3 +
 fs/nfs/write.c             |   99 ++++++++++++++++++++++++++++++++------------
 include/linux/nfs4_pnfs.h  |    7 +++
 include/linux/nfs_iostat.h |    1 +
 7 files changed, 141 insertions(+), 28 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 02f0da8..ae8b895 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -276,6 +276,14 @@ extern int pnfs_initiate_write(struct nfs_write_data *data,
 			      struct rpc_clnt *clnt,
 			      const struct rpc_call_ops *call_ops,
 			      int how);
+extern int nfs_initiate_commit(struct nfs_write_data *data,
+			       struct rpc_clnt *clnt,
+			       const struct rpc_call_ops *call_ops,
+			       int how);
+extern int pnfs_initiate_commit(struct nfs_write_data *data,
+			       struct rpc_clnt *clnt,
+			       const struct rpc_call_ops *call_ops,
+				int how, int pnfs);
 extern void nfs_write_prepare(struct rpc_task *task, void *calldata);
 extern void nfs_mark_list_commit(struct list_head *head);
 #ifdef CONFIG_MIGRATION
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 44ffa33..55aba4c 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3233,6 +3233,17 @@ static void nfs4_proc_write_setup(struct nfs_write_data *data, struct rpc_messag
 static int nfs4_commit_done(struct rpc_task *task, struct nfs_write_data *data)
 {
 	struct inode *inode = data->inode;
+	struct nfs_server *server = NFS_SERVER(data->inode);
+	struct nfs_client *client = server->nfs_client;
+
+#ifdef CONFIG_NFS_V4_1
+	/* Is this a DS session */
+	if (data->fldata.ds_nfs_client) {
+		dprintk("%s DS commit\n", __func__);
+		client = data->fldata.ds_nfs_client;
+	}
+#endif /* CONFIG_NFS_V4_1 */
+
 	if (!nfs4_sequence_done(task, &data->res.seq_res))
 		return -EAGAIN;
 
@@ -3240,7 +3251,8 @@ static int nfs4_commit_done(struct rpc_task *task, struct nfs_write_data *data)
 		nfs_restart_rpc(task, NFS_SERVER(inode)->nfs_client);
 		return -EAGAIN;
 	}
-	nfs_refresh_inode(inode, data->res.fattr);
+	if (client == server->nfs_client)
+		nfs_refresh_inode(inode, data->res.fattr);
 	return 0;
 }
 
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 424efce..393855e 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -260,6 +260,14 @@ pnfs_register_layoutdriver(struct pnfs_layoutdriver_type *ld_type)
 		return NULL;
 	}
 
+	if (!io_ops->read_pagelist || !io_ops->write_pagelist ||
+	    !io_ops->commit) {
+		printk(KERN_ERR "%s Layout driver must provide "
+		       "read_pagelist, write_pagelist, and commit.\n",
+		       __func__);
+		return NULL;
+	}
+
 	pnfs_mod = kmalloc(sizeof(struct pnfs_module), GFP_KERNEL);
 	if (pnfs_mod != NULL) {
 		dprintk("%s Registering id:%u name:%s\n",
@@ -1323,6 +1331,35 @@ pnfs_try_to_read_data(struct nfs_read_data *rdata,
 	return trypnfs;
 }
 
+enum pnfs_try_status
+pnfs_try_to_commit(struct nfs_write_data *data,
+		    const struct rpc_call_ops *call_ops, int sync)
+{
+	struct inode *inode = data->inode;
+	struct nfs_server *nfss = NFS_SERVER(data->inode);
+	enum pnfs_try_status trypnfs;
+
+	dprintk("%s: Begin\n", __func__);
+
+	/* We need to account for possibility that
+	 * each nfs_page can point to a different lseg (or be NULL).
+	 * For the immediate case of whole-file-only layouts, we at
+	 * least know there can be only a single lseg.
+	 * We still have to account for the possibility of some being NULL.
+	 * This will be done by passing the buck to the layout driver.
+	 */
+	data->pdata.call_ops = call_ops;
+	data->pdata.how = sync;
+	data->pdata.lseg = NULL;
+	trypnfs = nfss->pnfs_curr_ld->ld_io_ops->commit(data, sync);
+	if (trypnfs == PNFS_NOT_ATTEMPTED)
+		_pnfs_clear_lseg_from_pages(&data->pages);
+	else
+		nfs_inc_stats(inode, NFSIOS_PNFS_COMMIT);
+	dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
+	return trypnfs;
+}
+
 /*
  * Set up the argument/result storage required for the RPC call.
  */
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index b110f4e..80f67c7 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -50,6 +50,9 @@ void pnfs_layoutcommit_free(struct pnfs_layoutcommit_data *data);
 int pnfs_layoutcommit_inode(struct inode *inode, int sync);
 void pnfs_update_last_write(struct nfs_inode *nfsi, loff_t offset, size_t extent);
 void pnfs_need_layoutcommit(struct nfs_inode *nfsi, struct nfs_open_context *ctx);
+unsigned int pnfs_getiosize(struct nfs_server *server);
+enum pnfs_try_status pnfs_try_to_commit(struct nfs_write_data *,
+					 const struct rpc_call_ops *, int);
 void pnfs_pageio_init_read(struct nfs_pageio_descriptor *, struct inode *,
 			   struct nfs_open_context *, struct list_head *);
 void pnfs_pageio_init_write(struct nfs_pageio_descriptor *, struct inode *);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 18926d3..668c4c1 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1311,40 +1311,72 @@ static void nfs_commitdata_release(void *data)
 	nfs_commit_free(wdata);
 }
 
-/*
- * Set up the argument/result storage required for the RPC call.
- */
-static int nfs_commit_rpcsetup(struct list_head *head,
-		struct nfs_write_data *data,
-		int how)
+int nfs_initiate_commit(struct nfs_write_data *data,
+			struct rpc_clnt *clnt,
+			const struct rpc_call_ops *call_ops,
+			int how)
 {
-	struct nfs_page *first = nfs_list_entry(head->next);
-	struct inode *inode = first->wb_context->path.dentry->d_inode;
+	struct inode *inode = data->inode;
 	int priority = flush_task_priority(how);
 	struct rpc_task *task;
 	struct rpc_message msg = {
 		.rpc_argp = &data->args,
 		.rpc_resp = &data->res,
-		.rpc_cred = first->wb_context->cred,
+		.rpc_cred = data->cred,
 	};
 	struct rpc_task_setup task_setup_data = {
 		.task = &data->task,
-		.rpc_client = NFS_CLIENT(inode),
+		.rpc_client = clnt,
 		.rpc_message = &msg,
-		.callback_ops = &nfs_commit_ops,
+		.callback_ops = call_ops,
 		.callback_data = data,
 		.workqueue = nfsiod_workqueue,
 		.flags = RPC_TASK_ASYNC,
 		.priority = priority,
 	};
 
+	/* Set up the initial task struct.  */
+	NFS_PROTO(inode)->commit_setup(data, &msg);
+
+	dprintk("NFS: %5u initiated commit call\n", data->task.tk_pid);
+
+	task = rpc_run_task(&task_setup_data);
+	if (IS_ERR(task))
+		return PTR_ERR(task);
+	rpc_put_task(task);
+	return 0;
+}
+EXPORT_SYMBOL(nfs_initiate_commit);
+
+
+int pnfs_initiate_commit(struct nfs_write_data *data,
+			 struct rpc_clnt *clnt,
+			 const struct rpc_call_ops *call_ops,
+			 int how, int pnfs)
+{
+	if (pnfs &&
+	    (pnfs_try_to_commit(data, &nfs_commit_ops, how) == PNFS_ATTEMPTED))
+		return 0;
+	return nfs_initiate_commit(data, clnt, &nfs_commit_ops, how);
+}
+
+/*
+ * Set up the argument/result storage required for the RPC call.
+ */
+static int nfs_commit_rpcsetup(struct list_head *head,
+		struct nfs_write_data *data,
+		int how, int pnfs)
+{
+	struct nfs_page *first = nfs_list_entry(head->next);
+	struct inode *inode = first->wb_context->path.dentry->d_inode;
+
 	/* Set up the RPC argument and reply structs
 	 * NB: take care not to mess about with data->commit et al. */
 
 	list_splice_init(head, &data->pages);
 
 	data->inode	  = inode;
-	data->cred	  = msg.rpc_cred;
+	data->cred	  = first->wb_context->cred;
 
 	data->args.fh     = NFS_FH(data->inode);
 	/* Note: we always request a commit of the entire inode */
@@ -1355,24 +1387,19 @@ static int nfs_commit_rpcsetup(struct list_head *head,
 	data->res.fattr   = &data->fattr;
 	data->res.verf    = &data->verf;
 	nfs_fattr_init(&data->fattr);
+	kref_init(&data->refcount);
+	data->parent      = NULL;
+	data->args.context = first->wb_context;  /* used by commit done */
 
-	/* Set up the initial task struct.  */
-	NFS_PROTO(inode)->commit_setup(data, &msg);
-
-	dprintk("NFS: %5u initiated commit call\n", data->task.tk_pid);
-
-	task = rpc_run_task(&task_setup_data);
-	if (IS_ERR(task))
-		return PTR_ERR(task);
-	rpc_put_task(task);
-	return 0;
+	return pnfs_initiate_commit(data, NFS_CLIENT(inode), &nfs_commit_ops,
+				    how, pnfs);
 }
 
 /*
  * Commit dirty pages
  */
 static int
-nfs_commit_list(struct inode *inode, struct list_head *head, int how)
+nfs_commit_list(struct inode *inode, struct list_head *head, int how, int pnfs)
 {
 	struct nfs_write_data	*data;
 	struct nfs_page         *req;
@@ -1383,7 +1410,7 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how)
 		goto out_bad;
 
 	/* Set up the argument struct */
-	return nfs_commit_rpcsetup(head, data, how);
+	return nfs_commit_rpcsetup(head, data, how, pnfs);
  out_bad:
 	while (!list_empty(head)) {
 		req = nfs_list_entry(head->next);
@@ -1413,6 +1440,19 @@ static void nfs_commit_done(struct rpc_task *task, void *calldata)
 		return;
 }
 
+static inline void nfs_commit_cleanup(struct kref *kref)
+{
+	struct nfs_write_data *data;
+
+	data = container_of(kref, struct nfs_write_data, refcount);
+	/* Clear lock only when all cloned commits are finished */
+	if (data->parent)
+		kref_put(&data->parent->refcount, nfs_commit_cleanup);
+	else
+		nfs_commit_clear_lock(NFS_I(data->inode));
+	nfs_commitdata_release(data);
+}
+
 static void nfs_commit_release(void *calldata)
 {
 	struct nfs_write_data	*data = calldata;
@@ -1430,6 +1470,11 @@ static void nfs_commit_release(void *calldata)
 			req->wb_bytes,
 			(long long)req_offset(req));
 		if (status < 0) {
+			if (req->wb_lseg) {
+				nfs_mark_request_nopnfs(req);
+				nfs_mark_request_dirty(req);
+				goto next;
+			}
 			nfs_context_set_write_error(req->wb_context, status);
 			nfs_inode_remove_request(req);
 			dprintk(", error = %d\n", status);
@@ -1446,12 +1491,12 @@ static void nfs_commit_release(void *calldata)
 		}
 		/* We have a mismatch. Write the page again */
 		dprintk(" mismatch\n");
+		nfs_mark_request_nopnfs(req);
 		nfs_mark_request_dirty(req);
 	next:
 		nfs_clear_page_tag_locked(req);
 	}
-	nfs_commit_clear_lock(NFS_I(data->inode));
-	nfs_commitdata_release(calldata);
+	kref_put(&data->refcount, nfs_commit_cleanup);
 }
 
 static const struct rpc_call_ops nfs_commit_ops = {
@@ -1475,7 +1520,7 @@ int nfs_commit_inode(struct inode *inode, int how)
 	res = nfs_scan_commit(inode, &head, 0, 0, &use_pnfs);
 	spin_unlock(&inode->i_lock);
 	if (res) {
-		int error = nfs_commit_list(inode, &head, how);
+		int error = nfs_commit_list(inode, &head, how, use_pnfs);
 		if (error < 0)
 			return error;
 		if (may_wait) {
diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
index b010ff1..ef160e6 100644
--- a/include/linux/nfs4_pnfs.h
+++ b/include/linux/nfs4_pnfs.h
@@ -124,6 +124,13 @@ struct layoutdriver_io_operations {
 	enum pnfs_try_status
 	(*write_pagelist) (struct nfs_write_data *nfs_data, unsigned nr_pages, int how);
 
+	/* Consistency ops */
+	/* 2 problems:
+	 * 1) the page list contains nfs_pages, NOT pages
+	 * 2) currently the NFS code doesn't create a page array (as it does with read/write)
+	 */
+	enum pnfs_try_status
+	(*commit) (struct nfs_write_data *nfs_data, int how);
 
 	/* Layout information. For each inode, alloc_layout is executed once to retrieve an
 	 * inode specific layout structure.  Each subsequent layoutget operation results in
diff --git a/include/linux/nfs_iostat.h b/include/linux/nfs_iostat.h
index 8866bb3..f9b5f44 100644
--- a/include/linux/nfs_iostat.h
+++ b/include/linux/nfs_iostat.h
@@ -115,6 +115,7 @@ enum nfs_stat_eventcounters {
 	NFSIOS_DELAY,
 	NFSIOS_PNFS_READ,
 	NFSIOS_PNFS_WRITE,
+	NFSIOS_PNFS_COMMIT,
 	__NFSIOS_COUNTSMAX,
 };
 
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 43/50] pnfs_submit: data server commit with no getattr
  2010-08-13 21:31                                                                                   ` [PATCH 42/50] pnfs_submit: generic commit andros
@ 2010-08-13 21:31                                                                                     ` andros
  2010-08-13 21:31                                                                                       ` [PATCH 44/50] pnfs_submit: filelayout commit andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4proc.c    |    6 ++++++
 fs/nfs/nfs4xdr.c     |   50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/nfs4.h |    1 +
 3 files changed, 57 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 55aba4c..8879fab 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3262,6 +3262,12 @@ static void nfs4_proc_commit_setup(struct nfs_write_data *data, struct rpc_messa
 	
 	data->args.bitmask = server->cache_consistency_bitmask;
 	data->res.server = server;
+#if defined(CONFIG_NFS_V4_1)
+	if (data->fldata.ds_nfs_client) {
+		msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_PNFS_COMMIT];
+		return;
+	}
+#endif /* CONFIG_NFS_V4_1 */
 	msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_COMMIT];
 }
 
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 520b589..bb2cb86 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -765,6 +765,12 @@ static int nfs4_stat_to_errno(int);
 				decode_sequence_maxsz + \
 				decode_putfh_maxsz + \
 				decode_write_maxsz)
+#define NFS4_enc_dscommit_sz	(compound_encode_hdr_maxsz + \
+				encode_putfh_maxsz + \
+				encode_commit_maxsz)
+#define NFS4_dec_dscommit_sz	(compound_decode_hdr_maxsz + \
+				decode_putfh_maxsz + \
+				decode_commit_maxsz)
 
 const u32 nfs41_maxwrite_overhead = ((RPC_MAX_HEADER_WITH_AUTH +
 				      compound_encode_hdr_maxsz +
@@ -2844,6 +2850,25 @@ static int nfs4_xdr_enc_dswrite(struct rpc_rqst *req, uint32_t *p,
 	return 0;
 }
 
+/*
+ * Encode a pNFS File Layout Data Server COMMIT request
+ */
+static int nfs4_xdr_enc_dscommit(struct rpc_rqst *req, uint32_t *p,
+				 struct nfs_writeargs *args)
+{
+	struct xdr_stream xdr;
+	struct compound_hdr hdr = {
+		.minorversion = nfs4_xdr_minorversion(&args->seq_args),
+	};
+
+	xdr_init_encode(&xdr, &req->rq_snd_buf, p);
+	encode_compound_hdr(&xdr, req, &hdr);
+	encode_sequence(&xdr, &args->seq_args, &hdr);
+	encode_putfh(&xdr, args->fh, &hdr);
+	encode_commit(&xdr, args, &hdr);
+	encode_nops(&hdr);
+	return 0;
+}
 #endif /* CONFIG_NFS_V4_1 */
 
 static void print_overflow_msg(const char *func, const struct xdr_stream *xdr)
@@ -6424,6 +6449,30 @@ out:
 	return status;
 }
 
+/*
+ * Decode pNFS File Layout Data Server COMMIT response
+ */
+static int nfs4_xdr_dec_dscommit(struct rpc_rqst *rqstp, uint32_t *p,
+				 struct nfs_writeres *res)
+{
+	struct xdr_stream xdr;
+	struct compound_hdr hdr;
+	int status;
+
+	xdr_init_decode(&xdr, &rqstp->rq_rcv_buf, p);
+	status = decode_compound_hdr(&xdr, &hdr);
+	if (status)
+		goto out;
+	status = decode_sequence(&xdr, &res->seq_res, rqstp);
+	if (status)
+		goto out;
+	status = decode_putfh(&xdr);
+	if (status)
+		goto out;
+	status = decode_commit(&xdr, res);
+out:
+	return status;
+}
 #endif /* CONFIG_NFS_V4_1 */
 
 __be32 *nfs4_decode_dirent(__be32 *p, struct nfs_entry *entry, int plus)
@@ -6607,6 +6656,7 @@ struct rpc_procinfo	nfs4_procedures[] = {
   PROC(PNFS_LAYOUTCOMMIT, enc_layoutcommit,  dec_layoutcommit),
   PROC(PNFS_LAYOUTRETURN, enc_layoutreturn,  dec_layoutreturn),
   PROC(PNFS_WRITE, enc_dswrite,  dec_dswrite),
+  PROC(PNFS_COMMIT, enc_dscommit,  dec_dscommit),
 #endif /* CONFIG_NFS_V4_1 */
 };
 
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index d5509b7..6ee7357 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -558,6 +558,7 @@ enum {
 	NFSPROC4_CLNT_PNFS_LAYOUTRETURN,
 	NFSPROC4_CLNT_PNFS_GETDEVICEINFO,
 	NFSPROC4_CLNT_PNFS_WRITE,
+	NFSPROC4_CLNT_PNFS_COMMIT,
 };
 
 /* nfs41 types */
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 44/50] pnfs_submit: filelayout commit
  2010-08-13 21:31                                                                                     ` [PATCH 43/50] pnfs_submit: data server commit with no getattr andros
@ 2010-08-13 21:31                                                                                       ` andros
  2010-08-13 21:31                                                                                         ` [PATCH 45/50] pnfs_submit: cb_layoutrecall andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/inode.c          |    2 +
 fs/nfs/nfs4filelayout.c |  171 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/write.c          |   30 +++++---
 3 files changed, 192 insertions(+), 11 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 0360336..30d9ac6 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -644,6 +644,7 @@ struct nfs_open_context *get_nfs_open_context(struct nfs_open_context *ctx)
 		atomic_inc(&ctx->lock_context.count);
 	return ctx;
 }
+EXPORT_SYMBOL(get_nfs_open_context);
 
 static void __put_nfs_open_context(struct nfs_open_context *ctx, int is_sync)
 {
@@ -996,6 +997,7 @@ void nfs_fattr_init(struct nfs_fattr *fattr)
 	fattr->time_start = jiffies;
 	fattr->gencount = nfs_inc_attr_generation_counter();
 }
+EXPORT_SYMBOL(nfs_fattr_init);
 
 struct nfs_fattr *nfs_alloc_fattr(void)
 {
diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 8c5702b..fea1772 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -501,6 +501,176 @@ filelayout_free_lseg(struct pnfs_layout_segment *lseg)
 	_filelayout_free_lseg(lseg);
 }
 
+/* Allocate a new nfs_write_data struct and initialize */
+static struct nfs_write_data *
+filelayout_clone_write_data(struct nfs_write_data *old)
+{
+	static struct nfs_write_data *new;
+
+	new = nfs_commitdata_alloc();
+	if (!new)
+		goto out;
+	kref_init(&new->refcount);
+	new->parent      = old;
+	kref_get(&old->refcount);
+	new->inode       = old->inode;
+	new->cred        = old->cred;
+	new->args.offset = 0;
+	new->args.count  = 0;
+	new->res.count   = 0;
+	new->res.fattr   = &new->fattr;
+	nfs_fattr_init(&new->fattr);
+	new->res.verf    = &new->verf;
+	new->args.context = get_nfs_open_context(old->args.context);
+	new->pdata.lseg = NULL;
+	new->pdata.call_ops = old->pdata.call_ops;
+	new->pdata.how = old->pdata.how;
+out:
+	return new;
+}
+
+static void filelayout_commit_call_done(struct rpc_task *task, void *data)
+{
+	struct nfs_write_data *wdata = (struct nfs_write_data *)data;
+
+	wdata->pdata.call_ops->rpc_call_done(task, data);
+}
+
+static struct rpc_call_ops filelayout_commit_call_ops = {
+	.rpc_call_prepare = nfs_write_prepare,
+	.rpc_call_done = filelayout_commit_call_done,
+	.rpc_release = filelayout_write_release,
+};
+
+/*
+ * Execute a COMMIT op to the MDS or to each data server on which a page
+ * in 'pages' exists.
+ * Invoke the pnfs_commit_complete callback.
+ */
+enum pnfs_try_status
+filelayout_commit(struct nfs_write_data *data, int sync)
+{
+	LIST_HEAD(head);
+	struct nfs_page *req;
+	loff_t file_offset = 0;
+	u16 idx, i;
+	struct list_head **ds_page_list = NULL;
+	u16 *indices_used;
+	int num_indices_seen = 0;
+	const struct rpc_call_ops *call_ops;
+	struct rpc_clnt *clnt;
+	struct nfs_write_data **clone_list = NULL;
+	struct nfs_write_data *dsdata;
+	struct nfs4_pnfs_ds *ds;
+
+	dprintk("%s data %p sync %d\n", __func__, data, sync);
+
+	/* Alloc room for both in one go */
+	ds_page_list = kzalloc((NFS4_PNFS_MAX_MULTI_CNT + 1) *
+			       (sizeof(u16) + sizeof(struct list_head *)),
+			       GFP_KERNEL);
+	if (!ds_page_list)
+		goto mem_error;
+	indices_used = (u16 *) (ds_page_list + NFS4_PNFS_MAX_MULTI_CNT + 1);
+	/*
+	 * Sort pages based on which ds to send to.
+	 * MDS is given index equal to NFS4_PNFS_MAX_MULTI_CNT.
+	 * Note we are assuming there is only a single lseg in play.
+	 * When that is not true, we could first sort on lseg, then
+	 * sort within each as we do here.
+	 */
+	while (!list_empty(&data->pages)) {
+		req = nfs_list_entry(data->pages.next);
+		nfs_list_remove_request(req);
+		if (!req->wb_lseg ||
+		    ((struct nfs4_filelayout_segment *)
+		     LSEG_LD_DATA(req->wb_lseg))->commit_through_mds)
+			idx = NFS4_PNFS_MAX_MULTI_CNT;
+		else {
+			file_offset = (loff_t)req->wb_index << PAGE_CACHE_SHIFT;
+			idx = nfs4_fl_calc_ds_index(req->wb_lseg, file_offset);
+		}
+		if (ds_page_list[idx]) {
+			/* Already seen this idx */
+			list_add(&req->wb_list, ds_page_list[idx]);
+		} else {
+			/* New idx not seen so far */
+			list_add_tail(&req->wb_list, &head);
+			indices_used[num_indices_seen++] = idx;
+		}
+		ds_page_list[idx] = &req->wb_list;
+	}
+	/* Once created, clone must be released via call_op */
+	clone_list = kzalloc(num_indices_seen *
+			     sizeof(struct nfs_write_data *), GFP_KERNEL);
+	if (!clone_list)
+		goto mem_error;
+	for (i = 0; i < num_indices_seen - 1; i++) {
+		clone_list[i] = filelayout_clone_write_data(data);
+		if (!clone_list[i])
+			goto mem_error;
+	}
+	clone_list[i] = data;
+	/*
+	 * Now send off the RPCs to each ds.  Note that it is important
+	 * that any RPC to the MDS be sent last (or at least after all
+	 * clones have been made.)
+	 */
+	for (i = 0; i < num_indices_seen; i++) {
+		dsdata = clone_list[i];
+		idx = indices_used[i];
+		list_cut_position(&dsdata->pages, &head, ds_page_list[idx]);
+		if (idx == NFS4_PNFS_MAX_MULTI_CNT) {
+			call_ops = data->pdata.call_ops;;
+			clnt = NFS_CLIENT(dsdata->inode);
+			ds = NULL;
+		} else {
+			struct nfs_fh *fh;
+
+			call_ops = &filelayout_commit_call_ops;
+			req = nfs_list_entry(dsdata->pages.next);
+			ds = nfs4_fl_prepare_ds(req->wb_lseg, idx);
+			if (!ds) {
+				/* Trigger retry of this chunk through MDS */
+				dsdata->task.tk_status = -EIO;
+				data->pdata.call_ops->rpc_release(dsdata);
+				continue;
+			}
+			clnt = ds->ds_clp->cl_rpcclient;
+			dsdata->fldata.ds_nfs_client = ds->ds_clp;
+			file_offset = (loff_t)req->wb_index << PAGE_CACHE_SHIFT;
+			fh = nfs4_fl_select_ds_fh(req->wb_lseg, file_offset);
+			if (fh)
+				dsdata->args.fh = fh;
+		}
+		dprintk("%s: Initiating commit: %llu USE DS:\n",
+			__func__, file_offset);
+		print_ds(ds);
+
+		/* Send COMMIT to data server */
+		nfs_initiate_commit(dsdata, clnt, call_ops, sync);
+	}
+	kfree(clone_list);
+	kfree(ds_page_list);
+	return PNFS_ATTEMPTED;
+
+ mem_error:
+	if (clone_list) {
+		for (i = 0; i < num_indices_seen - 1; i++) {
+			if (!clone_list[i])
+				break;
+			data->pdata.call_ops->rpc_release(clone_list[i]);
+		}
+		kfree(clone_list);
+	}
+	kfree(ds_page_list);
+	/* One of these will be empty, but doesn't hurt to do both */
+	nfs_mark_list_commit(&head);
+	nfs_mark_list_commit(&data->pages);
+	data->pdata.call_ops->rpc_release(data);
+	return PNFS_ATTEMPTED;
+}
+
 /* Return the stripesize for the specified file */
 ssize_t
 filelayout_get_stripesize(struct pnfs_layout_type *layoutid)
@@ -534,6 +704,7 @@ filelayout_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
 }
 
 struct layoutdriver_io_operations filelayout_io_operations = {
+	.commit                  = filelayout_commit,
 	.read_pagelist           = filelayout_read_pagelist,
 	.write_pagelist          = filelayout_write_pagelist,
 	.alloc_layout            = filelayout_alloc_layout,
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 668c4c1..2251551 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -60,6 +60,7 @@ struct nfs_write_data *nfs_commitdata_alloc(void)
 	}
 	return p;
 }
+EXPORT_SYMBOL(nfs_commitdata_alloc);
 
 void nfs_commit_free(struct nfs_write_data *p)
 {
@@ -1395,6 +1396,23 @@ static int nfs_commit_rpcsetup(struct list_head *head,
 				    how, pnfs);
 }
 
+/* Handle memory error during commit */
+void nfs_mark_list_commit(struct list_head *head)
+{
+	struct nfs_page         *req;
+
+	while (!list_empty(head)) {
+		req = nfs_list_entry(head->next);
+		nfs_list_remove_request(req);
+		nfs_mark_request_commit(req);
+		dec_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
+		dec_bdi_stat(req->wb_page->mapping->backing_dev_info,
+				BDI_RECLAIMABLE);
+		nfs_clear_page_tag_locked(req);
+	}
+}
+EXPORT_SYMBOL(nfs_mark_list_commit);
+
 /*
  * Commit dirty pages
  */
@@ -1402,25 +1420,15 @@ static int
 nfs_commit_list(struct inode *inode, struct list_head *head, int how, int pnfs)
 {
 	struct nfs_write_data	*data;
-	struct nfs_page         *req;
 
 	data = nfs_commitdata_alloc();
-
 	if (!data)
 		goto out_bad;
 
 	/* Set up the argument struct */
 	return nfs_commit_rpcsetup(head, data, how, pnfs);
  out_bad:
-	while (!list_empty(head)) {
-		req = nfs_list_entry(head->next);
-		nfs_list_remove_request(req);
-		nfs_mark_request_commit(req);
-		dec_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
-		dec_bdi_stat(req->wb_page->mapping->backing_dev_info,
-				BDI_RECLAIMABLE);
-		nfs_clear_page_tag_locked(req);
-	}
+	nfs_mark_list_commit(head);
 	nfs_commit_clear_lock(NFS_I(inode));
 	return -ENOMEM;
 }
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 45/50] pnfs_submit: cb_layoutrecall
  2010-08-13 21:31                                                                                       ` [PATCH 44/50] pnfs_submit: filelayout commit andros
@ 2010-08-13 21:31                                                                                         ` andros
  2010-08-13 21:31                                                                                           ` [PATCH 46/50] pnfs_submit: increase NFS_MAX_FILE_IO_SIZE andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/callback.h      |   25 ++++
 fs/nfs/callback_proc.c |  328 +++++++++++++++++++++++++++++++++++++++++++++++-
 fs/nfs/callback_xdr.c  |   65 +++++++++-
 fs/nfs/nfs4_fs.h       |    1 +
 4 files changed, 415 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/callback.h b/fs/nfs/callback.h
index 85a7cfd..ab9b421 100644
--- a/fs/nfs/callback.h
+++ b/fs/nfs/callback.h
@@ -8,6 +8,8 @@
 #ifndef __LINUX_FS_NFS_CALLBACK_H
 #define __LINUX_FS_NFS_CALLBACK_H
 
+#include <linux/pnfs_xdr.h>
+
 #define NFS4_CALLBACK 0x40000000
 #define NFS4_CALLBACK_XDRSIZE 2048
 #define NFS4_CALLBACK_BUFSIZE (1024 + NFS4_CALLBACK_XDRSIZE)
@@ -72,6 +74,8 @@ struct cb_recallargs {
 
 #if defined(CONFIG_NFS_V4_1)
 
+#include <linux/pnfs_xdr.h>
+
 struct referring_call {
 	uint32_t			rc_sequenceid;
 	uint32_t			rc_slotid;
@@ -111,6 +115,13 @@ extern int nfs41_validate_delegation_stateid(struct nfs_delegation *delegation,
 
 #define RCA4_TYPE_MASK_RDATA_DLG	0
 #define RCA4_TYPE_MASK_WDATA_DLG	1
+#define RCA4_TYPE_MASK_DIR_DLG         2
+#define RCA4_TYPE_MASK_FILE_LAYOUT     3
+#define RCA4_TYPE_MASK_BLK_LAYOUT      4
+#define RCA4_TYPE_MASK_OBJ_LAYOUT_MIN  8
+#define RCA4_TYPE_MASK_OBJ_LAYOUT_MAX  9
+#define RCA4_TYPE_MASK_OTHER_LAYOUT_MIN 12
+#define RCA4_TYPE_MASK_OTHER_LAYOUT_MAX 15
 
 struct cb_recallanyargs {
 	struct sockaddr	*craa_addr;
@@ -127,6 +138,20 @@ struct cb_recallslotargs {
 extern unsigned nfs4_callback_recallslot(struct cb_recallslotargs *args,
 					  void *dummy);
 
+struct cb_pnfs_layoutrecallargs {
+	struct sockaddr		*cbl_addr;
+	struct nfs_fh		cbl_fh;
+	struct nfs4_pnfs_layout_segment cbl_seg;
+	struct nfs_fsid		cbl_fsid;
+	uint32_t		cbl_recall_type;
+	uint32_t		cbl_layout_type;
+	uint32_t		cbl_layoutchanged;
+	nfs4_stateid		cbl_stateid;
+};
+
+extern unsigned pnfs_cb_layoutrecall(struct cb_pnfs_layoutrecallargs *args,
+				     void *dummy);
+
 #endif /* CONFIG_NFS_V4_1 */
 
 extern __be32 nfs4_callback_getattr(struct cb_getattrargs *args, struct cb_getattrres *res);
diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
index c7f0021..e2ea2be 100644
--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -8,10 +8,15 @@
 #include <linux/nfs4.h>
 #include <linux/nfs_fs.h>
 #include <linux/slab.h>
+#include <linux/kthread.h>
+#include <linux/module.h>
+#include <linux/writeback.h>
+#include <linux/nfs4_pnfs.h>
 #include "nfs4_fs.h"
 #include "callback.h"
 #include "delegation.h"
 #include "internal.h"
+#include "pnfs.h"
 
 #ifdef NFS_DEBUG
 #define NFSDBG_FACILITY NFSDBG_CALLBACK
@@ -114,6 +119,292 @@ int nfs4_validate_delegation_stateid(struct nfs_delegation *delegation, const nf
 
 #if defined(CONFIG_NFS_V4_1)
 
+static bool
+pnfs_is_next_layout_stateid(const struct pnfs_layout_type *lo,
+			    const nfs4_stateid stateid)
+{
+	int seqlock;
+	bool res;
+	u32 oldseqid, newseqid;
+
+	do {
+		seqlock = read_seqbegin(&lo->seqlock);
+		oldseqid = be32_to_cpu(lo->stateid.u.stateid.seqid);
+		newseqid = be32_to_cpu(stateid.u.stateid.seqid);
+		res = !memcmp(lo->stateid.u.stateid.other,
+			      stateid.u.stateid.other,
+			      NFS4_STATEID_OTHER_SIZE);
+		if (res) { /* comparing layout stateids */
+			if (oldseqid == ~0)
+				res = (newseqid == 1);
+			else
+				res = (newseqid == oldseqid + 1);
+		} else { /* open stateid */
+			res = !memcmp(lo->stateid.u.data,
+				      &zero_stateid,
+				      NFS4_STATEID_SIZE);
+			if (res)
+				res = (newseqid == 1);
+		}
+	} while (read_seqretry(&lo->seqlock, seqlock));
+
+	return res;
+}
+
+/*
+ * Retrieve an inode based on layout recall parameters
+ *
+ * Note: caller must iput(inode) to dereference the inode.
+ */
+static struct inode *
+nfs_layoutrecall_find_inode(struct nfs_client *clp,
+			    const struct cb_pnfs_layoutrecallargs *args)
+{
+	struct nfs_inode *nfsi;
+	struct pnfs_layout_type *layout;
+	struct nfs_server *server;
+	struct inode *ino = NULL;
+
+	dprintk("%s: Begin recall_type=%d clp %p\n",
+		__func__, args->cbl_recall_type, clp);
+
+	spin_lock(&clp->cl_lock);
+	list_for_each_entry(layout, &clp->cl_layouts, lo_layouts) {
+		nfsi = PNFS_NFS_INODE(layout);
+		if (!nfsi)
+			continue;
+
+		dprintk("%s: Searching inode=%lu\n",
+			__func__, nfsi->vfs_inode.i_ino);
+
+		if (args->cbl_recall_type == RETURN_FILE) {
+		    if (nfs_compare_fh(&args->cbl_fh, &nfsi->fh))
+			continue;
+		} else if (args->cbl_recall_type == RETURN_FSID) {
+			server = NFS_SERVER(&nfsi->vfs_inode);
+			if (server->fsid.major != args->cbl_fsid.major ||
+			    server->fsid.minor != args->cbl_fsid.minor)
+				continue;
+		}
+
+		/* Make sure client didn't clean up layout without
+		 * telling the server */
+		if (!has_layout(nfsi))
+			continue;
+
+		ino = igrab(&nfsi->vfs_inode);
+		dprintk("%s: Found inode=%p\n", __func__, ino);
+		break;
+	}
+	spin_unlock(&clp->cl_lock);
+	return ino;
+}
+
+struct recall_layout_threadargs {
+	struct inode *inode;
+	struct nfs_client *clp;
+	struct completion started;
+	struct cb_pnfs_layoutrecallargs *rl;
+	int result;
+};
+
+static int pnfs_recall_layout(void *data)
+{
+	struct inode *inode, *ino;
+	struct nfs_client *clp;
+	struct cb_pnfs_layoutrecallargs rl;
+	struct nfs4_pnfs_layoutreturn *lrp;
+	struct recall_layout_threadargs *args =
+		(struct recall_layout_threadargs *)data;
+	int status = 0;
+
+	daemonize("nfsv4-layoutreturn");
+
+	dprintk("%s: recall_type=%d fsid 0x%llx-0x%llx start\n",
+		__func__, args->rl->cbl_recall_type,
+		args->rl->cbl_fsid.major, args->rl->cbl_fsid.minor);
+
+	clp = args->clp;
+	inode = args->inode;
+	rl = *args->rl;
+
+	/* support whole file layouts only */
+	rl.cbl_seg.offset = 0;
+	rl.cbl_seg.length = NFS4_MAX_UINT64;
+
+	if (rl.cbl_recall_type == RETURN_FILE) {
+		if (pnfs_is_next_layout_stateid(NFS_I(inode)->layout,
+						rl.cbl_stateid))
+			status = pnfs_return_layout(inode, &rl.cbl_seg,
+						    &rl.cbl_stateid, RETURN_FILE,
+						    false);
+		else
+			status = cpu_to_be32(NFS4ERR_DELAY);
+		if (status)
+			dprintk("%s RETURN_FILE error: %d\n", __func__, status);
+		else
+			status =  cpu_to_be32(NFS4ERR_NOMATCHING_LAYOUT);
+		args->result = status;
+		complete(&args->started);
+		goto out;
+	}
+
+	status = cpu_to_be32(NFS4_OK);
+	args->result = status;
+	complete(&args->started);
+	args = NULL;
+
+	/* IMPROVEME: This loop is inefficient, running in O(|s_inodes|^2) */
+	while ((ino = nfs_layoutrecall_find_inode(clp, &rl)) != NULL) {
+		/* FIXME: need to check status on pnfs_return_layout */
+		pnfs_return_layout(ino, &rl.cbl_seg, NULL, RETURN_FILE, false);
+		iput(ino);
+	}
+
+	lrp = kzalloc(sizeof(*lrp), GFP_KERNEL);
+	if (!lrp) {
+		dprintk("%s: allocation failed. Cannot send last LAYOUTRETURN\n",
+			__func__);
+		goto out;
+	}
+
+	/* send final layoutreturn */
+	lrp->args.reclaim = 0;
+	lrp->args.layout_type = rl.cbl_layout_type;
+	lrp->args.return_type = rl.cbl_recall_type;
+	lrp->args.lseg = rl.cbl_seg;
+	lrp->args.inode = inode;
+	pnfs4_proc_layoutreturn(lrp, true);
+
+out:
+	clear_bit(NFS4CLNT_LAYOUT_RECALL, &clp->cl_state);
+	nfs_put_client(clp);
+	module_put_and_exit(0);
+	dprintk("%s: exit status %d\n", __func__, 0);
+	return 0;
+}
+
+/*
+ * Asynchronous layout recall!
+ */
+static int pnfs_async_return_layout(struct nfs_client *clp, struct inode *inode,
+				    struct cb_pnfs_layoutrecallargs *rl)
+{
+	struct recall_layout_threadargs data = {
+		.clp = clp,
+		.inode = inode,
+		.rl = rl,
+	};
+	struct task_struct *t;
+	int status = -EAGAIN;
+
+	dprintk("%s: -->\n", __func__);
+
+	/* FIXME: do not allow two concurrent layout recalls */
+	if (test_and_set_bit(NFS4CLNT_LAYOUT_RECALL, &clp->cl_state))
+		return status;
+
+	init_completion(&data.started);
+	__module_get(THIS_MODULE);
+	if (!atomic_inc_not_zero(&clp->cl_count))
+		goto out_put_no_client;
+
+	t = kthread_run(pnfs_recall_layout, &data, "%s", "pnfs_recall_layout");
+	if (IS_ERR(t)) {
+		printk(KERN_INFO "NFS: Layout recall callback thread failed "
+			"for client (clientid %08x/%08x)\n",
+			(unsigned)(clp->cl_clientid >> 32),
+			(unsigned)(clp->cl_clientid));
+		status = PTR_ERR(t);
+		goto out_module_put;
+	}
+	wait_for_completion(&data.started);
+	return data.result;
+out_module_put:
+	nfs_put_client(clp);
+out_put_no_client:
+	clear_bit(NFS4CLNT_LAYOUT_RECALL, &clp->cl_state);
+	module_put(THIS_MODULE);
+	return status;
+}
+
+static int pnfs_recall_all_layouts(struct nfs_client *clp)
+{
+	struct cb_pnfs_layoutrecallargs rl;
+	struct inode *inode;
+	int status = 0;
+
+	rl.cbl_recall_type = RETURN_ALL;
+	rl.cbl_seg.iomode = IOMODE_ANY;
+	rl.cbl_seg.offset = 0;
+	rl.cbl_seg.length = NFS4_MAX_UINT64;
+
+	/* we need the inode to get the nfs_server struct */
+	inode = nfs_layoutrecall_find_inode(clp, &rl);
+	if (!inode)
+		return status;
+	status = pnfs_async_return_layout(clp, inode, &rl);
+	iput(inode);
+
+	return status;
+}
+
+__be32 pnfs_cb_layoutrecall(struct cb_pnfs_layoutrecallargs *args,
+			    void *dummy)
+{
+	struct nfs_client *clp;
+	struct inode *inode = NULL;
+	__be32 res;
+	int status;
+	unsigned int num_client = 0;
+
+	dprintk("%s: -->\n", __func__);
+
+	res = cpu_to_be32(NFS4ERR_OP_NOT_IN_SESSION);
+	clp  = nfs_find_client(args->cbl_addr, 4);
+	if (clp == NULL) {
+		dprintk("%s: no client for addr %u.%u.%u.%u\n",
+			__func__, NIPQUAD(args->cbl_addr));
+		goto out;
+	}
+
+	res = cpu_to_be32(NFS4ERR_NOMATCHING_LAYOUT);
+	do {
+		struct nfs_client *prev = clp;
+		num_client++;
+		/* the callback must come from the MDS personality */
+		if (!(clp->cl_exchange_flags & EXCHGID4_FLAG_USE_PNFS_MDS))
+			goto loop;
+		if (args->cbl_recall_type == RETURN_FILE) {
+			inode = nfs_layoutrecall_find_inode(clp, args);
+			if (inode != NULL) {
+				status = pnfs_async_return_layout(clp, inode,
+								  args);
+				if (status)
+					res = cpu_to_be32(NFS4ERR_DELAY);
+				iput(inode);
+			}
+		} else { /* _ALL or _FSID */
+			/* we need the inode to get the nfs_server struct */
+			inode = nfs_layoutrecall_find_inode(clp, args);
+			if (!inode)
+				goto loop;
+			status = pnfs_async_return_layout(clp, inode, args);
+			if (status)
+				res = cpu_to_be32(NFS4ERR_DELAY);
+			iput(inode);
+		}
+loop:
+		clp = nfs_find_client_next(prev);
+		nfs_put_client(prev);
+	} while (clp != NULL);
+
+out:
+	dprintk("%s: exit with status = %d numclient %u\n",
+		__func__, ntohl(res), num_client);
+	return res;
+}
+
 int nfs41_validate_delegation_stateid(struct nfs_delegation *delegation, const nfs4_stateid *stateid)
 {
 	if (delegation == NULL)
@@ -325,13 +616,37 @@ out:
 	return status;
 }
 
+static inline bool
+validate_bitmap_values(const unsigned long *mask)
+{
+	int i;
+
+	if (*mask == 0)
+		return true;
+	if (test_bit(RCA4_TYPE_MASK_RDATA_DLG, mask) ||
+	    test_bit(RCA4_TYPE_MASK_WDATA_DLG, mask) ||
+	    test_bit(RCA4_TYPE_MASK_DIR_DLG, mask) ||
+	    test_bit(RCA4_TYPE_MASK_FILE_LAYOUT, mask) ||
+	    test_bit(RCA4_TYPE_MASK_BLK_LAYOUT, mask))
+		return true;
+	for (i = RCA4_TYPE_MASK_OBJ_LAYOUT_MIN;
+	     i <= RCA4_TYPE_MASK_OBJ_LAYOUT_MAX; i++)
+		if (test_bit(i, mask))
+			return true;
+	for (i = RCA4_TYPE_MASK_OTHER_LAYOUT_MIN;
+	     i <= RCA4_TYPE_MASK_OTHER_LAYOUT_MAX; i++)
+		if (test_bit(i, mask))
+			return true;
+	return false;
+}
+
 __be32 nfs4_callback_recallany(struct cb_recallanyargs *args, void *dummy)
 {
 	struct nfs_client *clp;
 	__be32 status;
 	fmode_t flags = 0;
 
-	status = htonl(NFS4ERR_OP_NOT_IN_SESSION);
+	status = cpu_to_be32(NFS4ERR_OP_NOT_IN_SESSION);
 	clp = nfs_find_client(args->craa_addr, 4);
 	if (clp == NULL)
 		goto out;
@@ -339,16 +654,25 @@ __be32 nfs4_callback_recallany(struct cb_recallanyargs *args, void *dummy)
 	dprintk("NFS: RECALL_ANY callback request from %s\n",
 		rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_ADDR));
 
+	status = cpu_to_be32(NFS4ERR_INVAL);
+	if (!validate_bitmap_values((const unsigned long *)
+				    &args->craa_type_mask))
+		return status;
+
+	status = cpu_to_be32(NFS4_OK);
 	if (test_bit(RCA4_TYPE_MASK_RDATA_DLG, (const unsigned long *)
 		     &args->craa_type_mask))
 		flags = FMODE_READ;
 	if (test_bit(RCA4_TYPE_MASK_WDATA_DLG, (const unsigned long *)
 		     &args->craa_type_mask))
 		flags |= FMODE_WRITE;
+	if (test_bit(RCA4_TYPE_MASK_FILE_LAYOUT, (const unsigned long *)
+		     &args->craa_type_mask))
+		if (pnfs_recall_all_layouts(clp) == -EAGAIN)
+			status = cpu_to_be32(NFS4ERR_DELAY);
 
 	if (flags)
 		nfs_expire_all_delegation_types(clp, flags);
-	status = htonl(NFS4_OK);
 out:
 	dprintk("%s: exit with status = %d\n", __func__, ntohl(status));
 	return status;
diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c
index 79b0fb7..a3f5279 100644
--- a/fs/nfs/callback_xdr.c
+++ b/fs/nfs/callback_xdr.c
@@ -22,6 +22,7 @@
 #define CB_OP_RECALL_RES_MAXSZ	(CB_OP_HDR_RES_MAXSZ)
 
 #if defined(CONFIG_NFS_V4_1)
+#define CB_OP_LAYOUTRECALL_RES_MAXSZ	(CB_OP_HDR_RES_MAXSZ)
 #define CB_OP_SEQUENCE_RES_MAXSZ	(CB_OP_HDR_RES_MAXSZ + \
 					4 + 1 + 3)
 #define CB_OP_RECALLANY_RES_MAXSZ	(CB_OP_HDR_RES_MAXSZ)
@@ -220,6 +221,60 @@ out:
 
 #if defined(CONFIG_NFS_V4_1)
 
+static __be32 decode_pnfs_layoutrecall_args(struct svc_rqst *rqstp,
+					    struct xdr_stream *xdr,
+					    struct cb_pnfs_layoutrecallargs *args)
+{
+	__be32 *p;
+	__be32 status = 0;
+
+	args->cbl_addr = svc_addr(rqstp);
+	p = read_buf(xdr, 4 * sizeof(uint32_t));
+	if (unlikely(p == NULL)) {
+		status = htonl(NFS4ERR_BADXDR);
+		goto out;
+	}
+
+	args->cbl_layout_type = ntohl(*p++);
+	args->cbl_seg.iomode = ntohl(*p++);
+	args->cbl_layoutchanged = ntohl(*p++);
+	args->cbl_recall_type = ntohl(*p++);
+
+	if (likely(args->cbl_recall_type == RETURN_FILE)) {
+		status = decode_fh(xdr, &args->cbl_fh);
+		if (unlikely(status != 0))
+			goto out;
+
+		p = read_buf(xdr, 2 * sizeof(uint64_t));
+		if (unlikely(p == NULL)) {
+			status = htonl(NFS4ERR_BADXDR);
+			goto out;
+		}
+		p = xdr_decode_hyper(p, &args->cbl_seg.offset);
+		p = xdr_decode_hyper(p, &args->cbl_seg.length);
+		status = decode_stateid(xdr, &args->cbl_stateid);
+		if (unlikely(status != 0))
+			goto out;
+	} else if (args->cbl_recall_type == RETURN_FSID) {
+		p = read_buf(xdr, 2 * sizeof(uint64_t));
+		if (unlikely(p == NULL)) {
+			status = htonl(NFS4ERR_BADXDR);
+			goto out;
+		}
+		p = xdr_decode_hyper(p, &args->cbl_fsid.major);
+		p = xdr_decode_hyper(p, &args->cbl_fsid.minor);
+	}
+	dprintk("%s: ltype 0x%x iomode %d changed %d recall_type %d "
+		"fsid %llx-%llx fhsize %d\n", __func__,
+		args->cbl_layout_type, args->cbl_seg.iomode,
+		args->cbl_layoutchanged, args->cbl_recall_type,
+		args->cbl_fsid.major, args->cbl_fsid.minor,
+		args->cbl_fh.size);
+out:
+	dprintk("%s: exit with status = %d\n", __func__, ntohl(status));
+	return status;
+}
+
 static __be32 decode_sessionid(struct xdr_stream *xdr,
 				 struct nfs4_sessionid *sid)
 {
@@ -574,12 +629,12 @@ preprocess_nfs41_op(int nop, unsigned int op_nr, struct callback_op **op)
 	case OP_CB_SEQUENCE:
 	case OP_CB_RECALL_ANY:
 	case OP_CB_RECALL_SLOT:
+	case OP_CB_LAYOUTRECALL:
 		*op = &callback_ops[op_nr];
 		break;
 
-	case OP_CB_LAYOUTRECALL:
-	case OP_CB_NOTIFY_DEVICEID:
 	case OP_CB_NOTIFY:
+	case OP_CB_NOTIFY_DEVICEID:
 	case OP_CB_PUSH_DELEG:
 	case OP_CB_RECALLABLE_OBJ_AVAIL:
 	case OP_CB_WANTS_CANCELLED:
@@ -739,6 +794,12 @@ static struct callback_op callback_ops[] = {
 		.res_maxsize = CB_OP_RECALL_RES_MAXSZ,
 	},
 #if defined(CONFIG_NFS_V4_1)
+	[OP_CB_LAYOUTRECALL] = {
+		.process_op = (callback_process_op_t)pnfs_cb_layoutrecall,
+		.decode_args =
+			(callback_decode_arg_t)decode_pnfs_layoutrecall_args,
+		.res_maxsize = CB_OP_LAYOUTRECALL_RES_MAXSZ,
+	},
 	[OP_CB_SEQUENCE] = {
 		.process_op = (callback_process_op_t)nfs4_callback_sequence,
 		.decode_args = (callback_decode_arg_t)decode_cb_sequence_args,
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index ef70bef..d6440fc 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -46,6 +46,7 @@ enum nfs4_client_state {
 	NFS4CLNT_DELEGRETURN,
 	NFS4CLNT_SESSION_RESET,
 	NFS4CLNT_RECALL_SLOT,
+	NFS4CLNT_LAYOUT_RECALL,
 };
 
 enum nfs4_session_state {
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 46/50] pnfs_submit: increase NFS_MAX_FILE_IO_SIZE
  2010-08-13 21:31                                                                                         ` [PATCH 45/50] pnfs_submit: cb_layoutrecall andros
@ 2010-08-13 21:31                                                                                           ` andros
  2010-08-13 21:31                                                                                             ` [PATCH 47/50] SQUASHME pnfs_post_submit: direct i/o andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 include/linux/nfs_xdr.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index e627788..42c5ccf 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -10,7 +10,7 @@
  * support a megabyte or more.  The default is left at 4096 bytes, which is
  * reasonable for NFS over UDP.
  */
-#define NFS_MAX_FILE_IO_SIZE	(1048576U)
+#define NFS_MAX_FILE_IO_SIZE	(4U * 1048576U)
 #define NFS_DEF_FILE_IO_SIZE	(4096U)
 #define NFS_MIN_FILE_IO_SIZE	(1024U)
 
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 47/50] SQUASHME pnfs_post_submit: direct i/o
  2010-08-13 21:31                                                                                           ` [PATCH 46/50] pnfs_submit: increase NFS_MAX_FILE_IO_SIZE andros
@ 2010-08-13 21:31                                                                                             ` andros
  2010-08-13 21:32                                                                                               ` [PATCH 48/50] SQUASHME pnfs_post_submit: layout type enum andros
  0 siblings, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:31 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/direct.c |  160 +++++++++++++++++++++++++++++++-----------------------
 1 files changed, 92 insertions(+), 68 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index ad4cd31..3ef9b0c 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -267,6 +267,38 @@ static const struct rpc_call_ops nfs_read_direct_ops = {
 	.rpc_release = nfs_direct_read_release,
 };
 
+static long nfs_direct_read_execute(struct nfs_read_data *data,
+				    struct rpc_task_setup *task_setup_data,
+				    struct rpc_message *msg)
+{
+	struct inode *inode = data->inode;
+	struct rpc_task *task;
+
+	nfs_fattr_init(&data->fattr);
+	msg->rpc_argp = &data->args;
+	msg->rpc_resp = &data->res;
+
+	task_setup_data->task = &data->task;
+	task_setup_data->callback_data = data;
+	NFS_PROTO(inode)->read_setup(data, msg);
+
+	task = rpc_run_task(task_setup_data);
+	if (IS_ERR(task))
+		return PTR_ERR(task);
+
+	rpc_put_task(task);
+
+	dprintk("NFS: %5u initiated direct read call "
+		"(req %s/%lld, %u bytes @ offset %llu)\n",
+		data->task.tk_pid,
+		inode->i_sb->s_id,
+		(long long)NFS_FILEID(inode),
+		data->args.count,
+		(unsigned long long)data->args.offset);
+
+	return 0;
+}
+
 /*
  * For each rsize'd chunk of the user's buffer, dispatch an NFS READ
  * operation.  If nfs_readdata_alloc() or get_user_pages() fails,
@@ -283,7 +315,6 @@ static ssize_t nfs_direct_read_schedule_segment(struct nfs_direct_req *dreq,
 	unsigned long user_addr = (unsigned long)iov->iov_base;
 	size_t count = iov->iov_len;
 	size_t rsize = NFS_SERVER(inode)->rsize;
-	struct rpc_task *task;
 	struct rpc_message msg = {
 		.rpc_cred = ctx->cred,
 	};
@@ -343,26 +374,9 @@ static ssize_t nfs_direct_read_schedule_segment(struct nfs_direct_req *dreq,
 		data->res.fattr = &data->fattr;
 		data->res.eof = 0;
 		data->res.count = bytes;
-		nfs_fattr_init(&data->fattr);
-		msg.rpc_argp = &data->args;
-		msg.rpc_resp = &data->res;
 
-		task_setup_data.task = &data->task;
-		task_setup_data.callback_data = data;
-		NFS_PROTO(inode)->read_setup(data, &msg);
-
-		task = rpc_run_task(&task_setup_data);
-		if (IS_ERR(task))
+		if (nfs_direct_read_execute(data, &task_setup_data, &msg))
 			break;
-		rpc_put_task(task);
-
-		dprintk("NFS: %5u initiated direct read call "
-			"(req %s/%Ld, %zu bytes @ offset %Lu)\n",
-				data->task.tk_pid,
-				inode->i_sb->s_id,
-				(long long)NFS_FILEID(inode),
-				bytes,
-				(unsigned long long)data->args.offset);
 
 		started += bytes;
 		user_addr += bytes;
@@ -448,12 +462,15 @@ static void nfs_direct_free_writedata(struct nfs_direct_req *dreq)
 }
 
 #if defined(CONFIG_NFS_V3) || defined(CONFIG_NFS_V4)
+static long nfs_direct_write_execute(struct nfs_write_data *data,
+				     struct rpc_task_setup *task_setup_data,
+				     struct rpc_message *msg);
+
 static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq)
 {
 	struct inode *inode = dreq->inode;
 	struct list_head *p;
 	struct nfs_write_data *data;
-	struct rpc_task *task;
 	struct rpc_message msg = {
 		.rpc_cred = dreq->ctx->cred,
 	};
@@ -487,25 +504,7 @@ static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq)
 		 * Reuse data->task; data->args should not have changed
 		 * since the original request was sent.
 		 */
-		task_setup_data.task = &data->task;
-		task_setup_data.callback_data = data;
-		msg.rpc_argp = &data->args;
-		msg.rpc_resp = &data->res;
-		NFS_PROTO(inode)->write_setup(data, &msg);
-
-		/*
-		 * We're called via an RPC callback, so BKL is already held.
-		 */
-		task = rpc_run_task(&task_setup_data);
-		if (!IS_ERR(task))
-			rpc_put_task(task);
-
-		dprintk("NFS: %5u rescheduled direct write call (req %s/%Ld, %u bytes @ offset %Lu)\n",
-				data->task.tk_pid,
-				inode->i_sb->s_id,
-				(long long)NFS_FILEID(inode),
-				data->args.count,
-				(unsigned long long)data->args.offset);
+		nfs_direct_write_execute(data, &task_setup_data, &msg);
 	}
 
 	if (put_dreq(dreq))
@@ -548,10 +547,31 @@ static const struct rpc_call_ops nfs_commit_direct_ops = {
 	.rpc_release = nfs_direct_commit_release,
 };
 
+static long nfs_direct_commit_execute(struct nfs_direct_req *dreq,
+				      struct nfs_write_data *data,
+				      struct rpc_task_setup *task_setup_data,
+				      struct rpc_message *msg)
+{
+	struct rpc_task *task;
+
+	NFS_PROTO(data->inode)->commit_setup(data, msg);
+
+	/* Note: task.tk_ops->rpc_release will free dreq->commit_data */
+	dreq->commit_data = NULL;
+
+	dprintk("NFS: %5u initiated commit call\n", data->task.tk_pid);
+
+	task = rpc_run_task(task_setup_data);
+	if (IS_ERR(task))
+		return PTR_ERR(task);
+
+	rpc_put_task(task);
+	return 0;
+}
+
 static void nfs_direct_commit_schedule(struct nfs_direct_req *dreq)
 {
 	struct nfs_write_data *data = dreq->commit_data;
-	struct rpc_task *task;
 	struct rpc_message msg = {
 		.rpc_argp = &data->args,
 		.rpc_resp = &data->res,
@@ -579,16 +599,7 @@ static void nfs_direct_commit_schedule(struct nfs_direct_req *dreq)
 	data->res.verf = &data->verf;
 	nfs_fattr_init(&data->fattr);
 
-	NFS_PROTO(data->inode)->commit_setup(data, &msg);
-
-	/* Note: task.tk_ops->rpc_release will free dreq->commit_data */
-	dreq->commit_data = NULL;
-
-	dprintk("NFS: %5u initiated commit call\n", data->task.tk_pid);
-
-	task = rpc_run_task(&task_setup_data);
-	if (!IS_ERR(task))
-		rpc_put_task(task);
+	nfs_direct_commit_execute(dreq, data, &task_setup_data, &msg);
 }
 
 static void nfs_direct_write_complete(struct nfs_direct_req *dreq, struct inode *inode)
@@ -690,6 +701,36 @@ static const struct rpc_call_ops nfs_write_direct_ops = {
 	.rpc_release = nfs_direct_write_release,
 };
 
+static long nfs_direct_write_execute(struct nfs_write_data *data,
+				     struct rpc_task_setup *task_setup_data,
+				     struct rpc_message *msg)
+{
+	struct inode *inode = data->inode;
+	struct rpc_task *task;
+
+	task_setup_data->task = &data->task;
+	task_setup_data->callback_data = data;
+	msg->rpc_argp = &data->args;
+	msg->rpc_resp = &data->res;
+	NFS_PROTO(inode)->write_setup(data, msg);
+
+	task = rpc_run_task(task_setup_data);
+	if (IS_ERR(task))
+		return PTR_ERR(task);
+
+	rpc_put_task(task);
+
+	dprintk("NFS: %5u initiated direct write call "
+		"(req %s/%lld, %u bytes @ offset %llu)\n",
+		data->task.tk_pid,
+		inode->i_sb->s_id,
+		(long long)NFS_FILEID(inode),
+		data->args.count,
+		(unsigned long long)data->args.offset);
+
+	return 0;
+}
+
 /*
  * For each wsize'd chunk of the user's buffer, dispatch an NFS WRITE
  * operation.  If nfs_writedata_alloc() or get_user_pages() fails,
@@ -705,7 +746,6 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_direct_req *dreq,
 	struct inode *inode = ctx->path.dentry->d_inode;
 	unsigned long user_addr = (unsigned long)iov->iov_base;
 	size_t count = iov->iov_len;
-	struct rpc_task *task;
 	struct rpc_message msg = {
 		.rpc_cred = ctx->cred,
 	};
@@ -771,24 +811,8 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_direct_req *dreq,
 		data->res.verf = &data->verf;
 		nfs_fattr_init(&data->fattr);
 
-		task_setup_data.task = &data->task;
-		task_setup_data.callback_data = data;
-		msg.rpc_argp = &data->args;
-		msg.rpc_resp = &data->res;
-		NFS_PROTO(inode)->write_setup(data, &msg);
-
-		task = rpc_run_task(&task_setup_data);
-		if (IS_ERR(task))
+		if (nfs_direct_write_execute(data, &task_setup_data, &msg))
 			break;
-		rpc_put_task(task);
-
-		dprintk("NFS: %5u initiated direct write call "
-			"(req %s/%Ld, %zu bytes @ offset %Lu)\n",
-				data->task.tk_pid,
-				inode->i_sb->s_id,
-				(long long)NFS_FILEID(inode),
-				bytes,
-				(unsigned long long)data->args.offset);
 
 		started += bytes;
 		user_addr += bytes;
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 48/50] SQUASHME pnfs_post_submit: layout type enum
  2010-08-13 21:31                                                                                             ` [PATCH 47/50] SQUASHME pnfs_post_submit: direct i/o andros
@ 2010-08-13 21:32                                                                                               ` andros
  2010-08-13 21:32                                                                                                 ` [PATCH 49/50] SQUASHME pnfs_post_submit: cb notify deviceid declarations andros
  2010-08-31 15:52                                                                                                 ` [PATCH 48/50] SQUASHME pnfs_post_submit: layout type enum Boaz Harrosh
  0 siblings, 2 replies; 69+ messages in thread
From: andros @ 2010-08-13 21:32 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 include/linux/nfs4.h |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 6ee7357..2d3d277 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -579,6 +579,8 @@ enum state_protect_how4 {
 
 enum pnfs_layouttype {
 	LAYOUT_NFSV4_1_FILES  = 1,
+	LAYOUT_OSD2_OBJECTS = 2,
+	LAYOUT_BLOCK_VOLUME = 3,
 };
 
 /* used for both layout return and recall */
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 49/50] SQUASHME pnfs_post_submit: cb notify deviceid declarations
  2010-08-13 21:32                                                                                               ` [PATCH 48/50] SQUASHME pnfs_post_submit: layout type enum andros
@ 2010-08-13 21:32                                                                                                 ` andros
  2010-08-13 21:32                                                                                                   ` [PATCH 50/50] SQUASHME pnfs_submit: remove this unused code andros
  2010-08-31 15:52                                                                                                 ` [PATCH 48/50] SQUASHME pnfs_post_submit: layout type enum Boaz Harrosh
  1 sibling, 1 reply; 69+ messages in thread
From: andros @ 2010-08-13 21:32 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/callback.h    |   17 +++++++++++++++++
 include/linux/nfs4.h |    5 +++++
 2 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/callback.h b/fs/nfs/callback.h
index ab9b421..b39ac86 100644
--- a/fs/nfs/callback.h
+++ b/fs/nfs/callback.h
@@ -152,6 +152,23 @@ struct cb_pnfs_layoutrecallargs {
 extern unsigned pnfs_cb_layoutrecall(struct cb_pnfs_layoutrecallargs *args,
 				     void *dummy);
 
+struct cb_pnfs_devicenotifyitem {
+	uint32_t		cbd_notify_type;
+	uint32_t		cbd_layout_type;
+	struct pnfs_deviceid	cbd_dev_id;
+	uint32_t		cbd_immediate;
+};
+
+/* XXX: Should be dynamic up to max compound size */
+#define NFS4_DEV_NOTIFY_MAXENTRIES 10
+struct cb_pnfs_devicenotifyargs {
+	struct sockaddr			*addr;
+	int				 ndevs;
+	struct cb_pnfs_devicenotifyitem	 devs[NFS4_DEV_NOTIFY_MAXENTRIES];
+};
+
+extern unsigned pnfs_cb_devicenotify(struct cb_pnfs_devicenotifyargs *args,
+				     void *dummy);
 #endif /* CONFIG_NFS_V4_1 */
 
 extern __be32 nfs4_callback_getattr(struct cb_getattrargs *args, struct cb_getattrres *res);
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 2d3d277..e947a32 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -596,6 +596,11 @@ enum pnfs_iomode {
 	IOMODE_ANY = 3,
 };
 
+enum pnfs_notify_deviceid_type4 {
+	NOTIFY_DEVICEID4_CHANGE = 1 << 1,
+	NOTIFY_DEVICEID4_DELETE = 1 << 2,
+};
+
 #define NFL4_UFLG_MASK			0x0000003F
 #define NFL4_UFLG_DENSE			0x00000001
 #define NFL4_UFLG_COMMIT_THRU_MDS	0x00000002
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 50/50] SQUASHME pnfs_submit: remove this unused code
  2010-08-13 21:32                                                                                                 ` [PATCH 49/50] SQUASHME pnfs_post_submit: cb notify deviceid declarations andros
@ 2010-08-13 21:32                                                                                                   ` andros
  2010-08-19 20:25                                                                                                     ` Benny Halevy
  2010-08-31 16:32                                                                                                     ` Boaz Harrosh
  0 siblings, 2 replies; 69+ messages in thread
From: andros @ 2010-08-13 21:32 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/client.c           |    1 +
 fs/nfs/file.c             |    1 +
 fs/nfs/inode.c            |   10 +++++++++-
 fs/nfs/nfs3proc.c         |    1 +
 fs/nfs/nfs4_fs.h          |    1 +
 fs/nfs/nfs4filelayout.c   |    3 +++
 fs/nfs/nfs4filelayout.h   |    6 ++++++
 fs/nfs/nfs4proc.c         |    2 ++
 fs/nfs/pnfs.h             |   15 +++++++++++++++
 fs/nfs/proc.c             |    1 +
 fs/nfs/super.c            |   25 +++++++++++++++++++++++++
 fs/nfs/write.c            |   11 +++++++++--
 include/linux/nfs4.h      |    9 +++++++++
 include/linux/nfs4_pnfs.h |   18 ++++++++++++++++++
 include/linux/nfs_xdr.h   |    2 ++
 include/linux/pnfs_xdr.h  |    6 +++---
 16 files changed, 106 insertions(+), 6 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index b53f61c..38ef02f 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -39,6 +39,7 @@
 #include <net/ipv6.h>
 #include <linux/nfs_xdr.h>
 #include <linux/sunrpc/bc_xprt.h>
+#include <linux/nfs4_pnfs.h>
 
 #include <asm/system.h>
 
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index d0ed767..bf17633 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -28,6 +28,7 @@
 #include <linux/aio.h>
 #include <linux/gfp.h>
 #include <linux/swap.h>
+#include <linux/pnfs_xdr.h>
 
 #include <asm/uaccess.h>
 #include <asm/system.h>
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 30d9ac6..6132f6b 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -279,7 +279,7 @@ nfs_fhget(struct super_block *sb, struct nfs_fh *fh, struct nfs_fattr *fattr)
 		 */
 		inode->i_op = NFS_SB(sb)->nfs_client->rpc_ops->file_inode_ops;
 		if (S_ISREG(inode->i_mode)) {
-			inode->i_fop = &nfs_file_operations;
+			inode->i_fop = NFS_SB(sb)->nfs_client->rpc_ops->file_ops;
 			inode->i_data.a_ops = &nfs_file_aops;
 			inode->i_data.backing_dev_info = &NFS_SB(sb)->backing_dev_info;
 		} else if (S_ISDIR(inode->i_mode)) {
@@ -1207,6 +1207,14 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
 		server->fsid = fattr->fsid;
 
 	/*
+	 * file needs layout commit, server attributes may be stale
+	 */
+	if (layoutcommit_needed(nfsi) && nfsi->change_attr >= fattr->change_attr) {
+		dprintk("NFS: %s: layoutcommit is needed for file %s/%ld\n",
+			__func__, inode->i_sb->s_id, inode->i_ino);
+		return 0;
+	}
+	/*
 	 * Update the read time so we don't revalidate too often.
 	 */
 	nfsi->read_cache_jiffies = fattr->time_start;
diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
index fabb4f2..304c63c 100644
--- a/fs/nfs/nfs3proc.c
+++ b/fs/nfs/nfs3proc.c
@@ -833,6 +833,7 @@ const struct nfs_rpc_ops nfs_v3_clientops = {
 	.dentry_ops	= &nfs_dentry_operations,
 	.dir_inode_ops	= &nfs3_dir_inode_operations,
 	.file_inode_ops	= &nfs3_file_inode_operations,
+	.file_ops	= &nfs_file_operations,
 	.getroot	= nfs3_proc_get_root,
 	.getattr	= nfs3_proc_getattr,
 	.setattr	= nfs3_proc_setattr,
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index d6440fc..dd584e5 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -334,6 +334,7 @@ extern void nfs_increment_lock_seqid(int status, struct nfs_seqid *seqid);
 extern void nfs_release_seqid(struct nfs_seqid *seqid);
 extern void nfs_free_seqid(struct nfs_seqid *seqid);
 
+/* write.c */
 extern const nfs4_stateid zero_stateid;
 
 /* nfs4xdr.c */
diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index fea1772..b2ce478 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -65,6 +65,9 @@ MODULE_DESCRIPTION("The NFSv4 file layout driver");
 /* Callback operations to the pNFS client */
 struct pnfs_client_operations *pnfs_callback_ops;
 
+/* Forward declaration */
+struct layoutdriver_io_operations filelayout_io_operations;
+
 int
 filelayout_initialize_mountpoint(struct nfs_client *clp)
 {
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index f8f7c05..1de176d 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -22,6 +22,7 @@
 
 #define NFS4_PNFS_MAX_STRIPE_CNT 4096
 #define NFS4_PNFS_MAX_MULTI_CNT  64 /* 256 fit into a u8 stripe_index */
+#define NFS4_PNFS_MAX_MULTI_DS   2
 
 #define FILE_DSADDR(lseg) (container_of(lseg->deviceid, \
 					struct nfs4_file_layout_dsaddr, \
@@ -50,6 +51,11 @@ struct nfs4_file_layout_dsaddr {
 	struct nfs4_pnfs_ds	*ds_list[1];
 };
 
+struct nfs4_pnfs_dev_hlist {
+	rwlock_t		dev_lock;
+	struct hlist_head	dev_list[NFS4_PNFS_DEV_HASH_SIZE];
+};
+
 struct nfs4_filelayout_segment {
 	u32 stripe_type;
 	u32 commit_through_mds;
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 8879fab..05f072c 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -4711,6 +4711,7 @@ int nfs4_proc_exchange_id(struct nfs_client *clp, struct rpc_cred *cred)
 	dprintk("<-- %s status= %d\n", __func__, status);
 	return status;
 }
+EXPORT_SYMBOL(nfs4_proc_exchange_id);
 
 struct nfs4_get_lease_time_data {
 	struct nfs4_get_lease_time_args *args;
@@ -5887,6 +5888,7 @@ const struct nfs_rpc_ops nfs_v4_clientops = {
 	.dentry_ops	= &nfs4_dentry_operations,
 	.dir_inode_ops	= &nfs4_dir_inode_operations,
 	.file_inode_ops	= &nfs4_file_inode_operations,
+	.file_ops	= &nfs_file_operations,
 	.getroot	= nfs4_proc_get_root,
 	.getattr	= nfs4_proc_getattr,
 	.setattr	= nfs4_proc_setattr,
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 80f67c7..78d5c30 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -30,6 +30,8 @@ extern int pnfs4_proc_layoutcommit(struct pnfs_layoutcommit_data *data,
 extern int pnfs4_proc_layoutreturn(struct nfs4_pnfs_layoutreturn *lrp, bool wait);
 
 /* pnfs.c */
+extern const nfs4_stateid zero_stateid;
+
 void put_lseg(struct pnfs_layout_segment *lseg);
 void _pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
 	enum pnfs_iomode access_type,
@@ -69,6 +71,9 @@ void pnfs_get_layout_stateid(nfs4_stateid *dst, struct pnfs_layout_type *lo);
 #define PNFS_EXISTS_LDIO_OP(srv, opname) ((srv)->pnfs_curr_ld &&	\
 				     (srv)->pnfs_curr_ld->ld_io_ops &&	\
 				     (srv)->pnfs_curr_ld->ld_io_ops->opname)
+#define PNFS_EXISTS_LDPOLICY_OP(srv, opname) ((srv)->pnfs_curr_ld &&	\
+				     (srv)->pnfs_curr_ld->ld_policy_ops && \
+				     (srv)->pnfs_curr_ld->ld_policy_ops->opname)
 
 #define LAYOUT_NFSV4_1_MODULE_PREFIX "nfs-layouttype4"
 
@@ -176,6 +181,16 @@ pnfs_try_to_commit(struct nfs_write_data *data,
 	return PNFS_NOT_ATTEMPTED;
 }
 
+static inline int pnfs_get_write_status(struct nfs_write_data *data)
+{
+	return 0;
+}
+
+static inline int pnfs_get_read_status(struct nfs_read_data *data)
+{
+	return 0;
+}
+
 static inline int pnfs_layoutcommit_inode(struct inode *inode, int sync)
 {
 	return 0;
diff --git a/fs/nfs/proc.c b/fs/nfs/proc.c
index 737160d..1837a05 100644
--- a/fs/nfs/proc.c
+++ b/fs/nfs/proc.c
@@ -694,6 +694,7 @@ const struct nfs_rpc_ops nfs_v2_clientops = {
 	.dentry_ops	= &nfs_dentry_operations,
 	.dir_inode_ops	= &nfs_dir_inode_operations,
 	.file_inode_ops	= &nfs_file_inode_operations,
+	.file_ops	= &nfs_file_operations,
 	.getroot	= nfs_proc_get_root,
 	.getattr	= nfs_proc_getattr,
 	.setattr	= nfs_proc_setattr,
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index f9df16d..cd9f8d4 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -64,6 +64,7 @@
 #include "iostat.h"
 #include "internal.h"
 #include "fscache.h"
+#include "pnfs.h"
 
 #define NFSDBG_FACILITY		NFSDBG_VFS
 
@@ -669,6 +670,28 @@ static int nfs_show_options(struct seq_file *m, struct vfsmount *mnt)
 
 	return 0;
 }
+#ifdef CONFIG_NFS_V4_1
+void show_sessions(struct seq_file *m, struct nfs_server *server)
+{
+	if (nfs4_has_session(server->nfs_client))
+		seq_printf(m, ",sessions");
+}
+#else
+void show_sessions(struct seq_file *m, struct nfs_server *server) {}
+#endif
+
+#ifdef CONFIG_NFS_V4_1
+void show_pnfs(struct seq_file *m, struct nfs_server *server)
+{
+	seq_printf(m, ",pnfs=");
+	if (server->pnfs_curr_ld)
+		seq_printf(m, "%s", server->pnfs_curr_ld->name);
+	else
+		seq_printf(m, "not configured");
+}
+#else  /* CONFIG_NFS_V4_1 */
+void show_pnfs(struct seq_file *m, struct nfs_server *server) {}
+#endif /* CONFIG_NFS_V4_1 */
 
 /*
  * Present statistical information for this VFS mountpoint
@@ -707,6 +730,8 @@ static int nfs_show_stats(struct seq_file *m, struct vfsmount *mnt)
 		seq_printf(m, "bm0=0x%x", nfss->attr_bitmask[0]);
 		seq_printf(m, ",bm1=0x%x", nfss->attr_bitmask[1]);
 		seq_printf(m, ",acl=0x%x", nfss->acl_bitmask);
+		show_sessions(m, nfss);
+		show_pnfs(m, nfss);
 	}
 #endif
 
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 2251551..99af688 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -20,6 +20,7 @@
 #include <linux/nfs_mount.h>
 #include <linux/nfs_page.h>
 #include <linux/backing-dev.h>
+#include <linux/module.h>
 
 #include <asm/uaccess.h>
 
@@ -68,6 +69,7 @@ void nfs_commit_free(struct nfs_write_data *p)
 		kfree(p->pagevec);
 	mempool_free(p, nfs_commit_mempool);
 }
+EXPORT_SYMBOL(nfs_commit_free);
 
 struct nfs_write_data *nfs_writedata_alloc(unsigned int pagecount)
 {
@@ -1252,6 +1254,9 @@ int nfs_writeback_done(struct rpc_task *task, struct nfs_write_data *data)
 	if (task->tk_status >= 0 && resp->count < argp->count) {
 		static unsigned long    complain;
 
+		dprintk("NFS:       short write:"
+			" (resp->count %u) < (argp->count = %u)\n",
+			resp->count, argp->count);
 		nfs_inc_stats(data->inode, NFSIOS_SHORTWRITE);
 
 		/* Has the server at least made some progress? */
@@ -1268,7 +1273,7 @@ int nfs_writeback_done(struct rpc_task *task, struct nfs_write_data *data)
 				 */
 				argp->stable = NFS_FILE_SYNC;
 			}
-			nfs_restart_rpc(task, server->nfs_client);
+			nfs_restart_rpc(task, clp);
 			return -EAGAIN;
 		}
 		if (time_before(complain, jiffies)) {
@@ -1607,6 +1612,7 @@ int nfs_write_inode(struct inode *inode, struct writeback_control *wbc)
  */
 int nfs_wb_all(struct inode *inode)
 {
+	int ret;
 	struct writeback_control wbc = {
 		.sync_mode = WB_SYNC_ALL,
 		.nr_to_write = LONG_MAX,
@@ -1614,7 +1620,8 @@ int nfs_wb_all(struct inode *inode)
 		.range_end = LLONG_MAX,
 	};
 
-	return sync_inode(inode, &wbc);
+	ret = sync_inode(inode, &wbc);
+	return ret;
 }
 
 int nfs_wb_page_cancel(struct inode *inode, struct page *page)
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index e947a32..e6748cd 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -492,6 +492,7 @@ enum lock_type4 {
 #define FATTR4_WORD1_TIME_MODIFY_SET    (1UL << 22)
 #define FATTR4_WORD1_MOUNTED_ON_FILEID  (1UL << 23)
 #define FATTR4_WORD1_FS_LAYOUT_TYPES    (1UL << 30)
+#define FATTR4_WORD2_LAYOUT_BLKSIZE     (1UL << 1)
 
 #define NFSPROC4_NULL 0
 #define NFSPROC4_COMPOUND 1
@@ -606,6 +607,14 @@ enum pnfs_notify_deviceid_type4 {
 #define NFL4_UFLG_COMMIT_THRU_MDS	0x00000002
 #define NFL4_UFLG_STRIPE_UNIT_SIZE_MASK	0xFFFFFFC0
 
+/* Encoded in the loh_body field of type layouthint4 */
+enum filelayout_hint_care4 {
+	NFLH4_CARE_DENSE		= NFL4_UFLG_DENSE,
+	NFLH4_CARE_COMMIT_THRU_MDS	= NFL4_UFLG_COMMIT_THRU_MDS,
+	NFLH4_CARE_STRIPE_UNIT_SIZE	= 0x00000040,
+	NFLH4_CARE_STRIPE_COUNT		= 0x00000080
+};
+
 #endif
 #endif
 
diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
index ef160e6..6236687 100644
--- a/include/linux/nfs4_pnfs.h
+++ b/include/linux/nfs4_pnfs.h
@@ -20,6 +20,8 @@ enum pnfs_try_status {
 	PNFS_NOT_ATTEMPTED = 1,
 };
 
+#define NFS4_PNFS_GETDEVLIST_MAXNUM 16
+
 /* Per-layout driver specific registration structure */
 struct pnfs_layoutdriver_type {
 	const u32 id;
@@ -60,6 +62,12 @@ PNFS_LD_IO_OPS(struct pnfs_layout_type *lo)
 	return PNFS_LD(lo)->ld_io_ops;
 }
 
+static inline struct layoutdriver_policy_operations *
+PNFS_LD_POLICY_OPS(struct pnfs_layout_type *lo)
+{
+	return PNFS_LD(lo)->ld_policy_ops;
+}
+
 static inline bool
 has_layout(struct nfs_inode *nfsi)
 {
@@ -165,6 +173,12 @@ struct pnfs_device {
 	unsigned int  dev_notify_types;
 };
 
+struct pnfs_devicelist {
+	unsigned int		eof;
+	unsigned int		num_devs;
+	struct pnfs_deviceid	dev_id[NFS4_PNFS_GETDEVLIST_MAXNUM];
+};
+
 /*
  * Device ID RCU cache. A device ID is unique per client ID and layout type.
  */
@@ -222,6 +236,7 @@ extern void nfs4_unset_layout_deviceid(struct pnfs_layout_segment *,
 struct pnfs_client_operations {
 	int (*nfs_getdeviceinfo) (struct nfs_server *,
 				  struct pnfs_device *dev);
+	void (*nfs_return_layout) (struct inode *);
 };
 
 extern struct pnfs_client_operations pnfs_ops;
@@ -229,4 +244,7 @@ extern struct pnfs_client_operations pnfs_ops;
 extern struct pnfs_client_operations *pnfs_register_layoutdriver(struct pnfs_layoutdriver_type *);
 extern void pnfs_unregister_layoutdriver(struct pnfs_layoutdriver_type *);
 
+#define NFS4_PNFS_MAX_LAYOUTS 4
+#define NFS4_PNFS_PRIVATE_LAYOUT 0x80000000
+
 #endif /* LINUX_NFS4_PNFS_H */
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 42c5ccf..4afeaeb 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1046,6 +1046,7 @@ struct nfs_rpc_ops {
 	const struct dentry_operations *dentry_ops;
 	const struct inode_operations *dir_inode_ops;
 	const struct inode_operations *file_inode_ops;
+	const struct file_operations *file_ops;
 
 	int	(*getroot) (struct nfs_server *, struct nfs_fh *,
 			    struct nfs_fsinfo *);
@@ -1110,6 +1111,7 @@ struct nfs_rpc_ops {
 extern const struct nfs_rpc_ops	nfs_v2_clientops;
 extern const struct nfs_rpc_ops	nfs_v3_clientops;
 extern const struct nfs_rpc_ops	nfs_v4_clientops;
+extern const struct nfs_rpc_ops	pnfs_v4_clientops;
 extern struct rpc_version	nfs_version2;
 extern struct rpc_version	nfs_version3;
 extern struct rpc_version	nfs_version4;
diff --git a/include/linux/pnfs_xdr.h b/include/linux/pnfs_xdr.h
index ed16c65..c9a01b3 100644
--- a/include/linux/pnfs_xdr.h
+++ b/include/linux/pnfs_xdr.h
@@ -89,9 +89,9 @@ struct pnfs_layoutcommit_data {
 };
 
 struct nfs4_pnfs_layoutreturn_arg {
-	__u32   reclaim;
-	__u32   layout_type;
-	__u32   return_type;
+	__u32	reclaim;
+	__u32	layout_type;
+	__u32	return_type;
 	struct nfs4_pnfs_layout_segment lseg;
 	struct inode *inode;
 	struct nfs4_sequence_args seq_args;
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH 05/50] pnfs_submit: pnfs and nfslayoutdriver kconfig
  2010-08-13 21:31         ` [PATCH 05/50] pnfs_submit: pnfs and nfslayoutdriver kconfig andros
  2010-08-13 21:31           ` [PATCH 06/50] pnfs_submit: introduce include/linux/nfs4_pnfs.h andros
@ 2010-08-18 20:25           ` Christoph Hellwig
  2010-08-18 21:09             ` Benny Halevy
  1 sibling, 1 reply; 69+ messages in thread
From: Christoph Hellwig @ 2010-08-18 20:25 UTC (permalink / raw)
  To: andros; +Cc: bhalevy, linux-nfs

On Fri, Aug 13, 2010 at 05:31:17PM -0400, andros@netapp.com wrote:
> From: The pNFS Team <linux-nfs@vger.kernel.org>
> 
> Signed-off-by: Benny Halevy <bhalevy@panasas.com>

If you split out the Kconfig from the code it guards it should come last
in the series so that the code can't be enabled until it's complete.

>  	depends on NFS_V4 && EXPERIMENTAL
>  	help
>  	  This option enables support for minor version 1 of the NFSv4 protocol
> -	  (draft-ietf-nfsv4-minorversion1) in the kernel's NFS client.
> +	  (RFC5661) including support for the parallel NFS (pNFS) features
> +	  in the kernel's NFS client.
> +
> +	  Unless you're an NFS developer, say N.

How much code does the pnfs support add to the nfs.ko module?  Just
including this unconditionally might not be an all that good idea.

Also Trond seemed to be pretty determined to split up nfs.ko into
version specific modules, and pnfs and the various layout drivers are
pretty good candidates for that.  Maybe the pnfs patches should be ontop
of that?

> +config PNFS_FILE_LAYOUT
> +	tristate "NFS client support for the pNFS nfs-files layout (DEVELOPER ONLY)"
> +	depends on NFS_FS && NFS_V4_1

is it really developers-only at this point?  And if it is so why doesn't
it depend on EXPERIMENTAL

> +	default y

defaulting to y for random fringe features is frowned upon.


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 06/50] pnfs_submit: introduce include/linux/nfs4_pnfs.h
  2010-08-13 21:31           ` [PATCH 06/50] pnfs_submit: introduce include/linux/nfs4_pnfs.h andros
  2010-08-13 21:31             ` [PATCH 07/50] pnfs_submit: introduce include/linux/pnfs_xdr.h andros
@ 2010-08-18 20:27             ` Christoph Hellwig
  2010-08-18 20:48               ` William A. (Andy) Adamson
  2010-08-18 20:50               ` Benny Halevy
  1 sibling, 2 replies; 69+ messages in thread
From: Christoph Hellwig @ 2010-08-18 20:27 UTC (permalink / raw)
  To: andros; +Cc: bhalevy, linux-nfs, Dean Hildebrand

> +++ b/include/linux/nfs4_pnfs.h
> @@ -0,0 +1,15 @@
> +/*
> + *  include/linux/nfs4_pnfs.h

Please don't include these kinds of comments, they only purpose they
serve is frequently getting out of date.  That applies to just about
every file added in this series.

> + *
> + *  Common data structures needed by the pnfs client and pnfs layout driver.
> + *
> + *  Copyright (c) 2002 The Regents of the University of Michigan.
> + *  All rights reserved.
> + *
> + *  Dean Hildebrand   <dhildebz-nNDzPDmKTdnSiEDVxGk4TQ@public.gmane.org>
> + */
> +
> +#ifndef LINUX_NFS4_PNFS_H
> +#define LINUX_NFS4_PNFS_H
> +
> +#endif /* LINUX_NFS4_PNFS_H */

Adding a file that only contains copyrights and inclusion headers is
rather odd.  I think you want your split a little more corse grained.


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 09/50] pnfs_submit: introduce fs/nfs/pnfs.c
  2010-08-13 21:31                 ` [PATCH 09/50] pnfs_submit: introduce fs/nfs/pnfs.c andros
  2010-08-13 21:31                   ` [PATCH 10/50] pnfs_submit: register unregister pnfs module andros
@ 2010-08-18 20:28                   ` Christoph Hellwig
  2010-08-19 17:21                     ` J. Bruce Fields
  1 sibling, 1 reply; 69+ messages in thread
From: Christoph Hellwig @ 2010-08-18 20:28 UTC (permalink / raw)
  To: andros; +Cc: bhalevy, linux-nfs, Dean Hildebrand

This file uses a different license than the rest of Linux.  I think
you at least need to dual-license it under the GPLv2.


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 10/50] pnfs_submit: register unregister pnfs module
  2010-08-13 21:31                   ` [PATCH 10/50] pnfs_submit: register unregister pnfs module andros
  2010-08-13 21:31                     ` [PATCH 11/50] pnfs_submit: set and unset pnfs layoutdriver modules andros
@ 2010-08-18 20:29                     ` Christoph Hellwig
  2010-08-18 20:49                       ` Benny Halevy
  1 sibling, 1 reply; 69+ messages in thread
From: Christoph Hellwig @ 2010-08-18 20:29 UTC (permalink / raw)
  To: andros; +Cc: bhalevy, linux-nfs

> +#ifdef CONFIG_NFS_V4_1
> +	err = pnfs_initialize();
> +	if (err)
> +		goto out00;
> +#endif /* CONFIG_NFS_V4_1 */

Unless the pnfs code goes into it's own module anyway this really
screams for stubs for the !CONFIG_NFS_V4_1 case to avoid all the
ifdef mess.


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 11/50] pnfs_submit: set and unset pnfs layoutdriver modules
  2010-08-13 21:31                     ` [PATCH 11/50] pnfs_submit: set and unset pnfs layoutdriver modules andros
  2010-08-13 21:31                       ` [PATCH 12/50] pnfs_submit: generic pnfs deviceid cache andros
@ 2010-08-18 20:31                       ` Christoph Hellwig
  2010-08-18 20:46                         ` Benny Halevy
  1 sibling, 1 reply; 69+ messages in thread
From: Christoph Hellwig @ 2010-08-18 20:31 UTC (permalink / raw)
  To: andros; +Cc: bhalevy, linux-nfs

I don't think such a fine-grained splitup is reviewable at all.
Anything more than a couple of patches for core nfs client changes, 
one patch for core pnfs support and one for the file layout driver
is simply too much.


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 11/50] pnfs_submit: set and unset pnfs layoutdriver modules
  2010-08-18 20:31                       ` [PATCH 11/50] pnfs_submit: set and unset pnfs layoutdriver modules Christoph Hellwig
@ 2010-08-18 20:46                         ` Benny Halevy
  2010-08-19  9:43                           ` Christoph Hellwig
  0 siblings, 1 reply; 69+ messages in thread
From: Benny Halevy @ 2010-08-18 20:46 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: andros, linux-nfs

On Aug. 18, 2010, 23:31 +0300, Christoph Hellwig <hch@infradead.org> wrote:
> I don't think such a fine-grained splitup is reviewable at all.
> Anything more than a couple of patches for core nfs client changes, 
> one patch for core pnfs support and one for the file layout driver
> is simply too much.
> 

The whole enchilada is about 8500 lines of diff.
Don't you think that splitting it up to self contained bite-sized patches
is better for reviewing it?

Benny

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 06/50] pnfs_submit: introduce include/linux/nfs4_pnfs.h
  2010-08-18 20:27             ` [PATCH 06/50] pnfs_submit: introduce include/linux/nfs4_pnfs.h Christoph Hellwig
@ 2010-08-18 20:48               ` William A. (Andy) Adamson
  2010-08-18 20:50               ` Benny Halevy
  1 sibling, 0 replies; 69+ messages in thread
From: William A. (Andy) Adamson @ 2010-08-18 20:48 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: bhalevy, linux-nfs, Dean Hildebrand

On Wed, Aug 18, 2010 at 4:27 PM, Christoph Hellwig <hch@infradead.org> =
wrote:
>> +++ b/include/linux/nfs4_pnfs.h
>> @@ -0,0 +1,15 @@
>> +/*
>> + * =A0include/linux/nfs4_pnfs.h
>
> Please don't include these kinds of comments, they only purpose they
> serve is frequently getting out of date. =A0That applies to just abou=
t
> every file added in this series.
>
>> + *
>> + * =A0Common data structures needed by the pnfs client and pnfs lay=
out driver.
>> + *
>> + * =A0Copyright (c) 2002 The Regents of the University of Michigan.
>> + * =A0All rights reserved.
>> + *
>> + * =A0Dean Hildebrand =A0 <dhildebz-nNDzPDmKTdnSiEDVxGk4TQ@public.gmane.org>
>> + */
>> +
>> +#ifndef LINUX_NFS4_PNFS_H
>> +#define LINUX_NFS4_PNFS_H
>> +
>> +#endif /* LINUX_NFS4_PNFS_H */
>
> Adding a file that only contains copyrights and inclusion headers is
> rather odd. =A0I think you want your split a little more corse graine=
d.

Agreed. This patch set is really a first go at squashing 281 patches
into 50. We have more know re-org and cleanup, just wanted to do an
initial re-org keeping the tree the same as a first step.

Thanks for your comments.

-->Andy

>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" =
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 10/50] pnfs_submit: register unregister pnfs module
  2010-08-18 20:29                     ` [PATCH 10/50] pnfs_submit: register unregister pnfs module Christoph Hellwig
@ 2010-08-18 20:49                       ` Benny Halevy
  0 siblings, 0 replies; 69+ messages in thread
From: Benny Halevy @ 2010-08-18 20:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: andros, linux-nfs

On Aug. 18, 2010, 23:29 +0300, Christoph Hellwig <hch@infradead.org> wrote:
>> +#ifdef CONFIG_NFS_V4_1
>> +	err = pnfs_initialize();
>> +	if (err)
>> +		goto out00;
>> +#endif /* CONFIG_NFS_V4_1 */
> 
> Unless the pnfs code goes into it's own module anyway this really
> screams for stubs for the !CONFIG_NFS_V4_1 case to avoid all the
> ifdef mess.
> 

In this case we followed the existing "style" in this function.
That said, I agree that a stub for CONFIG_NFS_V4_1 will work better. 

Benny

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 06/50] pnfs_submit: introduce include/linux/nfs4_pnfs.h
  2010-08-18 20:27             ` [PATCH 06/50] pnfs_submit: introduce include/linux/nfs4_pnfs.h Christoph Hellwig
  2010-08-18 20:48               ` William A. (Andy) Adamson
@ 2010-08-18 20:50               ` Benny Halevy
  1 sibling, 0 replies; 69+ messages in thread
From: Benny Halevy @ 2010-08-18 20:50 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: andros, linux-nfs, Dean Hildebrand

On Aug. 18, 2010, 23:27 +0300, Christoph Hellwig <hch@infradead.org> wrote:
>> +++ b/include/linux/nfs4_pnfs.h
>> @@ -0,0 +1,15 @@
>> +/*
>> + *  include/linux/nfs4_pnfs.h
> 
> Please don't include these kinds of comments, they only purpose they
> serve is frequently getting out of date.  That applies to just about
> every file added in this series.
> 

Agreed.

>> + *
>> + *  Common data structures needed by the pnfs client and pnfs layout driver.
>> + *
>> + *  Copyright (c) 2002 The Regents of the University of Michigan.
>> + *  All rights reserved.
>> + *
>> + *  Dean Hildebrand   <dhildebz-nNDzPDmKTdnSiEDVxGk4TQ@public.gmane.org>
>> + */
>> +
>> +#ifndef LINUX_NFS4_PNFS_H
>> +#define LINUX_NFS4_PNFS_H
>> +
>> +#endif /* LINUX_NFS4_PNFS_H */
> 
> Adding a file that only contains copyrights and inclusion headers is
> rather odd.  I think you want your split a little more corse grained.
> 

Yup. This was originally done for patch mechanics reasons.
It can be just part of the first patch that uses the file.

Benny

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 05/50] pnfs_submit: pnfs and nfslayoutdriver kconfig
  2010-08-18 20:25           ` [PATCH 05/50] pnfs_submit: pnfs and nfslayoutdriver kconfig Christoph Hellwig
@ 2010-08-18 21:09             ` Benny Halevy
  2010-08-19  9:45               ` Christoph Hellwig
  0 siblings, 1 reply; 69+ messages in thread
From: Benny Halevy @ 2010-08-18 21:09 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: andros, linux-nfs

On Aug. 18, 2010, 23:25 +0300, Christoph Hellwig <hch@infradead.org> wrote:
> On Fri, Aug 13, 2010 at 05:31:17PM -0400, andros@netapp.com wrote:
>> From: The pNFS Team <linux-nfs@vger.kernel.org>
>>
>> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
> 
> If you split out the Kconfig from the code it guards it should come last
> in the series so that the code can't be enabled until it's complete.
> 

But then it's harder to bisect it, or even make sure each patch
compiles :-(

>>  	depends on NFS_V4 && EXPERIMENTAL
>>  	help
>>  	  This option enables support for minor version 1 of the NFSv4 protocol
>> -	  (draft-ietf-nfsv4-minorversion1) in the kernel's NFS client.
>> +	  (RFC5661) including support for the parallel NFS (pNFS) features
>> +	  in the kernel's NFS client.
>> +
>> +	  Unless you're an NFS developer, say N.
> 
> How much code does the pnfs support add to the nfs.ko module?  Just
> including this unconditionally might not be an all that good idea.
> 
> Also Trond seemed to be pretty determined to split up nfs.ko into
> version specific modules, and pnfs and the various layout drivers are
> pretty good candidates for that.  Maybe the pnfs patches should be ontop
> of that?
> 

Maybe.  I guess it's up to Trond.  But previously as the team discussed,
we consider pNFS an integral part of NFSv4.1 and to reduce the number of
possible configurations we would like to keep the core pnfs functionality
on the same config option as NFS_V4_1.  To disable pNFS at compile time,
the user may still unconfig the layout modules.

>> +config PNFS_FILE_LAYOUT
>> +	tristate "NFS client support for the pNFS nfs-files layout (DEVELOPER ONLY)"
>> +	depends on NFS_FS && NFS_V4_1
> 
> is it really developers-only at this point?

I think it should be this way, at least until we push the whole pnfs
branch (not just the files-layout only part) upstream.

> And if it is so why doesn't
> it depend on EXPERIMENTAL
> 

Good catch.  It should depend on experimental.

>> +	default y
> 
> defaulting to y for random fringe features is frowned upon.

The problem is that it will rot otherwise.
I'd rather disable it by default using a variable and provide a way to
enable it at run time.

Benny

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 11/50] pnfs_submit: set and unset pnfs layoutdriver modules
  2010-08-18 20:46                         ` Benny Halevy
@ 2010-08-19  9:43                           ` Christoph Hellwig
  0 siblings, 0 replies; 69+ messages in thread
From: Christoph Hellwig @ 2010-08-19  9:43 UTC (permalink / raw)
  To: Benny Halevy; +Cc: Christoph Hellwig, andros, linux-nfs

On Wed, Aug 18, 2010 at 11:46:15PM +0300, Benny Halevy wrote:
> On Aug. 18, 2010, 23:31 +0300, Christoph Hellwig <hch@infradead.org> wrote:
> > I don't think such a fine-grained splitup is reviewable at all.
> > Anything more than a couple of patches for core nfs client changes, 
> > one patch for core pnfs support and one for the file layout driver
> > is simply too much.
> > 
> 
> The whole enchilada is about 8500 lines of diff.
> Don't you think that splitting it up to self contained bite-sized patches
> is better for reviewing it?

Depends on your defintion of self contained.  If you actually have
a first patch with a functional and working subset and just keep adding
optimizations or new features that's okay, but otherwise it's poinless.
If you submit a single driver / feature it needs to be a reviewed as a
whole and not artifical pieces.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 05/50] pnfs_submit: pnfs and nfslayoutdriver kconfig
  2010-08-18 21:09             ` Benny Halevy
@ 2010-08-19  9:45               ` Christoph Hellwig
  0 siblings, 0 replies; 69+ messages in thread
From: Christoph Hellwig @ 2010-08-19  9:45 UTC (permalink / raw)
  To: Benny Halevy; +Cc: Christoph Hellwig, andros, linux-nfs

On Thu, Aug 19, 2010 at 12:09:06AM +0300, Benny Halevy wrote:
> >> +	default y
> > 
> > defaulting to y for random fringe features is frowned upon.
> 
> The problem is that it will rot otherwise.
> I'd rather disable it by default using a variable and provide a way to
> enable it at run time.

That's unrelated.  Features should not default to y in Kconfig except
for very specific cases, e.g. when a new variable is required to be
set to keep working from a previous setup.  pNFS clearly does not fall
into that category.


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 09/50] pnfs_submit: introduce fs/nfs/pnfs.c
  2010-08-18 20:28                   ` [PATCH 09/50] pnfs_submit: introduce fs/nfs/pnfs.c Christoph Hellwig
@ 2010-08-19 17:21                     ` J. Bruce Fields
  0 siblings, 0 replies; 69+ messages in thread
From: J. Bruce Fields @ 2010-08-19 17:21 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: andros, bhalevy, linux-nfs, Dean Hildebrand

On Wed, Aug 18, 2010 at 04:28:24PM -0400, Christoph Hellwig wrote:
> This file uses a different license than the rest of Linux.  I think
> you at least need to dual-license it under the GPLv2.

It's a common BSD-like license, GPL-compatible, and already used
elsewhere.  There's no need to dual-license.

(Agreed with point elsewhere that this should be merged with another
patch.)

--b.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 50/50] SQUASHME pnfs_submit: remove this unused code
  2010-08-13 21:32                                                                                                   ` [PATCH 50/50] SQUASHME pnfs_submit: remove this unused code andros
@ 2010-08-19 20:25                                                                                                     ` Benny Halevy
  2010-08-31 16:32                                                                                                     ` Boaz Harrosh
  1 sibling, 0 replies; 69+ messages in thread
From: Benny Halevy @ 2010-08-19 20:25 UTC (permalink / raw)
  To: andros; +Cc: linux-nfs

Andy, I'm not sure everything in this patch can be dropped.
Please see comments below.

Thanks!

On Aug. 14, 2010, 0:32 +0300, andros@netapp.com wrote:
> From: Andy Adamson <andros@netapp.com>
> 
> Signed-off-by: Andy Adamson <andros@netapp.com>
> ---
>  fs/nfs/client.c           |    1 +
>  fs/nfs/file.c             |    1 +
>  fs/nfs/inode.c            |   10 +++++++++-
>  fs/nfs/nfs3proc.c         |    1 +
>  fs/nfs/nfs4_fs.h          |    1 +
>  fs/nfs/nfs4filelayout.c   |    3 +++
>  fs/nfs/nfs4filelayout.h   |    6 ++++++
>  fs/nfs/nfs4proc.c         |    2 ++
>  fs/nfs/pnfs.h             |   15 +++++++++++++++
>  fs/nfs/proc.c             |    1 +
>  fs/nfs/super.c            |   25 +++++++++++++++++++++++++
>  fs/nfs/write.c            |   11 +++++++++--
>  include/linux/nfs4.h      |    9 +++++++++
>  include/linux/nfs4_pnfs.h |   18 ++++++++++++++++++
>  include/linux/nfs_xdr.h   |    2 ++
>  include/linux/pnfs_xdr.h  |    6 +++---
>  16 files changed, 106 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> index b53f61c..38ef02f 100644
> --- a/fs/nfs/client.c
> +++ b/fs/nfs/client.c
> @@ -39,6 +39,7 @@
>  #include <net/ipv6.h>
>  #include <linux/nfs_xdr.h>
>  #include <linux/sunrpc/bc_xprt.h>
> +#include <linux/nfs4_pnfs.h>
>  
>  #include <asm/system.h>
>  
> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> index d0ed767..bf17633 100644
> --- a/fs/nfs/file.c
> +++ b/fs/nfs/file.c
> @@ -28,6 +28,7 @@
>  #include <linux/aio.h>
>  #include <linux/gfp.h>
>  #include <linux/swap.h>
> +#include <linux/pnfs_xdr.h>
>  
>  #include <asm/uaccess.h>
>  #include <asm/system.h>
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index 30d9ac6..6132f6b 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -279,7 +279,7 @@ nfs_fhget(struct super_block *sb, struct nfs_fh *fh, struct nfs_fattr *fattr)
>  		 */
>  		inode->i_op = NFS_SB(sb)->nfs_client->rpc_ops->file_inode_ops;
>  		if (S_ISREG(inode->i_mode)) {
> -			inode->i_fop = &nfs_file_operations;
> +			inode->i_fop = NFS_SB(sb)->nfs_client->rpc_ops->file_ops;
>  			inode->i_data.a_ops = &nfs_file_aops;
>  			inode->i_data.backing_dev_info = &NFS_SB(sb)->backing_dev_info;
>  		} else if (S_ISDIR(inode->i_mode)) {
> @@ -1207,6 +1207,14 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
>  		server->fsid = fattr->fsid;
>  
>  	/*
> +	 * file needs layout commit, server attributes may be stale
> +	 */
> +	if (layoutcommit_needed(nfsi) && nfsi->change_attr >= fattr->change_attr) {
> +		dprintk("NFS: %s: layoutcommit is needed for file %s/%ld\n",
> +			__func__, inode->i_sb->s_id, inode->i_ino);
> +		return 0;
> +	}
> +	/*
>  	 * Update the read time so we don't revalidate too often.
>  	 */
>  	nfsi->read_cache_jiffies = fattr->time_start;

Why isn't this needed?

> diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
> index fabb4f2..304c63c 100644
> --- a/fs/nfs/nfs3proc.c
> +++ b/fs/nfs/nfs3proc.c
> @@ -833,6 +833,7 @@ const struct nfs_rpc_ops nfs_v3_clientops = {
>  	.dentry_ops	= &nfs_dentry_operations,
>  	.dir_inode_ops	= &nfs3_dir_inode_operations,
>  	.file_inode_ops	= &nfs3_file_inode_operations,
> +	.file_ops	= &nfs_file_operations,
>  	.getroot	= nfs3_proc_get_root,
>  	.getattr	= nfs3_proc_getattr,
>  	.setattr	= nfs3_proc_setattr,
> diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
> index d6440fc..dd584e5 100644
> --- a/fs/nfs/nfs4_fs.h
> +++ b/fs/nfs/nfs4_fs.h
> @@ -334,6 +334,7 @@ extern void nfs_increment_lock_seqid(int status, struct nfs_seqid *seqid);
>  extern void nfs_release_seqid(struct nfs_seqid *seqid);
>  extern void nfs_free_seqid(struct nfs_seqid *seqid);
>  
> +/* write.c */
>  extern const nfs4_stateid zero_stateid;
>  
>  /* nfs4xdr.c */
> diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
> index fea1772..b2ce478 100644
> --- a/fs/nfs/nfs4filelayout.c
> +++ b/fs/nfs/nfs4filelayout.c
> @@ -65,6 +65,9 @@ MODULE_DESCRIPTION("The NFSv4 file layout driver");
>  /* Callback operations to the pNFS client */
>  struct pnfs_client_operations *pnfs_callback_ops;
>  
> +/* Forward declaration */
> +struct layoutdriver_io_operations filelayout_io_operations;
> +
>  int
>  filelayout_initialize_mountpoint(struct nfs_client *clp)
>  {
> diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
> index f8f7c05..1de176d 100644
> --- a/fs/nfs/nfs4filelayout.h
> +++ b/fs/nfs/nfs4filelayout.h
> @@ -22,6 +22,7 @@
>  
>  #define NFS4_PNFS_MAX_STRIPE_CNT 4096
>  #define NFS4_PNFS_MAX_MULTI_CNT  64 /* 256 fit into a u8 stripe_index */
> +#define NFS4_PNFS_MAX_MULTI_DS   2
>  
>  #define FILE_DSADDR(lseg) (container_of(lseg->deviceid, \
>  					struct nfs4_file_layout_dsaddr, \
> @@ -50,6 +51,11 @@ struct nfs4_file_layout_dsaddr {
>  	struct nfs4_pnfs_ds	*ds_list[1];
>  };
>  
> +struct nfs4_pnfs_dev_hlist {
> +	rwlock_t		dev_lock;
> +	struct hlist_head	dev_list[NFS4_PNFS_DEV_HASH_SIZE];
> +};
> +
>  struct nfs4_filelayout_segment {
>  	u32 stripe_type;
>  	u32 commit_through_mds;
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 8879fab..05f072c 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -4711,6 +4711,7 @@ int nfs4_proc_exchange_id(struct nfs_client *clp, struct rpc_cred *cred)
>  	dprintk("<-- %s status= %d\n", __func__, status);
>  	return status;
>  }
> +EXPORT_SYMBOL(nfs4_proc_exchange_id);
>  
>  struct nfs4_get_lease_time_data {
>  	struct nfs4_get_lease_time_args *args;
> @@ -5887,6 +5888,7 @@ const struct nfs_rpc_ops nfs_v4_clientops = {
>  	.dentry_ops	= &nfs4_dentry_operations,
>  	.dir_inode_ops	= &nfs4_dir_inode_operations,
>  	.file_inode_ops	= &nfs4_file_inode_operations,
> +	.file_ops	= &nfs_file_operations,
>  	.getroot	= nfs4_proc_get_root,
>  	.getattr	= nfs4_proc_getattr,
>  	.setattr	= nfs4_proc_setattr,
> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
> index 80f67c7..78d5c30 100644
> --- a/fs/nfs/pnfs.h
> +++ b/fs/nfs/pnfs.h
> @@ -30,6 +30,8 @@ extern int pnfs4_proc_layoutcommit(struct pnfs_layoutcommit_data *data,
>  extern int pnfs4_proc_layoutreturn(struct nfs4_pnfs_layoutreturn *lrp, bool wait);
>  
>  /* pnfs.c */
> +extern const nfs4_stateid zero_stateid;
> +
>  void put_lseg(struct pnfs_layout_segment *lseg);
>  void _pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
>  	enum pnfs_iomode access_type,
> @@ -69,6 +71,9 @@ void pnfs_get_layout_stateid(nfs4_stateid *dst, struct pnfs_layout_type *lo);
>  #define PNFS_EXISTS_LDIO_OP(srv, opname) ((srv)->pnfs_curr_ld &&	\
>  				     (srv)->pnfs_curr_ld->ld_io_ops &&	\
>  				     (srv)->pnfs_curr_ld->ld_io_ops->opname)
> +#define PNFS_EXISTS_LDPOLICY_OP(srv, opname) ((srv)->pnfs_curr_ld &&	\
> +				     (srv)->pnfs_curr_ld->ld_policy_ops && \
> +				     (srv)->pnfs_curr_ld->ld_policy_ops->opname)
>  
>  #define LAYOUT_NFSV4_1_MODULE_PREFIX "nfs-layouttype4"
>  
> @@ -176,6 +181,16 @@ pnfs_try_to_commit(struct nfs_write_data *data,
>  	return PNFS_NOT_ATTEMPTED;
>  }
>  
> +static inline int pnfs_get_write_status(struct nfs_write_data *data)
> +{
> +	return 0;
> +}
> +
> +static inline int pnfs_get_read_status(struct nfs_read_data *data)
> +{
> +	return 0;
> +}
> +
>  static inline int pnfs_layoutcommit_inode(struct inode *inode, int sync)
>  {
>  	return 0;
> diff --git a/fs/nfs/proc.c b/fs/nfs/proc.c
> index 737160d..1837a05 100644
> --- a/fs/nfs/proc.c
> +++ b/fs/nfs/proc.c
> @@ -694,6 +694,7 @@ const struct nfs_rpc_ops nfs_v2_clientops = {
>  	.dentry_ops	= &nfs_dentry_operations,
>  	.dir_inode_ops	= &nfs_dir_inode_operations,
>  	.file_inode_ops	= &nfs_file_inode_operations,
> +	.file_ops	= &nfs_file_operations,
>  	.getroot	= nfs_proc_get_root,
>  	.getattr	= nfs_proc_getattr,
>  	.setattr	= nfs_proc_setattr,
> diff --git a/fs/nfs/super.c b/fs/nfs/super.c
> index f9df16d..cd9f8d4 100644
> --- a/fs/nfs/super.c
> +++ b/fs/nfs/super.c
> @@ -64,6 +64,7 @@
>  #include "iostat.h"
>  #include "internal.h"
>  #include "fscache.h"
> +#include "pnfs.h"
>  
>  #define NFSDBG_FACILITY		NFSDBG_VFS
>  
> @@ -669,6 +670,28 @@ static int nfs_show_options(struct seq_file *m, struct vfsmount *mnt)
>  
>  	return 0;
>  }
> +#ifdef CONFIG_NFS_V4_1
> +void show_sessions(struct seq_file *m, struct nfs_server *server)
> +{
> +	if (nfs4_has_session(server->nfs_client))
> +		seq_printf(m, ",sessions");
> +}
> +#else
> +void show_sessions(struct seq_file *m, struct nfs_server *server) {}
> +#endif
> +
> +#ifdef CONFIG_NFS_V4_1
> +void show_pnfs(struct seq_file *m, struct nfs_server *server)
> +{
> +	seq_printf(m, ",pnfs=");
> +	if (server->pnfs_curr_ld)
> +		seq_printf(m, "%s", server->pnfs_curr_ld->name);
> +	else
> +		seq_printf(m, "not configured");
> +}
> +#else  /* CONFIG_NFS_V4_1 */
> +void show_pnfs(struct seq_file *m, struct nfs_server *server) {}
> +#endif /* CONFIG_NFS_V4_1 */
>  
>  /*
>   * Present statistical information for this VFS mountpoint
> @@ -707,6 +730,8 @@ static int nfs_show_stats(struct seq_file *m, struct vfsmount *mnt)
>  		seq_printf(m, "bm0=0x%x", nfss->attr_bitmask[0]);
>  		seq_printf(m, ",bm1=0x%x", nfss->attr_bitmask[1]);
>  		seq_printf(m, ",acl=0x%x", nfss->acl_bitmask);
> +		show_sessions(m, nfss);
> +		show_pnfs(m, nfss);

This is part of "9121554 pnfs: client stats" sent by Bruce.
why drop it?

>  	}
>  #endif
>  
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index 2251551..99af688 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -20,6 +20,7 @@
>  #include <linux/nfs_mount.h>
>  #include <linux/nfs_page.h>
>  #include <linux/backing-dev.h>
> +#include <linux/module.h>
>  
>  #include <asm/uaccess.h>
>  
> @@ -68,6 +69,7 @@ void nfs_commit_free(struct nfs_write_data *p)
>  		kfree(p->pagevec);
>  	mempool_free(p, nfs_commit_mempool);
>  }
> +EXPORT_SYMBOL(nfs_commit_free);
>  
>  struct nfs_write_data *nfs_writedata_alloc(unsigned int pagecount)
>  {
> @@ -1252,6 +1254,9 @@ int nfs_writeback_done(struct rpc_task *task, struct nfs_write_data *data)
>  	if (task->tk_status >= 0 && resp->count < argp->count) {
>  		static unsigned long    complain;
>  
> +		dprintk("NFS:       short write:"
> +			" (resp->count %u) < (argp->count = %u)\n",
> +			resp->count, argp->count);
>  		nfs_inc_stats(data->inode, NFSIOS_SHORTWRITE);
>  
>  		/* Has the server at least made some progress? */
> @@ -1268,7 +1273,7 @@ int nfs_writeback_done(struct rpc_task *task, struct nfs_write_data *data)
>  				 */
>  				argp->stable = NFS_FILE_SYNC;
>  			}
> -			nfs_restart_rpc(task, server->nfs_client);
> +			nfs_restart_rpc(task, clp);
>  			return -EAGAIN;
>  		}
>  		if (time_before(complain, jiffies)) {
> @@ -1607,6 +1612,7 @@ int nfs_write_inode(struct inode *inode, struct writeback_control *wbc)
>   */
>  int nfs_wb_all(struct inode *inode)
>  {
> +	int ret;
>  	struct writeback_control wbc = {
>  		.sync_mode = WB_SYNC_ALL,
>  		.nr_to_write = LONG_MAX,
> @@ -1614,7 +1620,8 @@ int nfs_wb_all(struct inode *inode)
>  		.range_end = LLONG_MAX,
>  	};
>  
> -	return sync_inode(inode, &wbc);
> +	ret = sync_inode(inode, &wbc);
> +	return ret;
>  }
>  
>  int nfs_wb_page_cancel(struct inode *inode, struct page *page)
> diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
> index e947a32..e6748cd 100644
> --- a/include/linux/nfs4.h
> +++ b/include/linux/nfs4.h
> @@ -492,6 +492,7 @@ enum lock_type4 {
>  #define FATTR4_WORD1_TIME_MODIFY_SET    (1UL << 22)
>  #define FATTR4_WORD1_MOUNTED_ON_FILEID  (1UL << 23)
>  #define FATTR4_WORD1_FS_LAYOUT_TYPES    (1UL << 30)
> +#define FATTR4_WORD2_LAYOUT_BLKSIZE     (1UL << 1)
>  
>  #define NFSPROC4_NULL 0
>  #define NFSPROC4_COMPOUND 1
> @@ -606,6 +607,14 @@ enum pnfs_notify_deviceid_type4 {
>  #define NFL4_UFLG_COMMIT_THRU_MDS	0x00000002
>  #define NFL4_UFLG_STRIPE_UNIT_SIZE_MASK	0xFFFFFFC0
>  
> +/* Encoded in the loh_body field of type layouthint4 */
> +enum filelayout_hint_care4 {
> +	NFLH4_CARE_DENSE		= NFL4_UFLG_DENSE,
> +	NFLH4_CARE_COMMIT_THRU_MDS	= NFL4_UFLG_COMMIT_THRU_MDS,
> +	NFLH4_CARE_STRIPE_UNIT_SIZE	= 0x00000040,
> +	NFLH4_CARE_STRIPE_COUNT		= 0x00000080
> +};
> +

We better leave the definitions in the header as they are part of the protocol
even if we don't make use of them in the patchset, for completeness
(and it causes me major headache to rebase on top of the pnfsd branch
which has these defs :)

>  #endif
>  #endif
>  
> diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
> index ef160e6..6236687 100644
> --- a/include/linux/nfs4_pnfs.h
> +++ b/include/linux/nfs4_pnfs.h
> @@ -20,6 +20,8 @@ enum pnfs_try_status {
>  	PNFS_NOT_ATTEMPTED = 1,
>  };
>  
> +#define NFS4_PNFS_GETDEVLIST_MAXNUM 16
> +
>  /* Per-layout driver specific registration structure */
>  struct pnfs_layoutdriver_type {
>  	const u32 id;
> @@ -60,6 +62,12 @@ PNFS_LD_IO_OPS(struct pnfs_layout_type *lo)
>  	return PNFS_LD(lo)->ld_io_ops;
>  }
>  
> +static inline struct layoutdriver_policy_operations *
> +PNFS_LD_POLICY_OPS(struct pnfs_layout_type *lo)
> +{
> +	return PNFS_LD(lo)->ld_policy_ops;
> +}
> +
>  static inline bool
>  has_layout(struct nfs_inode *nfsi)
>  {
> @@ -165,6 +173,12 @@ struct pnfs_device {
>  	unsigned int  dev_notify_types;
>  };
>  
> +struct pnfs_devicelist {
> +	unsigned int		eof;
> +	unsigned int		num_devs;
> +	struct pnfs_deviceid	dev_id[NFS4_PNFS_GETDEVLIST_MAXNUM];
> +};
> +
>  /*
>   * Device ID RCU cache. A device ID is unique per client ID and layout type.
>   */
> @@ -222,6 +236,7 @@ extern void nfs4_unset_layout_deviceid(struct pnfs_layout_segment *,
>  struct pnfs_client_operations {
>  	int (*nfs_getdeviceinfo) (struct nfs_server *,
>  				  struct pnfs_device *dev);
> +	void (*nfs_return_layout) (struct inode *);
>  };
>  
>  extern struct pnfs_client_operations pnfs_ops;
> @@ -229,4 +244,7 @@ extern struct pnfs_client_operations pnfs_ops;
>  extern struct pnfs_client_operations *pnfs_register_layoutdriver(struct pnfs_layoutdriver_type *);
>  extern void pnfs_unregister_layoutdriver(struct pnfs_layoutdriver_type *);
>  
> +#define NFS4_PNFS_MAX_LAYOUTS 4
> +#define NFS4_PNFS_PRIVATE_LAYOUT 0x80000000
> +

we use that post submit.
[the private layout range can actually be defined in nfs4.h]

Benny

>  #endif /* LINUX_NFS4_PNFS_H */
> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
> index 42c5ccf..4afeaeb 100644
> --- a/include/linux/nfs_xdr.h
> +++ b/include/linux/nfs_xdr.h
> @@ -1046,6 +1046,7 @@ struct nfs_rpc_ops {
>  	const struct dentry_operations *dentry_ops;
>  	const struct inode_operations *dir_inode_ops;
>  	const struct inode_operations *file_inode_ops;
> +	const struct file_operations *file_ops;
>  
>  	int	(*getroot) (struct nfs_server *, struct nfs_fh *,
>  			    struct nfs_fsinfo *);
> @@ -1110,6 +1111,7 @@ struct nfs_rpc_ops {
>  extern const struct nfs_rpc_ops	nfs_v2_clientops;
>  extern const struct nfs_rpc_ops	nfs_v3_clientops;
>  extern const struct nfs_rpc_ops	nfs_v4_clientops;
> +extern const struct nfs_rpc_ops	pnfs_v4_clientops;
>  extern struct rpc_version	nfs_version2;
>  extern struct rpc_version	nfs_version3;
>  extern struct rpc_version	nfs_version4;
> diff --git a/include/linux/pnfs_xdr.h b/include/linux/pnfs_xdr.h
> index ed16c65..c9a01b3 100644
> --- a/include/linux/pnfs_xdr.h
> +++ b/include/linux/pnfs_xdr.h
> @@ -89,9 +89,9 @@ struct pnfs_layoutcommit_data {
>  };
>  
>  struct nfs4_pnfs_layoutreturn_arg {
> -	__u32   reclaim;
> -	__u32   layout_type;
> -	__u32   return_type;
> +	__u32	reclaim;
> +	__u32	layout_type;
> +	__u32	return_type;
>  	struct nfs4_pnfs_layout_segment lseg;
>  	struct inode *inode;
>  	struct nfs4_sequence_args seq_args;

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 0/50] Squashed and re-organized pnfs-submit tree
  2010-08-13 21:31 [PATCH 0/50] Squashed and re-organized pnfs-submit tree andros
  2010-08-13 21:31 ` [PATCH 01/50] nfs41: prevent exchange_id from sending server-only flag andros
@ 2010-08-19 20:50 ` Benny Halevy
  1 sibling, 0 replies; 69+ messages in thread
From: Benny Halevy @ 2010-08-19 20:50 UTC (permalink / raw)
  To: andros; +Cc: linux-nfs

Andy,  I merged these patches with restored sign-offs and a couple minor touch-ups
to make sure all compile.  There's still zero diff from the original pnfs-submit
branch.  The pnfs-all-latest tip is at pnfs-all-2.6.35-2010-08-19.
With regards to the patch descriptions, we've changed the patcheset so much
it just makes more sense to re-write the commentary when we're close to have the
final version.

Let's get on with the renaming now and then we'll continue to rearrange the set
for review and clean it up further.

Benny

On Aug. 14, 2010, 0:31 +0300, andros@netapp.com wrote:
> Here is the pnfs-submit tree all squashed and reorganized. The resultant tree
> is unchanged. The patch comments and Signed-off-by's need to be added.
> 
> We may want to split or re-order some of the patches. I left the "introduce
> new file" patches #6-9 unsquashed, but we probably want to squash them into
> the patch that uses them first.
> 
> Patches 1-4 are not pnfs patches, and should be submitted to Trond/Bruce.
> 
> Patch #50 "SQUASHME pnfs_submit: remove this unused code"
> should not be squashed, but removed.
> 
> Patches 47,-49 are for the post-submit trees.
> 
> -->Andy
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 04/50] nfsd: remove duplicate NFS4_STATEID_SIZE declaration
  2010-08-13 21:31       ` [PATCH 04/50] nfsd: remove duplicate NFS4_STATEID_SIZE declaration andros
  2010-08-13 21:31         ` [PATCH 05/50] pnfs_submit: pnfs and nfslayoutdriver kconfig andros
@ 2010-08-20 22:13         ` J. Bruce Fields
  1 sibling, 0 replies; 69+ messages in thread
From: J. Bruce Fields @ 2010-08-20 22:13 UTC (permalink / raw)
  To: andros; +Cc: bhalevy, linux-nfs

On Fri, Aug 13, 2010 at 05:31:16PM -0400, andros@netapp.com wrote:
> From: Andy Adamson <andros@netapp.com>
> 
> Use NFS4_STATEID_SIZE from include/linux/nfs4

Thanks, applying for 2.6.37.--b.

> 
> Signed-off-by: Andy Adamson <andros@netapp.com>
> ---
>  fs/nfsd/nfs4callback.c |    1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index f324f4b..e89476e 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -41,7 +41,6 @@
>  
>  #define NFSPROC4_CB_NULL 0
>  #define NFSPROC4_CB_COMPOUND 1
> -#define NFS4_STATEID_SIZE 16
>  
>  /* Index of predefined Linux callback client operations */
>  
> -- 
> 1.6.2.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 48/50] SQUASHME pnfs_post_submit: layout type enum
  2010-08-13 21:32                                                                                               ` [PATCH 48/50] SQUASHME pnfs_post_submit: layout type enum andros
  2010-08-13 21:32                                                                                                 ` [PATCH 49/50] SQUASHME pnfs_post_submit: cb notify deviceid declarations andros
@ 2010-08-31 15:52                                                                                                 ` Boaz Harrosh
  1 sibling, 0 replies; 69+ messages in thread
From: Boaz Harrosh @ 2010-08-31 15:52 UTC (permalink / raw)
  To: andros; +Cc: bhalevy, linux-nfs

On 08/14/2010 12:32 AM, andros@netapp.com wrote:
> From: Andy Adamson <andros@netapp.com>
> 
> Signed-off-by: Andy Adamson <andros@netapp.com>
> ---
>  include/linux/nfs4.h |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
> index 6ee7357..2d3d277 100644
> --- a/include/linux/nfs4.h
> +++ b/include/linux/nfs4.h
> @@ -579,6 +579,8 @@ enum state_protect_how4 {
>  
>  enum pnfs_layouttype {
>  	LAYOUT_NFSV4_1_FILES  = 1,
> +	LAYOUT_OSD2_OBJECTS = 2,
> +	LAYOUT_BLOCK_VOLUME = 3,
>  };
>  

Rrr, what's the point? it's just an enum.
All STD derived types and constants should just be kept intact. 
Implemented or not is irrelevant.

which calls for a rename BTW.
pnfs_layouttype => nfs4_layouttype (The name nfs4.h should be an
hint I'd say)

>  /* used for both layout return and recall */


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 50/50] SQUASHME pnfs_submit: remove this unused code
  2010-08-13 21:32                                                                                                   ` [PATCH 50/50] SQUASHME pnfs_submit: remove this unused code andros
  2010-08-19 20:25                                                                                                     ` Benny Halevy
@ 2010-08-31 16:32                                                                                                     ` Boaz Harrosh
  1 sibling, 0 replies; 69+ messages in thread
From: Boaz Harrosh @ 2010-08-31 16:32 UTC (permalink / raw)
  To: andros; +Cc: bhalevy, linux-nfs

On 08/14/2010 12:32 AM, andros@netapp.com wrote:
> From: Andy Adamson <andros@netapp.com>
> 
> Signed-off-by: Andy Adamson <andros@netapp.com>
> ---
>  fs/nfs/client.c           |    1 +
>  fs/nfs/file.c             |    1 +
>  fs/nfs/inode.c            |   10 +++++++++-
>  fs/nfs/nfs3proc.c         |    1 +
>  fs/nfs/nfs4_fs.h          |    1 +
>  fs/nfs/nfs4filelayout.c   |    3 +++
>  fs/nfs/nfs4filelayout.h   |    6 ++++++
>  fs/nfs/nfs4proc.c         |    2 ++
>  fs/nfs/pnfs.h             |   15 +++++++++++++++
>  fs/nfs/proc.c             |    1 +
>  fs/nfs/super.c            |   25 +++++++++++++++++++++++++
>  fs/nfs/write.c            |   11 +++++++++--
>  include/linux/nfs4.h      |    9 +++++++++
>  include/linux/nfs4_pnfs.h |   18 ++++++++++++++++++
>  include/linux/nfs_xdr.h   |    2 ++
>  include/linux/pnfs_xdr.h  |    6 +++---
>  16 files changed, 106 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> index b53f61c..38ef02f 100644
> --- a/fs/nfs/client.c
> +++ b/fs/nfs/client.c
> @@ -39,6 +39,7 @@
>  #include <net/ipv6.h>
>  #include <linux/nfs_xdr.h>
>  #include <linux/sunrpc/bc_xprt.h>
> +#include <linux/nfs4_pnfs.h>
>  
>  #include <asm/system.h>
>  
> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> index d0ed767..bf17633 100644
> --- a/fs/nfs/file.c
> +++ b/fs/nfs/file.c
> @@ -28,6 +28,7 @@
>  #include <linux/aio.h>
>  #include <linux/gfp.h>
>  #include <linux/swap.h>
> +#include <linux/pnfs_xdr.h>
>  
>  #include <asm/uaccess.h>
>  #include <asm/system.h>
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index 30d9ac6..6132f6b 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -279,7 +279,7 @@ nfs_fhget(struct super_block *sb, struct nfs_fh *fh, struct nfs_fattr *fattr)
>  		 */
>  		inode->i_op = NFS_SB(sb)->nfs_client->rpc_ops->file_inode_ops;
>  		if (S_ISREG(inode->i_mode)) {
> -			inode->i_fop = &nfs_file_operations;
> +			inode->i_fop = NFS_SB(sb)->nfs_client->rpc_ops->file_ops;
>  			inode->i_data.a_ops = &nfs_file_aops;
>  			inode->i_data.backing_dev_info = &NFS_SB(sb)->backing_dev_info;
>  		} else if (S_ISDIR(inode->i_mode)) {
> @@ -1207,6 +1207,14 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
>  		server->fsid = fattr->fsid;
>  
>  	/*
> +	 * file needs layout commit, server attributes may be stale
> +	 */
> +	if (layoutcommit_needed(nfsi) && nfsi->change_attr >= fattr->change_attr) {
> +		dprintk("NFS: %s: layoutcommit is needed for file %s/%ld\n",
> +			__func__, inode->i_sb->s_id, inode->i_ino);
> +		return 0;
> +	}
> +	/*
>  	 * Update the read time so we don't revalidate too often.
>  	 */
>  	nfsi->read_cache_jiffies = fattr->time_start;
> diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
> index fabb4f2..304c63c 100644
> --- a/fs/nfs/nfs3proc.c
> +++ b/fs/nfs/nfs3proc.c
> @@ -833,6 +833,7 @@ const struct nfs_rpc_ops nfs_v3_clientops = {
>  	.dentry_ops	= &nfs_dentry_operations,
>  	.dir_inode_ops	= &nfs3_dir_inode_operations,
>  	.file_inode_ops	= &nfs3_file_inode_operations,
> +	.file_ops	= &nfs_file_operations,
>  	.getroot	= nfs3_proc_get_root,
>  	.getattr	= nfs3_proc_getattr,
>  	.setattr	= nfs3_proc_setattr,
> diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
> index d6440fc..dd584e5 100644
> --- a/fs/nfs/nfs4_fs.h
> +++ b/fs/nfs/nfs4_fs.h
> @@ -334,6 +334,7 @@ extern void nfs_increment_lock_seqid(int status, struct nfs_seqid *seqid);
>  extern void nfs_release_seqid(struct nfs_seqid *seqid);
>  extern void nfs_free_seqid(struct nfs_seqid *seqid);
>  
> +/* write.c */

A comment was removed, but is needed back just by itself? What changed?

>  extern const nfs4_stateid zero_stateid;
>  
>  /* nfs4xdr.c */
> diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
> index fea1772..b2ce478 100644
> --- a/fs/nfs/nfs4filelayout.c
> +++ b/fs/nfs/nfs4filelayout.c
> @@ -65,6 +65,9 @@ MODULE_DESCRIPTION("The NFSv4 file layout driver");
>  /* Callback operations to the pNFS client */
>  struct pnfs_client_operations *pnfs_callback_ops;
>  
> +/* Forward declaration */

I'm not sure we need the comment

> +struct layoutdriver_io_operations filelayout_io_operations;
> +

Forward declaration in a .c file are we missing a "static"?

>  int
>  filelayout_initialize_mountpoint(struct nfs_client *clp)
>  {
> diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
> index f8f7c05..1de176d 100644
> --- a/fs/nfs/nfs4filelayout.h
> +++ b/fs/nfs/nfs4filelayout.h
> @@ -22,6 +22,7 @@
>  
>  #define NFS4_PNFS_MAX_STRIPE_CNT 4096
>  #define NFS4_PNFS_MAX_MULTI_CNT  64 /* 256 fit into a u8 stripe_index */
> +#define NFS4_PNFS_MAX_MULTI_DS   2
>  
>  #define FILE_DSADDR(lseg) (container_of(lseg->deviceid, \
>  					struct nfs4_file_layout_dsaddr, \
> @@ -50,6 +51,11 @@ struct nfs4_file_layout_dsaddr {
>  	struct nfs4_pnfs_ds	*ds_list[1];
>  };
>  
> +struct nfs4_pnfs_dev_hlist {
> +	rwlock_t		dev_lock;
> +	struct hlist_head	dev_list[NFS4_PNFS_DEV_HASH_SIZE];
> +};
> +
>  struct nfs4_filelayout_segment {
>  	u32 stripe_type;
>  	u32 commit_through_mds;
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 8879fab..05f072c 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -4711,6 +4711,7 @@ int nfs4_proc_exchange_id(struct nfs_client *clp, struct rpc_cred *cred)
>  	dprintk("<-- %s status= %d\n", __func__, status);
>  	return status;
>  }
> +EXPORT_SYMBOL(nfs4_proc_exchange_id);
>  
>  struct nfs4_get_lease_time_data {
>  	struct nfs4_get_lease_time_args *args;
> @@ -5887,6 +5888,7 @@ const struct nfs_rpc_ops nfs_v4_clientops = {
>  	.dentry_ops	= &nfs4_dentry_operations,
>  	.dir_inode_ops	= &nfs4_dir_inode_operations,
>  	.file_inode_ops	= &nfs4_file_inode_operations,
> +	.file_ops	= &nfs_file_operations,
>  	.getroot	= nfs4_proc_get_root,
>  	.getattr	= nfs4_proc_getattr,
>  	.setattr	= nfs4_proc_setattr,
> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
> index 80f67c7..78d5c30 100644
> --- a/fs/nfs/pnfs.h
> +++ b/fs/nfs/pnfs.h
> @@ -30,6 +30,8 @@ extern int pnfs4_proc_layoutcommit(struct pnfs_layoutcommit_data *data,
>  extern int pnfs4_proc_layoutreturn(struct nfs4_pnfs_layoutreturn *lrp, bool wait);
>  
>  /* pnfs.c */
> +extern const nfs4_stateid zero_stateid;
> +
>  void put_lseg(struct pnfs_layout_segment *lseg);
>  void _pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
>  	enum pnfs_iomode access_type,
> @@ -69,6 +71,9 @@ void pnfs_get_layout_stateid(nfs4_stateid *dst, struct pnfs_layout_type *lo);
>  #define PNFS_EXISTS_LDIO_OP(srv, opname) ((srv)->pnfs_curr_ld &&	\
>  				     (srv)->pnfs_curr_ld->ld_io_ops &&	\
>  				     (srv)->pnfs_curr_ld->ld_io_ops->opname)
> +#define PNFS_EXISTS_LDPOLICY_OP(srv, opname) ((srv)->pnfs_curr_ld &&	\
> +				     (srv)->pnfs_curr_ld->ld_policy_ops && \
> +				     (srv)->pnfs_curr_ld->ld_policy_ops->opname)
>  
>  #define LAYOUT_NFSV4_1_MODULE_PREFIX "nfs-layouttype4"
>  
> @@ -176,6 +181,16 @@ pnfs_try_to_commit(struct nfs_write_data *data,
>  	return PNFS_NOT_ATTEMPTED;
>  }
>  
> +static inline int pnfs_get_write_status(struct nfs_write_data *data)
> +{
> +	return 0;
> +}
> +
> +static inline int pnfs_get_read_status(struct nfs_read_data *data)
> +{
> +	return 0;
> +}
> +
>  static inline int pnfs_layoutcommit_inode(struct inode *inode, int sync)
>  {
>  	return 0;
> diff --git a/fs/nfs/proc.c b/fs/nfs/proc.c
> index 737160d..1837a05 100644
> --- a/fs/nfs/proc.c
> +++ b/fs/nfs/proc.c
> @@ -694,6 +694,7 @@ const struct nfs_rpc_ops nfs_v2_clientops = {
>  	.dentry_ops	= &nfs_dentry_operations,
>  	.dir_inode_ops	= &nfs_dir_inode_operations,
>  	.file_inode_ops	= &nfs_file_inode_operations,
> +	.file_ops	= &nfs_file_operations,
>  	.getroot	= nfs_proc_get_root,
>  	.getattr	= nfs_proc_getattr,
>  	.setattr	= nfs_proc_setattr,
> diff --git a/fs/nfs/super.c b/fs/nfs/super.c
> index f9df16d..cd9f8d4 100644
> --- a/fs/nfs/super.c
> +++ b/fs/nfs/super.c
> @@ -64,6 +64,7 @@
>  #include "iostat.h"
>  #include "internal.h"
>  #include "fscache.h"
> +#include "pnfs.h"
>  
>  #define NFSDBG_FACILITY		NFSDBG_VFS
>  
> @@ -669,6 +670,28 @@ static int nfs_show_options(struct seq_file *m, struct vfsmount *mnt)
>  
>  	return 0;
>  }
> +#ifdef CONFIG_NFS_V4_1
> +void show_sessions(struct seq_file *m, struct nfs_server *server)
> +{
> +	if (nfs4_has_session(server->nfs_client))
> +		seq_printf(m, ",sessions");
> +}
> +#else
> +void show_sessions(struct seq_file *m, struct nfs_server *server) {}
> +#endif
> +
> +#ifdef CONFIG_NFS_V4_1
> +void show_pnfs(struct seq_file *m, struct nfs_server *server)
> +{
> +	seq_printf(m, ",pnfs=");
> +	if (server->pnfs_curr_ld)
> +		seq_printf(m, "%s", server->pnfs_curr_ld->name);
> +	else
> +		seq_printf(m, "not configured");
> +}
> +#else  /* CONFIG_NFS_V4_1 */
> +void show_pnfs(struct seq_file *m, struct nfs_server *server) {}
> +#endif /* CONFIG_NFS_V4_1 */
>  
>  /*
>   * Present statistical information for this VFS mountpoint
> @@ -707,6 +730,8 @@ static int nfs_show_stats(struct seq_file *m, struct vfsmount *mnt)
>  		seq_printf(m, "bm0=0x%x", nfss->attr_bitmask[0]);
>  		seq_printf(m, ",bm1=0x%x", nfss->attr_bitmask[1]);
>  		seq_printf(m, ",acl=0x%x", nfss->acl_bitmask);
> +		show_sessions(m, nfss);
> +		show_pnfs(m, nfss);
>  	}
>  #endif
>  
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index 2251551..99af688 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -20,6 +20,7 @@
>  #include <linux/nfs_mount.h>
>  #include <linux/nfs_page.h>
>  #include <linux/backing-dev.h>
> +#include <linux/module.h>
>  
>  #include <asm/uaccess.h>
>  
> @@ -68,6 +69,7 @@ void nfs_commit_free(struct nfs_write_data *p)
>  		kfree(p->pagevec);
>  	mempool_free(p, nfs_commit_mempool);
>  }
> +EXPORT_SYMBOL(nfs_commit_free);
>  
>  struct nfs_write_data *nfs_writedata_alloc(unsigned int pagecount)
>  {
> @@ -1252,6 +1254,9 @@ int nfs_writeback_done(struct rpc_task *task, struct nfs_write_data *data)
>  	if (task->tk_status >= 0 && resp->count < argp->count) {
>  		static unsigned long    complain;
>  
> +		dprintk("NFS:       short write:"
> +			" (resp->count %u) < (argp->count = %u)\n",
> +			resp->count, argp->count);
>  		nfs_inc_stats(data->inode, NFSIOS_SHORTWRITE);
>  
>  		/* Has the server at least made some progress? */
> @@ -1268,7 +1273,7 @@ int nfs_writeback_done(struct rpc_task *task, struct nfs_write_data *data)
>  				 */
>  				argp->stable = NFS_FILE_SYNC;
>  			}
> -			nfs_restart_rpc(task, server->nfs_client);
> +			nfs_restart_rpc(task, clp);

OK, I admit, I don't understand what happened here?

>  			return -EAGAIN;
>  		}
>  		if (time_before(complain, jiffies)) {
> @@ -1607,6 +1612,7 @@ int nfs_write_inode(struct inode *inode, struct writeback_control *wbc)
>   */
>  int nfs_wb_all(struct inode *inode)
>  {
> +	int ret;
>  	struct writeback_control wbc = {
>  		.sync_mode = WB_SYNC_ALL,
>  		.nr_to_write = LONG_MAX,
> @@ -1614,7 +1620,8 @@ int nfs_wb_all(struct inode *inode)
>  		.range_end = LLONG_MAX,
>  	};
>  
> -	return sync_inode(inode, &wbc);
> +	ret = sync_inode(inode, &wbc);
> +	return ret;

Why is this needed back in?

>  }
>  
>  int nfs_wb_page_cancel(struct inode *inode, struct page *page)
> diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
> index e947a32..e6748cd 100644
> --- a/include/linux/nfs4.h
> +++ b/include/linux/nfs4.h
> @@ -492,6 +492,7 @@ enum lock_type4 {
>  #define FATTR4_WORD1_TIME_MODIFY_SET    (1UL << 22)
>  #define FATTR4_WORD1_MOUNTED_ON_FILEID  (1UL << 23)
>  #define FATTR4_WORD1_FS_LAYOUT_TYPES    (1UL << 30)
> +#define FATTR4_WORD2_LAYOUT_BLKSIZE     (1UL << 1)
>  
>  #define NFSPROC4_NULL 0
>  #define NFSPROC4_COMPOUND 1
> @@ -606,6 +607,14 @@ enum pnfs_notify_deviceid_type4 {
>  #define NFL4_UFLG_COMMIT_THRU_MDS	0x00000002
>  #define NFL4_UFLG_STRIPE_UNIT_SIZE_MASK	0xFFFFFFC0
>  
> +/* Encoded in the loh_body field of type layouthint4 */
> +enum filelayout_hint_care4 {
> +	NFLH4_CARE_DENSE		= NFL4_UFLG_DENSE,
> +	NFLH4_CARE_COMMIT_THRU_MDS	= NFL4_UFLG_COMMIT_THRU_MDS,
> +	NFLH4_CARE_STRIPE_UNIT_SIZE	= 0x00000040,
> +	NFLH4_CARE_STRIPE_COUNT		= 0x00000080
> +};
> +
>  #endif
>  #endif
>  

Please, again don't split out the STD definitions. Even if not yet used
I don't see the point. I do see why it's bad, it forces the developer
to read the STD instead of trusting an header for been complete and
accurate.

> diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
> index ef160e6..6236687 100644
> --- a/include/linux/nfs4_pnfs.h
> +++ b/include/linux/nfs4_pnfs.h
> @@ -20,6 +20,8 @@ enum pnfs_try_status {
>  	PNFS_NOT_ATTEMPTED = 1,
>  };
>  
> +#define NFS4_PNFS_GETDEVLIST_MAXNUM 16
> +
>  /* Per-layout driver specific registration structure */
>  struct pnfs_layoutdriver_type {
>  	const u32 id;
> @@ -60,6 +62,12 @@ PNFS_LD_IO_OPS(struct pnfs_layout_type *lo)
>  	return PNFS_LD(lo)->ld_io_ops;
>  }
>  
> +static inline struct layoutdriver_policy_operations *
> +PNFS_LD_POLICY_OPS(struct pnfs_layout_type *lo)
> +{
> +	return PNFS_LD(lo)->ld_policy_ops;
> +}
> +
>  static inline bool
>  has_layout(struct nfs_inode *nfsi)
>  {
> @@ -165,6 +173,12 @@ struct pnfs_device {
>  	unsigned int  dev_notify_types;
>  };
>  
> +struct pnfs_devicelist {
> +	unsigned int		eof;
> +	unsigned int		num_devs;
> +	struct pnfs_deviceid	dev_id[NFS4_PNFS_GETDEVLIST_MAXNUM];
> +};
> +
>  /*
>   * Device ID RCU cache. A device ID is unique per client ID and layout type.
>   */
> @@ -222,6 +236,7 @@ extern void nfs4_unset_layout_deviceid(struct pnfs_layout_segment *,
>  struct pnfs_client_operations {
>  	int (*nfs_getdeviceinfo) (struct nfs_server *,
>  				  struct pnfs_device *dev);
> +	void (*nfs_return_layout) (struct inode *);
>  };
>  
>  extern struct pnfs_client_operations pnfs_ops;
> @@ -229,4 +244,7 @@ extern struct pnfs_client_operations pnfs_ops;
>  extern struct pnfs_client_operations *pnfs_register_layoutdriver(struct pnfs_layoutdriver_type *);
>  extern void pnfs_unregister_layoutdriver(struct pnfs_layoutdriver_type *);
>  
> +#define NFS4_PNFS_MAX_LAYOUTS 4
> +#define NFS4_PNFS_PRIVATE_LAYOUT 0x80000000
> +
>  #endif /* LINUX_NFS4_PNFS_H */
> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
> index 42c5ccf..4afeaeb 100644
> --- a/include/linux/nfs_xdr.h
> +++ b/include/linux/nfs_xdr.h
> @@ -1046,6 +1046,7 @@ struct nfs_rpc_ops {
>  	const struct dentry_operations *dentry_ops;
>  	const struct inode_operations *dir_inode_ops;
>  	const struct inode_operations *file_inode_ops;
> +	const struct file_operations *file_ops;
>  
>  	int	(*getroot) (struct nfs_server *, struct nfs_fh *,
>  			    struct nfs_fsinfo *);
> @@ -1110,6 +1111,7 @@ struct nfs_rpc_ops {
>  extern const struct nfs_rpc_ops	nfs_v2_clientops;
>  extern const struct nfs_rpc_ops	nfs_v3_clientops;
>  extern const struct nfs_rpc_ops	nfs_v4_clientops;
> +extern const struct nfs_rpc_ops	pnfs_v4_clientops;
>  extern struct rpc_version	nfs_version2;
>  extern struct rpc_version	nfs_version3;
>  extern struct rpc_version	nfs_version4;
> diff --git a/include/linux/pnfs_xdr.h b/include/linux/pnfs_xdr.h
> index ed16c65..c9a01b3 100644
> --- a/include/linux/pnfs_xdr.h
> +++ b/include/linux/pnfs_xdr.h
> @@ -89,9 +89,9 @@ struct pnfs_layoutcommit_data {
>  };
>  
>  struct nfs4_pnfs_layoutreturn_arg {
> -	__u32   reclaim;
> -	__u32   layout_type;
> -	__u32   return_type;
> +	__u32	reclaim;
> +	__u32	layout_type;
> +	__u32	return_type;

Is that a white space change of a tab to 3 spaces? could we make one
big white space SQUASHME separate from other code?

>  	struct nfs4_pnfs_layout_segment lseg;
>  	struct inode *inode;
>  	struct nfs4_sequence_args seq_args;

And all of Benny's comments as well for me.
Thanks
Boaz


^ permalink raw reply	[flat|nested] 69+ messages in thread

end of thread, other threads:[~2010-08-31 16:32 UTC | newest]

Thread overview: 69+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-13 21:31 [PATCH 0/50] Squashed and re-organized pnfs-submit tree andros
2010-08-13 21:31 ` [PATCH 01/50] nfs41: prevent exchange_id from sending server-only flag andros
2010-08-13 21:31   ` [PATCH 02/50] sunrpc: define xdr_decode_opaque_fixed andros
2010-08-13 21:31     ` [PATCH 03/50] sunrpc: don't reset buflen twice in xdr_shrink_pagelen andros
2010-08-13 21:31       ` [PATCH 04/50] nfsd: remove duplicate NFS4_STATEID_SIZE declaration andros
2010-08-13 21:31         ` [PATCH 05/50] pnfs_submit: pnfs and nfslayoutdriver kconfig andros
2010-08-13 21:31           ` [PATCH 06/50] pnfs_submit: introduce include/linux/nfs4_pnfs.h andros
2010-08-13 21:31             ` [PATCH 07/50] pnfs_submit: introduce include/linux/pnfs_xdr.h andros
2010-08-13 21:31               ` [PATCH 08/50] pnfs_submit: introduce fs/nfs/pnfs.h andros
2010-08-13 21:31                 ` [PATCH 09/50] pnfs_submit: introduce fs/nfs/pnfs.c andros
2010-08-13 21:31                   ` [PATCH 10/50] pnfs_submit: register unregister pnfs module andros
2010-08-13 21:31                     ` [PATCH 11/50] pnfs_submit: set and unset pnfs layoutdriver modules andros
2010-08-13 21:31                       ` [PATCH 12/50] pnfs_submit: generic pnfs deviceid cache andros
2010-08-13 21:31                         ` [PATCH 13/50] pnfs_submit: introduce nfs4layoutdriver module andros
2010-08-13 21:31                           ` [PATCH 14/50] pnfs_submit: filelayout data server cache andros
2010-08-13 21:31                             ` [PATCH 15/50] pnfs_submit: filelayout deviceid cache andros
2010-08-13 21:31                               ` [PATCH 16/50] pnfs_submit: generic getdeviceinfo andros
2010-08-13 21:31                                 ` [PATCH 17/50] pnfs_submit: filelayout getdeviceinfo andros
2010-08-13 21:31                                   ` [PATCH 18/50] pnfs-submit: change stateid to be a union andros
2010-08-13 21:31                                     ` [PATCH 19/50] pnfs_submit: layout header alloc,reference, and destroy andros
2010-08-13 21:31                                       ` [PATCH 20/50] pnfs_submit: filelayout alloc_layout and free_layout andros
2010-08-13 21:31                                         ` [PATCH 21/50] pnfs_submit: layout segment alloc, reference, destroy andros
2010-08-13 21:31                                           ` [PATCH 22/50] pnfs_submit: layoutget andros
2010-08-13 21:31                                             ` [PATCH 23/50] pnfs_submit: layout helper functions andros
2010-08-13 21:31                                               ` [PATCH 24/50] pnfs_submit: filelayout layout segment alloc and free andros
2010-08-13 21:31                                                 ` [PATCH 25/50] pnfs_submit: layoutcommit helper functions andros
2010-08-13 21:31                                                   ` [PATCH 26/50] pnfs_submit: layoutcommit andros
2010-08-13 21:31                                                     ` [PATCH 27/50] pnfs_submit: layoutreturn helper functions andros
2010-08-13 21:31                                                       ` [PATCH 28/50] pnfs_submit: layoutreturn andros
2010-08-13 21:31                                                         ` [PATCH 29/50] pnfs_submit: add data server session to nfs4_setup_sequence andros
2010-08-13 21:31                                                           ` [PATCH 30/50] pnfs_submit: update nfs4_async_handle_error for data server andros
2010-08-13 21:31                                                             ` [PATCH 31/50] pnfs_submit: update state renewal for data servers andros
2010-08-13 21:31                                                               ` [PATCH 32/50] pnfs_submit-pageio-helpers.patch andros
2010-08-13 21:31                                                                 ` [PATCH 33/50] pnfs_submit: associate layout segmennt with nfs_page andros
2010-08-13 21:31                                                                   ` [PATCH 34/50] pnfs_submit: filelayout policy operations andros
2010-08-13 21:31                                                                     ` [PATCH 35/50] pnfs_submit: filelayout i/o helpers andros
2010-08-13 21:31                                                                       ` [PATCH 36/50] pnfs_submit: generic read andros
2010-08-13 21:31                                                                         ` [PATCH 37/50] pnfs_submit: filelayout read andros
2010-08-13 21:31                                                                           ` [PATCH 38/50] pnfs_submit: generic write andros
2010-08-13 21:31                                                                             ` [PATCH 39/50] pnfs_submit: data server write with no getattr andros
2010-08-13 21:31                                                                               ` [PATCH 40/50] pnfs_submit: filelayout write andros
2010-08-13 21:31                                                                                 ` [PATCH 41/50] pnfs_submit: signal layoutdriver commit andros
2010-08-13 21:31                                                                                   ` [PATCH 42/50] pnfs_submit: generic commit andros
2010-08-13 21:31                                                                                     ` [PATCH 43/50] pnfs_submit: data server commit with no getattr andros
2010-08-13 21:31                                                                                       ` [PATCH 44/50] pnfs_submit: filelayout commit andros
2010-08-13 21:31                                                                                         ` [PATCH 45/50] pnfs_submit: cb_layoutrecall andros
2010-08-13 21:31                                                                                           ` [PATCH 46/50] pnfs_submit: increase NFS_MAX_FILE_IO_SIZE andros
2010-08-13 21:31                                                                                             ` [PATCH 47/50] SQUASHME pnfs_post_submit: direct i/o andros
2010-08-13 21:32                                                                                               ` [PATCH 48/50] SQUASHME pnfs_post_submit: layout type enum andros
2010-08-13 21:32                                                                                                 ` [PATCH 49/50] SQUASHME pnfs_post_submit: cb notify deviceid declarations andros
2010-08-13 21:32                                                                                                   ` [PATCH 50/50] SQUASHME pnfs_submit: remove this unused code andros
2010-08-19 20:25                                                                                                     ` Benny Halevy
2010-08-31 16:32                                                                                                     ` Boaz Harrosh
2010-08-31 15:52                                                                                                 ` [PATCH 48/50] SQUASHME pnfs_post_submit: layout type enum Boaz Harrosh
2010-08-18 20:31                       ` [PATCH 11/50] pnfs_submit: set and unset pnfs layoutdriver modules Christoph Hellwig
2010-08-18 20:46                         ` Benny Halevy
2010-08-19  9:43                           ` Christoph Hellwig
2010-08-18 20:29                     ` [PATCH 10/50] pnfs_submit: register unregister pnfs module Christoph Hellwig
2010-08-18 20:49                       ` Benny Halevy
2010-08-18 20:28                   ` [PATCH 09/50] pnfs_submit: introduce fs/nfs/pnfs.c Christoph Hellwig
2010-08-19 17:21                     ` J. Bruce Fields
2010-08-18 20:27             ` [PATCH 06/50] pnfs_submit: introduce include/linux/nfs4_pnfs.h Christoph Hellwig
2010-08-18 20:48               ` William A. (Andy) Adamson
2010-08-18 20:50               ` Benny Halevy
2010-08-18 20:25           ` [PATCH 05/50] pnfs_submit: pnfs and nfslayoutdriver kconfig Christoph Hellwig
2010-08-18 21:09             ` Benny Halevy
2010-08-19  9:45               ` Christoph Hellwig
2010-08-20 22:13         ` [PATCH 04/50] nfsd: remove duplicate NFS4_STATEID_SIZE declaration J. Bruce Fields
2010-08-19 20:50 ` [PATCH 0/50] Squashed and re-organized pnfs-submit tree Benny Halevy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.