All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/40] Wave3: For pNFS team review, not for kernel submission
@ 2011-02-04 21:33 andros
  2011-02-04 21:33 ` [PATCH 01/40] pnfs-submit: wave3: lseg refcounting andros
  2011-02-10  5:59 ` [PATCH 0/40] Wave3: For pNFS team review, not for kernel submission Benny Halevy
  0 siblings, 2 replies; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs


The wave3 code addresses pNFS file layout data server connection, data server
READ I/O and recovery of failed data server READs through the MDS.

I did not see the pnfs-submit-wave3 branch on benny's tree, so I created my
own for the meantime.
I cloned the nfsd41-all from git://linux-nfs.org/~bhalevy/linux-pnfs.git
which is the base for the pnfs-submit branch.
I then applied the wave3 patches from benny's pnfs-submit branch,
and then the changes.

git://linux-nfs.org/projects/andros/benny-linux-pnfs.git
branch andros-pnfs-submit-wave3 contains the result.

========================================================================
Please review the changes - I want to submit to Trond/Christoph next week.
========================================================================

These patches are in the first 12 in the pnfs-submit tree and are the original
"wave3" patches.

0001-pnfs-submit-wave3-lseg-refcounting.patch
0002-pnfs_submit-add-data-server-session-to-nfs4_setup_se.patch
0003-pnfs_submit-update-nfs4_async_handle_error-for-data-.patch
0004-pnfs_submit-update-state-renewal-for-data-servers.patch
0005-pnfs_submit-wave3-pageio-helpers.patch
0006-pnfs_submit-wave3-associate-layout-segment-with-nfs_.patch
0007-pnfs_submit-filelayout-policy-operations.patch
0008-pnfs_submit-filelayout-i-o-helpers.patch
0009-pnfs_submit-wave3-generic-read.patch
0010-pnfs_submit-filelayout-read.patch
0011-pnfs_submit-increase-NFS_MAX_FILE_IO_SIZE.patch
0012-pnfs_submit-enforce-requested-DS-only-pNFS-role.patch

The rest are the wave3 changes.

Summary of changes;
-------------------

1) The file layoutdriver now specifies it's own rpc_call_prepare and
rpc_call_done callbacks for READ.

filelayout_read_prepare:
- Uses nfs41_setup_sequence so we do not need to change nfs4_setup_sequence().

filelayout_read_done
- Add a read_done_cb function to nfs_read_data that calls nfs_read_done_cb for
NFS READs and filelayout_read_done_cb for data server READs.
- filelayout_read_done_cb has its own async error handler so we do not need to change nfs4_async_handle_error()

2) DS/MDS dual role now allows for sessions used as a data server to be reused
as an MDS or NFSv41 mount.
- We don't ask for the DS role on data server EXCHANGE_ID
- We don't strip any roles returned by the server.
- If a session is in use as a DS role, and the client subsequently mounts the
same server as either an MDS or NON_PNFS mount, the same session can be used
provided the existing exchange flags allow it.

3) We always send a zero READ/WRITE stateid seqid. This is required for
data servers, and there is no advantage to not doing it for MDS or NON_PNFS
mounts.

4) We mark the deviceid as invalid upon any data server connection failure
and print out a kernel message.
This in turn marks any layout that tries to use the devicid as failed for
both IOMODE_READ and IOMODE_RW. Inodes without layouts will still send
a layoutget. If the resultant layout uses the marked deviceid, it will be
marked as failed for both iomodes. All I/O will go through the MDS until
a client reboot or a CB_LAYOUTRECALL ALL or FSID removes all layouts that
refer to the deviceid, which removes the deviceid.

5) Our new file layout async error handler only recovers from session
related errors, or grace/delay errors. All other errors including
NFS4ERR_EXPIRED or NFS4ERR_STALE_CLIENTID result in marking the layout as
failed for IOMODE_READ and I/O is retried through the MDS.

6) Fred's lock inversion patches, and the request by Trond to not reference
a layout segment on dirty pages held in the cache changed the layout
segment reference counting.

There are a couple of small issues I'm still investigating. Trond and Fred
have done an initial review.

-->Andy


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 01/40] pnfs-submit: wave3: lseg refcounting
  2011-02-04 21:33 [PATCH 0/40] Wave3: For pNFS team review, not for kernel submission andros
@ 2011-02-04 21:33 ` andros
  2011-02-04 21:33   ` [PATCH 02/40] pnfs_submit: add data server session to nfs4_setup_sequence andros
  2011-02-10  5:59 ` [PATCH 0/40] Wave3: For pNFS team review, not for kernel submission Benny Halevy
  1 sibling, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Fred Isaman

From: Fred Isaman <iisaman@netapp.com>

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/pnfs.c |   48 ++++++++++++++++++++++++++++++++++++++----------
 fs/nfs/pnfs.h |   10 ++++++++++
 2 files changed, 48 insertions(+), 10 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 0f5b66f..32ad768 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -230,6 +230,21 @@ static void free_lseg(struct pnfs_layout_segment *lseg)
 	put_layout_hdr(NFS_I(ino)->layout);
 }
 
+static void
+_put_lseg_common(struct pnfs_layout_segment *lseg)
+{
+	struct inode *ino = lseg->pls_layout->plh_inode;
+
+	BUG_ON(test_bit(NFS_LSEG_VALID, &lseg->pls_flags));
+	list_del(&lseg->pls_list);
+	if (list_empty(&lseg->pls_layout->plh_segs)) {
+		set_bit(NFS_LAYOUT_DESTROYED, &lseg->pls_layout->plh_flags);
+		/* Matched by initial refcount set in alloc_init_layout_hdr */
+		put_layout_hdr_locked(lseg->pls_layout);
+	}
+	rpc_wake_up(&NFS_SERVER(ino)->roc_rpcwaitq);
+}
+
 /* The use of tmp_list is necessary because pnfs_curr_ld->free_lseg
  * could sleep, so must be called outside of the lock.
  * Returns 1 if object was removed, otherwise return 0.
@@ -242,22 +257,32 @@ put_lseg_locked(struct pnfs_layout_segment *lseg,
 		atomic_read(&lseg->pls_refcount),
 		test_bit(NFS_LSEG_VALID, &lseg->pls_flags));
 	if (atomic_dec_and_test(&lseg->pls_refcount)) {
-		struct inode *ino = lseg->pls_layout->plh_inode;
-
-		BUG_ON(test_bit(NFS_LSEG_VALID, &lseg->pls_flags));
-		list_del(&lseg->pls_list);
-		if (list_empty(&lseg->pls_layout->plh_segs)) {
-			set_bit(NFS_LAYOUT_DESTROYED, &lseg->pls_layout->plh_flags);
-			/* Matched by initial refcount set in alloc_init_layout_hdr */
-			put_layout_hdr_locked(lseg->pls_layout);
-		}
-		rpc_wake_up(&NFS_SERVER(ino)->roc_rpcwaitq);
+		_put_lseg_common(lseg);
 		list_add(&lseg->pls_list, tmp_list);
 		return 1;
 	}
 	return 0;
 }
 
+static void
+put_lseg(struct pnfs_layout_segment *lseg)
+{
+	struct inode *ino;
+
+	if (!lseg)
+		return;
+
+	dprintk("%s: lseg %p ref %d valid %d\n", __func__, lseg,
+		atomic_read(&lseg->pls_refcount),
+		test_bit(NFS_LSEG_VALID, &lseg->pls_flags));
+	ino = lseg->pls_layout->plh_inode;
+	if (atomic_dec_and_lock(&lseg->pls_refcount, &ino->i_lock)) {
+		_put_lseg_common(lseg);
+		spin_unlock(&ino->i_lock);
+		free_lseg(lseg);
+	}
+}
+
 static bool
 should_free_lseg(u32 lseg_iomode, u32 recall_iomode)
 {
@@ -689,6 +714,7 @@ pnfs_find_lseg(struct pnfs_layout_hdr *lo, u32 iomode)
 	list_for_each_entry(lseg, &lo->plh_segs, pls_list) {
 		if (test_bit(NFS_LSEG_VALID, &lseg->pls_flags) &&
 		    is_matching_lseg(lseg, iomode)) {
+			get_lseg(lseg);
 			ret = lseg;
 			break;
 		}
@@ -769,6 +795,7 @@ pnfs_update_layout(struct inode *ino,
 out:
 	dprintk("%s end, state 0x%lx lseg %p\n", __func__,
 		nfsi->layout->plh_flags, lseg);
+	put_lseg(lseg); /* STUB - callers currently ignore return value */
 	return lseg;
 out_unlock:
 	spin_unlock(&ino->i_lock);
@@ -821,6 +848,7 @@ pnfs_layout_process(struct nfs4_layoutget *lgp)
 	}
 	init_lseg(lo, lseg);
 	lseg->pls_range = res->range;
+	get_lseg(lseg);
 	*lgp->lsegpp = lseg;
 	pnfs_insert_layout(lo, lseg);
 
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index e2612ea..fe50faa 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -177,6 +177,12 @@ static inline int lo_fail_bit(u32 iomode)
 			 NFS_LAYOUT_RW_FAILED : NFS_LAYOUT_RO_FAILED;
 }
 
+static inline void get_lseg(struct pnfs_layout_segment *lseg)
+{
+	atomic_inc(&lseg->pls_refcount);
+	smp_mb__after_atomic_inc();
+}
+
 /* Return true if a layout driver is being used for this mountpoint */
 static inline int pnfs_enabled_sb(struct nfs_server *nfss)
 {
@@ -193,6 +199,10 @@ static inline void pnfs_destroy_layout(struct nfs_inode *nfsi)
 {
 }
 
+static inline void get_lseg(struct pnfs_layout_segment *lseg)
+{
+}
+
 static inline struct pnfs_layout_segment *
 pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
 		   enum pnfs_iomode access_type)
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 02/40] pnfs_submit: add data server session to nfs4_setup_sequence
  2011-02-04 21:33 ` [PATCH 01/40] pnfs-submit: wave3: lseg refcounting andros
@ 2011-02-04 21:33   ` andros
  2011-02-04 21:33     ` [PATCH 03/40] pnfs_submit: update nfs4_async_handle_error for data server andros
  0 siblings, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy
  Cc: linux-nfs, Andy Adamson, Dean Hildebrand, Marc Eshel, Mike Sager,
	Oleg Drokin, Tigran Mkrtchyan, Andy Adamson, Fred Isaman

From: The pNFS Team <linux-nfs@vger.kernel.org>

Note: original patch of same name...

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off_by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/nfs4_fs.h  |    3 +++
 fs/nfs/nfs4proc.c |   17 ++++++++++-------
 fs/nfs/read.c     |    2 +-
 fs/nfs/unlink.c   |    4 ++--
 fs/nfs/write.c    |    2 +-
 5 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 7a74740..28fda51 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -250,10 +250,12 @@ static inline struct nfs4_session *nfs4_get_session(const struct nfs_server *ser
 }
 
 extern int nfs4_setup_sequence(const struct nfs_server *server,
+		struct nfs4_session *ds_session,
 		struct nfs4_sequence_args *args, struct nfs4_sequence_res *res,
 		int cache_reply, struct rpc_task *task);
 extern void nfs4_destroy_session(struct nfs4_session *session);
 extern struct nfs4_session *nfs4_alloc_session(struct nfs_client *clp);
+extern int nfs4_proc_exchange_id(struct nfs_client *, struct rpc_cred *);
 extern int nfs4_proc_create_session(struct nfs_client *);
 extern int nfs4_proc_destroy_session(struct nfs4_session *);
 extern int nfs4_init_session(struct nfs_server *server);
@@ -266,6 +268,7 @@ static inline struct nfs4_session *nfs4_get_session(const struct nfs_server *ser
 }
 
 static inline int nfs4_setup_sequence(const struct nfs_server *server,
+		struct nfs4_session *ds_session,
 		struct nfs4_sequence_args *args, struct nfs4_sequence_res *res,
 		int cache_reply, struct rpc_task *task)
 {
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 78936a8..9d0d636 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -573,6 +573,7 @@ static int nfs41_setup_sequence(struct nfs4_session *session,
 }
 
 int nfs4_setup_sequence(const struct nfs_server *server,
+		struct nfs4_session *ds_session,
 			struct nfs4_sequence_args *args,
 			struct nfs4_sequence_res *res,
 			int cache_reply,
@@ -581,6 +582,8 @@ int nfs4_setup_sequence(const struct nfs_server *server,
 	struct nfs4_session *session = nfs4_get_session(server);
 	int ret = 0;
 
+	if (ds_session)
+		session = ds_session;
 	if (session == NULL) {
 		args->sa_session = NULL;
 		res->sr_session = NULL;
@@ -611,7 +614,7 @@ static void nfs41_call_sync_prepare(struct rpc_task *task, void *calldata)
 
 	dprintk("--> %s data->seq_server %p\n", __func__, data->seq_server);
 
-	if (nfs4_setup_sequence(data->seq_server, data->seq_args,
+	if (nfs4_setup_sequence(data->seq_server, NULL, data->seq_args,
 				data->seq_res, data->cache_reply, task))
 		return;
 	rpc_call_start(task);
@@ -1399,7 +1402,7 @@ static void nfs4_open_prepare(struct rpc_task *task, void *calldata)
 		nfs_copy_fh(&data->o_res.fh, data->o_arg.fh);
 	}
 	data->timestamp = jiffies;
-	if (nfs4_setup_sequence(data->o_arg.server,
+	if (nfs4_setup_sequence(data->o_arg.server, NULL,
 				&data->o_arg.seq_args,
 				&data->o_res.seq_res, 1, task))
 		return;
@@ -1950,7 +1953,7 @@ static void nfs4_close_prepare(struct rpc_task *task, void *data)
 
 	nfs_fattr_init(calldata->res.fattr);
 	calldata->timestamp = jiffies;
-	if (nfs4_setup_sequence(NFS_SERVER(calldata->inode),
+	if (nfs4_setup_sequence(NFS_SERVER(calldata->inode), NULL,
 				&calldata->arg.seq_args, &calldata->res.seq_res,
 				1, task))
 		return;
@@ -3652,7 +3655,7 @@ static void nfs4_delegreturn_prepare(struct rpc_task *task, void *data)
 
 	d_data = (struct nfs4_delegreturndata *)data;
 
-	if (nfs4_setup_sequence(d_data->res.server,
+	if (nfs4_setup_sequence(d_data->res.server, NULL,
 				&d_data->args.seq_args,
 				&d_data->res.seq_res, 1, task))
 		return;
@@ -3904,7 +3907,7 @@ static void nfs4_locku_prepare(struct rpc_task *task, void *data)
 		return;
 	}
 	calldata->timestamp = jiffies;
-	if (nfs4_setup_sequence(calldata->server,
+	if (nfs4_setup_sequence(calldata->server, NULL,
 				&calldata->arg.seq_args,
 				&calldata->res.seq_res, 1, task))
 		return;
@@ -4059,7 +4062,7 @@ static void nfs4_lock_prepare(struct rpc_task *task, void *calldata)
 	} else
 		data->arg.new_lock_owner = 0;
 	data->timestamp = jiffies;
-	if (nfs4_setup_sequence(data->server,
+	if (nfs4_setup_sequence(data->server, NULL,
 				&data->arg.seq_args,
 				&data->res.seq_res, 1, task))
 		return;
@@ -5328,7 +5331,7 @@ nfs4_layoutget_prepare(struct rpc_task *task, void *calldata)
 	 * However, that is not so catastrophic, and there seems
 	 * to be no way to prevent it completely.
 	 */
-	if (nfs4_setup_sequence(server, &lgp->args.seq_args,
+	if (nfs4_setup_sequence(server, NULL, &lgp->args.seq_args,
 				&lgp->res.seq_res, 0, task))
 		return;
 	if (pnfs_choose_layoutget_stateid(&lgp->args.stateid,
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index aedcaa7..6c224e8 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -411,7 +411,7 @@ void nfs_read_prepare(struct rpc_task *task, void *calldata)
 {
 	struct nfs_read_data *data = calldata;
 
-	if (nfs4_setup_sequence(NFS_SERVER(data->inode),
+	if (nfs4_setup_sequence(NFS_SERVER(data->inode), NULL,
 				&data->args.seq_args, &data->res.seq_res,
 				0, task))
 		return;
diff --git a/fs/nfs/unlink.c b/fs/nfs/unlink.c
index e313a51..82dc70b 100644
--- a/fs/nfs/unlink.c
+++ b/fs/nfs/unlink.c
@@ -113,7 +113,7 @@ void nfs_unlink_prepare(struct rpc_task *task, void *calldata)
 	struct nfs_unlinkdata *data = calldata;
 	struct nfs_server *server = NFS_SERVER(data->dir);
 
-	if (nfs4_setup_sequence(server, &data->args.seq_args,
+	if (nfs4_setup_sequence(server, NULL, &data->args.seq_args,
 				&data->res.seq_res, 1, task))
 		return;
 	rpc_call_start(task);
@@ -388,7 +388,7 @@ static void nfs_rename_prepare(struct rpc_task *task, void *calldata)
 	struct nfs_renamedata *data = calldata;
 	struct nfs_server *server = NFS_SERVER(data->old_dir);
 
-	if (nfs4_setup_sequence(server, &data->args.seq_args,
+	if (nfs4_setup_sequence(server, NULL, &data->args.seq_args,
 				&data->res.seq_res, 1, task))
 		return;
 	rpc_call_start(task);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index c8278f4..7a2905d 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1048,7 +1048,7 @@ void nfs_write_prepare(struct rpc_task *task, void *calldata)
 {
 	struct nfs_write_data *data = calldata;
 
-	if (nfs4_setup_sequence(NFS_SERVER(data->inode),
+	if (nfs4_setup_sequence(NFS_SERVER(data->inode), NULL,
 				&data->args.seq_args,
 				&data->res.seq_res, 1, task))
 		return;
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 03/40] pnfs_submit: update nfs4_async_handle_error for data server
  2011-02-04 21:33   ` [PATCH 02/40] pnfs_submit: add data server session to nfs4_setup_sequence andros
@ 2011-02-04 21:33     ` andros
  2011-02-04 21:33       ` [PATCH 04/40] pnfs_submit: update state renewal for data servers andros
  0 siblings, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson, Boaz Harrosh, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/nfs4proc.c         |   37 +++++++++++++++++++++----------------
 include/linux/nfs4.h      |    7 +++++++
 include/linux/nfs_fs_sb.h |   10 ++++++++++
 3 files changed, 38 insertions(+), 16 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 9d0d636..519b9bd 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -70,7 +70,7 @@ struct nfs4_opendata;
 static int _nfs4_proc_open(struct nfs4_opendata *data);
 static int _nfs4_recover_proc_open(struct nfs4_opendata *data);
 static int nfs4_do_fsinfo(struct nfs_server *, struct nfs_fh *, struct nfs_fsinfo *);
-static int nfs4_async_handle_error(struct rpc_task *, const struct nfs_server *, struct nfs4_state *);
+static int nfs4_async_handle_error(struct rpc_task *, const struct nfs_server *, struct nfs4_state *, struct nfs_client *);
 static int _nfs4_proc_lookup(struct inode *dir, const struct qstr *name, struct nfs_fh *fhandle, struct nfs_fattr *fattr);
 static int _nfs4_proc_getattr(struct nfs_server *server, struct nfs_fh *fhandle, struct nfs_fattr *fattr);
 static int nfs4_do_setattr(struct inode *inode, struct rpc_cred *cred,
@@ -1901,7 +1901,7 @@ static void nfs4_close_done(struct rpc_task *task, void *data)
 			if (calldata->arg.fmode == 0)
 				break;
 		default:
-			if (nfs4_async_handle_error(task, server, state) == -EAGAIN)
+			if (nfs4_async_handle_error(task, server, state, NULL) == -EAGAIN)
 				rpc_restart_call_prepare(task);
 	}
 	nfs_release_seqid(calldata->arg.seqid);
@@ -2600,7 +2600,7 @@ static int nfs4_proc_unlink_done(struct rpc_task *task, struct inode *dir)
 
 	if (!nfs4_sequence_done(task, &res->seq_res))
 		return 0;
-	if (nfs4_async_handle_error(task, res->server, NULL) == -EAGAIN)
+	if (nfs4_async_handle_error(task, res->server, NULL, NULL) == -EAGAIN)
 		return 0;
 	update_changeattr(dir, &res->cinfo);
 	nfs_post_op_update_inode(dir, res->dir_attr);
@@ -2625,7 +2625,7 @@ static int nfs4_proc_rename_done(struct rpc_task *task, struct inode *old_dir,
 
 	if (!nfs4_sequence_done(task, &res->seq_res))
 		return 0;
-	if (nfs4_async_handle_error(task, res->server, NULL) == -EAGAIN)
+	if (nfs4_async_handle_error(task, res->server, NULL, NULL) == -EAGAIN)
 		return 0;
 
 	update_changeattr(old_dir, &res->old_cinfo);
@@ -3082,7 +3082,7 @@ static int nfs4_read_done(struct rpc_task *task, struct nfs_read_data *data)
 	if (!nfs4_sequence_done(task, &data->res.seq_res))
 		return -EAGAIN;
 
-	if (nfs4_async_handle_error(task, server, data->args.context->state) == -EAGAIN) {
+	if (nfs4_async_handle_error(task, server, data->args.context->state, NULL) == -EAGAIN) {
 		nfs_restart_rpc(task, server->nfs_client);
 		return -EAGAIN;
 	}
@@ -3106,7 +3106,7 @@ static int nfs4_write_done(struct rpc_task *task, struct nfs_write_data *data)
 	if (!nfs4_sequence_done(task, &data->res.seq_res))
 		return -EAGAIN;
 
-	if (nfs4_async_handle_error(task, NFS_SERVER(inode), data->args.context->state) == -EAGAIN) {
+	if (nfs4_async_handle_error(task, NFS_SERVER(inode), data->args.context->state, NULL) == -EAGAIN) {
 		nfs_restart_rpc(task, NFS_SERVER(inode)->nfs_client);
 		return -EAGAIN;
 	}
@@ -3135,7 +3135,7 @@ static int nfs4_commit_done(struct rpc_task *task, struct nfs_write_data *data)
 	if (!nfs4_sequence_done(task, &data->res.seq_res))
 		return -EAGAIN;
 
-	if (nfs4_async_handle_error(task, NFS_SERVER(inode), NULL) == -EAGAIN) {
+	if (nfs4_async_handle_error(task, NFS_SERVER(inode), NULL, NULL) == -EAGAIN) {
 		nfs_restart_rpc(task, NFS_SERVER(inode)->nfs_client);
 		return -EAGAIN;
 	}
@@ -3455,9 +3455,10 @@ static int nfs4_proc_set_acl(struct inode *inode, const void *buf, size_t buflen
 }
 
 static int
-nfs4_async_handle_error(struct rpc_task *task, const struct nfs_server *server, struct nfs4_state *state)
+nfs4_async_handle_error(struct rpc_task *task, const struct nfs_server *server, struct nfs4_state *state, struct nfs_client *clp)
 {
-	struct nfs_client *clp = server->nfs_client;
+	if (!clp)
+		clp = server->nfs_client;
 
 	if (task->tk_status >= 0)
 		return 0;
@@ -3481,14 +3482,16 @@ nfs4_async_handle_error(struct rpc_task *task, const struct nfs_server *server,
 		case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION:
 		case -NFS4ERR_SEQ_FALSE_RETRY:
 		case -NFS4ERR_SEQ_MISORDERED:
-			dprintk("%s ERROR %d, Reset session\n", __func__,
-				task->tk_status);
+			dprintk("%s ERROR %d, Reset session. Exchangeid "
+				"flags 0x%x\n", __func__, task->tk_status,
+				clp->cl_exchange_flags);
 			nfs4_schedule_state_recovery(clp);
 			task->tk_status = 0;
 			return -EAGAIN;
 #endif /* CONFIG_NFS_V4_1 */
 		case -NFS4ERR_DELAY:
-			nfs_inc_server_stats(server, NFSIOS_DELAY);
+			if (server)
+				nfs_inc_server_stats(server, NFSIOS_DELAY);
 		case -NFS4ERR_GRACE:
 		case -EKEYEXPIRED:
 			rpc_delay(task, NFS4_POLL_RETRY_MAX);
@@ -3501,6 +3504,8 @@ nfs4_async_handle_error(struct rpc_task *task, const struct nfs_server *server,
 	task->tk_status = nfs4_map_errors(task->tk_status);
 	return 0;
 do_state_recovery:
+	if (is_ds_only_client(clp))
+		return 0;
 	rpc_sleep_on(&clp->cl_rpcwaitq, task, NULL);
 	nfs4_schedule_state_recovery(clp);
 	if (test_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state) == 0)
@@ -3634,8 +3639,8 @@ static void nfs4_delegreturn_done(struct rpc_task *task, void *calldata)
 		renew_lease(data->res.server, data->timestamp);
 		break;
 	default:
-		if (nfs4_async_handle_error(task, data->res.server, NULL) ==
-				-EAGAIN) {
+		if (nfs4_async_handle_error(task, data->res.server, NULL, NULL)
+				== -EAGAIN) {
 			nfs_restart_rpc(task, data->res.server->nfs_client);
 			return;
 		}
@@ -3889,7 +3894,7 @@ static void nfs4_locku_done(struct rpc_task *task, void *data)
 		case -NFS4ERR_EXPIRED:
 			break;
 		default:
-			if (nfs4_async_handle_error(task, calldata->server, NULL) == -EAGAIN)
+			if (nfs4_async_handle_error(task, calldata->server, NULL, NULL) == -EAGAIN)
 				nfs_restart_rpc(task,
 						 calldata->server->nfs_client);
 	}
@@ -5361,7 +5366,7 @@ static void nfs4_layoutget_done(struct rpc_task *task, void *calldata)
 		task->tk_status = -NFS4ERR_DELAY;
 		/* Fall through */
 	default:
-		if (nfs4_async_handle_error(task, server, NULL) == -EAGAIN) {
+		if (nfs4_async_handle_error(task, server, NULL, NULL) == -EAGAIN) {
 			rpc_restart_call_prepare(task);
 			return;
 		}
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 134716e..b32a792 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -131,6 +131,13 @@
 #define EXCHGID4_FLAG_MASK_A			0x40070103
 #define EXCHGID4_FLAG_MASK_R			0x80070103
 
+static inline bool
+is_ds_only_session(u32 exchange_flags)
+{
+	u32 mask = EXCHGID4_FLAG_USE_PNFS_DS | EXCHGID4_FLAG_USE_PNFS_MDS;
+	return (exchange_flags & mask) == EXCHGID4_FLAG_USE_PNFS_DS;
+}
+
 #define SEQ4_STATUS_CB_PATH_DOWN		0x00000001
 #define SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING	0x00000002
 #define SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED	0x00000004
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index b197563..017f835 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -87,6 +87,16 @@ struct nfs_client {
 #endif
 };
 
+static inline bool
+is_ds_only_client(struct nfs_client *clp)
+{
+#ifdef CONFIG_NFS_V4_1
+	return is_ds_only_session(clp->cl_exchange_flags);
+#else
+	return false;
+#endif
+}
+
 /*
  * NFS client parameters stored in the superblock.
  */
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 04/40] pnfs_submit: update state renewal for data servers
  2011-02-04 21:33     ` [PATCH 03/40] pnfs_submit: update nfs4_async_handle_error for data server andros
@ 2011-02-04 21:33       ` andros
  2011-02-04 21:33         ` [PATCH 05/40] pnfs_submit: wave3 pageio-helpers andros
  0 siblings, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson, Boaz Harrosh, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/nfs4renewd.c |    2 +-
 fs/nfs/nfs4state.c  |    5 +++++
 2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/nfs4renewd.c b/fs/nfs/nfs4renewd.c
index 402143d..c8dbbeb 100644
--- a/fs/nfs/nfs4renewd.c
+++ b/fs/nfs/nfs4renewd.c
@@ -65,7 +65,7 @@ nfs4_renew_state(struct work_struct *work)
 	dprintk("%s: start\n", __func__);
 
 	rcu_read_lock();
-	if (list_empty(&clp->cl_superblocks)) {
+	if (list_empty(&clp->cl_superblocks) && !is_ds_only_client(clp)) {
 		rcu_read_unlock();
 		goto out;
 	}
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index e6742b5..49433aa 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -153,6 +153,11 @@ static int nfs41_setup_state_renewal(struct nfs_client *clp)
 	int status;
 	struct nfs_fsinfo fsinfo;
 
+	if (is_ds_only_client(clp)) {
+		nfs4_schedule_state_renewal(clp);
+		return 0;
+	}
+
 	status = nfs4_proc_get_lease_time(clp, &fsinfo);
 	if (status == 0) {
 		/* Update lease time and schedule renewal */
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 05/40] pnfs_submit: wave3 pageio-helpers
  2011-02-04 21:33       ` [PATCH 04/40] pnfs_submit: update state renewal for data servers andros
@ 2011-02-04 21:33         ` andros
  2011-02-04 21:33           ` [PATCH 06/40] pnfs_submit: wave3 associate layout segment with nfs_page andros
  0 siblings, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy
  Cc: linux-nfs, Andy Adamson, Andy Adamson, Dean Hildebrand,
	Fred Isaman, Fred Isaman

From: Andy Adamson <andros@netapp.com>

Signed-off-by: Andy Adamon <andros@netapp.com>
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
[pnfs-submit: init pg_lseg to NULL when !CONFIG_NFS_V4_1]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/pagelist.c        |   11 +++++++++--
 fs/nfs/pnfs.c            |   40 ++++++++++++++++++++++++++++++++++++++++
 fs/nfs/pnfs.h            |   14 ++++++++++++++
 fs/nfs/read.c            |    4 ++--
 include/linux/nfs_page.h |    5 +++++
 5 files changed, 70 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index e1164e3..0e8dece 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -240,7 +240,8 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
  * Return 'true' if this is the case, else return 'false'.
  */
 static int nfs_can_coalesce_requests(struct nfs_page *prev,
-				     struct nfs_page *req)
+				     struct nfs_page *req,
+				     struct nfs_pageio_descriptor *pgio)
 {
 	if (req->wb_context->cred != prev->wb_context->cred)
 		return 0;
@@ -254,6 +255,12 @@ static int nfs_can_coalesce_requests(struct nfs_page *prev,
 		return 0;
 	if (prev->wb_pgbase + prev->wb_bytes != PAGE_CACHE_SIZE)
 		return 0;
+	if (req->wb_lseg != prev->wb_lseg)
+		return 0;
+#ifdef CONFIG_NFS_V4_1
+	if (pgio->pg_test && !pgio->pg_test(pgio, prev, req))
+		return 0;
+#endif /* CONFIG_NFS_V4_1 */
 	return 1;
 }
 
@@ -286,7 +293,7 @@ static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
 		if (newlen > desc->pg_bsize)
 			return 0;
 		prev = nfs_list_entry(desc->pg_list.prev);
-		if (!nfs_can_coalesce_requests(prev, req))
+		if (!nfs_can_coalesce_requests(prev, req, desc))
 			return 0;
 	} else
 		desc->pg_base = req->wb_pgbase;
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 32ad768..c7199db 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -870,6 +870,46 @@ out_forget_reply:
 	goto out;
 }
 
+void
+pnfs_set_pg_test(struct inode *inode, struct nfs_pageio_descriptor *pgio)
+{
+	struct pnfs_layout_hdr *lo;
+	struct pnfs_layoutdriver_type *ld;
+
+	pgio->pg_test = NULL;
+
+	lo = NFS_I(inode)->layout;
+	ld = NFS_SERVER(inode)->pnfs_curr_ld;
+	if (!ld || !lo)
+		return;
+
+	pgio->pg_test = ld->pg_test;
+}
+
+/*
+ * rsize is already set by caller to MDS rsize.
+ */
+void
+pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,
+		  struct inode *inode,
+		  struct nfs_open_context *ctx,
+		  struct list_head *pages)
+{
+	struct nfs_server *nfss = NFS_SERVER(inode);
+
+	pgio->pg_test = NULL;
+	pgio->pg_lseg = NULL;
+
+	if (!pnfs_enabled_sb(nfss))
+		return;
+
+	pgio->pg_lseg = pnfs_update_layout(inode, ctx, IOMODE_READ);
+	if (!pgio->pg_lseg)
+		return;
+
+	pnfs_set_pg_test(inode, pgio);
+}
+
 /*
  * Device ID cache. Currently supports one layout type per struct nfs_client.
  * Add layout type to the lookup key to expand to support multiple types.
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index fe50faa..7a75a0c 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -30,6 +30,8 @@
 #ifndef FS_NFS_PNFS_H
 #define FS_NFS_PNFS_H
 
+#include <linux/nfs_page.h>
+
 enum {
 	NFS_LSEG_VALID = 0,	/* cleared when lseg is recalled/returned */
 	NFS_LSEG_ROC,		/* roc bit received from server */
@@ -65,6 +67,9 @@ struct pnfs_layoutdriver_type {
 	int (*clear_layoutdriver) (struct nfs_server *);
 	struct pnfs_layout_segment * (*alloc_lseg) (struct pnfs_layout_hdr *layoutid, struct nfs4_layoutget_res *lgr);
 	void (*free_lseg) (struct pnfs_layout_segment *lseg);
+
+	/* test for nfs page cache coalescing */
+	int (*pg_test)(struct nfs_pageio_descriptor *, struct nfs_page *, struct nfs_page *);
 };
 
 struct pnfs_layout_hdr {
@@ -151,6 +156,8 @@ pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
 		   enum pnfs_iomode access_type);
 void set_pnfs_layoutdriver(struct nfs_server *, u32 id);
 void unset_pnfs_layoutdriver(struct nfs_server *);
+void pnfs_pageio_init_read(struct nfs_pageio_descriptor *, struct inode *,
+			   struct nfs_open_context *, struct list_head *);
 int pnfs_layout_process(struct nfs4_layoutget *lgp);
 void pnfs_free_lseg_list(struct list_head *tmp_list);
 void pnfs_destroy_layout(struct nfs_inode *);
@@ -240,6 +247,13 @@ static inline void unset_pnfs_layoutdriver(struct nfs_server *s)
 {
 }
 
+static inline void
+pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio, struct inode *ino,
+		      struct nfs_open_context *ctx, struct list_head *pages)
+{
+	pgio->pg_lseg = NULL;
+}
+
 #endif /* CONFIG_NFS_V4_1 */
 
 #endif /* FS_NFS_PNFS_H */
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 6c224e8..11e7d6e 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -20,12 +20,12 @@
 #include <linux/nfs_page.h>
 
 #include <asm/system.h>
+#include "pnfs.h"
 
 #include "nfs4_fs.h"
 #include "internal.h"
 #include "iostat.h"
 #include "fscache.h"
-#include "pnfs.h"
 
 #define NFSDBG_FACILITY		NFSDBG_PAGECACHE
 
@@ -625,7 +625,7 @@ int nfs_readpages(struct file *filp, struct address_space *mapping,
 	if (ret == 0)
 		goto read_complete; /* all pages were read */
 
-	pnfs_update_layout(inode, desc.ctx, IOMODE_READ);
+	pnfs_pageio_init_read(&pgio, inode, desc.ctx, pages);
 	if (rsize < PAGE_CACHE_SIZE)
 		nfs_pageio_init(&pgio, inode, nfs_pagein_multi, rsize, 0);
 	else
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index d55cee7..ec9fd30 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -49,6 +49,7 @@ struct nfs_page {
 	struct kref		wb_kref;	/* reference count */
 	unsigned long		wb_flags;
 	struct nfs_writeverf	wb_verf;	/* Commit cookie */
+	struct pnfs_layout_segment *wb_lseg;	/* Pnfs layout info */
 };
 
 struct nfs_pageio_descriptor {
@@ -62,6 +63,10 @@ struct nfs_pageio_descriptor {
 	int			(*pg_doio)(struct inode *, struct list_head *, unsigned int, size_t, int);
 	int 			pg_ioflags;
 	int			pg_error;
+	struct pnfs_layout_segment *pg_lseg;
+#ifdef CONFIG_NFS_V4_1
+	int			(*pg_test)(struct nfs_pageio_descriptor *, struct nfs_page *, struct nfs_page *);
+#endif /* CONFIG_NFS_V4_1 */
 };
 
 #define NFS_WBACK_BUSY(req)	(test_bit(PG_BUSY,&(req)->wb_flags))
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 06/40] pnfs_submit: wave3 associate layout segment with nfs_page
  2011-02-04 21:33         ` [PATCH 05/40] pnfs_submit: wave3 pageio-helpers andros
@ 2011-02-04 21:33           ` andros
  2011-02-04 21:33             ` [PATCH 07/40] pnfs_submit: filelayout policy operations andros
  0 siblings, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy
  Cc: linux-nfs, Andy Adamson, Andy Adamson, Boaz Harrosh,
	Dean Hildebrand, Fred Isaman

From: Andy Adamson <andros@netapp.com>

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/file.c            |    4 ----
 fs/nfs/pagelist.c        |   12 ++++++++++--
 fs/nfs/pnfs.c            |   14 ++++++++++++--
 fs/nfs/pnfs.h            |    5 +++++
 fs/nfs/read.c            |    9 ++++++---
 fs/nfs/write.c           |    2 +-
 include/linux/nfs_page.h |    3 ++-
 7 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 7bf029e..d85a534 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -387,10 +387,6 @@ static int nfs_write_begin(struct file *file, struct address_space *mapping,
 		file->f_path.dentry->d_name.name,
 		mapping->host->i_ino, len, (long long) pos);
 
-	pnfs_update_layout(mapping->host,
-			   nfs_file_open_context(file),
-			   IOMODE_RW);
-
 start:
 	/*
 	 * Prevent starvation issues if someone is doing a consistency
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 0e8dece..9a27592 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -20,6 +20,7 @@
 #include <linux/nfs_mount.h>
 
 #include "internal.h"
+#include "pnfs.h"
 
 static struct kmem_cache *nfs_page_cachep;
 
@@ -53,7 +54,8 @@ nfs_page_free(struct nfs_page *p)
 struct nfs_page *
 nfs_create_request(struct nfs_open_context *ctx, struct inode *inode,
 		   struct page *page,
-		   unsigned int offset, unsigned int count)
+		   unsigned int offset, unsigned int count,
+		   struct pnfs_layout_segment *lseg)
 {
 	struct nfs_page		*req;
 
@@ -84,6 +86,9 @@ nfs_create_request(struct nfs_open_context *ctx, struct inode *inode,
 	req->wb_bytes   = count;
 	req->wb_context = get_nfs_open_context(ctx);
 	kref_init(&req->wb_kref);
+	req->wb_lseg    = lseg;
+	if (lseg)
+		get_lseg(lseg);
 	return req;
 }
 
@@ -159,9 +164,12 @@ void nfs_clear_request(struct nfs_page *req)
 		put_nfs_open_context(ctx);
 		req->wb_context = NULL;
 	}
+	if (req->wb_lseg != NULL) {
+		put_lseg(req->wb_lseg);
+		req->wb_lseg = NULL;
+	}
 }
 
-
 /**
  * nfs_release_request - Release the count on an NFS read/write request
  * @req: request to release
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index c7199db..1811204 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -264,7 +264,7 @@ put_lseg_locked(struct pnfs_layout_segment *lseg,
 	return 0;
 }
 
-static void
+void
 put_lseg(struct pnfs_layout_segment *lseg)
 {
 	struct inode *ino;
@@ -282,6 +282,7 @@ put_lseg(struct pnfs_layout_segment *lseg)
 		free_lseg(lseg);
 	}
 }
+EXPORT_SYMBOL_GPL(put_lseg);
 
 static bool
 should_free_lseg(u32 lseg_iomode, u32 recall_iomode)
@@ -795,7 +796,6 @@ pnfs_update_layout(struct inode *ino,
 out:
 	dprintk("%s end, state 0x%lx lseg %p\n", __func__,
 		nfsi->layout->plh_flags, lseg);
-	put_lseg(lseg); /* STUB - callers currently ignore return value */
 	return lseg;
 out_unlock:
 	spin_unlock(&ino->i_lock);
@@ -910,6 +910,16 @@ pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,
 	pnfs_set_pg_test(inode, pgio);
 }
 
+static void _pnfs_clear_lseg_from_pages(struct list_head *head)
+{
+	struct nfs_page *req;
+
+	list_for_each_entry(req, head, wb_list) {
+		put_lseg(req->wb_lseg);
+		req->wb_lseg = NULL;
+	}
+}
+
 /*
  * Device ID cache. Currently supports one layout type per struct nfs_client.
  * Add layout type to the lookup key to expand to support multiple types.
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 7a75a0c..7614c3b 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -151,6 +151,7 @@ extern int nfs4_proc_layoutget(struct nfs4_layoutget *lgp);
 
 /* pnfs.c */
 void get_layout_hdr(struct pnfs_layout_hdr *lo);
+void put_lseg(struct pnfs_layout_segment *lseg);
 struct pnfs_layout_segment *
 pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
 		   enum pnfs_iomode access_type);
@@ -210,6 +211,10 @@ static inline void get_lseg(struct pnfs_layout_segment *lseg)
 {
 }
 
+static inline void put_lseg(struct pnfs_layout_segment *lseg)
+{
+}
+
 static inline struct pnfs_layout_segment *
 pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
 		   enum pnfs_iomode access_type)
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 11e7d6e..2eac0f1 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -117,12 +117,14 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
 	LIST_HEAD(one_request);
 	struct nfs_page	*new;
 	unsigned int len;
+	struct pnfs_layout_segment *lseg;
 
 	len = nfs_page_length(page);
 	if (len == 0)
 		return nfs_return_empty_page(page);
-	pnfs_update_layout(inode, ctx, IOMODE_READ);
-	new = nfs_create_request(ctx, inode, page, 0, len);
+	lseg = pnfs_update_layout(inode, ctx, IOMODE_READ);
+	new = nfs_create_request(ctx, inode, page, 0, len, lseg);
+	put_lseg(lseg);
 	if (IS_ERR(new)) {
 		unlock_page(page);
 		return PTR_ERR(new);
@@ -569,7 +571,8 @@ readpage_async_filler(void *data, struct page *page)
 	if (len == 0)
 		return nfs_return_empty_page(page);
 
-	new = nfs_create_request(desc->ctx, inode, page, 0, len);
+	new = nfs_create_request(desc->ctx, inode, page, 0, len,
+				 desc->pgio->pg_lseg);
 	if (IS_ERR(new))
 		goto out_error;
 
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 7a2905d..7f3c10a 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -651,7 +651,7 @@ static struct nfs_page * nfs_setup_write_request(struct nfs_open_context* ctx,
 	req = nfs_try_to_update_request(inode, page, offset, bytes);
 	if (req != NULL)
 		goto out;
-	req = nfs_create_request(ctx, inode, page, offset, bytes);
+	req = nfs_create_request(ctx, inode, page, offset, bytes, NULL);
 	if (IS_ERR(req))
 		goto out;
 	error = nfs_inode_add_request(inode, req);
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index ec9fd30..484c5b9 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -75,7 +75,8 @@ extern	struct nfs_page *nfs_create_request(struct nfs_open_context *ctx,
 					    struct inode *inode,
 					    struct page *page,
 					    unsigned int offset,
-					    unsigned int count);
+					    unsigned int count,
+					    struct pnfs_layout_segment *lseg);
 extern	void nfs_clear_request(struct nfs_page *req);
 extern	void nfs_release_request(struct nfs_page *req);
 
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 07/40] pnfs_submit: filelayout policy operations
  2011-02-04 21:33           ` [PATCH 06/40] pnfs_submit: wave3 associate layout segment with nfs_page andros
@ 2011-02-04 21:33             ` andros
  2011-02-04 21:33               ` [PATCH 08/40] pnfs_submit: filelayout i/o helpers andros
  0 siblings, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy
  Cc: linux-nfs, Andy Adamson, Dean Hildebrand, Oleg Drokin, Tao Guo,
	Andy Adamson, Fred Isaman

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/nfs4filelayout.c |   28 ++++++++++++++++++++++++++++
 1 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 23f930c..8b1c4ad 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -252,6 +252,33 @@ filelayout_free_lseg(struct pnfs_layout_segment *lseg)
 	_filelayout_free_lseg(fl);
 }
 
+/*
+ * filelayout_pg_test(). Called by nfs_can_coalesce_requests()
+ *
+ * return 1 :  coalesce page
+ * return 0 :  don't coalesce page
+ *
+ * By the time this is called, we know req->wb_lseg == prev->wb_lseg
+ */
+int
+filelayout_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
+		   struct nfs_page *req)
+{
+	u64 p_stripe, r_stripe;
+	u32 stripe_unit;
+
+	if (!req->wb_lseg)
+		return 1;
+	p_stripe = (u64)prev->wb_index << PAGE_CACHE_SHIFT;
+	r_stripe = (u64)req->wb_index << PAGE_CACHE_SHIFT;
+	stripe_unit = FILELAYOUT_LSEG(req->wb_lseg)->stripe_unit;
+
+	do_div(p_stripe, stripe_unit);
+	do_div(r_stripe, stripe_unit);
+
+	return (p_stripe == r_stripe);
+}
+
 static struct pnfs_layoutdriver_type filelayout_type = {
 	.id = LAYOUT_NFSV4_1_FILES,
 	.name = "LAYOUT_NFSV4_1_FILES",
@@ -260,6 +287,7 @@ static struct pnfs_layoutdriver_type filelayout_type = {
 	.clear_layoutdriver = filelayout_clear_layoutdriver,
 	.alloc_lseg              = filelayout_alloc_lseg,
 	.free_lseg               = filelayout_free_lseg,
+	.pg_test                 = filelayout_pg_test,
 };
 
 static int __init nfs4filelayout_init(void)
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 08/40] pnfs_submit: filelayout i/o helpers
  2011-02-04 21:33             ` [PATCH 07/40] pnfs_submit: filelayout policy operations andros
@ 2011-02-04 21:33               ` andros
  2011-02-04 21:33                 ` [PATCH 09/40] pnfs_submit: wave3 generic read andros
  0 siblings, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy
  Cc: linux-nfs, Andy Adamson, Dean Hildebrand, Fred Isaman,
	Marc Eshel, Mike Sager, Oleg Drokin, Tao Guo, Tigran Mkrtchyan,
	Tigran Mkrtchyan, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
Signed-off-by: Tigran Mkrtchyan <tigran@anahit.desy.de>
Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/client.c            |    7 +-
 fs/nfs/internal.h          |   12 +++
 fs/nfs/nfs4filelayout.c    |   32 ++++++++
 fs/nfs/nfs4filelayout.h    |    6 ++
 fs/nfs/nfs4filelayoutdev.c |  177 ++++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/nfs4proc.c          |    8 +-
 6 files changed, 236 insertions(+), 6 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index bd3ca32..ea2d032 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -404,7 +404,7 @@ static int nfs_sockaddr_match_ipaddr(const struct sockaddr *sa1,
  * Test if two socket addresses represent the same actual socket,
  * by comparing (only) relevant fields, including the port number.
  */
-static int nfs_sockaddr_cmp(const struct sockaddr *sa1,
+int nfs_sockaddr_cmp(const struct sockaddr *sa1,
 			    const struct sockaddr *sa2)
 {
 	if (sa1->sa_family != sa2->sa_family)
@@ -418,6 +418,7 @@ static int nfs_sockaddr_cmp(const struct sockaddr *sa1,
 	}
 	return 0;
 }
+EXPORT_SYMBOL(nfs_sockaddr_cmp);
 
 /* Common match routine for v4.0 and v4.1 callback services */
 bool
@@ -567,6 +568,7 @@ int nfs4_check_client_ready(struct nfs_client *clp)
 		return -EPROTONOSUPPORT;
 	return 0;
 }
+EXPORT_SYMBOL(nfs4_check_client_ready);
 
 /*
  * Initialise the timeout values for a connection
@@ -1355,7 +1357,7 @@ error:
 /*
  * Set up an NFS4 client
  */
-static int nfs4_set_client(struct nfs_server *server,
+int nfs4_set_client(struct nfs_server *server,
 		const char *hostname,
 		const struct sockaddr *addr,
 		const size_t addrlen,
@@ -1398,6 +1400,7 @@ error:
 	dprintk("<-- nfs4_set_client() = xerror %d\n", error);
 	return error;
 }
+EXPORT_SYMBOL(nfs4_set_client);
 
 
 /*
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index cf9fdbd..869b388 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -148,6 +148,16 @@ extern struct nfs_server *nfs_clone_server(struct nfs_server *,
 					   struct nfs_fattr *);
 extern void nfs_mark_client_ready(struct nfs_client *clp, int state);
 extern int nfs4_check_client_ready(struct nfs_client *clp);
+extern int nfs_sockaddr_cmp(const struct sockaddr *sa1,
+		const struct sockaddr *sa2);
+extern int nfs4_set_client(struct nfs_server *server,
+		const char *hostname,
+		const struct sockaddr *addr,
+		const size_t addrlen,
+		const char *ip_addr,
+		rpc_authflavor_t authflavour,
+		int proto, const struct rpc_timeout *timeparms,
+		u32 minorversion);
 #ifdef CONFIG_PROC_FS
 extern int __init nfs_fs_proc_init(void);
 extern void nfs_fs_proc_exit(void);
@@ -213,6 +223,8 @@ extern const u32 nfs41_maxwrite_overhead;
 extern struct rpc_procinfo nfs4_procedures[];
 #endif
 
+extern int nfs4_recover_expired_lease(struct nfs_client *clp);
+
 /* proc.c */
 void nfs_close_context(struct nfs_open_context *ctx, int is_sync);
 
diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 8b1c4ad..6ec9957 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -66,6 +66,38 @@ filelayout_clear_layoutdriver(struct nfs_server *nfss)
 	return 0;
 }
 
+/* This function is used by the layout driver to calculate the
+ * offset of the file on the dserver based on whether the
+ * layout type is STRIPE_DENSE or STRIPE_SPARSE
+ */
+static loff_t
+filelayout_get_dserver_offset(struct pnfs_layout_segment *lseg, loff_t offset)
+{
+	struct nfs4_filelayout_segment *flseg = FILELAYOUT_LSEG(lseg);
+
+	switch (flseg->stripe_type) {
+	case STRIPE_SPARSE:
+		return offset;
+
+	case STRIPE_DENSE:
+	{
+		u32 stripe_width;
+		u64 tmp, off;
+		u32 unit = flseg->stripe_unit;
+
+		stripe_width = unit * flseg->dsaddr->stripe_count;
+		tmp = off = offset - flseg->pattern_offset;
+		do_div(tmp, stripe_width);
+		return tmp * unit + do_div(off, unit);
+	}
+	default:
+		BUG();
+	}
+
+	/* We should never get here... just to stop the gcc warning */
+	return 0;
+}
+
 /*
  * filelayout_check_layout()
  *
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index bbf60dd..f884b0c 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -83,9 +83,15 @@ FILELAYOUT_LSEG(struct pnfs_layout_segment *lseg)
 			    generic_hdr);
 }
 
+extern struct nfs_fh *
+nfs4_fl_select_ds_fh(struct pnfs_layout_segment *lseg, loff_t offset);
+
 extern void nfs4_fl_free_deviceid_callback(struct pnfs_deviceid_node *);
 extern void print_ds(struct nfs4_pnfs_ds *ds);
 extern void print_deviceid(struct nfs4_deviceid *dev_id);
+u32 nfs4_fl_calc_ds_index(struct pnfs_layout_segment *lseg, loff_t offset);
+struct nfs4_pnfs_ds *nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg,
+					u32 ds_idx);
 extern struct nfs4_file_layout_dsaddr *
 nfs4_fl_find_get_deviceid(struct nfs_client *, struct nfs4_deviceid *dev_id);
 struct nfs4_file_layout_dsaddr *
diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index f5c9b12..0059375 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -104,6 +104,114 @@ _data_server_lookup_locked(u32 ip_addr, u32 port)
 	return NULL;
 }
 
+/* Create an rpc to the data server defined in 'dev_list' */
+static int
+nfs4_pnfs_ds_create(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds)
+{
+	struct nfs_server	*tmp;
+	struct sockaddr_in	sin;
+	struct rpc_clnt		*mds_clnt = mds_srv->client;
+	struct nfs_client	*clp = mds_srv->nfs_client;
+	struct sockaddr		*mds_addr;
+	int err = 0;
+
+	dprintk("--> %s ip:port %x:%hu au_flavor %d\n", __func__,
+		ntohl(ds->ds_ip_addr), ntohs(ds->ds_port),
+		mds_clnt->cl_auth->au_flavor);
+
+	sin.sin_family = AF_INET;
+	sin.sin_addr.s_addr = ds->ds_ip_addr;
+	sin.sin_port = ds->ds_port;
+
+	/*
+	 * If this DS is also the MDS, use the MDS session only if the
+	 * MDS exchangeid flags show the EXCHGID4_FLAG_USE_PNFS_DS pNFS role.
+	 */
+	mds_addr = (struct sockaddr *)&clp->cl_addr;
+	if (nfs_sockaddr_cmp((struct sockaddr *)&sin, mds_addr)) {
+		if (!(clp->cl_exchange_flags & EXCHGID4_FLAG_USE_PNFS_DS)) {
+			printk(KERN_INFO
+			       "ip:port %x:%hu is not a pNFS Data Server\n",
+			       ntohl(ds->ds_ip_addr), ntohs(ds->ds_port));
+			err = -ENODEV;
+		} else {
+			atomic_inc(&clp->cl_count);
+			ds->ds_clp = clp;
+			dprintk("%s Using MDS Session for DS\n", __func__);
+		}
+		goto out;
+	}
+
+	/* Temporay server for nfs4_set_client */
+	tmp = kzalloc(sizeof(struct nfs_server), GFP_KERNEL);
+	if (!tmp)
+		goto out;
+
+	/*
+	 * Set a retrans, timeout interval, and authflavor equual to the MDS
+	 * values. Use the MDS nfs_client cl_ipaddr field so as to use the
+	 * same co_ownerid as the MDS.
+	 */
+	err = nfs4_set_client(tmp,
+			      mds_srv->nfs_client->cl_hostname,
+			      (struct sockaddr *)&sin,
+			      sizeof(struct sockaddr),
+			      mds_srv->nfs_client->cl_ipaddr,
+			      mds_clnt->cl_auth->au_flavor,
+			      IPPROTO_TCP,
+			      mds_clnt->cl_xprt->timeout,
+			      1 /* minorversion */);
+	if (err < 0)
+		goto out_free;
+
+	clp = tmp->nfs_client;
+
+	/* Ask for only the EXCHGID4_FLAG_USE_PNFS_DS pNFS role */
+	dprintk("%s EXCHANGE_ID for clp %p\n", __func__, clp);
+	clp->cl_exchange_flags = EXCHGID4_FLAG_USE_PNFS_DS;
+
+	err = nfs4_recover_expired_lease(clp);
+	if (!err)
+		err = nfs4_check_client_ready(clp);
+	if (err)
+		goto out_put;
+
+	if (!(clp->cl_exchange_flags & EXCHGID4_FLAG_USE_PNFS_DS)) {
+		printk(KERN_INFO "ip:port %x:%hu is not a pNFS Data Server\n",
+		       ntohl(ds->ds_ip_addr), ntohs(ds->ds_port));
+		err = -ENODEV;
+		goto out_put;
+	}
+	/*
+	 * Mask the (possibly) returned EXCHGID4_FLAG_USE_PNFS_MDS pNFS role
+	 * The is_ds_only_session depends on this.
+	 */
+	clp->cl_exchange_flags &= ~EXCHGID4_FLAG_USE_PNFS_MDS;
+	/*
+	 * Set DS lease equal to the MDS lease, renewal is scheduled in
+	 * create_session
+	 */
+	spin_lock(&mds_srv->nfs_client->cl_lock);
+	clp->cl_lease_time = mds_srv->nfs_client->cl_lease_time;
+	spin_unlock(&mds_srv->nfs_client->cl_lock);
+	clp->cl_last_renewal = jiffies;
+
+	clear_bit(NFS4CLNT_SESSION_RESET, &clp->cl_state);
+	ds->ds_clp = clp;
+
+	dprintk("%s: ip=%x, port=%hu, rpcclient %p\n", __func__,
+				ntohl(ds->ds_ip_addr), ntohs(ds->ds_port),
+				clp->cl_rpcclient);
+out_free:
+	kfree(tmp);
+out:
+	dprintk("%s Returns %d\n", __func__, err);
+	return err;
+out_put:
+	nfs_put_client(clp);
+	goto out_free;
+}
+
 static void
 destroy_ds(struct nfs4_pnfs_ds *ds)
 {
@@ -451,3 +559,72 @@ nfs4_fl_find_get_deviceid(struct nfs_client *clp, struct nfs4_deviceid *id)
 	return (d == NULL) ? NULL :
 		container_of(d, struct nfs4_file_layout_dsaddr, deviceid);
 }
+
+/*
+ * Want res = (offset - layout->pattern_offset)/ layout->stripe_unit
+ * Then: ((res + fsi) % dsaddr->stripe_count)
+ */
+static u32
+_nfs4_fl_calc_j_index(struct pnfs_layout_segment *lseg, loff_t offset)
+{
+	struct nfs4_filelayout_segment *flseg = FILELAYOUT_LSEG(lseg);
+	u64 tmp;
+
+	tmp = offset - flseg->pattern_offset;
+	do_div(tmp, flseg->stripe_unit);
+	tmp += flseg->first_stripe_index;
+	return do_div(tmp, flseg->dsaddr->stripe_count);
+}
+
+u32
+nfs4_fl_calc_ds_index(struct pnfs_layout_segment *lseg, loff_t offset)
+{
+	u32 j;
+
+	j = _nfs4_fl_calc_j_index(lseg, offset);
+	return FILELAYOUT_LSEG(lseg)->dsaddr->stripe_indices[j];
+}
+
+struct nfs_fh *
+nfs4_fl_select_ds_fh(struct pnfs_layout_segment *lseg, loff_t offset)
+{
+	struct nfs4_filelayout_segment *flseg = FILELAYOUT_LSEG(lseg);
+	u32 i;
+
+	if (flseg->stripe_type == STRIPE_SPARSE) {
+		if (flseg->num_fh == 1)
+			i = 0;
+		else if (flseg->num_fh == 0)
+			return NULL;
+		else
+			i = nfs4_fl_calc_ds_index(lseg, offset);
+	} else
+		i = _nfs4_fl_calc_j_index(lseg, offset);
+	return flseg->fh_array[i];
+}
+
+struct nfs4_pnfs_ds *
+nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
+{
+	struct nfs4_file_layout_dsaddr *dsaddr;
+
+	dsaddr = FILELAYOUT_LSEG(lseg)->dsaddr;
+	if (dsaddr->ds_list[ds_idx] == NULL) {
+		printk(KERN_ERR "%s: No data server for device id!\n",
+			__func__);
+		return NULL;
+	}
+
+	if (!dsaddr->ds_list[ds_idx]->ds_clp) {
+		int err;
+
+		err = nfs4_pnfs_ds_create(NFS_SERVER(lseg->pls_layout->plh_inode),
+					  dsaddr->ds_list[ds_idx]);
+		if (err) {
+			printk(KERN_ERR "%s nfs4_pnfs_ds_create error %d\n",
+			       __func__, err);
+			return NULL;
+		}
+	}
+	return dsaddr->ds_list[ds_idx];
+}
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 519b9bd..4d5bd81 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1577,9 +1577,8 @@ static int _nfs4_proc_open(struct nfs4_opendata *data)
 	return 0;
 }
 
-static int nfs4_recover_expired_lease(struct nfs_server *server)
+int nfs4_recover_expired_lease(struct nfs_client *clp)
 {
-	struct nfs_client *clp = server->nfs_client;
 	unsigned int loop;
 	int ret;
 
@@ -1595,6 +1594,7 @@ static int nfs4_recover_expired_lease(struct nfs_server *server)
 	}
 	return ret;
 }
+EXPORT_SYMBOL(nfs4_recover_expired_lease);
 
 /*
  * OPEN_EXPIRED:
@@ -1683,7 +1683,7 @@ static int _nfs4_do_open(struct inode *dir, struct path *path, fmode_t fmode, in
 		dprintk("nfs4_do_open: nfs4_get_state_owner failed!\n");
 		goto out_err;
 	}
-	status = nfs4_recover_expired_lease(server);
+	status = nfs4_recover_expired_lease(server->nfs_client);
 	if (status != 0)
 		goto err_put_state_owner;
 	if (path->dentry->d_inode != NULL)
@@ -5075,7 +5075,7 @@ int nfs4_init_session(struct nfs_server *server)
 	session->fc_attrs.max_rqst_sz = wsize + nfs41_maxwrite_overhead;
 	session->fc_attrs.max_resp_sz = rsize + nfs41_maxread_overhead;
 
-	ret = nfs4_recover_expired_lease(server);
+	ret = nfs4_recover_expired_lease(server->nfs_client);
 	if (!ret)
 		ret = nfs4_check_client_ready(clp);
 	return ret;
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 09/40] pnfs_submit: wave3 generic read
  2011-02-04 21:33               ` [PATCH 08/40] pnfs_submit: filelayout i/o helpers andros
@ 2011-02-04 21:33                 ` andros
  2011-02-04 21:33                   ` [PATCH 10/40] pnfs_submit: filelayout read andros
  0 siblings, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy
  Cc: linux-nfs, Andy Adamson, Andy Adamson, Boaz Harrosh,
	Dean Hildebrand, Fred Isaman, Fred Isaman, J. Bruce Fields,
	Mike Sager, Mingyang Guo, Ricardo Labiaga, Tao Guo

From: Andy Adamson <andros@netapp.com>

Reported-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Mingyang Guo <guomingyang@nrchpc.ac.cn>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/internal.h          |    4 ++
 fs/nfs/nfs4proc.c          |   15 ++++++-
 fs/nfs/pnfs.c              |   36 +++++++++++++++++
 fs/nfs/pnfs.h              |   24 ++++++++++++
 fs/nfs/read.c              |   90 ++++++++++++++++++++++++++++++-------------
 include/linux/nfs_iostat.h |    1 +
 include/linux/nfs_xdr.h    |   21 ++++++++++
 7 files changed, 161 insertions(+), 30 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 869b388..657b71c 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -274,6 +274,10 @@ extern int nfs4_get_rootfh(struct nfs_server *server, struct nfs_fh *mntfh);
 #endif
 
 /* read.c */
+extern int nfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
+			     const struct rpc_call_ops *call_ops);
+extern int pnfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
+			     const struct rpc_call_ops *call_ops);
 extern void nfs_read_prepare(struct rpc_task *task, void *calldata);
 
 /* write.c */
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 4d5bd81..49e89d8 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3076,19 +3076,28 @@ static int nfs4_proc_pathconf(struct nfs_server *server, struct nfs_fh *fhandle,
 static int nfs4_read_done(struct rpc_task *task, struct nfs_read_data *data)
 {
 	struct nfs_server *server = NFS_SERVER(data->inode);
+	struct nfs_client *client = server->nfs_client;
 
 	dprintk("--> %s\n", __func__);
 
+#ifdef CONFIG_NFS_V4_1
+	/* Is this a DS session */
+	if (data->fldata.ds_nfs_client) {
+		dprintk("%s DS read\n", __func__);
+		client = data->fldata.ds_nfs_client;
+	}
+#endif /* CONFIG_NFS_V4_1 */
+
 	if (!nfs4_sequence_done(task, &data->res.seq_res))
 		return -EAGAIN;
 
-	if (nfs4_async_handle_error(task, server, data->args.context->state, NULL) == -EAGAIN) {
-		nfs_restart_rpc(task, server->nfs_client);
+	if (nfs4_async_handle_error(task, server, data->args.context->state, client) == -EAGAIN) {
+		nfs_restart_rpc(task, client);
 		return -EAGAIN;
 	}
 
 	nfs_invalidate_atime(data->inode);
-	if (task->tk_status > 0)
+	if (task->tk_status > 0 && client == server->nfs_client)
 		renew_lease(server, data->timestamp);
 	return 0;
 }
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 1811204..d06e9ea 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -30,6 +30,7 @@
 #include <linux/nfs_fs.h>
 #include "internal.h"
 #include "pnfs.h"
+#include "iostat.h"
 
 #define NFSDBG_FACILITY		NFSDBG_PNFS
 
@@ -921,6 +922,41 @@ static void _pnfs_clear_lseg_from_pages(struct list_head *head)
 }
 
 /*
+ * Call the appropriate parallel I/O subsystem read function.
+ * If no I/O device driver exists, or one does match the returned
+ * fstype, then return a positive status for regular NFS processing.
+ */
+enum pnfs_try_status
+pnfs_try_to_read_data(struct nfs_read_data *rdata,
+		       const struct rpc_call_ops *call_ops)
+{
+	struct inode *inode = rdata->inode;
+	struct nfs_server *nfss = NFS_SERVER(inode);
+	struct pnfs_layout_segment *lseg = rdata->req->wb_lseg;
+	enum pnfs_try_status trypnfs;
+
+	rdata->pdata.call_ops = call_ops;
+
+	dprintk("%s: Reading ino:%lu %u@%llu\n",
+		__func__, inode->i_ino, rdata->args.count, rdata->args.offset);
+
+	get_lseg(lseg);
+
+	rdata->pdata.lseg = lseg;
+	trypnfs = nfss->pnfs_curr_ld->read_pagelist(rdata,
+		nfs_page_array_len(rdata->args.pgbase, rdata->args.count));
+	if (trypnfs == PNFS_NOT_ATTEMPTED) {
+		rdata->pdata.lseg = NULL;
+		put_lseg(lseg);
+		_pnfs_clear_lseg_from_pages(&rdata->pages);
+	} else {
+		nfs_inc_stats(inode, NFSIOS_PNFS_READ);
+	}
+	dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
+	return trypnfs;
+}
+
+/*
  * Device ID cache. Currently supports one layout type per struct nfs_client.
  * Add layout type to the lookup key to expand to support multiple types.
  */
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 7614c3b..2e231e3 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -45,6 +45,11 @@ struct pnfs_layout_segment {
 	struct pnfs_layout_hdr *pls_layout;
 };
 
+enum pnfs_try_status {
+	PNFS_ATTEMPTED     = 0,
+	PNFS_NOT_ATTEMPTED = 1,
+};
+
 #ifdef CONFIG_NFS_V4_1
 
 #define LAYOUT_NFSV4_1_MODULE_PREFIX "nfs-layouttype4"
@@ -70,6 +75,16 @@ struct pnfs_layoutdriver_type {
 
 	/* test for nfs page cache coalescing */
 	int (*pg_test)(struct nfs_pageio_descriptor *, struct nfs_page *, struct nfs_page *);
+
+	/* read and write pagelist should return just 0 (to indicate that
+	 * the layout code has taken control) or 1 (to indicate that the
+	 * layout code wishes to fall back to normal nfs.)  If 0 is returned,
+	 * information can be passed back through nfs_data->res and
+	 * nfs_data->task.tk_status, and the appropriate pnfs done function
+	 * MUST be called.
+	 */
+	enum pnfs_try_status
+	(*read_pagelist) (struct nfs_read_data *nfs_data, unsigned nr_pages);
 };
 
 struct pnfs_layout_hdr {
@@ -157,6 +172,8 @@ pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
 		   enum pnfs_iomode access_type);
 void set_pnfs_layoutdriver(struct nfs_server *, u32 id);
 void unset_pnfs_layoutdriver(struct nfs_server *);
+enum pnfs_try_status pnfs_try_to_read_data(struct nfs_read_data *,
+					    const struct rpc_call_ops *);
 void pnfs_pageio_init_read(struct nfs_pageio_descriptor *, struct inode *,
 			   struct nfs_open_context *, struct list_head *);
 int pnfs_layout_process(struct nfs4_layoutget *lgp);
@@ -222,6 +239,13 @@ pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
 	return NULL;
 }
 
+static inline enum pnfs_try_status
+pnfs_try_to_read_data(struct nfs_read_data *data,
+		      const struct rpc_call_ops *call_ops)
+{
+	return PNFS_NOT_ATTEMPTED;
+}
+
 static inline bool
 pnfs_roc(struct inode *ino)
 {
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 2eac0f1..79da5cb 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -18,8 +18,11 @@
 #include <linux/sunrpc/clnt.h>
 #include <linux/nfs_fs.h>
 #include <linux/nfs_page.h>
+#include <linux/smp_lock.h>
+#include <linux/module.h>
 
 #include <asm/system.h>
+#include <linux/module.h>
 #include "pnfs.h"
 
 #include "nfs4_fs.h"
@@ -157,24 +160,20 @@ static void nfs_readpage_release(struct nfs_page *req)
 	nfs_release_request(req);
 }
 
-/*
- * Set up the NFS read request struct
- */
-static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
-		const struct rpc_call_ops *call_ops,
-		unsigned int count, unsigned int offset)
+int nfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
+		      const struct rpc_call_ops *call_ops)
 {
-	struct inode *inode = req->wb_context->path.dentry->d_inode;
+	struct inode *inode = data->inode;
 	int swap_flags = IS_SWAPFILE(inode) ? NFS_RPC_SWAPFLAGS : 0;
 	struct rpc_task *task;
 	struct rpc_message msg = {
 		.rpc_argp = &data->args,
 		.rpc_resp = &data->res,
-		.rpc_cred = req->wb_context->cred,
+		.rpc_cred = data->cred,
 	};
 	struct rpc_task_setup task_setup_data = {
 		.task = &data->task,
-		.rpc_client = NFS_CLIENT(inode),
+		.rpc_client = clnt,
 		.rpc_message = &msg,
 		.callback_ops = call_ops,
 		.callback_data = data,
@@ -182,9 +181,46 @@ static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
 		.flags = RPC_TASK_ASYNC | swap_flags,
 	};
 
+	/* Set up the initial task struct. */
+	NFS_PROTO(inode)->read_setup(data, &msg);
+
+	dprintk("NFS: %5u initiated read call (req %s/%Ld, %u bytes @ offset %Lu)\n",
+			data->task.tk_pid,
+			inode->i_sb->s_id,
+			(long long)NFS_FILEID(inode),
+			data->args.count,
+			(unsigned long long)data->args.offset);
+
+	task = rpc_run_task(&task_setup_data);
+	if (IS_ERR(task))
+		return PTR_ERR(task);
+	rpc_put_task(task);
+	return 0;
+}
+EXPORT_SYMBOL(nfs_initiate_read);
+
+int pnfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
+		       const struct rpc_call_ops *call_ops)
+{
+	if (data->req->wb_lseg &&
+	    (pnfs_try_to_read_data(data, call_ops) == PNFS_ATTEMPTED))
+		return 0;
+
+	return nfs_initiate_read(data, clnt, call_ops);
+}
+
+/*
+ * Set up the NFS read request struct
+ */
+static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
+		const struct rpc_call_ops *call_ops,
+		unsigned int count, unsigned int offset)
+{
+	struct inode *inode = req->wb_context->path.dentry->d_inode;
+
 	data->req	  = req;
 	data->inode	  = inode;
-	data->cred	  = msg.rpc_cred;
+	data->cred	  = req->wb_context->cred;
 
 	data->args.fh     = NFS_FH(inode);
 	data->args.offset = req_offset(req) + offset;
@@ -199,21 +235,7 @@ static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
 	data->res.eof     = 0;
 	nfs_fattr_init(&data->fattr);
 
-	/* Set up the initial task struct. */
-	NFS_PROTO(inode)->read_setup(data, &msg);
-
-	dprintk("NFS: %5u initiated read call (req %s/%Ld, %u bytes @ offset %Lu)\n",
-			data->task.tk_pid,
-			inode->i_sb->s_id,
-			(long long)NFS_FILEID(inode),
-			count,
-			(unsigned long long)data->args.offset);
-
-	task = rpc_run_task(&task_setup_data);
-	if (IS_ERR(task))
-		return PTR_ERR(task);
-	rpc_put_task(task);
-	return 0;
+	return pnfs_initiate_read(data, NFS_CLIENT(inode), call_ops);
 }
 
 static void
@@ -357,7 +379,14 @@ static void nfs_readpage_retry(struct rpc_task *task, struct nfs_read_data *data
 {
 	struct nfs_readargs *argp = &data->args;
 	struct nfs_readres *resp = &data->res;
+	struct nfs_client *clp = NFS_SERVER(data->inode)->nfs_client;
 
+#ifdef CONFIG_NFS_V4_1
+	if (data->fldata.ds_nfs_client) {
+		dprintk("%s DS read\n", __func__);
+		clp = data->fldata.ds_nfs_client;
+	}
+#endif /* CONFIG_NFS_V4_1 */
 	if (resp->eof || resp->count == argp->count)
 		return;
 
@@ -371,7 +400,7 @@ static void nfs_readpage_retry(struct rpc_task *task, struct nfs_read_data *data
 	argp->offset += resp->count;
 	argp->pgbase += resp->count;
 	argp->count -= resp->count;
-	nfs_restart_rpc(task, NFS_SERVER(data->inode)->nfs_client);
+	nfs_restart_rpc(task, clp);
 }
 
 /*
@@ -412,13 +441,19 @@ static void nfs_readpage_release_partial(void *calldata)
 void nfs_read_prepare(struct rpc_task *task, void *calldata)
 {
 	struct nfs_read_data *data = calldata;
+	struct nfs4_session *ds_session = NULL;
 
-	if (nfs4_setup_sequence(NFS_SERVER(data->inode), NULL,
+	if (data->fldata.ds_nfs_client) {
+		dprintk("%s DS read\n", __func__);
+		ds_session = data->fldata.ds_nfs_client->cl_session;
+	}
+	if (nfs4_setup_sequence(NFS_SERVER(data->inode), ds_session,
 				&data->args.seq_args, &data->res.seq_res,
 				0, task))
 		return;
 	rpc_call_start(task);
 }
+EXPORT_SYMBOL(nfs_read_prepare);
 #endif /* CONFIG_NFS_V4_1 */
 
 static const struct rpc_call_ops nfs_read_partial_ops = {
@@ -637,6 +672,7 @@ int nfs_readpages(struct file *filp, struct address_space *mapping,
 	ret = read_cache_pages(mapping, pages, readpage_async_filler, &desc);
 
 	nfs_pageio_complete(&pgio);
+	put_lseg(pgio.pg_lseg);
 	npages = (pgio.pg_bytes_written + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
 	nfs_add_stats(inode, NFSIOS_READPAGES, npages);
 read_complete:
diff --git a/include/linux/nfs_iostat.h b/include/linux/nfs_iostat.h
index 68b10f5..37a1437 100644
--- a/include/linux/nfs_iostat.h
+++ b/include/linux/nfs_iostat.h
@@ -113,6 +113,7 @@ enum nfs_stat_eventcounters {
 	NFSIOS_SHORTREAD,
 	NFSIOS_SHORTWRITE,
 	NFSIOS_DELAY,
+	NFSIOS_PNFS_READ,
 	__NFSIOS_COUNTSMAX,
 };
 
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index b006857..bd84684 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1004,6 +1004,23 @@ struct nfs_page;
 
 #define NFS_PAGEVEC_SIZE	(8U)
 
+#if defined(CONFIG_NFS_V4_1)
+
+/* pnfs-specific data needed for read, write, and commit calls */
+struct pnfs_call_data {
+	struct pnfs_layout_segment *lseg;
+	const struct rpc_call_ops *call_ops;
+	u32			orig_count;	/* for retry via MDS */
+	u8			how;		/* for FLUSH_STABLE */
+};
+
+/* files layout-type specific data for read, write, and commit */
+struct pnfs_fl_call_data {
+	struct nfs_client	*ds_nfs_client;
+	__u64			orig_offset;
+};
+#endif /* CONFIG_NFS_V4_1 */
+
 struct nfs_read_data {
 	int			flags;
 	struct rpc_task		task;
@@ -1019,6 +1036,10 @@ struct nfs_read_data {
 #ifdef CONFIG_NFS_V4
 	unsigned long		timestamp;	/* For lease renewal */
 #endif
+#if defined(CONFIG_NFS_V4_1)
+	struct pnfs_call_data	pdata;
+	struct pnfs_fl_call_data fldata;
+#endif /* CONFIG_NFS_V4_1 */
 	struct page		*page_array[NFS_PAGEVEC_SIZE];
 };
 
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 10/40] pnfs_submit: filelayout read
  2011-02-04 21:33                 ` [PATCH 09/40] pnfs_submit: wave3 generic read andros
@ 2011-02-04 21:33                   ` andros
  2011-02-04 21:33                     ` [PATCH 11/40] pnfs_submit: increase NFS_MAX_FILE_IO_SIZE andros
  0 siblings, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy
  Cc: linux-nfs, Andy Adamson, Dean Hildebrand, Fred Isaman,
	Fred Isaman, Mingyang Guo, Oleg Drokin, Ricardo Labiaga,
	Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Mingyang Guo <guomingyang@nrchpc.ac.cn>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Tested-by: Guo Mingyang <guomingyang@nrchpc.ac.cn>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/nfs4filelayout.c |   88 +++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 88 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 6ec9957..3daf351 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -99,6 +99,93 @@ filelayout_get_dserver_offset(struct pnfs_layout_segment *lseg, loff_t offset)
 }
 
 /*
+ * Call ops for the async read/write cases
+ * In the case of dense layouts, the offset needs to be reset to its
+ * original value.
+ */
+static void filelayout_read_call_done(struct rpc_task *task, void *data)
+{
+	struct nfs_read_data *rdata = (struct nfs_read_data *)data;
+
+	if (rdata->fldata.orig_offset) {
+		dprintk("%s new off %llu orig offset %llu\n", __func__,
+			rdata->args.offset, rdata->fldata.orig_offset);
+		rdata->args.offset = rdata->fldata.orig_offset;
+	}
+
+	/* Note this may cause RPC to be resent */
+	rdata->pdata.call_ops->rpc_call_done(task, data);
+}
+
+static void filelayout_read_release(void *data)
+{
+	struct nfs_read_data *rdata = (struct nfs_read_data *)data;
+
+	put_lseg(rdata->pdata.lseg);
+	rdata->pdata.lseg = NULL;
+	rdata->pdata.call_ops->rpc_release(data);
+}
+
+struct rpc_call_ops filelayout_read_call_ops = {
+	.rpc_call_prepare = nfs_read_prepare,
+	.rpc_call_done = filelayout_read_call_done,
+	.rpc_release = filelayout_read_release,
+};
+
+/* Perform sync or async reads.
+ *
+ * An optimization for the NFS file layout driver
+ * allows the original read/write data structs to be passed in the
+ * last argument.
+ *
+ * TODO: join with write_pagelist?
+ */
+static enum pnfs_try_status
+filelayout_read_pagelist(struct nfs_read_data *data, unsigned nr_pages)
+{
+	struct pnfs_layout_segment *lseg = data->pdata.lseg;
+	struct nfs4_pnfs_ds *ds;
+	loff_t offset = data->args.offset;
+	u32 idx;
+	struct nfs_fh *fh;
+
+	dprintk("--> %s ino %lu nr_pages %d pgbase %u req %Zu@%llu\n",
+		__func__, data->inode->i_ino, nr_pages,
+		data->args.pgbase, (size_t)data->args.count, offset);
+
+	/* Retrieve the correct rpc_client for the byte range */
+	idx = nfs4_fl_calc_ds_index(lseg, offset);
+	ds = nfs4_fl_prepare_ds(lseg, idx);
+	if (!ds) {
+		printk(KERN_ERR "%s: prepare_ds failed, use MDS\n", __func__);
+		return PNFS_NOT_ATTEMPTED;
+	}
+	dprintk("%s USE DS:ip %x %hu\n", __func__,
+		ntohl(ds->ds_ip_addr), ntohs(ds->ds_port));
+
+	/* just try the first data server for the index..*/
+	data->fldata.ds_nfs_client = ds->ds_clp;
+	fh = nfs4_fl_select_ds_fh(lseg, offset);
+	if (fh)
+		data->args.fh = fh;
+
+	/*
+	 * Now get the file offset on the dserver
+	 * Set the read offset to this offset, and
+	 * save the original offset in orig_offset
+	 * In the case of aync reads, the offset will be reset in the
+	 * call_ops->rpc_call_done() routine.
+	 */
+	data->args.offset = filelayout_get_dserver_offset(lseg, offset);
+	data->fldata.orig_offset = offset;
+
+	/* Perform an asynchronous read */
+	nfs_initiate_read(data, ds->ds_clp->cl_rpcclient,
+			  &filelayout_read_call_ops);
+	return PNFS_ATTEMPTED;
+}
+
+/*
  * filelayout_check_layout()
  *
  * Make sure layout segment parameters are sane WRT the device.
@@ -320,6 +407,7 @@ static struct pnfs_layoutdriver_type filelayout_type = {
 	.alloc_lseg              = filelayout_alloc_lseg,
 	.free_lseg               = filelayout_free_lseg,
 	.pg_test                 = filelayout_pg_test,
+	.read_pagelist           = filelayout_read_pagelist,
 };
 
 static int __init nfs4filelayout_init(void)
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 11/40] pnfs_submit: increase NFS_MAX_FILE_IO_SIZE
  2011-02-04 21:33                   ` [PATCH 10/40] pnfs_submit: filelayout read andros
@ 2011-02-04 21:33                     ` andros
  2011-02-04 21:33                       ` [PATCH 12/40] pnfs_submit: enforce requested DS only pNFS role andros
  0 siblings, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: The pNFS Team <linux-nfs@vger.kernel.org>

For striping over several DSs we need larger I/Os.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 include/linux/nfs_xdr.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index bd84684..3ea43aa 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -10,7 +10,7 @@
  * support a megabyte or more.  The default is left at 4096 bytes, which is
  * reasonable for NFS over UDP.
  */
-#define NFS_MAX_FILE_IO_SIZE	(1048576U)
+#define NFS_MAX_FILE_IO_SIZE	(4U * 1048576U)
 #define NFS_DEF_FILE_IO_SIZE	(4096U)
 #define NFS_MIN_FILE_IO_SIZE	(1024U)
 
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 12/40] pnfs_submit: enforce requested DS only pNFS role
  2011-02-04 21:33                     ` [PATCH 11/40] pnfs_submit: increase NFS_MAX_FILE_IO_SIZE andros
@ 2011-02-04 21:33                       ` andros
  2011-02-04 21:33                         ` [PATCH 13/40] REVERT pnfs_submit-add-data-server-session-to-nfs4_setup_s.patch andros
  0 siblings, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Signed-off-by: Andy Adamson <andros@netapp.com>
[pnfs-submit: fail init_clientid if a DS is not really a DS]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/nfs4filelayoutdev.c |    5 -----
 fs/nfs/nfs4state.c         |   11 +++++++++++
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index 0059375..83b0ab8 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -183,11 +183,6 @@ nfs4_pnfs_ds_create(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds)
 		goto out_put;
 	}
 	/*
-	 * Mask the (possibly) returned EXCHGID4_FLAG_USE_PNFS_MDS pNFS role
-	 * The is_ds_only_session depends on this.
-	 */
-	clp->cl_exchange_flags &= ~EXCHGID4_FLAG_USE_PNFS_MDS;
-	/*
 	 * Set DS lease equal to the MDS lease, renewal is scheduled in
 	 * create_session
 	 */
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 49433aa..3cdbf3b 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -229,6 +229,7 @@ static int nfs4_begin_drain_session(struct nfs_client *clp)
 int nfs41_init_clientid(struct nfs_client *clp, struct rpc_cred *cred)
 {
 	int status;
+	u32 req_exchange_flags = clp->cl_exchange_flags;
 
 	nfs4_begin_drain_session(clp);
 	status = nfs4_proc_exchange_id(clp, cred);
@@ -237,6 +238,16 @@ int nfs41_init_clientid(struct nfs_client *clp, struct rpc_cred *cred)
 	status = nfs4_proc_create_session(clp);
 	if (status != 0)
 		goto out;
+	if (is_ds_only_session(req_exchange_flags)) {
+		clp->cl_exchange_flags &=
+		     ~(EXCHGID4_FLAG_USE_PNFS_MDS | EXCHGID4_FLAG_USE_NON_PNFS);
+		if (!is_ds_only_session(clp->cl_exchange_flags)) {
+			nfs4_destroy_session(clp->cl_session);
+			clp->cl_session = NULL;
+			status = -ENOTSUPP;
+			goto out;
+		}
+	}
 	nfs41_setup_state_renewal(clp);
 	nfs_mark_client_ready(clp, NFS_CS_READY);
 out:
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 13/40] REVERT pnfs_submit-add-data-server-session-to-nfs4_setup_s.patch
  2011-02-04 21:33                       ` [PATCH 12/40] pnfs_submit: enforce requested DS only pNFS role andros
@ 2011-02-04 21:33                         ` andros
  2011-02-04 21:33                           ` [PATCH 14/40] REVERT: pnfs_submit: update nfs4_async_handle_error for data server andros
  0 siblings, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Just use nfs41_setup_sequence for data servers.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4_fs.h  |    3 ---
 fs/nfs/nfs4proc.c |   17 +++++++----------
 fs/nfs/read.c     |    2 +-
 fs/nfs/unlink.c   |    4 ++--
 fs/nfs/write.c    |    2 +-
 5 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 28fda51..7a74740 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -250,12 +250,10 @@ static inline struct nfs4_session *nfs4_get_session(const struct nfs_server *ser
 }
 
 extern int nfs4_setup_sequence(const struct nfs_server *server,
-		struct nfs4_session *ds_session,
 		struct nfs4_sequence_args *args, struct nfs4_sequence_res *res,
 		int cache_reply, struct rpc_task *task);
 extern void nfs4_destroy_session(struct nfs4_session *session);
 extern struct nfs4_session *nfs4_alloc_session(struct nfs_client *clp);
-extern int nfs4_proc_exchange_id(struct nfs_client *, struct rpc_cred *);
 extern int nfs4_proc_create_session(struct nfs_client *);
 extern int nfs4_proc_destroy_session(struct nfs4_session *);
 extern int nfs4_init_session(struct nfs_server *server);
@@ -268,7 +266,6 @@ static inline struct nfs4_session *nfs4_get_session(const struct nfs_server *ser
 }
 
 static inline int nfs4_setup_sequence(const struct nfs_server *server,
-		struct nfs4_session *ds_session,
 		struct nfs4_sequence_args *args, struct nfs4_sequence_res *res,
 		int cache_reply, struct rpc_task *task)
 {
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 49e89d8..acaaed5 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -573,7 +573,6 @@ static int nfs41_setup_sequence(struct nfs4_session *session,
 }
 
 int nfs4_setup_sequence(const struct nfs_server *server,
-		struct nfs4_session *ds_session,
 			struct nfs4_sequence_args *args,
 			struct nfs4_sequence_res *res,
 			int cache_reply,
@@ -582,8 +581,6 @@ int nfs4_setup_sequence(const struct nfs_server *server,
 	struct nfs4_session *session = nfs4_get_session(server);
 	int ret = 0;
 
-	if (ds_session)
-		session = ds_session;
 	if (session == NULL) {
 		args->sa_session = NULL;
 		res->sr_session = NULL;
@@ -614,7 +611,7 @@ static void nfs41_call_sync_prepare(struct rpc_task *task, void *calldata)
 
 	dprintk("--> %s data->seq_server %p\n", __func__, data->seq_server);
 
-	if (nfs4_setup_sequence(data->seq_server, NULL, data->seq_args,
+	if (nfs4_setup_sequence(data->seq_server, data->seq_args,
 				data->seq_res, data->cache_reply, task))
 		return;
 	rpc_call_start(task);
@@ -1402,7 +1399,7 @@ static void nfs4_open_prepare(struct rpc_task *task, void *calldata)
 		nfs_copy_fh(&data->o_res.fh, data->o_arg.fh);
 	}
 	data->timestamp = jiffies;
-	if (nfs4_setup_sequence(data->o_arg.server, NULL,
+	if (nfs4_setup_sequence(data->o_arg.server,
 				&data->o_arg.seq_args,
 				&data->o_res.seq_res, 1, task))
 		return;
@@ -1953,7 +1950,7 @@ static void nfs4_close_prepare(struct rpc_task *task, void *data)
 
 	nfs_fattr_init(calldata->res.fattr);
 	calldata->timestamp = jiffies;
-	if (nfs4_setup_sequence(NFS_SERVER(calldata->inode), NULL,
+	if (nfs4_setup_sequence(NFS_SERVER(calldata->inode),
 				&calldata->arg.seq_args, &calldata->res.seq_res,
 				1, task))
 		return;
@@ -3669,7 +3666,7 @@ static void nfs4_delegreturn_prepare(struct rpc_task *task, void *data)
 
 	d_data = (struct nfs4_delegreturndata *)data;
 
-	if (nfs4_setup_sequence(d_data->res.server, NULL,
+	if (nfs4_setup_sequence(d_data->res.server,
 				&d_data->args.seq_args,
 				&d_data->res.seq_res, 1, task))
 		return;
@@ -3921,7 +3918,7 @@ static void nfs4_locku_prepare(struct rpc_task *task, void *data)
 		return;
 	}
 	calldata->timestamp = jiffies;
-	if (nfs4_setup_sequence(calldata->server, NULL,
+	if (nfs4_setup_sequence(calldata->server,
 				&calldata->arg.seq_args,
 				&calldata->res.seq_res, 1, task))
 		return;
@@ -4076,7 +4073,7 @@ static void nfs4_lock_prepare(struct rpc_task *task, void *calldata)
 	} else
 		data->arg.new_lock_owner = 0;
 	data->timestamp = jiffies;
-	if (nfs4_setup_sequence(data->server, NULL,
+	if (nfs4_setup_sequence(data->server,
 				&data->arg.seq_args,
 				&data->res.seq_res, 1, task))
 		return;
@@ -5345,7 +5342,7 @@ nfs4_layoutget_prepare(struct rpc_task *task, void *calldata)
 	 * However, that is not so catastrophic, and there seems
 	 * to be no way to prevent it completely.
 	 */
-	if (nfs4_setup_sequence(server, NULL, &lgp->args.seq_args,
+	if (nfs4_setup_sequence(server, &lgp->args.seq_args,
 				&lgp->res.seq_res, 0, task))
 		return;
 	if (pnfs_choose_layoutget_stateid(&lgp->args.stateid,
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 79da5cb..345e51e 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -447,7 +447,7 @@ void nfs_read_prepare(struct rpc_task *task, void *calldata)
 		dprintk("%s DS read\n", __func__);
 		ds_session = data->fldata.ds_nfs_client->cl_session;
 	}
-	if (nfs4_setup_sequence(NFS_SERVER(data->inode), ds_session,
+	if (nfs4_setup_sequence(NFS_SERVER(data->inode),
 				&data->args.seq_args, &data->res.seq_res,
 				0, task))
 		return;
diff --git a/fs/nfs/unlink.c b/fs/nfs/unlink.c
index 82dc70b..e313a51 100644
--- a/fs/nfs/unlink.c
+++ b/fs/nfs/unlink.c
@@ -113,7 +113,7 @@ void nfs_unlink_prepare(struct rpc_task *task, void *calldata)
 	struct nfs_unlinkdata *data = calldata;
 	struct nfs_server *server = NFS_SERVER(data->dir);
 
-	if (nfs4_setup_sequence(server, NULL, &data->args.seq_args,
+	if (nfs4_setup_sequence(server, &data->args.seq_args,
 				&data->res.seq_res, 1, task))
 		return;
 	rpc_call_start(task);
@@ -388,7 +388,7 @@ static void nfs_rename_prepare(struct rpc_task *task, void *calldata)
 	struct nfs_renamedata *data = calldata;
 	struct nfs_server *server = NFS_SERVER(data->old_dir);
 
-	if (nfs4_setup_sequence(server, NULL, &data->args.seq_args,
+	if (nfs4_setup_sequence(server, &data->args.seq_args,
 				&data->res.seq_res, 1, task))
 		return;
 	rpc_call_start(task);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 7f3c10a..6b87b03 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1048,7 +1048,7 @@ void nfs_write_prepare(struct rpc_task *task, void *calldata)
 {
 	struct nfs_write_data *data = calldata;
 
-	if (nfs4_setup_sequence(NFS_SERVER(data->inode), NULL,
+	if (nfs4_setup_sequence(NFS_SERVER(data->inode),
 				&data->args.seq_args,
 				&data->res.seq_res, 1, task))
 		return;
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 14/40] REVERT: pnfs_submit: update nfs4_async_handle_error for data server
  2011-02-04 21:33                         ` [PATCH 13/40] REVERT pnfs_submit-add-data-server-session-to-nfs4_setup_s.patch andros
@ 2011-02-04 21:33                           ` andros
  2011-02-04 21:33                             ` [PATCH 15/40] REVERT pnfs_submit: increase NFS_MAX_FILE_IO_SIZE andros
  0 siblings, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Except for is_ds_only_session declaration
Layout drivers will declare their own handlers for async i/o.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4proc.c |   36 +++++++++++++++---------------------
 1 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index acaaed5..9c50be7 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -70,7 +70,7 @@ struct nfs4_opendata;
 static int _nfs4_proc_open(struct nfs4_opendata *data);
 static int _nfs4_recover_proc_open(struct nfs4_opendata *data);
 static int nfs4_do_fsinfo(struct nfs_server *, struct nfs_fh *, struct nfs_fsinfo *);
-static int nfs4_async_handle_error(struct rpc_task *, const struct nfs_server *, struct nfs4_state *, struct nfs_client *);
+static int nfs4_async_handle_error(struct rpc_task *, const struct nfs_server *, struct nfs4_state *);
 static int _nfs4_proc_lookup(struct inode *dir, const struct qstr *name, struct nfs_fh *fhandle, struct nfs_fattr *fattr);
 static int _nfs4_proc_getattr(struct nfs_server *server, struct nfs_fh *fhandle, struct nfs_fattr *fattr);
 static int nfs4_do_setattr(struct inode *inode, struct rpc_cred *cred,
@@ -1898,7 +1898,7 @@ static void nfs4_close_done(struct rpc_task *task, void *data)
 			if (calldata->arg.fmode == 0)
 				break;
 		default:
-			if (nfs4_async_handle_error(task, server, state, NULL) == -EAGAIN)
+			if (nfs4_async_handle_error(task, server, state) == -EAGAIN)
 				rpc_restart_call_prepare(task);
 	}
 	nfs_release_seqid(calldata->arg.seqid);
@@ -2597,7 +2597,7 @@ static int nfs4_proc_unlink_done(struct rpc_task *task, struct inode *dir)
 
 	if (!nfs4_sequence_done(task, &res->seq_res))
 		return 0;
-	if (nfs4_async_handle_error(task, res->server, NULL, NULL) == -EAGAIN)
+	if (nfs4_async_handle_error(task, res->server, NULL) == -EAGAIN)
 		return 0;
 	update_changeattr(dir, &res->cinfo);
 	nfs_post_op_update_inode(dir, res->dir_attr);
@@ -2622,7 +2622,7 @@ static int nfs4_proc_rename_done(struct rpc_task *task, struct inode *old_dir,
 
 	if (!nfs4_sequence_done(task, &res->seq_res))
 		return 0;
-	if (nfs4_async_handle_error(task, res->server, NULL, NULL) == -EAGAIN)
+	if (nfs4_async_handle_error(task, res->server, NULL) == -EAGAIN)
 		return 0;
 
 	update_changeattr(old_dir, &res->old_cinfo);
@@ -3088,7 +3088,7 @@ static int nfs4_read_done(struct rpc_task *task, struct nfs_read_data *data)
 	if (!nfs4_sequence_done(task, &data->res.seq_res))
 		return -EAGAIN;
 
-	if (nfs4_async_handle_error(task, server, data->args.context->state, client) == -EAGAIN) {
+	if (nfs4_async_handle_error(task, server, data->args.context->state) == -EAGAIN) {
 		nfs_restart_rpc(task, client);
 		return -EAGAIN;
 	}
@@ -3112,7 +3112,7 @@ static int nfs4_write_done(struct rpc_task *task, struct nfs_write_data *data)
 	if (!nfs4_sequence_done(task, &data->res.seq_res))
 		return -EAGAIN;
 
-	if (nfs4_async_handle_error(task, NFS_SERVER(inode), data->args.context->state, NULL) == -EAGAIN) {
+	if (nfs4_async_handle_error(task, NFS_SERVER(inode), data->args.context->state) == -EAGAIN) {
 		nfs_restart_rpc(task, NFS_SERVER(inode)->nfs_client);
 		return -EAGAIN;
 	}
@@ -3141,7 +3141,7 @@ static int nfs4_commit_done(struct rpc_task *task, struct nfs_write_data *data)
 	if (!nfs4_sequence_done(task, &data->res.seq_res))
 		return -EAGAIN;
 
-	if (nfs4_async_handle_error(task, NFS_SERVER(inode), NULL, NULL) == -EAGAIN) {
+	if (nfs4_async_handle_error(task, NFS_SERVER(inode), NULL) == -EAGAIN) {
 		nfs_restart_rpc(task, NFS_SERVER(inode)->nfs_client);
 		return -EAGAIN;
 	}
@@ -3461,10 +3461,9 @@ static int nfs4_proc_set_acl(struct inode *inode, const void *buf, size_t buflen
 }
 
 static int
-nfs4_async_handle_error(struct rpc_task *task, const struct nfs_server *server, struct nfs4_state *state, struct nfs_client *clp)
+nfs4_async_handle_error(struct rpc_task *task, const struct nfs_server *server, struct nfs4_state *state)
 {
-	if (!clp)
-		clp = server->nfs_client;
+	struct nfs_client *clp = server->nfs_client;
 
 	if (task->tk_status >= 0)
 		return 0;
@@ -3488,16 +3487,14 @@ nfs4_async_handle_error(struct rpc_task *task, const struct nfs_server *server,
 		case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION:
 		case -NFS4ERR_SEQ_FALSE_RETRY:
 		case -NFS4ERR_SEQ_MISORDERED:
-			dprintk("%s ERROR %d, Reset session. Exchangeid "
-				"flags 0x%x\n", __func__, task->tk_status,
-				clp->cl_exchange_flags);
+			dprintk("%s ERROR %d, Reset session\n", __func__,
+				task->tk_status);
 			nfs4_schedule_state_recovery(clp);
 			task->tk_status = 0;
 			return -EAGAIN;
 #endif /* CONFIG_NFS_V4_1 */
 		case -NFS4ERR_DELAY:
-			if (server)
-				nfs_inc_server_stats(server, NFSIOS_DELAY);
+			nfs_inc_server_stats(server, NFSIOS_DELAY);
 		case -NFS4ERR_GRACE:
 		case -EKEYEXPIRED:
 			rpc_delay(task, NFS4_POLL_RETRY_MAX);
@@ -3510,8 +3507,6 @@ nfs4_async_handle_error(struct rpc_task *task, const struct nfs_server *server,
 	task->tk_status = nfs4_map_errors(task->tk_status);
 	return 0;
 do_state_recovery:
-	if (is_ds_only_client(clp))
-		return 0;
 	rpc_sleep_on(&clp->cl_rpcwaitq, task, NULL);
 	nfs4_schedule_state_recovery(clp);
 	if (test_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state) == 0)
@@ -3645,8 +3640,7 @@ static void nfs4_delegreturn_done(struct rpc_task *task, void *calldata)
 		renew_lease(data->res.server, data->timestamp);
 		break;
 	default:
-		if (nfs4_async_handle_error(task, data->res.server, NULL, NULL)
-				== -EAGAIN) {
+		if (nfs4_async_handle_error(task, data->res.server, NULL) == -EAGAIN) {
 			nfs_restart_rpc(task, data->res.server->nfs_client);
 			return;
 		}
@@ -3900,7 +3894,7 @@ static void nfs4_locku_done(struct rpc_task *task, void *data)
 		case -NFS4ERR_EXPIRED:
 			break;
 		default:
-			if (nfs4_async_handle_error(task, calldata->server, NULL, NULL) == -EAGAIN)
+			if (nfs4_async_handle_error(task, calldata->server, NULL) == -EAGAIN)
 				nfs_restart_rpc(task,
 						 calldata->server->nfs_client);
 	}
@@ -5372,7 +5366,7 @@ static void nfs4_layoutget_done(struct rpc_task *task, void *calldata)
 		task->tk_status = -NFS4ERR_DELAY;
 		/* Fall through */
 	default:
-		if (nfs4_async_handle_error(task, server, NULL, NULL) == -EAGAIN) {
+		if (nfs4_async_handle_error(task, server, NULL) == -EAGAIN) {
 			rpc_restart_call_prepare(task);
 			return;
 		}
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 15/40] REVERT pnfs_submit: increase NFS_MAX_FILE_IO_SIZE
  2011-02-04 21:33                           ` [PATCH 14/40] REVERT: pnfs_submit: update nfs4_async_handle_error for data server andros
@ 2011-02-04 21:33                             ` andros
  2011-02-04 21:33                               ` [PATCH 16/40] REVERT pnfs_submit: enforce requested DS only pNFS role andros
  0 siblings, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

We need to demonstrate that this is useful. Revisit as a 'B' item.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 include/linux/nfs_xdr.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 3ea43aa..bd84684 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -10,7 +10,7 @@
  * support a megabyte or more.  The default is left at 4096 bytes, which is
  * reasonable for NFS over UDP.
  */
-#define NFS_MAX_FILE_IO_SIZE	(4U * 1048576U)
+#define NFS_MAX_FILE_IO_SIZE	(1048576U)
 #define NFS_DEF_FILE_IO_SIZE	(4096U)
 #define NFS_MIN_FILE_IO_SIZE	(1024U)
 
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 16/40] REVERT pnfs_submit: enforce requested DS only pNFS role
  2011-02-04 21:33                             ` [PATCH 15/40] REVERT pnfs_submit: increase NFS_MAX_FILE_IO_SIZE andros
@ 2011-02-04 21:33                               ` andros
  2011-02-04 21:33                                 ` [PATCH 17/40] SQUASHME pnfs-submit wave3 remove is_ds_only_session andros
  0 siblings, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

We no longer ask for any role. We check for DS only in nfs4_set_ds_client.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4state.c |   11 -----------
 1 files changed, 0 insertions(+), 11 deletions(-)

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 3cdbf3b..49433aa 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -229,7 +229,6 @@ static int nfs4_begin_drain_session(struct nfs_client *clp)
 int nfs41_init_clientid(struct nfs_client *clp, struct rpc_cred *cred)
 {
 	int status;
-	u32 req_exchange_flags = clp->cl_exchange_flags;
 
 	nfs4_begin_drain_session(clp);
 	status = nfs4_proc_exchange_id(clp, cred);
@@ -238,16 +237,6 @@ int nfs41_init_clientid(struct nfs_client *clp, struct rpc_cred *cred)
 	status = nfs4_proc_create_session(clp);
 	if (status != 0)
 		goto out;
-	if (is_ds_only_session(req_exchange_flags)) {
-		clp->cl_exchange_flags &=
-		     ~(EXCHGID4_FLAG_USE_PNFS_MDS | EXCHGID4_FLAG_USE_NON_PNFS);
-		if (!is_ds_only_session(clp->cl_exchange_flags)) {
-			nfs4_destroy_session(clp->cl_session);
-			clp->cl_session = NULL;
-			status = -ENOTSUPP;
-			goto out;
-		}
-	}
 	nfs41_setup_state_renewal(clp);
 	nfs_mark_client_ready(clp, NFS_CS_READY);
 out:
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 17/40] SQUASHME pnfs-submit wave3 remove is_ds_only_session
  2011-02-04 21:33                               ` [PATCH 16/40] REVERT pnfs_submit: enforce requested DS only pNFS role andros
@ 2011-02-04 21:33                                 ` andros
  2011-02-04 21:33                                   ` [PATCH 18/40] SQUASHME pnfs-submit: wave3 make pnfs_initiate_read static andros
  0 siblings, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Remove the un-used is_ds_only_session. Move private function is_ds_only_client.
Use EXCHGID4_FLAG_MASK_PNFS which includes the NON_PNFS role.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4_fs.h          |   13 +++++++++++++
 include/linux/nfs4.h      |    7 -------
 include/linux/nfs_fs_sb.h |   10 ----------
 3 files changed, 13 insertions(+), 17 deletions(-)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 7a74740..5d84642 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -259,6 +259,13 @@ extern int nfs4_proc_destroy_session(struct nfs4_session *);
 extern int nfs4_init_session(struct nfs_server *server);
 extern int nfs4_proc_get_lease_time(struct nfs_client *clp,
 		struct nfs_fsinfo *fsinfo);
+
+static inline bool
+is_ds_only_client(struct nfs_client *clp)
+{
+	return (clp->cl_exchange_flags & EXCHGID4_FLAG_MASK_PNFS) ==
+		EXCHGID4_FLAG_USE_PNFS_DS;
+}
 #else /* CONFIG_NFS_v4_1 */
 static inline struct nfs4_session *nfs4_get_session(const struct nfs_server *server)
 {
@@ -276,6 +283,12 @@ static inline int nfs4_init_session(struct nfs_server *server)
 {
 	return 0;
 }
+
+static inline bool
+is_ds_only_client(struct nfs_client *clp)
+{
+	return false;
+}
 #endif /* CONFIG_NFS_V4_1 */
 
 extern const struct nfs4_minor_version_ops *nfs_v4_minor_ops[];
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index b32a792..134716e 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -131,13 +131,6 @@
 #define EXCHGID4_FLAG_MASK_A			0x40070103
 #define EXCHGID4_FLAG_MASK_R			0x80070103
 
-static inline bool
-is_ds_only_session(u32 exchange_flags)
-{
-	u32 mask = EXCHGID4_FLAG_USE_PNFS_DS | EXCHGID4_FLAG_USE_PNFS_MDS;
-	return (exchange_flags & mask) == EXCHGID4_FLAG_USE_PNFS_DS;
-}
-
 #define SEQ4_STATUS_CB_PATH_DOWN		0x00000001
 #define SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING	0x00000002
 #define SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED	0x00000004
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 017f835..b197563 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -87,16 +87,6 @@ struct nfs_client {
 #endif
 };
 
-static inline bool
-is_ds_only_client(struct nfs_client *clp)
-{
-#ifdef CONFIG_NFS_V4_1
-	return is_ds_only_session(clp->cl_exchange_flags);
-#else
-	return false;
-#endif
-}
-
 /*
  * NFS client parameters stored in the superblock.
  */
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 18/40] SQUASHME pnfs-submit: wave3 make pnfs_initiate_read static
  2011-02-04 21:33                                 ` [PATCH 17/40] SQUASHME pnfs-submit wave3 remove is_ds_only_session andros
@ 2011-02-04 21:33                                   ` andros
  2011-02-04 21:33                                     ` [PATCH 19/40] SQUASHME pnfs-submit wave3 filelayout read pagelist cleanup andros
  0 siblings, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

into: pnfs_submit: generic read

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/internal.h |    2 --
 fs/nfs/read.c     |    2 +-
 2 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 657b71c..5c156d3 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -276,8 +276,6 @@ extern int nfs4_get_rootfh(struct nfs_server *server, struct nfs_fh *mntfh);
 /* read.c */
 extern int nfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
 			     const struct rpc_call_ops *call_ops);
-extern int pnfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
-			     const struct rpc_call_ops *call_ops);
 extern void nfs_read_prepare(struct rpc_task *task, void *calldata);
 
 /* write.c */
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 345e51e..43eb6a2 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -199,7 +199,7 @@ int nfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
 }
 EXPORT_SYMBOL(nfs_initiate_read);
 
-int pnfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
+static int pnfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
 		       const struct rpc_call_ops *call_ops)
 {
 	if (data->req->wb_lseg &&
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 19/40] SQUASHME pnfs-submit wave3 filelayout read pagelist cleanup
  2011-02-04 21:33                                   ` [PATCH 18/40] SQUASHME pnfs-submit: wave3 make pnfs_initiate_read static andros
@ 2011-02-04 21:33                                     ` andros
  2011-02-04 21:33                                       ` [PATCH 20/40] SQUASHME pnfs-submit wave3 remove nr_pages from read_pagelist andros
  2011-02-04 21:44                                       ` [PATCH 19/40] SQUASHME pnfs-submit wave3 filelayout read pagelist cleanup Fred Isaman
  0 siblings, 2 replies; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayout.c |   19 ++-----------------
 1 files changed, 2 insertions(+), 17 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 3daf351..dce90a0 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -132,14 +132,6 @@ struct rpc_call_ops filelayout_read_call_ops = {
 	.rpc_release = filelayout_read_release,
 };
 
-/* Perform sync or async reads.
- *
- * An optimization for the NFS file layout driver
- * allows the original read/write data structs to be passed in the
- * last argument.
- *
- * TODO: join with write_pagelist?
- */
 static enum pnfs_try_status
 filelayout_read_pagelist(struct nfs_read_data *data, unsigned nr_pages)
 {
@@ -149,8 +141,8 @@ filelayout_read_pagelist(struct nfs_read_data *data, unsigned nr_pages)
 	u32 idx;
 	struct nfs_fh *fh;
 
-	dprintk("--> %s ino %lu nr_pages %d pgbase %u req %Zu@%llu\n",
-		__func__, data->inode->i_ino, nr_pages,
+	dprintk("--> %s ino %lu pgbase %u req %Zu@%llu\n",
+		__func__, data->inode->i_ino,
 		data->args.pgbase, (size_t)data->args.count, offset);
 
 	/* Retrieve the correct rpc_client for the byte range */
@@ -169,13 +161,6 @@ filelayout_read_pagelist(struct nfs_read_data *data, unsigned nr_pages)
 	if (fh)
 		data->args.fh = fh;
 
-	/*
-	 * Now get the file offset on the dserver
-	 * Set the read offset to this offset, and
-	 * save the original offset in orig_offset
-	 * In the case of aync reads, the offset will be reset in the
-	 * call_ops->rpc_call_done() routine.
-	 */
 	data->args.offset = filelayout_get_dserver_offset(lseg, offset);
 	data->fldata.orig_offset = offset;
 
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 20/40] SQUASHME pnfs-submit wave3 remove nr_pages from read_pagelist
  2011-02-04 21:33                                     ` [PATCH 19/40] SQUASHME pnfs-submit wave3 filelayout read pagelist cleanup andros
@ 2011-02-04 21:33                                       ` andros
  2011-02-04 21:33                                         ` [PATCH 21/40] SQUASHME pnfs-submit wave3 add comment to nfs4_fl_prepare_ds_fh andros
  2011-02-04 21:44                                       ` [PATCH 19/40] SQUASHME pnfs-submit wave3 filelayout read pagelist cleanup Fred Isaman
  1 sibling, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayout.c |    2 +-
 fs/nfs/pnfs.c           |    3 +--
 fs/nfs/pnfs.h           |    2 +-
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index dce90a0..2b5a38e 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -133,7 +133,7 @@ struct rpc_call_ops filelayout_read_call_ops = {
 };
 
 static enum pnfs_try_status
-filelayout_read_pagelist(struct nfs_read_data *data, unsigned nr_pages)
+filelayout_read_pagelist(struct nfs_read_data *data)
 {
 	struct pnfs_layout_segment *lseg = data->pdata.lseg;
 	struct nfs4_pnfs_ds *ds;
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index d06e9ea..4c49109 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -943,8 +943,7 @@ pnfs_try_to_read_data(struct nfs_read_data *rdata,
 	get_lseg(lseg);
 
 	rdata->pdata.lseg = lseg;
-	trypnfs = nfss->pnfs_curr_ld->read_pagelist(rdata,
-		nfs_page_array_len(rdata->args.pgbase, rdata->args.count));
+	trypnfs = nfss->pnfs_curr_ld->read_pagelist(rdata);
 	if (trypnfs == PNFS_NOT_ATTEMPTED) {
 		rdata->pdata.lseg = NULL;
 		put_lseg(lseg);
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 2e231e3..cbbcdfa 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -84,7 +84,7 @@ struct pnfs_layoutdriver_type {
 	 * MUST be called.
 	 */
 	enum pnfs_try_status
-	(*read_pagelist) (struct nfs_read_data *nfs_data, unsigned nr_pages);
+	(*read_pagelist) (struct nfs_read_data *nfs_data);
 };
 
 struct pnfs_layout_hdr {
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 21/40] SQUASHME pnfs-submit wave3 add comment to nfs4_fl_prepare_ds_fh
  2011-02-04 21:33                                       ` [PATCH 20/40] SQUASHME pnfs-submit wave3 remove nr_pages from read_pagelist andros
@ 2011-02-04 21:33                                         ` andros
  2011-02-04 21:33                                           ` [PATCH 22/40] SQUASHME pnfs-submit wave3 move BUG outside of switch andros
  0 siblings, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Signed-off-by: Andy Adamson <andros@netapp.com>

	tag-out
---
 fs/nfs/nfs4filelayoutdev.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index 83b0ab8..b1290ca 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -590,6 +590,7 @@ nfs4_fl_select_ds_fh(struct pnfs_layout_segment *lseg, loff_t offset)
 		if (flseg->num_fh == 1)
 			i = 0;
 		else if (flseg->num_fh == 0)
+			/* Use the MDS OPEN fh set in nfs_read_rpcsetup */
 			return NULL;
 		else
 			i = nfs4_fl_calc_ds_index(lseg, offset);
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 22/40] SQUASHME pnfs-submit wave3 move BUG outside of switch
  2011-02-04 21:33                                         ` [PATCH 21/40] SQUASHME pnfs-submit wave3 add comment to nfs4_fl_prepare_ds_fh andros
@ 2011-02-04 21:33                                           ` andros
  2011-02-04 21:33                                             ` [PATCH 23/40] SQUASHME pnfs-submit wave3 new function for ds expired lease andros
  0 siblings, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayout.c |    5 +----
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 2b5a38e..d925af6 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -90,12 +90,9 @@ filelayout_get_dserver_offset(struct pnfs_layout_segment *lseg, loff_t offset)
 		do_div(tmp, stripe_width);
 		return tmp * unit + do_div(off, unit);
 	}
-	default:
-		BUG();
 	}
 
-	/* We should never get here... just to stop the gcc warning */
-	return 0;
+	BUG();
 }
 
 /*
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 23/40] SQUASHME pnfs-submit wave3 new function for ds expired lease
  2011-02-04 21:33                                           ` [PATCH 22/40] SQUASHME pnfs-submit wave3 move BUG outside of switch andros
@ 2011-02-04 21:33                                             ` andros
  2011-02-04 21:33                                               ` [PATCH 24/40] NFS move nfs_client initialization into nfs_get_client andros
  2011-02-04 21:51                                               ` [PATCH 23/40] SQUASHME pnfs-submit wave3 new function for ds expired lease Fred Isaman
  0 siblings, 2 replies; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4proc.c |   11 ++++++++---
 1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 9c50be7..fb22cbf 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1574,7 +1574,7 @@ static int _nfs4_proc_open(struct nfs4_opendata *data)
 	return 0;
 }
 
-int nfs4_recover_expired_lease(struct nfs_client *clp)
+static int nfs4_client_recover_expired_lease(struct nfs_client *clp)
 {
 	unsigned int loop;
 	int ret;
@@ -1593,6 +1593,11 @@ int nfs4_recover_expired_lease(struct nfs_client *clp)
 }
 EXPORT_SYMBOL(nfs4_recover_expired_lease);
 
+static int nfs4_recover_expired_lease(struct nfs_server *server)
+{
+	return nfs4_client_recover_expired_lease(server->nfs_client);
+}
+
 /*
  * OPEN_EXPIRED:
  * 	reclaim state on the server after a network partition.
@@ -1680,7 +1685,7 @@ static int _nfs4_do_open(struct inode *dir, struct path *path, fmode_t fmode, in
 		dprintk("nfs4_do_open: nfs4_get_state_owner failed!\n");
 		goto out_err;
 	}
-	status = nfs4_recover_expired_lease(server->nfs_client);
+	status = nfs4_recover_expired_lease(server);
 	if (status != 0)
 		goto err_put_state_owner;
 	if (path->dentry->d_inode != NULL)
@@ -5075,7 +5080,7 @@ int nfs4_init_session(struct nfs_server *server)
 	session->fc_attrs.max_rqst_sz = wsize + nfs41_maxwrite_overhead;
 	session->fc_attrs.max_resp_sz = rsize + nfs41_maxread_overhead;
 
-	ret = nfs4_recover_expired_lease(server->nfs_client);
+	ret = nfs4_recover_expired_lease(server);
 	if (!ret)
 		ret = nfs4_check_client_ready(clp);
 	return ret;
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 24/40] NFS move nfs_client initialization into nfs_get_client
  2011-02-04 21:33                                             ` [PATCH 23/40] SQUASHME pnfs-submit wave3 new function for ds expired lease andros
@ 2011-02-04 21:33                                               ` andros
  2011-02-04 21:33                                                 ` [PATCH 25/40] pnfs-submit: wave3 refactor dataserver client setup andros
  2011-02-04 21:51                                               ` [PATCH 23/40] SQUASHME pnfs-submit wave3 new function for ds expired lease Fred Isaman
  1 sibling, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Now nfs_get_client returns an nfs_client ready to be used no matter if it was
found or created.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/client.c |   66 ++++++++++++++++++++++++++++++++++++++----------------
 1 files changed, 46 insertions(+), 20 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index ea2d032..5cfcd40 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -81,6 +81,15 @@ retry:
 }
 #endif /* CONFIG_NFS_V4 */
 
+static int nfs4_init_client(struct nfs_client *clp,
+		const struct rpc_timeout *timeparms,
+		const char *ip_addr,
+		rpc_authflavor_t authflavour,
+		int noresvport);
+static int nfs_init_client(struct nfs_client *clp,
+			   const struct rpc_timeout *timeparms,
+			   int noresvport);
+
 /*
  * RPC cruft for NFS
  */
@@ -482,7 +491,11 @@ static struct nfs_client *nfs_match_client(const struct nfs_client_initdata *dat
  * Look up a client by IP address and protocol version
  * - creates a new record if one doesn't yet exist
  */
-static struct nfs_client *nfs_get_client(const struct nfs_client_initdata *cl_init)
+static struct nfs_client *nfs_get_client(struct nfs_client_initdata *cl_init,
+					 const struct rpc_timeout *timeparms,
+					 const char *ip_addr,
+					 rpc_authflavor_t authflavour,
+					 int noresvport)
 {
 	struct nfs_client *clp, *new = NULL;
 	int error;
@@ -513,6 +526,17 @@ install_client:
 	clp = new;
 	list_add(&clp->cl_share_link, &nfs_client_list);
 	spin_unlock(&nfs_client_lock);
+
+	if (cl_init->rpc_ops->version == 4)
+		error = nfs4_init_client(clp, timeparms, ip_addr, authflavour,
+					 noresvport);
+	else
+		error = nfs_init_client(clp, timeparms, noresvport);
+
+	if (error < 0) {
+		nfs_put_client(clp);
+		return ERR_PTR(error);
+	}
 	dprintk("--> nfs_get_client() = %p [new]\n", clp);
 	return clp;
 
@@ -771,7 +795,7 @@ static int nfs_init_server_rpcclient(struct nfs_server *server,
  */
 static int nfs_init_client(struct nfs_client *clp,
 			   const struct rpc_timeout *timeparms,
-			   const struct nfs_parsed_mount_data *data)
+			   int noresvport)
 {
 	int error;
 
@@ -786,7 +810,7 @@ static int nfs_init_client(struct nfs_client *clp,
 	 * - RFC 2623, sec 2.3.2
 	 */
 	error = nfs_create_rpc_client(clp, timeparms, RPC_AUTH_UNIX,
-				      0, data->flags & NFS_MOUNT_NORESVPORT);
+				      0, noresvport);
 	if (error < 0)
 		goto error;
 	nfs_mark_client_ready(clp, NFS_CS_READY);
@@ -822,19 +846,17 @@ static int nfs_init_server(struct nfs_server *server,
 		cl_init.rpc_ops = &nfs_v3_clientops;
 #endif
 
+	nfs_init_timeout_values(&timeparms, data->nfs_server.protocol,
+			data->timeo, data->retrans);
+
 	/* Allocate or find a client reference we can use */
-	clp = nfs_get_client(&cl_init);
+	clp = nfs_get_client(&cl_init, &timeparms, NULL, RPC_AUTH_UNIX,
+			     data->flags & NFS_MOUNT_NORESVPORT);
 	if (IS_ERR(clp)) {
 		dprintk("<-- nfs_init_server() = error %ld\n", PTR_ERR(clp));
 		return PTR_ERR(clp);
 	}
 
-	nfs_init_timeout_values(&timeparms, data->nfs_server.protocol,
-			data->timeo, data->retrans);
-	error = nfs_init_client(clp, &timeparms, data);
-	if (error < 0)
-		goto error;
-
 	server->nfs_client = clp;
 
 	/* Initialise the client representation from the mount data */
@@ -1313,7 +1335,7 @@ static int nfs4_init_client(struct nfs_client *clp,
 		const struct rpc_timeout *timeparms,
 		const char *ip_addr,
 		rpc_authflavor_t authflavour,
-		int flags)
+		int noresvport)
 {
 	int error;
 
@@ -1327,7 +1349,7 @@ static int nfs4_init_client(struct nfs_client *clp,
 	clp->rpc_ops = &nfs_v4_clientops;
 
 	error = nfs_create_rpc_client(clp, timeparms, authflavour,
-				      1, flags & NFS_MOUNT_NORESVPORT);
+				      1, noresvport);
 	if (error < 0)
 		goto error;
 	strlcpy(clp->cl_ipaddr, ip_addr, sizeof(clp->cl_ipaddr));
@@ -1380,22 +1402,16 @@ int nfs4_set_client(struct nfs_server *server,
 	dprintk("--> nfs4_set_client()\n");
 
 	/* Allocate or find a client reference we can use */
-	clp = nfs_get_client(&cl_init);
+	clp = nfs_get_client(&cl_init, timeparms, ip_addr, authflavour,
+			     server->flags & NFS_MOUNT_NORESVPORT);
 	if (IS_ERR(clp)) {
 		error = PTR_ERR(clp);
 		goto error;
 	}
-	error = nfs4_init_client(clp, timeparms, ip_addr, authflavour,
-					server->flags);
-	if (error < 0)
-		goto error_put;
 
 	server->nfs_client = clp;
 	dprintk("<-- nfs4_set_client() = 0 [new %p]\n", clp);
 	return 0;
-
-error_put:
-	nfs_put_client(clp);
 error:
 	dprintk("<-- nfs4_set_client() = xerror %d\n", error);
 	return error;
@@ -1614,6 +1630,16 @@ error:
 	return ERR_PTR(error);
 }
 
+#else /* CONFIG_NFS_V4 */
+static int nfs4_init_client(struct nfs_client *clp,
+			    const struct rpc_timeout *timeparms,
+			    const char *ip_addr,
+			    rpc_authflavor_t authflavour,
+			    int noresvport)
+{
+	return -EPROTONOSUPPORT;
+}
+
 #endif /* CONFIG_NFS_V4 */
 
 /*
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 25/40] pnfs-submit: wave3 refactor dataserver client setup
  2011-02-04 21:33                                               ` [PATCH 24/40] NFS move nfs_client initialization into nfs_get_client andros
@ 2011-02-04 21:33                                                 ` andros
  2011-02-04 21:33                                                   ` [PATCH 26/40] pnfs-submit: wave3 refactor data server session initialization andros
  0 siblings, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Follow nfs4_set_client convention.

Once a new nfs_client is on the nfs_client_list, the nfs_client cl_cons_state
serializes access to creating an nfs_client struct with matching properties.

Use the new nfs_get_client() that initializes new clients.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/client.c            |   48 +++++++++++++++++++++++++---
 fs/nfs/internal.h          |   13 ++------
 fs/nfs/nfs4filelayoutdev.c |   76 +++++++++++++-------------------------------
 3 files changed, 68 insertions(+), 69 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 5cfcd40..9e07586 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -413,7 +413,7 @@ static int nfs_sockaddr_match_ipaddr(const struct sockaddr *sa1,
  * Test if two socket addresses represent the same actual socket,
  * by comparing (only) relevant fields, including the port number.
  */
-int nfs_sockaddr_cmp(const struct sockaddr *sa1,
+static int nfs_sockaddr_cmp(const struct sockaddr *sa1,
 			    const struct sockaddr *sa2)
 {
 	if (sa1->sa_family != sa2->sa_family)
@@ -427,7 +427,6 @@ int nfs_sockaddr_cmp(const struct sockaddr *sa1,
 	}
 	return 0;
 }
-EXPORT_SYMBOL(nfs_sockaddr_cmp);
 
 /* Common match routine for v4.0 and v4.1 callback services */
 bool
@@ -592,7 +591,6 @@ int nfs4_check_client_ready(struct nfs_client *clp)
 		return -EPROTONOSUPPORT;
 	return 0;
 }
-EXPORT_SYMBOL(nfs4_check_client_ready);
 
 /*
  * Initialise the timeout values for a connection
@@ -1379,7 +1377,7 @@ error:
 /*
  * Set up an NFS4 client
  */
-int nfs4_set_client(struct nfs_server *server,
+static int nfs4_set_client(struct nfs_server *server,
 		const char *hostname,
 		const struct sockaddr *addr,
 		const size_t addrlen,
@@ -1416,8 +1414,48 @@ error:
 	dprintk("<-- nfs4_set_client() = xerror %d\n", error);
 	return error;
 }
-EXPORT_SYMBOL(nfs4_set_client);
 
+/*
+ * Set up a pNFS Data Server client.
+ *
+ * Return any existing nfs_client that matches server address,port,version
+ * and minorversion.
+ *
+ * For a new nfs_client, use a soft mount (default), a low retrans and a
+ * low timeout interval so that if a connection is lost, we retry through
+ * the MDS.
+ */
+struct nfs_client *nfs4_set_ds_client(struct nfs_client* mds_clp,
+		const struct sockaddr *ds_addr,
+		int ds_addrlen, int ds_proto)
+{
+	struct nfs_client_initdata cl_init = {
+		.addr = ds_addr,
+		.addrlen = ds_addrlen,
+		.rpc_ops = &nfs_v4_clientops,
+		.proto = ds_proto,
+		.minorversion = mds_clp->cl_minorversion,
+	};
+	struct rpc_timeout ds_timeout = {
+		.to_initval = 15 * HZ,
+		.to_maxval = 15 * HZ,
+		.to_retries = 1,
+		.to_exponential = 1,
+	};
+	struct nfs_client *clp;
+
+	/*
+	 * Set an authflavor equual to the MDS value. Use the MDS nfs_client
+	 * cl_ipaddr so as to use the same EXCHANGE_ID co_ownerid as the MDS
+	 * (section 13.1 RFC 5661).
+	 */
+	clp = nfs_get_client(&cl_init, &ds_timeout, mds_clp->cl_ipaddr,
+			     mds_clp->cl_rpcclient->cl_auth->au_flavor, 0);
+
+	dprintk("<-- %s %p\n", __func__, clp);
+	return clp;
+}
+EXPORT_SYMBOL(nfs4_set_ds_client);
 
 /*
  * Session has been established, and the client marked ready.
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 5c156d3..764a235 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -148,16 +148,9 @@ extern struct nfs_server *nfs_clone_server(struct nfs_server *,
 					   struct nfs_fattr *);
 extern void nfs_mark_client_ready(struct nfs_client *clp, int state);
 extern int nfs4_check_client_ready(struct nfs_client *clp);
-extern int nfs_sockaddr_cmp(const struct sockaddr *sa1,
-		const struct sockaddr *sa2);
-extern int nfs4_set_client(struct nfs_server *server,
-		const char *hostname,
-		const struct sockaddr *addr,
-		const size_t addrlen,
-		const char *ip_addr,
-		rpc_authflavor_t authflavour,
-		int proto, const struct rpc_timeout *timeparms,
-		u32 minorversion);
+extern struct nfs_client *nfs4_set_ds_client(struct nfs_client* mds_clp,
+					     const struct sockaddr *ds_addr,
+					     int ds_addrlen, int ds_proto);
 #ifdef CONFIG_PROC_FS
 extern int __init nfs_fs_proc_init(void);
 extern void nfs_fs_proc_exit(void);
diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index b1290ca..c46cb00 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -104,68 +104,43 @@ _data_server_lookup_locked(u32 ip_addr, u32 port)
 	return NULL;
 }
 
-/* Create an rpc to the data server defined in 'dev_list' */
+/*
+ * Create an rpc to the nfs4_pnfs_ds data server
+ * Currently only support IPv4
+ */
 static int
 nfs4_pnfs_ds_create(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds)
 {
-	struct nfs_server	*tmp;
-	struct sockaddr_in	sin;
-	struct rpc_clnt		*mds_clnt = mds_srv->client;
-	struct nfs_client	*clp = mds_srv->nfs_client;
-	struct sockaddr		*mds_addr;
+	struct nfs_client *clp;
+	struct sockaddr_in sin;
 	int err = 0;
 
 	dprintk("--> %s ip:port %x:%hu au_flavor %d\n", __func__,
 		ntohl(ds->ds_ip_addr), ntohs(ds->ds_port),
-		mds_clnt->cl_auth->au_flavor);
+		mds_srv->nfs_client->cl_rpcclient->cl_auth->au_flavor);
 
 	sin.sin_family = AF_INET;
 	sin.sin_addr.s_addr = ds->ds_ip_addr;
 	sin.sin_port = ds->ds_port;
 
-	/*
-	 * If this DS is also the MDS, use the MDS session only if the
-	 * MDS exchangeid flags show the EXCHGID4_FLAG_USE_PNFS_DS pNFS role.
-	 */
-	mds_addr = (struct sockaddr *)&clp->cl_addr;
-	if (nfs_sockaddr_cmp((struct sockaddr *)&sin, mds_addr)) {
+	clp = nfs4_set_ds_client(mds_srv->nfs_client, (struct sockaddr *)&sin,
+				 sizeof(sin), IPPROTO_TCP);
+	if (IS_ERR(clp)) {
+		err = PTR_ERR(clp);
+		goto out;
+	}
+
+	if ((clp->cl_exchange_flags & EXCHGID4_FLAG_MASK_PNFS) != 0) {
+		dprintk("%s [existing] ip=%x, port=%hu\n", __func__,
+			ntohl(ds->ds_ip_addr), ntohs(ds->ds_port));
+
 		if (!(clp->cl_exchange_flags & EXCHGID4_FLAG_USE_PNFS_DS)) {
-			printk(KERN_INFO
-			       "ip:port %x:%hu is not a pNFS Data Server\n",
-			       ntohl(ds->ds_ip_addr), ntohs(ds->ds_port));
 			err = -ENODEV;
-		} else {
-			atomic_inc(&clp->cl_count);
-			ds->ds_clp = clp;
-			dprintk("%s Using MDS Session for DS\n", __func__);
+			goto out_put;
 		}
 		goto out;
 	}
 
-	/* Temporay server for nfs4_set_client */
-	tmp = kzalloc(sizeof(struct nfs_server), GFP_KERNEL);
-	if (!tmp)
-		goto out;
-
-	/*
-	 * Set a retrans, timeout interval, and authflavor equual to the MDS
-	 * values. Use the MDS nfs_client cl_ipaddr field so as to use the
-	 * same co_ownerid as the MDS.
-	 */
-	err = nfs4_set_client(tmp,
-			      mds_srv->nfs_client->cl_hostname,
-			      (struct sockaddr *)&sin,
-			      sizeof(struct sockaddr),
-			      mds_srv->nfs_client->cl_ipaddr,
-			      mds_clnt->cl_auth->au_flavor,
-			      IPPROTO_TCP,
-			      mds_clnt->cl_xprt->timeout,
-			      1 /* minorversion */);
-	if (err < 0)
-		goto out_free;
-
-	clp = tmp->nfs_client;
-
 	/* Ask for only the EXCHGID4_FLAG_USE_PNFS_DS pNFS role */
 	dprintk("%s EXCHANGE_ID for clp %p\n", __func__, clp);
 	clp->cl_exchange_flags = EXCHGID4_FLAG_USE_PNFS_DS;
@@ -177,8 +152,6 @@ nfs4_pnfs_ds_create(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds)
 		goto out_put;
 
 	if (!(clp->cl_exchange_flags & EXCHGID4_FLAG_USE_PNFS_DS)) {
-		printk(KERN_INFO "ip:port %x:%hu is not a pNFS Data Server\n",
-		       ntohl(ds->ds_ip_addr), ntohs(ds->ds_port));
 		err = -ENODEV;
 		goto out_put;
 	}
@@ -191,20 +164,15 @@ nfs4_pnfs_ds_create(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds)
 	spin_unlock(&mds_srv->nfs_client->cl_lock);
 	clp->cl_last_renewal = jiffies;
 
-	clear_bit(NFS4CLNT_SESSION_RESET, &clp->cl_state);
 	ds->ds_clp = clp;
 
-	dprintk("%s: ip=%x, port=%hu, rpcclient %p\n", __func__,
-				ntohl(ds->ds_ip_addr), ntohs(ds->ds_port),
-				clp->cl_rpcclient);
-out_free:
-	kfree(tmp);
+	dprintk("%s [new] ip=%x, port=%hu\n", __func__, ntohl(ds->ds_ip_addr),
+		ntohs(ds->ds_port));
 out:
-	dprintk("%s Returns %d\n", __func__, err);
 	return err;
 out_put:
 	nfs_put_client(clp);
-	goto out_free;
+	goto out;
 }
 
 static void
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 26/40] pnfs-submit: wave3 refactor data server session initialization
  2011-02-04 21:33                                                 ` [PATCH 25/40] pnfs-submit: wave3 refactor dataserver client setup andros
@ 2011-02-04 21:33                                                   ` andros
  2011-02-04 21:33                                                     ` [PATCH 27/40] pnfs_submit: wave3 rename nfs4_pnfs_ds_create andros
  0 siblings, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Follow nfs4_init_session convention.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/internal.h          |    2 +-
 fs/nfs/nfs4_fs.h           |   12 ++++++++++++
 fs/nfs/nfs4filelayoutdev.c |   30 +++++++++++-------------------
 fs/nfs/nfs4proc.c          |   22 +++++++++++++++++++++-
 4 files changed, 45 insertions(+), 21 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 764a235..5518d61 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -216,7 +216,7 @@ extern const u32 nfs41_maxwrite_overhead;
 extern struct rpc_procinfo nfs4_procedures[];
 #endif
 
-extern int nfs4_recover_expired_lease(struct nfs_client *clp);
+extern int nfs4_init_ds_session(struct nfs_client *clp);
 
 /* proc.c */
 void nfs_close_context(struct nfs_open_context *ctx, int is_sync);
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 5d84642..5dc378e 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -266,6 +266,12 @@ is_ds_only_client(struct nfs_client *clp)
 	return (clp->cl_exchange_flags & EXCHGID4_FLAG_MASK_PNFS) ==
 		EXCHGID4_FLAG_USE_PNFS_DS;
 }
+
+static inline bool
+is_ds_client(struct nfs_client *clp)
+{
+	return clp->cl_exchange_flags & EXCHGID4_FLAG_USE_PNFS_DS;
+}
 #else /* CONFIG_NFS_v4_1 */
 static inline struct nfs4_session *nfs4_get_session(const struct nfs_server *server)
 {
@@ -289,6 +295,12 @@ is_ds_only_client(struct nfs_client *clp)
 {
 	return false;
 }
+
+static inline bool
+is_ds_client(struct nfs_client *clp)
+{
+	return false;
+}
 #endif /* CONFIG_NFS_V4_1 */
 
 extern const struct nfs4_minor_version_ops *nfs_v4_minor_ops[];
diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index c46cb00..6557d1c 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -113,7 +113,7 @@ nfs4_pnfs_ds_create(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds)
 {
 	struct nfs_client *clp;
 	struct sockaddr_in sin;
-	int err = 0;
+	int status = 0;
 
 	dprintk("--> %s ip:port %x:%hu au_flavor %d\n", __func__,
 		ntohl(ds->ds_ip_addr), ntohs(ds->ds_port),
@@ -126,7 +126,7 @@ nfs4_pnfs_ds_create(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds)
 	clp = nfs4_set_ds_client(mds_srv->nfs_client, (struct sockaddr *)&sin,
 				 sizeof(sin), IPPROTO_TCP);
 	if (IS_ERR(clp)) {
-		err = PTR_ERR(clp);
+		status = PTR_ERR(clp);
 		goto out;
 	}
 
@@ -134,27 +134,13 @@ nfs4_pnfs_ds_create(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds)
 		dprintk("%s [existing] ip=%x, port=%hu\n", __func__,
 			ntohl(ds->ds_ip_addr), ntohs(ds->ds_port));
 
-		if (!(clp->cl_exchange_flags & EXCHGID4_FLAG_USE_PNFS_DS)) {
-			err = -ENODEV;
+		if (!is_ds_client(clp)) {
+			status = -ENODEV;
 			goto out_put;
 		}
 		goto out;
 	}
 
-	/* Ask for only the EXCHGID4_FLAG_USE_PNFS_DS pNFS role */
-	dprintk("%s EXCHANGE_ID for clp %p\n", __func__, clp);
-	clp->cl_exchange_flags = EXCHGID4_FLAG_USE_PNFS_DS;
-
-	err = nfs4_recover_expired_lease(clp);
-	if (!err)
-		err = nfs4_check_client_ready(clp);
-	if (err)
-		goto out_put;
-
-	if (!(clp->cl_exchange_flags & EXCHGID4_FLAG_USE_PNFS_DS)) {
-		err = -ENODEV;
-		goto out_put;
-	}
 	/*
 	 * Set DS lease equal to the MDS lease, renewal is scheduled in
 	 * create_session
@@ -164,12 +150,18 @@ nfs4_pnfs_ds_create(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds)
 	spin_unlock(&mds_srv->nfs_client->cl_lock);
 	clp->cl_last_renewal = jiffies;
 
+	/* New nfs_client */
+	status = nfs4_init_ds_session(clp);
+	if (status)
+		goto out_put;
+
+
 	ds->ds_clp = clp;
 
 	dprintk("%s [new] ip=%x, port=%hu\n", __func__, ntohl(ds->ds_ip_addr),
 		ntohs(ds->ds_port));
 out:
-	return err;
+	return status;
 out_put:
 	nfs_put_client(clp);
 	goto out;
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index fb22cbf..1da0ebf 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1591,7 +1591,6 @@ static int nfs4_client_recover_expired_lease(struct nfs_client *clp)
 	}
 	return ret;
 }
-EXPORT_SYMBOL(nfs4_recover_expired_lease);
 
 static int nfs4_recover_expired_lease(struct nfs_server *server)
 {
@@ -5086,6 +5085,27 @@ int nfs4_init_session(struct nfs_server *server)
 	return ret;
 }
 
+int nfs4_init_ds_session(struct nfs_client *clp)
+{
+	struct nfs4_session *session = clp->cl_session;
+	int ret;
+
+	if (!test_and_clear_bit(NFS4_SESSION_INITING, &session->session_state))
+		return 0;
+
+	ret = nfs4_client_recover_expired_lease(clp);
+	if (!ret)
+		/* Test for the DS role */
+		if (!is_ds_client(clp))
+			ret = -ENODEV;
+	if (!ret)
+		ret = nfs4_check_client_ready(clp);
+	return ret;
+
+}
+EXPORT_SYMBOL_GPL(nfs4_init_ds_session);
+
+
 /*
  * Renew the cl_session lease.
  */
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 27/40] pnfs_submit: wave3 rename nfs4_pnfs_ds_create
  2011-02-04 21:33                                                   ` [PATCH 26/40] pnfs-submit: wave3 refactor data server session initialization andros
@ 2011-02-04 21:33                                                     ` andros
  2011-02-04 21:33                                                       ` [PATCH 28/40] pnfs-submit: wave3 turn off pNFS on ds connection failure andros
  0 siblings, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Function connects the already created data server.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayoutdev.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index 6557d1c..9bb13f5 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -109,7 +109,7 @@ _data_server_lookup_locked(u32 ip_addr, u32 port)
  * Currently only support IPv4
  */
 static int
-nfs4_pnfs_ds_create(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds)
+nfs4_ds_connect(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds)
 {
 	struct nfs_client *clp;
 	struct sockaddr_in sin;
@@ -574,10 +574,10 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
 	if (!dsaddr->ds_list[ds_idx]->ds_clp) {
 		int err;
 
-		err = nfs4_pnfs_ds_create(NFS_SERVER(lseg->pls_layout->plh_inode),
+		err = nfs4_ds_connect(NFS_SERVER(lseg->pls_layout->plh_inode),
 					  dsaddr->ds_list[ds_idx]);
 		if (err) {
-			printk(KERN_ERR "%s nfs4_pnfs_ds_create error %d\n",
+			printk(KERN_ERR "%s nfs4_ds_connect error %d\n",
 			       __func__, err);
 			return NULL;
 		}
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 28/40] pnfs-submit: wave3 turn off pNFS on ds connection failure
  2011-02-04 21:33                                                     ` [PATCH 27/40] pnfs_submit: wave3 rename nfs4_pnfs_ds_create andros
@ 2011-02-04 21:33                                                       ` andros
  2011-02-04 21:33                                                         ` [PATCH 29/40] pnfs-submit: wave3 rewrite read lseg refcounting andros
  0 siblings, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

If a data server is unavailable, go through MDS.

Mark the deviceid containing the data server as a negative cache entry.
Do not try to connect to any data server on a deviceid marked as a negative
cache entry. Mark any layout that tries to use the marked deviceid as failed.

Inodes with a layout marked as fails will not use the layout for I/O, and will
not perform any more layoutgets.
Inodes without a layout will still do layoutget, but the layout will get
marked immediately.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayout.c    |    6 ++++--
 fs/nfs/nfs4filelayout.h    |    3 +++
 fs/nfs/nfs4filelayoutdev.c |   39 +++++++++++++++++++++++++++++----------
 fs/nfs/pnfs.c              |   18 ++++++++++++++----
 fs/nfs/pnfs.h              |    4 ++++
 5 files changed, 54 insertions(+), 16 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index d925af6..9b9a81c 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -146,7 +146,9 @@ filelayout_read_pagelist(struct nfs_read_data *data)
 	idx = nfs4_fl_calc_ds_index(lseg, offset);
 	ds = nfs4_fl_prepare_ds(lseg, idx);
 	if (!ds) {
-		printk(KERN_ERR "%s: prepare_ds failed, use MDS\n", __func__);
+		/* Either layout fh index faulty, or ds connect failed */
+		set_bit(lo_fail_bit(IOMODE_RW), &lseg->pls_layout->plh_flags);
+		set_bit(lo_fail_bit(IOMODE_READ), &lseg->pls_layout->plh_flags);
 		return PNFS_NOT_ATTEMPTED;
 	}
 	dprintk("%s USE DS:ip %x %hu\n", __func__,
@@ -161,7 +163,7 @@ filelayout_read_pagelist(struct nfs_read_data *data)
 	data->args.offset = filelayout_get_dserver_offset(lseg, offset);
 	data->fldata.orig_offset = offset;
 
-	/* Perform an asynchronous read */
+	/* Perform an asynchronous read to ds */
 	nfs_initiate_read(data, ds->ds_clp->cl_rpcclient,
 			  &filelayout_read_call_ops);
 	return PNFS_ATTEMPTED;
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index f884b0c..7e33bd8 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -96,5 +96,8 @@ extern struct nfs4_file_layout_dsaddr *
 nfs4_fl_find_get_deviceid(struct nfs_client *, struct nfs4_deviceid *dev_id);
 struct nfs4_file_layout_dsaddr *
 get_device_info(struct inode *inode, struct nfs4_deviceid *dev_id);
+void filelayout_mark_devid_negative(struct nfs_client *clp,
+				    struct pnfs_deviceid_node *devid,
+				    int err, u32 ds_ipaddr);
 
 #endif /* FS_NFS_NFS4FILELAYOUT_H */
diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index 9bb13f5..8642109 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -558,27 +558,46 @@ nfs4_fl_select_ds_fh(struct pnfs_layout_segment *lseg, loff_t offset)
 		i = _nfs4_fl_calc_j_index(lseg, offset);
 	return flseg->fh_array[i];
 }
+void
+filelayout_mark_devid_negative(struct nfs_client *mds_clp,
+			       struct pnfs_deviceid_node *devid,
+			       int err, u32 ds_addr)
+{
+	u32 *p = (u32 *)&devid->de_id;
+
+	printk(KERN_ERR "NFS: data server %x connection error %d."
+			" Deviceid [%x%x%x%x] marked out of use.\n",
+			ds_addr, err, p[0], p[1], p[2], p[3]);
+
+	pnfs_mark_devid_negative(mds_clp, devid);
+}
 
 struct nfs4_pnfs_ds *
 nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
 {
-	struct nfs4_file_layout_dsaddr *dsaddr;
+	struct nfs4_file_layout_dsaddr *dsaddr = FILELAYOUT_LSEG(lseg)->dsaddr;
+	struct nfs4_pnfs_ds *ds = dsaddr->ds_list[ds_idx];
 
-	dsaddr = FILELAYOUT_LSEG(lseg)->dsaddr;
-	if (dsaddr->ds_list[ds_idx] == NULL) {
-		printk(KERN_ERR "%s: No data server for device id!\n",
-			__func__);
+	if (ds == NULL) {
+		printk(KERN_ERR "%s: No data server for offset index %d\n",
+			__func__, ds_idx);
 		return NULL;
 	}
 
-	if (!dsaddr->ds_list[ds_idx]->ds_clp) {
+	if (!ds->ds_clp) {
+		struct nfs_server *s = NFS_SERVER(lseg->pls_layout->plh_inode);
 		int err;
 
-		err = nfs4_ds_connect(NFS_SERVER(lseg->pls_layout->plh_inode),
-					  dsaddr->ds_list[ds_idx]);
+		/* Already tried to connect, don't try again */
+		if (dsaddr->deviceid.de_flags & NFS4_DEVICE_ID_NEG_ENTRY) {
+			dprintk("%s Deviceid marked out of use\n", __func__);
+			return NULL;
+		}
+		err = nfs4_ds_connect(s, ds);
 		if (err) {
-			printk(KERN_ERR "%s nfs4_ds_connect error %d\n",
-			       __func__, err);
+			filelayout_mark_devid_negative(s->nfs_client,
+						       &dsaddr->deviceid, err,
+						       ntohl(ds->ds_ip_addr));
 			return NULL;
 		}
 	}
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 4c49109..72786ec 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -759,15 +759,16 @@ pnfs_update_layout(struct inode *ino,
 		dprintk("%s matches recall, use MDS\n", __func__);
 		goto out_unlock;
 	}
+
+	/* If LAYOUTGET or pNFS I/O already failed once we don't try again */
+	if (test_bit(lo_fail_bit(iomode), &nfsi->layout->plh_flags))
+		goto out_unlock;
+
 	/* Check to see if the layout for the given range already exists */
 	lseg = pnfs_find_lseg(lo, iomode);
 	if (lseg)
 		goto out_unlock;
 
-	/* if LAYOUTGET already failed once we don't try again */
-	if (test_bit(lo_fail_bit(iomode), &nfsi->layout->plh_flags))
-		goto out_unlock;
-
 	if (pnfs_layoutgets_blocked(lo, NULL, 0))
 		goto out_unlock;
 	atomic_inc(&lo->plh_outstanding);
@@ -1089,3 +1090,12 @@ pnfs_put_deviceid_cache(struct nfs_client *clp)
 	}
 }
 EXPORT_SYMBOL_GPL(pnfs_put_deviceid_cache);
+
+void
+pnfs_mark_devid_negative(struct nfs_client *clp, struct pnfs_deviceid_node *d)
+{
+	spin_lock(&clp->cl_devid_cache->dc_lock);
+	d->de_flags |= NFS4_DEVICE_ID_NEG_ENTRY;
+	spin_unlock(&clp->cl_devid_cache->dc_lock);
+}
+EXPORT_SYMBOL_GPL(pnfs_mark_devid_negative);
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index cbbcdfa..25a4e25 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -135,6 +135,8 @@ struct pnfs_deviceid_node {
 	struct hlist_node	de_node;
 	struct nfs4_deviceid	de_id;
 	atomic_t		de_ref;
+	unsigned long		de_flags;
+#define NFS4_DEVICE_ID_NEG_ENTRY		1
 };
 
 struct pnfs_deviceid_cache {
@@ -155,6 +157,8 @@ extern struct pnfs_deviceid_node *pnfs_add_deviceid(
 				struct pnfs_deviceid_node *);
 extern void pnfs_put_deviceid(struct pnfs_deviceid_cache *c,
 			      struct pnfs_deviceid_node *devid);
+extern void pnfs_mark_devid_negative(struct nfs_client *clp,
+				     struct pnfs_deviceid_node *d);
 
 extern int pnfs_register_layoutdriver(struct pnfs_layoutdriver_type *);
 extern void pnfs_unregister_layoutdriver(struct pnfs_layoutdriver_type *);
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 29/40] pnfs-submit: wave3 rewrite read lseg refcounting
  2011-02-04 21:33                                                       ` [PATCH 28/40] pnfs-submit: wave3 turn off pNFS on ds connection failure andros
@ 2011-02-04 21:33                                                         ` andros
  2011-02-04 21:33                                                           ` [PATCH 30/40] pnfs-submit: wave3 let LAYOUTGET distinguish between read and write calls andros
  0 siblings, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Fred Isaman

From: Fred Isaman <iisaman@netapp.com>

Shift lseg refcounting from nfs_page to nfs_read_data.

Note this will cause all writes to get a READ layout, but since
we don't actually use the layout at all for write yet, that's OK.

Also note that this and my prior patch change how the drivers need
to implement pg_test.  There are no longer two lsegs to compare, one
for each nfs_page.  Instead, there is (potentially) one attached to the
pageio_desc, which if exists we know includes the first page.

Remove unused pages from pnfs_pageio_init_read

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Acked-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayout.c  |   10 ++------
 fs/nfs/pagelist.c        |   29 +++++++++++++------------
 fs/nfs/pnfs.c            |   52 ++++++---------------------------------------
 fs/nfs/pnfs.h            |   21 ++++++++++--------
 fs/nfs/read.c            |   35 ++++++++++++++++---------------
 fs/nfs/write.c           |    6 ++--
 include/linux/nfs_page.h |    8 ++----
 include/linux/nfs_xdr.h  |    2 +-
 8 files changed, 62 insertions(+), 101 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 9b9a81c..c7ba5bc 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -118,8 +118,6 @@ static void filelayout_read_release(void *data)
 {
 	struct nfs_read_data *rdata = (struct nfs_read_data *)data;
 
-	put_lseg(rdata->pdata.lseg);
-	rdata->pdata.lseg = NULL;
 	rdata->pdata.call_ops->rpc_release(data);
 }
 
@@ -132,7 +130,7 @@ struct rpc_call_ops filelayout_read_call_ops = {
 static enum pnfs_try_status
 filelayout_read_pagelist(struct nfs_read_data *data)
 {
-	struct pnfs_layout_segment *lseg = data->pdata.lseg;
+	struct pnfs_layout_segment *lseg = data->lseg;
 	struct nfs4_pnfs_ds *ds;
 	loff_t offset = data->args.offset;
 	u32 idx;
@@ -360,8 +358,6 @@ filelayout_free_lseg(struct pnfs_layout_segment *lseg)
  *
  * return 1 :  coalesce page
  * return 0 :  don't coalesce page
- *
- * By the time this is called, we know req->wb_lseg == prev->wb_lseg
  */
 int
 filelayout_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
@@ -370,11 +366,11 @@ filelayout_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
 	u64 p_stripe, r_stripe;
 	u32 stripe_unit;
 
-	if (!req->wb_lseg)
+	if (!pgio->pg_lseg)
 		return 1;
 	p_stripe = (u64)prev->wb_index << PAGE_CACHE_SHIFT;
 	r_stripe = (u64)req->wb_index << PAGE_CACHE_SHIFT;
-	stripe_unit = FILELAYOUT_LSEG(req->wb_lseg)->stripe_unit;
+	stripe_unit = FILELAYOUT_LSEG(pgio->pg_lseg)->stripe_unit;
 
 	do_div(p_stripe, stripe_unit);
 	do_div(r_stripe, stripe_unit);
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 9a27592..ea3b7f8 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -54,8 +54,7 @@ nfs_page_free(struct nfs_page *p)
 struct nfs_page *
 nfs_create_request(struct nfs_open_context *ctx, struct inode *inode,
 		   struct page *page,
-		   unsigned int offset, unsigned int count,
-		   struct pnfs_layout_segment *lseg)
+		   unsigned int offset, unsigned int count)
 {
 	struct nfs_page		*req;
 
@@ -86,9 +85,6 @@ nfs_create_request(struct nfs_open_context *ctx, struct inode *inode,
 	req->wb_bytes   = count;
 	req->wb_context = get_nfs_open_context(ctx);
 	kref_init(&req->wb_kref);
-	req->wb_lseg    = lseg;
-	if (lseg)
-		get_lseg(lseg);
 	return req;
 }
 
@@ -164,10 +160,6 @@ void nfs_clear_request(struct nfs_page *req)
 		put_nfs_open_context(ctx);
 		req->wb_context = NULL;
 	}
-	if (req->wb_lseg != NULL) {
-		put_lseg(req->wb_lseg);
-		req->wb_lseg = NULL;
-	}
 }
 
 /**
@@ -221,7 +213,7 @@ nfs_wait_on_request(struct nfs_page *req)
  */
 void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
 		     struct inode *inode,
-		     int (*doio)(struct inode *, struct list_head *, unsigned int, size_t, int),
+		     int (*doio)(struct inode *, struct list_head *, unsigned int, size_t, int, struct pnfs_layout_segment *),
 		     size_t bsize,
 		     int io_flags)
 {
@@ -234,6 +226,7 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
 	desc->pg_doio = doio;
 	desc->pg_ioflags = io_flags;
 	desc->pg_error = 0;
+	desc->pg_lseg = NULL;
 }
 
 /**
@@ -263,8 +256,9 @@ static int nfs_can_coalesce_requests(struct nfs_page *prev,
 		return 0;
 	if (prev->wb_pgbase + prev->wb_bytes != PAGE_CACHE_SIZE)
 		return 0;
-	if (req->wb_lseg != prev->wb_lseg)
-		return 0;
+	/* For non-whole file layouts, need to check that req is inside of
+	 * pgio->pg_test.
+	 */
 #ifdef CONFIG_NFS_V4_1
 	if (pgio->pg_test && !pgio->pg_test(pgio, prev, req))
 		return 0;
@@ -303,8 +297,13 @@ static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
 		prev = nfs_list_entry(desc->pg_list.prev);
 		if (!nfs_can_coalesce_requests(prev, req, desc))
 			return 0;
-	} else
+	} else {
+		put_lseg(desc->pg_lseg);
 		desc->pg_base = req->wb_pgbase;
+		desc->pg_lseg = pnfs_update_layout(desc->pg_inode,
+						   req->wb_context,
+						   IOMODE_READ);
+	}
 	nfs_list_remove_request(req);
 	nfs_list_add_request(req, &desc->pg_list);
 	desc->pg_count = newlen;
@@ -322,7 +321,8 @@ static void nfs_pageio_doio(struct nfs_pageio_descriptor *desc)
 					  nfs_page_array_len(desc->pg_base,
 							     desc->pg_count),
 					  desc->pg_count,
-					  desc->pg_ioflags);
+					  desc->pg_ioflags,
+					  desc->pg_lseg);
 		if (error < 0)
 			desc->pg_error = error;
 		else
@@ -360,6 +360,7 @@ int nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
 void nfs_pageio_complete(struct nfs_pageio_descriptor *desc)
 {
 	nfs_pageio_doio(desc);
+	put_lseg(desc->pg_lseg);
 }
 
 /**
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 72786ec..76a5e00 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -716,8 +716,7 @@ pnfs_find_lseg(struct pnfs_layout_hdr *lo, u32 iomode)
 	list_for_each_entry(lseg, &lo->plh_segs, pls_list) {
 		if (test_bit(NFS_LSEG_VALID, &lseg->pls_flags) &&
 		    is_matching_lseg(lseg, iomode)) {
-			get_lseg(lseg);
-			ret = lseg;
+			ret = get_lseg(lseg);
 			break;
 		}
 		if (cmp_layout(iomode, lseg->pls_range.iomode) > 0)
@@ -850,8 +849,7 @@ pnfs_layout_process(struct nfs4_layoutget *lgp)
 	}
 	init_lseg(lo, lseg);
 	lseg->pls_range = res->range;
-	get_lseg(lseg);
-	*lgp->lsegpp = lseg;
+	*lgp->lsegpp = get_lseg(lseg);
 	pnfs_insert_layout(lo, lseg);
 
 	if (res->return_on_close) {
@@ -872,20 +870,13 @@ out_forget_reply:
 	goto out;
 }
 
-void
+static void
 pnfs_set_pg_test(struct inode *inode, struct nfs_pageio_descriptor *pgio)
 {
-	struct pnfs_layout_hdr *lo;
 	struct pnfs_layoutdriver_type *ld;
 
-	pgio->pg_test = NULL;
-
-	lo = NFS_I(inode)->layout;
 	ld = NFS_SERVER(inode)->pnfs_curr_ld;
-	if (!ld || !lo)
-		return;
-
-	pgio->pg_test = ld->pg_test;
+	pgio->pg_test = (ld ? ld->pg_test : NULL);
 }
 
 /*
@@ -893,35 +884,11 @@ pnfs_set_pg_test(struct inode *inode, struct nfs_pageio_descriptor *pgio)
  */
 void
 pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,
-		  struct inode *inode,
-		  struct nfs_open_context *ctx,
-		  struct list_head *pages)
+		  struct inode *inode)
 {
-	struct nfs_server *nfss = NFS_SERVER(inode);
-
-	pgio->pg_test = NULL;
-	pgio->pg_lseg = NULL;
-
-	if (!pnfs_enabled_sb(nfss))
-		return;
-
-	pgio->pg_lseg = pnfs_update_layout(inode, ctx, IOMODE_READ);
-	if (!pgio->pg_lseg)
-		return;
-
 	pnfs_set_pg_test(inode, pgio);
 }
 
-static void _pnfs_clear_lseg_from_pages(struct list_head *head)
-{
-	struct nfs_page *req;
-
-	list_for_each_entry(req, head, wb_list) {
-		put_lseg(req->wb_lseg);
-		req->wb_lseg = NULL;
-	}
-}
-
 /*
  * Call the appropriate parallel I/O subsystem read function.
  * If no I/O device driver exists, or one does match the returned
@@ -933,7 +900,6 @@ pnfs_try_to_read_data(struct nfs_read_data *rdata,
 {
 	struct inode *inode = rdata->inode;
 	struct nfs_server *nfss = NFS_SERVER(inode);
-	struct pnfs_layout_segment *lseg = rdata->req->wb_lseg;
 	enum pnfs_try_status trypnfs;
 
 	rdata->pdata.call_ops = call_ops;
@@ -941,14 +907,10 @@ pnfs_try_to_read_data(struct nfs_read_data *rdata,
 	dprintk("%s: Reading ino:%lu %u@%llu\n",
 		__func__, inode->i_ino, rdata->args.count, rdata->args.offset);
 
-	get_lseg(lseg);
-
-	rdata->pdata.lseg = lseg;
 	trypnfs = nfss->pnfs_curr_ld->read_pagelist(rdata);
 	if (trypnfs == PNFS_NOT_ATTEMPTED) {
-		rdata->pdata.lseg = NULL;
-		put_lseg(lseg);
-		_pnfs_clear_lseg_from_pages(&rdata->pages);
+		put_lseg(rdata->lseg);
+		rdata->lseg = NULL;
 	} else {
 		nfs_inc_stats(inode, NFSIOS_PNFS_READ);
 	}
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 25a4e25..6a99c33 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -178,8 +178,7 @@ void set_pnfs_layoutdriver(struct nfs_server *, u32 id);
 void unset_pnfs_layoutdriver(struct nfs_server *);
 enum pnfs_try_status pnfs_try_to_read_data(struct nfs_read_data *,
 					    const struct rpc_call_ops *);
-void pnfs_pageio_init_read(struct nfs_pageio_descriptor *, struct inode *,
-			   struct nfs_open_context *, struct list_head *);
+void pnfs_pageio_init_read(struct nfs_pageio_descriptor *, struct inode *);
 int pnfs_layout_process(struct nfs4_layoutget *lgp);
 void pnfs_free_lseg_list(struct list_head *tmp_list);
 void pnfs_destroy_layout(struct nfs_inode *);
@@ -206,10 +205,14 @@ static inline int lo_fail_bit(u32 iomode)
 			 NFS_LAYOUT_RW_FAILED : NFS_LAYOUT_RO_FAILED;
 }
 
-static inline void get_lseg(struct pnfs_layout_segment *lseg)
+static inline struct pnfs_layout_segment *
+get_lseg(struct pnfs_layout_segment *lseg)
 {
-	atomic_inc(&lseg->pls_refcount);
-	smp_mb__after_atomic_inc();
+	if (lseg) {
+		atomic_inc(&lseg->pls_refcount);
+		smp_mb__after_atomic_inc();
+	}
+	return lseg;
 }
 
 /* Return true if a layout driver is being used for this mountpoint */
@@ -228,8 +231,10 @@ static inline void pnfs_destroy_layout(struct nfs_inode *nfsi)
 {
 }
 
-static inline void get_lseg(struct pnfs_layout_segment *lseg)
+static inline struct pnfs_layout_segment *
+get_lseg(struct pnfs_layout_segment *lseg)
 {
+	return NULL;
 }
 
 static inline void put_lseg(struct pnfs_layout_segment *lseg)
@@ -281,10 +286,8 @@ static inline void unset_pnfs_layoutdriver(struct nfs_server *s)
 }
 
 static inline void
-pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio, struct inode *ino,
-		      struct nfs_open_context *ctx, struct list_head *pages)
+pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio, struct inode *ino)
 {
-	pgio->pg_lseg = NULL;
 }
 
 #endif /* CONFIG_NFS_V4_1 */
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 43eb6a2..0db6203 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -32,8 +32,8 @@
 
 #define NFSDBG_FACILITY		NFSDBG_PAGECACHE
 
-static int nfs_pagein_multi(struct inode *, struct list_head *, unsigned int, size_t, int);
-static int nfs_pagein_one(struct inode *, struct list_head *, unsigned int, size_t, int);
+static int nfs_pagein_multi(struct inode *, struct list_head *, unsigned int, size_t, int, struct pnfs_layout_segment *);
+static int nfs_pagein_one(struct inode *, struct list_head *, unsigned int, size_t, int, struct pnfs_layout_segment *);
 static const struct rpc_call_ops nfs_read_partial_ops;
 static const struct rpc_call_ops nfs_read_full_ops;
 
@@ -73,6 +73,7 @@ void nfs_readdata_free(struct nfs_read_data *p)
 static void nfs_readdata_release(struct nfs_read_data *rdata)
 {
 	put_nfs_open_context(rdata->args.context);
+	put_lseg(rdata->lseg);
 	nfs_readdata_free(rdata);
 }
 
@@ -125,9 +126,7 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
 	len = nfs_page_length(page);
 	if (len == 0)
 		return nfs_return_empty_page(page);
-	lseg = pnfs_update_layout(inode, ctx, IOMODE_READ);
-	new = nfs_create_request(ctx, inode, page, 0, len, lseg);
-	put_lseg(lseg);
+	new = nfs_create_request(ctx, inode, page, 0, len);
 	if (IS_ERR(new)) {
 		unlock_page(page);
 		return PTR_ERR(new);
@@ -136,10 +135,12 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
 		zero_user_segment(page, len, PAGE_CACHE_SIZE);
 
 	nfs_list_add_request(new, &one_request);
+	lseg = pnfs_update_layout(inode, ctx, IOMODE_READ);
 	if (NFS_SERVER(inode)->rsize < PAGE_CACHE_SIZE)
-		nfs_pagein_multi(inode, &one_request, 1, len, 0);
+		nfs_pagein_multi(inode, &one_request, 1, len, 0, lseg);
 	else
-		nfs_pagein_one(inode, &one_request, 1, len, 0);
+		nfs_pagein_one(inode, &one_request, 1, len, 0, lseg);
+	put_lseg(lseg);
 	return 0;
 }
 
@@ -202,7 +203,7 @@ EXPORT_SYMBOL(nfs_initiate_read);
 static int pnfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
 		       const struct rpc_call_ops *call_ops)
 {
-	if (data->req->wb_lseg &&
+	if (data->lseg &&
 	    (pnfs_try_to_read_data(data, call_ops) == PNFS_ATTEMPTED))
 		return 0;
 
@@ -214,13 +215,15 @@ static int pnfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
  */
 static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
 		const struct rpc_call_ops *call_ops,
-		unsigned int count, unsigned int offset)
+		unsigned int count, unsigned int offset,
+		struct pnfs_layout_segment *lseg)
 {
 	struct inode *inode = req->wb_context->path.dentry->d_inode;
 
 	data->req	  = req;
 	data->inode	  = inode;
 	data->cred	  = req->wb_context->cred;
+	data->lseg	  = get_lseg(lseg);
 
 	data->args.fh     = NFS_FH(inode);
 	data->args.offset = req_offset(req) + offset;
@@ -264,7 +267,7 @@ nfs_async_read_error(struct list_head *head)
  * won't see the new data until our attribute cache is updated.  This is more
  * or less conventional NFS client behavior.
  */
-static int nfs_pagein_multi(struct inode *inode, struct list_head *head, unsigned int npages, size_t count, int flags)
+static int nfs_pagein_multi(struct inode *inode, struct list_head *head, unsigned int npages, size_t count, int flags, struct pnfs_layout_segment *lseg)
 {
 	struct nfs_page *req = nfs_list_entry(head->next);
 	struct page *page = req->wb_page;
@@ -304,7 +307,7 @@ static int nfs_pagein_multi(struct inode *inode, struct list_head *head, unsigne
 		if (nbytes < rsize)
 			rsize = nbytes;
 		ret2 = nfs_read_rpcsetup(req, data, &nfs_read_partial_ops,
-				  rsize, offset);
+					 rsize, offset, lseg);
 		if (ret == 0)
 			ret = ret2;
 		offset += rsize;
@@ -324,7 +327,7 @@ out_bad:
 	return -ENOMEM;
 }
 
-static int nfs_pagein_one(struct inode *inode, struct list_head *head, unsigned int npages, size_t count, int flags)
+static int nfs_pagein_one(struct inode *inode, struct list_head *head, unsigned int npages, size_t count, int flags, struct pnfs_layout_segment *lseg)
 {
 	struct nfs_page		*req;
 	struct page		**pages;
@@ -345,7 +348,7 @@ static int nfs_pagein_one(struct inode *inode, struct list_head *head, unsigned
 	}
 	req = nfs_list_entry(data->pages.next);
 
-	return nfs_read_rpcsetup(req, data, &nfs_read_full_ops, count, 0);
+	return nfs_read_rpcsetup(req, data, &nfs_read_full_ops, count, 0, lseg);
 out_bad:
 	nfs_async_read_error(head);
 	return ret;
@@ -606,8 +609,7 @@ readpage_async_filler(void *data, struct page *page)
 	if (len == 0)
 		return nfs_return_empty_page(page);
 
-	new = nfs_create_request(desc->ctx, inode, page, 0, len,
-				 desc->pgio->pg_lseg);
+	new = nfs_create_request(desc->ctx, inode, page, 0, len);
 	if (IS_ERR(new))
 		goto out_error;
 
@@ -663,7 +665,7 @@ int nfs_readpages(struct file *filp, struct address_space *mapping,
 	if (ret == 0)
 		goto read_complete; /* all pages were read */
 
-	pnfs_pageio_init_read(&pgio, inode, desc.ctx, pages);
+	pnfs_pageio_init_read(&pgio, inode);
 	if (rsize < PAGE_CACHE_SIZE)
 		nfs_pageio_init(&pgio, inode, nfs_pagein_multi, rsize, 0);
 	else
@@ -672,7 +674,6 @@ int nfs_readpages(struct file *filp, struct address_space *mapping,
 	ret = read_cache_pages(mapping, pages, readpage_async_filler, &desc);
 
 	nfs_pageio_complete(&pgio);
-	put_lseg(pgio.pg_lseg);
 	npages = (pgio.pg_bytes_written + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
 	nfs_add_stats(inode, NFSIOS_READPAGES, npages);
 read_complete:
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 6b87b03..004c28b 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -651,7 +651,7 @@ static struct nfs_page * nfs_setup_write_request(struct nfs_open_context* ctx,
 	req = nfs_try_to_update_request(inode, page, offset, bytes);
 	if (req != NULL)
 		goto out;
-	req = nfs_create_request(ctx, inode, page, offset, bytes, NULL);
+	req = nfs_create_request(ctx, inode, page, offset, bytes);
 	if (IS_ERR(req))
 		goto out;
 	error = nfs_inode_add_request(inode, req);
@@ -879,7 +879,7 @@ static void nfs_redirty_request(struct nfs_page *req)
  * Generate multiple small requests to write out a single
  * contiguous dirty area on one page.
  */
-static int nfs_flush_multi(struct inode *inode, struct list_head *head, unsigned int npages, size_t count, int how)
+static int nfs_flush_multi(struct inode *inode, struct list_head *head, unsigned int npages, size_t count, int how, struct pnfs_layout_segment *lseg)
 {
 	struct nfs_page *req = nfs_list_entry(head->next);
 	struct page *page = req->wb_page;
@@ -946,7 +946,7 @@ out_bad:
  * This is the case if nfs_updatepage detects a conflicting request
  * that has been written but not committed.
  */
-static int nfs_flush_one(struct inode *inode, struct list_head *head, unsigned int npages, size_t count, int how)
+static int nfs_flush_one(struct inode *inode, struct list_head *head, unsigned int npages, size_t count, int how, struct pnfs_layout_segment *lseg)
 {
 	struct nfs_page		*req;
 	struct page		**pages;
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index 484c5b9..488f27b 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -49,7 +49,6 @@ struct nfs_page {
 	struct kref		wb_kref;	/* reference count */
 	unsigned long		wb_flags;
 	struct nfs_writeverf	wb_verf;	/* Commit cookie */
-	struct pnfs_layout_segment *wb_lseg;	/* Pnfs layout info */
 };
 
 struct nfs_pageio_descriptor {
@@ -60,7 +59,7 @@ struct nfs_pageio_descriptor {
 	unsigned int		pg_base;
 
 	struct inode		*pg_inode;
-	int			(*pg_doio)(struct inode *, struct list_head *, unsigned int, size_t, int);
+	int			(*pg_doio)(struct inode *, struct list_head *, unsigned int, size_t, int, struct pnfs_layout_segment *);
 	int 			pg_ioflags;
 	int			pg_error;
 	struct pnfs_layout_segment *pg_lseg;
@@ -75,8 +74,7 @@ extern	struct nfs_page *nfs_create_request(struct nfs_open_context *ctx,
 					    struct inode *inode,
 					    struct page *page,
 					    unsigned int offset,
-					    unsigned int count,
-					    struct pnfs_layout_segment *lseg);
+					    unsigned int count);
 extern	void nfs_clear_request(struct nfs_page *req);
 extern	void nfs_release_request(struct nfs_page *req);
 
@@ -85,7 +83,7 @@ extern	int nfs_scan_list(struct nfs_inode *nfsi, struct list_head *dst,
 			  pgoff_t idx_start, unsigned int npages, int tag);
 extern	void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
 			     struct inode *inode,
-			     int (*doio)(struct inode *, struct list_head *, unsigned int, size_t, int),
+			     int (*doio)(struct inode *, struct list_head *, unsigned int, size_t, int, struct pnfs_layout_segment *),
 			     size_t bsize,
 			     int how);
 extern	int nfs_pageio_add_request(struct nfs_pageio_descriptor *,
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index bd84684..887aff3 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1008,7 +1008,6 @@ struct nfs_page;
 
 /* pnfs-specific data needed for read, write, and commit calls */
 struct pnfs_call_data {
-	struct pnfs_layout_segment *lseg;
 	const struct rpc_call_ops *call_ops;
 	u32			orig_count;	/* for retry via MDS */
 	u8			how;		/* for FLUSH_STABLE */
@@ -1033,6 +1032,7 @@ struct nfs_read_data {
 	unsigned int		npages;	/* Max length of pagevec */
 	struct nfs_readargs args;
 	struct nfs_readres  res;
+	struct pnfs_layout_segment *lseg;
 #ifdef CONFIG_NFS_V4
 	unsigned long		timestamp;	/* For lease renewal */
 #endif
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 30/40] pnfs-submit: wave3 let LAYOUTGET distinguish between read and write calls
  2011-02-04 21:33                                                         ` [PATCH 29/40] pnfs-submit: wave3 rewrite read lseg refcounting andros
@ 2011-02-04 21:33                                                           ` andros
  2011-02-04 21:33                                                             ` [PATCH 31/40] pnfs_submit wave3 remove struct pnfs_fl_call_data andros
  2011-02-04 21:59                                                             ` [PATCH 30/40] pnfs-submit: wave3 let LAYOUTGET distinguish between read and write calls Fred Isaman
  0 siblings, 2 replies; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Fred Isaman, Andy Adamson

From: Fred Isaman <iisaman@netapp.com>

This is done by introducing pgio->pg_iswrite.

For wave3 do not send layoutget on write
pnfs-submit wave 3 remove pg_iswrite add back for wave4
Remove CONFIG_NFS_V4_1 from struct nfs_page

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/pagelist.c        |    9 ++++-----
 fs/nfs/write.c           |    3 +++
 include/linux/nfs_page.h |    2 --
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index ea3b7f8..cf09cb7 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -259,10 +259,8 @@ static int nfs_can_coalesce_requests(struct nfs_page *prev,
 	/* For non-whole file layouts, need to check that req is inside of
 	 * pgio->pg_test.
 	 */
-#ifdef CONFIG_NFS_V4_1
 	if (pgio->pg_test && !pgio->pg_test(pgio, prev, req))
 		return 0;
-#endif /* CONFIG_NFS_V4_1 */
 	return 1;
 }
 
@@ -300,9 +298,10 @@ static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
 	} else {
 		put_lseg(desc->pg_lseg);
 		desc->pg_base = req->wb_pgbase;
-		desc->pg_lseg = pnfs_update_layout(desc->pg_inode,
-						   req->wb_context,
-						   IOMODE_READ);
+		if (desc->pg_test)
+			desc->pg_lseg = pnfs_update_layout(desc->pg_inode,
+							   req->wb_context,
+							   IOMODE_READ);
 	}
 	nfs_list_remove_request(req);
 	nfs_list_add_request(req, &desc->pg_list);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 004c28b..aca0268 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -28,6 +28,7 @@
 #include "iostat.h"
 #include "nfs4_fs.h"
 #include "fscache.h"
+#include "pnfs.h"
 
 #define NFSDBG_FACILITY		NFSDBG_PAGECACHE
 
@@ -982,6 +983,8 @@ static void nfs_pageio_init_write(struct nfs_pageio_descriptor *pgio,
 {
 	size_t wsize = NFS_SERVER(inode)->wsize;
 
+	pgio->pg_test = NULL;
+
 	if (wsize < PAGE_CACHE_SIZE)
 		nfs_pageio_init(pgio, inode, nfs_flush_multi, wsize, ioflags);
 	else
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index 488f27b..ba88ff4 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -63,9 +63,7 @@ struct nfs_pageio_descriptor {
 	int 			pg_ioflags;
 	int			pg_error;
 	struct pnfs_layout_segment *pg_lseg;
-#ifdef CONFIG_NFS_V4_1
 	int			(*pg_test)(struct nfs_pageio_descriptor *, struct nfs_page *, struct nfs_page *);
-#endif /* CONFIG_NFS_V4_1 */
 };
 
 #define NFS_WBACK_BUSY(req)	(test_bit(PG_BUSY,&(req)->wb_flags))
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 31/40] pnfs_submit wave3 remove struct pnfs_fl_call_data
  2011-02-04 21:33                                                           ` [PATCH 30/40] pnfs-submit: wave3 let LAYOUTGET distinguish between read and write calls andros
@ 2011-02-04 21:33                                                             ` andros
  2011-02-04 21:33                                                               ` [PATCH 32/40] pnfs_submit: wave3 get rid of pnfs_call_data andros
  2011-02-04 21:59                                                             ` [PATCH 30/40] pnfs-submit: wave3 let LAYOUTGET distinguish between read and write calls Fred Isaman
  1 sibling, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayout.c |   13 +++++++------
 fs/nfs/nfs4proc.c       |    8 ++++----
 fs/nfs/read.c           |    9 +--------
 include/linux/nfs_xdr.h |   13 ++++---------
 4 files changed, 16 insertions(+), 27 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index c7ba5bc..fa718e1 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -104,10 +104,11 @@ static void filelayout_read_call_done(struct rpc_task *task, void *data)
 {
 	struct nfs_read_data *rdata = (struct nfs_read_data *)data;
 
-	if (rdata->fldata.orig_offset) {
+	dprintk("--> %s task->tk_status %d\n", __func__, task->tk_status);
+	if (rdata->orig_offset) {
 		dprintk("%s new off %llu orig offset %llu\n", __func__,
-			rdata->args.offset, rdata->fldata.orig_offset);
-		rdata->args.offset = rdata->fldata.orig_offset;
+			rdata->args.offset, rdata->orig_offset);
+		rdata->args.offset = rdata->orig_offset;
 	}
 
 	/* Note this may cause RPC to be resent */
@@ -152,14 +153,14 @@ filelayout_read_pagelist(struct nfs_read_data *data)
 	dprintk("%s USE DS:ip %x %hu\n", __func__,
 		ntohl(ds->ds_ip_addr), ntohs(ds->ds_port));
 
-	/* just try the first data server for the index..*/
-	data->fldata.ds_nfs_client = ds->ds_clp;
+	/* No multipath support. Use first DS */
+	data->ds_clp = ds->ds_clp;
 	fh = nfs4_fl_select_ds_fh(lseg, offset);
 	if (fh)
 		data->args.fh = fh;
 
 	data->args.offset = filelayout_get_dserver_offset(lseg, offset);
-	data->fldata.orig_offset = offset;
+	data->orig_offset = offset;
 
 	/* Perform an asynchronous read to ds */
 	nfs_initiate_read(data, ds->ds_clp->cl_rpcclient,
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 1da0ebf..213e3f0 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3077,15 +3077,15 @@ static int nfs4_proc_pathconf(struct nfs_server *server, struct nfs_fh *fhandle,
 static int nfs4_read_done(struct rpc_task *task, struct nfs_read_data *data)
 {
 	struct nfs_server *server = NFS_SERVER(data->inode);
-	struct nfs_client *client = server->nfs_client;
+	struct nfs_client *clp = server->nfs_client;
 
 	dprintk("--> %s\n", __func__);
 
 #ifdef CONFIG_NFS_V4_1
 	/* Is this a DS session */
-	if (data->fldata.ds_nfs_client) {
+	if (data->ds_clp) {
 		dprintk("%s DS read\n", __func__);
-		client = data->fldata.ds_nfs_client;
+		clp = data->ds_clp;
 	}
 #endif /* CONFIG_NFS_V4_1 */
 
@@ -3098,7 +3098,7 @@ static int nfs4_read_done(struct rpc_task *task, struct nfs_read_data *data)
 	}
 
 	nfs_invalidate_atime(data->inode);
-	if (task->tk_status > 0 && client == server->nfs_client)
+	if (task->tk_status > 0 && !data->ds_clp)
 		renew_lease(server, data->timestamp);
 	return 0;
 }
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 0db6203..9af3048 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -382,14 +382,7 @@ static void nfs_readpage_retry(struct rpc_task *task, struct nfs_read_data *data
 {
 	struct nfs_readargs *argp = &data->args;
 	struct nfs_readres *resp = &data->res;
-	struct nfs_client *clp = NFS_SERVER(data->inode)->nfs_client;
 
-#ifdef CONFIG_NFS_V4_1
-	if (data->fldata.ds_nfs_client) {
-		dprintk("%s DS read\n", __func__);
-		clp = data->fldata.ds_nfs_client;
-	}
-#endif /* CONFIG_NFS_V4_1 */
 	if (resp->eof || resp->count == argp->count)
 		return;
 
@@ -403,7 +396,7 @@ static void nfs_readpage_retry(struct rpc_task *task, struct nfs_read_data *data
 	argp->offset += resp->count;
 	argp->pgbase += resp->count;
 	argp->count -= resp->count;
-	nfs_restart_rpc(task, clp);
+	nfs_restart_rpc(task, NFS_SERVER(data->inode)->nfs_client);
 }
 
 /*
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 887aff3..4cf522e 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1009,14 +1009,8 @@ struct nfs_page;
 /* pnfs-specific data needed for read, write, and commit calls */
 struct pnfs_call_data {
 	const struct rpc_call_ops *call_ops;
-	u32			orig_count;	/* for retry via MDS */
-	u8			how;		/* for FLUSH_STABLE */
-};
-
-/* files layout-type specific data for read, write, and commit */
-struct pnfs_fl_call_data {
-	struct nfs_client	*ds_nfs_client;
-	__u64			orig_offset;
+	u32			orig_count;     /* for retry via MDS */
+	u8			how;            /* for FLUSH_STABLE */
 };
 #endif /* CONFIG_NFS_V4_1 */
 
@@ -1033,12 +1027,13 @@ struct nfs_read_data {
 	struct nfs_readargs args;
 	struct nfs_readres  res;
 	struct pnfs_layout_segment *lseg;
+	struct nfs_client	*ds_clp;   /* pNFS data server */
 #ifdef CONFIG_NFS_V4
 	unsigned long		timestamp;	/* For lease renewal */
 #endif
 #if defined(CONFIG_NFS_V4_1)
 	struct pnfs_call_data	pdata;
-	struct pnfs_fl_call_data fldata;
+	__u64			orig_offset; /* For filelayout dense stripe */
 #endif /* CONFIG_NFS_V4_1 */
 	struct page		*page_array[NFS_PAGEVEC_SIZE];
 };
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 32/40] pnfs_submit: wave3 get rid of pnfs_call_data
  2011-02-04 21:33                                                             ` [PATCH 31/40] pnfs_submit wave3 remove struct pnfs_fl_call_data andros
@ 2011-02-04 21:33                                                               ` andros
  2011-02-04 21:33                                                                 ` [PATCH 33/40] pnfs-submit wave3 remove CONFIG_NFS_V4 and V4_1 from nfs_read_data andros
  0 siblings, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayout.c |    4 ++--
 fs/nfs/pnfs.c           |    2 +-
 include/linux/nfs_xdr.h |   12 +-----------
 3 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index fa718e1..4c841c0 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -112,14 +112,14 @@ static void filelayout_read_call_done(struct rpc_task *task, void *data)
 	}
 
 	/* Note this may cause RPC to be resent */
-	rdata->pdata.call_ops->rpc_call_done(task, data);
+	rdata->call_ops->rpc_call_done(task, data);
 }
 
 static void filelayout_read_release(void *data)
 {
 	struct nfs_read_data *rdata = (struct nfs_read_data *)data;
 
-	rdata->pdata.call_ops->rpc_release(data);
+	rdata->call_ops->rpc_release(data);
 }
 
 struct rpc_call_ops filelayout_read_call_ops = {
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 76a5e00..e96bd82 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -902,7 +902,7 @@ pnfs_try_to_read_data(struct nfs_read_data *rdata,
 	struct nfs_server *nfss = NFS_SERVER(inode);
 	enum pnfs_try_status trypnfs;
 
-	rdata->pdata.call_ops = call_ops;
+	rdata->call_ops = call_ops;
 
 	dprintk("%s: Reading ino:%lu %u@%llu\n",
 		__func__, inode->i_ino, rdata->args.count, rdata->args.offset);
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 4cf522e..3b2e488 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1004,16 +1004,6 @@ struct nfs_page;
 
 #define NFS_PAGEVEC_SIZE	(8U)
 
-#if defined(CONFIG_NFS_V4_1)
-
-/* pnfs-specific data needed for read, write, and commit calls */
-struct pnfs_call_data {
-	const struct rpc_call_ops *call_ops;
-	u32			orig_count;     /* for retry via MDS */
-	u8			how;            /* for FLUSH_STABLE */
-};
-#endif /* CONFIG_NFS_V4_1 */
-
 struct nfs_read_data {
 	int			flags;
 	struct rpc_task		task;
@@ -1032,7 +1022,7 @@ struct nfs_read_data {
 	unsigned long		timestamp;	/* For lease renewal */
 #endif
 #if defined(CONFIG_NFS_V4_1)
-	struct pnfs_call_data	pdata;
+	const struct rpc_call_ops *call_ops;
 	__u64			orig_offset; /* For filelayout dense stripe */
 #endif /* CONFIG_NFS_V4_1 */
 	struct page		*page_array[NFS_PAGEVEC_SIZE];
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 33/40] pnfs-submit wave3 remove CONFIG_NFS_V4 and V4_1 from nfs_read_data
  2011-02-04 21:33                                                               ` [PATCH 32/40] pnfs_submit: wave3 get rid of pnfs_call_data andros
@ 2011-02-04 21:33                                                                 ` andros
  2011-02-04 21:33                                                                   ` [PATCH 34/40] pnfs-submit wave3 don't use nfs_read_prepare for DS andros
  2011-02-08 22:09                                                                   ` [PATCH 33/40] pnfs-submit wave3 remove CONFIG_NFS_V4 and V4_1 from nfs_read_data Fred Isaman
  0 siblings, 2 replies; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 include/linux/nfs_xdr.h |   12 ++++--------
 1 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 3b2e488..1222aa9 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1016,15 +1016,11 @@ struct nfs_read_data {
 	unsigned int		npages;	/* Max length of pagevec */
 	struct nfs_readargs args;
 	struct nfs_readres  res;
-	struct pnfs_layout_segment *lseg;
-	struct nfs_client	*ds_clp;   /* pNFS data server */
-#ifdef CONFIG_NFS_V4
 	unsigned long		timestamp;	/* For lease renewal */
-#endif
-#if defined(CONFIG_NFS_V4_1)
-	const struct rpc_call_ops *call_ops;
-	__u64			orig_offset; /* For filelayout dense stripe */
-#endif /* CONFIG_NFS_V4_1 */
+	struct pnfs_layout_segment *lseg;
+	struct nfs_client	*ds_clp;	/* pNFS data server */
+	const struct rpc_call_ops *call_ops;	/* For pNFS recovery to MDS */
+	__u64			orig_offset;	/* Filelayout dense stripe */
 	struct page		*page_array[NFS_PAGEVEC_SIZE];
 };
 
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 34/40] pnfs-submit wave3 don't use nfs_read_prepare for DS
  2011-02-04 21:33                                                                 ` [PATCH 33/40] pnfs-submit wave3 remove CONFIG_NFS_V4 and V4_1 from nfs_read_data andros
@ 2011-02-04 21:33                                                                   ` andros
  2011-02-04 21:33                                                                     ` [PATCH 35/40] pnfs_submit wave3 filelayout_read_prepare andros
  2011-02-08 22:09                                                                   ` [PATCH 33/40] pnfs-submit wave3 remove CONFIG_NFS_V4 and V4_1 from nfs_read_data Fred Isaman
  1 sibling, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/read.c |    6 ------
 1 files changed, 0 insertions(+), 6 deletions(-)

diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 9af3048..cb0b239 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -437,19 +437,13 @@ static void nfs_readpage_release_partial(void *calldata)
 void nfs_read_prepare(struct rpc_task *task, void *calldata)
 {
 	struct nfs_read_data *data = calldata;
-	struct nfs4_session *ds_session = NULL;
 
-	if (data->fldata.ds_nfs_client) {
-		dprintk("%s DS read\n", __func__);
-		ds_session = data->fldata.ds_nfs_client->cl_session;
-	}
 	if (nfs4_setup_sequence(NFS_SERVER(data->inode),
 				&data->args.seq_args, &data->res.seq_res,
 				0, task))
 		return;
 	rpc_call_start(task);
 }
-EXPORT_SYMBOL(nfs_read_prepare);
 #endif /* CONFIG_NFS_V4_1 */
 
 static const struct rpc_call_ops nfs_read_partial_ops = {
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 35/40] pnfs_submit wave3 filelayout_read_prepare
  2011-02-04 21:33                                                                   ` [PATCH 34/40] pnfs-submit wave3 don't use nfs_read_prepare for DS andros
@ 2011-02-04 21:33                                                                     ` andros
  2011-02-04 21:33                                                                       ` [PATCH 36/40] pnfs-submit wave3 filelayout read done andros
  0 siblings, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4_fs.h        |    3 +++
 fs/nfs/nfs4filelayout.c |   14 +++++++++++++-
 fs/nfs/nfs4proc.c       |    3 ++-
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 5dc378e..457b1fe 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -252,6 +252,9 @@ static inline struct nfs4_session *nfs4_get_session(const struct nfs_server *ser
 extern int nfs4_setup_sequence(const struct nfs_server *server,
 		struct nfs4_sequence_args *args, struct nfs4_sequence_res *res,
 		int cache_reply, struct rpc_task *task);
+extern int nfs41_setup_sequence(struct nfs4_session *session,
+		struct nfs4_sequence_args *args, struct nfs4_sequence_res *res,
+		int cache_reply, struct rpc_task *task);
 extern void nfs4_destroy_session(struct nfs4_session *session);
 extern struct nfs4_session *nfs4_alloc_session(struct nfs_client *clp);
 extern int nfs4_proc_create_session(struct nfs_client *);
diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 4c841c0..5fd8ed3 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -100,6 +100,18 @@ filelayout_get_dserver_offset(struct pnfs_layout_segment *lseg, loff_t offset)
  * In the case of dense layouts, the offset needs to be reset to its
  * original value.
  */
+static void filelayout_read_prepare(struct rpc_task *task, void *data)
+{
+	struct nfs_read_data *rdata = (struct nfs_read_data *)data;
+
+	if (nfs41_setup_sequence(rdata->ds_clp->cl_session,
+				&rdata->args.seq_args, &rdata->res.seq_res,
+				0, task))
+		return;
+
+	rpc_call_start(task);
+}
+
 static void filelayout_read_call_done(struct rpc_task *task, void *data)
 {
 	struct nfs_read_data *rdata = (struct nfs_read_data *)data;
@@ -123,7 +135,7 @@ static void filelayout_read_release(void *data)
 }
 
 struct rpc_call_ops filelayout_read_call_ops = {
-	.rpc_call_prepare = nfs_read_prepare,
+	.rpc_call_prepare = filelayout_read_prepare,
 	.rpc_call_done = filelayout_read_call_done,
 	.rpc_release = filelayout_read_release,
 };
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 213e3f0..3fcf756 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -505,7 +505,7 @@ out:
 	return ret_id;
 }
 
-static int nfs41_setup_sequence(struct nfs4_session *session,
+int nfs41_setup_sequence(struct nfs4_session *session,
 				struct nfs4_sequence_args *args,
 				struct nfs4_sequence_res *res,
 				int cache_reply,
@@ -571,6 +571,7 @@ static int nfs41_setup_sequence(struct nfs4_session *session,
 	res->sr_status = 1;
 	return 0;
 }
+EXPORT_SYMBOL_GPL(nfs41_setup_sequence);
 
 int nfs4_setup_sequence(const struct nfs_server *server,
 			struct nfs4_sequence_args *args,
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 36/40] pnfs-submit wave3 filelayout read done
  2011-02-04 21:33                                                                     ` [PATCH 35/40] pnfs_submit wave3 filelayout_read_prepare andros
@ 2011-02-04 21:33                                                                       ` andros
  2011-02-04 21:33                                                                         ` [PATCH 37/40] pnfs-submit wave3 send zero stateid seqid on v4.1 i/o andros
  2011-02-08 23:06                                                                         ` [PATCH 36/40] pnfs-submit wave3 filelayout read done Fred Isaman
  0 siblings, 2 replies; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Use our own async error handler.
Mark the layout as failed and retry i/o through the MDS on specified errors.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/internal.h           |    1 +
 fs/nfs/nfs4filelayout.c     |   86 +++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/nfs4proc.c           |   44 +++++++++++++--------
 fs/nfs/nfs4state.c          |    1 +
 fs/nfs/pnfs.h               |    1 -
 include/linux/nfs_xdr.h     |    1 +
 include/linux/sunrpc/clnt.h |    1 +
 net/sunrpc/clnt.c           |    8 ++++
 8 files changed, 125 insertions(+), 18 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 5518d61..f69a322 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -281,6 +281,7 @@ extern int nfs_migrate_page(struct address_space *,
 #endif
 
 /* nfs4proc.c */
+extern void nfs4_reset_read(struct rpc_task *task, struct nfs_read_data *data);
 extern int _nfs4_call_sync(struct nfs_server *server,
 			   struct rpc_message *msg,
 			   struct nfs4_sequence_args *args,
diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 5fd8ed3..777d78b 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -40,6 +40,8 @@ MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Dean Hildebrand <dhildebz@umich.edu>");
 MODULE_DESCRIPTION("The NFSv4 file layout driver");
 
+#define FILELAYOUT_POLL_RETRY_MAX     (15*HZ)
+
 static int
 filelayout_set_layoutdriver(struct nfs_server *nfss)
 {
@@ -95,6 +97,88 @@ filelayout_get_dserver_offset(struct pnfs_layout_segment *lseg, loff_t offset)
 	BUG();
 }
 
+/* For data server errors we don't recover from */
+static void
+filelayout_set_lo_fail(struct pnfs_layout_segment *lseg, fmode_t mode)
+{
+	if (mode & FMODE_WRITE) {
+		dprintk("%s Setting layout IOMODE_RW fail bit\n", __func__);
+		set_bit(lo_fail_bit(IOMODE_RW), &lseg->pls_layout->plh_flags);
+	} else if (mode & FMODE_READ) {
+		dprintk("%s Setting layout IOMODE_READ fail bit\n", __func__);
+		set_bit(lo_fail_bit(IOMODE_READ), &lseg->pls_layout->plh_flags);
+	}
+}
+
+/*
+ * Async I/O error handler.
+ *
+ * NFS4ERR_OLD_STATEID can not occur with a zero stateid seqid.
+ */
+static int filelayout_async_handle_error(struct rpc_task *task,
+					 struct nfs4_state *state,
+					 struct nfs_client *clp,
+					 int *reset)
+{
+	if (task->tk_status >= 0)
+		return 0;
+	switch (task->tk_status) {
+	case -NFS4ERR_BADSESSION:
+	case -NFS4ERR_BADSLOT:
+	case -NFS4ERR_BAD_HIGH_SLOT:
+	case -NFS4ERR_DEADSESSION:
+	case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION:
+	case -NFS4ERR_SEQ_FALSE_RETRY:
+	case -NFS4ERR_SEQ_MISORDERED:
+		dprintk("%s ERROR %d, Reset session. Exchangeid "
+			"flags 0x%x\n", __func__, task->tk_status,
+			clp->cl_exchange_flags);
+		nfs4_schedule_state_recovery(clp);
+		task->tk_status = 0;
+		return -EAGAIN;
+	case -NFS4ERR_DELAY:
+	case -NFS4ERR_GRACE:
+	case -EKEYEXPIRED:
+		rpc_delay(task, FILELAYOUT_POLL_RETRY_MAX);
+		task->tk_status = 0;
+		return -EAGAIN;
+	default:
+		dprintk("%s DS error %d\n", __func__, task->tk_status);
+		/* Layout marked as failed by pnfs_check_io_status.
+		 * Retry I/O through the MDS */
+		*reset = 1;
+		task->tk_status = 0;
+		return -EAGAIN;
+	}
+}
+
+/* NFS_PROTO call done callback routines */
+
+static int filelayout_read_done_cb(struct rpc_task *task,
+				struct nfs_read_data *data)
+{
+	struct nfs_client *clp = data->ds_clp;
+	int reset = 0;
+
+	dprintk("%s DS read\n", __func__);
+
+	if (filelayout_async_handle_error(task, data->args.context->state,
+					  data->ds_clp, &reset) == -EAGAIN) {
+		dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
+			__func__, data->ds_clp, data->ds_clp->cl_session);
+		if (reset) {
+			nfs4_reset_read(task, data);
+			filelayout_set_lo_fail(data->lseg,
+					data->args.context->state->state);
+			clp = NFS_SERVER(data->inode)->nfs_client;
+		}
+		nfs_restart_rpc(task, clp);
+		return -EAGAIN;
+	}
+
+	return 0;
+}
+
 /*
  * Call ops for the async read/write cases
  * In the case of dense layouts, the offset needs to be reset to its
@@ -104,6 +188,8 @@ static void filelayout_read_prepare(struct rpc_task *task, void *data)
 {
 	struct nfs_read_data *rdata = (struct nfs_read_data *)data;
 
+	rdata->read_done_cb = filelayout_read_done_cb;
+
 	if (nfs41_setup_sequence(rdata->ds_clp->cl_session,
 				&rdata->args.seq_args, &rdata->res.seq_res,
 				0, task))
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 3fcf756..9dee49d 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3075,41 +3075,51 @@ static int nfs4_proc_pathconf(struct nfs_server *server, struct nfs_fh *fhandle,
 	return err;
 }
 
-static int nfs4_read_done(struct rpc_task *task, struct nfs_read_data *data)
+static int nfs4_read_done_cb(struct rpc_task *task, struct nfs_read_data *data)
 {
 	struct nfs_server *server = NFS_SERVER(data->inode);
-	struct nfs_client *clp = server->nfs_client;
-
-	dprintk("--> %s\n", __func__);
-
-#ifdef CONFIG_NFS_V4_1
-	/* Is this a DS session */
-	if (data->ds_clp) {
-		dprintk("%s DS read\n", __func__);
-		clp = data->ds_clp;
-	}
-#endif /* CONFIG_NFS_V4_1 */
-
-	if (!nfs4_sequence_done(task, &data->res.seq_res))
-		return -EAGAIN;
 
 	if (nfs4_async_handle_error(task, server, data->args.context->state) == -EAGAIN) {
-		nfs_restart_rpc(task, client);
+		nfs_restart_rpc(task, server->nfs_client);
 		return -EAGAIN;
 	}
 
 	nfs_invalidate_atime(data->inode);
-	if (task->tk_status > 0 && !data->ds_clp)
+	if (task->tk_status > 0)
 		renew_lease(server, data->timestamp);
 	return 0;
 }
 
+static int nfs4_read_done(struct rpc_task *task, struct nfs_read_data *data)
+{
+
+	dprintk("--> %s\n", __func__);
+
+	if (!nfs4_sequence_done(task, &data->res.seq_res))
+		return -EAGAIN;
+
+	return data->read_done_cb(task, data);
+}
+
 static void nfs4_proc_read_setup(struct nfs_read_data *data, struct rpc_message *msg)
 {
 	data->timestamp   = jiffies;
+	data->read_done_cb = nfs4_read_done_cb;
 	msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
 }
 
+/* Reset the the nfs_read_data to send the read to another server. */
+void nfs4_reset_read(struct rpc_task *task, struct nfs_read_data *data)
+{
+	dprintk("%s Reset task for i/o through \n", __func__);
+	data->ds_clp = NULL;
+	data->args.fh     = NFS_FH(data->inode);
+	data->read_done_cb = nfs4_read_done_cb;
+	task->tk_ops = data->call_ops;
+	rpc_task_reset_client(task, NFS_CLIENT(data->inode));
+}
+EXPORT_SYMBOL_GPL(nfs4_reset_read);
+
 static int nfs4_write_done(struct rpc_task *task, struct nfs_write_data *data)
 {
 	struct inode *inode = data->inode;
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 49433aa..346fb97 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1022,6 +1022,7 @@ void nfs4_schedule_state_recovery(struct nfs_client *clp)
 		set_bit(NFS4CLNT_CHECK_LEASE, &clp->cl_state);
 	nfs4_schedule_state_manager(clp);
 }
+EXPORT_SYMBOL_GPL(nfs4_schedule_state_recovery);
 
 int nfs4_state_mark_reclaim_reboot(struct nfs_client *clp, struct nfs4_state *state)
 {
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 6a99c33..218cdfe 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -198,7 +198,6 @@ void pnfs_roc_release(struct inode *ino);
 void pnfs_roc_set_barrier(struct inode *ino, u32 barrier);
 bool pnfs_roc_drain(struct inode *ino, u32 *barrier);
 
-
 static inline int lo_fail_bit(u32 iomode)
 {
 	return iomode == IOMODE_RW ?
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 1222aa9..c91f468 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1020,6 +1020,7 @@ struct nfs_read_data {
 	struct pnfs_layout_segment *lseg;
 	struct nfs_client	*ds_clp;	/* pNFS data server */
 	const struct rpc_call_ops *call_ops;	/* For pNFS recovery to MDS */
+	int (*read_done_cb) (struct rpc_task *task, struct nfs_read_data *data);
 	__u64			orig_offset;	/* Filelayout dense stripe */
 	struct page		*page_array[NFS_PAGEVEC_SIZE];
 };
diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
index ef9476a..db7bcaf 100644
--- a/include/linux/sunrpc/clnt.h
+++ b/include/linux/sunrpc/clnt.h
@@ -129,6 +129,7 @@ struct rpc_create_args {
 struct rpc_clnt *rpc_create(struct rpc_create_args *args);
 struct rpc_clnt	*rpc_bind_new_program(struct rpc_clnt *,
 				struct rpc_program *, u32);
+void rpc_task_reset_client(struct rpc_task *task, struct rpc_clnt *clnt);
 struct rpc_clnt *rpc_clone_client(struct rpc_clnt *);
 void		rpc_shutdown_client(struct rpc_clnt *);
 void		rpc_release_client(struct rpc_clnt *);
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 57d344c..5c4df70 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -597,6 +597,14 @@ void rpc_task_set_client(struct rpc_task *task, struct rpc_clnt *clnt)
 	}
 }
 
+void rpc_task_reset_client(struct rpc_task *task, struct rpc_clnt *clnt)
+{
+	rpc_task_release_client(task);
+	rpc_task_set_client(task, clnt);
+}
+EXPORT_SYMBOL_GPL(rpc_task_reset_client);
+
+
 static void
 rpc_task_set_rpc_message(struct rpc_task *task, const struct rpc_message *msg)
 {
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 37/40] pnfs-submit wave3 send zero stateid seqid on v4.1 i/o
  2011-02-04 21:33                                                                       ` [PATCH 36/40] pnfs-submit wave3 filelayout read done andros
@ 2011-02-04 21:33                                                                         ` andros
  2011-02-04 21:34                                                                           ` [PATCH 38/40] pnfs-submit wave3 new flag for state renewal check andros
  2011-02-07 17:42                                                                           ` [PATCH 37/40] pnfs-submit wave3 send zero stateid seqid on v4.1 i/o Benny Halevy
  2011-02-08 23:06                                                                         ` [PATCH 36/40] pnfs-submit wave3 filelayout read done Fred Isaman
  1 sibling, 2 replies; 58+ messages in thread
From: andros @ 2011-02-04 21:33 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Data servers require a zero stateid seqid, and there is no advantage to not
doing the same for all NFSv4.1

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4xdr.c |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 4e2c168..2380c45 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -1384,7 +1384,7 @@ static void encode_putrootfh(struct xdr_stream *xdr, struct compound_hdr *hdr)
 	hdr->replen += decode_putrootfh_maxsz;
 }
 
-static void encode_stateid(struct xdr_stream *xdr, const struct nfs_open_context *ctx, const struct nfs_lock_context *l_ctx)
+static void encode_stateid(struct xdr_stream *xdr, const struct nfs_open_context *ctx, const struct nfs_lock_context *l_ctx, int zero_seqid)
 {
 	nfs4_stateid stateid;
 	__be32 *p;
@@ -1392,6 +1392,8 @@ static void encode_stateid(struct xdr_stream *xdr, const struct nfs_open_context
 	p = reserve_space(xdr, NFS4_STATEID_SIZE);
 	if (ctx->state != NULL) {
 		nfs4_copy_stateid(&stateid, ctx->state, l_ctx->lockowner, l_ctx->pid);
+		if (zero_seqid)
+			stateid.stateid.seqid = 0;
 		xdr_encode_opaque_fixed(p, stateid.data, NFS4_STATEID_SIZE);
 	} else
 		xdr_encode_opaque_fixed(p, zero_stateid.data, NFS4_STATEID_SIZE);
@@ -1404,7 +1406,8 @@ static void encode_read(struct xdr_stream *xdr, const struct nfs_readargs *args,
 	p = reserve_space(xdr, 4);
 	*p = cpu_to_be32(OP_READ);
 
-	encode_stateid(xdr, args->context, args->lock_context);
+	encode_stateid(xdr, args->context, args->lock_context,
+		       hdr->minorversion);
 
 	p = reserve_space(xdr, 12);
 	p = xdr_encode_hyper(p, args->offset);
@@ -1592,7 +1595,8 @@ static void encode_write(struct xdr_stream *xdr, const struct nfs_writeargs *arg
 	p = reserve_space(xdr, 4);
 	*p = cpu_to_be32(OP_WRITE);
 
-	encode_stateid(xdr, args->context, args->lock_context);
+	encode_stateid(xdr, args->context, args->lock_context,
+		       hdr->minorversion);
 
 	p = reserve_space(xdr, 16);
 	p = xdr_encode_hyper(p, args->offset);
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 38/40] pnfs-submit wave3 new flag for state renewal check
  2011-02-04 21:33                                                                         ` [PATCH 37/40] pnfs-submit wave3 send zero stateid seqid on v4.1 i/o andros
@ 2011-02-04 21:34                                                                           ` andros
  2011-02-04 21:34                                                                             ` [PATCH 39/40] pnfs-submit wave3 new flag for lease time check andros
  2011-02-07 17:42                                                                           ` [PATCH 37/40] pnfs-submit wave3 send zero stateid seqid on v4.1 i/o Benny Halevy
  1 sibling, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:34 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Data servers not sharing a session with the mount MDS always have an empty
cl_superblocks list.
Replace the cl_superblocks empty list check to see if it is time to shut down
renewd with the NFS_CS_STOP_RENEW bit which is not set by such a data server.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/client.c           |    5 +++++
 fs/nfs/nfs4renewd.c       |    6 +-----
 include/linux/nfs_fs_sb.h |    1 +
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 9e07586..c2ba883 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -1031,14 +1031,19 @@ static void nfs_server_insert_lists(struct nfs_server *server)
 	spin_lock(&nfs_client_lock);
 	list_add_tail_rcu(&server->client_link, &clp->cl_superblocks);
 	list_add_tail(&server->master_link, &nfs_volume_list);
+	clear_bit(NFS_CS_STOP_RENEW, &clp->cl_res_state);
 	spin_unlock(&nfs_client_lock);
 
 }
 
 static void nfs_server_remove_lists(struct nfs_server *server)
 {
+	struct nfs_client *clp = server->nfs_client;
+
 	spin_lock(&nfs_client_lock);
 	list_del_rcu(&server->client_link);
+	if (clp && list_empty(&clp->cl_superblocks))
+		set_bit(NFS_CS_STOP_RENEW, &clp->cl_res_state);
 	list_del(&server->master_link);
 	spin_unlock(&nfs_client_lock);
 
diff --git a/fs/nfs/nfs4renewd.c b/fs/nfs/nfs4renewd.c
index c8dbbeb..df8e7f3 100644
--- a/fs/nfs/nfs4renewd.c
+++ b/fs/nfs/nfs4renewd.c
@@ -64,12 +64,8 @@ nfs4_renew_state(struct work_struct *work)
 	ops = clp->cl_mvops->state_renewal_ops;
 	dprintk("%s: start\n", __func__);
 
-	rcu_read_lock();
-	if (list_empty(&clp->cl_superblocks) && !is_ds_only_client(clp)) {
-		rcu_read_unlock();
+	if (test_bit(NFS_CS_STOP_RENEW, &clp->cl_res_state))
 		goto out;
-	}
-	rcu_read_unlock();
 
 	spin_lock(&clp->cl_lock);
 	lease = clp->cl_lease_time;
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index b197563..2c2dc18 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -30,6 +30,7 @@ struct nfs_client {
 #define NFS_CS_CALLBACK		1		/* - callback started */
 #define NFS_CS_IDMAP		2		/* - idmap started */
 #define NFS_CS_RENEWD		3		/* - renewd started */
+#define NFS_CS_STOP_RENEW	4		/* no more state to renew */
 	struct sockaddr_storage	cl_addr;	/* server identifier */
 	size_t			cl_addrlen;
 	char *			cl_hostname;	/* hostname of server */
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 39/40] pnfs-submit wave3 new flag for lease time check
  2011-02-04 21:34                                                                           ` [PATCH 38/40] pnfs-submit wave3 new flag for state renewal check andros
@ 2011-02-04 21:34                                                                             ` andros
  2011-02-04 21:34                                                                               ` [PATCH 40/40] pnfs-submit wave3 add MDS mount DS only check andros
  0 siblings, 1 reply; 58+ messages in thread
From: andros @ 2011-02-04 21:34 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Data servers cannot send nfs4_proc_get_lease_time. but still need to setup
state renewal. Add the NFS_CS_CHECK_LEASE_TIME bit to indicate if the lease
time can be checked.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/client.c            |    9 +++++++++
 fs/nfs/nfs4filelayoutdev.c |    4 ++--
 fs/nfs/nfs4state.c         |    2 +-
 include/linux/nfs_fs_sb.h  |    1 +
 4 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index c2ba883..842c288 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -1412,6 +1412,15 @@ static int nfs4_set_client(struct nfs_server *server,
 		goto error;
 	}
 
+	/*
+	 * Query for the lease time on clientid setup or renewal
+	 *
+	 * Note that this will be set on nfs_clients that were created
+	 * only for the DS role and did not set this bit, but now will
+	 * serve a dual role.
+	 */
+	set_bit(NFS_CS_CHECK_LEASE_TIME, &clp->cl_res_state);
+
 	server->nfs_client = clp;
 	dprintk("<-- nfs4_set_client() = 0 [new %p]\n", clp);
 	return 0;
diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index 8642109..96e9e6a 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -142,8 +142,8 @@ nfs4_ds_connect(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds)
 	}
 
 	/*
-	 * Set DS lease equal to the MDS lease, renewal is scheduled in
-	 * create_session
+	 * Do not set NFS_CS_CHECK_LEASE_TIME instead set the DS lease to
+	 * be equal to the MDS lease. Renewal is scheduled in create_session.
 	 */
 	spin_lock(&mds_srv->nfs_client->cl_lock);
 	clp->cl_lease_time = mds_srv->nfs_client->cl_lease_time;
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 346fb97..6da026a 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -153,7 +153,7 @@ static int nfs41_setup_state_renewal(struct nfs_client *clp)
 	int status;
 	struct nfs_fsinfo fsinfo;
 
-	if (is_ds_only_client(clp)) {
+	if (!test_bit(NFS_CS_CHECK_LEASE_TIME, &clp->cl_res_state)) {
 		nfs4_schedule_state_renewal(clp);
 		return 0;
 	}
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 2c2dc18..2669a9a 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -31,6 +31,7 @@ struct nfs_client {
 #define NFS_CS_IDMAP		2		/* - idmap started */
 #define NFS_CS_RENEWD		3		/* - renewd started */
 #define NFS_CS_STOP_RENEW	4		/* no more state to renew */
+#define NFS_CS_CHECK_LEASE_TIME	5		/* need to check lease time */
 	struct sockaddr_storage	cl_addr;	/* server identifier */
 	size_t			cl_addrlen;
 	char *			cl_hostname;	/* hostname of server */
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 40/40] pnfs-submit wave3 add MDS mount DS only check
  2011-02-04 21:34                                                                             ` [PATCH 39/40] pnfs-submit wave3 new flag for lease time check andros
@ 2011-02-04 21:34                                                                               ` andros
  0 siblings, 0 replies; 58+ messages in thread
From: andros @ 2011-02-04 21:34 UTC (permalink / raw)
  To: bhalevy; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

The DS only role cannot be used to mount.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/client.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 842c288..17da633 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -1412,6 +1412,12 @@ static int nfs4_set_client(struct nfs_server *server,
 		goto error;
 	}
 
+	/* Cannot mount a DS only server */
+	if (is_ds_only_client(clp)) {
+		error = -ENODEV;
+		goto error;
+	}
+
 	/*
 	 * Query for the lease time on clientid setup or renewal
 	 *
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH 19/40] SQUASHME pnfs-submit wave3 filelayout read pagelist cleanup
  2011-02-04 21:33                                     ` [PATCH 19/40] SQUASHME pnfs-submit wave3 filelayout read pagelist cleanup andros
  2011-02-04 21:33                                       ` [PATCH 20/40] SQUASHME pnfs-submit wave3 remove nr_pages from read_pagelist andros
@ 2011-02-04 21:44                                       ` Fred Isaman
  2011-02-05 16:47                                         ` William A. (Andy) Adamson
  1 sibling, 1 reply; 58+ messages in thread
From: Fred Isaman @ 2011-02-04 21:44 UTC (permalink / raw)
  To: andros; +Cc: bhalevy, linux-nfs

On Fri, Feb 4, 2011 at 4:33 PM,  <andros@netapp.com> wrote:
> From: Andy Adamson <andros@netapp.com>
>
> Signed-off-by: Andy Adamson <andros@netapp.com>
> ---
>  fs/nfs/nfs4filelayout.c |   19 ++-----------------
>  1 files changed, 2 insertions(+), 17 deletions(-)
>
> diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
> index 3daf351..dce90a0 100644
> --- a/fs/nfs/nfs4filelayout.c
> +++ b/fs/nfs/nfs4filelayout.c
> @@ -132,14 +132,6 @@ struct rpc_call_ops filelayout_read_call_ops = {
>        .rpc_release = filelayout_read_release,
>  };
>
> -/* Perform sync or async reads.
> - *
> - * An optimization for the NFS file layout driver
> - * allows the original read/write data structs to be passed in the
> - * last argument.
> - *
> - * TODO: join with write_pagelist?
> - */
>  static enum pnfs_try_status
>  filelayout_read_pagelist(struct nfs_read_data *data, unsigned nr_pages)
>  {
> @@ -149,8 +141,8 @@ filelayout_read_pagelist(struct nfs_read_data *data, unsigned nr_pages)
>        u32 idx;
>        struct nfs_fh *fh;
>
> -       dprintk("--> %s ino %lu nr_pages %d pgbase %u req %Zu@%llu\n",
> -               __func__, data->inode->i_ino, nr_pages,
> +       dprintk("--> %s ino %lu pgbase %u req %Zu@%llu\n",
> +               __func__, data->inode->i_ino,
>                data->args.pgbase, (size_t)data->args.count, offset);
>

A nit, but this belongs in the next patch.

Fred

>        /* Retrieve the correct rpc_client for the byte range */
> @@ -169,13 +161,6 @@ filelayout_read_pagelist(struct nfs_read_data *data, unsigned nr_pages)
>        if (fh)
>                data->args.fh = fh;
>
> -       /*
> -        * Now get the file offset on the dserver
> -        * Set the read offset to this offset, and
> -        * save the original offset in orig_offset
> -        * In the case of aync reads, the offset will be reset in the
> -        * call_ops->rpc_call_done() routine.
> -        */
>        data->args.offset = filelayout_get_dserver_offset(lseg, offset);
>        data->fldata.orig_offset = offset;
>
> --
> 1.6.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 23/40] SQUASHME pnfs-submit wave3 new function for ds expired lease
  2011-02-04 21:33                                             ` [PATCH 23/40] SQUASHME pnfs-submit wave3 new function for ds expired lease andros
  2011-02-04 21:33                                               ` [PATCH 24/40] NFS move nfs_client initialization into nfs_get_client andros
@ 2011-02-04 21:51                                               ` Fred Isaman
  2011-02-05 16:46                                                 ` William A. (Andy) Adamson
  1 sibling, 1 reply; 58+ messages in thread
From: Fred Isaman @ 2011-02-04 21:51 UTC (permalink / raw)
  To: andros; +Cc: bhalevy, linux-nfs

On Fri, Feb 4, 2011 at 4:33 PM,  <andros@netapp.com> wrote:
> From: Andy Adamson <andros@netapp.com>
>
> Signed-off-by: Andy Adamson <andros@netapp.com>
> ---
>  fs/nfs/nfs4proc.c |   11 ++++++++---
>  1 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 9c50be7..fb22cbf 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -1574,7 +1574,7 @@ static int _nfs4_proc_open(struct nfs4_opendata *data)
>        return 0;
>  }
>
> -int nfs4_recover_expired_lease(struct nfs_client *clp)
> +static int nfs4_client_recover_expired_lease(struct nfs_client *clp)
>  {
>        unsigned int loop;
>        int ret;
> @@ -1593,6 +1593,11 @@ int nfs4_recover_expired_lease(struct nfs_client *clp)
>  }
>  EXPORT_SYMBOL(nfs4_recover_expired_lease);
>
> +static int nfs4_recover_expired_lease(struct nfs_server *server)
> +{
> +       return nfs4_client_recover_expired_lease(server->nfs_client);
> +}
> +

Why are we doing this extra indirection?

Fred

>  /*
>  * OPEN_EXPIRED:
>  *     reclaim state on the server after a network partition.
> @@ -1680,7 +1685,7 @@ static int _nfs4_do_open(struct inode *dir, struct path *path, fmode_t fmode, in
>                dprintk("nfs4_do_open: nfs4_get_state_owner failed!\n");
>                goto out_err;
>        }
> -       status = nfs4_recover_expired_lease(server->nfs_client);
> +       status = nfs4_recover_expired_lease(server);
>        if (status != 0)
>                goto err_put_state_owner;
>        if (path->dentry->d_inode != NULL)
> @@ -5075,7 +5080,7 @@ int nfs4_init_session(struct nfs_server *server)
>        session->fc_attrs.max_rqst_sz = wsize + nfs41_maxwrite_overhead;
>        session->fc_attrs.max_resp_sz = rsize + nfs41_maxread_overhead;
>
> -       ret = nfs4_recover_expired_lease(server->nfs_client);
> +       ret = nfs4_recover_expired_lease(server);
>        if (!ret)
>                ret = nfs4_check_client_ready(clp);
>        return ret;
> --
> 1.6.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 30/40] pnfs-submit: wave3 let LAYOUTGET distinguish between read and write calls
  2011-02-04 21:33                                                           ` [PATCH 30/40] pnfs-submit: wave3 let LAYOUTGET distinguish between read and write calls andros
  2011-02-04 21:33                                                             ` [PATCH 31/40] pnfs_submit wave3 remove struct pnfs_fl_call_data andros
@ 2011-02-04 21:59                                                             ` Fred Isaman
  2011-02-05 16:47                                                               ` William A. (Andy) Adamson
  1 sibling, 1 reply; 58+ messages in thread
From: Fred Isaman @ 2011-02-04 21:59 UTC (permalink / raw)
  To: andros; +Cc: bhalevy, linux-nfs

On Fri, Feb 4, 2011 at 4:33 PM,  <andros@netapp.com> wrote:
> From: Fred Isaman <iisaman@netapp.com>
>
> This is done by introducing pgio->pg_iswrite.
>
> For wave3 do not send layoutget on write
> pnfs-submit wave 3 remove pg_iswrite add back for wave4
> Remove CONFIG_NFS_V4_1 from struct nfs_page
>

The pg_iswrite comments are outdated.

Fred

> Signed-off-by: Fred Isaman <iisaman@netapp.com>
> Signed-off-by: Andy Adamson <andros@netapp.com>
> ---
>  fs/nfs/pagelist.c        |    9 ++++-----
>  fs/nfs/write.c           |    3 +++
>  include/linux/nfs_page.h |    2 --
>  3 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> index ea3b7f8..cf09cb7 100644
> --- a/fs/nfs/pagelist.c
> +++ b/fs/nfs/pagelist.c
> @@ -259,10 +259,8 @@ static int nfs_can_coalesce_requests(struct nfs_page *prev,
>        /* For non-whole file layouts, need to check that req is inside of
>         * pgio->pg_test.
>         */
> -#ifdef CONFIG_NFS_V4_1
>        if (pgio->pg_test && !pgio->pg_test(pgio, prev, req))
>                return 0;
> -#endif /* CONFIG_NFS_V4_1 */
>        return 1;
>  }
>
> @@ -300,9 +298,10 @@ static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
>        } else {
>                put_lseg(desc->pg_lseg);
>                desc->pg_base = req->wb_pgbase;
> -               desc->pg_lseg = pnfs_update_layout(desc->pg_inode,
> -                                                  req->wb_context,
> -                                                  IOMODE_READ);
> +               if (desc->pg_test)
> +                       desc->pg_lseg = pnfs_update_layout(desc->pg_inode,
> +                                                          req->wb_context,
> +                                                          IOMODE_READ);
>        }
>        nfs_list_remove_request(req);
>        nfs_list_add_request(req, &desc->pg_list);
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index 004c28b..aca0268 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -28,6 +28,7 @@
>  #include "iostat.h"
>  #include "nfs4_fs.h"
>  #include "fscache.h"
> +#include "pnfs.h"
>
>  #define NFSDBG_FACILITY                NFSDBG_PAGECACHE
>
> @@ -982,6 +983,8 @@ static void nfs_pageio_init_write(struct nfs_pageio_descriptor *pgio,
>  {
>        size_t wsize = NFS_SERVER(inode)->wsize;
>
> +       pgio->pg_test = NULL;
> +
>        if (wsize < PAGE_CACHE_SIZE)
>                nfs_pageio_init(pgio, inode, nfs_flush_multi, wsize, ioflags);
>        else
> diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
> index 488f27b..ba88ff4 100644
> --- a/include/linux/nfs_page.h
> +++ b/include/linux/nfs_page.h
> @@ -63,9 +63,7 @@ struct nfs_pageio_descriptor {
>        int                     pg_ioflags;
>        int                     pg_error;
>        struct pnfs_layout_segment *pg_lseg;
> -#ifdef CONFIG_NFS_V4_1
>        int                     (*pg_test)(struct nfs_pageio_descriptor *, struct nfs_page *, struct nfs_page *);
> -#endif /* CONFIG_NFS_V4_1 */
>  };
>
>  #define NFS_WBACK_BUSY(req)    (test_bit(PG_BUSY,&(req)->wb_flags))
> --
> 1.6.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 23/40] SQUASHME pnfs-submit wave3 new function for ds expired lease
  2011-02-04 21:51                                               ` [PATCH 23/40] SQUASHME pnfs-submit wave3 new function for ds expired lease Fred Isaman
@ 2011-02-05 16:46                                                 ` William A. (Andy) Adamson
  2011-02-06 19:41                                                   ` Fred Isaman
  0 siblings, 1 reply; 58+ messages in thread
From: William A. (Andy) Adamson @ 2011-02-05 16:46 UTC (permalink / raw)
  To: Fred Isaman; +Cc: bhalevy, linux-nfs

On Fri, Feb 4, 2011 at 4:51 PM, Fred Isaman <iisaman@netapp.com> wrote:
> On Fri, Feb 4, 2011 at 4:33 PM,  <andros@netapp.com> wrote:
>> From: Andy Adamson <andros@netapp.com>
>>
>> Signed-off-by: Andy Adamson <andros@netapp.com>
>> ---
>> áfs/nfs/nfs4proc.c | á 11 ++++++++---
>> á1 files changed, 8 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> index 9c50be7..fb22cbf 100644
>> --- a/fs/nfs/nfs4proc.c
>> +++ b/fs/nfs/nfs4proc.c
>> @@ -1574,7 +1574,7 @@ static int _nfs4_proc_open(struct nfs4_opendata *data)
>> á á á áreturn 0;
>> á}
>>
>> -int nfs4_recover_expired_lease(struct nfs_client *clp)
>> +static int nfs4_client_recover_expired_lease(struct nfs_client *clp)
>> á{
>> á á á áunsigned int loop;
>> á á á áint ret;
>> @@ -1593,6 +1593,11 @@ int nfs4_recover_expired_lease(struct nfs_client *clp)
>> á}
>> áEXPORT_SYMBOL(nfs4_recover_expired_lease);
>>
>> +static int nfs4_recover_expired_lease(struct nfs_server *server)
>> +{
>> + á á á return nfs4_client_recover_expired_lease(server->nfs_client);
>> +}
>> +
>
> Why are we doing this extra indirection?

As Trond pointed out, it is a lot less intrusive to the existing code.

-->Andy

>
> Fred
>
>> á/*
>> á* OPEN_EXPIRED:
>> á* á á reclaim state on the server after a network partition.
>> @@ -1680,7 +1685,7 @@ static int _nfs4_do_open(struct inode *dir, struct path *path, fmode_t fmode, in
>> á á á á á á á ádprintk("nfs4_do_open: nfs4_get_state_owner failed!\n");
>> á á á á á á á ágoto out_err;
>> á á á á}
>> - á á á status = nfs4_recover_expired_lease(server->nfs_client);
>> + á á á status = nfs4_recover_expired_lease(server);
>> á á á áif (status != 0)
>> á á á á á á á ágoto err_put_state_owner;
>> á á á áif (path->dentry->d_inode != NULL)
>> @@ -5075,7 +5080,7 @@ int nfs4_init_session(struct nfs_server *server)
>> á á á ásession->fc_attrs.max_rqst_sz = wsize + nfs41_maxwrite_overhead;
>> á á á ásession->fc_attrs.max_resp_sz = rsize + nfs41_maxread_overhead;
>>
>> - á á á ret = nfs4_recover_expired_lease(server->nfs_client);
>> + á á á ret = nfs4_recover_expired_lease(server);
>> á á á áif (!ret)
>> á á á á á á á áret = nfs4_check_client_ready(clp);
>> á á á áreturn ret;
>> --
>> 1.6.6
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at áhttp://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 30/40] pnfs-submit: wave3 let LAYOUTGET distinguish between read and write calls
  2011-02-04 21:59                                                             ` [PATCH 30/40] pnfs-submit: wave3 let LAYOUTGET distinguish between read and write calls Fred Isaman
@ 2011-02-05 16:47                                                               ` William A. (Andy) Adamson
  0 siblings, 0 replies; 58+ messages in thread
From: William A. (Andy) Adamson @ 2011-02-05 16:47 UTC (permalink / raw)
  To: Fred Isaman; +Cc: bhalevy, linux-nfs

On Fri, Feb 4, 2011 at 4:59 PM, Fred Isaman <iisaman@netapp.com> wrote:
> On Fri, Feb 4, 2011 at 4:33 PM,  <andros@netapp.com> wrote:
>> From: Fred Isaman <iisaman@netapp.com>
>>
>> This is done by introducing pgio->pg_iswrite.
>>
>> For wave3 do not send layoutget on write
>> pnfs-submit wave 3 remove pg_iswrite add back for wave4
>> Remove CONFIG_NFS_V4_1 from struct nfs_page
>>
>
> The pg_iswrite comments are outdated.

OK

>
> Fred
>
>> Signed-off-by: Fred Isaman <iisaman@netapp.com>
>> Signed-off-by: Andy Adamson <andros@netapp.com>
>> ---
>>  fs/nfs/pagelist.c        |    9 ++++-----
>>  fs/nfs/write.c           |    3 +++
>>  include/linux/nfs_page.h |    2 --
>>  3 files changed, 7 insertions(+), 7 deletions(-)
>>
>> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
>> index ea3b7f8..cf09cb7 100644
>> --- a/fs/nfs/pagelist.c
>> +++ b/fs/nfs/pagelist.c
>> @@ -259,10 +259,8 @@ static int nfs_can_coalesce_requests(struct nfs_page *prev,
>>        /* For non-whole file layouts, need to check that req is inside of
>>         * pgio->pg_test.
>>         */
>> -#ifdef CONFIG_NFS_V4_1
>>        if (pgio->pg_test && !pgio->pg_test(pgio, prev, req))
>>                return 0;
>> -#endif /* CONFIG_NFS_V4_1 */
>>        return 1;
>>  }
>>
>> @@ -300,9 +298,10 @@ static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
>>        } else {
>>                put_lseg(desc->pg_lseg);
>>                desc->pg_base = req->wb_pgbase;
>> -               desc->pg_lseg = pnfs_update_layout(desc->pg_inode,
>> -                                                  req->wb_context,
>> -                                                  IOMODE_READ);
>> +               if (desc->pg_test)
>> +                       desc->pg_lseg = pnfs_update_layout(desc->pg_inode,
>> +                                                          req->wb_context,
>> +                                                          IOMODE_READ);
>>        }
>>        nfs_list_remove_request(req);
>>        nfs_list_add_request(req, &desc->pg_list);
>> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
>> index 004c28b..aca0268 100644
>> --- a/fs/nfs/write.c
>> +++ b/fs/nfs/write.c
>> @@ -28,6 +28,7 @@
>>  #include "iostat.h"
>>  #include "nfs4_fs.h"
>>  #include "fscache.h"
>> +#include "pnfs.h"
>>
>>  #define NFSDBG_FACILITY                NFSDBG_PAGECACHE
>>
>> @@ -982,6 +983,8 @@ static void nfs_pageio_init_write(struct nfs_pageio_descriptor *pgio,
>>  {
>>        size_t wsize = NFS_SERVER(inode)->wsize;
>>
>> +       pgio->pg_test = NULL;
>> +
>>        if (wsize < PAGE_CACHE_SIZE)
>>                nfs_pageio_init(pgio, inode, nfs_flush_multi, wsize, ioflags);
>>        else
>> diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
>> index 488f27b..ba88ff4 100644
>> --- a/include/linux/nfs_page.h
>> +++ b/include/linux/nfs_page.h
>> @@ -63,9 +63,7 @@ struct nfs_pageio_descriptor {
>>        int                     pg_ioflags;
>>        int                     pg_error;
>>        struct pnfs_layout_segment *pg_lseg;
>> -#ifdef CONFIG_NFS_V4_1
>>        int                     (*pg_test)(struct nfs_pageio_descriptor *, struct nfs_page *, struct nfs_page *);
>> -#endif /* CONFIG_NFS_V4_1 */
>>  };
>>
>>  #define NFS_WBACK_BUSY(req)    (test_bit(PG_BUSY,&(req)->wb_flags))
>> --
>> 1.6.6
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 19/40] SQUASHME pnfs-submit wave3 filelayout read pagelist cleanup
  2011-02-04 21:44                                       ` [PATCH 19/40] SQUASHME pnfs-submit wave3 filelayout read pagelist cleanup Fred Isaman
@ 2011-02-05 16:47                                         ` William A. (Andy) Adamson
  0 siblings, 0 replies; 58+ messages in thread
From: William A. (Andy) Adamson @ 2011-02-05 16:47 UTC (permalink / raw)
  To: Fred Isaman; +Cc: bhalevy, linux-nfs

On Fri, Feb 4, 2011 at 4:44 PM, Fred Isaman <iisaman@netapp.com> wrote:
> On Fri, Feb 4, 2011 at 4:33 PM,  <andros@netapp.com> wrote:
>> From: Andy Adamson <andros@netapp.com>
>>
>> Signed-off-by: Andy Adamson <andros@netapp.com>
>> ---
>>  fs/nfs/nfs4filelayout.c |   19 ++-----------------
>>  1 files changed, 2 insertions(+), 17 deletions(-)
>>
>> diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
>> index 3daf351..dce90a0 100644
>> --- a/fs/nfs/nfs4filelayout.c
>> +++ b/fs/nfs/nfs4filelayout.c
>> @@ -132,14 +132,6 @@ struct rpc_call_ops filelayout_read_call_ops = {
>>        .rpc_release = filelayout_read_release,
>>  };
>>
>> -/* Perform sync or async reads.
>> - *
>> - * An optimization for the NFS file layout driver
>> - * allows the original read/write data structs to be passed in the
>> - * last argument.
>> - *
>> - * TODO: join with write_pagelist?
>> - */
>>  static enum pnfs_try_status
>>  filelayout_read_pagelist(struct nfs_read_data *data, unsigned nr_pages)
>>  {
>> @@ -149,8 +141,8 @@ filelayout_read_pagelist(struct nfs_read_data *data, unsigned nr_pages)
>>        u32 idx;
>>        struct nfs_fh *fh;
>>
>> -       dprintk("--> %s ino %lu nr_pages %d pgbase %u req %Zu@%llu\n",
>> -               __func__, data->inode->i_ino, nr_pages,
>> +       dprintk("--> %s ino %lu pgbase %u req %Zu@%llu\n",
>> +               __func__, data->inode->i_ino,
>>                data->args.pgbase, (size_t)data->args.count, offset);
>>
>
> A nit, but this belongs in the next patch.

Yep :)

>
> Fred
>
>>        /* Retrieve the correct rpc_client for the byte range */
>> @@ -169,13 +161,6 @@ filelayout_read_pagelist(struct nfs_read_data *data, unsigned nr_pages)
>>        if (fh)
>>                data->args.fh = fh;
>>
>> -       /*
>> -        * Now get the file offset on the dserver
>> -        * Set the read offset to this offset, and
>> -        * save the original offset in orig_offset
>> -        * In the case of aync reads, the offset will be reset in the
>> -        * call_ops->rpc_call_done() routine.
>> -        */
>>        data->args.offset = filelayout_get_dserver_offset(lseg, offset);
>>        data->fldata.orig_offset = offset;
>>
>> --
>> 1.6.6
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 23/40] SQUASHME pnfs-submit wave3 new function for ds expired lease
  2011-02-05 16:46                                                 ` William A. (Andy) Adamson
@ 2011-02-06 19:41                                                   ` Fred Isaman
  2011-02-07 15:05                                                     ` William A. (Andy) Adamson
  0 siblings, 1 reply; 58+ messages in thread
From: Fred Isaman @ 2011-02-06 19:41 UTC (permalink / raw)
  To: William A. (Andy) Adamson; +Cc: bhalevy, linux-nfs

On Sat, Feb 5, 2011 at 11:46 AM, William A. (Andy) Adamson
<androsadamson@gmail.com> wrote:
> On Fri, Feb 4, 2011 at 4:51 PM, Fred Isaman <iisaman@netapp.com> wrote:
>> On Fri, Feb 4, 2011 at 4:33 PM,  <andros@netapp.com> wrote:
>>> From: Andy Adamson <andros@netapp.com>
>>>
>>> Signed-off-by: Andy Adamson <andros@netapp.com>
>>> ---
>>> áfs/nfs/nfs4proc.c | á 11 ++++++++---
>>> á1 files changed, 8 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>> index 9c50be7..fb22cbf 100644
>>> --- a/fs/nfs/nfs4proc.c
>>> +++ b/fs/nfs/nfs4proc.c
>>> @@ -1574,7 +1574,7 @@ static int _nfs4_proc_open(struct nfs4_opendata *data)
>>> á á á áreturn 0;
>>> á}
>>>
>>> -int nfs4_recover_expired_lease(struct nfs_client *clp)
>>> +static int nfs4_client_recover_expired_lease(struct nfs_client *clp)
>>> á{
>>> á á á áunsigned int loop;
>>> á á á áint ret;
>>> @@ -1593,6 +1593,11 @@ int nfs4_recover_expired_lease(struct nfs_client *clp)
>>> á}
>>> áEXPORT_SYMBOL(nfs4_recover_expired_lease);
>>>
>>> +static int nfs4_recover_expired_lease(struct nfs_server *server)
>>> +{
>>> + á á á return nfs4_client_recover_expired_lease(server->nfs_client);
>>> +}
>>> +
>>
>> Why are we doing this extra indirection?
>
> As Trond pointed out, it is a lot less intrusive to the existing code.
>
> -->Andy
>

I must be missing something.  What I see is that you are changing the
arguments to a function that is called exactly twice, and creating a
totally unnecessary subfunction  nfs4_client_recover_expired_lease.
How is this less intrusive than just directly inlining
nfs4_client_recover_expired_lease?

Fred

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 23/40] SQUASHME pnfs-submit wave3 new function for ds expired lease
  2011-02-06 19:41                                                   ` Fred Isaman
@ 2011-02-07 15:05                                                     ` William A. (Andy) Adamson
  2011-02-07 15:29                                                       ` Fred Isaman
  0 siblings, 1 reply; 58+ messages in thread
From: William A. (Andy) Adamson @ 2011-02-07 15:05 UTC (permalink / raw)
  To: Fred Isaman; +Cc: bhalevy, linux-nfs

On Sun, Feb 6, 2011 at 2:41 PM, Fred Isaman <iisaman@netapp.com> wrote:
> On Sat, Feb 5, 2011 at 11:46 AM, William A. (Andy) Adamson
> <androsadamson@gmail.com> wrote:
>> On Fri, Feb 4, 2011 at 4:51 PM, Fred Isaman <iisaman@netapp.com> wrote:
>>> On Fri, Feb 4, 2011 at 4:33 PM,  <andros@netapp.com> wrote:
>>>> From: Andy Adamson <andros@netapp.com>
>>>>
>>>> Signed-off-by: Andy Adamson <andros@netapp.com>
>>>> ---
>>>> áfs/nfs/nfs4proc.c | á 11 ++++++++---
>>>> á1 files changed, 8 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>>> index 9c50be7..fb22cbf 100644
>>>> --- a/fs/nfs/nfs4proc.c
>>>> +++ b/fs/nfs/nfs4proc.c
>>>> @@ -1574,7 +1574,7 @@ static int _nfs4_proc_open(struct nfs4_opendata *data)
>>>> á á á áreturn 0;
>>>> á}
>>>>
>>>> -int nfs4_recover_expired_lease(struct nfs_client *clp)
>>>> +static int nfs4_client_recover_expired_lease(struct nfs_client *clp)
>>>> á{
>>>> á á á áunsigned int loop;
>>>> á á á áint ret;
>>>> @@ -1593,6 +1593,11 @@ int nfs4_recover_expired_lease(struct nfs_client *clp)
>>>> á}
>>>> áEXPORT_SYMBOL(nfs4_recover_expired_lease);
>>>>
>>>> +static int nfs4_recover_expired_lease(struct nfs_server *server)
>>>> +{
>>>> + á á á return nfs4_client_recover_expired_lease(server->nfs_client);
>>>> +}
>>>> +
>>>
>>> Why are we doing this extra indirection?
>>
>> As Trond pointed out, it is a lot less intrusive to the existing code.
>>
>> -->Andy
>>
>
> I must be missing something.  What I see is that you are changing the
> arguments to a function that is called exactly twice, and creating a
> totally unnecessary subfunction  nfs4_client_recover_expired_lease.
> How is this less intrusive than just directly inlining
> nfs4_client_recover_expired_lease?

This patch reverts the change made in pnfs_submit: filelayout i/o
helpers which changed the argument from nfs_server to nfs_client.
which changed the two calls. That is more intrusive to what this patch
does which is to leave the two calls alone.  Adding
nfs4_client_recover_expired_lease lets us call it with struct
nfs_client from the data server code without changing the existing
calls and is therefore less intrusive.

-->Andy
>
> Fred
>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 23/40] SQUASHME pnfs-submit wave3 new function for ds expired lease
  2011-02-07 15:05                                                     ` William A. (Andy) Adamson
@ 2011-02-07 15:29                                                       ` Fred Isaman
  0 siblings, 0 replies; 58+ messages in thread
From: Fred Isaman @ 2011-02-07 15:29 UTC (permalink / raw)
  To: William A. (Andy) Adamson; +Cc: bhalevy, linux-nfs

On Mon, Feb 7, 2011 at 10:05 AM, William A. (Andy) Adamson
<androsadamson@gmail.com> wrote:
> On Sun, Feb 6, 2011 at 2:41 PM, Fred Isaman <iisaman@netapp.com> wrote:
>> On Sat, Feb 5, 2011 at 11:46 AM, William A. (Andy) Adamson
>> <androsadamson@gmail.com> wrote:
>>> On Fri, Feb 4, 2011 at 4:51 PM, Fred Isaman <iisaman@netapp.com> wrote:
>>>> On Fri, Feb 4, 2011 at 4:33 PM,  <andros@netapp.com> wrote:
>>>>> From: Andy Adamson <andros@netapp.com>
>>>>>
>>>>> Signed-off-by: Andy Adamson <andros@netapp.com>
>>>>> ---
>>>>> áfs/nfs/nfs4proc.c | á 11 ++++++++---
>>>>> á1 files changed, 8 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>>>> index 9c50be7..fb22cbf 100644
>>>>> --- a/fs/nfs/nfs4proc.c
>>>>> +++ b/fs/nfs/nfs4proc.c
>>>>> @@ -1574,7 +1574,7 @@ static int _nfs4_proc_open(struct nfs4_opendata *data)
>>>>> á á á áreturn 0;
>>>>> á}
>>>>>
>>>>> -int nfs4_recover_expired_lease(struct nfs_client *clp)
>>>>> +static int nfs4_client_recover_expired_lease(struct nfs_client *clp)
>>>>> á{
>>>>> á á á áunsigned int loop;
>>>>> á á á áint ret;
>>>>> @@ -1593,6 +1593,11 @@ int nfs4_recover_expired_lease(struct nfs_client *clp)
>>>>> á}
>>>>> áEXPORT_SYMBOL(nfs4_recover_expired_lease);
>>>>>
>>>>> +static int nfs4_recover_expired_lease(struct nfs_server *server)
>>>>> +{
>>>>> + á á á return nfs4_client_recover_expired_lease(server->nfs_client);
>>>>> +}
>>>>> +
>>>>
>>>> Why are we doing this extra indirection?
>>>
>>> As Trond pointed out, it is a lot less intrusive to the existing code.
>>>
>>> -->Andy
>>>
>>
>> I must be missing something.  What I see is that you are changing the
>> arguments to a function that is called exactly twice, and creating a
>> totally unnecessary subfunction  nfs4_client_recover_expired_lease.
>> How is this less intrusive than just directly inlining
>> nfs4_client_recover_expired_lease?
>
> This patch reverts the change made in pnfs_submit: filelayout i/o
> helpers which changed the argument from nfs_server to nfs_client.
> which changed the two calls. That is more intrusive to what this patch
> does which is to leave the two calls alone.  Adding
> nfs4_client_recover_expired_lease lets us call it with struct
> nfs_client from the data server code without changing the existing
> calls and is therefore less intrusive.
>
> -->Andy
>>

Ah, that is what I missed.  I did not understand that this was
reverting a previous change.

Fred

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 37/40] pnfs-submit wave3 send zero stateid seqid on v4.1 i/o
  2011-02-04 21:33                                                                         ` [PATCH 37/40] pnfs-submit wave3 send zero stateid seqid on v4.1 i/o andros
  2011-02-04 21:34                                                                           ` [PATCH 38/40] pnfs-submit wave3 new flag for state renewal check andros
@ 2011-02-07 17:42                                                                           ` Benny Halevy
  2011-02-09 17:11                                                                             ` William A. (Andy) Adamson
  1 sibling, 1 reply; 58+ messages in thread
From: Benny Halevy @ 2011-02-07 17:42 UTC (permalink / raw)
  To: andros; +Cc: linux-nfs

On 2011-02-04 23:33, andros@netapp.com wrote:
> From: Andy Adamson <andros@netapp.com>
> 
> Data servers require a zero stateid seqid, and there is no advantage to not
> doing the same for all NFSv4.1
> 
> Signed-off-by: Andy Adamson <andros@netapp.com>
> ---
>  fs/nfs/nfs4xdr.c |   10 +++++++---
>  1 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
> index 4e2c168..2380c45 100644
> --- a/fs/nfs/nfs4xdr.c
> +++ b/fs/nfs/nfs4xdr.c
> @@ -1384,7 +1384,7 @@ static void encode_putrootfh(struct xdr_stream *xdr, struct compound_hdr *hdr)
>  	hdr->replen += decode_putrootfh_maxsz;
>  }
>  
> -static void encode_stateid(struct xdr_stream *xdr, const struct nfs_open_context *ctx, const struct nfs_lock_context *l_ctx)
> +static void encode_stateid(struct xdr_stream *xdr, const struct nfs_open_context *ctx, const struct nfs_lock_context *l_ctx, int zero_seqid)

nit: how about bool rather than int zero_stateid?

Benny

>  {
>  	nfs4_stateid stateid;
>  	__be32 *p;
> @@ -1392,6 +1392,8 @@ static void encode_stateid(struct xdr_stream *xdr, const struct nfs_open_context
>  	p = reserve_space(xdr, NFS4_STATEID_SIZE);
>  	if (ctx->state != NULL) {
>  		nfs4_copy_stateid(&stateid, ctx->state, l_ctx->lockowner, l_ctx->pid);
> +		if (zero_seqid)
> +			stateid.stateid.seqid = 0;
>  		xdr_encode_opaque_fixed(p, stateid.data, NFS4_STATEID_SIZE);
>  	} else
>  		xdr_encode_opaque_fixed(p, zero_stateid.data, NFS4_STATEID_SIZE);
> @@ -1404,7 +1406,8 @@ static void encode_read(struct xdr_stream *xdr, const struct nfs_readargs *args,
>  	p = reserve_space(xdr, 4);
>  	*p = cpu_to_be32(OP_READ);
>  
> -	encode_stateid(xdr, args->context, args->lock_context);
> +	encode_stateid(xdr, args->context, args->lock_context,
> +		       hdr->minorversion);
>  
>  	p = reserve_space(xdr, 12);
>  	p = xdr_encode_hyper(p, args->offset);
> @@ -1592,7 +1595,8 @@ static void encode_write(struct xdr_stream *xdr, const struct nfs_writeargs *arg
>  	p = reserve_space(xdr, 4);
>  	*p = cpu_to_be32(OP_WRITE);
>  
> -	encode_stateid(xdr, args->context, args->lock_context);
> +	encode_stateid(xdr, args->context, args->lock_context,
> +		       hdr->minorversion);
>  
>  	p = reserve_space(xdr, 16);
>  	p = xdr_encode_hyper(p, args->offset);

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 33/40] pnfs-submit wave3 remove CONFIG_NFS_V4 and V4_1 from nfs_read_data
  2011-02-04 21:33                                                                 ` [PATCH 33/40] pnfs-submit wave3 remove CONFIG_NFS_V4 and V4_1 from nfs_read_data andros
  2011-02-04 21:33                                                                   ` [PATCH 34/40] pnfs-submit wave3 don't use nfs_read_prepare for DS andros
@ 2011-02-08 22:09                                                                   ` Fred Isaman
       [not found]                                                                     ` <AANLkTin_N0rFNr2KzxZ32bpWWUzwJQ4skLnZNVA=W6FQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 58+ messages in thread
From: Fred Isaman @ 2011-02-08 22:09 UTC (permalink / raw)
  To: andros; +Cc: bhalevy, linux-nfs

On Fri, Feb 4, 2011 at 4:33 PM,  <andros@netapp.com> wrote:
> From: Andy Adamson <andros@netapp.com>
>
> Signed-off-by: Andy Adamson <andros@netapp.com>
> ---
>  include/linux/nfs_xdr.h |   12 ++++--------
>  1 files changed, 4 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
> index 3b2e488..1222aa9 100644
> --- a/include/linux/nfs_xdr.h
> +++ b/include/linux/nfs_xdr.h
> @@ -1016,15 +1016,11 @@ struct nfs_read_data {
>        unsigned int            npages; /* Max length of pagevec */
>        struct nfs_readargs args;
>        struct nfs_readres  res;
> -       struct pnfs_layout_segment *lseg;
> -       struct nfs_client       *ds_clp;   /* pNFS data server */
> -#ifdef CONFIG_NFS_V4
>        unsigned long           timestamp;      /* For lease renewal */
> -#endif
> -#if defined(CONFIG_NFS_V4_1)
> -       const struct rpc_call_ops *call_ops;
> -       __u64                   orig_offset; /* For filelayout dense stripe */
> -#endif /* CONFIG_NFS_V4_1 */
> +       struct pnfs_layout_segment *lseg;
> +       struct nfs_client       *ds_clp;        /* pNFS data server */
> +       const struct rpc_call_ops *call_ops;    /* For pNFS recovery to MDS */

The comment is misleading, as it is used for pretty much all file
layout calls, not just recovery.
Perhaps rename the field to mds_call_ops and drop the comment?

> +       __u64                   orig_offset;    /* Filelayout dense stripe */

Also here, mds_offset might be a better name.

Fred

>        struct page             *page_array[NFS_PAGEVEC_SIZE];
>  };
>
> --
> 1.6.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 36/40] pnfs-submit wave3 filelayout read done
  2011-02-04 21:33                                                                       ` [PATCH 36/40] pnfs-submit wave3 filelayout read done andros
  2011-02-04 21:33                                                                         ` [PATCH 37/40] pnfs-submit wave3 send zero stateid seqid on v4.1 i/o andros
@ 2011-02-08 23:06                                                                         ` Fred Isaman
  2011-02-09 16:10                                                                           ` William A. (Andy) Adamson
  1 sibling, 1 reply; 58+ messages in thread
From: Fred Isaman @ 2011-02-08 23:06 UTC (permalink / raw)
  To: andros; +Cc: bhalevy, linux-nfs

On Fri, Feb 4, 2011 at 4:33 PM,  <andros@netapp.com> wrote:
> From: Andy Adamson <andros@netapp.com>
>
> Use our own async error handler.
> Mark the layout as failed and retry i/o through the MDS on specified errors.
>
> Signed-off-by: Andy Adamson <andros@netapp.com>
> ---
>  fs/nfs/internal.h           |    1 +
>  fs/nfs/nfs4filelayout.c     |   86 +++++++++++++++++++++++++++++++++++++++++++
>  fs/nfs/nfs4proc.c           |   44 +++++++++++++--------
>  fs/nfs/nfs4state.c          |    1 +
>  fs/nfs/pnfs.h               |    1 -
>  include/linux/nfs_xdr.h     |    1 +
>  include/linux/sunrpc/clnt.h |    1 +
>  net/sunrpc/clnt.c           |    8 ++++
>  8 files changed, 125 insertions(+), 18 deletions(-)
>
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index 5518d61..f69a322 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -281,6 +281,7 @@ extern int nfs_migrate_page(struct address_space *,
>  #endif
>
>  /* nfs4proc.c */
> +extern void nfs4_reset_read(struct rpc_task *task, struct nfs_read_data *data);
>  extern int _nfs4_call_sync(struct nfs_server *server,
>                           struct rpc_message *msg,
>                           struct nfs4_sequence_args *args,
> diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
> index 5fd8ed3..777d78b 100644
> --- a/fs/nfs/nfs4filelayout.c
> +++ b/fs/nfs/nfs4filelayout.c
> @@ -40,6 +40,8 @@ MODULE_LICENSE("GPL");
>  MODULE_AUTHOR("Dean Hildebrand <dhildebz@umich.edu>");
>  MODULE_DESCRIPTION("The NFSv4 file layout driver");
>
> +#define FILELAYOUT_POLL_RETRY_MAX     (15*HZ)
> +
>  static int
>  filelayout_set_layoutdriver(struct nfs_server *nfss)
>  {
> @@ -95,6 +97,88 @@ filelayout_get_dserver_offset(struct pnfs_layout_segment *lseg, loff_t offset)
>        BUG();
>  }
>
> +/* For data server errors we don't recover from */
> +static void
> +filelayout_set_lo_fail(struct pnfs_layout_segment *lseg, fmode_t mode)
> +{
> +       if (mode & FMODE_WRITE) {
> +               dprintk("%s Setting layout IOMODE_RW fail bit\n", __func__);
> +               set_bit(lo_fail_bit(IOMODE_RW), &lseg->pls_layout->plh_flags);
> +       } else if (mode & FMODE_READ) {
> +               dprintk("%s Setting layout IOMODE_READ fail bit\n", __func__);
> +               set_bit(lo_fail_bit(IOMODE_READ), &lseg->pls_layout->plh_flags);
> +       }
> +}
> +
> +/*
> + * Async I/O error handler.
> + *
> + * NFS4ERR_OLD_STATEID can not occur with a zero stateid seqid.
> + */
> +static int filelayout_async_handle_error(struct rpc_task *task,
> +                                        struct nfs4_state *state,
> +                                        struct nfs_client *clp,
> +                                        int *reset)
> +{
> +       if (task->tk_status >= 0)
> +               return 0;
> +       switch (task->tk_status) {
> +       case -NFS4ERR_BADSESSION:
> +       case -NFS4ERR_BADSLOT:
> +       case -NFS4ERR_BAD_HIGH_SLOT:
> +       case -NFS4ERR_DEADSESSION:
> +       case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION:
> +       case -NFS4ERR_SEQ_FALSE_RETRY:
> +       case -NFS4ERR_SEQ_MISORDERED:
> +               dprintk("%s ERROR %d, Reset session. Exchangeid "
> +                       "flags 0x%x\n", __func__, task->tk_status,
> +                       clp->cl_exchange_flags);
> +               nfs4_schedule_state_recovery(clp);
> +               task->tk_status = 0;
> +               return -EAGAIN;
> +       case -NFS4ERR_DELAY:
> +       case -NFS4ERR_GRACE:
> +       case -EKEYEXPIRED:
> +               rpc_delay(task, FILELAYOUT_POLL_RETRY_MAX);
> +               task->tk_status = 0;
> +               return -EAGAIN;
> +       default:
> +               dprintk("%s DS error %d\n", __func__, task->tk_status);
> +               /* Layout marked as failed by pnfs_check_io_status.
> +                * Retry I/O through the MDS */
> +               *reset = 1;
> +               task->tk_status = 0;
> +               return -EAGAIN;
> +       }
> +}
> +
> +/* NFS_PROTO call done callback routines */
> +
> +static int filelayout_read_done_cb(struct rpc_task *task,
> +                               struct nfs_read_data *data)
> +{
> +       struct nfs_client *clp = data->ds_clp;
> +       int reset = 0;
> +
> +       dprintk("%s DS read\n", __func__);
> +
> +       if (filelayout_async_handle_error(task, data->args.context->state,
> +                                         data->ds_clp, &reset) == -EAGAIN) {
> +               dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
> +                       __func__, data->ds_clp, data->ds_clp->cl_session);
> +               if (reset) {
> +                       nfs4_reset_read(task, data);
> +                       filelayout_set_lo_fail(data->lseg,
> +                                       data->args.context->state->state);

Why use the open context, instead of just failing read layouts? Do you
really want to prevent all future write layouts too?  Even worse is
the reverse case...if a write layout op fails do you want to prevent
all future read layouts just because the file happened to be open for
read?

If you answer is no, then just send a READ/WRITE bit.

If your answer is yes, then just remove the arg entirely and always
mark both modes as failing.

Fred

> +                       clp = NFS_SERVER(data->inode)->nfs_client;
> +               }
> +               nfs_restart_rpc(task, clp);
> +               return -EAGAIN;
> +       }
> +
> +       return 0;
> +}
> +
>  /*
>  * Call ops for the async read/write cases
>  * In the case of dense layouts, the offset needs to be reset to its
> @@ -104,6 +188,8 @@ static void filelayout_read_prepare(struct rpc_task *task, void *data)
>  {
>        struct nfs_read_data *rdata = (struct nfs_read_data *)data;
>
> +       rdata->read_done_cb = filelayout_read_done_cb;
> +
>        if (nfs41_setup_sequence(rdata->ds_clp->cl_session,
>                                &rdata->args.seq_args, &rdata->res.seq_res,
>                                0, task))
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 3fcf756..9dee49d 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -3075,41 +3075,51 @@ static int nfs4_proc_pathconf(struct nfs_server *server, struct nfs_fh *fhandle,
>        return err;
>  }
>
> -static int nfs4_read_done(struct rpc_task *task, struct nfs_read_data *data)
> +static int nfs4_read_done_cb(struct rpc_task *task, struct nfs_read_data *data)
>  {
>        struct nfs_server *server = NFS_SERVER(data->inode);
> -       struct nfs_client *clp = server->nfs_client;
> -
> -       dprintk("--> %s\n", __func__);
> -
> -#ifdef CONFIG_NFS_V4_1
> -       /* Is this a DS session */
> -       if (data->ds_clp) {
> -               dprintk("%s DS read\n", __func__);
> -               clp = data->ds_clp;
> -       }
> -#endif /* CONFIG_NFS_V4_1 */
> -
> -       if (!nfs4_sequence_done(task, &data->res.seq_res))
> -               return -EAGAIN;
>
>        if (nfs4_async_handle_error(task, server, data->args.context->state) == -EAGAIN) {
> -               nfs_restart_rpc(task, client);
> +               nfs_restart_rpc(task, server->nfs_client);
>                return -EAGAIN;
>        }
>
>        nfs_invalidate_atime(data->inode);
> -       if (task->tk_status > 0 && !data->ds_clp)
> +       if (task->tk_status > 0)
>                renew_lease(server, data->timestamp);
>        return 0;
>  }
>
> +static int nfs4_read_done(struct rpc_task *task, struct nfs_read_data *data)
> +{
> +
> +       dprintk("--> %s\n", __func__);
> +
> +       if (!nfs4_sequence_done(task, &data->res.seq_res))
> +               return -EAGAIN;
> +
> +       return data->read_done_cb(task, data);
> +}
> +
>  static void nfs4_proc_read_setup(struct nfs_read_data *data, struct rpc_message *msg)
>  {
>        data->timestamp   = jiffies;
> +       data->read_done_cb = nfs4_read_done_cb;
>        msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
>  }
>
> +/* Reset the the nfs_read_data to send the read to another server. */
> +void nfs4_reset_read(struct rpc_task *task, struct nfs_read_data *data)
> +{
> +       dprintk("%s Reset task for i/o through \n", __func__);
> +       data->ds_clp = NULL;
> +       data->args.fh     = NFS_FH(data->inode);
> +       data->read_done_cb = nfs4_read_done_cb;
> +       task->tk_ops = data->call_ops;
> +       rpc_task_reset_client(task, NFS_CLIENT(data->inode));
> +}
> +EXPORT_SYMBOL_GPL(nfs4_reset_read);
> +
>  static int nfs4_write_done(struct rpc_task *task, struct nfs_write_data *data)
>  {
>        struct inode *inode = data->inode;
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index 49433aa..346fb97 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -1022,6 +1022,7 @@ void nfs4_schedule_state_recovery(struct nfs_client *clp)
>                set_bit(NFS4CLNT_CHECK_LEASE, &clp->cl_state);
>        nfs4_schedule_state_manager(clp);
>  }
> +EXPORT_SYMBOL_GPL(nfs4_schedule_state_recovery);
>
>  int nfs4_state_mark_reclaim_reboot(struct nfs_client *clp, struct nfs4_state *state)
>  {
> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
> index 6a99c33..218cdfe 100644
> --- a/fs/nfs/pnfs.h
> +++ b/fs/nfs/pnfs.h
> @@ -198,7 +198,6 @@ void pnfs_roc_release(struct inode *ino);
>  void pnfs_roc_set_barrier(struct inode *ino, u32 barrier);
>  bool pnfs_roc_drain(struct inode *ino, u32 *barrier);
>
> -
>  static inline int lo_fail_bit(u32 iomode)
>  {
>        return iomode == IOMODE_RW ?
> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
> index 1222aa9..c91f468 100644
> --- a/include/linux/nfs_xdr.h
> +++ b/include/linux/nfs_xdr.h
> @@ -1020,6 +1020,7 @@ struct nfs_read_data {
>        struct pnfs_layout_segment *lseg;
>        struct nfs_client       *ds_clp;        /* pNFS data server */
>        const struct rpc_call_ops *call_ops;    /* For pNFS recovery to MDS */
> +       int (*read_done_cb) (struct rpc_task *task, struct nfs_read_data *data);
>        __u64                   orig_offset;    /* Filelayout dense stripe */
>        struct page             *page_array[NFS_PAGEVEC_SIZE];
>  };
> diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
> index ef9476a..db7bcaf 100644
> --- a/include/linux/sunrpc/clnt.h
> +++ b/include/linux/sunrpc/clnt.h
> @@ -129,6 +129,7 @@ struct rpc_create_args {
>  struct rpc_clnt *rpc_create(struct rpc_create_args *args);
>  struct rpc_clnt        *rpc_bind_new_program(struct rpc_clnt *,
>                                struct rpc_program *, u32);
> +void rpc_task_reset_client(struct rpc_task *task, struct rpc_clnt *clnt);
>  struct rpc_clnt *rpc_clone_client(struct rpc_clnt *);
>  void           rpc_shutdown_client(struct rpc_clnt *);
>  void           rpc_release_client(struct rpc_clnt *);
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index 57d344c..5c4df70 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -597,6 +597,14 @@ void rpc_task_set_client(struct rpc_task *task, struct rpc_clnt *clnt)
>        }
>  }
>
> +void rpc_task_reset_client(struct rpc_task *task, struct rpc_clnt *clnt)
> +{
> +       rpc_task_release_client(task);
> +       rpc_task_set_client(task, clnt);
> +}
> +EXPORT_SYMBOL_GPL(rpc_task_reset_client);
> +
> +
>  static void
>  rpc_task_set_rpc_message(struct rpc_task *task, const struct rpc_message *msg)
>  {
> --
> 1.6.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 36/40] pnfs-submit wave3 filelayout read done
  2011-02-08 23:06                                                                         ` [PATCH 36/40] pnfs-submit wave3 filelayout read done Fred Isaman
@ 2011-02-09 16:10                                                                           ` William A. (Andy) Adamson
  0 siblings, 0 replies; 58+ messages in thread
From: William A. (Andy) Adamson @ 2011-02-09 16:10 UTC (permalink / raw)
  To: Fred Isaman; +Cc: bhalevy, linux-nfs

Sorry if you got two responses - my mailer to netapp is having trouble
sending mail....

On Tue, Feb 8, 2011 at 6:06 PM, Fred Isaman <iisaman@netapp.com> wrote:
> On Fri, Feb 4, 2011 at 4:33 PM,  <andros@netapp.com> wrote:
>> From: Andy Adamson <andros@netapp.com>
>>
>> Use our own async error handler.
>> Mark the layout as failed and retry i/o through the MDS on specified errors.
>>
>> Signed-off-by: Andy Adamson <andros@netapp.com>
>> ---
>>  fs/nfs/internal.h           |    1 +
>>  fs/nfs/nfs4filelayout.c     |   86 +++++++++++++++++++++++++++++++++++++++++++
>>  fs/nfs/nfs4proc.c           |   44 +++++++++++++--------
>>  fs/nfs/nfs4state.c          |    1 +
>>  fs/nfs/pnfs.h               |    1 -
>>  include/linux/nfs_xdr.h     |    1 +
>>  include/linux/sunrpc/clnt.h |    1 +
>>  net/sunrpc/clnt.c           |    8 ++++
>>  8 files changed, 125 insertions(+), 18 deletions(-)
>>
>> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
>> index 5518d61..f69a322 100644
>> --- a/fs/nfs/internal.h
>> +++ b/fs/nfs/internal.h
>> @@ -281,6 +281,7 @@ extern int nfs_migrate_page(struct address_space *,
>>  #endif
>>
>>  /* nfs4proc.c */
>> +extern void nfs4_reset_read(struct rpc_task *task, struct nfs_read_data *data);
>>  extern int _nfs4_call_sync(struct nfs_server *server,
>>                           struct rpc_message *msg,
>>                           struct nfs4_sequence_args *args,
>> diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
>> index 5fd8ed3..777d78b 100644
>> --- a/fs/nfs/nfs4filelayout.c
>> +++ b/fs/nfs/nfs4filelayout.c
>> @@ -40,6 +40,8 @@ MODULE_LICENSE("GPL");
>>  MODULE_AUTHOR("Dean Hildebrand <dhildebz@umich.edu>");
>>  MODULE_DESCRIPTION("The NFSv4 file layout driver");
>>
>> +#define FILELAYOUT_POLL_RETRY_MAX     (15*HZ)
>> +
>>  static int
>>  filelayout_set_layoutdriver(struct nfs_server *nfss)
>>  {
>> @@ -95,6 +97,88 @@ filelayout_get_dserver_offset(struct pnfs_layout_segment *lseg, loff_t offset)
>>        BUG();
>>  }
>>
>> +/* For data server errors we don't recover from */
>> +static void
>> +filelayout_set_lo_fail(struct pnfs_layout_segment *lseg, fmode_t mode)
>> +{
>> +       if (mode & FMODE_WRITE) {
>> +               dprintk("%s Setting layout IOMODE_RW fail bit\n", __func__);
>> +               set_bit(lo_fail_bit(IOMODE_RW), &lseg->pls_layout->plh_flags);
>> +       } else if (mode & FMODE_READ) {
>> +               dprintk("%s Setting layout IOMODE_READ fail bit\n", __func__);
>> +               set_bit(lo_fail_bit(IOMODE_READ), &lseg->pls_layout->plh_flags);
>> +       }
>> +}
>> +
>> +/*
>> + * Async I/O error handler.
>> + *
>> + * NFS4ERR_OLD_STATEID can not occur with a zero stateid seqid.
>> + */
>> +static int filelayout_async_handle_error(struct rpc_task *task,
>> +                                        struct nfs4_state *state,
>> +                                        struct nfs_client *clp,
>> +                                        int *reset)
>> +{
>> +       if (task->tk_status >= 0)
>> +               return 0;
>> +       switch (task->tk_status) {
>> +       case -NFS4ERR_BADSESSION:
>> +       case -NFS4ERR_BADSLOT:
>> +       case -NFS4ERR_BAD_HIGH_SLOT:
>> +       case -NFS4ERR_DEADSESSION:
>> +       case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION:
>> +       case -NFS4ERR_SEQ_FALSE_RETRY:
>> +       case -NFS4ERR_SEQ_MISORDERED:
>> +               dprintk("%s ERROR %d, Reset session. Exchangeid "
>> +                       "flags 0x%x\n", __func__, task->tk_status,
>> +                       clp->cl_exchange_flags);
>> +               nfs4_schedule_state_recovery(clp);
>> +               task->tk_status = 0;
>> +               return -EAGAIN;
>> +       case -NFS4ERR_DELAY:
>> +       case -NFS4ERR_GRACE:
>> +       case -EKEYEXPIRED:
>> +               rpc_delay(task, FILELAYOUT_POLL_RETRY_MAX);
>> +               task->tk_status = 0;
>> +               return -EAGAIN;
>> +       default:
>> +               dprintk("%s DS error %d\n", __func__, task->tk_status);
>> +               /* Layout marked as failed by pnfs_check_io_status.
>> +                * Retry I/O through the MDS */
>> +               *reset = 1;
>> +               task->tk_status = 0;
>> +               return -EAGAIN;
>> +       }
>> +}
>> +
>> +/* NFS_PROTO call done callback routines */
>> +
>> +static int filelayout_read_done_cb(struct rpc_task *task,
>> +                               struct nfs_read_data *data)
>> +{
>> +       struct nfs_client *clp = data->ds_clp;
>> +       int reset = 0;
>> +
>> +       dprintk("%s DS read\n", __func__);
>> +
>> +       if (filelayout_async_handle_error(task, data->args.context->state,
>> +                                         data->ds_clp, &reset) == -EAGAIN) {
>> +               dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
>> +                       __func__, data->ds_clp, data->ds_clp->cl_session);
>> +               if (reset) {
>> +                       nfs4_reset_read(task, data);
>> +                       filelayout_set_lo_fail(data->lseg,
>> +                                       data->args.context->state->state);
>
> Why use the open context, instead of just failing read layouts?

I pass in the mode of the open, which is also used to determine the
iomode of the layout.

> Do you
> really want to prevent all future write layouts too?  Even worse is
> the reverse case...if a write layout op fails do you want to prevent
> all future read layouts just because the file happened to be open for
> read?

That is not what this code does. It either fails IOMODE_RW (for
FMODE_WRITE) or IOMODE_READ (for FMODE_READ) layouts, never both.

>
> If you answer is no, then just send a READ/WRITE bit.

What we really want is to fail the layout that was used. So, I can
simply use the iomode of the data->lseg to determine which mode to
fail.

-->Andy

>
> If your answer is yes, then just remove the arg entirely and always
> mark both modes as failing.
>
> Fred
>
>> +                       clp = NFS_SERVER(data->inode)->nfs_client;
>> +               }
>> +               nfs_restart_rpc(task, clp);
>> +               return -EAGAIN;
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>>  /*
>>  * Call ops for the async read/write cases
>>  * In the case of dense layouts, the offset needs to be reset to its
>> @@ -104,6 +188,8 @@ static void filelayout_read_prepare(struct rpc_task *task, void *data)
>>  {
>>        struct nfs_read_data *rdata = (struct nfs_read_data *)data;
>>
>> +       rdata->read_done_cb = filelayout_read_done_cb;
>> +
>>        if (nfs41_setup_sequence(rdata->ds_clp->cl_session,
>>                                &rdata->args.seq_args, &rdata->res.seq_res,
>>                                0, task))
>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> index 3fcf756..9dee49d 100644
>> --- a/fs/nfs/nfs4proc.c
>> +++ b/fs/nfs/nfs4proc.c
>> @@ -3075,41 +3075,51 @@ static int nfs4_proc_pathconf(struct nfs_server *server, struct nfs_fh *fhandle,
>>        return err;
>>  }
>>
>> -static int nfs4_read_done(struct rpc_task *task, struct nfs_read_data *data)
>> +static int nfs4_read_done_cb(struct rpc_task *task, struct nfs_read_data *data)
>>  {
>>        struct nfs_server *server = NFS_SERVER(data->inode);
>> -       struct nfs_client *clp = server->nfs_client;
>> -
>> -       dprintk("--> %s\n", __func__);
>> -
>> -#ifdef CONFIG_NFS_V4_1
>> -       /* Is this a DS session */
>> -       if (data->ds_clp) {
>> -               dprintk("%s DS read\n", __func__);
>> -               clp = data->ds_clp;
>> -       }
>> -#endif /* CONFIG_NFS_V4_1 */
>> -
>> -       if (!nfs4_sequence_done(task, &data->res.seq_res))
>> -               return -EAGAIN;
>>
>>        if (nfs4_async_handle_error(task, server, data->args.context->state) == -EAGAIN) {
>> -               nfs_restart_rpc(task, client);
>> +               nfs_restart_rpc(task, server->nfs_client);
>>                return -EAGAIN;
>>        }
>>
>>        nfs_invalidate_atime(data->inode);
>> -       if (task->tk_status > 0 && !data->ds_clp)
>> +       if (task->tk_status > 0)
>>                renew_lease(server, data->timestamp);
>>        return 0;
>>  }
>>
>> +static int nfs4_read_done(struct rpc_task *task, struct nfs_read_data *data)
>> +{
>> +
>> +       dprintk("--> %s\n", __func__);
>> +
>> +       if (!nfs4_sequence_done(task, &data->res.seq_res))
>> +               return -EAGAIN;
>> +
>> +       return data->read_done_cb(task, data);
>> +}
>> +
>>  static void nfs4_proc_read_setup(struct nfs_read_data *data, struct rpc_message *msg)
>>  {
>>        data->timestamp   = jiffies;
>> +       data->read_done_cb = nfs4_read_done_cb;
>>        msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
>>  }
>>
>> +/* Reset the the nfs_read_data to send the read to another server. */
>> +void nfs4_reset_read(struct rpc_task *task, struct nfs_read_data *data)
>> +{
>> +       dprintk("%s Reset task for i/o through \n", __func__);
>> +       data->ds_clp = NULL;
>> +       data->args.fh     = NFS_FH(data->inode);
>> +       data->read_done_cb = nfs4_read_done_cb;
>> +       task->tk_ops = data->call_ops;
>> +       rpc_task_reset_client(task, NFS_CLIENT(data->inode));
>> +}
>> +EXPORT_SYMBOL_GPL(nfs4_reset_read);
>> +
>>  static int nfs4_write_done(struct rpc_task *task, struct nfs_write_data *data)
>>  {
>>        struct inode *inode = data->inode;
>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>> index 49433aa..346fb97 100644
>> --- a/fs/nfs/nfs4state.c
>> +++ b/fs/nfs/nfs4state.c
>> @@ -1022,6 +1022,7 @@ void nfs4_schedule_state_recovery(struct nfs_client *clp)
>>                set_bit(NFS4CLNT_CHECK_LEASE, &clp->cl_state);
>>        nfs4_schedule_state_manager(clp);
>>  }
>> +EXPORT_SYMBOL_GPL(nfs4_schedule_state_recovery);
>>
>>  int nfs4_state_mark_reclaim_reboot(struct nfs_client *clp, struct nfs4_state *state)
>>  {
>> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
>> index 6a99c33..218cdfe 100644
>> --- a/fs/nfs/pnfs.h
>> +++ b/fs/nfs/pnfs.h
>> @@ -198,7 +198,6 @@ void pnfs_roc_release(struct inode *ino);
>>  void pnfs_roc_set_barrier(struct inode *ino, u32 barrier);
>>  bool pnfs_roc_drain(struct inode *ino, u32 *barrier);
>>
>> -
>>  static inline int lo_fail_bit(u32 iomode)
>>  {
>>        return iomode == IOMODE_RW ?
>> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
>> index 1222aa9..c91f468 100644
>> --- a/include/linux/nfs_xdr.h
>> +++ b/include/linux/nfs_xdr.h
>> @@ -1020,6 +1020,7 @@ struct nfs_read_data {
>>        struct pnfs_layout_segment *lseg;
>>        struct nfs_client       *ds_clp;        /* pNFS data server */
>>        const struct rpc_call_ops *call_ops;    /* For pNFS recovery to MDS */
>> +       int (*read_done_cb) (struct rpc_task *task, struct nfs_read_data *data);
>>        __u64                   orig_offset;    /* Filelayout dense stripe */
>>        struct page             *page_array[NFS_PAGEVEC_SIZE];
>>  };
>> diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
>> index ef9476a..db7bcaf 100644
>> --- a/include/linux/sunrpc/clnt.h
>> +++ b/include/linux/sunrpc/clnt.h
>> @@ -129,6 +129,7 @@ struct rpc_create_args {
>>  struct rpc_clnt *rpc_create(struct rpc_create_args *args);
>>  struct rpc_clnt        *rpc_bind_new_program(struct rpc_clnt *,
>>                                struct rpc_program *, u32);
>> +void rpc_task_reset_client(struct rpc_task *task, struct rpc_clnt *clnt);
>>  struct rpc_clnt *rpc_clone_client(struct rpc_clnt *);
>>  void           rpc_shutdown_client(struct rpc_clnt *);
>>  void           rpc_release_client(struct rpc_clnt *);
>> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
>> index 57d344c..5c4df70 100644
>> --- a/net/sunrpc/clnt.c
>> +++ b/net/sunrpc/clnt.c
>> @@ -597,6 +597,14 @@ void rpc_task_set_client(struct rpc_task *task, struct rpc_clnt *clnt)
>>        }
>>  }
>>
>> +void rpc_task_reset_client(struct rpc_task *task, struct rpc_clnt *clnt)
>> +{
>> +       rpc_task_release_client(task);
>> +       rpc_task_set_client(task, clnt);
>> +}
>> +EXPORT_SYMBOL_GPL(rpc_task_reset_client);
>> +
>> +
>>  static void
>>  rpc_task_set_rpc_message(struct rpc_task *task, const struct rpc_message *msg)
>>  {
>> --
>> 1.6.6
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 33/40] pnfs-submit wave3 remove CONFIG_NFS_V4 and V4_1 from nfs_read_data
       [not found]                                                                     ` <AANLkTin_N0rFNr2KzxZ32bpWWUzwJQ4skLnZNVA=W6FQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-02-09 16:11                                                                       ` William A. (Andy) Adamson
  0 siblings, 0 replies; 58+ messages in thread
From: William A. (Andy) Adamson @ 2011-02-09 16:11 UTC (permalink / raw)
  To: Fred Isaman; +Cc: bhalevy, linux-nfs

On Tue, Feb 8, 2011 at 5:09 PM, Fred Isaman <iisaman@netapp.com> wrote:
> On Fri, Feb 4, 2011 at 4:33 PM,  <andros@netapp.com> wrote:
>> From: Andy Adamson <andros@netapp.com>
>>
>> Signed-off-by: Andy Adamson <andros@netapp.com>
>> ---
>>  include/linux/nfs_xdr.h |   12 ++++--------
>>  1 files changed, 4 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
>> index 3b2e488..1222aa9 100644
>> --- a/include/linux/nfs_xdr.h
>> +++ b/include/linux/nfs_xdr.h
>> @@ -1016,15 +1016,11 @@ struct nfs_read_data {
>>        unsigned int            npages; /* Max length of pagevec */
>>        struct nfs_readargs args;
>>        struct nfs_readres  res;
>> -       struct pnfs_layout_segment *lseg;
>> -       struct nfs_client       *ds_clp;   /* pNFS data server */
>> -#ifdef CONFIG_NFS_V4
>>        unsigned long           timestamp;      /* For lease renewal */
>> -#endif
>> -#if defined(CONFIG_NFS_V4_1)
>> -       const struct rpc_call_ops *call_ops;
>> -       __u64                   orig_offset; /* For filelayout dense stripe */
>> -#endif /* CONFIG_NFS_V4_1 */
>> +       struct pnfs_layout_segment *lseg;
>> +       struct nfs_client       *ds_clp;        /* pNFS data server */
>> +       const struct rpc_call_ops *call_ops;    /* For pNFS recovery to MDS */
>
> The comment is misleading, as it is used for pretty much all file
> layout calls, not just recovery.
> Perhaps rename the field to mds_call_ops and drop the comment?
>
>> +       __u64                   orig_offset;    /* Filelayout dense stripe */
>
> Also here, mds_offset might be a better name.

Good suggestions.

-->Andy

>
> Fred
>
>>        struct page             *page_array[NFS_PAGEVEC_SIZE];
>>  };
>>
>> --
>> 1.6.6
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 37/40] pnfs-submit wave3 send zero stateid seqid on v4.1 i/o
  2011-02-07 17:42                                                                           ` [PATCH 37/40] pnfs-submit wave3 send zero stateid seqid on v4.1 i/o Benny Halevy
@ 2011-02-09 17:11                                                                             ` William A. (Andy) Adamson
  0 siblings, 0 replies; 58+ messages in thread
From: William A. (Andy) Adamson @ 2011-02-09 17:11 UTC (permalink / raw)
  To: Benny Halevy; +Cc: linux-nfs

On Mon, Feb 7, 2011 at 12:42 PM, Benny Halevy <bhalevy@panasas.com> wrote:
> On 2011-02-04 23:33, andros@netapp.com wrote:
>> From: Andy Adamson <andros@netapp.com>
>>
>> Data servers require a zero stateid seqid, and there is no advantage to not
>> doing the same for all NFSv4.1
>>
>> Signed-off-by: Andy Adamson <andros@netapp.com>
>> ---
>>  fs/nfs/nfs4xdr.c |   10 +++++++---
>>  1 files changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
>> index 4e2c168..2380c45 100644
>> --- a/fs/nfs/nfs4xdr.c
>> +++ b/fs/nfs/nfs4xdr.c
>> @@ -1384,7 +1384,7 @@ static void encode_putrootfh(struct xdr_stream *xdr, struct compound_hdr *hdr)
>>       hdr->replen += decode_putrootfh_maxsz;
>>  }
>>
>> -static void encode_stateid(struct xdr_stream *xdr, const struct nfs_open_context *ctx, const struct nfs_lock_context *l_ctx)
>> +static void encode_stateid(struct xdr_stream *xdr, const struct nfs_open_context *ctx, const struct nfs_lock_context *l_ctx, int zero_seqid)
>
> nit: how about bool rather than int zero_stateid?

At first I thought OK, change to bool. But then I looked at how it is
set. The zero_seqid parameter is simply the compound_hdr->minorversion
passed into encode_stateid which I think is cleaner than changing
zero_seqid to a bool and then testing if compound_hdr->minorversion
then set 'true' when encode_stateid is called.  So I think this is OK
as is.

-->Andy
>
> Benny
>
>>  {
>>       nfs4_stateid stateid;
>>       __be32 *p;
>> @@ -1392,6 +1392,8 @@ static void encode_stateid(struct xdr_stream *xdr, const struct nfs_open_context
>>       p = reserve_space(xdr, NFS4_STATEID_SIZE);
>>       if (ctx->state != NULL) {
>>               nfs4_copy_stateid(&stateid, ctx->state, l_ctx->lockowner, l_ctx->pid);
>> +             if (zero_seqid)
>> +                     stateid.stateid.seqid = 0;
>>               xdr_encode_opaque_fixed(p, stateid.data, NFS4_STATEID_SIZE);
>>       } else
>>               xdr_encode_opaque_fixed(p, zero_stateid.data, NFS4_STATEID_SIZE);
>> @@ -1404,7 +1406,8 @@ static void encode_read(struct xdr_stream *xdr, const struct nfs_readargs *args,
>>       p = reserve_space(xdr, 4);
>>       *p = cpu_to_be32(OP_READ);
>>
>> -     encode_stateid(xdr, args->context, args->lock_context);
>> +     encode_stateid(xdr, args->context, args->lock_context,
>> +                    hdr->minorversion);
>>
>>       p = reserve_space(xdr, 12);
>>       p = xdr_encode_hyper(p, args->offset);
>> @@ -1592,7 +1595,8 @@ static void encode_write(struct xdr_stream *xdr, const struct nfs_writeargs *arg
>>       p = reserve_space(xdr, 4);
>>       *p = cpu_to_be32(OP_WRITE);
>>
>> -     encode_stateid(xdr, args->context, args->lock_context);
>> +     encode_stateid(xdr, args->context, args->lock_context,
>> +                    hdr->minorversion);
>>
>>       p = reserve_space(xdr, 16);
>>       p = xdr_encode_hyper(p, args->offset);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/40] Wave3: For pNFS team review, not for kernel submission
  2011-02-04 21:33 [PATCH 0/40] Wave3: For pNFS team review, not for kernel submission andros
  2011-02-04 21:33 ` [PATCH 01/40] pnfs-submit: wave3: lseg refcounting andros
@ 2011-02-10  5:59 ` Benny Halevy
  2011-02-10 14:17   ` William A. (Andy) Adamson
  1 sibling, 1 reply; 58+ messages in thread
From: Benny Halevy @ 2011-02-10  5:59 UTC (permalink / raw)
  To: andros; +Cc: linux-nfs

I merged these patches in the pnfs-submit-wave3-rev2 branch
in git://linux-nfs.org/~bhalevy/linux-pnfs.git
and then your 15 patch series that's zero diff from this one
as pnfs-submit-wave3-rev3

pnfs-submit-wave3 now points to pnfs-submit-wave3-rev3

Fred is working on preparing wave4 on top of wave3
But until we're finished with that and then the rest of the tree on top of it
I forked pnfs-submit and downward from wave3 and it has not changed.

The tree at this point is structured like this:

nfsd41-all
	pnfs-submit-wave3
	pnfs-submit
		pnfs
			...

Benny

On 2011-02-04 23:33, andros@netapp.com wrote:
> The wave3 code addresses pNFS file layout data server connection, data server
> READ I/O and recovery of failed data server READs through the MDS.
> 
> I did not see the pnfs-submit-wave3 branch on benny's tree, so I created my
> own for the meantime.
> I cloned the nfsd41-all from git://linux-nfs.org/~bhalevy/linux-pnfs.git
> which is the base for the pnfs-submit branch.
> I then applied the wave3 patches from benny's pnfs-submit branch,
> and then the changes.
> 
> git://linux-nfs.org/projects/andros/benny-linux-pnfs.git
> branch andros-pnfs-submit-wave3 contains the result.
> 
> ========================================================================
> Please review the changes - I want to submit to Trond/Christoph next week.
> ========================================================================
> 
> These patches are in the first 12 in the pnfs-submit tree and are the original
> "wave3" patches.
> 
> 0001-pnfs-submit-wave3-lseg-refcounting.patch
> 0002-pnfs_submit-add-data-server-session-to-nfs4_setup_se.patch
> 0003-pnfs_submit-update-nfs4_async_handle_error-for-data-.patch
> 0004-pnfs_submit-update-state-renewal-for-data-servers.patch
> 0005-pnfs_submit-wave3-pageio-helpers.patch
> 0006-pnfs_submit-wave3-associate-layout-segment-with-nfs_.patch
> 0007-pnfs_submit-filelayout-policy-operations.patch
> 0008-pnfs_submit-filelayout-i-o-helpers.patch
> 0009-pnfs_submit-wave3-generic-read.patch
> 0010-pnfs_submit-filelayout-read.patch
> 0011-pnfs_submit-increase-NFS_MAX_FILE_IO_SIZE.patch
> 0012-pnfs_submit-enforce-requested-DS-only-pNFS-role.patch
> 
> The rest are the wave3 changes.
> 
> Summary of changes;
> -------------------
> 
> 1) The file layoutdriver now specifies it's own rpc_call_prepare and
> rpc_call_done callbacks for READ.
> 
> filelayout_read_prepare:
> - Uses nfs41_setup_sequence so we do not need to change nfs4_setup_sequence().
> 
> filelayout_read_done
> - Add a read_done_cb function to nfs_read_data that calls nfs_read_done_cb for
> NFS READs and filelayout_read_done_cb for data server READs.
> - filelayout_read_done_cb has its own async error handler so we do not need to change nfs4_async_handle_error()
> 
> 2) DS/MDS dual role now allows for sessions used as a data server to be reused
> as an MDS or NFSv41 mount.
> - We don't ask for the DS role on data server EXCHANGE_ID
> - We don't strip any roles returned by the server.
> - If a session is in use as a DS role, and the client subsequently mounts the
> same server as either an MDS or NON_PNFS mount, the same session can be used
> provided the existing exchange flags allow it.
> 
> 3) We always send a zero READ/WRITE stateid seqid. This is required for
> data servers, and there is no advantage to not doing it for MDS or NON_PNFS
> mounts.
> 
> 4) We mark the deviceid as invalid upon any data server connection failure
> and print out a kernel message.
> This in turn marks any layout that tries to use the devicid as failed for
> both IOMODE_READ and IOMODE_RW. Inodes without layouts will still send
> a layoutget. If the resultant layout uses the marked deviceid, it will be
> marked as failed for both iomodes. All I/O will go through the MDS until
> a client reboot or a CB_LAYOUTRECALL ALL or FSID removes all layouts that
> refer to the deviceid, which removes the deviceid.
> 
> 5) Our new file layout async error handler only recovers from session
> related errors, or grace/delay errors. All other errors including
> NFS4ERR_EXPIRED or NFS4ERR_STALE_CLIENTID result in marking the layout as
> failed for IOMODE_READ and I/O is retried through the MDS.
> 
> 6) Fred's lock inversion patches, and the request by Trond to not reference
> a layout segment on dirty pages held in the cache changed the layout
> segment reference counting.
> 
> There are a couple of small issues I'm still investigating. Trond and Fred
> have done an initial review.
> 
> -->Andy
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/40] Wave3: For pNFS team review, not for kernel submission
  2011-02-10  5:59 ` [PATCH 0/40] Wave3: For pNFS team review, not for kernel submission Benny Halevy
@ 2011-02-10 14:17   ` William A. (Andy) Adamson
  0 siblings, 0 replies; 58+ messages in thread
From: William A. (Andy) Adamson @ 2011-02-10 14:17 UTC (permalink / raw)
  To: Benny Halevy; +Cc: linux-nfs

Thanks Benny, that's awesome :)

-->Andy

On Thu, Feb 10, 2011 at 12:59 AM, Benny Halevy <bhalevy@panasas.com> wrote:
> I merged these patches in the pnfs-submit-wave3-rev2 branch
> in git://linux-nfs.org/~bhalevy/linux-pnfs.git
> and then your 15 patch series that's zero diff from this one
> as pnfs-submit-wave3-rev3
>
> pnfs-submit-wave3 now points to pnfs-submit-wave3-rev3
>
> Fred is working on preparing wave4 on top of wave3
> But until we're finished with that and then the rest of the tree on top of it
> I forked pnfs-submit and downward from wave3 and it has not changed.
>
> The tree at this point is structured like this:
>
> nfsd41-all
>        pnfs-submit-wave3
>        pnfs-submit
>                pnfs
>                        ...
>
> Benny
>
> On 2011-02-04 23:33, andros@netapp.com wrote:
>> The wave3 code addresses pNFS file layout data server connection, data server
>> READ I/O and recovery of failed data server READs through the MDS.
>>
>> I did not see the pnfs-submit-wave3 branch on benny's tree, so I created my
>> own for the meantime.
>> I cloned the nfsd41-all from git://linux-nfs.org/~bhalevy/linux-pnfs.git
>> which is the base for the pnfs-submit branch.
>> I then applied the wave3 patches from benny's pnfs-submit branch,
>> and then the changes.
>>
>> git://linux-nfs.org/projects/andros/benny-linux-pnfs.git
>> branch andros-pnfs-submit-wave3 contains the result.
>>
>> ========================================================================
>> Please review the changes - I want to submit to Trond/Christoph next week.
>> ========================================================================
>>
>> These patches are in the first 12 in the pnfs-submit tree and are the original
>> "wave3" patches.
>>
>> 0001-pnfs-submit-wave3-lseg-refcounting.patch
>> 0002-pnfs_submit-add-data-server-session-to-nfs4_setup_se.patch
>> 0003-pnfs_submit-update-nfs4_async_handle_error-for-data-.patch
>> 0004-pnfs_submit-update-state-renewal-for-data-servers.patch
>> 0005-pnfs_submit-wave3-pageio-helpers.patch
>> 0006-pnfs_submit-wave3-associate-layout-segment-with-nfs_.patch
>> 0007-pnfs_submit-filelayout-policy-operations.patch
>> 0008-pnfs_submit-filelayout-i-o-helpers.patch
>> 0009-pnfs_submit-wave3-generic-read.patch
>> 0010-pnfs_submit-filelayout-read.patch
>> 0011-pnfs_submit-increase-NFS_MAX_FILE_IO_SIZE.patch
>> 0012-pnfs_submit-enforce-requested-DS-only-pNFS-role.patch
>>
>> The rest are the wave3 changes.
>>
>> Summary of changes;
>> -------------------
>>
>> 1) The file layoutdriver now specifies it's own rpc_call_prepare and
>> rpc_call_done callbacks for READ.
>>
>> filelayout_read_prepare:
>> - Uses nfs41_setup_sequence so we do not need to change nfs4_setup_sequence().
>>
>> filelayout_read_done
>> - Add a read_done_cb function to nfs_read_data that calls nfs_read_done_cb for
>> NFS READs and filelayout_read_done_cb for data server READs.
>> - filelayout_read_done_cb has its own async error handler so we do not need to change nfs4_async_handle_error()
>>
>> 2) DS/MDS dual role now allows for sessions used as a data server to be reused
>> as an MDS or NFSv41 mount.
>> - We don't ask for the DS role on data server EXCHANGE_ID
>> - We don't strip any roles returned by the server.
>> - If a session is in use as a DS role, and the client subsequently mounts the
>> same server as either an MDS or NON_PNFS mount, the same session can be used
>> provided the existing exchange flags allow it.
>>
>> 3) We always send a zero READ/WRITE stateid seqid. This is required for
>> data servers, and there is no advantage to not doing it for MDS or NON_PNFS
>> mounts.
>>
>> 4) We mark the deviceid as invalid upon any data server connection failure
>> and print out a kernel message.
>> This in turn marks any layout that tries to use the devicid as failed for
>> both IOMODE_READ and IOMODE_RW. Inodes without layouts will still send
>> a layoutget. If the resultant layout uses the marked deviceid, it will be
>> marked as failed for both iomodes. All I/O will go through the MDS until
>> a client reboot or a CB_LAYOUTRECALL ALL or FSID removes all layouts that
>> refer to the deviceid, which removes the deviceid.
>>
>> 5) Our new file layout async error handler only recovers from session
>> related errors, or grace/delay errors. All other errors including
>> NFS4ERR_EXPIRED or NFS4ERR_STALE_CLIENTID result in marking the layout as
>> failed for IOMODE_READ and I/O is retried through the MDS.
>>
>> 6) Fred's lock inversion patches, and the request by Trond to not reference
>> a layout segment on dirty pages held in the cache changed the layout
>> segment reference counting.
>>
>> There are a couple of small issues I'm still investigating. Trond and Fred
>> have done an initial review.
>>
>> -->Andy
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2011-02-10 14:17 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-04 21:33 [PATCH 0/40] Wave3: For pNFS team review, not for kernel submission andros
2011-02-04 21:33 ` [PATCH 01/40] pnfs-submit: wave3: lseg refcounting andros
2011-02-04 21:33   ` [PATCH 02/40] pnfs_submit: add data server session to nfs4_setup_sequence andros
2011-02-04 21:33     ` [PATCH 03/40] pnfs_submit: update nfs4_async_handle_error for data server andros
2011-02-04 21:33       ` [PATCH 04/40] pnfs_submit: update state renewal for data servers andros
2011-02-04 21:33         ` [PATCH 05/40] pnfs_submit: wave3 pageio-helpers andros
2011-02-04 21:33           ` [PATCH 06/40] pnfs_submit: wave3 associate layout segment with nfs_page andros
2011-02-04 21:33             ` [PATCH 07/40] pnfs_submit: filelayout policy operations andros
2011-02-04 21:33               ` [PATCH 08/40] pnfs_submit: filelayout i/o helpers andros
2011-02-04 21:33                 ` [PATCH 09/40] pnfs_submit: wave3 generic read andros
2011-02-04 21:33                   ` [PATCH 10/40] pnfs_submit: filelayout read andros
2011-02-04 21:33                     ` [PATCH 11/40] pnfs_submit: increase NFS_MAX_FILE_IO_SIZE andros
2011-02-04 21:33                       ` [PATCH 12/40] pnfs_submit: enforce requested DS only pNFS role andros
2011-02-04 21:33                         ` [PATCH 13/40] REVERT pnfs_submit-add-data-server-session-to-nfs4_setup_s.patch andros
2011-02-04 21:33                           ` [PATCH 14/40] REVERT: pnfs_submit: update nfs4_async_handle_error for data server andros
2011-02-04 21:33                             ` [PATCH 15/40] REVERT pnfs_submit: increase NFS_MAX_FILE_IO_SIZE andros
2011-02-04 21:33                               ` [PATCH 16/40] REVERT pnfs_submit: enforce requested DS only pNFS role andros
2011-02-04 21:33                                 ` [PATCH 17/40] SQUASHME pnfs-submit wave3 remove is_ds_only_session andros
2011-02-04 21:33                                   ` [PATCH 18/40] SQUASHME pnfs-submit: wave3 make pnfs_initiate_read static andros
2011-02-04 21:33                                     ` [PATCH 19/40] SQUASHME pnfs-submit wave3 filelayout read pagelist cleanup andros
2011-02-04 21:33                                       ` [PATCH 20/40] SQUASHME pnfs-submit wave3 remove nr_pages from read_pagelist andros
2011-02-04 21:33                                         ` [PATCH 21/40] SQUASHME pnfs-submit wave3 add comment to nfs4_fl_prepare_ds_fh andros
2011-02-04 21:33                                           ` [PATCH 22/40] SQUASHME pnfs-submit wave3 move BUG outside of switch andros
2011-02-04 21:33                                             ` [PATCH 23/40] SQUASHME pnfs-submit wave3 new function for ds expired lease andros
2011-02-04 21:33                                               ` [PATCH 24/40] NFS move nfs_client initialization into nfs_get_client andros
2011-02-04 21:33                                                 ` [PATCH 25/40] pnfs-submit: wave3 refactor dataserver client setup andros
2011-02-04 21:33                                                   ` [PATCH 26/40] pnfs-submit: wave3 refactor data server session initialization andros
2011-02-04 21:33                                                     ` [PATCH 27/40] pnfs_submit: wave3 rename nfs4_pnfs_ds_create andros
2011-02-04 21:33                                                       ` [PATCH 28/40] pnfs-submit: wave3 turn off pNFS on ds connection failure andros
2011-02-04 21:33                                                         ` [PATCH 29/40] pnfs-submit: wave3 rewrite read lseg refcounting andros
2011-02-04 21:33                                                           ` [PATCH 30/40] pnfs-submit: wave3 let LAYOUTGET distinguish between read and write calls andros
2011-02-04 21:33                                                             ` [PATCH 31/40] pnfs_submit wave3 remove struct pnfs_fl_call_data andros
2011-02-04 21:33                                                               ` [PATCH 32/40] pnfs_submit: wave3 get rid of pnfs_call_data andros
2011-02-04 21:33                                                                 ` [PATCH 33/40] pnfs-submit wave3 remove CONFIG_NFS_V4 and V4_1 from nfs_read_data andros
2011-02-04 21:33                                                                   ` [PATCH 34/40] pnfs-submit wave3 don't use nfs_read_prepare for DS andros
2011-02-04 21:33                                                                     ` [PATCH 35/40] pnfs_submit wave3 filelayout_read_prepare andros
2011-02-04 21:33                                                                       ` [PATCH 36/40] pnfs-submit wave3 filelayout read done andros
2011-02-04 21:33                                                                         ` [PATCH 37/40] pnfs-submit wave3 send zero stateid seqid on v4.1 i/o andros
2011-02-04 21:34                                                                           ` [PATCH 38/40] pnfs-submit wave3 new flag for state renewal check andros
2011-02-04 21:34                                                                             ` [PATCH 39/40] pnfs-submit wave3 new flag for lease time check andros
2011-02-04 21:34                                                                               ` [PATCH 40/40] pnfs-submit wave3 add MDS mount DS only check andros
2011-02-07 17:42                                                                           ` [PATCH 37/40] pnfs-submit wave3 send zero stateid seqid on v4.1 i/o Benny Halevy
2011-02-09 17:11                                                                             ` William A. (Andy) Adamson
2011-02-08 23:06                                                                         ` [PATCH 36/40] pnfs-submit wave3 filelayout read done Fred Isaman
2011-02-09 16:10                                                                           ` William A. (Andy) Adamson
2011-02-08 22:09                                                                   ` [PATCH 33/40] pnfs-submit wave3 remove CONFIG_NFS_V4 and V4_1 from nfs_read_data Fred Isaman
     [not found]                                                                     ` <AANLkTin_N0rFNr2KzxZ32bpWWUzwJQ4skLnZNVA=W6FQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-02-09 16:11                                                                       ` William A. (Andy) Adamson
2011-02-04 21:59                                                             ` [PATCH 30/40] pnfs-submit: wave3 let LAYOUTGET distinguish between read and write calls Fred Isaman
2011-02-05 16:47                                                               ` William A. (Andy) Adamson
2011-02-04 21:51                                               ` [PATCH 23/40] SQUASHME pnfs-submit wave3 new function for ds expired lease Fred Isaman
2011-02-05 16:46                                                 ` William A. (Andy) Adamson
2011-02-06 19:41                                                   ` Fred Isaman
2011-02-07 15:05                                                     ` William A. (Andy) Adamson
2011-02-07 15:29                                                       ` Fred Isaman
2011-02-04 21:44                                       ` [PATCH 19/40] SQUASHME pnfs-submit wave3 filelayout read pagelist cleanup Fred Isaman
2011-02-05 16:47                                         ` William A. (Andy) Adamson
2011-02-10  5:59 ` [PATCH 0/40] Wave3: For pNFS team review, not for kernel submission Benny Halevy
2011-02-10 14:17   ` William A. (Andy) Adamson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.