All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH Version 4 00/13] NFSv4.1 file layout data server quick failover
@ 2012-04-27 21:53 andros
  2012-04-27 21:53 ` [PATCH Version 4 01/13] NFSv4.1 do not send LAYOUTRETURN when there are no layout segments andros
                   ` (12 more replies)
  0 siblings, 13 replies; 15+ messages in thread
From: andros @ 2012-04-27 21:53 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@rhel-62.androsad.fake>

This is a RFC. The patches will be tested throughly next week. They
are based upon Fred Issaman's direct IO patches.

-->Andy

Changes from Version 3:

- added module parameters for setting the DS timeo and retrans
- rewrote filelayout_reset_read/write using pnfs_read/write_don_resend_to_mds() 
- simply check the plh_segs list and if it is empty, do not send a layoutreturn
- moved nfs_put_client outside of spin_lock in nfs4_ds_disconnect
- added NFSv4.1 resend LAYOUTGET on data server invalid layout errors

Currently, when a data server connection goes down due to a network partion,
a data server failure, or an administrative action, RPC tasks in various
stages of the RPC finite state machine (FSM) need to transmit and timeout
(or other failure) before being redirected towards an alternative server
(MDS or another DS).
This can take a very long time if the connection goes down during a heavy
I/O load where the data server fore channel session slot_tbl_waitq and the
transport sending/pending waitqs are populated with many requests.
(see RedHat Bugzilla 756212 "Redirecting I/O through the MDS after a data
server network partition is very slow")
The current code also keeps the client structure and the session to the failed
data server until umount.

The module parameters dataserver_timeo and dataserver_retrans are equivalent to
the mount parameters of the same name. They determine how long the client
waits to recover from a DS disconnect error. E.g. how long the client waits
to begin recovery to the MDS, and how long the recovery takes - which can 
take up to (timeo * retrans * number of DS session slots).
 
These patches address this problem by setting data server RPC tasks to
RPC_TASK_SOFTCONN and handling the resultant connection errors as follows:

On a DS disconnect error, the pNFS deviceid is marked invalid which blocks any
new pNFS io using that deviceid. The RPC done routines for READ, WRITE and
COMMIT redirect failed requests to the MDS. The read/write RPC prepare
routines redirect the tasks that are awakened from the data server session
fore channel slot_tbl_waitq.

All data server io requests reference the data server client structure
across io calls, and the client is dereferenced upon deviceid invalidation so
that the client (and the session) is freed upon the last (failed) redirected io.

Testing:
I use a pynfs file layout server with a DS to test. The pynfs server and DS
is modified to use the local host for MDS to DS communication. I add a
second ipv4 address to the single machine interface for the DS to client
communication. While a "dd" or a read/write heavy Connectathon test is
running, the DS ip address is removed from the ethernet interface, and the
time the removal of the DS ip address during a DS COMMIT and have seen it
recover as well. :)


Andy Adamson (13):
  NFSv4.1 do not send LAYOUTRETURN when there are no layout segments
  NFSv4.1: cleanup filelayout invalid deviceid handling
  NFSv4.1 cleanup filelayout invalid layout handling
  NFSv4.1 set RPC_TASK_SOFTCONN for filelayout DS RPC calls
  NFSv4.1 data server timeo and retrans module parameters
  NFSv4.1: mark deviceid invalid on filelayout DS connection errors
  NFSv4.1 remove nfs4_reset_write and nfs4_reset_read
  NFSv4.1 Check invalid deviceid upon slot table waitq wakeup
  NFSv4.1 wake up all tasks on un-connected DS slot table waitq
  NFSv4.1 send layoutreturn to fence disconnected data server
  NFSv4.1 ref count nfs_client across filelayout data server io
  NFSv4.1 dereference a disconnected data server client record
  NFSv4.1 resend LAYOUTGET on data server invalid layout errors

 fs/nfs/client.c            |   12 +--
 fs/nfs/internal.h          |   12 +-
 fs/nfs/nfs4filelayout.c    |  237 +++++++++++++++++++++++++++++++-------------
 fs/nfs/nfs4filelayout.h    |   45 ++++++++-
 fs/nfs/nfs4filelayoutdev.c |   77 +++++++++-----
 fs/nfs/nfs4proc.c          |   35 -------
 fs/nfs/pnfs.c              |   10 ++-
 fs/nfs/pnfs.h              |    5 +
 fs/nfs/read.c              |    6 +-
 fs/nfs/write.c             |   13 ++-
 10 files changed, 291 insertions(+), 161 deletions(-)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH Version 4 01/13] NFSv4.1 do not send LAYOUTRETURN when there are no layout segments
  2012-04-27 21:53 [PATCH Version 4 00/13] NFSv4.1 file layout data server quick failover andros
@ 2012-04-27 21:53 ` andros
  2012-05-19 21:18   ` Myklebust, Trond
  2012-04-27 21:53 ` [PATCH Version 4 02/13] NFSv4.1: cleanup filelayout invalid deviceid handling andros
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 15+ messages in thread
From: andros @ 2012-04-27 21:53 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/pnfs.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 5e11557..463eb2f 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -657,7 +657,7 @@ _pnfs_return_layout(struct inode *ino)
 
 	spin_lock(&ino->i_lock);
 	lo = nfsi->layout;
-	if (!lo) {
+	if (!lo || list_empty(&lo->plh_segs)) {
 		spin_unlock(&ino->i_lock);
 		dprintk("%s: no layout to return\n", __func__);
 		return status;
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH Version 4 02/13] NFSv4.1: cleanup filelayout invalid deviceid handling
  2012-04-27 21:53 [PATCH Version 4 00/13] NFSv4.1 file layout data server quick failover andros
  2012-04-27 21:53 ` [PATCH Version 4 01/13] NFSv4.1 do not send LAYOUTRETURN when there are no layout segments andros
@ 2012-04-27 21:53 ` andros
  2012-04-27 21:53 ` [PATCH Version 4 03/13] NFSv4.1 cleanup filelayout invalid layout handling andros
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: andros @ 2012-04-27 21:53 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Move the invalid deviceid test into nfs4_fl_prepare_ds, called by the
filelayout read, write, and commit routines. NFS4_DEVICE_ID_NEG_ENTRY
is no longer needed.
Remove redundant printk's - filelayout_mark_devid_invalid prints a KERN_WARNING.

An invalid device prevents pNFS io.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayout.c    |   10 ----------
 fs/nfs/nfs4filelayout.h    |   21 +++++++++++++++++----
 fs/nfs/nfs4filelayoutdev.c |   37 +++++++++++--------------------------
 3 files changed, 28 insertions(+), 40 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 80a63f6..eebec9a 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -389,9 +389,6 @@ filelayout_read_pagelist(struct nfs_read_data *data)
 		__func__, hdr->inode->i_ino,
 		data->args.pgbase, (size_t)data->args.count, offset);
 
-	if (test_bit(NFS_DEVICEID_INVALID, &FILELAYOUT_DEVID_NODE(lseg)->flags))
-		return PNFS_NOT_ATTEMPTED;
-
 	/* Retrieve the correct rpc_client for the byte range */
 	j = nfs4_fl_calc_j_index(lseg, offset);
 	idx = nfs4_fl_calc_ds_index(lseg, j);
@@ -432,16 +429,11 @@ filelayout_write_pagelist(struct nfs_write_data *data, int sync)
 	struct nfs_fh *fh;
 	int status;
 
-	if (test_bit(NFS_DEVICEID_INVALID, &FILELAYOUT_DEVID_NODE(lseg)->flags))
-		return PNFS_NOT_ATTEMPTED;
-
 	/* Retrieve the correct rpc_client for the byte range */
 	j = nfs4_fl_calc_j_index(lseg, offset);
 	idx = nfs4_fl_calc_ds_index(lseg, j);
 	ds = nfs4_fl_prepare_ds(lseg, idx);
 	if (!ds) {
-		printk(KERN_ERR "NFS: %s: prepare_ds failed, use MDS\n",
-			__func__);
 		set_bit(lo_fail_bit(IOMODE_RW), &lseg->pls_layout->plh_flags);
 		set_bit(lo_fail_bit(IOMODE_READ), &lseg->pls_layout->plh_flags);
 		return PNFS_NOT_ATTEMPTED;
@@ -977,8 +969,6 @@ static int filelayout_initiate_commit(struct nfs_commit_data *data, int how)
 	idx = calc_ds_index_from_commit(lseg, data->ds_commit_index);
 	ds = nfs4_fl_prepare_ds(lseg, idx);
 	if (!ds) {
-		printk(KERN_ERR "NFS: %s: prepare_ds failed, use MDS\n",
-			__func__);
 		set_bit(lo_fail_bit(IOMODE_RW), &lseg->pls_layout->plh_flags);
 		set_bit(lo_fail_bit(IOMODE_READ), &lseg->pls_layout->plh_flags);
 		prepare_to_resend_writes(data);
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index 96b89bb..2f6330c 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -62,12 +62,8 @@ struct nfs4_pnfs_ds {
 	atomic_t		ds_count;
 };
 
-/* nfs4_file_layout_dsaddr flags */
-#define NFS4_DEVICE_ID_NEG_ENTRY	0x00000001
-
 struct nfs4_file_layout_dsaddr {
 	struct nfs4_deviceid_node	id_node;
-	unsigned long			flags;
 	u32				stripe_count;
 	u8				*stripe_indices;
 	u32				ds_num;
@@ -111,6 +107,23 @@ FILELAYOUT_DEVID_NODE(struct pnfs_layout_segment *lseg)
 	return &FILELAYOUT_LSEG(lseg)->dsaddr->id_node;
 }
 
+static inline void
+filelayout_mark_devid_invalid(struct nfs4_deviceid_node *node)
+{
+	u32 *p = (u32 *)&node->deviceid;
+
+	printk(KERN_WARNING "NFS: Deviceid [%x%x%x%x] marked out of use.\n",
+		p[0], p[1], p[2], p[3]);
+
+	set_bit(NFS_DEVICEID_INVALID, &node->flags);
+}
+
+static inline bool
+filelayout_test_devid_invalid(struct nfs4_deviceid_node *node)
+{
+	return test_bit(NFS_DEVICEID_INVALID, &node->flags);
+}
+
 extern struct nfs_fh *
 nfs4_fl_select_ds_fh(struct pnfs_layout_segment *lseg, u32 j);
 
diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index a866bbd..2b8ae96 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -791,48 +791,33 @@ nfs4_fl_select_ds_fh(struct pnfs_layout_segment *lseg, u32 j)
 	return flseg->fh_array[i];
 }
 
-static void
-filelayout_mark_devid_negative(struct nfs4_file_layout_dsaddr *dsaddr,
-			       int err, const char *ds_remotestr)
-{
-	u32 *p = (u32 *)&dsaddr->id_node.deviceid;
-
-	printk(KERN_ERR "NFS: data server %s connection error %d."
-		" Deviceid [%x%x%x%x] marked out of use.\n",
-		ds_remotestr, err, p[0], p[1], p[2], p[3]);
-
-	spin_lock(&nfs4_ds_cache_lock);
-	dsaddr->flags |= NFS4_DEVICE_ID_NEG_ENTRY;
-	spin_unlock(&nfs4_ds_cache_lock);
-}
-
 struct nfs4_pnfs_ds *
 nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
 {
 	struct nfs4_file_layout_dsaddr *dsaddr = FILELAYOUT_LSEG(lseg)->dsaddr;
 	struct nfs4_pnfs_ds *ds = dsaddr->ds_list[ds_idx];
+	struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(lseg);
+
+	if (filelayout_test_devid_invalid(devid))
+		return NULL;
 
 	if (ds == NULL) {
 		printk(KERN_ERR "NFS: %s: No data server for offset index %d\n",
 			__func__, ds_idx);
-		return NULL;
+		goto mark_dev_invalid;
 	}
 
 	if (!ds->ds_clp) {
 		struct nfs_server *s = NFS_SERVER(lseg->pls_layout->plh_inode);
 		int err;
 
-		if (dsaddr->flags & NFS4_DEVICE_ID_NEG_ENTRY) {
-			/* Already tried to connect, don't try again */
-			dprintk("%s Deviceid marked out of use\n", __func__);
-			return NULL;
-		}
 		err = nfs4_ds_connect(s, ds);
-		if (err) {
-			filelayout_mark_devid_negative(dsaddr, err,
-						       ds->ds_remotestr);
-			return NULL;
-		}
+		if (err)
+			goto mark_dev_invalid;
 	}
 	return ds;
+
+mark_dev_invalid:
+	filelayout_mark_devid_invalid(devid);
+	return NULL;
 }
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH Version 4 03/13] NFSv4.1 cleanup filelayout invalid layout handling
  2012-04-27 21:53 [PATCH Version 4 00/13] NFSv4.1 file layout data server quick failover andros
  2012-04-27 21:53 ` [PATCH Version 4 01/13] NFSv4.1 do not send LAYOUTRETURN when there are no layout segments andros
  2012-04-27 21:53 ` [PATCH Version 4 02/13] NFSv4.1: cleanup filelayout invalid deviceid handling andros
@ 2012-04-27 21:53 ` andros
  2012-04-27 21:53 ` [PATCH Version 4 04/13] NFSv4.1 set RPC_TASK_SOFTCONN for filelayout DS RPC calls andros
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: andros @ 2012-04-27 21:53 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

The invalid layout bits are should only be used to block LAYOUTGETs.

Do not invalidate a layout on deviceid invalidation.
Do not invalidate a layout on un-handled READ, WRITE, COMMIT errors.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayout.c |   28 ++++++----------------------
 1 files changed, 6 insertions(+), 22 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index eebec9a..b9edc88 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -148,7 +148,6 @@ wait_on_recovery:
 static int filelayout_read_done_cb(struct rpc_task *task,
 				struct nfs_read_data *data)
 {
-	struct nfs_pgio_header *hdr = data->header;
 	int reset = 0;
 
 	dprintk("%s DS read\n", __func__);
@@ -157,10 +156,8 @@ static int filelayout_read_done_cb(struct rpc_task *task,
 					  data->ds_clp, &reset) == -EAGAIN) {
 		dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
 			__func__, data->ds_clp, data->ds_clp->cl_session);
-		if (reset) {
-			pnfs_set_lo_fail(hdr->lseg);
+		if (reset)
 			nfs4_reset_read(task, data);
-		}
 		rpc_restart_call_prepare(task);
 		return -EAGAIN;
 	}
@@ -233,17 +230,14 @@ static void filelayout_read_release(void *data)
 static int filelayout_write_done_cb(struct rpc_task *task,
 				struct nfs_write_data *data)
 {
-	struct nfs_pgio_header *hdr = data->header;
 	int reset = 0;
 
 	if (filelayout_async_handle_error(task, data->args.context->state,
 					  data->ds_clp, &reset) == -EAGAIN) {
 		dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
 			__func__, data->ds_clp, data->ds_clp->cl_session);
-		if (reset) {
-			pnfs_set_lo_fail(hdr->lseg);
+		if (reset)
 			nfs4_reset_write(task, data);
-		}
 		rpc_restart_call_prepare(task);
 		return -EAGAIN;
 	}
@@ -272,10 +266,9 @@ static int filelayout_commit_done_cb(struct rpc_task *task,
 					  data->ds_clp, &reset) == -EAGAIN) {
 		dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
 			__func__, data->ds_clp, data->ds_clp->cl_session);
-		if (reset) {
+		if (reset)
 			prepare_to_resend_writes(data);
-			pnfs_set_lo_fail(data->lseg);
-		} else
+		else
 			rpc_restart_call_prepare(task);
 		return -EAGAIN;
 	}
@@ -393,12 +386,8 @@ filelayout_read_pagelist(struct nfs_read_data *data)
 	j = nfs4_fl_calc_j_index(lseg, offset);
 	idx = nfs4_fl_calc_ds_index(lseg, j);
 	ds = nfs4_fl_prepare_ds(lseg, idx);
-	if (!ds) {
-		/* Either layout fh index faulty, or ds connect failed */
-		set_bit(lo_fail_bit(IOMODE_RW), &lseg->pls_layout->plh_flags);
-		set_bit(lo_fail_bit(IOMODE_READ), &lseg->pls_layout->plh_flags);
+	if (!ds)
 		return PNFS_NOT_ATTEMPTED;
-	}
 	dprintk("%s USE DS: %s\n", __func__, ds->ds_remotestr);
 
 	/* No multipath support. Use first DS */
@@ -433,11 +422,8 @@ filelayout_write_pagelist(struct nfs_write_data *data, int sync)
 	j = nfs4_fl_calc_j_index(lseg, offset);
 	idx = nfs4_fl_calc_ds_index(lseg, j);
 	ds = nfs4_fl_prepare_ds(lseg, idx);
-	if (!ds) {
-		set_bit(lo_fail_bit(IOMODE_RW), &lseg->pls_layout->plh_flags);
-		set_bit(lo_fail_bit(IOMODE_READ), &lseg->pls_layout->plh_flags);
+	if (!ds)
 		return PNFS_NOT_ATTEMPTED;
-	}
 	dprintk("%s ino %lu sync %d req %Zu@%llu DS: %s\n", __func__,
 		hdr->inode->i_ino, sync, (size_t) data->args.count, offset,
 		ds->ds_remotestr);
@@ -969,8 +955,6 @@ static int filelayout_initiate_commit(struct nfs_commit_data *data, int how)
 	idx = calc_ds_index_from_commit(lseg, data->ds_commit_index);
 	ds = nfs4_fl_prepare_ds(lseg, idx);
 	if (!ds) {
-		set_bit(lo_fail_bit(IOMODE_RW), &lseg->pls_layout->plh_flags);
-		set_bit(lo_fail_bit(IOMODE_READ), &lseg->pls_layout->plh_flags);
 		prepare_to_resend_writes(data);
 		filelayout_commit_release(data);
 		return -EAGAIN;
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH Version 4 04/13] NFSv4.1 set RPC_TASK_SOFTCONN for filelayout DS RPC calls
  2012-04-27 21:53 [PATCH Version 4 00/13] NFSv4.1 file layout data server quick failover andros
                   ` (2 preceding siblings ...)
  2012-04-27 21:53 ` [PATCH Version 4 03/13] NFSv4.1 cleanup filelayout invalid layout handling andros
@ 2012-04-27 21:53 ` andros
  2012-04-27 21:53 ` [PATCH Version 4 05/13] NFSv4.1 data server timeo and retrans module parameters andros
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: andros @ 2012-04-27 21:53 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

RPC_TASK_SOFTCONN returns connection errors to the caller which allows the pNFS
file layout to quickly try the MDS or perhaps another DS.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/internal.h       |    6 +++---
 fs/nfs/nfs4filelayout.c |   10 ++++++----
 fs/nfs/read.c           |    6 +++---
 fs/nfs/write.c          |   13 +++++++------
 4 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 12d3818..5026bcc 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -309,7 +309,7 @@ extern void nfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,
 			const struct nfs_pgio_completion_ops *compl_ops);
 extern int nfs_initiate_read(struct rpc_clnt *clnt,
 			     struct nfs_read_data *data,
-			     const struct rpc_call_ops *call_ops);
+			     const struct rpc_call_ops *call_ops, int flags);
 extern void nfs_read_prepare(struct rpc_task *task, void *calldata);
 extern int nfs_generic_pagein(struct nfs_pageio_descriptor *desc,
 			      struct nfs_pgio_header *hdr);
@@ -336,13 +336,13 @@ extern void nfs_commit_free(struct nfs_commit_data *p);
 extern int nfs_initiate_write(struct rpc_clnt *clnt,
 			      struct nfs_write_data *data,
 			      const struct rpc_call_ops *call_ops,
-			      int how);
+			      int how, int flags);
 extern void nfs_write_prepare(struct rpc_task *task, void *calldata);
 extern void nfs_commit_prepare(struct rpc_task *task, void *calldata);
 extern int nfs_initiate_commit(struct rpc_clnt *clnt,
 			       struct nfs_commit_data *data,
 			       const struct rpc_call_ops *call_ops,
-			       int how);
+			       int how, int flags);
 extern void nfs_init_commit(struct nfs_commit_data *data,
 			    struct list_head *head,
 			    struct pnfs_layout_segment *lseg,
diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index b9edc88..0db8c07 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -401,7 +401,7 @@ filelayout_read_pagelist(struct nfs_read_data *data)
 
 	/* Perform an asynchronous read to ds */
 	status = nfs_initiate_read(ds->ds_clp->cl_rpcclient, data,
-				   &filelayout_read_call_ops);
+				  &filelayout_read_call_ops, RPC_TASK_SOFTCONN);
 	BUG_ON(status != 0);
 	return PNFS_ATTEMPTED;
 }
@@ -441,7 +441,8 @@ filelayout_write_pagelist(struct nfs_write_data *data, int sync)
 
 	/* Perform an asynchronous write */
 	status = nfs_initiate_write(ds->ds_clp->cl_rpcclient, data,
-				    &filelayout_write_call_ops, sync);
+				    &filelayout_write_call_ops, sync,
+				    RPC_TASK_SOFTCONN);
 	BUG_ON(status != 0);
 	return PNFS_ATTEMPTED;
 }
@@ -966,7 +967,8 @@ static int filelayout_initiate_commit(struct nfs_commit_data *data, int how)
 	if (fh)
 		data->args.fh = fh;
 	return nfs_initiate_commit(ds->ds_clp->cl_rpcclient, data,
-				   &filelayout_commit_call_ops, how);
+				   &filelayout_commit_call_ops, how,
+				   RPC_TASK_SOFTCONN);
 }
 
 static int
@@ -1120,7 +1122,7 @@ filelayout_commit_pagelist(struct inode *inode, struct list_head *mds_pages,
 		if (!data->lseg) {
 			nfs_init_commit(data, mds_pages, NULL, cinfo);
 			nfs_initiate_commit(NFS_CLIENT(inode), data,
-					    data->mds_ops, how);
+					    data->mds_ops, how, 0);
 		} else {
 			struct pnfs_commit_bucket *buckets;
 
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 35e2dce..4d53a60 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -214,7 +214,7 @@ out:
 
 int nfs_initiate_read(struct rpc_clnt *clnt,
 		      struct nfs_read_data *data,
-		      const struct rpc_call_ops *call_ops)
+		      const struct rpc_call_ops *call_ops, int flags)
 {
 	struct inode *inode = data->header->inode;
 	int swap_flags = IS_SWAPFILE(inode) ? NFS_RPC_SWAPFLAGS : 0;
@@ -231,7 +231,7 @@ int nfs_initiate_read(struct rpc_clnt *clnt,
 		.callback_ops = call_ops,
 		.callback_data = data,
 		.workqueue = nfsiod_workqueue,
-		.flags = RPC_TASK_ASYNC | swap_flags,
+		.flags = RPC_TASK_ASYNC | swap_flags | flags,
 	};
 
 	/* Set up the initial task struct. */
@@ -280,7 +280,7 @@ static int nfs_do_read(struct nfs_read_data *data,
 {
 	struct inode *inode = data->header->inode;
 
-	return nfs_initiate_read(NFS_CLIENT(inode), data, call_ops);
+	return nfs_initiate_read(NFS_CLIENT(inode), data, call_ops, 0);
 }
 
 static int
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index fec214b..2407c2f 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -902,7 +902,7 @@ static int flush_task_priority(int how)
 int nfs_initiate_write(struct rpc_clnt *clnt,
 		       struct nfs_write_data *data,
 		       const struct rpc_call_ops *call_ops,
-		       int how)
+		       int how, int flags)
 {
 	struct inode *inode = data->header->inode;
 	int priority = flush_task_priority(how);
@@ -919,7 +919,7 @@ int nfs_initiate_write(struct rpc_clnt *clnt,
 		.callback_ops = call_ops,
 		.callback_data = data,
 		.workqueue = nfsiod_workqueue,
-		.flags = RPC_TASK_ASYNC,
+		.flags = RPC_TASK_ASYNC | flags,
 		.priority = priority,
 	};
 	int ret = 0;
@@ -995,7 +995,7 @@ static int nfs_do_write(struct nfs_write_data *data,
 {
 	struct inode *inode = data->header->inode;
 
-	return nfs_initiate_write(NFS_CLIENT(inode), data, call_ops, how);
+	return nfs_initiate_write(NFS_CLIENT(inode), data, call_ops, how, 0);
 }
 
 static int nfs_do_multiple_writes(struct list_head *head,
@@ -1381,7 +1381,7 @@ EXPORT_SYMBOL_GPL(nfs_commitdata_release);
 
 int nfs_initiate_commit(struct rpc_clnt *clnt, struct nfs_commit_data *data,
 			const struct rpc_call_ops *call_ops,
-			int how)
+			int how, int flags)
 {
 	struct rpc_task *task;
 	int priority = flush_task_priority(how);
@@ -1397,7 +1397,7 @@ int nfs_initiate_commit(struct rpc_clnt *clnt, struct nfs_commit_data *data,
 		.callback_ops = call_ops,
 		.callback_data = data,
 		.workqueue = nfsiod_workqueue,
-		.flags = RPC_TASK_ASYNC,
+		.flags = RPC_TASK_ASYNC | flags,
 		.priority = priority,
 	};
 	/* Set up the initial task struct.  */
@@ -1486,7 +1486,8 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how,
 	/* Set up the argument struct */
 	nfs_init_commit(data, head, NULL, cinfo);
 	atomic_inc(&cinfo->mds->rpcs_out);
-	return nfs_initiate_commit(NFS_CLIENT(inode), data, data->mds_ops, how);
+	return nfs_initiate_commit(NFS_CLIENT(inode), data, data->mds_ops,
+				   how, 0);
  out_bad:
 	nfs_retry_commit(head, NULL, cinfo);
 	cinfo->completion_ops->error_cleanup(NFS_I(inode));
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH Version 4 05/13] NFSv4.1 data server timeo and retrans module parameters
  2012-04-27 21:53 [PATCH Version 4 00/13] NFSv4.1 file layout data server quick failover andros
                   ` (3 preceding siblings ...)
  2012-04-27 21:53 ` [PATCH Version 4 04/13] NFSv4.1 set RPC_TASK_SOFTCONN for filelayout DS RPC calls andros
@ 2012-04-27 21:53 ` andros
  2012-04-27 21:53 ` [PATCH Version 4 06/13] NFSv4.1: mark deviceid invalid on filelayout DS connection errors andros
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: andros @ 2012-04-27 21:53 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Set the recovery parameters for data servers.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/client.c            |   12 ++++--------
 fs/nfs/internal.h          |    4 +++-
 fs/nfs/nfs4filelayout.h    |    7 +++++++
 fs/nfs/nfs4filelayoutdev.c |   18 ++++++++++++++++--
 4 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index da7b5e4..0f54732 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -1465,8 +1465,8 @@ error:
  * the MDS.
  */
 struct nfs_client *nfs4_set_ds_client(struct nfs_client* mds_clp,
-		const struct sockaddr *ds_addr,
-		int ds_addrlen, int ds_proto)
+		const struct sockaddr *ds_addr, int ds_addrlen,
+		int ds_proto, unsigned int ds_timeo, unsigned int ds_retrans)
 {
 	struct nfs_client_initdata cl_init = {
 		.addr = ds_addr,
@@ -1476,12 +1476,7 @@ struct nfs_client *nfs4_set_ds_client(struct nfs_client* mds_clp,
 		.minorversion = mds_clp->cl_minorversion,
 		.net = mds_clp->net,
 	};
-	struct rpc_timeout ds_timeout = {
-		.to_initval = 15 * HZ,
-		.to_maxval = 15 * HZ,
-		.to_retries = 1,
-		.to_exponential = 1,
-	};
+	struct rpc_timeout ds_timeout;
 	struct nfs_client *clp;
 
 	/*
@@ -1489,6 +1484,7 @@ struct nfs_client *nfs4_set_ds_client(struct nfs_client* mds_clp,
 	 * cl_ipaddr so as to use the same EXCHANGE_ID co_ownerid as the MDS
 	 * (section 13.1 RFC 5661).
 	 */
+	nfs_init_timeout_values(&ds_timeout, ds_proto, ds_timeo, ds_retrans);
 	clp = nfs_get_client(&cl_init, &ds_timeout, mds_clp->cl_ipaddr,
 			     mds_clp->cl_rpcclient->cl_auth->au_flavor, 0);
 
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 5026bcc..f38e099 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -170,7 +170,9 @@ extern void nfs_mark_client_ready(struct nfs_client *clp, int state);
 extern int nfs4_check_client_ready(struct nfs_client *clp);
 extern struct nfs_client *nfs4_set_ds_client(struct nfs_client* mds_clp,
 					     const struct sockaddr *ds_addr,
-					     int ds_addrlen, int ds_proto);
+					     int ds_addrlen, int ds_proto,
+					     unsigned int ds_timeo,
+					     unsigned int ds_retrans);
 #ifdef CONFIG_PROC_FS
 extern int __init nfs_fs_proc_init(void);
 extern void nfs_fs_proc_exit(void);
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index 2f6330c..6fb1901 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -33,6 +33,13 @@
 #include "pnfs.h"
 
 /*
+ * Default data server connection timeout and retrans vaules.
+ * Set by module paramters dataserver_timeo and dataserver_retrans.
+ */
+#define NFS4_DEF_DS_TIMEO   60
+#define NFS4_DEF_DS_RETRANS 5
+
+/*
  * Field testing shows we need to support up to 4096 stripe indices.
  * We store each index as a u8 (u32 on the wire) to keep the memory footprint
  * reasonable. This in turn means we support a maximum of 256
diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index 2b8ae96..d5a92cf 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -30,12 +30,16 @@
 
 #include <linux/nfs_fs.h>
 #include <linux/vmalloc.h>
+#include <linux/module.h>
 
 #include "internal.h"
 #include "nfs4filelayout.h"
 
 #define NFSDBG_FACILITY		NFSDBG_PNFS_LD
 
+static unsigned int dataserver_timeo = NFS4_DEF_DS_TIMEO;
+static unsigned int dataserver_retrans = NFS4_DEF_DS_RETRANS;
+
 /*
  * Data server cache
  *
@@ -165,8 +169,9 @@ nfs4_ds_connect(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds)
 			__func__, ds->ds_remotestr, da->da_remotestr);
 
 		clp = nfs4_set_ds_client(mds_srv->nfs_client,
-				 (struct sockaddr *)&da->da_addr,
-				 da->da_addrlen, IPPROTO_TCP);
+					(struct sockaddr *)&da->da_addr,
+					da->da_addrlen, IPPROTO_TCP,
+					dataserver_timeo, dataserver_retrans);
 		if (!IS_ERR(clp))
 			break;
 	}
@@ -821,3 +826,12 @@ mark_dev_invalid:
 	filelayout_mark_devid_invalid(devid);
 	return NULL;
 }
+
+module_param(dataserver_retrans, uint, 0644);
+MODULE_PARM_DESC(dataserver_retrans, "The  number of times the NFSv4.1 client "
+			"retries a request before it attempts further "
+			" recovery  action.");
+module_param(dataserver_timeo, uint, 0644);
+MODULE_PARM_DESC(dataserver_timeo, "The time (in tenths of a second) the "
+			"NFSv4.1  client  waits for a response from a "
+			" data server before it retries an NFS request.");
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH Version 4 06/13] NFSv4.1: mark deviceid invalid on filelayout DS connection errors
  2012-04-27 21:53 [PATCH Version 4 00/13] NFSv4.1 file layout data server quick failover andros
                   ` (4 preceding siblings ...)
  2012-04-27 21:53 ` [PATCH Version 4 05/13] NFSv4.1 data server timeo and retrans module parameters andros
@ 2012-04-27 21:53 ` andros
  2012-04-27 21:53 ` [PATCH Version 4 07/13] NFSv4.1 remove nfs4_reset_write and nfs4_reset_read andros
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: andros @ 2012-04-27 21:53 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

This prevents the use of any layout for i/o that references the deviceid.
I/O is redirected through the MDS.

Redirect the unhandled failed I/O to the MDS without marking either the
layout or the deviceid invalid.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayout.c |  130 +++++++++++++++++++++++++++++++++++------------
 fs/nfs/nfs4filelayout.h |    3 +
 fs/nfs/pnfs.c           |    6 ++-
 fs/nfs/pnfs.h           |    4 ++
 4 files changed, 108 insertions(+), 35 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 0db8c07..f503cbe5 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -82,29 +82,77 @@ filelayout_get_dserver_offset(struct pnfs_layout_segment *lseg, loff_t offset)
 	BUG();
 }
 
+static void filelayout_reset_write(struct nfs_write_data *data)
+{
+	struct nfs_pgio_header *hdr = data->header;
+	struct inode *inode = hdr->inode;
+	struct rpc_task *task = &data->task;
+
+	if (!test_and_set_bit(NFS_IOHDR_REDO, &hdr->flags)) {
+		dprintk("%s Reset task %5u for i/o through MDS "
+			"(req %s/%lld, %u bytes @ offset %llu)\n", __func__,
+			data->task.tk_pid,
+			inode->i_sb->s_id,
+			(long long)NFS_FILEID(inode),
+			data->args.count,
+			(unsigned long long)data->args.offset);
+
+		task->tk_status = pnfs_write_done_resend_to_mds(hdr->inode,
+							&hdr->pages,
+							hdr->completion_ops);
+	}
+}
+
+static void filelayout_reset_read(struct nfs_read_data *data)
+{
+	struct nfs_pgio_header *hdr = data->header;
+	struct inode *inode = hdr->inode;
+	struct rpc_task *task = &data->task;
+
+	if (!test_and_set_bit(NFS_IOHDR_REDO, &hdr->flags)) {
+		dprintk("%s Reset task %5u for i/o through MDS "
+			"(req %s/%lld, %u bytes @ offset %llu)\n", __func__,
+			data->task.tk_pid,
+			inode->i_sb->s_id,
+			(long long)NFS_FILEID(inode),
+			data->args.count,
+			(unsigned long long)data->args.offset);
+
+		task->tk_status = pnfs_read_done_resend_to_mds(hdr->inode,
+							&hdr->pages,
+							hdr->completion_ops);
+	}
+}
+
 static int filelayout_async_handle_error(struct rpc_task *task,
 					 struct nfs4_state *state,
 					 struct nfs_client *clp,
-					 int *reset)
+					 struct pnfs_layout_segment *lseg)
 {
-	struct nfs_server *mds_server = NFS_SERVER(state->inode);
+	struct inode *inode = lseg->pls_layout->plh_inode;
+	struct nfs_server *mds_server = NFS_SERVER(inode);
+	struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(lseg);
 	struct nfs_client *mds_client = mds_server->nfs_client;
 
 	if (task->tk_status >= 0)
 		return 0;
-	*reset = 0;
 
 	switch (task->tk_status) {
 	/* MDS state errors */
 	case -NFS4ERR_DELEG_REVOKED:
 	case -NFS4ERR_ADMIN_REVOKED:
 	case -NFS4ERR_BAD_STATEID:
+		if (state == NULL)
+			break;
 		nfs_remove_bad_delegation(state->inode);
 	case -NFS4ERR_OPENMODE:
+		if (state == NULL)
+			break;
 		nfs4_schedule_stateid_recovery(mds_server, state);
 		goto wait_on_recovery;
 	case -NFS4ERR_EXPIRED:
-		nfs4_schedule_stateid_recovery(mds_server, state);
+		if (state != NULL)
+			nfs4_schedule_stateid_recovery(mds_server, state);
 		nfs4_schedule_lease_recovery(mds_client);
 		goto wait_on_recovery;
 	/* DS session errors */
@@ -127,11 +175,22 @@ static int filelayout_async_handle_error(struct rpc_task *task,
 		break;
 	case -NFS4ERR_RETRY_UNCACHED_REP:
 		break;
+	/* RPC connection errors */
+	case -ECONNREFUSED:
+	case -EHOSTDOWN:
+	case -EHOSTUNREACH:
+	case -ENETUNREACH:
+	case -EIO:
+	case -ETIMEDOUT:
+	case -EPIPE:
+		dprintk("%s DS connection error %d\n", __func__,
+			task->tk_status);
+		filelayout_mark_devid_invalid(devid);
+		/* fall through */
 	default:
-		dprintk("%s DS error. Retry through MDS %d\n", __func__,
+		dprintk("%s Retry through MDS. Error %d\n", __func__,
 			task->tk_status);
-		*reset = 1;
-		break;
+		return -NFS4ERR_RESET_TO_MDS;
 	}
 out:
 	task->tk_status = 0;
@@ -148,16 +207,17 @@ wait_on_recovery:
 static int filelayout_read_done_cb(struct rpc_task *task,
 				struct nfs_read_data *data)
 {
-	int reset = 0;
+	struct nfs_pgio_header *hdr = data->header;
+	int err;
 
-	dprintk("%s DS read\n", __func__);
+	err = filelayout_async_handle_error(task, data->args.context->state,
+					    data->ds_clp, hdr->lseg);
 
-	if (filelayout_async_handle_error(task, data->args.context->state,
-					  data->ds_clp, &reset) == -EAGAIN) {
-		dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
-			__func__, data->ds_clp, data->ds_clp->cl_session);
-		if (reset)
-			nfs4_reset_read(task, data);
+	switch (err) {
+	case -NFS4ERR_RESET_TO_MDS:
+		filelayout_reset_read(data);
+		return task->tk_status;
+	case -EAGAIN:
 		rpc_restart_call_prepare(task);
 		return -EAGAIN;
 	}
@@ -230,14 +290,17 @@ static void filelayout_read_release(void *data)
 static int filelayout_write_done_cb(struct rpc_task *task,
 				struct nfs_write_data *data)
 {
-	int reset = 0;
-
-	if (filelayout_async_handle_error(task, data->args.context->state,
-					  data->ds_clp, &reset) == -EAGAIN) {
-		dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
-			__func__, data->ds_clp, data->ds_clp->cl_session);
-		if (reset)
-			nfs4_reset_write(task, data);
+	struct nfs_pgio_header *hdr = data->header;
+	int err;
+
+	err = filelayout_async_handle_error(task, data->args.context->state,
+					    data->ds_clp, hdr->lseg);
+
+	switch (err) {
+	case -NFS4ERR_RESET_TO_MDS:
+		filelayout_reset_write(data);
+		return task->tk_status;
+	case -EAGAIN:
 		rpc_restart_call_prepare(task);
 		return -EAGAIN;
 	}
@@ -260,16 +323,17 @@ static void prepare_to_resend_writes(struct nfs_commit_data *data)
 static int filelayout_commit_done_cb(struct rpc_task *task,
 				     struct nfs_commit_data *data)
 {
-	int reset = 0;
-
-	if (filelayout_async_handle_error(task, data->context->state,
-					  data->ds_clp, &reset) == -EAGAIN) {
-		dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
-			__func__, data->ds_clp, data->ds_clp->cl_session);
-		if (reset)
-			prepare_to_resend_writes(data);
-		else
-			rpc_restart_call_prepare(task);
+	int err;
+
+	err = filelayout_async_handle_error(task, NULL, data->ds_clp,
+					    data->lseg);
+
+	switch (err) {
+	case -NFS4ERR_RESET_TO_MDS:
+		prepare_to_resend_writes(data);
+		return -EAGAIN;
+	case -EAGAIN:
+		rpc_restart_call_prepare(task);
 		return -EAGAIN;
 	}
 
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index 6fb1901..3259be6 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -48,6 +48,9 @@
 #define NFS4_PNFS_MAX_STRIPE_CNT 4096
 #define NFS4_PNFS_MAX_MULTI_CNT  256 /* 256 fit into a u8 stripe_index */
 
+/* error codes for internal use */
+#define NFS4ERR_RESET_TO_MDS   12001
+
 enum stripetype4 {
 	STRIPE_SPARSE = 1,
 	STRIPE_DENSE = 2
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 463eb2f..78fd7eb 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1175,7 +1175,7 @@ pnfs_generic_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
 }
 EXPORT_SYMBOL_GPL(pnfs_generic_pg_test);
 
-static int pnfs_write_done_resend_to_mds(struct inode *inode,
+int pnfs_write_done_resend_to_mds(struct inode *inode,
 				struct list_head *head,
 				const struct nfs_pgio_completion_ops *compl_ops)
 {
@@ -1203,6 +1203,7 @@ static int pnfs_write_done_resend_to_mds(struct inode *inode,
 	}
 	return 0;
 }
+EXPORT_SYMBOL_GPL(pnfs_write_done_resend_to_mds);
 
 static void pnfs_ld_handle_write_error(struct nfs_write_data *data)
 {
@@ -1330,7 +1331,7 @@ pnfs_generic_pg_writepages(struct nfs_pageio_descriptor *desc)
 }
 EXPORT_SYMBOL_GPL(pnfs_generic_pg_writepages);
 
-static int pnfs_read_done_resend_to_mds(struct inode *inode,
+int pnfs_read_done_resend_to_mds(struct inode *inode,
 				struct list_head *head,
 				const struct nfs_pgio_completion_ops *compl_ops)
 {
@@ -1354,6 +1355,7 @@ static int pnfs_read_done_resend_to_mds(struct inode *inode,
 	}
 	return 0;
 }
+EXPORT_SYMBOL_GPL(pnfs_read_done_resend_to_mds);
 
 static void pnfs_ld_handle_read_error(struct nfs_read_data *data)
 {
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 8efbee7..681875d 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -222,6 +222,10 @@ struct pnfs_layout_segment *pnfs_update_layout(struct inode *ino,
 					       gfp_t gfp_flags);
 
 void nfs4_deviceid_mark_client_invalid(struct nfs_client *clp);
+int pnfs_read_done_resend_to_mds(struct inode *inode, struct list_head *head,
+			const struct nfs_pgio_completion_ops *compl_ops);
+int pnfs_write_done_resend_to_mds(struct inode *inode, struct list_head *head,
+			const struct nfs_pgio_completion_ops *compl_ops);
 
 /* nfs4_deviceid_flags */
 enum {
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH Version 4 07/13] NFSv4.1 remove nfs4_reset_write and nfs4_reset_read
  2012-04-27 21:53 [PATCH Version 4 00/13] NFSv4.1 file layout data server quick failover andros
                   ` (5 preceding siblings ...)
  2012-04-27 21:53 ` [PATCH Version 4 06/13] NFSv4.1: mark deviceid invalid on filelayout DS connection errors andros
@ 2012-04-27 21:53 ` andros
  2012-04-27 21:53 ` [PATCH Version 4 08/13] NFSv4.1 Check invalid deviceid upon slot table waitq wakeup andros
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: andros @ 2012-04-27 21:53 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Replaced by filelayout_reset_write and filelayout_reset_read

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/internal.h |    2 --
 fs/nfs/nfs4proc.c |   35 -----------------------------------
 2 files changed, 0 insertions(+), 37 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index f38e099..0397d2f 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -383,13 +383,11 @@ void nfs_init_cinfo_from_dreq(struct nfs_commit_info *cinfo,
 
 /* nfs4proc.c */
 extern void __nfs4_read_done_cb(struct nfs_read_data *);
-extern void nfs4_reset_read(struct rpc_task *task, struct nfs_read_data *data);
 extern int nfs4_init_client(struct nfs_client *clp,
 			    const struct rpc_timeout *timeparms,
 			    const char *ip_addr,
 			    rpc_authflavor_t authflavour,
 			    int noresvport);
-extern void nfs4_reset_write(struct rpc_task *task, struct nfs_write_data *data);
 extern int _nfs4_call_sync(struct rpc_clnt *clnt,
 			   struct nfs_server *server,
 			   struct rpc_message *msg,
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 6365b02..bc9c2c9 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3329,23 +3329,6 @@ static void nfs4_proc_read_rpc_prepare(struct rpc_task *task, struct nfs_read_da
 	rpc_call_start(task);
 }
 
-/* Reset the the nfs_read_data to send the read to the MDS. */
-void nfs4_reset_read(struct rpc_task *task, struct nfs_read_data *data)
-{
-	struct nfs_pgio_header *hdr = data->header;
-	struct inode *inode = hdr->inode;
-
-	dprintk("%s Reset task for i/o through\n", __func__);
-	data->ds_clp = NULL;
-	/* offsets will differ in the dense stripe case */
-	data->args.offset = data->mds_offset;
-	data->args.fh     = NFS_FH(inode);
-	data->read_done_cb = nfs4_read_done_cb;
-	task->tk_ops = hdr->mds_ops;
-	rpc_task_reset_client(task, NFS_CLIENT(inode));
-}
-EXPORT_SYMBOL_GPL(nfs4_reset_read);
-
 static int nfs4_write_done_cb(struct rpc_task *task, struct nfs_write_data *data)
 {
 	struct inode *inode = data->header->inode;
@@ -3369,24 +3352,6 @@ static int nfs4_write_done(struct rpc_task *task, struct nfs_write_data *data)
 		nfs4_write_done_cb(task, data);
 }
 
-/* Reset the the nfs_write_data to send the write to the MDS. */
-void nfs4_reset_write(struct rpc_task *task, struct nfs_write_data *data)
-{
-	struct nfs_pgio_header *hdr = data->header;
-	struct inode *inode = hdr->inode;
-
-	dprintk("%s Reset task for i/o through\n", __func__);
-	data->ds_clp     = NULL;
-	data->write_done_cb = nfs4_write_done_cb;
-	data->args.fh       = NFS_FH(inode);
-	data->args.bitmask  = data->res.server->cache_consistency_bitmask;
-	data->args.offset   = data->mds_offset;
-	data->res.fattr     = &data->fattr;
-	task->tk_ops        = hdr->mds_ops;
-	rpc_task_reset_client(task, NFS_CLIENT(inode));
-}
-EXPORT_SYMBOL_GPL(nfs4_reset_write);
-
 static void nfs4_proc_write_setup(struct nfs_write_data *data, struct rpc_message *msg)
 {
 	struct nfs_server *server = NFS_SERVER(data->header->inode);
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH Version 4 08/13] NFSv4.1 Check invalid deviceid upon slot table waitq wakeup
  2012-04-27 21:53 [PATCH Version 4 00/13] NFSv4.1 file layout data server quick failover andros
                   ` (6 preceding siblings ...)
  2012-04-27 21:53 ` [PATCH Version 4 07/13] NFSv4.1 remove nfs4_reset_write and nfs4_reset_read andros
@ 2012-04-27 21:53 ` andros
  2012-04-27 21:53 ` [PATCH Version 4 09/13] NFSv4.1 wake up all tasks on un-connected DS slot table waitq andros
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: andros @ 2012-04-27 21:53 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Tasks sleeping on the slot table waitq wake to the rpc_prepare_task state.
Reset the task for io through the MDS if the deviceid is invalid.

The reset functions put the io pages through the pageio layer which has the
advantage of re-coalescing which allows for the MDS and DS having different
r/wsizes. Exit the awakened task without executing the rpc_call_done routine.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayout.c |   20 ++++++++++++++++++++
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index f503cbe5..1b9bedb 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -252,7 +252,14 @@ filelayout_set_layoutcommit(struct nfs_write_data *wdata)
 static void filelayout_read_prepare(struct rpc_task *task, void *data)
 {
 	struct nfs_read_data *rdata = data;
+	struct pnfs_layout_segment *lseg = rdata->header->lseg;
 
+	if (filelayout_test_devid_invalid(FILELAYOUT_DEVID_NODE(lseg))) {
+		dprintk("%s task %u reset io to MDS\n", __func__, task->tk_pid);
+		filelayout_reset_read(rdata);
+		rpc_exit(task, 0);
+		return;
+	}
 	rdata->read_done_cb = filelayout_read_done_cb;
 
 	if (nfs41_setup_sequence(rdata->ds_clp->cl_session,
@@ -269,6 +276,9 @@ static void filelayout_read_call_done(struct rpc_task *task, void *data)
 
 	dprintk("--> %s task->tk_status %d\n", __func__, task->tk_status);
 
+	if (test_bit(NFS_IOHDR_REDO, &rdata->header->flags))
+		return;
+
 	/* Note this may cause RPC to be resent */
 	rdata->header->mds_ops->rpc_call_done(task, data);
 }
@@ -343,7 +353,14 @@ static int filelayout_commit_done_cb(struct rpc_task *task,
 static void filelayout_write_prepare(struct rpc_task *task, void *data)
 {
 	struct nfs_write_data *wdata = data;
+	struct pnfs_layout_segment *lseg = wdata->header->lseg;
 
+	if (filelayout_test_devid_invalid(FILELAYOUT_DEVID_NODE(lseg))) {
+		dprintk("%s task %u reset io to MDS\n", __func__, task->tk_pid);
+		filelayout_reset_write(wdata);
+		rpc_exit(task, 0);
+		return;
+	}
 	if (nfs41_setup_sequence(wdata->ds_clp->cl_session,
 				&wdata->args.seq_args, &wdata->res.seq_res,
 				task))
@@ -356,6 +373,9 @@ static void filelayout_write_call_done(struct rpc_task *task, void *data)
 {
 	struct nfs_write_data *wdata = data;
 
+	if (test_bit(NFS_IOHDR_REDO, &wdata->header->flags))
+		return;
+
 	/* Note this may cause RPC to be resent */
 	wdata->header->mds_ops->rpc_call_done(task, data);
 }
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH Version 4 09/13] NFSv4.1 wake up all tasks on un-connected DS slot table waitq
  2012-04-27 21:53 [PATCH Version 4 00/13] NFSv4.1 file layout data server quick failover andros
                   ` (7 preceding siblings ...)
  2012-04-27 21:53 ` [PATCH Version 4 08/13] NFSv4.1 Check invalid deviceid upon slot table waitq wakeup andros
@ 2012-04-27 21:53 ` andros
  2012-04-27 21:53 ` [PATCH Version 4 10/13] NFSv4.1 send layoutreturn to fence disconnected data server andros
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: andros @ 2012-04-27 21:53 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

The DS has a connection error (invalid deviceid). Drain the fore channel
slot table waitq.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayout.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 1b9bedb..a63062d 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -133,6 +133,7 @@ static int filelayout_async_handle_error(struct rpc_task *task,
 	struct nfs_server *mds_server = NFS_SERVER(inode);
 	struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(lseg);
 	struct nfs_client *mds_client = mds_server->nfs_client;
+	struct nfs4_slot_table *tbl = &clp->cl_session->fc_slot_table;
 
 	if (task->tk_status >= 0)
 		return 0;
@@ -186,6 +187,7 @@ static int filelayout_async_handle_error(struct rpc_task *task,
 		dprintk("%s DS connection error %d\n", __func__,
 			task->tk_status);
 		filelayout_mark_devid_invalid(devid);
+		rpc_wake_up(&tbl->slot_tbl_waitq);
 		/* fall through */
 	default:
 		dprintk("%s Retry through MDS. Error %d\n", __func__,
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH Version 4 10/13] NFSv4.1 send layoutreturn to fence disconnected data server
  2012-04-27 21:53 [PATCH Version 4 00/13] NFSv4.1 file layout data server quick failover andros
                   ` (8 preceding siblings ...)
  2012-04-27 21:53 ` [PATCH Version 4 09/13] NFSv4.1 wake up all tasks on un-connected DS slot table waitq andros
@ 2012-04-27 21:53 ` andros
  2012-04-27 21:53 ` [PATCH Version 4 11/13] NFSv4.1 ref count nfs_client across filelayout data server io andros
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: andros @ 2012-04-27 21:53 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Let the MDS know that you are redirecting I/O from pNFS to MDS.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayout.c |    2 ++
 fs/nfs/pnfs.c           |    1 +
 2 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index a63062d..c6b7c18 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -186,6 +186,8 @@ static int filelayout_async_handle_error(struct rpc_task *task,
 	case -EPIPE:
 		dprintk("%s DS connection error %d\n", __func__,
 			task->tk_status);
+		if (!filelayout_test_devid_invalid(devid))
+			_pnfs_return_layout(state->inode);
 		filelayout_mark_devid_invalid(devid);
 		rpc_wake_up(&tbl->slot_tbl_waitq);
 		/* fall through */
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 78fd7eb..98a7ec8 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -692,6 +692,7 @@ out:
 	dprintk("<-- %s status: %d\n", __func__, status);
 	return status;
 }
+EXPORT_SYMBOL_GPL(_pnfs_return_layout);
 
 bool pnfs_roc(struct inode *ino)
 {
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH Version 4 11/13] NFSv4.1 ref count nfs_client across filelayout data server io
  2012-04-27 21:53 [PATCH Version 4 00/13] NFSv4.1 file layout data server quick failover andros
                   ` (9 preceding siblings ...)
  2012-04-27 21:53 ` [PATCH Version 4 10/13] NFSv4.1 send layoutreturn to fence disconnected data server andros
@ 2012-04-27 21:53 ` andros
  2012-04-27 21:53 ` [PATCH Version 4 12/13] NFSv4.1 dereference a disconnected data server client record andros
  2012-04-27 21:53 ` [PATCH Version 4 13/13] NFSv4.1 resend LAYOUTGET on data server invalid layout errors andros
  12 siblings, 0 replies; 15+ messages in thread
From: andros @ 2012-04-27 21:53 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Prepare to put a dis-connected DS client record.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayout.c |   26 +++++++++++++++++++++-----
 1 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index c6b7c18..eb8eb00 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -101,6 +101,9 @@ static void filelayout_reset_write(struct nfs_write_data *data)
 							&hdr->pages,
 							hdr->completion_ops);
 	}
+	/* balance nfs_get_client in filelayout_write_pagelist */
+	nfs_put_client(data->ds_clp);
+	data->ds_clp     = NULL;
 }
 
 static void filelayout_reset_read(struct nfs_read_data *data)
@@ -122,6 +125,9 @@ static void filelayout_reset_read(struct nfs_read_data *data)
 							&hdr->pages,
 							hdr->completion_ops);
 	}
+	/* balance nfs_get_client in filelayout_read_pagelist */
+	nfs_put_client(data->ds_clp);
+	data->ds_clp = NULL;
 }
 
 static int filelayout_async_handle_error(struct rpc_task *task,
@@ -298,6 +304,8 @@ static void filelayout_read_release(void *data)
 {
 	struct nfs_read_data *rdata = data;
 
+	if (!test_bit(NFS_IOHDR_REDO, &rdata->header->flags))
+		nfs_put_client(rdata->ds_clp);
 	rdata->header->mds_ops->rpc_release(data);
 }
 
@@ -395,6 +403,8 @@ static void filelayout_write_release(void *data)
 {
 	struct nfs_write_data *wdata = data;
 
+	if (!test_bit(NFS_IOHDR_REDO, &wdata->header->flags))
+		nfs_put_client(wdata->ds_clp);
 	wdata->header->mds_ops->rpc_release(data);
 }
 
@@ -431,6 +441,7 @@ static void filelayout_commit_release(void *calldata)
 
 	data->completion_ops->completion(data);
 	put_lseg(data->lseg);
+	nfs_put_client(data->ds_clp);
 	nfs_commitdata_release(data);
 }
 
@@ -476,9 +487,11 @@ filelayout_read_pagelist(struct nfs_read_data *data)
 	ds = nfs4_fl_prepare_ds(lseg, idx);
 	if (!ds)
 		return PNFS_NOT_ATTEMPTED;
-	dprintk("%s USE DS: %s\n", __func__, ds->ds_remotestr);
+	dprintk("%s USE DS: %s cl_count %d\n", __func__,
+		ds->ds_remotestr, atomic_read(&ds->ds_clp->cl_count));
 
 	/* No multipath support. Use first DS */
+	atomic_inc(&ds->ds_clp->cl_count);
 	data->ds_clp = ds->ds_clp;
 	fh = nfs4_fl_select_ds_fh(lseg, j);
 	if (fh)
@@ -512,11 +525,12 @@ filelayout_write_pagelist(struct nfs_write_data *data, int sync)
 	ds = nfs4_fl_prepare_ds(lseg, idx);
 	if (!ds)
 		return PNFS_NOT_ATTEMPTED;
-	dprintk("%s ino %lu sync %d req %Zu@%llu DS: %s\n", __func__,
-		hdr->inode->i_ino, sync, (size_t) data->args.count, offset,
-		ds->ds_remotestr);
+	dprintk("%s ino %lu sync %d req %Zu@%llu DS: %s cl_count %d\n",
+		__func__, hdr->inode->i_ino, sync, (size_t) data->args.count,
+		offset, ds->ds_remotestr, atomic_read(&ds->ds_clp->cl_count));
 
 	data->write_done_cb = filelayout_write_done_cb;
+	atomic_inc(&ds->ds_clp->cl_count);
 	data->ds_clp = ds->ds_clp;
 	fh = nfs4_fl_select_ds_fh(lseg, j);
 	if (fh)
@@ -1048,8 +1062,10 @@ static int filelayout_initiate_commit(struct nfs_commit_data *data, int how)
 		filelayout_commit_release(data);
 		return -EAGAIN;
 	}
-	dprintk("%s ino %lu, how %d\n", __func__, data->inode->i_ino, how);
+	dprintk("%s ino %lu, how %d cl_count %d\n", __func__,
+		data->inode->i_ino, how, atomic_read(&ds->ds_clp->cl_count));
 	data->commit_done_cb = filelayout_commit_done_cb;
+	atomic_inc(&ds->ds_clp->cl_count);
 	data->ds_clp = ds->ds_clp;
 	fh = select_ds_fh_from_commit(lseg, data->ds_commit_index);
 	if (fh)
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH Version 4 12/13] NFSv4.1 dereference a disconnected data server client record
  2012-04-27 21:53 [PATCH Version 4 00/13] NFSv4.1 file layout data server quick failover andros
                   ` (10 preceding siblings ...)
  2012-04-27 21:53 ` [PATCH Version 4 11/13] NFSv4.1 ref count nfs_client across filelayout data server io andros
@ 2012-04-27 21:53 ` andros
  2012-04-27 21:53 ` [PATCH Version 4 13/13] NFSv4.1 resend LAYOUTGET on data server invalid layout errors andros
  12 siblings, 0 replies; 15+ messages in thread
From: andros @ 2012-04-27 21:53 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

When the last DS io is processed, the data server client record will be
freed.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayout.c    |    1 +
 fs/nfs/nfs4filelayout.h    |    1 +
 fs/nfs/nfs4filelayoutdev.c |   22 ++++++++++++++++++++++
 3 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index eb8eb00..eaaca89 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -196,6 +196,7 @@ static int filelayout_async_handle_error(struct rpc_task *task,
 			_pnfs_return_layout(state->inode);
 		filelayout_mark_devid_invalid(devid);
 		rpc_wake_up(&tbl->slot_tbl_waitq);
+		nfs4_ds_disconnect(clp);
 		/* fall through */
 	default:
 		dprintk("%s Retry through MDS. Error %d\n", __func__,
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index 3259be6..95562ad 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -146,5 +146,6 @@ extern void nfs4_fl_put_deviceid(struct nfs4_file_layout_dsaddr *dsaddr);
 extern void nfs4_fl_free_deviceid(struct nfs4_file_layout_dsaddr *dsaddr);
 struct nfs4_file_layout_dsaddr *
 get_device_info(struct inode *inode, struct nfs4_deviceid *dev_id, gfp_t gfp_flags);
+void nfs4_ds_disconnect(struct nfs_client *clp);
 
 #endif /* FS_NFS_NFS4FILELAYOUT_H */
diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index d5a92cf..0d3e35b 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -149,6 +149,28 @@ _data_server_lookup_locked(const struct list_head *dsaddrs)
 }
 
 /*
+ * Lookup DS by nfs_client pointer. Zero data server client pointer
+ */
+void nfs4_ds_disconnect(struct nfs_client *clp)
+{
+	struct nfs4_pnfs_ds *ds;
+	struct nfs_client *found = NULL;
+
+	dprintk("%s clp %p\n", __func__, clp);
+	spin_lock(&nfs4_ds_cache_lock);
+	list_for_each_entry(ds, &nfs4_data_server_cache, ds_node)
+		if (ds->ds_clp && ds->ds_clp == clp) {
+			found = ds->ds_clp;
+			ds->ds_clp = NULL;
+		}
+	spin_unlock(&nfs4_ds_cache_lock);
+	if (found) {
+		set_bit(NFS_CS_STOP_RENEW, &clp->cl_res_state);
+		nfs_put_client(clp);
+	}
+}
+
+/*
  * Create an rpc connection to the nfs4_pnfs_ds data server
  * Currently only supports IPv4 and IPv6 addresses
  */
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH Version 4 13/13] NFSv4.1 resend LAYOUTGET on data server invalid layout errors
  2012-04-27 21:53 [PATCH Version 4 00/13] NFSv4.1 file layout data server quick failover andros
                   ` (11 preceding siblings ...)
  2012-04-27 21:53 ` [PATCH Version 4 12/13] NFSv4.1 dereference a disconnected data server client record andros
@ 2012-04-27 21:53 ` andros
  12 siblings, 0 replies; 15+ messages in thread
From: andros @ 2012-04-27 21:53 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

The "invalid layout" class of errors is handled by destroying the layout and
getting a new layout from the server.  Currently, the layout must be
destroyed before a new layout can be obtained.

This means that all references (e.g.lsegs) to the "to be destroyed" layout
header must be dropped before it can be destroyed. This in turn means waiting
for all in flight RPC's using the old layout as well as draining the data
server session slot table wait queue.

Set the NFS_LAYOUT_INVALID flag to redirect I/O to the MDS while waiting for
the old layout to be destroyed.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayout.c |   28 ++++++++++++++++++++++++----
 fs/nfs/nfs4filelayout.h |   13 +++++++++++++
 fs/nfs/pnfs.c           |    1 +
 fs/nfs/pnfs.h           |    1 +
 4 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index eaaca89..474c630 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -182,6 +182,27 @@ static int filelayout_async_handle_error(struct rpc_task *task,
 		break;
 	case -NFS4ERR_RETRY_UNCACHED_REP:
 		break;
+	/* Invalidate Layout errors */
+	case -NFS4ERR_PNFS_NO_LAYOUT:
+	case -ESTALE:           /* mapped NFS4ERR_STALE */
+	case -EBADHANDLE:       /* mapped NFS4ERR_BADHANDLE */
+	case -EISDIR:           /* mapped NFS4ERR_ISDIR */
+	case -NFS4ERR_FHEXPIRED:
+	case -NFS4ERR_WRONG_TYPE:
+		dprintk("%s Invalid layout error %d\n", __func__,
+			task->tk_status);
+		/*
+		 * Destroy layout so new i/o will get a new layout.
+		 * Layout will not be destroyed until all current lseg
+		 * references are put. Mark layout as invalid to resend failed
+		 * i/o and all i/o waiting on the slot table to the MDS until
+		 * layout is destroyed and a new valid layout is obtained.
+		 */
+		set_bit(NFS_LAYOUT_INVALID,
+				&NFS_I(state->inode)->layout->plh_flags);
+		pnfs_destroy_layout(NFS_I(state->inode));
+		rpc_wake_up(&tbl->slot_tbl_waitq);
+		goto reset;
 	/* RPC connection errors */
 	case -ECONNREFUSED:
 	case -EHOSTDOWN:
@@ -199,6 +220,7 @@ static int filelayout_async_handle_error(struct rpc_task *task,
 		nfs4_ds_disconnect(clp);
 		/* fall through */
 	default:
+reset:
 		dprintk("%s Retry through MDS. Error %d\n", __func__,
 			task->tk_status);
 		return -NFS4ERR_RESET_TO_MDS;
@@ -263,9 +285,8 @@ filelayout_set_layoutcommit(struct nfs_write_data *wdata)
 static void filelayout_read_prepare(struct rpc_task *task, void *data)
 {
 	struct nfs_read_data *rdata = data;
-	struct pnfs_layout_segment *lseg = rdata->header->lseg;
 
-	if (filelayout_test_devid_invalid(FILELAYOUT_DEVID_NODE(lseg))) {
+	if (filelayout_reset_to_mds(rdata->header->lseg)) {
 		dprintk("%s task %u reset io to MDS\n", __func__, task->tk_pid);
 		filelayout_reset_read(rdata);
 		rpc_exit(task, 0);
@@ -366,9 +387,8 @@ static int filelayout_commit_done_cb(struct rpc_task *task,
 static void filelayout_write_prepare(struct rpc_task *task, void *data)
 {
 	struct nfs_write_data *wdata = data;
-	struct pnfs_layout_segment *lseg = wdata->header->lseg;
 
-	if (filelayout_test_devid_invalid(FILELAYOUT_DEVID_NODE(lseg))) {
+	if (filelayout_reset_to_mds(wdata->header->lseg)) {
 		dprintk("%s task %u reset io to MDS\n", __func__, task->tk_pid);
 		filelayout_reset_write(wdata);
 		rpc_exit(task, 0);
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index 95562ad..43fe802 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -129,11 +129,24 @@ filelayout_mark_devid_invalid(struct nfs4_deviceid_node *node)
 }
 
 static inline bool
+filelayout_test_layout_invalid(struct pnfs_layout_hdr *lo)
+{
+	return test_bit(NFS_LAYOUT_INVALID, &lo->plh_flags);
+}
+
+static inline bool
 filelayout_test_devid_invalid(struct nfs4_deviceid_node *node)
 {
 	return test_bit(NFS_DEVICEID_INVALID, &node->flags);
 }
 
+static inline bool
+filelayout_reset_to_mds(struct pnfs_layout_segment *lseg)
+{
+	return filelayout_test_devid_invalid(FILELAYOUT_DEVID_NODE(lseg)) ||
+		filelayout_test_layout_invalid(lseg->pls_layout);
+}
+
 extern struct nfs_fh *
 nfs4_fl_select_ds_fh(struct pnfs_layout_segment *lseg, u32 j);
 
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 98a7ec8..42af565 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -455,6 +455,7 @@ pnfs_destroy_layout(struct nfs_inode *nfsi)
 	spin_unlock(&nfsi->vfs_inode.i_lock);
 	pnfs_free_lseg_list(&tmp_list);
 }
+EXPORT_SYMBOL_GPL(pnfs_destroy_layout);
 
 /*
  * Called by the state manger to remove all layouts established under an
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 681875d..549136a 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -63,6 +63,7 @@ enum {
 	NFS_LAYOUT_BULK_RECALL,		/* bulk recall affecting layout */
 	NFS_LAYOUT_ROC,			/* some lseg had roc bit set */
 	NFS_LAYOUT_DESTROYED,		/* no new use of layout allowed */
+	NFS_LAYOUT_INVALID,		/* layout is being destroyed */
 };
 
 enum layoutdriver_policy_flags {
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH Version 4 01/13] NFSv4.1 do not send LAYOUTRETURN when there are no layout segments
  2012-04-27 21:53 ` [PATCH Version 4 01/13] NFSv4.1 do not send LAYOUTRETURN when there are no layout segments andros
@ 2012-05-19 21:18   ` Myklebust, Trond
  0 siblings, 0 replies; 15+ messages in thread
From: Myklebust, Trond @ 2012-05-19 21:18 UTC (permalink / raw)
  To: Adamson, Andy; +Cc: linux-nfs

T24gRnJpLCAyMDEyLTA0LTI3IGF0IDE3OjUzIC0wNDAwLCBhbmRyb3NAbmV0YXBwLmNvbSB3cm90
ZToNCj4gRnJvbTogQW5keSBBZGFtc29uIDxhbmRyb3NAbmV0YXBwLmNvbT4NCj4gDQo+IFNpZ25l
ZC1vZmYtYnk6IEFuZHkgQWRhbXNvbiA8YW5kcm9zQG5ldGFwcC5jb20+DQo+IC0tLQ0KPiAgZnMv
bmZzL3BuZnMuYyB8ICAgIDIgKy0NCj4gIDEgZmlsZXMgY2hhbmdlZCwgMSBpbnNlcnRpb25zKCsp
LCAxIGRlbGV0aW9ucygtKQ0KPiANCj4gZGlmZiAtLWdpdCBhL2ZzL25mcy9wbmZzLmMgYi9mcy9u
ZnMvcG5mcy5jDQo+IGluZGV4IDVlMTE1NTcuLjQ2M2ViMmYgMTAwNjQ0DQo+IC0tLSBhL2ZzL25m
cy9wbmZzLmMNCj4gKysrIGIvZnMvbmZzL3BuZnMuYw0KPiBAQCAtNjU3LDcgKzY1Nyw3IEBAIF9w
bmZzX3JldHVybl9sYXlvdXQoc3RydWN0IGlub2RlICppbm8pDQo+ICANCj4gIAlzcGluX2xvY2so
Jmluby0+aV9sb2NrKTsNCj4gIAlsbyA9IG5mc2ktPmxheW91dDsNCj4gLQlpZiAoIWxvKSB7DQo+
ICsJaWYgKCFsbyB8fCBsaXN0X2VtcHR5KCZsby0+cGxoX3NlZ3MpKSB7DQo+ICAJCXNwaW5fdW5s
b2NrKCZpbm8tPmlfbG9jayk7DQo+ICAJCWRwcmludGsoIiVzOiBubyBsYXlvdXQgdG8gcmV0dXJu
XG4iLCBfX2Z1bmNfXyk7DQo+ICAJCXJldHVybiBzdGF0dXM7DQoNCkhtbS4uLiBBcmVuJ3Qgd2Ug
YmV0dGVyIG9mZiBkb2luZyB0aGlzIGlmIHRoZSBjYWxsIHRvDQptYXJrX21hdGNoaW5nX2xzZWdz
X2ludmFsaWQoKSByZXR1cm5zIGFuIGVtcHR5IGxpc3Q/DQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0
DQpMaW51eCBORlMgY2xpZW50IG1haW50YWluZXINCg0KTmV0QXBwDQpUcm9uZC5NeWtsZWJ1c3RA
bmV0YXBwLmNvbQ0Kd3d3Lm5ldGFwcC5jb20NCg0K

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-05-19 21:19 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-27 21:53 [PATCH Version 4 00/13] NFSv4.1 file layout data server quick failover andros
2012-04-27 21:53 ` [PATCH Version 4 01/13] NFSv4.1 do not send LAYOUTRETURN when there are no layout segments andros
2012-05-19 21:18   ` Myklebust, Trond
2012-04-27 21:53 ` [PATCH Version 4 02/13] NFSv4.1: cleanup filelayout invalid deviceid handling andros
2012-04-27 21:53 ` [PATCH Version 4 03/13] NFSv4.1 cleanup filelayout invalid layout handling andros
2012-04-27 21:53 ` [PATCH Version 4 04/13] NFSv4.1 set RPC_TASK_SOFTCONN for filelayout DS RPC calls andros
2012-04-27 21:53 ` [PATCH Version 4 05/13] NFSv4.1 data server timeo and retrans module parameters andros
2012-04-27 21:53 ` [PATCH Version 4 06/13] NFSv4.1: mark deviceid invalid on filelayout DS connection errors andros
2012-04-27 21:53 ` [PATCH Version 4 07/13] NFSv4.1 remove nfs4_reset_write and nfs4_reset_read andros
2012-04-27 21:53 ` [PATCH Version 4 08/13] NFSv4.1 Check invalid deviceid upon slot table waitq wakeup andros
2012-04-27 21:53 ` [PATCH Version 4 09/13] NFSv4.1 wake up all tasks on un-connected DS slot table waitq andros
2012-04-27 21:53 ` [PATCH Version 4 10/13] NFSv4.1 send layoutreturn to fence disconnected data server andros
2012-04-27 21:53 ` [PATCH Version 4 11/13] NFSv4.1 ref count nfs_client across filelayout data server io andros
2012-04-27 21:53 ` [PATCH Version 4 12/13] NFSv4.1 dereference a disconnected data server client record andros
2012-04-27 21:53 ` [PATCH Version 4 13/13] NFSv4.1 resend LAYOUTGET on data server invalid layout errors andros

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.