All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH Version 3 00/11] NFSv4.1 file layout data server quick failover
@ 2012-03-22 19:19 andros
  2012-03-22 19:19 ` [PATCH Version 3 01/11] NFSv4.1 move nfs4_reset_read and nfs_reset_write andros
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: andros @ 2012-03-22 19:19 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Changes from Version 2:

- refactored filelayout_async_handle errors responding to comments.
- removed NFSv4.1-have-filelayout_initiate_commit-return-void.patch
  as per comments.

Currently, when a data server connection goes down due to a network partion,
a data server failure, or an administrative action, RPC tasks in various
stages of the RPC finite state machine (FSM) need to transmit and timeout
(or other failure) before being redirected towards an alternative server
(MDS or another DS).
This can take a very long time if the connection goes down during a heavy
I/O load where the data server fore channel session slot_tbl_waitq and the
transport sending/pending waitqs are populated with many requests.
(see RedHat Bugzilla 756212 "Redirecting I/O through the MDS after a data
server network partition is very slow")
The current code also keeps the client structure and the session to the failed
data server until umount.

These patches address this problem by setting data server RPC tasks to
RPC_TASK_SOFTCONN and handling the resultant connection errors as follows:

 * The pNFS deviceid is marked invalid which blocks any new pNFS io using that
deviceid.
 * The RPC done routines for READ, WRITE and COMMIT redirect the requests
to the new server (MDS) and send the request back through the RPC FSM.
 * The data server session fore channel slot_tbl_waitq is drained using
the debugged rpc_wake_up
 * Code is added to the filelayout read/write prepare routines to reset
the task for io to MDS upon invalid deviceid. This is called when the
session fore channel slot table is drained.
 * All data server io requests reference the data server client structure
across io calls, and the client is dereferenced upon deviceid invalidation so
that the client (and the session) is freed upon the last (failed) redirected io.

Testing:
I use a pynfs file layout server with a DS to test. The pynfs server and DS
is modified to use the local host for MDS to DS communication. I add a
second ipv4 address to the single machine interface for the DS to client
communication. While a "dd" or a read/write heavy Connectathon test is
running, the DS ip address is removed from the ethernet interface, and the
client recovers io to the MDS.
I have tested READ and WRITE recovery multiple times, and have managed to
time the removal of the DS ip address during a DS COMMIT and have seen it
recover as well. :)


Comments welcome

--> Andy



Andy Adamson (11):
  NFSv4.1 move nfs4_reset_read and nfs_reset_write
  NFSv4.1: cleanup filelayout invalid deviceid handling
  NFSv4.1 cleanup filelayout invalid layout handling
  NFSv4.1 set RPC_TASK_SOFTCONN for filelayout DS RPC calls
  NFSv4.1: mark deviceid invalid on filelayout DS connection errors
  NFSv4.1: send filelayout DS commits to the MDS on invalid deviceid
  NFSv4.1 Check invalid deviceid upon slot table waitq wakeup
  NFSv4.1 wake up all tasks on un-connected DS slot table waitq
  NFSv4.1 ref count nfs_client across filelayout data server io
  NFSv4.1 de reference a disconnected data server client record
  NFSv4.1 check for NULL pnfs_layout_hdr in pnfs scan commit lists

 fs/nfs/internal.h          |   11 ++-
 fs/nfs/nfs4filelayout.c    |  202 +++++++++++++++++++++++++++++---------------
 fs/nfs/nfs4filelayout.h    |   25 +++++-
 fs/nfs/nfs4filelayoutdev.c |   54 ++++++------
 fs/nfs/nfs4proc.c          |   39 +--------
 fs/nfs/pnfs.h              |    3 +-
 fs/nfs/read.c              |    6 +-
 fs/nfs/write.c             |   13 ++--
 8 files changed, 205 insertions(+), 148 deletions(-)

-- 
1.7.6.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH Version 3 01/11] NFSv4.1 move nfs4_reset_read and nfs_reset_write
  2012-03-22 19:19 [PATCH Version 3 00/11] NFSv4.1 file layout data server quick failover andros
@ 2012-03-22 19:19 ` andros
  2012-03-22 19:19 ` [PATCH Version 3 02/11] NFSv4.1: cleanup filelayout invalid deviceid handling andros
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: andros @ 2012-03-22 19:19 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Only called by the file layout code

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/internal.h       |    5 +++--
 fs/nfs/nfs4filelayout.c |   35 +++++++++++++++++++++++++++++++++--
 fs/nfs/nfs4proc.c       |   39 ++++-----------------------------------
 3 files changed, 40 insertions(+), 39 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 2476dc6..f9ac1f0 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -344,13 +344,14 @@ extern int nfs_migrate_page(struct address_space *,
 
 /* nfs4proc.c */
 extern void __nfs4_read_done_cb(struct nfs_read_data *);
-extern void nfs4_reset_read(struct rpc_task *task, struct nfs_read_data *data);
+extern int nfs4_read_done_cb(struct rpc_task *task, struct nfs_read_data *data);
+extern int nfs4_write_done_cb(struct rpc_task *task,
+			      struct nfs_write_data *data);
 extern int nfs4_init_client(struct nfs_client *clp,
 			    const struct rpc_timeout *timeparms,
 			    const char *ip_addr,
 			    rpc_authflavor_t authflavour,
 			    int noresvport);
-extern void nfs4_reset_write(struct rpc_task *task, struct nfs_write_data *data);
 extern int _nfs4_call_sync(struct rpc_clnt *clnt,
 			   struct nfs_server *server,
 			   struct rpc_message *msg,
diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 634c0bc..36a65ce 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -82,6 +82,37 @@ filelayout_get_dserver_offset(struct pnfs_layout_segment *lseg, loff_t offset)
 	BUG();
 }
 
+/* Reset the the nfs_write_data to send the write to the MDS. */
+void filelayout_reset_write(struct rpc_task *task, struct nfs_write_data *data)
+{
+	dprintk("%s Reset task for i/o through MDS\n", __func__);
+	put_lseg(data->lseg);
+	data->lseg          = NULL;
+	data->ds_clp        = NULL;
+	data->write_done_cb = nfs4_write_done_cb;
+	data->args.fh       = NFS_FH(data->inode);
+	data->args.bitmask  = data->res.server->cache_consistency_bitmask;
+	data->args.offset   = data->mds_offset;
+	data->res.fattr     = &data->fattr;
+	task->tk_ops        = data->mds_ops;
+	rpc_task_reset_client(task, NFS_CLIENT(data->inode));
+}
+
+/* Reset the the nfs_read_data to send the read to the MDS. */
+void filelayout_reset_read(struct rpc_task *task, struct nfs_read_data *data)
+{
+	dprintk("%s Reset task for i/o through MDS\n", __func__);
+	put_lseg(data->lseg);
+	data->lseg         = NULL;
+	/* offsets will differ in the dense stripe case */
+	data->args.offset  = data->mds_offset;
+	data->ds_clp       = NULL;
+	data->args.fh      = NFS_FH(data->inode);
+	data->read_done_cb = nfs4_read_done_cb;
+	task->tk_ops = data->mds_ops;
+	rpc_task_reset_client(task, NFS_CLIENT(data->inode));
+}
+
 static int filelayout_async_handle_error(struct rpc_task *task,
 					 struct nfs4_state *state,
 					 struct nfs_client *clp,
@@ -158,7 +189,7 @@ static int filelayout_read_done_cb(struct rpc_task *task,
 			__func__, data->ds_clp, data->ds_clp->cl_session);
 		if (reset) {
 			pnfs_set_lo_fail(data->lseg);
-			nfs4_reset_read(task, data);
+			filelayout_reset_read(task, data);
 		}
 		rpc_restart_call_prepare(task);
 		return -EAGAIN;
@@ -239,7 +270,7 @@ static int filelayout_write_done_cb(struct rpc_task *task,
 			__func__, data->ds_clp, data->ds_clp->cl_session);
 		if (reset) {
 			pnfs_set_lo_fail(data->lseg);
-			nfs4_reset_write(task, data);
+			filelayout_reset_write(task, data);
 		}
 		rpc_restart_call_prepare(task);
 		return -EAGAIN;
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index b76dd0e..f3c67db 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3284,7 +3284,7 @@ void __nfs4_read_done_cb(struct nfs_read_data *data)
 	nfs_invalidate_atime(data->inode);
 }
 
-static int nfs4_read_done_cb(struct rpc_task *task, struct nfs_read_data *data)
+int nfs4_read_done_cb(struct rpc_task *task, struct nfs_read_data *data)
 {
 	struct nfs_server *server = NFS_SERVER(data->inode);
 
@@ -3298,6 +3298,7 @@ static int nfs4_read_done_cb(struct rpc_task *task, struct nfs_read_data *data)
 		renew_lease(server, data->timestamp);
 	return 0;
 }
+EXPORT_SYMBOL_GPL(nfs4_read_done_cb);
 
 static int nfs4_read_done(struct rpc_task *task, struct nfs_read_data *data)
 {
@@ -3329,23 +3330,7 @@ static void nfs4_proc_read_rpc_prepare(struct rpc_task *task, struct nfs_read_da
 	rpc_call_start(task);
 }
 
-/* Reset the the nfs_read_data to send the read to the MDS. */
-void nfs4_reset_read(struct rpc_task *task, struct nfs_read_data *data)
-{
-	dprintk("%s Reset task for i/o through\n", __func__);
-	put_lseg(data->lseg);
-	data->lseg = NULL;
-	/* offsets will differ in the dense stripe case */
-	data->args.offset = data->mds_offset;
-	data->ds_clp = NULL;
-	data->args.fh     = NFS_FH(data->inode);
-	data->read_done_cb = nfs4_read_done_cb;
-	task->tk_ops = data->mds_ops;
-	rpc_task_reset_client(task, NFS_CLIENT(data->inode));
-}
-EXPORT_SYMBOL_GPL(nfs4_reset_read);
-
-static int nfs4_write_done_cb(struct rpc_task *task, struct nfs_write_data *data)
+int nfs4_write_done_cb(struct rpc_task *task, struct nfs_write_data *data)
 {
 	struct inode *inode = data->inode;
 	
@@ -3359,6 +3344,7 @@ static int nfs4_write_done_cb(struct rpc_task *task, struct nfs_write_data *data
 	}
 	return 0;
 }
+EXPORT_SYMBOL_GPL(nfs4_write_done_cb);
 
 static int nfs4_write_done(struct rpc_task *task, struct nfs_write_data *data)
 {
@@ -3368,23 +3354,6 @@ static int nfs4_write_done(struct rpc_task *task, struct nfs_write_data *data)
 		nfs4_write_done_cb(task, data);
 }
 
-/* Reset the the nfs_write_data to send the write to the MDS. */
-void nfs4_reset_write(struct rpc_task *task, struct nfs_write_data *data)
-{
-	dprintk("%s Reset task for i/o through\n", __func__);
-	put_lseg(data->lseg);
-	data->lseg          = NULL;
-	data->ds_clp        = NULL;
-	data->write_done_cb = nfs4_write_done_cb;
-	data->args.fh       = NFS_FH(data->inode);
-	data->args.bitmask  = data->res.server->cache_consistency_bitmask;
-	data->args.offset   = data->mds_offset;
-	data->res.fattr     = &data->fattr;
-	task->tk_ops        = data->mds_ops;
-	rpc_task_reset_client(task, NFS_CLIENT(data->inode));
-}
-EXPORT_SYMBOL_GPL(nfs4_reset_write);
-
 static void nfs4_proc_write_setup(struct nfs_write_data *data, struct rpc_message *msg)
 {
 	struct nfs_server *server = NFS_SERVER(data->inode);
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH Version 3 02/11] NFSv4.1: cleanup filelayout invalid deviceid handling
  2012-03-22 19:19 [PATCH Version 3 00/11] NFSv4.1 file layout data server quick failover andros
  2012-03-22 19:19 ` [PATCH Version 3 01/11] NFSv4.1 move nfs4_reset_read and nfs_reset_write andros
@ 2012-03-22 19:19 ` andros
  2012-03-22 19:19 ` [PATCH Version 3 03/11] NFSv4.1 cleanup filelayout invalid layout handling andros
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: andros @ 2012-03-22 19:19 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Move the invalid deviceid test into nfs4_fl_prepare_ds, called by the
filelayout read, write, and commit routines. NFS4_DEVICE_ID_NEG_ENTRY
is no longer needed.
Remove redundant printk's - filelayout_mark_devid_invalid prints a KERN_WARNING.

An invalid device prevents pNFS io.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayout.c    |   10 ----------
 fs/nfs/nfs4filelayout.h    |   21 +++++++++++++++++----
 fs/nfs/nfs4filelayoutdev.c |   37 +++++++++++--------------------------
 3 files changed, 28 insertions(+), 40 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 36a65ce..cb9ea7e 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -392,9 +392,6 @@ filelayout_read_pagelist(struct nfs_read_data *data)
 		__func__, data->inode->i_ino,
 		data->args.pgbase, (size_t)data->args.count, offset);
 
-	if (test_bit(NFS_DEVICEID_INVALID, &FILELAYOUT_DEVID_NODE(lseg)->flags))
-		return PNFS_NOT_ATTEMPTED;
-
 	/* Retrieve the correct rpc_client for the byte range */
 	j = nfs4_fl_calc_j_index(lseg, offset);
 	idx = nfs4_fl_calc_ds_index(lseg, j);
@@ -434,16 +431,11 @@ filelayout_write_pagelist(struct nfs_write_data *data, int sync)
 	struct nfs_fh *fh;
 	int status;
 
-	if (test_bit(NFS_DEVICEID_INVALID, &FILELAYOUT_DEVID_NODE(lseg)->flags))
-		return PNFS_NOT_ATTEMPTED;
-
 	/* Retrieve the correct rpc_client for the byte range */
 	j = nfs4_fl_calc_j_index(lseg, offset);
 	idx = nfs4_fl_calc_ds_index(lseg, j);
 	ds = nfs4_fl_prepare_ds(lseg, idx);
 	if (!ds) {
-		printk(KERN_ERR "NFS: %s: prepare_ds failed, use MDS\n",
-			__func__);
 		set_bit(lo_fail_bit(IOMODE_RW), &lseg->pls_layout->plh_flags);
 		set_bit(lo_fail_bit(IOMODE_READ), &lseg->pls_layout->plh_flags);
 		return PNFS_NOT_ATTEMPTED;
@@ -922,8 +914,6 @@ static int filelayout_initiate_commit(struct nfs_write_data *data, int how)
 	idx = calc_ds_index_from_commit(lseg, data->ds_commit_index);
 	ds = nfs4_fl_prepare_ds(lseg, idx);
 	if (!ds) {
-		printk(KERN_ERR "NFS: %s: prepare_ds failed, use MDS\n",
-			__func__);
 		set_bit(lo_fail_bit(IOMODE_RW), &lseg->pls_layout->plh_flags);
 		set_bit(lo_fail_bit(IOMODE_READ), &lseg->pls_layout->plh_flags);
 		prepare_to_resend_writes(data);
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index 21190bb..b54b389 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -62,12 +62,8 @@ struct nfs4_pnfs_ds {
 	atomic_t		ds_count;
 };
 
-/* nfs4_file_layout_dsaddr flags */
-#define NFS4_DEVICE_ID_NEG_ENTRY	0x00000001
-
 struct nfs4_file_layout_dsaddr {
 	struct nfs4_deviceid_node	id_node;
-	unsigned long			flags;
 	u32				stripe_count;
 	u8				*stripe_indices;
 	u32				ds_num;
@@ -107,6 +103,23 @@ FILELAYOUT_DEVID_NODE(struct pnfs_layout_segment *lseg)
 	return &FILELAYOUT_LSEG(lseg)->dsaddr->id_node;
 }
 
+static inline void
+filelayout_mark_devid_invalid(struct nfs4_deviceid_node *node)
+{
+	u32 *p = (u32 *)&node->deviceid;
+
+	printk(KERN_WARNING "NFS: Deviceid [%x%x%x%x] marked out of use.\n",
+		p[0], p[1], p[2], p[3]);
+
+	set_bit(NFS_DEVICEID_INVALID, &node->flags);
+}
+
+static inline bool
+filelayout_test_devid_invalid(struct nfs4_deviceid_node *node)
+{
+	return test_bit(NFS_DEVICEID_INVALID, &node->flags);
+}
+
 extern struct nfs_fh *
 nfs4_fl_select_ds_fh(struct pnfs_layout_segment *lseg, u32 j);
 
diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index a866bbd..2b8ae96 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -791,48 +791,33 @@ nfs4_fl_select_ds_fh(struct pnfs_layout_segment *lseg, u32 j)
 	return flseg->fh_array[i];
 }
 
-static void
-filelayout_mark_devid_negative(struct nfs4_file_layout_dsaddr *dsaddr,
-			       int err, const char *ds_remotestr)
-{
-	u32 *p = (u32 *)&dsaddr->id_node.deviceid;
-
-	printk(KERN_ERR "NFS: data server %s connection error %d."
-		" Deviceid [%x%x%x%x] marked out of use.\n",
-		ds_remotestr, err, p[0], p[1], p[2], p[3]);
-
-	spin_lock(&nfs4_ds_cache_lock);
-	dsaddr->flags |= NFS4_DEVICE_ID_NEG_ENTRY;
-	spin_unlock(&nfs4_ds_cache_lock);
-}
-
 struct nfs4_pnfs_ds *
 nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
 {
 	struct nfs4_file_layout_dsaddr *dsaddr = FILELAYOUT_LSEG(lseg)->dsaddr;
 	struct nfs4_pnfs_ds *ds = dsaddr->ds_list[ds_idx];
+	struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(lseg);
+
+	if (filelayout_test_devid_invalid(devid))
+		return NULL;
 
 	if (ds == NULL) {
 		printk(KERN_ERR "NFS: %s: No data server for offset index %d\n",
 			__func__, ds_idx);
-		return NULL;
+		goto mark_dev_invalid;
 	}
 
 	if (!ds->ds_clp) {
 		struct nfs_server *s = NFS_SERVER(lseg->pls_layout->plh_inode);
 		int err;
 
-		if (dsaddr->flags & NFS4_DEVICE_ID_NEG_ENTRY) {
-			/* Already tried to connect, don't try again */
-			dprintk("%s Deviceid marked out of use\n", __func__);
-			return NULL;
-		}
 		err = nfs4_ds_connect(s, ds);
-		if (err) {
-			filelayout_mark_devid_negative(dsaddr, err,
-						       ds->ds_remotestr);
-			return NULL;
-		}
+		if (err)
+			goto mark_dev_invalid;
 	}
 	return ds;
+
+mark_dev_invalid:
+	filelayout_mark_devid_invalid(devid);
+	return NULL;
 }
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH Version 3 03/11] NFSv4.1 cleanup filelayout invalid layout handling
  2012-03-22 19:19 [PATCH Version 3 00/11] NFSv4.1 file layout data server quick failover andros
  2012-03-22 19:19 ` [PATCH Version 3 01/11] NFSv4.1 move nfs4_reset_read and nfs_reset_write andros
  2012-03-22 19:19 ` [PATCH Version 3 02/11] NFSv4.1: cleanup filelayout invalid deviceid handling andros
@ 2012-03-22 19:19 ` andros
  2012-03-22 19:19 ` [PATCH Version 3 04/11] NFSv4.1 set RPC_TASK_SOFTCONN for filelayout DS RPC calls andros
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: andros @ 2012-03-22 19:19 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

The invalid layout bits are should only be used to block LAYOUTGETs.

Do not invalidate a layout on deviceid invalidation.
Do not invalidate a layout on un-handled READ, WRITE, COMMIT errors.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayout.c |   26 ++++++--------------------
 1 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index cb9ea7e..acafc4d 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -187,10 +187,8 @@ static int filelayout_read_done_cb(struct rpc_task *task,
 					  data->ds_clp, &reset) == -EAGAIN) {
 		dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
 			__func__, data->ds_clp, data->ds_clp->cl_session);
-		if (reset) {
-			pnfs_set_lo_fail(data->lseg);
+		if (reset)
 			filelayout_reset_read(task, data);
-		}
 		rpc_restart_call_prepare(task);
 		return -EAGAIN;
 	}
@@ -268,10 +266,8 @@ static int filelayout_write_done_cb(struct rpc_task *task,
 					  data->ds_clp, &reset) == -EAGAIN) {
 		dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
 			__func__, data->ds_clp, data->ds_clp->cl_session);
-		if (reset) {
-			pnfs_set_lo_fail(data->lseg);
+		if (reset)
 			filelayout_reset_write(task, data);
-		}
 		rpc_restart_call_prepare(task);
 		return -EAGAIN;
 	}
@@ -300,10 +296,9 @@ static int filelayout_commit_done_cb(struct rpc_task *task,
 					  data->ds_clp, &reset) == -EAGAIN) {
 		dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
 			__func__, data->ds_clp, data->ds_clp->cl_session);
-		if (reset) {
+		if (reset)
 			prepare_to_resend_writes(data);
-			pnfs_set_lo_fail(data->lseg);
-		} else
+		else
 			rpc_restart_call_prepare(task);
 		return -EAGAIN;
 	}
@@ -396,12 +391,8 @@ filelayout_read_pagelist(struct nfs_read_data *data)
 	j = nfs4_fl_calc_j_index(lseg, offset);
 	idx = nfs4_fl_calc_ds_index(lseg, j);
 	ds = nfs4_fl_prepare_ds(lseg, idx);
-	if (!ds) {
-		/* Either layout fh index faulty, or ds connect failed */
-		set_bit(lo_fail_bit(IOMODE_RW), &lseg->pls_layout->plh_flags);
-		set_bit(lo_fail_bit(IOMODE_READ), &lseg->pls_layout->plh_flags);
+	if (!ds)
 		return PNFS_NOT_ATTEMPTED;
-	}
 	dprintk("%s USE DS: %s\n", __func__, ds->ds_remotestr);
 
 	/* No multipath support. Use first DS */
@@ -435,11 +426,8 @@ filelayout_write_pagelist(struct nfs_write_data *data, int sync)
 	j = nfs4_fl_calc_j_index(lseg, offset);
 	idx = nfs4_fl_calc_ds_index(lseg, j);
 	ds = nfs4_fl_prepare_ds(lseg, idx);
-	if (!ds) {
-		set_bit(lo_fail_bit(IOMODE_RW), &lseg->pls_layout->plh_flags);
-		set_bit(lo_fail_bit(IOMODE_READ), &lseg->pls_layout->plh_flags);
+	if (!ds)
 		return PNFS_NOT_ATTEMPTED;
-	}
 	dprintk("%s ino %lu sync %d req %Zu@%llu DS: %s\n", __func__,
 		data->inode->i_ino, sync, (size_t) data->args.count, offset,
 		ds->ds_remotestr);
@@ -914,8 +902,6 @@ static int filelayout_initiate_commit(struct nfs_write_data *data, int how)
 	idx = calc_ds_index_from_commit(lseg, data->ds_commit_index);
 	ds = nfs4_fl_prepare_ds(lseg, idx);
 	if (!ds) {
-		set_bit(lo_fail_bit(IOMODE_RW), &lseg->pls_layout->plh_flags);
-		set_bit(lo_fail_bit(IOMODE_READ), &lseg->pls_layout->plh_flags);
 		prepare_to_resend_writes(data);
 		filelayout_commit_release(data);
 		return -EAGAIN;
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH Version 3 04/11] NFSv4.1 set RPC_TASK_SOFTCONN for filelayout DS RPC calls
  2012-03-22 19:19 [PATCH Version 3 00/11] NFSv4.1 file layout data server quick failover andros
                   ` (2 preceding siblings ...)
  2012-03-22 19:19 ` [PATCH Version 3 03/11] NFSv4.1 cleanup filelayout invalid layout handling andros
@ 2012-03-22 19:19 ` andros
  2012-03-22 19:19 ` [PATCH Version 3 05/11] NFSv4.1: mark deviceid invalid on filelayout DS connection errors andros
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: andros @ 2012-03-22 19:19 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

RPC_TASK_SOFTCONN returns connection errors to the caller which allows the pNFS
file layout to quickly try the MDS or perhaps another DS.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/internal.h       |    6 +++---
 fs/nfs/nfs4filelayout.c |   10 ++++++----
 fs/nfs/read.c           |    6 +++---
 fs/nfs/write.c          |   13 +++++++------
 4 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index f9ac1f0..eebd7f1 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -297,7 +297,7 @@ extern int nfs4_get_rootfh(struct nfs_server *server, struct nfs_fh *mntfh);
 struct nfs_pageio_descriptor;
 /* read.c */
 extern int nfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
-			     const struct rpc_call_ops *call_ops);
+			     const struct rpc_call_ops *call_ops, int flags);
 extern void nfs_read_prepare(struct rpc_task *task, void *calldata);
 extern int nfs_generic_pagein(struct nfs_pageio_descriptor *desc,
 		struct list_head *head);
@@ -318,12 +318,12 @@ extern void nfs_commit_free(struct nfs_write_data *p);
 extern int nfs_initiate_write(struct nfs_write_data *data,
 			      struct rpc_clnt *clnt,
 			      const struct rpc_call_ops *call_ops,
-			      int how);
+			      int how, int flags);
 extern void nfs_write_prepare(struct rpc_task *task, void *calldata);
 extern int nfs_initiate_commit(struct nfs_write_data *data,
 			       struct rpc_clnt *clnt,
 			       const struct rpc_call_ops *call_ops,
-			       int how);
+			       int how, int flags);
 extern void nfs_init_commit(struct nfs_write_data *data,
 			    struct list_head *head,
 			    struct pnfs_layout_segment *lseg);
diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index acafc4d..3802937 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -406,7 +406,7 @@ filelayout_read_pagelist(struct nfs_read_data *data)
 
 	/* Perform an asynchronous read to ds */
 	status = nfs_initiate_read(data, ds->ds_clp->cl_rpcclient,
-				   &filelayout_read_call_ops);
+				  &filelayout_read_call_ops, RPC_TASK_SOFTCONN);
 	BUG_ON(status != 0);
 	return PNFS_ATTEMPTED;
 }
@@ -445,7 +445,8 @@ filelayout_write_pagelist(struct nfs_write_data *data, int sync)
 
 	/* Perform an asynchronous write */
 	status = nfs_initiate_write(data, ds->ds_clp->cl_rpcclient,
-				    &filelayout_write_call_ops, sync);
+				    &filelayout_write_call_ops, sync,
+				    RPC_TASK_SOFTCONN);
 	BUG_ON(status != 0);
 	return PNFS_ATTEMPTED;
 }
@@ -913,7 +914,8 @@ static int filelayout_initiate_commit(struct nfs_write_data *data, int how)
 	if (fh)
 		data->args.fh = fh;
 	return nfs_initiate_commit(data, ds->ds_clp->cl_rpcclient,
-				   &filelayout_commit_call_ops, how);
+				   &filelayout_commit_call_ops, how,
+				    RPC_TASK_SOFTCONN);
 }
 
 /*
@@ -1064,7 +1066,7 @@ filelayout_commit_pagelist(struct inode *inode, struct list_head *mds_pages,
 		if (!data->lseg) {
 			nfs_init_commit(data, mds_pages, NULL);
 			nfs_initiate_commit(data, NFS_CLIENT(inode),
-					    data->mds_ops, how);
+					    data->mds_ops, how, 0);
 		} else {
 			nfs_init_commit(data, &FILELAYOUT_LSEG(data->lseg)->commit_buckets[data->ds_commit_index].committing, data->lseg);
 			filelayout_initiate_commit(data, how);
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index cc1f758..da7c0b1 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -171,7 +171,7 @@ static void nfs_readpage_release(struct nfs_page *req)
 }
 
 int nfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
-		      const struct rpc_call_ops *call_ops)
+		      const struct rpc_call_ops *call_ops, int flags)
 {
 	struct inode *inode = data->inode;
 	int swap_flags = IS_SWAPFILE(inode) ? NFS_RPC_SWAPFLAGS : 0;
@@ -188,7 +188,7 @@ int nfs_initiate_read(struct nfs_read_data *data, struct rpc_clnt *clnt,
 		.callback_ops = call_ops,
 		.callback_data = data,
 		.workqueue = nfsiod_workqueue,
-		.flags = RPC_TASK_ASYNC | swap_flags,
+		.flags = RPC_TASK_ASYNC | swap_flags | flags,
 	};
 
 	/* Set up the initial task struct. */
@@ -241,7 +241,7 @@ static int nfs_do_read(struct nfs_read_data *data,
 {
 	struct inode *inode = data->args.context->dentry->d_inode;
 
-	return nfs_initiate_read(data, NFS_CLIENT(inode), call_ops);
+	return nfs_initiate_read(data, NFS_CLIENT(inode), call_ops, 0);
 }
 
 static int
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 2c68818..3b620e4 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -839,7 +839,7 @@ static int flush_task_priority(int how)
 int nfs_initiate_write(struct nfs_write_data *data,
 		       struct rpc_clnt *clnt,
 		       const struct rpc_call_ops *call_ops,
-		       int how)
+		       int how, int flags)
 {
 	struct inode *inode = data->inode;
 	int priority = flush_task_priority(how);
@@ -856,7 +856,7 @@ int nfs_initiate_write(struct nfs_write_data *data,
 		.callback_ops = call_ops,
 		.callback_data = data,
 		.workqueue = nfsiod_workqueue,
-		.flags = RPC_TASK_ASYNC,
+		.flags = RPC_TASK_ASYNC | flags,
 		.priority = priority,
 	};
 	int ret = 0;
@@ -937,7 +937,7 @@ static int nfs_do_write(struct nfs_write_data *data,
 {
 	struct inode *inode = data->args.context->dentry->d_inode;
 
-	return nfs_initiate_write(data, NFS_CLIENT(inode), call_ops, how);
+	return nfs_initiate_write(data, NFS_CLIENT(inode), call_ops, how, 0);
 }
 
 static int nfs_do_multiple_writes(struct list_head *head,
@@ -1365,7 +1365,7 @@ EXPORT_SYMBOL_GPL(nfs_commitdata_release);
 
 int nfs_initiate_commit(struct nfs_write_data *data, struct rpc_clnt *clnt,
 			const struct rpc_call_ops *call_ops,
-			int how)
+			int how, int flags)
 {
 	struct rpc_task *task;
 	int priority = flush_task_priority(how);
@@ -1381,7 +1381,7 @@ int nfs_initiate_commit(struct nfs_write_data *data, struct rpc_clnt *clnt,
 		.callback_ops = call_ops,
 		.callback_data = data,
 		.workqueue = nfsiod_workqueue,
-		.flags = RPC_TASK_ASYNC,
+		.flags = RPC_TASK_ASYNC | flags,
 		.priority = priority,
 	};
 	/* Set up the initial task struct.  */
@@ -1463,7 +1463,8 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how)
 
 	/* Set up the argument struct */
 	nfs_init_commit(data, head, NULL);
-	return nfs_initiate_commit(data, NFS_CLIENT(inode), data->mds_ops, how);
+	return nfs_initiate_commit(data, NFS_CLIENT(inode), data->mds_ops,
+				   how, 0);
  out_bad:
 	nfs_retry_commit(head, NULL);
 	nfs_commit_clear_lock(NFS_I(inode));
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH Version 3 05/11] NFSv4.1: mark deviceid invalid on filelayout DS connection errors
  2012-03-22 19:19 [PATCH Version 3 00/11] NFSv4.1 file layout data server quick failover andros
                   ` (3 preceding siblings ...)
  2012-03-22 19:19 ` [PATCH Version 3 04/11] NFSv4.1 set RPC_TASK_SOFTCONN for filelayout DS RPC calls andros
@ 2012-03-22 19:19 ` andros
  2012-03-22 19:19 ` [PATCH Version 3 06/11] NFSv4.1: send filelayout DS commits to the MDS on invalid deviceid andros
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: andros @ 2012-03-22 19:19 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

This prevents the use of any layout for i/o that references the deviceid.
I/O is redirected through the MDS.

Redirect the unhandled failed I/O to the MDS without marking either the
layout or the deviceid invalid.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayout.c |   79 +++++++++++++++++++++++++++++------------------
 fs/nfs/nfs4filelayout.h |    3 ++
 2 files changed, 52 insertions(+), 30 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 3802937..869ce26 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -116,14 +116,13 @@ void filelayout_reset_read(struct rpc_task *task, struct nfs_read_data *data)
 static int filelayout_async_handle_error(struct rpc_task *task,
 					 struct nfs4_state *state,
 					 struct nfs_client *clp,
-					 int *reset)
+					 struct nfs4_deviceid_node *devid)
 {
 	struct nfs_server *mds_server = NFS_SERVER(state->inode);
 	struct nfs_client *mds_client = mds_server->nfs_client;
 
 	if (task->tk_status >= 0)
 		return 0;
-	*reset = 0;
 
 	switch (task->tk_status) {
 	/* MDS state errors */
@@ -158,11 +157,22 @@ static int filelayout_async_handle_error(struct rpc_task *task,
 		break;
 	case -NFS4ERR_RETRY_UNCACHED_REP:
 		break;
+	/* RPC connection errors */
+	case -ECONNREFUSED:
+	case -EHOSTDOWN:
+	case -EHOSTUNREACH:
+	case -ENETUNREACH:
+	case -EIO:
+	case -ETIMEDOUT:
+	case -EPIPE:
+		dprintk("%s DS connection error %d\n", __func__,
+			task->tk_status);
+		filelayout_mark_devid_invalid(devid);
+		/* fall through */
 	default:
-		dprintk("%s DS error. Retry through MDS %d\n", __func__,
+		dprintk("%s Retry through MDS. Error %d\n", __func__,
 			task->tk_status);
-		*reset = 1;
-		break;
+		return -NFS4ERR_RESET_TO_MDS;
 	}
 out:
 	task->tk_status = 0;
@@ -179,16 +189,19 @@ wait_on_recovery:
 static int filelayout_read_done_cb(struct rpc_task *task,
 				struct nfs_read_data *data)
 {
-	int reset = 0;
+	struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(data->lseg);
+	int err;
 
 	dprintk("%s DS read\n", __func__);
 
-	if (filelayout_async_handle_error(task, data->args.context->state,
-					  data->ds_clp, &reset) == -EAGAIN) {
-		dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
-			__func__, data->ds_clp, data->ds_clp->cl_session);
-		if (reset)
-			filelayout_reset_read(task, data);
+	err = filelayout_async_handle_error(task, data->args.context->state,
+					  data->ds_clp, devid);
+
+	switch (err) {
+	case -NFS4ERR_RESET_TO_MDS:
+		filelayout_reset_read(task, data);
+		/* fall through */
+	case -EAGAIN:
 		rpc_restart_call_prepare(task);
 		return -EAGAIN;
 	}
@@ -260,14 +273,17 @@ static void filelayout_read_release(void *data)
 static int filelayout_write_done_cb(struct rpc_task *task,
 				struct nfs_write_data *data)
 {
-	int reset = 0;
-
-	if (filelayout_async_handle_error(task, data->args.context->state,
-					  data->ds_clp, &reset) == -EAGAIN) {
-		dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
-			__func__, data->ds_clp, data->ds_clp->cl_session);
-		if (reset)
-			filelayout_reset_write(task, data);
+	struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(data->lseg);
+	int err;
+
+	err = filelayout_async_handle_error(task, data->args.context->state,
+					  data->ds_clp, devid);
+
+	switch (err) {
+	case -NFS4ERR_RESET_TO_MDS:
+		filelayout_reset_write(task, data);
+		/* fall through */
+	case -EAGAIN:
 		rpc_restart_call_prepare(task);
 		return -EAGAIN;
 	}
@@ -290,16 +306,19 @@ static void prepare_to_resend_writes(struct nfs_write_data *data)
 static int filelayout_commit_done_cb(struct rpc_task *task,
 				     struct nfs_write_data *data)
 {
-	int reset = 0;
-
-	if (filelayout_async_handle_error(task, data->args.context->state,
-					  data->ds_clp, &reset) == -EAGAIN) {
-		dprintk("%s calling restart ds_clp %p ds_clp->cl_session %p\n",
-			__func__, data->ds_clp, data->ds_clp->cl_session);
-		if (reset)
-			prepare_to_resend_writes(data);
-		else
-			rpc_restart_call_prepare(task);
+	struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(data->lseg);
+	int err;
+
+	err = filelayout_async_handle_error(task, data->args.context->state,
+					  data->ds_clp, devid);
+
+	switch (err) {
+	case -NFS4ERR_RESET_TO_MDS:
+		prepare_to_resend_writes(data);
+		return -EAGAIN;
+
+	case -EAGAIN:
+		rpc_restart_call_prepare(task);
 		return -EAGAIN;
 	}
 
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index b54b389..745324c 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -41,6 +41,9 @@
 #define NFS4_PNFS_MAX_STRIPE_CNT 4096
 #define NFS4_PNFS_MAX_MULTI_CNT  256 /* 256 fit into a u8 stripe_index */
 
+/* error codes for internal use */
+#define NFS4ERR_RESET_TO_MDS   12001
+
 enum stripetype4 {
 	STRIPE_SPARSE = 1,
 	STRIPE_DENSE = 2
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH Version 3 06/11] NFSv4.1: send filelayout DS commits to the MDS on invalid deviceid
  2012-03-22 19:19 [PATCH Version 3 00/11] NFSv4.1 file layout data server quick failover andros
                   ` (4 preceding siblings ...)
  2012-03-22 19:19 ` [PATCH Version 3 05/11] NFSv4.1: mark deviceid invalid on filelayout DS connection errors andros
@ 2012-03-22 19:19 ` andros
  2012-03-22 19:19 ` [PATCH Version 3 07/11] NFSv4.1 Check invalid deviceid upon slot table waitq wakeup andros
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: andros @ 2012-03-22 19:19 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayout.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 869ce26..04d9445 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -848,10 +848,11 @@ filelayout_choose_commit_list(struct nfs_page *req,
 			      struct pnfs_layout_segment *lseg)
 {
 	struct nfs4_filelayout_segment *fl = FILELAYOUT_LSEG(lseg);
+	struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(lseg);
 	u32 i, j;
 	struct list_head *list;
 
-	if (fl->commit_through_mds)
+	if (fl->commit_through_mds || filelayout_test_devid_invalid(devid))
 		return &NFS_I(req->wb_context->dentry->d_inode)->commit_list;
 
 	/* Note that we are calling nfs4_fl_calc_j_index on each page
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH Version 3 07/11] NFSv4.1 Check invalid deviceid upon slot table waitq wakeup
  2012-03-22 19:19 [PATCH Version 3 00/11] NFSv4.1 file layout data server quick failover andros
                   ` (5 preceding siblings ...)
  2012-03-22 19:19 ` [PATCH Version 3 06/11] NFSv4.1: send filelayout DS commits to the MDS on invalid deviceid andros
@ 2012-03-22 19:19 ` andros
  2012-03-22 19:19 ` [PATCH Version 3 08/11] NFSv4.1 wake up all tasks on un-connected DS slot table waitq andros
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: andros @ 2012-03-22 19:19 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Tasks sleeping on the slot table waitq wake to the rpc_prepare_task state.
Reset the task for io through the MDS if the deviceid is invalid.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayout.c |   26 +++++++++++++++++++++++++-
 1 files changed, 25 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 04d9445..faba987 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -235,6 +235,12 @@ static void filelayout_read_prepare(struct rpc_task *task, void *data)
 {
 	struct nfs_read_data *rdata = (struct nfs_read_data *)data;
 
+	if (filelayout_test_devid_invalid(FILELAYOUT_DEVID_NODE(rdata->lseg))) {
+		dprintk("%s task %u reset io to MDS\n", __func__, task->tk_pid);
+		filelayout_reset_read(task, rdata);
+		rpc_restart_call_prepare(task);
+		return;
+	}
 	rdata->read_done_cb = filelayout_read_done_cb;
 
 	if (nfs41_setup_sequence(rdata->ds_clp->cl_session,
@@ -329,6 +335,12 @@ static void filelayout_write_prepare(struct rpc_task *task, void *data)
 {
 	struct nfs_write_data *wdata = (struct nfs_write_data *)data;
 
+	if (filelayout_test_devid_invalid(FILELAYOUT_DEVID_NODE(wdata->lseg))) {
+		dprintk("%s task %u reset io to MDS\n", __func__, task->tk_pid);
+		filelayout_reset_write(task, wdata);
+		rpc_restart_call_prepare(task);
+		return;
+	}
 	if (nfs41_setup_sequence(wdata->ds_clp->cl_session,
 				&wdata->args.seq_args, &wdata->res.seq_res,
 				task))
@@ -360,6 +372,18 @@ static void filelayout_write_release(void *data)
 	wdata->mds_ops->rpc_release(data);
 }
 
+static void filelayout_commit_prepare(struct rpc_task *task, void *data)
+{
+	struct nfs_write_data *wdata = (struct nfs_write_data *)data;
+
+	if (nfs41_setup_sequence(wdata->ds_clp->cl_session,
+				&wdata->args.seq_args, &wdata->res.seq_res,
+				task))
+		return;
+
+	rpc_call_start(task);
+}
+
 static void filelayout_commit_release(void *data)
 {
 	struct nfs_write_data *wdata = (struct nfs_write_data *)data;
@@ -386,7 +410,7 @@ static const struct rpc_call_ops filelayout_write_call_ops = {
 };
 
 static const struct rpc_call_ops filelayout_commit_call_ops = {
-	.rpc_call_prepare = filelayout_write_prepare,
+	.rpc_call_prepare = filelayout_commit_prepare,
 	.rpc_call_done = filelayout_write_call_done,
 	.rpc_count_stats = filelayout_write_count_stats,
 	.rpc_release = filelayout_commit_release,
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH Version 3 08/11] NFSv4.1 wake up all tasks on un-connected DS slot table waitq
  2012-03-22 19:19 [PATCH Version 3 00/11] NFSv4.1 file layout data server quick failover andros
                   ` (6 preceding siblings ...)
  2012-03-22 19:19 ` [PATCH Version 3 07/11] NFSv4.1 Check invalid deviceid upon slot table waitq wakeup andros
@ 2012-03-22 19:19 ` andros
  2012-03-22 19:19 ` [PATCH Version 3 09/11] NFSv4.1 ref count nfs_client across filelayout data server io andros
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: andros @ 2012-03-22 19:19 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

The DS has a connection error (invalid deviceid). Drain the fore channel
slot table waitq.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayout.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index faba987..ce8734d 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -120,6 +120,7 @@ static int filelayout_async_handle_error(struct rpc_task *task,
 {
 	struct nfs_server *mds_server = NFS_SERVER(state->inode);
 	struct nfs_client *mds_client = mds_server->nfs_client;
+	struct nfs4_slot_table *tbl = &clp->cl_session->fc_slot_table;
 
 	if (task->tk_status >= 0)
 		return 0;
@@ -168,6 +169,7 @@ static int filelayout_async_handle_error(struct rpc_task *task,
 		dprintk("%s DS connection error %d\n", __func__,
 			task->tk_status);
 		filelayout_mark_devid_invalid(devid);
+		rpc_wake_up(&tbl->slot_tbl_waitq);
 		/* fall through */
 	default:
 		dprintk("%s Retry through MDS. Error %d\n", __func__,
@@ -192,8 +194,6 @@ static int filelayout_read_done_cb(struct rpc_task *task,
 	struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(data->lseg);
 	int err;
 
-	dprintk("%s DS read\n", __func__);
-
 	err = filelayout_async_handle_error(task, data->args.context->state,
 					  data->ds_clp, devid);
 
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH Version 3 09/11] NFSv4.1 ref count nfs_client across filelayout data server io
  2012-03-22 19:19 [PATCH Version 3 00/11] NFSv4.1 file layout data server quick failover andros
                   ` (7 preceding siblings ...)
  2012-03-22 19:19 ` [PATCH Version 3 08/11] NFSv4.1 wake up all tasks on un-connected DS slot table waitq andros
@ 2012-03-22 19:19 ` andros
  2012-03-22 19:19 ` [PATCH Version 3 10/11] NFSv4.1 de reference a disconnected data server client record andros
  2012-03-22 19:19 ` [PATCH Version 3 11/11] NFSv4.1 check for NULL pnfs_layout_hdr in pnfs scan commit lists andros
  10 siblings, 0 replies; 12+ messages in thread
From: andros @ 2012-03-22 19:19 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Prepare to put a dis-connected DS client record.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayout.c |   22 +++++++++++++++++-----
 1 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index ce8734d..acf6226 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -88,6 +88,8 @@ void filelayout_reset_write(struct rpc_task *task, struct nfs_write_data *data)
 	dprintk("%s Reset task for i/o through MDS\n", __func__);
 	put_lseg(data->lseg);
 	data->lseg          = NULL;
+	/* balance nfs_get_client in filelayout_write_pagelist */
+	nfs_put_client(data->ds_clp);
 	data->ds_clp        = NULL;
 	data->write_done_cb = nfs4_write_done_cb;
 	data->args.fh       = NFS_FH(data->inode);
@@ -106,6 +108,8 @@ void filelayout_reset_read(struct rpc_task *task, struct nfs_read_data *data)
 	data->lseg         = NULL;
 	/* offsets will differ in the dense stripe case */
 	data->args.offset  = data->mds_offset;
+	/* balance nfs_get_client in filelayout_read_pagelist */
+	nfs_put_client(data->ds_clp);
 	data->ds_clp       = NULL;
 	data->args.fh      = NFS_FH(data->inode);
 	data->read_done_cb = nfs4_read_done_cb;
@@ -273,6 +277,7 @@ static void filelayout_read_release(void *data)
 	struct nfs_read_data *rdata = (struct nfs_read_data *)data;
 
 	put_lseg(rdata->lseg);
+	nfs_put_client(rdata->ds_clp);
 	rdata->mds_ops->rpc_release(data);
 }
 
@@ -369,6 +374,7 @@ static void filelayout_write_release(void *data)
 	struct nfs_write_data *wdata = (struct nfs_write_data *)data;
 
 	put_lseg(wdata->lseg);
+	nfs_put_client(wdata->ds_clp);
 	wdata->mds_ops->rpc_release(data);
 }
 
@@ -388,6 +394,7 @@ static void filelayout_commit_release(void *data)
 {
 	struct nfs_write_data *wdata = (struct nfs_write_data *)data;
 
+	nfs_put_client(wdata->ds_clp);
 	nfs_commit_release_pages(wdata);
 	if (atomic_dec_and_test(&NFS_I(wdata->inode)->commits_outstanding))
 		nfs_commit_clear_lock(NFS_I(wdata->inode));
@@ -436,9 +443,11 @@ filelayout_read_pagelist(struct nfs_read_data *data)
 	ds = nfs4_fl_prepare_ds(lseg, idx);
 	if (!ds)
 		return PNFS_NOT_ATTEMPTED;
-	dprintk("%s USE DS: %s\n", __func__, ds->ds_remotestr);
+	dprintk("%s USE DS: %s cl_count %d\n", __func__,
+		ds->ds_remotestr, atomic_read(&ds->ds_clp->cl_count));
 
 	/* No multipath support. Use first DS */
+	atomic_inc(&ds->ds_clp->cl_count);
 	data->ds_clp = ds->ds_clp;
 	fh = nfs4_fl_select_ds_fh(lseg, j);
 	if (fh)
@@ -471,11 +480,12 @@ filelayout_write_pagelist(struct nfs_write_data *data, int sync)
 	ds = nfs4_fl_prepare_ds(lseg, idx);
 	if (!ds)
 		return PNFS_NOT_ATTEMPTED;
-	dprintk("%s ino %lu sync %d req %Zu@%llu DS: %s\n", __func__,
-		data->inode->i_ino, sync, (size_t) data->args.count, offset,
-		ds->ds_remotestr);
+	dprintk("%s ino %lu sync %d req %Zu@%llu DS: %s cl_count %d\n",
+		__func__, data->inode->i_ino, sync, (size_t) data->args.count,
+		offset, ds->ds_remotestr, atomic_read(&ds->ds_clp->cl_count));
 
 	data->write_done_cb = filelayout_write_done_cb;
+	atomic_inc(&ds->ds_clp->cl_count);
 	data->ds_clp = ds->ds_clp;
 	fh = nfs4_fl_select_ds_fh(lseg, j);
 	if (fh)
@@ -951,8 +961,10 @@ static int filelayout_initiate_commit(struct nfs_write_data *data, int how)
 		filelayout_commit_release(data);
 		return -EAGAIN;
 	}
-	dprintk("%s ino %lu, how %d\n", __func__, data->inode->i_ino, how);
+	dprintk("%s ino %lu, how %d cl_count %d\n", __func__,
+		data->inode->i_ino, how, atomic_read(&ds->ds_clp->cl_count));
 	data->write_done_cb = filelayout_commit_done_cb;
+	atomic_inc(&ds->ds_clp->cl_count);
 	data->ds_clp = ds->ds_clp;
 	fh = select_ds_fh_from_commit(lseg, data->ds_commit_index);
 	if (fh)
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH Version 3 10/11] NFSv4.1 de reference a disconnected data server client record
  2012-03-22 19:19 [PATCH Version 3 00/11] NFSv4.1 file layout data server quick failover andros
                   ` (8 preceding siblings ...)
  2012-03-22 19:19 ` [PATCH Version 3 09/11] NFSv4.1 ref count nfs_client across filelayout data server io andros
@ 2012-03-22 19:19 ` andros
  2012-03-22 19:19 ` [PATCH Version 3 11/11] NFSv4.1 check for NULL pnfs_layout_hdr in pnfs scan commit lists andros
  10 siblings, 0 replies; 12+ messages in thread
From: andros @ 2012-03-22 19:19 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

When the last DS io is processed, the data server client record will be
freed.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4filelayout.c    |    1 +
 fs/nfs/nfs4filelayout.h    |    1 +
 fs/nfs/nfs4filelayoutdev.c |   17 +++++++++++++++++
 3 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index acf6226..5032e4a 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -174,6 +174,7 @@ static int filelayout_async_handle_error(struct rpc_task *task,
 			task->tk_status);
 		filelayout_mark_devid_invalid(devid);
 		rpc_wake_up(&tbl->slot_tbl_waitq);
+		nfs4_ds_disconnect(clp);
 		/* fall through */
 	default:
 		dprintk("%s Retry through MDS. Error %d\n", __func__,
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index 745324c..ff86c86 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -135,5 +135,6 @@ extern void nfs4_fl_put_deviceid(struct nfs4_file_layout_dsaddr *dsaddr);
 extern void nfs4_fl_free_deviceid(struct nfs4_file_layout_dsaddr *dsaddr);
 struct nfs4_file_layout_dsaddr *
 get_device_info(struct inode *inode, struct nfs4_deviceid *dev_id, gfp_t gfp_flags);
+void nfs4_ds_disconnect(struct nfs_client *clp);
 
 #endif /* FS_NFS_NFS4FILELAYOUT_H */
diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index 2b8ae96..0e54cdf 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -145,6 +145,23 @@ _data_server_lookup_locked(const struct list_head *dsaddrs)
 }
 
 /*
+ * Lookup DS by nfs_client pointer. Zero data server client pointer
+ */
+void nfs4_ds_disconnect(struct nfs_client *clp)
+{
+	struct nfs4_pnfs_ds *ds;
+
+	dprintk("%s clp %p\n", __func__, clp);
+	spin_lock(&nfs4_ds_cache_lock);
+	list_for_each_entry(ds, &nfs4_data_server_cache, ds_node)
+		if (ds->ds_clp && ds->ds_clp == clp) {
+			nfs_put_client(clp);
+			ds->ds_clp = NULL;
+		}
+	spin_unlock(&nfs4_ds_cache_lock);
+}
+
+/*
  * Create an rpc connection to the nfs4_pnfs_ds data server
  * Currently only supports IPv4 and IPv6 addresses
  */
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH Version 3 11/11] NFSv4.1 check for NULL pnfs_layout_hdr in pnfs scan commit lists
  2012-03-22 19:19 [PATCH Version 3 00/11] NFSv4.1 file layout data server quick failover andros
                   ` (9 preceding siblings ...)
  2012-03-22 19:19 ` [PATCH Version 3 10/11] NFSv4.1 de reference a disconnected data server client record andros
@ 2012-03-22 19:19 ` andros
  10 siblings, 0 replies; 12+ messages in thread
From: andros @ 2012-03-22 19:19 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/pnfs.h |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 442ebf6..3bd7e87 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -296,9 +296,10 @@ static inline int
 pnfs_scan_commit_lists(struct inode *inode, int max, spinlock_t *lock)
 {
 	struct pnfs_layoutdriver_type *ld = NFS_SERVER(inode)->pnfs_curr_ld;
+	struct pnfs_layout_hdr *lh = NFS_I(inode)->layout;
 	int ret;
 
-	if (ld == NULL || ld->scan_commit_lists == NULL)
+	if (ld == NULL || ld->scan_commit_lists == NULL || lh == NULL)
 		return 0;
 	ret = ld->scan_commit_lists(inode, max, lock);
 	if (ret != 0)
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-03-22 19:19 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-22 19:19 [PATCH Version 3 00/11] NFSv4.1 file layout data server quick failover andros
2012-03-22 19:19 ` [PATCH Version 3 01/11] NFSv4.1 move nfs4_reset_read and nfs_reset_write andros
2012-03-22 19:19 ` [PATCH Version 3 02/11] NFSv4.1: cleanup filelayout invalid deviceid handling andros
2012-03-22 19:19 ` [PATCH Version 3 03/11] NFSv4.1 cleanup filelayout invalid layout handling andros
2012-03-22 19:19 ` [PATCH Version 3 04/11] NFSv4.1 set RPC_TASK_SOFTCONN for filelayout DS RPC calls andros
2012-03-22 19:19 ` [PATCH Version 3 05/11] NFSv4.1: mark deviceid invalid on filelayout DS connection errors andros
2012-03-22 19:19 ` [PATCH Version 3 06/11] NFSv4.1: send filelayout DS commits to the MDS on invalid deviceid andros
2012-03-22 19:19 ` [PATCH Version 3 07/11] NFSv4.1 Check invalid deviceid upon slot table waitq wakeup andros
2012-03-22 19:19 ` [PATCH Version 3 08/11] NFSv4.1 wake up all tasks on un-connected DS slot table waitq andros
2012-03-22 19:19 ` [PATCH Version 3 09/11] NFSv4.1 ref count nfs_client across filelayout data server io andros
2012-03-22 19:19 ` [PATCH Version 3 10/11] NFSv4.1 de reference a disconnected data server client record andros
2012-03-22 19:19 ` [PATCH Version 3 11/11] NFSv4.1 check for NULL pnfs_layout_hdr in pnfs scan commit lists andros

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.