linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/17] Lustre stability patches
@ 2014-03-01  2:16 Oleg Drokin
  2014-03-01  2:16 ` [PATCH 01/17] staging/lustre/llite: fix open lock matching in ll_md_blocking_ast() Oleg Drokin
                   ` (16 more replies)
  0 siblings, 17 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-03-01  2:16 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel; +Cc: Oleg Drokin

This series of patches fixes most of the issues I hit during
Lustre regression test suite. All observed crashes are gone too.

Please consider for inclusion.

Alexey Lyashkov (1):
  lustre/mdc: use ibits_known mask for lock match

Ann Koehler (1):
  lustre/osc: Don't flush active extents.

Bruno Faccini (1):
  lustre/ldlm: set l_lvb_type coherent when layout is returned

Hongchao Zhang (1):
  lustre/recovery: free open/close request promptly

John L. Hammond (3):
  staging/lustre/llite: fix open lock matching in ll_md_blocking_ast()
  lustre/clio: honor O_NOATIME
  lustre/mdc: fix bad ERR_PTR usage in mdc_locks.c

Lai Siyao (1):
  lustre/llite: simplify dentry revalidate

Liang Zhen (2):
  lustre/ptlrpc: rq_commit_cb is called for twice
  lustre/ptlrpc: re-enqueue ptlrpcd worker

Niu Yawei (1):
  lustre/quota: improper assert in osc_quota_chkdq()

Oleg Drokin (3):
  lustre/mdc: Check for all attributes validity in revalidate
  lustre/llite: Do not send parent dir fid in getattr by fid
  lustre/libcfs: warn if all HTs in a core are gone

Peng Tao (1):
  lustre/ptlrpc: skip rpcs that fail ptl_send_rpc

Sebastien Buisson (1):
  lustre/ptlrpc: fix 'data race condition' issues

wang di (1):
  lustre/mdc: comments on LOOKUP and PERM lock

 drivers/staging/lustre/lustre/include/cl_object.h  |   6 +-
 .../lustre/lustre/include/lustre/lustre_idl.h      |  32 ++-
 .../staging/lustre/lustre/include/lustre_export.h  |  17 ++
 .../staging/lustre/lustre/include/lustre_import.h  |  11 +
 drivers/staging/lustre/lustre/include/lustre_net.h |   2 +
 drivers/staging/lustre/lustre/include/obd.h        |   5 +-
 drivers/staging/lustre/lustre/include/obd_class.h  |   4 +-
 drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c    |   1 +
 .../staging/lustre/lustre/libcfs/linux/linux-cpu.c |  19 +-
 drivers/staging/lustre/lustre/llite/dcache.c       | 290 ++-------------------
 drivers/staging/lustre/lustre/llite/dir.c          |   2 +-
 drivers/staging/lustre/lustre/llite/file.c         |  60 +++--
 .../staging/lustre/lustre/llite/llite_internal.h   |   6 +-
 drivers/staging/lustre/lustre/llite/llite_lib.c    |   3 +-
 drivers/staging/lustre/lustre/llite/namei.c        |  78 +++---
 drivers/staging/lustre/lustre/lmv/lmv_intent.c     |   1 -
 drivers/staging/lustre/lustre/lmv/lmv_obd.c        |   5 +-
 drivers/staging/lustre/lustre/lov/lov_io.c         |   1 +
 drivers/staging/lustre/lustre/mdc/mdc_internal.h   |   2 +-
 drivers/staging/lustre/lustre/mdc/mdc_locks.c      | 102 ++++----
 drivers/staging/lustre/lustre/mdc/mdc_reint.c      |   1 +
 drivers/staging/lustre/lustre/mdc/mdc_request.c    |  27 +-
 drivers/staging/lustre/lustre/obdclass/genops.c    |   2 +
 .../lustre/lustre/obdclass/lprocfs_status.c        |   1 +
 drivers/staging/lustre/lustre/osc/osc_cache.c      |   6 +
 drivers/staging/lustre/lustre/osc/osc_io.c         |  14 +-
 drivers/staging/lustre/lustre/osc/osc_quota.c      |   7 +-
 drivers/staging/lustre/lustre/ptlrpc/client.c      | 155 ++++++++---
 drivers/staging/lustre/lustre/ptlrpc/import.c      |  33 ++-
 drivers/staging/lustre/lustre/ptlrpc/niobuf.c      |   4 +
 drivers/staging/lustre/lustre/ptlrpc/recover.c     |  57 +++-
 31 files changed, 480 insertions(+), 474 deletions(-)

-- 
1.8.5.3


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 01/17] staging/lustre/llite: fix open lock matching in ll_md_blocking_ast()
  2014-03-01  2:16 [PATCH 00/17] Lustre stability patches Oleg Drokin
@ 2014-03-01  2:16 ` Oleg Drokin
  2014-03-03 10:01   ` Dan Carpenter
  2014-03-01  2:16 ` [PATCH 02/17] lustre/mdc: Check for all attributes validity in revalidate Oleg Drokin
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 19+ messages in thread
From: Oleg Drokin @ 2014-03-01  2:16 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel; +Cc: John L. Hammond, Oleg Drokin

From: "John L. Hammond" <john.hammond@intel.com>

In ll_md_blocking_ast() match open locks before all others, ensuring
that MDS_INODELOCK_OPEN is not cleared from bits by another open lock
with a different mode. Change the int flags parameter of
ll_md_real_close() to fmode_t fmode. Clean up verious style issues in
both functions.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Reviewed-on: http://review.whamcloud.com/8718
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4429
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lustre/llite/file.c         | 19 +++---
 .../staging/lustre/lustre/llite/llite_internal.h   |  2 +-
 drivers/staging/lustre/lustre/llite/namei.c        | 78 ++++++++++++----------
 3 files changed, 54 insertions(+), 45 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 4c28f39..c9ee574 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -205,7 +205,7 @@ out:
 	return rc;
 }
 
-int ll_md_real_close(struct inode *inode, int flags)
+int ll_md_real_close(struct inode *inode, fmode_t fmode)
 {
 	struct ll_inode_info *lli = ll_i2info(inode);
 	struct obd_client_handle **och_p;
@@ -213,30 +213,33 @@ int ll_md_real_close(struct inode *inode, int flags)
 	__u64 *och_usecount;
 	int rc = 0;
 
-	if (flags & FMODE_WRITE) {
+	if (fmode & FMODE_WRITE) {
 		och_p = &lli->lli_mds_write_och;
 		och_usecount = &lli->lli_open_fd_write_count;
-	} else if (flags & FMODE_EXEC) {
+	} else if (fmode & FMODE_EXEC) {
 		och_p = &lli->lli_mds_exec_och;
 		och_usecount = &lli->lli_open_fd_exec_count;
 	} else {
-		LASSERT(flags & FMODE_READ);
+		LASSERT(fmode & FMODE_READ);
 		och_p = &lli->lli_mds_read_och;
 		och_usecount = &lli->lli_open_fd_read_count;
 	}
 
 	mutex_lock(&lli->lli_och_mutex);
-	if (*och_usecount) { /* There are still users of this handle, so
-				skip freeing it. */
+	if (*och_usecount > 0) {
+		/* There are still users of this handle, so skip
+		 * freeing it. */
 		mutex_unlock(&lli->lli_och_mutex);
 		return 0;
 	}
+
 	och=*och_p;
 	*och_p = NULL;
 	mutex_unlock(&lli->lli_och_mutex);
 
-	if (och) { /* There might be a race and somebody have freed this och
-		      already */
+	if (och != NULL) {
+		/* There might be a race and this handle may already
+		   be closed. */
 		rc = ll_close_inode_openhandle(ll_i2sbi(inode)->ll_md_exp,
 					       inode, och, NULL);
 	}
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index e27efd1..47c5142 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -775,7 +775,7 @@ int ll_local_open(struct file *file,
 int ll_release_openhandle(struct dentry *, struct lookup_intent *);
 int ll_md_close(struct obd_export *md_exp, struct inode *inode,
 		struct file *file);
-int ll_md_real_close(struct inode *inode, int flags);
+int ll_md_real_close(struct inode *inode, fmode_t fmode);
 void ll_ioepoch_close(struct inode *inode, struct md_op_data *op_data,
 		      struct obd_client_handle **och, unsigned long flags);
 void ll_done_writing_attr(struct inode *inode, struct md_op_data *op_data);
diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c
index 93c3744..86ff708 100644
--- a/drivers/staging/lustre/lustre/llite/namei.c
+++ b/drivers/staging/lustre/lustre/llite/namei.c
@@ -195,101 +195,107 @@ static void ll_invalidate_negative_children(struct inode *dir)
 int ll_md_blocking_ast(struct ldlm_lock *lock, struct ldlm_lock_desc *desc,
 		       void *data, int flag)
 {
-	int rc;
 	struct lustre_handle lockh;
+	int rc;
 
 	switch (flag) {
 	case LDLM_CB_BLOCKING:
 		ldlm_lock2handle(lock, &lockh);
 		rc = ldlm_cli_cancel(&lockh, LCF_ASYNC);
 		if (rc < 0) {
-			CDEBUG(D_INODE, "ldlm_cli_cancel: %d\n", rc);
+			CDEBUG(D_INODE, "ldlm_cli_cancel: rc = %d\n", rc);
 			return rc;
 		}
 		break;
 	case LDLM_CB_CANCELING: {
 		struct inode *inode = ll_inode_from_resource_lock(lock);
-		struct ll_inode_info *lli;
 		__u64 bits = lock->l_policy_data.l_inodebits.bits;
-		struct lu_fid *fid;
-		ldlm_mode_t mode = lock->l_req_mode;
 
 		/* Inode is set to lock->l_resource->lr_lvb_inode
 		 * for mdc - bug 24555 */
 		LASSERT(lock->l_ast_data == NULL);
 
-		/* Invalidate all dentries associated with this inode */
 		if (inode == NULL)
 			break;
 
+		/* Invalidate all dentries associated with this inode */
 		LASSERT(lock->l_flags & LDLM_FL_CANCELING);
 
-		if (bits & MDS_INODELOCK_XATTR)
+		if (!fid_res_name_eq(ll_inode2fid(inode),
+				     &lock->l_resource->lr_name)) {
+			LDLM_ERROR(lock, "data mismatch with object "DFID"(%p)",
+				   PFID(ll_inode2fid(inode)), inode);
+			LBUG();
+		}
+
+		if (bits & MDS_INODELOCK_XATTR) {
 			ll_xattr_cache_destroy(inode);
+			bits &= ~MDS_INODELOCK_XATTR;
+		}
 
 		/* For OPEN locks we differentiate between lock modes
 		 * LCK_CR, LCK_CW, LCK_PR - bug 22891 */
-		if (bits & (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_UPDATE |
-			    MDS_INODELOCK_LAYOUT | MDS_INODELOCK_PERM))
-			ll_have_md_lock(inode, &bits, LCK_MINMODE);
-
 		if (bits & MDS_INODELOCK_OPEN)
-			ll_have_md_lock(inode, &bits, mode);
-
-		fid = ll_inode2fid(inode);
-		if (!fid_res_name_eq(fid, &lock->l_resource->lr_name))
-			LDLM_ERROR(lock, "data mismatch with object "
-				   DFID" (%p)", PFID(fid), inode);
+			ll_have_md_lock(inode, &bits, lock->l_req_mode);
 
 		if (bits & MDS_INODELOCK_OPEN) {
-			int flags = 0;
+			fmode_t fmode;
+
 			switch (lock->l_req_mode) {
 			case LCK_CW:
-				flags = FMODE_WRITE;
+				fmode = FMODE_WRITE;
 				break;
 			case LCK_PR:
-				flags = FMODE_EXEC;
+				fmode = FMODE_EXEC;
 				break;
 			case LCK_CR:
-				flags = FMODE_READ;
+				fmode = FMODE_READ;
 				break;
 			default:
-				CERROR("Unexpected lock mode for OPEN lock "
-				       "%d, inode %ld\n", lock->l_req_mode,
-				       inode->i_ino);
+				LDLM_ERROR(lock, "bad lock mode for OPEN lock");
+				LBUG();
 			}
-			ll_md_real_close(inode, flags);
+
+			ll_md_real_close(inode, fmode);
 		}
 
-		lli = ll_i2info(inode);
+		if (bits & (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_UPDATE |
+			    MDS_INODELOCK_LAYOUT | MDS_INODELOCK_PERM))
+			ll_have_md_lock(inode, &bits, LCK_MINMODE);
+
 		if (bits & MDS_INODELOCK_LAYOUT) {
-			struct cl_object_conf conf = { { 0 } };
+			struct cl_object_conf conf = {
+				.coc_opc = OBJECT_CONF_INVALIDATE,
+				.coc_inode = inode,
+			};
 
-			conf.coc_opc = OBJECT_CONF_INVALIDATE;
-			conf.coc_inode = inode;
 			rc = ll_layout_conf(inode, &conf);
-			if (rc)
-				CDEBUG(D_INODE, "invaliding layout %d.\n", rc);
+			if (rc < 0)
+				CDEBUG(D_INODE, "cannot invalidate layout of "
+				       DFID": rc = %d\n",
+				       PFID(ll_inode2fid(inode)), rc);
 		}
 
 		if (bits & MDS_INODELOCK_UPDATE) {
+			struct ll_inode_info *lli = ll_i2info(inode);
+
 			spin_lock(&lli->lli_lock);
 			lli->lli_flags &= ~LLIF_MDS_SIZE_LOCK;
 			spin_unlock(&lli->lli_lock);
 		}
 
-		if (S_ISDIR(inode->i_mode) &&
-		     (bits & MDS_INODELOCK_UPDATE)) {
+		if ((bits & MDS_INODELOCK_UPDATE) && S_ISDIR(inode->i_mode)) {
 			CDEBUG(D_INODE, "invalidating inode %lu\n",
 			       inode->i_ino);
 			truncate_inode_pages(inode->i_mapping, 0);
 			ll_invalidate_negative_children(inode);
 		}
 
-		if (inode->i_sb->s_root &&
-		    inode != inode->i_sb->s_root->d_inode &&
-		    (bits & (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_PERM)))
+		if ((bits & (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_PERM)) &&
+		    inode->i_sb->s_root != NULL &&
+		    inode != inode->i_sb->s_root->d_inode)
 			ll_invalidate_aliases(inode);
+
 		iput(inode);
 		break;
 	}
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 02/17] lustre/mdc: Check for all attributes validity in revalidate
  2014-03-01  2:16 [PATCH 00/17] Lustre stability patches Oleg Drokin
  2014-03-01  2:16 ` [PATCH 01/17] staging/lustre/llite: fix open lock matching in ll_md_blocking_ast() Oleg Drokin
@ 2014-03-01  2:16 ` Oleg Drokin
  2014-03-01  2:16 ` [PATCH 03/17] lustre/llite: Do not send parent dir fid in getattr by fid Oleg Drokin
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-03-01  2:16 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel
  Cc: Oleg Drokin, Alexey Lyashkov, Oleg Drokin

GETATTR needs to return attributes protected by different bits, so
we need to ensure all we have locks with all of those bits, not
just UPDATE bit

Signed-off-by: Alexey Lyashkov <alexey_lyashkov@xyratex.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: http://review.whamcloud.com/6460
Xyratex-bug-id: MRP-1052
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3240
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
---
 drivers/staging/lustre/lustre/mdc/mdc_locks.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 288a41e..1336d47 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -1061,7 +1061,20 @@ int mdc_revalidate_lock(struct obd_export *exp, struct lookup_intent *it,
 		fid_build_reg_res_name(fid, &res_id);
 		switch (it->it_op) {
 		case IT_GETATTR:
-			policy.l_inodebits.bits = MDS_INODELOCK_UPDATE;
+			/* File attributes are held under multiple bits:
+			 * nlink is under lookup lock, size and times are
+			 * under UPDATE lock and recently we've also got
+			 * a separate permissions lock for owner/group/acl that
+			 * were protected by lookup lock before.
+			 * Getattr must provide all of that information,
+			 * so we need to ensure we have all of those locks.
+			 * Unfortunately, if the bits are split across multiple
+			 * locks, there's no easy way to match all of them here,
+			 * so an extra RPC would be performed to fetch all
+			 * of those bits at once for now. */
+			policy.l_inodebits.bits = MDS_INODELOCK_UPDATE |
+						  MDS_INODELOCK_LOOKUP |
+						  MDS_INODELOCK_PERM;
 			break;
 		case IT_LAYOUT:
 			policy.l_inodebits.bits = MDS_INODELOCK_LAYOUT;
@@ -1070,6 +1083,7 @@ int mdc_revalidate_lock(struct obd_export *exp, struct lookup_intent *it,
 			policy.l_inodebits.bits = MDS_INODELOCK_LOOKUP;
 			break;
 		}
+
 		mode = ldlm_lock_match(exp->exp_obd->obd_namespace,
 				       LDLM_FL_BLOCK_GRANTED, &res_id,
 				       LDLM_IBITS, &policy,
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 03/17] lustre/llite: Do not send parent dir fid in getattr by fid
  2014-03-01  2:16 [PATCH 00/17] Lustre stability patches Oleg Drokin
  2014-03-01  2:16 ` [PATCH 01/17] staging/lustre/llite: fix open lock matching in ll_md_blocking_ast() Oleg Drokin
  2014-03-01  2:16 ` [PATCH 02/17] lustre/mdc: Check for all attributes validity in revalidate Oleg Drokin
@ 2014-03-01  2:16 ` Oleg Drokin
  2014-03-01  2:16 ` [PATCH 04/17] lustre/mdc: comments on LOOKUP and PERM lock Oleg Drokin
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-03-01  2:16 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel; +Cc: Oleg Drokin, Oleg Drokin

Sending getattr by fid in this case is pointless, as the parent
might havelong changed and we have no control over it, but it's
irrelevant anyway, since we already have the child fid.

Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: http://review.whamcloud.com/7910
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3240
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
---
 drivers/staging/lustre/lustre/llite/dir.c  | 2 +-
 drivers/staging/lustre/lustre/llite/file.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index fd0dd20e..7fbc18e 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -362,7 +362,7 @@ struct page *ll_get_dir_page(struct inode *dir, __u64 hash,
 		struct ptlrpc_request *request;
 		struct md_op_data *op_data;
 
-		op_data = ll_prep_md_op_data(NULL, dir, NULL, NULL, 0, 0,
+		op_data = ll_prep_md_op_data(NULL, dir, dir, NULL, 0, 0,
 		LUSTRE_OPC_ANY, NULL);
 		if (IS_ERR(op_data))
 			return (void *)op_data;
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index c9ee574..36c54aa 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -2891,7 +2891,7 @@ int __ll_inode_revalidate_it(struct dentry *dentry, struct lookup_intent *it,
 			oit.it_op = IT_LOOKUP;
 
 		/* Call getattr by fid, so do not provide name at all. */
-		op_data = ll_prep_md_op_data(NULL, dentry->d_parent->d_inode,
+		op_data = ll_prep_md_op_data(NULL, dentry->d_inode,
 					     dentry->d_inode, NULL, 0, 0,
 					     LUSTRE_OPC_ANY, NULL);
 		if (IS_ERR(op_data))
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 04/17] lustre/mdc: comments on LOOKUP and PERM lock
  2014-03-01  2:16 [PATCH 00/17] Lustre stability patches Oleg Drokin
                   ` (2 preceding siblings ...)
  2014-03-01  2:16 ` [PATCH 03/17] lustre/llite: Do not send parent dir fid in getattr by fid Oleg Drokin
@ 2014-03-01  2:16 ` Oleg Drokin
  2014-03-01  2:16 ` [PATCH 05/17] lustre/mdc: use ibits_known mask for lock match Oleg Drokin
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-03-01  2:16 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel; +Cc: wang di, Oleg Drokin

From: wang di <di.wang@intel.com>

Add more comments for MDS_INODELOCK_PERM and
MDS_INODELOCK_LOOKUP

Signed-off-by: wang di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/7937
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3240
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      | 24 ++++++++++++++++------
 drivers/staging/lustre/lustre/mdc/mdc_locks.c      |  3 +++
 2 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 4183a35..4c70c06 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -2116,12 +2116,24 @@ extern void lustre_swab_generic_32s (__u32 *val);
 #define DISP_OPEN_LEASE      0x04000000
 
 /* INODE LOCK PARTS */
-#define MDS_INODELOCK_LOOKUP 0x000001       /* dentry, mode, owner, group */
-#define MDS_INODELOCK_UPDATE 0x000002       /* size, links, timestamps */
-#define MDS_INODELOCK_OPEN   0x000004       /* For opened files */
-#define MDS_INODELOCK_LAYOUT 0x000008       /* for layout */
-#define MDS_INODELOCK_PERM   0x000010       /* for permission */
-#define MDS_INODELOCK_XATTR  0x000020       /* extended attributes */
+#define MDS_INODELOCK_LOOKUP 0x000001	/* For namespace, dentry etc, and also
+					 * was used to protect permission (mode,
+					 * owner, group etc) before 2.4. */
+#define MDS_INODELOCK_UPDATE 0x000002	/* size, links, timestamps */
+#define MDS_INODELOCK_OPEN   0x000004	/* For opened files */
+#define MDS_INODELOCK_LAYOUT 0x000008	/* for layout */
+
+/* The PERM bit is added int 2.4, and it is used to protect permission(mode,
+ * owner, group, acl etc), so to separate the permission from LOOKUP lock.
+ * Because for remote directories(in DNE), these locks will be granted by
+ * different MDTs(different ldlm namespace).
+ *
+ * For local directory, MDT will always grant UPDATE_LOCK|PERM_LOCK together.
+ * For Remote directory, the master MDT, where the remote directory is, will
+ * grant UPDATE_LOCK|PERM_LOCK, and the remote MDT, where the name entry is,
+ * will grant LOOKUP_LOCK. */
+#define MDS_INODELOCK_PERM   0x000010
+#define MDS_INODELOCK_XATTR  0x000020	/* extended attributes */
 
 #define MDS_INODELOCK_MAXSHIFT 5
 /* This FULL lock is useful to take on unlink sort of operations */
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 1336d47..d9017a5 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -1072,6 +1072,9 @@ int mdc_revalidate_lock(struct obd_export *exp, struct lookup_intent *it,
 			 * locks, there's no easy way to match all of them here,
 			 * so an extra RPC would be performed to fetch all
 			 * of those bits at once for now. */
+			/* For new MDTs(> 2.4), UPDATE|PERM should be enough,
+			 * but for old MDTs (< 2.4), permission is covered
+			 * by LOOKUP lock, so it needs to match all bits here.*/
 			policy.l_inodebits.bits = MDS_INODELOCK_UPDATE |
 						  MDS_INODELOCK_LOOKUP |
 						  MDS_INODELOCK_PERM;
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 05/17] lustre/mdc: use ibits_known mask for lock match
  2014-03-01  2:16 [PATCH 00/17] Lustre stability patches Oleg Drokin
                   ` (3 preceding siblings ...)
  2014-03-01  2:16 ` [PATCH 04/17] lustre/mdc: comments on LOOKUP and PERM lock Oleg Drokin
@ 2014-03-01  2:16 ` Oleg Drokin
  2014-03-01  2:16 ` [PATCH 06/17] lustre/clio: honor O_NOATIME Oleg Drokin
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-03-01  2:16 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel
  Cc: Alexey Lyashkov, Patrick Farrell, Oleg Drokin

From: Alexey Lyashkov <alexey_lyashkov@xyratex.com>

Before revalidating a lock on the client, mask the lock bits against
the lock bits supported by the server (ibits_known), so newer clients
will find valid locks given by older server versions.

Signed-off-by: Patrick Farrell <paf@cray.com>
Signed-off-by: Alexey Lyashkov <alexey_lyashkov@xyratex.com>
Reviewed-on: http://review.whamcloud.com/8636
Xyratex-bug-id: MRP-1583
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4405
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lustre/include/lustre_export.h | 8 ++++++++
 drivers/staging/lustre/lustre/mdc/mdc_locks.c         | 8 +++++---
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre_export.h b/drivers/staging/lustre/lustre/include/lustre_export.h
index 2feb38b..82a230b 100644
--- a/drivers/staging/lustre/lustre/include/lustre_export.h
+++ b/drivers/staging/lustre/lustre/include/lustre_export.h
@@ -380,6 +380,14 @@ static inline bool imp_connect_lvb_type(struct obd_import *imp)
 		return false;
 }
 
+static inline __u64 exp_connect_ibits(struct obd_export *exp)
+{
+	struct obd_connect_data *ocd;
+
+	ocd = &exp->exp_connect_data;
+	return ocd->ocd_ibits_known;
+}
+
 extern struct obd_export *class_conn2export(struct lustre_handle *conn);
 extern struct obd_device *class_conn2obd(struct lustre_handle *conn);
 
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index d9017a5..6ef9e28 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -160,6 +160,8 @@ ldlm_mode_t mdc_lock_match(struct obd_export *exp, __u64 flags,
 	ldlm_mode_t rc;
 
 	fid_build_reg_res_name(fid, &res_id);
+	/* LU-4405: Clear bits not supported by server */
+	policy->l_inodebits.bits &= exp_connect_ibits(exp);
 	rc = ldlm_lock_match(class_exp2obd(exp)->obd_namespace, flags,
 			     &res_id, type, policy, mode, lockh, 0);
 	return rc;
@@ -1087,10 +1089,10 @@ int mdc_revalidate_lock(struct obd_export *exp, struct lookup_intent *it,
 			break;
 		}
 
-		mode = ldlm_lock_match(exp->exp_obd->obd_namespace,
-				       LDLM_FL_BLOCK_GRANTED, &res_id,
+		mode = mdc_lock_match(exp, LDLM_FL_BLOCK_GRANTED, fid,
 				       LDLM_IBITS, &policy,
-				       LCK_CR|LCK_CW|LCK_PR|LCK_PW, &lockh, 0);
+				      LCK_CR | LCK_CW | LCK_PR | LCK_PW,
+				      &lockh);
 	}
 
 	if (mode) {
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 06/17] lustre/clio: honor O_NOATIME
  2014-03-01  2:16 [PATCH 00/17] Lustre stability patches Oleg Drokin
                   ` (4 preceding siblings ...)
  2014-03-01  2:16 ` [PATCH 05/17] lustre/mdc: use ibits_known mask for lock match Oleg Drokin
@ 2014-03-01  2:16 ` Oleg Drokin
  2014-03-01  2:16 ` [PATCH 07/17] lustre/mdc: fix bad ERR_PTR usage in mdc_locks.c Oleg Drokin
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-03-01  2:16 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel; +Cc: John L. Hammond, Oleg Drokin

From: "John L. Hammond" <john.hammond@intel.com>

Add a ci_noatime bit to struct cl_io. In ll_io_init() set this bit if
O_NOATIME is set in f_flags. Ensure that this bit is propagated down
to lower layers. In osc_io_read_start() don't update atime if this bit
is set. Add sanity test 39n to check that passing O_NOATIME to open()
is honored.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Reviewed-on: http://review.whamcloud.com/7442
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3832
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lustre/include/cl_object.h |  6 ++++-
 drivers/staging/lustre/lustre/llite/file.c        | 29 +++++++++++++++++++++++
 drivers/staging/lustre/lustre/lov/lov_io.c        |  1 +
 drivers/staging/lustre/lustre/osc/osc_io.c        | 14 ++++-------
 4 files changed, 40 insertions(+), 10 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
index 4d692dc..76e1b68 100644
--- a/drivers/staging/lustre/lustre/include/cl_object.h
+++ b/drivers/staging/lustre/lustre/include/cl_object.h
@@ -2392,7 +2392,11 @@ struct cl_io {
 	/**
 	 * file is released, restore has to to be triggered by vvp layer
 	 */
-			     ci_restore_needed:1;
+			     ci_restore_needed:1,
+	/**
+	 * O_NOATIME
+	 */
+			     ci_noatime:1;
 	/**
 	 * Number of pages owned by this IO. For invariant checking.
 	 */
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 36c54aa..362f5ec 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -1035,6 +1035,33 @@ int ll_glimpse_ioctl(struct ll_sb_info *sbi, struct lov_stripe_md *lsm,
 	return rc;
 }
 
+static bool file_is_noatime(const struct file *file)
+{
+	const struct vfsmount *mnt = file->f_path.mnt;
+	const struct inode *inode = file->f_path.dentry->d_inode;
+
+	/* Adapted from file_accessed() and touch_atime().*/
+	if (file->f_flags & O_NOATIME)
+		return true;
+
+	if (inode->i_flags & S_NOATIME)
+		return true;
+
+	if (IS_NOATIME(inode))
+		return true;
+
+	if (mnt->mnt_flags & (MNT_NOATIME | MNT_READONLY))
+		return true;
+
+	if ((mnt->mnt_flags & MNT_NODIRATIME) && S_ISDIR(inode->i_mode))
+		return true;
+
+	if ((inode->i_sb->s_flags & MS_NODIRATIME) && S_ISDIR(inode->i_mode))
+		return true;
+
+	return false;
+}
+
 void ll_io_init(struct cl_io *io, const struct file *file, int write)
 {
 	struct inode *inode = file->f_dentry->d_inode;
@@ -1054,6 +1081,8 @@ void ll_io_init(struct cl_io *io, const struct file *file, int write)
 	} else if (file->f_flags & O_APPEND) {
 		io->ci_lockreq = CILR_MANDATORY;
 	}
+
+	io->ci_noatime = file_is_noatime(file);
 }
 
 static ssize_t
diff --git a/drivers/staging/lustre/lustre/lov/lov_io.c b/drivers/staging/lustre/lustre/lov/lov_io.c
index 5a6ab70..65133ea 100644
--- a/drivers/staging/lustre/lustre/lov/lov_io.c
+++ b/drivers/staging/lustre/lustre/lov/lov_io.c
@@ -194,6 +194,7 @@ static int lov_io_sub_init(const struct lu_env *env, struct lov_io *lio,
 		sub_io->ci_lockreq = io->ci_lockreq;
 		sub_io->ci_type    = io->ci_type;
 		sub_io->ci_no_srvlock = io->ci_no_srvlock;
+		sub_io->ci_noatime = io->ci_noatime;
 
 		lov_sub_enter(sub);
 		result = cl_io_sub_init(sub->sub_env, sub_io,
diff --git a/drivers/staging/lustre/lustre/osc/osc_io.c b/drivers/staging/lustre/lustre/osc/osc_io.c
index 777ae24..5f3c545 100644
--- a/drivers/staging/lustre/lustre/osc/osc_io.c
+++ b/drivers/staging/lustre/lustre/osc/osc_io.c
@@ -512,19 +512,15 @@ static int osc_io_read_start(const struct lu_env *env,
 	struct osc_io    *oio   = cl2osc_io(env, slice);
 	struct cl_object *obj   = slice->cis_obj;
 	struct cl_attr   *attr  = &osc_env_info(env)->oti_attr;
-	int	      result = 0;
+	int rc = 0;
 
-	if (oio->oi_lockless == 0) {
+	if (oio->oi_lockless == 0 && !slice->cis_io->ci_noatime) {
 		cl_object_attr_lock(obj);
-		result = cl_object_attr_get(env, obj, attr);
-		if (result == 0) {
-			attr->cat_atime = LTIME_S(CURRENT_TIME);
-			result = cl_object_attr_set(env, obj, attr,
-						    CAT_ATIME);
-		}
+		attr->cat_atime = LTIME_S(CURRENT_TIME);
+		rc = cl_object_attr_set(env, obj, attr, CAT_ATIME);
 		cl_object_attr_unlock(obj);
 	}
-	return result;
+	return rc;
 }
 
 static int osc_io_write_start(const struct lu_env *env,
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 07/17] lustre/mdc: fix bad ERR_PTR usage in mdc_locks.c
  2014-03-01  2:16 [PATCH 00/17] Lustre stability patches Oleg Drokin
                   ` (5 preceding siblings ...)
  2014-03-01  2:16 ` [PATCH 06/17] lustre/clio: honor O_NOATIME Oleg Drokin
@ 2014-03-01  2:16 ` Oleg Drokin
  2014-03-01  2:16 ` [PATCH 08/17] lustre/recovery: free open/close request promptly Oleg Drokin
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-03-01  2:16 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel; +Cc: John L. Hammond, Oleg Drokin

From: "John L. Hammond" <john.hammond@intel.com>

In mdc_intent_open_pack() return an ERR_PTR() rather than NULL when
ldlm_prep_enqueue_req() fails. In mdc_intent_getattr_async() check the
return value of mdc_intent_getattr_pack() using IS_ERR(). Clean up the
includes in mdc_locks.c.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Reviewed-on: http://review.whamcloud.com/7886
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4078
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lustre/mdc/mdc_locks.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 6ef9e28..6110943 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -37,15 +37,15 @@
 #define DEBUG_SUBSYSTEM S_MDC
 
 # include <linux/module.h>
-# include <linux/pagemap.h>
-# include <linux/miscdevice.h>
 
-#include <lustre_acl.h>
+#include <linux/lustre_intent.h>
+#include <obd.h>
 #include <obd_class.h>
 #include <lustre_dlm.h>
-/* fid_res_name_eq() */
-#include <lustre_fid.h>
-#include <lprocfs_status.h>
+#include <lustre_fid.h> /* fid_res_name_eq() */
+#include <lustre_mdc.h>
+#include <lustre_net.h>
+#include <lustre_req_layout.h>
 #include "mdc_internal.h"
 
 struct mdc_getattr_args {
@@ -336,9 +336,9 @@ static struct ptlrpc_request *mdc_intent_open_pack(struct obd_export *exp,
 			     max(lmmsize, obddev->u.cli.cl_default_mds_easize));
 
 	rc = ldlm_prep_enqueue_req(exp, req, &cancels, count);
-	if (rc) {
+	if (rc < 0) {
 		ptlrpc_request_free(req);
-		return NULL;
+		return ERR_PTR(rc);
 	}
 
 	spin_lock(&req->rq_lock);
@@ -1281,8 +1281,8 @@ int mdc_intent_getattr_async(struct obd_export *exp,
 
 	fid_build_reg_res_name(&op_data->op_fid1, &res_id);
 	req = mdc_intent_getattr_pack(exp, it, op_data);
-	if (!req)
-		return -ENOMEM;
+	if (IS_ERR(req))
+		return PTR_ERR(req);
 
 	rc = mdc_enter_request(&obddev->u.cli);
 	if (rc != 0) {
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 08/17] lustre/recovery: free open/close request promptly
  2014-03-01  2:16 [PATCH 00/17] Lustre stability patches Oleg Drokin
                   ` (6 preceding siblings ...)
  2014-03-01  2:16 ` [PATCH 07/17] lustre/mdc: fix bad ERR_PTR usage in mdc_locks.c Oleg Drokin
@ 2014-03-01  2:16 ` Oleg Drokin
  2014-03-01  2:16 ` [PATCH 09/17] lustre/llite: simplify dentry revalidate Oleg Drokin
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-03-01  2:16 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel
  Cc: Hongchao Zhang, Niu Yawei, Oleg Drokin

From: Hongchao Zhang <hongchao.zhang@intel.com>

- For the non-create open or committed open, the open request
  should be freed along with the close request as soon as the
  close done, despite that the transno of open/close is
  greater than the last committed transno known by client or not.

- Move the committed open request into another dedicated list,
  that will avoid scanning a huge replay list on receiving each
  reply (when there are many open files).

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: http://review.whamcloud.com/6665
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2613
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |  6 +-
 .../staging/lustre/lustre/include/lustre_export.h  |  9 +++
 .../staging/lustre/lustre/include/lustre_import.h  | 11 +++
 drivers/staging/lustre/lustre/include/lustre_net.h |  2 +
 drivers/staging/lustre/lustre/include/obd.h        |  5 +-
 drivers/staging/lustre/lustre/include/obd_class.h  |  4 +-
 drivers/staging/lustre/lustre/llite/file.c         |  2 +-
 drivers/staging/lustre/lustre/llite/llite_lib.c    |  3 +-
 drivers/staging/lustre/lustre/lmv/lmv_obd.c        |  4 +-
 drivers/staging/lustre/lustre/mdc/mdc_internal.h   |  2 +-
 drivers/staging/lustre/lustre/mdc/mdc_locks.c      |  2 +-
 drivers/staging/lustre/lustre/mdc/mdc_reint.c      |  1 +
 drivers/staging/lustre/lustre/mdc/mdc_request.c    | 27 +++++++-
 drivers/staging/lustre/lustre/obdclass/genops.c    |  2 +
 .../lustre/lustre/obdclass/lprocfs_status.c        |  1 +
 drivers/staging/lustre/lustre/ptlrpc/client.c      | 78 +++++++++++++++++-----
 drivers/staging/lustre/lustre/ptlrpc/import.c      | 33 ++++++---
 drivers/staging/lustre/lustre/ptlrpc/recover.c     | 57 +++++++++++++---
 18 files changed, 198 insertions(+), 51 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 4c70c06..a55eebf 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -1305,6 +1305,7 @@ extern void lustre_swab_ptlrpc_body(struct ptlrpc_body *pb);
 #define OBD_CONNECT_SHORTIO     0x2000000000000ULL/* short io */
 #define OBD_CONNECT_PINGLESS	0x4000000000000ULL/* pings not required */
 #define OBD_CONNECT_FLOCK_DEAD	0x8000000000000ULL/* flock deadlock detection */
+#define OBD_CONNECT_DISP_STRIPE 0x10000000000000ULL/*create stripe disposition*/
 
 /* XXX README XXX:
  * Please DO NOT add flag values here before first ensuring that this same
@@ -1344,7 +1345,9 @@ extern void lustre_swab_ptlrpc_body(struct ptlrpc_body *pb);
 				OBD_CONNECT_LIGHTWEIGHT | OBD_CONNECT_UMASK | \
 				OBD_CONNECT_LVB_TYPE | OBD_CONNECT_LAYOUTLOCK |\
 				OBD_CONNECT_PINGLESS | OBD_CONNECT_MAX_EASIZE |\
-				OBD_CONNECT_FLOCK_DEAD)
+				OBD_CONNECT_FLOCK_DEAD | \
+				OBD_CONNECT_DISP_STRIPE)
+
 #define OST_CONNECT_SUPPORTED  (OBD_CONNECT_SRVLOCK | OBD_CONNECT_GRANT | \
 				OBD_CONNECT_REQPORTAL | OBD_CONNECT_VERSION | \
 				OBD_CONNECT_TRUNCLOCK | OBD_CONNECT_INDEX | \
@@ -2114,6 +2117,7 @@ extern void lustre_swab_generic_32s (__u32 *val);
 #define DISP_ENQ_CREATE_REF  0x01000000
 #define DISP_OPEN_LOCK       0x02000000
 #define DISP_OPEN_LEASE      0x04000000
+#define DISP_OPEN_STRIPE     0x08000000
 
 /* INODE LOCK PARTS */
 #define MDS_INODELOCK_LOOKUP 0x000001	/* For namespace, dentry etc, and also
diff --git a/drivers/staging/lustre/lustre/include/lustre_export.h b/drivers/staging/lustre/lustre/include/lustre_export.h
index 82a230b..6f7f48c 100644
--- a/drivers/staging/lustre/lustre/include/lustre_export.h
+++ b/drivers/staging/lustre/lustre/include/lustre_export.h
@@ -388,6 +388,15 @@ static inline __u64 exp_connect_ibits(struct obd_export *exp)
 	return ocd->ocd_ibits_known;
 }
 
+static inline bool imp_connect_disp_stripe(struct obd_import *imp)
+{
+	struct obd_connect_data *ocd;
+
+	LASSERT(imp != NULL);
+	ocd = &imp->imp_connect_data;
+	return ocd->ocd_connect_flags & OBD_CONNECT_DISP_STRIPE;
+}
+
 extern struct obd_export *class_conn2export(struct lustre_handle *conn);
 extern struct obd_device *class_conn2obd(struct lustre_handle *conn);
 
diff --git a/drivers/staging/lustre/lustre/include/lustre_import.h b/drivers/staging/lustre/lustre/include/lustre_import.h
index 67259eb..e9833ae 100644
--- a/drivers/staging/lustre/lustre/include/lustre_import.h
+++ b/drivers/staging/lustre/lustre/include/lustre_import.h
@@ -180,6 +180,17 @@ struct obd_import {
 	struct list_head		imp_delayed_list;
 	/** @} */
 
+	/**
+	 * List of requests that are retained for committed open replay. Once
+	 * open is committed, open replay request will be moved from the
+	 * imp_replay_list into the imp_committed_list.
+	 * The imp_replay_cursor is for accelerating searching during replay.
+	 * @{
+	 */
+	struct list_head		imp_committed_list;
+	struct list_head	       *imp_replay_cursor;
+	/** @} */
+
 	/** obd device for this import */
 	struct obd_device	*imp_obd;
 
diff --git a/drivers/staging/lustre/lustre/include/lustre_net.h b/drivers/staging/lustre/lustre/include/lustre_net.h
index d8d0880..11382ab 100644
--- a/drivers/staging/lustre/lustre/include/lustre_net.h
+++ b/drivers/staging/lustre/lustre/include/lustre_net.h
@@ -2621,6 +2621,8 @@ int ptlrpc_register_rqbd(struct ptlrpc_request_buffer_desc *rqbd);
  * request queues, request management, etc.
  * @{
  */
+void ptlrpc_request_committed(struct ptlrpc_request *req, int force);
+
 void ptlrpc_init_client(int req_portal, int rep_portal, char *name,
 			struct ptlrpc_client *);
 void ptlrpc_cleanup_client(struct obd_import *imp);
diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h
index c3470ce..1b38695 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -1323,7 +1323,8 @@ struct md_open_data {
 	struct obd_client_handle *mod_och;
 	struct ptlrpc_request    *mod_open_req;
 	struct ptlrpc_request    *mod_close_req;
-	atomic_t	      mod_refcount;
+	atomic_t		  mod_refcount;
+	bool			  mod_is_create;
 };
 
 struct lookup_intent;
@@ -1392,7 +1393,7 @@ struct md_ops {
 
 	int (*m_set_open_replay_data)(struct obd_export *,
 				      struct obd_client_handle *,
-				      struct ptlrpc_request *);
+				      struct lookup_intent *);
 	int (*m_clear_open_replay_data)(struct obd_export *,
 					struct obd_client_handle *);
 	int (*m_set_lock_data)(struct obd_export *, __u64 *, void *, __u64 *);
diff --git a/drivers/staging/lustre/lustre/include/obd_class.h b/drivers/staging/lustre/lustre/include/obd_class.h
index 1c2ba19..0a18820 100644
--- a/drivers/staging/lustre/lustre/include/obd_class.h
+++ b/drivers/staging/lustre/lustre/include/obd_class.h
@@ -2001,11 +2001,11 @@ static inline int md_getxattr(struct obd_export *exp,
 
 static inline int md_set_open_replay_data(struct obd_export *exp,
 					  struct obd_client_handle *och,
-					  struct ptlrpc_request *open_req)
+					  struct lookup_intent *it)
 {
 	EXP_CHECK_MD_OP(exp, set_open_replay_data);
 	EXP_MD_COUNTER_INCREMENT(exp, set_open_replay_data);
-	return MDP(exp->exp_obd, set_open_replay_data)(exp, och, open_req);
+	return MDP(exp->exp_obd, set_open_replay_data)(exp, och, it);
 }
 
 static inline int md_clear_open_replay_data(struct obd_export *exp,
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 362f5ec..7ceec74 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -480,7 +480,7 @@ static int ll_och_fill(struct obd_export *md_exp, struct lookup_intent *it,
 	och->och_magic = OBD_CLIENT_HANDLE_MAGIC;
 	och->och_flags = it->it_flags;
 
-	return md_set_open_replay_data(md_exp, och, req);
+	return md_set_open_replay_data(md_exp, och, it);
 }
 
 int ll_local_open(struct file *file, struct lookup_intent *it,
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index 85c01e1..7427f69 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -208,7 +208,8 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt,
 				  OBD_CONNECT_LAYOUTLOCK |
 				  OBD_CONNECT_PINGLESS |
 				  OBD_CONNECT_MAX_EASIZE |
-				  OBD_CONNECT_FLOCK_DEAD;
+				  OBD_CONNECT_FLOCK_DEAD |
+				  OBD_CONNECT_DISP_STRIPE;
 
 	if (sbi->ll_flags & LL_SBI_SOM_PREVIEW)
 		data->ocd_connect_flags |= OBD_CONNECT_SOM;
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 1bddd8f..40fbd44 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -2593,7 +2593,7 @@ int lmv_free_lustre_md(struct obd_export *exp, struct lustre_md *md)
 
 int lmv_set_open_replay_data(struct obd_export *exp,
 			     struct obd_client_handle *och,
-			     struct ptlrpc_request *open_req)
+			     struct lookup_intent *it)
 {
 	struct obd_device       *obd = exp->exp_obd;
 	struct lmv_obd	  *lmv = &obd->u.lmv;
@@ -2603,7 +2603,7 @@ int lmv_set_open_replay_data(struct obd_export *exp,
 	if (IS_ERR(tgt))
 		return PTR_ERR(tgt);
 
-	return md_set_open_replay_data(tgt->ltd_exp, och, open_req);
+	return md_set_open_replay_data(tgt->ltd_exp, och, it);
 }
 
 int lmv_clear_open_replay_data(struct obd_export *exp,
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_internal.h b/drivers/staging/lustre/lustre/mdc/mdc_internal.h
index fc21777..c78bf00 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_internal.h
+++ b/drivers/staging/lustre/lustre/mdc/mdc_internal.h
@@ -122,7 +122,7 @@ int mdc_free_lustre_md(struct obd_export *exp, struct lustre_md *md);
 
 int mdc_set_open_replay_data(struct obd_export *exp,
 			     struct obd_client_handle *och,
-			     struct ptlrpc_request *open_req);
+			     struct lookup_intent *it);
 
 int mdc_clear_open_replay_data(struct obd_export *exp,
 			       struct obd_client_handle *och);
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 6110943..20706e7 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -641,7 +641,7 @@ static int mdc_finish_enqueue(struct obd_export *exp,
 			 * happens immediately after swabbing below, new reply
 			 * is swabbed by that handler correctly.
 			 */
-			mdc_set_open_replay_data(NULL, NULL, req);
+			mdc_set_open_replay_data(NULL, NULL, it);
 		}
 
 		if ((body->valid & (OBD_MD_FLDIREA | OBD_MD_FLEASIZE)) != 0) {
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_reint.c b/drivers/staging/lustre/lustre/mdc/mdc_reint.c
index 1aea154..d79aa16 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_reint.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_reint.c
@@ -165,6 +165,7 @@ int mdc_setattr(struct obd_export *exp, struct md_op_data *op_data,
 			req->rq_cb_data = *mod;
 			(*mod)->mod_open_req = req;
 			req->rq_commit_cb = mdc_commit_open;
+			(*mod)->mod_is_create = true;
 			/**
 			 * Take an extra reference on \var mod, it protects \var
 			 * mod from being freed on eviction (commit callback is
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c b/drivers/staging/lustre/lustre/mdc/mdc_request.c
index 17c8e14..d9ddb39 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_request.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c
@@ -722,11 +722,12 @@ void mdc_commit_open(struct ptlrpc_request *req)
 
 int mdc_set_open_replay_data(struct obd_export *exp,
 			     struct obd_client_handle *och,
-			     struct ptlrpc_request *open_req)
+			     struct lookup_intent *it)
 {
 	struct md_open_data   *mod;
 	struct mdt_rec_create *rec;
 	struct mdt_body       *body;
+	struct ptlrpc_request *open_req = it->d.lustre.it_data;
 	struct obd_import     *imp = open_req->rq_import;
 
 	if (!open_req->rq_replay)
@@ -760,6 +761,8 @@ int mdc_set_open_replay_data(struct obd_export *exp,
 		spin_lock(&open_req->rq_lock);
 		och->och_mod = mod;
 		mod->mod_och = och;
+		mod->mod_is_create = it_disposition(it, DISP_OPEN_CREATE) ||
+				     it_disposition(it, DISP_OPEN_STRIPE);
 		mod->mod_open_req = open_req;
 		open_req->rq_cb_data = mod;
 		open_req->rq_commit_cb = mdc_commit_open;
@@ -780,6 +783,23 @@ int mdc_set_open_replay_data(struct obd_export *exp,
 	return 0;
 }
 
+static void mdc_free_open(struct md_open_data *mod)
+{
+	int committed = 0;
+
+	if (mod->mod_is_create == 0 &&
+	    imp_connect_disp_stripe(mod->mod_open_req->rq_import))
+		committed = 1;
+
+	LASSERT(mod->mod_open_req->rq_replay == 0);
+
+	DEBUG_REQ(D_RPCTRACE, mod->mod_open_req, "free open request\n");
+
+	ptlrpc_request_committed(mod->mod_open_req, committed);
+	if (mod->mod_close_req)
+		ptlrpc_request_committed(mod->mod_close_req, committed);
+}
+
 int mdc_clear_open_replay_data(struct obd_export *exp,
 			       struct obd_client_handle *och)
 {
@@ -793,6 +813,8 @@ int mdc_clear_open_replay_data(struct obd_export *exp,
 		return 0;
 
 	LASSERT(mod != LP_POISON);
+	LASSERT(mod->mod_open_req != NULL);
+	mdc_free_open(mod);
 
 	mod->mod_och = NULL;
 	och->och_mod = NULL;
@@ -991,6 +1013,9 @@ int mdc_done_writing(struct obd_export *exp, struct md_op_data *op_data,
 	if (mod) {
 		if (rc != 0)
 			mod->mod_close_req = NULL;
+		LASSERT(mod->mod_open_req != NULL);
+		mdc_free_open(mod);
+
 		/* Since now, mod is accessed through setattr req only,
 		 * thus DW req does not keep a reference on mod anymore. */
 		obd_mod_put(mod);
diff --git a/drivers/staging/lustre/lustre/obdclass/genops.c b/drivers/staging/lustre/lustre/obdclass/genops.c
index d27f041..169c9ed 100644
--- a/drivers/staging/lustre/lustre/obdclass/genops.c
+++ b/drivers/staging/lustre/lustre/obdclass/genops.c
@@ -1010,6 +1010,8 @@ struct obd_import *class_new_import(struct obd_device *obd)
 	INIT_LIST_HEAD(&imp->imp_replay_list);
 	INIT_LIST_HEAD(&imp->imp_sending_list);
 	INIT_LIST_HEAD(&imp->imp_delayed_list);
+	INIT_LIST_HEAD(&imp->imp_committed_list);
+	imp->imp_replay_cursor = &imp->imp_committed_list;
 	spin_lock_init(&imp->imp_lock);
 	imp->imp_last_success_conn = 0;
 	imp->imp_state = LUSTRE_IMP_NEW;
diff --git a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
index 6e7d2e5..1432dd7 100644
--- a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
+++ b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
@@ -99,6 +99,7 @@ static const char * const obd_connect_names[] = {
 	"short_io",
 	"pingless",
 	"flock_deadlock",
+	"disp_stripe",
 	"unknown",
 	NULL
 };
diff --git a/drivers/staging/lustre/lustre/ptlrpc/client.c b/drivers/staging/lustre/lustre/ptlrpc/client.c
index eb33bb7..a32b722 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/client.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/client.c
@@ -2360,6 +2360,39 @@ int ptlrpc_unregister_reply(struct ptlrpc_request *request, int async)
 }
 EXPORT_SYMBOL(ptlrpc_unregister_reply);
 
+static void ptlrpc_free_request(struct ptlrpc_request *req)
+{
+	spin_lock(&req->rq_lock);
+	req->rq_replay = 0;
+	spin_unlock(&req->rq_lock);
+
+	if (req->rq_commit_cb != NULL)
+		req->rq_commit_cb(req);
+	list_del_init(&req->rq_replay_list);
+
+	__ptlrpc_req_finished(req, 1);
+}
+
+/**
+ * the request is committed and dropped from the replay list of its import
+ */
+void ptlrpc_request_committed(struct ptlrpc_request *req, int force)
+{
+	struct obd_import	*imp = req->rq_import;
+
+	spin_lock(&imp->imp_lock);
+	if (list_empty(&req->rq_replay_list)) {
+		spin_unlock(&imp->imp_lock);
+		return;
+	}
+
+	if (force || req->rq_transno <= imp->imp_peer_committed_transno)
+		ptlrpc_free_request(req);
+
+	spin_unlock(&imp->imp_lock);
+}
+EXPORT_SYMBOL(ptlrpc_request_committed);
+
 /**
  * Iterates through replay_list on import and prunes
  * all requests have transno smaller than last_committed for the
@@ -2370,9 +2403,9 @@ EXPORT_SYMBOL(ptlrpc_unregister_reply);
  */
 void ptlrpc_free_committed(struct obd_import *imp)
 {
-	struct list_head *tmp, *saved;
-	struct ptlrpc_request *req;
+	struct ptlrpc_request *req, *saved;
 	struct ptlrpc_request *last_req = NULL; /* temporary fire escape */
+	bool		       skip_committed_list = true;
 
 	LASSERT(imp != NULL);
 
@@ -2388,13 +2421,15 @@ void ptlrpc_free_committed(struct obd_import *imp)
 	CDEBUG(D_RPCTRACE, "%s: committing for last_committed "LPU64" gen %d\n",
 	       imp->imp_obd->obd_name, imp->imp_peer_committed_transno,
 	       imp->imp_generation);
+
+	if (imp->imp_generation != imp->imp_last_generation_checked)
+		skip_committed_list = false;
+
 	imp->imp_last_transno_checked = imp->imp_peer_committed_transno;
 	imp->imp_last_generation_checked = imp->imp_generation;
 
-	list_for_each_safe(tmp, saved, &imp->imp_replay_list) {
-		req = list_entry(tmp, struct ptlrpc_request,
-				     rq_replay_list);
-
+	list_for_each_entry_safe(req, saved, &imp->imp_replay_list,
+				 rq_replay_list) {
 		/* XXX ok to remove when 1357 resolved - rread 05/29/03  */
 		LASSERT(req != last_req);
 		last_req = req;
@@ -2408,27 +2443,34 @@ void ptlrpc_free_committed(struct obd_import *imp)
 			GOTO(free_req, 0);
 		}
 
-		if (req->rq_replay) {
-			DEBUG_REQ(D_RPCTRACE, req, "keeping (FL_REPLAY)");
-			continue;
-		}
-
 		/* not yet committed */
 		if (req->rq_transno > imp->imp_peer_committed_transno) {
 			DEBUG_REQ(D_RPCTRACE, req, "stopping search");
 			break;
 		}
 
+		if (req->rq_replay) {
+			DEBUG_REQ(D_RPCTRACE, req, "keeping (FL_REPLAY)");
+			list_move_tail(&req->rq_replay_list,
+				       &imp->imp_committed_list);
+			continue;
+		}
+
 		DEBUG_REQ(D_INFO, req, "commit (last_committed "LPU64")",
 			  imp->imp_peer_committed_transno);
 free_req:
-		spin_lock(&req->rq_lock);
-		req->rq_replay = 0;
-		spin_unlock(&req->rq_lock);
-		if (req->rq_commit_cb != NULL)
-			req->rq_commit_cb(req);
-		list_del_init(&req->rq_replay_list);
-		__ptlrpc_req_finished(req, 1);
+		ptlrpc_free_request(req);
+	}
+	if (skip_committed_list)
+		return;
+
+	list_for_each_entry_safe(req, saved, &imp->imp_committed_list,
+				 rq_replay_list) {
+		LASSERT(req->rq_transno != 0);
+		if (req->rq_import_generation < imp->imp_generation) {
+			DEBUG_REQ(D_RPCTRACE, req, "free stale open request");
+			ptlrpc_free_request(req);
+		}
 	}
 }
 
diff --git a/drivers/staging/lustre/lustre/ptlrpc/import.c b/drivers/staging/lustre/lustre/ptlrpc/import.c
index 82db0ed..537aa62 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/import.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/import.c
@@ -560,17 +560,30 @@ static int ptlrpc_first_transno(struct obd_import *imp, __u64 *transno)
 	struct ptlrpc_request *req;
 	struct list_head *tmp;
 
-	if (list_empty(&imp->imp_replay_list))
-		return 0;
-	tmp = imp->imp_replay_list.next;
-	req = list_entry(tmp, struct ptlrpc_request, rq_replay_list);
-	*transno = req->rq_transno;
-	if (req->rq_transno == 0) {
-		DEBUG_REQ(D_ERROR, req, "zero transno in replay");
-		LBUG();
+	/* The requests in committed_list always have smaller transnos than
+	 * the requests in replay_list */
+	if (!list_empty(&imp->imp_committed_list)) {
+		tmp = imp->imp_committed_list.next;
+		req = list_entry(tmp, struct ptlrpc_request, rq_replay_list);
+		*transno = req->rq_transno;
+		if (req->rq_transno == 0) {
+			DEBUG_REQ(D_ERROR, req,
+				  "zero transno in committed_list");
+			LBUG();
+		}
+		return 1;
 	}
-
-	return 1;
+	if (!list_empty(&imp->imp_replay_list)) {
+		tmp = imp->imp_replay_list.next;
+		req = list_entry(tmp, struct ptlrpc_request, rq_replay_list);
+		*transno = req->rq_transno;
+		if (req->rq_transno == 0) {
+			DEBUG_REQ(D_ERROR, req, "zero transno in replay_list");
+			LBUG();
+		}
+		return 1;
+	}
+	return 0;
 }
 
 /**
diff --git a/drivers/staging/lustre/lustre/ptlrpc/recover.c b/drivers/staging/lustre/lustre/ptlrpc/recover.c
index 84c39e0..48ae328 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/recover.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/recover.c
@@ -105,24 +105,59 @@ int ptlrpc_replay_next(struct obd_import *imp, int *inflight)
 	 * imp_lock is being held by ptlrpc_replay, but it's not. it's
 	 * just a little race...
 	 */
-	list_for_each_safe(tmp, pos, &imp->imp_replay_list) {
+
+	/* Replay all the committed open requests on committed_list first */
+	if (!list_empty(&imp->imp_committed_list)) {
+		tmp = imp->imp_committed_list.prev;
 		req = list_entry(tmp, struct ptlrpc_request,
 				     rq_replay_list);
 
-		/* If need to resend the last sent transno (because a
-		   reconnect has occurred), then stop on the matching
-		   req and send it again. If, however, the last sent
-		   transno has been committed then we continue replay
-		   from the next request. */
+		/* The last request on committed_list hasn't been replayed */
 		if (req->rq_transno > last_transno) {
-			if (imp->imp_resend_replay)
-				lustre_msg_add_flags(req->rq_reqmsg,
-						     MSG_RESENT);
-			break;
+			/* Since the imp_committed_list is immutable before
+			 * all of it's requests being replayed, it's safe to
+			 * use a cursor to accelerate the search */
+			imp->imp_replay_cursor = imp->imp_replay_cursor->next;
+
+			while (imp->imp_replay_cursor !=
+			       &imp->imp_committed_list) {
+				req = list_entry(imp->imp_replay_cursor,
+						 struct ptlrpc_request,
+						 rq_replay_list);
+				if (req->rq_transno > last_transno)
+					break;
+
+				req = NULL;
+				imp->imp_replay_cursor =
+					imp->imp_replay_cursor->next;
+			}
+		} else {
+			/* All requests on committed_list have been replayed */
+			imp->imp_replay_cursor = &imp->imp_committed_list;
+			req = NULL;
+		}
+	}
+
+	/* All the requests in committed list have been replayed, let's replay
+	 * the imp_replay_list */
+	if (req == NULL) {
+		list_for_each_safe(tmp, pos, &imp->imp_replay_list) {
+			req = list_entry(tmp, struct ptlrpc_request,
+					 rq_replay_list);
+
+			if (req->rq_transno > last_transno)
+				break;
+			req = NULL;
 		}
-		req = NULL;
 	}
 
+	/* If need to resend the last sent transno (because a reconnect
+	 * has occurred), then stop on the matching req and send it again.
+	 * If, however, the last sent transno has been committed then we
+	 * continue replay from the next request. */
+	if (req != NULL && imp->imp_resend_replay)
+		lustre_msg_add_flags(req->rq_reqmsg, MSG_RESENT);
+
 	spin_lock(&imp->imp_lock);
 	imp->imp_resend_replay = 0;
 	spin_unlock(&imp->imp_lock);
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 09/17] lustre/llite: simplify dentry revalidate
  2014-03-01  2:16 [PATCH 00/17] Lustre stability patches Oleg Drokin
                   ` (7 preceding siblings ...)
  2014-03-01  2:16 ` [PATCH 08/17] lustre/recovery: free open/close request promptly Oleg Drokin
@ 2014-03-01  2:16 ` Oleg Drokin
  2014-03-01  2:16 ` [PATCH 10/17] lustre/ldlm: set l_lvb_type coherent when layout is returned Oleg Drokin
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-03-01  2:16 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel; +Cc: Lai Siyao, Oleg Drokin

From: Lai Siyao <lai.siyao@intel.com>

Lustre client dentry validation is protected by LDLM lock, so
any time a dentry is found, it's valid and no need to revalidate
from MDS, and even it does, there is race that it may be
invalidated after revalidation is finished.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-on: http://review.whamcloud.com/7475
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3544
Reviewed-by: Peng Tao <bergwolf@gmail.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |   2 +-
 drivers/staging/lustre/lustre/llite/dcache.c       | 290 ++-------------------
 drivers/staging/lustre/lustre/llite/file.c         |   8 +-
 .../staging/lustre/lustre/llite/llite_internal.h   |   4 +-
 drivers/staging/lustre/lustre/lmv/lmv_intent.c     |   1 -
 drivers/staging/lustre/lustre/lmv/lmv_obd.c        |   1 -
 drivers/staging/lustre/lustre/mdc/mdc_locks.c      |  52 ++--
 7 files changed, 45 insertions(+), 313 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index a55eebf..5f5b0ba 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -2112,7 +2112,7 @@ extern void lustre_swab_generic_32s (__u32 *val);
 #define DISP_LOOKUP_POS      0x00000008
 #define DISP_OPEN_CREATE     0x00000010
 #define DISP_OPEN_OPEN       0x00000020
-#define DISP_ENQ_COMPLETE    0x00400000
+#define DISP_ENQ_COMPLETE    0x00400000		/* obsolete and unused */
 #define DISP_ENQ_OPEN_REF    0x00800000
 #define DISP_ENQ_CREATE_REF  0x01000000
 #define DISP_OPEN_LOCK       0x02000000
diff --git a/drivers/staging/lustre/lustre/llite/dcache.c b/drivers/staging/lustre/lustre/llite/dcache.c
index 3907c87..f971a54 100644
--- a/drivers/staging/lustre/lustre/llite/dcache.c
+++ b/drivers/staging/lustre/lustre/llite/dcache.c
@@ -241,9 +241,6 @@ void ll_intent_release(struct lookup_intent *it)
 		 ptlrpc_req_finished(it->d.lustre.it_data); /* ll_file_open */
 	if (it_disposition(it, DISP_ENQ_CREATE_REF)) /* create rec */
 		ptlrpc_req_finished(it->d.lustre.it_data);
-	if (it_disposition(it, DISP_ENQ_COMPLETE)) /* saved req from revalidate
-						    * to lookup */
-		ptlrpc_req_finished(it->d.lustre.it_data);
 
 	it->d.lustre.it_disposition = 0;
 	it->d.lustre.it_data = NULL;
@@ -328,262 +325,32 @@ void ll_frob_intent(struct lookup_intent **itp, struct lookup_intent *deft)
 
 }
 
-int ll_revalidate_it(struct dentry *de, int lookup_flags,
-		     struct lookup_intent *it)
+static int ll_revalidate_dentry(struct dentry *dentry,
+				unsigned int lookup_flags)
 {
-	struct md_op_data *op_data;
-	struct ptlrpc_request *req = NULL;
-	struct lookup_intent lookup_it = { .it_op = IT_LOOKUP };
-	struct obd_export *exp;
-	struct inode *parent = de->d_parent->d_inode;
-	int rc;
-
-	CDEBUG(D_VFSTRACE, "VFS Op:name=%s,intent=%s\n", de->d_name.name,
-	       LL_IT2STR(it));
-
-	LASSERT(de != de->d_sb->s_root);
-
-	if (de->d_inode == NULL) {
-		__u64 ibits;
-
-		/* We can only use negative dentries if this is stat or lookup,
-		   for opens and stuff we do need to query server. */
-		/* If there is IT_CREAT in intent op set, then we must throw
-		   away this negative dentry and actually do the request to
-		   kernel to create whatever needs to be created (if possible)*/
-		if (it && (it->it_op & IT_CREAT))
-			return 0;
+	struct inode *dir = dentry->d_parent->d_inode;
 
-		if (d_lustre_invalid(de))
-			return 0;
-
-		ibits = MDS_INODELOCK_UPDATE;
-		rc = ll_have_md_lock(parent, &ibits, LCK_MINMODE);
-		GOTO(out_sa, rc);
-	}
-
-	/* Never execute intents for mount points.
-	 * Attributes will be fixed up in ll_inode_revalidate_it */
-	if (d_mountpoint(de))
-		GOTO(out_sa, rc = 1);
-
-	exp = ll_i2mdexp(de->d_inode);
-
-	OBD_FAIL_TIMEOUT(OBD_FAIL_MDC_REVALIDATE_PAUSE, 5);
-	ll_frob_intent(&it, &lookup_it);
-	LASSERT(it);
+	/*
+	 * if open&create is set, talk to MDS to make sure file is created if
+	 * necessary, because we can't do this in ->open() later since that's
+	 * called on an inode. return 0 here to let lookup to handle this.
+	 */
+	if ((lookup_flags & (LOOKUP_OPEN | LOOKUP_CREATE)) ==
+	    (LOOKUP_OPEN | LOOKUP_CREATE))
+		return 0;
 
-	if (it->it_op == IT_LOOKUP && !d_lustre_invalid(de))
+	if (lookup_flags & (LOOKUP_PARENT | LOOKUP_OPEN | LOOKUP_CREATE))
 		return 1;
 
-	if (it->it_op == IT_OPEN) {
-		struct inode *inode = de->d_inode;
-		struct ll_inode_info *lli = ll_i2info(inode);
-		struct obd_client_handle **och_p;
-		__u64 ibits;
-
-		/*
-		 * We used to check for MDS_INODELOCK_OPEN here, but in fact
-		 * just having LOOKUP lock is enough to justify inode is the
-		 * same. And if inode is the same and we have suitable
-		 * openhandle, then there is no point in doing another OPEN RPC
-		 * just to throw away newly received openhandle.  There are no
-		 * security implications too, if file owner or access mode is
-		 * change, LOOKUP lock is revoked.
-		 */
-
-
-		if (it->it_flags & FMODE_WRITE)
-			och_p = &lli->lli_mds_write_och;
-		else if (it->it_flags & FMODE_EXEC)
-			och_p = &lli->lli_mds_exec_och;
-		else
-			och_p = &lli->lli_mds_read_och;
-
-		/* Check for the proper lock. */
-		ibits = MDS_INODELOCK_LOOKUP;
-		if (!ll_have_md_lock(inode, &ibits, LCK_MINMODE))
-			goto do_lock;
-		mutex_lock(&lli->lli_och_mutex);
-		if (*och_p) { /* Everything is open already, do nothing */
-			/* Originally it was idea to do not let them steal our
-			 * open handle from under us by (*och_usecount)++ here.
-			 * But in case we have the handle, but we cannot use it
-			 * due to later checks (e.g. O_CREAT|O_EXCL flags set),
-			 * nobody would decrement counter increased here. So we
-			 * just hope the lock won't be invalidated in between.
-			 * But if it would be, we'll reopen the open request to
-			 * MDS later during file open path.
-			 */
-			mutex_unlock(&lli->lli_och_mutex);
-			return 1;
-		}
-		mutex_unlock(&lli->lli_och_mutex);
-	}
-
-	if (it->it_op == IT_GETATTR) {
-		rc = ll_statahead_enter(parent, &de, 0);
-		if (rc == 1)
-			goto mark;
-		else if (rc != -EAGAIN && rc != 0)
-			GOTO(out, rc = 0);
-	}
-
-do_lock:
-	op_data = ll_prep_md_op_data(NULL, parent, de->d_inode,
-				     de->d_name.name, de->d_name.len,
-				     0, LUSTRE_OPC_ANY, NULL);
-	if (IS_ERR(op_data))
-		return PTR_ERR(op_data);
-
-	if (!IS_POSIXACL(parent) || !exp_connect_umask(exp))
-		it->it_create_mode &= ~current_umask();
-	it->it_create_mode |= M_CHECK_STALE;
-	rc = md_intent_lock(exp, op_data, NULL, 0, it,
-			    lookup_flags,
-			    &req, ll_md_blocking_ast, 0);
-	it->it_create_mode &= ~M_CHECK_STALE;
-	ll_finish_md_op_data(op_data);
-
-	/* If req is NULL, then md_intent_lock only tried to do a lock match;
-	 * if all was well, it will return 1 if it found locks, 0 otherwise. */
-	if (req == NULL && rc >= 0) {
-		if (!rc)
-			goto do_lookup;
-		GOTO(out, rc);
-	}
-
-	if (rc < 0) {
-		if (rc != -ESTALE) {
-			CDEBUG(D_INFO, "ll_intent_lock: rc %d : it->it_status "
-			       "%d\n", rc, it->d.lustre.it_status);
-		}
-		GOTO(out, rc = 0);
-	}
-
-revalidate_finish:
-	rc = ll_revalidate_it_finish(req, it, de);
-	if (rc != 0) {
-		if (rc != -ESTALE && rc != -ENOENT)
-			ll_intent_release(it);
-		GOTO(out, rc = 0);
-	}
-
-	if ((it->it_op & IT_OPEN) && de->d_inode &&
-	    !S_ISREG(de->d_inode->i_mode) &&
-	    !S_ISDIR(de->d_inode->i_mode)) {
-		ll_release_openhandle(de, it);
-	}
-	rc = 1;
-
-out:
-	/* We do not free request as it may be reused during following lookup
-	 * (see comment in mdc/mdc_locks.c::mdc_intent_lock()), request will
-	 * be freed in ll_lookup_it or in ll_intent_release. But if
-	 * request was not completed, we need to free it. (bug 5154, 9903) */
-	if (req != NULL && !it_disposition(it, DISP_ENQ_COMPLETE))
-		ptlrpc_req_finished(req);
-	if (rc == 0) {
-		/* mdt may grant layout lock for the newly created file, so
-		 * release the lock to avoid leaking */
-		ll_intent_drop_lock(it);
-		ll_invalidate_aliases(de->d_inode);
-	} else {
-		__u64 bits = 0;
-		__u64 matched_bits = 0;
-
-		CDEBUG(D_DENTRY, "revalidated dentry %.*s (%p) parent %p "
-		       "inode %p refc %d\n", de->d_name.len,
-		       de->d_name.name, de, de->d_parent, de->d_inode,
-		       d_count(de));
-
-		ll_set_lock_data(exp, de->d_inode, it, &bits);
-
-		/* Note: We have to match both LOOKUP and PERM lock
-		 * here to make sure the dentry is valid and no one
-		 * changing the permission.
-		 * But if the client connects < 2.4 server, which will
-		 * only grant LOOKUP lock, so we can only Match LOOKUP
-		 * lock for old server */
-		if (exp_connect_flags(ll_i2mdexp(de->d_inode)) &&
-							OBD_CONNECT_LVB_TYPE)
-			matched_bits =
-				MDS_INODELOCK_LOOKUP | MDS_INODELOCK_PERM;
-		else
-			matched_bits = MDS_INODELOCK_LOOKUP;
-
-		if (((bits & matched_bits) == matched_bits) &&
-		    d_lustre_invalid(de))
-			d_lustre_revalidate(de);
-		ll_lookup_finish_locks(it, de);
-	}
-
-mark:
-	if (it != NULL && it->it_op == IT_GETATTR && rc > 0)
-		ll_statahead_mark(parent, de);
-	return rc;
+	if (d_need_statahead(dir, dentry) <= 0)
+		return 1;
 
-	/*
-	 * This part is here to combat evil-evil race in real_lookup on 2.6
-	 * kernels.  The race details are: We enter do_lookup() looking for some
-	 * name, there is nothing in dcache for this name yet and d_lookup()
-	 * returns NULL.  We proceed to real_lookup(), and while we do this,
-	 * another process does open on the same file we looking up (most simple
-	 * reproducer), open succeeds and the dentry is added. Now back to
-	 * us. In real_lookup() we do d_lookup() again and suddenly find the
-	 * dentry, so we call d_revalidate on it, but there is no lock, so
-	 * without this code we would return 0, but unpatched real_lookup just
-	 * returns -ENOENT in such a case instead of retrying the lookup. Once
-	 * this is dealt with in real_lookup(), all of this ugly mess can go and
-	 * we can just check locks in ->d_revalidate without doing any RPCs
-	 * ever.
-	 */
-do_lookup:
-	if (it != &lookup_it) {
-		/* MDS_INODELOCK_UPDATE needed for IT_GETATTR case. */
-		if (it->it_op == IT_GETATTR)
-			lookup_it.it_op = IT_GETATTR;
-		ll_lookup_finish_locks(it, de);
-		it = &lookup_it;
-	}
+	if (lookup_flags & LOOKUP_RCU)
+		return -ECHILD;
 
-	/* Do real lookup here. */
-	op_data = ll_prep_md_op_data(NULL, parent, NULL, de->d_name.name,
-				     de->d_name.len, 0, (it->it_op & IT_CREAT ?
-							 LUSTRE_OPC_CREATE :
-							 LUSTRE_OPC_ANY), NULL);
-	if (IS_ERR(op_data))
-		return PTR_ERR(op_data);
-
-	rc = md_intent_lock(exp, op_data, NULL, 0,  it, 0, &req,
-			    ll_md_blocking_ast, 0);
-	if (rc >= 0) {
-		struct mdt_body *mdt_body;
-		struct lu_fid fid = {.f_seq = 0, .f_oid = 0, .f_ver = 0};
-		mdt_body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY);
-
-		if (de->d_inode)
-			fid = *ll_inode2fid(de->d_inode);
-
-		/* see if we got same inode, if not - return error */
-		if (lu_fid_eq(&fid, &mdt_body->fid1)) {
-			ll_finish_md_op_data(op_data);
-			op_data = NULL;
-			goto revalidate_finish;
-		}
-		ll_intent_release(it);
-	}
-	ll_finish_md_op_data(op_data);
-	GOTO(out, rc = 0);
-
-out_sa:
-	/*
-	 * For rc == 1 case, should not return directly to prevent losing
-	 * statahead windows; for rc == 0 case, the "lookup" will be done later.
-	 */
-	if (it != NULL && it->it_op == IT_GETATTR && rc == 1)
-		ll_statahead_enter(parent, &de, 1);
-	goto mark;
+	do_statahead_enter(dir, &dentry, dentry->d_inode == NULL);
+	ll_statahead_mark(dir, dentry);
+	return 1;
 }
 
 /*
@@ -591,24 +358,13 @@ out_sa:
  */
 int ll_revalidate_nd(struct dentry *dentry, unsigned int flags)
 {
-	struct inode *parent = dentry->d_parent->d_inode;
-	int unplug = 0;
+	int rc;
 
-	CDEBUG(D_VFSTRACE, "VFS Op:name=%s,flags=%u\n",
+	CDEBUG(D_VFSTRACE, "VFS Op:name=%s, flags=%u\n",
 	       dentry->d_name.name, flags);
 
-	if (!(flags & (LOOKUP_PARENT|LOOKUP_OPEN|LOOKUP_CREATE)) &&
-	    ll_need_statahead(parent, dentry) > 0) {
-		if (flags & LOOKUP_RCU)
-			return -ECHILD;
-
-		if (dentry->d_inode == NULL)
-			unplug = 1;
-		do_statahead_enter(parent, &dentry, unplug);
-		ll_statahead_mark(parent, dentry);
-	}
-
-	return 1;
+	rc = ll_revalidate_dentry(dentry, flags);
+	return rc;
 }
 
 
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 7ceec74..70b48ab 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -446,8 +446,7 @@ static int ll_intent_file_open(struct file *file, void *lmm,
 				 itp, NULL);
 
 out:
-	ptlrpc_req_finished(itp->d.lustre.it_data);
-	it_clear_disposition(itp, DISP_ENQ_COMPLETE);
+	ptlrpc_req_finished(req);
 	ll_intent_drop_lock(itp);
 
 	return rc;
@@ -815,10 +814,7 @@ struct obd_client_handle *ll_lease_open(struct inode *inode, struct file *file,
 	 * doesn't deal with openhandle, so normal openhandle will be leaked. */
 				LDLM_FL_NO_LRU | LDLM_FL_EXCL);
 	ll_finish_md_op_data(op_data);
-	if (req != NULL) {
-		ptlrpc_req_finished(req);
-		it_clear_disposition(&it, DISP_ENQ_COMPLETE);
-	}
+	ptlrpc_req_finished(req);
 	if (rc < 0)
 		GOTO(out_release_it, rc);
 
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 47c5142..f67c508 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -1309,7 +1309,7 @@ ll_statahead_mark(struct inode *dir, struct dentry *dentry)
 }
 
 static inline int
-ll_need_statahead(struct inode *dir, struct dentry *dentryp)
+d_need_statahead(struct inode *dir, struct dentry *dentryp)
 {
 	struct ll_inode_info  *lli;
 	struct ll_dentry_data *ldd;
@@ -1354,7 +1354,7 @@ ll_statahead_enter(struct inode *dir, struct dentry **dentryp, int only_unplug)
 {
 	int ret;
 
-	ret = ll_need_statahead(dir, *dentryp);
+	ret = d_need_statahead(dir, *dentryp);
 	if (ret <= 0)
 		return ret;
 
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_intent.c b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
index 56dedce..9ba5a0a 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_intent.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
@@ -119,7 +119,6 @@ static int lmv_intent_remote(struct obd_export *exp, void *lmm,
 	CDEBUG(D_INODE, "REMOTE_INTENT with fid="DFID" -> mds #%d\n",
 	       PFID(&body->fid1), tgt->ltd_idx);
 
-	it->d.lustre.it_disposition &= ~DISP_ENQ_COMPLETE;
 	rc = md_intent_lock(tgt->ltd_exp, op_data, lmm, lmmsize, it,
 			    flags, &req, cb_blocking, extra_lock_flags);
 	if (rc)
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 40fbd44..3ba0a0a 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -1744,7 +1744,6 @@ lmv_enqueue_remote(struct obd_export *exp, struct ldlm_enqueue_info *einfo,
 	it->d.lustre.it_data = NULL;
 	fid1 = body->fid1;
 
-	it->d.lustre.it_disposition &= ~DISP_ENQ_COMPLETE;
 	ptlrpc_req_finished(req);
 
 	tgt = lmv_find_target(lmv, &fid1);
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 20706e7..81adc2b 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -968,7 +968,6 @@ static int mdc_finish_intent_lock(struct obd_export *exp,
 	if (fid_is_sane(&op_data->op_fid2) &&
 	    it->it_create_mode & M_CHECK_STALE &&
 	    it->it_op != IT_GETATTR) {
-		it_set_disposition(it, DISP_ENQ_COMPLETE);
 
 		/* Also: did we find the same inode? */
 		/* sever can return one of two fids:
@@ -1139,6 +1138,12 @@ int mdc_intent_lock(struct obd_export *exp, struct md_op_data *op_data,
 		    ldlm_blocking_callback cb_blocking,
 		    __u64 extra_lock_flags)
 {
+	struct ldlm_enqueue_info einfo = {
+		.ei_type	= LDLM_IBITS,
+		.ei_mode	= it_to_lock_mode(it),
+		.ei_cb_bl	= cb_blocking,
+		.ei_cb_cp	= ldlm_completion_ast,
+	};
 	struct lustre_handle lockh;
 	int rc = 0;
 
@@ -1164,42 +1169,19 @@ int mdc_intent_lock(struct obd_export *exp, struct md_op_data *op_data,
 			return rc;
 	}
 
-	/* lookup_it may be called only after revalidate_it has run, because
-	 * revalidate_it cannot return errors, only zero.  Returning zero causes
-	 * this call to lookup, which *can* return an error.
-	 *
-	 * We only want to execute the request associated with the intent one
-	 * time, however, so don't send the request again.  Instead, skip past
-	 * this and use the request from revalidate.  In this case, revalidate
-	 * never dropped its reference, so the refcounts are all OK */
-	if (!it_disposition(it, DISP_ENQ_COMPLETE)) {
-		struct ldlm_enqueue_info einfo = {
-			.ei_type	= LDLM_IBITS,
-			.ei_mode	= it_to_lock_mode(it),
-			.ei_cb_bl	= cb_blocking,
-			.ei_cb_cp	= ldlm_completion_ast,
-		};
-
-		/* For case if upper layer did not alloc fid, do it now. */
-		if (!fid_is_sane(&op_data->op_fid2) && it->it_op & IT_CREAT) {
-			rc = mdc_fid_alloc(exp, &op_data->op_fid2, op_data);
-			if (rc < 0) {
-				CERROR("Can't alloc new fid, rc %d\n", rc);
-				return rc;
-			}
-		}
-		rc = mdc_enqueue(exp, &einfo, it, op_data, &lockh,
-				 lmm, lmmsize, NULL, extra_lock_flags);
-		if (rc < 0)
+	/* For case if upper layer did not alloc fid, do it now. */
+	if (!fid_is_sane(&op_data->op_fid2) && it->it_op & IT_CREAT) {
+		rc = mdc_fid_alloc(exp, &op_data->op_fid2, op_data);
+		if (rc < 0) {
+			CERROR("Can't alloc new fid, rc %d\n", rc);
 			return rc;
-	} else if (!fid_is_sane(&op_data->op_fid2) ||
-		   !(it->it_create_mode & M_CHECK_STALE)) {
-		/* DISP_ENQ_COMPLETE set means there is extra reference on
-		 * request referenced from this intent, saved for subsequent
-		 * lookup.  This path is executed when we proceed to this
-		 * lookup, so we clear DISP_ENQ_COMPLETE */
-		it_clear_disposition(it, DISP_ENQ_COMPLETE);
+		}
 	}
+	rc = mdc_enqueue(exp, &einfo, it, op_data, &lockh, lmm, lmmsize, NULL,
+			 extra_lock_flags);
+	if (rc < 0)
+		return rc;
+
 	*reqp = it->d.lustre.it_data;
 	rc = mdc_finish_intent_lock(exp, *reqp, op_data, it, &lockh);
 	return rc;
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 10/17] lustre/ldlm: set l_lvb_type coherent when layout is returned
  2014-03-01  2:16 [PATCH 00/17] Lustre stability patches Oleg Drokin
                   ` (8 preceding siblings ...)
  2014-03-01  2:16 ` [PATCH 09/17] lustre/llite: simplify dentry revalidate Oleg Drokin
@ 2014-03-01  2:16 ` Oleg Drokin
  2014-03-01  2:16 ` [PATCH 11/17] lustre/ptlrpc: rq_commit_cb is called for twice Oleg Drokin
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-03-01  2:16 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel; +Cc: Bruno Faccini, Oleg Drokin

From: Bruno Faccini <bruno.faccini@intel.com>

In case layout has been packed into server reply when not
requested, lock l_lvb_type must be set accordingly.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Reviewed-on: http://review.whamcloud.com/8270
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4194
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c | 1 +
 drivers/staging/lustre/lustre/mdc/mdc_locks.c   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c b/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c
index 3ed020e..d87048d 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c
@@ -228,6 +228,7 @@ static void ldlm_handle_cp_callback(struct ptlrpc_request *req,
 
 			lock_res_and_lock(lock);
 			LASSERT(lock->l_lvb_data == NULL);
+			lock->l_lvb_type = LVB_T_LAYOUT;
 			lock->l_lvb_data = lvb_data;
 			lock->l_lvb_len = lvb_len;
 			unlock_res_and_lock(lock);
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 81adc2b..b0d0e2a 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -753,6 +753,7 @@ static int mdc_finish_enqueue(struct obd_export *exp,
 		/* install lvb_data */
 		lock_res_and_lock(lock);
 		if (lock->l_lvb_data == NULL) {
+			lock->l_lvb_type = LVB_T_LAYOUT;
 			lock->l_lvb_data = lmm;
 			lock->l_lvb_len = lvb_len;
 			lmm = NULL;
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 11/17] lustre/ptlrpc: rq_commit_cb is called for twice
  2014-03-01  2:16 [PATCH 00/17] Lustre stability patches Oleg Drokin
                   ` (9 preceding siblings ...)
  2014-03-01  2:16 ` [PATCH 10/17] lustre/ldlm: set l_lvb_type coherent when layout is returned Oleg Drokin
@ 2014-03-01  2:16 ` Oleg Drokin
  2014-03-01  2:16 ` [PATCH 12/17] lustre/ptlrpc: skip rpcs that fail ptl_send_rpc Oleg Drokin
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-03-01  2:16 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel; +Cc: Liang Zhen, Oleg Drokin

From: Liang Zhen <liang.zhen@intel.com>

If a ptlrpc_request is already on imp::imp_replay_list, when it's
replayed and replied, after_reply() will call req::rq_commit_cb
for the request, then call it again in ptlrpc_free_committed.

Signed-off-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-on: http://review.whamcloud.com/8815
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3618
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lustre/ptlrpc/client.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/ptlrpc/client.c b/drivers/staging/lustre/lustre/ptlrpc/client.c
index a32b722..b6d831a 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/client.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/client.c
@@ -1313,7 +1313,11 @@ static int after_reply(struct ptlrpc_request *req)
 			/** version recovery */
 			ptlrpc_save_versions(req);
 			ptlrpc_retain_replayable_request(req, imp);
-		} else if (req->rq_commit_cb != NULL) {
+		} else if (req->rq_commit_cb != NULL &&
+			   list_empty(&req->rq_replay_list)) {
+			/* NB: don't call rq_commit_cb if it's already on
+			 * rq_replay_list, ptlrpc_free_committed() will call
+			 * it later, see LU-3618 for details */
 			spin_unlock(&imp->imp_lock);
 			req->rq_commit_cb(req);
 			spin_lock(&imp->imp_lock);
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 12/17] lustre/ptlrpc: skip rpcs that fail ptl_send_rpc
  2014-03-01  2:16 [PATCH 00/17] Lustre stability patches Oleg Drokin
                   ` (10 preceding siblings ...)
  2014-03-01  2:16 ` [PATCH 11/17] lustre/ptlrpc: rq_commit_cb is called for twice Oleg Drokin
@ 2014-03-01  2:16 ` Oleg Drokin
  2014-03-01  2:16 ` [PATCH 13/17] lustre/ptlrpc: fix 'data race condition' issues Oleg Drokin
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-03-01  2:16 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel
  Cc: Peng Tao, Keith Mannthey, Oleg Drokin

From: Peng Tao <bergwolf@gmail.com>

ptl_send_rpc is not dealing with -ENOMEM in some
situations.  When the ptl_send_rpc fails we need
set error and skip further processing or trigger
and LBUG

Signed-off-by: Keith Mannthey <keith.mannthey@intel.com>
Signed-off-by: Peng Tao <bergwolf@gmail.com>
Reviewed-on: http://review.whamcloud.com/7411
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3698
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lustre/ptlrpc/client.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/staging/lustre/lustre/ptlrpc/client.c b/drivers/staging/lustre/lustre/ptlrpc/client.c
index b6d831a..98041e8 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/client.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/client.c
@@ -1692,6 +1692,7 @@ int ptlrpc_check_set(const struct lu_env *env, struct ptlrpc_request_set *set)
 					spin_lock(&req->rq_lock);
 					req->rq_net_err = 1;
 					spin_unlock(&req->rq_lock);
+					continue;
 				}
 				/* need to reset the timeout */
 				force_timer_recalc = 1;
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 13/17] lustre/ptlrpc: fix 'data race condition' issues
  2014-03-01  2:16 [PATCH 00/17] Lustre stability patches Oleg Drokin
                   ` (11 preceding siblings ...)
  2014-03-01  2:16 ` [PATCH 12/17] lustre/ptlrpc: skip rpcs that fail ptl_send_rpc Oleg Drokin
@ 2014-03-01  2:16 ` Oleg Drokin
  2014-03-01  2:16 ` [PATCH 14/17] lustre/ptlrpc: re-enqueue ptlrpcd worker Oleg Drokin
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-03-01  2:16 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel; +Cc: Sebastien Buisson, Oeg Drokin

From: Sebastien Buisson <sebastien.buisson@bull.net>

Fix 'data race condition' defects found by Coverity version
6.5.0:
Data race condition (MISSING_LOCK)
Accessing variable without holding lock. Elsewhere,
this variable is accessed with lock held.

Signed-off-by: Sebastien Buisson <sebastien.buisson@bull.net>
Reviewed-on: http://review.whamcloud.com/6575
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2744
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Oeg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lustre/ptlrpc/client.c | 6 ++++++
 drivers/staging/lustre/lustre/ptlrpc/niobuf.c | 4 ++++
 2 files changed, 10 insertions(+)

diff --git a/drivers/staging/lustre/lustre/ptlrpc/client.c b/drivers/staging/lustre/lustre/ptlrpc/client.c
index 98041e8..7b97c64 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/client.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/client.c
@@ -1190,7 +1190,9 @@ static int after_reply(struct ptlrpc_request *req)
 		 * will roundup it */
 		req->rq_replen       = req->rq_nob_received;
 		req->rq_nob_received = 0;
+		spin_lock(&req->rq_lock);
 		req->rq_resend       = 1;
+		spin_unlock(&req->rq_lock);
 		return 0;
 	}
 
@@ -1412,7 +1414,9 @@ static int ptlrpc_send_new_req(struct ptlrpc_request *req)
 			req->rq_status = rc;
 			return 1;
 		} else {
+			spin_lock(&req->rq_lock);
 			req->rq_wait_ctx = 1;
+			spin_unlock(&req->rq_lock);
 			return 0;
 		}
 	}
@@ -1427,7 +1431,9 @@ static int ptlrpc_send_new_req(struct ptlrpc_request *req)
 	rc = ptl_send_rpc(req, 0);
 	if (rc) {
 		DEBUG_REQ(D_HA, req, "send failed (%d); expect timeout", rc);
+		spin_lock(&req->rq_lock);
 		req->rq_net_err = 1;
+		spin_unlock(&req->rq_lock);
 		return rc;
 	}
 	return 0;
diff --git a/drivers/staging/lustre/lustre/ptlrpc/niobuf.c b/drivers/staging/lustre/lustre/ptlrpc/niobuf.c
index 1e94597..a47a8d8 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/niobuf.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/niobuf.c
@@ -511,7 +511,9 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply)
 		CDEBUG(D_HA, "muting rpc for failed imp obd %s\n",
 		       request->rq_import->imp_obd->obd_name);
 		/* this prevents us from waiting in ptlrpc_queue_wait */
+		spin_lock(&request->rq_lock);
 		request->rq_err = 1;
+		spin_unlock(&request->rq_lock);
 		request->rq_status = -ENODEV;
 		return -ENODEV;
 	}
@@ -553,7 +555,9 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply)
 			if (rc) {
 				/* this prevents us from looping in
 				 * ptlrpc_queue_wait */
+				spin_lock(&request->rq_lock);
 				request->rq_err = 1;
+				spin_unlock(&request->rq_lock);
 				request->rq_status = rc;
 				GOTO(cleanup_bulk, rc);
 			}
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 14/17] lustre/ptlrpc: re-enqueue ptlrpcd worker
  2014-03-01  2:16 [PATCH 00/17] Lustre stability patches Oleg Drokin
                   ` (12 preceding siblings ...)
  2014-03-01  2:16 ` [PATCH 13/17] lustre/ptlrpc: fix 'data race condition' issues Oleg Drokin
@ 2014-03-01  2:16 ` Oleg Drokin
  2014-03-01  2:16 ` [PATCH 15/17] lustre/osc: Don't flush active extents Oleg Drokin
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-03-01  2:16 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel; +Cc: Liang Zhen, Oleg Drokin

From: Liang Zhen <liang.zhen@intel.com>

osc_extent_wait can be stuck in scenario like this:

1) thread-1 held an active extent
2) thread-2 called flush cache, and marked this extent as "urgent"
   and "sync_wait"
3) thread-3 wants to write to the same extent, osc_extent_find will
   get "conflict" because this extent is "sync_wait", so it starts
   to wait...
4) cl_writeback_work has been scheduled by thread-4 to write some
   other extents, it has sent RPCs but not returned yet.
5) thread-1 finished his work, and called osc_extent_release()->
   osc_io_unplug_async()->ptlrpcd_queue_work(), but found
   cl_writeback_work is still running, so it's ignored (-EBUSY)
6) thread-3 is stuck because nobody will wake him up.

This patch allows ptlrpcd_work to be rescheduled, so it will not
miss request anymore

Signed-off-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-on: http://review.whamcloud.com/8922
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4509
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lustre/ptlrpc/client.c | 64 +++++++++++++++++----------
 1 file changed, 40 insertions(+), 24 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ptlrpc/client.c b/drivers/staging/lustre/lustre/ptlrpc/client.c
index 7b97c64..4c9e006 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/client.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/client.c
@@ -48,6 +48,7 @@
 #include "ptlrpc_internal.h"
 
 static int ptlrpc_send_new_req(struct ptlrpc_request *req);
+static int ptlrpcd_check_work(struct ptlrpc_request *req);
 
 /**
  * Initialize passed in client structure \a cl.
@@ -1784,6 +1785,10 @@ interpret:
 
 		ptlrpc_req_interpret(env, req, req->rq_status);
 
+		if (ptlrpcd_check_work(req)) {
+			atomic_dec(&set->set_remaining);
+			continue;
+		}
 		ptlrpc_rqphase_move(req, RQ_PHASE_COMPLETE);
 
 		CDEBUG(req->rq_reqmsg != NULL ? D_RPCTRACE : 0,
@@ -2957,22 +2962,50 @@ EXPORT_SYMBOL(ptlrpc_sample_next_xid);
  *    have delay before it really runs by ptlrpcd thread.
  */
 struct ptlrpc_work_async_args {
-	__u64   magic;
 	int   (*cb)(const struct lu_env *, void *);
 	void   *cbdata;
 };
 
-#define PTLRPC_WORK_MAGIC 0x6655436b676f4f44ULL /* magic code */
+static void ptlrpcd_add_work_req(struct ptlrpc_request *req)
+{
+	/* re-initialize the req */
+	req->rq_timeout		= obd_timeout;
+	req->rq_sent		= cfs_time_current_sec();
+	req->rq_deadline	= req->rq_sent + req->rq_timeout;
+	req->rq_reply_deadline	= req->rq_deadline;
+	req->rq_phase		= RQ_PHASE_INTERPRET;
+	req->rq_next_phase	= RQ_PHASE_COMPLETE;
+	req->rq_xid		= ptlrpc_next_xid();
+	req->rq_import_generation = req->rq_import->imp_generation;
+
+	ptlrpcd_add_req(req, PDL_POLICY_ROUND, -1);
+}
 
 static int work_interpreter(const struct lu_env *env,
 			    struct ptlrpc_request *req, void *data, int rc)
 {
 	struct ptlrpc_work_async_args *arg = data;
 
-	LASSERT(arg->magic == PTLRPC_WORK_MAGIC);
+	LASSERT(ptlrpcd_check_work(req));
 	LASSERT(arg->cb != NULL);
 
-	return arg->cb(env, arg->cbdata);
+	rc = arg->cb(env, arg->cbdata);
+
+	list_del_init(&req->rq_set_chain);
+	req->rq_set = NULL;
+
+	if (atomic_dec_return(&req->rq_refcount) > 1) {
+		atomic_set(&req->rq_refcount, 2);
+		ptlrpcd_add_work_req(req);
+	}
+	return rc;
+}
+
+static int worker_format;
+
+static int ptlrpcd_check_work(struct ptlrpc_request *req)
+{
+	return req->rq_pill.rc_fmt == (void *)&worker_format;
 }
 
 /**
@@ -3005,6 +3038,7 @@ void *ptlrpcd_alloc_work(struct obd_import *imp,
 	req->rq_receiving_reply = 0;
 	req->rq_must_unlink = 0;
 	req->rq_no_delay = req->rq_no_resend = 1;
+	req->rq_pill.rc_fmt = (void *)&worker_format;
 
 	spin_lock_init(&req->rq_lock);
 	INIT_LIST_HEAD(&req->rq_list);
@@ -3018,7 +3052,6 @@ void *ptlrpcd_alloc_work(struct obd_import *imp,
 
 	CLASSERT(sizeof(*args) <= sizeof(req->rq_async_args));
 	args = ptlrpc_req_async_args(req);
-	args->magic  = PTLRPC_WORK_MAGIC;
 	args->cb     = cb;
 	args->cbdata = cbdata;
 
@@ -3048,25 +3081,8 @@ int ptlrpcd_queue_work(void *handler)
 	 * req as opaque data. - Jinshan
 	 */
 	LASSERT(atomic_read(&req->rq_refcount) > 0);
-	if (atomic_read(&req->rq_refcount) > 1)
-		return -EBUSY;
-
-	if (atomic_inc_return(&req->rq_refcount) > 2) { /* race */
-		atomic_dec(&req->rq_refcount);
-		return -EBUSY;
-	}
-
-	/* re-initialize the req */
-	req->rq_timeout	= obd_timeout;
-	req->rq_sent	   = cfs_time_current_sec();
-	req->rq_deadline       = req->rq_sent + req->rq_timeout;
-	req->rq_reply_deadline = req->rq_deadline;
-	req->rq_phase	  = RQ_PHASE_INTERPRET;
-	req->rq_next_phase     = RQ_PHASE_COMPLETE;
-	req->rq_xid	    = ptlrpc_next_xid();
-	req->rq_import_generation = req->rq_import->imp_generation;
-
-	ptlrpcd_add_req(req, PDL_POLICY_ROUND, -1);
+	if (atomic_inc_return(&req->rq_refcount) == 2)
+		ptlrpcd_add_work_req(req);
 	return 0;
 }
 EXPORT_SYMBOL(ptlrpcd_queue_work);
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 15/17] lustre/osc: Don't flush active extents.
  2014-03-01  2:16 [PATCH 00/17] Lustre stability patches Oleg Drokin
                   ` (13 preceding siblings ...)
  2014-03-01  2:16 ` [PATCH 14/17] lustre/ptlrpc: re-enqueue ptlrpcd worker Oleg Drokin
@ 2014-03-01  2:16 ` Oleg Drokin
  2014-03-01  2:16 ` [PATCH 16/17] lustre/quota: improper assert in osc_quota_chkdq() Oleg Drokin
  2014-03-01  2:16 ` [PATCH 17/17] lustre/libcfs: warn if all HTs in a core are gone Oleg Drokin
  16 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-03-01  2:16 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel; +Cc: Ann Koehler, Oleg Drokin

From: Ann Koehler <amk@cray.com>

The extent is active so we need to abort and let the caller
re-dirty the page. If we continued on here, and we were the
one making the extent active, we could deadlock waiting for
the page writeback to clear but it won't because the extent
is active and won't be written out.

Signed-off-by: Ann Koehler <amk@cray.com>
Reviewed-on: http://review.whamcloud.com/8278
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4253
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Alexey Lyashkov <alexey_lyashkov@xyratex.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lustre/osc/osc_cache.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/staging/lustre/lustre/osc/osc_cache.c b/drivers/staging/lustre/lustre/osc/osc_cache.c
index b92a02e..af25c19 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cache.c
+++ b/drivers/staging/lustre/lustre/osc/osc_cache.c
@@ -2394,6 +2394,12 @@ int osc_flush_async_page(const struct lu_env *env, struct cl_io *io,
 		 * really sending the RPC. */
 	case OES_TRUNC:
 		/* race with truncate, page will be redirtied */
+	case OES_ACTIVE:
+		/* The extent is active so we need to abort and let the caller
+		 * re-dirty the page. If we continued on here, and we were the
+		 * one making the extent active, we could deadlock waiting for
+		 * the page writeback to clear but it won't because the extent
+		 * is active and won't be written out. */
 		GOTO(out, rc = -EAGAIN);
 	default:
 		break;
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 16/17] lustre/quota: improper assert in osc_quota_chkdq()
  2014-03-01  2:16 [PATCH 00/17] Lustre stability patches Oleg Drokin
                   ` (14 preceding siblings ...)
  2014-03-01  2:16 ` [PATCH 15/17] lustre/osc: Don't flush active extents Oleg Drokin
@ 2014-03-01  2:16 ` Oleg Drokin
  2014-03-01  2:16 ` [PATCH 17/17] lustre/libcfs: warn if all HTs in a core are gone Oleg Drokin
  16 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-03-01  2:16 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel; +Cc: Niu Yawei, Oleg Drokin

From: Niu Yawei <yawei.niu@intel.com>

In osc_quota_chkdq(), we should never try to access oqi found
from hash, since it could have been freed by osc_quota_setdq().

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-on: http://review.whamcloud.com/8460
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4336
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lustre/osc/osc_quota.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/lustre/lustre/osc/osc_quota.c b/drivers/staging/lustre/lustre/osc/osc_quota.c
index 6045a78..f395ae4 100644
--- a/drivers/staging/lustre/lustre/osc/osc_quota.c
+++ b/drivers/staging/lustre/lustre/osc/osc_quota.c
@@ -51,11 +51,8 @@ int osc_quota_chkdq(struct client_obd *cli, const unsigned int qid[])
 
 		oqi = cfs_hash_lookup(cli->cl_quota_hash[type], &qid[type]);
 		if (oqi) {
-			obd_uid id = oqi->oqi_id;
-
-			LASSERTF(id == qid[type],
-				 "The ids don't match %u != %u\n",
-				 id, qid[type]);
+			/* do not try to access oqi here, it could have been
+			 * freed by osc_quota_setdq() */
 
 			/* the slot is busy, the user is about to run out of
 			 * quota space on this OST */
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 17/17] lustre/libcfs: warn if all HTs in a core are gone
  2014-03-01  2:16 [PATCH 00/17] Lustre stability patches Oleg Drokin
                   ` (15 preceding siblings ...)
  2014-03-01  2:16 ` [PATCH 16/17] lustre/quota: improper assert in osc_quota_chkdq() Oleg Drokin
@ 2014-03-01  2:16 ` Oleg Drokin
  16 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-03-01  2:16 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel
  Cc: Oleg Drokin, Liang Zhen, Oleg Drokin

libcfs cpu partition can't support CPU hotplug, but it is safe
when plug-in new CPU or enabling/disabling hyper-threading.
It has potential risk only if plug-out CPU because it may break CPU
affinity of Lustre threads.

Current libcfs will print warning for all CPU notification, this
patch changed this behavior and only output warning when we lost all
HTs in a CPU core which may have broken affinity of Lustre threads.

Signed-off-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-on: http://review.whamcloud.com/8770
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4454
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
---
 .../staging/lustre/lustre/libcfs/linux/linux-cpu.c    | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/libcfs/linux/linux-cpu.c b/drivers/staging/lustre/lustre/libcfs/linux/linux-cpu.c
index 58bb256..77b1ef6 100644
--- a/drivers/staging/lustre/lustre/libcfs/linux/linux-cpu.c
+++ b/drivers/staging/lustre/lustre/libcfs/linux/linux-cpu.c
@@ -952,6 +952,7 @@ static int
 cfs_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu)
 {
 	unsigned int  cpu = (unsigned long)hcpu;
+	bool	     warn;
 
 	switch (action) {
 	case CPU_DEAD:
@@ -962,9 +963,21 @@ cfs_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu)
 		cpt_data.cpt_version++;
 		spin_unlock(&cpt_data.cpt_lock);
 	default:
-		CWARN("Lustre: can't support CPU hotplug well now, "
-		      "performance and stability could be impacted"
-		      "[CPU %u notify: %lx]\n", cpu, action);
+		if (action != CPU_DEAD && action != CPU_DEAD_FROZEN) {
+			CDEBUG(D_INFO, "CPU changed [cpu %u action %lx]\n",
+			       cpu, action);
+			break;
+		}
+
+		down(&cpt_data.cpt_mutex);
+		/* if all HTs in a core are offline, it may break affinity */
+		cfs_cpu_ht_siblings(cpu, cpt_data.cpt_cpumask);
+		warn = any_online_cpu(*cpt_data.cpt_cpumask) >= nr_cpu_ids;
+		up(&cpt_data.cpt_mutex);
+		CDEBUG(warn ? D_WARNING : D_INFO,
+		       "Lustre: can't support CPU plug-out well now, "
+		       "performance and stability could be impacted "
+		       "[CPU %u action: %lx]\n", cpu, action);
 	}
 
 	return NOTIFY_OK;
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 01/17] staging/lustre/llite: fix open lock matching in ll_md_blocking_ast()
  2014-03-01  2:16 ` [PATCH 01/17] staging/lustre/llite: fix open lock matching in ll_md_blocking_ast() Oleg Drokin
@ 2014-03-03 10:01   ` Dan Carpenter
  0 siblings, 0 replies; 19+ messages in thread
From: Dan Carpenter @ 2014-03-03 10:01 UTC (permalink / raw)
  To: Oleg Drokin
  Cc: Greg Kroah-Hartman, linux-kernel, devel, Oleg Drokin, John L. Hammond

On Fri, Feb 28, 2014 at 09:16:30PM -0500, Oleg Drokin wrote:
>  
> -	if (och) { /* There might be a race and somebody have freed this och
> -		      already */
> +	if (och != NULL) {
> +		/* There might be a race and this handle may already
> +		   be closed. */

This is a random unrelated whitespace change and the style was better in
the original (double negatives are not not stupid).

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2014-03-03 10:01 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-01  2:16 [PATCH 00/17] Lustre stability patches Oleg Drokin
2014-03-01  2:16 ` [PATCH 01/17] staging/lustre/llite: fix open lock matching in ll_md_blocking_ast() Oleg Drokin
2014-03-03 10:01   ` Dan Carpenter
2014-03-01  2:16 ` [PATCH 02/17] lustre/mdc: Check for all attributes validity in revalidate Oleg Drokin
2014-03-01  2:16 ` [PATCH 03/17] lustre/llite: Do not send parent dir fid in getattr by fid Oleg Drokin
2014-03-01  2:16 ` [PATCH 04/17] lustre/mdc: comments on LOOKUP and PERM lock Oleg Drokin
2014-03-01  2:16 ` [PATCH 05/17] lustre/mdc: use ibits_known mask for lock match Oleg Drokin
2014-03-01  2:16 ` [PATCH 06/17] lustre/clio: honor O_NOATIME Oleg Drokin
2014-03-01  2:16 ` [PATCH 07/17] lustre/mdc: fix bad ERR_PTR usage in mdc_locks.c Oleg Drokin
2014-03-01  2:16 ` [PATCH 08/17] lustre/recovery: free open/close request promptly Oleg Drokin
2014-03-01  2:16 ` [PATCH 09/17] lustre/llite: simplify dentry revalidate Oleg Drokin
2014-03-01  2:16 ` [PATCH 10/17] lustre/ldlm: set l_lvb_type coherent when layout is returned Oleg Drokin
2014-03-01  2:16 ` [PATCH 11/17] lustre/ptlrpc: rq_commit_cb is called for twice Oleg Drokin
2014-03-01  2:16 ` [PATCH 12/17] lustre/ptlrpc: skip rpcs that fail ptl_send_rpc Oleg Drokin
2014-03-01  2:16 ` [PATCH 13/17] lustre/ptlrpc: fix 'data race condition' issues Oleg Drokin
2014-03-01  2:16 ` [PATCH 14/17] lustre/ptlrpc: re-enqueue ptlrpcd worker Oleg Drokin
2014-03-01  2:16 ` [PATCH 15/17] lustre/osc: Don't flush active extents Oleg Drokin
2014-03-01  2:16 ` [PATCH 16/17] lustre/quota: improper assert in osc_quota_chkdq() Oleg Drokin
2014-03-01  2:16 ` [PATCH 17/17] lustre/libcfs: warn if all HTs in a core are gone Oleg Drokin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).