All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/18] Lustre fixes
@ 2014-06-23  1:32 Oleg Drokin
  2014-06-23  1:32 ` [PATCH 01/18] staging/lustre/libcfs: revert changes to libcfs_sock_ioctl Oleg Drokin
                   ` (17 more replies)
  0 siblings, 18 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-06-23  1:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel; +Cc: Oleg Drokin

These patches here represent most of recent fixes we added recently
in our tree.
The first patch also unbreaks lustre from total breakage that was
introduced by commit 80db2734acbc78db12798cfb611d6acc7fe389e6

The changes seem to pass my testing.
checkpatch output is clean except for the last patch
#144: FILE: drivers/staging/lustre/lnet/lnet/lib-move.c:821:
+               CNETERR("Aborting message for %s: LNetM[DE]Unlink() already "
+                       "called on the MD/ME.\n",

This one cannot be helped I guess.

WARNING: return of an errno should typically be -ve (return -ECANCELED)
#150: FILE: drivers/staging/lustre/lnet/lnet/lib-move.c:827:
+               return ECANCELED;

This one would be addressed by two other patches I am working on right now.

Please consider for inclusion.

Alexander.Boyko (1):
  staging/lustre/ptlrpc: race at req processing

Alexey Lyashkov (1):
  staging/lustre/ptlrpc: unlink request buffer correctly

Andriy Skulysh (2):
  staging/lustre/mgc: mgc import reconnect race
  staging/lustre/osc: osc_extent_truncate()) ASSERTION( !ext->oe_urgent
    ) failed

Bob Glossman (1):
  staging/lustre/obdclass: runtime load lustre client when needed

Bobi Jam (1):
  staging/lustre/osc: get rid of old checksum initial value

Cheng Shao (1):
  staging/lustre/mgc: replace hard-coded MGC_ENQUEUE_LIMIT value

Christopher J. Morrone (1):
  staging/lustre/ptlrpc: Add schedule point to ptlrpc_check_set()

Dmitry Eremin (4):
  staging/lustre: fix frong ldlm flags type used
  staging/lustre/ptlrpc: fix NULL pointer dereference of {exp,imp}_obd
  staging/lustre/obdclass: Fix uninitialized variables
  staging/lustre/llite: Fix uninitialized variable

Isaac Huang (1):
  staging/lustre/lnet: abort messages whose MD has been unlinked

Li Xi (1):
  staging/lustre/llite: fix a flag bug of vvp_io_kernel_fault()

Nathaniel Clark (1):
  staging/lustre/llite: Only kill SGID/SUID bits

Oleg Drokin (2):
  staging/lustre/libcfs: revert changes to libcfs_sock_ioctl
  staging/lustre/ptlrpc: Protect request buffer changing

Patrick Farrell (1):
  staging/lustre/vvp: release mmap_sem in error case

 .../staging/lustre/include/linux/lnet/lib-types.h  |  1 +
 drivers/staging/lustre/lnet/lnet/lib-md.c          | 10 ++---
 drivers/staging/lustre/lnet/lnet/lib-me.c          | 11 ++---
 drivers/staging/lustre/lnet/lnet/lib-move.c        | 49 +++++++++++++++-------
 drivers/staging/lustre/lustre/include/lustre_dlm.h |  2 +-
 drivers/staging/lustre/lustre/include/lustre_net.h |  6 ++-
 drivers/staging/lustre/lustre/include/obd_class.h  |  2 +-
 drivers/staging/lustre/lustre/include/obd_ost.h    |  4 ++
 drivers/staging/lustre/lustre/ldlm/ldlm_request.c  |  2 +-
 .../lustre/lustre/libcfs/linux/linux-tcpip.c       | 21 ++++++++--
 drivers/staging/lustre/lustre/llite/file.c         |  6 +--
 drivers/staging/lustre/lustre/llite/llite_lib.c    |  6 ++-
 drivers/staging/lustre/lustre/llite/vvp_io.c       |  6 ++-
 drivers/staging/lustre/lustre/llite/xattr.c        |  1 +
 drivers/staging/lustre/lustre/lmv/lmv_obd.c        |  2 +-
 drivers/staging/lustre/lustre/lov/lov_internal.h   |  2 +-
 drivers/staging/lustre/lustre/lov/lov_request.c    | 10 ++---
 drivers/staging/lustre/lustre/mgc/mgc_request.c    |  5 ++-
 drivers/staging/lustre/lustre/obdclass/capa.c      |  5 +++
 drivers/staging/lustre/lustre/obdclass/obd_mount.c | 18 +++-----
 drivers/staging/lustre/lustre/osc/osc_cache.c      |  7 ++--
 drivers/staging/lustre/lustre/osc/osc_internal.h   |  2 +-
 drivers/staging/lustre/lustre/osc/osc_page.c       |  2 +-
 drivers/staging/lustre/lustre/osc/osc_request.c    | 17 +++-----
 drivers/staging/lustre/lustre/ptlrpc/client.c      | 32 +++++++++++---
 drivers/staging/lustre/lustre/ptlrpc/events.c      | 11 +++--
 drivers/staging/lustre/lustre/ptlrpc/gss/sec_gss.c | 29 +++++++++++++
 drivers/staging/lustre/lustre/ptlrpc/import.c      | 41 ++++++++++++++----
 drivers/staging/lustre/lustre/ptlrpc/niobuf.c      | 14 ++++---
 drivers/staging/lustre/lustre/ptlrpc/pinger.c      |  5 +++
 drivers/staging/lustre/lustre/ptlrpc/sec_null.c    | 11 +++++
 drivers/staging/lustre/lustre/ptlrpc/sec_plain.c   | 12 ++++++
 drivers/staging/lustre/lustre/ptlrpc/service.c     | 10 ++---
 33 files changed, 254 insertions(+), 108 deletions(-)

-- 
1.9.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 01/18] staging/lustre/libcfs: revert changes to libcfs_sock_ioctl
  2014-06-23  1:32 [PATCH 00/18] Lustre fixes Oleg Drokin
@ 2014-06-23  1:32 ` Oleg Drokin
  2014-06-23  1:32 ` [PATCH 02/18] staging/lustre/ptlrpc: Protect request buffer changing Oleg Drokin
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-06-23  1:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel
  Cc: Oleg Drokin, Fredrick John Berchmans

Changes introduced by 80db2734acbc78db12798cfb611d6acc7fe389e6
unfortunately totally break lustre, we use this function
to access not only socket proto obs, but also device ioctl
like SIOCGIFCONF that now fail.
Reverting part of the previous patch to regain the needed
functionality.

Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
CC: Fredrick John Berchmans <fredrickprashanth@gmail.com>
---
 .../lustre/lustre/libcfs/linux/linux-tcpip.c        | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/libcfs/linux/linux-tcpip.c b/drivers/staging/lustre/lustre/libcfs/linux/linux-tcpip.c
index a21b426..e52e33a 100644
--- a/drivers/staging/lustre/lustre/libcfs/linux/linux-tcpip.c
+++ b/drivers/staging/lustre/lustre/libcfs/linux/linux-tcpip.c
@@ -46,16 +46,31 @@
 int
 libcfs_sock_ioctl(int cmd, unsigned long arg)
 {
+	mm_segment_t	oldmm = get_fs();
 	struct socket  *sock;
-	int	     rc;
+	int		rc;
+	struct file    *sock_filp;
 
 	rc = sock_create (PF_INET, SOCK_STREAM, 0, &sock);
 	if (rc != 0) {
 		CERROR ("Can't create socket: %d\n", rc);
 		return rc;
 	}
-	rc = kernel_sock_ioctl(sock, cmd, arg);
-	sock_release(sock);
+
+	sock_filp = sock_alloc_file(sock, 0, NULL);
+	if (IS_ERR(sock_filp)) {
+		sock_release(sock);
+		rc = PTR_ERR(sock_filp);
+		goto out;
+	}
+
+	set_fs(KERNEL_DS);
+	if (sock_filp->f_op->unlocked_ioctl)
+		rc = sock_filp->f_op->unlocked_ioctl(sock_filp, cmd, arg);
+	set_fs(oldmm);
+
+	fput(sock_filp);
+out:
 	return rc;
 }
 
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 02/18] staging/lustre/ptlrpc: Protect request buffer changing
  2014-06-23  1:32 [PATCH 00/18] Lustre fixes Oleg Drokin
  2014-06-23  1:32 ` [PATCH 01/18] staging/lustre/libcfs: revert changes to libcfs_sock_ioctl Oleg Drokin
@ 2014-06-23  1:32 ` Oleg Drokin
  2014-06-23  1:32 ` [PATCH 03/18] staging/lustre/llite: Only kill SGID/SUID bits Oleg Drokin
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-06-23  1:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel; +Cc: Oleg Drokin, Oleg Drokin

*_enlarge_reqbuf class of functions can change request body location
for a request that's already in replay list, as such a parallel
traverser of the list (after_reply -> ptlrpc_free_committed) might
access freed and scrambled memory causing assertion.

Since all such users only can get to this request under imp_lock, take
imp_lock to protect against them in *_enlarge_reqbuf

Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: http://review.whamcloud.com/10074
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3333
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
---
 drivers/staging/lustre/lustre/ptlrpc/gss/sec_gss.c | 29 ++++++++++++++++++++++
 drivers/staging/lustre/lustre/ptlrpc/sec_null.c    | 11 ++++++++
 drivers/staging/lustre/lustre/ptlrpc/sec_plain.c   | 12 +++++++++
 3 files changed, 52 insertions(+)

diff --git a/drivers/staging/lustre/lustre/ptlrpc/gss/sec_gss.c b/drivers/staging/lustre/lustre/ptlrpc/gss/sec_gss.c
index 383601c..ef44e09 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/gss/sec_gss.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/gss/sec_gss.c
@@ -1687,12 +1687,24 @@ int gss_enlarge_reqbuf_intg(struct ptlrpc_sec *sec,
 		if (newbuf == NULL)
 			return -ENOMEM;
 
+		/* Must lock this, so that otherwise unprotected change of
+		 * rq_reqmsg is not racing with parallel processing of
+		 * imp_replay_list traversing threads. See LU-3333
+		 * This is a bandaid at best, we really need to deal with this
+		 * in request enlarging code before unpacking that's already
+		 * there */
+		if (req->rq_import)
+			spin_lock(&req->rq_import->imp_lock);
+
 		memcpy(newbuf, req->rq_reqbuf, req->rq_reqbuf_len);
 
 		OBD_FREE_LARGE(req->rq_reqbuf, req->rq_reqbuf_len);
 		req->rq_reqbuf = newbuf;
 		req->rq_reqbuf_len = newbuf_size;
 		req->rq_reqmsg = lustre_msg_buf(req->rq_reqbuf, 1, 0);
+
+		if (req->rq_import)
+			spin_unlock(&req->rq_import->imp_lock);
 	}
 
 	/* do enlargement, from wrapper to embedded, from end to begin */
@@ -1753,6 +1765,8 @@ int gss_enlarge_reqbuf_priv(struct ptlrpc_sec *sec,
 		if (newclrbuf_size + newcipbuf_size <= req->rq_reqbuf_len) {
 			void *src, *dst;
 
+			if (req->rq_import)
+				spin_lock(&req->rq_import->imp_lock);
 			/* move clear text backward. */
 			src = req->rq_clrbuf;
 			dst = (char *) req->rq_reqbuf + newcipbuf_size;
@@ -1762,6 +1776,9 @@ int gss_enlarge_reqbuf_priv(struct ptlrpc_sec *sec,
 			req->rq_clrbuf = (struct lustre_msg *) dst;
 			req->rq_clrbuf_len = newclrbuf_size;
 			req->rq_reqmsg = lustre_msg_buf(req->rq_clrbuf, 0, 0);
+
+			if (req->rq_import)
+				spin_unlock(&req->rq_import->imp_lock);
 		} else {
 			/* sadly we have to split out the clear buffer */
 			LASSERT(req->rq_reqbuf_len >= newcipbuf_size);
@@ -1776,6 +1793,15 @@ int gss_enlarge_reqbuf_priv(struct ptlrpc_sec *sec,
 		if (newclrbuf == NULL)
 			return -ENOMEM;
 
+		/* Must lock this, so that otherwise unprotected change of
+		 * rq_reqmsg is not racing with parallel processing of
+		 * imp_replay_list traversing threads. See LU-3333
+		 * This is a bandaid at best, we really need to deal with this
+		 * in request enlarging code before unpacking that's already
+		 * there */
+		if (req->rq_import)
+			spin_lock(&req->rq_import->imp_lock);
+
 		memcpy(newclrbuf, req->rq_clrbuf, req->rq_clrbuf_len);
 
 		if (req->rq_reqbuf == NULL ||
@@ -1788,6 +1814,9 @@ int gss_enlarge_reqbuf_priv(struct ptlrpc_sec *sec,
 		req->rq_clrbuf = newclrbuf;
 		req->rq_clrbuf_len = newclrbuf_size;
 		req->rq_reqmsg = lustre_msg_buf(req->rq_clrbuf, 0, 0);
+
+		if (req->rq_import)
+			spin_unlock(&req->rq_import->imp_lock);
 	}
 
 	_sptlrpc_enlarge_msg_inplace(req->rq_clrbuf, 0, newmsg_size);
diff --git a/drivers/staging/lustre/lustre/ptlrpc/sec_null.c b/drivers/staging/lustre/lustre/ptlrpc/sec_null.c
index ff1137f..ac967cb 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/sec_null.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/sec_null.c
@@ -260,11 +260,22 @@ int null_enlarge_reqbuf(struct ptlrpc_sec *sec,
 		if (newbuf == NULL)
 			return -ENOMEM;
 
+		/* Must lock this, so that otherwise unprotected change of
+		 * rq_reqmsg is not racing with parallel processing of
+		 * imp_replay_list traversing threads. See LU-3333
+		 * This is a bandaid at best, we really need to deal with this
+		 * in request enlarging code before unpacking that's already
+		 * there */
+		if (req->rq_import)
+			spin_lock(&req->rq_import->imp_lock);
 		memcpy(newbuf, req->rq_reqbuf, req->rq_reqlen);
 
 		OBD_FREE_LARGE(req->rq_reqbuf, req->rq_reqbuf_len);
 		req->rq_reqbuf = req->rq_reqmsg = newbuf;
 		req->rq_reqbuf_len = alloc_size;
+
+		if (req->rq_import)
+			spin_unlock(&req->rq_import->imp_lock);
 	}
 
 	_sptlrpc_enlarge_msg_inplace(req->rq_reqmsg, segment, newsize);
diff --git a/drivers/staging/lustre/lustre/ptlrpc/sec_plain.c b/drivers/staging/lustre/lustre/ptlrpc/sec_plain.c
index 416401b..12c6cef 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/sec_plain.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/sec_plain.c
@@ -669,6 +669,15 @@ int plain_enlarge_reqbuf(struct ptlrpc_sec *sec,
 		if (newbuf == NULL)
 			return -ENOMEM;
 
+		/* Must lock this, so that otherwise unprotected change of
+		 * rq_reqmsg is not racing with parallel processing of
+		 * imp_replay_list traversing threads. See LU-3333
+		 * This is a bandaid at best, we really need to deal with this
+		 * in request enlarging code before unpacking that's already
+		 * there */
+		if (req->rq_import)
+			spin_lock(&req->rq_import->imp_lock);
+
 		memcpy(newbuf, req->rq_reqbuf, req->rq_reqbuf_len);
 
 		OBD_FREE_LARGE(req->rq_reqbuf, req->rq_reqbuf_len);
@@ -676,6 +685,9 @@ int plain_enlarge_reqbuf(struct ptlrpc_sec *sec,
 		req->rq_reqbuf_len = newbuf_size;
 		req->rq_reqmsg = lustre_msg_buf(req->rq_reqbuf,
 						PLAIN_PACK_MSG_OFF, 0);
+
+		if (req->rq_import)
+			spin_unlock(&req->rq_import->imp_lock);
 	}
 
 	_sptlrpc_enlarge_msg_inplace(req->rq_reqbuf, PLAIN_PACK_MSG_OFF,
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 03/18] staging/lustre/llite: Only kill SGID/SUID bits
  2014-06-23  1:32 [PATCH 00/18] Lustre fixes Oleg Drokin
  2014-06-23  1:32 ` [PATCH 01/18] staging/lustre/libcfs: revert changes to libcfs_sock_ioctl Oleg Drokin
  2014-06-23  1:32 ` [PATCH 02/18] staging/lustre/ptlrpc: Protect request buffer changing Oleg Drokin
@ 2014-06-23  1:32 ` Oleg Drokin
  2014-06-23  1:32 ` [PATCH 04/18] staging/lustre: fix frong ldlm flags type used Oleg Drokin
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-06-23  1:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel; +Cc: Nathaniel Clark, Oleg Drokin

From: Nathaniel Clark <nathaniel.l.clark@intel.com>

Check that attr mode is valid before using it when determining if to
clear SGID and SUID bits in ll_setattr.

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-on: http://review.whamcloud.com/10153
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4924
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lustre/llite/llite_lib.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index deca27e..7eadd60 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -1537,12 +1537,14 @@ int ll_setattr(struct dentry *de, struct iattr *attr)
 	      !(attr->ia_mode & S_ISGID))))
 		attr->ia_valid |= ATTR_FORCE;
 
-	if ((mode & S_ISUID) &&
+	if ((attr->ia_valid & ATTR_MODE) &&
+	    (mode & S_ISUID) &&
 	    !(attr->ia_mode & S_ISUID) &&
 	    !(attr->ia_valid & ATTR_KILL_SUID))
 		attr->ia_valid |= ATTR_KILL_SUID;
 
-	if (((mode & (S_ISGID|S_IXGRP)) == (S_ISGID|S_IXGRP)) &&
+	if ((attr->ia_valid & ATTR_MODE) &&
+	    ((mode & (S_ISGID|S_IXGRP)) == (S_ISGID|S_IXGRP)) &&
 	    !(attr->ia_mode & S_ISGID) &&
 	    !(attr->ia_valid & ATTR_KILL_SGID))
 		attr->ia_valid |= ATTR_KILL_SGID;
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 04/18] staging/lustre: fix frong ldlm flags type used
  2014-06-23  1:32 [PATCH 00/18] Lustre fixes Oleg Drokin
                   ` (2 preceding siblings ...)
  2014-06-23  1:32 ` [PATCH 03/18] staging/lustre/llite: Only kill SGID/SUID bits Oleg Drokin
@ 2014-06-23  1:32 ` Oleg Drokin
  2014-06-23  1:32 ` [PATCH 05/18] staging/lustre/ptlrpc: fix NULL pointer dereference of {exp,imp}_obd Oleg Drokin
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-06-23  1:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel; +Cc: Dmitry Eremin, Oleg Drokin

From: Dmitry Eremin <dmitry.eremin@intel.com>

Fixed implicit conversion from 'unsigned long long' to 'int'.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-on: http://review.whamcloud.com/7799
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4023
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lustre/include/lustre_dlm.h |  2 +-
 drivers/staging/lustre/lustre/include/obd_class.h  |  2 +-
 drivers/staging/lustre/lustre/include/obd_ost.h    |  4 ++++
 drivers/staging/lustre/lustre/ldlm/ldlm_request.c  |  2 +-
 drivers/staging/lustre/lustre/llite/file.c         |  6 +++---
 drivers/staging/lustre/lustre/lmv/lmv_obd.c        |  2 +-
 drivers/staging/lustre/lustre/lov/lov_internal.h   |  2 +-
 drivers/staging/lustre/lustre/lov/lov_request.c    | 10 +++-------
 drivers/staging/lustre/lustre/osc/osc_internal.h   |  2 +-
 drivers/staging/lustre/lustre/osc/osc_page.c       |  2 +-
 drivers/staging/lustre/lustre/osc/osc_request.c    | 10 +++++-----
 11 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre_dlm.h b/drivers/staging/lustre/lustre/include/lustre_dlm.h
index 0c6b784..bf5b2cb 100644
--- a/drivers/staging/lustre/lustre/include/lustre_dlm.h
+++ b/drivers/staging/lustre/lustre/include/lustre_dlm.h
@@ -1390,7 +1390,7 @@ int ldlm_cli_cancel_req(struct obd_export *exp, struct list_head *head,
 int ldlm_cancel_resource_local(struct ldlm_resource *res,
 			       struct list_head *cancels,
 			       ldlm_policy_data_t *policy,
-			       ldlm_mode_t mode, int lock_flags,
+			       ldlm_mode_t mode, __u64 lock_flags,
 			       ldlm_cancel_flags_t cancel_flags, void *opaque);
 int ldlm_cli_cancel_list_local(struct list_head *cancels, int count,
 			       ldlm_cancel_flags_t flags);
diff --git a/drivers/staging/lustre/lustre/include/obd_class.h b/drivers/staging/lustre/lustre/include/obd_class.h
index e265820..f8a9d7c 100644
--- a/drivers/staging/lustre/lustre/include/obd_class.h
+++ b/drivers/staging/lustre/lustre/include/obd_class.h
@@ -1818,7 +1818,7 @@ static inline int md_enqueue(struct obd_export *exp,
 			     struct lustre_handle *lockh,
 			     void *lmm, int lmmsize,
 			     struct ptlrpc_request **req,
-			     int extra_lock_flags)
+			     __u64 extra_lock_flags)
 {
 	int rc;
 
diff --git a/drivers/staging/lustre/lustre/include/obd_ost.h b/drivers/staging/lustre/lustre/include/obd_ost.h
index af89843..54ef540 100644
--- a/drivers/staging/lustre/lustre/include/obd_ost.h
+++ b/drivers/staging/lustre/lustre/include/obd_ost.h
@@ -87,6 +87,10 @@ struct osc_enqueue_args {
 	unsigned int	      oa_agl:1;
 };
 
+extern void osc_update_enqueue(struct lustre_handle *lov_lockhp,
+			       struct lov_oinfo *loi, __u64 flags,
+			       struct ost_lvb *lvb, __u32 mode, int rc);
+
 #if 0
 int osc_extent_blocking_cb(struct ldlm_lock *lock,
 			   struct ldlm_lock_desc *new, void *data,
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_request.c b/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
index fcc7a99..3accbce 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
@@ -1768,7 +1768,7 @@ int ldlm_cancel_lru(struct ldlm_namespace *ns, int nr,
 int ldlm_cancel_resource_local(struct ldlm_resource *res,
 			       struct list_head *cancels,
 			       ldlm_policy_data_t *policy,
-			       ldlm_mode_t mode, int lock_flags,
+			       ldlm_mode_t mode, __u64 lock_flags,
 			       ldlm_cancel_flags_t cancel_flags, void *opaque)
 {
 	struct ldlm_lock *lock;
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 716e1ee..660fd4d 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -290,7 +290,7 @@ static int ll_md_close(struct obd_export *md_exp, struct inode *inode,
 	   we can skip talking to MDS */
 	if (file->f_dentry->d_inode) { /* Can this ever be false? */
 		int lockmode;
-		int flags = LDLM_FL_BLOCK_GRANTED | LDLM_FL_TEST_LOCK;
+		__u64 flags = LDLM_FL_BLOCK_GRANTED | LDLM_FL_TEST_LOCK;
 		struct lustre_handle lockh;
 		struct inode *inode = file->f_dentry->d_inode;
 		ldlm_policy_data_t policy = {.l_inodebits={MDS_INODELOCK_OPEN}};
@@ -2623,7 +2623,7 @@ ll_file_flock(struct file *file, int cmd, struct file_lock *file_lock)
 	struct md_op_data *op_data;
 	struct lustre_handle lockh = {0};
 	ldlm_policy_data_t flock = {{0}};
-	int flags = 0;
+	__u64 flags = 0;
 	int rc;
 	int rc2 = 0;
 
@@ -2708,7 +2708,7 @@ ll_file_flock(struct file *file, int cmd, struct file_lock *file_lock)
 	if (IS_ERR(op_data))
 		return PTR_ERR(op_data);
 
-	CDEBUG(D_DLMTRACE, "inode=%lu, pid=%u, flags=%#x, mode=%u, "
+	CDEBUG(D_DLMTRACE, "inode=%lu, pid=%u, flags=%#llx, mode=%u, "
 	       "start="LPU64", end="LPU64"\n", inode->i_ino, flock.l_flock.pid,
 	       flags, einfo.ei_mode, flock.l_flock.start, flock.l_flock.end);
 
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 4edf8a3..c17a49e 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -1715,7 +1715,7 @@ static int
 lmv_enqueue_remote(struct obd_export *exp, struct ldlm_enqueue_info *einfo,
 		   struct lookup_intent *it, struct md_op_data *op_data,
 		   struct lustre_handle *lockh, void *lmm, int lmmsize,
-		   int extra_lock_flags)
+		   __u64 extra_lock_flags)
 {
 	struct ptlrpc_request      *req = it->d.lustre.it_data;
 	struct obd_device	  *obd = exp->exp_obd;
diff --git a/drivers/staging/lustre/lustre/lov/lov_internal.h b/drivers/staging/lustre/lustre/lov/lov_internal.h
index 38508a5..2232643 100644
--- a/drivers/staging/lustre/lustre/lov/lov_internal.h
+++ b/drivers/staging/lustre/lustre/lov/lov_internal.h
@@ -252,7 +252,7 @@ int lov_prep_match_set(struct obd_export *exp, struct obd_info *oinfo,
 		       ldlm_policy_data_t *policy, __u32 mode,
 		       struct lustre_handle *lockh,
 		       struct lov_request_set **reqset);
-int lov_fini_match_set(struct lov_request_set *set, __u32 mode, int flags);
+int lov_fini_match_set(struct lov_request_set *set, __u32 mode, __u64 flags);
 int lov_prep_cancel_set(struct obd_export *exp, struct obd_info *oinfo,
 			struct lov_stripe_md *lsm,
 			__u32 mode, struct lustre_handle *lockh,
diff --git a/drivers/staging/lustre/lustre/lov/lov_request.c b/drivers/staging/lustre/lustre/lov/lov_request.c
index bd6490d..984f4c3 100644
--- a/drivers/staging/lustre/lustre/lov/lov_request.c
+++ b/drivers/staging/lustre/lustre/lov/lov_request.c
@@ -39,8 +39,8 @@
 #include <linux/libcfs/libcfs.h>
 
 #include <obd_class.h>
+#include <obd_ost.h>
 #include <lustre/lustre_idl.h>
-
 #include "lov_internal.h"
 
 static void lov_init_set(struct lov_request_set *set)
@@ -194,13 +194,9 @@ out:
 	return rc;
 }
 
-extern void osc_update_enqueue(struct lustre_handle *lov_lockhp,
-			       struct lov_oinfo *loi, int flags,
-			       struct ost_lvb *lvb, __u32 mode, int rc);
-
 static int lov_update_enqueue_lov(struct obd_export *exp,
 				  struct lustre_handle *lov_lockhp,
-				  struct lov_oinfo *loi, int flags, int idx,
+				  struct lov_oinfo *loi, __u64 flags, int idx,
 				  struct ost_id *oi, int rc)
 {
 	struct lov_obd *lov = &exp->exp_obd->u.lov;
@@ -443,7 +439,7 @@ out_set:
 	return rc;
 }
 
-int lov_fini_match_set(struct lov_request_set *set, __u32 mode, int flags)
+int lov_fini_match_set(struct lov_request_set *set, __u32 mode, __u64 flags)
 {
 	int rc = 0;
 
diff --git a/drivers/staging/lustre/lustre/osc/osc_internal.h b/drivers/staging/lustre/lustre/osc/osc_internal.h
index efc5db4..9c4a189 100644
--- a/drivers/staging/lustre/lustre/osc/osc_internal.h
+++ b/drivers/staging/lustre/lustre/osc/osc_internal.h
@@ -112,7 +112,7 @@ int osc_cancel_base(struct lustre_handle *lockh, __u32 mode);
 
 int osc_match_base(struct obd_export *exp, struct ldlm_res_id *res_id,
 		   __u32 type, ldlm_policy_data_t *policy, __u32 mode,
-		   int *flags, void *data, struct lustre_handle *lockh,
+		   __u64 *flags, void *data, struct lustre_handle *lockh,
 		   int unref);
 
 int osc_setattr_async_base(struct obd_export *exp, struct obd_info *oinfo,
diff --git a/drivers/staging/lustre/lustre/osc/osc_page.c b/drivers/staging/lustre/lustre/osc/osc_page.c
index 96cb6e2..71a2447 100644
--- a/drivers/staging/lustre/lustre/osc/osc_page.c
+++ b/drivers/staging/lustre/lustre/osc/osc_page.c
@@ -70,7 +70,7 @@ static int osc_page_is_dlocked(const struct lu_env *env,
 	struct lustre_handle   *lockh;
 	ldlm_policy_data_t     *policy;
 	ldlm_mode_t	     dlmmode;
-	int		     flags;
+	__u64                   flags;
 
 	might_sleep();
 
diff --git a/drivers/staging/lustre/lustre/osc/osc_request.c b/drivers/staging/lustre/lustre/osc/osc_request.c
index 294db84..5804104 100644
--- a/drivers/staging/lustre/lustre/osc/osc_request.c
+++ b/drivers/staging/lustre/lustre/osc/osc_request.c
@@ -635,7 +635,7 @@ static int osc_sync(const struct lu_env *env, struct obd_export *exp,
  * locks added to @cancels list. */
 static int osc_resource_get_unused(struct obd_export *exp, struct obdo *oa,
 				   struct list_head *cancels,
-				   ldlm_mode_t mode, int lock_flags)
+				   ldlm_mode_t mode, __u64 lock_flags)
 {
 	struct ldlm_namespace *ns = exp->exp_obd->obd_namespace;
 	struct ldlm_res_id res_id;
@@ -2398,7 +2398,7 @@ static int osc_enqueue_interpret(const struct lu_env *env,
 }
 
 void osc_update_enqueue(struct lustre_handle *lov_lockhp,
-			struct lov_oinfo *loi, int flags,
+			struct lov_oinfo *loi, __u64 flags,
 			struct ost_lvb *lvb, __u32 mode, int rc)
 {
 	struct ldlm_lock *lock = ldlm_handle2lock(lov_lockhp);
@@ -2462,7 +2462,7 @@ int osc_enqueue_base(struct obd_export *exp, struct ldlm_res_id *res_id,
 	struct obd_device *obd = exp->exp_obd;
 	struct ptlrpc_request *req = NULL;
 	int intent = *flags & LDLM_FL_HAS_INTENT;
-	int match_lvb = (agl != 0 ? 0 : LDLM_FL_LVB_READY);
+	__u64 match_lvb = (agl != 0 ? 0 : LDLM_FL_LVB_READY);
 	ldlm_mode_t mode;
 	int rc;
 
@@ -2613,11 +2613,11 @@ static int osc_enqueue(struct obd_export *exp, struct obd_info *oinfo,
 
 int osc_match_base(struct obd_export *exp, struct ldlm_res_id *res_id,
 		   __u32 type, ldlm_policy_data_t *policy, __u32 mode,
-		   int *flags, void *data, struct lustre_handle *lockh,
+		   __u64 *flags, void *data, struct lustre_handle *lockh,
 		   int unref)
 {
 	struct obd_device *obd = exp->exp_obd;
-	int lflags = *flags;
+	__u64 lflags = *flags;
 	ldlm_mode_t rc;
 
 	if (OBD_FAIL_CHECK(OBD_FAIL_OSC_MATCH))
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 05/18] staging/lustre/ptlrpc: fix NULL pointer dereference of {exp,imp}_obd
  2014-06-23  1:32 [PATCH 00/18] Lustre fixes Oleg Drokin
                   ` (3 preceding siblings ...)
  2014-06-23  1:32 ` [PATCH 04/18] staging/lustre: fix frong ldlm flags type used Oleg Drokin
@ 2014-06-23  1:32 ` Oleg Drokin
  2014-06-23  1:32 ` [PATCH 06/18] staging/lustre/mgc: mgc import reconnect race Oleg Drokin
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-06-23  1:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel; +Cc: Dmitry Eremin, Oleg Drokin

From: Dmitry Eremin <dmitry.eremin@intel.com>

Pointer 'obd' checked for NULL at line 694 may be dereferenced at
line 813.

Pointer 'req->rq_export->exp_obd' checked for NULL at line 1155
may be dereferenced at line 1164. Also there is one similar error
on line 1170.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-on: http://review.whamcloud.com/10062
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4629
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lustre/ptlrpc/niobuf.c  |  7 +++----
 drivers/staging/lustre/lustre/ptlrpc/service.c | 10 +++++-----
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ptlrpc/niobuf.c b/drivers/staging/lustre/lustre/ptlrpc/niobuf.c
index a47a8d8..ef18639 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/niobuf.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/niobuf.c
@@ -506,10 +506,9 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply)
 	 * cleanly from the previous attempt */
 	LASSERT(!request->rq_receiving_reply);
 
-	if (request->rq_import->imp_obd &&
-	    request->rq_import->imp_obd->obd_fail) {
+	if (unlikely(obd != NULL && obd->obd_fail)) {
 		CDEBUG(D_HA, "muting rpc for failed imp obd %s\n",
-		       request->rq_import->imp_obd->obd_name);
+			obd->obd_name);
 		/* this prevents us from waiting in ptlrpc_queue_wait */
 		spin_lock(&request->rq_lock);
 		request->rq_err = 1;
@@ -625,7 +624,7 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply)
 
 	/* add references on request for request_out_callback */
 	ptlrpc_request_addref(request);
-	if (obd->obd_svc_stats != NULL)
+	if (obd != NULL && obd->obd_svc_stats != NULL)
 		lprocfs_counter_add(obd->obd_svc_stats, PTLRPC_REQACTIVE_CNTR,
 			atomic_read(&request->rq_import->imp_inflight));
 
diff --git a/drivers/staging/lustre/lustre/ptlrpc/service.c b/drivers/staging/lustre/lustre/ptlrpc/service.c
index d278f2e..214daa2 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/service.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/service.c
@@ -1100,6 +1100,7 @@ static void ptlrpc_update_export_timer(struct obd_export *exp, long extra_delay)
  */
 static int ptlrpc_check_req(struct ptlrpc_request *req)
 {
+	struct obd_device *obd = req->rq_export->exp_obd;
 	int rc = 0;
 
 	if (unlikely(lustre_msg_get_conn_cnt(req->rq_reqmsg) <
@@ -1110,24 +1111,23 @@ static int ptlrpc_check_req(struct ptlrpc_request *req)
 			  req->rq_export->exp_conn_cnt);
 		return -EEXIST;
 	}
-	if (unlikely(req->rq_export->exp_obd &&
-		     req->rq_export->exp_obd->obd_fail)) {
+	if (unlikely(obd == NULL || obd->obd_fail)) {
 		/*
 		 * Failing over, don't handle any more reqs, send
 		 * error response instead.
 		 */
 		CDEBUG(D_RPCTRACE, "Dropping req %p for failed obd %s\n",
-		       req, req->rq_export->exp_obd->obd_name);
+		       req, (obd != NULL) ? obd->obd_name : "unknown");
 		rc = -ENODEV;
 	} else if (lustre_msg_get_flags(req->rq_reqmsg) &
 		   (MSG_REPLAY | MSG_REQ_REPLAY_DONE) &&
-		   !(req->rq_export->exp_obd->obd_recovering)) {
+		   !obd->obd_recovering) {
 			DEBUG_REQ(D_ERROR, req,
 				  "Invalid replay without recovery");
 			class_fail_export(req->rq_export);
 			rc = -ENODEV;
 	} else if (lustre_msg_get_transno(req->rq_reqmsg) != 0 &&
-		   !(req->rq_export->exp_obd->obd_recovering)) {
+		   !obd->obd_recovering) {
 			DEBUG_REQ(D_ERROR, req, "Invalid req with transno "
 				  LPU64" without recovery",
 				  lustre_msg_get_transno(req->rq_reqmsg));
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 06/18] staging/lustre/mgc: mgc import reconnect race
  2014-06-23  1:32 [PATCH 00/18] Lustre fixes Oleg Drokin
                   ` (4 preceding siblings ...)
  2014-06-23  1:32 ` [PATCH 05/18] staging/lustre/ptlrpc: fix NULL pointer dereference of {exp,imp}_obd Oleg Drokin
@ 2014-06-23  1:32 ` Oleg Drokin
  2014-06-23  1:32 ` [PATCH 07/18] staging/lustre/osc: get rid of old checksum initial value Oleg Drokin
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-06-23  1:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel; +Cc: Andriy Skulysh, Oleg Drokin

From: Andriy Skulysh <Andriy_Skulysh@xyratex.com>

mgc import can be reconnected by pinger or
ptlrpc_reconnect_import().
ptlrpc_invalidate_import() isn't protected against
alteration of imp_invalid state. Import can be
reconnected by pinger which makes imp_invalid
equal to false. Thus LASSERT(imp->imp_invalid) fails
in ptlrpc_invalidate_import().

It is safe to call ptlrpc_invalidate_import() when
import is deactivated, but ptlrpc_reconnect_import() doesn't
deactivate it.
Let's use only pinger when available to reconnect import

Signed-off-by: Andriy Skulysh <Andriy_Skulysh@xyratex.com>
Reviewed-on: http://review.whamcloud.com/9967
Xyratex-bug-id: MRP-1746
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4913
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lustre/obdclass/obd_mount.c | 13 ++-----
 drivers/staging/lustre/lustre/ptlrpc/import.c      | 41 +++++++++++++++++-----
 drivers/staging/lustre/lustre/ptlrpc/pinger.c      |  5 +++
 3 files changed, 40 insertions(+), 19 deletions(-)

diff --git a/drivers/staging/lustre/lustre/obdclass/obd_mount.c b/drivers/staging/lustre/lustre/obdclass/obd_mount.c
index a034aee..03d9a6a 100644
--- a/drivers/staging/lustre/lustre/obdclass/obd_mount.c
+++ b/drivers/staging/lustre/lustre/obdclass/obd_mount.c
@@ -219,7 +219,6 @@ int lustre_start_mgc(struct super_block *sb)
 	lnet_nid_t nid;
 	char *mgcname = NULL, *niduuid = NULL, *mgssec = NULL;
 	char *ptr;
-	int recov_bk;
 	int rc = 0, i = 0, j, len;
 
 	LASSERT(lsi->lsi_lmd);
@@ -269,6 +268,8 @@ int lustre_start_mgc(struct super_block *sb)
 
 	obd = class_name2obd(mgcname);
 	if (obd && !obd->obd_stopping) {
+		int recov_bk;
+
 		rc = obd_set_info_async(NULL, obd->obd_self_export,
 					strlen(KEY_MGSSEC), KEY_MGSSEC,
 					strlen(mgssec), mgssec, NULL);
@@ -429,16 +430,6 @@ int lustre_start_mgc(struct super_block *sb)
 	   so we know when we can get rid of the mgc. */
 	atomic_set(&obd->u.cli.cl_mgc_refcount, 1);
 
-	/* Try all connections, but only once. */
-	recov_bk = 1;
-	rc = obd_set_info_async(NULL, obd->obd_self_export,
-				sizeof(KEY_INIT_RECOV_BACKUP),
-				KEY_INIT_RECOV_BACKUP,
-				sizeof(recov_bk), &recov_bk, NULL);
-	if (rc)
-		/* nonfatal */
-		CWARN("can't set %s %d\n", KEY_INIT_RECOV_BACKUP, rc);
-
 	/* We connect to the MGS at setup, and don't disconnect until cleanup */
 	data->ocd_connect_flags = OBD_CONNECT_VERSION | OBD_CONNECT_AT |
 				  OBD_CONNECT_FULL20 | OBD_CONNECT_IMP_RECOV |
diff --git a/drivers/staging/lustre/lustre/ptlrpc/import.c b/drivers/staging/lustre/lustre/ptlrpc/import.c
index 8573f32..b4def8a 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/import.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/import.c
@@ -275,6 +275,7 @@ void ptlrpc_invalidate_import(struct obd_import *imp)
 	if (!imp->imp_invalid || imp->imp_obd->obd_no_recov)
 		ptlrpc_deactivate_import(imp);
 
+	CFS_FAIL_TIMEOUT(OBD_FAIL_MGS_CONNECT_NET, 3 * cfs_fail_val / 2);
 	LASSERT(imp->imp_invalid);
 
 	/* Wait forever until inflight == 0. We really can't do it another
@@ -392,6 +393,19 @@ void ptlrpc_activate_import(struct obd_import *imp)
 }
 EXPORT_SYMBOL(ptlrpc_activate_import);
 
+static void ptlrpc_pinger_force(struct obd_import *imp)
+{
+	CDEBUG(D_HA, "%s: waking up pinger s:%s\n", obd2cli_tgt(imp->imp_obd),
+	       ptlrpc_import_state_name(imp->imp_state));
+
+	spin_lock(&imp->imp_lock);
+	imp->imp_force_verify = 1;
+	spin_unlock(&imp->imp_lock);
+
+	if (imp->imp_state != LUSTRE_IMP_CONNECTING)
+		ptlrpc_pinger_wake_up();
+}
+
 void ptlrpc_fail_import(struct obd_import *imp, __u32 conn_cnt)
 {
 	LASSERT(!imp->imp_dlm_fake);
@@ -406,20 +420,30 @@ void ptlrpc_fail_import(struct obd_import *imp, __u32 conn_cnt)
 			ptlrpc_deactivate_import(imp);
 		}
 
-		CDEBUG(D_HA, "%s: waking up pinger\n",
-		       obd2cli_tgt(imp->imp_obd));
-
-		spin_lock(&imp->imp_lock);
-		imp->imp_force_verify = 1;
-		spin_unlock(&imp->imp_lock);
-
-		ptlrpc_pinger_wake_up();
+		ptlrpc_pinger_force(imp);
 	}
 }
 EXPORT_SYMBOL(ptlrpc_fail_import);
 
 int ptlrpc_reconnect_import(struct obd_import *imp)
 {
+#ifdef ENABLE_PINGER
+	struct l_wait_info lwi;
+	int secs = cfs_time_seconds(obd_timeout);
+	int rc;
+
+	ptlrpc_pinger_force(imp);
+
+	CDEBUG(D_HA, "%s: recovery started, waiting %u seconds\n",
+	       obd2cli_tgt(imp->imp_obd), secs);
+
+	lwi = LWI_TIMEOUT(secs, NULL, NULL);
+	rc = l_wait_event(imp->imp_recovery_waitq,
+			  !ptlrpc_import_in_recovery(imp), &lwi);
+	CDEBUG(D_HA, "%s: recovery finished s:%s\n", obd2cli_tgt(imp->imp_obd),
+	       ptlrpc_import_state_name(imp->imp_state));
+	return rc;
+#else
 	ptlrpc_set_import_discon(imp, 0);
 	/* Force a new connect attempt */
 	ptlrpc_invalidate_import(imp);
@@ -444,6 +468,7 @@ int ptlrpc_reconnect_import(struct obd_import *imp)
 	/* Attempt a new connect */
 	ptlrpc_recover_import(imp, NULL, 0);
 	return 0;
+#endif
 }
 EXPORT_SYMBOL(ptlrpc_reconnect_import);
 
diff --git a/drivers/staging/lustre/lustre/ptlrpc/pinger.c b/drivers/staging/lustre/lustre/ptlrpc/pinger.c
index 38099d9..2898087 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/pinger.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/pinger.c
@@ -224,6 +224,11 @@ static void ptlrpc_pinger_process_import(struct obd_import *imp,
 		       "or recovery disabled: %s)\n",
 		       imp->imp_obd->obd_uuid.uuid, obd2cli_tgt(imp->imp_obd),
 		       ptlrpc_import_state_name(level));
+		if (force) {
+			spin_lock(&imp->imp_lock);
+			imp->imp_force_verify = 1;
+			spin_unlock(&imp->imp_lock);
+		}
 	} else if ((imp->imp_pingable && !suppress) || force_next || force) {
 		ptlrpc_ping(imp);
 	}
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 07/18] staging/lustre/osc: get rid of old checksum initial value
  2014-06-23  1:32 [PATCH 00/18] Lustre fixes Oleg Drokin
                   ` (5 preceding siblings ...)
  2014-06-23  1:32 ` [PATCH 06/18] staging/lustre/mgc: mgc import reconnect race Oleg Drokin
@ 2014-06-23  1:32 ` Oleg Drokin
  2014-06-23  1:32 ` [PATCH 08/18] staging/lustre/ptlrpc: race at req processing Oleg Drokin
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-06-23  1:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel; +Cc: Bobi Jam, Oleg Drokin

From: Bobi Jam <bobijam.xu@intel.com>

Old code residue assumes initial checksum value as ~0, and relies on
that to check whether OST server has calculated bulk data checksum.
That is not the case anymore.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Reviewed-on: http://review.whamcloud.com/10354
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4937
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lustre/osc/osc_request.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/staging/lustre/lustre/osc/osc_request.c b/drivers/staging/lustre/lustre/osc/osc_request.c
index 5804104..90e8912 100644
--- a/drivers/staging/lustre/lustre/osc/osc_request.c
+++ b/drivers/staging/lustre/lustre/osc/osc_request.c
@@ -1570,12 +1570,7 @@ static int osc_brw_fini_request(struct ptlrpc_request *req, int rc)
 			router = libcfs_nid2str(req->rq_bulk->bd_sender);
 		}
 
-		if (server_cksum == ~0 && rc > 0) {
-			CERROR("Protocol error: server %s set the 'checksum' "
-			       "bit, but didn't send a checksum.  Not fatal, "
-			       "but please notify on http://bugs.whamcloud.com/\n",
-			       libcfs_nid2str(peer->nid));
-		} else if (server_cksum != client_cksum) {
+		if (server_cksum != client_cksum) {
 			LCONSOLE_ERROR_MSG(0x133, "%s: BAD READ CHECKSUM: from "
 					   "%s%s%s inode "DFID" object "DOSTID
 					   " extent ["LPU64"-"LPU64"]\n",
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 08/18] staging/lustre/ptlrpc: race at req processing
  2014-06-23  1:32 [PATCH 00/18] Lustre fixes Oleg Drokin
                   ` (6 preceding siblings ...)
  2014-06-23  1:32 ` [PATCH 07/18] staging/lustre/osc: get rid of old checksum initial value Oleg Drokin
@ 2014-06-23  1:32 ` Oleg Drokin
  2014-06-23  1:32 ` [PATCH 09/18] staging/lustre/mgc: replace hard-coded MGC_ENQUEUE_LIMIT value Oleg Drokin
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-06-23  1:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel; +Cc: Alexander.Boyko, Oleg Drokin

From: "Alexander.Boyko" <alexander_boyko@xyratex.com>

Race between ptlrpc_resend_req() and ptlrpc_check_set().
1 thread do ptlrpc_check_set()->after_reply()
2 thread do ptlrpc_resend_req()
The result is request with rq_resend = 1 and MSG_REPLY flag.
When this request will came to server it will cause client eviction.
The patch skip ptlrpc_resend_req logic if rq_replied is set,
and clear rq_resend flag at reply_in_callback() when client got
reply.

Signed-off-by: Alexander Boyko <alexander_boyko@xyratex.com>
Xyratex-bug-id: MRP-1888
Reviewed-on: http://review.whamcloud.com/10471
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5116
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lustre/ptlrpc/client.c | 11 ++++++++++-
 drivers/staging/lustre/lustre/ptlrpc/events.c |  2 ++
 drivers/staging/lustre/lustre/ptlrpc/niobuf.c |  2 ++
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/ptlrpc/client.c b/drivers/staging/lustre/lustre/ptlrpc/client.c
index 7246e8c..d806257 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/client.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/client.c
@@ -2530,10 +2530,19 @@ EXPORT_SYMBOL(ptlrpc_cleanup_client);
 void ptlrpc_resend_req(struct ptlrpc_request *req)
 {
 	DEBUG_REQ(D_HA, req, "going to resend");
+	spin_lock(&req->rq_lock);
+
+	/* Request got reply but linked to the import list still.
+	   Let ptlrpc_check_set() to process it. */
+	if (ptlrpc_client_replied(req)) {
+		spin_unlock(&req->rq_lock);
+		DEBUG_REQ(D_HA, req, "it has reply, so skip it");
+		return;
+	}
+
 	lustre_msg_set_handle(req->rq_reqmsg, &(struct lustre_handle){ 0 });
 	req->rq_status = -EAGAIN;
 
-	spin_lock(&req->rq_lock);
 	req->rq_resend = 1;
 	req->rq_net_err = 0;
 	req->rq_timedout = 0;
diff --git a/drivers/staging/lustre/lustre/ptlrpc/events.c b/drivers/staging/lustre/lustre/ptlrpc/events.c
index aa85239..9f9b8d1 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/events.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/events.c
@@ -145,6 +145,8 @@ void reply_in_callback(lnet_event_t *ev)
 		/* Real reply */
 		req->rq_rep_swab_mask = 0;
 		req->rq_replied = 1;
+		/* Got reply, no resend required */
+		req->rq_resend = 0;
 		req->rq_reply_off = ev->offset;
 		req->rq_nob_received = ev->mlength;
 		/* LNetMDUnlink can't be called under the LNET_LOCK,
diff --git a/drivers/staging/lustre/lustre/ptlrpc/niobuf.c b/drivers/staging/lustre/lustre/ptlrpc/niobuf.c
index ef18639..f760504 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/niobuf.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/niobuf.c
@@ -505,6 +505,8 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply)
 	/* If this is a re-transmit, we're required to have disengaged
 	 * cleanly from the previous attempt */
 	LASSERT(!request->rq_receiving_reply);
+	LASSERT(!((lustre_msg_get_flags(request->rq_reqmsg) & MSG_REPLAY) &&
+		(request->rq_import->imp_state == LUSTRE_IMP_FULL)));
 
 	if (unlikely(obd != NULL && obd->obd_fail)) {
 		CDEBUG(D_HA, "muting rpc for failed imp obd %s\n",
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 09/18] staging/lustre/mgc: replace hard-coded MGC_ENQUEUE_LIMIT value
  2014-06-23  1:32 [PATCH 00/18] Lustre fixes Oleg Drokin
                   ` (7 preceding siblings ...)
  2014-06-23  1:32 ` [PATCH 08/18] staging/lustre/ptlrpc: race at req processing Oleg Drokin
@ 2014-06-23  1:32 ` Oleg Drokin
  2014-06-23  1:32 ` [PATCH 10/18] staging/lustre/ptlrpc: Add schedule point to ptlrpc_check_set() Oleg Drokin
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-06-23  1:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel; +Cc: Cheng Shao, Oleg Drokin

From: Cheng Shao <cheng_shao@xyratex.com>

During client mount, the client will send an LDLM_ENQUEUE request to
MGS with send delay set to MGC_ENQUEUE_LIMIT, which is hard coded to
50 seconds. On the other hand, the interval for pinger is deduced from
obd_timeout. When obd_timeout is configured for a longer period of
time, so does the pinger. We know that connecting to the secondary MGS
node is triggered by the pinger. Now that we have a longer interval,
the pinger will not be able to try the secondary before the
LDLM_ENQUEUE request fails the mount using the same delay limit.

This code change will replace the hard-coded send delay being
mentioned above with a value that is long enough to give the client a
chance to connect to the secondary MGS if exists.

Signed-off-by: Cheng Shao <cheng_shao@xyratex.com>
Reviewed-on: http://review.whamcloud.com/9217
Xyratex-bug-id: MRP-1516
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4582
Reviewed-by: Ryan Haasken <haasken@cray.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lustre/mgc/mgc_request.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/mgc/mgc_request.c b/drivers/staging/lustre/lustre/mgc/mgc_request.c
index a806aef..28960f9 100644
--- a/drivers/staging/lustre/lustre/mgc/mgc_request.c
+++ b/drivers/staging/lustre/lustre/mgc/mgc_request.c
@@ -950,7 +950,10 @@ static int mgc_blocking_ast(struct ldlm_lock *lock, struct ldlm_lock_desc *desc,
 }
 
 /* Not sure where this should go... */
-#define  MGC_ENQUEUE_LIMIT 50
+/* This is the timeout value for MGS_CONNECT request plus a ping interval, such
+ * that we can have a chance to try the secondary MGS if any. */
+#define  MGC_ENQUEUE_LIMIT (INITIAL_CONNECT_TIMEOUT + (AT_OFF ? 0 : at_min) \
+				+ PING_INTERVAL)
 #define  MGC_TARGET_REG_LIMIT 10
 #define  MGC_SEND_PARAM_LIMIT 10
 
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 10/18] staging/lustre/ptlrpc: Add schedule point to ptlrpc_check_set()
  2014-06-23  1:32 [PATCH 00/18] Lustre fixes Oleg Drokin
                   ` (8 preceding siblings ...)
  2014-06-23  1:32 ` [PATCH 09/18] staging/lustre/mgc: replace hard-coded MGC_ENQUEUE_LIMIT value Oleg Drokin
@ 2014-06-23  1:32 ` Oleg Drokin
  2014-06-23  1:32 ` [PATCH 11/18] staging/lustre/obdclass: Fix uninitialized variables Oleg Drokin
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-06-23  1:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel
  Cc: Christopher J. Morrone, Oleg Drokin

From: "Christopher J. Morrone" <morrone2@llnl.gov>

Most ptlrpc sets are believed to be small and bounded in length.  However
at the very least the ptlrpcd reuses the ptlrpc sets at its primary work
queue.  This work queue can easily have work added faster than the ptlrpcd
thread can process the work.  The unbounded work can lead to the ptlrpcd
monopolizing a CPU for hundreds of seconds.  Obviously a well-behaved
kernel function should obey the scheduler and share the processor.

We address that problem by inserting a cond_resched() at the top of the
main loop of ptlrpc_check_set().

Some have suggested putting the cond_resched() lower in the loop.  However,
the only current way to bound the number of loops that we exceed our
allocated run time is to put the call at the top of the loop.  Putting it
lower would allow an unknown number (and since it is unknown, it might be
excessively large at times) of cycles through the loop before a
resched is allowed.

Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
Reviewed-on: http://review.whamcloud.com/10358
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5053
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lustre/ptlrpc/client.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/staging/lustre/lustre/ptlrpc/client.c b/drivers/staging/lustre/lustre/ptlrpc/client.c
index d806257..1890482 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/client.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/client.c
@@ -1496,6 +1496,8 @@ static inline int ptlrpc_set_producer(struct ptlrpc_request_set *set)
  * and no more replies are expected.
  * (it is possible to get less replies than requests sent e.g. due to timed out
  * requests or requests that we had trouble to send out)
+ *
+ * NOTE: This function contains a potential schedule point (cond_resched()).
  */
 int ptlrpc_check_set(const struct lu_env *env, struct ptlrpc_request_set *set)
 {
@@ -1513,6 +1515,14 @@ int ptlrpc_check_set(const struct lu_env *env, struct ptlrpc_request_set *set)
 		int unregistered = 0;
 		int rc = 0;
 
+		/* This schedule point is mainly for the ptlrpcd caller of this
+		 * function.  Most ptlrpc sets are not long-lived and unbounded
+		 * in length, but at the least the set used by the ptlrpcd is.
+		 * Since the processing time is unbounded, we need to insert an
+		 * explicit schedule point to make the thread well-behaved.
+		 */
+		cond_resched();
+
 		if (req->rq_phase == RQ_PHASE_NEW &&
 		    ptlrpc_send_new_req(req)) {
 			force_timer_recalc = 1;
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 11/18] staging/lustre/obdclass: Fix uninitialized variables
  2014-06-23  1:32 [PATCH 00/18] Lustre fixes Oleg Drokin
                   ` (9 preceding siblings ...)
  2014-06-23  1:32 ` [PATCH 10/18] staging/lustre/ptlrpc: Add schedule point to ptlrpc_check_set() Oleg Drokin
@ 2014-06-23  1:32 ` Oleg Drokin
  2014-06-23  1:32 ` [PATCH 12/18] staging/lustre/osc: osc_extent_truncate()) ASSERTION( !ext->oe_urgent ) failed Oleg Drokin
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-06-23  1:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel; +Cc: Dmitry Eremin, Oleg Drokin

From: Dmitry Eremin <dmitry.eremin@intel.com>

'sd.page_link' is used uninitialized in this function.
'ss.page_link' is used uninitialized in this function.
'sl.page_link' is used uninitialized in this function.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-on: http://review.whamcloud.com/10613
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4629
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lustre/obdclass/capa.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/staging/lustre/lustre/obdclass/capa.c b/drivers/staging/lustre/lustre/obdclass/capa.c
index be1c613..cf1c497 100644
--- a/drivers/staging/lustre/lustre/obdclass/capa.c
+++ b/drivers/staging/lustre/lustre/obdclass/capa.c
@@ -279,6 +279,7 @@ int capa_hmac(__u8 *hmac, struct lustre_capa *capa, __u8 *key)
 	}
 	keylen = alg->ha_keylen;
 
+	sg_init_table(&sl, 1);
 	sg_set_page(&sl, virt_to_page(capa),
 		    offsetof(struct lustre_capa, lc_hmac),
 		    (unsigned long)(capa) % PAGE_CACHE_SIZE);
@@ -320,9 +321,11 @@ int capa_encrypt_id(__u32 *d, __u32 *s, __u8 *key, int keylen)
 		GOTO(out, rc);
 	}
 
+	sg_init_table(&sd, 1);
 	sg_set_page(&sd, virt_to_page(d), 16,
 		    (unsigned long)(d) % PAGE_CACHE_SIZE);
 
+	sg_init_table(&ss, 1);
 	sg_set_page(&ss, virt_to_page(s), 16,
 		    (unsigned long)(s) % PAGE_CACHE_SIZE);
 	desc.tfm   = tfm;
@@ -370,9 +373,11 @@ int capa_decrypt_id(__u32 *d, __u32 *s, __u8 *key, int keylen)
 		GOTO(out, rc);
 	}
 
+	sg_init_table(&sd, 1);
 	sg_set_page(&sd, virt_to_page(d), 16,
 		    (unsigned long)(d) % PAGE_CACHE_SIZE);
 
+	sg_init_table(&ss, 1);
 	sg_set_page(&ss, virt_to_page(s), 16,
 		    (unsigned long)(s) % PAGE_CACHE_SIZE);
 
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 12/18] staging/lustre/osc: osc_extent_truncate()) ASSERTION( !ext->oe_urgent ) failed
  2014-06-23  1:32 [PATCH 00/18] Lustre fixes Oleg Drokin
                   ` (10 preceding siblings ...)
  2014-06-23  1:32 ` [PATCH 11/18] staging/lustre/obdclass: Fix uninitialized variables Oleg Drokin
@ 2014-06-23  1:32 ` Oleg Drokin
  2014-06-23  1:32 ` [PATCH 13/18] staging/lustre/llite: Fix uninitialized variable Oleg Drokin
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-06-23  1:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel; +Cc: Andriy Skulysh, Oleg Drokin

From: Andriy Skulysh <Andriy_Skulysh@xyratex.com>

The bug was caused by race between truncate & fsync.
osc_extent_wait() doesn't takes into account oe_trunc_pending
during setting oe_urgent. The race arises after
osc_object_unlock().
osc_extent_wait() should ignore extents with oe_trunc_pending
while waiting for OES_INV. osc_cache_truncate_end() will set
oe_urgent and call osc_io_unplug_async()

Signed-off-by: Andriy Skulysh <Andriy_Skulysh@xyratex.com>
Reviewed-on: http://review.whamcloud.com/10204
Xyratex-bug-id: LELUS-239
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4852
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lustre/osc/osc_cache.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/osc/osc_cache.c b/drivers/staging/lustre/lustre/osc/osc_cache.c
index 00f38ee..f075b69 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cache.c
+++ b/drivers/staging/lustre/lustre/osc/osc_cache.c
@@ -871,7 +871,8 @@ static int osc_extent_wait(const struct lu_env *env, struct osc_extent *ext,
 	LASSERT(sanity_check_nolock(ext) == 0);
 	/* `Kick' this extent only if the caller is waiting for it to be
 	 * written out. */
-	if (state == OES_INV && !ext->oe_urgent && !ext->oe_hp) {
+	if (state == OES_INV && !ext->oe_urgent && !ext->oe_hp &&
+	    !ext->oe_trunc_pending) {
 		if (ext->oe_state == OES_ACTIVE) {
 			ext->oe_urgent = 1;
 		} else if (ext->oe_state == OES_CACHE) {
@@ -922,8 +923,8 @@ static int osc_extent_truncate(struct osc_extent *ext, pgoff_t trunc_index,
 	int		    rc       = 0;
 
 	LASSERT(sanity_check(ext) == 0);
-	LASSERT(ext->oe_state == OES_TRUNC);
-	LASSERT(!ext->oe_urgent);
+	EASSERT(ext->oe_state == OES_TRUNC, ext);
+	EASSERT(!ext->oe_urgent, ext);
 
 	/* Request new lu_env.
 	 * We can't use that env from osc_cache_truncate_start() because
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 13/18] staging/lustre/llite: Fix uninitialized variable
  2014-06-23  1:32 [PATCH 00/18] Lustre fixes Oleg Drokin
                   ` (11 preceding siblings ...)
  2014-06-23  1:32 ` [PATCH 12/18] staging/lustre/osc: osc_extent_truncate()) ASSERTION( !ext->oe_urgent ) failed Oleg Drokin
@ 2014-06-23  1:32 ` Oleg Drokin
  2014-06-23  1:32 ` [PATCH 14/18] staging/lustre/ptlrpc: unlink request buffer correctly Oleg Drokin
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-06-23  1:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel; +Cc: Dmitry Eremin, Oleg Drokin

From: Dmitry Eremin <dmitry.eremin@intel.com>

'f.f_flags' might be used uninitialized in this function.

xattr.c:248: 'f.f_flags' is declared.
xattr.c:244: lump!= ( (void* )0) is true
xattr.c:254: 'f.f_flags' is used, but is uninitialized.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-on: http://review.whamcloud.com/10663
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4629
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lustre/llite/xattr.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c
index c6c27bb..c1eff65 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -246,6 +246,7 @@ int ll_setxattr(struct dentry *dentry, const char *name,
 			int lum_size = (lump->lmm_magic == LOV_USER_MAGIC_V1) ?
 				sizeof(*lump) : sizeof(struct lov_user_md_v3);
 
+			memset(&f, 0, sizeof(f)); /* f.f_flags is used below */
 			f.f_dentry = dentry;
 			rc = ll_lov_setstripe_ea_info(inode, &f, flags, lump,
 						      lum_size);
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 14/18] staging/lustre/ptlrpc: unlink request buffer correctly
  2014-06-23  1:32 [PATCH 00/18] Lustre fixes Oleg Drokin
                   ` (12 preceding siblings ...)
  2014-06-23  1:32 ` [PATCH 13/18] staging/lustre/llite: Fix uninitialized variable Oleg Drokin
@ 2014-06-23  1:32 ` Oleg Drokin
  2014-06-23  1:32 ` [PATCH 15/18] staging/lustre/obdclass: runtime load lustre client when needed Oleg Drokin
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-06-23  1:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel; +Cc: Alexey Lyashkov, Oleg Drokin

From: Alexey Lyashkov <alexey_lyashkov@xyratex.com>

outgoning buffer may be hold by lnet and don't unlinked fast,
it's break unloading a lustre modules as request hold a
reference to the export/obd

Signed-off-by: Alexey Lyashkov <alexey_lyashkov@xyratex.com>
Xyratex-bug-id: MRP-1848
Reviewed-on: http://review.whamcloud.com/10353
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5073
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lustre/include/lustre_net.h |  6 ++++--
 drivers/staging/lustre/lustre/ptlrpc/client.c      | 11 ++++++-----
 drivers/staging/lustre/lustre/ptlrpc/events.c      |  9 +++++----
 drivers/staging/lustre/lustre/ptlrpc/niobuf.c      |  5 +++--
 4 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre_net.h b/drivers/staging/lustre/lustre/include/lustre_net.h
index f6b7d10..b837d34 100644
--- a/drivers/staging/lustre/lustre/include/lustre_net.h
+++ b/drivers/staging/lustre/lustre/include/lustre_net.h
@@ -1591,7 +1591,8 @@ struct ptlrpc_request {
 		rq_replay:1,
 		rq_no_resend:1, rq_waiting:1, rq_receiving_reply:1,
 		rq_no_delay:1, rq_net_err:1, rq_wait_ctx:1,
-		rq_early:1, rq_must_unlink:1,
+		rq_early:1,
+		rq_req_unlink:1, rq_reply_unlink:1,
 		rq_memalloc:1,      /* req originated from "kswapd" */
 		/* server-side flags */
 		rq_packed_final:1,  /* packed final reply */
@@ -3039,7 +3040,8 @@ ptlrpc_client_recv_or_unlink(struct ptlrpc_request *req)
 		spin_unlock(&req->rq_lock);
 		return 1;
 	}
-	rc = req->rq_receiving_reply || req->rq_must_unlink;
+	rc = req->rq_receiving_reply;
+	rc = rc || req->rq_req_unlink || req->rq_reply_unlink;
 	spin_unlock(&req->rq_lock);
 	return rc;
 }
diff --git a/drivers/staging/lustre/lustre/ptlrpc/client.c b/drivers/staging/lustre/lustre/ptlrpc/client.c
index 1890482..0e0ea5c 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/client.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/client.c
@@ -1202,7 +1202,7 @@ static int after_reply(struct ptlrpc_request *req)
 
 	LASSERT(obd != NULL);
 	/* repbuf must be unlinked */
-	LASSERT(!req->rq_receiving_reply && !req->rq_must_unlink);
+	LASSERT(!req->rq_receiving_reply && !req->rq_reply_unlink);
 
 	if (req->rq_reply_truncate) {
 		if (ptlrpc_no_resend(req)) {
@@ -2406,9 +2406,10 @@ int ptlrpc_unregister_reply(struct ptlrpc_request *request, int async)
 		}
 
 		LASSERT(rc == -ETIMEDOUT);
-		DEBUG_REQ(D_WARNING, request, "Unexpectedly long timeout "
-			  "rvcng=%d unlnk=%d", request->rq_receiving_reply,
-			  request->rq_must_unlink);
+		DEBUG_REQ(D_WARNING, request,
+			  "Unexpectedly long timeout rvcng=%d unlnk=%d/%d",
+			  request->rq_receiving_reply,
+			  request->rq_req_unlink, request->rq_reply_unlink);
 	}
 	return 0;
 }
@@ -3081,7 +3082,7 @@ void *ptlrpcd_alloc_work(struct obd_import *imp,
 	req->rq_interpret_reply = work_interpreter;
 	/* don't want reply */
 	req->rq_receiving_reply = 0;
-	req->rq_must_unlink = 0;
+	req->rq_req_unlink = req->rq_reply_unlink = 0;
 	req->rq_no_delay = req->rq_no_resend = 1;
 	req->rq_pill.rc_fmt = (void *)&worker_format;
 
diff --git a/drivers/staging/lustre/lustre/ptlrpc/events.c b/drivers/staging/lustre/lustre/ptlrpc/events.c
index 9f9b8d1..209fcc1 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/events.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/events.c
@@ -63,19 +63,20 @@ void request_out_callback(lnet_event_t *ev)
 	DEBUG_REQ(D_NET, req, "type %d, status %d", ev->type, ev->status);
 
 	sptlrpc_request_out_callback(req);
+	spin_lock(&req->rq_lock);
 	req->rq_real_sent = cfs_time_current_sec();
+	if (ev->unlinked)
+		req->rq_req_unlink = 0;
 
 	if (ev->type == LNET_EVENT_UNLINK || ev->status != 0) {
 
 		/* Failed send: make it seem like the reply timed out, just
 		 * like failing sends in client.c does currently...  */
 
-		spin_lock(&req->rq_lock);
 		req->rq_net_err = 1;
-		spin_unlock(&req->rq_lock);
-
 		ptlrpc_client_wake_req(req);
 	}
+	spin_unlock(&req->rq_lock);
 
 	ptlrpc_req_finished(req);
 }
@@ -102,7 +103,7 @@ void reply_in_callback(lnet_event_t *ev)
 	req->rq_receiving_reply = 0;
 	req->rq_early = 0;
 	if (ev->unlinked)
-		req->rq_must_unlink = 0;
+		req->rq_reply_unlink = 0;
 
 	if (ev->status)
 		goto out_wake;
diff --git a/drivers/staging/lustre/lustre/ptlrpc/niobuf.c b/drivers/staging/lustre/lustre/ptlrpc/niobuf.c
index f760504..3f0ca23 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/niobuf.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/niobuf.c
@@ -580,8 +580,9 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply)
 	spin_lock(&request->rq_lock);
 	/* If the MD attach succeeds, there _will_ be a reply_in callback */
 	request->rq_receiving_reply = !noreply;
+	request->rq_req_unlink = 1;
 	/* We are responsible for unlinking the reply buffer */
-	request->rq_must_unlink = !noreply;
+	request->rq_reply_unlink = !noreply;
 	/* Clear any flags that may be present from previous sends. */
 	request->rq_replied = 0;
 	request->rq_err = 0;
@@ -604,7 +605,7 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply)
 		reply_md.user_ptr  = &request->rq_reply_cbid;
 		reply_md.eq_handle = ptlrpc_eq_h;
 
-		/* We must see the unlink callback to unset rq_must_unlink,
+		/* We must see the unlink callback to unset rq_reply_unlink,
 		   so we can't auto-unlink */
 		rc = LNetMDAttach(reply_me_h, reply_md, LNET_RETAIN,
 				  &request->rq_reply_md_h);
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 15/18] staging/lustre/obdclass: runtime load lustre client when needed
  2014-06-23  1:32 [PATCH 00/18] Lustre fixes Oleg Drokin
                   ` (13 preceding siblings ...)
  2014-06-23  1:32 ` [PATCH 14/18] staging/lustre/ptlrpc: unlink request buffer correctly Oleg Drokin
@ 2014-06-23  1:32 ` Oleg Drokin
  2014-06-23  1:32 ` [PATCH 16/18] staging/lustre/vvp: release mmap_sem in error case Oleg Drokin
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-06-23  1:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel; +Cc: Bob Glossman, Oleg Drokin

From: Bob Glossman <bob.glossman@intel.com>

Manually force loading of lustre.ko during client
mounts with request_module for cases where
autoloading doesn't happen.

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-on: http://review.whamcloud.com/10587
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4800
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lustre/obdclass/obd_mount.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/obdclass/obd_mount.c b/drivers/staging/lustre/lustre/obdclass/obd_mount.c
index 03d9a6a..4e77f07 100644
--- a/drivers/staging/lustre/lustre/obdclass/obd_mount.c
+++ b/drivers/staging/lustre/lustre/obdclass/obd_mount.c
@@ -1216,7 +1216,9 @@ int lustre_fill_super(struct super_block *sb, void *data, int silent)
 
 	if (lmd_is_client(lmd)) {
 		CDEBUG(D_MOUNT, "Mounting client %s\n", lmd->lmd_profile);
-		if (!client_fill_super) {
+		if (client_fill_super == NULL)
+			request_module("lustre");
+		if (client_fill_super == NULL) {
 			LCONSOLE_ERROR_MSG(0x165, "Nothing registered for "
 					   "client mount! Is the 'lustre' "
 					   "module loaded?\n");
@@ -1299,6 +1301,7 @@ struct file_system_type lustre_fs_type = {
 	.fs_flags     = FS_BINARY_MOUNTDATA | FS_REQUIRES_DEV |
 			FS_HAS_FIEMAP | FS_RENAME_DOES_D_MOVE,
 };
+MODULE_ALIAS_FS("lustre");
 
 int lustre_register_fs(void)
 {
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 16/18] staging/lustre/vvp: release mmap_sem in error case
  2014-06-23  1:32 [PATCH 00/18] Lustre fixes Oleg Drokin
                   ` (14 preceding siblings ...)
  2014-06-23  1:32 ` [PATCH 15/18] staging/lustre/obdclass: runtime load lustre client when needed Oleg Drokin
@ 2014-06-23  1:32 ` Oleg Drokin
  2014-06-23  1:32 ` [PATCH 17/18] staging/lustre/llite: fix a flag bug of vvp_io_kernel_fault() Oleg Drokin
  2014-06-23  1:32 ` [PATCH 18/18] staging/lustre/lnet: abort messages whose MD has been unlinked Oleg Drokin
  17 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-06-23  1:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel; +Cc: Patrick Farrell, Oleg Drokin

From: Patrick Farrell <paf@cray.com>

The mmap_sem is downed in vvp_mmap_locks, but in case of
error from cl_io_lock_alloc_add, it is not upped.

Credit to Paul Casella at Cray for finding this.

Signed-off-by: Patrick Farrell <paf@cray.com>
Reviewed-on: http://review.whamcloud.com/10741
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5221
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lustre/llite/vvp_io.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/llite/vvp_io.c b/drivers/staging/lustre/lustre/llite/vvp_io.c
index 0e0b404..04230ed 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_io.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_io.c
@@ -269,8 +269,10 @@ static int vvp_mmap_locks(const struct lu_env *env,
 			       descr->cld_mode, descr->cld_start,
 			       descr->cld_end);
 
-			if (result < 0)
+			if (result < 0) {
+				up_read(&mm->mmap_sem);
 				return result;
+			}
 
 			if (vma->vm_end - addr >= count)
 				break;
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 17/18] staging/lustre/llite: fix a flag bug of vvp_io_kernel_fault()
  2014-06-23  1:32 [PATCH 00/18] Lustre fixes Oleg Drokin
                   ` (15 preceding siblings ...)
  2014-06-23  1:32 ` [PATCH 16/18] staging/lustre/vvp: release mmap_sem in error case Oleg Drokin
@ 2014-06-23  1:32 ` Oleg Drokin
  2014-06-23  1:32 ` [PATCH 18/18] staging/lustre/lnet: abort messages whose MD has been unlinked Oleg Drokin
  17 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-06-23  1:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel; +Cc: Li Xi, Oleg Drokin

From: Li Xi <lixi@ddn.com>

After vvp_io_kernel_fault() locked the page, it should set
VM_FAULT_LOCKED.

Signed-off-by: Li Xi <lixi@ddn.com>
Reviewed-on: http://review.whamcloud.com/10740
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Emoly Liu <emoly.liu@intel.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lustre/llite/vvp_io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/llite/vvp_io.c b/drivers/staging/lustre/lustre/llite/vvp_io.c
index 04230ed..2539a89 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_io.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_io.c
@@ -624,7 +624,7 @@ static int vvp_io_kernel_fault(struct vvp_fault_io *cfio)
 		       page_private(vmf->page), vmf->virtual_address);
 		if (unlikely(!(cfio->fault.ft_flags & VM_FAULT_LOCKED))) {
 			lock_page(vmf->page);
-			cfio->fault.ft_flags &= VM_FAULT_LOCKED;
+			cfio->fault.ft_flags |= VM_FAULT_LOCKED;
 		}
 
 		cfio->ft_vmpage = vmf->page;
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 18/18] staging/lustre/lnet: abort messages whose MD has been unlinked
  2014-06-23  1:32 [PATCH 00/18] Lustre fixes Oleg Drokin
                   ` (16 preceding siblings ...)
  2014-06-23  1:32 ` [PATCH 17/18] staging/lustre/llite: fix a flag bug of vvp_io_kernel_fault() Oleg Drokin
@ 2014-06-23  1:32 ` Oleg Drokin
  17 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2014-06-23  1:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, devel; +Cc: Isaac Huang, Oleg Drokin

From: Isaac Huang <he.huang@intel.com>

If LNetMDUnlink has been called, all outgoing messages
on that MD should be aborted before lnet_ni_send() is
called.

Signed-off-by: Isaac Huang <he.huang@intel.com>
Reviewed-on: http://review.whamcloud.com/8041
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4006
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
---
 .../staging/lustre/include/linux/lnet/lib-types.h  |  1 +
 drivers/staging/lustre/lnet/lnet/lib-md.c          | 10 ++---
 drivers/staging/lustre/lnet/lnet/lib-me.c          | 11 ++---
 drivers/staging/lustre/lnet/lnet/lib-move.c        | 49 +++++++++++++++-------
 4 files changed, 45 insertions(+), 26 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index a63654b..6816aa0 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -280,6 +280,7 @@ typedef struct lnet_libmd {
 
 #define LNET_MD_FLAG_ZOMBIE	   (1 << 0)
 #define LNET_MD_FLAG_AUTO_UNLINK      (1 << 1)
+#define LNET_MD_FLAG_ABORTED	 (1 << 2)
 
 #ifdef LNET_USE_LIB_FREELIST
 typedef struct {
diff --git a/drivers/staging/lustre/lnet/lnet/lib-md.c b/drivers/staging/lustre/lnet/lnet/lib-md.c
index ae643f2..d68c6e0 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-md.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-md.c
@@ -387,7 +387,8 @@ EXPORT_SYMBOL(LNetMDBind);
 
 /**
  * Unlink the memory descriptor from any ME it may be linked to and release
- * the internal resources associated with it.
+ * the internal resources associated with it. As a result, active messages
+ * associated with the MD may get aborted.
  *
  * This function does not free the memory region associated with the MD;
  * i.e., the memory the user allocated for this MD. If the ME associated with
@@ -433,12 +434,11 @@ LNetMDUnlink (lnet_handle_md_t mdh)
 		return -ENOENT;
 	}
 
+	md->md_flags |= LNET_MD_FLAG_ABORTED;
 	/* If the MD is busy, lnet_md_unlink just marks it for deletion, and
-	 * when the NAL is done, the completion event flags that the MD was
+	 * when the LND is done, the completion event flags that the MD was
 	 * unlinked.  Otherwise, we enqueue an event now... */
-
-	if (md->md_eq != NULL &&
-	    md->md_refcount == 0) {
+	if (md->md_eq != NULL && md->md_refcount == 0) {
 		lnet_build_unlink_event(md, &ev);
 		lnet_eq_enqueue_event(md->md_eq, &ev);
 	}
diff --git a/drivers/staging/lustre/lnet/lnet/lib-me.c b/drivers/staging/lustre/lnet/lnet/lib-me.c
index 0081075..0e42209 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-me.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-me.c
@@ -246,11 +246,12 @@ LNetMEUnlink(lnet_handle_me_t meh)
 	}
 
 	md = me->me_md;
-	if (md != NULL &&
-	    md->md_eq != NULL &&
-	    md->md_refcount == 0) {
-		lnet_build_unlink_event(md, &ev);
-		lnet_eq_enqueue_event(md->md_eq, &ev);
+	if (md != NULL) {
+		md->md_flags |= LNET_MD_FLAG_ABORTED;
+		if (md->md_eq != NULL && md->md_refcount == 0) {
+			lnet_build_unlink_event(md, &ev);
+			lnet_eq_enqueue_event(md->md_eq, &ev);
+		}
 	}
 
 	lnet_me_unlink(me);
diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
index bbf43ae..95bf41f 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-move.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
@@ -773,26 +773,30 @@ lnet_peer_alive_locked(lnet_peer_t *lp)
 	return 0;
 }
 
-int
+/**
+ * \param msg The message to be sent.
+ * \param do_send True if lnet_ni_send() should be called in this function.
+ *	  lnet_send() is going to lnet_net_unlock immediately after this, so
+ *	  it sets do_send FALSE and I don't do the unlock/send/lock bit.
+ *
+ * \retval 0 If \a msg sent or OK to send.
+ * \retval EAGAIN If \a msg blocked for credit.
+ * \retval EHOSTUNREACH If the next hop of the message appears dead.
+ * \retval ECANCELED If the MD of the message has been unlinked.
+ */
+static int
 lnet_post_send_locked(lnet_msg_t *msg, int do_send)
 {
-	/* lnet_send is going to lnet_net_unlock immediately after this,
-	 * so it sets do_send FALSE and I don't do the unlock/send/lock bit.
-	 * I return EAGAIN if msg blocked, EHOSTUNREACH if msg_txpeer
-	 * appears dead, and 0 if sent or OK to send */
-	struct lnet_peer	*lp = msg->msg_txpeer;
-	struct lnet_ni		*ni = lp->lp_ni;
-	struct lnet_tx_queue	*tq;
-	int			cpt;
+	lnet_peer_t		*lp = msg->msg_txpeer;
+	lnet_ni_t		*ni = lp->lp_ni;
+	int			cpt = msg->msg_tx_cpt;
+	struct lnet_tx_queue	*tq = ni->ni_tx_queues[cpt];
 
 	/* non-lnet_send() callers have checked before */
 	LASSERT(!do_send || msg->msg_tx_delayed);
 	LASSERT(!msg->msg_receiving);
 	LASSERT(msg->msg_tx_committed);
 
-	cpt = msg->msg_tx_cpt;
-	tq = ni->ni_tx_queues[cpt];
-
 	/* NB 'lp' is always the next hop */
 	if ((msg->msg_target.pid & LNET_PID_USERFLAG) == 0 &&
 	    lnet_peer_alive_locked(lp) == 0) {
@@ -809,6 +813,20 @@ lnet_post_send_locked(lnet_msg_t *msg, int do_send)
 		return EHOSTUNREACH;
 	}
 
+	if (msg->msg_md != NULL &&
+	    (msg->msg_md->md_flags & LNET_MD_FLAG_ABORTED) != 0) {
+		lnet_net_unlock(cpt);
+
+		CNETERR("Aborting message for %s: LNetM[DE]Unlink() already "
+			"called on the MD/ME.\n",
+			libcfs_id2str(msg->msg_target));
+		if (do_send)
+			lnet_finalize(ni, msg, -ECANCELED);
+
+		lnet_net_lock(cpt);
+		return ECANCELED;
+	}
+
 	if (!msg->msg_peertxcredit) {
 		LASSERT((lp->lp_txcredits < 0) ==
 			 !list_empty(&lp->lp_txq));
@@ -1327,13 +1345,13 @@ lnet_send(lnet_nid_t src_nid, lnet_msg_t *msg, lnet_nid_t rtr_nid)
 	rc = lnet_post_send_locked(msg, 0);
 	lnet_net_unlock(cpt);
 
-	if (rc == EHOSTUNREACH)
-		return -EHOSTUNREACH;
+	if (rc == EHOSTUNREACH || rc == ECANCELED)
+		return -rc;
 
 	if (rc == 0)
 		lnet_ni_send(src_ni, msg);
 
-	return 0;
+	return 0; /* rc == 0 or EAGAIN */
 }
 
 static void
@@ -2288,7 +2306,6 @@ LNetGet(lnet_nid_t self, lnet_handle_md_t mdh,
 		lnet_res_unlock(cpt);
 
 		lnet_msg_free(msg);
-
 		return -ENOENT;
 	}
 
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2014-06-23  1:37 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-23  1:32 [PATCH 00/18] Lustre fixes Oleg Drokin
2014-06-23  1:32 ` [PATCH 01/18] staging/lustre/libcfs: revert changes to libcfs_sock_ioctl Oleg Drokin
2014-06-23  1:32 ` [PATCH 02/18] staging/lustre/ptlrpc: Protect request buffer changing Oleg Drokin
2014-06-23  1:32 ` [PATCH 03/18] staging/lustre/llite: Only kill SGID/SUID bits Oleg Drokin
2014-06-23  1:32 ` [PATCH 04/18] staging/lustre: fix frong ldlm flags type used Oleg Drokin
2014-06-23  1:32 ` [PATCH 05/18] staging/lustre/ptlrpc: fix NULL pointer dereference of {exp,imp}_obd Oleg Drokin
2014-06-23  1:32 ` [PATCH 06/18] staging/lustre/mgc: mgc import reconnect race Oleg Drokin
2014-06-23  1:32 ` [PATCH 07/18] staging/lustre/osc: get rid of old checksum initial value Oleg Drokin
2014-06-23  1:32 ` [PATCH 08/18] staging/lustre/ptlrpc: race at req processing Oleg Drokin
2014-06-23  1:32 ` [PATCH 09/18] staging/lustre/mgc: replace hard-coded MGC_ENQUEUE_LIMIT value Oleg Drokin
2014-06-23  1:32 ` [PATCH 10/18] staging/lustre/ptlrpc: Add schedule point to ptlrpc_check_set() Oleg Drokin
2014-06-23  1:32 ` [PATCH 11/18] staging/lustre/obdclass: Fix uninitialized variables Oleg Drokin
2014-06-23  1:32 ` [PATCH 12/18] staging/lustre/osc: osc_extent_truncate()) ASSERTION( !ext->oe_urgent ) failed Oleg Drokin
2014-06-23  1:32 ` [PATCH 13/18] staging/lustre/llite: Fix uninitialized variable Oleg Drokin
2014-06-23  1:32 ` [PATCH 14/18] staging/lustre/ptlrpc: unlink request buffer correctly Oleg Drokin
2014-06-23  1:32 ` [PATCH 15/18] staging/lustre/obdclass: runtime load lustre client when needed Oleg Drokin
2014-06-23  1:32 ` [PATCH 16/18] staging/lustre/vvp: release mmap_sem in error case Oleg Drokin
2014-06-23  1:32 ` [PATCH 17/18] staging/lustre/llite: fix a flag bug of vvp_io_kernel_fault() Oleg Drokin
2014-06-23  1:32 ` [PATCH 18/18] staging/lustre/lnet: abort messages whose MD has been unlinked Oleg Drokin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.