All of lore.kernel.org
 help / color / mirror / Atom feed
* [lustre-devel] [PATCH 0/9] Assorted lustre patches - mostly from OpenSFS
@ 2018-11-23  7:15 NeilBrown
  2018-11-23  7:15 ` [lustre-devel] [PATCH 2/9] lustre: remove EIOCBRETRY handling NeilBrown
                   ` (9 more replies)
  0 siblings, 10 replies; 25+ messages in thread
From: NeilBrown @ 2018-11-23  7:15 UTC (permalink / raw)
  To: lustre-devel

This is a miscellaneous set of patches for lustre-in-linux.
Apart from the first, they are (bases on) patches from OpenSFS lustre.
I've been trawling through OpenSFS looking for patches that aren't
in linux-lustre.  Most of what I find are only relevant on the
server, but these seem to be appropriate for the client.
In most cases, I don't really know what is going on, so they
might be inappropriate for the client.  I'm hoping that more
knowledgeable people can put be straight.

This are just the first few - I think there are quite a few more
(before we get to e.g. pfl which has been attempted yet).  If
I've got these mostly right, I'll continue looking through
the rest of my list.

Thanks,
NeilBrown

---

Bruno Faccini (2):
      lustre: llite: clear LLIF_FILE_RESTORING when done
      lustre: obdclass: health_check to report unhealthy upon LBUG

Doug Oucharek (1):
      lustre: lnet: Stop MLX5 triggering a dump_cqe

Fan Yong (1):
      lustre: statahead: skip agl for the file in restoring

James Simmons (1):
      lustre: remove EIOCBRETRY handling

Lai Siyao (2):
      lustre: rename: DNE2 should return -EXDEV upon remote rename
      lustre: statahead: add smp_mb() to serialize ops

NeilBrown (1):
      lustre: obdclass: fix formating of connection flags

Vladimir Saveliev (1):
      lustre: ptlrpc: use smp unsafe at_init only for initialization


 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c    |    7 ++++-
 .../staging/lustre/lustre/include/lustre_errno.h   |    1 -
 .../staging/lustre/lustre/include/lustre_import.h  |   19 ++++++++++++--
 drivers/staging/lustre/lustre/llite/llite_lib.c    |    6 ++++
 drivers/staging/lustre/lustre/llite/statahead.c    |   28 ++++++++++++++++++--
 drivers/staging/lustre/lustre/lmv/lmv_obd.c        |    2 +
 .../lustre/lustre/obdclass/lprocfs_status.c        |    4 +--
 drivers/staging/lustre/lustre/obdclass/obd_sysfs.c |    4 ++-
 drivers/staging/lustre/lustre/ptlrpc/import.c      |    2 +
 9 files changed, 61 insertions(+), 12 deletions(-)

--
Signature

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 1/9] lustre: obdclass: fix formating of connection flags
  2018-11-23  7:15 [lustre-devel] [PATCH 0/9] Assorted lustre patches - mostly from OpenSFS NeilBrown
  2018-11-23  7:15 ` [lustre-devel] [PATCH 2/9] lustre: remove EIOCBRETRY handling NeilBrown
@ 2018-11-23  7:15 ` NeilBrown
  2018-11-26  1:30   ` James Simmons
  2018-11-23  7:15 ` [lustre-devel] [PATCH 3/9] lustre: ptlrpc: use smp unsafe at_init only for initialization NeilBrown
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: NeilBrown @ 2018-11-23  7:15 UTC (permalink / raw)
  To: lustre-devel

The current code puts the separator *only*
at the start rather than *never* at the start.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 .../lustre/lustre/obdclass/lprocfs_status.c        |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
index feba2ef5a3bc..acfea7a44350 100644
--- a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
+++ b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
@@ -709,14 +709,14 @@ static void obd_connect_seq_flags2str(struct seq_file *m, u64 flags,
 	for (i = 0, mask = 1; i < 64; i++, mask <<= 1) {
 		if (flags & mask) {
 			seq_printf(m, "%s%s",
-				   first ? sep : "", obd_connect_names[i]);
+				   first ? "" : sep, obd_connect_names[i]);
 			first = false;
 		}
 	}
 
 	if (flags & ~(mask - 1)) {
 		seq_printf(m, "%sunknown flags %#llx",
-			   first ? sep : "", flags & ~(mask - 1));
+			   first ? "" : sep, flags & ~(mask - 1));
 		first = false;
 	}
 

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 2/9] lustre: remove EIOCBRETRY handling
  2018-11-23  7:15 [lustre-devel] [PATCH 0/9] Assorted lustre patches - mostly from OpenSFS NeilBrown
@ 2018-11-23  7:15 ` NeilBrown
  2018-11-26  1:30   ` James Simmons
  2018-11-23  7:15 ` [lustre-devel] [PATCH 1/9] lustre: obdclass: fix formating of connection flags NeilBrown
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: NeilBrown @ 2018-11-23  7:15 UTC (permalink / raw)
  To: lustre-devel

From: James Simmons <uja.ornl@yahoo.com>

With linux commit 41003a7bcfed ("aio: remove retry-based AIO")
AIO retry handling was removed due to it being buggy and no
one using it, including lustre. Since this is the case
remove EIOCBRETRY since it no longer in the kernel starting
with version 3.18.

Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: http://review.whamcloud.com/14507
WC-bug-id: https://jira.whamcloud.com/browse/LU-6426
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Blake Caldwell <blakec@ornl.gov>
Reviewed-by: Thomas Stibor <thomas@stibor.net>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 .../staging/lustre/lustre/include/lustre_errno.h   |    1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre_errno.h b/drivers/staging/lustre/lustre/include/lustre_errno.h
index 59fbb9f47ff1..806e79672482 100644
--- a/drivers/staging/lustre/lustre/include/lustre_errno.h
+++ b/drivers/staging/lustre/lustre/include/lustre_errno.h
@@ -181,7 +181,6 @@
 #define LUSTRE_EBADTYPE		527	/* Type not supported by server */
 #define LUSTRE_EJUKEBOX		528	/* Request won't finish until timeout */
 #define LUSTRE_EIOCBQUEUED	529	/* iocb queued await completion event */
-#define LUSTRE_EIOCBRETRY	530	/* iocb queued, will trigger a retry */
 
 /*
  * Translations are optimized away on x86.  Host errnos that shouldn't be put

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 3/9] lustre: ptlrpc: use smp unsafe at_init only for initialization
  2018-11-23  7:15 [lustre-devel] [PATCH 0/9] Assorted lustre patches - mostly from OpenSFS NeilBrown
  2018-11-23  7:15 ` [lustre-devel] [PATCH 2/9] lustre: remove EIOCBRETRY handling NeilBrown
  2018-11-23  7:15 ` [lustre-devel] [PATCH 1/9] lustre: obdclass: fix formating of connection flags NeilBrown
@ 2018-11-23  7:15 ` NeilBrown
  2018-11-26  1:32   ` James Simmons
  2018-11-23  7:15 ` [lustre-devel] [PATCH 4/9] lustre: rename: DNE2 should return -EXDEV upon remote rename NeilBrown
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: NeilBrown @ 2018-11-23  7:15 UTC (permalink / raw)
  To: lustre-devel

From: Vladimir Saveliev <vladimir.saveliev@seagate.com>

at_init() is not smp safe, so it is not supposed to be used anywhere
but in at initialization.
Add at_reinit() - safe version of at_init().

Signed-off-by: Vladimir Saveliev <vladimir.saveliev@seagate.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-6805
Reviewed-on: http://review.whamcloud.com/15522
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 .../staging/lustre/lustre/include/lustre_import.h  |   19 +++++++++++++++++--
 drivers/staging/lustre/lustre/ptlrpc/import.c      |    2 +-
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre_import.h b/drivers/staging/lustre/lustre/include/lustre_import.h
index a629f6bba814..8a8a125bd130 100644
--- a/drivers/staging/lustre/lustre/include/lustre_import.h
+++ b/drivers/staging/lustre/lustre/include/lustre_import.h
@@ -331,12 +331,17 @@ static inline unsigned int at_timeout2est(unsigned int val)
 	return (max((val << 2) / 5, 5U) - 4);
 }
 
-static inline void at_reset(struct adaptive_timeout *at, int val)
+static inline void at_reset_nolock(struct adaptive_timeout *at, int val)
 {
-	spin_lock(&at->at_lock);
 	at->at_current = val;
 	at->at_worst_ever = val;
 	at->at_worst_time = ktime_get_real_seconds();
+}
+
+static inline void at_reset(struct adaptive_timeout *at, int val)
+{
+	spin_lock(&at->at_lock);
+	at_reset_nolock(at, val);
 	spin_unlock(&at->at_lock);
 }
 
@@ -348,6 +353,16 @@ static inline void at_init(struct adaptive_timeout *at, int val, int flags)
 	at_reset(at, val);
 }
 
+static inline void at_reinit(struct adaptive_timeout *at, int val, int flags)
+{
+	spin_lock(&at->at_lock);
+	at->at_binstart = 0;
+	memset(at->at_hist, 0, sizeof(at->at_hist));
+	at->at_flags = flags;
+	at_reset_nolock(at, val);
+	spin_unlock(&at->at_lock);
+}
+
 extern unsigned int at_min;
 static inline int at_get(struct adaptive_timeout *at)
 {
diff --git a/drivers/staging/lustre/lustre/ptlrpc/import.c b/drivers/staging/lustre/lustre/ptlrpc/import.c
index 07dc87d9513e..480c860d066e 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/import.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/import.c
@@ -1036,7 +1036,7 @@ static int ptlrpc_connect_interpret(const struct lu_env *env,
 	 * The net statistics after (re-)connect is not valid anymore,
 	 * because may reflect other routing, etc.
 	 */
-	at_init(&imp->imp_at.iat_net_latency, 0, 0);
+	at_reinit(&imp->imp_at.iat_net_latency, 0, 0);
 	ptlrpc_at_adj_net_latency(request,
 				  lustre_msg_get_service_time(request->rq_repmsg));
 

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 4/9] lustre: rename: DNE2 should return -EXDEV upon remote rename
  2018-11-23  7:15 [lustre-devel] [PATCH 0/9] Assorted lustre patches - mostly from OpenSFS NeilBrown
                   ` (2 preceding siblings ...)
  2018-11-23  7:15 ` [lustre-devel] [PATCH 3/9] lustre: ptlrpc: use smp unsafe at_init only for initialization NeilBrown
@ 2018-11-23  7:15 ` NeilBrown
  2018-11-26  1:31   ` James Simmons
  2018-11-23  7:15 ` [lustre-devel] [PATCH 9/9] lustre: statahead: add smp_mb() to serialize ops NeilBrown
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: NeilBrown @ 2018-11-23  7:15 UTC (permalink / raw)
  To: lustre-devel

From: Lai Siyao <lai.siyao@intel.com>

DNE2 MDS should return -EXDEV upon remote rename, so that old
client can do rename with copy and delete, instead of fail
with -EREMOTE.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: I68e8e99259065922f31bee5343be309380715674
WC-bug-id: https://jira.whamcloud.com/browse/LU-6660
Reviewed-on: http://review.whamcloud.com/15323
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/staging/lustre/lustre/lmv/lmv_obd.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 32bb9fca88c9..7e4ffeb15a63 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -1945,7 +1945,7 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data,
 	}
 
 	rc = md_rename(target_exp, op_data, old, oldlen, new, newlen, request);
-	if (rc && rc != -EREMOTE)
+	if (rc && rc != -EXDEV)
 		return rc;
 
 	body = req_capsule_server_get(&(*request)->rq_pill, &RMF_MDT_BODY);

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 5/9] lustre: llite: clear LLIF_FILE_RESTORING when done
  2018-11-23  7:15 [lustre-devel] [PATCH 0/9] Assorted lustre patches - mostly from OpenSFS NeilBrown
                   ` (5 preceding siblings ...)
  2018-11-23  7:15 ` [lustre-devel] [PATCH 8/9] lustre: statahead: skip agl for the file in restoring NeilBrown
@ 2018-11-23  7:15 ` NeilBrown
  2018-11-26  1:34   ` James Simmons
  2018-11-23  7:15 ` [lustre-devel] [PATCH 6/9] lustre: obdclass: health_check to report unhealthy upon LBUG NeilBrown
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: NeilBrown @ 2018-11-23  7:15 UTC (permalink / raw)
  To: lustre-devel

From: Bruno Faccini <bruno.faccini@intel.com>

Clear LLIF_FILE_RESTORING if restore done to ensure to start again
to glimpse new attrs.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-6560
Reviewed-on: http://review.whamcloud.com/14609
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/staging/lustre/lustre/llite/llite_lib.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index fac65845526a..b8693049f3b6 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -1905,8 +1905,14 @@ int ll_update_inode(struct inode *inode, struct lustre_md *md)
 	}
 
 	if (body->mbo_valid & OBD_MD_TSTATE) {
+		/* Set LLIF_FILE_RESTORING if restore ongoing and
+		 * clear it when done to ensure to start again
+		 * glimpsing updated attrs
+		 */
 		if (body->mbo_t_state & MS_RESTORE)
 			set_bit(LLIF_FILE_RESTORING, &lli->lli_flags);
+		else
+			clear_bit(LLIF_FILE_RESTORING, &lli->lli_flags);
 	}
 
 	return 0;

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 6/9] lustre: obdclass: health_check to report unhealthy upon LBUG
  2018-11-23  7:15 [lustre-devel] [PATCH 0/9] Assorted lustre patches - mostly from OpenSFS NeilBrown
                   ` (6 preceding siblings ...)
  2018-11-23  7:15 ` [lustre-devel] [PATCH 5/9] lustre: llite: clear LLIF_FILE_RESTORING when done NeilBrown
@ 2018-11-23  7:15 ` NeilBrown
  2018-11-26  1:46   ` James Simmons
  2018-11-26  1:46   ` James Simmons
  2018-11-23  7:15 ` [lustre-devel] [PATCH 7/9] lustre: lnet: Stop MLX5 triggering a dump_cqe NeilBrown
  2018-11-26  3:47 ` [lustre-devel] [PATCH 0/9] Assorted lustre patches - mostly from OpenSFS James Simmons
  9 siblings, 2 replies; 25+ messages in thread
From: NeilBrown @ 2018-11-23  7:15 UTC (permalink / raw)
  To: lustre-devel

From: Bruno Faccini <bruno.faccini@intel.com>

When a LBUG has occurred, without panic_on_lbug being set, health_check
/proc file must return an unhealthy state.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-7486
Reviewed-on: http://review.whamcloud.com/17981
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/staging/lustre/lustre/obdclass/obd_sysfs.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/obdclass/obd_sysfs.c b/drivers/staging/lustre/lustre/obdclass/obd_sysfs.c
index 6669c235dd51..5fd30a8e2b44 100644
--- a/drivers/staging/lustre/lustre/obdclass/obd_sysfs.c
+++ b/drivers/staging/lustre/lustre/obdclass/obd_sysfs.c
@@ -173,8 +173,10 @@ health_check_show(struct kobject *kobj, struct attribute *attr, char *buf)
 	int i;
 	size_t len = 0;
 
-	if (libcfs_catastrophe)
+	if (libcfs_catastrophe) {
 		return sprintf(buf, "LBUG\n");
+		healthy = false;
+	}
 
 	read_lock(&obd_dev_lock);
 	for (i = 0; i < class_devno_max(); i++) {

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 7/9] lustre: lnet: Stop MLX5 triggering a dump_cqe
  2018-11-23  7:15 [lustre-devel] [PATCH 0/9] Assorted lustre patches - mostly from OpenSFS NeilBrown
                   ` (7 preceding siblings ...)
  2018-11-23  7:15 ` [lustre-devel] [PATCH 6/9] lustre: obdclass: health_check to report unhealthy upon LBUG NeilBrown
@ 2018-11-23  7:15 ` NeilBrown
  2018-11-26  1:49   ` James Simmons
  2018-11-26  3:47 ` [lustre-devel] [PATCH 0/9] Assorted lustre patches - mostly from OpenSFS James Simmons
  9 siblings, 1 reply; 25+ messages in thread
From: NeilBrown @ 2018-11-23  7:15 UTC (permalink / raw)
  To: lustre-devel

From: Doug Oucharek <doug.s.oucharek@intel.com>

We have found that MLX5 will trigger a dump_cqe if we don't
invalidate the rkey on a newly allocated MR for FastReg usage.

This fix just tags the MR as invalid on its creation if we are
using FastReg and that will force it to do an invalidate of the
rkey on first usage.

Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-8752
Reviewed-on: https://review.whamcloud.com/24306
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c    |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
index ecdf4dee533d..a5eada8ee354 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
@@ -1483,7 +1483,12 @@ static int kiblnd_alloc_freg_pool(struct kib_fmr_poolset *fps,
 			goto out_middle;
 		}
 
-		frd->frd_valid = true;
+		/*
+		 * There appears to be a bug in MLX5 code where you must
+		 * invalidate the rkey of a new FastReg pool before first
+		 * using it. Thus, I am marking the FRD invalid here.
+		 */
+		frd->frd_valid = false;
 
 		list_add_tail(&frd->frd_list, &fpo->fast_reg.fpo_pool_list);
 		fpo->fast_reg.fpo_pool_size++;

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 8/9] lustre: statahead: skip agl for the file in restoring
  2018-11-23  7:15 [lustre-devel] [PATCH 0/9] Assorted lustre patches - mostly from OpenSFS NeilBrown
                   ` (4 preceding siblings ...)
  2018-11-23  7:15 ` [lustre-devel] [PATCH 9/9] lustre: statahead: add smp_mb() to serialize ops NeilBrown
@ 2018-11-23  7:15 ` NeilBrown
  2018-11-26  2:09   ` James Simmons
  2018-11-26  2:09   ` James Simmons
  2018-11-23  7:15 ` [lustre-devel] [PATCH 5/9] lustre: llite: clear LLIF_FILE_RESTORING when done NeilBrown
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 25+ messages in thread
From: NeilBrown @ 2018-11-23  7:15 UTC (permalink / raw)
  To: lustre-devel

From: Fan Yong <fan.yong@intel.com>

In case of restore, the MDT has the right size and has already sent
it back without granting the layout lock, inode is up-to-date. Then
AGL (async glimpse lock) is useless.

Also to glimpse we need the layout, in case of a running restore the
MDT holds the layout lock so the glimpse will block up to the end of
restore (statahead/agl will block).

Signed-off-by: Fan Yong <fan.yong@intel.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-9319
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-on: https://review.whamcloud.com/26501
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/staging/lustre/lustre/llite/statahead.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/staging/lustre/lustre/llite/statahead.c b/drivers/staging/lustre/lustre/llite/statahead.c
index 28e85bfb9b82..3d71322aa1c7 100644
--- a/drivers/staging/lustre/lustre/llite/statahead.c
+++ b/drivers/staging/lustre/lustre/llite/statahead.c
@@ -504,6 +504,19 @@ static void ll_agl_trigger(struct inode *inode, struct ll_statahead_info *sai)
 		return;
 	}
 
+	/* In case of restore, the MDT has the right size and has already
+	 * sent it back without granting the layout lock, inode is up-to-date.
+	 * Then AGL (async glimpse lock) is useless.
+	 * Also to glimpse we need the layout, in case of a runninh restore
+	 * the MDT holds the layout lock so the glimpse will block up to the
+	 * end of restore (statahead/agl will block)
+	 */
+	if (test_bit(LLIF_FILE_RESTORING, &lli->lli_flags)) {
+		lli->lli_agl_index = 0;
+		iput(inode);
+		return;
+	}
+
 	/* Someone is in glimpse (sync or async), do nothing. */
 	rc = down_write_trylock(&lli->lli_glimpse_sem);
 	if (rc == 0) {

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 9/9] lustre: statahead: add smp_mb() to serialize ops
  2018-11-23  7:15 [lustre-devel] [PATCH 0/9] Assorted lustre patches - mostly from OpenSFS NeilBrown
                   ` (3 preceding siblings ...)
  2018-11-23  7:15 ` [lustre-devel] [PATCH 4/9] lustre: rename: DNE2 should return -EXDEV upon remote rename NeilBrown
@ 2018-11-23  7:15 ` NeilBrown
  2018-11-26  2:10   ` James Simmons
  2018-11-23  7:15 ` [lustre-devel] [PATCH 8/9] lustre: statahead: skip agl for the file in restoring NeilBrown
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: NeilBrown @ 2018-11-23  7:15 UTC (permalink / raw)
  To: lustre-devel

From: Lai Siyao <lai.siyao@intel.com>

In ll_deauthorize_statahead(), it set thread stop flag, and then
wake up thread, however wakeup is called inside spinlock in case
ll_statahead_info is released, then we need to call smp_mb() to
serialize setting and wakeup.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-7994
Reviewed-on: https://review.whamcloud.com/23040
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/staging/lustre/lustre/llite/statahead.c |   15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/statahead.c b/drivers/staging/lustre/lustre/llite/statahead.c
index 3d71322aa1c7..24c2335c70a7 100644
--- a/drivers/staging/lustre/lustre/llite/statahead.c
+++ b/drivers/staging/lustre/lustre/llite/statahead.c
@@ -1110,8 +1110,9 @@ static int ll_statahead_thread(void *arg)
 		sa_handle_callback(sai);
 
 		set_current_state(TASK_IDLE);
+		/* ensure we see the NULL stored by ll_deauthorize_statahead() */
 		if (!sa_has_callback(sai) &&
-		    sai->sai_task)
+		    smp_load_acquire(&sai->sai_task))
 			schedule();
 		__set_current_state(TASK_RUNNING);
 	}
@@ -1191,9 +1192,17 @@ void ll_deauthorize_statahead(struct inode *dir, void *key)
 		/*
 		 * statahead thread may not quit yet because it needs to cache
 		 * entries, now it's time to tell it to quit.
+		 *
+		 * In case sai is released, wake_up() is called inside spinlock,
+		 * so we use smp_store_release() to serialize ops.
 		 */
-		wake_up_process(sai->sai_task);
-		sai->sai_task = NULL;
+		struct task_struct *task = sai->sai_task;
+
+		/* ensure ll_statahead_thread sees the NULL before
+		 * calling schedule() again.
+		 */
+		smp_store_release(&sai->sai_task, NULL);
+		wake_up_process(task);
 	}
 	spin_unlock(&lli->lli_sa_lock);
 }

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 1/9] lustre: obdclass: fix formating of connection flags
  2018-11-23  7:15 ` [lustre-devel] [PATCH 1/9] lustre: obdclass: fix formating of connection flags NeilBrown
@ 2018-11-26  1:30   ` James Simmons
  0 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2018-11-26  1:30 UTC (permalink / raw)
  To: lustre-devel


> The current code puts the separator *only*
> at the start rather than *never* at the start.

Reviewed-by: James Simmons <jsimmons@infradead.org>y
 
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  .../lustre/lustre/obdclass/lprocfs_status.c        |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
> index feba2ef5a3bc..acfea7a44350 100644
> --- a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
> +++ b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
> @@ -709,14 +709,14 @@ static void obd_connect_seq_flags2str(struct seq_file *m, u64 flags,
>  	for (i = 0, mask = 1; i < 64; i++, mask <<= 1) {
>  		if (flags & mask) {
>  			seq_printf(m, "%s%s",
> -				   first ? sep : "", obd_connect_names[i]);
> +				   first ? "" : sep, obd_connect_names[i]);
>  			first = false;
>  		}
>  	}
>  
>  	if (flags & ~(mask - 1)) {
>  		seq_printf(m, "%sunknown flags %#llx",
> -			   first ? sep : "", flags & ~(mask - 1));
> +			   first ? "" : sep, flags & ~(mask - 1));
>  		first = false;
>  	}
>  
> 
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 2/9] lustre: remove EIOCBRETRY handling
  2018-11-23  7:15 ` [lustre-devel] [PATCH 2/9] lustre: remove EIOCBRETRY handling NeilBrown
@ 2018-11-26  1:30   ` James Simmons
  0 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2018-11-26  1:30 UTC (permalink / raw)
  To: lustre-devel


> From: James Simmons <uja.ornl@yahoo.com>
> 
> With linux commit 41003a7bcfed ("aio: remove retry-based AIO")
> AIO retry handling was removed due to it being buggy and no
> one using it, including lustre. Since this is the case
> remove EIOCBRETRY since it no longer in the kernel starting
> with version 3.18.

Reviewed-by: James Simmons <jsimmons@infradead.org>
 
> Signed-off-by: James Simmons <uja.ornl@yahoo.com>
> Reviewed-on: http://review.whamcloud.com/14507
> WC-bug-id: https://jira.whamcloud.com/browse/LU-6426
> Reviewed-by: Bob Glossman <bob.glossman@intel.com>
> Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
> Reviewed-by: Blake Caldwell <blakec@ornl.gov>
> Reviewed-by: Thomas Stibor <thomas@stibor.net>
> Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  .../staging/lustre/lustre/include/lustre_errno.h   |    1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/staging/lustre/lustre/include/lustre_errno.h b/drivers/staging/lustre/lustre/include/lustre_errno.h
> index 59fbb9f47ff1..806e79672482 100644
> --- a/drivers/staging/lustre/lustre/include/lustre_errno.h
> +++ b/drivers/staging/lustre/lustre/include/lustre_errno.h
> @@ -181,7 +181,6 @@
>  #define LUSTRE_EBADTYPE		527	/* Type not supported by server */
>  #define LUSTRE_EJUKEBOX		528	/* Request won't finish until timeout */
>  #define LUSTRE_EIOCBQUEUED	529	/* iocb queued await completion event */
> -#define LUSTRE_EIOCBRETRY	530	/* iocb queued, will trigger a retry */
>  
>  /*
>   * Translations are optimized away on x86.  Host errnos that shouldn't be put
> 
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 4/9] lustre: rename: DNE2 should return -EXDEV upon remote rename
  2018-11-23  7:15 ` [lustre-devel] [PATCH 4/9] lustre: rename: DNE2 should return -EXDEV upon remote rename NeilBrown
@ 2018-11-26  1:31   ` James Simmons
  2018-11-26  3:00     ` NeilBrown
  0 siblings, 1 reply; 25+ messages in thread
From: James Simmons @ 2018-11-26  1:31 UTC (permalink / raw)
  To: lustre-devel



On Fri, 23 Nov 2018, NeilBrown wrote:

> From: Lai Siyao <lai.siyao@intel.com>
> 
> DNE2 MDS should return -EXDEV upon remote rename, so that old
> client can do rename with copy and delete, instead of fail
> with -EREMOTE.

Let me guess you were debugging the migration failures and fould this :-)
I was doing the same thing.

Reviewed-by: James Simmons <jsimmons@infradead.org>
 
> Signed-off-by: Lai Siyao <lai.siyao@intel.com>
> Change-Id: I68e8e99259065922f31bee5343be309380715674
> WC-bug-id: https://jira.whamcloud.com/browse/LU-6660
> Reviewed-on: http://review.whamcloud.com/15323
> Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
> Reviewed-by: wangdi <di.wang@intel.com>
> Reviewed-by: Fan Yong <fan.yong@intel.com>
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  drivers/staging/lustre/lustre/lmv/lmv_obd.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
> index 32bb9fca88c9..7e4ffeb15a63 100644
> --- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
> +++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
> @@ -1945,7 +1945,7 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data,
>  	}
>  
>  	rc = md_rename(target_exp, op_data, old, oldlen, new, newlen, request);
> -	if (rc && rc != -EREMOTE)
> +	if (rc && rc != -EXDEV)
>  		return rc;
>  
>  	body = req_capsule_server_get(&(*request)->rq_pill, &RMF_MDT_BODY);
> 
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 3/9] lustre: ptlrpc: use smp unsafe at_init only for initialization
  2018-11-23  7:15 ` [lustre-devel] [PATCH 3/9] lustre: ptlrpc: use smp unsafe at_init only for initialization NeilBrown
@ 2018-11-26  1:32   ` James Simmons
  0 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2018-11-26  1:32 UTC (permalink / raw)
  To: lustre-devel


> From: Vladimir Saveliev <vladimir.saveliev@seagate.com>
> 
> at_init() is not smp safe, so it is not supposed to be used anywhere
> but in at initialization.
> Add at_reinit() - safe version of at_init().

Reviewed-by: James Simmons <jsimmons@infradead.org>
 
> Signed-off-by: Vladimir Saveliev <vladimir.saveliev@seagate.com>
> WC-bug-id: https://jira.whamcloud.com/browse/LU-6805
> Reviewed-on: http://review.whamcloud.com/15522
> Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
> Reviewed-by: Chris Horn <hornc@cray.com>
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  .../staging/lustre/lustre/include/lustre_import.h  |   19 +++++++++++++++++--
>  drivers/staging/lustre/lustre/ptlrpc/import.c      |    2 +-
>  2 files changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/include/lustre_import.h b/drivers/staging/lustre/lustre/include/lustre_import.h
> index a629f6bba814..8a8a125bd130 100644
> --- a/drivers/staging/lustre/lustre/include/lustre_import.h
> +++ b/drivers/staging/lustre/lustre/include/lustre_import.h
> @@ -331,12 +331,17 @@ static inline unsigned int at_timeout2est(unsigned int val)
>  	return (max((val << 2) / 5, 5U) - 4);
>  }
>  
> -static inline void at_reset(struct adaptive_timeout *at, int val)
> +static inline void at_reset_nolock(struct adaptive_timeout *at, int val)
>  {
> -	spin_lock(&at->at_lock);
>  	at->at_current = val;
>  	at->at_worst_ever = val;
>  	at->at_worst_time = ktime_get_real_seconds();
> +}
> +
> +static inline void at_reset(struct adaptive_timeout *at, int val)
> +{
> +	spin_lock(&at->at_lock);
> +	at_reset_nolock(at, val);
>  	spin_unlock(&at->at_lock);
>  }
>  
> @@ -348,6 +353,16 @@ static inline void at_init(struct adaptive_timeout *at, int val, int flags)
>  	at_reset(at, val);
>  }
>  
> +static inline void at_reinit(struct adaptive_timeout *at, int val, int flags)
> +{
> +	spin_lock(&at->at_lock);
> +	at->at_binstart = 0;
> +	memset(at->at_hist, 0, sizeof(at->at_hist));
> +	at->at_flags = flags;
> +	at_reset_nolock(at, val);
> +	spin_unlock(&at->at_lock);
> +}
> +
>  extern unsigned int at_min;
>  static inline int at_get(struct adaptive_timeout *at)
>  {
> diff --git a/drivers/staging/lustre/lustre/ptlrpc/import.c b/drivers/staging/lustre/lustre/ptlrpc/import.c
> index 07dc87d9513e..480c860d066e 100644
> --- a/drivers/staging/lustre/lustre/ptlrpc/import.c
> +++ b/drivers/staging/lustre/lustre/ptlrpc/import.c
> @@ -1036,7 +1036,7 @@ static int ptlrpc_connect_interpret(const struct lu_env *env,
>  	 * The net statistics after (re-)connect is not valid anymore,
>  	 * because may reflect other routing, etc.
>  	 */
> -	at_init(&imp->imp_at.iat_net_latency, 0, 0);
> +	at_reinit(&imp->imp_at.iat_net_latency, 0, 0);
>  	ptlrpc_at_adj_net_latency(request,
>  				  lustre_msg_get_service_time(request->rq_repmsg));
>  
> 
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 5/9] lustre: llite: clear LLIF_FILE_RESTORING when done
  2018-11-23  7:15 ` [lustre-devel] [PATCH 5/9] lustre: llite: clear LLIF_FILE_RESTORING when done NeilBrown
@ 2018-11-26  1:34   ` James Simmons
  0 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2018-11-26  1:34 UTC (permalink / raw)
  To: lustre-devel


> From: Bruno Faccini <bruno.faccini@intel.com>
> 
> Clear LLIF_FILE_RESTORING if restore done to ensure to start again
> to glimpse new attrs.

Reviewed-by: James Simmons <jsimmons@infradead.org>
 
> Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
> WC-bug-id: https://jira.whamcloud.com/browse/LU-6560
> Reviewed-on: http://review.whamcloud.com/14609
> Reviewed-by: John L. Hammond <john.hammond@intel.com>
> Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
> Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  drivers/staging/lustre/lustre/llite/llite_lib.c |    6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
> index fac65845526a..b8693049f3b6 100644
> --- a/drivers/staging/lustre/lustre/llite/llite_lib.c
> +++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
> @@ -1905,8 +1905,14 @@ int ll_update_inode(struct inode *inode, struct lustre_md *md)
>  	}
>  
>  	if (body->mbo_valid & OBD_MD_TSTATE) {
> +		/* Set LLIF_FILE_RESTORING if restore ongoing and
> +		 * clear it when done to ensure to start again
> +		 * glimpsing updated attrs
> +		 */
>  		if (body->mbo_t_state & MS_RESTORE)
>  			set_bit(LLIF_FILE_RESTORING, &lli->lli_flags);
> +		else
> +			clear_bit(LLIF_FILE_RESTORING, &lli->lli_flags);
>  	}
>  
>  	return 0;
> 
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 6/9] lustre: obdclass: health_check to report unhealthy upon LBUG
  2018-11-23  7:15 ` [lustre-devel] [PATCH 6/9] lustre: obdclass: health_check to report unhealthy upon LBUG NeilBrown
@ 2018-11-26  1:46   ` James Simmons
  2018-11-26  1:46   ` James Simmons
  1 sibling, 0 replies; 25+ messages in thread
From: James Simmons @ 2018-11-26  1:46 UTC (permalink / raw)
  To: lustre-devel


> From: Bruno Faccini <bruno.faccini@intel.com>
> 
> When a LBUG has occurred, without panic_on_lbug being set, health_check
> /proc file must return an unhealthy state.

I pushed this one to Greg which was disliked since it breaks the one item 
per sysfs rule. See 

https://lore.kernel.org/patchwork/patch/755571

I did start a proper port to sysfs at 
https://review.whamcloud.com/#/c/25631

but it needs to be updated. I do like Andreas idea of a sysfs and debugfs
file since lctl get_param will return the results from both together.
We could land it as is and update the sysfs handling at a latter date 
(shouldn't be too far down the road). Here is my review in case you want
to land it.y

> Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
> WC-bug-id: https://jira.whamcloud.com/browse/LU-7486
> Reviewed-on: http://review.whamcloud.com/17981
> Reviewed-by: Bobi Jam <bobijam@hotmail.com>
> Reviewed-by: Niu Yawei <yawei.niu@intel.com>
> Reviewed-by: James Simmons <uja.ornl@yahoo.com>
> Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  drivers/staging/lustre/lustre/obdclass/obd_sysfs.c |    4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/lustre/lustre/obdclass/obd_sysfs.c b/drivers/staging/lustre/lustre/obdclass/obd_sysfs.c
> index 6669c235dd51..5fd30a8e2b44 100644
> --- a/drivers/staging/lustre/lustre/obdclass/obd_sysfs.c
> +++ b/drivers/staging/lustre/lustre/obdclass/obd_sysfs.c
> @@ -173,8 +173,10 @@ health_check_show(struct kobject *kobj, struct attribute *attr, char *buf)
>  	int i;
>  	size_t len = 0;
>  
> -	if (libcfs_catastrophe)
> +	if (libcfs_catastrophe) {
>  		return sprintf(buf, "LBUG\n");
> +		healthy = false;
> +	}
>  
>  	read_lock(&obd_dev_lock);
>  	for (i = 0; i < class_devno_max(); i++) {
> 
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 6/9] lustre: obdclass: health_check to report unhealthy upon LBUG
  2018-11-23  7:15 ` [lustre-devel] [PATCH 6/9] lustre: obdclass: health_check to report unhealthy upon LBUG NeilBrown
  2018-11-26  1:46   ` James Simmons
@ 2018-11-26  1:46   ` James Simmons
  2018-11-27  2:32     ` NeilBrown
  1 sibling, 1 reply; 25+ messages in thread
From: James Simmons @ 2018-11-26  1:46 UTC (permalink / raw)
  To: lustre-devel


> From: Bruno Faccini <bruno.faccini@intel.com>
> 
> When a LBUG has occurred, without panic_on_lbug being set, health_check
> /proc file must return an unhealthy state.

I pushed this one to Greg which was disliked since it breaks the one item 
per sysfs rule. See 

https://lore.kernel.org/patchwork/patch/755571

I did start a proper port to sysfs at 
https://review.whamcloud.com/#/c/25631

but it needs to be updated. I do like Andreas idea of a sysfs and debugfs
file since lctl get_param will return the results from both together.
We could land it as is and update the sysfs handling at a latter date 
(shouldn't be too far down the road). Here is my review in case you want
to land it.

Reviewed-by: James Simmons <jsimmons@infradead.org>

> Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
> WC-bug-id: https://jira.whamcloud.com/browse/LU-7486
> Reviewed-on: http://review.whamcloud.com/17981
> Reviewed-by: Bobi Jam <bobijam@hotmail.com>
> Reviewed-by: Niu Yawei <yawei.niu@intel.com>
> Reviewed-by: James Simmons <uja.ornl@yahoo.com>
> Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  drivers/staging/lustre/lustre/obdclass/obd_sysfs.c |    4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/lustre/lustre/obdclass/obd_sysfs.c b/drivers/staging/lustre/lustre/obdclass/obd_sysfs.c
> index 6669c235dd51..5fd30a8e2b44 100644
> --- a/drivers/staging/lustre/lustre/obdclass/obd_sysfs.c
> +++ b/drivers/staging/lustre/lustre/obdclass/obd_sysfs.c
> @@ -173,8 +173,10 @@ health_check_show(struct kobject *kobj, struct attribute *attr, char *buf)
>  	int i;
>  	size_t len = 0;
>  
> -	if (libcfs_catastrophe)
> +	if (libcfs_catastrophe) {
>  		return sprintf(buf, "LBUG\n");
> +		healthy = false;
> +	}
>  
>  	read_lock(&obd_dev_lock);
>  	for (i = 0; i < class_devno_max(); i++) {
> 
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 7/9] lustre: lnet: Stop MLX5 triggering a dump_cqe
  2018-11-23  7:15 ` [lustre-devel] [PATCH 7/9] lustre: lnet: Stop MLX5 triggering a dump_cqe NeilBrown
@ 2018-11-26  1:49   ` James Simmons
  2018-11-27  2:21     ` NeilBrown
  0 siblings, 1 reply; 25+ messages in thread
From: James Simmons @ 2018-11-26  1:49 UTC (permalink / raw)
  To: lustre-devel


> From: Doug Oucharek <doug.s.oucharek@intel.com>
> 
> We have found that MLX5 will trigger a dump_cqe if we don't
> invalidate the rkey on a newly allocated MR for FastReg usage.
> 
> This fix just tags the MR as invalid on its creation if we are
> using FastReg and that will force it to do an invalidate of the
> rkey on first usage.

I pushed this one already, see https://lkml.org/lkml/2018/3/16/1410.
Dan felt this was more a infiniband layer bug that needed to be fixed.
It could be fixed already upstream or if it is not once this problem
is reported we will need to work the rdma group to fix it.
 
> Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
> WC-bug-id: https://jira.whamcloud.com/browse/LU-8752
> Reviewed-on: https://review.whamcloud.com/24306
> Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
> Reviewed-by: Amir Shehata <amir.shehata@intel.com>
> Reviewed-by: James Simmons <uja.ornl@yahoo.com>
> Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c    |    7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
> index ecdf4dee533d..a5eada8ee354 100644
> --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
> +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
> @@ -1483,7 +1483,12 @@ static int kiblnd_alloc_freg_pool(struct kib_fmr_poolset *fps,
>  			goto out_middle;
>  		}
>  
> -		frd->frd_valid = true;
> +		/*
> +		 * There appears to be a bug in MLX5 code where you must
> +		 * invalidate the rkey of a new FastReg pool before first
> +		 * using it. Thus, I am marking the FRD invalid here.
> +		 */
> +		frd->frd_valid = false;
>  
>  		list_add_tail(&frd->frd_list, &fpo->fast_reg.fpo_pool_list);
>  		fpo->fast_reg.fpo_pool_size++;
> 
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 8/9] lustre: statahead: skip agl for the file in restoring
  2018-11-23  7:15 ` [lustre-devel] [PATCH 8/9] lustre: statahead: skip agl for the file in restoring NeilBrown
@ 2018-11-26  2:09   ` James Simmons
  2018-11-26  2:09   ` James Simmons
  1 sibling, 0 replies; 25+ messages in thread
From: James Simmons @ 2018-11-26  2:09 UTC (permalink / raw)
  To: lustre-devel


> From: Fan Yong <fan.yong@intel.com>
> 
> In case of restore, the MDT has the right size and has already sent
> it back without granting the layout lock, inode is up-to-date. Then
> AGL (async glimpse lock) is useless.
> 
> Also to glimpse we need the layout, in case of a running restore the
> MDT holds the layout lock so the glimpse will block up to the end of
> restore (statahead/agl will block).

Reviewed-by: James Simmons <jsimmons@infradead.org>
 
> Signed-off-by: Fan Yong <fan.yong@intel.com>
> WC-bug-id: https://jira.whamcloud.com/browse/LU-9319
> Reviewed-by: Lai Siyao <lai.siyao@intel.com>
> Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
> Reviewed-by: John L. Hammond <john.hammond@intel.com>
> Reviewed-on: https://review.whamcloud.com/26501
> Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  drivers/staging/lustre/lustre/llite/statahead.c |   13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/staging/lustre/lustre/llite/statahead.c b/drivers/staging/lustre/lustre/llite/statahead.c
> index 28e85bfb9b82..3d71322aa1c7 100644
> --- a/drivers/staging/lustre/lustre/llite/statahead.c
> +++ b/drivers/staging/lustre/lustre/llite/statahead.c
> @@ -504,6 +504,19 @@ static void ll_agl_trigger(struct inode *inode, struct ll_statahead_info *sai)
>  		return;
>  	}
>  
> +	/* In case of restore, the MDT has the right size and has already
> +	 * sent it back without granting the layout lock, inode is up-to-date.
> +	 * Then AGL (async glimpse lock) is useless.
> +	 * Also to glimpse we need the layout, in case of a runninh restore
> +	 * the MDT holds the layout lock so the glimpse will block up to the
> +	 * end of restore (statahead/agl will block)
> +	 */
> +	if (test_bit(LLIF_FILE_RESTORING, &lli->lli_flags)) {
> +		lli->lli_agl_index = 0;
> +		iput(inode);
> +		return;
> +	}
> +
>  	/* Someone is in glimpse (sync or async), do nothing. */
>  	rc = down_write_trylock(&lli->lli_glimpse_sem);
>  	if (rc == 0) {
> 
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 8/9] lustre: statahead: skip agl for the file in restoring
  2018-11-23  7:15 ` [lustre-devel] [PATCH 8/9] lustre: statahead: skip agl for the file in restoring NeilBrown
  2018-11-26  2:09   ` James Simmons
@ 2018-11-26  2:09   ` James Simmons
  1 sibling, 0 replies; 25+ messages in thread
From: James Simmons @ 2018-11-26  2:09 UTC (permalink / raw)
  To: lustre-devel

> From: Fan Yong <fan.yong@intel.com>
> 
> In case of restore, the MDT has the right size and has already sent
> it back without granting the layout lock, inode is up-to-date. Then
> AGL (async glimpse lock) is useless.
> 
> Also to glimpse we need the layout, in case of a running restore the
> MDT holds the layout lock so the glimpse will block up to the end of
> restore (statahead/agl will block).

Reviewed-by: James Simmons <jsimmons@infradead.org>
 
> Signed-off-by: Fan Yong <fan.yong@intel.com>
> WC-bug-id: https://jira.whamcloud.com/browse/LU-9319
> Reviewed-by: Lai Siyao <lai.siyao@intel.com>
> Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
> Reviewed-by: John L. Hammond <john.hammond@intel.com>
> Reviewed-on: https://review.whamcloud.com/26501
> Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  drivers/staging/lustre/lustre/llite/statahead.c |   13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/staging/lustre/lustre/llite/statahead.c b/drivers/staging/lustre/lustre/llite/statahead.c
> index 28e85bfb9b82..3d71322aa1c7 100644
> --- a/drivers/staging/lustre/lustre/llite/statahead.c
> +++ b/drivers/staging/lustre/lustre/llite/statahead.c
> @@ -504,6 +504,19 @@ static void ll_agl_trigger(struct inode *inode, struct ll_statahead_info *sai)
>  		return;
>  	}
>  
> +	/* In case of restore, the MDT has the right size and has already
> +	 * sent it back without granting the layout lock, inode is up-to-date.
> +	 * Then AGL (async glimpse lock) is useless.
> +	 * Also to glimpse we need the layout, in case of a runninh restore
> +	 * the MDT holds the layout lock so the glimpse will block up to the
> +	 * end of restore (statahead/agl will block)
> +	 */
> +	if (test_bit(LLIF_FILE_RESTORING, &lli->lli_flags)) {
> +		lli->lli_agl_index = 0;
> +		iput(inode);
> +		return;
> +	}
> +
>  	/* Someone is in glimpse (sync or async), do nothing. */
>  	rc = down_write_trylock(&lli->lli_glimpse_sem);
>  	if (rc == 0) {
> 
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 9/9] lustre: statahead: add smp_mb() to serialize ops
  2018-11-23  7:15 ` [lustre-devel] [PATCH 9/9] lustre: statahead: add smp_mb() to serialize ops NeilBrown
@ 2018-11-26  2:10   ` James Simmons
  0 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2018-11-26  2:10 UTC (permalink / raw)
  To: lustre-devel


> From: Lai Siyao <lai.siyao@intel.com>
> 
> In ll_deauthorize_statahead(), it set thread stop flag, and then
> wake up thread, however wakeup is called inside spinlock in case
> ll_statahead_info is released, then we need to call smp_mb() to
> serialize setting and wakeup.

Reviewed-by: James Simmons <jsimmons@infradead.org>
 
> Signed-off-by: Lai Siyao <lai.siyao@intel.com>
> WC-bug-id: https://jira.whamcloud.com/browse/LU-7994
> Reviewed-on: https://review.whamcloud.com/23040
> Reviewed-by: Fan Yong <fan.yong@intel.com>
> Reviewed-by: Bobi Jam <bobijam@hotmail.com>
> Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  drivers/staging/lustre/lustre/llite/statahead.c |   15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/llite/statahead.c b/drivers/staging/lustre/lustre/llite/statahead.c
> index 3d71322aa1c7..24c2335c70a7 100644
> --- a/drivers/staging/lustre/lustre/llite/statahead.c
> +++ b/drivers/staging/lustre/lustre/llite/statahead.c
> @@ -1110,8 +1110,9 @@ static int ll_statahead_thread(void *arg)
>  		sa_handle_callback(sai);
>  
>  		set_current_state(TASK_IDLE);
> +		/* ensure we see the NULL stored by ll_deauthorize_statahead() */
>  		if (!sa_has_callback(sai) &&
> -		    sai->sai_task)
> +		    smp_load_acquire(&sai->sai_task))
>  			schedule();
>  		__set_current_state(TASK_RUNNING);
>  	}
> @@ -1191,9 +1192,17 @@ void ll_deauthorize_statahead(struct inode *dir, void *key)
>  		/*
>  		 * statahead thread may not quit yet because it needs to cache
>  		 * entries, now it's time to tell it to quit.
> +		 *
> +		 * In case sai is released, wake_up() is called inside spinlock,
> +		 * so we use smp_store_release() to serialize ops.
>  		 */
> -		wake_up_process(sai->sai_task);
> -		sai->sai_task = NULL;
> +		struct task_struct *task = sai->sai_task;
> +
> +		/* ensure ll_statahead_thread sees the NULL before
> +		 * calling schedule() again.
> +		 */
> +		smp_store_release(&sai->sai_task, NULL);
> +		wake_up_process(task);
>  	}
>  	spin_unlock(&lli->lli_sa_lock);
>  }
> 
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 4/9] lustre: rename: DNE2 should return -EXDEV upon remote rename
  2018-11-26  1:31   ` James Simmons
@ 2018-11-26  3:00     ` NeilBrown
  0 siblings, 0 replies; 25+ messages in thread
From: NeilBrown @ 2018-11-26  3:00 UTC (permalink / raw)
  To: lustre-devel

On Mon, Nov 26 2018, James Simmons wrote:

> On Fri, 23 Nov 2018, NeilBrown wrote:
>
>> From: Lai Siyao <lai.siyao@intel.com>
>> 
>> DNE2 MDS should return -EXDEV upon remote rename, so that old
>> client can do rename with copy and delete, instead of fail
>> with -EREMOTE.
>
> Let me guess you were debugging the migration failures and fould this :-)
> I was doing the same thing.

Nope - I was just looking though all the missing patches to see if
anything was relevant. :-)

Thanks for all the "Reviewed-by"s !

NeilBrown

>
> Reviewed-by: James Simmons <jsimmons@infradead.org>
>  
>> Signed-off-by: Lai Siyao <lai.siyao@intel.com>
>> Change-Id: I68e8e99259065922f31bee5343be309380715674
>> WC-bug-id: https://jira.whamcloud.com/browse/LU-6660
>> Reviewed-on: http://review.whamcloud.com/15323
>> Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
>> Reviewed-by: wangdi <di.wang@intel.com>
>> Reviewed-by: Fan Yong <fan.yong@intel.com>
>> Signed-off-by: NeilBrown <neilb@suse.com>
>> ---
>>  drivers/staging/lustre/lustre/lmv/lmv_obd.c |    2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
>> index 32bb9fca88c9..7e4ffeb15a63 100644
>> --- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
>> +++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
>> @@ -1945,7 +1945,7 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data,
>>  	}
>>  
>>  	rc = md_rename(target_exp, op_data, old, oldlen, new, newlen, request);
>> -	if (rc && rc != -EREMOTE)
>> +	if (rc && rc != -EXDEV)
>>  		return rc;
>>  
>>  	body = req_capsule_server_get(&(*request)->rq_pill, &RMF_MDT_BODY);
>> 
>> 
>> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20181126/7da77410/attachment.sig>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 0/9] Assorted lustre patches - mostly from OpenSFS
  2018-11-23  7:15 [lustre-devel] [PATCH 0/9] Assorted lustre patches - mostly from OpenSFS NeilBrown
                   ` (8 preceding siblings ...)
  2018-11-23  7:15 ` [lustre-devel] [PATCH 7/9] lustre: lnet: Stop MLX5 triggering a dump_cqe NeilBrown
@ 2018-11-26  3:47 ` James Simmons
  9 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2018-11-26  3:47 UTC (permalink / raw)
  To: lustre-devel


> This is a miscellaneous set of patches for lustre-in-linux.
> Apart from the first, they are (bases on) patches from OpenSFS lustre.
> I've been trawling through OpenSFS looking for patches that aren't
> in linux-lustre.  Most of what I find are only relevant on the
> server, but these seem to be appropriate for the client.
> In most cases, I don't really know what is going on, so they
> might be inappropriate for the client.  I'm hoping that more
> knowledgeable people can put be straight.
> 
> This are just the first few - I think there are quite a few more
> (before we get to e.g. pfl which has been attempted yet).  If
> I've got these mostly right, I'll continue looking through
> the rest of my list.

To let you know I have ported the PFL patches for the linux client.
I'm currently doing testing to make sure it works correctly. Need
to work out some bugs but the major porting is done :-)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 7/9] lustre: lnet: Stop MLX5 triggering a dump_cqe
  2018-11-26  1:49   ` James Simmons
@ 2018-11-27  2:21     ` NeilBrown
  0 siblings, 0 replies; 25+ messages in thread
From: NeilBrown @ 2018-11-27  2:21 UTC (permalink / raw)
  To: lustre-devel

On Mon, Nov 26 2018, James Simmons wrote:

>> From: Doug Oucharek <doug.s.oucharek@intel.com>
>> 
>> We have found that MLX5 will trigger a dump_cqe if we don't
>> invalidate the rkey on a newly allocated MR for FastReg usage.
>> 
>> This fix just tags the MR as invalid on its creation if we are
>> using FastReg and that will force it to do an invalidate of the
>> rkey on first usage.
>
> I pushed this one already, see https://lkml.org/lkml/2018/3/16/1410.
> Dan felt this was more a infiniband layer bug that needed to be fixed.
> It could be fixed already upstream or if it is not once this problem
> is reported we will need to work the rdma group to fix it.

Thanks.  I've dropped it for now.

If I had any idea about infiniband, I might look at the MLX driver - but
I don't :-(

NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20181127/90c3732c/attachment.sig>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 6/9] lustre: obdclass: health_check to report unhealthy upon LBUG
  2018-11-26  1:46   ` James Simmons
@ 2018-11-27  2:32     ` NeilBrown
  0 siblings, 0 replies; 25+ messages in thread
From: NeilBrown @ 2018-11-27  2:32 UTC (permalink / raw)
  To: lustre-devel

On Mon, Nov 26 2018, James Simmons wrote:

>> From: Bruno Faccini <bruno.faccini@intel.com>
>> 
>> When a LBUG has occurred, without panic_on_lbug being set, health_check
>> /proc file must return an unhealthy state.
>
> I pushed this one to Greg which was disliked since it breaks the one item 
> per sysfs rule. See 
>
> https://lore.kernel.org/patchwork/patch/755571
>
> I did start a proper port to sysfs at 
> https://review.whamcloud.com/#/c/25631
>
> but it needs to be updated. I do like Andreas idea of a sysfs and debugfs
> file since lctl get_param will return the results from both together.
> We could land it as is and update the sysfs handling at a latter date 
> (shouldn't be too far down the road). Here is my review in case you want
> to land it.
>
> Reviewed-by: James Simmons <jsimmons@infradead.org>
>

Thanks, but the patch as it stands it totally broken - I add code
immediately after a 'return'. :-(

I'll just discard this patch.

Thanks,
NeilBrown


>> Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
>> WC-bug-id: https://jira.whamcloud.com/browse/LU-7486
>> Reviewed-on: http://review.whamcloud.com/17981
>> Reviewed-by: Bobi Jam <bobijam@hotmail.com>
>> Reviewed-by: Niu Yawei <yawei.niu@intel.com>
>> Reviewed-by: James Simmons <uja.ornl@yahoo.com>
>> Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
>> Signed-off-by: NeilBrown <neilb@suse.com>
>> ---
>>  drivers/staging/lustre/lustre/obdclass/obd_sysfs.c |    4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/staging/lustre/lustre/obdclass/obd_sysfs.c b/drivers/staging/lustre/lustre/obdclass/obd_sysfs.c
>> index 6669c235dd51..5fd30a8e2b44 100644
>> --- a/drivers/staging/lustre/lustre/obdclass/obd_sysfs.c
>> +++ b/drivers/staging/lustre/lustre/obdclass/obd_sysfs.c
>> @@ -173,8 +173,10 @@ health_check_show(struct kobject *kobj, struct attribute *attr, char *buf)
>>  	int i;
>>  	size_t len = 0;
>>  
>> -	if (libcfs_catastrophe)
>> +	if (libcfs_catastrophe) {
>>  		return sprintf(buf, "LBUG\n");
>> +		healthy = false;
>> +	}
>>  
>>  	read_lock(&obd_dev_lock);
>>  	for (i = 0; i < class_devno_max(); i++) {
>> 
>> 
>> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20181127/a713a21f/attachment.sig>

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2018-11-27  2:32 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-23  7:15 [lustre-devel] [PATCH 0/9] Assorted lustre patches - mostly from OpenSFS NeilBrown
2018-11-23  7:15 ` [lustre-devel] [PATCH 2/9] lustre: remove EIOCBRETRY handling NeilBrown
2018-11-26  1:30   ` James Simmons
2018-11-23  7:15 ` [lustre-devel] [PATCH 1/9] lustre: obdclass: fix formating of connection flags NeilBrown
2018-11-26  1:30   ` James Simmons
2018-11-23  7:15 ` [lustre-devel] [PATCH 3/9] lustre: ptlrpc: use smp unsafe at_init only for initialization NeilBrown
2018-11-26  1:32   ` James Simmons
2018-11-23  7:15 ` [lustre-devel] [PATCH 4/9] lustre: rename: DNE2 should return -EXDEV upon remote rename NeilBrown
2018-11-26  1:31   ` James Simmons
2018-11-26  3:00     ` NeilBrown
2018-11-23  7:15 ` [lustre-devel] [PATCH 9/9] lustre: statahead: add smp_mb() to serialize ops NeilBrown
2018-11-26  2:10   ` James Simmons
2018-11-23  7:15 ` [lustre-devel] [PATCH 8/9] lustre: statahead: skip agl for the file in restoring NeilBrown
2018-11-26  2:09   ` James Simmons
2018-11-26  2:09   ` James Simmons
2018-11-23  7:15 ` [lustre-devel] [PATCH 5/9] lustre: llite: clear LLIF_FILE_RESTORING when done NeilBrown
2018-11-26  1:34   ` James Simmons
2018-11-23  7:15 ` [lustre-devel] [PATCH 6/9] lustre: obdclass: health_check to report unhealthy upon LBUG NeilBrown
2018-11-26  1:46   ` James Simmons
2018-11-26  1:46   ` James Simmons
2018-11-27  2:32     ` NeilBrown
2018-11-23  7:15 ` [lustre-devel] [PATCH 7/9] lustre: lnet: Stop MLX5 triggering a dump_cqe NeilBrown
2018-11-26  1:49   ` James Simmons
2018-11-27  2:21     ` NeilBrown
2018-11-26  3:47 ` [lustre-devel] [PATCH 0/9] Assorted lustre patches - mostly from OpenSFS James Simmons

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.