lustre-devel-lustre.org archive mirror
 help / color / mirror / Atom feed
* [lustre-devel] [PATCH 00/15] lustre: update to OpenSFS tree Nov 8, 2021
@ 2021-11-08 15:07 James Simmons
  2021-11-08 15:07 ` [lustre-devel] [PATCH 01/15] lustre: sec: keep encryption context in xattr cache James Simmons
                   ` (14 more replies)
  0 siblings, 15 replies; 16+ messages in thread
From: James Simmons @ 2021-11-08 15:07 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

Next batch of patches that landed to OpenSFS tree backported
to native Linux client.

Alex Zhuravlev (1):
  lustre: mdc: add support for grant shrink

Alexander Zarochentsev (1):
  lustre: ptlrpc: recalc timer on EINPROGRESS reply

Andreas Dilger (3):
  lustre: ptlrpc: align function names with param names
  lnet: don't retry allocating router buffers
  lustre: obdclass: add start time to stats files

Andrew Perepechko (1):
  lustre: vfs: set_nlink() is not race-safe

Artem Blagodarenko (1):
  lnet: socklnd: lock ksnc_tx_queue list processing

Chris Horn (1):
  lnet: Fix reference leak in lnet_parse

Lai Siyao (2):
  lustre: dne: dir migrate in QOS mode
  lustre: lmv: update default LMV upon any change

Lei Feng (1):
  lustre: ptlrpc: remove LASSERT in nrs_polices debugfs handler

Sebastien Buisson (1):
  lustre: sec: keep encryption context in xattr cache

Sergey Cheremencev (1):
  lustre: lov: fix error handling in lov_new_pool

Serguei Smirnov (2):
  lnet: socklnd: default conns_per_peer to 0
  lnet: don't use hops to determine the route state

 fs/lustre/include/lprocfs_status.h         |   5 +-
 fs/lustre/include/lustre_lmv.h             |  12 +-
 fs/lustre/include/lustre_osc.h             |   3 +
 fs/lustre/include/obd.h                    |   2 +
 fs/lustre/llite/file.c                     |   3 +
 fs/lustre/llite/llite_internal.h           |   6 +-
 fs/lustre/llite/llite_lib.c                |   6 +-
 fs/lustre/llite/lproc_llite.c              |  23 ++--
 fs/lustre/llite/namei.c                    |  12 +-
 fs/lustre/llite/xattr_cache.c              |  36 +++++-
 fs/lustre/lmv/lmv_intent.c                 |   2 +
 fs/lustre/lmv/lmv_obd.c                    | 176 ++++++++++++++++++++++++++---
 fs/lustre/lov/lov_pool.c                   |   3 +-
 fs/lustre/mdc/lproc_mdc.c                  | 114 ++++++++++++++++++-
 fs/lustre/obdclass/genops.c                |   8 +-
 fs/lustre/obdclass/lprocfs_status.c        |  28 +++--
 fs/lustre/obdclass/lu_tgt_pool.c           |   9 +-
 fs/lustre/osc/lproc_osc.c                  |  15 +--
 fs/lustre/osc/osc_request.c                |   2 +
 fs/lustre/ptlrpc/client.c                  |   4 +-
 fs/lustre/ptlrpc/lproc_ptlrpc.c            | 100 ++++++++++------
 include/linux/lnet/lib-lnet.h              |   2 +-
 net/lnet/klnds/socklnd/socklnd_cb.c        |   5 +
 net/lnet/klnds/socklnd/socklnd_modparams.c |  10 +-
 net/lnet/lnet/lib-move.c                   |   2 +
 net/lnet/lnet/router.c                     |  39 ++++---
 26 files changed, 501 insertions(+), 126 deletions(-)

-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [lustre-devel] [PATCH 01/15] lustre: sec: keep encryption context in xattr cache
  2021-11-08 15:07 [lustre-devel] [PATCH 00/15] lustre: update to OpenSFS tree Nov 8, 2021 James Simmons
@ 2021-11-08 15:07 ` James Simmons
  2021-11-08 15:07 ` [lustre-devel] [PATCH 02/15] lustre: mdc: add support for grant shrink James Simmons
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: James Simmons @ 2021-11-08 15:07 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Sebastien Buisson <sbuisson@ddn.com>

When an inode is being cleared, its xattr cache must be completely
wiped. But in case of lock cancel, we want to keep the encryption
context, as further processing might need to check it.

Fixes: b5de088eb4 ("lustre: sec: access to enc file's xattrs")
WC-bug-id: https://jira.whamcloud.com/browse/LU-14989
Lustre-commit: 14b37c763c5751faf ("LU-14989 sec: keep encryption context in xattr cache")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/45148
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/llite_internal.h |  1 +
 fs/lustre/llite/namei.c          |  2 +-
 fs/lustre/llite/xattr_cache.c    | 36 +++++++++++++++++++++++++++++++++++-
 3 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index bd49228..bed0443 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -416,6 +416,7 @@ static inline void ll_layout_version_set(struct ll_inode_info *lli, u32 gen)
 }
 
 int ll_xattr_cache_destroy(struct inode *inode);
+int ll_xattr_cache_empty(struct inode *inode);
 
 int ll_xattr_cache_get(struct inode *inode, const char *name,
 		       char *buffer, size_t size, u64 valid);
diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c
index f942179..fe7fdbb 100644
--- a/fs/lustre/llite/namei.c
+++ b/fs/lustre/llite/namei.c
@@ -248,7 +248,7 @@ static void ll_lock_cancel_bits(struct ldlm_lock *lock, u64 to_cancel)
 	}
 
 	if (bits & MDS_INODELOCK_XATTR) {
-		ll_xattr_cache_destroy(inode);
+		ll_xattr_cache_empty(inode);
 		bits &= ~MDS_INODELOCK_XATTR;
 	}
 
diff --git a/fs/lustre/llite/xattr_cache.c b/fs/lustre/llite/xattr_cache.c
index 0641f73..b044c89 100644
--- a/fs/lustre/llite/xattr_cache.c
+++ b/fs/lustre/llite/xattr_cache.c
@@ -272,6 +272,40 @@ int ll_xattr_cache_destroy(struct inode *inode)
 }
 
 /**
+ * ll_xattr_cache_empty - empty xattr cache for @ino
+ *
+ * Similar to ll_xattr_cache_destroy(), but preserves encryption context.
+ * So only LLIF_XATTR_CACHE_FILLED flag is cleared, but not LLIF_XATTR_CACHE.
+ */
+int ll_xattr_cache_empty(struct inode *inode)
+{
+	struct ll_inode_info *lli = ll_i2info(inode);
+	struct ll_xattr_entry *entry, *n;
+
+	down_write(&lli->lli_xattrs_list_rwsem);
+	if (!ll_xattr_cache_valid(lli) ||
+	    !ll_xattr_cache_filled(lli))
+		goto out_empty;
+
+	list_for_each_entry_safe(entry, n, &lli->lli_xattrs, xe_list) {
+		if (strcmp(entry->xe_name,
+			   LL_XATTR_NAME_ENCRYPTION_CONTEXT) == 0)
+			continue;
+
+		CDEBUG(D_CACHE, "delete: %s\n", entry->xe_name);
+		list_del(&entry->xe_list);
+		kfree(entry->xe_name);
+		kfree(entry->xe_value);
+		kmem_cache_free(xattr_kmem, entry);
+	}
+	clear_bit(LLIF_XATTR_CACHE_FILLED, &lli->lli_flags);
+
+out_empty:
+	up_write(&lli->lli_xattrs_list_rwsem);
+	return 0;
+}
+
+/**
  * Match or enqueue a PR lock.
  *
  * Find or request an LDLM lock with xattr data.
@@ -495,7 +529,7 @@ int ll_xattr_cache_get(struct inode *inode, const char *name, char *buffer,
 	 */
 	if ((valid & OBD_MD_FLXATTRLS ||
 	     strcmp(name, LL_XATTR_NAME_ENCRYPTION_CONTEXT) != 0) &&
-	    !ll_xattr_cache_valid(lli)) {
+	    !ll_xattr_cache_filled(lli)) {
 		up_read(&lli->lli_xattrs_list_rwsem);
 		rc = ll_xattr_cache_refill(inode);
 		if (rc)
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [lustre-devel] [PATCH 02/15] lustre: mdc: add support for grant shrink
  2021-11-08 15:07 [lustre-devel] [PATCH 00/15] lustre: update to OpenSFS tree Nov 8, 2021 James Simmons
  2021-11-08 15:07 ` [lustre-devel] [PATCH 01/15] lustre: sec: keep encryption context in xattr cache James Simmons
@ 2021-11-08 15:07 ` James Simmons
  2021-11-08 15:07 ` [lustre-devel] [PATCH 03/15] lnet: Fix reference leak in lnet_parse James Simmons
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: James Simmons @ 2021-11-08 15:07 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Alex Zhuravlev <bzzz@whamcloud.com>

just re-use existing mechanism used in OSC

WC-bug-id: https://jira.whamcloud.com/browse/LU-15010
Lustre-commit: 6e116213e3fd7d726 ("LU-15010 mdc: add support for grant shrink")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44956
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_osc.h |   2 +
 fs/lustre/llite/llite_lib.c    |   1 +
 fs/lustre/mdc/lproc_mdc.c      | 106 +++++++++++++++++++++++++++++++++++++++++
 fs/lustre/osc/osc_request.c    |   2 +
 4 files changed, 111 insertions(+)

diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h
index cdc9aae..49a5e3b 100644
--- a/fs/lustre/include/lustre_osc.h
+++ b/fs/lustre/include/lustre_osc.h
@@ -680,6 +680,8 @@ int osc_punch_send(struct obd_export *exp, struct obdo *oa,
 		   obd_enqueue_update_f upcall, void *cookie);
 int osc_fallocate_base(struct obd_export *exp, struct obdo *oa,
 		       obd_enqueue_update_f upcall, void *cookie, int mode);
+void osc_update_next_shrink(struct client_obd *cli);
+void osc_schedule_grant_work(void);
 
 /* osc_io.c */
 int osc_io_submit(const struct lu_env *env, const struct cl_io_slice *ios,
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index abd470a..51823b5 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -303,6 +303,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt)
 				  OBD_CONNECT_SUBTREE	 |
 				  OBD_CONNECT_MULTIMODRPCS |
 				  OBD_CONNECT_GRANT_PARAM |
+				  OBD_CONNECT_GRANT_SHRINK |
 				  OBD_CONNECT_SHORTIO | OBD_CONNECT_FLAGS2;
 
 	data->ocd_connect_flags2 = OBD_CONNECT2_DIR_MIGRATE |
diff --git a/fs/lustre/mdc/lproc_mdc.c b/fs/lustre/mdc/lproc_mdc.c
index d13a6b7..87beb1b 100644
--- a/fs/lustre/mdc/lproc_mdc.c
+++ b/fs/lustre/mdc/lproc_mdc.c
@@ -606,6 +606,108 @@ static ssize_t mdc_dom_min_repsize_seq_write(struct file *file,
 	{ NULL }
 };
 
+static ssize_t cur_lost_grant_bytes_show(struct kobject *kobj,
+					 struct attribute *attr,
+					 char *buf)
+{
+	struct obd_device *obd = container_of(kobj, struct obd_device,
+					      obd_kset.kobj);
+	struct client_obd *cli = &obd->u.cli;
+
+	return scnprintf(buf, PAGE_SIZE, "%lu\n", cli->cl_lost_grant);
+}
+LUSTRE_RO_ATTR(cur_lost_grant_bytes);
+
+static ssize_t cur_dirty_grant_bytes_show(struct kobject *kobj,
+					  struct attribute *attr,
+					  char *buf)
+{
+	struct obd_device *obd = container_of(kobj, struct obd_device,
+					      obd_kset.kobj);
+	struct client_obd *cli = &obd->u.cli;
+
+	return scnprintf(buf, PAGE_SIZE, "%lu\n", cli->cl_dirty_grant);
+}
+LUSTRE_RO_ATTR(cur_dirty_grant_bytes);
+
+static ssize_t grant_shrink_show(struct kobject *kobj, struct attribute *attr,
+				 char *buf)
+{
+	struct obd_device *obd = container_of(kobj, struct obd_device,
+					      obd_kset.kobj);
+	struct obd_import *imp;
+	ssize_t len;
+
+	with_imp_locked(obd, imp, len)
+		len = scnprintf(buf, PAGE_SIZE, "%d\n",
+				!imp->imp_grant_shrink_disabled &&
+				OCD_HAS_FLAG(&imp->imp_connect_data,
+					     GRANT_SHRINK));
+
+	return len;
+}
+
+static ssize_t grant_shrink_store(struct kobject *kobj, struct attribute *attr,
+				  const char *buffer, size_t count)
+{
+	struct obd_device *obd = container_of(kobj, struct obd_device,
+					      obd_kset.kobj);
+	struct obd_import *imp;
+	bool val;
+	int rc;
+
+	if (!obd)
+		return 0;
+
+	rc = kstrtobool(buffer, &val);
+	if (rc)
+		return rc;
+
+	with_imp_locked(obd, imp, rc) {
+		spin_lock(&imp->imp_lock);
+		imp->imp_grant_shrink_disabled = !val;
+		spin_unlock(&imp->imp_lock);
+	}
+
+	return rc ?: count;
+}
+LUSTRE_RW_ATTR(grant_shrink);
+
+static ssize_t grant_shrink_interval_show(struct kobject *kobj,
+					  struct attribute *attr,
+					  char *buf)
+{
+	struct obd_device *obd = container_of(kobj, struct obd_device,
+					      obd_kset.kobj);
+
+	return sprintf(buf, "%lld\n", obd->u.cli.cl_grant_shrink_interval);
+}
+
+static ssize_t grant_shrink_interval_store(struct kobject *kobj,
+					   struct attribute *attr,
+					   const char *buffer,
+					   size_t count)
+{
+	struct obd_device *obd = container_of(kobj, struct obd_device,
+					      obd_kset.kobj);
+	unsigned int val;
+	int rc;
+
+	rc = kstrtouint(buffer, 0, &val);
+	if (rc)
+		return rc;
+
+	if (val == 0)
+		return -ERANGE;
+
+	obd->u.cli.cl_grant_shrink_interval = val;
+	osc_update_next_shrink(&obd->u.cli);
+	osc_schedule_grant_work();
+
+	return count;
+}
+LUSTRE_RW_ATTR(grant_shrink_interval);
+
 static struct attribute *mdc_attrs[] = {
 	&lustre_attr_active.attr,
 	&lustre_attr_checksums.attr,
@@ -616,6 +718,10 @@ static ssize_t mdc_dom_min_repsize_seq_write(struct file *file,
 	&lustre_attr_mds_conn_uuid.attr,
 	&lustre_attr_conn_uuid.attr,
 	&lustre_attr_ping.attr,
+	&lustre_attr_grant_shrink.attr,
+	&lustre_attr_grant_shrink_interval.attr,
+	&lustre_attr_cur_lost_grant_bytes.attr,
+	&lustre_attr_cur_dirty_grant_bytes.attr,
 	NULL,
 };
 
diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c
index 22b7e5e..cf79808 100644
--- a/fs/lustre/osc/osc_request.c
+++ b/fs/lustre/osc/osc_request.c
@@ -770,6 +770,7 @@ void osc_update_next_shrink(struct client_obd *cli)
 	CDEBUG(D_CACHE, "next time %lld to shrink grant\n",
 	       cli->cl_next_shrink_grant);
 }
+EXPORT_SYMBOL(osc_update_next_shrink);
 
 static void __osc_update_grant(struct client_obd *cli, u64 grant)
 {
@@ -980,6 +981,7 @@ void osc_schedule_grant_work(void)
 	cancel_delayed_work_sync(&work);
 	schedule_work(&work.work);
 }
+EXPORT_SYMBOL(osc_schedule_grant_work);
 
 /**
  * Start grant thread for returing grant to server for idle clients.
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [lustre-devel] [PATCH 03/15] lnet: Fix reference leak in lnet_parse
  2021-11-08 15:07 [lustre-devel] [PATCH 00/15] lustre: update to OpenSFS tree Nov 8, 2021 James Simmons
  2021-11-08 15:07 ` [lustre-devel] [PATCH 01/15] lustre: sec: keep encryption context in xattr cache James Simmons
  2021-11-08 15:07 ` [lustre-devel] [PATCH 02/15] lustre: mdc: add support for grant shrink James Simmons
@ 2021-11-08 15:07 ` James Simmons
  2021-11-08 15:07 ` [lustre-devel] [PATCH 04/15] lnet: socklnd: lock ksnc_tx_queue list processing James Simmons
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: James Simmons @ 2021-11-08 15:07 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

We need to drop the reference taken by lnet_nid2peerni_locked() if we
determine that we need to drop the message because of asymmetric
route.

HPE-bug-id: LUS-9186
Fixes: 8d00758c6f ("lnet: Correct asymmetric route detection")
WC-bug-id: https://jira.whamcloud.com/browse/LU-15039
Lustre-commit: e69eca08bce47bf85 ("LU-15039 lnet: Fix reference leak in lnet_parse")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/45067
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/lib-move.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index 2b38480..170d684 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -4433,6 +4433,8 @@ void lnet_monitor_thr_stop(void)
 			}
 		}
 		if (lnet_drop_asym_route && for_me && !found) {
+			/* Drop ref taken by lnet_nid2peerni_locked() */
+			lnet_peer_ni_decref_locked(lpni);
 			lnet_net_unlock(cpt);
 			/* we would not use from_nid to route a message to
 			 * src_nid
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [lustre-devel] [PATCH 04/15] lnet: socklnd: lock ksnc_tx_queue list processing
  2021-11-08 15:07 [lustre-devel] [PATCH 00/15] lustre: update to OpenSFS tree Nov 8, 2021 James Simmons
                   ` (2 preceding siblings ...)
  2021-11-08 15:07 ` [lustre-devel] [PATCH 03/15] lnet: Fix reference leak in lnet_parse James Simmons
@ 2021-11-08 15:07 ` James Simmons
  2021-11-08 15:07 ` [lustre-devel] [PATCH 05/15] lustre: ptlrpc: align function names with param names James Simmons
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: James Simmons @ 2021-11-08 15:07 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Artem Blagodarenko, Lustre Development List

From: Artem Blagodarenko <artem.blagodarenko@hpe.com>

A GFP occurred in the ksocknal_find_timed_out_conn() while processing
ksnc_tx_queue list.

Add locking to this list.

HPE-bug-id: LUS-10248
Fixes: 3f8b895465 ("lnet: handle socklnd tx failure")
WC-bug-id: https://jira.whamcloud.com/browse/LU-15076
Lustre-commit: 13c7c2e3c248c8cdb ("LU-15076 socklnd: lock ksnc_tx_queue list processing")
Signed-off-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/45179
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/socklnd/socklnd_cb.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c
index edc584a..b2a1267 100644
--- a/net/lnet/klnds/socklnd/socklnd_cb.c
+++ b/net/lnet/klnds/socklnd/socklnd_cb.c
@@ -2188,12 +2188,14 @@ void ksocknal_write_callback(struct ksock_conn *conn)
 	/* We're called with a shared lock on ksnd_global_lock */
 	struct ksock_conn *conn;
 	struct ksock_tx *tx;
+	struct ksock_sched *sched;
 
 	list_for_each_entry(conn, &peer_ni->ksnp_conns, ksnc_list) {
 		int error;
 
 		/* Don't need the {get,put}connsock dance to deref ksnc_sock */
 		LASSERT(!conn->ksnc_closing);
+		sched = conn->ksnc_scheduler;
 
 		error = conn->ksnc_sock->sk->sk_err;
 		if (error) {
@@ -2234,6 +2236,7 @@ void ksocknal_write_callback(struct ksock_conn *conn)
 			return conn;
 		}
 
+		spin_lock_bh(&sched->kss_lock);
 		if ((!list_empty(&conn->ksnc_tx_queue) ||
 		     conn->ksnc_sock->sk->sk_wmem_queued) &&
 		    ktime_get_seconds() >= conn->ksnc_tx_deadline) {
@@ -2249,8 +2252,10 @@ void ksocknal_write_callback(struct ksock_conn *conn)
 			CNETERR("Timeout sending data to %s (%pISp) the network or that node may be down.\n",
 				libcfs_idstr(&peer_ni->ksnp_id),
 				&conn->ksnc_peeraddr);
+			spin_unlock_bh(&sched->kss_lock);
 			return conn;
 		}
+		spin_unlock_bh(&sched->kss_lock);
 	}
 
 	return NULL;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [lustre-devel] [PATCH 05/15] lustre: ptlrpc: align function names with param names
  2021-11-08 15:07 [lustre-devel] [PATCH 00/15] lustre: update to OpenSFS tree Nov 8, 2021 James Simmons
                   ` (3 preceding siblings ...)
  2021-11-08 15:07 ` [lustre-devel] [PATCH 04/15] lnet: socklnd: lock ksnc_tx_queue list processing James Simmons
@ 2021-11-08 15:07 ` James Simmons
  2021-11-08 15:07 ` [lustre-devel] [PATCH 06/15] lnet: don't retry allocating router buffers James Simmons
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: James Simmons @ 2021-11-08 15:07 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Andreas Dilger <adilger@whamcloud.com>

Change the internal function names for the ptlrpc proc tunables
to match the parameter names exposed to userspace.  Otherwise it
is needlessly complex to find the function that implements the
"nrs_policies" parameter, since the parameter use itself is wrapped
in a macro that generates the proc handling structure.

Clean up code style in related functions.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14976
Lustre-commit: 7fe49f1e7cf0586da ("LU-14976 ptlrpc: align function names with param names")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44817
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/lproc_ptlrpc.c | 49 ++++++++++++++++++++++-------------------
 1 file changed, 26 insertions(+), 23 deletions(-)

diff --git a/fs/lustre/ptlrpc/lproc_ptlrpc.c b/fs/lustre/ptlrpc/lproc_ptlrpc.c
index b291374..0323291 100644
--- a/fs/lustre/ptlrpc/lproc_ptlrpc.c
+++ b/fs/lustre/ptlrpc/lproc_ptlrpc.c
@@ -158,21 +158,22 @@
 
 const char *ll_opcode2str(u32 opcode)
 {
+	u32 offset = opcode_offset(opcode);
+
 	/* When one of the assertions below fail, chances are that:
 	 *     1) A new opcode was added in include/lustre/lustre_idl.h,
-	 *	but is missing from the table above.
+	 *	  but is missing from the table above.
 	 * or  2) The opcode space was renumbered or rearranged,
-	 *	and the opcode_offset() function in
-	 *	ptlrpc_internal.h needs to be modified.
+	 *	  and the opcode_offset() function in
+	 *	  ptlrpc_internal.h needs to be modified.
 	 */
-	u32 offset = opcode_offset(opcode);
-
 	LASSERTF(offset < LUSTRE_MAX_OPCODES,
 		 "offset %u >= LUSTRE_MAX_OPCODES %u\n",
 		 offset, LUSTRE_MAX_OPCODES);
 	LASSERTF(ll_rpc_opcode_table[offset].opcode == opcode,
 		 "ll_rpc_opcode_table[%u].opcode %u != opcode %u\n",
 		 offset, ll_rpc_opcode_table[offset].opcode, opcode);
+
 	return ll_rpc_opcode_table[offset].opname;
 }
 
@@ -249,7 +250,7 @@ static const char *ll_eopcode2str(u32 opcode)
 }
 
 static int
-ptlrpc_lprocfs_req_history_len_seq_show(struct seq_file *m, void *v)
+ptlrpc_lprocfs_req_buffer_history_len_seq_show(struct seq_file *m, void *v)
 {
 	struct ptlrpc_service *svc = m->private;
 	struct ptlrpc_service_part *svcpt;
@@ -260,13 +261,14 @@ static const char *ll_eopcode2str(u32 opcode)
 		total += svcpt->scp_hist_nrqbds;
 
 	seq_printf(m, "%d\n", total);
+
 	return 0;
 }
 
-LDEBUGFS_SEQ_FOPS_RO(ptlrpc_lprocfs_req_history_len);
+LDEBUGFS_SEQ_FOPS_RO(ptlrpc_lprocfs_req_buffer_history_len);
 
 static int
-ptlrpc_lprocfs_req_history_max_seq_show(struct seq_file *m, void *n)
+ptlrpc_lprocfs_req_buffer_history_max_seq_show(struct seq_file *m, void *n)
 {
 	struct ptlrpc_service *svc = m->private;
 	struct ptlrpc_service_part *svcpt;
@@ -281,9 +283,9 @@ static const char *ll_eopcode2str(u32 opcode)
 }
 
 static ssize_t
-ptlrpc_lprocfs_req_history_max_seq_write(struct file *file,
-					 const char __user *buffer,
-					 size_t count, loff_t *off)
+ptlrpc_lprocfs_req_buffer_history_max_seq_write(struct file *file,
+						const char __user *buffer,
+						size_t count, loff_t *off)
 {
 	struct seq_file *m = file->private_data;
 	struct ptlrpc_service *svc = m->private;
@@ -325,7 +327,7 @@ static const char *ll_eopcode2str(u32 opcode)
 	return count;
 }
 
-LDEBUGFS_SEQ_FOPS(ptlrpc_lprocfs_req_history_max);
+LDEBUGFS_SEQ_FOPS(ptlrpc_lprocfs_req_buffer_history_max);
 
 static int
 ptlrpc_lprocfs_req_buffers_max_seq_show(struct seq_file *m, void *n)
@@ -513,7 +515,7 @@ static void nrs_policy_get_info_locked(struct ptlrpc_nrs_policy *policy,
  * Reads and prints policy status information for all policies of a PTLRPC
  * service.
  */
-static int ptlrpc_lprocfs_nrs_seq_show(struct seq_file *m, void *n)
+static int ptlrpc_lprocfs_nrs_policies_seq_show(struct seq_file *m, void *n)
 {
 	struct ptlrpc_service *svc = m->private;
 	struct ptlrpc_service_part *svcpt;
@@ -660,11 +662,13 @@ static int ptlrpc_lprocfs_nrs_seq_show(struct seq_file *m, void *n)
 	return rc;
 }
 
+#define LPROCFS_NRS_WR_MAX_ARG (1024)
 /**
  * The longest valid command string is the maximum policy name size, plus the
  * length of the " reg" substring
  */
-#define LPROCFS_NRS_WR_MAX_CMD	(NRS_POL_NAME_MAX + sizeof(" reg") - 1)
+#define LPROCFS_NRS_WR_MAX_CMD	(NRS_POL_NAME_MAX + sizeof(" reg") - 1 + \
+				 LPROCFS_NRS_WR_MAX_ARG)
 
 /**
  * Starts and stops a given policy on a PTLRPC service.
@@ -673,9 +677,9 @@ static int ptlrpc_lprocfs_nrs_seq_show(struct seq_file *m, void *n)
  * if the optional token is omitted, the operation is performed on both the
  * regular and high-priority (if the service has one) NRS head.
  */
-static ssize_t ptlrpc_lprocfs_nrs_seq_write(struct file *file,
-					    const char __user *buffer,
-					    size_t count, loff_t *off)
+static ssize_t ptlrpc_lprocfs_nrs_policies_seq_write(struct file *file,
+						     const char __user *buffer,
+						     size_t count, loff_t *off)
 {
 	struct seq_file *m = file->private_data;
 	struct ptlrpc_service *svc = m->private;
@@ -753,7 +757,7 @@ static ssize_t ptlrpc_lprocfs_nrs_seq_write(struct file *file,
 	return rc < 0 ? rc : count;
 }
 
-LDEBUGFS_SEQ_FOPS(ptlrpc_lprocfs_nrs);
+LDEBUGFS_SEQ_FOPS(ptlrpc_lprocfs_nrs_policies);
 
 /** @} nrs */
 
@@ -777,8 +781,7 @@ struct ptlrpc_srh_iterator {
 		 * we're searching for a seq on or after it (i.e. more
 		 * recent), search from it onwards.
 		 * Since the service history is LRU (i.e. culled reqs will
-		 * be near the head), we shouldn't have to do long
-		 * re-scans
+		 * be near the head), we shouldn't have to do long re-scans
 		 */
 		LASSERTF(srhi->srhi_seq == srhi->srhi_req->rq_history_seq,
 			 "%s:%d: seek seq %llu, request seq %llu\n",
@@ -1136,16 +1139,16 @@ void ptlrpc_ldebugfs_register_service(struct dentry *entry,
 {
 	struct ldebugfs_vars lproc_vars[] = {
 		{ .name		= "req_buffer_history_len",
-		  .fops		= &ptlrpc_lprocfs_req_history_len_fops,
+		  .fops		= &ptlrpc_lprocfs_req_buffer_history_len_fops,
 		  .data		= svc },
 		{ .name		= "req_buffer_history_max",
-		  .fops		= &ptlrpc_lprocfs_req_history_max_fops,
+		  .fops		= &ptlrpc_lprocfs_req_buffer_history_max_fops,
 		  .data		= svc },
 		{ .name		= "timeouts",
 		  .fops		= &ptlrpc_lprocfs_timeouts_fops,
 		  .data		= svc },
 		{ .name		= "nrs_policies",
-		  .fops		= &ptlrpc_lprocfs_nrs_fops,
+		  .fops		= &ptlrpc_lprocfs_nrs_policies_fops,
 		  .data		= svc },
 		{ .name		= "req_buffers_max",
 		  .fops		= &ptlrpc_lprocfs_req_buffers_max_fops,
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [lustre-devel] [PATCH 06/15] lnet: don't retry allocating router buffers
  2021-11-08 15:07 [lustre-devel] [PATCH 00/15] lustre: update to OpenSFS tree Nov 8, 2021 James Simmons
                   ` (4 preceding siblings ...)
  2021-11-08 15:07 ` [lustre-devel] [PATCH 05/15] lustre: ptlrpc: align function names with param names James Simmons
@ 2021-11-08 15:07 ` James Simmons
  2021-11-08 15:07 ` [lustre-devel] [PATCH 07/15] lustre: ptlrpc: recalc timer on EINPROGRESS reply James Simmons
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: James Simmons @ 2021-11-08 15:07 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Andreas Dilger <adilger@whamcloud.com>

Don't loop indefinitely trying to allocate router buffer pools if
the number of requested buffers is too large for the system.

WC-bug-id: https://jira.whamcloud.com/browse/LU-2084
Lustre-commit: 3038917f12a53b059 ("LU-2084 lnet: don't retry allocating router buffers")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45174
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/router.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c
index 6cfcead..7ce33eb 100644
--- a/net/lnet/lnet/router.c
+++ b/net/lnet/lnet/router.c
@@ -1245,18 +1245,19 @@ bool lnet_router_checker_active(void)
 	int sz = offsetof(struct lnet_rtrbuf, rb_kiov[npages]);
 	struct page *page;
 	struct lnet_rtrbuf *rb;
-	int i;
+	int i, node;
 
 	rb = kzalloc_cpt(sz, GFP_NOFS, cpt);
 	if (!rb)
 		return NULL;
 
+	node = cfs_cpt_spread_node(lnet_cpt_table(), cpt);
 	rb->rb_pool = rbp;
 
 	for (i = 0; i < npages; i++) {
-		page = alloc_pages_node(
-				cfs_cpt_spread_node(lnet_cpt_table(), cpt),
-				GFP_KERNEL | __GFP_ZERO, 0);
+		page = alloc_pages_node(node,
+					GFP_KERNEL | __GFP_ZERO | __GFP_NORETRY,
+					0);
 		if (!page) {
 			while (--i >= 0)
 				__free_page(rb->rb_kiov[i].bv_page);
@@ -1344,8 +1345,8 @@ bool lnet_router_checker_active(void)
 	while (num_rb-- > 0) {
 		rb = lnet_new_rtrbuf(rbp, cpt);
 		if (!rb) {
-			CERROR("Failed to allocate %d route bufs of %d pages\n",
-			       nbufs, npages);
+			CERROR("lnet: error allocating %ux%u page router buffers on CPT %u: rc = %d\n",
+			       nbufs, npages, cpt, -ENOMEM);
 
 			lnet_net_lock(cpt);
 			rbp->rbp_req_nbuffers = old_req_nbufs;
@@ -1496,8 +1497,11 @@ bool lnet_router_checker_active(void)
 	} else if (!strcmp(forwarding, "enabled")) {
 		/* explicitly enabled */
 	} else {
-		LCONSOLE_ERROR_MSG(0x10b, "'forwarding' not set to either 'enabled' or 'disabled'\n");
-		return -EINVAL;
+		rc = -EINVAL;
+		LCONSOLE_ERROR_MSG(0x10b,
+				   "lnet: forwarding='%s' not set to either 'enabled' or 'disabled': rc = %d\n",
+				   forwarding, rc);
+		return rc;
 	}
 
 	nrb_tiny = lnet_nrb_tiny_calculate();
@@ -1516,9 +1520,11 @@ bool lnet_router_checker_active(void)
 						LNET_NRBPOOLS *
 						sizeof(*the_lnet.ln_rtrpools[0]));
 	if (!the_lnet.ln_rtrpools) {
+		rc = -ENOMEM;
 		LCONSOLE_ERROR_MSG(0x10c,
-				   "Failed to initialize router buffe pool\n");
-		return -ENOMEM;
+			"lnet: error allocating router buffer pool: rc = %d\n",
+			rc);
+		return rc;
 	}
 
 	cfs_percpt_for_each(rtrp, i, the_lnet.ln_rtrpools) {
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [lustre-devel] [PATCH 07/15] lustre: ptlrpc: recalc timer on EINPROGRESS reply
  2021-11-08 15:07 [lustre-devel] [PATCH 00/15] lustre: update to OpenSFS tree Nov 8, 2021 James Simmons
                   ` (5 preceding siblings ...)
  2021-11-08 15:07 ` [lustre-devel] [PATCH 06/15] lnet: don't retry allocating router buffers James Simmons
@ 2021-11-08 15:07 ` James Simmons
  2021-11-08 15:07 ` [lustre-devel] [PATCH 08/15] lustre: obdclass: add start time to stats files James Simmons
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: James Simmons @ 2021-11-08 15:07 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Alexander Zarochentsev, Lustre Development List

From: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>

ptlrpcd doesn't recalculate wait queue timer after
getting -EINPROGRESS reply. It may delay request resend
till its timing out.

HPE-bug-id: LUS-10366
WC-bug-id: https://jira.whamcloud.com/browse/LU-15115
Lustre-commit: 9a5bace55a5ddb8a9 ("LU-15115 ptlrpc: recalc timer on EINPROGRESS reply")
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45266
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/client.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c
index e800000..dedb5db 100644
--- a/fs/lustre/ptlrpc/client.c
+++ b/fs/lustre/ptlrpc/client.c
@@ -2047,8 +2047,10 @@ int ptlrpc_check_set(const struct lu_env *env, struct ptlrpc_request_set *set)
 				continue;
 
 			req->rq_status = after_reply(req);
-			if (req->rq_resend)
+			if (req->rq_resend) {
+				force_timer_recalc = 1;
 				continue;
+			}
 
 			/*
 			 * If there is no bulk associated with this request,
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [lustre-devel] [PATCH 08/15] lustre: obdclass: add start time to stats files
  2021-11-08 15:07 [lustre-devel] [PATCH 00/15] lustre: update to OpenSFS tree Nov 8, 2021 James Simmons
                   ` (6 preceding siblings ...)
  2021-11-08 15:07 ` [lustre-devel] [PATCH 07/15] lustre: ptlrpc: recalc timer on EINPROGRESS reply James Simmons
@ 2021-11-08 15:07 ` James Simmons
  2021-11-08 15:07 ` [lustre-devel] [PATCH 09/15] lustre: dne: dir migrate in QOS mode James Simmons
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: James Simmons @ 2021-11-08 15:07 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Andreas Dilger <adilger@whamcloud.com>

When the stats files are initialized or reset, store the current
timestamp with the stats.  That allows computing average IO and
RPC rates over the accumulated stats lifetime, in addition to the
normal incremental operation rates found by comparing successive
values read from the stats file with the read interval.

Any stats that currently print the "snapshot_time:" header will
now also print "start_time:" and "elapsed_time:" fields as well.
Consolodate this printing into a helper function instead of
duplicating very similar code in many different functions.  Output
can't be exactly the same for all callers, because these fields are
embedded into different types of output files, but it is very close.

WC-bug-id: https://jira.whamcloud.com/browse/LU-11407
Lustre-commit: ea2cd3af7bfabfa68 ("LU-11407 obdclass: add start time to stats files")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/33201
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lprocfs_status.h  |  5 ++++-
 fs/lustre/include/lustre_osc.h      |  1 +
 fs/lustre/include/obd.h             |  2 ++
 fs/lustre/llite/llite_internal.h    |  5 ++++-
 fs/lustre/llite/lproc_llite.c       | 23 ++++++++---------------
 fs/lustre/mdc/lproc_mdc.c           |  8 +++-----
 fs/lustre/obdclass/genops.c         |  8 ++------
 fs/lustre/obdclass/lprocfs_status.c | 28 ++++++++++++++++++++--------
 fs/lustre/osc/lproc_osc.c           | 15 +++++----------
 9 files changed, 49 insertions(+), 46 deletions(-)

diff --git a/fs/lustre/include/lprocfs_status.h b/fs/lustre/include/lprocfs_status.h
index c8923c8..3e86e8e 100644
--- a/fs/lustre/include/lprocfs_status.h
+++ b/fs/lustre/include/lprocfs_status.h
@@ -209,6 +209,7 @@ struct lprocfs_stats {
 	/* 1 + the biggest cpu # whose ls_percpu slot has been allocated */
 	unsigned short			ls_biggest_alloc_num;
 	enum lprocfs_stats_flags	ls_flags;
+	ktime_t				ls_init;
 	/* Lock used when there are no percpu stats areas; For percpu stats,
 	 * it is used to protect ls_biggest_alloc_num change
 	 */
@@ -444,9 +445,11 @@ void ldebugfs_add_vars(struct dentry *parent, struct ldebugfs_vars *var,
 
 int lprocfs_obd_setup(struct obd_device *obd, bool uuid_only);
 int lprocfs_obd_cleanup(struct obd_device *obd);
+void lprocfs_stats_header(struct seq_file *seq, ktime_t now,
+			  ktime_t ts_init, int width, const char *colon,
+			  bool show_units);
 
 /* Generic callbacks */
-
 int ldebugfs_uint(struct seq_file *m, void *data);
 int lprocfs_wr_uint(struct file *file, const char __user *buffer,
 		    unsigned long count, void *data);
diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h
index 49a5e3b..4c5eb1f 100644
--- a/fs/lustre/include/lustre_osc.h
+++ b/fs/lustre/include/lustre_osc.h
@@ -114,6 +114,7 @@ struct osc_device {
 
 	/* Write stats is actually protected by client_obd's lock. */
 	struct osc_stats {
+		ktime_t		os_init;
 		u64		os_lockless_writes;	/* by bytes */
 		u64		os_lockless_reads;	/* by bytes */
 	} od_stats;
diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h
index b3ad511..27acd33 100644
--- a/fs/lustre/include/obd.h
+++ b/fs/lustre/include/obd.h
@@ -270,6 +270,7 @@ struct client_obd {
 	u32			cl_max_pages_per_rpc;
 	u32			cl_max_rpcs_in_flight;
 	u32			cl_max_short_io_bytes;
+	ktime_t			cl_stats_init;
 	struct obd_histogram    cl_read_rpc_hist;
 	struct obd_histogram    cl_write_rpc_hist;
 	struct obd_histogram    cl_read_page_hist;
@@ -330,6 +331,7 @@ struct client_obd {
 	u16			cl_close_rpcs_in_flight;
 	wait_queue_head_t	cl_mod_rpcs_waitq;
 	unsigned long	       *cl_mod_tag_bitmap;
+	ktime_t			cl_mod_rpcs_init;
 	struct obd_histogram	cl_mod_rpcs_hist;
 
 	/* mgc datastruct */
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index bed0443..7768c99 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -581,6 +581,7 @@ struct per_process_info {
 
 /* pp_extents[LL_PROCESS_HIST_MAX] will hold the combined process info */
 struct ll_rw_extents_info {
+	ktime_t pp_init;
 	struct per_process_info pp_extents[LL_PROCESS_HIST_MAX + 1];
 };
 
@@ -696,11 +697,13 @@ struct ll_sb_info {
 
 	struct lu_site		*ll_site;
 	struct cl_device	*ll_cl;
+
 	/* Statistics */
 	struct ll_rw_extents_info ll_rw_extents_info;
 	int			ll_extent_process_count;
-	struct ll_rw_process_info ll_rw_process_info[LL_PROCESS_HIST_MAX];
 	unsigned int		ll_offset_process_count;
+	ktime_t			ll_process_stats_init;
+	struct ll_rw_process_info ll_rw_process_info[LL_PROCESS_HIST_MAX];
 	struct ll_rw_process_info ll_rw_offset_info[LL_OFFSET_HIST_MAX];
 	unsigned int		ll_rw_offset_entry_count;
 	int			ll_stats_track_id;
diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c
index eac905d..a7eb8e1 100644
--- a/fs/lustre/llite/lproc_llite.c
+++ b/fs/lustre/llite/lproc_llite.c
@@ -1967,20 +1967,16 @@ static void ll_display_extents_info(struct ll_rw_extents_info *io_extents,
 
 static int ll_rw_extents_stats_pp_seq_show(struct seq_file *seq, void *v)
 {
-	struct timespec64 now;
 	struct ll_sb_info *sbi = seq->private;
 	struct ll_rw_extents_info *io_extents = &sbi->ll_rw_extents_info;
 	int k;
 
-	ktime_get_real_ts64(&now);
-
 	if (!sbi->ll_rw_stats_on) {
 		seq_puts(seq, "disabled\n"
 			 "write anything in this file to activate, then '0' or 'disabled' to deactivate\n");
 		return 0;
 	}
-	seq_printf(seq, "snapshot_time:	 %llu.%09lu (secs.usecs)\n",
-		   (s64)now.tv_sec, (unsigned long)now.tv_nsec);
+	lprocfs_stats_header(seq, ktime_get(), io_extents->pp_init, 25, ":", 1);
 	seq_printf(seq, "%15s %19s       | %20s\n", " ", "read", "write");
 	seq_printf(seq, "%13s   %14s %4s %4s  | %14s %4s %4s\n",
 		   "extents", "calls", "%", "cum%",
@@ -2019,6 +2015,7 @@ static ssize_t ll_rw_extents_stats_pp_seq_write(struct file *file,
 		sbi->ll_rw_stats_on = 1;
 
 	spin_lock(&sbi->ll_pp_extent_lock);
+	io_extents->pp_init = ktime_get();
 	for (i = 0; i < LL_PROCESS_HIST_MAX; i++) {
 		io_extents->pp_extents[i].pid = 0;
 		lprocfs_oh_clear(&io_extents->pp_extents[i].pp_r_hist);
@@ -2032,19 +2029,16 @@ static ssize_t ll_rw_extents_stats_pp_seq_write(struct file *file,
 
 static int ll_rw_extents_stats_seq_show(struct seq_file *seq, void *v)
 {
-	struct timespec64 now;
 	struct ll_sb_info *sbi = seq->private;
 	struct ll_rw_extents_info *io_extents = &sbi->ll_rw_extents_info;
 
-	ktime_get_real_ts64(&now);
-
 	if (!sbi->ll_rw_stats_on) {
 		seq_puts(seq, "disabled\n"
 			 "write anything in this file to activate, then '0' or 'disabled' to deactivate\n");
 		return 0;
 	}
-	seq_printf(seq, "snapshot_time:	 %llu.%09lu (secs.usecs)\n",
-		   (u64)now.tv_sec, (unsigned long)now.tv_nsec);
+
+	lprocfs_stats_header(seq, ktime_get(), io_extents->pp_init, 25, ":", 1);
 
 	seq_printf(seq, "%15s %19s       | %20s\n", " ", "read", "write");
 	seq_printf(seq, "%13s   %14s %4s %4s  | %14s %4s %4s\n",
@@ -2078,6 +2072,7 @@ static ssize_t ll_rw_extents_stats_seq_write(struct file *file,
 		sbi->ll_rw_stats_on = 1;
 
 	spin_lock(&sbi->ll_pp_extent_lock);
+	io_extents->pp_init = ktime_get();
 	for (i = 0; i <= LL_PROCESS_HIST_MAX; i++) {
 		io_extents->pp_extents[i].pid = 0;
 		lprocfs_oh_clear(&io_extents->pp_extents[i].pp_r_hist);
@@ -2196,23 +2191,20 @@ void ll_rw_stats_tally(struct ll_sb_info *sbi, pid_t pid,
 
 static int ll_rw_offset_stats_seq_show(struct seq_file *seq, void *v)
 {
-	struct timespec64 now;
 	struct ll_sb_info *sbi = seq->private;
 	struct ll_rw_process_info *offset = sbi->ll_rw_offset_info;
 	struct ll_rw_process_info *process = sbi->ll_rw_process_info;
 	int i;
 
-	ktime_get_real_ts64(&now);
-
 	if (!sbi->ll_rw_stats_on) {
 		seq_puts(seq, "disabled\n"
 			 "write anything in this file to activate, then 0 or \"[D/d]isabled\" to deactivate\n");
 		return 0;
 	}
 	spin_lock(&sbi->ll_process_lock);
+	lprocfs_stats_header(seq, ktime_get(), sbi->ll_process_stats_init, 25,
+			     ":", true);
 
-	seq_printf(seq, "snapshot_time:	 %llu.%09lu (secs.usecs)\n",
-		   (s64)now.tv_sec, (unsigned long)now.tv_nsec);
 	seq_printf(seq, "%3s %10s %14s %14s %17s %17s %14s\n",
 		   "R/W", "PID", "RANGE START", "RANGE END",
 		   "SMALLEST EXTENT", "LARGEST EXTENT", "OFFSET");
@@ -2270,6 +2262,7 @@ static ssize_t ll_rw_offset_stats_seq_write(struct file *file,
 	spin_lock(&sbi->ll_process_lock);
 	sbi->ll_offset_process_count = 0;
 	sbi->ll_rw_offset_entry_count = 0;
+	sbi->ll_process_stats_init = ktime_get();
 	memset(process_info, 0, sizeof(struct ll_rw_process_info) *
 	       LL_PROCESS_HIST_MAX);
 	memset(offset_info, 0, sizeof(struct ll_rw_process_info) *
diff --git a/fs/lustre/mdc/lproc_mdc.c b/fs/lustre/mdc/lproc_mdc.c
index 87beb1b..fe93ccd 100644
--- a/fs/lustre/mdc/lproc_mdc.c
+++ b/fs/lustre/mdc/lproc_mdc.c
@@ -510,14 +510,10 @@ static int mdc_rpc_stats_seq_show(struct seq_file *seq, void *v)
 
 static int mdc_stats_seq_show(struct seq_file *seq, void *v)
 {
-	struct timespec64 now;
 	struct obd_device *obd = seq->private;
 	struct osc_stats *stats = &obd2osc_dev(obd)->od_stats;
 
-	ktime_get_real_ts64(&now);
-
-	seq_printf(seq, "snapshot_time:         %lld.%09lu (secs.nsecs)\n",
-		   (s64)now.tv_sec, now.tv_nsec);
+	lprocfs_stats_header(seq, ktime_get(), stats->os_init, 25, ":", true);
 	seq_printf(seq, "lockless_write_bytes\t\t%llu\n",
 		   stats->os_lockless_writes);
 	seq_printf(seq, "lockless_read_bytes\t\t%llu\n",
@@ -534,6 +530,8 @@ static ssize_t mdc_stats_seq_write(struct file *file,
 	struct osc_stats *stats = &obd2osc_dev(obd)->od_stats;
 
 	memset(stats, 0, sizeof(*stats));
+	stats->os_init = ktime_get();
+
 	return len;
 }
 LDEBUGFS_SEQ_FOPS(mdc_stats);
diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c
index 4e89e0a..81e3498 100644
--- a/fs/lustre/obdclass/genops.c
+++ b/fs/lustre/obdclass/genops.c
@@ -1443,15 +1443,11 @@ int obd_set_max_mod_rpcs_in_flight(struct client_obd *cli, u16 max)
 int obd_mod_rpc_stats_seq_show(struct client_obd *cli, struct seq_file *seq)
 {
 	unsigned long mod_tot = 0, mod_cum;
-	struct timespec64 now;
 	int i;
 
-	ktime_get_real_ts64(&now);
-
 	spin_lock(&cli->cl_mod_rpcs_lock);
-
-	seq_printf(seq, "snapshot_time:		%llu.%9lu (secs.nsecs)\n",
-		   (s64)now.tv_sec, (unsigned long)now.tv_nsec);
+	lprocfs_stats_header(seq, ktime_get(), cli->cl_mod_rpcs_init, 25,
+			     ":", true);
 	seq_printf(seq, "modify_RPCs_in_flight:  %hu\n",
 		   cli->cl_mod_rpcs_in_flight);
 
diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c
index db809f3..335fc34 100644
--- a/fs/lustre/obdclass/lprocfs_status.c
+++ b/fs/lustre/obdclass/lprocfs_status.c
@@ -1366,6 +1366,24 @@ static void *lprocfs_stats_seq_next(struct seq_file *p, void *v, loff_t *pos)
 	return lprocfs_stats_seq_start(p, pos);
 }
 
+void lprocfs_stats_header(struct seq_file *seq, ktime_t now, ktime_t ts_init,
+			  int width, const char *colon, bool show_units)
+{
+	const char *units = show_units ? " secs.nsecs" : "";
+	struct timespec64 ts;
+
+	ts = ktime_to_timespec64(now);
+	seq_printf(seq, "%-*s%s %llu.%09lu%s\n", width,
+		   "snapshot_time", colon, (s64)ts.tv_sec, ts.tv_nsec, units);
+	ts = ktime_to_timespec64(ts_init);
+	seq_printf(seq, "%-*s%s %llu.%09lu%s\n", width,
+		   "start_time", colon, (s64)ts.tv_sec, ts.tv_nsec, units);
+	ts = ktime_to_timespec64(ktime_sub(now, ts_init));
+	seq_printf(seq, "%-*s%s %llu.%09lu%s\n", width,
+		   "elapsed_time", colon, (s64)ts.tv_sec, ts.tv_nsec, units);
+}
+EXPORT_SYMBOL(lprocfs_stats_header);
+
 /* seq file export of one lprocfs counter */
 static int lprocfs_stats_seq_show(struct seq_file *p, void *v)
 {
@@ -1374,14 +1392,8 @@ static int lprocfs_stats_seq_show(struct seq_file *p, void *v)
 	struct lprocfs_counter ctr;
 	int idx = *(loff_t *)v;
 
-	if (idx == 0) {
-		struct timespec64 now;
-
-		ktime_get_real_ts64(&now);
-		seq_printf(p, "%-25s %llu.%9lu secs.usecs\n",
-			   "snapshot_time",
-			   (s64)now.tv_sec, (unsigned long)now.tv_nsec);
-	}
+	if (idx == 0)
+		lprocfs_stats_header(p, ktime_get(), stats->ls_init, 25, "", 1);
 
 	hdr = &stats->ls_cnt_header[idx];
 	lprocfs_stats_collect(stats, idx, &ctr);
diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c
index f9878e0..54b86d1 100644
--- a/fs/lustre/osc/lproc_osc.c
+++ b/fs/lustre/osc/lproc_osc.c
@@ -695,18 +695,14 @@ static ssize_t grant_shrink_store(struct kobject *kobj, struct attribute *attr,
 
 static int osc_rpc_stats_seq_show(struct seq_file *seq, void *v)
 {
-	struct timespec64 now;
 	struct obd_device *obd = seq->private;
 	struct client_obd *cli = &obd->u.cli;
 	unsigned long read_tot = 0, write_tot = 0, read_cum, write_cum;
 	int i;
 
-	ktime_get_real_ts64(&now);
-
 	spin_lock(&cli->cl_loi_list_lock);
 
-	seq_printf(seq, "snapshot_time:	 %llu.%9lu (secs.usecs)\n",
-		   (s64)now.tv_sec, (unsigned long)now.tv_nsec);
+	lprocfs_stats_header(seq, ktime_get(), cli->cl_stats_init, 25, ":", 1);
 	seq_printf(seq, "read RPCs in flight:  %d\n",
 		   cli->cl_r_in_flight);
 	seq_printf(seq, "write RPCs in flight: %d\n",
@@ -806,6 +802,7 @@ static ssize_t osc_rpc_stats_seq_write(struct file *file,
 	lprocfs_oh_clear(&cli->cl_write_page_hist);
 	lprocfs_oh_clear(&cli->cl_read_offset_hist);
 	lprocfs_oh_clear(&cli->cl_write_offset_hist);
+	cli->cl_stats_init = ktime_get();
 
 	return len;
 }
@@ -814,14 +811,10 @@ static ssize_t osc_rpc_stats_seq_write(struct file *file,
 
 static int osc_stats_seq_show(struct seq_file *seq, void *v)
 {
-	struct timespec64 now;
 	struct obd_device *obd = seq->private;
 	struct osc_stats *stats = &obd2osc_dev(obd)->od_stats;
 
-	ktime_get_real_ts64(&now);
-
-	seq_printf(seq, "snapshot_time:	 %llu.%9lu (secs.usecs)\n",
-		   (s64)now.tv_sec, (unsigned long)now.tv_nsec);
+	lprocfs_stats_header(seq, ktime_get(), stats->os_init, 25, ":", true);
 	seq_printf(seq, "lockless_write_bytes\t\t%llu\n",
 		   stats->os_lockless_writes);
 	seq_printf(seq, "lockless_read_bytes\t\t%llu\n",
@@ -838,6 +831,8 @@ static ssize_t osc_stats_seq_write(struct file *file,
 	struct osc_stats *stats = &obd2osc_dev(obd)->od_stats;
 
 	memset(stats, 0, sizeof(*stats));
+	stats->os_init = ktime_get();
+
 	return len;
 }
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [lustre-devel] [PATCH 09/15] lustre: dne: dir migrate in QOS mode
  2021-11-08 15:07 [lustre-devel] [PATCH 00/15] lustre: update to OpenSFS tree Nov 8, 2021 James Simmons
                   ` (7 preceding siblings ...)
  2021-11-08 15:07 ` [lustre-devel] [PATCH 08/15] lustre: obdclass: add start time to stats files James Simmons
@ 2021-11-08 15:07 ` James Simmons
  2021-11-08 15:07 ` [lustre-devel] [PATCH 10/15] lustre: lov: fix error handling in lov_new_pool James Simmons
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: James Simmons @ 2021-11-08 15:07 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lai Siyao, Lustre Development List

From: Lai Siyao <lai.siyao@whamcloud.com>

Support "lfs migrate -m -1 ..." to migrate directory to MDTs by
space and inode usage, if system is balanced, the target MDT is
chosen in roundrobin mode, otherwise the less full MDTs will be
chosen, and the most full MDT is avoided.

Another minor change: if directory is migrated to specific MDTs,
and the target stripe count is more than 1, its subdirs may not be
migrated to the specified MDT in the command, but migrated to the
MDT where its parent stripe is located (subdir will be striped too),
as can avoid unnecessary remote directories. NB, for command like
"lfs migrate -m 0,1,2 ...", though the subdir may be located on
either MDT0, MDT1 or MDT2, its stripes will be striped over these
three MDTs, but for command like "lfs migrate -m 0 -c 3...", the
subdir may be striped on other MDTs if the subdir is not located on
MDT0.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13076
Lustre-commit: 378c7567876b430d0 ("LU-13076 dne: dir migrate in QOS mode")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44886
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/lmv/lmv_obd.c | 176 +++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 158 insertions(+), 18 deletions(-)

diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c
index fb64b6c..b31f943 100644
--- a/fs/lustre/lmv/lmv_obd.c
+++ b/fs/lustre/lmv/lmv_obd.c
@@ -1427,7 +1427,7 @@ static int lmv_close(struct obd_export *exp, struct md_op_data *op_data,
 	return md_close(tgt->ltd_exp, op_data, mod, request);
 }
 
-static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt,
+static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 mdt,
 					      unsigned short dir_depth)
 {
 	struct lu_tgt_desc *tgt, *cur = NULL;
@@ -1462,7 +1462,7 @@ static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt,
 
 		tgt->ltd_qos.ltq_usable = 1;
 		lu_tgt_qos_weight_calc(tgt);
-		if (tgt->ltd_index == *mdt)
+		if (tgt->ltd_index == mdt)
 			cur = tgt;
 		total_avail += tgt->ltd_qos.ltq_avail;
 		total_weight += tgt->ltd_qos.ltq_weight;
@@ -1477,7 +1477,6 @@ static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt,
 	       (total_usable * 256 * (1 + dir_depth / 4));
 	if (cur && cur->ltd_qos.ltq_avail >= rand) {
 		tgt = cur;
-		rc = 0;
 		goto unlock;
 	}
 
@@ -1491,9 +1490,7 @@ static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt,
 		if (cur_weight < rand)
 			continue;
 
-		*mdt = tgt->ltd_index;
 		ltd_qos_update(&lmv->lmv_mdt_descs, tgt, &total_weight);
-		rc = 0;
 		goto unlock;
 	}
 
@@ -1506,7 +1503,7 @@ static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt,
 	return tgt;
 }
 
-static struct lu_tgt_desc *lmv_locate_tgt_rr(struct lmv_obd *lmv, u32 *mdt)
+static struct lu_tgt_desc *lmv_locate_tgt_rr(struct lmv_obd *lmv)
 {
 	struct lu_tgt_desc *tgt;
 	int i;
@@ -1520,8 +1517,7 @@ static struct lu_tgt_desc *lmv_locate_tgt_rr(struct lmv_obd *lmv, u32 *mdt)
 		if (!tgt || !tgt->ltd_exp || !tgt->ltd_active)
 			continue;
 
-		*mdt = tgt->ltd_index;
-		lmv->lmv_qos_rr_index = (*mdt + 1) %
+		lmv->lmv_qos_rr_index = (tgt->ltd_index + 1) %
 					lmv->lmv_mdt_descs.ltd_tgts_size;
 		spin_unlock(&lmv->lmv_lock);
 
@@ -1532,6 +1528,65 @@ static struct lu_tgt_desc *lmv_locate_tgt_rr(struct lmv_obd *lmv, u32 *mdt)
 	return ERR_PTR(-ENODEV);
 }
 
+/* locate MDT which is less full (avoid the most full MDT) */
+static struct lu_tgt_desc *lmv_locate_tgt_lf(struct lmv_obd *lmv)
+{
+	struct lu_tgt_desc *min = NULL;
+	struct lu_tgt_desc *tgt;
+	u64 avail = 0;
+	u64 rand;
+
+	if (!ltd_qos_is_usable(&lmv->lmv_mdt_descs))
+		return ERR_PTR(-EAGAIN);
+
+	down_write(&lmv->lmv_qos.lq_rw_sem);
+
+	if (!ltd_qos_is_usable(&lmv->lmv_mdt_descs)) {
+		tgt = ERR_PTR(-EAGAIN);
+		goto unlock;
+	}
+
+	lmv_foreach_tgt(lmv, tgt) {
+		if (!tgt->ltd_exp || !tgt->ltd_active) {
+			tgt->ltd_qos.ltq_usable = 0;
+			continue;
+		}
+
+		tgt->ltd_qos.ltq_usable = 1;
+		lu_tgt_qos_weight_calc(tgt);
+		avail += tgt->ltd_qos.ltq_avail;
+		if (!min || min->ltd_qos.ltq_avail > tgt->ltd_qos.ltq_avail)
+			min = tgt;
+	}
+
+	/* avoid the most full MDT */
+	if (min)
+		avail -= min->ltd_qos.ltq_avail;
+
+	rand = lu_prandom_u64_max(avail);
+	avail = 0;
+	lmv_foreach_connected_tgt(lmv, tgt) {
+		if (!tgt->ltd_qos.ltq_usable)
+			continue;
+
+		if (tgt == min)
+			continue;
+
+		avail += tgt->ltd_qos.ltq_avail;
+		if (avail < rand)
+			continue;
+
+		goto unlock;
+	}
+
+	/* no proper target found */
+	tgt = ERR_PTR(-EAGAIN);
+unlock:
+	up_write(&lmv->lmv_qos.lq_rw_sem);
+
+	return tgt;
+}
+
 /* locate MDT by file name, for striped directory, the file name hash decides
  * which stripe its dirent is stored.
  */
@@ -1847,7 +1902,7 @@ int lmv_create(struct obd_export *exp, struct md_op_data *op_data,
 	} else if (lmv_op_qos_mkdir(op_data)) {
 		struct lmv_tgt_desc *tmp = tgt;
 
-		tgt = lmv_locate_tgt_qos(lmv, &op_data->op_mds,
+		tgt = lmv_locate_tgt_qos(lmv, op_data->op_mds,
 					 op_data->op_dir_depth);
 		if (tgt == ERR_PTR(-EAGAIN)) {
 			if (ltd_qos_is_balanced(&lmv->lmv_mdt_descs) &&
@@ -1858,11 +1913,12 @@ int lmv_create(struct obd_export *exp, struct md_op_data *op_data,
 				 */
 				tgt = tmp;
 			else
-				tgt = lmv_locate_tgt_rr(lmv, &op_data->op_mds);
+				tgt = lmv_locate_tgt_rr(lmv);
 		}
 		if (IS_ERR(tgt))
 			return PTR_ERR(tgt);
 
+		op_data->op_mds = tgt->ltd_index;
 		/*
 		 * only update statfs after QoS mkdir, this means the cached
 		 * statfs may be stale, and current mkdir may not follow QoS
@@ -2069,6 +2125,53 @@ static int lmv_link(struct obd_export *exp, struct md_op_data *op_data,
 	return md_link(tgt->ltd_exp, op_data, request);
 }
 
+/* migrate the top directory */
+static inline bool lmv_op_topdir_migrate(const struct md_op_data *op_data)
+{
+	if (!S_ISDIR(op_data->op_mode))
+		return false;
+
+	if (lmv_dir_layout_changing(op_data->op_mea1))
+		return false;
+
+	return true;
+}
+
+/* migrate top dir to specific MDTs */
+static inline bool lmv_topdir_specific_migrate(const struct md_op_data *op_data)
+{
+	const struct lmv_user_md *lum = op_data->op_data;
+
+	if (!lmv_op_topdir_migrate(op_data))
+		return false;
+
+	return le32_to_cpu(lum->lum_stripe_offset) != LMV_OFFSET_DEFAULT;
+}
+
+/* migrate top dir in QoS mode if user issued "lfs migrate -m -1..." */
+static inline bool lmv_topdir_qos_migrate(const struct md_op_data *op_data)
+{
+	const struct lmv_user_md *lum = op_data->op_data;
+
+	if (!lmv_op_topdir_migrate(op_data))
+		return false;
+
+	return le32_to_cpu(lum->lum_stripe_offset) == LMV_OFFSET_DEFAULT;
+}
+
+static inline bool lmv_subdir_specific_migrate(const struct md_op_data *op_data)
+{
+	const struct lmv_user_md *lum = op_data->op_data;
+
+	if (!S_ISDIR(op_data->op_mode))
+		return false;
+
+	if (!lmv_dir_layout_changing(op_data->op_mea1))
+		return false;
+
+	return le32_to_cpu(lum->lum_stripe_offset) != LMV_OFFSET_DEFAULT;
+}
+
 static int lmv_migrate(struct obd_export *exp, struct md_op_data *op_data,
 			const char *name, size_t namelen,
 			struct ptlrpc_request **request)
@@ -2133,19 +2236,56 @@ static int lmv_migrate(struct obd_export *exp, struct md_op_data *op_data,
 	if (IS_ERR(child_tgt))
 		return PTR_ERR(child_tgt);
 
-	/* for directory, migrate to MDT specified by lum_stripe_offset;
-	 * otherwise migrate to the target stripe of parent, but parent
-	 * directory may have finished migration (normally current file too),
-	 * allocate FID on MDT lum_stripe_offset, and server will check
-	 * whether file was migrated already.
-	 */
-	if (S_ISDIR(op_data->op_mode) || !tp_tgt) {
+	if (lmv_topdir_specific_migrate(op_data)) {
 		struct lmv_user_md *lum = op_data->op_data;
 
 		op_data->op_mds = le32_to_cpu(lum->lum_stripe_offset);
-	} else  {
+	} else if (lmv_topdir_qos_migrate(op_data)) {
+		tgt = lmv_locate_tgt_lf(lmv);
+		if (tgt == ERR_PTR(-EAGAIN))
+			tgt = lmv_locate_tgt_rr(lmv);
+		if (IS_ERR(tgt))
+			return PTR_ERR(tgt);
+
+		op_data->op_mds = tgt->ltd_index;
+	} else if (lmv_subdir_specific_migrate(op_data)) {
+		struct lmv_user_md *lum = op_data->op_data;
+		u32 i;
+
+		LASSERT(tp_tgt);
+		if (le32_to_cpu(lum->lum_magic) == LMV_USER_MAGIC_SPECIFIC) {
+			/* adjust MDTs in lum, since subdir is located on where
+			 * its parent stripe is, not the first specified MDT.
+			 */
+			for (i = 0; i < le32_to_cpu(lum->lum_stripe_count);
+			     i++) {
+				if (le32_to_cpu(lum->lum_objects[i].lum_mds) ==
+				    tp_tgt->ltd_index)
+					break;
+			}
+
+			if (i == le32_to_cpu(lum->lum_stripe_count))
+				return -ENODEV;
+
+			lum->lum_objects[i].lum_mds =
+				lum->lum_objects[0].lum_mds;
+			lum->lum_objects[0].lum_mds =
+				cpu_to_le32(tp_tgt->ltd_index);
+		}
+		/* NB, the above adjusts subdir migration for command like
+		 * "lfs migrate -m 0,1,2 ...", but for migration like
+		 * "lfs migrate -m 0 -c 2 ...", the top dir is migrated to MDT0
+		 * and MDT1, however its subdir may be migrated to MDT1 and MDT2
+		 */
+
+		lum->lum_stripe_offset = cpu_to_le32(tp_tgt->ltd_index);
 		op_data->op_mds = tp_tgt->ltd_index;
+	} else if (tp_tgt) {
+		op_data->op_mds = tp_tgt->ltd_index;
+	} else {
+		op_data->op_mds = sp_tgt->ltd_index;
 	}
+
 	rc = lmv_fid_alloc(NULL, exp, &target_fid, op_data);
 	if (rc)
 		return rc;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [lustre-devel] [PATCH 10/15] lustre: lov: fix error handling in lov_new_pool
  2021-11-08 15:07 [lustre-devel] [PATCH 00/15] lustre: update to OpenSFS tree Nov 8, 2021 James Simmons
                   ` (8 preceding siblings ...)
  2021-11-08 15:07 ` [lustre-devel] [PATCH 09/15] lustre: dne: dir migrate in QOS mode James Simmons
@ 2021-11-08 15:07 ` James Simmons
  2021-11-08 15:07 ` [lustre-devel] [PATCH 11/15] lustre: vfs: set_nlink() is not race-safe James Simmons
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: James Simmons @ 2021-11-08 15:07 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Sergey Cheremencev, Lustre Development List

From: Sergey Cheremencev <sergey.cheremencev@hpe.com>

- correct error handling in lov_new_pool - ENOMEM
  from tgt_pool_init may cause incorrect pool_count.
- optimisation in lu_tgt_pool_add. Do not extend
  a pool, if the target is already exists.

HPE-bug-id: LUS-6995
WC-bug-id: https://jira.whamcloud.com/browse/LU-15067
Lustre-commit: b6ac7490f3a30c80d ("LU-15067 lod: fix error handling in lod_new_pool")
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45137
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/lov/lov_pool.c         | 3 ++-
 fs/lustre/obdclass/lu_tgt_pool.c | 9 +++++----
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/lustre/lov/lov_pool.c b/fs/lustre/lov/lov_pool.c
index 25e980f..267ac2d 100644
--- a/fs/lustre/lov/lov_pool.c
+++ b/fs/lustre/lov/lov_pool.c
@@ -270,7 +270,7 @@ int lov_pool_new(struct obd_device *obd, char *poolname)
 	atomic_set(&new_pool->pool_refcount, 1);
 	rc = lu_tgt_pool_init(&new_pool->pool_obds, 0);
 	if (rc)
-		goto out_err;
+		goto out_free_pool;
 
 	/* get ref for debugfs file */
 	lov_pool_getref(new_pool);
@@ -311,6 +311,7 @@ int lov_pool_new(struct obd_device *obd, char *poolname)
 	spin_unlock(&obd->obd_dev_lock);
 	debugfs_remove_recursive(new_pool->pool_debugfs_entry);
 	lu_tgt_pool_free(&new_pool->pool_obds);
+out_free_pool:
 	kfree(new_pool);
 
 	return rc;
diff --git a/fs/lustre/obdclass/lu_tgt_pool.c b/fs/lustre/obdclass/lu_tgt_pool.c
index 8f52fb4..17bae54 100644
--- a/fs/lustre/obdclass/lu_tgt_pool.c
+++ b/fs/lustre/obdclass/lu_tgt_pool.c
@@ -138,10 +138,6 @@ int lu_tgt_pool_add(struct lu_tgt_pool *op, u32 idx, unsigned int min_count)
 
 	down_write(&op->op_rw_sem);
 
-	rc = lu_tgt_pool_extend(op, min_count);
-	if (rc)
-		goto out;
-
 	/* search ost in pool array */
 	for (i = 0; i < op->op_count; i++) {
 		if (op->op_array[i] == idx) {
@@ -149,6 +145,11 @@ int lu_tgt_pool_add(struct lu_tgt_pool *op, u32 idx, unsigned int min_count)
 			goto out;
 		}
 	}
+
+	rc = lu_tgt_pool_extend(op, min_count);
+	if (rc)
+		goto out;
+
 	/* ost not found we add it */
 	op->op_array[op->op_count] = idx;
 	op->op_count++;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [lustre-devel] [PATCH 11/15] lustre: vfs: set_nlink() is not race-safe
  2021-11-08 15:07 [lustre-devel] [PATCH 00/15] lustre: update to OpenSFS tree Nov 8, 2021 James Simmons
                   ` (9 preceding siblings ...)
  2021-11-08 15:07 ` [lustre-devel] [PATCH 10/15] lustre: lov: fix error handling in lov_new_pool James Simmons
@ 2021-11-08 15:07 ` James Simmons
  2021-11-08 15:07 ` [lustre-devel] [PATCH 12/15] lustre: ptlrpc: remove LASSERT in nrs_polices debugfs handler James Simmons
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: James Simmons @ 2021-11-08 15:07 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Andrew Perepechko, Lustre Development List

From: Andrew Perepechko <andrew.perepechko@hpe.com>

set_nlink() is not atomic wrt race with itself and
the following warning may be triggered by VFS:

WARNING: CPU: 5 PID: 195090 at fs/inode.c:241 __destroy_inode+0xdb/0xf0

It does not seem important what exact nlink value is the result
of the race. However, we need to protect the superblock remove
counter.

HPE-bug-id: LUS-9825
WC-bug-id: https://jira.whamcloud.com/browse/LU-15081
Lustre-commit: 12b05772fdb6d0808 ("LU-15081 vfs: set_nlink() is not race-safe")
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-on: https://review.whamcloud.com/45191
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/file.c      |  3 +++
 fs/lustre/llite/llite_lib.c |  5 ++++-
 fs/lustre/llite/namei.c     | 10 ++++++++--
 fs/lustre/lmv/lmv_intent.c  |  2 ++
 4 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 1e4ff49..6755671 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -5041,7 +5041,10 @@ static int ll_merge_md_attr(struct inode *inode)
 	if (rc)
 		return rc;
 
+	spin_lock(&inode->i_lock);
 	set_nlink(inode, attr.cat_nlink);
+	spin_unlock(&inode->i_lock);
+
 	inode->i_blocks = attr.cat_blocks;
 	i_size_write(inode, attr.cat_size);
 
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index 51823b5..147e680 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -2448,8 +2448,11 @@ int ll_update_inode(struct inode *inode, struct lustre_md *md)
 		inode->i_gid = make_kgid(&init_user_ns, body->mbo_gid);
 	if (body->mbo_valid & OBD_MD_FLPROJID)
 		lli->lli_projid = body->mbo_projid;
-	if (body->mbo_valid & OBD_MD_FLNLINK)
+	if (body->mbo_valid & OBD_MD_FLNLINK) {
+		spin_lock(&inode->i_lock);
 		set_nlink(inode, body->mbo_nlink);
+		spin_unlock(&inode->i_lock);
+	}
 	if (body->mbo_valid & OBD_MD_FLRDEV)
 		inode->i_rdev = old_decode_dev(body->mbo_rdev);
 
diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c
index fe7fdbb..a0192da 100644
--- a/fs/lustre/llite/namei.c
+++ b/fs/lustre/llite/namei.c
@@ -1867,8 +1867,11 @@ static int ll_unlink(struct inode *dir, struct dentry *dchild)
 	 * the link count so the inode can be freed immediately.
 	 */
 	body = req_capsule_server_get(&request->rq_pill, &RMF_MDT_BODY);
-	if (body->mbo_valid & OBD_MD_FLNLINK)
+	if (body->mbo_valid & OBD_MD_FLNLINK) {
+		spin_lock(&dchild->d_inode->i_lock);
 		set_nlink(dchild->d_inode, body->mbo_nlink);
+		spin_unlock(&dchild->d_inode->i_lock);
+	}
 
 	ll_update_times(request, dir);
 	ll_stats_ops_tally(ll_i2sbi(dir), LPROC_LL_UNLINK,
@@ -1938,8 +1941,11 @@ static int ll_rmdir(struct inode *dir, struct dentry *dchild)
 		 * immediately.
 		 */
 		body = req_capsule_server_get(&request->rq_pill, &RMF_MDT_BODY);
-		if (body->mbo_valid & OBD_MD_FLNLINK)
+		if (body->mbo_valid & OBD_MD_FLNLINK) {
+			spin_lock(&dchild->d_inode->i_lock);
 			set_nlink(dchild->d_inode, body->mbo_nlink);
+			spin_unlock(&dchild->d_inode->i_lock);
+		}
 	}
 
 	ptlrpc_req_finished(request);
diff --git a/fs/lustre/lmv/lmv_intent.c b/fs/lustre/lmv/lmv_intent.c
index 88201e6..93da2b3 100644
--- a/fs/lustre/lmv/lmv_intent.c
+++ b/fs/lustre/lmv/lmv_intent.c
@@ -252,7 +252,9 @@ int lmv_revalidate_slaves(struct obd_export *exp,
 
 			i_size_write(inode, body->mbo_size);
 			inode->i_blocks = body->mbo_blocks;
+			spin_lock(&inode->i_lock);
 			set_nlink(inode, body->mbo_nlink);
+			spin_unlock(&inode->i_lock);
 			inode->i_atime.tv_sec = body->mbo_atime;
 			inode->i_ctime.tv_sec = body->mbo_ctime;
 			inode->i_mtime.tv_sec = body->mbo_mtime;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [lustre-devel] [PATCH 12/15] lustre: ptlrpc: remove LASSERT in nrs_polices debugfs handler
  2021-11-08 15:07 [lustre-devel] [PATCH 00/15] lustre: update to OpenSFS tree Nov 8, 2021 James Simmons
                   ` (10 preceding siblings ...)
  2021-11-08 15:07 ` [lustre-devel] [PATCH 11/15] lustre: vfs: set_nlink() is not race-safe James Simmons
@ 2021-11-08 15:07 ` James Simmons
  2021-11-08 15:07 ` [lustre-devel] [PATCH 13/15] lnet: socklnd: default conns_per_peer to 0 James Simmons
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: James Simmons @ 2021-11-08 15:07 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lei Feng, Lustre Development List

From: Lei Feng <flei@whamcloud.com>

It's not necessary to LASSERT() in nrs_polices debugfs handler.
CERROR() and returning error is good enough.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14587
Lustre-commit: 9997f94d4b6ee335d ("LU-14587 ptlrpc: remove LASSERT in nrs_polices proc handler")
Signed-off-by: Lei Feng <flei@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45200
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/lproc_ptlrpc.c | 51 +++++++++++++++++++++++++++++++++--------
 1 file changed, 41 insertions(+), 10 deletions(-)

diff --git a/fs/lustre/ptlrpc/lproc_ptlrpc.c b/fs/lustre/ptlrpc/lproc_ptlrpc.c
index 0323291..b2daf1f 100644
--- a/fs/lustre/ptlrpc/lproc_ptlrpc.c
+++ b/fs/lustre/ptlrpc/lproc_ptlrpc.c
@@ -570,6 +570,8 @@ static int ptlrpc_lprocfs_nrs_policies_seq_show(struct seq_file *m, void *n)
 			if (i == 0) {
 				memcpy(infos[pol_idx].pi_name, tmp.pi_name,
 				       NRS_POL_NAME_MAX);
+				memcpy(infos[pol_idx].pi_arg, tmp.pi_arg,
+				       sizeof(tmp.pi_arg));
 				memcpy(&infos[pol_idx].pi_state, &tmp.pi_state,
 				       sizeof(tmp.pi_state));
 				infos[pol_idx].pi_fallback = tmp.pi_fallback;
@@ -578,17 +580,39 @@ static int ptlrpc_lprocfs_nrs_policies_seq_show(struct seq_file *m, void *n)
 				 * sanity-check the values we get.
 				 */
 			} else {
-				LASSERT(strncmp(infos[pol_idx].pi_name,
-						tmp.pi_name,
-						NRS_POL_NAME_MAX) == 0);
+				if (strncmp(infos[pol_idx].pi_name,
+					    tmp.pi_name,
+					    NRS_POL_NAME_MAX) != 0) {
+					spin_unlock(&nrs->nrs_lock);
+					rc = -EINVAL;
+					CERROR("%s: failed to check pi_name: rc = %d\n",
+					       svc->srv_thread_name, rc);
+					goto unlock;
+				}
+				if (strncmp(infos[pol_idx].pi_arg,
+					    tmp.pi_arg,
+					    sizeof(tmp.pi_arg)) != 0) {
+					spin_unlock(&nrs->nrs_lock);
+					rc = -EINVAL;
+					CERROR("%s: failed to check pi_arg: rc = %d\n",
+					       svc->srv_thread_name, rc);
+					goto unlock;
+				}
 				/**
-				 * Not asserting ptlrpc_nrs_pol_info::pi_state,
+				 * Not checking ptlrpc_nrs_pol_info::pi_state,
 				 * because it may be different between
 				 * instances of the same policy in different
 				 * service partitions.
 				 */
-				LASSERT(infos[pol_idx].pi_fallback ==
-					tmp.pi_fallback);
+
+				if (infos[pol_idx].pi_fallback !=
+				    tmp.pi_fallback) {
+					spin_unlock(&nrs->nrs_lock);
+					rc = -EINVAL;
+					CERROR("%s: failed to check pi_fallback: rc = %d\n",
+					       svc->srv_thread_name, rc);
+					goto unlock;
+				}
 			}
 
 			infos[pol_idx].pi_req_queued += tmp.pi_req_queued;
@@ -633,12 +657,18 @@ static int ptlrpc_lprocfs_nrs_policies_seq_show(struct seq_file *m, void *n)
 		   !hp ?  "\nregular_requests:" : "high_priority_requests:");
 
 	for (pol_idx = 0; pol_idx < num_pols; pol_idx++) {
-		seq_printf(m,  "  - name: %s\n"
-			       "    state: %s\n"
+		if (strlen(infos[pol_idx].pi_arg) > 0)
+			seq_printf(m, "  - name: %s %s\n",
+				   infos[pol_idx].pi_name,
+				   infos[pol_idx].pi_arg);
+		else
+			seq_printf(m, "  - name: %s\n",
+				   infos[pol_idx].pi_name);
+
+		seq_printf(m,  "    state: %s\n"
 			       "    fallback: %s\n"
 			       "    queued: %-20d\n"
 			       "    active: %-20d\n\n",
-			       infos[pol_idx].pi_name,
 			       nrs_state2str(infos[pol_idx].pi_state),
 			       infos[pol_idx].pi_fallback ? "yes" : "no",
 			       (int)infos[pol_idx].pi_req_queued,
@@ -655,8 +685,9 @@ static int ptlrpc_lprocfs_nrs_policies_seq_show(struct seq_file *m, void *n)
 		goto again;
 	}
 
-	kfree(infos);
 unlock:
+	kfree(infos);
+
 	mutex_unlock(&nrs_core.nrs_mutex);
 
 	return rc;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [lustre-devel] [PATCH 13/15] lnet: socklnd: default conns_per_peer to 0
  2021-11-08 15:07 [lustre-devel] [PATCH 00/15] lustre: update to OpenSFS tree Nov 8, 2021 James Simmons
                   ` (11 preceding siblings ...)
  2021-11-08 15:07 ` [lustre-devel] [PATCH 12/15] lustre: ptlrpc: remove LASSERT in nrs_polices debugfs handler James Simmons
@ 2021-11-08 15:07 ` James Simmons
  2021-11-08 15:07 ` [lustre-devel] [PATCH 14/15] lnet: don't use hops to determine the route state James Simmons
  2021-11-08 15:07 ` [lustre-devel] [PATCH 15/15] lustre: lmv: update default LMV upon any change James Simmons
  14 siblings, 0 replies; 16+ messages in thread
From: James Simmons @ 2021-11-08 15:07 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Serguei Smirnov, Lustre Development List

From: Serguei Smirnov <ssmirnov@whamcloud.com>

Setting conns_per_peer to 0 triggers socklnd to choose the
(heuristically) optimal setting for the interface given its speed.
Make 0 the default for socklnd conns_per_peer.

Fixes: 6374d25cfe ("lnet: socklnd: set conns_per_peer based on link speed")
WC-bug-id: https://jira.whamcloud.com/browse/LU-15136
Lustre-commit: 30a028e2ee2b3eead ("U-15136 socklnd: default conns_per_peer to 0")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45319
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h              |  2 +-
 net/lnet/klnds/socklnd/socklnd_modparams.c | 10 ++++++++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index 890f61a..104c98d 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -87,7 +87,7 @@
 #define DEFAULT_CREDITS	256
 
 /* default number of connections per peer */
-#define DEFAULT_CONNS_PER_PEER	1
+#define DEFAULT_CONNS_PER_PEER	0
 
 int choose_ipv4_src(u32 *ret, int interface, u32 dst_ipaddr, struct net *ns);
 
diff --git a/net/lnet/klnds/socklnd/socklnd_modparams.c b/net/lnet/klnds/socklnd/socklnd_modparams.c
index 72f9df2..c00ea49 100644
--- a/net/lnet/klnds/socklnd/socklnd_modparams.c
+++ b/net/lnet/klnds/socklnd/socklnd_modparams.c
@@ -163,6 +163,10 @@ static int ksocklnd_ni_get_eth_intf_speed(struct lnet_ni *ni)
 	int intf_idx = -1;
 	int ret = -1;
 
+	/* check if ni has interface assigned */
+	if (!ni->ni_net_ns || !ni->ni_interface)
+		return 0;
+
 	rtnl_lock();
 	for_each_netdev(ni->ni_net_ns, dev) {
 		int flags = dev_get_flags(dev);
@@ -215,10 +219,12 @@ static int ksocklnd_speed2cpp(int speed)
 
 static int ksocklnd_lookup_conns_per_peer(struct lnet_ni *ni)
 {
-	int cpp = DEFAULT_CONNS_PER_PEER;
+	int cpp = 1;
 	int speed = ksocklnd_ni_get_eth_intf_speed(ni);
 
-	CDEBUG(D_NET, "intf %s speed %d\n", ni->ni_interface, speed);
+	if (ni->ni_interface)
+		CDEBUG(D_NET, "intf %s speed %d\n", ni->ni_interface, speed);
+
 	if (speed > 0)
 		cpp = ksocklnd_speed2cpp(speed);
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [lustre-devel] [PATCH 14/15] lnet: don't use hops to determine the route state
  2021-11-08 15:07 [lustre-devel] [PATCH 00/15] lustre: update to OpenSFS tree Nov 8, 2021 James Simmons
                   ` (12 preceding siblings ...)
  2021-11-08 15:07 ` [lustre-devel] [PATCH 13/15] lnet: socklnd: default conns_per_peer to 0 James Simmons
@ 2021-11-08 15:07 ` James Simmons
  2021-11-08 15:07 ` [lustre-devel] [PATCH 15/15] lustre: lmv: update default LMV upon any change James Simmons
  14 siblings, 0 replies; 16+ messages in thread
From: James Simmons @ 2021-11-08 15:07 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Serguei Smirnov, Lustre Development List

From: Serguei Smirnov <ssmirnov@whamcloud.com>

NodeA <-tcp1-> GW1 <-tcp2-> GW2 <-tcp3-> NodeB

Assuming GW1 knows how to reach tcp3 network and GW2 knows
how to reach tcp1 network, it should be possible to add routes
without specifying hop=2 on nodes A and B to reach tcp3 and tcp1
respectively and then be able to lnetctl ping between them.
Changes introduced by LU-13785 interpret default hops to be
equivalent to hop=1 set explicitly for the purpose of determining
route aliveness, which results in the routes created as described
above to be considered "down".

Fix it so that default hop setting doesn't prevent
the multi-hop scenario from working.

Fixes: 64d703ca18 ("lnet: Use lr_hops for avoid_asym_router_failure")
WC-bug-id: https://jira.whamcloud.com/browse/LU-14945
Lustre-commit: 3f2844dc9333c8645 ("LU-14945 lnet: don't use hops to determine the route state")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44674
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/router.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c
index 7ce33eb..97e5ab2 100644
--- a/net/lnet/lnet/router.c
+++ b/net/lnet/lnet/router.c
@@ -318,7 +318,7 @@ bool lnet_is_route_alive(struct lnet_route *route)
 	 * routes the next-hop will not have the remote net.
 	 */
 	if (avoid_asym_router_failure &&
-	    (route->lr_hops == 1 || route->lr_hops == LNET_UNDEFINED_HOPS)) {
+	    (route->lr_hops == 1 || route->lr_single_hop)) {
 		rlpn = lnet_peer_get_net_locked(gw, route->lr_net);
 		if (!rlpn)
 			return false;
@@ -470,8 +470,7 @@ bool lnet_is_route_alive(struct lnet_route *route)
 
 		route->lr_single_hop = single_hop;
 		if (avoid_asym_router_failure &&
-		    (route->lr_hops == 1 ||
-		     route->lr_hops == LNET_UNDEFINED_HOPS))
+		    (route->lr_hops == 1 || route->lr_single_hop))
 			lnet_set_route_aliveness(route, net_up);
 		else
 			lnet_set_route_aliveness(route, true);
@@ -764,6 +763,14 @@ static void lnet_shuffle_seed(void)
 	lnet_peer_ni_decref_locked(lpni);
 	lnet_net_unlock(LNET_LOCK_EX);
 
+	/* If avoid_asym_router_failure is enabled and hop count is not
+	 * set to 1 for a route that is actually single-hop, then the
+	 * feature will fail to prevent the router from being selected
+	 * if it is missing a NI on the remote network due to misconfiguration.
+	 */
+	if (avoid_asym_router_failure && hops == LNET_UNDEFINED_HOPS)
+		CWARN("Use hops = 1 for a single-hop route when avoid_asym_router_failure feature is enabled\n");
+
 	rc = 0;
 
 	if (!add_route) {
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [lustre-devel] [PATCH 15/15] lustre: lmv: update default LMV upon any change
  2021-11-08 15:07 [lustre-devel] [PATCH 00/15] lustre: update to OpenSFS tree Nov 8, 2021 James Simmons
                   ` (13 preceding siblings ...)
  2021-11-08 15:07 ` [lustre-devel] [PATCH 14/15] lnet: don't use hops to determine the route state James Simmons
@ 2021-11-08 15:07 ` James Simmons
  14 siblings, 0 replies; 16+ messages in thread
From: James Simmons @ 2021-11-08 15:07 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lai Siyao, Lustre Development List

From: Lai Siyao <lai.siyao@whamcloud.com>

max_inherit and max_inherit_rr was newly added, and they are missing
in lsm_md_eq(), therefore client may not update default LMV when
either of these two fields is changed.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15070
Lustre-commit: f3314706b4e5c21f1 ("LU-15070 llite: update default LMV upon any change")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45237
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_lmv.h | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/fs/lustre/include/lustre_lmv.h b/fs/lustre/include/lustre_lmv.h
index 6861dd0..b1d8ed9 100644
--- a/fs/lustre/include/lustre_lmv.h
+++ b/fs/lustre/include/lustre_lmv.h
@@ -93,6 +93,8 @@ static inline bool lmv_dir_bad_hash(const struct lmv_stripe_md *lsm)
 	    lsm1->lsm_md_stripe_count != lsm2->lsm_md_stripe_count ||
 	    lsm1->lsm_md_master_mdt_index != lsm2->lsm_md_master_mdt_index ||
 	    lsm1->lsm_md_hash_type != lsm2->lsm_md_hash_type ||
+	    lsm1->lsm_md_max_inherit != lsm2->lsm_md_max_inherit ||
+	    lsm1->lsm_md_max_inherit_rr != lsm2->lsm_md_max_inherit_rr ||
 	    lsm1->lsm_md_layout_version != lsm2->lsm_md_layout_version ||
 	    lsm1->lsm_md_migrate_offset !=
 				lsm2->lsm_md_migrate_offset ||
@@ -108,6 +110,12 @@ static inline bool lmv_dir_bad_hash(const struct lmv_stripe_md *lsm)
 				       &lsm2->lsm_md_oinfo[idx].lmo_fid))
 				return false;
 		}
+	} else if (lsm1->lsm_md_magic == LMV_USER_MAGIC_SPECIFIC) {
+		for (idx = 0; idx < lsm1->lsm_md_stripe_count; idx++) {
+			if (lsm1->lsm_md_oinfo[idx].lmo_mds !=
+			    lsm2->lsm_md_oinfo[idx].lmo_mds)
+				return false;
+		}
 	}
 
 	return true;
@@ -122,13 +130,13 @@ static inline void lsm_md_dump(int mask, const struct lmv_stripe_md *lsm)
 	 * terminated string so only print LOV_MAXPOOLNAME bytes.
 	 */
 	CDEBUG(mask,
-	       "magic %#x stripe count %d master mdt %d hash type %s:%#x max inherit %hhu version %d migrate offset %d migrate hash %#x pool %.*s\n",
+	       "magic %#x stripe count %d master mdt %d hash type %s:%#x max-inherit %hhu max-inherit-rr %hhu version %d migrate offset %d migrate hash %#x pool %.*s\n",
 	       lsm->lsm_md_magic, lsm->lsm_md_stripe_count,
 	       lsm->lsm_md_master_mdt_index,
 	       valid_hash ? "invalid hash" :
 			    mdt_hash_name[lsm->lsm_md_hash_type & (LMV_HASH_TYPE_MAX - 1)],
 	       lsm->lsm_md_hash_type, lsm->lsm_md_max_inherit,
-	       lsm->lsm_md_layout_version,
+	       lsm->lsm_md_max_inherit_rr, lsm->lsm_md_layout_version,
 	       lsm->lsm_md_migrate_offset, lsm->lsm_md_migrate_hash,
 	       LOV_MAXPOOLNAME, lsm->lsm_md_pool_name);
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-11-08 15:08 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-08 15:07 [lustre-devel] [PATCH 00/15] lustre: update to OpenSFS tree Nov 8, 2021 James Simmons
2021-11-08 15:07 ` [lustre-devel] [PATCH 01/15] lustre: sec: keep encryption context in xattr cache James Simmons
2021-11-08 15:07 ` [lustre-devel] [PATCH 02/15] lustre: mdc: add support for grant shrink James Simmons
2021-11-08 15:07 ` [lustre-devel] [PATCH 03/15] lnet: Fix reference leak in lnet_parse James Simmons
2021-11-08 15:07 ` [lustre-devel] [PATCH 04/15] lnet: socklnd: lock ksnc_tx_queue list processing James Simmons
2021-11-08 15:07 ` [lustre-devel] [PATCH 05/15] lustre: ptlrpc: align function names with param names James Simmons
2021-11-08 15:07 ` [lustre-devel] [PATCH 06/15] lnet: don't retry allocating router buffers James Simmons
2021-11-08 15:07 ` [lustre-devel] [PATCH 07/15] lustre: ptlrpc: recalc timer on EINPROGRESS reply James Simmons
2021-11-08 15:07 ` [lustre-devel] [PATCH 08/15] lustre: obdclass: add start time to stats files James Simmons
2021-11-08 15:07 ` [lustre-devel] [PATCH 09/15] lustre: dne: dir migrate in QOS mode James Simmons
2021-11-08 15:07 ` [lustre-devel] [PATCH 10/15] lustre: lov: fix error handling in lov_new_pool James Simmons
2021-11-08 15:07 ` [lustre-devel] [PATCH 11/15] lustre: vfs: set_nlink() is not race-safe James Simmons
2021-11-08 15:07 ` [lustre-devel] [PATCH 12/15] lustre: ptlrpc: remove LASSERT in nrs_polices debugfs handler James Simmons
2021-11-08 15:07 ` [lustre-devel] [PATCH 13/15] lnet: socklnd: default conns_per_peer to 0 James Simmons
2021-11-08 15:07 ` [lustre-devel] [PATCH 14/15] lnet: don't use hops to determine the route state James Simmons
2021-11-08 15:07 ` [lustre-devel] [PATCH 15/15] lustre: lmv: update default LMV upon any change James Simmons

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).