lustre-devel-lustre.org archive mirror
 help / color / mirror / Atom feed
From: James Simmons <jsimmons@infradead.org>
To: lustre-devel@lists.lustre.org
Subject: [lustre-devel] [PATCH 22/23] lnet: Have LNet routers monitor the ni_fatal flag
Date: Tue, 11 Aug 2020 08:20:18 -0400	[thread overview]
Message-ID: <1597148419-20629-23-git-send-email-jsimmons@infradead.org> (raw)
In-Reply-To: <1597148419-20629-1-git-send-email-jsimmons@infradead.org>

From: Chris Horn <chris.horn@hpe.com>

Have the LNet monitor thread on LNet routers check the
ni_fatal_error_on flag to set local NI status appropriately. When
this results in a status change, perform a discovery push to all
peers. This allows peers to update their route status appropriately.

HPE-bug-id: LUS-9068
WC-bug-id: https://jira.whamcloud.com/browse/LU-13782
Lustre-commit: 7e0ec0f809ea1 ("LU-13782 lnet: Have LNet routers monitor the ni_fatal flag")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/39353
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h | 29 +++++++++++++++++++++++++++++
 net/lnet/lnet/lib-move.c      |  6 +-----
 net/lnet/lnet/router.c        | 35 ++++++++++++++++++++++++-----------
 3 files changed, 54 insertions(+), 16 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index 299ecf5..d2a39f6 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -98,6 +98,35 @@
 extern struct kmem_cache *lnet_rspt_cachep;
 extern struct kmem_cache *lnet_msg_cachep;
 
+static inline bool
+lnet_ni_set_status_locked(struct lnet_ni *ni, u32 status)
+__must_hold(&ni->ni_lock)
+{
+	bool update = false;
+
+	if (ni->ni_status && ni->ni_status->ns_status != status) {
+		CDEBUG(D_NET, "ni %s status changed from %#x to %#x\n",
+		       libcfs_nid2str(ni->ni_nid),
+		       ni->ni_status->ns_status, status);
+		ni->ni_status->ns_status = status;
+		update = true;
+	}
+
+	return update;
+}
+
+static inline bool
+lnet_ni_set_status(struct lnet_ni *ni, u32 status)
+{
+	bool update;
+
+	spin_lock(&ni->ni_lock);
+	update = lnet_ni_set_status_locked(ni, status);
+	spin_unlock(&ni->ni_lock);
+
+	return update;
+}
+
 bool lnet_is_route_alive(struct lnet_route *route);
 bool lnet_is_gateway_alive(struct lnet_peer *gw);
 
diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index 85b6453..f521817 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -4012,11 +4012,7 @@ void lnet_monitor_thr_stop(void)
 		spin_lock(&ni->ni_net->net_lock);
 		ni->ni_net->net_last_alive = ktime_get_real_seconds();
 		spin_unlock(&ni->ni_net->net_lock);
-		if (ni->ni_status &&
-		    ni->ni_status->ns_status == LNET_NI_STATUS_DOWN) {
-			ni->ni_status->ns_status = LNET_NI_STATUS_UP;
-			push = true;
-		}
+		push = lnet_ni_set_status_locked(ni, LNET_NI_STATUS_UP);
 		lnet_ni_unlock(ni);
 	}
 
diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c
index e3b3e71..1253e4c 100644
--- a/net/lnet/lnet/router.c
+++ b/net/lnet/lnet/router.c
@@ -1014,15 +1014,9 @@ int lnet_get_rtr_pool_cfg(int cpt, struct lnet_ioctl_pool_cfg *pool_cfg)
 	struct lnet_ni *ni;
 	bool update = false;
 
-	list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
-		lnet_ni_lock(ni);
-		if (ni->ni_status &&
-		    ni->ni_status->ns_status != status) {
-			ni->ni_status->ns_status = status;
+	list_for_each_entry(ni, &net->net_ni_list, ni_netlist)
+		if (lnet_ni_set_status(ni, status))
 			update = true;
-		}
-		lnet_ni_unlock(ni);
-	}
 
 	return update;
 }
@@ -1031,6 +1025,7 @@ int lnet_get_rtr_pool_cfg(int cpt, struct lnet_ioctl_pool_cfg *pool_cfg)
 lnet_update_ni_status_locked(void)
 {
 	struct lnet_net *net;
+	struct lnet_ni *ni;
 	bool push = false;
 	time64_t now;
 	time64_t timeout;
@@ -1045,13 +1040,13 @@ int lnet_get_rtr_pool_cfg(int cpt, struct lnet_ioctl_pool_cfg *pool_cfg)
 			continue;
 
 		if (now < net->net_last_alive + timeout)
-			continue;
+			goto check_ni_fatal;
 
 		spin_lock(&net->net_lock);
 		/* re-check with lock */
 		if (now < net->net_last_alive + timeout) {
 			spin_unlock(&net->net_lock);
-			continue;
+			goto check_ni_fatal;
 		}
 		spin_unlock(&net->net_lock);
 
@@ -1059,7 +1054,25 @@ int lnet_get_rtr_pool_cfg(int cpt, struct lnet_ioctl_pool_cfg *pool_cfg)
 		 * timeout on any of its constituent NIs, then mark all
 		 * the NIs down.
 		 */
-		push = lnet_net_set_status_locked(net, LNET_NI_STATUS_DOWN);
+		if (lnet_net_set_status_locked(net, LNET_NI_STATUS_DOWN)) {
+			push = true;
+			continue;
+		}
+
+check_ni_fatal:
+		list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
+			/* lnet_ni_set_status() will perform the same check of
+			 * ni_status while holding the ni lock. We can safely
+			 * check ni_status without that lock because it is only
+			 * written to under net_lock/EX and our caller is
+			 * holding a net lock.
+			 */
+			if (atomic_read(&ni->ni_fatal_error_on) &&
+			    ni->ni_status &&
+			    ni->ni_status->ns_status != LNET_NI_STATUS_DOWN &&
+			    lnet_ni_set_status(ni, LNET_NI_STATUS_DOWN))
+				push = true;
+		}
 	}
 
 	return push;
-- 
1.8.3.1

  parent reply	other threads:[~2020-08-11 12:20 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-11 12:19 [lustre-devel] [PATCH 00/23] lustre: latest patches landed to OpenSFS 08/11/2020 James Simmons
2020-08-11 12:19 ` [lustre-devel] [PATCH 01/23] lustre: lov: one more fix to write_intent end for trunc James Simmons
2020-08-11 12:19 ` [lustre-devel] [PATCH 02/23] lustre: lov: annotate nested locking of obd_dev_mutex James Simmons
2020-08-11 12:19 ` [lustre-devel] [PATCH 03/23] lustre: ptlrpc: make ptlrpc_connection_put() static inline James Simmons
2020-08-11 12:20 ` [lustre-devel] [PATCH 04/23] lustre: mdc: create mdc_acl.c James Simmons
2020-08-11 12:20 ` [lustre-devel] [PATCH 05/23] lustre: llite: Remove mutex on dio read James Simmons
2020-08-11 12:20 ` [lustre-devel] [PATCH 06/23] lustre: obd: rename lprocfs_ / LPROC_SEQ_ to debugfs name James Simmons
2020-08-11 12:20 ` [lustre-devel] [PATCH 07/23] lustre: sec: atomicity of encryption context getting/setting James Simmons
2020-08-11 12:20 ` [lustre-devel] [PATCH 08/23] lustre: sec: encryption support for DoM files James Simmons
2020-08-11 12:20 ` [lustre-devel] [PATCH 09/23] lustre: sec: check if page is empty with ZERO_PAGE James Simmons
2020-08-11 12:20 ` [lustre-devel] [PATCH 10/23] lustre: uapi: add OBD_CONNECT2_GETATTR_PFID James Simmons
2020-08-11 12:20 ` [lustre-devel] [PATCH 11/23] lustre: update version to 2.13.55 James Simmons
2020-08-11 12:20 ` [lustre-devel] [PATCH 12/23] lustre: sysfs: error-check value stored in jobid_var James Simmons
2020-08-11 12:20 ` [lustre-devel] [PATCH 13/23] lnet: Add param to control response tracking James Simmons
2020-08-11 12:20 ` [lustre-devel] [PATCH 14/23] lnet: Ensure LNet pings and pushes are always tracked James Simmons
2020-08-11 12:20 ` [lustre-devel] [PATCH 15/23] lnet: Preferred NI logic breaks MR routing James Simmons
2020-08-11 12:20 ` [lustre-devel] [PATCH 16/23] lnet: socklnd: remove declarations of missing functions James Simmons
2020-08-11 12:20 ` [lustre-devel] [PATCH 17/23] lnet: discard unused lnet_print_hdr() James Simmons
2020-08-11 12:20 ` [lustre-devel] [PATCH 18/23] lnet: clarify initialization of lpni_refcount James Simmons
2020-08-11 12:20 ` [lustre-devel] [PATCH 19/23] lnet: Allow duplicate nets in ip2nets syntax James Simmons
2020-08-11 12:20 ` [lustre-devel] [PATCH 20/23] lustre: llite: pack parent FID in getattr James Simmons
2020-08-11 12:20 ` [lustre-devel] [PATCH 21/23] lnet: Clear lp_dc_error when discovery completes James Simmons
2020-08-11 12:20 ` James Simmons [this message]
2020-08-11 12:20 ` [lustre-devel] [PATCH 23/23] lnet: socklnd: NID to interface mapping issues James Simmons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1597148419-20629-23-git-send-email-jsimmons@infradead.org \
    --to=jsimmons@infradead.org \
    --cc=lustre-devel@lists.lustre.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).