lustre-devel-lustre.org archive mirror
 help / color / mirror / Atom feed
From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>,
	Oleg Drokin <green@whamcloud.com>, NeilBrown <neilb@suse.de>
Cc: Chris Horn <chris.horn@hpe.com>,
	Lustre Development List <lustre-devel@lists.lustre.org>
Subject: [lustre-devel] [PATCH 06/14] lnet: Ensure proper peer, peer NI, peer net hierarchy
Date: Mon,  3 May 2021 20:10:08 -0400	[thread overview]
Message-ID: <1620087016-17857-7-git-send-email-jsimmons@infradead.org> (raw)
In-Reply-To: <1620087016-17857-1-git-send-email-jsimmons@infradead.org>

From: Chris Horn <chris.horn@hpe.com>

The MR design dictates that the peer nets and peer NIs are ordered
such that the peer net and peer NI for a peer's primary NID appears
first, followed by other peer NIs in the primary NID's peer net,
followed by other peer nets/NIs. This ordering is broken and it can
result in tripping an assertion if the primary NID of a peer is
deleted. Modify lnet_peer_attach_peer_ni() to check whether the
NI being attached is the peer's primary, and place it, and its
associated peer net, appropriately.

Modify lnet_peer_set_primary_nid() so that it updates the
lp_primary_nid before calling lnet_peer_add_nid() so that
lnet_peer_attach_peer_ni() can detect the situation where the
primary is changing and act appropriately.

Finally, modify lnet_peer_merge_data() to enforce the hierarchy
after it has finished merging the contents of the ping buffer. This
ensures we maintain the correct hierarchy in certain edge cases where
we've needed to reconcile two peers. e.g. if a peer adds a new
interface, the discovery push may arrive from that new interface
which will result in a second peer object being created which will
need to be reconciled with the original peer object.

HPE-bug-id: LUS-9630
WC-bug-id: https://jira.whamcloud.com/browse/LU-13806
Lustre-commit: 9eb9474c41c823c7 ("LU-13806 lnet: Ensure proper peer, peer NI, peer net hierarchy")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/40985
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/peer.c | 41 +++++++++++++++++++++++++++++++++++++----
 1 file changed, 37 insertions(+), 4 deletions(-)

diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 0ec1460..db00514 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -1428,7 +1428,10 @@ struct lnet_peer_net *
 
 	/* Add peer_ni to peer_net */
 	lpni->lpni_peer_net = lpn;
-	list_add_tail(&lpni->lpni_peer_nis, &lpn->lpn_peer_nis);
+	if (lp->lp_primary_nid == lpni->lpni_nid)
+		list_add(&lpni->lpni_peer_nis, &lpn->lpn_peer_nis);
+	else
+		list_add_tail(&lpni->lpni_peer_nis, &lpn->lpn_peer_nis);
 	lnet_update_peer_net_healthv(lpni);
 	lnet_peer_net_addref_locked(lpn);
 
@@ -1436,7 +1439,10 @@ struct lnet_peer_net *
 	if (!lpn->lpn_peer) {
 		new_lpn = true;
 		lpn->lpn_peer = lp;
-		list_add_tail(&lpn->lpn_peer_nets, &lp->lp_peer_nets);
+		if (lp->lp_primary_nid == lpni->lpni_nid)
+			list_add(&lpn->lpn_peer_nets, &lp->lp_peer_nets);
+		else
+			list_add_tail(&lpn->lpn_peer_nets, &lp->lp_peer_nets);
 		lnet_peer_addref_locked(lp);
 	}
 
@@ -1678,10 +1684,14 @@ struct lnet_peer_net *
 
 	if (lp->lp_primary_nid == nid)
 		goto out;
+
+	lp->lp_primary_nid = nid;
+
 	rc = lnet_peer_add_nid(lp, nid, flags);
-	if (rc)
+	if (rc) {
+		lp->lp_primary_nid = old;
 		goto out;
-	lp->lp_primary_nid = nid;
+	}
 out:
 	CDEBUG(D_NET, "peer %s NID %s: %d\n",
 	       libcfs_nid2str(old), libcfs_nid2str(nid), rc);
@@ -2777,6 +2787,7 @@ static void lnet_discovery_event_handler(struct lnet_event *event)
 static int lnet_peer_merge_data(struct lnet_peer *lp,
 				struct lnet_ping_buffer *pbuf)
 {
+	struct lnet_peer_net *lpn;
 	struct lnet_peer_ni *lpni;
 	lnet_nid_t *curnis = NULL;
 	struct lnet_ni_status *addnis = NULL;
@@ -2902,6 +2913,28 @@ static int lnet_peer_merge_data(struct lnet_peer *lp,
 				goto out;
 		}
 	}
+
+	/* The peer net for the primary NID should be the first entry in the
+	 * peer's lp_peer_nets list, and the peer NI for the primary NID should
+	 * be the first entry in its peer net's lpn_peer_nis list.
+	 */
+	lpni = lnet_find_peer_ni_locked(pbuf->pb_info.pi_ni[1].ns_nid);
+	if (!lpni) {
+		CERROR("Internal error: Failed to lookup peer NI for primary NID: %s\n",
+		       libcfs_nid2str(pbuf->pb_info.pi_ni[1].ns_nid));
+		goto out;
+	}
+
+	lnet_peer_ni_decref_locked(lpni);
+
+	lpn = lpni->lpni_peer_net;
+	if (lpn->lpn_peer_nets.prev != &lp->lp_peer_nets)
+		list_move(&lpn->lpn_peer_nets, &lp->lp_peer_nets);
+
+	if (lpni->lpni_peer_nis.prev != &lpni->lpni_peer_net->lpn_peer_nis)
+		list_move(&lpni->lpni_peer_nis,
+			  &lpni->lpni_peer_net->lpn_peer_nis);
+
 	/*
 	 * Errors other than -ENOMEM are due to peers having been
 	 * configured with DLC. Ignore these because DLC overrides
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

  parent reply	other threads:[~2021-05-04  0:10 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-04  0:10 [lustre-devel] [PATCH 00/14] Update to OpenSFS tree as of May 3, 2021 James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 01/14] lustre: llite: Remove last lockahead old compat James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 02/14] lustre: mdc: include linux/idr.h for referenced code James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 03/14] lnet: Recover local NI w/exponential backoff interval James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 04/14] lnet: Deprecate lnet_recovery_interval James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 05/14] lnet: Router ping timeout with discovery disabled James Simmons
2021-05-04  0:10 ` James Simmons [this message]
2021-05-04  0:10 ` [lustre-devel] [PATCH 07/14] lnet: libcfs: simplify task management in tracefile.c James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 08/14] lustre: move lu_tgt_pool out of obd_target.h James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 09/14] lnet: libcfs: remove references to Sun Trademark James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 10/14] lnet: Skip discovery in LNetPrimaryNID if DD disabled James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 11/14] lustre: ptlrpc: idle import vs lock enqueue race James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 12/14] lustre: mdc: make rpc set for MDS_STATFS interruptible James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 13/14] lustre: llite: fake symlink type of foreign file/dir James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 14/14] lustre: llite: use d_is_symlink to test if dentry is a symlink James Simmons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1620087016-17857-7-git-send-email-jsimmons@infradead.org \
    --to=jsimmons@infradead.org \
    --cc=adilger@whamcloud.com \
    --cc=chris.horn@hpe.com \
    --cc=green@whamcloud.com \
    --cc=lustre-devel@lists.lustre.org \
    --cc=neilb@suse.de \
    --subject='Re: [lustre-devel] [PATCH 06/14] lnet: Ensure proper peer, peer NI, peer net hierarchy' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).