lustre-devel-lustre.org archive mirror
 help / color / mirror / Atom feed
From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>,
	Oleg Drokin <green@whamcloud.com>, NeilBrown <neilb@suse.de>
Cc: Chris Horn <chris.horn@hpe.com>,
	Lustre Development List <lustre-devel@lists.lustre.org>
Subject: [lustre-devel] [PATCH 26/27] lnet: Check if discovery toggled off in ping reply
Date: Sun, 13 Jun 2021 19:11:36 -0400	[thread overview]
Message-ID: <1623625897-17706-27-git-send-email-jsimmons@infradead.org> (raw)
In-Reply-To: <1623625897-17706-1-git-send-email-jsimmons@infradead.org>

From: Chris Horn <chris.horn@hpe.com>

If a peer is initially discovered and found to have discovery
enabled, but the peer later reloads LNet with discovery disabled,
then we can delete the peer and re-create it the next time the peer
is discovered.

It is safe to delete and re-create the peer as long as it wasn't
configured manually.

In lnet_peer_deletion(), we need to use lnet_del_init() when removing
the peer from the discovery queue because the lnet_peer_del() code
path can result in a call to lnet_peer_queue_for_discovery() where
we check if the lp_dc_list is empty.

HPE-bug-id: LUS-9178
Fixes: 7ec94557b1 ("lnet: Prevent discovery on peer marked deletion")
WC-bug-id: https://jira.whamcloud.com/browse/LU-14661
Lustre-commit: 143893381d428466 ("LU-14661 lnet: Check if discovery toggled off in ping reply")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/43508
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/peer.c | 28 ++++++++++++++++++++--------
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 7630aff..2fc784d 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -2254,22 +2254,34 @@ void lnet_peer_push_event(struct lnet_event *ev)
 	/* The peer may have discovery disabled at its end. Set
 	 * NO_DISCOVERY as appropriate.
 	 */
-	if (!(pbuf->pb_info.pi_features & LNET_PING_FEAT_DISCOVERY)) {
+	if (!(pbuf->pb_info.pi_features & LNET_PING_FEAT_DISCOVERY) ||
+	    lnet_peer_discovery_disabled) {
 		CDEBUG(D_NET, "Peer %s has discovery disabled\n",
 		       libcfs_nid2str(lp->lp_primary_nid));
-		/* Mark the peer for deletion if we already know about it
-		 * and it's going from discovery set to no discovery set
+
+		/* Detect whether this peer has toggled discovery from on to
+		 * off and whether we can delete and re-create the peer. Peers
+		 * that were manually configured cannot be deleted by discovery.
+		 * We need to delete this peer and re-create it if the peer was
+		 * not configured manually, is currently considered DD capable,
+		 * and either:
+		 * 1. We've already discovered the peer (the peer has toggled
+		 *    the discovery feature from on to off), or
+		 * 2. The peer is considered MR, but it was not user configured
+		 *    (this was a "temporary" peer created via the kernel APIs
+		 *     that we're discovering for the first time)
 		 */
-		if (!(lp->lp_state & (LNET_PEER_NO_DISCOVERY |
-				      LNET_PEER_DISCOVERING)) &&
-		    lp->lp_state & LNET_PEER_DISCOVERED) {
+		if (!(lp->lp_state & (LNET_PEER_CONFIGURED |
+				      LNET_PEER_NO_DISCOVERY)) &&
+		    (lp->lp_state & (LNET_PEER_DISCOVERED |
+				     LNET_PEER_MULTI_RAIL))) {
 			CDEBUG(D_NET, "Marking %s:0x%x for deletion\n",
 			       libcfs_nid2str(lp->lp_primary_nid),
 			       lp->lp_state);
 			lp->lp_state |= LNET_PEER_MARK_DELETION;
 		}
 		lp->lp_state |= LNET_PEER_NO_DISCOVERY;
-	} else if (lp->lp_state & LNET_PEER_NO_DISCOVERY) {
+	} else {
 		CDEBUG(D_NET, "Peer %s has discovery enabled\n",
 		       libcfs_nid2str(lp->lp_primary_nid));
 		lp->lp_state &= ~LNET_PEER_NO_DISCOVERY;
@@ -3083,7 +3095,7 @@ static int lnet_peer_deletion(struct lnet_peer *lp)
 	 * of deleting it.
 	 */
 	if (!list_empty(&lp->lp_dc_list))
-		list_del(&lp->lp_dc_list);
+		list_del_init(&lp->lp_dc_list);
 	list_for_each_entry_safe(route, tmp,
 				 &lp->lp_routes,
 				 lr_gwlist)
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

  parent reply	other threads:[~2021-06-13 23:12 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-13 23:11 [lustre-devel] [PATCH 00/27] lustre: sync to 2.14.52 James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 01/27] lustre: uapi: add mdt_hash_name James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 02/27] lustre: uapi: rename CONFIG_T_* to MGS_CFG_T_* James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 03/27] lnet: o2iblnd: fix bug in list_first_entry() change James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 04/27] lustre: flr: mmap write/punch does not stale other mirrors James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 05/27] lustre: llite: default lsm update may memory leak James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 06/27] lustre: pcc: don't alloc FID in LLITE for pcc open James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 07/27] lustre: quota: default OST Pool Quotas James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 08/27] lustre: rename tgt_pool_* functions James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 09/27] lustre: llite: refresh layout after mirror merge/split James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 10/27] lustre: ptlrpc: do not match reply with resent RPC James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 11/27] lustre: vvp: wait for nrpages to be updated James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 12/27] lustre: obd: check if sbi->ll_md_exp is initialized James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 13/27] lustre: osc: Batch gang_lookup cbs James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 14/27] lustre: llite: Return errors for aio James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 15/27] lnet: do not crash if lnet_sock_getaddr returns error James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 16/27] lustre: sec: forbid file rename from enc to unencrypted dir James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 17/27] lustre: mdc: start changelog thread upon first access James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 18/27] lustre: llog: changelog purge deletes plain llog James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 19/27] lnet: libcfs: allow comma-separated masks James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 20/27] lustre: osc: cleanup comment in osc_object_is_contended James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 21/27] lnet: simplify lnet_ni_add_interface James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 22/27] lustre: lmv: change default hash type to crush James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 23/27] lustre: ptlrpc: move more members in PTLRPC request into pill James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 24/27] lustre: llite: add selinux testing James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 25/27] lnet: Fix destination NID for discovery PUSH James Simmons
2021-06-13 23:11 ` James Simmons [this message]
2021-06-13 23:11 ` [lustre-devel] [PATCH 27/27] lustre: update version to 2.14.52 James Simmons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1623625897-17706-27-git-send-email-jsimmons@infradead.org \
    --to=jsimmons@infradead.org \
    --cc=adilger@whamcloud.com \
    --cc=chris.horn@hpe.com \
    --cc=green@whamcloud.com \
    --cc=lustre-devel@lists.lustre.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).