From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>,
Oleg Drokin <green@whamcloud.com>, NeilBrown <neilb@suse.de>
Cc: Chris Horn <chris.horn@hpe.com>,
Lustre Development List <lustre-devel@lists.lustre.org>
Subject: [lustre-devel] [PATCH 26/27] lnet: Check if discovery toggled off in ping reply
Date: Sun, 13 Jun 2021 19:11:36 -0400 [thread overview]
Message-ID: <1623625897-17706-27-git-send-email-jsimmons@infradead.org> (raw)
In-Reply-To: <1623625897-17706-1-git-send-email-jsimmons@infradead.org>
From: Chris Horn <chris.horn@hpe.com>
If a peer is initially discovered and found to have discovery
enabled, but the peer later reloads LNet with discovery disabled,
then we can delete the peer and re-create it the next time the peer
is discovered.
It is safe to delete and re-create the peer as long as it wasn't
configured manually.
In lnet_peer_deletion(), we need to use lnet_del_init() when removing
the peer from the discovery queue because the lnet_peer_del() code
path can result in a call to lnet_peer_queue_for_discovery() where
we check if the lp_dc_list is empty.
HPE-bug-id: LUS-9178
Fixes: 7ec94557b1 ("lnet: Prevent discovery on peer marked deletion")
WC-bug-id: https://jira.whamcloud.com/browse/LU-14661
Lustre-commit: 143893381d428466 ("LU-14661 lnet: Check if discovery toggled off in ping reply")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/43508
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
net/lnet/lnet/peer.c | 28 ++++++++++++++++++++--------
1 file changed, 20 insertions(+), 8 deletions(-)
diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 7630aff..2fc784d 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -2254,22 +2254,34 @@ void lnet_peer_push_event(struct lnet_event *ev)
/* The peer may have discovery disabled at its end. Set
* NO_DISCOVERY as appropriate.
*/
- if (!(pbuf->pb_info.pi_features & LNET_PING_FEAT_DISCOVERY)) {
+ if (!(pbuf->pb_info.pi_features & LNET_PING_FEAT_DISCOVERY) ||
+ lnet_peer_discovery_disabled) {
CDEBUG(D_NET, "Peer %s has discovery disabled\n",
libcfs_nid2str(lp->lp_primary_nid));
- /* Mark the peer for deletion if we already know about it
- * and it's going from discovery set to no discovery set
+
+ /* Detect whether this peer has toggled discovery from on to
+ * off and whether we can delete and re-create the peer. Peers
+ * that were manually configured cannot be deleted by discovery.
+ * We need to delete this peer and re-create it if the peer was
+ * not configured manually, is currently considered DD capable,
+ * and either:
+ * 1. We've already discovered the peer (the peer has toggled
+ * the discovery feature from on to off), or
+ * 2. The peer is considered MR, but it was not user configured
+ * (this was a "temporary" peer created via the kernel APIs
+ * that we're discovering for the first time)
*/
- if (!(lp->lp_state & (LNET_PEER_NO_DISCOVERY |
- LNET_PEER_DISCOVERING)) &&
- lp->lp_state & LNET_PEER_DISCOVERED) {
+ if (!(lp->lp_state & (LNET_PEER_CONFIGURED |
+ LNET_PEER_NO_DISCOVERY)) &&
+ (lp->lp_state & (LNET_PEER_DISCOVERED |
+ LNET_PEER_MULTI_RAIL))) {
CDEBUG(D_NET, "Marking %s:0x%x for deletion\n",
libcfs_nid2str(lp->lp_primary_nid),
lp->lp_state);
lp->lp_state |= LNET_PEER_MARK_DELETION;
}
lp->lp_state |= LNET_PEER_NO_DISCOVERY;
- } else if (lp->lp_state & LNET_PEER_NO_DISCOVERY) {
+ } else {
CDEBUG(D_NET, "Peer %s has discovery enabled\n",
libcfs_nid2str(lp->lp_primary_nid));
lp->lp_state &= ~LNET_PEER_NO_DISCOVERY;
@@ -3083,7 +3095,7 @@ static int lnet_peer_deletion(struct lnet_peer *lp)
* of deleting it.
*/
if (!list_empty(&lp->lp_dc_list))
- list_del(&lp->lp_dc_list);
+ list_del_init(&lp->lp_dc_list);
list_for_each_entry_safe(route, tmp,
&lp->lp_routes,
lr_gwlist)
--
1.8.3.1
_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org
next prev parent reply other threads:[~2021-06-13 23:12 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-13 23:11 [lustre-devel] [PATCH 00/27] lustre: sync to 2.14.52 James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 01/27] lustre: uapi: add mdt_hash_name James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 02/27] lustre: uapi: rename CONFIG_T_* to MGS_CFG_T_* James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 03/27] lnet: o2iblnd: fix bug in list_first_entry() change James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 04/27] lustre: flr: mmap write/punch does not stale other mirrors James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 05/27] lustre: llite: default lsm update may memory leak James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 06/27] lustre: pcc: don't alloc FID in LLITE for pcc open James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 07/27] lustre: quota: default OST Pool Quotas James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 08/27] lustre: rename tgt_pool_* functions James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 09/27] lustre: llite: refresh layout after mirror merge/split James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 10/27] lustre: ptlrpc: do not match reply with resent RPC James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 11/27] lustre: vvp: wait for nrpages to be updated James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 12/27] lustre: obd: check if sbi->ll_md_exp is initialized James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 13/27] lustre: osc: Batch gang_lookup cbs James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 14/27] lustre: llite: Return errors for aio James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 15/27] lnet: do not crash if lnet_sock_getaddr returns error James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 16/27] lustre: sec: forbid file rename from enc to unencrypted dir James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 17/27] lustre: mdc: start changelog thread upon first access James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 18/27] lustre: llog: changelog purge deletes plain llog James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 19/27] lnet: libcfs: allow comma-separated masks James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 20/27] lustre: osc: cleanup comment in osc_object_is_contended James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 21/27] lnet: simplify lnet_ni_add_interface James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 22/27] lustre: lmv: change default hash type to crush James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 23/27] lustre: ptlrpc: move more members in PTLRPC request into pill James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 24/27] lustre: llite: add selinux testing James Simmons
2021-06-13 23:11 ` [lustre-devel] [PATCH 25/27] lnet: Fix destination NID for discovery PUSH James Simmons
2021-06-13 23:11 ` James Simmons [this message]
2021-06-13 23:11 ` [lustre-devel] [PATCH 27/27] lustre: update version to 2.14.52 James Simmons
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1623625897-17706-27-git-send-email-jsimmons@infradead.org \
--to=jsimmons@infradead.org \
--cc=adilger@whamcloud.com \
--cc=chris.horn@hpe.com \
--cc=green@whamcloud.com \
--cc=lustre-devel@lists.lustre.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).