lustre-devel-lustre.org archive mirror
 help / color / mirror / Atom feed
From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>,
	Oleg Drokin <green@whamcloud.com>, NeilBrown <neilb@suse.de>
Cc: Chris Horn <chris.horn@hpe.com>,
	Lustre Development List <lustre-devel@lists.lustre.org>
Subject: [lustre-devel] [PATCH 05/14] lnet: Router ping timeout with discovery disabled
Date: Mon,  3 May 2021 20:10:07 -0400	[thread overview]
Message-ID: <1620087016-17857-6-git-send-email-jsimmons@infradead.org> (raw)
In-Reply-To: <1620087016-17857-1-git-send-email-jsimmons@infradead.org>

From: Chris Horn <chris.horn@hpe.com>

Discovery pings are used to determine the health of gateways and
associated routes. Ping replies from gateways with dynamic discovery
(DD) disabled (or if DD is disabled locally) are handled in
a special routine, lnet_router_discovery_ping_reply(), but this
function and related code doesn't handle the case where a discovery
ping hits the response tracker timeout and is unlinked by the
monitor thread. In this case, an UNLINK event is generated and we
do not call the lnet_router_discovery_ping_reply(). For gateways
with DD enabled (and DD enabled locally), we handle this case
in lnet_router_discovery_complete(). If discovery failed then
lp_dc_error is set and we mark all routes down for the gateway. We
can simply extend this logic to the case of gateways w/DD disabled
(or DD disabled locally).

Fixes: dc80207e3a ("lnet: fix asym routing with multi-hop")
HPE-bug-id: LUS-9612
WC-bug-id: https://jira.whamcloud.com/browse/LU-14206
Lustre-commit: 173d86c6e9a704a8 ("LU-14206 lnet: Router ping timeout with discovery disabled")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/40923
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/router.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c
index ae7582ca..e179997 100644
--- a/net/lnet/lnet/router.c
+++ b/net/lnet/lnet/router.c
@@ -495,11 +495,11 @@ bool lnet_is_route_alive(struct lnet_route *route)
 	lp->lp_alive = lp->lp_dc_error == 0;
 	spin_unlock(&lp->lp_lock);
 
-	/* ping replies are being handled when discovery is disabled */
-	if (lnet_is_discovery_disabled_locked(lp))
-		return;
-
 	if (!lp->lp_dc_error) {
+		/* ping replies are being handled when discovery is disabled */
+		if (lnet_is_discovery_disabled_locked(lp))
+			return;
+
 		/* mark single-hop routes. If the remote net is not configured
 		 * on the gateway we assume this is intentional and we mark the
 		 * gateway as multi-hop
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

  parent reply	other threads:[~2021-05-04  0:10 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-04  0:10 [lustre-devel] [PATCH 00/14] Update to OpenSFS tree as of May 3, 2021 James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 01/14] lustre: llite: Remove last lockahead old compat James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 02/14] lustre: mdc: include linux/idr.h for referenced code James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 03/14] lnet: Recover local NI w/exponential backoff interval James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 04/14] lnet: Deprecate lnet_recovery_interval James Simmons
2021-05-04  0:10 ` James Simmons [this message]
2021-05-04  0:10 ` [lustre-devel] [PATCH 06/14] lnet: Ensure proper peer, peer NI, peer net hierarchy James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 07/14] lnet: libcfs: simplify task management in tracefile.c James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 08/14] lustre: move lu_tgt_pool out of obd_target.h James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 09/14] lnet: libcfs: remove references to Sun Trademark James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 10/14] lnet: Skip discovery in LNetPrimaryNID if DD disabled James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 11/14] lustre: ptlrpc: idle import vs lock enqueue race James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 12/14] lustre: mdc: make rpc set for MDS_STATFS interruptible James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 13/14] lustre: llite: fake symlink type of foreign file/dir James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 14/14] lustre: llite: use d_is_symlink to test if dentry is a symlink James Simmons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1620087016-17857-6-git-send-email-jsimmons@infradead.org \
    --to=jsimmons@infradead.org \
    --cc=adilger@whamcloud.com \
    --cc=chris.horn@hpe.com \
    --cc=green@whamcloud.com \
    --cc=lustre-devel@lists.lustre.org \
    --cc=neilb@suse.de \
    --subject='Re: [lustre-devel] [PATCH 05/14] lnet: Router ping timeout with discovery disabled' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).