lustre-devel-lustre.org archive mirror
 help / color / mirror / Atom feed
From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>,
	Oleg Drokin <green@whamcloud.com>, NeilBrown <neilb@suse.de>
Cc: Chris Horn <chris.horn@hpe.com>,
	Lustre Development List <lustre-devel@lists.lustre.org>
Subject: [lustre-devel] [PATCH 17/29] lnet: Use lr_hops for avoid_asym_router_failure
Date: Sun, 25 Apr 2021 16:08:24 -0400	[thread overview]
Message-ID: <1619381316-7719-18-git-send-email-jsimmons@infradead.org> (raw)
In-Reply-To: <1619381316-7719-1-git-send-email-jsimmons@infradead.org>

From: Chris Horn <chris.horn@hpe.com>

In order for the asymmetric route failure avoidance feature to work
properly it needs to know what the hop count of a route should be.
This information is defined by the lr_hops field of the lnet_route.
The lr_single_hop is what discovery was able to determine the hop
count actually is (single or multi) based on the last ping reply.
If a remote interface on a router goes missing, the route may be
classified as multi-hop by discovery, but it should be considered
single-hop for the purposes of avoiding asymmetric route failure.

HPE-bug-id: LUS-9099
WC-bug-id: https://jira.whamcloud.com/browse/LU-13785
Lustre-commit: 2e07619477684f28 ("LU-13785 lnet: Use lr_hops for avoid_asym_router_failure")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/39362
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/router.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c
index ee3c15f..af16263 100644
--- a/net/lnet/lnet/router.c
+++ b/net/lnet/lnet/router.c
@@ -317,7 +317,8 @@ bool lnet_is_route_alive(struct lnet_route *route)
 	 * that the remote net must exist on the gateway. For multi-hop
 	 * routes the next-hop will not have the remote net.
 	 */
-	if (avoid_asym_router_failure && route->lr_single_hop) {
+	if (avoid_asym_router_failure &&
+	    (route->lr_hops == 1 || route->lr_hops == LNET_UNDEFINED_HOPS)) {
 		rlpn = lnet_peer_get_net_locked(gw, route->lr_net);
 		if (!rlpn)
 			return false;
@@ -367,7 +368,8 @@ bool lnet_is_route_alive(struct lnet_route *route)
 static inline void
 lnet_check_route_inconsistency(struct lnet_route *route)
 {
-	if (!route->lr_single_hop && (int)route->lr_hops <= 1) {
+	if (!route->lr_single_hop &&
+	    (route->lr_hops == 1 || route->lr_hops == LNET_UNDEFINED_HOPS)) {
 		CWARN("route %s->%s is detected to be multi-hop but hop count is set to %d\n",
 		      libcfs_net2str(route->lr_net),
 		      libcfs_nid2str(route->lr_gateway->lp_primary_nid),
@@ -482,7 +484,9 @@ bool lnet_is_route_alive(struct lnet_route *route)
 		}
 
 		route->lr_single_hop = single_hop;
-		if (avoid_asym_router_failure && single_hop)
+		if (avoid_asym_router_failure &&
+		    (route->lr_hops == 1 ||
+		     route->lr_hops == LNET_UNDEFINED_HOPS))
 			lnet_set_route_aliveness(route, net_up);
 		else
 			lnet_set_route_aliveness(route, true);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

  parent reply	other threads:[~2021-04-25 20:09 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-25 20:08 [lustre-devel] [PATCH 00/29] lustre: Update to OpenSFS tree as of April 25, 2020 James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 01/29] lnet: socklnd: use sockaddr instead of u32 addresses James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 02/29] lnet: allow creation of IPv6 socket James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 03/29] lnet: allow lnet_connect() to use IPv6 addresses James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 04/29] lnet: handle possiblity of IPv6 being unavailable James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 05/29] lnet: socklnd: remove tcp bonding James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 06/29] lnet: socklnd: replace route construct James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 07/29] lustre: readahead: limit over reservation James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 08/29] lustre: clio: fix hang on urgent cached pages James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 09/29] lustre: uapi: add mdt_hash_name James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 10/29] lustre: mdc: set fid2path RPC interruptible James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 11/29] lustre: include: remove references to Sun Trademark James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 12/29] lnet: o2iblnd: Use REMOTE_DROPPED for ECONNREFUSED James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 13/29] lustre: lmv: reduce struct lmv_obd size James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 14/29] lustre: uapi: remove obsolete ioctls James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 15/29] lustre: lmv: don't include struct lu_qos_rr in client James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 16/29] lnet: libcfs: fix setting of debug_path James Simmons
2021-04-25 20:08 ` James Simmons [this message]
2021-04-25 20:08 ` [lustre-devel] [PATCH 18/29] lnet: Leverage peer aliveness more efficiently James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 19/29] lustre: mdt: mkdir should return -EEXIST if exists James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 20/29] lnet: o2iblnd: don't resend if there's no listener James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 21/29] lnet: obi2lnd: don't try to reconnect " James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 22/29] lustre: osc: fall back to vmalloc for large RPCs James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 23/29] lustre: ldlm: discard l_lock from struct ldlm_lock James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 24/29] lustre: llite: do fallocate() size checks under lock James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 25/29] lustre: misc: limit CDEBUG console message frequency James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 26/29] lustre: fallocate: Add punch mode to fallocate James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 27/29] lustre: various: only use wake_up_all() on exclusive waitqs James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 28/29] lnet: remove references to Sun Trademark James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 29/29] lustre: " James Simmons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1619381316-7719-18-git-send-email-jsimmons@infradead.org \
    --to=jsimmons@infradead.org \
    --cc=adilger@whamcloud.com \
    --cc=chris.horn@hpe.com \
    --cc=green@whamcloud.com \
    --cc=lustre-devel@lists.lustre.org \
    --cc=neilb@suse.de \
    --subject='Re: [lustre-devel] [PATCH 17/29] lnet: Use lr_hops for avoid_asym_router_failure' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).