Lustre-devel archive on lore.kernel.org
 help / color / Atom feed
From: James Simmons <jsimmons@infradead.org>
To: lustre-devel@lists.lustre.org
Subject: [lustre-devel] [PATCH 16/22] lnet: Correct the default LND timeout
Date: Tue,  2 Jun 2020 20:59:55 -0400
Message-ID: <1591146001-27171-17-git-send-email-jsimmons@infradead.org> (raw)
In-Reply-To: <1591146001-27171-1-git-send-email-jsimmons@infradead.org>

From: Chris Horn <hornc@cray.com>

Default LND timeout is currently too low. To allow for
lnet_retry_count resend attempts within a single
lnet_transaction_timeout window, the LND timeout needs to be less
than lnet_transaction_timeout / lnet_retry_count. If the retry
count is 0, we still want LND timeout to be less than the LNet
transaction timeout.

Also, be sure to update the LND timeout when health is toggled on or
off.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13510
Lustre-commit: 0127d64b8cadd ("LU-13510 lnet: Correct the default LND timeout")
Signed-off-by: Chris Horn <hornc@cray.com>
Reviewed-on: https://review.whamcloud.com/38481
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h |  1 -
 net/lnet/lnet/api-ni.c        | 28 +++++++++++++++++++---------
 2 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index a4a323c..a7825f9 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -83,7 +83,6 @@
 
 /* default timeout */
 #define DEFAULT_PEER_TIMEOUT    180
-#define LNET_LND_DEFAULT_TIMEOUT 5
 
 int choose_ipv4_src(u32 *ret, int interface, u32 dst_ipaddr, struct net *ns);
 
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index a966e64..62b4fa7 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -170,7 +170,15 @@ static int recovery_interval_set(const char *val,
 MODULE_PARM_DESC(lnet_retry_count,
 		 "Maximum number of times to retry transmitting a message");
 
-unsigned int lnet_lnd_timeout = LNET_LND_DEFAULT_TIMEOUT;
+#define LNET_LND_TIMEOUT_DEFAULT ((LNET_TRANSACTION_TIMEOUT_HEALTH_DEFAULT - 1) / \
+				  (LNET_RETRY_COUNT_HEALTH_DEFAULT + 1))
+unsigned int lnet_lnd_timeout = LNET_LND_TIMEOUT_DEFAULT;
+static void lnet_set_lnd_timeout(void)
+{
+	lnet_lnd_timeout = (lnet_transaction_timeout - 1) /
+			   (lnet_retry_count + 1);
+}
+
 unsigned int lnet_current_net_count;
 
 /*
@@ -220,6 +228,7 @@ static int lnet_discover(struct lnet_process_id id, u32 force,
 		lnet_transaction_timeout =
 			LNET_TRANSACTION_TIMEOUT_HEALTH_DEFAULT;
 		lnet_retry_count = LNET_RETRY_COUNT_HEALTH_DEFAULT;
+		lnet_set_lnd_timeout();
 	/* if we're turning off health then use the no health timeout
 	 * default.
 	 */
@@ -227,6 +236,7 @@ static int lnet_discover(struct lnet_process_id id, u32 force,
 		lnet_transaction_timeout =
 			LNET_TRANSACTION_TIMEOUT_NO_HEALTH_DEFAULT;
 		lnet_retry_count = 0;
+		lnet_set_lnd_timeout();
 	}
 
 	*sensitivity = value;
@@ -385,10 +395,10 @@ static int lnet_discover(struct lnet_process_id id, u32 force,
 	}
 
 	*transaction_to = value;
-	if (lnet_retry_count == 0)
-		lnet_lnd_timeout = value;
-	else
-		lnet_lnd_timeout = value / lnet_retry_count;
+	/* Update the lnet_lnd_timeout now that we've modified the
+	 * transaction timeout
+	 */
+	lnet_set_lnd_timeout();
 
 	mutex_unlock(&the_lnet.ln_api_mutex);
 
@@ -428,10 +438,10 @@ static int lnet_discover(struct lnet_process_id id, u32 force,
 
 	*retry_count = value;
 
-	if (value == 0)
-		lnet_lnd_timeout = lnet_transaction_timeout;
-	else
-		lnet_lnd_timeout = lnet_transaction_timeout / value;
+	/* Update the lnet_lnd_timeout now that we've modified the
+	 * transaction timeout
+	 */
+	lnet_set_lnd_timeout();
 
 	mutex_unlock(&the_lnet.ln_api_mutex);
 
-- 
1.8.3.1

  parent reply index

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-03  0:59 [lustre-devel] [PATCH 00/22] lustre: OpenSFS backport patches for May 29 2020 James Simmons
2020-06-03  0:59 ` [lustre-devel] [PATCH 01/22] lnet: libcfs: fix CPT handling for UP systems James Simmons
2020-06-03  0:59 ` [lustre-devel] [PATCH 02/22] lustre: use BIT() macro where appropriate in include James Simmons
2020-06-03  0:59 ` [lustre-devel] [PATCH 03/22] lustre: use BIT() macro where appropriate James Simmons
2020-06-03  0:59 ` [lustre-devel] [PATCH 04/22] lustre: ptlrpc: change LONG_UNLINK to PTLRPC_REQ_LONG_UNLINK James Simmons
2020-06-03  0:59 ` [lustre-devel] [PATCH 05/22] lustre: llite: use %pd to report dentry names James Simmons
2020-06-03  0:59 ` [lustre-devel] [PATCH 06/22] lnet: tidy lnet_discover and fix mem accounting bug James Simmons
2020-06-03  0:59 ` [lustre-devel] [PATCH 07/22] lustre: llite: prevent MAX_DIO_SIZE 32-bit truncation James Simmons
2020-06-03  0:59 ` [lustre-devel] [PATCH 08/22] lustre: llite: integrate statx() API with Lustre James Simmons
2020-06-03  0:59 ` [lustre-devel] [PATCH 09/22] lustre: ldlm: no current source if lu_ref_del not in same tsk James Simmons
2020-06-03  0:59 ` [lustre-devel] [PATCH 10/22] lnet: always pass struct lnet_md by reference James Simmons
2020-06-03  0:59 ` [lustre-devel] [PATCH 11/22] lustre: llite: fix read if readahead window smaller than rpc size James Simmons
2020-06-03  0:59 ` [lustre-devel] [PATCH 12/22] lustre: obdclass: bind zombie export cleanup workqueue James Simmons
2020-06-03  0:59 ` [lustre-devel] [PATCH 13/22] lnet: handle discovery off properly James Simmons
2020-06-03  0:59 ` [lustre-devel] [PATCH 14/22] lnet: Force full discovery cycle James Simmons
2020-06-03  0:59 ` [lustre-devel] [PATCH 15/22] lnet: set route aliveness properly James Simmons
2020-06-03  0:59 ` James Simmons [this message]
2020-06-03  0:59 ` [lustre-devel] [PATCH 17/22] lnet: Add lnet_lnd_timeout to sysfs James Simmons
2020-06-03  0:59 ` [lustre-devel] [PATCH 18/22] lnet: lnd: Allow independent ko2iblnd timeout James Simmons
2020-06-03  0:59 ` [lustre-devel] [PATCH 19/22] lnet: lnd: Allow independent socklnd timeout James Simmons
2020-06-03  0:59 ` [lustre-devel] [PATCH 20/22] lnet: lnd: gracefully handle unexpected events James Simmons
2020-06-03  1:00 ` [lustre-devel] [PATCH 21/22] lustre: update version to 2.13.54 James Simmons
2020-06-03  1:00 ` [lustre-devel] [PATCH 22/22] lnet: procs: print new line based on distro James Simmons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1591146001-27171-17-git-send-email-jsimmons@infradead.org \
    --to=jsimmons@infradead.org \
    --cc=lustre-devel@lists.lustre.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Lustre-devel archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lustre-devel/0 lustre-devel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lustre-devel lustre-devel/ https://lore.kernel.org/lustre-devel \
		lustre-devel@lists.lustre.org
	public-inbox-index lustre-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.lustre.lists.lustre-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git