lustre-devel-lustre.org archive mirror
 help / color / mirror / Atom feed
From: James Simmons <jsimmons@infradead.org>
To: lustre-devel@lists.lustre.org
Subject: [lustre-devel] [PATCH 17/28] lustre: ptlrpc: decrease time between reconnection
Date: Sun, 15 Nov 2020 19:59:50 -0500	[thread overview]
Message-ID: <1605488401-981-18-git-send-email-jsimmons@infradead.org> (raw)
In-Reply-To: <1605488401-981-1-git-send-email-jsimmons@infradead.org>

From: Alexander Boyko <alexander.boyko@hpe.com>

When a connection get a timeout or get an error reply from a sever,
the next attempt happens after PING_INTERVAL. It is equal to
obd_timeout/4. When a first reconnection fails, a second go to
failover pair. And a third connection go to a original server.
Only 3 reconnection before server evicts client base on blocking
ast timeout. Some times a first failed and the last is a bit late,
so client is evicted. It is better to try reconnect with a timeout
equal to a connection request deadline, it would increase a number
of attempts in 5 times for a large obd_timeout. For example,
    obd_timeout=200
     - [ 1597902357, CONNECTING ]
     - [ 1597902357, FULL ]
     - [ 1597902422, DISCONN ]
     - [ 1597902422, CONNECTING ]
     - [ 1597902433, DISCONN ]
     - [ 1597902473, CONNECTING ]
     - [ 1597902473, DISCONN ] <- ENODEV from a failover pair
     - [ 1597902523, CONNECTING ]
     - [ 1597902539, DISCONN ]

The patch adds a logic to wakeup pinger for failed connection request
with ETIMEDOUT or ENODEV. It adds imp_next_ping processing for
ptlrpc_pinger_main() time_to_next_wake calculation, and fixes setting
of imp_next_ping value.

HPE-bug-id: LUS-8520
WC-bug-id: https://jira.whamcloud.com/browse/LU-14031
Lustre-commit: de8ed5f19f0413 ("LU-14031 ptlrpc: decrease time between reconnection")
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/40244
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/events.c |  5 ++++
 fs/lustre/ptlrpc/import.c | 36 ++++++++++++++++++++++-
 fs/lustre/ptlrpc/niobuf.c |  2 --
 fs/lustre/ptlrpc/pinger.c | 73 ++++++++++++++++++++++++++++++-----------------
 4 files changed, 87 insertions(+), 29 deletions(-)

diff --git a/fs/lustre/ptlrpc/events.c b/fs/lustre/ptlrpc/events.c
index 0943612..fe33600 100644
--- a/fs/lustre/ptlrpc/events.c
+++ b/fs/lustre/ptlrpc/events.c
@@ -59,6 +59,11 @@ void request_out_callback(struct lnet_event *ev)
 
 	DEBUG_REQ(D_NET, req, "type %d, status %d", ev->type, ev->status);
 
+	/* Do not update imp_next_ping for connection request */
+	if (lustre_msg_get_opc(req->rq_reqmsg) !=
+	    req->rq_import->imp_connect_op)
+		ptlrpc_pinger_sending_on_import(req->rq_import);
+
 	sptlrpc_request_out_callback(req);
 
 	spin_lock(&req->rq_lock);
diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c
index 4e573cd..21ce593 100644
--- a/fs/lustre/ptlrpc/import.c
+++ b/fs/lustre/ptlrpc/import.c
@@ -1037,7 +1037,6 @@ static int ptlrpc_connect_interpret(const struct lu_env *env,
 		 */
 		imp->imp_force_reconnect = ptlrpc_busy_reconnect(rc);
 		spin_unlock(&imp->imp_lock);
-		ptlrpc_maybe_ping_import_soon(imp);
 		goto out;
 	}
 
@@ -1303,6 +1302,8 @@ static int ptlrpc_connect_interpret(const struct lu_env *env,
 
 	if (rc) {
 		bool inact = false;
+		time64_t now = ktime_get_seconds();
+		time64_t next_connect;
 
 		import_set_state_nolock(imp, LUSTRE_IMP_DISCON);
 		if (rc == -EACCES) {
@@ -1344,7 +1345,28 @@ static int ptlrpc_connect_interpret(const struct lu_env *env,
 				import_set_state_nolock(imp, LUSTRE_IMP_CLOSED);
 				inact = true;
 			}
+		} else if (rc == -ENODEV || rc == -ETIMEDOUT) {
+			/* ENODEV means there is no service, force reconnection
+			 * to a pair if attempt happen ptlrpc_next_reconnect
+			 * before now. ETIMEDOUT could be set during network
+			 * error and do not guarantee request deadline happened.
+			 */
+			struct obd_import_conn *conn;
+			time64_t reconnect_time;
+
+			/* Same as ptlrpc_next_reconnect, but in past */
+			reconnect_time = now - INITIAL_CONNECT_TIMEOUT;
+			list_for_each_entry(conn, &imp->imp_conn_list,
+					    oic_item) {
+				if (conn->oic_last_attempt <= reconnect_time) {
+					imp->imp_force_verify = 1;
+					break;
+				}
+			}
 		}
+
+		next_connect = imp->imp_conn_current->oic_last_attempt +
+			       (request->rq_deadline - request->rq_sent);
 		spin_unlock(&imp->imp_lock);
 
 		if (inact)
@@ -1353,6 +1375,18 @@ static int ptlrpc_connect_interpret(const struct lu_env *env,
 		if (rc == -EPROTO)
 			return rc;
 
+		/* adjust imp_next_ping to request deadline + 1 and reschedule
+		 * a pinger if import lost processing during CONNECTING or far
+		 * away from request deadline. It could happen when connection
+		 * was initiated outside of pinger, like
+		 * ptlrpc_set_import_discon().
+		 */
+		if (!imp->imp_force_verify && (imp->imp_next_ping <= now ||
+		    imp->imp_next_ping > next_connect)) {
+			imp->imp_next_ping = max(now, next_connect) + 1;
+			ptlrpc_pinger_wake_up();
+		}
+
 		ptlrpc_maybe_ping_import_soon(imp);
 
 		CDEBUG(D_HA, "recovery of %s on %s failed (%d)\n",
diff --git a/fs/lustre/ptlrpc/niobuf.c b/fs/lustre/ptlrpc/niobuf.c
index 924b9c4..a1e6581 100644
--- a/fs/lustre/ptlrpc/niobuf.c
+++ b/fs/lustre/ptlrpc/niobuf.c
@@ -701,8 +701,6 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply)
 	request->rq_deadline = request->rq_sent + request->rq_timeout +
 			       ptlrpc_at_get_net_latency(request);
 
-	ptlrpc_pinger_sending_on_import(imp);
-
 	DEBUG_REQ(D_INFO, request, "send flags=%x",
 		  lustre_msg_get_flags(request->rq_reqmsg));
 	rc = ptl_send_buf(&request->rq_req_md_h,
diff --git a/fs/lustre/ptlrpc/pinger.c b/fs/lustre/ptlrpc/pinger.c
index e23ba3c..178153c 100644
--- a/fs/lustre/ptlrpc/pinger.c
+++ b/fs/lustre/ptlrpc/pinger.c
@@ -108,6 +108,21 @@ static bool ptlrpc_check_import_is_idle(struct obd_import *imp)
 	return true;
 }
 
+static void ptlrpc_update_next_ping(struct obd_import *imp, int soon)
+{
+#ifdef CONFIG_LUSTRE_FS_PINGER
+	time64_t time = soon ? PING_INTERVAL_SHORT : PING_INTERVAL;
+
+	if (imp->imp_state == LUSTRE_IMP_DISCON) {
+		time64_t dtime = max_t(time64_t, CONNECTION_SWITCH_MIN,
+				  AT_OFF ? 0 :
+				  at_get(&imp->imp_at.iat_net_latency));
+		time = min(time, dtime);
+	}
+	imp->imp_next_ping = ktime_get_seconds() + time;
+#endif
+}
+
 static int ptlrpc_ping(struct obd_import *imp)
 {
 	struct ptlrpc_request *req;
@@ -125,26 +140,17 @@ static int ptlrpc_ping(struct obd_import *imp)
 
 	DEBUG_REQ(D_INFO, req, "pinging %s->%s",
 		  imp->imp_obd->obd_uuid.uuid, obd2cli_tgt(imp->imp_obd));
+	/* Updating imp_next_ping early, it allows pinger_check_timeout to
+	 * see an actual time for next awake. request_out_callback update
+	 * happens at another thread, and ptlrpc_pinger_main may sleep
+	 * already.
+	 */
+	ptlrpc_update_next_ping(imp, 0);
 	ptlrpcd_add_req(req);
 
 	return 0;
 }
 
-static void ptlrpc_update_next_ping(struct obd_import *imp, int soon)
-{
-#ifdef CONFIG_LUSTRE_FS_PINGER
-	time64_t time = soon ? PING_INTERVAL_SHORT : PING_INTERVAL;
-
-	if (imp->imp_state == LUSTRE_IMP_DISCON) {
-		time64_t dtime = max_t(time64_t, CONNECTION_SWITCH_MIN,
-				  AT_OFF ? 0 :
-				  at_get(&imp->imp_at.iat_net_latency));
-		time = min(time, dtime);
-	}
-	imp->imp_next_ping = ktime_get_seconds() + time;
-#endif
-}
-
 static inline int imp_is_deactive(struct obd_import *imp)
 {
 	return (imp->imp_deactive ||
@@ -153,17 +159,32 @@ static inline int imp_is_deactive(struct obd_import *imp)
 
 static inline time64_t ptlrpc_next_reconnect(struct obd_import *imp)
 {
-	if (imp->imp_server_timeout)
-		return ktime_get_seconds() + (obd_timeout >> 1);
-	else
-		return ktime_get_seconds() + obd_timeout;
+	return ktime_get_seconds() + INITIAL_CONNECT_TIMEOUT;
 }
 
-static time64_t pinger_check_timeout(time64_t time)
+static timeout_t pinger_check_timeout(time64_t time)
 {
-	time64_t timeout = PING_INTERVAL;
+	timeout_t timeout = PING_INTERVAL;
+	timeout_t next_timeout;
+	time64_t now;
+	struct list_head *iter;
+	struct obd_import *imp;
+
+	mutex_lock(&pinger_mutex);
+	now = ktime_get_seconds();
+	/* Process imports to find a nearest next ping */
+	list_for_each(iter, &pinger_imports) {
+		imp = list_entry(iter, struct obd_import, imp_pinger_chain);
+		if (!imp->imp_pingable || imp->imp_next_ping < now)
+			continue;
+		next_timeout = imp->imp_next_ping - now;
+		/* make sure imp_next_ping in the future from time */
+		if (next_timeout > (now - time) && timeout > next_timeout)
+			timeout = next_timeout;
+	}
+	mutex_unlock(&pinger_mutex);
 
-	return time + timeout - ktime_get_seconds();
+	return timeout - (now - time);
 }
 
 static bool ir_up;
@@ -245,7 +266,8 @@ static void ptlrpc_pinger_process_import(struct obd_import *imp,
 
 static void ptlrpc_pinger_main(struct work_struct *ws)
 {
-	time64_t this_ping, time_after_ping, time_to_next_wake;
+	time64_t this_ping, time_after_ping;
+	timeout_t time_to_next_wake;
 	struct obd_import *imp;
 
 	do {
@@ -276,9 +298,8 @@ static void ptlrpc_pinger_main(struct work_struct *ws)
 		 * we will SKIP the next ping at next_ping, and the
 		 * ping will get sent 2 timeouts from now!  Beware.
 		 */
-		CDEBUG(D_INFO, "next wakeup in %lld (%lld)\n",
-		       time_to_next_wake,
-		       this_ping + PING_INTERVAL);
+		CDEBUG(D_INFO, "next wakeup in %d (%lld)\n",
+		       time_to_next_wake, this_ping + PING_INTERVAL);
 	} while (time_to_next_wake <= 0);
 
 	queue_delayed_work(pinger_wq, &ping_work,
-- 
1.8.3.1

  parent reply	other threads:[~2020-11-16  0:59 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-16  0:59 [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020 James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 01/28] llite: remove splice_read handling for PCC James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 02/28] lustre: llite: disable statahead_agl for sanity test_56ra James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 03/28] lustre: seq_file .next functions must update *pos James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 04/28] lustre: llite: ASSERTION( last_oap_count > 0 ) failed James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 05/28] lnet: o2ib: raise bind cap before resolving address James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 06/28] lustre: use memalloc_nofs_save() for GFP_NOFS kvmalloc allocations James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 07/28] lnet: o2iblnd: Don't retry indefinitely James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 08/28] lustre: llite: rmdir releases inode on client James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 09/28] lustre: gss: update sequence in case of target disconnect James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 10/28] lustre: lov: doesn't check lov_refcount James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 11/28] lustre: ptlrpc: remove unused code at pinger James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 12/28] lustre: mdc: remote object support getattr from cache James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 13/28] lustre: llite: pass name in getattr by FID James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 14/28] lnet: o2iblnd: 'Timed out tx' error message James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 15/28] lustre: ldlm: Fix unbounded OBD_FAIL_LDLM_CANCEL_BL_CB_RACE wait James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 16/28] lustre: ldlm: group locks for DOM IBIT lock James Simmons
2020-11-16  0:59 ` James Simmons [this message]
2020-11-16  0:59 ` [lustre-devel] [PATCH 18/28] lustre: ptlrpc: throttle RPC resend if network error James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 19/28] lustre: ldlm: BL AST vs failed lock enqueue race James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 20/28] lustre: ptlrpc: don't log connection 'restored' inappropriately James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 21/28] lustre: llite: Avoid eternel retry loops with MAP_POPULATE James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 22/28] lustre: ptlrpc: introduce OST_SEEK RPC James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 23/28] lustre: clio: SEEK_HOLE/SEEK_DATA on client side James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 24/28] lustre: sec: O_DIRECT for encrypted file James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 25/28] lustre: sec: restrict fallocate on encrypted files James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 26/28] lustre: sec: encryption with different client PAGE_SIZE James Simmons
2020-11-16  1:00 ` [lustre-devel] [PATCH 27/28] lustre: sec: require enc key in case of O_CREAT only James Simmons
2020-11-16  1:00 ` [lustre-devel] [PATCH 28/28] lustre: sec: fix O_DIRECT and encrypted files James Simmons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1605488401-981-18-git-send-email-jsimmons@infradead.org \
    --to=jsimmons@infradead.org \
    --cc=lustre-devel@lists.lustre.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).