lustre-devel-lustre.org archive mirror
 help / color / mirror / Atom feed
From: James Simmons <jsimmons@infradead.org>
To: lustre-devel@lists.lustre.org
Subject: [lustre-devel] [PATCH 18/28] lustre: ptlrpc: throttle RPC resend if network error
Date: Sun, 15 Nov 2020 19:59:51 -0500	[thread overview]
Message-ID: <1605488401-981-19-git-send-email-jsimmons@infradead.org> (raw)
In-Reply-To: <1605488401-981-1-git-send-email-jsimmons@infradead.org>

From: Aurelien Degremont <degremoa@amazon.com>

When sending a callback AST to a non-responding client, the server
retries endlessly until the client is eventually evicted. When using
ksocklnd, it will retry after each AST timeout, until the socket is
eventually closed, after sock_timeout sec, where the retry will fail
immediately, returning -110, as no socket could be established.

The thread will spin on retrying and failing, until eventual client
eviction. This will cause high thread CPU usage and possible resource
denial.

To workaround that, this patch avoids re-trying callback resend if:
 - the request is flagged with network error and timeout
 - last try was less than 1 sec ago

In worst case, retry will happen after a timeout based on req->rq_deadline.
If there is nothing else to handle, thread will be sleeping during that
time, removing CPU overhead.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13984
Lustre-commit: 4103527c1c9b38 ("LU-13984 ptlrpc: throttle RPC resend if network error")
Signed-off-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-on: https://review.whamcloud.com/40020
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/client.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c
index c9d9fe9..0e01ab33 100644
--- a/fs/lustre/ptlrpc/client.c
+++ b/fs/lustre/ptlrpc/client.c
@@ -1900,6 +1900,26 @@ int ptlrpc_check_set(const struct lu_env *env, struct ptlrpc_request_set *set)
 					goto interpret;
 				}
 
+				/* don't resend too fast in case of network
+				 * errors.
+				 */
+				if (ktime_get_real_seconds() < (req->rq_sent + 1)
+				    && req->rq_net_err && req->rq_timedout) {
+					DEBUG_REQ(D_INFO, req,
+						  "throttle request");
+					/* Don't try to resend RPC right away
+					 * as it is likely it will fail again
+					 * and ptlrpc_check_set() will be
+					 * called again, keeping this thread
+					 * busy. Instead, wait for the next
+					 * timeout. Flag it as resend to
+					 * ensure we don't wait to long.
+					 */
+					req->rq_resend = 1;
+					spin_unlock(&imp->imp_lock);
+					continue;
+				}
+
 				list_move_tail(&req->rq_list,
 					       &imp->imp_sending_list);
 
-- 
1.8.3.1

  parent reply	other threads:[~2020-11-16  0:59 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-16  0:59 [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020 James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 01/28] llite: remove splice_read handling for PCC James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 02/28] lustre: llite: disable statahead_agl for sanity test_56ra James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 03/28] lustre: seq_file .next functions must update *pos James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 04/28] lustre: llite: ASSERTION( last_oap_count > 0 ) failed James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 05/28] lnet: o2ib: raise bind cap before resolving address James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 06/28] lustre: use memalloc_nofs_save() for GFP_NOFS kvmalloc allocations James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 07/28] lnet: o2iblnd: Don't retry indefinitely James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 08/28] lustre: llite: rmdir releases inode on client James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 09/28] lustre: gss: update sequence in case of target disconnect James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 10/28] lustre: lov: doesn't check lov_refcount James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 11/28] lustre: ptlrpc: remove unused code at pinger James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 12/28] lustre: mdc: remote object support getattr from cache James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 13/28] lustre: llite: pass name in getattr by FID James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 14/28] lnet: o2iblnd: 'Timed out tx' error message James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 15/28] lustre: ldlm: Fix unbounded OBD_FAIL_LDLM_CANCEL_BL_CB_RACE wait James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 16/28] lustre: ldlm: group locks for DOM IBIT lock James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 17/28] lustre: ptlrpc: decrease time between reconnection James Simmons
2020-11-16  0:59 ` James Simmons [this message]
2020-11-16  0:59 ` [lustre-devel] [PATCH 19/28] lustre: ldlm: BL AST vs failed lock enqueue race James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 20/28] lustre: ptlrpc: don't log connection 'restored' inappropriately James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 21/28] lustre: llite: Avoid eternel retry loops with MAP_POPULATE James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 22/28] lustre: ptlrpc: introduce OST_SEEK RPC James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 23/28] lustre: clio: SEEK_HOLE/SEEK_DATA on client side James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 24/28] lustre: sec: O_DIRECT for encrypted file James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 25/28] lustre: sec: restrict fallocate on encrypted files James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 26/28] lustre: sec: encryption with different client PAGE_SIZE James Simmons
2020-11-16  1:00 ` [lustre-devel] [PATCH 27/28] lustre: sec: require enc key in case of O_CREAT only James Simmons
2020-11-16  1:00 ` [lustre-devel] [PATCH 28/28] lustre: sec: fix O_DIRECT and encrypted files James Simmons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1605488401-981-19-git-send-email-jsimmons@infradead.org \
    --to=jsimmons@infradead.org \
    --cc=lustre-devel@lists.lustre.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).