From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>,
Oleg Drokin <green@whamcloud.com>, NeilBrown <neilb@suse.de>
Cc: Lai Siyao <lai.siyao@whamcloud.com>,
Lustre Development List <lustre-devel@lists.lustre.org>
Subject: [lustre-devel] [PATCH 10/29] lustre: mdc: set fid2path RPC interruptible
Date: Sun, 25 Apr 2021 16:08:17 -0400 [thread overview]
Message-ID: <1619381316-7719-11-git-send-email-jsimmons@infradead.org> (raw)
In-Reply-To: <1619381316-7719-1-git-send-email-jsimmons@infradead.org>
From: Lai Siyao <lai.siyao@whamcloud.com>
Sometimes OI scrub can't fix the inconsistency in FID and name, and
server will return -EINPROGRESS for fid2path request. Upon such
failure, client will keep resending the request. Set such request
to be interruptible to avoid deadlock.
WC-bug-id: https://jira.whamcloud.com/browse/LU-14119
Lustre-commit: bf47526261067153 ("LU-14119 mdc: set fid2path RPC interruptible")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41219
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
fs/lustre/include/lustre_net.h | 4 +++-
fs/lustre/mdc/mdc_request.c | 7 +++++++
fs/lustre/ptlrpc/client.c | 35 +++++++++++++++++++++++++++++++----
3 files changed, 41 insertions(+), 5 deletions(-)
diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h
index 2b98468..abd16ea 100644
--- a/fs/lustre/include/lustre_net.h
+++ b/fs/lustre/include/lustre_net.h
@@ -445,6 +445,7 @@ struct ptlrpc_request_set {
set_producer_func set_producer;
/** opaq argument passed to the producer callback */
void *set_producer_arg;
+ unsigned int set_allow_intr:1;
};
struct ptlrpc_bulk_desc;
@@ -825,7 +826,8 @@ struct ptlrpc_request {
rq_allow_replay:1,
/* bulk request, sent to server, but uncommitted */
rq_unstable:1,
- rq_early_free_repbuf:1; /* free reply buffer in advance */
+ rq_early_free_repbuf:1, /* free reply buffer in advance */
+ rq_allow_intr:1;
/** @} */
/** server-side flags @{ */
diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c
index ef27af6..6ac3a39 100644
--- a/fs/lustre/mdc/mdc_request.c
+++ b/fs/lustre/mdc/mdc_request.c
@@ -2293,6 +2293,13 @@ static int mdc_get_info_rpc(struct obd_export *exp,
RCL_SERVER, vallen);
ptlrpc_request_set_replen(req);
+ /* if server failed to resolve FID, and OI scrub not able to fix it, it
+ * will return -EINPROGRESS, ptlrpc_queue_wait() will keep retrying,
+ * set request interruptible to avoid deadlock.
+ */
+ if (KEY_IS(KEY_FID2PATH))
+ req->rq_allow_intr = 1;
+
rc = ptlrpc_queue_wait(req);
/* -EREMOTE means the get_info result is partial, and it needs to
* continue on another MDT, see fid2path part in lmv_iocontrol
diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c
index 04e8fec..3c57b69 100644
--- a/fs/lustre/ptlrpc/client.c
+++ b/fs/lustre/ptlrpc/client.c
@@ -1127,6 +1127,9 @@ void ptlrpc_set_add_req(struct ptlrpc_request_set *set,
LASSERT(req->rq_import->imp_state != LUSTRE_IMP_IDLE);
LASSERT(list_empty(&req->rq_set_chain));
+ if (req->rq_allow_intr)
+ set->set_allow_intr = 1;
+
/* The set takes over the caller's request reference */
list_add_tail(&req->rq_set_chain, &set->set_requests);
req->rq_set = set;
@@ -1725,6 +1728,7 @@ int ptlrpc_check_set(const struct lu_env *env, struct ptlrpc_request_set *set)
list_for_each_entry_safe(req, next, &set->set_requests, rq_set_chain) {
struct obd_import *imp = req->rq_import;
int unregistered = 0;
+ int async = 1;
int rc = 0;
/*
@@ -1736,6 +1740,24 @@ int ptlrpc_check_set(const struct lu_env *env, struct ptlrpc_request_set *set)
*/
cond_resched();
+ /*
+ * If the caller requires to allow to be interpreted by force
+ * and it has really been interpreted, then move the request
+ * to RQ_PHASE_INTERPRET phase in spite of what the current
+ * phase is.
+ */
+ if (unlikely(req->rq_allow_intr && req->rq_intr)) {
+ req->rq_status = -EINTR;
+ ptlrpc_rqphase_move(req, RQ_PHASE_INTERPRET);
+
+ /*
+ * Since it is interpreted and we have to wait for
+ * the reply to be unlinked, then use sync mode.
+ */
+ async = 0;
+ goto interpret;
+ }
+
if (req->rq_phase == RQ_PHASE_NEW &&
ptlrpc_send_new_req(req)) {
force_timer_recalc = 1;
@@ -2067,13 +2089,13 @@ int ptlrpc_check_set(const struct lu_env *env, struct ptlrpc_request_set *set)
* This moves to "unregistering" phase we need to wait for
* reply unlink.
*/
- if (!unregistered && !ptlrpc_unregister_reply(req, 1)) {
+ if (!unregistered && !ptlrpc_unregister_reply(req, async)) {
/* start async bulk unlink too */
ptlrpc_unregister_bulk(req, 1);
continue;
}
- if (!ptlrpc_unregister_bulk(req, 1))
+ if (!ptlrpc_unregister_bulk(req, async))
continue;
/* When calling interpret receive should already be finished. */
@@ -2271,8 +2293,12 @@ static void ptlrpc_interrupted_set(struct ptlrpc_request_set *set)
CDEBUG(D_RPCTRACE, "INTERRUPTED SET %p\n", set);
list_for_each_entry(req, &set->set_requests, rq_set_chain) {
+ if (req->rq_intr)
+ continue;
+
if (req->rq_phase != RQ_PHASE_RPC &&
- req->rq_phase != RQ_PHASE_UNREG_RPC)
+ req->rq_phase != RQ_PHASE_UNREG_RPC &&
+ !req->rq_allow_intr)
continue;
spin_lock(&req->rq_lock);
@@ -2368,7 +2394,8 @@ int ptlrpc_set_wait(const struct lu_env *env, struct ptlrpc_request_set *set)
CDEBUG(D_RPCTRACE, "set %p going to sleep for %lld seconds\n",
set, timeout);
- if (timeout == 0 && !signal_pending(current)) {
+ if ((timeout == 0 && !signal_pending(current)) ||
+ set->set_allow_intr) {
/*
* No requests are in-flight (ether timed out
* or delayed), so we can allow interrupts.
--
1.8.3.1
_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org
next prev parent reply other threads:[~2021-04-25 20:09 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-25 20:08 [lustre-devel] [PATCH 00/29] lustre: Update to OpenSFS tree as of April 25, 2020 James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 01/29] lnet: socklnd: use sockaddr instead of u32 addresses James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 02/29] lnet: allow creation of IPv6 socket James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 03/29] lnet: allow lnet_connect() to use IPv6 addresses James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 04/29] lnet: handle possiblity of IPv6 being unavailable James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 05/29] lnet: socklnd: remove tcp bonding James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 06/29] lnet: socklnd: replace route construct James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 07/29] lustre: readahead: limit over reservation James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 08/29] lustre: clio: fix hang on urgent cached pages James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 09/29] lustre: uapi: add mdt_hash_name James Simmons
2021-04-25 20:08 ` James Simmons [this message]
2021-04-25 20:08 ` [lustre-devel] [PATCH 11/29] lustre: include: remove references to Sun Trademark James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 12/29] lnet: o2iblnd: Use REMOTE_DROPPED for ECONNREFUSED James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 13/29] lustre: lmv: reduce struct lmv_obd size James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 14/29] lustre: uapi: remove obsolete ioctls James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 15/29] lustre: lmv: don't include struct lu_qos_rr in client James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 16/29] lnet: libcfs: fix setting of debug_path James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 17/29] lnet: Use lr_hops for avoid_asym_router_failure James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 18/29] lnet: Leverage peer aliveness more efficiently James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 19/29] lustre: mdt: mkdir should return -EEXIST if exists James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 20/29] lnet: o2iblnd: don't resend if there's no listener James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 21/29] lnet: obi2lnd: don't try to reconnect " James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 22/29] lustre: osc: fall back to vmalloc for large RPCs James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 23/29] lustre: ldlm: discard l_lock from struct ldlm_lock James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 24/29] lustre: llite: do fallocate() size checks under lock James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 25/29] lustre: misc: limit CDEBUG console message frequency James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 26/29] lustre: fallocate: Add punch mode to fallocate James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 27/29] lustre: various: only use wake_up_all() on exclusive waitqs James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 28/29] lnet: remove references to Sun Trademark James Simmons
2021-04-25 20:08 ` [lustre-devel] [PATCH 29/29] lustre: " James Simmons
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1619381316-7719-11-git-send-email-jsimmons@infradead.org \
--to=jsimmons@infradead.org \
--cc=adilger@whamcloud.com \
--cc=green@whamcloud.com \
--cc=lai.siyao@whamcloud.com \
--cc=lustre-devel@lists.lustre.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).