All of lore.kernel.org
 help / color / mirror / Atom feed
From: James Simmons <jsimmons@infradead.org>
To: lustre-devel@lists.lustre.org
Subject: [lustre-devel] [PATCH 2/8] lustre: ptlrpc: Fix an rq_no_reply assertion failure
Date: Wed, 24 Jul 2019 22:44:01 -0400	[thread overview]
Message-ID: <1564022647-17351-3-git-send-email-jsimmons@infradead.org> (raw)
In-Reply-To: <1564022647-17351-1-git-send-email-jsimmons@infradead.org>

From: Li Wei <wei.g.li@intel.com>

An OSS had an assertion failure:

  LustreError: 5366:0:(ldlm_lib.c:2689:target_bulk_io()) @@@ timeout
  on bulk GET after 0+0s  req at ffff88083a61b400
  x1476486691018500/t0(4300509964)
  o4->8dda3382-83f8-6445-5eea-828fd59e4a06 at 192.168.1.116@o2ib1:0/0
  lens 504/448 e 391470 to 0 dl 1408494729 ref 2 fl Complete:/4/0 rc
  0/0
  LustreError: 5432:0:(niobuf.c:550:ptlrpc_send_reply()) ASSERTION(
  req->rq_no_reply == 0 ) failed:
  Lustre: soaked-OST0000: Bulk IO write error with
  8dda3382-83f8-6445-5eea-828fd59e4a06 (at 192.168.1.116 at o2ib1),
  client will retry: rc -110
  LustreError: 5432:0:(niobuf.c:550:ptlrpc_send_reply()) LBUG
  Pid: 5432, comm: ll_ost_io03_003

  Call Trace:
  [<ffffffffa0641895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
  [<ffffffffa0641e97>] lbug_with_loc+0x47/0xb0 [libcfs]
  [<ffffffffa09cda4c>] ptlrpc_send_reply+0x4ec/0x7f0 [ptlrpc]
  [<ffffffffa09d4aae>] ? lustre_pack_reply_flags+0xae/0x1f0 [ptlrpc]
  [<ffffffffa09e4d75>] ptlrpc_at_check_timed+0xcd5/0x1370 [ptlrpc]
  [<ffffffffa09dc1e9>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
  [<ffffffffa09e66f8>] ptlrpc_main+0x12e8/0x1990 [ptlrpc]
  [<ffffffff81069290>] ? pick_next_task_fair+0xd0/0x130
  [<ffffffff81529246>] ? schedule+0x176/0x3b0
  [<ffffffffa09e5410>] ? ptlrpc_main+0x0/0x1990 [ptlrpc]
  [<ffffffff8109abf6>] kthread+0x96/0xa0
  [<ffffffff8100c20a>] child_rip+0xa/0x20
  [<ffffffff8109ab60>] ? kthread+0x0/0xa0
  [<ffffffff8100c200>] ? child_rip+0x0/0x20

The thread in tgt_brw_write() had decided not to reply by setting
rq_no_reply, right before another thread tried to send an early reply
for the request.

WC-bug-id: https://jira.whamcloud.com/browse/LU-5537
Lustre-commit: a8d448e4cd5978c546911f98067232bcdd30b651
Signed-off-by: Li Wei <wei.g.li@intel.com>
Reviewed-on: http://review.whamcloud.com/11740
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
---
 fs/lustre/ptlrpc/service.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c
index a40e964..c9ab9c3 100644
--- a/fs/lustre/ptlrpc/service.c
+++ b/fs/lustre/ptlrpc/service.c
@@ -1098,6 +1098,16 @@ static int ptlrpc_at_send_early_reply(struct ptlrpc_request *req)
 	reqcopy->rq_reqmsg = reqmsg;
 	memcpy(reqmsg, req->rq_reqmsg, req->rq_reqlen);
 
+	/*
+	 * tgt_brw_read() and tgt_brw_write() may have decided not to reply.
+	 * Without this check, we would fail the rq_no_reply assertion in
+	 * ptlrpc_send_reply().
+	 */
+	if (reqcopy->rq_no_reply) {
+		rc = -ETIMEDOUT;
+		goto out;
+	}
+
 	LASSERT(atomic_read(&req->rq_refcount));
 	/** if it is last refcount then early reply isn't needed */
 	if (atomic_read(&req->rq_refcount) == 1) {
-- 
1.8.3.1

  parent reply	other threads:[~2019-07-25  2:44 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-25  2:43 [lustre-devel] [PATCH 0/8] lustre: some old patches from whamcloud tree James Simmons
2019-07-25  2:44 ` [lustre-devel] [PATCH 1/8] lustre: seq: make seq_proc_write_common() safer James Simmons
2019-07-25 23:55   ` NeilBrown
2019-07-26  3:31     ` James Simmons
2019-07-25  2:44 ` James Simmons [this message]
2019-08-14 16:58   ` [lustre-devel] [PATCH 2/8] lustre: ptlrpc: Fix an rq_no_reply assertion failure Andreas Dilger
2019-07-25  2:44 ` [lustre-devel] [PATCH 3/8] lustre: fld: resend seq lookup RPC if it is on LWP James Simmons
2019-08-14 16:58   ` Andreas Dilger
2019-07-25  2:44 ` [lustre-devel] [PATCH 4/8] lustre: fld: retry fld rpc even for ESHUTDOWN James Simmons
2019-08-14 16:58   ` Andreas Dilger
2019-08-14 16:58   ` Andreas Dilger
2019-07-25  2:44 ` [lustre-devel] [PATCH 5/8] lustre: fld: retry fld rpc until the import is closed James Simmons
2019-08-14 16:58   ` Andreas Dilger
2019-07-25  2:44 ` [lustre-devel] [PATCH 6/8] lustre: fld: fld client lookup should retry James Simmons
2019-08-14 16:58   ` Andreas Dilger
2019-07-25  2:44 ` [lustre-devel] [PATCH 7/8] lustre: tests: testcases for multiple modify RPCs feature James Simmons
2019-08-14 16:58   ` Andreas Dilger
2019-07-25  2:44 ` [lustre-devel] [PATCH 8/8] lustre: ldlm: Don't check opcode with NULL rq_reqmsg James Simmons
2019-08-14 16:58   ` Andreas Dilger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1564022647-17351-3-git-send-email-jsimmons@infradead.org \
    --to=jsimmons@infradead.org \
    --cc=lustre-devel@lists.lustre.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.