All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@whamcloud.com>
To: lustre-devel@lists.lustre.org
Subject: [lustre-devel] [PATCH 2/8] lustre: ptlrpc: Fix an rq_no_reply assertion failure
Date: Wed, 14 Aug 2019 16:58:49 +0000	[thread overview]
Message-ID: <1F5F7B07-3D83-4CF2-B798-88F454EB69B8@whamcloud.com> (raw)
In-Reply-To: <1564022647-17351-3-git-send-email-jsimmons@infradead.org>

This is definitely server code.

Cheers, Andreas

> On Jul 24, 2019, at 19:44, James Simmons <jsimmons@infradead.org> wrote:
> 
> From: Li Wei <wei.g.li@intel.com>
> 
> An OSS had an assertion failure:
> 
>  LustreError: 5366:0:(ldlm_lib.c:2689:target_bulk_io()) @@@ timeout
>  on bulk GET after 0+0s  req at ffff88083a61b400
>  x1476486691018500/t0(4300509964)
>  o4->8dda3382-83f8-6445-5eea-828fd59e4a06 at 192.168.1.116@o2ib1:0/0
>  lens 504/448 e 391470 to 0 dl 1408494729 ref 2 fl Complete:/4/0 rc
>  0/0
>  LustreError: 5432:0:(niobuf.c:550:ptlrpc_send_reply()) ASSERTION(
>  req->rq_no_reply == 0 ) failed:
>  Lustre: soaked-OST0000: Bulk IO write error with
>  8dda3382-83f8-6445-5eea-828fd59e4a06 (at 192.168.1.116 at o2ib1),
>  client will retry: rc -110
>  LustreError: 5432:0:(niobuf.c:550:ptlrpc_send_reply()) LBUG
>  Pid: 5432, comm: ll_ost_io03_003
> 
>  Call Trace:
>  [<ffffffffa0641895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
>  [<ffffffffa0641e97>] lbug_with_loc+0x47/0xb0 [libcfs]
>  [<ffffffffa09cda4c>] ptlrpc_send_reply+0x4ec/0x7f0 [ptlrpc]
>  [<ffffffffa09d4aae>] ? lustre_pack_reply_flags+0xae/0x1f0 [ptlrpc]
>  [<ffffffffa09e4d75>] ptlrpc_at_check_timed+0xcd5/0x1370 [ptlrpc]
>  [<ffffffffa09dc1e9>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
>  [<ffffffffa09e66f8>] ptlrpc_main+0x12e8/0x1990 [ptlrpc]
>  [<ffffffff81069290>] ? pick_next_task_fair+0xd0/0x130
>  [<ffffffff81529246>] ? schedule+0x176/0x3b0
>  [<ffffffffa09e5410>] ? ptlrpc_main+0x0/0x1990 [ptlrpc]
>  [<ffffffff8109abf6>] kthread+0x96/0xa0
>  [<ffffffff8100c20a>] child_rip+0xa/0x20
>  [<ffffffff8109ab60>] ? kthread+0x0/0xa0
>  [<ffffffff8100c200>] ? child_rip+0x0/0x20
> 
> The thread in tgt_brw_write() had decided not to reply by setting
> rq_no_reply, right before another thread tried to send an early reply
> for the request.
> 
> WC-bug-id: https://jira.whamcloud.com/browse/LU-5537
> Lustre-commit: a8d448e4cd5978c546911f98067232bcdd30b651
> Signed-off-by: Li Wei <wei.g.li@intel.com>
> Reviewed-on: http://review.whamcloud.com/11740
> Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
> Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
> ---
> fs/lustre/ptlrpc/service.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
> 
> diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c
> index a40e964..c9ab9c3 100644
> --- a/fs/lustre/ptlrpc/service.c
> +++ b/fs/lustre/ptlrpc/service.c
> @@ -1098,6 +1098,16 @@ static int ptlrpc_at_send_early_reply(struct ptlrpc_request *req)
>    reqcopy->rq_reqmsg = reqmsg;
>    memcpy(reqmsg, req->rq_reqmsg, req->rq_reqlen);
> 
> +    /*
> +     * tgt_brw_read() and tgt_brw_write() may have decided not to reply.
> +     * Without this check, we would fail the rq_no_reply assertion in
> +     * ptlrpc_send_reply().
> +     */
> +    if (reqcopy->rq_no_reply) {
> +        rc = -ETIMEDOUT;
> +        goto out;
> +    }
> +
>    LASSERT(atomic_read(&req->rq_refcount));
>    /** if it is last refcount then early reply isn't needed */
>    if (atomic_read(&req->rq_refcount) == 1) {
> -- 
> 1.8.3.1
> 

  reply	other threads:[~2019-08-14 16:58 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-25  2:43 [lustre-devel] [PATCH 0/8] lustre: some old patches from whamcloud tree James Simmons
2019-07-25  2:44 ` [lustre-devel] [PATCH 1/8] lustre: seq: make seq_proc_write_common() safer James Simmons
2019-07-25 23:55   ` NeilBrown
2019-07-26  3:31     ` James Simmons
2019-07-25  2:44 ` [lustre-devel] [PATCH 2/8] lustre: ptlrpc: Fix an rq_no_reply assertion failure James Simmons
2019-08-14 16:58   ` Andreas Dilger [this message]
2019-07-25  2:44 ` [lustre-devel] [PATCH 3/8] lustre: fld: resend seq lookup RPC if it is on LWP James Simmons
2019-08-14 16:58   ` Andreas Dilger
2019-07-25  2:44 ` [lustre-devel] [PATCH 4/8] lustre: fld: retry fld rpc even for ESHUTDOWN James Simmons
2019-08-14 16:58   ` Andreas Dilger
2019-08-14 16:58   ` Andreas Dilger
2019-07-25  2:44 ` [lustre-devel] [PATCH 5/8] lustre: fld: retry fld rpc until the import is closed James Simmons
2019-08-14 16:58   ` Andreas Dilger
2019-07-25  2:44 ` [lustre-devel] [PATCH 6/8] lustre: fld: fld client lookup should retry James Simmons
2019-08-14 16:58   ` Andreas Dilger
2019-07-25  2:44 ` [lustre-devel] [PATCH 7/8] lustre: tests: testcases for multiple modify RPCs feature James Simmons
2019-08-14 16:58   ` Andreas Dilger
2019-07-25  2:44 ` [lustre-devel] [PATCH 8/8] lustre: ldlm: Don't check opcode with NULL rq_reqmsg James Simmons
2019-08-14 16:58   ` Andreas Dilger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1F5F7B07-3D83-4CF2-B798-88F454EB69B8@whamcloud.com \
    --to=adilger@whamcloud.com \
    --cc=lustre-devel@lists.lustre.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.