linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chuck Lever III <chuck.lever@oracle.com>
To: Dan Aloni <dan@kernelim.com>
Cc: Trond Myklebust <trondmy@hammerspace.com>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
Subject: Re: [PATCH v3 15/26] xprtrdma: Do not recycle MR after FastReg/LocalInv flushes
Date: Sun, 25 Apr 2021 16:21:03 +0000	[thread overview]
Message-ID: <53AD14DC-65A2-4E93-A467-1DE43894DC03@oracle.com> (raw)
In-Reply-To: <20210425141914.6govk2lm2hfosdie@gmail.com>



> On Apr 25, 2021, at 10:19 AM, Dan Aloni <dan@kernelim.com> wrote:
> 
> On Mon, Apr 19, 2021 at 02:03:12PM -0400, Chuck Lever wrote:
>> Better not to touch MRs involved in a flush or post error until the
>> Send and Receive Queues are drained and the transport is fully
>> quiescent. Simply don't insert such MRs back onto the free list.
>> They remain on mr_all and will be released when the connection is
>> torn down.
>> 
>> I had thought that recycling would prevent hardware resources from
>> being tied up for a long time. However, since v5.7, a transport
>> disconnect destroys the QP and other hardware-owned resources. The
>> MRs get cleaned up nicely at that point.
>> 
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> 
> Is this a fix for the crash below?

Yes, it is plausible. That is a familiar backtrace.

However, it's usually because the provider called the LocalInv
completion handler twice for the same CQE. Which provider is this?


> I just wonder if it appeared for
> others in the wild, and the fix is not just theoretical.
> 
>    WARNING: CPU: 5 PID: 20312 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0
>    list_del corruption, ffff9df150b06768->next is LIST_POISON1 (dead000000000100)
> 
>    Call Trace:
>     [<ffffffff99764147>] dump_stack+0x19/0x1b
>     [<ffffffff99098848>] __warn+0xd8/0x100
>     [<ffffffff990988cf>] warn_slowpath_fmt+0x5f/0x80
>     [<ffffffff9921d5f6>] ? kfree+0x106/0x140
>     [<ffffffff99396953>] __list_del_entry+0x63/0xd0
>     [<ffffffff993969cd>] list_del+0xd/0x30
>     [<ffffffffc0bb307f>] frwr_mr_recycle+0xaf/0x150 [rpcrdma]
>     [<ffffffffc0bb3264>] frwr_wc_localinv+0x94/0xa0 [rpcrdma]
>     [<ffffffffc067d20e>] __ib_process_cq+0x8e/0x100 [ib_core]
>     [<ffffffffc067d2f9>] ib_cq_poll_work+0x29/0x70 [ib_core]
>     [<ffffffff990baf9f>] process_one_work+0x17f/0x440
>     [<ffffffff990bc036>] worker_thread+0x126/0x3c0
>     [<ffffffff990bbf10>] ? manage_workers.isra.25+0x2a0/0x2a0
>     [<ffffffff990c2e81>] kthread+0xd1/0xe0
>     [<ffffffff990c2db0>] ? insert_kthread_work+0x40/0x40
>     [<ffffffff99776c37>] ret_from_fork_nospec_begin+0x21/0x21
>     [<ffffffff990c2db0>] ? insert_kthread_work+0x40/0x40
> 
> -- 
> Dan Aloni

--
Chuck Lever




  reply	other threads:[~2021-04-25 16:21 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-19 18:01 [PATCH v3 00/26] NFS/RDMA client patches for next Chuck Lever
2021-04-19 18:01 ` [PATCH v3 01/26] SUNRPC: Move fault injection call sites Chuck Lever
2021-04-19 18:05   ` Chuck Lever III
2021-04-19 18:01 ` [PATCH v3 02/26] SUNRPC: Remove trace_xprt_transmit_queued Chuck Lever
2021-04-19 18:01 ` [PATCH v3 03/26] SUNRPC: Add tracepoint that fires when an RPC is retransmitted Chuck Lever
2021-04-19 18:02 ` [PATCH v3 04/26] xprtrdma: Avoid Receive Queue wrapping Chuck Lever
2021-04-19 18:02 ` [PATCH v3 05/26] xprtrdma: Do not refresh Receive Queue while it is draining Chuck Lever
2021-04-19 18:02 ` [PATCH v3 06/26] xprtrdma: Put flushed Receives on free list instead of destroying them Chuck Lever
2021-04-19 18:02 ` [PATCH v3 07/26] xprtrdma: Improve locking around rpcrdma_rep destruction Chuck Lever
2021-04-23 21:06   ` Trond Myklebust
2021-04-24 17:39     ` Chuck Lever III
2021-04-24 17:59       ` Trond Myklebust
2021-04-19 18:02 ` [PATCH v3 08/26] xprtrdma: Improve commentary around rpcrdma_reps_unmap() Chuck Lever
2021-04-19 18:02 ` [PATCH v3 09/26] xprtrdma: Improve locking around rpcrdma_rep creation Chuck Lever
2021-04-19 18:02 ` [PATCH v3 10/26] xprtrdma: Fix cwnd update ordering Chuck Lever
2021-04-19 18:02 ` [PATCH v3 11/26] xprtrdma: Delete rpcrdma_recv_buffer_put() Chuck Lever
2021-04-19 18:02 ` [PATCH v3 12/26] xprtrdma: rpcrdma_mr_pop() already does list_del_init() Chuck Lever
2021-04-19 18:03 ` [PATCH v3 13/26] xprtrdma: Rename frwr_release_mr() Chuck Lever
2021-04-19 18:03 ` [PATCH v3 14/26] xprtrdma: Clarify use of barrier in frwr_wc_localinv_done() Chuck Lever
2021-04-19 18:03 ` [PATCH v3 15/26] xprtrdma: Do not recycle MR after FastReg/LocalInv flushes Chuck Lever
2021-04-25 14:19   ` Dan Aloni
2021-04-25 16:21     ` Chuck Lever III [this message]
2021-04-25 17:00       ` Dan Aloni
2021-04-19 18:03 ` [PATCH v3 16/26] xprtrdma: Do not wake RPC consumer on a failed LocalInv Chuck Lever
2021-04-19 18:03 ` [PATCH v3 17/26] xprtrdma: Avoid Send Queue wrapping Chuck Lever
2021-04-19 18:03 ` [PATCH v3 18/26] xprtrdma: Add tracepoints showing FastReg WRs and remote invalidation Chuck Lever
2021-04-19 18:03 ` [PATCH v3 19/26] xprtrdma: Add an rpcrdma_mr_completion_class Chuck Lever
2021-04-19 18:03 ` [PATCH v3 20/26] xprtrdma: Don't display r_xprt memory addresses in tracepoints Chuck Lever
2021-04-19 18:03 ` [PATCH v3 21/26] xprtrdma: Remove the RPC/RDMA QP event handler Chuck Lever
2021-04-19 18:03 ` [PATCH v3 22/26] xprtrdma: Move fr_cid to struct rpcrdma_mr Chuck Lever
2021-04-19 18:04 ` [PATCH v3 23/26] xprtrdma: Move cqe " Chuck Lever
2021-04-19 18:04 ` [PATCH v3 24/26] xprtrdma: Move fr_linv_done field " Chuck Lever
2021-04-19 18:04 ` [PATCH v3 25/26] xprtrdma: Move the Work Request union " Chuck Lever
2021-04-19 18:04 ` [PATCH v3 26/26] xprtrdma: Move fr_mr field " Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53AD14DC-65A2-4E93-A467-1DE43894DC03@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=dan@kernelim.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).