From: Bixuan Cui <cuibixuan@linux.alibaba.com>
To: Trond Myklebust <trondmy@hammerspace.com>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Cc: "pete.wl@alibaba-inc.com" <pete.wl@alibaba-inc.com>,
"xiaoh.peixh@alibaba-inc.com" <xiaoh.peixh@alibaba-inc.com>,
"davem@davemloft.net" <davem@davemloft.net>,
"chuck.lever@oracle.com" <chuck.lever@oracle.com>,
"kuba@kernel.org" <kuba@kernel.org>,
"bfields@fieldses.org" <bfields@fieldses.org>,
"weipu.zy@alibaba-inc.com" <weipu.zy@alibaba-inc.com>,
"wenan.mwa@alibaba-inc.com" <wenan.mwa@alibaba-inc.com>,
"anna.schumaker@netapp.com" <anna.schumaker@netapp.com>
Subject: Re: [PATCH -next] SUNRPC: Clean XPRT_CONGESTED of xprt->state when rpc task is killed
Date: Fri, 31 Dec 2021 10:00:20 +0800 [thread overview]
Message-ID: <b94123d2-5a39-b635-4471-8962ba2a69fb@linux.alibaba.com> (raw)
In-Reply-To: <b8c236d99fd0f4e08dd0ee12a81274bd643a7690.camel@hammerspace.com>
在 2021/12/22 下午11:02, Trond Myklebust 写道:
> On Wed, 2021-12-22 at 10:55 +0800, Bixuan Cui wrote:
>> 在 2021/12/21 上午2:22, Trond Myklebust 写道:
>>
>>> On Mon, 2021-12-20 at 11:39 +0800, Bixuan Cui wrote:
>>>
>>>> ping~
>>>>
>>>> 在 2021/12/14 下午9:53, Bixuan Cui 写道:
>>>>
>>>>> When the values of tcp_max_slot_table_entries and
>>>>> sunrpc.tcp_slot_table_entries are lower than the number of rpc
>>>>> tasks,
>>>>> xprt_dynamic_alloc_slot() in xprt_alloc_slot() will return -
>>>>> EAGAIN,
>>>>> and
>>>>> then set xprt->state to XPRT_CONGESTED:
>>>>> xprt_retry_reserve
>>>>> ->xprt_do_reserve
>>>>> ->xprt_alloc_slot
>>>>> ->xprt_dynamic_alloc_slot // return -EAGAIN and task-
>>>>>
>>>>>> tk_rqstp is NULL
>>>>> ->xprt_add_backlog // set_bit(XPRT_CONGESTED, &xprt-
>>>>>
>>>>>> state);
>>>>> When rpc task is killed, XPRT_CONGESTED bit of xprt->state will
>>>>> not
>>>>> be
>>>>> cleaned up and nfs hangs:
>>>>> rpc_exit_task
>>>>> ->xprt_release // if (req == NULL) is true, then
>>>>> XPRT_CONGESTED
>>>>> // bit not clean
>>>>>
>>>>> Add xprt_wake_up_backlog(xprt) to clean XPRT_CONGESTED bit in
>>>>> xprt_release().
>>> I'm not seeing how this explanation makes sense. If the task
>>> doesn't
>>> hold a slot, then freeing that task isn't going to clear the
>>> congestion
>>> caused by all the slots being in use.
>> Hi,
>> If the rpc task is free, call xprt_release() :
>> void xprt_release(struct rpc_task *task)
>> {
>> if (req == NULL) {
>> if (task->tk_client) {
>> xprt = task->tk_xprt;
>> xprt_release_write(xprt, task); // 1.
>> release xprt
>> }
>> return;
>> }
>> ....
>> if (likely(!bc_prealloc(req)))
>> xprt->ops->free_slot(xprt, req); // 2. release slot
>> and call xprt_wake_up_backlog(xprt, req) to wakeup next task(clear
>> XPRT_CONGESTED bit if next is NULL) in xprt_free_slot()
>> else
>> xprt_free_bc_request(req);
>> }
>> I mean that in step 1, xprt was only released, but
>> xprt_wake_up_backlog was not called (I don’t know if it is necessary,
>> but xprt->state may still be XPRT_CONGESTED), which causes xprt to
>> hold up. I think it happens when the task that does not hold a slot
>> is the last released task,xprt_wake_up_backlog(clear XPRT_CONGESTED)
>> will not be executed. :-)
>> Thanks,
>> Bixuan Cui
>>
>>
> My point is that in that case 1, there is no slot to free, so there is
> no change to the congestion state.
>
> IOW: your patch is incorrect because it is trying to assign a slot in a
> case where there is no slot to assign.
Hi,
I found the correct way to fix it, that is, do not free the request when
there are tasks in the xprt->backlog :-)
And it has been fixed by e877a88d1f06 (SUNRPC in case of backlog, hand
free slots directly to waiting task)
commit e877a88d1f069edced4160792f42c2a8e2dba942
Author: NeilBrown <neilb@suse.de>
Date: Mon May 17 09:59:10 2021 +1000
SUNRPC in case of backlog, hand free slots directly to waiting task
If sunrpc.tcp_max_slot_table_entries is small and there are tasks
on the backlog queue, then when a request completes it is freed and the
first task on the queue is woken. The expectation is that it will wake
and claim that request. However if it was a sync task and the waiting
process was killed at just that moment, it will wake and NOT claim the
request.
Thanks for your advice.
Thanks,
Bixuan Cui
prev parent reply other threads:[~2021-12-31 2:00 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-14 13:53 [PATCH -next] SUNRPC: Clean XPRT_CONGESTED of xprt->state when rpc task is killed Bixuan Cui
2021-12-14 13:53 ` Bixuan Cui
2021-12-20 3:39 ` Bixuan Cui
2021-12-20 18:22 ` Trond Myklebust
[not found] ` <efbf73f3-c6cd-90f6-ef22-bde14be708cc@linux.alibaba.com>
2021-12-22 15:02 ` Trond Myklebust
2021-12-31 2:00 ` Bixuan Cui [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b94123d2-5a39-b635-4471-8962ba2a69fb@linux.alibaba.com \
--to=cuibixuan@linux.alibaba.com \
--cc=anna.schumaker@netapp.com \
--cc=bfields@fieldses.org \
--cc=chuck.lever@oracle.com \
--cc=davem@davemloft.net \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pete.wl@alibaba-inc.com \
--cc=trondmy@hammerspace.com \
--cc=weipu.zy@alibaba-inc.com \
--cc=wenan.mwa@alibaba-inc.com \
--cc=xiaoh.peixh@alibaba-inc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).