From: Chuck Lever <chuck.lever@oracle.com>
To: Trond Myklebust <trondmy@hammerspace.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks
Date: Wed, 2 Jan 2019 14:33:37 -0500 [thread overview]
Message-ID: <5F5AF2D8-EB72-4B1C-BE5E-7593037FFB92@oracle.com> (raw)
In-Reply-To: <f451e8ac200370c520f2fb3371f4b2ae637fd3a8.camel@hammerspace.com>
> On Jan 2, 2019, at 2:24 PM, Trond Myklebust <trondmy@hammerspace.com> wrote:
>
> On Wed, 2019-01-02 at 14:06 -0500, Trond Myklebust wrote:
>> On Wed, 2019-01-02 at 13:57 -0500, Trond Myklebust wrote:
>>> On Wed, 2019-01-02 at 13:51 -0500, Chuck Lever wrote:
>>>>> On Jan 2, 2019, at 1:45 PM, Trond Myklebust <
>>>>> trondmy@hammerspace.com> wrote:
>>>>>
>>>>> On Wed, 2019-01-02 at 13:17 -0500, Chuck Lever wrote:
>>>>>>> On Dec 31, 2018, at 2:21 PM, Trond Myklebust <
>>>>>>> trondmy@hammerspace.com> wrote:
>>>>>>>
>>>>>>> On Mon, 2018-12-31 at 19:18 +0000, Trond Myklebust wrote:
>>>>>>>> On Mon, 2018-12-31 at 14:09 -0500, Chuck Lever wrote:
>>>>>>>>>> On Dec 31, 2018, at 1:59 PM, Trond Myklebust <
>>>>>>>>>> trondmy@hammerspace.com> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> The test for rpcauth_xmit_need_reencode() happens when we
>>>>>>>> call
>>>>>>>> xprt_request_transmit() to actually put the RPC call on
>>>>>>>> the
>>>>>>>> wire.
>>>>>>>> The
>>>>>>>> enqueue order should not be able to defeat that test.
>>>>>>>>
>>>>>>>> Hmm... Is it perhaps the test for req->rq_bytes_sent that
>>>>>>>> is
>>>>>>>> failing
>>>>>>>> because this is a retransmission after a
>>>>>>>> disconnect/reconnect
>>>>>>>> that
>>>>>>>> didn't trigger a re-encode?
>>>>>>>
>>>>>>> Actually, it might be worth a try to move the test for
>>>>>>> rpcauth_xmit_need_reencode() outside the enclosing test for
>>>>>>> req-
>>>>>>>> rq_bytes_sent as that is just a minor optimisation.
>>>>>>
>>>>>> Perhaps that's the case for TCP, but RPCs sent via xprtrdma
>>>>>> never
>>>>>> set
>>>>>> req->rq_bytes_sent to a non-zero value. The body of the "if"
>>>>>> statement
>>>>>> is always executed for those RPCs.
>>>>>>
>>>>>
>>>>> Then the question is what is defeating the call to
>>>>> rpcauth_xmit_need_reencode() in xprt_request_transmit() and
>>>>> causing
>>>>> it
>>>>> not to trigger in the misordered cases?
>>>>
>>>> Here's a sample RPC/RDMA case.
>>>>
>>>> My instrumented server reports this:
>>>>
>>>> Jan 2 13:29:00 klimt kernel: gss_check_seq_num: dropped:
>>>> seq_num=141220 sd->sd_max=141360
>>>>
>>>>
>>>> ftrace log on the client shows this:
>>>>
>>>> kworker/u28:12-2191 [004] 194.048534: rpcgss_need_reencode:
>>>> task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336
>>>> reencode
>>>> unneeded
>>>> kworker/u28:12-2191 [004] 194.048534:
>>>> xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220
>>>> status=-57
>>>> kworker/u28:12-2191 [004] 194.048534:
>>>> rpc_task_run_action: task:1779@5 flags=ASYNC
>>>> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-57
>>>> action=call_transmit_status
>>>> kworker/u28:12-2191 [004] 194.048535:
>>>> rpc_task_run_action: task:1779@5 flags=ASYNC
>>>> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0
>>>> action=call_transmit
>>>> kworker/u28:12-2191 [004] 194.048535:
>>>> rpc_task_sleep: task:1779@5 flags=ASYNC
>>>> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-11 timeout=0
>>>> queue=xprt_sending
>>>>
>>>>
>>>> kworker/u28:12-2191 [004] 194.048552: rpcgss_need_reencode:
>>>> task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336
>>>> reencode
>>>> unneeded
>>>> kworker/u28:12-2191 [004] 194.048557:
>>>> xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220
>>>> status=0
>>>>
>>>>
>>>> kworker/u28:12-2191 [004] 194.048559: rpcgss_need_reencode:
>>>> task:1902@5 xid=0x14f5f47c rq_seqno=141360 seq_xmit=141336
>>>> reencode
>>>> unneeded
>>>> kworker/u28:12-2191 [004] 194.048563:
>>>> xprt_transmit: task:1902@5 xid=0x14f5f47c seqno=141360
>>>> status=0
>>>>
>>>>
>>>> Note that first need_reencode: the sequence numbers show that the
>>>> xmit
>>>> queue has been significantly re-ordered. The request being
>>>> transmitted is
>>>> already very close to the lower end of the GSS sequence number
>>>> window.
>>>>
>>>> The server then re-ordereds these two slightly because the first
>>>> one
>>>> had
>>>> some Read chunks that need to be pulled over, the second was pure
>>>> inline
>>>> and therefore could be processed immediately. That is enough to
>>>> force
>>>> the
>>>> first one outside the GSS sequence number window.
>>>>
>>>> I haven't looked closely at the pathology of the TCP case.
>>>
>>> Wait a minute... That's not OK. The client can't be expected to
>>> take
>>> into account reordering that happens on the server side.
>>
>> If that's the case, then we would need to halt transmission as soon
>> as
>> we hit the RPCSEC_GSS window edge. Off the cuff, I'm not sure how to
>> do
>> that, since those windows are per session (i.e. per user).
>
> So here is something we probably could do: modify
> xprt_request_enqueue_transmit() to order the list in req->rq_xmit2 by
> req->rq_seqno.
Why not add " && !req->rq_seq_no " to the third arm? Calls are already
enqueued in sequence number order.
> Since task->tk_owner is actually a pid, then that's not
> a perfect solution, but we could further mitigate by modifying
> gss_xmit_need_reencode() to only allow transmission of requests that
> are within 2/3 of the window.
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
--
Chuck Lever
next prev parent reply other threads:[~2019-01-02 19:33 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-17 13:02 [PATCH v3 00/44] Convert RPC client transmission to a queued model Trond Myklebust
2018-09-17 13:02 ` [PATCH v3 01/44] SUNRPC: Clean up initialisation of the struct rpc_rqst Trond Myklebust
2018-09-17 13:02 ` [PATCH v3 02/44] SUNRPC: If there is no reply expected, bail early from call_decode Trond Myklebust
2018-09-17 13:02 ` [PATCH v3 03/44] SUNRPC: The transmitted message must lie in the RPCSEC window of validity Trond Myklebust
2018-09-17 13:02 ` [PATCH v3 04/44] SUNRPC: Simplify identification of when the message send/receive is complete Trond Myklebust
2018-09-17 13:02 ` [PATCH v3 05/44] SUNRPC: Avoid holding locks across the XDR encoding of the RPC message Trond Myklebust
2018-09-17 13:02 ` [PATCH v3 06/44] SUNRPC: Rename TCP receive-specific state variables Trond Myklebust
2018-09-17 13:02 ` [PATCH v3 07/44] SUNRPC: Move reset of TCP state variables into the reconnect code Trond Myklebust
2018-09-17 13:02 ` [PATCH v3 08/44] SUNRPC: Add socket transmit queue offset tracking Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 09/44] SUNRPC: Simplify dealing with aborted partially transmitted messages Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 10/44] SUNRPC: Refactor the transport request pinning Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 11/44] SUNRPC: Add a helper to wake up a sleeping rpc_task and set its status Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 12/44] SUNRPC: Test whether the task is queued before grabbing the queue spinlocks Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 13/44] SUNRPC: Don't wake queued RPC calls multiple times in xprt_transmit Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 14/44] SUNRPC: Rename xprt->recv_lock to xprt->queue_lock Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 15/44] SUNRPC: Refactor xprt_transmit() to remove the reply queue code Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 16/44] SUNRPC: Refactor xprt_transmit() to remove wait for reply code Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 17/44] SUNRPC: Minor cleanup for call_transmit() Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 18/44] SUNRPC: Distinguish between the slot allocation list and receive queue Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 19/44] SUNRPC: Add a transmission queue for RPC requests Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 20/44] SUNRPC: Refactor RPC call encoding Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 21/44] SUNRPC: Fix up the back channel transmit Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 22/44] SUNRPC: Treat the task and request as separate in the xprt_ops->send_request() Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 23/44] SUNRPC: Don't reset the request 'bytes_sent' counter when releasing XPRT_LOCK Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 24/44] SUNRPC: Simplify xprt_prepare_transmit() Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 25/44] SUNRPC: Move RPC retransmission stat counter to xprt_transmit() Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 27/44] SUNRPC: Support for congestion control when queuing is enabled Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 28/44] SUNRPC: Enqueue swapper tagged RPCs at the head of the transmit queue Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 29/44] SUNRPC: Allow calls to xprt_transmit() to drain the entire " Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 30/44] SUNRPC: Allow soft RPC calls to time out when waiting for the XPRT_LOCK Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 31/44] SUNRPC: Turn off throttling of RPC slots for TCP sockets Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 32/44] SUNRPC: Clean up transport write space handling Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 33/44] SUNRPC: Cleanup: remove the unused 'task' argument from the request_send() Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 34/44] SUNRPC: Don't take transport->lock unnecessarily when taking XPRT_LOCK Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 35/44] SUNRPC: Convert xprt receive queue to use an rbtree Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 36/44] SUNRPC: Fix priority queue fairness Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 37/44] SUNRPC: Convert the xprt->sending queue back to an ordinary wait queue Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 38/44] SUNRPC: Add a label for RPC calls that require allocation on receive Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 39/44] SUNRPC: Add a bvec array to struct xdr_buf for use with iovec_iter() Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 41/44] SUNRPC: Clean up - rename xs_tcp_data_receive() to xs_stream_data_receive() Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 42/44] SUNRPC: Allow AF_LOCAL sockets to use the generic stream receive Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 43/44] SUNRPC: Clean up xs_udp_data_receive() Trond Myklebust
2018-09-17 13:03 ` [PATCH v3 44/44] SUNRPC: Unexport xdr_partial_copy_from_skb() Trond Myklebust
2018-09-17 20:44 ` [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators Trond Myklebust
2018-11-09 11:19 ` Catalin Marinas
2018-11-29 19:28 ` Cristian Marussi
2018-11-29 19:56 ` Trond Myklebust
2018-11-30 16:19 ` Cristian Marussi
2018-11-30 19:31 ` Trond Myklebust
2018-12-02 16:44 ` Trond Myklebust
2018-12-03 11:45 ` Catalin Marinas
2018-12-03 11:53 ` Cristian Marussi
2018-12-03 18:54 ` Cristian Marussi
2018-12-27 19:21 ` [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks Chuck Lever
2018-12-27 22:14 ` Trond Myklebust
2018-12-27 22:34 ` Chuck Lever
2018-12-31 18:09 ` Trond Myklebust
2018-12-31 18:44 ` Chuck Lever
2018-12-31 18:59 ` Trond Myklebust
2018-12-31 19:09 ` Chuck Lever
2018-12-31 19:18 ` Trond Myklebust
2018-12-31 19:21 ` Trond Myklebust
2019-01-02 18:17 ` Chuck Lever
2019-01-02 18:45 ` Trond Myklebust
2019-01-02 18:51 ` Chuck Lever
2019-01-02 18:57 ` Trond Myklebust
2019-01-02 19:06 ` Trond Myklebust
2019-01-02 19:24 ` Trond Myklebust
2019-01-02 19:33 ` Chuck Lever [this message]
2019-01-02 19:08 ` Chuck Lever
2019-01-02 19:11 ` Trond Myklebust
2018-09-18 21:01 ` [PATCH v3 15/44] SUNRPC: Refactor xprt_transmit() to remove the reply queue code Anna Schumaker
2018-09-19 15:48 ` Trond Myklebust
2018-09-19 17:30 ` Anna Schumaker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5F5AF2D8-EB72-4B1C-BE5E-7593037FFB92@oracle.com \
--to=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
--cc=trondmy@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).