* [PATCH v3 00/44] Convert RPC client transmission to a queued model @ 2018-09-17 13:02 Trond Myklebust 2018-09-17 13:02 ` [PATCH v3 01/44] SUNRPC: Clean up initialisation of the struct rpc_rqst Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:02 UTC (permalink / raw) To: linux-nfs For historical reasons, the RPC client is heavily serialised during the process of transmitting a request by the XPRT_LOCK. A request is required to take that lock before it can start XDR encoding, and it is required to hold it until it is done transmitting. In essence the lock protects the following functions: - Stream based transport connect/reconnect - RPCSEC_GSS encoding of the RPC message - Transmission of a single RPC message The following patch set assumes that we do not need to do much to improve performance of the connect/reconnect case, as that is supposed to be a rare occurrence. The set looks at dealing with RPCSEC_GSS issues by removing serialisation while encoding, and simply assuming that if we detect after grabbing the XPRT_LOCK that we're about to transmit a message with a sequence number that has fallen outside the window allowed by RFC2203, then we can abort the transmission of that message, and schedule it for re-encoding. Since window sizes are typically expected to lie above 100 messages or so, we expect these cases where we miss the window to be rare, in general. We try to avoid the requirement that every request must go through the process of being woken up to grab the XPRT_LOCK in order to transmit itself by allowing a request that currently holds the XPRT_LOCK to grab other requests from an ordered queue, and to transmit them too. The bulk of the changes in this patchset are dedicated to providing this functionality. In addition, the XPRT_LOCK queue provides some extra functionality: - Throttling of the TCP slot allocation (as Chuck pointed out) - Fair queuing, to ensure batch jobs don't crowd out interactive ones The patchset does add functionality to ensure that the resulting transmission queue is fair, and also fixes up the RPC wait queues to ensure that they don't compromise fairness. For now, this patchset discards the TCP slot throttling. We may still want to throttle in the case where the connection is lost, but if we do so, we should ensure we do not serialise all requests when in the connected state. The last few patches also take a new look at the client receive code now that we have the iterator method for reading socket data into page buffers. It converts the TCP and the UNIX stream code to using the iterator method and performs some cleanups. --- v2: - Address feedback by Chuck. - Handle UDP/RDMA credits correctly - Remove throttling of TCP slot allocations - Minor nits - Clean up the write_space handling - Fair queueing v3: - Performance improvements, bugfixes and cleanups - Socket stream receive queue improvements Trond Myklebust (44): SUNRPC: Clean up initialisation of the struct rpc_rqst SUNRPC: If there is no reply expected, bail early from call_decode SUNRPC: The transmitted message must lie in the RPCSEC window of validity SUNRPC: Simplify identification of when the message send/receive is complete SUNRPC: Avoid holding locks across the XDR encoding of the RPC message SUNRPC: Rename TCP receive-specific state variables SUNRPC: Move reset of TCP state variables into the reconnect code SUNRPC: Add socket transmit queue offset tracking SUNRPC: Simplify dealing with aborted partially transmitted messages SUNRPC: Refactor the transport request pinning SUNRPC: Add a helper to wake up a sleeping rpc_task and set its status SUNRPC: Test whether the task is queued before grabbing the queue spinlocks SUNRPC: Don't wake queued RPC calls multiple times in xprt_transmit SUNRPC: Rename xprt->recv_lock to xprt->queue_lock SUNRPC: Refactor xprt_transmit() to remove the reply queue code SUNRPC: Refactor xprt_transmit() to remove wait for reply code SUNRPC: Minor cleanup for call_transmit() SUNRPC: Distinguish between the slot allocation list and receive queue SUNRPC: Add a transmission queue for RPC requests SUNRPC: Refactor RPC call encoding SUNRPC: Fix up the back channel transmit SUNRPC: Treat the task and request as separate in the xprt_ops->send_request() SUNRPC: Don't reset the request 'bytes_sent' counter when releasing XPRT_LOCK SUNRPC: Simplify xprt_prepare_transmit() SUNRPC: Move RPC retransmission stat counter to xprt_transmit() SUNRPC: Improve latency for interactive tasks SUNRPC: Support for congestion control when queuing is enabled SUNRPC: Enqueue swapper tagged RPCs at the head of the transmit queue SUNRPC: Allow calls to xprt_transmit() to drain the entire transmit queue SUNRPC: Allow soft RPC calls to time out when waiting for the XPRT_LOCK SUNRPC: Turn off throttling of RPC slots for TCP sockets SUNRPC: Clean up transport write space handling SUNRPC: Cleanup: remove the unused 'task' argument from the request_send() SUNRPC: Don't take transport->lock unnecessarily when taking XPRT_LOCK SUNRPC: Convert xprt receive queue to use an rbtree SUNRPC: Fix priority queue fairness SUNRPC: Convert the xprt->sending queue back to an ordinary wait queue SUNRPC: Add a label for RPC calls that require allocation on receive SUNRPC: Add a bvec array to struct xdr_buf for use with iovec_iter() SUNRPC: Simplify TCP receive code by switching to using iterators SUNRPC: Clean up - rename xs_tcp_data_receive() to xs_stream_data_receive() SUNRPC: Allow AF_LOCAL sockets to use the generic stream receive SUNRPC: Clean up xs_udp_data_receive() SUNRPC: Unexport xdr_partial_copy_from_skb() fs/nfs/nfs3xdr.c | 4 +- include/linux/sunrpc/auth.h | 2 + include/linux/sunrpc/auth_gss.h | 1 + include/linux/sunrpc/bc_xprt.h | 1 + include/linux/sunrpc/sched.h | 10 +- include/linux/sunrpc/svc_xprt.h | 1 - include/linux/sunrpc/xdr.h | 11 +- include/linux/sunrpc/xprt.h | 35 +- include/linux/sunrpc/xprtsock.h | 36 +- include/trace/events/sunrpc.h | 37 +- net/sunrpc/auth.c | 10 + net/sunrpc/auth_gss/auth_gss.c | 41 + net/sunrpc/auth_gss/gss_rpc_xdr.c | 1 + net/sunrpc/backchannel_rqst.c | 1 - net/sunrpc/clnt.c | 174 ++-- net/sunrpc/sched.c | 178 ++-- net/sunrpc/socklib.c | 10 +- net/sunrpc/svc_xprt.c | 2 - net/sunrpc/svcsock.c | 6 +- net/sunrpc/xdr.c | 34 + net/sunrpc/xprt.c | 893 ++++++++++++----- net/sunrpc/xprtrdma/backchannel.c | 4 +- net/sunrpc/xprtrdma/rpc_rdma.c | 12 +- net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 14 +- net/sunrpc/xprtrdma/transport.c | 10 +- net/sunrpc/xprtsock.c | 1060 +++++++++----------- 26 files changed, 1474 insertions(+), 1114 deletions(-) -- 2.17.1 ^ permalink raw reply [flat|nested] 76+ messages in thread
* [PATCH v3 01/44] SUNRPC: Clean up initialisation of the struct rpc_rqst 2018-09-17 13:02 [PATCH v3 00/44] Convert RPC client transmission to a queued model Trond Myklebust @ 2018-09-17 13:02 ` Trond Myklebust 2018-09-17 13:02 ` [PATCH v3 02/44] SUNRPC: If there is no reply expected, bail early from call_decode Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:02 UTC (permalink / raw) To: linux-nfs Move the initialisation back into xprt.c. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- include/linux/sunrpc/xprt.h | 1 - net/sunrpc/clnt.c | 1 - net/sunrpc/xprt.c | 91 +++++++++++++++++++++---------------- 3 files changed, 51 insertions(+), 42 deletions(-) diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h index 336fd1a19cca..3d80524e92d6 100644 --- a/include/linux/sunrpc/xprt.h +++ b/include/linux/sunrpc/xprt.h @@ -325,7 +325,6 @@ struct xprt_class { struct rpc_xprt *xprt_create_transport(struct xprt_create *args); void xprt_connect(struct rpc_task *task); void xprt_reserve(struct rpc_task *task); -void xprt_request_init(struct rpc_task *task); void xprt_retry_reserve(struct rpc_task *task); int xprt_reserve_xprt(struct rpc_xprt *xprt, struct rpc_task *task); int xprt_reserve_xprt_cong(struct rpc_xprt *xprt, struct rpc_task *task); diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index 8ea2f5fadd96..bc9d020bf71f 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1558,7 +1558,6 @@ call_reserveresult(struct rpc_task *task) task->tk_status = 0; if (status >= 0) { if (task->tk_rqstp) { - xprt_request_init(task); task->tk_action = call_refresh; return; } diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index a8db2e3f8904..6aa09edc9567 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1250,6 +1250,55 @@ void xprt_free(struct rpc_xprt *xprt) } EXPORT_SYMBOL_GPL(xprt_free); +static __be32 +xprt_alloc_xid(struct rpc_xprt *xprt) +{ + __be32 xid; + + spin_lock(&xprt->reserve_lock); + xid = (__force __be32)xprt->xid++; + spin_unlock(&xprt->reserve_lock); + return xid; +} + +static void +xprt_init_xid(struct rpc_xprt *xprt) +{ + xprt->xid = prandom_u32(); +} + +static void +xprt_request_init(struct rpc_task *task) +{ + struct rpc_xprt *xprt = task->tk_xprt; + struct rpc_rqst *req = task->tk_rqstp; + + INIT_LIST_HEAD(&req->rq_list); + req->rq_timeout = task->tk_client->cl_timeout->to_initval; + req->rq_task = task; + req->rq_xprt = xprt; + req->rq_buffer = NULL; + req->rq_xid = xprt_alloc_xid(xprt); + req->rq_connect_cookie = xprt->connect_cookie - 1; + req->rq_bytes_sent = 0; + req->rq_snd_buf.len = 0; + req->rq_snd_buf.buflen = 0; + req->rq_rcv_buf.len = 0; + req->rq_rcv_buf.buflen = 0; + req->rq_release_snd_buf = NULL; + xprt_reset_majortimeo(req); + dprintk("RPC: %5u reserved req %p xid %08x\n", task->tk_pid, + req, ntohl(req->rq_xid)); +} + +static void +xprt_do_reserve(struct rpc_xprt *xprt, struct rpc_task *task) +{ + xprt->ops->alloc_slot(xprt, task); + if (task->tk_rqstp != NULL) + xprt_request_init(task); +} + /** * xprt_reserve - allocate an RPC request slot * @task: RPC task requesting a slot allocation @@ -1269,7 +1318,7 @@ void xprt_reserve(struct rpc_task *task) task->tk_timeout = 0; task->tk_status = -EAGAIN; if (!xprt_throttle_congested(xprt, task)) - xprt->ops->alloc_slot(xprt, task); + xprt_do_reserve(xprt, task); } /** @@ -1291,45 +1340,7 @@ void xprt_retry_reserve(struct rpc_task *task) task->tk_timeout = 0; task->tk_status = -EAGAIN; - xprt->ops->alloc_slot(xprt, task); -} - -static inline __be32 xprt_alloc_xid(struct rpc_xprt *xprt) -{ - __be32 xid; - - spin_lock(&xprt->reserve_lock); - xid = (__force __be32)xprt->xid++; - spin_unlock(&xprt->reserve_lock); - return xid; -} - -static inline void xprt_init_xid(struct rpc_xprt *xprt) -{ - xprt->xid = prandom_u32(); -} - -void xprt_request_init(struct rpc_task *task) -{ - struct rpc_xprt *xprt = task->tk_xprt; - struct rpc_rqst *req = task->tk_rqstp; - - INIT_LIST_HEAD(&req->rq_list); - req->rq_timeout = task->tk_client->cl_timeout->to_initval; - req->rq_task = task; - req->rq_xprt = xprt; - req->rq_buffer = NULL; - req->rq_xid = xprt_alloc_xid(xprt); - req->rq_connect_cookie = xprt->connect_cookie - 1; - req->rq_bytes_sent = 0; - req->rq_snd_buf.len = 0; - req->rq_snd_buf.buflen = 0; - req->rq_rcv_buf.len = 0; - req->rq_rcv_buf.buflen = 0; - req->rq_release_snd_buf = NULL; - xprt_reset_majortimeo(req); - dprintk("RPC: %5u reserved req %p xid %08x\n", task->tk_pid, - req, ntohl(req->rq_xid)); + xprt_do_reserve(xprt, task); } /** -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 02/44] SUNRPC: If there is no reply expected, bail early from call_decode 2018-09-17 13:02 ` [PATCH v3 01/44] SUNRPC: Clean up initialisation of the struct rpc_rqst Trond Myklebust @ 2018-09-17 13:02 ` Trond Myklebust 2018-09-17 13:02 ` [PATCH v3 03/44] SUNRPC: The transmitted message must lie in the RPCSEC window of validity Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:02 UTC (permalink / raw) To: linux-nfs Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- net/sunrpc/clnt.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index bc9d020bf71f..4f1ec8013332 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -2260,6 +2260,11 @@ call_decode(struct rpc_task *task) dprint_status(task); + if (!decode) { + task->tk_action = rpc_exit_task; + return; + } + if (task->tk_flags & RPC_CALL_MAJORSEEN) { if (clnt->cl_chatty) { printk(KERN_NOTICE "%s: server %s OK\n", @@ -2297,13 +2302,11 @@ call_decode(struct rpc_task *task) goto out_retry; return; } - task->tk_action = rpc_exit_task; - if (decode) { - task->tk_status = rpcauth_unwrap_resp(task, decode, req, p, - task->tk_msg.rpc_resp); - } + task->tk_status = rpcauth_unwrap_resp(task, decode, req, p, + task->tk_msg.rpc_resp); + dprintk("RPC: %5u call_decode result %d\n", task->tk_pid, task->tk_status); return; -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 03/44] SUNRPC: The transmitted message must lie in the RPCSEC window of validity 2018-09-17 13:02 ` [PATCH v3 02/44] SUNRPC: If there is no reply expected, bail early from call_decode Trond Myklebust @ 2018-09-17 13:02 ` Trond Myklebust 2018-09-17 13:02 ` [PATCH v3 04/44] SUNRPC: Simplify identification of when the message send/receive is complete Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:02 UTC (permalink / raw) To: linux-nfs If a message has been encoded using RPCSEC_GSS, the server is maintaining a window of sequence numbers that it considers valid. The client should normally be tracking that window, and needs to verify that the sequence number used by the message being transmitted still lies inside the window of validity. So far, we've been able to assume this condition would be realised automatically, since the client has been encoding the message only after taking the socket lock. Once we change that condition, we will need the explicit check. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- include/linux/sunrpc/auth.h | 2 ++ include/linux/sunrpc/auth_gss.h | 1 + net/sunrpc/auth.c | 10 ++++++++ net/sunrpc/auth_gss/auth_gss.c | 41 +++++++++++++++++++++++++++++++++ net/sunrpc/clnt.c | 3 +++ net/sunrpc/xprt.c | 7 ++++++ 6 files changed, 64 insertions(+) diff --git a/include/linux/sunrpc/auth.h b/include/linux/sunrpc/auth.h index 58a6765c1c5e..2c97a3933ef9 100644 --- a/include/linux/sunrpc/auth.h +++ b/include/linux/sunrpc/auth.h @@ -157,6 +157,7 @@ struct rpc_credops { int (*crkey_timeout)(struct rpc_cred *); bool (*crkey_to_expire)(struct rpc_cred *); char * (*crstringify_acceptor)(struct rpc_cred *); + bool (*crneed_reencode)(struct rpc_task *); }; extern const struct rpc_authops authunix_ops; @@ -192,6 +193,7 @@ __be32 * rpcauth_marshcred(struct rpc_task *, __be32 *); __be32 * rpcauth_checkverf(struct rpc_task *, __be32 *); int rpcauth_wrap_req(struct rpc_task *task, kxdreproc_t encode, void *rqstp, __be32 *data, void *obj); int rpcauth_unwrap_resp(struct rpc_task *task, kxdrdproc_t decode, void *rqstp, __be32 *data, void *obj); +bool rpcauth_xmit_need_reencode(struct rpc_task *task); int rpcauth_refreshcred(struct rpc_task *); void rpcauth_invalcred(struct rpc_task *); int rpcauth_uptodatecred(struct rpc_task *); diff --git a/include/linux/sunrpc/auth_gss.h b/include/linux/sunrpc/auth_gss.h index 0c9eac351aab..30427b729070 100644 --- a/include/linux/sunrpc/auth_gss.h +++ b/include/linux/sunrpc/auth_gss.h @@ -70,6 +70,7 @@ struct gss_cl_ctx { refcount_t count; enum rpc_gss_proc gc_proc; u32 gc_seq; + u32 gc_seq_xmit; spinlock_t gc_seq_lock; struct gss_ctx *gc_gss_ctx; struct xdr_netobj gc_wire_ctx; diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c index 305ecea92170..59df5cdba0ac 100644 --- a/net/sunrpc/auth.c +++ b/net/sunrpc/auth.c @@ -817,6 +817,16 @@ rpcauth_unwrap_resp(struct rpc_task *task, kxdrdproc_t decode, void *rqstp, return rpcauth_unwrap_req_decode(decode, rqstp, data, obj); } +bool +rpcauth_xmit_need_reencode(struct rpc_task *task) +{ + struct rpc_cred *cred = task->tk_rqstp->rq_cred; + + if (!cred || !cred->cr_ops->crneed_reencode) + return false; + return cred->cr_ops->crneed_reencode(task); +} + int rpcauth_refreshcred(struct rpc_task *task) { diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c index 21c0aa0a0d1d..c898a7c75e84 100644 --- a/net/sunrpc/auth_gss/auth_gss.c +++ b/net/sunrpc/auth_gss/auth_gss.c @@ -1984,6 +1984,46 @@ gss_unwrap_req_decode(kxdrdproc_t decode, struct rpc_rqst *rqstp, return decode(rqstp, &xdr, obj); } +static bool +gss_seq_is_newer(u32 new, u32 old) +{ + return (s32)(new - old) > 0; +} + +static bool +gss_xmit_need_reencode(struct rpc_task *task) +{ + struct rpc_rqst *req = task->tk_rqstp; + struct rpc_cred *cred = req->rq_cred; + struct gss_cl_ctx *ctx = gss_cred_get_ctx(cred); + u32 win, seq_xmit; + bool ret = true; + + if (!ctx) + return true; + + if (gss_seq_is_newer(req->rq_seqno, READ_ONCE(ctx->gc_seq))) + goto out; + + seq_xmit = READ_ONCE(ctx->gc_seq_xmit); + while (gss_seq_is_newer(req->rq_seqno, seq_xmit)) { + u32 tmp = seq_xmit; + + seq_xmit = cmpxchg(&ctx->gc_seq_xmit, tmp, req->rq_seqno); + if (seq_xmit == tmp) { + ret = false; + goto out; + } + } + + win = ctx->gc_win; + if (win > 0) + ret = !gss_seq_is_newer(req->rq_seqno, seq_xmit - win); +out: + gss_put_ctx(ctx); + return ret; +} + static int gss_unwrap_resp(struct rpc_task *task, kxdrdproc_t decode, void *rqstp, __be32 *p, void *obj) @@ -2052,6 +2092,7 @@ static const struct rpc_credops gss_credops = { .crunwrap_resp = gss_unwrap_resp, .crkey_timeout = gss_key_timeout, .crstringify_acceptor = gss_stringify_acceptor, + .crneed_reencode = gss_xmit_need_reencode, }; static const struct rpc_credops gss_nullops = { diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index 4f1ec8013332..d41b5ac1d4e8 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -2184,6 +2184,9 @@ call_status(struct rpc_task *task) /* shutdown or soft timeout */ rpc_exit(task, status); break; + case -EBADMSG: + task->tk_action = call_transmit; + break; default: if (clnt->cl_chatty) printk("%s: RPC call returned error %d\n", diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 6aa09edc9567..3973e10ea2bd 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1014,6 +1014,13 @@ void xprt_transmit(struct rpc_task *task) dprintk("RPC: %5u xprt_transmit(%u)\n", task->tk_pid, req->rq_slen); if (!req->rq_reply_bytes_recvd) { + + /* Verify that our message lies in the RPCSEC_GSS window */ + if (!req->rq_bytes_sent && rpcauth_xmit_need_reencode(task)) { + task->tk_status = -EBADMSG; + return; + } + if (list_empty(&req->rq_list) && rpc_reply_expected(task)) { /* * Add to the list only if we're expecting a reply -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 04/44] SUNRPC: Simplify identification of when the message send/receive is complete 2018-09-17 13:02 ` [PATCH v3 03/44] SUNRPC: The transmitted message must lie in the RPCSEC window of validity Trond Myklebust @ 2018-09-17 13:02 ` Trond Myklebust 2018-09-17 13:02 ` [PATCH v3 05/44] SUNRPC: Avoid holding locks across the XDR encoding of the RPC message Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:02 UTC (permalink / raw) To: linux-nfs Add states to indicate that the message send and receive are not yet complete. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- include/linux/sunrpc/sched.h | 6 ++++-- net/sunrpc/clnt.c | 19 +++++++------------ net/sunrpc/xprt.c | 17 ++++++++++++++--- 3 files changed, 25 insertions(+), 17 deletions(-) diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h index 592653becd91..9e655df70131 100644 --- a/include/linux/sunrpc/sched.h +++ b/include/linux/sunrpc/sched.h @@ -140,8 +140,10 @@ struct rpc_task_setup { #define RPC_TASK_RUNNING 0 #define RPC_TASK_QUEUED 1 #define RPC_TASK_ACTIVE 2 -#define RPC_TASK_MSG_RECV 3 -#define RPC_TASK_MSG_RECV_WAIT 4 +#define RPC_TASK_NEED_XMIT 3 +#define RPC_TASK_NEED_RECV 4 +#define RPC_TASK_MSG_RECV 5 +#define RPC_TASK_MSG_RECV_WAIT 6 #define RPC_IS_RUNNING(t) test_bit(RPC_TASK_RUNNING, &(t)->tk_runstate) #define rpc_set_running(t) set_bit(RPC_TASK_RUNNING, &(t)->tk_runstate) diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index d41b5ac1d4e8..e5ac35e803ad 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1156,6 +1156,7 @@ struct rpc_task *rpc_run_bc_task(struct rpc_rqst *req) */ xbufp->len = xbufp->head[0].iov_len + xbufp->page_len + xbufp->tail[0].iov_len; + set_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate); task->tk_action = call_bc_transmit; atomic_inc(&task->tk_count); @@ -1720,17 +1721,10 @@ call_allocate(struct rpc_task *task) rpc_exit(task, -ERESTARTSYS); } -static inline int +static int rpc_task_need_encode(struct rpc_task *task) { - return task->tk_rqstp->rq_snd_buf.len == 0; -} - -static inline void -rpc_task_force_reencode(struct rpc_task *task) -{ - task->tk_rqstp->rq_snd_buf.len = 0; - task->tk_rqstp->rq_bytes_sent = 0; + return test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate) == 0; } /* @@ -1765,6 +1759,8 @@ rpc_xdr_encode(struct rpc_task *task) task->tk_status = rpcauth_wrap_req(task, encode, req, p, task->tk_msg.rpc_argp); + if (task->tk_status == 0) + set_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate); } /* @@ -1999,7 +1995,6 @@ call_transmit_status(struct rpc_task *task) */ if (task->tk_status == 0) { xprt_end_transmit(task); - rpc_task_force_reencode(task); return; } @@ -2010,7 +2005,6 @@ call_transmit_status(struct rpc_task *task) default: dprint_status(task); xprt_end_transmit(task); - rpc_task_force_reencode(task); break; /* * Special cases: if we've been waiting on the @@ -2038,7 +2032,7 @@ call_transmit_status(struct rpc_task *task) case -EADDRINUSE: case -ENOTCONN: case -EPIPE: - rpc_task_force_reencode(task); + break; } } @@ -2185,6 +2179,7 @@ call_status(struct rpc_task *task) rpc_exit(task, status); break; case -EBADMSG: + clear_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate); task->tk_action = call_transmit; break; default: diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 3973e10ea2bd..45d580cd93ac 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -936,10 +936,18 @@ void xprt_complete_rqst(struct rpc_task *task, int copied) /* req->rq_reply_bytes_recvd */ smp_wmb(); req->rq_reply_bytes_recvd = copied; + clear_bit(RPC_TASK_NEED_RECV, &task->tk_runstate); rpc_wake_up_queued_task(&xprt->pending, task); } EXPORT_SYMBOL_GPL(xprt_complete_rqst); +static bool +xprt_request_data_received(struct rpc_task *task) +{ + return !test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate) && + task->tk_rqstp->rq_reply_bytes_recvd != 0; +} + static void xprt_timer(struct rpc_task *task) { struct rpc_rqst *req = task->tk_rqstp; @@ -1031,12 +1039,13 @@ void xprt_transmit(struct rpc_task *task) /* Add request to the receive list */ spin_lock(&xprt->recv_lock); list_add_tail(&req->rq_list, &xprt->recv); + set_bit(RPC_TASK_NEED_RECV, &task->tk_runstate); spin_unlock(&xprt->recv_lock); xprt_reset_majortimeo(req); /* Turn off autodisconnect */ del_singleshot_timer_sync(&xprt->timer); } - } else if (!req->rq_bytes_sent) + } else if (xprt_request_data_received(task) && !req->rq_bytes_sent) return; connect_cookie = xprt->connect_cookie; @@ -1046,9 +1055,11 @@ void xprt_transmit(struct rpc_task *task) task->tk_status = status; return; } + xprt_inject_disconnect(xprt); dprintk("RPC: %5u xmit complete\n", task->tk_pid); + clear_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate); task->tk_flags |= RPC_TASK_SENT; spin_lock_bh(&xprt->transport_lock); @@ -1062,14 +1073,14 @@ void xprt_transmit(struct rpc_task *task) spin_unlock_bh(&xprt->transport_lock); req->rq_connect_cookie = connect_cookie; - if (rpc_reply_expected(task) && !READ_ONCE(req->rq_reply_bytes_recvd)) { + if (test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate)) { /* * Sleep on the pending queue if we're expecting a reply. * The spinlock ensures atomicity between the test of * req->rq_reply_bytes_recvd, and the call to rpc_sleep_on(). */ spin_lock(&xprt->recv_lock); - if (!req->rq_reply_bytes_recvd) { + if (test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate)) { rpc_sleep_on(&xprt->pending, task, xprt_timer); /* * Send an extra queue wakeup call if the -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 05/44] SUNRPC: Avoid holding locks across the XDR encoding of the RPC message 2018-09-17 13:02 ` [PATCH v3 04/44] SUNRPC: Simplify identification of when the message send/receive is complete Trond Myklebust @ 2018-09-17 13:02 ` Trond Myklebust 2018-09-17 13:02 ` [PATCH v3 06/44] SUNRPC: Rename TCP receive-specific state variables Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:02 UTC (permalink / raw) To: linux-nfs Currently, we grab the socket bit lock before we allow the message to be XDR encoded. That significantly slows down the transmission rate, since we serialise on a potentially blocking operation. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- net/sunrpc/clnt.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index e5ac35e803ad..a858366cd15d 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1949,9 +1949,6 @@ call_transmit(struct rpc_task *task) task->tk_action = call_status; if (task->tk_status < 0) return; - if (!xprt_prepare_transmit(task)) - return; - task->tk_action = call_transmit_status; /* Encode here so that rpcsec_gss can use correct sequence number. */ if (rpc_task_need_encode(task)) { rpc_xdr_encode(task); @@ -1965,6 +1962,9 @@ call_transmit(struct rpc_task *task) return; } } + if (!xprt_prepare_transmit(task)) + return; + task->tk_action = call_transmit_status; xprt_transmit(task); if (task->tk_status < 0) return; -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 06/44] SUNRPC: Rename TCP receive-specific state variables 2018-09-17 13:02 ` [PATCH v3 05/44] SUNRPC: Avoid holding locks across the XDR encoding of the RPC message Trond Myklebust @ 2018-09-17 13:02 ` Trond Myklebust 2018-09-17 13:02 ` [PATCH v3 07/44] SUNRPC: Move reset of TCP state variables into the reconnect code Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:02 UTC (permalink / raw) To: linux-nfs Since we will want to introduce similar TCP state variables for the transmission of requests, let's rename the existing ones to label that they are for the receive side. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- include/linux/sunrpc/xprtsock.h | 16 +-- include/trace/events/sunrpc.h | 10 +- net/sunrpc/xprtsock.c | 178 ++++++++++++++++---------------- 3 files changed, 103 insertions(+), 101 deletions(-) diff --git a/include/linux/sunrpc/xprtsock.h b/include/linux/sunrpc/xprtsock.h index ae0f99b9b965..90d5ca8e65f4 100644 --- a/include/linux/sunrpc/xprtsock.h +++ b/include/linux/sunrpc/xprtsock.h @@ -30,15 +30,17 @@ struct sock_xprt { /* * State of TCP reply receive */ - __be32 tcp_fraghdr, - tcp_xid, - tcp_calldir; + struct { + __be32 fraghdr, + xid, + calldir; - u32 tcp_offset, - tcp_reclen; + u32 offset, + len; - unsigned long tcp_copied, - tcp_flags; + unsigned long copied, + flags; + } recv; /* * Connection of transports diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h index bbb08a3ef5cc..0aa347194e0f 100644 --- a/include/trace/events/sunrpc.h +++ b/include/trace/events/sunrpc.h @@ -525,11 +525,11 @@ TRACE_EVENT(xs_tcp_data_recv, TP_fast_assign( __assign_str(addr, xs->xprt.address_strings[RPC_DISPLAY_ADDR]); __assign_str(port, xs->xprt.address_strings[RPC_DISPLAY_PORT]); - __entry->xid = be32_to_cpu(xs->tcp_xid); - __entry->flags = xs->tcp_flags; - __entry->copied = xs->tcp_copied; - __entry->reclen = xs->tcp_reclen; - __entry->offset = xs->tcp_offset; + __entry->xid = be32_to_cpu(xs->recv.xid); + __entry->flags = xs->recv.flags; + __entry->copied = xs->recv.copied; + __entry->reclen = xs->recv.len; + __entry->offset = xs->recv.offset; ), TP_printk("peer=[%s]:%s xid=0x%08x flags=%s copied=%lu reclen=%u offset=%lu", diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 6b7539c0466e..cd7d093721ae 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -1169,42 +1169,42 @@ static inline void xs_tcp_read_fraghdr(struct rpc_xprt *xprt, struct xdr_skb_rea size_t len, used; char *p; - p = ((char *) &transport->tcp_fraghdr) + transport->tcp_offset; - len = sizeof(transport->tcp_fraghdr) - transport->tcp_offset; + p = ((char *) &transport->recv.fraghdr) + transport->recv.offset; + len = sizeof(transport->recv.fraghdr) - transport->recv.offset; used = xdr_skb_read_bits(desc, p, len); - transport->tcp_offset += used; + transport->recv.offset += used; if (used != len) return; - transport->tcp_reclen = ntohl(transport->tcp_fraghdr); - if (transport->tcp_reclen & RPC_LAST_STREAM_FRAGMENT) - transport->tcp_flags |= TCP_RCV_LAST_FRAG; + transport->recv.len = ntohl(transport->recv.fraghdr); + if (transport->recv.len & RPC_LAST_STREAM_FRAGMENT) + transport->recv.flags |= TCP_RCV_LAST_FRAG; else - transport->tcp_flags &= ~TCP_RCV_LAST_FRAG; - transport->tcp_reclen &= RPC_FRAGMENT_SIZE_MASK; + transport->recv.flags &= ~TCP_RCV_LAST_FRAG; + transport->recv.len &= RPC_FRAGMENT_SIZE_MASK; - transport->tcp_flags &= ~TCP_RCV_COPY_FRAGHDR; - transport->tcp_offset = 0; + transport->recv.flags &= ~TCP_RCV_COPY_FRAGHDR; + transport->recv.offset = 0; /* Sanity check of the record length */ - if (unlikely(transport->tcp_reclen < 8)) { + if (unlikely(transport->recv.len < 8)) { dprintk("RPC: invalid TCP record fragment length\n"); xs_tcp_force_close(xprt); return; } dprintk("RPC: reading TCP record fragment of length %d\n", - transport->tcp_reclen); + transport->recv.len); } static void xs_tcp_check_fraghdr(struct sock_xprt *transport) { - if (transport->tcp_offset == transport->tcp_reclen) { - transport->tcp_flags |= TCP_RCV_COPY_FRAGHDR; - transport->tcp_offset = 0; - if (transport->tcp_flags & TCP_RCV_LAST_FRAG) { - transport->tcp_flags &= ~TCP_RCV_COPY_DATA; - transport->tcp_flags |= TCP_RCV_COPY_XID; - transport->tcp_copied = 0; + if (transport->recv.offset == transport->recv.len) { + transport->recv.flags |= TCP_RCV_COPY_FRAGHDR; + transport->recv.offset = 0; + if (transport->recv.flags & TCP_RCV_LAST_FRAG) { + transport->recv.flags &= ~TCP_RCV_COPY_DATA; + transport->recv.flags |= TCP_RCV_COPY_XID; + transport->recv.copied = 0; } } } @@ -1214,20 +1214,20 @@ static inline void xs_tcp_read_xid(struct sock_xprt *transport, struct xdr_skb_r size_t len, used; char *p; - len = sizeof(transport->tcp_xid) - transport->tcp_offset; + len = sizeof(transport->recv.xid) - transport->recv.offset; dprintk("RPC: reading XID (%zu bytes)\n", len); - p = ((char *) &transport->tcp_xid) + transport->tcp_offset; + p = ((char *) &transport->recv.xid) + transport->recv.offset; used = xdr_skb_read_bits(desc, p, len); - transport->tcp_offset += used; + transport->recv.offset += used; if (used != len) return; - transport->tcp_flags &= ~TCP_RCV_COPY_XID; - transport->tcp_flags |= TCP_RCV_READ_CALLDIR; - transport->tcp_copied = 4; + transport->recv.flags &= ~TCP_RCV_COPY_XID; + transport->recv.flags |= TCP_RCV_READ_CALLDIR; + transport->recv.copied = 4; dprintk("RPC: reading %s XID %08x\n", - (transport->tcp_flags & TCP_RPC_REPLY) ? "reply for" + (transport->recv.flags & TCP_RPC_REPLY) ? "reply for" : "request with", - ntohl(transport->tcp_xid)); + ntohl(transport->recv.xid)); xs_tcp_check_fraghdr(transport); } @@ -1239,34 +1239,34 @@ static inline void xs_tcp_read_calldir(struct sock_xprt *transport, char *p; /* - * We want transport->tcp_offset to be 8 at the end of this routine + * We want transport->recv.offset to be 8 at the end of this routine * (4 bytes for the xid and 4 bytes for the call/reply flag). * When this function is called for the first time, - * transport->tcp_offset is 4 (after having already read the xid). + * transport->recv.offset is 4 (after having already read the xid). */ - offset = transport->tcp_offset - sizeof(transport->tcp_xid); - len = sizeof(transport->tcp_calldir) - offset; + offset = transport->recv.offset - sizeof(transport->recv.xid); + len = sizeof(transport->recv.calldir) - offset; dprintk("RPC: reading CALL/REPLY flag (%zu bytes)\n", len); - p = ((char *) &transport->tcp_calldir) + offset; + p = ((char *) &transport->recv.calldir) + offset; used = xdr_skb_read_bits(desc, p, len); - transport->tcp_offset += used; + transport->recv.offset += used; if (used != len) return; - transport->tcp_flags &= ~TCP_RCV_READ_CALLDIR; + transport->recv.flags &= ~TCP_RCV_READ_CALLDIR; /* * We don't yet have the XDR buffer, so we will write the calldir * out after we get the buffer from the 'struct rpc_rqst' */ - switch (ntohl(transport->tcp_calldir)) { + switch (ntohl(transport->recv.calldir)) { case RPC_REPLY: - transport->tcp_flags |= TCP_RCV_COPY_CALLDIR; - transport->tcp_flags |= TCP_RCV_COPY_DATA; - transport->tcp_flags |= TCP_RPC_REPLY; + transport->recv.flags |= TCP_RCV_COPY_CALLDIR; + transport->recv.flags |= TCP_RCV_COPY_DATA; + transport->recv.flags |= TCP_RPC_REPLY; break; case RPC_CALL: - transport->tcp_flags |= TCP_RCV_COPY_CALLDIR; - transport->tcp_flags |= TCP_RCV_COPY_DATA; - transport->tcp_flags &= ~TCP_RPC_REPLY; + transport->recv.flags |= TCP_RCV_COPY_CALLDIR; + transport->recv.flags |= TCP_RCV_COPY_DATA; + transport->recv.flags &= ~TCP_RPC_REPLY; break; default: dprintk("RPC: invalid request message type\n"); @@ -1287,21 +1287,21 @@ static inline void xs_tcp_read_common(struct rpc_xprt *xprt, rcvbuf = &req->rq_private_buf; - if (transport->tcp_flags & TCP_RCV_COPY_CALLDIR) { + if (transport->recv.flags & TCP_RCV_COPY_CALLDIR) { /* * Save the RPC direction in the XDR buffer */ - memcpy(rcvbuf->head[0].iov_base + transport->tcp_copied, - &transport->tcp_calldir, - sizeof(transport->tcp_calldir)); - transport->tcp_copied += sizeof(transport->tcp_calldir); - transport->tcp_flags &= ~TCP_RCV_COPY_CALLDIR; + memcpy(rcvbuf->head[0].iov_base + transport->recv.copied, + &transport->recv.calldir, + sizeof(transport->recv.calldir)); + transport->recv.copied += sizeof(transport->recv.calldir); + transport->recv.flags &= ~TCP_RCV_COPY_CALLDIR; } len = desc->count; - if (len > transport->tcp_reclen - transport->tcp_offset) - desc->count = transport->tcp_reclen - transport->tcp_offset; - r = xdr_partial_copy_from_skb(rcvbuf, transport->tcp_copied, + if (len > transport->recv.len - transport->recv.offset) + desc->count = transport->recv.len - transport->recv.offset; + r = xdr_partial_copy_from_skb(rcvbuf, transport->recv.copied, desc, xdr_skb_read_bits); if (desc->count) { @@ -1314,31 +1314,31 @@ static inline void xs_tcp_read_common(struct rpc_xprt *xprt, * Any remaining data from this record will * be discarded. */ - transport->tcp_flags &= ~TCP_RCV_COPY_DATA; + transport->recv.flags &= ~TCP_RCV_COPY_DATA; dprintk("RPC: XID %08x truncated request\n", - ntohl(transport->tcp_xid)); - dprintk("RPC: xprt = %p, tcp_copied = %lu, " - "tcp_offset = %u, tcp_reclen = %u\n", - xprt, transport->tcp_copied, - transport->tcp_offset, transport->tcp_reclen); + ntohl(transport->recv.xid)); + dprintk("RPC: xprt = %p, recv.copied = %lu, " + "recv.offset = %u, recv.len = %u\n", + xprt, transport->recv.copied, + transport->recv.offset, transport->recv.len); return; } - transport->tcp_copied += r; - transport->tcp_offset += r; + transport->recv.copied += r; + transport->recv.offset += r; desc->count = len - r; dprintk("RPC: XID %08x read %zd bytes\n", - ntohl(transport->tcp_xid), r); - dprintk("RPC: xprt = %p, tcp_copied = %lu, tcp_offset = %u, " - "tcp_reclen = %u\n", xprt, transport->tcp_copied, - transport->tcp_offset, transport->tcp_reclen); - - if (transport->tcp_copied == req->rq_private_buf.buflen) - transport->tcp_flags &= ~TCP_RCV_COPY_DATA; - else if (transport->tcp_offset == transport->tcp_reclen) { - if (transport->tcp_flags & TCP_RCV_LAST_FRAG) - transport->tcp_flags &= ~TCP_RCV_COPY_DATA; + ntohl(transport->recv.xid), r); + dprintk("RPC: xprt = %p, recv.copied = %lu, recv.offset = %u, " + "recv.len = %u\n", xprt, transport->recv.copied, + transport->recv.offset, transport->recv.len); + + if (transport->recv.copied == req->rq_private_buf.buflen) + transport->recv.flags &= ~TCP_RCV_COPY_DATA; + else if (transport->recv.offset == transport->recv.len) { + if (transport->recv.flags & TCP_RCV_LAST_FRAG) + transport->recv.flags &= ~TCP_RCV_COPY_DATA; } } @@ -1353,14 +1353,14 @@ static inline int xs_tcp_read_reply(struct rpc_xprt *xprt, container_of(xprt, struct sock_xprt, xprt); struct rpc_rqst *req; - dprintk("RPC: read reply XID %08x\n", ntohl(transport->tcp_xid)); + dprintk("RPC: read reply XID %08x\n", ntohl(transport->recv.xid)); /* Find and lock the request corresponding to this xid */ spin_lock(&xprt->recv_lock); - req = xprt_lookup_rqst(xprt, transport->tcp_xid); + req = xprt_lookup_rqst(xprt, transport->recv.xid); if (!req) { dprintk("RPC: XID %08x request not found!\n", - ntohl(transport->tcp_xid)); + ntohl(transport->recv.xid)); spin_unlock(&xprt->recv_lock); return -1; } @@ -1370,8 +1370,8 @@ static inline int xs_tcp_read_reply(struct rpc_xprt *xprt, xs_tcp_read_common(xprt, desc, req); spin_lock(&xprt->recv_lock); - if (!(transport->tcp_flags & TCP_RCV_COPY_DATA)) - xprt_complete_rqst(req->rq_task, transport->tcp_copied); + if (!(transport->recv.flags & TCP_RCV_COPY_DATA)) + xprt_complete_rqst(req->rq_task, transport->recv.copied); xprt_unpin_rqst(req); spin_unlock(&xprt->recv_lock); return 0; @@ -1393,7 +1393,7 @@ static int xs_tcp_read_callback(struct rpc_xprt *xprt, struct rpc_rqst *req; /* Look up the request corresponding to the given XID */ - req = xprt_lookup_bc_request(xprt, transport->tcp_xid); + req = xprt_lookup_bc_request(xprt, transport->recv.xid); if (req == NULL) { printk(KERN_WARNING "Callback slot table overflowed\n"); xprt_force_disconnect(xprt); @@ -1403,8 +1403,8 @@ static int xs_tcp_read_callback(struct rpc_xprt *xprt, dprintk("RPC: read callback XID %08x\n", ntohl(req->rq_xid)); xs_tcp_read_common(xprt, desc, req); - if (!(transport->tcp_flags & TCP_RCV_COPY_DATA)) - xprt_complete_bc_request(req, transport->tcp_copied); + if (!(transport->recv.flags & TCP_RCV_COPY_DATA)) + xprt_complete_bc_request(req, transport->recv.copied); return 0; } @@ -1415,7 +1415,7 @@ static inline int _xs_tcp_read_data(struct rpc_xprt *xprt, struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); - return (transport->tcp_flags & TCP_RPC_REPLY) ? + return (transport->recv.flags & TCP_RPC_REPLY) ? xs_tcp_read_reply(xprt, desc) : xs_tcp_read_callback(xprt, desc); } @@ -1458,9 +1458,9 @@ static void xs_tcp_read_data(struct rpc_xprt *xprt, else { /* * The transport_lock protects the request handling. - * There's no need to hold it to update the tcp_flags. + * There's no need to hold it to update the recv.flags. */ - transport->tcp_flags &= ~TCP_RCV_COPY_DATA; + transport->recv.flags &= ~TCP_RCV_COPY_DATA; } } @@ -1468,12 +1468,12 @@ static inline void xs_tcp_read_discard(struct sock_xprt *transport, struct xdr_s { size_t len; - len = transport->tcp_reclen - transport->tcp_offset; + len = transport->recv.len - transport->recv.offset; if (len > desc->count) len = desc->count; desc->count -= len; desc->offset += len; - transport->tcp_offset += len; + transport->recv.offset += len; dprintk("RPC: discarded %zu bytes\n", len); xs_tcp_check_fraghdr(transport); } @@ -1494,22 +1494,22 @@ static int xs_tcp_data_recv(read_descriptor_t *rd_desc, struct sk_buff *skb, uns trace_xs_tcp_data_recv(transport); /* Read in a new fragment marker if necessary */ /* Can we ever really expect to get completely empty fragments? */ - if (transport->tcp_flags & TCP_RCV_COPY_FRAGHDR) { + if (transport->recv.flags & TCP_RCV_COPY_FRAGHDR) { xs_tcp_read_fraghdr(xprt, &desc); continue; } /* Read in the xid if necessary */ - if (transport->tcp_flags & TCP_RCV_COPY_XID) { + if (transport->recv.flags & TCP_RCV_COPY_XID) { xs_tcp_read_xid(transport, &desc); continue; } /* Read in the call/reply flag */ - if (transport->tcp_flags & TCP_RCV_READ_CALLDIR) { + if (transport->recv.flags & TCP_RCV_READ_CALLDIR) { xs_tcp_read_calldir(transport, &desc); continue; } /* Read in the request data */ - if (transport->tcp_flags & TCP_RCV_COPY_DATA) { + if (transport->recv.flags & TCP_RCV_COPY_DATA) { xs_tcp_read_data(xprt, &desc); continue; } @@ -1602,10 +1602,10 @@ static void xs_tcp_state_change(struct sock *sk) if (!xprt_test_and_set_connected(xprt)) { /* Reset TCP record info */ - transport->tcp_offset = 0; - transport->tcp_reclen = 0; - transport->tcp_copied = 0; - transport->tcp_flags = + transport->recv.offset = 0; + transport->recv.len = 0; + transport->recv.copied = 0; + transport->recv.flags = TCP_RCV_COPY_FRAGHDR | TCP_RCV_COPY_XID; xprt->connect_cookie++; clear_bit(XPRT_SOCK_CONNECTING, &transport->sock_state); -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 07/44] SUNRPC: Move reset of TCP state variables into the reconnect code 2018-09-17 13:02 ` [PATCH v3 06/44] SUNRPC: Rename TCP receive-specific state variables Trond Myklebust @ 2018-09-17 13:02 ` Trond Myklebust 2018-09-17 13:02 ` [PATCH v3 08/44] SUNRPC: Add socket transmit queue offset tracking Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:02 UTC (permalink / raw) To: linux-nfs Rather than resetting state variables in socket state_change() callback, do it in the sunrpc TCP connect function itself. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- net/sunrpc/xprtsock.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index cd7d093721ae..ec1e3f93e707 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -1600,13 +1600,6 @@ static void xs_tcp_state_change(struct sock *sk) case TCP_ESTABLISHED: spin_lock(&xprt->transport_lock); if (!xprt_test_and_set_connected(xprt)) { - - /* Reset TCP record info */ - transport->recv.offset = 0; - transport->recv.len = 0; - transport->recv.copied = 0; - transport->recv.flags = - TCP_RCV_COPY_FRAGHDR | TCP_RCV_COPY_XID; xprt->connect_cookie++; clear_bit(XPRT_SOCK_CONNECTING, &transport->sock_state); xprt_clear_connecting(xprt); @@ -2386,6 +2379,12 @@ static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock) xs_set_memalloc(xprt); + /* Reset TCP record info */ + transport->recv.offset = 0; + transport->recv.len = 0; + transport->recv.copied = 0; + transport->recv.flags = TCP_RCV_COPY_FRAGHDR | TCP_RCV_COPY_XID; + /* Tell the socket layer to start connecting... */ xprt->stat.connect_count++; xprt->stat.connect_start = jiffies; -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 08/44] SUNRPC: Add socket transmit queue offset tracking 2018-09-17 13:02 ` [PATCH v3 07/44] SUNRPC: Move reset of TCP state variables into the reconnect code Trond Myklebust @ 2018-09-17 13:02 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 09/44] SUNRPC: Simplify dealing with aborted partially transmitted messages Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:02 UTC (permalink / raw) To: linux-nfs Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- include/linux/sunrpc/xprtsock.h | 7 ++++++ net/sunrpc/xprtsock.c | 40 ++++++++++++++++++--------------- 2 files changed, 29 insertions(+), 18 deletions(-) diff --git a/include/linux/sunrpc/xprtsock.h b/include/linux/sunrpc/xprtsock.h index 90d5ca8e65f4..005cfb6e7238 100644 --- a/include/linux/sunrpc/xprtsock.h +++ b/include/linux/sunrpc/xprtsock.h @@ -42,6 +42,13 @@ struct sock_xprt { flags; } recv; + /* + * State of TCP transmit queue + */ + struct { + u32 offset; + } xmit; + /* * Connection of transports */ diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index ec1e3f93e707..629cc45e1e6c 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -461,7 +461,7 @@ static int xs_nospace(struct rpc_task *task) int ret = -EAGAIN; dprintk("RPC: %5u xmit incomplete (%u left of %u)\n", - task->tk_pid, req->rq_slen - req->rq_bytes_sent, + task->tk_pid, req->rq_slen - transport->xmit.offset, req->rq_slen); /* Protect against races with write_space */ @@ -528,19 +528,22 @@ static int xs_local_send_request(struct rpc_task *task) req->rq_svec->iov_base, req->rq_svec->iov_len); req->rq_xtime = ktime_get(); - status = xs_sendpages(transport->sock, NULL, 0, xdr, req->rq_bytes_sent, + status = xs_sendpages(transport->sock, NULL, 0, xdr, + transport->xmit.offset, true, &sent); dprintk("RPC: %s(%u) = %d\n", - __func__, xdr->len - req->rq_bytes_sent, status); + __func__, xdr->len - transport->xmit.offset, status); if (status == -EAGAIN && sock_writeable(transport->inet)) status = -ENOBUFS; if (likely(sent > 0) || status == 0) { - req->rq_bytes_sent += sent; - req->rq_xmit_bytes_sent += sent; + transport->xmit.offset += sent; + req->rq_bytes_sent = transport->xmit.offset; if (likely(req->rq_bytes_sent >= req->rq_slen)) { + req->rq_xmit_bytes_sent += transport->xmit.offset; req->rq_bytes_sent = 0; + transport->xmit.offset = 0; return 0; } status = -EAGAIN; @@ -592,10 +595,10 @@ static int xs_udp_send_request(struct rpc_task *task) return -ENOTCONN; req->rq_xtime = ktime_get(); status = xs_sendpages(transport->sock, xs_addr(xprt), xprt->addrlen, - xdr, req->rq_bytes_sent, true, &sent); + xdr, 0, true, &sent); dprintk("RPC: xs_udp_send_request(%u) = %d\n", - xdr->len - req->rq_bytes_sent, status); + xdr->len, status); /* firewall is blocking us, don't return -EAGAIN or we end up looping */ if (status == -EPERM) @@ -684,17 +687,20 @@ static int xs_tcp_send_request(struct rpc_task *task) while (1) { sent = 0; status = xs_sendpages(transport->sock, NULL, 0, xdr, - req->rq_bytes_sent, zerocopy, &sent); + transport->xmit.offset, + zerocopy, &sent); dprintk("RPC: xs_tcp_send_request(%u) = %d\n", - xdr->len - req->rq_bytes_sent, status); + xdr->len - transport->xmit.offset, status); /* If we've sent the entire packet, immediately * reset the count of bytes sent. */ - req->rq_bytes_sent += sent; - req->rq_xmit_bytes_sent += sent; + transport->xmit.offset += sent; + req->rq_bytes_sent = transport->xmit.offset; if (likely(req->rq_bytes_sent >= req->rq_slen)) { + req->rq_xmit_bytes_sent += transport->xmit.offset; req->rq_bytes_sent = 0; + transport->xmit.offset = 0; return 0; } @@ -760,18 +766,13 @@ static int xs_tcp_send_request(struct rpc_task *task) */ static void xs_tcp_release_xprt(struct rpc_xprt *xprt, struct rpc_task *task) { - struct rpc_rqst *req; + struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); if (task != xprt->snd_task) return; if (task == NULL) goto out_release; - req = task->tk_rqstp; - if (req == NULL) - goto out_release; - if (req->rq_bytes_sent == 0) - goto out_release; - if (req->rq_bytes_sent == req->rq_snd_buf.len) + if (transport->xmit.offset == 0 || !xprt_connected(xprt)) goto out_release; set_bit(XPRT_CLOSE_WAIT, &xprt->state); out_release: @@ -2021,6 +2022,8 @@ static int xs_local_finish_connecting(struct rpc_xprt *xprt, write_unlock_bh(&sk->sk_callback_lock); } + transport->xmit.offset = 0; + /* Tell the socket layer to start connecting... */ xprt->stat.connect_count++; xprt->stat.connect_start = jiffies; @@ -2384,6 +2387,7 @@ static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock) transport->recv.len = 0; transport->recv.copied = 0; transport->recv.flags = TCP_RCV_COPY_FRAGHDR | TCP_RCV_COPY_XID; + transport->xmit.offset = 0; /* Tell the socket layer to start connecting... */ xprt->stat.connect_count++; -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 09/44] SUNRPC: Simplify dealing with aborted partially transmitted messages 2018-09-17 13:02 ` [PATCH v3 08/44] SUNRPC: Add socket transmit queue offset tracking Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 10/44] SUNRPC: Refactor the transport request pinning Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs If the previous message was only partially transmitted, we need to close the socket in order to avoid corruption of the message stream. To do so, we currently hijack the unlocking of the socket in order to schedule the close. Now that we track the message offset in the socket state, we can move that kind of checking out of the socket lock code, which is needed to allow messages to remain queued after dropping the socket lock. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- net/sunrpc/xprtsock.c | 51 +++++++++++++++++++++---------------------- 1 file changed, 25 insertions(+), 26 deletions(-) diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 629cc45e1e6c..3fbccebd0b10 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -491,6 +491,16 @@ static int xs_nospace(struct rpc_task *task) return ret; } +/* + * Determine if the previous message in the stream was aborted before it + * could complete transmission. + */ +static bool +xs_send_request_was_aborted(struct sock_xprt *transport, struct rpc_rqst *req) +{ + return transport->xmit.offset != 0 && req->rq_bytes_sent == 0; +} + /* * Construct a stream transport record marker in @buf. */ @@ -522,6 +532,12 @@ static int xs_local_send_request(struct rpc_task *task) int status; int sent = 0; + /* Close the stream if the previous transmission was incomplete */ + if (xs_send_request_was_aborted(transport, req)) { + xs_close(xprt); + return -ENOTCONN; + } + xs_encode_stream_record_marker(&req->rq_snd_buf); xs_pktdump("packet data:", @@ -665,6 +681,13 @@ static int xs_tcp_send_request(struct rpc_task *task) int status; int sent; + /* Close the stream if the previous transmission was incomplete */ + if (xs_send_request_was_aborted(transport, req)) { + if (transport->sock != NULL) + kernel_sock_shutdown(transport->sock, SHUT_RDWR); + return -ENOTCONN; + } + xs_encode_stream_record_marker(&req->rq_snd_buf); xs_pktdump("packet data:", @@ -755,30 +778,6 @@ static int xs_tcp_send_request(struct rpc_task *task) return status; } -/** - * xs_tcp_release_xprt - clean up after a tcp transmission - * @xprt: transport - * @task: rpc task - * - * This cleans up if an error causes us to abort the transmission of a request. - * In this case, the socket may need to be reset in order to avoid confusing - * the server. - */ -static void xs_tcp_release_xprt(struct rpc_xprt *xprt, struct rpc_task *task) -{ - struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); - - if (task != xprt->snd_task) - return; - if (task == NULL) - goto out_release; - if (transport->xmit.offset == 0 || !xprt_connected(xprt)) - goto out_release; - set_bit(XPRT_CLOSE_WAIT, &xprt->state); -out_release: - xprt_release_xprt(xprt, task); -} - static void xs_save_old_callbacks(struct sock_xprt *transport, struct sock *sk) { transport->old_data_ready = sk->sk_data_ready; @@ -2764,7 +2763,7 @@ static void bc_destroy(struct rpc_xprt *xprt) static const struct rpc_xprt_ops xs_local_ops = { .reserve_xprt = xprt_reserve_xprt, - .release_xprt = xs_tcp_release_xprt, + .release_xprt = xprt_release_xprt, .alloc_slot = xprt_alloc_slot, .free_slot = xprt_free_slot, .rpcbind = xs_local_rpcbind, @@ -2806,7 +2805,7 @@ static const struct rpc_xprt_ops xs_udp_ops = { static const struct rpc_xprt_ops xs_tcp_ops = { .reserve_xprt = xprt_reserve_xprt, - .release_xprt = xs_tcp_release_xprt, + .release_xprt = xprt_release_xprt, .alloc_slot = xprt_lock_and_alloc_slot, .free_slot = xprt_free_slot, .rpcbind = rpcb_getport_async, -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 10/44] SUNRPC: Refactor the transport request pinning 2018-09-17 13:03 ` [PATCH v3 09/44] SUNRPC: Simplify dealing with aborted partially transmitted messages Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 11/44] SUNRPC: Add a helper to wake up a sleeping rpc_task and set its status Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs We are going to need to pin for both send and receive. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- include/linux/sunrpc/sched.h | 3 +-- include/linux/sunrpc/xprt.h | 1 + net/sunrpc/xprt.c | 43 +++++++++++++++++++----------------- 3 files changed, 25 insertions(+), 22 deletions(-) diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h index 9e655df70131..8062ce6b18e5 100644 --- a/include/linux/sunrpc/sched.h +++ b/include/linux/sunrpc/sched.h @@ -142,8 +142,7 @@ struct rpc_task_setup { #define RPC_TASK_ACTIVE 2 #define RPC_TASK_NEED_XMIT 3 #define RPC_TASK_NEED_RECV 4 -#define RPC_TASK_MSG_RECV 5 -#define RPC_TASK_MSG_RECV_WAIT 6 +#define RPC_TASK_MSG_PIN_WAIT 5 #define RPC_IS_RUNNING(t) test_bit(RPC_TASK_RUNNING, &(t)->tk_runstate) #define rpc_set_running(t) set_bit(RPC_TASK_RUNNING, &(t)->tk_runstate) diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h index 3d80524e92d6..bd743c51a865 100644 --- a/include/linux/sunrpc/xprt.h +++ b/include/linux/sunrpc/xprt.h @@ -103,6 +103,7 @@ struct rpc_rqst { /* A cookie used to track the state of the transport connection */ + atomic_t rq_pin; /* * Partial send handling diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 45d580cd93ac..649a40cfae6d 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -847,16 +847,22 @@ struct rpc_rqst *xprt_lookup_rqst(struct rpc_xprt *xprt, __be32 xid) } EXPORT_SYMBOL_GPL(xprt_lookup_rqst); +static bool +xprt_is_pinned_rqst(struct rpc_rqst *req) +{ + return atomic_read(&req->rq_pin) != 0; +} + /** * xprt_pin_rqst - Pin a request on the transport receive list * @req: Request to pin * * Caller must ensure this is atomic with the call to xprt_lookup_rqst() - * so should be holding the xprt transport lock. + * so should be holding the xprt receive lock. */ void xprt_pin_rqst(struct rpc_rqst *req) { - set_bit(RPC_TASK_MSG_RECV, &req->rq_task->tk_runstate); + atomic_inc(&req->rq_pin); } EXPORT_SYMBOL_GPL(xprt_pin_rqst); @@ -864,31 +870,22 @@ EXPORT_SYMBOL_GPL(xprt_pin_rqst); * xprt_unpin_rqst - Unpin a request on the transport receive list * @req: Request to pin * - * Caller should be holding the xprt transport lock. + * Caller should be holding the xprt receive lock. */ void xprt_unpin_rqst(struct rpc_rqst *req) { - struct rpc_task *task = req->rq_task; - - clear_bit(RPC_TASK_MSG_RECV, &task->tk_runstate); - if (test_bit(RPC_TASK_MSG_RECV_WAIT, &task->tk_runstate)) - wake_up_bit(&task->tk_runstate, RPC_TASK_MSG_RECV); + if (!test_bit(RPC_TASK_MSG_PIN_WAIT, &req->rq_task->tk_runstate)) { + atomic_dec(&req->rq_pin); + return; + } + if (atomic_dec_and_test(&req->rq_pin)) + wake_up_var(&req->rq_pin); } EXPORT_SYMBOL_GPL(xprt_unpin_rqst); static void xprt_wait_on_pinned_rqst(struct rpc_rqst *req) -__must_hold(&req->rq_xprt->recv_lock) { - struct rpc_task *task = req->rq_task; - - if (task && test_bit(RPC_TASK_MSG_RECV, &task->tk_runstate)) { - spin_unlock(&req->rq_xprt->recv_lock); - set_bit(RPC_TASK_MSG_RECV_WAIT, &task->tk_runstate); - wait_on_bit(&task->tk_runstate, RPC_TASK_MSG_RECV, - TASK_UNINTERRUPTIBLE); - clear_bit(RPC_TASK_MSG_RECV_WAIT, &task->tk_runstate); - spin_lock(&req->rq_xprt->recv_lock); - } + wait_var_event(&req->rq_pin, !xprt_is_pinned_rqst(req)); } /** @@ -1388,7 +1385,13 @@ void xprt_release(struct rpc_task *task) spin_lock(&xprt->recv_lock); if (!list_empty(&req->rq_list)) { list_del_init(&req->rq_list); - xprt_wait_on_pinned_rqst(req); + if (xprt_is_pinned_rqst(req)) { + set_bit(RPC_TASK_MSG_PIN_WAIT, &req->rq_task->tk_runstate); + spin_unlock(&xprt->recv_lock); + xprt_wait_on_pinned_rqst(req); + spin_lock(&xprt->recv_lock); + clear_bit(RPC_TASK_MSG_PIN_WAIT, &req->rq_task->tk_runstate); + } } spin_unlock(&xprt->recv_lock); spin_lock_bh(&xprt->transport_lock); -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 11/44] SUNRPC: Add a helper to wake up a sleeping rpc_task and set its status 2018-09-17 13:03 ` [PATCH v3 10/44] SUNRPC: Refactor the transport request pinning Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 12/44] SUNRPC: Test whether the task is queued before grabbing the queue spinlocks Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs Add a helper that will wake up a task that is sleeping on a specific queue, and will set the value of task->tk_status. This is mainly intended for use by the transport layer to notify the task of an error condition. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- include/linux/sunrpc/sched.h | 3 ++ net/sunrpc/sched.c | 65 ++++++++++++++++++++++++++++++------ 2 files changed, 58 insertions(+), 10 deletions(-) diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h index 8062ce6b18e5..8840a420cf4c 100644 --- a/include/linux/sunrpc/sched.h +++ b/include/linux/sunrpc/sched.h @@ -235,6 +235,9 @@ void rpc_wake_up_queued_task_on_wq(struct workqueue_struct *wq, struct rpc_task *task); void rpc_wake_up_queued_task(struct rpc_wait_queue *, struct rpc_task *); +void rpc_wake_up_queued_task_set_status(struct rpc_wait_queue *, + struct rpc_task *, + int); void rpc_wake_up(struct rpc_wait_queue *); struct rpc_task *rpc_wake_up_next(struct rpc_wait_queue *); struct rpc_task *rpc_wake_up_first_on_wq(struct workqueue_struct *wq, diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index 3fe5d60ab0e2..dec01bd1b71c 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -440,14 +440,28 @@ static void __rpc_do_wake_up_task_on_wq(struct workqueue_struct *wq, /* * Wake up a queued task while the queue lock is being held */ -static void rpc_wake_up_task_on_wq_queue_locked(struct workqueue_struct *wq, - struct rpc_wait_queue *queue, struct rpc_task *task) +static struct rpc_task * +rpc_wake_up_task_on_wq_queue_action_locked(struct workqueue_struct *wq, + struct rpc_wait_queue *queue, struct rpc_task *task, + bool (*action)(struct rpc_task *, void *), void *data) { if (RPC_IS_QUEUED(task)) { smp_rmb(); - if (task->tk_waitqueue == queue) - __rpc_do_wake_up_task_on_wq(wq, queue, task); + if (task->tk_waitqueue == queue) { + if (action == NULL || action(task, data)) { + __rpc_do_wake_up_task_on_wq(wq, queue, task); + return task; + } + } } + return NULL; +} + +static void +rpc_wake_up_task_on_wq_queue_locked(struct workqueue_struct *wq, + struct rpc_wait_queue *queue, struct rpc_task *task) +{ + rpc_wake_up_task_on_wq_queue_action_locked(wq, queue, task, NULL, NULL); } /* @@ -481,6 +495,40 @@ void rpc_wake_up_queued_task(struct rpc_wait_queue *queue, struct rpc_task *task } EXPORT_SYMBOL_GPL(rpc_wake_up_queued_task); +static bool rpc_task_action_set_status(struct rpc_task *task, void *status) +{ + task->tk_status = *(int *)status; + return true; +} + +static void +rpc_wake_up_task_queue_set_status_locked(struct rpc_wait_queue *queue, + struct rpc_task *task, int status) +{ + rpc_wake_up_task_on_wq_queue_action_locked(rpciod_workqueue, queue, + task, rpc_task_action_set_status, &status); +} + +/** + * rpc_wake_up_queued_task_set_status - wake up a task and set task->tk_status + * @queue: pointer to rpc_wait_queue + * @task: pointer to rpc_task + * @status: integer error value + * + * If @task is queued on @queue, then it is woken up, and @task->tk_status is + * set to the value of @status. + */ +void +rpc_wake_up_queued_task_set_status(struct rpc_wait_queue *queue, + struct rpc_task *task, int status) +{ + if (!RPC_IS_QUEUED(task)) + return; + spin_lock_bh(&queue->lock); + rpc_wake_up_task_queue_set_status_locked(queue, task, status); + spin_unlock_bh(&queue->lock); +} + /* * Wake up the next task on a priority queue. */ @@ -553,12 +601,9 @@ struct rpc_task *rpc_wake_up_first_on_wq(struct workqueue_struct *wq, queue, rpc_qname(queue)); spin_lock_bh(&queue->lock); task = __rpc_find_next_queued(queue); - if (task != NULL) { - if (func(task, data)) - rpc_wake_up_task_on_wq_queue_locked(wq, queue, task); - else - task = NULL; - } + if (task != NULL) + task = rpc_wake_up_task_on_wq_queue_action_locked(wq, queue, + task, func, data); spin_unlock_bh(&queue->lock); return task; -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 12/44] SUNRPC: Test whether the task is queued before grabbing the queue spinlocks 2018-09-17 13:03 ` [PATCH v3 11/44] SUNRPC: Add a helper to wake up a sleeping rpc_task and set its status Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 13/44] SUNRPC: Don't wake queued RPC calls multiple times in xprt_transmit Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs When asked to wake up an RPC task, it makes sense to test whether or not the task is still queued. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- net/sunrpc/sched.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index dec01bd1b71c..9a8ec012b449 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -479,6 +479,8 @@ void rpc_wake_up_queued_task_on_wq(struct workqueue_struct *wq, struct rpc_wait_queue *queue, struct rpc_task *task) { + if (!RPC_IS_QUEUED(task)) + return; spin_lock_bh(&queue->lock); rpc_wake_up_task_on_wq_queue_locked(wq, queue, task); spin_unlock_bh(&queue->lock); @@ -489,6 +491,8 @@ void rpc_wake_up_queued_task_on_wq(struct workqueue_struct *wq, */ void rpc_wake_up_queued_task(struct rpc_wait_queue *queue, struct rpc_task *task) { + if (!RPC_IS_QUEUED(task)) + return; spin_lock_bh(&queue->lock); rpc_wake_up_task_queue_locked(queue, task); spin_unlock_bh(&queue->lock); -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 13/44] SUNRPC: Don't wake queued RPC calls multiple times in xprt_transmit 2018-09-17 13:03 ` [PATCH v3 12/44] SUNRPC: Test whether the task is queued before grabbing the queue spinlocks Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 14/44] SUNRPC: Rename xprt->recv_lock to xprt->queue_lock Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs Rather than waking up the entire queue of RPC messages a second time, just wake up the task that was put to sleep. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- net/sunrpc/xprt.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 649a40cfae6d..3a3b3445a7c0 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1079,13 +1079,10 @@ void xprt_transmit(struct rpc_task *task) spin_lock(&xprt->recv_lock); if (test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate)) { rpc_sleep_on(&xprt->pending, task, xprt_timer); - /* - * Send an extra queue wakeup call if the - * connection was dropped in case the call to - * rpc_sleep_on() raced. - */ + /* Wake up immediately if the connection was dropped */ if (!xprt_connected(xprt)) - xprt_wake_pending_tasks(xprt, -ENOTCONN); + rpc_wake_up_queued_task_set_status(&xprt->pending, + task, -ENOTCONN); } spin_unlock(&xprt->recv_lock); } -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 14/44] SUNRPC: Rename xprt->recv_lock to xprt->queue_lock 2018-09-17 13:03 ` [PATCH v3 13/44] SUNRPC: Don't wake queued RPC calls multiple times in xprt_transmit Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 15/44] SUNRPC: Refactor xprt_transmit() to remove the reply queue code Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs We will use the same lock to protect both the transmit and receive queues. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- include/linux/sunrpc/xprt.h | 2 +- net/sunrpc/svcsock.c | 6 ++--- net/sunrpc/xprt.c | 24 ++++++++--------- net/sunrpc/xprtrdma/rpc_rdma.c | 10 ++++---- net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 4 +-- net/sunrpc/xprtsock.c | 30 +++++++++++----------- 6 files changed, 38 insertions(+), 38 deletions(-) diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h index bd743c51a865..c25d0a5fda69 100644 --- a/include/linux/sunrpc/xprt.h +++ b/include/linux/sunrpc/xprt.h @@ -235,7 +235,7 @@ struct rpc_xprt { */ spinlock_t transport_lock; /* lock transport info */ spinlock_t reserve_lock; /* lock slot table */ - spinlock_t recv_lock; /* lock receive list */ + spinlock_t queue_lock; /* send/receive queue lock */ u32 xid; /* Next XID value to use */ struct rpc_task * snd_task; /* Task blocked in send */ struct svc_xprt *bc_xprt; /* NFSv4.1 backchannel */ diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 5445145e639c..db8bb6b3a2b0 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -1004,7 +1004,7 @@ static int receive_cb_reply(struct svc_sock *svsk, struct svc_rqst *rqstp) if (!bc_xprt) return -EAGAIN; - spin_lock(&bc_xprt->recv_lock); + spin_lock(&bc_xprt->queue_lock); req = xprt_lookup_rqst(bc_xprt, xid); if (!req) goto unlock_notfound; @@ -1022,7 +1022,7 @@ static int receive_cb_reply(struct svc_sock *svsk, struct svc_rqst *rqstp) memcpy(dst->iov_base, src->iov_base, src->iov_len); xprt_complete_rqst(req->rq_task, rqstp->rq_arg.len); rqstp->rq_arg.len = 0; - spin_unlock(&bc_xprt->recv_lock); + spin_unlock(&bc_xprt->queue_lock); return 0; unlock_notfound: printk(KERN_NOTICE @@ -1031,7 +1031,7 @@ static int receive_cb_reply(struct svc_sock *svsk, struct svc_rqst *rqstp) __func__, ntohl(calldir), bc_xprt, ntohl(xid)); unlock_eagain: - spin_unlock(&bc_xprt->recv_lock); + spin_unlock(&bc_xprt->queue_lock); return -EAGAIN; } diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 3a3b3445a7c0..6e3d4b4ee79e 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -826,7 +826,7 @@ static void xprt_connect_status(struct rpc_task *task) * @xprt: transport on which the original request was transmitted * @xid: RPC XID of incoming reply * - * Caller holds xprt->recv_lock. + * Caller holds xprt->queue_lock. */ struct rpc_rqst *xprt_lookup_rqst(struct rpc_xprt *xprt, __be32 xid) { @@ -892,7 +892,7 @@ static void xprt_wait_on_pinned_rqst(struct rpc_rqst *req) * xprt_update_rtt - Update RPC RTT statistics * @task: RPC request that recently completed * - * Caller holds xprt->recv_lock. + * Caller holds xprt->queue_lock. */ void xprt_update_rtt(struct rpc_task *task) { @@ -914,7 +914,7 @@ EXPORT_SYMBOL_GPL(xprt_update_rtt); * @task: RPC request that recently completed * @copied: actual number of bytes received from the transport * - * Caller holds xprt->recv_lock. + * Caller holds xprt->queue_lock. */ void xprt_complete_rqst(struct rpc_task *task, int copied) { @@ -1034,10 +1034,10 @@ void xprt_transmit(struct rpc_task *task) memcpy(&req->rq_private_buf, &req->rq_rcv_buf, sizeof(req->rq_private_buf)); /* Add request to the receive list */ - spin_lock(&xprt->recv_lock); + spin_lock(&xprt->queue_lock); list_add_tail(&req->rq_list, &xprt->recv); set_bit(RPC_TASK_NEED_RECV, &task->tk_runstate); - spin_unlock(&xprt->recv_lock); + spin_unlock(&xprt->queue_lock); xprt_reset_majortimeo(req); /* Turn off autodisconnect */ del_singleshot_timer_sync(&xprt->timer); @@ -1076,7 +1076,7 @@ void xprt_transmit(struct rpc_task *task) * The spinlock ensures atomicity between the test of * req->rq_reply_bytes_recvd, and the call to rpc_sleep_on(). */ - spin_lock(&xprt->recv_lock); + spin_lock(&xprt->queue_lock); if (test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate)) { rpc_sleep_on(&xprt->pending, task, xprt_timer); /* Wake up immediately if the connection was dropped */ @@ -1084,7 +1084,7 @@ void xprt_transmit(struct rpc_task *task) rpc_wake_up_queued_task_set_status(&xprt->pending, task, -ENOTCONN); } - spin_unlock(&xprt->recv_lock); + spin_unlock(&xprt->queue_lock); } } @@ -1379,18 +1379,18 @@ void xprt_release(struct rpc_task *task) task->tk_ops->rpc_count_stats(task, task->tk_calldata); else if (task->tk_client) rpc_count_iostats(task, task->tk_client->cl_metrics); - spin_lock(&xprt->recv_lock); + spin_lock(&xprt->queue_lock); if (!list_empty(&req->rq_list)) { list_del_init(&req->rq_list); if (xprt_is_pinned_rqst(req)) { set_bit(RPC_TASK_MSG_PIN_WAIT, &req->rq_task->tk_runstate); - spin_unlock(&xprt->recv_lock); + spin_unlock(&xprt->queue_lock); xprt_wait_on_pinned_rqst(req); - spin_lock(&xprt->recv_lock); + spin_lock(&xprt->queue_lock); clear_bit(RPC_TASK_MSG_PIN_WAIT, &req->rq_task->tk_runstate); } } - spin_unlock(&xprt->recv_lock); + spin_unlock(&xprt->queue_lock); spin_lock_bh(&xprt->transport_lock); xprt->ops->release_xprt(xprt, task); if (xprt->ops->release_request) @@ -1420,7 +1420,7 @@ static void xprt_init(struct rpc_xprt *xprt, struct net *net) spin_lock_init(&xprt->transport_lock); spin_lock_init(&xprt->reserve_lock); - spin_lock_init(&xprt->recv_lock); + spin_lock_init(&xprt->queue_lock); INIT_LIST_HEAD(&xprt->free); INIT_LIST_HEAD(&xprt->recv); diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c index c8ae983c6cc0..0020dc401215 100644 --- a/net/sunrpc/xprtrdma/rpc_rdma.c +++ b/net/sunrpc/xprtrdma/rpc_rdma.c @@ -1238,7 +1238,7 @@ void rpcrdma_complete_rqst(struct rpcrdma_rep *rep) goto out_badheader; out: - spin_lock(&xprt->recv_lock); + spin_lock(&xprt->queue_lock); cwnd = xprt->cwnd; xprt->cwnd = r_xprt->rx_buf.rb_credits << RPC_CWNDSHIFT; if (xprt->cwnd > cwnd) @@ -1246,7 +1246,7 @@ void rpcrdma_complete_rqst(struct rpcrdma_rep *rep) xprt_complete_rqst(rqst->rq_task, status); xprt_unpin_rqst(rqst); - spin_unlock(&xprt->recv_lock); + spin_unlock(&xprt->queue_lock); return; /* If the incoming reply terminated a pending RPC, the next @@ -1345,7 +1345,7 @@ void rpcrdma_reply_handler(struct rpcrdma_rep *rep) /* Match incoming rpcrdma_rep to an rpcrdma_req to * get context for handling any incoming chunks. */ - spin_lock(&xprt->recv_lock); + spin_lock(&xprt->queue_lock); rqst = xprt_lookup_rqst(xprt, rep->rr_xid); if (!rqst) goto out_norqst; @@ -1357,7 +1357,7 @@ void rpcrdma_reply_handler(struct rpcrdma_rep *rep) credits = buf->rb_max_requests; buf->rb_credits = credits; - spin_unlock(&xprt->recv_lock); + spin_unlock(&xprt->queue_lock); req = rpcr_to_rdmar(rqst); req->rl_reply = rep; @@ -1378,7 +1378,7 @@ void rpcrdma_reply_handler(struct rpcrdma_rep *rep) * is corrupt. */ out_norqst: - spin_unlock(&xprt->recv_lock); + spin_unlock(&xprt->queue_lock); trace_xprtrdma_reply_rqst(rep); goto repost; diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c index a68180090554..09b12b7568fe 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c @@ -56,7 +56,7 @@ int svc_rdma_handle_bc_reply(struct rpc_xprt *xprt, __be32 *rdma_resp, if (src->iov_len < 24) goto out_shortreply; - spin_lock(&xprt->recv_lock); + spin_lock(&xprt->queue_lock); req = xprt_lookup_rqst(xprt, xid); if (!req) goto out_notfound; @@ -86,7 +86,7 @@ int svc_rdma_handle_bc_reply(struct rpc_xprt *xprt, __be32 *rdma_resp, rcvbuf->len = 0; out_unlock: - spin_unlock(&xprt->recv_lock); + spin_unlock(&xprt->queue_lock); out: return ret; diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 3fbccebd0b10..8d6404259ff9 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -966,12 +966,12 @@ static void xs_local_data_read_skb(struct rpc_xprt *xprt, return; /* Look up and lock the request corresponding to the given XID */ - spin_lock(&xprt->recv_lock); + spin_lock(&xprt->queue_lock); rovr = xprt_lookup_rqst(xprt, *xp); if (!rovr) goto out_unlock; xprt_pin_rqst(rovr); - spin_unlock(&xprt->recv_lock); + spin_unlock(&xprt->queue_lock); task = rovr->rq_task; copied = rovr->rq_private_buf.buflen; @@ -980,16 +980,16 @@ static void xs_local_data_read_skb(struct rpc_xprt *xprt, if (xs_local_copy_to_xdr(&rovr->rq_private_buf, skb)) { dprintk("RPC: sk_buff copy failed\n"); - spin_lock(&xprt->recv_lock); + spin_lock(&xprt->queue_lock); goto out_unpin; } - spin_lock(&xprt->recv_lock); + spin_lock(&xprt->queue_lock); xprt_complete_rqst(task, copied); out_unpin: xprt_unpin_rqst(rovr); out_unlock: - spin_unlock(&xprt->recv_lock); + spin_unlock(&xprt->queue_lock); } static void xs_local_data_receive(struct sock_xprt *transport) @@ -1058,13 +1058,13 @@ static void xs_udp_data_read_skb(struct rpc_xprt *xprt, return; /* Look up and lock the request corresponding to the given XID */ - spin_lock(&xprt->recv_lock); + spin_lock(&xprt->queue_lock); rovr = xprt_lookup_rqst(xprt, *xp); if (!rovr) goto out_unlock; xprt_pin_rqst(rovr); xprt_update_rtt(rovr->rq_task); - spin_unlock(&xprt->recv_lock); + spin_unlock(&xprt->queue_lock); task = rovr->rq_task; if ((copied = rovr->rq_private_buf.buflen) > repsize) @@ -1072,7 +1072,7 @@ static void xs_udp_data_read_skb(struct rpc_xprt *xprt, /* Suck it into the iovec, verify checksum if not done by hw. */ if (csum_partial_copy_to_xdr(&rovr->rq_private_buf, skb)) { - spin_lock(&xprt->recv_lock); + spin_lock(&xprt->queue_lock); __UDPX_INC_STATS(sk, UDP_MIB_INERRORS); goto out_unpin; } @@ -1081,13 +1081,13 @@ static void xs_udp_data_read_skb(struct rpc_xprt *xprt, spin_lock_bh(&xprt->transport_lock); xprt_adjust_cwnd(xprt, task, copied); spin_unlock_bh(&xprt->transport_lock); - spin_lock(&xprt->recv_lock); + spin_lock(&xprt->queue_lock); xprt_complete_rqst(task, copied); __UDPX_INC_STATS(sk, UDP_MIB_INDATAGRAMS); out_unpin: xprt_unpin_rqst(rovr); out_unlock: - spin_unlock(&xprt->recv_lock); + spin_unlock(&xprt->queue_lock); } static void xs_udp_data_receive(struct sock_xprt *transport) @@ -1356,24 +1356,24 @@ static inline int xs_tcp_read_reply(struct rpc_xprt *xprt, dprintk("RPC: read reply XID %08x\n", ntohl(transport->recv.xid)); /* Find and lock the request corresponding to this xid */ - spin_lock(&xprt->recv_lock); + spin_lock(&xprt->queue_lock); req = xprt_lookup_rqst(xprt, transport->recv.xid); if (!req) { dprintk("RPC: XID %08x request not found!\n", ntohl(transport->recv.xid)); - spin_unlock(&xprt->recv_lock); + spin_unlock(&xprt->queue_lock); return -1; } xprt_pin_rqst(req); - spin_unlock(&xprt->recv_lock); + spin_unlock(&xprt->queue_lock); xs_tcp_read_common(xprt, desc, req); - spin_lock(&xprt->recv_lock); + spin_lock(&xprt->queue_lock); if (!(transport->recv.flags & TCP_RCV_COPY_DATA)) xprt_complete_rqst(req->rq_task, transport->recv.copied); xprt_unpin_rqst(req); - spin_unlock(&xprt->recv_lock); + spin_unlock(&xprt->queue_lock); return 0; } -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 15/44] SUNRPC: Refactor xprt_transmit() to remove the reply queue code 2018-09-17 13:03 ` [PATCH v3 14/44] SUNRPC: Rename xprt->recv_lock to xprt->queue_lock Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 16/44] SUNRPC: Refactor xprt_transmit() to remove wait for reply code Trond Myklebust 2018-09-18 21:01 ` [PATCH v3 15/44] SUNRPC: Refactor xprt_transmit() to remove the reply queue code Anna Schumaker 0 siblings, 2 replies; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs Separate out the action of adding a request to the reply queue so that the backchannel code can simply skip calling it altogether. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- include/linux/sunrpc/xprt.h | 1 + net/sunrpc/backchannel_rqst.c | 1 - net/sunrpc/clnt.c | 5 ++ net/sunrpc/xprt.c | 126 +++++++++++++++++++----------- net/sunrpc/xprtrdma/backchannel.c | 1 - 5 files changed, 88 insertions(+), 46 deletions(-) diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h index c25d0a5fda69..0250294c904a 100644 --- a/include/linux/sunrpc/xprt.h +++ b/include/linux/sunrpc/xprt.h @@ -334,6 +334,7 @@ void xprt_free_slot(struct rpc_xprt *xprt, struct rpc_rqst *req); void xprt_lock_and_alloc_slot(struct rpc_xprt *xprt, struct rpc_task *task); bool xprt_prepare_transmit(struct rpc_task *task); +void xprt_request_enqueue_receive(struct rpc_task *task); void xprt_transmit(struct rpc_task *task); void xprt_end_transmit(struct rpc_task *task); int xprt_adjust_timeout(struct rpc_rqst *req); diff --git a/net/sunrpc/backchannel_rqst.c b/net/sunrpc/backchannel_rqst.c index 3c15a99b9700..fa5ba6ed3197 100644 --- a/net/sunrpc/backchannel_rqst.c +++ b/net/sunrpc/backchannel_rqst.c @@ -91,7 +91,6 @@ struct rpc_rqst *xprt_alloc_bc_req(struct rpc_xprt *xprt, gfp_t gfp_flags) return NULL; req->rq_xprt = xprt; - INIT_LIST_HEAD(&req->rq_list); INIT_LIST_HEAD(&req->rq_bc_list); /* Preallocate one XDR receive buffer */ diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index a858366cd15d..414966273a3f 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1962,6 +1962,11 @@ call_transmit(struct rpc_task *task) return; } } + + /* Add task to reply queue before transmission to avoid races */ + if (rpc_reply_expected(task)) + xprt_request_enqueue_receive(task); + if (!xprt_prepare_transmit(task)) return; task->tk_action = call_transmit_status; diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 6e3d4b4ee79e..d8f870b5dd46 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -888,6 +888,61 @@ static void xprt_wait_on_pinned_rqst(struct rpc_rqst *req) wait_var_event(&req->rq_pin, !xprt_is_pinned_rqst(req)); } +static bool +xprt_request_data_received(struct rpc_task *task) +{ + return !test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate) && + READ_ONCE(task->tk_rqstp->rq_reply_bytes_recvd) != 0; +} + +static bool +xprt_request_need_enqueue_receive(struct rpc_task *task, struct rpc_rqst *req) +{ + return !xprt_request_data_received(task); +} + +/** + * xprt_request_enqueue_receive - Add an request to the receive queue + * @task: RPC task + * + */ +void +xprt_request_enqueue_receive(struct rpc_task *task) +{ + struct rpc_rqst *req = task->tk_rqstp; + struct rpc_xprt *xprt = req->rq_xprt; + + if (!xprt_request_need_enqueue_receive(task, req)) + return; + spin_lock(&xprt->queue_lock); + + /* Update the softirq receive buffer */ + memcpy(&req->rq_private_buf, &req->rq_rcv_buf, + sizeof(req->rq_private_buf)); + + /* Add request to the receive list */ + list_add_tail(&req->rq_list, &xprt->recv); + set_bit(RPC_TASK_NEED_RECV, &task->tk_runstate); + spin_unlock(&xprt->queue_lock); + + xprt_reset_majortimeo(req); + /* Turn off autodisconnect */ + del_singleshot_timer_sync(&xprt->timer); +} + +/** + * xprt_request_dequeue_receive_locked - Remove a request from the receive queue + * @task: RPC task + * + * Caller must hold xprt->queue_lock. + */ +static void +xprt_request_dequeue_receive_locked(struct rpc_task *task) +{ + if (test_and_clear_bit(RPC_TASK_NEED_RECV, &task->tk_runstate)) + list_del(&task->tk_rqstp->rq_list); +} + /** * xprt_update_rtt - Update RPC RTT statistics * @task: RPC request that recently completed @@ -927,24 +982,16 @@ void xprt_complete_rqst(struct rpc_task *task, int copied) xprt->stat.recvs++; - list_del_init(&req->rq_list); req->rq_private_buf.len = copied; /* Ensure all writes are done before we update */ /* req->rq_reply_bytes_recvd */ smp_wmb(); req->rq_reply_bytes_recvd = copied; - clear_bit(RPC_TASK_NEED_RECV, &task->tk_runstate); + xprt_request_dequeue_receive_locked(task); rpc_wake_up_queued_task(&xprt->pending, task); } EXPORT_SYMBOL_GPL(xprt_complete_rqst); -static bool -xprt_request_data_received(struct rpc_task *task) -{ - return !test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate) && - task->tk_rqstp->rq_reply_bytes_recvd != 0; -} - static void xprt_timer(struct rpc_task *task) { struct rpc_rqst *req = task->tk_rqstp; @@ -1018,32 +1065,15 @@ void xprt_transmit(struct rpc_task *task) dprintk("RPC: %5u xprt_transmit(%u)\n", task->tk_pid, req->rq_slen); - if (!req->rq_reply_bytes_recvd) { - + if (!req->rq_bytes_sent) { + if (xprt_request_data_received(task)) + return; /* Verify that our message lies in the RPCSEC_GSS window */ - if (!req->rq_bytes_sent && rpcauth_xmit_need_reencode(task)) { + if (rpcauth_xmit_need_reencode(task)) { task->tk_status = -EBADMSG; return; } - - if (list_empty(&req->rq_list) && rpc_reply_expected(task)) { - /* - * Add to the list only if we're expecting a reply - */ - /* Update the softirq receive buffer */ - memcpy(&req->rq_private_buf, &req->rq_rcv_buf, - sizeof(req->rq_private_buf)); - /* Add request to the receive list */ - spin_lock(&xprt->queue_lock); - list_add_tail(&req->rq_list, &xprt->recv); - set_bit(RPC_TASK_NEED_RECV, &task->tk_runstate); - spin_unlock(&xprt->queue_lock); - xprt_reset_majortimeo(req); - /* Turn off autodisconnect */ - del_singleshot_timer_sync(&xprt->timer); - } - } else if (xprt_request_data_received(task) && !req->rq_bytes_sent) - return; + } connect_cookie = xprt->connect_cookie; status = xprt->ops->send_request(task); @@ -1285,7 +1315,6 @@ xprt_request_init(struct rpc_task *task) struct rpc_xprt *xprt = task->tk_xprt; struct rpc_rqst *req = task->tk_rqstp; - INIT_LIST_HEAD(&req->rq_list); req->rq_timeout = task->tk_client->cl_timeout->to_initval; req->rq_task = task; req->rq_xprt = xprt; @@ -1355,6 +1384,26 @@ void xprt_retry_reserve(struct rpc_task *task) xprt_do_reserve(xprt, task); } +static void +xprt_request_dequeue_all(struct rpc_task *task, struct rpc_rqst *req) +{ + struct rpc_xprt *xprt = req->rq_xprt; + + if (test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate) || + xprt_is_pinned_rqst(req)) { + spin_lock(&xprt->queue_lock); + xprt_request_dequeue_receive_locked(task); + while (xprt_is_pinned_rqst(req)) { + set_bit(RPC_TASK_MSG_PIN_WAIT, &task->tk_runstate); + spin_unlock(&xprt->queue_lock); + xprt_wait_on_pinned_rqst(req); + spin_lock(&xprt->queue_lock); + clear_bit(RPC_TASK_MSG_PIN_WAIT, &task->tk_runstate); + } + spin_unlock(&xprt->queue_lock); + } +} + /** * xprt_release - release an RPC request slot * @task: task which is finished with the slot @@ -1379,18 +1428,7 @@ void xprt_release(struct rpc_task *task) task->tk_ops->rpc_count_stats(task, task->tk_calldata); else if (task->tk_client) rpc_count_iostats(task, task->tk_client->cl_metrics); - spin_lock(&xprt->queue_lock); - if (!list_empty(&req->rq_list)) { - list_del_init(&req->rq_list); - if (xprt_is_pinned_rqst(req)) { - set_bit(RPC_TASK_MSG_PIN_WAIT, &req->rq_task->tk_runstate); - spin_unlock(&xprt->queue_lock); - xprt_wait_on_pinned_rqst(req); - spin_lock(&xprt->queue_lock); - clear_bit(RPC_TASK_MSG_PIN_WAIT, &req->rq_task->tk_runstate); - } - } - spin_unlock(&xprt->queue_lock); + xprt_request_dequeue_all(task, req); spin_lock_bh(&xprt->transport_lock); xprt->ops->release_xprt(xprt, task); if (xprt->ops->release_request) diff --git a/net/sunrpc/xprtrdma/backchannel.c b/net/sunrpc/xprtrdma/backchannel.c index 90adeff4c06b..ed58761e6b23 100644 --- a/net/sunrpc/xprtrdma/backchannel.c +++ b/net/sunrpc/xprtrdma/backchannel.c @@ -51,7 +51,6 @@ static int rpcrdma_bc_setup_reqs(struct rpcrdma_xprt *r_xprt, rqst = &req->rl_slot; rqst->rq_xprt = xprt; - INIT_LIST_HEAD(&rqst->rq_list); INIT_LIST_HEAD(&rqst->rq_bc_list); __set_bit(RPC_BC_PA_IN_USE, &rqst->rq_bc_pa_state); spin_lock_bh(&xprt->bc_pa_lock); -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 16/44] SUNRPC: Refactor xprt_transmit() to remove wait for reply code 2018-09-17 13:03 ` [PATCH v3 15/44] SUNRPC: Refactor xprt_transmit() to remove the reply queue code Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 17/44] SUNRPC: Minor cleanup for call_transmit() Trond Myklebust 2018-09-18 21:01 ` [PATCH v3 15/44] SUNRPC: Refactor xprt_transmit() to remove the reply queue code Anna Schumaker 1 sibling, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs Allow the caller in clnt.c to call into the code to wait for a reply after calling xprt_transmit(). Again, the reason is that the backchannel code does not need this functionality. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- include/linux/sunrpc/xprt.h | 1 + net/sunrpc/clnt.c | 10 +---- net/sunrpc/xprt.c | 74 ++++++++++++++++++++++++++----------- 3 files changed, 54 insertions(+), 31 deletions(-) diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h index 0250294c904a..4fa2af087cff 100644 --- a/include/linux/sunrpc/xprt.h +++ b/include/linux/sunrpc/xprt.h @@ -335,6 +335,7 @@ void xprt_free_slot(struct rpc_xprt *xprt, void xprt_lock_and_alloc_slot(struct rpc_xprt *xprt, struct rpc_task *task); bool xprt_prepare_transmit(struct rpc_task *task); void xprt_request_enqueue_receive(struct rpc_task *task); +void xprt_request_wait_receive(struct rpc_task *task); void xprt_transmit(struct rpc_task *task); void xprt_end_transmit(struct rpc_task *task); int xprt_adjust_timeout(struct rpc_rqst *req); diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index 414966273a3f..775d6e80b6e8 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1975,15 +1975,6 @@ call_transmit(struct rpc_task *task) return; if (is_retrans) task->tk_client->cl_stats->rpcretrans++; - /* - * On success, ensure that we call xprt_end_transmit() before sleeping - * in order to allow access to the socket to other RPC requests. - */ - call_transmit_status(task); - if (rpc_reply_expected(task)) - return; - task->tk_action = rpc_exit_task; - rpc_wake_up_queued_task(&task->tk_rqstp->rq_xprt->pending, task); } /* @@ -2000,6 +1991,7 @@ call_transmit_status(struct rpc_task *task) */ if (task->tk_status == 0) { xprt_end_transmit(task); + xprt_request_wait_receive(task); return; } diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index d8f870b5dd46..fe857ab18ee2 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -654,6 +654,22 @@ void xprt_force_disconnect(struct rpc_xprt *xprt) } EXPORT_SYMBOL_GPL(xprt_force_disconnect); +static unsigned int +xprt_connect_cookie(struct rpc_xprt *xprt) +{ + return READ_ONCE(xprt->connect_cookie); +} + +static bool +xprt_request_retransmit_after_disconnect(struct rpc_task *task) +{ + struct rpc_rqst *req = task->tk_rqstp; + struct rpc_xprt *xprt = req->rq_xprt; + + return req->rq_connect_cookie != xprt_connect_cookie(xprt) || + !xprt_connected(xprt); +} + /** * xprt_conditional_disconnect - force a transport to disconnect * @xprt: transport to disconnect @@ -1008,6 +1024,39 @@ static void xprt_timer(struct rpc_task *task) task->tk_status = 0; } +/** + * xprt_request_wait_receive - wait for the reply to an RPC request + * @task: RPC task about to send a request + * + */ +void xprt_request_wait_receive(struct rpc_task *task) +{ + struct rpc_rqst *req = task->tk_rqstp; + struct rpc_xprt *xprt = req->rq_xprt; + + if (!test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate)) + return; + /* + * Sleep on the pending queue if we're expecting a reply. + * The spinlock ensures atomicity between the test of + * req->rq_reply_bytes_recvd, and the call to rpc_sleep_on(). + */ + spin_lock(&xprt->queue_lock); + if (test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate)) { + xprt->ops->set_retrans_timeout(task); + rpc_sleep_on(&xprt->pending, task, xprt_timer); + /* + * Send an extra queue wakeup call if the + * connection was dropped in case the call to + * rpc_sleep_on() raced. + */ + if (xprt_request_retransmit_after_disconnect(task)) + rpc_wake_up_queued_task_set_status(&xprt->pending, + task, -ENOTCONN); + } + spin_unlock(&xprt->queue_lock); +} + /** * xprt_prepare_transmit - reserve the transport before sending a request * @task: RPC task about to send a request @@ -1027,9 +1076,8 @@ bool xprt_prepare_transmit(struct rpc_task *task) task->tk_status = req->rq_reply_bytes_recvd; goto out_unlock; } - if ((task->tk_flags & RPC_TASK_NO_RETRANS_TIMEOUT) - && xprt_connected(xprt) - && req->rq_connect_cookie == xprt->connect_cookie) { + if ((task->tk_flags & RPC_TASK_NO_RETRANS_TIMEOUT) && + !xprt_request_retransmit_after_disconnect(task)) { xprt->ops->set_retrans_timeout(task); rpc_sleep_on(&xprt->pending, task, xprt_timer); goto out_unlock; @@ -1090,8 +1138,6 @@ void xprt_transmit(struct rpc_task *task) task->tk_flags |= RPC_TASK_SENT; spin_lock_bh(&xprt->transport_lock); - xprt->ops->set_retrans_timeout(task); - xprt->stat.sends++; xprt->stat.req_u += xprt->stat.sends - xprt->stat.recvs; xprt->stat.bklog_u += xprt->backlog.qlen; @@ -1100,22 +1146,6 @@ void xprt_transmit(struct rpc_task *task) spin_unlock_bh(&xprt->transport_lock); req->rq_connect_cookie = connect_cookie; - if (test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate)) { - /* - * Sleep on the pending queue if we're expecting a reply. - * The spinlock ensures atomicity between the test of - * req->rq_reply_bytes_recvd, and the call to rpc_sleep_on(). - */ - spin_lock(&xprt->queue_lock); - if (test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate)) { - rpc_sleep_on(&xprt->pending, task, xprt_timer); - /* Wake up immediately if the connection was dropped */ - if (!xprt_connected(xprt)) - rpc_wake_up_queued_task_set_status(&xprt->pending, - task, -ENOTCONN); - } - spin_unlock(&xprt->queue_lock); - } } static void xprt_add_backlog(struct rpc_xprt *xprt, struct rpc_task *task) @@ -1320,7 +1350,7 @@ xprt_request_init(struct rpc_task *task) req->rq_xprt = xprt; req->rq_buffer = NULL; req->rq_xid = xprt_alloc_xid(xprt); - req->rq_connect_cookie = xprt->connect_cookie - 1; + req->rq_connect_cookie = xprt_connect_cookie(xprt) - 1; req->rq_bytes_sent = 0; req->rq_snd_buf.len = 0; req->rq_snd_buf.buflen = 0; -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 17/44] SUNRPC: Minor cleanup for call_transmit() 2018-09-17 13:03 ` [PATCH v3 16/44] SUNRPC: Refactor xprt_transmit() to remove wait for reply code Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 18/44] SUNRPC: Distinguish between the slot allocation list and receive queue Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- net/sunrpc/clnt.c | 32 +++++++++++++++----------------- 1 file changed, 15 insertions(+), 17 deletions(-) diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index 775d6e80b6e8..be0f06a8156b 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1946,9 +1946,7 @@ call_transmit(struct rpc_task *task) dprint_status(task); - task->tk_action = call_status; - if (task->tk_status < 0) - return; + task->tk_action = call_transmit_status; /* Encode here so that rpcsec_gss can use correct sequence number. */ if (rpc_task_need_encode(task)) { rpc_xdr_encode(task); @@ -1969,7 +1967,6 @@ call_transmit(struct rpc_task *task) if (!xprt_prepare_transmit(task)) return; - task->tk_action = call_transmit_status; xprt_transmit(task); if (task->tk_status < 0) return; @@ -1996,19 +1993,29 @@ call_transmit_status(struct rpc_task *task) } switch (task->tk_status) { - case -EAGAIN: - case -ENOBUFS: - break; default: dprint_status(task); xprt_end_transmit(task); break; + case -EBADMSG: + clear_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate); + task->tk_action = call_transmit; + task->tk_status = 0; + xprt_end_transmit(task); + break; /* * Special cases: if we've been waiting on the * socket's write_space() callback, or if the * socket just returned a connection error, * then hold onto the transport lock. */ + case -ENOBUFS: + rpc_delay(task, HZ>>2); + /* fall through */ + case -EAGAIN: + task->tk_action = call_transmit; + task->tk_status = 0; + break; case -ECONNREFUSED: case -EHOSTDOWN: case -ENETDOWN: @@ -2163,22 +2170,13 @@ call_status(struct rpc_task *task) /* fall through */ case -EPIPE: case -ENOTCONN: - task->tk_action = call_bind; - break; - case -ENOBUFS: - rpc_delay(task, HZ>>2); - /* fall through */ case -EAGAIN: - task->tk_action = call_transmit; + task->tk_action = call_bind; break; case -EIO: /* shutdown or soft timeout */ rpc_exit(task, status); break; - case -EBADMSG: - clear_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate); - task->tk_action = call_transmit; - break; default: if (clnt->cl_chatty) printk("%s: RPC call returned error %d\n", -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 18/44] SUNRPC: Distinguish between the slot allocation list and receive queue 2018-09-17 13:03 ` [PATCH v3 17/44] SUNRPC: Minor cleanup for call_transmit() Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 19/44] SUNRPC: Add a transmission queue for RPC requests Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs When storing a struct rpc_rqst on the slot allocation list, we currently use the same field 'rq_list' as we use to store the request on the receive queue. Since the structure is never on both lists at the same time, this is OK. However, for clarity, let's make that a union with different names for the different lists so that we can more easily distinguish between the two states. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- include/linux/sunrpc/xprt.h | 9 +++++++-- net/sunrpc/xprt.c | 12 ++++++------ 2 files changed, 13 insertions(+), 8 deletions(-) diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h index 4fa2af087cff..9cec2d0811f2 100644 --- a/include/linux/sunrpc/xprt.h +++ b/include/linux/sunrpc/xprt.h @@ -82,7 +82,11 @@ struct rpc_rqst { struct page **rq_enc_pages; /* scratch pages for use by gss privacy code */ void (*rq_release_snd_buf)(struct rpc_rqst *); /* release rq_enc_pages */ - struct list_head rq_list; + + union { + struct list_head rq_list; /* Slot allocation list */ + struct list_head rq_recv; /* Receive queue */ + }; void *rq_buffer; /* Call XDR encode buffer */ size_t rq_callsize; @@ -249,7 +253,8 @@ struct rpc_xprt { struct list_head bc_pa_list; /* List of preallocated * backchannel rpc_rqst's */ #endif /* CONFIG_SUNRPC_BACKCHANNEL */ - struct list_head recv; + + struct list_head recv_queue; /* Receive queue */ struct { unsigned long bind_count, /* total number of binds */ diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index fe857ab18ee2..b242a1c78f8a 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -708,7 +708,7 @@ static void xprt_schedule_autodisconnect(struct rpc_xprt *xprt) __must_hold(&xprt->transport_lock) { - if (list_empty(&xprt->recv) && xprt_has_timer(xprt)) + if (list_empty(&xprt->recv_queue) && xprt_has_timer(xprt)) mod_timer(&xprt->timer, xprt->last_used + xprt->idle_timeout); } @@ -718,7 +718,7 @@ xprt_init_autodisconnect(struct timer_list *t) struct rpc_xprt *xprt = from_timer(xprt, t, timer); spin_lock(&xprt->transport_lock); - if (!list_empty(&xprt->recv)) + if (!list_empty(&xprt->recv_queue)) goto out_abort; /* Reset xprt->last_used to avoid connect/autodisconnect cycling */ xprt->last_used = jiffies; @@ -848,7 +848,7 @@ struct rpc_rqst *xprt_lookup_rqst(struct rpc_xprt *xprt, __be32 xid) { struct rpc_rqst *entry; - list_for_each_entry(entry, &xprt->recv, rq_list) + list_for_each_entry(entry, &xprt->recv_queue, rq_recv) if (entry->rq_xid == xid) { trace_xprt_lookup_rqst(xprt, xid, 0); entry->rq_rtt = ktime_sub(ktime_get(), entry->rq_xtime); @@ -937,7 +937,7 @@ xprt_request_enqueue_receive(struct rpc_task *task) sizeof(req->rq_private_buf)); /* Add request to the receive list */ - list_add_tail(&req->rq_list, &xprt->recv); + list_add_tail(&req->rq_recv, &xprt->recv_queue); set_bit(RPC_TASK_NEED_RECV, &task->tk_runstate); spin_unlock(&xprt->queue_lock); @@ -956,7 +956,7 @@ static void xprt_request_dequeue_receive_locked(struct rpc_task *task) { if (test_and_clear_bit(RPC_TASK_NEED_RECV, &task->tk_runstate)) - list_del(&task->tk_rqstp->rq_list); + list_del(&task->tk_rqstp->rq_recv); } /** @@ -1491,7 +1491,7 @@ static void xprt_init(struct rpc_xprt *xprt, struct net *net) spin_lock_init(&xprt->queue_lock); INIT_LIST_HEAD(&xprt->free); - INIT_LIST_HEAD(&xprt->recv); + INIT_LIST_HEAD(&xprt->recv_queue); #if defined(CONFIG_SUNRPC_BACKCHANNEL) spin_lock_init(&xprt->bc_pa_lock); INIT_LIST_HEAD(&xprt->bc_pa_list); -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 19/44] SUNRPC: Add a transmission queue for RPC requests 2018-09-17 13:03 ` [PATCH v3 18/44] SUNRPC: Distinguish between the slot allocation list and receive queue Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 20/44] SUNRPC: Refactor RPC call encoding Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs Add the queue that will enforce the ordering of RPC task transmission. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- include/linux/sunrpc/xprt.h | 6 +++ net/sunrpc/clnt.c | 6 +-- net/sunrpc/xprt.c | 84 +++++++++++++++++++++++++++++++++---- 3 files changed, 83 insertions(+), 13 deletions(-) diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h index 9cec2d0811f2..81a6c2c8dfc7 100644 --- a/include/linux/sunrpc/xprt.h +++ b/include/linux/sunrpc/xprt.h @@ -88,6 +88,8 @@ struct rpc_rqst { struct list_head rq_recv; /* Receive queue */ }; + struct list_head rq_xmit; /* Send queue */ + void *rq_buffer; /* Call XDR encode buffer */ size_t rq_callsize; void *rq_rbuffer; /* Reply XDR decode buffer */ @@ -242,6 +244,9 @@ struct rpc_xprt { spinlock_t queue_lock; /* send/receive queue lock */ u32 xid; /* Next XID value to use */ struct rpc_task * snd_task; /* Task blocked in send */ + + struct list_head xmit_queue; /* Send queue */ + struct svc_xprt *bc_xprt; /* NFSv4.1 backchannel */ #if defined(CONFIG_SUNRPC_BACKCHANNEL) struct svc_serv *bc_serv; /* The RPC service which will */ @@ -339,6 +344,7 @@ void xprt_free_slot(struct rpc_xprt *xprt, struct rpc_rqst *req); void xprt_lock_and_alloc_slot(struct rpc_xprt *xprt, struct rpc_task *task); bool xprt_prepare_transmit(struct rpc_task *task); +void xprt_request_enqueue_transmit(struct rpc_task *task); void xprt_request_enqueue_receive(struct rpc_task *task); void xprt_request_wait_receive(struct rpc_task *task); void xprt_transmit(struct rpc_task *task); diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index be0f06a8156b..c1a19a3e1356 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1156,11 +1156,11 @@ struct rpc_task *rpc_run_bc_task(struct rpc_rqst *req) */ xbufp->len = xbufp->head[0].iov_len + xbufp->page_len + xbufp->tail[0].iov_len; - set_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate); task->tk_action = call_bc_transmit; atomic_inc(&task->tk_count); WARN_ON_ONCE(atomic_read(&task->tk_count) != 2); + xprt_request_enqueue_transmit(task); rpc_execute(task); dprintk("RPC: rpc_run_bc_task: task= %p\n", task); @@ -1759,8 +1759,6 @@ rpc_xdr_encode(struct rpc_task *task) task->tk_status = rpcauth_wrap_req(task, encode, req, p, task->tk_msg.rpc_argp); - if (task->tk_status == 0) - set_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate); } /* @@ -1964,6 +1962,7 @@ call_transmit(struct rpc_task *task) /* Add task to reply queue before transmission to avoid races */ if (rpc_reply_expected(task)) xprt_request_enqueue_receive(task); + xprt_request_enqueue_transmit(task); if (!xprt_prepare_transmit(task)) return; @@ -1998,7 +1997,6 @@ call_transmit_status(struct rpc_task *task) xprt_end_transmit(task); break; case -EBADMSG: - clear_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate); task->tk_action = call_transmit; task->tk_status = 0; xprt_end_transmit(task); diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index b242a1c78f8a..39a6f6e8ae01 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1057,6 +1057,72 @@ void xprt_request_wait_receive(struct rpc_task *task) spin_unlock(&xprt->queue_lock); } +static bool +xprt_request_need_transmit(struct rpc_task *task) +{ + return !(task->tk_flags & RPC_TASK_NO_RETRANS_TIMEOUT) || + xprt_request_retransmit_after_disconnect(task); +} + +static bool +xprt_request_need_enqueue_transmit(struct rpc_task *task, struct rpc_rqst *req) +{ + return xprt_request_need_transmit(task) && + !test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate); +} + +/** + * xprt_request_enqueue_transmit - queue a task for transmission + * @task: pointer to rpc_task + * + * Add a task to the transmission queue. + */ +void +xprt_request_enqueue_transmit(struct rpc_task *task) +{ + struct rpc_rqst *req = task->tk_rqstp; + struct rpc_xprt *xprt = req->rq_xprt; + + if (xprt_request_need_enqueue_transmit(task, req)) { + spin_lock(&xprt->queue_lock); + list_add_tail(&req->rq_xmit, &xprt->xmit_queue); + set_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate); + spin_unlock(&xprt->queue_lock); + } +} + +/** + * xprt_request_dequeue_transmit_locked - remove a task from the transmission queue + * @task: pointer to rpc_task + * + * Remove a task from the transmission queue + * Caller must hold xprt->queue_lock + */ +static void +xprt_request_dequeue_transmit_locked(struct rpc_task *task) +{ + xprt_task_clear_bytes_sent(task); + if (test_and_clear_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate)) + list_del(&task->tk_rqstp->rq_xmit); +} + +/** + * xprt_request_dequeue_transmit - remove a task from the transmission queue + * @task: pointer to rpc_task + * + * Remove a task from the transmission queue + */ +static void +xprt_request_dequeue_transmit(struct rpc_task *task) +{ + struct rpc_rqst *req = task->tk_rqstp; + struct rpc_xprt *xprt = req->rq_xprt; + + spin_lock(&xprt->queue_lock); + xprt_request_dequeue_transmit_locked(task); + spin_unlock(&xprt->queue_lock); +} + /** * xprt_prepare_transmit - reserve the transport before sending a request * @task: RPC task about to send a request @@ -1076,12 +1142,8 @@ bool xprt_prepare_transmit(struct rpc_task *task) task->tk_status = req->rq_reply_bytes_recvd; goto out_unlock; } - if ((task->tk_flags & RPC_TASK_NO_RETRANS_TIMEOUT) && - !xprt_request_retransmit_after_disconnect(task)) { - xprt->ops->set_retrans_timeout(task); - rpc_sleep_on(&xprt->pending, task, xprt_timer); + if (!test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate)) goto out_unlock; - } } if (!xprt->ops->reserve_xprt(xprt, task)) { task->tk_status = -EAGAIN; @@ -1115,11 +1177,11 @@ void xprt_transmit(struct rpc_task *task) if (!req->rq_bytes_sent) { if (xprt_request_data_received(task)) - return; + goto out_dequeue; /* Verify that our message lies in the RPCSEC_GSS window */ if (rpcauth_xmit_need_reencode(task)) { task->tk_status = -EBADMSG; - return; + goto out_dequeue; } } @@ -1134,7 +1196,6 @@ void xprt_transmit(struct rpc_task *task) xprt_inject_disconnect(xprt); dprintk("RPC: %5u xmit complete\n", task->tk_pid); - clear_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate); task->tk_flags |= RPC_TASK_SENT; spin_lock_bh(&xprt->transport_lock); @@ -1146,6 +1207,8 @@ void xprt_transmit(struct rpc_task *task) spin_unlock_bh(&xprt->transport_lock); req->rq_connect_cookie = connect_cookie; +out_dequeue: + xprt_request_dequeue_transmit(task); } static void xprt_add_backlog(struct rpc_xprt *xprt, struct rpc_task *task) @@ -1419,9 +1482,11 @@ xprt_request_dequeue_all(struct rpc_task *task, struct rpc_rqst *req) { struct rpc_xprt *xprt = req->rq_xprt; - if (test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate) || + if (test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate) || + test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate) || xprt_is_pinned_rqst(req)) { spin_lock(&xprt->queue_lock); + xprt_request_dequeue_transmit_locked(task); xprt_request_dequeue_receive_locked(task); while (xprt_is_pinned_rqst(req)) { set_bit(RPC_TASK_MSG_PIN_WAIT, &task->tk_runstate); @@ -1492,6 +1557,7 @@ static void xprt_init(struct rpc_xprt *xprt, struct net *net) INIT_LIST_HEAD(&xprt->free); INIT_LIST_HEAD(&xprt->recv_queue); + INIT_LIST_HEAD(&xprt->xmit_queue); #if defined(CONFIG_SUNRPC_BACKCHANNEL) spin_lock_init(&xprt->bc_pa_lock); INIT_LIST_HEAD(&xprt->bc_pa_list); -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 20/44] SUNRPC: Refactor RPC call encoding 2018-09-17 13:03 ` [PATCH v3 19/44] SUNRPC: Add a transmission queue for RPC requests Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 21/44] SUNRPC: Fix up the back channel transmit Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs Move the call encoding so that it occurs before the transport connection etc. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- include/linux/sunrpc/xprt.h | 1 + net/sunrpc/clnt.c | 81 ++++++++++++++++++++++--------------- net/sunrpc/xprt.c | 22 +++++----- 3 files changed, 63 insertions(+), 41 deletions(-) diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h index 81a6c2c8dfc7..b8a7de161f67 100644 --- a/include/linux/sunrpc/xprt.h +++ b/include/linux/sunrpc/xprt.h @@ -347,6 +347,7 @@ bool xprt_prepare_transmit(struct rpc_task *task); void xprt_request_enqueue_transmit(struct rpc_task *task); void xprt_request_enqueue_receive(struct rpc_task *task); void xprt_request_wait_receive(struct rpc_task *task); +bool xprt_request_need_retransmit(struct rpc_task *task); void xprt_transmit(struct rpc_task *task); void xprt_end_transmit(struct rpc_task *task); int xprt_adjust_timeout(struct rpc_rqst *req); diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index c1a19a3e1356..64159716be30 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -61,6 +61,7 @@ static void call_start(struct rpc_task *task); static void call_reserve(struct rpc_task *task); static void call_reserveresult(struct rpc_task *task); static void call_allocate(struct rpc_task *task); +static void call_encode(struct rpc_task *task); static void call_decode(struct rpc_task *task); static void call_bind(struct rpc_task *task); static void call_bind_status(struct rpc_task *task); @@ -1140,7 +1141,8 @@ struct rpc_task *rpc_run_bc_task(struct rpc_rqst *req) struct xdr_buf *xbufp = &req->rq_snd_buf; struct rpc_task_setup task_setup_data = { .callback_ops = &rpc_default_ops, - .flags = RPC_TASK_SOFTCONN, + .flags = RPC_TASK_SOFTCONN | + RPC_TASK_NO_RETRANS_TIMEOUT, }; dprintk("RPC: rpc_run_bc_task req= %p\n", req); @@ -1160,7 +1162,6 @@ struct rpc_task *rpc_run_bc_task(struct rpc_rqst *req) task->tk_action = call_bc_transmit; atomic_inc(&task->tk_count); WARN_ON_ONCE(atomic_read(&task->tk_count) != 2); - xprt_request_enqueue_transmit(task); rpc_execute(task); dprintk("RPC: rpc_run_bc_task: task= %p\n", task); @@ -1680,7 +1681,7 @@ call_allocate(struct rpc_task *task) dprint_status(task); task->tk_status = 0; - task->tk_action = call_bind; + task->tk_action = call_encode; if (req->rq_buffer) return; @@ -1724,12 +1725,12 @@ call_allocate(struct rpc_task *task) static int rpc_task_need_encode(struct rpc_task *task) { - return test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate) == 0; + return test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate) == 0 && + (!(task->tk_flags & RPC_TASK_SENT) || + !(task->tk_flags & RPC_TASK_NO_RETRANS_TIMEOUT) || + xprt_request_need_retransmit(task)); } -/* - * 3. Encode arguments of an RPC call - */ static void rpc_xdr_encode(struct rpc_task *task) { @@ -1745,6 +1746,7 @@ rpc_xdr_encode(struct rpc_task *task) xdr_buf_init(&req->rq_rcv_buf, req->rq_rbuffer, req->rq_rcvsize); + req->rq_bytes_sent = 0; p = rpc_encode_header(task); if (p == NULL) { @@ -1761,6 +1763,34 @@ rpc_xdr_encode(struct rpc_task *task) task->tk_msg.rpc_argp); } +/* + * 3. Encode arguments of an RPC call + */ +static void +call_encode(struct rpc_task *task) +{ + if (!rpc_task_need_encode(task)) + goto out; + /* Encode here so that rpcsec_gss can use correct sequence number. */ + rpc_xdr_encode(task); + /* Did the encode result in an error condition? */ + if (task->tk_status != 0) { + /* Was the error nonfatal? */ + if (task->tk_status == -EAGAIN) + rpc_delay(task, HZ >> 4); + else + rpc_exit(task, task->tk_status); + return; + } + + /* Add task to reply queue before transmission to avoid races */ + if (rpc_reply_expected(task)) + xprt_request_enqueue_receive(task); + xprt_request_enqueue_transmit(task); +out: + task->tk_action = call_bind; +} + /* * 4. Get the server port number if not yet set */ @@ -1945,24 +1975,8 @@ call_transmit(struct rpc_task *task) dprint_status(task); task->tk_action = call_transmit_status; - /* Encode here so that rpcsec_gss can use correct sequence number. */ - if (rpc_task_need_encode(task)) { - rpc_xdr_encode(task); - /* Did the encode result in an error condition? */ - if (task->tk_status != 0) { - /* Was the error nonfatal? */ - if (task->tk_status == -EAGAIN) - rpc_delay(task, HZ >> 4); - else - rpc_exit(task, task->tk_status); - return; - } - } - - /* Add task to reply queue before transmission to avoid races */ - if (rpc_reply_expected(task)) - xprt_request_enqueue_receive(task); - xprt_request_enqueue_transmit(task); + if (!test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate)) + return; if (!xprt_prepare_transmit(task)) return; @@ -1997,9 +2011,9 @@ call_transmit_status(struct rpc_task *task) xprt_end_transmit(task); break; case -EBADMSG: - task->tk_action = call_transmit; - task->tk_status = 0; xprt_end_transmit(task); + task->tk_status = 0; + task->tk_action = call_encode; break; /* * Special cases: if we've been waiting on the @@ -2048,6 +2062,9 @@ call_bc_transmit(struct rpc_task *task) { struct rpc_rqst *req = task->tk_rqstp; + if (rpc_task_need_encode(task)) + xprt_request_enqueue_transmit(task); + if (!xprt_prepare_transmit(task)) goto out_retry; @@ -2169,7 +2186,7 @@ call_status(struct rpc_task *task) case -EPIPE: case -ENOTCONN: case -EAGAIN: - task->tk_action = call_bind; + task->tk_action = call_encode; break; case -EIO: /* shutdown or soft timeout */ @@ -2234,7 +2251,7 @@ call_timeout(struct rpc_task *task) rpcauth_invalcred(task); retry: - task->tk_action = call_bind; + task->tk_action = call_encode; task->tk_status = 0; } @@ -2278,7 +2295,7 @@ call_decode(struct rpc_task *task) if (req->rq_rcv_buf.len < 12) { if (!RPC_IS_SOFT(task)) { - task->tk_action = call_bind; + task->tk_action = call_encode; goto out_retry; } dprintk("RPC: %s: too small RPC reply size (%d bytes)\n", @@ -2409,7 +2426,7 @@ rpc_verify_header(struct rpc_task *task) task->tk_garb_retry--; dprintk("RPC: %5u %s: retry garbled creds\n", task->tk_pid, __func__); - task->tk_action = call_bind; + task->tk_action = call_encode; goto out_retry; case RPC_AUTH_TOOWEAK: printk(KERN_NOTICE "RPC: server %s requires stronger " @@ -2478,7 +2495,7 @@ rpc_verify_header(struct rpc_task *task) task->tk_garb_retry--; dprintk("RPC: %5u %s: retrying\n", task->tk_pid, __func__); - task->tk_action = call_bind; + task->tk_action = call_encode; out_retry: return ERR_PTR(-EAGAIN); } diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 39a6f6e8ae01..426a3a05e075 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1057,18 +1057,10 @@ void xprt_request_wait_receive(struct rpc_task *task) spin_unlock(&xprt->queue_lock); } -static bool -xprt_request_need_transmit(struct rpc_task *task) -{ - return !(task->tk_flags & RPC_TASK_NO_RETRANS_TIMEOUT) || - xprt_request_retransmit_after_disconnect(task); -} - static bool xprt_request_need_enqueue_transmit(struct rpc_task *task, struct rpc_rqst *req) { - return xprt_request_need_transmit(task) && - !test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate); + return !test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate); } /** @@ -1123,6 +1115,18 @@ xprt_request_dequeue_transmit(struct rpc_task *task) spin_unlock(&xprt->queue_lock); } +/** + * xprt_request_need_retransmit - Test if a task needs retransmission + * @task: pointer to rpc_task + * + * Test for whether a connection breakage requires the task to retransmit + */ +bool +xprt_request_need_retransmit(struct rpc_task *task) +{ + return xprt_request_retransmit_after_disconnect(task); +} + /** * xprt_prepare_transmit - reserve the transport before sending a request * @task: RPC task about to send a request -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 21/44] SUNRPC: Fix up the back channel transmit 2018-09-17 13:03 ` [PATCH v3 20/44] SUNRPC: Refactor RPC call encoding Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 22/44] SUNRPC: Treat the task and request as separate in the xprt_ops->send_request() Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs Fix up the back channel code to recognise that it has already been transmitted, so does not need to be called again. Also ensure that we set req->rq_task. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- include/linux/sunrpc/bc_xprt.h | 1 + net/sunrpc/clnt.c | 19 +++++-------------- net/sunrpc/xprt.c | 27 ++++++++++++++++++++++++++- 3 files changed, 32 insertions(+), 15 deletions(-) diff --git a/include/linux/sunrpc/bc_xprt.h b/include/linux/sunrpc/bc_xprt.h index 4397a4824c81..28721cf73ec3 100644 --- a/include/linux/sunrpc/bc_xprt.h +++ b/include/linux/sunrpc/bc_xprt.h @@ -34,6 +34,7 @@ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. #ifdef CONFIG_SUNRPC_BACKCHANNEL struct rpc_rqst *xprt_lookup_bc_request(struct rpc_xprt *xprt, __be32 xid); void xprt_complete_bc_request(struct rpc_rqst *req, uint32_t copied); +void xprt_init_bc_request(struct rpc_rqst *req, struct rpc_task *task); void xprt_free_bc_request(struct rpc_rqst *req); int xprt_setup_backchannel(struct rpc_xprt *, unsigned int min_reqs); void xprt_destroy_backchannel(struct rpc_xprt *, unsigned int max_reqs); diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index 64159716be30..dcefbf406482 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1138,7 +1138,6 @@ EXPORT_SYMBOL_GPL(rpc_call_async); struct rpc_task *rpc_run_bc_task(struct rpc_rqst *req) { struct rpc_task *task; - struct xdr_buf *xbufp = &req->rq_snd_buf; struct rpc_task_setup task_setup_data = { .callback_ops = &rpc_default_ops, .flags = RPC_TASK_SOFTCONN | @@ -1150,14 +1149,7 @@ struct rpc_task *rpc_run_bc_task(struct rpc_rqst *req) * Create an rpc_task to send the data */ task = rpc_new_task(&task_setup_data); - task->tk_rqstp = req; - - /* - * Set up the xdr_buf length. - * This also indicates that the buffer is XDR encoded already. - */ - xbufp->len = xbufp->head[0].iov_len + xbufp->page_len + - xbufp->tail[0].iov_len; + xprt_init_bc_request(req, task); task->tk_action = call_bc_transmit; atomic_inc(&task->tk_count); @@ -2064,6 +2056,8 @@ call_bc_transmit(struct rpc_task *task) if (rpc_task_need_encode(task)) xprt_request_enqueue_transmit(task); + if (!test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate)) + goto out_wakeup; if (!xprt_prepare_transmit(task)) goto out_retry; @@ -2073,13 +2067,11 @@ call_bc_transmit(struct rpc_task *task) "error: %d\n", task->tk_status); goto out_done; } - if (req->rq_connect_cookie != req->rq_xprt->connect_cookie) - req->rq_bytes_sent = 0; xprt_transmit(task); if (task->tk_status == -EAGAIN) - goto out_nospace; + goto out_retry; xprt_end_transmit(task); dprint_status(task); @@ -2119,12 +2111,11 @@ call_bc_transmit(struct rpc_task *task) "error: %d\n", task->tk_status); break; } +out_wakeup: rpc_wake_up_queued_task(&req->rq_xprt->pending, task); out_done: task->tk_action = rpc_exit_task; return; -out_nospace: - req->rq_connect_cookie = req->rq_xprt->connect_cookie; out_retry: task->tk_status = 0; } diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 426a3a05e075..d418bd4db7ff 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1389,6 +1389,12 @@ void xprt_free(struct rpc_xprt *xprt) } EXPORT_SYMBOL_GPL(xprt_free); +static void +xprt_init_connect_cookie(struct rpc_rqst *req, struct rpc_xprt *xprt) +{ + req->rq_connect_cookie = xprt_connect_cookie(xprt) - 1; +} + static __be32 xprt_alloc_xid(struct rpc_xprt *xprt) { @@ -1417,7 +1423,7 @@ xprt_request_init(struct rpc_task *task) req->rq_xprt = xprt; req->rq_buffer = NULL; req->rq_xid = xprt_alloc_xid(xprt); - req->rq_connect_cookie = xprt_connect_cookie(xprt) - 1; + xprt_init_connect_cookie(req, xprt); req->rq_bytes_sent = 0; req->rq_snd_buf.len = 0; req->rq_snd_buf.buflen = 0; @@ -1551,6 +1557,25 @@ void xprt_release(struct rpc_task *task) xprt_free_bc_request(req); } +#ifdef CONFIG_SUNRPC_BACKCHANNEL +void +xprt_init_bc_request(struct rpc_rqst *req, struct rpc_task *task) +{ + struct xdr_buf *xbufp = &req->rq_snd_buf; + + task->tk_rqstp = req; + req->rq_task = task; + xprt_init_connect_cookie(req, req->rq_xprt); + /* + * Set up the xdr_buf length. + * This also indicates that the buffer is XDR encoded already. + */ + xbufp->len = xbufp->head[0].iov_len + xbufp->page_len + + xbufp->tail[0].iov_len; + req->rq_bytes_sent = 0; +} +#endif + static void xprt_init(struct rpc_xprt *xprt, struct net *net) { kref_init(&xprt->kref); -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 22/44] SUNRPC: Treat the task and request as separate in the xprt_ops->send_request() 2018-09-17 13:03 ` [PATCH v3 21/44] SUNRPC: Fix up the back channel transmit Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 23/44] SUNRPC: Don't reset the request 'bytes_sent' counter when releasing XPRT_LOCK Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs When we shift to using the transmit queue, then the task that holds the write lock will not necessarily be the same as the one being transmitted. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- include/linux/sunrpc/xprt.h | 2 +- net/sunrpc/xprt.c | 2 +- net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 3 +-- net/sunrpc/xprtrdma/transport.c | 5 ++-- net/sunrpc/xprtsock.c | 27 +++++++++++----------- 5 files changed, 18 insertions(+), 21 deletions(-) diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h index b8a7de161f67..8c2bb078f00c 100644 --- a/include/linux/sunrpc/xprt.h +++ b/include/linux/sunrpc/xprt.h @@ -140,7 +140,7 @@ struct rpc_xprt_ops { void (*connect)(struct rpc_xprt *xprt, struct rpc_task *task); int (*buf_alloc)(struct rpc_task *task); void (*buf_free)(struct rpc_task *task); - int (*send_request)(struct rpc_task *task); + int (*send_request)(struct rpc_rqst *req, struct rpc_task *task); void (*set_retrans_timeout)(struct rpc_task *task); void (*timer)(struct rpc_xprt *xprt, struct rpc_task *task); void (*release_request)(struct rpc_task *task); diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index d418bd4db7ff..00b17cb49910 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1190,7 +1190,7 @@ void xprt_transmit(struct rpc_task *task) } connect_cookie = xprt->connect_cookie; - status = xprt->ops->send_request(task); + status = xprt->ops->send_request(req, task); trace_xprt_transmit(xprt, req->rq_xid, status); if (status != 0) { task->tk_status = status; diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c index 09b12b7568fe..d1618c70edb4 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c @@ -215,9 +215,8 @@ rpcrdma_bc_send_request(struct svcxprt_rdma *rdma, struct rpc_rqst *rqst) * connection. */ static int -xprt_rdma_bc_send_request(struct rpc_task *task) +xprt_rdma_bc_send_request(struct rpc_rqst *rqst, struct rpc_task *task) { - struct rpc_rqst *rqst = task->tk_rqstp; struct svc_xprt *sxprt = rqst->rq_xprt->bc_xprt; struct svcxprt_rdma *rdma; int ret; diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index 143ce2579ba9..fa684bf4d090 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -706,9 +706,8 @@ xprt_rdma_free(struct rpc_task *task) * sent. Do not try to send this message again. */ static int -xprt_rdma_send_request(struct rpc_task *task) +xprt_rdma_send_request(struct rpc_rqst *rqst, struct rpc_task *task) { - struct rpc_rqst *rqst = task->tk_rqstp; struct rpc_xprt *xprt = rqst->rq_xprt; struct rpcrdma_req *req = rpcr_to_rdmar(rqst); struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt); @@ -741,7 +740,7 @@ xprt_rdma_send_request(struct rpc_task *task) /* An RPC with no reply will throw off credit accounting, * so drop the connection to reset the credit grant. */ - if (!rpc_reply_expected(task)) + if (!rpc_reply_expected(rqst->rq_task)) goto drop_connection; return 0; diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 8d6404259ff9..b8143eded4af 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -449,12 +449,12 @@ static void xs_nospace_callback(struct rpc_task *task) /** * xs_nospace - place task on wait queue if transmit was incomplete + * @req: pointer to RPC request * @task: task to put to sleep * */ -static int xs_nospace(struct rpc_task *task) +static int xs_nospace(struct rpc_rqst *req, struct rpc_task *task) { - struct rpc_rqst *req = task->tk_rqstp; struct rpc_xprt *xprt = req->rq_xprt; struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); struct sock *sk = transport->inet; @@ -513,6 +513,7 @@ static inline void xs_encode_stream_record_marker(struct xdr_buf *buf) /** * xs_local_send_request - write an RPC request to an AF_LOCAL socket + * @req: pointer to RPC request * @task: RPC task that manages the state of an RPC request * * Return values: @@ -522,9 +523,8 @@ static inline void xs_encode_stream_record_marker(struct xdr_buf *buf) * ENOTCONN: Caller needs to invoke connect logic then call again * other: Some other error occured, the request was not sent */ -static int xs_local_send_request(struct rpc_task *task) +static int xs_local_send_request(struct rpc_rqst *req, struct rpc_task *task) { - struct rpc_rqst *req = task->tk_rqstp; struct rpc_xprt *xprt = req->rq_xprt; struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); @@ -569,7 +569,7 @@ static int xs_local_send_request(struct rpc_task *task) case -ENOBUFS: break; case -EAGAIN: - status = xs_nospace(task); + status = xs_nospace(req, task); break; default: dprintk("RPC: sendmsg returned unrecognized error %d\n", @@ -585,6 +585,7 @@ static int xs_local_send_request(struct rpc_task *task) /** * xs_udp_send_request - write an RPC request to a UDP socket + * @req: pointer to RPC request * @task: address of RPC task that manages the state of an RPC request * * Return values: @@ -594,9 +595,8 @@ static int xs_local_send_request(struct rpc_task *task) * ENOTCONN: Caller needs to invoke connect logic then call again * other: Some other error occurred, the request was not sent */ -static int xs_udp_send_request(struct rpc_task *task) +static int xs_udp_send_request(struct rpc_rqst *req, struct rpc_task *task) { - struct rpc_rqst *req = task->tk_rqstp; struct rpc_xprt *xprt = req->rq_xprt; struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); struct xdr_buf *xdr = &req->rq_snd_buf; @@ -638,7 +638,7 @@ static int xs_udp_send_request(struct rpc_task *task) /* Should we call xs_close() here? */ break; case -EAGAIN: - status = xs_nospace(task); + status = xs_nospace(req, task); break; case -ENETUNREACH: case -ENOBUFS: @@ -658,6 +658,7 @@ static int xs_udp_send_request(struct rpc_task *task) /** * xs_tcp_send_request - write an RPC request to a TCP socket + * @req: pointer to RPC request * @task: address of RPC task that manages the state of an RPC request * * Return values: @@ -670,9 +671,8 @@ static int xs_udp_send_request(struct rpc_task *task) * XXX: In the case of soft timeouts, should we eventually give up * if sendmsg is not able to make progress? */ -static int xs_tcp_send_request(struct rpc_task *task) +static int xs_tcp_send_request(struct rpc_rqst *req, struct rpc_task *task) { - struct rpc_rqst *req = task->tk_rqstp; struct rpc_xprt *xprt = req->rq_xprt; struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); struct xdr_buf *xdr = &req->rq_snd_buf; @@ -697,7 +697,7 @@ static int xs_tcp_send_request(struct rpc_task *task) * completes while the socket holds a reference to the pages, * then we may end up resending corrupted data. */ - if (task->tk_flags & RPC_TASK_SENT) + if (req->rq_task->tk_flags & RPC_TASK_SENT) zerocopy = false; if (test_bit(XPRT_SOCK_UPD_TIMEOUT, &transport->sock_state)) @@ -761,7 +761,7 @@ static int xs_tcp_send_request(struct rpc_task *task) /* Should we call xs_close() here? */ break; case -EAGAIN: - status = xs_nospace(task); + status = xs_nospace(req, task); break; case -ECONNRESET: case -ECONNREFUSED: @@ -2706,9 +2706,8 @@ static int bc_sendto(struct rpc_rqst *req) /* * The send routine. Borrows from svc_send */ -static int bc_send_request(struct rpc_task *task) +static int bc_send_request(struct rpc_rqst *req, struct rpc_task *task) { - struct rpc_rqst *req = task->tk_rqstp; struct svc_xprt *xprt; int len; -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 23/44] SUNRPC: Don't reset the request 'bytes_sent' counter when releasing XPRT_LOCK 2018-09-17 13:03 ` [PATCH v3 22/44] SUNRPC: Treat the task and request as separate in the xprt_ops->send_request() Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 24/44] SUNRPC: Simplify xprt_prepare_transmit() Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs If the request is still on the queue, this will be incorrect behaviour. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- net/sunrpc/clnt.c | 4 ---- net/sunrpc/xprt.c | 14 -------------- 2 files changed, 18 deletions(-) diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index dcefbf406482..4ca23a6607ba 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -2128,15 +2128,11 @@ static void call_status(struct rpc_task *task) { struct rpc_clnt *clnt = task->tk_client; - struct rpc_rqst *req = task->tk_rqstp; int status; if (!task->tk_msg.rpc_proc->p_proc) trace_xprt_ping(task->tk_xprt, task->tk_status); - if (req->rq_reply_bytes_recvd > 0 && !req->rq_bytes_sent) - task->tk_status = req->rq_reply_bytes_recvd; - dprint_status(task); status = task->tk_status; diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 00b17cb49910..3b31830ef851 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -332,15 +332,6 @@ static void __xprt_lock_write_next_cong(struct rpc_xprt *xprt) xprt_clear_locked(xprt); } -static void xprt_task_clear_bytes_sent(struct rpc_task *task) -{ - if (task != NULL) { - struct rpc_rqst *req = task->tk_rqstp; - if (req != NULL) - req->rq_bytes_sent = 0; - } -} - /** * xprt_release_xprt - allow other requests to use a transport * @xprt: transport with other tasks potentially waiting @@ -351,7 +342,6 @@ static void xprt_task_clear_bytes_sent(struct rpc_task *task) void xprt_release_xprt(struct rpc_xprt *xprt, struct rpc_task *task) { if (xprt->snd_task == task) { - xprt_task_clear_bytes_sent(task); xprt_clear_locked(xprt); __xprt_lock_write_next(xprt); } @@ -369,7 +359,6 @@ EXPORT_SYMBOL_GPL(xprt_release_xprt); void xprt_release_xprt_cong(struct rpc_xprt *xprt, struct rpc_task *task) { if (xprt->snd_task == task) { - xprt_task_clear_bytes_sent(task); xprt_clear_locked(xprt); __xprt_lock_write_next_cong(xprt); } @@ -742,7 +731,6 @@ bool xprt_lock_connect(struct rpc_xprt *xprt, goto out; if (xprt->snd_task != task) goto out; - xprt_task_clear_bytes_sent(task); xprt->snd_task = cookie; ret = true; out: @@ -788,7 +776,6 @@ void xprt_connect(struct rpc_task *task) xprt->ops->close(xprt); if (!xprt_connected(xprt)) { - task->tk_rqstp->rq_bytes_sent = 0; task->tk_timeout = task->tk_rqstp->rq_timeout; task->tk_rqstp->rq_connect_cookie = xprt->connect_cookie; rpc_sleep_on(&xprt->pending, task, xprt_connect_status); @@ -1093,7 +1080,6 @@ xprt_request_enqueue_transmit(struct rpc_task *task) static void xprt_request_dequeue_transmit_locked(struct rpc_task *task) { - xprt_task_clear_bytes_sent(task); if (test_and_clear_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate)) list_del(&task->tk_rqstp->rq_xmit); } -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 24/44] SUNRPC: Simplify xprt_prepare_transmit() 2018-09-17 13:03 ` [PATCH v3 23/44] SUNRPC: Don't reset the request 'bytes_sent' counter when releasing XPRT_LOCK Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 25/44] SUNRPC: Move RPC retransmission stat counter to xprt_transmit() Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs Remove the checks for whether or not we need to transmit, and whether or not a reply has been received. Those are already handled in call_transmit() itself. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- net/sunrpc/xprt.c | 23 +++++++---------------- 1 file changed, 7 insertions(+), 16 deletions(-) diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 3b31830ef851..385ee9f64353 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1122,27 +1122,18 @@ bool xprt_prepare_transmit(struct rpc_task *task) { struct rpc_rqst *req = task->tk_rqstp; struct rpc_xprt *xprt = req->rq_xprt; - bool ret = false; dprintk("RPC: %5u xprt_prepare_transmit\n", task->tk_pid); - spin_lock_bh(&xprt->transport_lock); - if (!req->rq_bytes_sent) { - if (req->rq_reply_bytes_recvd) { - task->tk_status = req->rq_reply_bytes_recvd; - goto out_unlock; - } + if (!xprt_lock_write(xprt, task)) { + /* Race breaker: someone may have transmitted us */ if (!test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate)) - goto out_unlock; - } - if (!xprt->ops->reserve_xprt(xprt, task)) { - task->tk_status = -EAGAIN; - goto out_unlock; + rpc_wake_up_queued_task_set_status(&xprt->sending, + task, 0); + return false; + } - ret = true; -out_unlock: - spin_unlock_bh(&xprt->transport_lock); - return ret; + return true; } void xprt_end_transmit(struct rpc_task *task) -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 25/44] SUNRPC: Move RPC retransmission stat counter to xprt_transmit() 2018-09-17 13:03 ` [PATCH v3 24/44] SUNRPC: Simplify xprt_prepare_transmit() Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- net/sunrpc/clnt.c | 6 ------ net/sunrpc/xprt.c | 19 ++++++++++++------- 2 files changed, 12 insertions(+), 13 deletions(-) diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index 4ca23a6607ba..8dc3d33827c4 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1962,8 +1962,6 @@ call_connect_status(struct rpc_task *task) static void call_transmit(struct rpc_task *task) { - int is_retrans = RPC_WAS_SENT(task); - dprint_status(task); task->tk_action = call_transmit_status; @@ -1973,10 +1971,6 @@ call_transmit(struct rpc_task *task) if (!xprt_prepare_transmit(task)) return; xprt_transmit(task); - if (task->tk_status < 0) - return; - if (is_retrans) - task->tk_client->cl_stats->rpcretrans++; } /* diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 385ee9f64353..35f5df367591 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -191,8 +191,6 @@ int xprt_reserve_xprt(struct rpc_xprt *xprt, struct rpc_task *task) goto out_sleep; } xprt->snd_task = task; - if (req != NULL) - req->rq_ntrans++; return 1; @@ -247,7 +245,6 @@ int xprt_reserve_xprt_cong(struct rpc_xprt *xprt, struct rpc_task *task) } if (__xprt_get_cong(xprt, task)) { xprt->snd_task = task; - req->rq_ntrans++; return 1; } xprt_clear_locked(xprt); @@ -281,12 +278,8 @@ static inline int xprt_lock_write(struct rpc_xprt *xprt, struct rpc_task *task) static bool __xprt_lock_write_func(struct rpc_task *task, void *data) { struct rpc_xprt *xprt = data; - struct rpc_rqst *req; - req = task->tk_rqstp; xprt->snd_task = task; - if (req) - req->rq_ntrans++; return true; } @@ -1152,6 +1145,7 @@ void xprt_transmit(struct rpc_task *task) struct rpc_rqst *req = task->tk_rqstp; struct rpc_xprt *xprt = req->rq_xprt; unsigned int connect_cookie; + int is_retrans = RPC_WAS_SENT(task); int status; dprintk("RPC: %5u xprt_transmit(%u)\n", task->tk_pid, req->rq_slen); @@ -1166,14 +1160,25 @@ void xprt_transmit(struct rpc_task *task) } } + /* + * Update req->rq_ntrans before transmitting to avoid races with + * xprt_update_rtt(), which needs to know that it is recording a + * reply to the first transmission. + */ + req->rq_ntrans++; + connect_cookie = xprt->connect_cookie; status = xprt->ops->send_request(req, task); trace_xprt_transmit(xprt, req->rq_xid, status); if (status != 0) { + req->rq_ntrans--; task->tk_status = status; return; } + if (is_retrans) + task->tk_client->cl_stats->rpcretrans++; + xprt_inject_disconnect(xprt); dprintk("RPC: %5u xmit complete\n", task->tk_pid); -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks 2018-09-17 13:03 ` [PATCH v3 25/44] SUNRPC: Move RPC retransmission stat counter to xprt_transmit() Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 27/44] SUNRPC: Support for congestion control when queuing is enabled Trond Myklebust 2018-12-27 19:21 ` [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks Chuck Lever 0 siblings, 2 replies; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs One of the intentions with the priority queues was to ensure that no single process can hog the transport. The field task->tk_owner therefore identifies the RPC call's origin, and is intended to allow the RPC layer to organise queues for fairness. This commit therefore modifies the transmit queue to group requests by task->tk_owner, and ensures that we round robin among those groups. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- include/linux/sunrpc/xprt.h | 1 + net/sunrpc/xprt.c | 27 ++++++++++++++++++++++++--- 2 files changed, 25 insertions(+), 3 deletions(-) diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h index 8c2bb078f00c..e377620b9744 100644 --- a/include/linux/sunrpc/xprt.h +++ b/include/linux/sunrpc/xprt.h @@ -89,6 +89,7 @@ struct rpc_rqst { }; struct list_head rq_xmit; /* Send queue */ + struct list_head rq_xmit2; /* Send queue */ void *rq_buffer; /* Call XDR encode buffer */ size_t rq_callsize; diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 35f5df367591..3e68f35f71f6 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1052,12 +1052,21 @@ xprt_request_need_enqueue_transmit(struct rpc_task *task, struct rpc_rqst *req) void xprt_request_enqueue_transmit(struct rpc_task *task) { - struct rpc_rqst *req = task->tk_rqstp; + struct rpc_rqst *pos, *req = task->tk_rqstp; struct rpc_xprt *xprt = req->rq_xprt; if (xprt_request_need_enqueue_transmit(task, req)) { spin_lock(&xprt->queue_lock); + list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) { + if (pos->rq_task->tk_owner != task->tk_owner) + continue; + list_add_tail(&req->rq_xmit2, &pos->rq_xmit2); + INIT_LIST_HEAD(&req->rq_xmit); + goto out; + } list_add_tail(&req->rq_xmit, &xprt->xmit_queue); + INIT_LIST_HEAD(&req->rq_xmit2); +out: set_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate); spin_unlock(&xprt->queue_lock); } @@ -1073,8 +1082,20 @@ xprt_request_enqueue_transmit(struct rpc_task *task) static void xprt_request_dequeue_transmit_locked(struct rpc_task *task) { - if (test_and_clear_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate)) - list_del(&task->tk_rqstp->rq_xmit); + struct rpc_rqst *req = task->tk_rqstp; + + if (!test_and_clear_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate)) + return; + if (!list_empty(&req->rq_xmit)) { + list_del(&req->rq_xmit); + if (!list_empty(&req->rq_xmit2)) { + struct rpc_rqst *next = list_first_entry(&req->rq_xmit2, + struct rpc_rqst, rq_xmit2); + list_del(&req->rq_xmit2); + list_add_tail(&next->rq_xmit, &next->rq_xprt->xmit_queue); + } + } else + list_del(&req->rq_xmit2); } /** -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 27/44] SUNRPC: Support for congestion control when queuing is enabled 2018-09-17 13:03 ` [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 28/44] SUNRPC: Enqueue swapper tagged RPCs at the head of the transmit queue Trond Myklebust 2018-12-27 19:21 ` [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks Chuck Lever 1 sibling, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs Both RDMA and UDP transports require the request to get a "congestion control" credit before they can be transmitted. Right now, this is done when the request locks the socket. We'd like it to happen when a request attempts to be transmitted for the first time. In order to support retransmission of requests that already hold such credits, we also want to ensure that they get queued first, so that we don't deadlock with requests that have yet to obtain a credit. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- include/linux/sunrpc/xprt.h | 2 + net/sunrpc/clnt.c | 5 ++ net/sunrpc/xprt.c | 128 +++++++++++++++++++++--------- net/sunrpc/xprtrdma/backchannel.c | 3 + net/sunrpc/xprtrdma/transport.c | 3 + net/sunrpc/xprtsock.c | 4 + 6 files changed, 109 insertions(+), 36 deletions(-) diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h index e377620b9744..0d0cc127615e 100644 --- a/include/linux/sunrpc/xprt.h +++ b/include/linux/sunrpc/xprt.h @@ -397,6 +397,7 @@ void xprt_complete_rqst(struct rpc_task *task, int copied); void xprt_pin_rqst(struct rpc_rqst *req); void xprt_unpin_rqst(struct rpc_rqst *req); void xprt_release_rqst_cong(struct rpc_task *task); +bool xprt_request_get_cong(struct rpc_xprt *xprt, struct rpc_rqst *req); void xprt_disconnect_done(struct rpc_xprt *xprt); void xprt_force_disconnect(struct rpc_xprt *xprt); void xprt_conditional_disconnect(struct rpc_xprt *xprt, unsigned int cookie); @@ -415,6 +416,7 @@ void xprt_unlock_connect(struct rpc_xprt *, void *); #define XPRT_BINDING (5) #define XPRT_CLOSING (6) #define XPRT_CONGESTED (9) +#define XPRT_CWND_WAIT (10) static inline void xprt_set_connected(struct rpc_xprt *xprt) { diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index 8dc3d33827c4..f03911f84953 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1996,6 +1996,11 @@ call_transmit_status(struct rpc_task *task) dprint_status(task); xprt_end_transmit(task); break; + case -EBADSLT: + xprt_end_transmit(task); + task->tk_action = call_transmit; + task->tk_status = 0; + break; case -EBADMSG: xprt_end_transmit(task); task->tk_status = 0; diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 3e68f35f71f6..e07a54fbe1e7 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -68,8 +68,6 @@ static void xprt_init(struct rpc_xprt *xprt, struct net *net); static __be32 xprt_alloc_xid(struct rpc_xprt *xprt); static void xprt_connect_status(struct rpc_task *task); -static int __xprt_get_cong(struct rpc_xprt *, struct rpc_task *); -static void __xprt_put_cong(struct rpc_xprt *, struct rpc_rqst *); static void xprt_destroy(struct rpc_xprt *xprt); static DEFINE_SPINLOCK(xprt_list_lock); @@ -221,6 +219,31 @@ static void xprt_clear_locked(struct rpc_xprt *xprt) queue_work(xprtiod_workqueue, &xprt->task_cleanup); } +static bool +xprt_need_congestion_window_wait(struct rpc_xprt *xprt) +{ + return test_bit(XPRT_CWND_WAIT, &xprt->state); +} + +static void +xprt_set_congestion_window_wait(struct rpc_xprt *xprt) +{ + if (!list_empty(&xprt->xmit_queue)) { + /* Peek at head of queue to see if it can make progress */ + if (list_first_entry(&xprt->xmit_queue, struct rpc_rqst, + rq_xmit)->rq_cong) + return; + } + set_bit(XPRT_CWND_WAIT, &xprt->state); +} + +static void +xprt_test_and_clear_congestion_window_wait(struct rpc_xprt *xprt) +{ + if (!RPCXPRT_CONGESTED(xprt)) + clear_bit(XPRT_CWND_WAIT, &xprt->state); +} + /* * xprt_reserve_xprt_cong - serialize write access to transports * @task: task that is requesting access to the transport @@ -228,6 +251,7 @@ static void xprt_clear_locked(struct rpc_xprt *xprt) * Same as xprt_reserve_xprt, but Van Jacobson congestion control is * integrated into the decision of whether a request is allowed to be * woken up and given access to the transport. + * Note that the lock is only granted if we know there are free slots. */ int xprt_reserve_xprt_cong(struct rpc_xprt *xprt, struct rpc_task *task) { @@ -243,14 +267,12 @@ int xprt_reserve_xprt_cong(struct rpc_xprt *xprt, struct rpc_task *task) xprt->snd_task = task; return 1; } - if (__xprt_get_cong(xprt, task)) { + if (!xprt_need_congestion_window_wait(xprt)) { xprt->snd_task = task; return 1; } xprt_clear_locked(xprt); out_sleep: - if (req) - __xprt_put_cong(xprt, req); dprintk("RPC: %5u failed to lock transport %p\n", task->tk_pid, xprt); task->tk_timeout = 0; task->tk_status = -EAGAIN; @@ -294,32 +316,14 @@ static void __xprt_lock_write_next(struct rpc_xprt *xprt) xprt_clear_locked(xprt); } -static bool __xprt_lock_write_cong_func(struct rpc_task *task, void *data) -{ - struct rpc_xprt *xprt = data; - struct rpc_rqst *req; - - req = task->tk_rqstp; - if (req == NULL) { - xprt->snd_task = task; - return true; - } - if (__xprt_get_cong(xprt, task)) { - xprt->snd_task = task; - req->rq_ntrans++; - return true; - } - return false; -} - static void __xprt_lock_write_next_cong(struct rpc_xprt *xprt) { if (test_and_set_bit(XPRT_LOCKED, &xprt->state)) return; - if (RPCXPRT_CONGESTED(xprt)) + if (xprt_need_congestion_window_wait(xprt)) goto out_unlock; if (rpc_wake_up_first_on_wq(xprtiod_workqueue, &xprt->sending, - __xprt_lock_write_cong_func, xprt)) + __xprt_lock_write_func, xprt)) return; out_unlock: xprt_clear_locked(xprt); @@ -370,16 +374,16 @@ static inline void xprt_release_write(struct rpc_xprt *xprt, struct rpc_task *ta * overflowed. Put the task to sleep if this is the case. */ static int -__xprt_get_cong(struct rpc_xprt *xprt, struct rpc_task *task) +__xprt_get_cong(struct rpc_xprt *xprt, struct rpc_rqst *req) { - struct rpc_rqst *req = task->tk_rqstp; - if (req->rq_cong) return 1; dprintk("RPC: %5u xprt_cwnd_limited cong = %lu cwnd = %lu\n", - task->tk_pid, xprt->cong, xprt->cwnd); - if (RPCXPRT_CONGESTED(xprt)) + req->rq_task->tk_pid, xprt->cong, xprt->cwnd); + if (RPCXPRT_CONGESTED(xprt)) { + xprt_set_congestion_window_wait(xprt); return 0; + } req->rq_cong = 1; xprt->cong += RPC_CWNDSCALE; return 1; @@ -396,9 +400,31 @@ __xprt_put_cong(struct rpc_xprt *xprt, struct rpc_rqst *req) return; req->rq_cong = 0; xprt->cong -= RPC_CWNDSCALE; + xprt_test_and_clear_congestion_window_wait(xprt); __xprt_lock_write_next_cong(xprt); } +/** + * xprt_request_get_cong - Request congestion control credits + * @xprt: pointer to transport + * @req: pointer to RPC request + * + * Useful for transports that require congestion control. + */ +bool +xprt_request_get_cong(struct rpc_xprt *xprt, struct rpc_rqst *req) +{ + bool ret = false; + + if (req->rq_cong) + return true; + spin_lock_bh(&xprt->transport_lock); + ret = __xprt_get_cong(xprt, req) != 0; + spin_unlock_bh(&xprt->transport_lock); + return ret; +} +EXPORT_SYMBOL_GPL(xprt_request_get_cong); + /** * xprt_release_rqst_cong - housekeeping when request is complete * @task: RPC request that recently completed @@ -413,6 +439,20 @@ void xprt_release_rqst_cong(struct rpc_task *task) } EXPORT_SYMBOL_GPL(xprt_release_rqst_cong); +/* + * Clear the congestion window wait flag and wake up the next + * entry on xprt->sending + */ +static void +xprt_clear_congestion_window_wait(struct rpc_xprt *xprt) +{ + if (test_and_clear_bit(XPRT_CWND_WAIT, &xprt->state)) { + spin_lock_bh(&xprt->transport_lock); + __xprt_lock_write_next_cong(xprt); + spin_unlock_bh(&xprt->transport_lock); + } +} + /** * xprt_adjust_cwnd - adjust transport congestion window * @xprt: pointer to xprt @@ -1057,12 +1097,28 @@ xprt_request_enqueue_transmit(struct rpc_task *task) if (xprt_request_need_enqueue_transmit(task, req)) { spin_lock(&xprt->queue_lock); - list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) { - if (pos->rq_task->tk_owner != task->tk_owner) - continue; - list_add_tail(&req->rq_xmit2, &pos->rq_xmit2); - INIT_LIST_HEAD(&req->rq_xmit); - goto out; + /* + * Requests that carry congestion control credits are added + * to the head of the list to avoid starvation issues. + */ + if (req->rq_cong) { + xprt_clear_congestion_window_wait(xprt); + list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) { + if (pos->rq_cong) + continue; + /* Note: req is added _before_ pos */ + list_add_tail(&req->rq_xmit, &pos->rq_xmit); + INIT_LIST_HEAD(&req->rq_xmit2); + goto out; + } + } else { + list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) { + if (pos->rq_task->tk_owner != task->tk_owner) + continue; + list_add_tail(&req->rq_xmit2, &pos->rq_xmit2); + INIT_LIST_HEAD(&req->rq_xmit); + goto out; + } } list_add_tail(&req->rq_xmit, &xprt->xmit_queue); INIT_LIST_HEAD(&req->rq_xmit2); diff --git a/net/sunrpc/xprtrdma/backchannel.c b/net/sunrpc/xprtrdma/backchannel.c index ed58761e6b23..e7c445cee16f 100644 --- a/net/sunrpc/xprtrdma/backchannel.c +++ b/net/sunrpc/xprtrdma/backchannel.c @@ -200,6 +200,9 @@ int xprt_rdma_bc_send_reply(struct rpc_rqst *rqst) if (!xprt_connected(rqst->rq_xprt)) goto drop_connection; + if (!xprt_request_get_cong(rqst->rq_xprt, rqst)) + return -EBADSLT; + rc = rpcrdma_bc_marshal_reply(rqst); if (rc < 0) goto failed_marshal; diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index fa684bf4d090..9ff322e53f37 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -721,6 +721,9 @@ xprt_rdma_send_request(struct rpc_rqst *rqst, struct rpc_task *task) if (!xprt_connected(xprt)) goto drop_connection; + if (!xprt_request_get_cong(xprt, rqst)) + return -EBADSLT; + rc = rpcrdma_marshal_req(r_xprt, rqst); if (rc < 0) goto failed_marshal; diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index b8143eded4af..8831e84a058a 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -609,6 +609,10 @@ static int xs_udp_send_request(struct rpc_rqst *req, struct rpc_task *task) if (!xprt_bound(xprt)) return -ENOTCONN; + + if (!xprt_request_get_cong(xprt, req)) + return -EBADSLT; + req->rq_xtime = ktime_get(); status = xs_sendpages(transport->sock, xs_addr(xprt), xprt->addrlen, xdr, 0, true, &sent); -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 28/44] SUNRPC: Enqueue swapper tagged RPCs at the head of the transmit queue 2018-09-17 13:03 ` [PATCH v3 27/44] SUNRPC: Support for congestion control when queuing is enabled Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 29/44] SUNRPC: Allow calls to xprt_transmit() to drain the entire " Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs Avoid memory starvation by giving RPCs that are tagged with the RPC_TASK_SWAPPER flag the highest priority. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- net/sunrpc/xprt.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index e07a54fbe1e7..68974966b2e4 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1111,6 +1111,17 @@ xprt_request_enqueue_transmit(struct rpc_task *task) INIT_LIST_HEAD(&req->rq_xmit2); goto out; } + } else if (RPC_IS_SWAPPER(task)) { + list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) { + if (pos->rq_cong || pos->rq_bytes_sent) + continue; + if (RPC_IS_SWAPPER(pos->rq_task)) + continue; + /* Note: req is added _before_ pos */ + list_add_tail(&req->rq_xmit, &pos->rq_xmit); + INIT_LIST_HEAD(&req->rq_xmit2); + goto out; + } } else { list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) { if (pos->rq_task->tk_owner != task->tk_owner) -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 29/44] SUNRPC: Allow calls to xprt_transmit() to drain the entire transmit queue 2018-09-17 13:03 ` [PATCH v3 28/44] SUNRPC: Enqueue swapper tagged RPCs at the head of the transmit queue Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 30/44] SUNRPC: Allow soft RPC calls to time out when waiting for the XPRT_LOCK Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs Rather than forcing each and every RPC task to grab the socket write lock in order to send itself, we allow whichever task is holding the write lock to attempt to drain the entire transmit queue. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- net/sunrpc/xprt.c | 71 +++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 60 insertions(+), 11 deletions(-) diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 68974966b2e4..ae1109c7b9b4 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1223,15 +1223,20 @@ void xprt_end_transmit(struct rpc_task *task) } /** - * xprt_transmit - send an RPC request on a transport - * @task: controlling RPC task + * xprt_request_transmit - send an RPC request on a transport + * @req: pointer to request to transmit + * @snd_task: RPC task that owns the transport lock * - * We have to copy the iovec because sendmsg fiddles with its contents. + * This performs the transmission of a single request. + * Note that if the request is not the same as snd_task, then it + * does need to be pinned. + * Returns '0' on success. */ -void xprt_transmit(struct rpc_task *task) +static int +xprt_request_transmit(struct rpc_rqst *req, struct rpc_task *snd_task) { - struct rpc_rqst *req = task->tk_rqstp; - struct rpc_xprt *xprt = req->rq_xprt; + struct rpc_xprt *xprt = req->rq_xprt; + struct rpc_task *task = req->rq_task; unsigned int connect_cookie; int is_retrans = RPC_WAS_SENT(task); int status; @@ -1239,11 +1244,13 @@ void xprt_transmit(struct rpc_task *task) dprintk("RPC: %5u xprt_transmit(%u)\n", task->tk_pid, req->rq_slen); if (!req->rq_bytes_sent) { - if (xprt_request_data_received(task)) + if (xprt_request_data_received(task)) { + status = 0; goto out_dequeue; + } /* Verify that our message lies in the RPCSEC_GSS window */ if (rpcauth_xmit_need_reencode(task)) { - task->tk_status = -EBADMSG; + status = -EBADMSG; goto out_dequeue; } } @@ -1256,12 +1263,11 @@ void xprt_transmit(struct rpc_task *task) req->rq_ntrans++; connect_cookie = xprt->connect_cookie; - status = xprt->ops->send_request(req, task); + status = xprt->ops->send_request(req, snd_task); trace_xprt_transmit(xprt, req->rq_xid, status); if (status != 0) { req->rq_ntrans--; - task->tk_status = status; - return; + return status; } if (is_retrans) @@ -1283,6 +1289,49 @@ void xprt_transmit(struct rpc_task *task) req->rq_connect_cookie = connect_cookie; out_dequeue: xprt_request_dequeue_transmit(task); + rpc_wake_up_queued_task_set_status(&xprt->sending, task, status); + return status; +} + +/** + * xprt_transmit - send an RPC request on a transport + * @task: controlling RPC task + * + * Attempts to drain the transmit queue. On exit, either the transport + * signalled an error that needs to be handled before transmission can + * resume, or @task finished transmitting, and detected that it already + * received a reply. + */ +void +xprt_transmit(struct rpc_task *task) +{ + struct rpc_rqst *next, *req = task->tk_rqstp; + struct rpc_xprt *xprt = req->rq_xprt; + int status; + + spin_lock(&xprt->queue_lock); + while (!list_empty(&xprt->xmit_queue)) { + next = list_first_entry(&xprt->xmit_queue, + struct rpc_rqst, rq_xmit); + xprt_pin_rqst(next); + spin_unlock(&xprt->queue_lock); + status = xprt_request_transmit(next, task); + if (status == -EBADMSG && next != req) + status = 0; + cond_resched(); + spin_lock(&xprt->queue_lock); + xprt_unpin_rqst(next); + if (status == 0) { + if (!xprt_request_data_received(task) || + test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate)) + continue; + } else if (!test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate)) + rpc_wake_up_queued_task(&xprt->pending, task); + else + task->tk_status = status; + break; + } + spin_unlock(&xprt->queue_lock); } static void xprt_add_backlog(struct rpc_xprt *xprt, struct rpc_task *task) -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 30/44] SUNRPC: Allow soft RPC calls to time out when waiting for the XPRT_LOCK 2018-09-17 13:03 ` [PATCH v3 29/44] SUNRPC: Allow calls to xprt_transmit() to drain the entire " Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 31/44] SUNRPC: Turn off throttling of RPC slots for TCP sockets Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs This no longer causes them to lose their place in the transmission queue. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- net/sunrpc/xprt.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index ae1109c7b9b4..a523e59a074e 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -195,7 +195,7 @@ int xprt_reserve_xprt(struct rpc_xprt *xprt, struct rpc_task *task) out_sleep: dprintk("RPC: %5u failed to lock transport %p\n", task->tk_pid, xprt); - task->tk_timeout = 0; + task->tk_timeout = RPC_IS_SOFT(task) ? req->rq_timeout : 0; task->tk_status = -EAGAIN; if (req == NULL) priority = RPC_PRIORITY_LOW; @@ -274,7 +274,7 @@ int xprt_reserve_xprt_cong(struct rpc_xprt *xprt, struct rpc_task *task) xprt_clear_locked(xprt); out_sleep: dprintk("RPC: %5u failed to lock transport %p\n", task->tk_pid, xprt); - task->tk_timeout = 0; + task->tk_timeout = RPC_IS_SOFT(task) ? req->rq_timeout : 0; task->tk_status = -EAGAIN; if (req == NULL) priority = RPC_PRIORITY_LOW; -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 31/44] SUNRPC: Turn off throttling of RPC slots for TCP sockets 2018-09-17 13:03 ` [PATCH v3 30/44] SUNRPC: Allow soft RPC calls to time out when waiting for the XPRT_LOCK Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 32/44] SUNRPC: Clean up transport write space handling Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs The theory was that we would need to grab the socket lock anyway, so we might as well use it to gate the allocation of RPC slots for a TCP socket. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- include/linux/sunrpc/xprt.h | 1 - net/sunrpc/xprt.c | 14 -------------- net/sunrpc/xprtsock.c | 2 +- 3 files changed, 1 insertion(+), 16 deletions(-) diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h index 0d0cc127615e..14c9b4d49fb4 100644 --- a/include/linux/sunrpc/xprt.h +++ b/include/linux/sunrpc/xprt.h @@ -343,7 +343,6 @@ int xprt_reserve_xprt_cong(struct rpc_xprt *xprt, struct rpc_task *task); void xprt_alloc_slot(struct rpc_xprt *xprt, struct rpc_task *task); void xprt_free_slot(struct rpc_xprt *xprt, struct rpc_rqst *req); -void xprt_lock_and_alloc_slot(struct rpc_xprt *xprt, struct rpc_task *task); bool xprt_prepare_transmit(struct rpc_task *task); void xprt_request_enqueue_transmit(struct rpc_task *task); void xprt_request_enqueue_receive(struct rpc_task *task); diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index a523e59a074e..6bdc10147297 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1428,20 +1428,6 @@ void xprt_alloc_slot(struct rpc_xprt *xprt, struct rpc_task *task) } EXPORT_SYMBOL_GPL(xprt_alloc_slot); -void xprt_lock_and_alloc_slot(struct rpc_xprt *xprt, struct rpc_task *task) -{ - /* Note: grabbing the xprt_lock_write() ensures that we throttle - * new slot allocation if the transport is congested (i.e. when - * reconnecting a stream transport or when out of socket write - * buffer space). - */ - if (xprt_lock_write(xprt, task)) { - xprt_alloc_slot(xprt, task); - xprt_release_write(xprt, task); - } -} -EXPORT_SYMBOL_GPL(xprt_lock_and_alloc_slot); - void xprt_free_slot(struct rpc_xprt *xprt, struct rpc_rqst *req) { spin_lock(&xprt->reserve_lock); diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 8831e84a058a..f54e8110f4c6 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -2809,7 +2809,7 @@ static const struct rpc_xprt_ops xs_udp_ops = { static const struct rpc_xprt_ops xs_tcp_ops = { .reserve_xprt = xprt_reserve_xprt, .release_xprt = xprt_release_xprt, - .alloc_slot = xprt_lock_and_alloc_slot, + .alloc_slot = xprt_alloc_slot, .free_slot = xprt_free_slot, .rpcbind = rpcb_getport_async, .set_port = xs_set_port, -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 32/44] SUNRPC: Clean up transport write space handling 2018-09-17 13:03 ` [PATCH v3 31/44] SUNRPC: Turn off throttling of RPC slots for TCP sockets Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 33/44] SUNRPC: Cleanup: remove the unused 'task' argument from the request_send() Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs Treat socket write space handling in the same way we now treat transport congestion: by denying the XPRT_LOCK until the transport signals that it has free buffer space. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- include/linux/sunrpc/svc_xprt.h | 1 - include/linux/sunrpc/xprt.h | 5 +- net/sunrpc/clnt.c | 28 +++----- net/sunrpc/svc_xprt.c | 2 - net/sunrpc/xprt.c | 77 +++++++++++++--------- net/sunrpc/xprtrdma/rpc_rdma.c | 2 +- net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 7 +- net/sunrpc/xprtsock.c | 33 ++++------ 8 files changed, 73 insertions(+), 82 deletions(-) diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h index c3d72066d4b1..6b7a86c4d6e6 100644 --- a/include/linux/sunrpc/svc_xprt.h +++ b/include/linux/sunrpc/svc_xprt.h @@ -84,7 +84,6 @@ struct svc_xprt { struct sockaddr_storage xpt_remote; /* remote peer's address */ size_t xpt_remotelen; /* length of address */ char xpt_remotebuf[INET6_ADDRSTRLEN + 10]; - struct rpc_wait_queue xpt_bc_pending; /* backchannel wait queue */ struct list_head xpt_users; /* callbacks on free */ struct net *xpt_net; diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h index 14c9b4d49fb4..5600242ccbf9 100644 --- a/include/linux/sunrpc/xprt.h +++ b/include/linux/sunrpc/xprt.h @@ -387,8 +387,8 @@ int xprt_load_transport(const char *); void xprt_set_retrans_timeout_def(struct rpc_task *task); void xprt_set_retrans_timeout_rtt(struct rpc_task *task); void xprt_wake_pending_tasks(struct rpc_xprt *xprt, int status); -void xprt_wait_for_buffer_space(struct rpc_task *task, rpc_action action); -void xprt_write_space(struct rpc_xprt *xprt); +void xprt_wait_for_buffer_space(struct rpc_xprt *xprt); +bool xprt_write_space(struct rpc_xprt *xprt); void xprt_adjust_cwnd(struct rpc_xprt *xprt, struct rpc_task *task, int result); struct rpc_rqst * xprt_lookup_rqst(struct rpc_xprt *xprt, __be32 xid); void xprt_update_rtt(struct rpc_task *task); @@ -416,6 +416,7 @@ void xprt_unlock_connect(struct rpc_xprt *, void *); #define XPRT_CLOSING (6) #define XPRT_CONGESTED (9) #define XPRT_CWND_WAIT (10) +#define XPRT_WRITE_SPACE (11) static inline void xprt_set_connected(struct rpc_xprt *xprt) { diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index f03911f84953..0c4b2e7d791f 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1964,13 +1964,14 @@ call_transmit(struct rpc_task *task) { dprint_status(task); + task->tk_status = 0; + if (test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate)) { + if (!xprt_prepare_transmit(task)) + return; + xprt_transmit(task); + } task->tk_action = call_transmit_status; - if (!test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate)) - return; - - if (!xprt_prepare_transmit(task)) - return; - xprt_transmit(task); + xprt_end_transmit(task); } /* @@ -1986,7 +1987,6 @@ call_transmit_status(struct rpc_task *task) * test first. */ if (task->tk_status == 0) { - xprt_end_transmit(task); xprt_request_wait_receive(task); return; } @@ -1994,15 +1994,8 @@ call_transmit_status(struct rpc_task *task) switch (task->tk_status) { default: dprint_status(task); - xprt_end_transmit(task); - break; - case -EBADSLT: - xprt_end_transmit(task); - task->tk_action = call_transmit; - task->tk_status = 0; break; case -EBADMSG: - xprt_end_transmit(task); task->tk_status = 0; task->tk_action = call_encode; break; @@ -2015,6 +2008,7 @@ call_transmit_status(struct rpc_task *task) case -ENOBUFS: rpc_delay(task, HZ>>2); /* fall through */ + case -EBADSLT: case -EAGAIN: task->tk_action = call_transmit; task->tk_status = 0; @@ -2026,7 +2020,6 @@ call_transmit_status(struct rpc_task *task) case -ENETUNREACH: case -EPERM: if (RPC_IS_SOFTCONN(task)) { - xprt_end_transmit(task); if (!task->tk_msg.rpc_proc->p_proc) trace_xprt_ping(task->tk_xprt, task->tk_status); @@ -2069,9 +2062,6 @@ call_bc_transmit(struct rpc_task *task) xprt_transmit(task); - if (task->tk_status == -EAGAIN) - goto out_retry; - xprt_end_transmit(task); dprint_status(task); switch (task->tk_status) { @@ -2087,6 +2077,8 @@ call_bc_transmit(struct rpc_task *task) case -ENOTCONN: case -EPIPE: break; + case -EAGAIN: + goto out_retry; case -ETIMEDOUT: /* * Problem reaching the server. Disconnect and let the diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index 5185efb9027b..87533fbb96cf 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -171,7 +171,6 @@ void svc_xprt_init(struct net *net, struct svc_xprt_class *xcl, mutex_init(&xprt->xpt_mutex); spin_lock_init(&xprt->xpt_lock); set_bit(XPT_BUSY, &xprt->xpt_flags); - rpc_init_wait_queue(&xprt->xpt_bc_pending, "xpt_bc_pending"); xprt->xpt_net = get_net(net); strcpy(xprt->xpt_remotebuf, "uninitialized"); } @@ -895,7 +894,6 @@ int svc_send(struct svc_rqst *rqstp) else len = xprt->xpt_ops->xpo_sendto(rqstp); mutex_unlock(&xprt->xpt_mutex); - rpc_wake_up(&xprt->xpt_bc_pending); trace_svc_send(rqstp, len); svc_xprt_release(rqstp); diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 6bdc10147297..e4d57f5be5e2 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -169,6 +169,17 @@ int xprt_load_transport(const char *transport_name) } EXPORT_SYMBOL_GPL(xprt_load_transport); +static void xprt_clear_locked(struct rpc_xprt *xprt) +{ + xprt->snd_task = NULL; + if (!test_bit(XPRT_CLOSE_WAIT, &xprt->state)) { + smp_mb__before_atomic(); + clear_bit(XPRT_LOCKED, &xprt->state); + smp_mb__after_atomic(); + } else + queue_work(xprtiod_workqueue, &xprt->task_cleanup); +} + /** * xprt_reserve_xprt - serialize write access to transports * @task: task that is requesting access to the transport @@ -188,10 +199,14 @@ int xprt_reserve_xprt(struct rpc_xprt *xprt, struct rpc_task *task) return 1; goto out_sleep; } + if (test_bit(XPRT_WRITE_SPACE, &xprt->state)) + goto out_unlock; xprt->snd_task = task; return 1; +out_unlock: + xprt_clear_locked(xprt); out_sleep: dprintk("RPC: %5u failed to lock transport %p\n", task->tk_pid, xprt); @@ -208,17 +223,6 @@ int xprt_reserve_xprt(struct rpc_xprt *xprt, struct rpc_task *task) } EXPORT_SYMBOL_GPL(xprt_reserve_xprt); -static void xprt_clear_locked(struct rpc_xprt *xprt) -{ - xprt->snd_task = NULL; - if (!test_bit(XPRT_CLOSE_WAIT, &xprt->state)) { - smp_mb__before_atomic(); - clear_bit(XPRT_LOCKED, &xprt->state); - smp_mb__after_atomic(); - } else - queue_work(xprtiod_workqueue, &xprt->task_cleanup); -} - static bool xprt_need_congestion_window_wait(struct rpc_xprt *xprt) { @@ -267,10 +271,13 @@ int xprt_reserve_xprt_cong(struct rpc_xprt *xprt, struct rpc_task *task) xprt->snd_task = task; return 1; } + if (test_bit(XPRT_WRITE_SPACE, &xprt->state)) + goto out_unlock; if (!xprt_need_congestion_window_wait(xprt)) { xprt->snd_task = task; return 1; } +out_unlock: xprt_clear_locked(xprt); out_sleep: dprintk("RPC: %5u failed to lock transport %p\n", task->tk_pid, xprt); @@ -309,10 +316,12 @@ static void __xprt_lock_write_next(struct rpc_xprt *xprt) { if (test_and_set_bit(XPRT_LOCKED, &xprt->state)) return; - + if (test_bit(XPRT_WRITE_SPACE, &xprt->state)) + goto out_unlock; if (rpc_wake_up_first_on_wq(xprtiod_workqueue, &xprt->sending, __xprt_lock_write_func, xprt)) return; +out_unlock: xprt_clear_locked(xprt); } @@ -320,6 +329,8 @@ static void __xprt_lock_write_next_cong(struct rpc_xprt *xprt) { if (test_and_set_bit(XPRT_LOCKED, &xprt->state)) return; + if (test_bit(XPRT_WRITE_SPACE, &xprt->state)) + goto out_unlock; if (xprt_need_congestion_window_wait(xprt)) goto out_unlock; if (rpc_wake_up_first_on_wq(xprtiod_workqueue, &xprt->sending, @@ -510,39 +521,46 @@ EXPORT_SYMBOL_GPL(xprt_wake_pending_tasks); /** * xprt_wait_for_buffer_space - wait for transport output buffer to clear - * @task: task to be put to sleep - * @action: function pointer to be executed after wait + * @xprt: transport * * Note that we only set the timer for the case of RPC_IS_SOFT(), since * we don't in general want to force a socket disconnection due to * an incomplete RPC call transmission. */ -void xprt_wait_for_buffer_space(struct rpc_task *task, rpc_action action) +void xprt_wait_for_buffer_space(struct rpc_xprt *xprt) { - struct rpc_rqst *req = task->tk_rqstp; - struct rpc_xprt *xprt = req->rq_xprt; - - task->tk_timeout = RPC_IS_SOFT(task) ? req->rq_timeout : 0; - rpc_sleep_on(&xprt->pending, task, action); + set_bit(XPRT_WRITE_SPACE, &xprt->state); } EXPORT_SYMBOL_GPL(xprt_wait_for_buffer_space); +static bool +xprt_clear_write_space_locked(struct rpc_xprt *xprt) +{ + if (test_and_clear_bit(XPRT_WRITE_SPACE, &xprt->state)) { + __xprt_lock_write_next(xprt); + dprintk("RPC: write space: waking waiting task on " + "xprt %p\n", xprt); + return true; + } + return false; +} + /** * xprt_write_space - wake the task waiting for transport output buffer space * @xprt: transport with waiting tasks * * Can be called in a soft IRQ context, so xprt_write_space never sleeps. */ -void xprt_write_space(struct rpc_xprt *xprt) +bool xprt_write_space(struct rpc_xprt *xprt) { + bool ret; + + if (!test_bit(XPRT_WRITE_SPACE, &xprt->state)) + return false; spin_lock_bh(&xprt->transport_lock); - if (xprt->snd_task) { - dprintk("RPC: write space: waking waiting task on " - "xprt %p\n", xprt); - rpc_wake_up_queued_task_on_wq(xprtiod_workqueue, - &xprt->pending, xprt->snd_task); - } + ret = xprt_clear_write_space_locked(xprt); spin_unlock_bh(&xprt->transport_lock); + return ret; } EXPORT_SYMBOL_GPL(xprt_write_space); @@ -653,6 +671,7 @@ void xprt_disconnect_done(struct rpc_xprt *xprt) dprintk("RPC: disconnected transport %p\n", xprt); spin_lock_bh(&xprt->transport_lock); xprt_clear_connected(xprt); + xprt_clear_write_space_locked(xprt); xprt_wake_pending_tasks(xprt, -EAGAIN); spin_unlock_bh(&xprt->transport_lock); } @@ -1325,9 +1344,7 @@ xprt_transmit(struct rpc_task *task) if (!xprt_request_data_received(task) || test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate)) continue; - } else if (!test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate)) - rpc_wake_up_queued_task(&xprt->pending, task); - else + } else if (test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate)) task->tk_status = status; break; } diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c index 0020dc401215..53fa95d60015 100644 --- a/net/sunrpc/xprtrdma/rpc_rdma.c +++ b/net/sunrpc/xprtrdma/rpc_rdma.c @@ -866,7 +866,7 @@ rpcrdma_marshal_req(struct rpcrdma_xprt *r_xprt, struct rpc_rqst *rqst) out_err: switch (ret) { case -EAGAIN: - xprt_wait_for_buffer_space(rqst->rq_task, NULL); + xprt_wait_for_buffer_space(rqst->rq_xprt); break; case -ENOBUFS: break; diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c index d1618c70edb4..35a8c3aab302 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c @@ -224,12 +224,7 @@ xprt_rdma_bc_send_request(struct rpc_rqst *rqst, struct rpc_task *task) dprintk("svcrdma: sending bc call with xid: %08x\n", be32_to_cpu(rqst->rq_xid)); - if (!mutex_trylock(&sxprt->xpt_mutex)) { - rpc_sleep_on(&sxprt->xpt_bc_pending, task, NULL); - if (!mutex_trylock(&sxprt->xpt_mutex)) - return -EAGAIN; - rpc_wake_up_queued_task(&sxprt->xpt_bc_pending, task); - } + mutex_lock(&sxprt->xpt_mutex); ret = -ENOTCONN; rdma = container_of(sxprt, struct svcxprt_rdma, sc_xprt); diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index f54e8110f4c6..ef8d0e81cbda 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -440,20 +440,12 @@ static int xs_sendpages(struct socket *sock, struct sockaddr *addr, int addrlen, return err; } -static void xs_nospace_callback(struct rpc_task *task) -{ - struct sock_xprt *transport = container_of(task->tk_rqstp->rq_xprt, struct sock_xprt, xprt); - - transport->inet->sk_write_pending--; -} - /** - * xs_nospace - place task on wait queue if transmit was incomplete + * xs_nospace - handle transmit was incomplete * @req: pointer to RPC request - * @task: task to put to sleep * */ -static int xs_nospace(struct rpc_rqst *req, struct rpc_task *task) +static int xs_nospace(struct rpc_rqst *req) { struct rpc_xprt *xprt = req->rq_xprt; struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); @@ -461,7 +453,8 @@ static int xs_nospace(struct rpc_rqst *req, struct rpc_task *task) int ret = -EAGAIN; dprintk("RPC: %5u xmit incomplete (%u left of %u)\n", - task->tk_pid, req->rq_slen - transport->xmit.offset, + req->rq_task->tk_pid, + req->rq_slen - transport->xmit.offset, req->rq_slen); /* Protect against races with write_space */ @@ -471,7 +464,7 @@ static int xs_nospace(struct rpc_rqst *req, struct rpc_task *task) if (xprt_connected(xprt)) { /* wait for more buffer space */ sk->sk_write_pending++; - xprt_wait_for_buffer_space(task, xs_nospace_callback); + xprt_wait_for_buffer_space(xprt); } else ret = -ENOTCONN; @@ -569,7 +562,7 @@ static int xs_local_send_request(struct rpc_rqst *req, struct rpc_task *task) case -ENOBUFS: break; case -EAGAIN: - status = xs_nospace(req, task); + status = xs_nospace(req); break; default: dprintk("RPC: sendmsg returned unrecognized error %d\n", @@ -642,7 +635,7 @@ static int xs_udp_send_request(struct rpc_rqst *req, struct rpc_task *task) /* Should we call xs_close() here? */ break; case -EAGAIN: - status = xs_nospace(req, task); + status = xs_nospace(req); break; case -ENETUNREACH: case -ENOBUFS: @@ -765,7 +758,7 @@ static int xs_tcp_send_request(struct rpc_rqst *req, struct rpc_task *task) /* Should we call xs_close() here? */ break; case -EAGAIN: - status = xs_nospace(req, task); + status = xs_nospace(req); break; case -ECONNRESET: case -ECONNREFUSED: @@ -1672,7 +1665,8 @@ static void xs_write_space(struct sock *sk) if (!wq || test_and_clear_bit(SOCKWQ_ASYNC_NOSPACE, &wq->flags) == 0) goto out; - xprt_write_space(xprt); + if (xprt_write_space(xprt)) + sk->sk_write_pending--; out: rcu_read_unlock(); } @@ -2725,12 +2719,7 @@ static int bc_send_request(struct rpc_rqst *req, struct rpc_task *task) * Grab the mutex to serialize data as the connection is shared * with the fore channel */ - if (!mutex_trylock(&xprt->xpt_mutex)) { - rpc_sleep_on(&xprt->xpt_bc_pending, task, NULL); - if (!mutex_trylock(&xprt->xpt_mutex)) - return -EAGAIN; - rpc_wake_up_queued_task(&xprt->xpt_bc_pending, task); - } + mutex_lock(&xprt->xpt_mutex); if (test_bit(XPT_DEAD, &xprt->xpt_flags)) len = -ENOTCONN; else -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 33/44] SUNRPC: Cleanup: remove the unused 'task' argument from the request_send() 2018-09-17 13:03 ` [PATCH v3 32/44] SUNRPC: Clean up transport write space handling Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 34/44] SUNRPC: Don't take transport->lock unnecessarily when taking XPRT_LOCK Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- include/linux/sunrpc/xprt.h | 2 +- net/sunrpc/xprt.c | 2 +- net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 2 +- net/sunrpc/xprtrdma/transport.c | 4 ++-- net/sunrpc/xprtsock.c | 11 ++++------- 5 files changed, 9 insertions(+), 12 deletions(-) diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h index 5600242ccbf9..823860cce0bc 100644 --- a/include/linux/sunrpc/xprt.h +++ b/include/linux/sunrpc/xprt.h @@ -141,7 +141,7 @@ struct rpc_xprt_ops { void (*connect)(struct rpc_xprt *xprt, struct rpc_task *task); int (*buf_alloc)(struct rpc_task *task); void (*buf_free)(struct rpc_task *task); - int (*send_request)(struct rpc_rqst *req, struct rpc_task *task); + int (*send_request)(struct rpc_rqst *req); void (*set_retrans_timeout)(struct rpc_task *task); void (*timer)(struct rpc_xprt *xprt, struct rpc_task *task); void (*release_request)(struct rpc_task *task); diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index e4d57f5be5e2..d1ea88b3f9d4 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1282,7 +1282,7 @@ xprt_request_transmit(struct rpc_rqst *req, struct rpc_task *snd_task) req->rq_ntrans++; connect_cookie = xprt->connect_cookie; - status = xprt->ops->send_request(req, snd_task); + status = xprt->ops->send_request(req); trace_xprt_transmit(xprt, req->rq_xid, status); if (status != 0) { req->rq_ntrans--; diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c index 35a8c3aab302..992312504cfd 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c @@ -215,7 +215,7 @@ rpcrdma_bc_send_request(struct svcxprt_rdma *rdma, struct rpc_rqst *rqst) * connection. */ static int -xprt_rdma_bc_send_request(struct rpc_rqst *rqst, struct rpc_task *task) +xprt_rdma_bc_send_request(struct rpc_rqst *rqst) { struct svc_xprt *sxprt = rqst->rq_xprt->bc_xprt; struct svcxprt_rdma *rdma; diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index 9ff322e53f37..a5a6a4a353f2 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -693,7 +693,7 @@ xprt_rdma_free(struct rpc_task *task) /** * xprt_rdma_send_request - marshal and send an RPC request - * @task: RPC task with an RPC message in rq_snd_buf + * @rqst: RPC message in rq_snd_buf * * Caller holds the transport's write lock. * @@ -706,7 +706,7 @@ xprt_rdma_free(struct rpc_task *task) * sent. Do not try to send this message again. */ static int -xprt_rdma_send_request(struct rpc_rqst *rqst, struct rpc_task *task) +xprt_rdma_send_request(struct rpc_rqst *rqst) { struct rpc_xprt *xprt = rqst->rq_xprt; struct rpcrdma_req *req = rpcr_to_rdmar(rqst); diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index ef8d0e81cbda..f16406228ead 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -507,7 +507,6 @@ static inline void xs_encode_stream_record_marker(struct xdr_buf *buf) /** * xs_local_send_request - write an RPC request to an AF_LOCAL socket * @req: pointer to RPC request - * @task: RPC task that manages the state of an RPC request * * Return values: * 0: The request has been sent @@ -516,7 +515,7 @@ static inline void xs_encode_stream_record_marker(struct xdr_buf *buf) * ENOTCONN: Caller needs to invoke connect logic then call again * other: Some other error occured, the request was not sent */ -static int xs_local_send_request(struct rpc_rqst *req, struct rpc_task *task) +static int xs_local_send_request(struct rpc_rqst *req) { struct rpc_xprt *xprt = req->rq_xprt; struct sock_xprt *transport = @@ -579,7 +578,6 @@ static int xs_local_send_request(struct rpc_rqst *req, struct rpc_task *task) /** * xs_udp_send_request - write an RPC request to a UDP socket * @req: pointer to RPC request - * @task: address of RPC task that manages the state of an RPC request * * Return values: * 0: The request has been sent @@ -588,7 +586,7 @@ static int xs_local_send_request(struct rpc_rqst *req, struct rpc_task *task) * ENOTCONN: Caller needs to invoke connect logic then call again * other: Some other error occurred, the request was not sent */ -static int xs_udp_send_request(struct rpc_rqst *req, struct rpc_task *task) +static int xs_udp_send_request(struct rpc_rqst *req) { struct rpc_xprt *xprt = req->rq_xprt; struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); @@ -656,7 +654,6 @@ static int xs_udp_send_request(struct rpc_rqst *req, struct rpc_task *task) /** * xs_tcp_send_request - write an RPC request to a TCP socket * @req: pointer to RPC request - * @task: address of RPC task that manages the state of an RPC request * * Return values: * 0: The request has been sent @@ -668,7 +665,7 @@ static int xs_udp_send_request(struct rpc_rqst *req, struct rpc_task *task) * XXX: In the case of soft timeouts, should we eventually give up * if sendmsg is not able to make progress? */ -static int xs_tcp_send_request(struct rpc_rqst *req, struct rpc_task *task) +static int xs_tcp_send_request(struct rpc_rqst *req) { struct rpc_xprt *xprt = req->rq_xprt; struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); @@ -2704,7 +2701,7 @@ static int bc_sendto(struct rpc_rqst *req) /* * The send routine. Borrows from svc_send */ -static int bc_send_request(struct rpc_rqst *req, struct rpc_task *task) +static int bc_send_request(struct rpc_rqst *req) { struct svc_xprt *xprt; int len; -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 34/44] SUNRPC: Don't take transport->lock unnecessarily when taking XPRT_LOCK 2018-09-17 13:03 ` [PATCH v3 33/44] SUNRPC: Cleanup: remove the unused 'task' argument from the request_send() Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 35/44] SUNRPC: Convert xprt receive queue to use an rbtree Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- net/sunrpc/xprt.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index d1ea88b3f9d4..a1cb28a4adad 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -298,6 +298,8 @@ static inline int xprt_lock_write(struct rpc_xprt *xprt, struct rpc_task *task) { int retval; + if (test_bit(XPRT_LOCKED, &xprt->state) && xprt->snd_task == task) + return 1; spin_lock_bh(&xprt->transport_lock); retval = xprt->ops->reserve_xprt(xprt, task); spin_unlock_bh(&xprt->transport_lock); @@ -375,6 +377,8 @@ EXPORT_SYMBOL_GPL(xprt_release_xprt_cong); static inline void xprt_release_write(struct rpc_xprt *xprt, struct rpc_task *task) { + if (xprt->snd_task != task) + return; spin_lock_bh(&xprt->transport_lock); xprt->ops->release_xprt(xprt, task); spin_unlock_bh(&xprt->transport_lock); @@ -1644,8 +1648,7 @@ void xprt_release(struct rpc_task *task) if (req == NULL) { if (task->tk_client) { xprt = task->tk_xprt; - if (xprt->snd_task == task) - xprt_release_write(xprt, task); + xprt_release_write(xprt, task); } return; } -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 35/44] SUNRPC: Convert xprt receive queue to use an rbtree 2018-09-17 13:03 ` [PATCH v3 34/44] SUNRPC: Don't take transport->lock unnecessarily when taking XPRT_LOCK Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 36/44] SUNRPC: Fix priority queue fairness Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs If the server is slow, we can find ourselves with quite a lot of entries on the receive queue. Converting the search from an O(n) to O(log(n)) can make a significant difference, particularly since we have to hold a number of locks while searching. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- include/linux/sunrpc/xprt.h | 4 +- net/sunrpc/xprt.c | 93 ++++++++++++++++++++++++++++++++----- 2 files changed, 84 insertions(+), 13 deletions(-) diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h index 823860cce0bc..9be399020dab 100644 --- a/include/linux/sunrpc/xprt.h +++ b/include/linux/sunrpc/xprt.h @@ -85,7 +85,7 @@ struct rpc_rqst { union { struct list_head rq_list; /* Slot allocation list */ - struct list_head rq_recv; /* Receive queue */ + struct rb_node rq_recv; /* Receive queue */ }; struct list_head rq_xmit; /* Send queue */ @@ -260,7 +260,7 @@ struct rpc_xprt { * backchannel rpc_rqst's */ #endif /* CONFIG_SUNRPC_BACKCHANNEL */ - struct list_head recv_queue; /* Receive queue */ + struct rb_root recv_queue; /* Receive queue */ struct { unsigned long bind_count, /* total number of binds */ diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index a1cb28a4adad..051638d5b39c 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -753,7 +753,7 @@ static void xprt_schedule_autodisconnect(struct rpc_xprt *xprt) __must_hold(&xprt->transport_lock) { - if (list_empty(&xprt->recv_queue) && xprt_has_timer(xprt)) + if (RB_EMPTY_ROOT(&xprt->recv_queue) && xprt_has_timer(xprt)) mod_timer(&xprt->timer, xprt->last_used + xprt->idle_timeout); } @@ -763,7 +763,7 @@ xprt_init_autodisconnect(struct timer_list *t) struct rpc_xprt *xprt = from_timer(xprt, t, timer); spin_lock(&xprt->transport_lock); - if (!list_empty(&xprt->recv_queue)) + if (!RB_EMPTY_ROOT(&xprt->recv_queue)) goto out_abort; /* Reset xprt->last_used to avoid connect/autodisconnect cycling */ xprt->last_used = jiffies; @@ -880,6 +880,75 @@ static void xprt_connect_status(struct rpc_task *task) } } +enum xprt_xid_rb_cmp { + XID_RB_EQUAL, + XID_RB_LEFT, + XID_RB_RIGHT, +}; +static enum xprt_xid_rb_cmp +xprt_xid_cmp(__be32 xid1, __be32 xid2) +{ + if (xid1 == xid2) + return XID_RB_EQUAL; + if ((__force u32)xid1 < (__force u32)xid2) + return XID_RB_LEFT; + return XID_RB_RIGHT; +} + +static struct rpc_rqst * +xprt_request_rb_find(struct rpc_xprt *xprt, __be32 xid) +{ + struct rb_node *n = xprt->recv_queue.rb_node; + struct rpc_rqst *req; + + while (n != NULL) { + req = rb_entry(n, struct rpc_rqst, rq_recv); + switch (xprt_xid_cmp(xid, req->rq_xid)) { + case XID_RB_LEFT: + n = n->rb_left; + break; + case XID_RB_RIGHT: + n = n->rb_right; + break; + case XID_RB_EQUAL: + return req; + } + } + return NULL; +} + +static void +xprt_request_rb_insert(struct rpc_xprt *xprt, struct rpc_rqst *new) +{ + struct rb_node **p = &xprt->recv_queue.rb_node; + struct rb_node *n = NULL; + struct rpc_rqst *req; + + while (*p != NULL) { + n = *p; + req = rb_entry(n, struct rpc_rqst, rq_recv); + switch(xprt_xid_cmp(new->rq_xid, req->rq_xid)) { + case XID_RB_LEFT: + p = &n->rb_left; + break; + case XID_RB_RIGHT: + p = &n->rb_right; + break; + case XID_RB_EQUAL: + WARN_ON_ONCE(new != req); + return; + } + } + rb_link_node(&new->rq_recv, n, p); + rb_insert_color(&new->rq_recv, &xprt->recv_queue); +} + +static void +xprt_request_rb_remove(struct rpc_xprt *xprt, struct rpc_rqst *req) +{ + rb_erase(&req->rq_recv, &xprt->recv_queue); +} + /** * xprt_lookup_rqst - find an RPC request corresponding to an XID * @xprt: transport on which the original request was transmitted @@ -891,12 +960,12 @@ struct rpc_rqst *xprt_lookup_rqst(struct rpc_xprt *xprt, __be32 xid) { struct rpc_rqst *entry; - list_for_each_entry(entry, &xprt->recv_queue, rq_recv) - if (entry->rq_xid == xid) { - trace_xprt_lookup_rqst(xprt, xid, 0); - entry->rq_rtt = ktime_sub(ktime_get(), entry->rq_xtime); - return entry; - } + entry = xprt_request_rb_find(xprt, xid); + if (entry != NULL) { + trace_xprt_lookup_rqst(xprt, xid, 0); + entry->rq_rtt = ktime_sub(ktime_get(), entry->rq_xtime); + return entry; + } dprintk("RPC: xprt_lookup_rqst did not find xid %08x\n", ntohl(xid)); @@ -980,7 +1049,7 @@ xprt_request_enqueue_receive(struct rpc_task *task) sizeof(req->rq_private_buf)); /* Add request to the receive list */ - list_add_tail(&req->rq_recv, &xprt->recv_queue); + xprt_request_rb_insert(xprt, req); set_bit(RPC_TASK_NEED_RECV, &task->tk_runstate); spin_unlock(&xprt->queue_lock); @@ -998,8 +1067,10 @@ xprt_request_enqueue_receive(struct rpc_task *task) static void xprt_request_dequeue_receive_locked(struct rpc_task *task) { + struct rpc_rqst *req = task->tk_rqstp; + if (test_and_clear_bit(RPC_TASK_NEED_RECV, &task->tk_runstate)) - list_del(&task->tk_rqstp->rq_recv); + xprt_request_rb_remove(req->rq_xprt, req); } /** @@ -1710,7 +1781,7 @@ static void xprt_init(struct rpc_xprt *xprt, struct net *net) spin_lock_init(&xprt->queue_lock); INIT_LIST_HEAD(&xprt->free); - INIT_LIST_HEAD(&xprt->recv_queue); + xprt->recv_queue = RB_ROOT; INIT_LIST_HEAD(&xprt->xmit_queue); #if defined(CONFIG_SUNRPC_BACKCHANNEL) spin_lock_init(&xprt->bc_pa_lock); -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 36/44] SUNRPC: Fix priority queue fairness 2018-09-17 13:03 ` [PATCH v3 35/44] SUNRPC: Convert xprt receive queue to use an rbtree Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 37/44] SUNRPC: Convert the xprt->sending queue back to an ordinary wait queue Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs Fix up the priority queue to not batch by owner, but by queue, so that we allow '1 << priority' elements to be dequeued before switching to the next priority queue. The owner field is still used to wake up requests in round robin order by owner to avoid single processes hogging the RPC layer by loading the queues. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- include/linux/sunrpc/sched.h | 2 - net/sunrpc/sched.c | 109 +++++++++++++++++------------------ 2 files changed, 54 insertions(+), 57 deletions(-) diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h index 8840a420cf4c..7b540c066594 100644 --- a/include/linux/sunrpc/sched.h +++ b/include/linux/sunrpc/sched.h @@ -189,7 +189,6 @@ struct rpc_timer { struct rpc_wait_queue { spinlock_t lock; struct list_head tasks[RPC_NR_PRIORITY]; /* task queue for each priority level */ - pid_t owner; /* process id of last task serviced */ unsigned char maxpriority; /* maximum priority (0 if queue is not a priority queue) */ unsigned char priority; /* current priority */ unsigned char nr; /* # tasks remaining for cookie */ @@ -205,7 +204,6 @@ struct rpc_wait_queue { * from a single cookie. The aim is to improve * performance of NFS operations such as read/write. */ -#define RPC_BATCH_COUNT 16 #define RPC_IS_PRIORITY(q) ((q)->maxpriority > 0) /* diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index 9a8ec012b449..57ca5bead1cb 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -99,64 +99,78 @@ __rpc_add_timer(struct rpc_wait_queue *queue, struct rpc_task *task) list_add(&task->u.tk_wait.timer_list, &queue->timer_list.list); } -static void rpc_rotate_queue_owner(struct rpc_wait_queue *queue) -{ - struct list_head *q = &queue->tasks[queue->priority]; - struct rpc_task *task; - - if (!list_empty(q)) { - task = list_first_entry(q, struct rpc_task, u.tk_wait.list); - if (task->tk_owner == queue->owner) - list_move_tail(&task->u.tk_wait.list, q); - } -} - static void rpc_set_waitqueue_priority(struct rpc_wait_queue *queue, int priority) { if (queue->priority != priority) { - /* Fairness: rotate the list when changing priority */ - rpc_rotate_queue_owner(queue); queue->priority = priority; + queue->nr = 1U << priority; } } -static void rpc_set_waitqueue_owner(struct rpc_wait_queue *queue, pid_t pid) -{ - queue->owner = pid; - queue->nr = RPC_BATCH_COUNT; -} - static void rpc_reset_waitqueue_priority(struct rpc_wait_queue *queue) { rpc_set_waitqueue_priority(queue, queue->maxpriority); - rpc_set_waitqueue_owner(queue, 0); } /* - * Add new request to a priority queue. + * Add a request to a queue list */ -static void __rpc_add_wait_queue_priority(struct rpc_wait_queue *queue, - struct rpc_task *task, - unsigned char queue_priority) +static void +__rpc_list_enqueue_task(struct list_head *q, struct rpc_task *task) { - struct list_head *q; struct rpc_task *t; - INIT_LIST_HEAD(&task->u.tk_wait.links); - if (unlikely(queue_priority > queue->maxpriority)) - queue_priority = queue->maxpriority; - if (queue_priority > queue->priority) - rpc_set_waitqueue_priority(queue, queue_priority); - q = &queue->tasks[queue_priority]; list_for_each_entry(t, q, u.tk_wait.list) { if (t->tk_owner == task->tk_owner) { - list_add_tail(&task->u.tk_wait.list, &t->u.tk_wait.links); + list_add_tail(&task->u.tk_wait.links, + &t->u.tk_wait.links); + /* Cache the queue head in task->u.tk_wait.list */ + task->u.tk_wait.list.next = q; + task->u.tk_wait.list.prev = NULL; return; } } + INIT_LIST_HEAD(&task->u.tk_wait.links); list_add_tail(&task->u.tk_wait.list, q); } +/* + * Remove request from a queue list + */ +static void +__rpc_list_dequeue_task(struct rpc_task *task) +{ + struct list_head *q; + struct rpc_task *t; + + if (task->u.tk_wait.list.prev == NULL) { + list_del(&task->u.tk_wait.links); + return; + } + if (!list_empty(&task->u.tk_wait.links)) { + t = list_first_entry(&task->u.tk_wait.links, + struct rpc_task, + u.tk_wait.links); + /* Assume __rpc_list_enqueue_task() cached the queue head */ + q = t->u.tk_wait.list.next; + list_add_tail(&t->u.tk_wait.list, q); + list_del(&task->u.tk_wait.links); + } + list_del(&task->u.tk_wait.list); +} + +/* + * Add new request to a priority queue. + */ +static void __rpc_add_wait_queue_priority(struct rpc_wait_queue *queue, + struct rpc_task *task, + unsigned char queue_priority) +{ + if (unlikely(queue_priority > queue->maxpriority)) + queue_priority = queue->maxpriority; + __rpc_list_enqueue_task(&queue->tasks[queue_priority], task); +} + /* * Add new request to wait queue. * @@ -194,13 +208,7 @@ static void __rpc_add_wait_queue(struct rpc_wait_queue *queue, */ static void __rpc_remove_wait_queue_priority(struct rpc_task *task) { - struct rpc_task *t; - - if (!list_empty(&task->u.tk_wait.links)) { - t = list_entry(task->u.tk_wait.links.next, struct rpc_task, u.tk_wait.list); - list_move(&t->u.tk_wait.list, &task->u.tk_wait.list); - list_splice_init(&task->u.tk_wait.links, &t->u.tk_wait.links); - } + __rpc_list_dequeue_task(task); } /* @@ -212,7 +220,8 @@ static void __rpc_remove_wait_queue(struct rpc_wait_queue *queue, struct rpc_tas __rpc_disable_timer(queue, task); if (RPC_IS_PRIORITY(queue)) __rpc_remove_wait_queue_priority(task); - list_del(&task->u.tk_wait.list); + else + list_del(&task->u.tk_wait.list); queue->qlen--; dprintk("RPC: %5u removed from queue %p \"%s\"\n", task->tk_pid, queue, rpc_qname(queue)); @@ -545,17 +554,9 @@ static struct rpc_task *__rpc_find_next_queued_priority(struct rpc_wait_queue *q * Service a batch of tasks from a single owner. */ q = &queue->tasks[queue->priority]; - if (!list_empty(q)) { - task = list_entry(q->next, struct rpc_task, u.tk_wait.list); - if (queue->owner == task->tk_owner) { - if (--queue->nr) - goto out; - list_move_tail(&task->u.tk_wait.list, q); - } - /* - * Check if we need to switch queues. - */ - goto new_owner; + if (!list_empty(q) && --queue->nr) { + task = list_first_entry(q, struct rpc_task, u.tk_wait.list); + goto out; } /* @@ -567,7 +568,7 @@ static struct rpc_task *__rpc_find_next_queued_priority(struct rpc_wait_queue *q else q = q - 1; if (!list_empty(q)) { - task = list_entry(q->next, struct rpc_task, u.tk_wait.list); + task = list_first_entry(q, struct rpc_task, u.tk_wait.list); goto new_queue; } } while (q != &queue->tasks[queue->priority]); @@ -577,8 +578,6 @@ static struct rpc_task *__rpc_find_next_queued_priority(struct rpc_wait_queue *q new_queue: rpc_set_waitqueue_priority(queue, (unsigned int)(q - &queue->tasks[0])); -new_owner: - rpc_set_waitqueue_owner(queue, task->tk_owner); out: return task; } -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 37/44] SUNRPC: Convert the xprt->sending queue back to an ordinary wait queue 2018-09-17 13:03 ` [PATCH v3 36/44] SUNRPC: Fix priority queue fairness Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 38/44] SUNRPC: Add a label for RPC calls that require allocation on receive Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs We no longer need priority semantics on the xprt->sending queue, because the order in which tasks are sent is now dictated by their position in the send queue. Note that the backlog queue remains a priority queue, meaning that slot resources are still managed in order of task priority. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- net/sunrpc/xprt.c | 20 +++----------------- 1 file changed, 3 insertions(+), 17 deletions(-) diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 051638d5b39c..d1a67e97e7d3 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -192,7 +192,6 @@ static void xprt_clear_locked(struct rpc_xprt *xprt) int xprt_reserve_xprt(struct rpc_xprt *xprt, struct rpc_task *task) { struct rpc_rqst *req = task->tk_rqstp; - int priority; if (test_and_set_bit(XPRT_LOCKED, &xprt->state)) { if (task == xprt->snd_task) @@ -212,13 +211,7 @@ int xprt_reserve_xprt(struct rpc_xprt *xprt, struct rpc_task *task) task->tk_pid, xprt); task->tk_timeout = RPC_IS_SOFT(task) ? req->rq_timeout : 0; task->tk_status = -EAGAIN; - if (req == NULL) - priority = RPC_PRIORITY_LOW; - else if (!req->rq_ntrans) - priority = RPC_PRIORITY_NORMAL; - else - priority = RPC_PRIORITY_HIGH; - rpc_sleep_on_priority(&xprt->sending, task, NULL, priority); + rpc_sleep_on(&xprt->sending, task, NULL); return 0; } EXPORT_SYMBOL_GPL(xprt_reserve_xprt); @@ -260,7 +253,6 @@ xprt_test_and_clear_congestion_window_wait(struct rpc_xprt *xprt) int xprt_reserve_xprt_cong(struct rpc_xprt *xprt, struct rpc_task *task) { struct rpc_rqst *req = task->tk_rqstp; - int priority; if (test_and_set_bit(XPRT_LOCKED, &xprt->state)) { if (task == xprt->snd_task) @@ -283,13 +275,7 @@ int xprt_reserve_xprt_cong(struct rpc_xprt *xprt, struct rpc_task *task) dprintk("RPC: %5u failed to lock transport %p\n", task->tk_pid, xprt); task->tk_timeout = RPC_IS_SOFT(task) ? req->rq_timeout : 0; task->tk_status = -EAGAIN; - if (req == NULL) - priority = RPC_PRIORITY_LOW; - else if (!req->rq_ntrans) - priority = RPC_PRIORITY_NORMAL; - else - priority = RPC_PRIORITY_HIGH; - rpc_sleep_on_priority(&xprt->sending, task, NULL, priority); + rpc_sleep_on(&xprt->sending, task, NULL); return 0; } EXPORT_SYMBOL_GPL(xprt_reserve_xprt_cong); @@ -1795,7 +1781,7 @@ static void xprt_init(struct rpc_xprt *xprt, struct net *net) rpc_init_wait_queue(&xprt->binding, "xprt_binding"); rpc_init_wait_queue(&xprt->pending, "xprt_pending"); - rpc_init_priority_wait_queue(&xprt->sending, "xprt_sending"); + rpc_init_wait_queue(&xprt->sending, "xprt_sending"); rpc_init_priority_wait_queue(&xprt->backlog, "xprt_backlog"); xprt_init_xid(xprt); -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 38/44] SUNRPC: Add a label for RPC calls that require allocation on receive 2018-09-17 13:03 ` [PATCH v3 37/44] SUNRPC: Convert the xprt->sending queue back to an ordinary wait queue Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 39/44] SUNRPC: Add a bvec array to struct xdr_buf for use with iovec_iter() Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs If the RPC call relies on the receive call allocating pages as buffers, then let's label it so that we a) Don't leak memory by allocating pages for requests that do not expect this behaviour b) Can optimise for the common case where calls do not require allocation. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- fs/nfs/nfs3xdr.c | 4 +++- include/linux/sunrpc/xdr.h | 1 + net/sunrpc/auth_gss/gss_rpc_xdr.c | 1 + net/sunrpc/socklib.c | 2 +- 4 files changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c index 64e4fa33d89f..d8c4c10b15f7 100644 --- a/fs/nfs/nfs3xdr.c +++ b/fs/nfs/nfs3xdr.c @@ -1364,10 +1364,12 @@ static void nfs3_xdr_enc_getacl3args(struct rpc_rqst *req, encode_nfs_fh3(xdr, args->fh); encode_uint32(xdr, args->mask); - if (args->mask & (NFS_ACL | NFS_DFACL)) + if (args->mask & (NFS_ACL | NFS_DFACL)) { prepare_reply_buffer(req, args->pages, 0, NFSACL_MAXPAGES << PAGE_SHIFT, ACL3_getaclres_sz); + req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES; + } } static void nfs3_xdr_enc_setacl3args(struct rpc_rqst *req, diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index 2bd68177a442..431829233392 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -58,6 +58,7 @@ struct xdr_buf { flags; /* Flags for data disposition */ #define XDRBUF_READ 0x01 /* target of file read */ #define XDRBUF_WRITE 0x02 /* source of file write */ +#define XDRBUF_SPARSE_PAGES 0x04 /* Page array is sparse */ unsigned int buflen, /* Total length of storage buffer */ len; /* Length of XDR encoded message */ diff --git a/net/sunrpc/auth_gss/gss_rpc_xdr.c b/net/sunrpc/auth_gss/gss_rpc_xdr.c index 444380f968f1..006062ad5f58 100644 --- a/net/sunrpc/auth_gss/gss_rpc_xdr.c +++ b/net/sunrpc/auth_gss/gss_rpc_xdr.c @@ -784,6 +784,7 @@ void gssx_enc_accept_sec_context(struct rpc_rqst *req, xdr_inline_pages(&req->rq_rcv_buf, PAGE_SIZE/2 /* pretty arbitrary */, arg->pages, 0 /* page base */, arg->npages * PAGE_SIZE); + req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES; done: if (err) dprintk("RPC: gssx_enc_accept_sec_context: %d\n", err); diff --git a/net/sunrpc/socklib.c b/net/sunrpc/socklib.c index f217c348b341..08f00a98151f 100644 --- a/net/sunrpc/socklib.c +++ b/net/sunrpc/socklib.c @@ -104,7 +104,7 @@ ssize_t xdr_partial_copy_from_skb(struct xdr_buf *xdr, unsigned int base, struct /* ACL likes to be lazy in allocating pages - ACLs * are small by default but can get huge. */ - if (unlikely(*ppage == NULL)) { + if ((xdr->flags & XDRBUF_SPARSE_PAGES) && *ppage == NULL) { *ppage = alloc_page(GFP_ATOMIC); if (unlikely(*ppage == NULL)) { if (copied == 0) -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 39/44] SUNRPC: Add a bvec array to struct xdr_buf for use with iovec_iter() 2018-09-17 13:03 ` [PATCH v3 38/44] SUNRPC: Add a label for RPC calls that require allocation on receive Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs Add a bvec array to struct xdr_buf, and have the client allocate it when we need to receive data into pages. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- include/linux/sunrpc/xdr.h | 7 +++++++ include/linux/sunrpc/xprt.h | 2 ++ net/sunrpc/clnt.c | 4 +++- net/sunrpc/xdr.c | 34 ++++++++++++++++++++++++++++++++++ net/sunrpc/xprt.c | 17 +++++++++++++++++ 5 files changed, 63 insertions(+), 1 deletion(-) diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index 431829233392..745587132a87 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -18,6 +18,7 @@ #include <asm/unaligned.h> #include <linux/scatterlist.h> +struct bio_vec; struct rpc_rqst; /* @@ -52,6 +53,7 @@ struct xdr_buf { struct kvec head[1], /* RPC header + non-page data */ tail[1]; /* Appended after page data */ + struct bio_vec *bvec; struct page ** pages; /* Array of pages */ unsigned int page_base, /* Start of page data */ page_len, /* Length of page data */ @@ -70,6 +72,8 @@ xdr_buf_init(struct xdr_buf *buf, void *start, size_t len) buf->head[0].iov_base = start; buf->head[0].iov_len = len; buf->tail[0].iov_len = 0; + buf->bvec = NULL; + buf->pages = NULL; buf->page_len = 0; buf->flags = 0; buf->len = 0; @@ -116,6 +120,9 @@ __be32 *xdr_decode_netobj(__be32 *p, struct xdr_netobj *); void xdr_inline_pages(struct xdr_buf *, unsigned int, struct page **, unsigned int, unsigned int); void xdr_terminate_string(struct xdr_buf *, const u32); +size_t xdr_buf_pagecount(struct xdr_buf *buf); +int xdr_alloc_bvec(struct xdr_buf *buf, gfp_t gfp); +void xdr_free_bvec(struct xdr_buf *buf); static inline __be32 *xdr_encode_array(__be32 *p, const void *s, unsigned int len) { diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h index 9be399020dab..a4ab4f8d9140 100644 --- a/include/linux/sunrpc/xprt.h +++ b/include/linux/sunrpc/xprt.h @@ -141,6 +141,7 @@ struct rpc_xprt_ops { void (*connect)(struct rpc_xprt *xprt, struct rpc_task *task); int (*buf_alloc)(struct rpc_task *task); void (*buf_free)(struct rpc_task *task); + void (*prepare_request)(struct rpc_rqst *req); int (*send_request)(struct rpc_rqst *req); void (*set_retrans_timeout)(struct rpc_task *task); void (*timer)(struct rpc_xprt *xprt, struct rpc_task *task); @@ -343,6 +344,7 @@ int xprt_reserve_xprt_cong(struct rpc_xprt *xprt, struct rpc_task *task); void xprt_alloc_slot(struct rpc_xprt *xprt, struct rpc_task *task); void xprt_free_slot(struct rpc_xprt *xprt, struct rpc_rqst *req); +void xprt_request_prepare(struct rpc_rqst *req); bool xprt_prepare_transmit(struct rpc_task *task); void xprt_request_enqueue_transmit(struct rpc_task *task); void xprt_request_enqueue_receive(struct rpc_task *task); diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index 0c4b2e7d791f..ae3b8145da35 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1753,6 +1753,8 @@ rpc_xdr_encode(struct rpc_task *task) task->tk_status = rpcauth_wrap_req(task, encode, req, p, task->tk_msg.rpc_argp); + if (task->tk_status == 0) + xprt_request_prepare(req); } /* @@ -1768,7 +1770,7 @@ call_encode(struct rpc_task *task) /* Did the encode result in an error condition? */ if (task->tk_status != 0) { /* Was the error nonfatal? */ - if (task->tk_status == -EAGAIN) + if (task->tk_status == -EAGAIN || task->tk_status == -ENOMEM) rpc_delay(task, HZ >> 4); else rpc_exit(task, task->tk_status); diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c index 30afbd236656..2bbb8d38d2bf 100644 --- a/net/sunrpc/xdr.c +++ b/net/sunrpc/xdr.c @@ -15,6 +15,7 @@ #include <linux/errno.h> #include <linux/sunrpc/xdr.h> #include <linux/sunrpc/msg_prot.h> +#include <linux/bvec.h> /* * XDR functions for basic NFS types @@ -128,6 +129,39 @@ xdr_terminate_string(struct xdr_buf *buf, const u32 len) } EXPORT_SYMBOL_GPL(xdr_terminate_string); +size_t +xdr_buf_pagecount(struct xdr_buf *buf) +{ + if (!buf->page_len) + return 0; + return (buf->page_base + buf->page_len + PAGE_SIZE - 1) >> PAGE_SHIFT; +} + +int +xdr_alloc_bvec(struct xdr_buf *buf, gfp_t gfp) +{ + size_t i, n = xdr_buf_pagecount(buf); + + if (n != 0 && buf->bvec == NULL) { + buf->bvec = kmalloc_array(n, sizeof(buf->bvec[0]), gfp); + if (!buf->bvec) + return -ENOMEM; + for (i = 0; i < n; i++) { + buf->bvec[i].bv_page = buf->pages[i]; + buf->bvec[i].bv_len = PAGE_SIZE; + buf->bvec[i].bv_offset = 0; + } + } + return 0; +} + +void +xdr_free_bvec(struct xdr_buf *buf) +{ + kfree(buf->bvec); + buf->bvec = NULL; +} + void xdr_inline_pages(struct xdr_buf *xdr, unsigned int offset, struct page **pages, unsigned int base, unsigned int len) diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index d1a67e97e7d3..547519f25878 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1262,6 +1262,22 @@ xprt_request_dequeue_transmit(struct rpc_task *task) spin_unlock(&xprt->queue_lock); } +/** + * xprt_request_prepare - prepare an encoded request for transport + * @req: pointer to rpc_rqst + * + * Calls into the transport layer to do whatever is needed to prepare + * the request for transmission or receive. + */ +void +xprt_request_prepare(struct rpc_rqst *req) +{ + struct rpc_xprt *xprt = req->rq_xprt; + + if (xprt->ops->prepare_request) + xprt->ops->prepare_request(req); +} + /** * xprt_request_need_retransmit - Test if a task needs retransmission * @task: pointer to rpc_task @@ -1726,6 +1742,7 @@ void xprt_release(struct rpc_task *task) if (req->rq_buffer) xprt->ops->buf_free(task); xprt_inject_disconnect(xprt); + xdr_free_bvec(&req->rq_rcv_buf); if (req->rq_cred != NULL) put_rpccred(req->rq_cred); task->tk_rqstp = NULL; -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators 2018-09-17 13:03 ` [PATCH v3 39/44] SUNRPC: Add a bvec array to struct xdr_buf for use with iovec_iter() Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 41/44] SUNRPC: Clean up - rename xs_tcp_data_receive() to xs_stream_data_receive() Trond Myklebust ` (2 more replies) 0 siblings, 3 replies; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs Most of this code should also be reusable with other socket types. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- include/linux/sunrpc/xprtsock.h | 19 +- include/trace/events/sunrpc.h | 15 +- net/sunrpc/xprtsock.c | 694 +++++++++++++++----------------- 3 files changed, 335 insertions(+), 393 deletions(-) diff --git a/include/linux/sunrpc/xprtsock.h b/include/linux/sunrpc/xprtsock.h index 005cfb6e7238..458bfe0137f5 100644 --- a/include/linux/sunrpc/xprtsock.h +++ b/include/linux/sunrpc/xprtsock.h @@ -31,15 +31,16 @@ struct sock_xprt { * State of TCP reply receive */ struct { - __be32 fraghdr, + struct { + __be32 fraghdr, xid, calldir; + } __attribute__((packed)); u32 offset, len; - unsigned long copied, - flags; + unsigned long copied; } recv; /* @@ -76,21 +77,9 @@ struct sock_xprt { void (*old_error_report)(struct sock *); }; -/* - * TCP receive state flags - */ -#define TCP_RCV_LAST_FRAG (1UL << 0) -#define TCP_RCV_COPY_FRAGHDR (1UL << 1) -#define TCP_RCV_COPY_XID (1UL << 2) -#define TCP_RCV_COPY_DATA (1UL << 3) -#define TCP_RCV_READ_CALLDIR (1UL << 4) -#define TCP_RCV_COPY_CALLDIR (1UL << 5) - /* * TCP RPC flags */ -#define TCP_RPC_REPLY (1UL << 6) - #define XPRT_SOCK_CONNECTING 1U #define XPRT_SOCK_DATA_READY (2) #define XPRT_SOCK_UPD_TIMEOUT (3) diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h index 0aa347194e0f..19e08d12696c 100644 --- a/include/trace/events/sunrpc.h +++ b/include/trace/events/sunrpc.h @@ -497,16 +497,6 @@ TRACE_EVENT(xs_tcp_data_ready, __get_str(port), __entry->err, __entry->total) ); -#define rpc_show_sock_xprt_flags(flags) \ - __print_flags(flags, "|", \ - { TCP_RCV_LAST_FRAG, "TCP_RCV_LAST_FRAG" }, \ - { TCP_RCV_COPY_FRAGHDR, "TCP_RCV_COPY_FRAGHDR" }, \ - { TCP_RCV_COPY_XID, "TCP_RCV_COPY_XID" }, \ - { TCP_RCV_COPY_DATA, "TCP_RCV_COPY_DATA" }, \ - { TCP_RCV_READ_CALLDIR, "TCP_RCV_READ_CALLDIR" }, \ - { TCP_RCV_COPY_CALLDIR, "TCP_RCV_COPY_CALLDIR" }, \ - { TCP_RPC_REPLY, "TCP_RPC_REPLY" }) - TRACE_EVENT(xs_tcp_data_recv, TP_PROTO(struct sock_xprt *xs), @@ -516,7 +506,6 @@ TRACE_EVENT(xs_tcp_data_recv, __string(addr, xs->xprt.address_strings[RPC_DISPLAY_ADDR]) __string(port, xs->xprt.address_strings[RPC_DISPLAY_PORT]) __field(u32, xid) - __field(unsigned long, flags) __field(unsigned long, copied) __field(unsigned int, reclen) __field(unsigned long, offset) @@ -526,15 +515,13 @@ TRACE_EVENT(xs_tcp_data_recv, __assign_str(addr, xs->xprt.address_strings[RPC_DISPLAY_ADDR]); __assign_str(port, xs->xprt.address_strings[RPC_DISPLAY_PORT]); __entry->xid = be32_to_cpu(xs->recv.xid); - __entry->flags = xs->recv.flags; __entry->copied = xs->recv.copied; __entry->reclen = xs->recv.len; __entry->offset = xs->recv.offset; ), - TP_printk("peer=[%s]:%s xid=0x%08x flags=%s copied=%lu reclen=%u offset=%lu", + TP_printk("peer=[%s]:%s xid=0x%08x copied=%lu reclen=%u offset=%lu", __get_str(addr), __get_str(port), __entry->xid, - rpc_show_sock_xprt_flags(__entry->flags), __entry->copied, __entry->reclen, __entry->offset) ); diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index f16406228ead..5269ad98bb08 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -47,13 +47,13 @@ #include <net/checksum.h> #include <net/udp.h> #include <net/tcp.h> +#include <linux/bvec.h> +#include <linux/uio.h> #include <trace/events/sunrpc.h> #include "sunrpc.h" -#define RPC_TCP_READ_CHUNK_SZ (3*512*1024) - static void xs_close(struct rpc_xprt *xprt); static void xs_tcp_set_socket_timeouts(struct rpc_xprt *xprt, struct socket *sock); @@ -325,6 +325,320 @@ static void xs_free_peer_addresses(struct rpc_xprt *xprt) } } +static size_t +xs_alloc_sparse_pages(struct xdr_buf *buf, size_t want, gfp_t gfp) +{ + size_t i,n; + + if (!(buf->flags & XDRBUF_SPARSE_PAGES)) + return want; + if (want > buf->page_len) + want = buf->page_len; + n = (buf->page_base + want + PAGE_SIZE - 1) >> PAGE_SHIFT; + for (i = 0; i < n; i++) { + if (buf->pages[i]) + continue; + buf->bvec[i].bv_page = buf->pages[i] = alloc_page(gfp); + if (!buf->pages[i]) { + buf->page_len = (i * PAGE_SIZE) - buf->page_base; + return buf->page_len; + } + } + return want; +} + +static ssize_t +xs_sock_recvmsg(struct socket *sock, struct msghdr *msg, int flags, size_t seek) +{ + ssize_t ret; + if (seek != 0) + iov_iter_advance(&msg->msg_iter, seek); + ret = sock_recvmsg(sock, msg, flags); + return ret > 0 ? ret + seek : ret; +} + +static ssize_t +xs_read_kvec(struct socket *sock, struct msghdr *msg, int flags, + struct kvec *kvec, size_t count, size_t seek) +{ + iov_iter_kvec(&msg->msg_iter, READ | ITER_KVEC, kvec, 1, count); + return xs_sock_recvmsg(sock, msg, flags, seek); +} + +static ssize_t +xs_read_bvec(struct socket *sock, struct msghdr *msg, int flags, + struct bio_vec *bvec, unsigned long nr, size_t count, + size_t seek) +{ + iov_iter_bvec(&msg->msg_iter, READ | ITER_BVEC, bvec, nr, count); + return xs_sock_recvmsg(sock, msg, flags, seek); +} + +static ssize_t +xs_read_discard(struct socket *sock, struct msghdr *msg, int flags, + size_t count) +{ + struct kvec kvec = { 0 }; + return xs_read_kvec(sock, msg, flags | MSG_TRUNC, &kvec, count, 0); +} + +static ssize_t +xs_read_xdr_buf(struct socket *sock, struct msghdr *msg, int flags, + struct xdr_buf *buf, size_t count, size_t seek, size_t *read) +{ + size_t want, seek_init = seek, offset = 0; + ssize_t ret; + + if (seek < buf->head[0].iov_len) { + want = min_t(size_t, count, buf->head[0].iov_len); + ret = xs_read_kvec(sock, msg, flags, &buf->head[0], want, seek); + if (ret <= 0) + goto sock_err; + offset += ret; + if (offset == count || msg->msg_flags & (MSG_EOR|MSG_TRUNC)) + goto out; + if (ret != want) + goto eagain; + seek = 0; + } else { + seek -= buf->head[0].iov_len; + offset += buf->head[0].iov_len; + } + if (buf->page_len && seek < buf->page_len) { + want = min_t(size_t, count - offset, buf->page_len); + want = xs_alloc_sparse_pages(buf, want, GFP_NOWAIT); + ret = xs_read_bvec(sock, msg, flags, buf->bvec, + xdr_buf_pagecount(buf), + want + buf->page_base, + seek + buf->page_base); + if (ret <= 0) + goto sock_err; + offset += ret; + if (offset == count || msg->msg_flags & (MSG_EOR|MSG_TRUNC)) + goto out; + if (ret != want) + goto eagain; + seek = 0; + } else { + seek -= buf->page_len; + offset += buf->page_len; + } + if (buf->tail[0].iov_len && seek < buf->tail[0].iov_len) { + want = min_t(size_t, count - offset, buf->tail[0].iov_len); + ret = xs_read_kvec(sock, msg, flags, &buf->tail[0], want, seek); + if (ret <= 0) + goto sock_err; + offset += ret; + if (offset == count || msg->msg_flags & (MSG_EOR|MSG_TRUNC)) + goto out; + if (ret != want) + goto eagain; + } else + offset += buf->tail[0].iov_len; + ret = -EMSGSIZE; + msg->msg_flags |= MSG_TRUNC; +out: + *read = offset - seek_init; + return ret; +eagain: + ret = -EAGAIN; + goto out; +sock_err: + offset += seek; + goto out; +} + +static void +xs_read_header(struct sock_xprt *transport, struct xdr_buf *buf) +{ + if (!transport->recv.copied) { + if (buf->head[0].iov_len >= transport->recv.offset) + memcpy(buf->head[0].iov_base, + &transport->recv.xid, + transport->recv.offset); + transport->recv.copied = transport->recv.offset; + } +} + +static bool +xs_read_stream_request_done(struct sock_xprt *transport) +{ + return transport->recv.fraghdr & cpu_to_be32(RPC_LAST_STREAM_FRAGMENT); +} + +static ssize_t +xs_read_stream_request(struct sock_xprt *transport, struct msghdr *msg, + int flags, struct rpc_rqst *req) +{ + struct xdr_buf *buf = &req->rq_private_buf; + size_t want, read; + ssize_t ret; + + xs_read_header(transport, buf); + + want = transport->recv.len - transport->recv.offset; + ret = xs_read_xdr_buf(transport->sock, msg, flags, buf, + transport->recv.copied + want, transport->recv.copied, + &read); + transport->recv.offset += read; + transport->recv.copied += read; + if (transport->recv.offset == transport->recv.len) { + if (xs_read_stream_request_done(transport)) + msg->msg_flags |= MSG_EOR; + return transport->recv.copied; + } + + switch (ret) { + case -EMSGSIZE: + return transport->recv.copied; + case 0: + return -ESHUTDOWN; + default: + if (ret < 0) + return ret; + } + return -EAGAIN; +} + +static size_t +xs_read_stream_headersize(bool isfrag) +{ + if (isfrag) + return sizeof(__be32); + return 3 * sizeof(__be32); +} + +static ssize_t +xs_read_stream_header(struct sock_xprt *transport, struct msghdr *msg, + int flags, size_t want, size_t seek) +{ + struct kvec kvec = { + .iov_base = &transport->recv.fraghdr, + .iov_len = want, + }; + return xs_read_kvec(transport->sock, msg, flags, &kvec, want, seek); +} + +#if defined(CONFIG_SUNRPC_BACKCHANNEL) +static ssize_t +xs_read_stream_call(struct sock_xprt *transport, struct msghdr *msg, int flags) +{ + struct rpc_xprt *xprt = &transport->xprt; + struct rpc_rqst *req; + ssize_t ret; + + /* Look up and lock the request corresponding to the given XID */ + req = xprt_lookup_bc_request(xprt, transport->recv.xid); + if (!req) { + printk(KERN_WARNING "Callback slot table overflowed\n"); + return -ESHUTDOWN; + } + + ret = xs_read_stream_request(transport, msg, flags, req); + if (msg->msg_flags & (MSG_EOR|MSG_TRUNC)) + xprt_complete_bc_request(req, ret); + + return ret; +} +#else /* CONFIG_SUNRPC_BACKCHANNEL */ +static ssize_t +xs_read_stream_call(struct sock_xprt *transport, struct msghdr *msg, int flags) +{ + return -ESHUTDOWN; +} +#endif /* CONFIG_SUNRPC_BACKCHANNEL */ + +static ssize_t +xs_read_stream_reply(struct sock_xprt *transport, struct msghdr *msg, int flags) +{ + struct rpc_xprt *xprt = &transport->xprt; + struct rpc_rqst *req; + ssize_t ret = 0; + + /* Look up and lock the request corresponding to the given XID */ + spin_lock(&xprt->queue_lock); + req = xprt_lookup_rqst(xprt, transport->recv.xid); + if (!req) { + msg->msg_flags |= MSG_TRUNC; + goto out; + } + xprt_pin_rqst(req); + spin_unlock(&xprt->queue_lock); + + ret = xs_read_stream_request(transport, msg, flags, req); + + spin_lock(&xprt->queue_lock); + if (msg->msg_flags & (MSG_EOR|MSG_TRUNC)) + xprt_complete_rqst(req->rq_task, ret); + xprt_unpin_rqst(req); +out: + spin_unlock(&xprt->queue_lock); + return ret; +} + +static ssize_t +xs_read_stream(struct sock_xprt *transport, int flags) +{ + struct msghdr msg = { 0 }; + size_t want, read = 0; + ssize_t ret = 0; + + if (transport->recv.len == 0) { + want = xs_read_stream_headersize(transport->recv.copied != 0); + ret = xs_read_stream_header(transport, &msg, flags, want, + transport->recv.offset); + if (ret <= 0) + goto out_err; + transport->recv.offset = ret; + if (ret != want) { + ret = -EAGAIN; + goto out_err; + } + transport->recv.len = be32_to_cpu(transport->recv.fraghdr) & + RPC_FRAGMENT_SIZE_MASK; + transport->recv.offset -= sizeof(transport->recv.fraghdr); + read = ret; + } + + switch (be32_to_cpu(transport->recv.calldir)) { + case RPC_CALL: + ret = xs_read_stream_call(transport, &msg, flags); + break; + case RPC_REPLY: + ret = xs_read_stream_reply(transport, &msg, flags); + } + if (msg.msg_flags & MSG_TRUNC) { + transport->recv.calldir = cpu_to_be32(-1); + transport->recv.copied = -1; + } + if (ret < 0) + goto out_err; + read += ret; + if (transport->recv.offset < transport->recv.len) { + ret = xs_read_discard(transport->sock, &msg, flags, + transport->recv.len - transport->recv.offset); + if (ret <= 0) + goto out_err; + transport->recv.offset += ret; + read += ret; + } + if (xs_read_stream_request_done(transport)) { + trace_xs_tcp_data_recv(transport); + transport->recv.copied = 0; + } + transport->recv.offset = 0; + transport->recv.len = 0; + return read; +out_err: + switch (ret) { + case 0: + case -ESHUTDOWN: + xprt_force_disconnect(&transport->xprt); + return -ESHUTDOWN; + } + return ret; +} + #define XS_SENDMSG_FLAGS (MSG_DONTWAIT | MSG_NOSIGNAL) static int xs_send_kvec(struct socket *sock, struct sockaddr *addr, int addrlen, struct kvec *vec, unsigned int base, int more) @@ -484,6 +798,12 @@ static int xs_nospace(struct rpc_rqst *req) return ret; } +static void +xs_stream_prepare_request(struct rpc_rqst *req) +{ + req->rq_task->tk_status = xdr_alloc_bvec(&req->rq_rcv_buf, GFP_NOIO); +} + /* * Determine if the previous message in the stream was aborted before it * could complete transmission. @@ -1157,263 +1477,7 @@ static void xs_tcp_force_close(struct rpc_xprt *xprt) xprt_force_disconnect(xprt); } -static inline void xs_tcp_read_fraghdr(struct rpc_xprt *xprt, struct xdr_skb_reader *desc) -{ - struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); - size_t len, used; - char *p; - - p = ((char *) &transport->recv.fraghdr) + transport->recv.offset; - len = sizeof(transport->recv.fraghdr) - transport->recv.offset; - used = xdr_skb_read_bits(desc, p, len); - transport->recv.offset += used; - if (used != len) - return; - - transport->recv.len = ntohl(transport->recv.fraghdr); - if (transport->recv.len & RPC_LAST_STREAM_FRAGMENT) - transport->recv.flags |= TCP_RCV_LAST_FRAG; - else - transport->recv.flags &= ~TCP_RCV_LAST_FRAG; - transport->recv.len &= RPC_FRAGMENT_SIZE_MASK; - - transport->recv.flags &= ~TCP_RCV_COPY_FRAGHDR; - transport->recv.offset = 0; - - /* Sanity check of the record length */ - if (unlikely(transport->recv.len < 8)) { - dprintk("RPC: invalid TCP record fragment length\n"); - xs_tcp_force_close(xprt); - return; - } - dprintk("RPC: reading TCP record fragment of length %d\n", - transport->recv.len); -} - -static void xs_tcp_check_fraghdr(struct sock_xprt *transport) -{ - if (transport->recv.offset == transport->recv.len) { - transport->recv.flags |= TCP_RCV_COPY_FRAGHDR; - transport->recv.offset = 0; - if (transport->recv.flags & TCP_RCV_LAST_FRAG) { - transport->recv.flags &= ~TCP_RCV_COPY_DATA; - transport->recv.flags |= TCP_RCV_COPY_XID; - transport->recv.copied = 0; - } - } -} - -static inline void xs_tcp_read_xid(struct sock_xprt *transport, struct xdr_skb_reader *desc) -{ - size_t len, used; - char *p; - - len = sizeof(transport->recv.xid) - transport->recv.offset; - dprintk("RPC: reading XID (%zu bytes)\n", len); - p = ((char *) &transport->recv.xid) + transport->recv.offset; - used = xdr_skb_read_bits(desc, p, len); - transport->recv.offset += used; - if (used != len) - return; - transport->recv.flags &= ~TCP_RCV_COPY_XID; - transport->recv.flags |= TCP_RCV_READ_CALLDIR; - transport->recv.copied = 4; - dprintk("RPC: reading %s XID %08x\n", - (transport->recv.flags & TCP_RPC_REPLY) ? "reply for" - : "request with", - ntohl(transport->recv.xid)); - xs_tcp_check_fraghdr(transport); -} - -static inline void xs_tcp_read_calldir(struct sock_xprt *transport, - struct xdr_skb_reader *desc) -{ - size_t len, used; - u32 offset; - char *p; - - /* - * We want transport->recv.offset to be 8 at the end of this routine - * (4 bytes for the xid and 4 bytes for the call/reply flag). - * When this function is called for the first time, - * transport->recv.offset is 4 (after having already read the xid). - */ - offset = transport->recv.offset - sizeof(transport->recv.xid); - len = sizeof(transport->recv.calldir) - offset; - dprintk("RPC: reading CALL/REPLY flag (%zu bytes)\n", len); - p = ((char *) &transport->recv.calldir) + offset; - used = xdr_skb_read_bits(desc, p, len); - transport->recv.offset += used; - if (used != len) - return; - transport->recv.flags &= ~TCP_RCV_READ_CALLDIR; - /* - * We don't yet have the XDR buffer, so we will write the calldir - * out after we get the buffer from the 'struct rpc_rqst' - */ - switch (ntohl(transport->recv.calldir)) { - case RPC_REPLY: - transport->recv.flags |= TCP_RCV_COPY_CALLDIR; - transport->recv.flags |= TCP_RCV_COPY_DATA; - transport->recv.flags |= TCP_RPC_REPLY; - break; - case RPC_CALL: - transport->recv.flags |= TCP_RCV_COPY_CALLDIR; - transport->recv.flags |= TCP_RCV_COPY_DATA; - transport->recv.flags &= ~TCP_RPC_REPLY; - break; - default: - dprintk("RPC: invalid request message type\n"); - xs_tcp_force_close(&transport->xprt); - } - xs_tcp_check_fraghdr(transport); -} - -static inline void xs_tcp_read_common(struct rpc_xprt *xprt, - struct xdr_skb_reader *desc, - struct rpc_rqst *req) -{ - struct sock_xprt *transport = - container_of(xprt, struct sock_xprt, xprt); - struct xdr_buf *rcvbuf; - size_t len; - ssize_t r; - - rcvbuf = &req->rq_private_buf; - - if (transport->recv.flags & TCP_RCV_COPY_CALLDIR) { - /* - * Save the RPC direction in the XDR buffer - */ - memcpy(rcvbuf->head[0].iov_base + transport->recv.copied, - &transport->recv.calldir, - sizeof(transport->recv.calldir)); - transport->recv.copied += sizeof(transport->recv.calldir); - transport->recv.flags &= ~TCP_RCV_COPY_CALLDIR; - } - - len = desc->count; - if (len > transport->recv.len - transport->recv.offset) - desc->count = transport->recv.len - transport->recv.offset; - r = xdr_partial_copy_from_skb(rcvbuf, transport->recv.copied, - desc, xdr_skb_read_bits); - - if (desc->count) { - /* Error when copying to the receive buffer, - * usually because we weren't able to allocate - * additional buffer pages. All we can do now - * is turn off TCP_RCV_COPY_DATA, so the request - * will not receive any additional updates, - * and time out. - * Any remaining data from this record will - * be discarded. - */ - transport->recv.flags &= ~TCP_RCV_COPY_DATA; - dprintk("RPC: XID %08x truncated request\n", - ntohl(transport->recv.xid)); - dprintk("RPC: xprt = %p, recv.copied = %lu, " - "recv.offset = %u, recv.len = %u\n", - xprt, transport->recv.copied, - transport->recv.offset, transport->recv.len); - return; - } - - transport->recv.copied += r; - transport->recv.offset += r; - desc->count = len - r; - - dprintk("RPC: XID %08x read %zd bytes\n", - ntohl(transport->recv.xid), r); - dprintk("RPC: xprt = %p, recv.copied = %lu, recv.offset = %u, " - "recv.len = %u\n", xprt, transport->recv.copied, - transport->recv.offset, transport->recv.len); - - if (transport->recv.copied == req->rq_private_buf.buflen) - transport->recv.flags &= ~TCP_RCV_COPY_DATA; - else if (transport->recv.offset == transport->recv.len) { - if (transport->recv.flags & TCP_RCV_LAST_FRAG) - transport->recv.flags &= ~TCP_RCV_COPY_DATA; - } -} - -/* - * Finds the request corresponding to the RPC xid and invokes the common - * tcp read code to read the data. - */ -static inline int xs_tcp_read_reply(struct rpc_xprt *xprt, - struct xdr_skb_reader *desc) -{ - struct sock_xprt *transport = - container_of(xprt, struct sock_xprt, xprt); - struct rpc_rqst *req; - - dprintk("RPC: read reply XID %08x\n", ntohl(transport->recv.xid)); - - /* Find and lock the request corresponding to this xid */ - spin_lock(&xprt->queue_lock); - req = xprt_lookup_rqst(xprt, transport->recv.xid); - if (!req) { - dprintk("RPC: XID %08x request not found!\n", - ntohl(transport->recv.xid)); - spin_unlock(&xprt->queue_lock); - return -1; - } - xprt_pin_rqst(req); - spin_unlock(&xprt->queue_lock); - - xs_tcp_read_common(xprt, desc, req); - - spin_lock(&xprt->queue_lock); - if (!(transport->recv.flags & TCP_RCV_COPY_DATA)) - xprt_complete_rqst(req->rq_task, transport->recv.copied); - xprt_unpin_rqst(req); - spin_unlock(&xprt->queue_lock); - return 0; -} - #if defined(CONFIG_SUNRPC_BACKCHANNEL) -/* - * Obtains an rpc_rqst previously allocated and invokes the common - * tcp read code to read the data. The result is placed in the callback - * queue. - * If we're unable to obtain the rpc_rqst we schedule the closing of the - * connection and return -1. - */ -static int xs_tcp_read_callback(struct rpc_xprt *xprt, - struct xdr_skb_reader *desc) -{ - struct sock_xprt *transport = - container_of(xprt, struct sock_xprt, xprt); - struct rpc_rqst *req; - - /* Look up the request corresponding to the given XID */ - req = xprt_lookup_bc_request(xprt, transport->recv.xid); - if (req == NULL) { - printk(KERN_WARNING "Callback slot table overflowed\n"); - xprt_force_disconnect(xprt); - return -1; - } - - dprintk("RPC: read callback XID %08x\n", ntohl(req->rq_xid)); - xs_tcp_read_common(xprt, desc, req); - - if (!(transport->recv.flags & TCP_RCV_COPY_DATA)) - xprt_complete_bc_request(req, transport->recv.copied); - - return 0; -} - -static inline int _xs_tcp_read_data(struct rpc_xprt *xprt, - struct xdr_skb_reader *desc) -{ - struct sock_xprt *transport = - container_of(xprt, struct sock_xprt, xprt); - - return (transport->recv.flags & TCP_RPC_REPLY) ? - xs_tcp_read_reply(xprt, desc) : - xs_tcp_read_callback(xprt, desc); -} - static int xs_tcp_bc_up(struct svc_serv *serv, struct net *net) { int ret; @@ -1429,106 +1493,14 @@ static size_t xs_tcp_bc_maxpayload(struct rpc_xprt *xprt) { return PAGE_SIZE; } -#else -static inline int _xs_tcp_read_data(struct rpc_xprt *xprt, - struct xdr_skb_reader *desc) -{ - return xs_tcp_read_reply(xprt, desc); -} #endif /* CONFIG_SUNRPC_BACKCHANNEL */ -/* - * Read data off the transport. This can be either an RPC_CALL or an - * RPC_REPLY. Relay the processing to helper functions. - */ -static void xs_tcp_read_data(struct rpc_xprt *xprt, - struct xdr_skb_reader *desc) -{ - struct sock_xprt *transport = - container_of(xprt, struct sock_xprt, xprt); - - if (_xs_tcp_read_data(xprt, desc) == 0) - xs_tcp_check_fraghdr(transport); - else { - /* - * The transport_lock protects the request handling. - * There's no need to hold it to update the recv.flags. - */ - transport->recv.flags &= ~TCP_RCV_COPY_DATA; - } -} - -static inline void xs_tcp_read_discard(struct sock_xprt *transport, struct xdr_skb_reader *desc) -{ - size_t len; - - len = transport->recv.len - transport->recv.offset; - if (len > desc->count) - len = desc->count; - desc->count -= len; - desc->offset += len; - transport->recv.offset += len; - dprintk("RPC: discarded %zu bytes\n", len); - xs_tcp_check_fraghdr(transport); -} - -static int xs_tcp_data_recv(read_descriptor_t *rd_desc, struct sk_buff *skb, unsigned int offset, size_t len) -{ - struct rpc_xprt *xprt = rd_desc->arg.data; - struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); - struct xdr_skb_reader desc = { - .skb = skb, - .offset = offset, - .count = len, - }; - size_t ret; - - dprintk("RPC: xs_tcp_data_recv started\n"); - do { - trace_xs_tcp_data_recv(transport); - /* Read in a new fragment marker if necessary */ - /* Can we ever really expect to get completely empty fragments? */ - if (transport->recv.flags & TCP_RCV_COPY_FRAGHDR) { - xs_tcp_read_fraghdr(xprt, &desc); - continue; - } - /* Read in the xid if necessary */ - if (transport->recv.flags & TCP_RCV_COPY_XID) { - xs_tcp_read_xid(transport, &desc); - continue; - } - /* Read in the call/reply flag */ - if (transport->recv.flags & TCP_RCV_READ_CALLDIR) { - xs_tcp_read_calldir(transport, &desc); - continue; - } - /* Read in the request data */ - if (transport->recv.flags & TCP_RCV_COPY_DATA) { - xs_tcp_read_data(xprt, &desc); - continue; - } - /* Skip over any trailing bytes on short reads */ - xs_tcp_read_discard(transport, &desc); - } while (desc.count); - ret = len - desc.count; - if (ret < rd_desc->count) - rd_desc->count -= ret; - else - rd_desc->count = 0; - trace_xs_tcp_data_recv(transport); - dprintk("RPC: xs_tcp_data_recv done\n"); - return ret; -} - static void xs_tcp_data_receive(struct sock_xprt *transport) { struct rpc_xprt *xprt = &transport->xprt; struct sock *sk; - read_descriptor_t rd_desc = { - .arg.data = xprt, - }; - unsigned long total = 0; - int read = 0; + size_t read = 0; + ssize_t ret = 0; restart: mutex_lock(&transport->recv_mutex); @@ -1536,18 +1508,12 @@ static void xs_tcp_data_receive(struct sock_xprt *transport) if (sk == NULL) goto out; - /* We use rd_desc to pass struct xprt to xs_tcp_data_recv */ for (;;) { - rd_desc.count = RPC_TCP_READ_CHUNK_SZ; - lock_sock(sk); - read = tcp_read_sock(sk, &rd_desc, xs_tcp_data_recv); - if (rd_desc.count != 0 || read < 0) { - clear_bit(XPRT_SOCK_DATA_READY, &transport->sock_state); - release_sock(sk); + clear_bit(XPRT_SOCK_DATA_READY, &transport->sock_state); + ret = xs_read_stream(transport, MSG_DONTWAIT | MSG_NOSIGNAL); + if (ret < 0) break; - } - release_sock(sk); - total += read; + read += ret; if (need_resched()) { mutex_unlock(&transport->recv_mutex); cond_resched(); @@ -1558,7 +1524,7 @@ static void xs_tcp_data_receive(struct sock_xprt *transport) queue_work(xprtiod_workqueue, &transport->recv_worker); out: mutex_unlock(&transport->recv_mutex); - trace_xs_tcp_data_ready(xprt, read, total); + trace_xs_tcp_data_ready(xprt, ret, read); } static void xs_tcp_data_receive_workfn(struct work_struct *work) @@ -2380,7 +2346,6 @@ static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock) transport->recv.offset = 0; transport->recv.len = 0; transport->recv.copied = 0; - transport->recv.flags = TCP_RCV_COPY_FRAGHDR | TCP_RCV_COPY_XID; transport->xmit.offset = 0; /* Tell the socket layer to start connecting... */ @@ -2802,6 +2767,7 @@ static const struct rpc_xprt_ops xs_tcp_ops = { .connect = xs_connect, .buf_alloc = rpc_malloc, .buf_free = rpc_free, + .prepare_request = xs_stream_prepare_request, .send_request = xs_tcp_send_request, .set_retrans_timeout = xprt_set_retrans_timeout_def, .close = xs_tcp_shutdown, -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 41/44] SUNRPC: Clean up - rename xs_tcp_data_receive() to xs_stream_data_receive() 2018-09-17 13:03 ` [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 42/44] SUNRPC: Allow AF_LOCAL sockets to use the generic stream receive Trond Myklebust 2018-09-17 20:44 ` [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators Trond Myklebust 2018-11-09 11:19 ` Catalin Marinas 2 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs In preparation for sharing with AF_LOCAL. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- include/trace/events/sunrpc.h | 16 ++++---- net/sunrpc/xprtsock.c | 71 +++++++++++++++-------------------- 2 files changed, 38 insertions(+), 49 deletions(-) diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h index 19e08d12696c..28e384186c35 100644 --- a/include/trace/events/sunrpc.h +++ b/include/trace/events/sunrpc.h @@ -470,14 +470,14 @@ TRACE_EVENT(xprt_ping, __get_str(addr), __get_str(port), __entry->status) ); -TRACE_EVENT(xs_tcp_data_ready, - TP_PROTO(struct rpc_xprt *xprt, int err, unsigned int total), +TRACE_EVENT(xs_stream_read_data, + TP_PROTO(struct rpc_xprt *xprt, ssize_t err, size_t total), TP_ARGS(xprt, err, total), TP_STRUCT__entry( - __field(int, err) - __field(unsigned int, total) + __field(ssize_t, err) + __field(size_t, total) __string(addr, xprt ? xprt->address_strings[RPC_DISPLAY_ADDR] : "(null)") __string(port, xprt ? xprt->address_strings[RPC_DISPLAY_PORT] : @@ -493,11 +493,11 @@ TRACE_EVENT(xs_tcp_data_ready, xprt->address_strings[RPC_DISPLAY_PORT] : "(null)"); ), - TP_printk("peer=[%s]:%s err=%d total=%u", __get_str(addr), + TP_printk("peer=[%s]:%s err=%zd total=%zu", __get_str(addr), __get_str(port), __entry->err, __entry->total) ); -TRACE_EVENT(xs_tcp_data_recv, +TRACE_EVENT(xs_stream_read_request, TP_PROTO(struct sock_xprt *xs), TP_ARGS(xs), @@ -508,7 +508,7 @@ TRACE_EVENT(xs_tcp_data_recv, __field(u32, xid) __field(unsigned long, copied) __field(unsigned int, reclen) - __field(unsigned long, offset) + __field(unsigned int, offset) ), TP_fast_assign( @@ -520,7 +520,7 @@ TRACE_EVENT(xs_tcp_data_recv, __entry->offset = xs->recv.offset; ), - TP_printk("peer=[%s]:%s xid=0x%08x copied=%lu reclen=%u offset=%lu", + TP_printk("peer=[%s]:%s xid=0x%08x copied=%lu reclen=%u offset=%u", __get_str(addr), __get_str(port), __entry->xid, __entry->copied, __entry->reclen, __entry->offset) ); diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 5269ad98bb08..15364e2746bd 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -623,7 +623,7 @@ xs_read_stream(struct sock_xprt *transport, int flags) read += ret; } if (xs_read_stream_request_done(transport)) { - trace_xs_tcp_data_recv(transport); + trace_xs_stream_read_request(transport); transport->recv.copied = 0; } transport->recv.offset = 0; @@ -639,6 +639,34 @@ xs_read_stream(struct sock_xprt *transport, int flags) return ret; } +static void xs_stream_data_receive(struct sock_xprt *transport) +{ + size_t read = 0; + ssize_t ret = 0; + + mutex_lock(&transport->recv_mutex); + if (transport->sock == NULL) + goto out; + clear_bit(XPRT_SOCK_DATA_READY, &transport->sock_state); + for (;;) { + ret = xs_read_stream(transport, MSG_DONTWAIT | MSG_NOSIGNAL); + if (ret <= 0) + break; + read += ret; + cond_resched(); + } +out: + mutex_unlock(&transport->recv_mutex); + trace_xs_stream_read_data(&transport->xprt, ret, read); +} + +static void xs_stream_data_receive_workfn(struct work_struct *work) +{ + struct sock_xprt *transport = + container_of(work, struct sock_xprt, recv_worker); + xs_stream_data_receive(transport); +} + #define XS_SENDMSG_FLAGS (MSG_DONTWAIT | MSG_NOSIGNAL) static int xs_send_kvec(struct socket *sock, struct sockaddr *addr, int addrlen, struct kvec *vec, unsigned int base, int more) @@ -1495,45 +1523,6 @@ static size_t xs_tcp_bc_maxpayload(struct rpc_xprt *xprt) } #endif /* CONFIG_SUNRPC_BACKCHANNEL */ -static void xs_tcp_data_receive(struct sock_xprt *transport) -{ - struct rpc_xprt *xprt = &transport->xprt; - struct sock *sk; - size_t read = 0; - ssize_t ret = 0; - -restart: - mutex_lock(&transport->recv_mutex); - sk = transport->inet; - if (sk == NULL) - goto out; - - for (;;) { - clear_bit(XPRT_SOCK_DATA_READY, &transport->sock_state); - ret = xs_read_stream(transport, MSG_DONTWAIT | MSG_NOSIGNAL); - if (ret < 0) - break; - read += ret; - if (need_resched()) { - mutex_unlock(&transport->recv_mutex); - cond_resched(); - goto restart; - } - } - if (test_bit(XPRT_SOCK_DATA_READY, &transport->sock_state)) - queue_work(xprtiod_workqueue, &transport->recv_worker); -out: - mutex_unlock(&transport->recv_mutex); - trace_xs_tcp_data_ready(xprt, ret, read); -} - -static void xs_tcp_data_receive_workfn(struct work_struct *work) -{ - struct sock_xprt *transport = - container_of(work, struct sock_xprt, recv_worker); - xs_tcp_data_receive(transport); -} - /** * xs_tcp_state_change - callback to handle TCP socket state changes * @sk: socket whose state has changed @@ -3063,7 +3052,7 @@ static struct rpc_xprt *xs_setup_tcp(struct xprt_create *args) xprt->connect_timeout = xprt->timeout->to_initval * (xprt->timeout->to_retries + 1); - INIT_WORK(&transport->recv_worker, xs_tcp_data_receive_workfn); + INIT_WORK(&transport->recv_worker, xs_stream_data_receive_workfn); INIT_DELAYED_WORK(&transport->connect_worker, xs_tcp_setup_socket); switch (addr->sa_family) { -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 42/44] SUNRPC: Allow AF_LOCAL sockets to use the generic stream receive 2018-09-17 13:03 ` [PATCH v3 41/44] SUNRPC: Clean up - rename xs_tcp_data_receive() to xs_stream_data_receive() Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 43/44] SUNRPC: Clean up xs_udp_data_receive() Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- include/linux/sunrpc/xdr.h | 1 - net/sunrpc/socklib.c | 4 +- net/sunrpc/xprtsock.c | 137 +++++-------------------------------- 3 files changed, 18 insertions(+), 124 deletions(-) diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index 745587132a87..8815be7cae72 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -185,7 +185,6 @@ struct xdr_skb_reader { typedef size_t (*xdr_skb_read_actor)(struct xdr_skb_reader *desc, void *to, size_t len); -size_t xdr_skb_read_bits(struct xdr_skb_reader *desc, void *to, size_t len); extern int csum_partial_copy_to_xdr(struct xdr_buf *, struct sk_buff *); extern ssize_t xdr_partial_copy_from_skb(struct xdr_buf *, unsigned int, struct xdr_skb_reader *, xdr_skb_read_actor); diff --git a/net/sunrpc/socklib.c b/net/sunrpc/socklib.c index 08f00a98151f..0e7c0dee7578 100644 --- a/net/sunrpc/socklib.c +++ b/net/sunrpc/socklib.c @@ -26,7 +26,8 @@ * Possibly called several times to iterate over an sk_buff and copy * data out of it. */ -size_t xdr_skb_read_bits(struct xdr_skb_reader *desc, void *to, size_t len) +static size_t +xdr_skb_read_bits(struct xdr_skb_reader *desc, void *to, size_t len) { if (len > desc->count) len = desc->count; @@ -36,7 +37,6 @@ size_t xdr_skb_read_bits(struct xdr_skb_reader *desc, void *to, size_t len) desc->offset += len; return len; } -EXPORT_SYMBOL_GPL(xdr_skb_read_bits); /** * xdr_skb_read_and_csum_bits - copy and checksum from skb to buffer diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 15364e2746bd..1daa179b7706 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -667,6 +667,17 @@ static void xs_stream_data_receive_workfn(struct work_struct *work) xs_stream_data_receive(transport); } +static void +xs_stream_reset_connect(struct sock_xprt *transport) +{ + transport->recv.offset = 0; + transport->recv.len = 0; + transport->recv.copied = 0; + transport->xmit.offset = 0; + transport->xprt.stat.connect_count++; + transport->xprt.stat.connect_start = jiffies; +} + #define XS_SENDMSG_FLAGS (MSG_DONTWAIT | MSG_NOSIGNAL) static int xs_send_kvec(struct socket *sock, struct sockaddr *addr, int addrlen, struct kvec *vec, unsigned int base, int more) @@ -1263,114 +1274,6 @@ static void xs_destroy(struct rpc_xprt *xprt) module_put(THIS_MODULE); } -static int xs_local_copy_to_xdr(struct xdr_buf *xdr, struct sk_buff *skb) -{ - struct xdr_skb_reader desc = { - .skb = skb, - .offset = sizeof(rpc_fraghdr), - .count = skb->len - sizeof(rpc_fraghdr), - }; - - if (xdr_partial_copy_from_skb(xdr, 0, &desc, xdr_skb_read_bits) < 0) - return -1; - if (desc.count) - return -1; - return 0; -} - -/** - * xs_local_data_read_skb - * @xprt: transport - * @sk: socket - * @skb: skbuff - * - * Currently this assumes we can read the whole reply in a single gulp. - */ -static void xs_local_data_read_skb(struct rpc_xprt *xprt, - struct sock *sk, - struct sk_buff *skb) -{ - struct rpc_task *task; - struct rpc_rqst *rovr; - int repsize, copied; - u32 _xid; - __be32 *xp; - - repsize = skb->len - sizeof(rpc_fraghdr); - if (repsize < 4) { - dprintk("RPC: impossible RPC reply size %d\n", repsize); - return; - } - - /* Copy the XID from the skb... */ - xp = skb_header_pointer(skb, sizeof(rpc_fraghdr), sizeof(_xid), &_xid); - if (xp == NULL) - return; - - /* Look up and lock the request corresponding to the given XID */ - spin_lock(&xprt->queue_lock); - rovr = xprt_lookup_rqst(xprt, *xp); - if (!rovr) - goto out_unlock; - xprt_pin_rqst(rovr); - spin_unlock(&xprt->queue_lock); - task = rovr->rq_task; - - copied = rovr->rq_private_buf.buflen; - if (copied > repsize) - copied = repsize; - - if (xs_local_copy_to_xdr(&rovr->rq_private_buf, skb)) { - dprintk("RPC: sk_buff copy failed\n"); - spin_lock(&xprt->queue_lock); - goto out_unpin; - } - - spin_lock(&xprt->queue_lock); - xprt_complete_rqst(task, copied); -out_unpin: - xprt_unpin_rqst(rovr); - out_unlock: - spin_unlock(&xprt->queue_lock); -} - -static void xs_local_data_receive(struct sock_xprt *transport) -{ - struct sk_buff *skb; - struct sock *sk; - int err; - -restart: - mutex_lock(&transport->recv_mutex); - sk = transport->inet; - if (sk == NULL) - goto out; - for (;;) { - skb = skb_recv_datagram(sk, 0, 1, &err); - if (skb != NULL) { - xs_local_data_read_skb(&transport->xprt, sk, skb); - skb_free_datagram(sk, skb); - continue; - } - if (!test_and_clear_bit(XPRT_SOCK_DATA_READY, &transport->sock_state)) - break; - if (need_resched()) { - mutex_unlock(&transport->recv_mutex); - cond_resched(); - goto restart; - } - } -out: - mutex_unlock(&transport->recv_mutex); -} - -static void xs_local_data_receive_workfn(struct work_struct *work) -{ - struct sock_xprt *transport = - container_of(work, struct sock_xprt, recv_worker); - xs_local_data_receive(transport); -} - /** * xs_udp_data_read_skb - receive callback for UDP sockets * @xprt: transport @@ -1971,11 +1874,8 @@ static int xs_local_finish_connecting(struct rpc_xprt *xprt, write_unlock_bh(&sk->sk_callback_lock); } - transport->xmit.offset = 0; + xs_stream_reset_connect(transport); - /* Tell the socket layer to start connecting... */ - xprt->stat.connect_count++; - xprt->stat.connect_start = jiffies; return kernel_connect(sock, xs_addr(xprt), xprt->addrlen, 0); } @@ -2332,14 +2232,9 @@ static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock) xs_set_memalloc(xprt); /* Reset TCP record info */ - transport->recv.offset = 0; - transport->recv.len = 0; - transport->recv.copied = 0; - transport->xmit.offset = 0; + xs_stream_reset_connect(transport); /* Tell the socket layer to start connecting... */ - xprt->stat.connect_count++; - xprt->stat.connect_start = jiffies; set_bit(XPRT_SOCK_CONNECTING, &transport->sock_state); ret = kernel_connect(sock, xs_addr(xprt), xprt->addrlen, O_NONBLOCK); switch (ret) { @@ -2714,6 +2609,7 @@ static const struct rpc_xprt_ops xs_local_ops = { .connect = xs_local_connect, .buf_alloc = rpc_malloc, .buf_free = rpc_free, + .prepare_request = xs_stream_prepare_request, .send_request = xs_local_send_request, .set_retrans_timeout = xprt_set_retrans_timeout_def, .close = xs_close, @@ -2898,9 +2794,8 @@ static struct rpc_xprt *xs_setup_local(struct xprt_create *args) xprt->ops = &xs_local_ops; xprt->timeout = &xs_local_default_timeout; - INIT_WORK(&transport->recv_worker, xs_local_data_receive_workfn); - INIT_DELAYED_WORK(&transport->connect_worker, - xs_dummy_setup_socket); + INIT_WORK(&transport->recv_worker, xs_stream_data_receive_workfn); + INIT_DELAYED_WORK(&transport->connect_worker, xs_dummy_setup_socket); switch (sun->sun_family) { case AF_LOCAL: -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 43/44] SUNRPC: Clean up xs_udp_data_receive() 2018-09-17 13:03 ` [PATCH v3 42/44] SUNRPC: Allow AF_LOCAL sockets to use the generic stream receive Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 44/44] SUNRPC: Unexport xdr_partial_copy_from_skb() Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs Simplify the retry logic. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- net/sunrpc/xprtsock.c | 17 +++++------------ 1 file changed, 5 insertions(+), 12 deletions(-) diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 1daa179b7706..175347f62875 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -1341,25 +1341,18 @@ static void xs_udp_data_receive(struct sock_xprt *transport) struct sock *sk; int err; -restart: mutex_lock(&transport->recv_mutex); sk = transport->inet; if (sk == NULL) goto out; + clear_bit(XPRT_SOCK_DATA_READY, &transport->sock_state); for (;;) { skb = skb_recv_udp(sk, 0, 1, &err); - if (skb != NULL) { - xs_udp_data_read_skb(&transport->xprt, sk, skb); - consume_skb(skb); - continue; - } - if (!test_and_clear_bit(XPRT_SOCK_DATA_READY, &transport->sock_state)) + if (skb == NULL) break; - if (need_resched()) { - mutex_unlock(&transport->recv_mutex); - cond_resched(); - goto restart; - } + xs_udp_data_read_skb(&transport->xprt, sk, skb); + consume_skb(skb); + cond_resched(); } out: mutex_unlock(&transport->recv_mutex); -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 44/44] SUNRPC: Unexport xdr_partial_copy_from_skb() 2018-09-17 13:03 ` [PATCH v3 43/44] SUNRPC: Clean up xs_udp_data_receive() Trond Myklebust @ 2018-09-17 13:03 ` Trond Myklebust 0 siblings, 0 replies; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 13:03 UTC (permalink / raw) To: linux-nfs It is no longer used outside of net/sunrpc/socklib.c Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- include/linux/sunrpc/xdr.h | 2 -- net/sunrpc/socklib.c | 4 ++-- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index 8815be7cae72..43106ffa6788 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -186,8 +186,6 @@ struct xdr_skb_reader { typedef size_t (*xdr_skb_read_actor)(struct xdr_skb_reader *desc, void *to, size_t len); extern int csum_partial_copy_to_xdr(struct xdr_buf *, struct sk_buff *); -extern ssize_t xdr_partial_copy_from_skb(struct xdr_buf *, unsigned int, - struct xdr_skb_reader *, xdr_skb_read_actor); extern int xdr_encode_word(struct xdr_buf *, unsigned int, u32); extern int xdr_decode_word(struct xdr_buf *, unsigned int, u32 *); diff --git a/net/sunrpc/socklib.c b/net/sunrpc/socklib.c index 0e7c0dee7578..9062967575c4 100644 --- a/net/sunrpc/socklib.c +++ b/net/sunrpc/socklib.c @@ -69,7 +69,8 @@ static size_t xdr_skb_read_and_csum_bits(struct xdr_skb_reader *desc, void *to, * @copy_actor: virtual method for copying data * */ -ssize_t xdr_partial_copy_from_skb(struct xdr_buf *xdr, unsigned int base, struct xdr_skb_reader *desc, xdr_skb_read_actor copy_actor) +static ssize_t +xdr_partial_copy_from_skb(struct xdr_buf *xdr, unsigned int base, struct xdr_skb_reader *desc, xdr_skb_read_actor copy_actor) { struct page **ppage = xdr->pages; unsigned int len, pglen = xdr->page_len; @@ -140,7 +141,6 @@ ssize_t xdr_partial_copy_from_skb(struct xdr_buf *xdr, unsigned int base, struct out: return copied; } -EXPORT_SYMBOL_GPL(xdr_partial_copy_from_skb); /** * csum_partial_copy_to_xdr - checksum and copy data -- 2.17.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* Re: [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators 2018-09-17 13:03 ` [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 41/44] SUNRPC: Clean up - rename xs_tcp_data_receive() to xs_stream_data_receive() Trond Myklebust @ 2018-09-17 20:44 ` Trond Myklebust 2018-11-09 11:19 ` Catalin Marinas 2 siblings, 0 replies; 76+ messages in thread From: Trond Myklebust @ 2018-09-17 20:44 UTC (permalink / raw) To: linux-nfs T24gTW9uLCAyMDE4LTA5LTE3IGF0IDA5OjAzIC0wNDAwLCBUcm9uZCBNeWtsZWJ1c3Qgd3JvdGU6 DQo+IE1vc3Qgb2YgdGhpcyBjb2RlIHNob3VsZCBhbHNvIGJlIHJldXNhYmxlIHdpdGggb3RoZXIg c29ja2V0IHR5cGVzLg0KPiANCj4gU2lnbmVkLW9mZi1ieTogVHJvbmQgTXlrbGVidXN0IDx0cm9u ZC5teWtsZWJ1c3RAaGFtbWVyc3BhY2UuY29tPg0KPiAtLS0NCj4gIGluY2x1ZGUvbGludXgvc3Vu cnBjL3hwcnRzb2NrLmggfCAgMTkgKy0NCj4gIGluY2x1ZGUvdHJhY2UvZXZlbnRzL3N1bnJwYy5o ICAgfCAgMTUgKy0NCj4gIG5ldC9zdW5ycGMveHBydHNvY2suYyAgICAgICAgICAgfCA2OTQgKysr KysrKysrKysrKysrLS0tLS0tLS0tLS0tLS0tDQo+IC0tDQo+ICAzIGZpbGVzIGNoYW5nZWQsIDMz NSBpbnNlcnRpb25zKCspLCAzOTMgZGVsZXRpb25zKC0pDQo+IA0KPiBkaWZmIC0tZ2l0IGEvaW5j bHVkZS9saW51eC9zdW5ycGMveHBydHNvY2suaA0KPiBiL2luY2x1ZGUvbGludXgvc3VucnBjL3hw cnRzb2NrLmgNCj4gaW5kZXggMDA1Y2ZiNmU3MjM4Li40NThiZmUwMTM3ZjUgMTAwNjQ0DQo+IC0t LSBhL2luY2x1ZGUvbGludXgvc3VucnBjL3hwcnRzb2NrLmgNCj4gKysrIGIvaW5jbHVkZS9saW51 eC9zdW5ycGMveHBydHNvY2suaA0KPiBAQCAtMzEsMTUgKzMxLDE2IEBAIHN0cnVjdCBzb2NrX3hw cnQgew0KPiAgCSAqIFN0YXRlIG9mIFRDUCByZXBseSByZWNlaXZlDQo+ICAJICovDQo+ICAJc3Ry dWN0IHsNCj4gLQkJX19iZTMyCQlmcmFnaGRyLA0KPiArCQlzdHJ1Y3Qgew0KPiArCQkJX19iZTMy CWZyYWdoZHIsDQo+ICAJCQkJeGlkLA0KPiAgCQkJCWNhbGxkaXI7DQo+ICsJCX0gX19hdHRyaWJ1 dGVfXygocGFja2VkKSk7DQo+ICANCj4gIAkJdTMyCQlvZmZzZXQsDQo+ICAJCQkJbGVuOw0KPiAg DQo+IC0JCXVuc2lnbmVkIGxvbmcJY29waWVkLA0KPiAtCQkJCWZsYWdzOw0KPiArCQl1bnNpZ25l ZCBsb25nCWNvcGllZDsNCj4gIAl9IHJlY3Y7DQo+ICANCj4gIAkvKg0KPiBAQCAtNzYsMjEgKzc3 LDkgQEAgc3RydWN0IHNvY2tfeHBydCB7DQo+ICAJdm9pZAkJCSgqb2xkX2Vycm9yX3JlcG9ydCko c3RydWN0IHNvY2sgKik7DQo+ICB9Ow0KPiAgDQo+IC0vKg0KPiAtICogVENQIHJlY2VpdmUgc3Rh dGUgZmxhZ3MNCj4gLSAqLw0KPiAtI2RlZmluZSBUQ1BfUkNWX0xBU1RfRlJBRwkoMVVMIDw8IDAp DQo+IC0jZGVmaW5lIFRDUF9SQ1ZfQ09QWV9GUkFHSERSCSgxVUwgPDwgMSkNCj4gLSNkZWZpbmUg VENQX1JDVl9DT1BZX1hJRAkoMVVMIDw8IDIpDQo+IC0jZGVmaW5lIFRDUF9SQ1ZfQ09QWV9EQVRB CSgxVUwgPDwgMykNCj4gLSNkZWZpbmUgVENQX1JDVl9SRUFEX0NBTExESVIJKDFVTCA8PCA0KQ0K PiAtI2RlZmluZSBUQ1BfUkNWX0NPUFlfQ0FMTERJUgkoMVVMIDw8IDUpDQo+IC0NCj4gIC8qDQo+ ICAgKiBUQ1AgUlBDIGZsYWdzDQo+ICAgKi8NCj4gLSNkZWZpbmUgVENQX1JQQ19SRVBMWQkJKDFV TCA8PCA2KQ0KPiAtDQo+ICAjZGVmaW5lIFhQUlRfU09DS19DT05ORUNUSU5HCTFVDQo+ICAjZGVm aW5lIFhQUlRfU09DS19EQVRBX1JFQURZCSgyKQ0KPiAgI2RlZmluZSBYUFJUX1NPQ0tfVVBEX1RJ TUVPVVQJKDMpDQo+IGRpZmYgLS1naXQgYS9pbmNsdWRlL3RyYWNlL2V2ZW50cy9zdW5ycGMuaA0K PiBiL2luY2x1ZGUvdHJhY2UvZXZlbnRzL3N1bnJwYy5oDQo+IGluZGV4IDBhYTM0NzE5NGUwZi4u MTllMDhkMTI2OTZjIDEwMDY0NA0KPiAtLS0gYS9pbmNsdWRlL3RyYWNlL2V2ZW50cy9zdW5ycGMu aA0KPiArKysgYi9pbmNsdWRlL3RyYWNlL2V2ZW50cy9zdW5ycGMuaA0KPiBAQCAtNDk3LDE2ICs0 OTcsNiBAQCBUUkFDRV9FVkVOVCh4c190Y3BfZGF0YV9yZWFkeSwNCj4gIAkJCV9fZ2V0X3N0cihw b3J0KSwgX19lbnRyeS0+ZXJyLCBfX2VudHJ5LT50b3RhbCkNCj4gICk7DQo+ICANCj4gLSNkZWZp bmUgcnBjX3Nob3dfc29ja194cHJ0X2ZsYWdzKGZsYWdzKSBcDQo+IC0JX19wcmludF9mbGFncyhm bGFncywgInwiLCBcDQo+IC0JCXsgVENQX1JDVl9MQVNUX0ZSQUcsICJUQ1BfUkNWX0xBU1RfRlJB RyIgfSwgXA0KPiAtCQl7IFRDUF9SQ1ZfQ09QWV9GUkFHSERSLCAiVENQX1JDVl9DT1BZX0ZSQUdI RFIiIH0sIFwNCj4gLQkJeyBUQ1BfUkNWX0NPUFlfWElELCAiVENQX1JDVl9DT1BZX1hJRCIgfSwg XA0KPiAtCQl7IFRDUF9SQ1ZfQ09QWV9EQVRBLCAiVENQX1JDVl9DT1BZX0RBVEEiIH0sIFwNCj4g LQkJeyBUQ1BfUkNWX1JFQURfQ0FMTERJUiwgIlRDUF9SQ1ZfUkVBRF9DQUxMRElSIiB9LCBcDQo+ IC0JCXsgVENQX1JDVl9DT1BZX0NBTExESVIsICJUQ1BfUkNWX0NPUFlfQ0FMTERJUiIgfSwgXA0K PiAtCQl7IFRDUF9SUENfUkVQTFksICJUQ1BfUlBDX1JFUExZIiB9KQ0KPiAtDQo+ICBUUkFDRV9F VkVOVCh4c190Y3BfZGF0YV9yZWN2LA0KPiAgCVRQX1BST1RPKHN0cnVjdCBzb2NrX3hwcnQgKnhz KSwNCj4gIA0KPiBAQCAtNTE2LDcgKzUwNiw2IEBAIFRSQUNFX0VWRU5UKHhzX3RjcF9kYXRhX3Jl Y3YsDQo+ICAJCV9fc3RyaW5nKGFkZHIsIHhzLQ0KPiA+eHBydC5hZGRyZXNzX3N0cmluZ3NbUlBD X0RJU1BMQVlfQUREUl0pDQo+ICAJCV9fc3RyaW5nKHBvcnQsIHhzLQ0KPiA+eHBydC5hZGRyZXNz X3N0cmluZ3NbUlBDX0RJU1BMQVlfUE9SVF0pDQo+ICAJCV9fZmllbGQodTMyLCB4aWQpDQo+IC0J CV9fZmllbGQodW5zaWduZWQgbG9uZywgZmxhZ3MpDQo+ICAJCV9fZmllbGQodW5zaWduZWQgbG9u ZywgY29waWVkKQ0KPiAgCQlfX2ZpZWxkKHVuc2lnbmVkIGludCwgcmVjbGVuKQ0KPiAgCQlfX2Zp ZWxkKHVuc2lnbmVkIGxvbmcsIG9mZnNldCkNCj4gQEAgLTUyNiwxNSArNTE1LDEzIEBAIFRSQUNF X0VWRU5UKHhzX3RjcF9kYXRhX3JlY3YsDQo+ICAJCV9fYXNzaWduX3N0cihhZGRyLCB4cy0NCj4g PnhwcnQuYWRkcmVzc19zdHJpbmdzW1JQQ19ESVNQTEFZX0FERFJdKTsNCj4gIAkJX19hc3NpZ25f c3RyKHBvcnQsIHhzLQ0KPiA+eHBydC5hZGRyZXNzX3N0cmluZ3NbUlBDX0RJU1BMQVlfUE9SVF0p Ow0KPiAgCQlfX2VudHJ5LT54aWQgPSBiZTMyX3RvX2NwdSh4cy0+cmVjdi54aWQpOw0KPiAtCQlf X2VudHJ5LT5mbGFncyA9IHhzLT5yZWN2LmZsYWdzOw0KPiAgCQlfX2VudHJ5LT5jb3BpZWQgPSB4 cy0+cmVjdi5jb3BpZWQ7DQo+ICAJCV9fZW50cnktPnJlY2xlbiA9IHhzLT5yZWN2LmxlbjsNCj4g IAkJX19lbnRyeS0+b2Zmc2V0ID0geHMtPnJlY3Yub2Zmc2V0Ow0KPiAgCSksDQo+ICANCj4gLQlU UF9wcmludGsoInBlZXI9WyVzXTolcyB4aWQ9MHglMDh4IGZsYWdzPSVzIGNvcGllZD0lbHUNCj4g cmVjbGVuPSV1IG9mZnNldD0lbHUiLA0KPiArCVRQX3ByaW50aygicGVlcj1bJXNdOiVzIHhpZD0w eCUwOHggY29waWVkPSVsdSByZWNsZW49JXUNCj4gb2Zmc2V0PSVsdSIsDQo+ICAJCQlfX2dldF9z dHIoYWRkciksIF9fZ2V0X3N0cihwb3J0KSwgX19lbnRyeS0+eGlkLA0KPiAtCQkJcnBjX3Nob3df c29ja194cHJ0X2ZsYWdzKF9fZW50cnktPmZsYWdzKSwNCj4gIAkJCV9fZW50cnktPmNvcGllZCwg X19lbnRyeS0+cmVjbGVuLCBfX2VudHJ5LQ0KPiA+b2Zmc2V0KQ0KPiAgKTsNCj4gIA0KPiBkaWZm IC0tZ2l0IGEvbmV0L3N1bnJwYy94cHJ0c29jay5jIGIvbmV0L3N1bnJwYy94cHJ0c29jay5jDQo+ IGluZGV4IGYxNjQwNjIyOGVhZC4uNTI2OWFkOThiYjA4IDEwMDY0NA0KPiAtLS0gYS9uZXQvc3Vu cnBjL3hwcnRzb2NrLmMNCj4gKysrIGIvbmV0L3N1bnJwYy94cHJ0c29jay5jDQo+IEBAIC00Nywx MyArNDcsMTMgQEANCj4gICNpbmNsdWRlIDxuZXQvY2hlY2tzdW0uaD4NCj4gICNpbmNsdWRlIDxu ZXQvdWRwLmg+DQo+ICAjaW5jbHVkZSA8bmV0L3RjcC5oPg0KPiArI2luY2x1ZGUgPGxpbnV4L2J2 ZWMuaD4NCj4gKyNpbmNsdWRlIDxsaW51eC91aW8uaD4NCj4gIA0KPiAgI2luY2x1ZGUgPHRyYWNl L2V2ZW50cy9zdW5ycGMuaD4NCj4gIA0KPiAgI2luY2x1ZGUgInN1bnJwYy5oIg0KPiAgDQo+IC0j ZGVmaW5lIFJQQ19UQ1BfUkVBRF9DSFVOS19TWgkoMyo1MTIqMTAyNCkNCj4gLQ0KPiAgc3RhdGlj IHZvaWQgeHNfY2xvc2Uoc3RydWN0IHJwY194cHJ0ICp4cHJ0KTsNCj4gIHN0YXRpYyB2b2lkIHhz X3RjcF9zZXRfc29ja2V0X3RpbWVvdXRzKHN0cnVjdCBycGNfeHBydCAqeHBydCwNCj4gIAkJc3Ry dWN0IHNvY2tldCAqc29jayk7DQo+IEBAIC0zMjUsNiArMzI1LDMyMCBAQCBzdGF0aWMgdm9pZCB4 c19mcmVlX3BlZXJfYWRkcmVzc2VzKHN0cnVjdA0KPiBycGNfeHBydCAqeHBydCkNCj4gIAkJfQ0K PiAgfQ0KPiAgDQo+ICtzdGF0aWMgc2l6ZV90DQo+ICt4c19hbGxvY19zcGFyc2VfcGFnZXMoc3Ry dWN0IHhkcl9idWYgKmJ1Ziwgc2l6ZV90IHdhbnQsIGdmcF90IGdmcCkNCj4gK3sNCj4gKwlzaXpl X3QgaSxuOw0KPiArDQo+ICsJaWYgKCEoYnVmLT5mbGFncyAmIFhEUkJVRl9TUEFSU0VfUEFHRVMp KQ0KPiArCQlyZXR1cm4gd2FudDsNCj4gKwlpZiAod2FudCA+IGJ1Zi0+cGFnZV9sZW4pDQo+ICsJ CXdhbnQgPSBidWYtPnBhZ2VfbGVuOw0KPiArCW4gPSAoYnVmLT5wYWdlX2Jhc2UgKyB3YW50ICsg UEFHRV9TSVpFIC0gMSkgPj4gUEFHRV9TSElGVDsNCj4gKwlmb3IgKGkgPSAwOyBpIDwgbjsgaSsr KSB7DQo+ICsJCWlmIChidWYtPnBhZ2VzW2ldKQ0KPiArCQkJY29udGludWU7DQo+ICsJCWJ1Zi0+ YnZlY1tpXS5idl9wYWdlID0gYnVmLT5wYWdlc1tpXSA9IGFsbG9jX3BhZ2UoZ2ZwKTsNCj4gKwkJ aWYgKCFidWYtPnBhZ2VzW2ldKSB7DQo+ICsJCQlidWYtPnBhZ2VfbGVuID0gKGkgKiBQQUdFX1NJ WkUpIC0gYnVmLQ0KPiA+cGFnZV9iYXNlOw0KPiArCQkJcmV0dXJuIGJ1Zi0+cGFnZV9sZW47DQo+ ICsJCX0NCj4gKwl9DQo+ICsJcmV0dXJuIHdhbnQ7DQo+ICt9DQo+ICsNCj4gK3N0YXRpYyBzc2l6 ZV90DQo+ICt4c19zb2NrX3JlY3Ztc2coc3RydWN0IHNvY2tldCAqc29jaywgc3RydWN0IG1zZ2hk ciAqbXNnLCBpbnQgZmxhZ3MsDQo+IHNpemVfdCBzZWVrKQ0KPiArew0KPiArCXNzaXplX3QgcmV0 Ow0KPiArCWlmIChzZWVrICE9IDApDQo+ICsJCWlvdl9pdGVyX2FkdmFuY2UoJm1zZy0+bXNnX2l0 ZXIsIHNlZWspOw0KPiArCXJldCA9IHNvY2tfcmVjdm1zZyhzb2NrLCBtc2csIGZsYWdzKTsNCj4g KwlyZXR1cm4gcmV0ID4gMCA/IHJldCArIHNlZWsgOiByZXQ7DQo+ICt9DQo+ICsNCj4gK3N0YXRp YyBzc2l6ZV90DQo+ICt4c19yZWFkX2t2ZWMoc3RydWN0IHNvY2tldCAqc29jaywgc3RydWN0IG1z Z2hkciAqbXNnLCBpbnQgZmxhZ3MsDQo+ICsJCXN0cnVjdCBrdmVjICprdmVjLCBzaXplX3QgY291 bnQsIHNpemVfdCBzZWVrKQ0KPiArew0KPiArCWlvdl9pdGVyX2t2ZWMoJm1zZy0+bXNnX2l0ZXIs IFJFQUQgfCBJVEVSX0tWRUMsIGt2ZWMsIDEsDQo+IGNvdW50KTsNCj4gKwlyZXR1cm4geHNfc29j a19yZWN2bXNnKHNvY2ssIG1zZywgZmxhZ3MsIHNlZWspOw0KPiArfQ0KPiArDQo+ICtzdGF0aWMg c3NpemVfdA0KPiAreHNfcmVhZF9idmVjKHN0cnVjdCBzb2NrZXQgKnNvY2ssIHN0cnVjdCBtc2do ZHIgKm1zZywgaW50IGZsYWdzLA0KPiArCQlzdHJ1Y3QgYmlvX3ZlYyAqYnZlYywgdW5zaWduZWQg bG9uZyBuciwgc2l6ZV90IGNvdW50LA0KPiArCQlzaXplX3Qgc2VlaykNCj4gK3sNCj4gKwlpb3Zf aXRlcl9idmVjKCZtc2ctPm1zZ19pdGVyLCBSRUFEIHwgSVRFUl9CVkVDLCBidmVjLCBuciwNCj4g Y291bnQpOw0KPiArCXJldHVybiB4c19zb2NrX3JlY3Ztc2coc29jaywgbXNnLCBmbGFncywgc2Vl ayk7DQo+ICt9DQo+ICsNCj4gK3N0YXRpYyBzc2l6ZV90DQo+ICt4c19yZWFkX2Rpc2NhcmQoc3Ry dWN0IHNvY2tldCAqc29jaywgc3RydWN0IG1zZ2hkciAqbXNnLCBpbnQgZmxhZ3MsDQo+ICsJCXNp emVfdCBjb3VudCkNCj4gK3sNCj4gKwlzdHJ1Y3Qga3ZlYyBrdmVjID0geyAwIH07DQo+ICsJcmV0 dXJuIHhzX3JlYWRfa3ZlYyhzb2NrLCBtc2csIGZsYWdzIHwgTVNHX1RSVU5DLCAma3ZlYywgY291 bnQsDQo+IDApOw0KPiArfQ0KPiArDQo+ICtzdGF0aWMgc3NpemVfdA0KPiAreHNfcmVhZF94ZHJf YnVmKHN0cnVjdCBzb2NrZXQgKnNvY2ssIHN0cnVjdCBtc2doZHIgKm1zZywgaW50IGZsYWdzLA0K PiArCQlzdHJ1Y3QgeGRyX2J1ZiAqYnVmLCBzaXplX3QgY291bnQsIHNpemVfdCBzZWVrLCBzaXpl X3QNCj4gKnJlYWQpDQo+ICt7DQo+ICsJc2l6ZV90IHdhbnQsIHNlZWtfaW5pdCA9IHNlZWssIG9m ZnNldCA9IDA7DQo+ICsJc3NpemVfdCByZXQ7DQo+ICsNCj4gKwlpZiAoc2VlayA8IGJ1Zi0+aGVh ZFswXS5pb3ZfbGVuKSB7DQo+ICsJCXdhbnQgPSBtaW5fdChzaXplX3QsIGNvdW50LCBidWYtPmhl YWRbMF0uaW92X2xlbik7DQo+ICsJCXJldCA9IHhzX3JlYWRfa3ZlYyhzb2NrLCBtc2csIGZsYWdz LCAmYnVmLT5oZWFkWzBdLA0KPiB3YW50LCBzZWVrKTsNCj4gKwkJaWYgKHJldCA8PSAwKQ0KPiAr CQkJZ290byBzb2NrX2VycjsNCj4gKwkJb2Zmc2V0ICs9IHJldDsNCj4gKwkJaWYgKG9mZnNldCA9 PSBjb3VudCB8fCBtc2ctPm1zZ19mbGFncyAmDQo+IChNU0dfRU9SfE1TR19UUlVOQykpDQo+ICsJ CQlnb3RvIG91dDsNCj4gKwkJaWYgKHJldCAhPSB3YW50KQ0KPiArCQkJZ290byBlYWdhaW47DQo+ ICsJCXNlZWsgPSAwOw0KPiArCX0gZWxzZSB7DQo+ICsJCXNlZWsgLT0gYnVmLT5oZWFkWzBdLmlv dl9sZW47DQo+ICsJCW9mZnNldCArPSBidWYtPmhlYWRbMF0uaW92X2xlbjsNCj4gKwl9DQo+ICsJ aWYgKGJ1Zi0+cGFnZV9sZW4gJiYgc2VlayA8IGJ1Zi0+cGFnZV9sZW4pIHsNCj4gKwkJd2FudCA9 IG1pbl90KHNpemVfdCwgY291bnQgLSBvZmZzZXQsIGJ1Zi0+cGFnZV9sZW4pOw0KPiArCQl3YW50 ID0geHNfYWxsb2Nfc3BhcnNlX3BhZ2VzKGJ1Ziwgd2FudCwgR0ZQX05PV0FJVCk7DQo+ICsJCXJl dCA9IHhzX3JlYWRfYnZlYyhzb2NrLCBtc2csIGZsYWdzLCBidWYtPmJ2ZWMsDQo+ICsJCQkJeGRy X2J1Zl9wYWdlY291bnQoYnVmKSwNCj4gKwkJCQl3YW50ICsgYnVmLT5wYWdlX2Jhc2UsDQo+ICsJ CQkJc2VlayArIGJ1Zi0+cGFnZV9iYXNlKTsNCj4gKwkJaWYgKHJldCA8PSAwKQ0KPiArCQkJZ290 byBzb2NrX2VycjsNCj4gKwkJb2Zmc2V0ICs9IHJldDsNCg0KVGhlcmUgaXMgYSBidWcgaGVyZSB0 aGF0IGhhcyBiZWVuIGZpeGVkIHVwIGluIHRoZSBsaW51eC1uZnMub3JnIHRlc3RpbmcNCmJyYW5j aC4NCg0KPiArCQlpZiAob2Zmc2V0ID09IGNvdW50IHx8IG1zZy0+bXNnX2ZsYWdzICYNCj4gKE1T R19FT1J8TVNHX1RSVU5DKSkNCj4gKwkJCWdvdG8gb3V0Ow0KPiArCQlpZiAocmV0ICE9IHdhbnQp DQo+ICsJCQlnb3RvIGVhZ2FpbjsNCj4gKwkJc2VlayA9IDA7DQo+ICsJfSBlbHNlIHsNCj4gKwkJ c2VlayAtPSBidWYtPnBhZ2VfbGVuOw0KPiArCQlvZmZzZXQgKz0gYnVmLT5wYWdlX2xlbjsNCj4g Kwl9DQo+ICsJaWYgKGJ1Zi0+dGFpbFswXS5pb3ZfbGVuICYmIHNlZWsgPCBidWYtPnRhaWxbMF0u aW92X2xlbikgew0KPiArCQl3YW50ID0gbWluX3Qoc2l6ZV90LCBjb3VudCAtIG9mZnNldCwgYnVm LQ0KPiA+dGFpbFswXS5pb3ZfbGVuKTsNCj4gKwkJcmV0ID0geHNfcmVhZF9rdmVjKHNvY2ssIG1z ZywgZmxhZ3MsICZidWYtPnRhaWxbMF0sDQo+IHdhbnQsIHNlZWspOw0KPiArCQlpZiAocmV0IDw9 IDApDQo+ICsJCQlnb3RvIHNvY2tfZXJyOw0KPiArCQlvZmZzZXQgKz0gcmV0Ow0KPiArCQlpZiAo b2Zmc2V0ID09IGNvdW50IHx8IG1zZy0+bXNnX2ZsYWdzICYNCj4gKE1TR19FT1J8TVNHX1RSVU5D KSkNCj4gKwkJCWdvdG8gb3V0Ow0KPiArCQlpZiAocmV0ICE9IHdhbnQpDQo+ICsJCQlnb3RvIGVh Z2FpbjsNCj4gKwl9IGVsc2UNCj4gKwkJb2Zmc2V0ICs9IGJ1Zi0+dGFpbFswXS5pb3ZfbGVuOw0K PiArCXJldCA9IC1FTVNHU0laRTsNCj4gKwltc2ctPm1zZ19mbGFncyB8PSBNU0dfVFJVTkM7DQo+ ICtvdXQ6DQo+ICsJKnJlYWQgPSBvZmZzZXQgLSBzZWVrX2luaXQ7DQo+ICsJcmV0dXJuIHJldDsN Cj4gK2VhZ2FpbjoNCj4gKwlyZXQgPSAtRUFHQUlOOw0KPiArCWdvdG8gb3V0Ow0KPiArc29ja19l cnI6DQo+ICsJb2Zmc2V0ICs9IHNlZWs7DQo+ICsJZ290byBvdXQ7DQo+ICt9DQo+ICsNCj4gK3N0 YXRpYyB2b2lkDQo+ICt4c19yZWFkX2hlYWRlcihzdHJ1Y3Qgc29ja194cHJ0ICp0cmFuc3BvcnQs IHN0cnVjdCB4ZHJfYnVmICpidWYpDQo+ICt7DQo+ICsJaWYgKCF0cmFuc3BvcnQtPnJlY3YuY29w aWVkKSB7DQo+ICsJCWlmIChidWYtPmhlYWRbMF0uaW92X2xlbiA+PSB0cmFuc3BvcnQtPnJlY3Yu b2Zmc2V0KQ0KPiArCQkJbWVtY3B5KGJ1Zi0+aGVhZFswXS5pb3ZfYmFzZSwNCj4gKwkJCQkJJnRy YW5zcG9ydC0+cmVjdi54aWQsDQo+ICsJCQkJCXRyYW5zcG9ydC0+cmVjdi5vZmZzZXQpOw0KPiAr CQl0cmFuc3BvcnQtPnJlY3YuY29waWVkID0gdHJhbnNwb3J0LT5yZWN2Lm9mZnNldDsNCj4gKwl9 DQo+ICt9DQo+ICsNCj4gK3N0YXRpYyBib29sDQo+ICt4c19yZWFkX3N0cmVhbV9yZXF1ZXN0X2Rv bmUoc3RydWN0IHNvY2tfeHBydCAqdHJhbnNwb3J0KQ0KPiArew0KPiArCXJldHVybiB0cmFuc3Bv cnQtPnJlY3YuZnJhZ2hkciAmDQo+IGNwdV90b19iZTMyKFJQQ19MQVNUX1NUUkVBTV9GUkFHTUVO VCk7DQo+ICt9DQo+ICsNCj4gK3N0YXRpYyBzc2l6ZV90DQo+ICt4c19yZWFkX3N0cmVhbV9yZXF1 ZXN0KHN0cnVjdCBzb2NrX3hwcnQgKnRyYW5zcG9ydCwgc3RydWN0IG1zZ2hkcg0KPiAqbXNnLA0K PiArCQlpbnQgZmxhZ3MsIHN0cnVjdCBycGNfcnFzdCAqcmVxKQ0KPiArew0KPiArCXN0cnVjdCB4 ZHJfYnVmICpidWYgPSAmcmVxLT5ycV9wcml2YXRlX2J1ZjsNCj4gKwlzaXplX3Qgd2FudCwgcmVh ZDsNCj4gKwlzc2l6ZV90IHJldDsNCj4gKw0KPiArCXhzX3JlYWRfaGVhZGVyKHRyYW5zcG9ydCwg YnVmKTsNCj4gKw0KPiArCXdhbnQgPSB0cmFuc3BvcnQtPnJlY3YubGVuIC0gdHJhbnNwb3J0LT5y ZWN2Lm9mZnNldDsNCj4gKwlyZXQgPSB4c19yZWFkX3hkcl9idWYodHJhbnNwb3J0LT5zb2NrLCBt c2csIGZsYWdzLCBidWYsDQo+ICsJCQl0cmFuc3BvcnQtPnJlY3YuY29waWVkICsgd2FudCwgdHJh bnNwb3J0LQ0KPiA+cmVjdi5jb3BpZWQsDQo+ICsJCQkmcmVhZCk7DQo+ICsJdHJhbnNwb3J0LT5y ZWN2Lm9mZnNldCArPSByZWFkOw0KPiArCXRyYW5zcG9ydC0+cmVjdi5jb3BpZWQgKz0gcmVhZDsN Cj4gKwlpZiAodHJhbnNwb3J0LT5yZWN2Lm9mZnNldCA9PSB0cmFuc3BvcnQtPnJlY3YubGVuKSB7 DQo+ICsJCWlmICh4c19yZWFkX3N0cmVhbV9yZXF1ZXN0X2RvbmUodHJhbnNwb3J0KSkNCj4gKwkJ CW1zZy0+bXNnX2ZsYWdzIHw9IE1TR19FT1I7DQo+ICsJCXJldHVybiB0cmFuc3BvcnQtPnJlY3Yu Y29waWVkOw0KPiArCX0NCj4gKw0KPiArCXN3aXRjaCAocmV0KSB7DQo+ICsJY2FzZSAtRU1TR1NJ WkU6DQo+ICsJCXJldHVybiB0cmFuc3BvcnQtPnJlY3YuY29waWVkOw0KPiArCWNhc2UgMDoNCj4g KwkJcmV0dXJuIC1FU0hVVERPV047DQo+ICsJZGVmYXVsdDoNCj4gKwkJaWYgKHJldCA8IDApDQo+ ICsJCQlyZXR1cm4gcmV0Ow0KPiArCX0NCj4gKwlyZXR1cm4gLUVBR0FJTjsNCj4gK30NCj4gKw0K PiArc3RhdGljIHNpemVfdA0KPiAreHNfcmVhZF9zdHJlYW1faGVhZGVyc2l6ZShib29sIGlzZnJh ZykNCj4gK3sNCj4gKwlpZiAoaXNmcmFnKQ0KPiArCQlyZXR1cm4gc2l6ZW9mKF9fYmUzMik7DQo+ ICsJcmV0dXJuIDMgKiBzaXplb2YoX19iZTMyKTsNCj4gK30NCj4gKw0KPiArc3RhdGljIHNzaXpl X3QNCj4gK3hzX3JlYWRfc3RyZWFtX2hlYWRlcihzdHJ1Y3Qgc29ja194cHJ0ICp0cmFuc3BvcnQs IHN0cnVjdCBtc2doZHINCj4gKm1zZywNCj4gKwkJaW50IGZsYWdzLCBzaXplX3Qgd2FudCwgc2l6 ZV90IHNlZWspDQo+ICt7DQo+ICsJc3RydWN0IGt2ZWMga3ZlYyA9IHsNCj4gKwkJLmlvdl9iYXNl ID0gJnRyYW5zcG9ydC0+cmVjdi5mcmFnaGRyLA0KPiArCQkuaW92X2xlbiA9IHdhbnQsDQo+ICsJ fTsNCj4gKwlyZXR1cm4geHNfcmVhZF9rdmVjKHRyYW5zcG9ydC0+c29jaywgbXNnLCBmbGFncywg Jmt2ZWMsIHdhbnQsDQo+IHNlZWspOw0KPiArfQ0KPiArDQo+ICsjaWYgZGVmaW5lZChDT05GSUdf U1VOUlBDX0JBQ0tDSEFOTkVMKQ0KPiArc3RhdGljIHNzaXplX3QNCj4gK3hzX3JlYWRfc3RyZWFt X2NhbGwoc3RydWN0IHNvY2tfeHBydCAqdHJhbnNwb3J0LCBzdHJ1Y3QgbXNnaGRyICptc2csDQo+ IGludCBmbGFncykNCj4gK3sNCj4gKwlzdHJ1Y3QgcnBjX3hwcnQgKnhwcnQgPSAmdHJhbnNwb3J0 LT54cHJ0Ow0KPiArCXN0cnVjdCBycGNfcnFzdCAqcmVxOw0KPiArCXNzaXplX3QgcmV0Ow0KPiAr DQo+ICsJLyogTG9vayB1cCBhbmQgbG9jayB0aGUgcmVxdWVzdCBjb3JyZXNwb25kaW5nIHRvIHRo ZSBnaXZlbiBYSUQNCj4gKi8NCj4gKwlyZXEgPSB4cHJ0X2xvb2t1cF9iY19yZXF1ZXN0KHhwcnQs IHRyYW5zcG9ydC0+cmVjdi54aWQpOw0KPiArCWlmICghcmVxKSB7DQo+ICsJCXByaW50ayhLRVJO X1dBUk5JTkcgIkNhbGxiYWNrIHNsb3QgdGFibGUNCj4gb3ZlcmZsb3dlZFxuIik7DQo+ICsJCXJl dHVybiAtRVNIVVRET1dOOw0KPiArCX0NCj4gKw0KPiArCXJldCA9IHhzX3JlYWRfc3RyZWFtX3Jl cXVlc3QodHJhbnNwb3J0LCBtc2csIGZsYWdzLCByZXEpOw0KPiArCWlmIChtc2ctPm1zZ19mbGFn cyAmIChNU0dfRU9SfE1TR19UUlVOQykpDQo+ICsJCXhwcnRfY29tcGxldGVfYmNfcmVxdWVzdChy ZXEsIHJldCk7DQo+ICsNCj4gKwlyZXR1cm4gcmV0Ow0KPiArfQ0KPiArI2Vsc2UgLyogQ09ORklH X1NVTlJQQ19CQUNLQ0hBTk5FTCAqLw0KPiArc3RhdGljIHNzaXplX3QNCj4gK3hzX3JlYWRfc3Ry ZWFtX2NhbGwoc3RydWN0IHNvY2tfeHBydCAqdHJhbnNwb3J0LCBzdHJ1Y3QgbXNnaGRyICptc2cs DQo+IGludCBmbGFncykNCj4gK3sNCj4gKwlyZXR1cm4gLUVTSFVURE9XTjsNCj4gK30NCj4gKyNl bmRpZiAvKiBDT05GSUdfU1VOUlBDX0JBQ0tDSEFOTkVMICovDQo+ICsNCj4gK3N0YXRpYyBzc2l6 ZV90DQo+ICt4c19yZWFkX3N0cmVhbV9yZXBseShzdHJ1Y3Qgc29ja194cHJ0ICp0cmFuc3BvcnQs IHN0cnVjdCBtc2doZHINCj4gKm1zZywgaW50IGZsYWdzKQ0KPiArew0KPiArCXN0cnVjdCBycGNf eHBydCAqeHBydCA9ICZ0cmFuc3BvcnQtPnhwcnQ7DQo+ICsJc3RydWN0IHJwY19ycXN0ICpyZXE7 DQo+ICsJc3NpemVfdCByZXQgPSAwOw0KPiArDQo+ICsJLyogTG9vayB1cCBhbmQgbG9jayB0aGUg cmVxdWVzdCBjb3JyZXNwb25kaW5nIHRvIHRoZSBnaXZlbiBYSUQNCj4gKi8NCj4gKwlzcGluX2xv Y2soJnhwcnQtPnF1ZXVlX2xvY2spOw0KPiArCXJlcSA9IHhwcnRfbG9va3VwX3Jxc3QoeHBydCwg dHJhbnNwb3J0LT5yZWN2LnhpZCk7DQo+ICsJaWYgKCFyZXEpIHsNCj4gKwkJbXNnLT5tc2dfZmxh Z3MgfD0gTVNHX1RSVU5DOw0KPiArCQlnb3RvIG91dDsNCj4gKwl9DQo+ICsJeHBydF9waW5fcnFz dChyZXEpOw0KPiArCXNwaW5fdW5sb2NrKCZ4cHJ0LT5xdWV1ZV9sb2NrKTsNCj4gKw0KPiArCXJl dCA9IHhzX3JlYWRfc3RyZWFtX3JlcXVlc3QodHJhbnNwb3J0LCBtc2csIGZsYWdzLCByZXEpOw0K PiArDQo+ICsJc3Bpbl9sb2NrKCZ4cHJ0LT5xdWV1ZV9sb2NrKTsNCj4gKwlpZiAobXNnLT5tc2df ZmxhZ3MgJiAoTVNHX0VPUnxNU0dfVFJVTkMpKQ0KPiArCQl4cHJ0X2NvbXBsZXRlX3Jxc3QocmVx LT5ycV90YXNrLCByZXQpOw0KPiArCXhwcnRfdW5waW5fcnFzdChyZXEpOw0KPiArb3V0Og0KPiAr CXNwaW5fdW5sb2NrKCZ4cHJ0LT5xdWV1ZV9sb2NrKTsNCj4gKwlyZXR1cm4gcmV0Ow0KPiArfQ0K PiArDQo+ICtzdGF0aWMgc3NpemVfdA0KPiAreHNfcmVhZF9zdHJlYW0oc3RydWN0IHNvY2tfeHBy dCAqdHJhbnNwb3J0LCBpbnQgZmxhZ3MpDQo+ICt7DQo+ICsJc3RydWN0IG1zZ2hkciBtc2cgPSB7 IDAgfTsNCj4gKwlzaXplX3Qgd2FudCwgcmVhZCA9IDA7DQo+ICsJc3NpemVfdCByZXQgPSAwOw0K PiArDQo+ICsJaWYgKHRyYW5zcG9ydC0+cmVjdi5sZW4gPT0gMCkgew0KPiArCQl3YW50ID0geHNf cmVhZF9zdHJlYW1faGVhZGVyc2l6ZSh0cmFuc3BvcnQtPnJlY3YuY29waWVkIA0KPiAhPSAwKTsN Cj4gKwkJcmV0ID0geHNfcmVhZF9zdHJlYW1faGVhZGVyKHRyYW5zcG9ydCwgJm1zZywgZmxhZ3Ms DQo+IHdhbnQsDQo+ICsJCQkJdHJhbnNwb3J0LT5yZWN2Lm9mZnNldCk7DQo+ICsJCWlmIChyZXQg PD0gMCkNCj4gKwkJCWdvdG8gb3V0X2VycjsNCj4gKwkJdHJhbnNwb3J0LT5yZWN2Lm9mZnNldCA9 IHJldDsNCj4gKwkJaWYgKHJldCAhPSB3YW50KSB7DQo+ICsJCQlyZXQgPSAtRUFHQUlOOw0KPiAr CQkJZ290byBvdXRfZXJyOw0KPiArCQl9DQo+ICsJCXRyYW5zcG9ydC0+cmVjdi5sZW4gPSBiZTMy X3RvX2NwdSh0cmFuc3BvcnQtDQo+ID5yZWN2LmZyYWdoZHIpICYNCj4gKwkJCVJQQ19GUkFHTUVO VF9TSVpFX01BU0s7DQo+ICsJCXRyYW5zcG9ydC0+cmVjdi5vZmZzZXQgLT0gc2l6ZW9mKHRyYW5z cG9ydC0NCj4gPnJlY3YuZnJhZ2hkcik7DQo+ICsJCXJlYWQgPSByZXQ7DQo+ICsJfQ0KPiArDQo+ ICsJc3dpdGNoIChiZTMyX3RvX2NwdSh0cmFuc3BvcnQtPnJlY3YuY2FsbGRpcikpIHsNCj4gKwlj YXNlIFJQQ19DQUxMOg0KPiArCQlyZXQgPSB4c19yZWFkX3N0cmVhbV9jYWxsKHRyYW5zcG9ydCwg Jm1zZywgZmxhZ3MpOw0KPiArCQlicmVhazsNCj4gKwljYXNlIFJQQ19SRVBMWToNCj4gKwkJcmV0 ID0geHNfcmVhZF9zdHJlYW1fcmVwbHkodHJhbnNwb3J0LCAmbXNnLCBmbGFncyk7DQo+ICsJfQ0K PiArCWlmIChtc2cubXNnX2ZsYWdzICYgTVNHX1RSVU5DKSB7DQo+ICsJCXRyYW5zcG9ydC0+cmVj di5jYWxsZGlyID0gY3B1X3RvX2JlMzIoLTEpOw0KPiArCQl0cmFuc3BvcnQtPnJlY3YuY29waWVk ID0gLTE7DQo+ICsJfQ0KPiArCWlmIChyZXQgPCAwKQ0KPiArCQlnb3RvIG91dF9lcnI7DQo+ICsJ cmVhZCArPSByZXQ7DQo+ICsJaWYgKHRyYW5zcG9ydC0+cmVjdi5vZmZzZXQgPCB0cmFuc3BvcnQt PnJlY3YubGVuKSB7DQo+ICsJCXJldCA9IHhzX3JlYWRfZGlzY2FyZCh0cmFuc3BvcnQtPnNvY2ss ICZtc2csIGZsYWdzLA0KPiArCQkJCXRyYW5zcG9ydC0+cmVjdi5sZW4gLSB0cmFuc3BvcnQtDQo+ ID5yZWN2Lm9mZnNldCk7DQo+ICsJCWlmIChyZXQgPD0gMCkNCj4gKwkJCWdvdG8gb3V0X2VycjsN Cj4gKwkJdHJhbnNwb3J0LT5yZWN2Lm9mZnNldCArPSByZXQ7DQo+ICsJCXJlYWQgKz0gcmV0Ow0K DQouLi5hbmQgYW5vdGhlciBidWcgaGVyZS4NCg0KPiArCX0NCj4gKwlpZiAoeHNfcmVhZF9zdHJl YW1fcmVxdWVzdF9kb25lKHRyYW5zcG9ydCkpIHsNCj4gKwkJdHJhY2VfeHNfdGNwX2RhdGFfcmVj dih0cmFuc3BvcnQpOw0KPiArCQl0cmFuc3BvcnQtPnJlY3YuY29waWVkID0gMDsNCj4gKwl9DQo+ ICsJdHJhbnNwb3J0LT5yZWN2Lm9mZnNldCA9IDA7DQo+ICsJdHJhbnNwb3J0LT5yZWN2LmxlbiA9 IDA7DQo+ICsJcmV0dXJuIHJlYWQ7DQo+ICtvdXRfZXJyOg0KPiArCXN3aXRjaCAocmV0KSB7DQo+ ICsJY2FzZSAwOg0KPiArCWNhc2UgLUVTSFVURE9XTjoNCj4gKwkJeHBydF9mb3JjZV9kaXNjb25u ZWN0KCZ0cmFuc3BvcnQtPnhwcnQpOw0KPiArCQlyZXR1cm4gLUVTSFVURE9XTjsNCj4gKwl9DQo+ ICsJcmV0dXJuIHJldDsNCj4gK30NCj4gKw0KPiAgI2RlZmluZSBYU19TRU5ETVNHX0ZMQUdTCShN U0dfRE9OVFdBSVQgfCBNU0dfTk9TSUdOQUwpDQo+ICANCj4gIHN0YXRpYyBpbnQgeHNfc2VuZF9r dmVjKHN0cnVjdCBzb2NrZXQgKnNvY2ssIHN0cnVjdCBzb2NrYWRkciAqYWRkciwNCj4gaW50IGFk ZHJsZW4sIHN0cnVjdCBrdmVjICp2ZWMsIHVuc2lnbmVkIGludCBiYXNlLCBpbnQgbW9yZSkNCj4g QEAgLTQ4NCw2ICs3OTgsMTIgQEAgc3RhdGljIGludCB4c19ub3NwYWNlKHN0cnVjdCBycGNfcnFz dCAqcmVxKQ0KPiAgCXJldHVybiByZXQ7DQo+ICB9DQo+ICANCj4gK3N0YXRpYyB2b2lkDQo+ICt4 c19zdHJlYW1fcHJlcGFyZV9yZXF1ZXN0KHN0cnVjdCBycGNfcnFzdCAqcmVxKQ0KPiArew0KPiAr CXJlcS0+cnFfdGFzay0+dGtfc3RhdHVzID0geGRyX2FsbG9jX2J2ZWMoJnJlcS0+cnFfcmN2X2J1 ZiwNCj4gR0ZQX05PSU8pOw0KPiArfQ0KPiArDQo+ICAvKg0KPiAgICogRGV0ZXJtaW5lIGlmIHRo ZSBwcmV2aW91cyBtZXNzYWdlIGluIHRoZSBzdHJlYW0gd2FzIGFib3J0ZWQNCj4gYmVmb3JlIGl0 DQo+ICAgKiBjb3VsZCBjb21wbGV0ZSB0cmFuc21pc3Npb24uDQo+IEBAIC0xMTU3LDI2MyArMTQ3 Nyw3IEBAIHN0YXRpYyB2b2lkIHhzX3RjcF9mb3JjZV9jbG9zZShzdHJ1Y3QNCj4gcnBjX3hwcnQg KnhwcnQpDQo+ICAJeHBydF9mb3JjZV9kaXNjb25uZWN0KHhwcnQpOw0KPiAgfQ0KPiAgDQo+IC1z dGF0aWMgaW5saW5lIHZvaWQgeHNfdGNwX3JlYWRfZnJhZ2hkcihzdHJ1Y3QgcnBjX3hwcnQgKnhw cnQsIHN0cnVjdA0KPiB4ZHJfc2tiX3JlYWRlciAqZGVzYykNCj4gLXsNCj4gLQlzdHJ1Y3Qgc29j a194cHJ0ICp0cmFuc3BvcnQgPSBjb250YWluZXJfb2YoeHBydCwgc3RydWN0DQo+IHNvY2tfeHBy dCwgeHBydCk7DQo+IC0Jc2l6ZV90IGxlbiwgdXNlZDsNCj4gLQljaGFyICpwOw0KPiAtDQo+IC0J cCA9ICgoY2hhciAqKSAmdHJhbnNwb3J0LT5yZWN2LmZyYWdoZHIpICsgdHJhbnNwb3J0LQ0KPiA+ cmVjdi5vZmZzZXQ7DQo+IC0JbGVuID0gc2l6ZW9mKHRyYW5zcG9ydC0+cmVjdi5mcmFnaGRyKSAt IHRyYW5zcG9ydC0+cmVjdi5vZmZzZXQ7DQo+IC0JdXNlZCA9IHhkcl9za2JfcmVhZF9iaXRzKGRl c2MsIHAsIGxlbik7DQo+IC0JdHJhbnNwb3J0LT5yZWN2Lm9mZnNldCArPSB1c2VkOw0KPiAtCWlm ICh1c2VkICE9IGxlbikNCj4gLQkJcmV0dXJuOw0KPiAtDQo+IC0JdHJhbnNwb3J0LT5yZWN2Lmxl biA9IG50b2hsKHRyYW5zcG9ydC0+cmVjdi5mcmFnaGRyKTsNCj4gLQlpZiAodHJhbnNwb3J0LT5y ZWN2LmxlbiAmIFJQQ19MQVNUX1NUUkVBTV9GUkFHTUVOVCkNCj4gLQkJdHJhbnNwb3J0LT5yZWN2 LmZsYWdzIHw9IFRDUF9SQ1ZfTEFTVF9GUkFHOw0KPiAtCWVsc2UNCj4gLQkJdHJhbnNwb3J0LT5y ZWN2LmZsYWdzICY9IH5UQ1BfUkNWX0xBU1RfRlJBRzsNCj4gLQl0cmFuc3BvcnQtPnJlY3YubGVu ICY9IFJQQ19GUkFHTUVOVF9TSVpFX01BU0s7DQo+IC0NCj4gLQl0cmFuc3BvcnQtPnJlY3YuZmxh Z3MgJj0gflRDUF9SQ1ZfQ09QWV9GUkFHSERSOw0KPiAtCXRyYW5zcG9ydC0+cmVjdi5vZmZzZXQg PSAwOw0KPiAtDQo+IC0JLyogU2FuaXR5IGNoZWNrIG9mIHRoZSByZWNvcmQgbGVuZ3RoICovDQo+ IC0JaWYgKHVubGlrZWx5KHRyYW5zcG9ydC0+cmVjdi5sZW4gPCA4KSkgew0KPiAtCQlkcHJpbnRr KCJSUEM6ICAgICAgIGludmFsaWQgVENQIHJlY29yZCBmcmFnbWVudA0KPiBsZW5ndGhcbiIpOw0K PiAtCQl4c190Y3BfZm9yY2VfY2xvc2UoeHBydCk7DQo+IC0JCXJldHVybjsNCj4gLQl9DQo+IC0J ZHByaW50aygiUlBDOiAgICAgICByZWFkaW5nIFRDUCByZWNvcmQgZnJhZ21lbnQgb2YgbGVuZ3Ro DQo+ICVkXG4iLA0KPiAtCQkJdHJhbnNwb3J0LT5yZWN2Lmxlbik7DQo+IC19DQo+IC0NCj4gLXN0 YXRpYyB2b2lkIHhzX3RjcF9jaGVja19mcmFnaGRyKHN0cnVjdCBzb2NrX3hwcnQgKnRyYW5zcG9y dCkNCj4gLXsNCj4gLQlpZiAodHJhbnNwb3J0LT5yZWN2Lm9mZnNldCA9PSB0cmFuc3BvcnQtPnJl Y3YubGVuKSB7DQo+IC0JCXRyYW5zcG9ydC0+cmVjdi5mbGFncyB8PSBUQ1BfUkNWX0NPUFlfRlJB R0hEUjsNCj4gLQkJdHJhbnNwb3J0LT5yZWN2Lm9mZnNldCA9IDA7DQo+IC0JCWlmICh0cmFuc3Bv cnQtPnJlY3YuZmxhZ3MgJiBUQ1BfUkNWX0xBU1RfRlJBRykgew0KPiAtCQkJdHJhbnNwb3J0LT5y ZWN2LmZsYWdzICY9IH5UQ1BfUkNWX0NPUFlfREFUQTsNCj4gLQkJCXRyYW5zcG9ydC0+cmVjdi5m bGFncyB8PSBUQ1BfUkNWX0NPUFlfWElEOw0KPiAtCQkJdHJhbnNwb3J0LT5yZWN2LmNvcGllZCA9 IDA7DQo+IC0JCX0NCj4gLQl9DQo+IC19DQo+IC0NCj4gLXN0YXRpYyBpbmxpbmUgdm9pZCB4c190 Y3BfcmVhZF94aWQoc3RydWN0IHNvY2tfeHBydCAqdHJhbnNwb3J0LA0KPiBzdHJ1Y3QgeGRyX3Nr Yl9yZWFkZXIgKmRlc2MpDQo+IC17DQo+IC0Jc2l6ZV90IGxlbiwgdXNlZDsNCj4gLQljaGFyICpw Ow0KPiAtDQo+IC0JbGVuID0gc2l6ZW9mKHRyYW5zcG9ydC0+cmVjdi54aWQpIC0gdHJhbnNwb3J0 LT5yZWN2Lm9mZnNldDsNCj4gLQlkcHJpbnRrKCJSUEM6ICAgICAgIHJlYWRpbmcgWElEICglenUg Ynl0ZXMpXG4iLCBsZW4pOw0KPiAtCXAgPSAoKGNoYXIgKikgJnRyYW5zcG9ydC0+cmVjdi54aWQp ICsgdHJhbnNwb3J0LT5yZWN2Lm9mZnNldDsNCj4gLQl1c2VkID0geGRyX3NrYl9yZWFkX2JpdHMo ZGVzYywgcCwgbGVuKTsNCj4gLQl0cmFuc3BvcnQtPnJlY3Yub2Zmc2V0ICs9IHVzZWQ7DQo+IC0J aWYgKHVzZWQgIT0gbGVuKQ0KPiAtCQlyZXR1cm47DQo+IC0JdHJhbnNwb3J0LT5yZWN2LmZsYWdz ICY9IH5UQ1BfUkNWX0NPUFlfWElEOw0KPiAtCXRyYW5zcG9ydC0+cmVjdi5mbGFncyB8PSBUQ1Bf UkNWX1JFQURfQ0FMTERJUjsNCj4gLQl0cmFuc3BvcnQtPnJlY3YuY29waWVkID0gNDsNCj4gLQlk cHJpbnRrKCJSUEM6ICAgICAgIHJlYWRpbmcgJXMgWElEICUwOHhcbiIsDQo+IC0JCQkodHJhbnNw b3J0LT5yZWN2LmZsYWdzICYgVENQX1JQQ19SRVBMWSkgPw0KPiAicmVwbHkgZm9yIg0KPiAtCQkJ CQkJCSAgICAgIDoNCj4gInJlcXVlc3Qgd2l0aCIsDQo+IC0JCQludG9obCh0cmFuc3BvcnQtPnJl Y3YueGlkKSk7DQo+IC0JeHNfdGNwX2NoZWNrX2ZyYWdoZHIodHJhbnNwb3J0KTsNCj4gLX0NCj4g LQ0KPiAtc3RhdGljIGlubGluZSB2b2lkIHhzX3RjcF9yZWFkX2NhbGxkaXIoc3RydWN0IHNvY2tf eHBydCAqdHJhbnNwb3J0LA0KPiAtCQkJCSAgICAgICBzdHJ1Y3QgeGRyX3NrYl9yZWFkZXIgKmRl c2MpDQo+IC17DQo+IC0Jc2l6ZV90IGxlbiwgdXNlZDsNCj4gLQl1MzIgb2Zmc2V0Ow0KPiAtCWNo YXIgKnA7DQo+IC0NCj4gLQkvKg0KPiAtCSAqIFdlIHdhbnQgdHJhbnNwb3J0LT5yZWN2Lm9mZnNl dCB0byBiZSA4IGF0IHRoZSBlbmQgb2YgdGhpcw0KPiByb3V0aW5lDQo+IC0JICogKDQgYnl0ZXMg Zm9yIHRoZSB4aWQgYW5kIDQgYnl0ZXMgZm9yIHRoZSBjYWxsL3JlcGx5IGZsYWcpLg0KPiAtCSAq IFdoZW4gdGhpcyBmdW5jdGlvbiBpcyBjYWxsZWQgZm9yIHRoZSBmaXJzdCB0aW1lLA0KPiAtCSAq IHRyYW5zcG9ydC0+cmVjdi5vZmZzZXQgaXMgNCAoYWZ0ZXIgaGF2aW5nIGFscmVhZHkgcmVhZCB0 aGUNCj4geGlkKS4NCj4gLQkgKi8NCj4gLQlvZmZzZXQgPSB0cmFuc3BvcnQtPnJlY3Yub2Zmc2V0 IC0gc2l6ZW9mKHRyYW5zcG9ydC0+cmVjdi54aWQpOw0KPiAtCWxlbiA9IHNpemVvZih0cmFuc3Bv cnQtPnJlY3YuY2FsbGRpcikgLSBvZmZzZXQ7DQo+IC0JZHByaW50aygiUlBDOiAgICAgICByZWFk aW5nIENBTEwvUkVQTFkgZmxhZyAoJXp1IGJ5dGVzKVxuIiwNCj4gbGVuKTsNCj4gLQlwID0gKChj aGFyICopICZ0cmFuc3BvcnQtPnJlY3YuY2FsbGRpcikgKyBvZmZzZXQ7DQo+IC0JdXNlZCA9IHhk cl9za2JfcmVhZF9iaXRzKGRlc2MsIHAsIGxlbik7DQo+IC0JdHJhbnNwb3J0LT5yZWN2Lm9mZnNl dCArPSB1c2VkOw0KPiAtCWlmICh1c2VkICE9IGxlbikNCj4gLQkJcmV0dXJuOw0KPiAtCXRyYW5z cG9ydC0+cmVjdi5mbGFncyAmPSB+VENQX1JDVl9SRUFEX0NBTExESVI7DQo+IC0JLyoNCj4gLQkg KiBXZSBkb24ndCB5ZXQgaGF2ZSB0aGUgWERSIGJ1ZmZlciwgc28gd2Ugd2lsbCB3cml0ZSB0aGUN Cj4gY2FsbGRpcg0KPiAtCSAqIG91dCBhZnRlciB3ZSBnZXQgdGhlIGJ1ZmZlciBmcm9tIHRoZSAn c3RydWN0IHJwY19ycXN0Jw0KPiAtCSAqLw0KPiAtCXN3aXRjaCAobnRvaGwodHJhbnNwb3J0LT5y ZWN2LmNhbGxkaXIpKSB7DQo+IC0JY2FzZSBSUENfUkVQTFk6DQo+IC0JCXRyYW5zcG9ydC0+cmVj di5mbGFncyB8PSBUQ1BfUkNWX0NPUFlfQ0FMTERJUjsNCj4gLQkJdHJhbnNwb3J0LT5yZWN2LmZs YWdzIHw9IFRDUF9SQ1ZfQ09QWV9EQVRBOw0KPiAtCQl0cmFuc3BvcnQtPnJlY3YuZmxhZ3MgfD0g VENQX1JQQ19SRVBMWTsNCj4gLQkJYnJlYWs7DQo+IC0JY2FzZSBSUENfQ0FMTDoNCj4gLQkJdHJh bnNwb3J0LT5yZWN2LmZsYWdzIHw9IFRDUF9SQ1ZfQ09QWV9DQUxMRElSOw0KPiAtCQl0cmFuc3Bv cnQtPnJlY3YuZmxhZ3MgfD0gVENQX1JDVl9DT1BZX0RBVEE7DQo+IC0JCXRyYW5zcG9ydC0+cmVj di5mbGFncyAmPSB+VENQX1JQQ19SRVBMWTsNCj4gLQkJYnJlYWs7DQo+IC0JZGVmYXVsdDoNCj4g LQkJZHByaW50aygiUlBDOiAgICAgICBpbnZhbGlkIHJlcXVlc3QgbWVzc2FnZSB0eXBlXG4iKTsN Cj4gLQkJeHNfdGNwX2ZvcmNlX2Nsb3NlKCZ0cmFuc3BvcnQtPnhwcnQpOw0KPiAtCX0NCj4gLQl4 c190Y3BfY2hlY2tfZnJhZ2hkcih0cmFuc3BvcnQpOw0KPiAtfQ0KPiAtDQo+IC1zdGF0aWMgaW5s aW5lIHZvaWQgeHNfdGNwX3JlYWRfY29tbW9uKHN0cnVjdCBycGNfeHBydCAqeHBydCwNCj4gLQkJ CQkgICAgIHN0cnVjdCB4ZHJfc2tiX3JlYWRlciAqZGVzYywNCj4gLQkJCQkgICAgIHN0cnVjdCBy cGNfcnFzdCAqcmVxKQ0KPiAtew0KPiAtCXN0cnVjdCBzb2NrX3hwcnQgKnRyYW5zcG9ydCA9DQo+ IC0JCQkJY29udGFpbmVyX29mKHhwcnQsIHN0cnVjdCBzb2NrX3hwcnQsDQo+IHhwcnQpOw0KPiAt CXN0cnVjdCB4ZHJfYnVmICpyY3ZidWY7DQo+IC0Jc2l6ZV90IGxlbjsNCj4gLQlzc2l6ZV90IHI7 DQo+IC0NCj4gLQlyY3ZidWYgPSAmcmVxLT5ycV9wcml2YXRlX2J1ZjsNCj4gLQ0KPiAtCWlmICh0 cmFuc3BvcnQtPnJlY3YuZmxhZ3MgJiBUQ1BfUkNWX0NPUFlfQ0FMTERJUikgew0KPiAtCQkvKg0K PiAtCQkgKiBTYXZlIHRoZSBSUEMgZGlyZWN0aW9uIGluIHRoZSBYRFIgYnVmZmVyDQo+IC0JCSAq Lw0KPiAtCQltZW1jcHkocmN2YnVmLT5oZWFkWzBdLmlvdl9iYXNlICsgdHJhbnNwb3J0LQ0KPiA+ cmVjdi5jb3BpZWQsDQo+IC0JCQkmdHJhbnNwb3J0LT5yZWN2LmNhbGxkaXIsDQo+IC0JCQlzaXpl b2YodHJhbnNwb3J0LT5yZWN2LmNhbGxkaXIpKTsNCj4gLQkJdHJhbnNwb3J0LT5yZWN2LmNvcGll ZCArPSBzaXplb2YodHJhbnNwb3J0LQ0KPiA+cmVjdi5jYWxsZGlyKTsNCj4gLQkJdHJhbnNwb3J0 LT5yZWN2LmZsYWdzICY9IH5UQ1BfUkNWX0NPUFlfQ0FMTERJUjsNCj4gLQl9DQo+IC0NCj4gLQls ZW4gPSBkZXNjLT5jb3VudDsNCj4gLQlpZiAobGVuID4gdHJhbnNwb3J0LT5yZWN2LmxlbiAtIHRy YW5zcG9ydC0+cmVjdi5vZmZzZXQpDQo+IC0JCWRlc2MtPmNvdW50ID0gdHJhbnNwb3J0LT5yZWN2 LmxlbiAtIHRyYW5zcG9ydC0NCj4gPnJlY3Yub2Zmc2V0Ow0KPiAtCXIgPSB4ZHJfcGFydGlhbF9j b3B5X2Zyb21fc2tiKHJjdmJ1ZiwgdHJhbnNwb3J0LT5yZWN2LmNvcGllZCwNCj4gLQkJCQkJICBk ZXNjLCB4ZHJfc2tiX3JlYWRfYml0cyk7DQo+IC0NCj4gLQlpZiAoZGVzYy0+Y291bnQpIHsNCj4g LQkJLyogRXJyb3Igd2hlbiBjb3B5aW5nIHRvIHRoZSByZWNlaXZlIGJ1ZmZlciwNCj4gLQkJICog dXN1YWxseSBiZWNhdXNlIHdlIHdlcmVuJ3QgYWJsZSB0byBhbGxvY2F0ZQ0KPiAtCQkgKiBhZGRp dGlvbmFsIGJ1ZmZlciBwYWdlcy4gQWxsIHdlIGNhbiBkbyBub3cNCj4gLQkJICogaXMgdHVybiBv ZmYgVENQX1JDVl9DT1BZX0RBVEEsIHNvIHRoZSByZXF1ZXN0DQo+IC0JCSAqIHdpbGwgbm90IHJl Y2VpdmUgYW55IGFkZGl0aW9uYWwgdXBkYXRlcywNCj4gLQkJICogYW5kIHRpbWUgb3V0Lg0KPiAt CQkgKiBBbnkgcmVtYWluaW5nIGRhdGEgZnJvbSB0aGlzIHJlY29yZCB3aWxsDQo+IC0JCSAqIGJl IGRpc2NhcmRlZC4NCj4gLQkJICovDQo+IC0JCXRyYW5zcG9ydC0+cmVjdi5mbGFncyAmPSB+VENQ X1JDVl9DT1BZX0RBVEE7DQo+IC0JCWRwcmludGsoIlJQQzogICAgICAgWElEICUwOHggdHJ1bmNh dGVkIHJlcXVlc3RcbiIsDQo+IC0JCQkJbnRvaGwodHJhbnNwb3J0LT5yZWN2LnhpZCkpOw0KPiAt CQlkcHJpbnRrKCJSUEM6ICAgICAgIHhwcnQgPSAlcCwgcmVjdi5jb3BpZWQgPSAlbHUsICINCj4g LQkJCQkicmVjdi5vZmZzZXQgPSAldSwgcmVjdi5sZW4gPSAldVxuIiwNCj4gLQkJCQl4cHJ0LCB0 cmFuc3BvcnQtPnJlY3YuY29waWVkLA0KPiAtCQkJCXRyYW5zcG9ydC0+cmVjdi5vZmZzZXQsIHRy YW5zcG9ydC0NCj4gPnJlY3YubGVuKTsNCj4gLQkJcmV0dXJuOw0KPiAtCX0NCj4gLQ0KPiAtCXRy YW5zcG9ydC0+cmVjdi5jb3BpZWQgKz0gcjsNCj4gLQl0cmFuc3BvcnQtPnJlY3Yub2Zmc2V0ICs9 IHI7DQo+IC0JZGVzYy0+Y291bnQgPSBsZW4gLSByOw0KPiAtDQo+IC0JZHByaW50aygiUlBDOiAg ICAgICBYSUQgJTA4eCByZWFkICV6ZCBieXRlc1xuIiwNCj4gLQkJCW50b2hsKHRyYW5zcG9ydC0+ cmVjdi54aWQpLCByKTsNCj4gLQlkcHJpbnRrKCJSUEM6ICAgICAgIHhwcnQgPSAlcCwgcmVjdi5j b3BpZWQgPSAlbHUsIHJlY3Yub2Zmc2V0ID0NCj4gJXUsICINCj4gLQkJCSJyZWN2LmxlbiA9ICV1 XG4iLCB4cHJ0LCB0cmFuc3BvcnQtDQo+ID5yZWN2LmNvcGllZCwNCj4gLQkJCXRyYW5zcG9ydC0+ cmVjdi5vZmZzZXQsIHRyYW5zcG9ydC0+cmVjdi5sZW4pOw0KPiAtDQo+IC0JaWYgKHRyYW5zcG9y dC0+cmVjdi5jb3BpZWQgPT0gcmVxLT5ycV9wcml2YXRlX2J1Zi5idWZsZW4pDQo+IC0JCXRyYW5z cG9ydC0+cmVjdi5mbGFncyAmPSB+VENQX1JDVl9DT1BZX0RBVEE7DQo+IC0JZWxzZSBpZiAodHJh bnNwb3J0LT5yZWN2Lm9mZnNldCA9PSB0cmFuc3BvcnQtPnJlY3YubGVuKSB7DQo+IC0JCWlmICh0 cmFuc3BvcnQtPnJlY3YuZmxhZ3MgJiBUQ1BfUkNWX0xBU1RfRlJBRykNCj4gLQkJCXRyYW5zcG9y dC0+cmVjdi5mbGFncyAmPSB+VENQX1JDVl9DT1BZX0RBVEE7DQo+IC0JfQ0KPiAtfQ0KPiAtDQo+ IC0vKg0KPiAtICogRmluZHMgdGhlIHJlcXVlc3QgY29ycmVzcG9uZGluZyB0byB0aGUgUlBDIHhp ZCBhbmQgaW52b2tlcyB0aGUNCj4gY29tbW9uDQo+IC0gKiB0Y3AgcmVhZCBjb2RlIHRvIHJlYWQg dGhlIGRhdGEuDQo+IC0gKi8NCj4gLXN0YXRpYyBpbmxpbmUgaW50IHhzX3RjcF9yZWFkX3JlcGx5 KHN0cnVjdCBycGNfeHBydCAqeHBydCwNCj4gLQkJCQkgICAgc3RydWN0IHhkcl9za2JfcmVhZGVy ICpkZXNjKQ0KPiAtew0KPiAtCXN0cnVjdCBzb2NrX3hwcnQgKnRyYW5zcG9ydCA9DQo+IC0JCQkJ Y29udGFpbmVyX29mKHhwcnQsIHN0cnVjdCBzb2NrX3hwcnQsDQo+IHhwcnQpOw0KPiAtCXN0cnVj dCBycGNfcnFzdCAqcmVxOw0KPiAtDQo+IC0JZHByaW50aygiUlBDOiAgICAgICByZWFkIHJlcGx5 IFhJRCAlMDh4XG4iLCBudG9obCh0cmFuc3BvcnQtDQo+ID5yZWN2LnhpZCkpOw0KPiAtDQo+IC0J LyogRmluZCBhbmQgbG9jayB0aGUgcmVxdWVzdCBjb3JyZXNwb25kaW5nIHRvIHRoaXMgeGlkICov DQo+IC0Jc3Bpbl9sb2NrKCZ4cHJ0LT5xdWV1ZV9sb2NrKTsNCj4gLQlyZXEgPSB4cHJ0X2xvb2t1 cF9ycXN0KHhwcnQsIHRyYW5zcG9ydC0+cmVjdi54aWQpOw0KPiAtCWlmICghcmVxKSB7DQo+IC0J CWRwcmludGsoIlJQQzogICAgICAgWElEICUwOHggcmVxdWVzdCBub3QgZm91bmQhXG4iLA0KPiAt CQkJCW50b2hsKHRyYW5zcG9ydC0+cmVjdi54aWQpKTsNCj4gLQkJc3Bpbl91bmxvY2soJnhwcnQt PnF1ZXVlX2xvY2spOw0KPiAtCQlyZXR1cm4gLTE7DQo+IC0JfQ0KPiAtCXhwcnRfcGluX3Jxc3Qo cmVxKTsNCj4gLQlzcGluX3VubG9jaygmeHBydC0+cXVldWVfbG9jayk7DQo+IC0NCj4gLQl4c190 Y3BfcmVhZF9jb21tb24oeHBydCwgZGVzYywgcmVxKTsNCj4gLQ0KPiAtCXNwaW5fbG9jaygmeHBy dC0+cXVldWVfbG9jayk7DQo+IC0JaWYgKCEodHJhbnNwb3J0LT5yZWN2LmZsYWdzICYgVENQX1JD Vl9DT1BZX0RBVEEpKQ0KPiAtCQl4cHJ0X2NvbXBsZXRlX3Jxc3QocmVxLT5ycV90YXNrLCB0cmFu c3BvcnQtDQo+ID5yZWN2LmNvcGllZCk7DQo+IC0JeHBydF91bnBpbl9ycXN0KHJlcSk7DQo+IC0J c3Bpbl91bmxvY2soJnhwcnQtPnF1ZXVlX2xvY2spOw0KPiAtCXJldHVybiAwOw0KPiAtfQ0KPiAt DQo+ICAjaWYgZGVmaW5lZChDT05GSUdfU1VOUlBDX0JBQ0tDSEFOTkVMKQ0KPiAtLyoNCj4gLSAq IE9idGFpbnMgYW4gcnBjX3Jxc3QgcHJldmlvdXNseSBhbGxvY2F0ZWQgYW5kIGludm9rZXMgdGhl IGNvbW1vbg0KPiAtICogdGNwIHJlYWQgY29kZSB0byByZWFkIHRoZSBkYXRhLiAgVGhlIHJlc3Vs dCBpcyBwbGFjZWQgaW4gdGhlDQo+IGNhbGxiYWNrDQo+IC0gKiBxdWV1ZS4NCj4gLSAqIElmIHdl J3JlIHVuYWJsZSB0byBvYnRhaW4gdGhlIHJwY19ycXN0IHdlIHNjaGVkdWxlIHRoZSBjbG9zaW5n IG9mDQo+IHRoZQ0KPiAtICogY29ubmVjdGlvbiBhbmQgcmV0dXJuIC0xLg0KPiAtICovDQo+IC1z dGF0aWMgaW50IHhzX3RjcF9yZWFkX2NhbGxiYWNrKHN0cnVjdCBycGNfeHBydCAqeHBydCwNCj4g LQkJCQkgICAgICAgc3RydWN0IHhkcl9za2JfcmVhZGVyICpkZXNjKQ0KPiAtew0KPiAtCXN0cnVj dCBzb2NrX3hwcnQgKnRyYW5zcG9ydCA9DQo+IC0JCQkJY29udGFpbmVyX29mKHhwcnQsIHN0cnVj dCBzb2NrX3hwcnQsDQo+IHhwcnQpOw0KPiAtCXN0cnVjdCBycGNfcnFzdCAqcmVxOw0KPiAtDQo+ IC0JLyogTG9vayB1cCB0aGUgcmVxdWVzdCBjb3JyZXNwb25kaW5nIHRvIHRoZSBnaXZlbiBYSUQg Ki8NCj4gLQlyZXEgPSB4cHJ0X2xvb2t1cF9iY19yZXF1ZXN0KHhwcnQsIHRyYW5zcG9ydC0+cmVj di54aWQpOw0KPiAtCWlmIChyZXEgPT0gTlVMTCkgew0KPiAtCQlwcmludGsoS0VSTl9XQVJOSU5H ICJDYWxsYmFjayBzbG90IHRhYmxlDQo+IG92ZXJmbG93ZWRcbiIpOw0KPiAtCQl4cHJ0X2ZvcmNl X2Rpc2Nvbm5lY3QoeHBydCk7DQo+IC0JCXJldHVybiAtMTsNCj4gLQl9DQo+IC0NCj4gLQlkcHJp bnRrKCJSUEM6ICAgICAgIHJlYWQgY2FsbGJhY2sgIFhJRCAlMDh4XG4iLCBudG9obChyZXEtDQo+ ID5ycV94aWQpKTsNCj4gLQl4c190Y3BfcmVhZF9jb21tb24oeHBydCwgZGVzYywgcmVxKTsNCj4g LQ0KPiAtCWlmICghKHRyYW5zcG9ydC0+cmVjdi5mbGFncyAmIFRDUF9SQ1ZfQ09QWV9EQVRBKSkN Cj4gLQkJeHBydF9jb21wbGV0ZV9iY19yZXF1ZXN0KHJlcSwgdHJhbnNwb3J0LT5yZWN2LmNvcGll ZCk7DQo+IC0NCj4gLQlyZXR1cm4gMDsNCj4gLX0NCj4gLQ0KPiAtc3RhdGljIGlubGluZSBpbnQg X3hzX3RjcF9yZWFkX2RhdGEoc3RydWN0IHJwY194cHJ0ICp4cHJ0LA0KPiAtCQkJCQlzdHJ1Y3Qg eGRyX3NrYl9yZWFkZXIgKmRlc2MpDQo+IC17DQo+IC0Jc3RydWN0IHNvY2tfeHBydCAqdHJhbnNw b3J0ID0NCj4gLQkJCQljb250YWluZXJfb2YoeHBydCwgc3RydWN0IHNvY2tfeHBydCwNCj4geHBy dCk7DQo+IC0NCj4gLQlyZXR1cm4gKHRyYW5zcG9ydC0+cmVjdi5mbGFncyAmIFRDUF9SUENfUkVQ TFkpID8NCj4gLQkJeHNfdGNwX3JlYWRfcmVwbHkoeHBydCwgZGVzYykgOg0KPiAtCQl4c190Y3Bf cmVhZF9jYWxsYmFjayh4cHJ0LCBkZXNjKTsNCj4gLX0NCj4gLQ0KPiAgc3RhdGljIGludCB4c190 Y3BfYmNfdXAoc3RydWN0IHN2Y19zZXJ2ICpzZXJ2LCBzdHJ1Y3QgbmV0ICpuZXQpDQo+ICB7DQo+ ICAJaW50IHJldDsNCj4gQEAgLTE0MjksMTA2ICsxNDkzLDE0IEBAIHN0YXRpYyBzaXplX3QgeHNf dGNwX2JjX21heHBheWxvYWQoc3RydWN0DQo+IHJwY194cHJ0ICp4cHJ0KQ0KPiAgew0KPiAgCXJl dHVybiBQQUdFX1NJWkU7DQo+ICB9DQo+IC0jZWxzZQ0KPiAtc3RhdGljIGlubGluZSBpbnQgX3hz X3RjcF9yZWFkX2RhdGEoc3RydWN0IHJwY194cHJ0ICp4cHJ0LA0KPiAtCQkJCQlzdHJ1Y3QgeGRy X3NrYl9yZWFkZXIgKmRlc2MpDQo+IC17DQo+IC0JcmV0dXJuIHhzX3RjcF9yZWFkX3JlcGx5KHhw cnQsIGRlc2MpOw0KPiAtfQ0KPiAgI2VuZGlmIC8qIENPTkZJR19TVU5SUENfQkFDS0NIQU5ORUwg Ki8NCj4gIA0KPiAtLyoNCj4gLSAqIFJlYWQgZGF0YSBvZmYgdGhlIHRyYW5zcG9ydC4gIFRoaXMg Y2FuIGJlIGVpdGhlciBhbiBSUENfQ0FMTCBvcg0KPiBhbg0KPiAtICogUlBDX1JFUExZLiAgUmVs YXkgdGhlIHByb2Nlc3NpbmcgdG8gaGVscGVyIGZ1bmN0aW9ucy4NCj4gLSAqLw0KPiAtc3RhdGlj IHZvaWQgeHNfdGNwX3JlYWRfZGF0YShzdHJ1Y3QgcnBjX3hwcnQgKnhwcnQsDQo+IC0JCQkJICAg IHN0cnVjdCB4ZHJfc2tiX3JlYWRlciAqZGVzYykNCj4gLXsNCj4gLQlzdHJ1Y3Qgc29ja194cHJ0 ICp0cmFuc3BvcnQgPQ0KPiAtCQkJCWNvbnRhaW5lcl9vZih4cHJ0LCBzdHJ1Y3Qgc29ja194cHJ0 LA0KPiB4cHJ0KTsNCj4gLQ0KPiAtCWlmIChfeHNfdGNwX3JlYWRfZGF0YSh4cHJ0LCBkZXNjKSA9 PSAwKQ0KPiAtCQl4c190Y3BfY2hlY2tfZnJhZ2hkcih0cmFuc3BvcnQpOw0KPiAtCWVsc2Ugew0K PiAtCQkvKg0KPiAtCQkgKiBUaGUgdHJhbnNwb3J0X2xvY2sgcHJvdGVjdHMgdGhlIHJlcXVlc3Qg aGFuZGxpbmcuDQo+IC0JCSAqIFRoZXJlJ3Mgbm8gbmVlZCB0byBob2xkIGl0IHRvIHVwZGF0ZSB0 aGUgcmVjdi5mbGFncy4NCj4gLQkJICovDQo+IC0JCXRyYW5zcG9ydC0+cmVjdi5mbGFncyAmPSB+ VENQX1JDVl9DT1BZX0RBVEE7DQo+IC0JfQ0KPiAtfQ0KPiAtDQo+IC1zdGF0aWMgaW5saW5lIHZv aWQgeHNfdGNwX3JlYWRfZGlzY2FyZChzdHJ1Y3Qgc29ja194cHJ0ICp0cmFuc3BvcnQsDQo+IHN0 cnVjdCB4ZHJfc2tiX3JlYWRlciAqZGVzYykNCj4gLXsNCj4gLQlzaXplX3QgbGVuOw0KPiAtDQo+ IC0JbGVuID0gdHJhbnNwb3J0LT5yZWN2LmxlbiAtIHRyYW5zcG9ydC0+cmVjdi5vZmZzZXQ7DQo+ IC0JaWYgKGxlbiA+IGRlc2MtPmNvdW50KQ0KPiAtCQlsZW4gPSBkZXNjLT5jb3VudDsNCj4gLQlk ZXNjLT5jb3VudCAtPSBsZW47DQo+IC0JZGVzYy0+b2Zmc2V0ICs9IGxlbjsNCj4gLQl0cmFuc3Bv cnQtPnJlY3Yub2Zmc2V0ICs9IGxlbjsNCj4gLQlkcHJpbnRrKCJSUEM6ICAgICAgIGRpc2NhcmRl ZCAlenUgYnl0ZXNcbiIsIGxlbik7DQo+IC0JeHNfdGNwX2NoZWNrX2ZyYWdoZHIodHJhbnNwb3J0 KTsNCj4gLX0NCj4gLQ0KPiAtc3RhdGljIGludCB4c190Y3BfZGF0YV9yZWN2KHJlYWRfZGVzY3Jp cHRvcl90ICpyZF9kZXNjLCBzdHJ1Y3QNCj4gc2tfYnVmZiAqc2tiLCB1bnNpZ25lZCBpbnQgb2Zm c2V0LCBzaXplX3QgbGVuKQ0KPiAtew0KPiAtCXN0cnVjdCBycGNfeHBydCAqeHBydCA9IHJkX2Rl c2MtPmFyZy5kYXRhOw0KPiAtCXN0cnVjdCBzb2NrX3hwcnQgKnRyYW5zcG9ydCA9IGNvbnRhaW5l cl9vZih4cHJ0LCBzdHJ1Y3QNCj4gc29ja194cHJ0LCB4cHJ0KTsNCj4gLQlzdHJ1Y3QgeGRyX3Nr Yl9yZWFkZXIgZGVzYyA9IHsNCj4gLQkJLnNrYgk9IHNrYiwNCj4gLQkJLm9mZnNldAk9IG9mZnNl dCwNCj4gLQkJLmNvdW50CT0gbGVuLA0KPiAtCX07DQo+IC0Jc2l6ZV90IHJldDsNCj4gLQ0KPiAt CWRwcmludGsoIlJQQzogICAgICAgeHNfdGNwX2RhdGFfcmVjdiBzdGFydGVkXG4iKTsNCj4gLQlk byB7DQo+IC0JCXRyYWNlX3hzX3RjcF9kYXRhX3JlY3YodHJhbnNwb3J0KTsNCj4gLQkJLyogUmVh ZCBpbiBhIG5ldyBmcmFnbWVudCBtYXJrZXIgaWYgbmVjZXNzYXJ5ICovDQo+IC0JCS8qIENhbiB3 ZSBldmVyIHJlYWxseSBleHBlY3QgdG8gZ2V0IGNvbXBsZXRlbHkgZW1wdHkNCj4gZnJhZ21lbnRz PyAqLw0KPiAtCQlpZiAodHJhbnNwb3J0LT5yZWN2LmZsYWdzICYgVENQX1JDVl9DT1BZX0ZSQUdI RFIpIHsNCj4gLQkJCXhzX3RjcF9yZWFkX2ZyYWdoZHIoeHBydCwgJmRlc2MpOw0KPiAtCQkJY29u dGludWU7DQo+IC0JCX0NCj4gLQkJLyogUmVhZCBpbiB0aGUgeGlkIGlmIG5lY2Vzc2FyeSAqLw0K PiAtCQlpZiAodHJhbnNwb3J0LT5yZWN2LmZsYWdzICYgVENQX1JDVl9DT1BZX1hJRCkgew0KPiAt CQkJeHNfdGNwX3JlYWRfeGlkKHRyYW5zcG9ydCwgJmRlc2MpOw0KPiAtCQkJY29udGludWU7DQo+ IC0JCX0NCj4gLQkJLyogUmVhZCBpbiB0aGUgY2FsbC9yZXBseSBmbGFnICovDQo+IC0JCWlmICh0 cmFuc3BvcnQtPnJlY3YuZmxhZ3MgJiBUQ1BfUkNWX1JFQURfQ0FMTERJUikgew0KPiAtCQkJeHNf dGNwX3JlYWRfY2FsbGRpcih0cmFuc3BvcnQsICZkZXNjKTsNCj4gLQkJCWNvbnRpbnVlOw0KPiAt CQl9DQo+IC0JCS8qIFJlYWQgaW4gdGhlIHJlcXVlc3QgZGF0YSAqLw0KPiAtCQlpZiAodHJhbnNw b3J0LT5yZWN2LmZsYWdzICYgVENQX1JDVl9DT1BZX0RBVEEpIHsNCj4gLQkJCXhzX3RjcF9yZWFk X2RhdGEoeHBydCwgJmRlc2MpOw0KPiAtCQkJY29udGludWU7DQo+IC0JCX0NCj4gLQkJLyogU2tp cCBvdmVyIGFueSB0cmFpbGluZyBieXRlcyBvbiBzaG9ydCByZWFkcyAqLw0KPiAtCQl4c190Y3Bf cmVhZF9kaXNjYXJkKHRyYW5zcG9ydCwgJmRlc2MpOw0KPiAtCX0gd2hpbGUgKGRlc2MuY291bnQp Ow0KPiAtCXJldCA9IGxlbiAtIGRlc2MuY291bnQ7DQo+IC0JaWYgKHJldCA8IHJkX2Rlc2MtPmNv dW50KQ0KPiAtCQlyZF9kZXNjLT5jb3VudCAtPSByZXQ7DQo+IC0JZWxzZQ0KPiAtCQlyZF9kZXNj LT5jb3VudCA9IDA7DQo+IC0JdHJhY2VfeHNfdGNwX2RhdGFfcmVjdih0cmFuc3BvcnQpOw0KPiAt CWRwcmludGsoIlJQQzogICAgICAgeHNfdGNwX2RhdGFfcmVjdiBkb25lXG4iKTsNCj4gLQlyZXR1 cm4gcmV0Ow0KPiAtfQ0KPiAtDQo+ICBzdGF0aWMgdm9pZCB4c190Y3BfZGF0YV9yZWNlaXZlKHN0 cnVjdCBzb2NrX3hwcnQgKnRyYW5zcG9ydCkNCj4gIHsNCj4gIAlzdHJ1Y3QgcnBjX3hwcnQgKnhw cnQgPSAmdHJhbnNwb3J0LT54cHJ0Ow0KPiAgCXN0cnVjdCBzb2NrICpzazsNCj4gLQlyZWFkX2Rl c2NyaXB0b3JfdCByZF9kZXNjID0gew0KPiAtCQkuYXJnLmRhdGEgPSB4cHJ0LA0KPiAtCX07DQo+ IC0JdW5zaWduZWQgbG9uZyB0b3RhbCA9IDA7DQo+IC0JaW50IHJlYWQgPSAwOw0KPiArCXNpemVf dCByZWFkID0gMDsNCj4gKwlzc2l6ZV90IHJldCA9IDA7DQo+ICANCj4gIHJlc3RhcnQ6DQo+ICAJ bXV0ZXhfbG9jaygmdHJhbnNwb3J0LT5yZWN2X211dGV4KTsNCj4gQEAgLTE1MzYsMTggKzE1MDgs MTIgQEAgc3RhdGljIHZvaWQgeHNfdGNwX2RhdGFfcmVjZWl2ZShzdHJ1Y3QNCj4gc29ja194cHJ0 ICp0cmFuc3BvcnQpDQo+ICAJaWYgKHNrID09IE5VTEwpDQo+ICAJCWdvdG8gb3V0Ow0KPiAgDQo+ IC0JLyogV2UgdXNlIHJkX2Rlc2MgdG8gcGFzcyBzdHJ1Y3QgeHBydCB0byB4c190Y3BfZGF0YV9y ZWN2ICovDQo+ICAJZm9yICg7Oykgew0KPiAtCQlyZF9kZXNjLmNvdW50ID0gUlBDX1RDUF9SRUFE X0NIVU5LX1NaOw0KPiAtCQlsb2NrX3NvY2soc2spOw0KPiAtCQlyZWFkID0gdGNwX3JlYWRfc29j ayhzaywgJnJkX2Rlc2MsIHhzX3RjcF9kYXRhX3JlY3YpOw0KPiAtCQlpZiAocmRfZGVzYy5jb3Vu dCAhPSAwIHx8IHJlYWQgPCAwKSB7DQo+IC0JCQljbGVhcl9iaXQoWFBSVF9TT0NLX0RBVEFfUkVB RFksICZ0cmFuc3BvcnQtDQo+ID5zb2NrX3N0YXRlKTsNCj4gLQkJCXJlbGVhc2Vfc29jayhzayk7 DQo+ICsJCWNsZWFyX2JpdChYUFJUX1NPQ0tfREFUQV9SRUFEWSwgJnRyYW5zcG9ydC0NCj4gPnNv Y2tfc3RhdGUpOw0KPiArCQlyZXQgPSB4c19yZWFkX3N0cmVhbSh0cmFuc3BvcnQsIE1TR19ET05U V0FJVCB8DQo+IE1TR19OT1NJR05BTCk7DQo+ICsJCWlmIChyZXQgPCAwKQ0KPiAgCQkJYnJlYWs7 DQo+IC0JCX0NCj4gLQkJcmVsZWFzZV9zb2NrKHNrKTsNCj4gLQkJdG90YWwgKz0gcmVhZDsNCj4g KwkJcmVhZCArPSByZXQ7DQo+ICAJCWlmIChuZWVkX3Jlc2NoZWQoKSkgew0KPiAgCQkJbXV0ZXhf dW5sb2NrKCZ0cmFuc3BvcnQtPnJlY3ZfbXV0ZXgpOw0KPiAgCQkJY29uZF9yZXNjaGVkKCk7DQo+ IEBAIC0xNTU4LDcgKzE1MjQsNyBAQCBzdGF0aWMgdm9pZCB4c190Y3BfZGF0YV9yZWNlaXZlKHN0 cnVjdA0KPiBzb2NrX3hwcnQgKnRyYW5zcG9ydCkNCj4gIAkJcXVldWVfd29yayh4cHJ0aW9kX3dv cmtxdWV1ZSwgJnRyYW5zcG9ydC0+cmVjdl93b3JrZXIpOw0KPiAgb3V0Og0KPiAgCW11dGV4X3Vu bG9jaygmdHJhbnNwb3J0LT5yZWN2X211dGV4KTsNCj4gLQl0cmFjZV94c190Y3BfZGF0YV9yZWFk eSh4cHJ0LCByZWFkLCB0b3RhbCk7DQo+ICsJdHJhY2VfeHNfdGNwX2RhdGFfcmVhZHkoeHBydCwg cmV0LCByZWFkKTsNCj4gIH0NCj4gIA0KPiAgc3RhdGljIHZvaWQgeHNfdGNwX2RhdGFfcmVjZWl2 ZV93b3JrZm4oc3RydWN0IHdvcmtfc3RydWN0ICp3b3JrKQ0KPiBAQCAtMjM4MCw3ICsyMzQ2LDYg QEAgc3RhdGljIGludCB4c190Y3BfZmluaXNoX2Nvbm5lY3Rpbmcoc3RydWN0DQo+IHJwY194cHJ0 ICp4cHJ0LCBzdHJ1Y3Qgc29ja2V0ICpzb2NrKQ0KPiAgCXRyYW5zcG9ydC0+cmVjdi5vZmZzZXQg PSAwOw0KPiAgCXRyYW5zcG9ydC0+cmVjdi5sZW4gPSAwOw0KPiAgCXRyYW5zcG9ydC0+cmVjdi5j b3BpZWQgPSAwOw0KPiAtCXRyYW5zcG9ydC0+cmVjdi5mbGFncyA9IFRDUF9SQ1ZfQ09QWV9GUkFH SERSIHwNCj4gVENQX1JDVl9DT1BZX1hJRDsNCj4gIAl0cmFuc3BvcnQtPnhtaXQub2Zmc2V0ID0g MDsNCj4gIA0KPiAgCS8qIFRlbGwgdGhlIHNvY2tldCBsYXllciB0byBzdGFydCBjb25uZWN0aW5n Li4uICovDQo+IEBAIC0yODAyLDYgKzI3NjcsNyBAQCBzdGF0aWMgY29uc3Qgc3RydWN0IHJwY194 cHJ0X29wcyB4c190Y3Bfb3BzID0gew0KPiAgCS5jb25uZWN0CQk9IHhzX2Nvbm5lY3QsDQo+ICAJ LmJ1Zl9hbGxvYwkJPSBycGNfbWFsbG9jLA0KPiAgCS5idWZfZnJlZQkJPSBycGNfZnJlZSwNCj4g KwkucHJlcGFyZV9yZXF1ZXN0CT0geHNfc3RyZWFtX3ByZXBhcmVfcmVxdWVzdCwNCj4gIAkuc2Vu ZF9yZXF1ZXN0CQk9IHhzX3RjcF9zZW5kX3JlcXVlc3QsDQo+ICAJLnNldF9yZXRyYW5zX3RpbWVv dXQJPSB4cHJ0X3NldF9yZXRyYW5zX3RpbWVvdXRfZGVmLA0KPiAgCS5jbG9zZQkJCT0geHNfdGNw X3NodXRkb3duLA0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRh aW5lciwgSGFtbWVyc3BhY2UNCnRyb25kLm15a2xlYnVzdEBoYW1tZXJzcGFjZS5jb20NCg0KDQo= ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators 2018-09-17 13:03 ` [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 41/44] SUNRPC: Clean up - rename xs_tcp_data_receive() to xs_stream_data_receive() Trond Myklebust 2018-09-17 20:44 ` [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators Trond Myklebust @ 2018-11-09 11:19 ` Catalin Marinas 2018-11-29 19:28 ` Cristian Marussi 2 siblings, 1 reply; 76+ messages in thread From: Catalin Marinas @ 2018-11-09 11:19 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs Hi Trond, On Mon, Sep 17, 2018 at 09:03:31AM -0400, Trond Myklebust wrote: > Most of this code should also be reusable with other socket types. > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> > --- > include/linux/sunrpc/xprtsock.h | 19 +- > include/trace/events/sunrpc.h | 15 +- > net/sunrpc/xprtsock.c | 694 +++++++++++++++----------------- > 3 files changed, 335 insertions(+), 393 deletions(-) With latest mainline (24ccea7e102d, it includes Al Viro's iov_iter fixup) I started hitting some severe slowdown and systemd timeouts with nfsroot on arm64 machines (physical or guests under KVM). Interestingly, it only happens when the client kernel is configured with 64K pages, the 4K pages configuration runs fine. It also runs fine if I add rsize=65536 to the nfsroot= argument. Bisecting led me to commit 277e4ab7d530 ("SUNRPC: Simplify TCP receive code by switching to using iterators"). Prior to this commit, it works fine. Some more info: - defconfig with CONFIG_ARM64_64K_PAGES enabled - kernel cmdline arg: nfsroot=<some-server>:/srv/nfs/debian-arm64,tcp,v4 - if it matters, the server is also an arm64 machine running 4.19 with 4K pages configuration I haven't figured out what's wrong or even how to debug this as I'm not familiar with the sunrpc code. Any suggestion? Thanks. -- Catalin ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators 2018-11-09 11:19 ` Catalin Marinas @ 2018-11-29 19:28 ` Cristian Marussi 2018-11-29 19:56 ` Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Cristian Marussi @ 2018-11-29 19:28 UTC (permalink / raw) To: Catalin Marinas, Trond Myklebust; +Cc: linux-nfs Hi Trond, Catalin On 09/11/2018 11:19, Catalin Marinas wrote: > Hi Trond, > > On Mon, Sep 17, 2018 at 09:03:31AM -0400, Trond Myklebust wrote: >> Most of this code should also be reusable with other socket types. >> >> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> >> --- >> include/linux/sunrpc/xprtsock.h | 19 +- >> include/trace/events/sunrpc.h | 15 +- >> net/sunrpc/xprtsock.c | 694 +++++++++++++++----------------- >> 3 files changed, 335 insertions(+), 393 deletions(-) > > With latest mainline (24ccea7e102d, it includes Al Viro's iov_iter > fixup) I started hitting some severe slowdown and systemd timeouts with > nfsroot on arm64 machines (physical or guests under KVM). Interestingly, > it only happens when the client kernel is configured with 64K pages, the > 4K pages configuration runs fine. It also runs fine if I add rsize=65536 > to the nfsroot= argument. > > Bisecting led me to commit 277e4ab7d530 ("SUNRPC: Simplify TCP receive > code by switching to using iterators"). Prior to this commit, it works > fine. > > Some more info: > > - defconfig with CONFIG_ARM64_64K_PAGES enabled > > - kernel cmdline arg: nfsroot=<some-server>:/srv/nfs/debian-arm64,tcp,v4 > > - if it matters, the server is also an arm64 machine running 4.19 with > 4K pages configuration > > I haven't figured out what's wrong or even how to debug this as I'm not > familiar with the sunrpc code. Any suggestion? > > Thanks. > I've done a bit of experiments/observations with this since it was seriously impacting all form of testing on arm64 with a 64K pages configuration. I can confirm rsize=65536 workaround above mentioned by Catalin is effective also for me, as it is to reset back before the commit mentioned in the subject. In the following I tested instead with: - linus arm64 v4.20-rc1 64K pages + "Debug Lockups and Hangs" Enabled - hw Juno-r2 - fully NFS mounted rootfs (Debian 9) - NO rsize workaround - NFS Client config(nfsstat -m) Flags: rw,relatime,vers=4.0,rsize=4096,wsize=4096,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.1,local_lock=none,addr=192.168.0.254 Observations: 1. despite some general boot slow down (not so evident in my setup), I hit the issue when simply trying to launch LTP or LKP tests (all is in NFS mounted rootfs): the immediately observable behavior is just that the application gets 'apparently' stuck straight away (NO output whatsoever). Waiting some seconds yields no progress or result NOR any Lockup is detected by Kernel. A good deal of effort is needed to kill the process at this point...BUT it is feasible (many SIGSTOP + KILL)...and the system is back alive. 2. running again LKP via 'strace' we can observe the process apparently starting fine but then suddenly hanging randomly multiple times: at first on an execve() and then on some read() while trying to load its own file components; each hang lasts 30-45 seconds approximately. In LKP as an example: $ strace lkp run ./dbench-100%.yaml .... newfstatat(AT_FDCWD, "/opt/lkp-tests/bin/run-local", {st_mode=S_IFREG|0755, st_size=4367, ...}, 0) = 0 faccessat(AT_FDCWD, "/opt/lkp-tests/bin/run-local", X_OK) = 0 execve("/opt/lkp-tests/bin/run-local", ["/opt/lkp-tests/bin/run-local", "./dbench-100%.yaml"], [/* 12 vars */] <<< HANGGG ... ... 30-40 secs.. .... openat(AT_FDCWD, "/usr/lib/ruby/2.3.0/rubygems.rb", O_RDONLY|O_NONBLOCK|O_CLOEXEC) = 7 fstat(7, {st_mode=S_IFREG|0644, st_size=33018, ...}) = 0 close(7) = 0 getuid() = 0 geteuid() = 0 getgid() = 0 getegid() = 0 oopenat(AT_FDCWD, "/usr/lib/ruby/2.3.0/rubygems.rb", O_RDONLY|O_NONBLOCK|O_CLOEXEC) = 7 fcntl(7, F_SETFL, O_RDONLY) = 0 fstat(7, {st_mode=S_IFREG|0644, st_size=33018, ...}) = 0 fstat(7, {st_mode=S_IFREG|0644, st_size=33018, ...}) = 0 ioctl(7, TCGETS, 0xffffdd0895d8) = -1 ENOTTY (Inappropriate ioctl for device) read(7, <<<HANGGSSS .... ~30-40 secs .... "# frozen_string_literal: true\n# "..., 8192) = 8192 read(7, ") as the standard configuration "..., 8192) = 8192 brk(0xaaaaeea70000) = 0xaaaaeea70000 read(7, "ady been required, then we have "..., 8192) = 8192 read(7, "lf.user_home\n @user_home ||= "..., 8192) = 8192 read(7, " require \"rubygems/defaults/#"..., 8192) = 250 read(7, "", 8192) = 0 close(7) = 0 .... .... openat(AT_FDCWD, "/usr/lib/ruby/2.3.0/rubygems/specification.rb", O_RDONLY|O_NONBLOCK|O_CLOEXEC) = 7 fstat(7, {st_mode=S_IFREG|0644, st_size=81998, ...}) = 0 close(7) = 0 getuid() = 0 geteuid() = 0 getgid() = 0 getegid() = 0 openat(AT_FDCWD, "/usr/lib/ruby/2.3.0/rubygems/specification.rb", O_RDONLY|O_NONBLOCK|O_CLOEXEC) = 7 fcntl(7, F_SETFL, O_RDONLY) = 0 fstat(7, {st_mode=S_IFREG|0644, st_size=81998, ...}) = 0 fstat(7, {st_mode=S_IFREG|0644, st_size=81998, ...}) = 0 ioctl(7, TCGETS, 0xffffef390b38) = -1 ENOTTY (Inappropriate ioctl for device) read(7, "# -*- coding: utf-8 -*-\n# frozen"..., 8192) = 8192 read(7, "e platform attribute appropriate"..., 8192) = 8192 read(7, "sealicense.com/.\n #\n # You sho"..., 8192) = 8192 read(7, " TODO: find all extraneous adds\n"..., 8192) = 8192 read(7, "rn a list of all outdated local "..., 8192) = 8192 read(7, " = authors.collect { |a"..., 8192) = 8192 read(7, "ends on.\n #\n # Use #add_depend"..., 8192) = 8192 read(7, "ns true you\n # probably want to"..., 8192) = 8192 read(7, <<<< HANGGGGG Note that this last hang happens halfway through a file read ! 3. Having a look at the underlying network traffic with Wireshark I could see in fact that the NFS packets stop flowing completely for 30-40s when all of the above happens...but I cannot see any error or timeout or NFS retries. Same happened when I tried to reduce NFS timeo to 150 (15secs) from the original 600 (60 secs). This lack of retries was confirmed by stats: root@sqwt-ubuntu:/opt/lkp-tests# nfsstat -r Client rpc stats: calls retrans authrefrsh 16008 0 16019 The only notable thing is a routine TCP KeepAlive sent by the NFS Client TCP stack during the 30/40 secs quiet window: NFS data flow restarts anyway after another 10/12 secs after the KeepAlive is sent and ACKed by the server, so it does not seem the trigger itself for the restart. 4. Waiting forever (minutes) I was finally be able to see LKP completing initialization and the dbench test being run. Below you can see the results with and without the rsize workaround: DBENCH SANE RESULTS WITH WORKAROUND RSIZE=65536 ------------------------------------------------ ... 6 122759 5.62 MB/sec execute 599 sec latency 84.304 ms 6 cleanup 600 sec 0 cleanup 600 sec Operation Count AvgLat MaxLat ---------------------------------------- NTCreateX 108397 11.100 358.921 Close 79762 16.243 406.213 Rename 4582 19.399 259.180 Unlink 21728 3.610 190.564 Qpathinfo 98367 1.773 289.554 Qfileinfo 17163 9.917 232.687 Qfsinfo 17903 2.130 216.804 Sfileinfo 8828 17.427 234.069 Find 37915 3.478 287.326 WriteX 53503 0.048 2.992 ReadX 169707 0.592 199.341 LockX 350 13.536 242.800 UnlockX 350 2.801 124.317 Flush 7548 20.248 229.864 Throughput 5.61504 MB/sec 6 clients 6 procs max_latency=406.225 ms DBENCH RESULTS WITHOUT WORKAROUND --------------------------------- ... 6 111273 5.06 MB/sec execute 599 sec latency 4066.072 ms 6 cleanup 600 sec 3 cleanup 601 sec 0 cleanup 601 sec Operation Count AvgLat MaxLat ---------------------------------------- NTCreateX 97674 12.244 13773.786 Close 71583 19.126 13774.836 Rename 4135 19.177 277.171 Unlink 19881 4.286 12842.441 Qpathinfo 88303 1.942 12923.151 Qfileinfo 15305 9.636 203.481 Qfsinfo 16305 1.801 227.405 Sfileinfo 7960 15.871 186.799 Find 34164 3.409 255.098 WriteX 48105 0.053 5.428 ReadX 152460 0.926 13759.913 LockX 314 7.562 53.131 UnlockX 314 1.847 47.083 Flush 6872 19.222 200.180 Throughput 5.06232 MB/sec 6 clients 6 procs max_latency=13774.850 ms Then I tried to running also: ./nfstest_io -d /mnt/t/data -v all -n 10 -r 3600 with WORKAROUND it took: INFO: 2018-11-28 16:31:25.031280 TIME: 449 secs WITHOUT: INFO: 2018-11-28 17:55:39.438688 TIME: 1348 secs 4. All of the above 'slowliness' disappeared when I re-run the same tests the second time, since NFS had cached all locally in the first run probably. 5. Reboot hangs similarly. The fact that the traffic stops without triggering any NFS timeo and retry makes me thing that are the egressing NFS rpc requests themselves that got stuck somehow (but I'm far from being an NFS expert) Any idea or thoughts ? Additional sensible test cases to run ? Or hint on where to look inside NFS code Trond ? (I'll re-test with a newer 4.20 RC and trying to ftrace something...in the next days) Thanks Cristian ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators 2018-11-29 19:28 ` Cristian Marussi @ 2018-11-29 19:56 ` Trond Myklebust 2018-11-30 16:19 ` Cristian Marussi 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-11-29 19:56 UTC (permalink / raw) To: cristian.marussi, catalin.marinas; +Cc: linux-nfs On Thu, 2018-11-29 at 19:28 +0000, Cristian Marussi wrote: > Hi Trond, Catalin > > On 09/11/2018 11:19, Catalin Marinas wrote: > > Hi Trond, > > > > On Mon, Sep 17, 2018 at 09:03:31AM -0400, Trond Myklebust wrote: > > > Most of this code should also be reusable with other socket > > > types. > > > > > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> > > > --- > > > include/linux/sunrpc/xprtsock.h | 19 +- > > > include/trace/events/sunrpc.h | 15 +- > > > net/sunrpc/xprtsock.c | 694 +++++++++++++++----------- > > > ------ > > > 3 files changed, 335 insertions(+), 393 deletions(-) > > > > With latest mainline (24ccea7e102d, it includes Al Viro's iov_iter > > fixup) I started hitting some severe slowdown and systemd timeouts > > with > > nfsroot on arm64 machines (physical or guests under KVM). > > Interestingly, > > it only happens when the client kernel is configured with 64K > > pages, the > > 4K pages configuration runs fine. It also runs fine if I add > > rsize=65536 > > to the nfsroot= argument. > > > > Bisecting led me to commit 277e4ab7d530 ("SUNRPC: Simplify TCP > > receive > > code by switching to using iterators"). Prior to this commit, it > > works > > fine. > > > > Some more info: > > > > - defconfig with CONFIG_ARM64_64K_PAGES enabled > > > > - kernel cmdline arg: nfsroot=<some-server>:/srv/nfs/debian- > > arm64,tcp,v4 > > > > - if it matters, the server is also an arm64 machine running 4.19 > > with > > 4K pages configuration > > Question to you both: when this happens, does /proc/*/stack show any of the processes hanging in the socket or sunrpc code? If so, can you please send me examples of those stack traces (i.e. the contents of /proc/<pid>/stack for the processes that are hanging) I'd be particularly interested if the processes in question are related to the rpciod workqueue. Thanks Trond -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators 2018-11-29 19:56 ` Trond Myklebust @ 2018-11-30 16:19 ` Cristian Marussi 2018-11-30 19:31 ` Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Cristian Marussi @ 2018-11-30 16:19 UTC (permalink / raw) To: Trond Myklebust; +Cc: catalin.marinas, linux-nfs Hi On 29/11/2018 19:56, Trond Myklebust wrote: > On Thu, 2018-11-29 at 19:28 +0000, Cristian Marussi wrote: >> Hi Trond, Catalin [snip] > > Question to you both: when this happens, does /proc/*/stack show any of > the processes hanging in the socket or sunrpc code? If so, can you > please send me examples of those stack traces (i.e. the contents of > /proc/<pid>/stack for the processes that are hanging) (using a reverse shell since starting ssh causes a lot of pain and traffic) Looking at NFS traffic holes(30-40 secs) to detect Client side various HANGS ---------------------------------------------------------------------------- root@sqwt-ubuntu:/opt/lkp-tests# nc -lk -e /bin/bash -s 192.168.0.1 -p 1235 & root@sqwt-ubuntu:/opt/lkp-tests# lkp run ./dbench-100%.yaml $ nc 192.168.0.1 1235 cat /proc/2833/cmdline ruby/opt/lkp-tests/bin/run-local./dbench-100%.yaml HANG CLOSE ---------- cat /proc/2833/stack [<0>] __switch_to+0x6c/0x90 [<0>] rpc_wait_bit_killable+0x2c/0xb0 [<0>] __rpc_wait_for_completion_task+0x3c/0x48 [<0>] nfs4_do_close+0x1ec/0x2b0 [<0>] __nfs4_close+0x130/0x198 [<0>] nfs4_close_sync+0x34/0x40 [<0>] nfs4_close_context+0x40/0x50 [<0>] __put_nfs_open_context+0xac/0x118 [<0>] nfs_file_clear_open_context+0x38/0x58 [<0>] nfs_file_release+0x7c/0x90 [<0>] __fput+0x94/0x1c0 [<0>] ____fput+0x20/0x30 [<0>] task_work_run+0x98/0xb8 [<0>] do_notify_resume+0x2d0/0x318 [<0>] work_pending+0x8/0x10 [<0>] 0xffffffffffffffff HANG READ --------- cat /proc/2833/stack [<0>] __switch_to+0x6c/0x90 [<0>] io_schedule+0x20/0x40 [<0>] wait_on_page_bit_killable+0x164/0x260 [<0>] generic_file_read_iter+0x1c4/0x820 [<0>] nfs_file_read+0xa4/0x108 [<0>] __vfs_read+0x120/0x170 [<0>] vfs_read+0x94/0x150 [<0>] ksys_read+0x6c/0xd8 [<0>] __arm64_sys_read+0x24/0x30 [<0>] el0_svc_handler+0x7c/0x118 [<0>] el0_svc+0x8/0xc [<0>] 0xffffffffffffffff HANG STAT --------- cat /proc/2833/stack [<0>] __switch_to+0x6c/0x90 [<0>] rpc_wait_bit_killable+0x2c/0xb0 [<0>] __rpc_execute+0x1cc/0x528 [<0>] rpc_execute+0xe4/0x1b0 [<0>] rpc_run_task+0x130/0x168 [<0>] nfs4_call_sync_sequence+0x80/0xc8 [<0>] _nfs4_proc_getattr+0xc8/0xf8 [<0>] nfs4_proc_getattr+0x88/0x1d8 [<0>] __nfs_revalidate_inode+0x1f8/0x468 [<0>] nfs_getattr+0x14c/0x420 [<0>] vfs_getattr_nosec+0x7c/0x98 [<0>] vfs_getattr+0x48/0x58 [<0>] vfs_statx+0xb4/0x118 [<0>] __se_sys_newfstatat+0x58/0x98 [<0>] __arm64_sys_newfstatat+0x24/0x30 [<0>] el0_svc_handler+0x7c/0x118 [<0>] el0_svc+0x8/0xc [<0>] 0xffffffffffffffff .... Looking at a straced lkp to detect HANGS ---------------------------------------- cat /proc/2878/cmdline ruby/opt/lkp-tests/bin/run-local./dbench-100%.yaml HANG READ ---------- cat /proc/2878/stack [<0>] __switch_to+0x6c/0x90 [<0>] io_schedule+0x20/0x40 [<0>] wait_on_page_bit_killable+0x164/0x260 [<0>] generic_file_read_iter+0x1c4/0x820 [<0>] nfs_file_read+0xa4/0x108 [<0>] __vfs_read+0x120/0x170 [<0>] vfs_read+0x94/0x150 [<0>] ksys_read+0x6c/0xd8 [<0>] __arm64_sys_read+0x24/0x30 [<0>] el0_svc_handler+0x7c/0x118 [<0>] el0_svc+0x8/0xc [<0>] 0xffffffffffffffff ... cat /proc/2878/status Name: ruby Umask: 0022 State: D (disk sleep) Tgid: 2878 Ngid: 0 Pid: 2878 PPid: 2876 TracerPid: 2876 Uid: 0 0 0 0 Gid: 0 0 0 0 FDSize: 64 Groups: NStgid: 2878 NSpid: 2878 NSpgid: 2876 NSsid: 2822 VmPeak: 24192 kB VmSize: 24192 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 13376 kB VmRSS: 13376 kB RssAnon: 8768 kB RssFile: 4608 kB RssShmem: 0 kB VmData: 9792 kB VmStk: 8192 kB VmExe: 64 kB VmLib: 5888 kB VmPTE: 320 kB VmSwap: 0 kB HugetlbPages: 0 kB CoreDumping: 0 Threads: 2 SigQ: 0/7534 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000000000000 SigCgt: 00000001c2007e4f CapInh: 0000000000000000 CapPrm: 0000003fffffffff CapEff: 0000003fffffffff CapBnd: 0000003fffffffff CapAmb: 0000000000000000 NoNewPrivs: 0 Seccomp: 0 Speculation_Store_Bypass: unknown Cpus_allowed: 3f Cpus_allowed_list: 0-5 Mems_allowed: 1 Mems_allowed_list: 0 voluntary_ctxt_switches: 7547 nonvoluntary_ctxt_switches: 564 Thanks Cristian ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators 2018-11-30 16:19 ` Cristian Marussi @ 2018-11-30 19:31 ` Trond Myklebust 2018-12-02 16:44 ` Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-11-30 19:31 UTC (permalink / raw) To: cristian.marussi; +Cc: catalin.marinas, linux-nfs On Fri, 2018-11-30 at 16:19 +0000, Cristian Marussi wrote: > Hi > > On 29/11/2018 19:56, Trond Myklebust wrote: > > On Thu, 2018-11-29 at 19:28 +0000, Cristian Marussi wrote: > > > Hi Trond, Catalin > [snip] > > Question to you both: when this happens, does /proc/*/stack show > > any of > > the processes hanging in the socket or sunrpc code? If so, can you > > please send me examples of those stack traces (i.e. the contents of > > /proc/<pid>/stack for the processes that are hanging) > > (using a reverse shell since starting ssh causes a lot of pain and > traffic) > > Looking at NFS traffic holes(30-40 secs) to detect Client side > various HANGS > ------------------------------------------------------------------- > --------- > > root@sqwt-ubuntu:/opt/lkp-tests# nc -lk -e /bin/bash -s 192.168.0.1 > -p 1235 & > root@sqwt-ubuntu:/opt/lkp-tests# lkp run ./dbench-100%.yaml > > $ nc 192.168.0.1 1235 > > cat /proc/2833/cmdline > ruby/opt/lkp-tests/bin/run-local./dbench-100%.yaml > > HANG CLOSE > ---------- > cat /proc/2833/stack > [<0>] __switch_to+0x6c/0x90 > [<0>] rpc_wait_bit_killable+0x2c/0xb0 > [<0>] __rpc_wait_for_completion_task+0x3c/0x48 > [<0>] nfs4_do_close+0x1ec/0x2b0 > [<0>] __nfs4_close+0x130/0x198 > [<0>] nfs4_close_sync+0x34/0x40 > [<0>] nfs4_close_context+0x40/0x50 > [<0>] __put_nfs_open_context+0xac/0x118 > [<0>] nfs_file_clear_open_context+0x38/0x58 > [<0>] nfs_file_release+0x7c/0x90 > [<0>] __fput+0x94/0x1c0 > [<0>] ____fput+0x20/0x30 > [<0>] task_work_run+0x98/0xb8 > [<0>] do_notify_resume+0x2d0/0x318 > [<0>] work_pending+0x8/0x10 > [<0>] 0xffffffffffffffff > > HANG READ > --------- > cat /proc/2833/stack > [<0>] __switch_to+0x6c/0x90 > [<0>] io_schedule+0x20/0x40 > [<0>] wait_on_page_bit_killable+0x164/0x260 > [<0>] generic_file_read_iter+0x1c4/0x820 > [<0>] nfs_file_read+0xa4/0x108 > [<0>] __vfs_read+0x120/0x170 > [<0>] vfs_read+0x94/0x150 > [<0>] ksys_read+0x6c/0xd8 > [<0>] __arm64_sys_read+0x24/0x30 > [<0>] el0_svc_handler+0x7c/0x118 > [<0>] el0_svc+0x8/0xc > [<0>] 0xffffffffffffffff > > > HANG STAT > --------- > cat /proc/2833/stack > [<0>] __switch_to+0x6c/0x90 > [<0>] rpc_wait_bit_killable+0x2c/0xb0 > [<0>] __rpc_execute+0x1cc/0x528 > [<0>] rpc_execute+0xe4/0x1b0 > [<0>] rpc_run_task+0x130/0x168 > [<0>] nfs4_call_sync_sequence+0x80/0xc8 > [<0>] _nfs4_proc_getattr+0xc8/0xf8 > [<0>] nfs4_proc_getattr+0x88/0x1d8 > [<0>] __nfs_revalidate_inode+0x1f8/0x468 > [<0>] nfs_getattr+0x14c/0x420 > [<0>] vfs_getattr_nosec+0x7c/0x98 > [<0>] vfs_getattr+0x48/0x58 > [<0>] vfs_statx+0xb4/0x118 > [<0>] __se_sys_newfstatat+0x58/0x98 > [<0>] __arm64_sys_newfstatat+0x24/0x30 > [<0>] el0_svc_handler+0x7c/0x118 > [<0>] el0_svc+0x8/0xc > [<0>] 0xffffffffffffffff > > .... Is there anything else blocked in the RPC layer? The above are all standard tasks waiting for the rpciod/xprtiod workqueues to complete the calls to the server. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators 2018-11-30 19:31 ` Trond Myklebust @ 2018-12-02 16:44 ` Trond Myklebust 2018-12-03 11:45 ` Catalin Marinas 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-12-02 16:44 UTC (permalink / raw) To: cristian.marussi; +Cc: catalin.marinas, linux-nfs On Fri, 2018-11-30 at 14:31 -0500, Trond Myklebust wrote: > On Fri, 2018-11-30 at 16:19 +0000, Cristian Marussi wrote: > > Hi > > > > On 29/11/2018 19:56, Trond Myklebust wrote: > > > On Thu, 2018-11-29 at 19:28 +0000, Cristian Marussi wrote: > > > > Hi Trond, Catalin > > [snip] > > > Question to you both: when this happens, does /proc/*/stack show > > > any of > > > the processes hanging in the socket or sunrpc code? If so, can > > > you > > > please send me examples of those stack traces (i.e. the contents > > > of > > > /proc/<pid>/stack for the processes that are hanging) > > > > (using a reverse shell since starting ssh causes a lot of pain and > > traffic) > > > > Looking at NFS traffic holes(30-40 secs) to detect Client side > > various HANGS > > ------------------------------------------------------------------- > > Hi Cristian and Catalin Chuck and I have identified a few issues that might have an effect on the hangs you report. Could you please give the linux-next branch in my repository on git.linux-nfs.org ( https://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=shortlog;h=refs/heads/linux-next ) a try? git pull git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next Thanks! Trond -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators 2018-12-02 16:44 ` Trond Myklebust @ 2018-12-03 11:45 ` Catalin Marinas 2018-12-03 11:53 ` Cristian Marussi 0 siblings, 1 reply; 76+ messages in thread From: Catalin Marinas @ 2018-12-03 11:45 UTC (permalink / raw) To: Trond Myklebust; +Cc: cristian.marussi, linux-nfs Hi Trond, On Sun, Dec 02, 2018 at 04:44:49PM +0000, Trond Myklebust wrote: > On Fri, 2018-11-30 at 14:31 -0500, Trond Myklebust wrote: > > On Fri, 2018-11-30 at 16:19 +0000, Cristian Marussi wrote: > > > On 29/11/2018 19:56, Trond Myklebust wrote: > > > > On Thu, 2018-11-29 at 19:28 +0000, Cristian Marussi wrote: > > > > Question to you both: when this happens, does /proc/*/stack show > > > > any of the processes hanging in the socket or sunrpc code? If > > > > so, can you please send me examples of those stack traces (i.e. > > > > the contents of /proc/<pid>/stack for the processes that are > > > > hanging) > > > > > > (using a reverse shell since starting ssh causes a lot of pain and > > > traffic) > > > > > > Looking at NFS traffic holes(30-40 secs) to detect Client side > > > various HANGS > > Chuck and I have identified a few issues that might have an effect on > the hangs you report. Could you please give the linux-next branch in my > repository on git.linux-nfs.org ( > https://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=shortlog;h=refs/heads/linux-next > ) a try? > > git pull git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next I tried, unfortunately there's no difference for me (I merged the above branch on top of 4.20-rc5). -- Catalin ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators 2018-12-03 11:45 ` Catalin Marinas @ 2018-12-03 11:53 ` Cristian Marussi 2018-12-03 18:54 ` Cristian Marussi 0 siblings, 1 reply; 76+ messages in thread From: Cristian Marussi @ 2018-12-03 11:53 UTC (permalink / raw) To: Catalin Marinas, Trond Myklebust; +Cc: linux-nfs Hi On 03/12/2018 11:45, Catalin Marinas wrote: > Hi Trond, > > On Sun, Dec 02, 2018 at 04:44:49PM +0000, Trond Myklebust wrote: >> On Fri, 2018-11-30 at 14:31 -0500, Trond Myklebust wrote: >>> On Fri, 2018-11-30 at 16:19 +0000, Cristian Marussi wrote: >>>> On 29/11/2018 19:56, Trond Myklebust wrote: >>>>> On Thu, 2018-11-29 at 19:28 +0000, Cristian Marussi wrote: >>>>> Question to you both: when this happens, does /proc/*/stack show >>>>> any of the processes hanging in the socket or sunrpc code? If >>>>> so, can you please send me examples of those stack traces (i.e. >>>>> the contents of /proc/<pid>/stack for the processes that are >>>>> hanging) >>>> >>>> (using a reverse shell since starting ssh causes a lot of pain and >>>> traffic) >>>> >>>> Looking at NFS traffic holes(30-40 secs) to detect Client side >>>> various HANGS >> >> Chuck and I have identified a few issues that might have an effect on >> the hangs you report. Could you please give the linux-next branch in my >> repository on git.linux-nfs.org ( >> https://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=shortlog;h=refs/heads/linux-next >> ) a try? >> >> git pull git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next > > I tried, unfortunately there's no difference for me (I merged the above > branch on top of 4.20-rc5). > same for me. Issue still there. Beside I saw some differences in the dbench result which I used for testing. From the dbench (comparing with previous mail) it seems that Unlink and Qpathinfo MaxLat has normalized. Operation Count AvgLat MaxLat ---------------------------------------- NTCreateX 90820 13.613 13855.620 Close 66565 18.075 13853.289 Rename 3845 23.668 326.642 Unlink 18450 4.581 186.062 Qpathinfo 82068 2.677 280.203 Qfileinfo 14235 10.357 176.373 Qfsinfo 15156 2.822 242.794 Sfileinfo 7400 17.018 240.546 Find 31812 5.988 277.332 WriteX 44735 0.155 14.685 ReadX 141872 0.741 13817.870 LockX 288 10.558 96.179 UnlockX 288 3.307 57.939 Flush 6389 20.427 187.429 > Is there anything else blocked in the RPC layer? The above are all > standard tasks waiting for the rpciod/xprtiod workqueues to complete > the calls to the server. cat /proc/692/stack [<0>] __switch_to+0x6c/0x90 [<0>] rescuer_thread+0x2e8/0x360 [<0>] kthread+0x134/0x138 [<0>] ret_from_fork+0x10/0x1c [<0>] 0xffffffffffffffff I was now trying to collect more evidence ftracing during the quiet-stuck-period till the restart happens. Thanks Cristian ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators 2018-12-03 11:53 ` Cristian Marussi @ 2018-12-03 18:54 ` Cristian Marussi 0 siblings, 0 replies; 76+ messages in thread From: Cristian Marussi @ 2018-12-03 18:54 UTC (permalink / raw) To: Catalin Marinas, Trond Myklebust; +Cc: linux-nfs [-- Attachment #1: Type: text/plain, Size: 1911 bytes --] Hi On 03/12/2018 11:53, Cristian Marussi wrote: > Hi > [snip] > same for me. Issue still there. > > Beside I saw some differences in the dbench result which I used for testing. > > From the dbench (comparing with previous mail) it seems that > Unlink and Qpathinfo MaxLat has normalized. > > Operation Count AvgLat MaxLat > ---------------------------------------- > NTCreateX 90820 13.613 13855.620 > Close 66565 18.075 13853.289 > Rename 3845 23.668 326.642 > Unlink 18450 4.581 186.062 > Qpathinfo 82068 2.677 280.203 > Qfileinfo 14235 10.357 176.373 > Qfsinfo 15156 2.822 242.794 > Sfileinfo 7400 17.018 240.546 > Find 31812 5.988 277.332 > WriteX 44735 0.155 14.685 > ReadX 141872 0.741 13817.870 > LockX 288 10.558 96.179 > UnlockX 288 3.307 57.939 > Flush 6389 20.427 187.429 > > >> Is there anything else blocked in the RPC layer? The above are all >> standard tasks waiting for the rpciod/xprtiod workqueues to complete >> the calls to the server. > cat /proc/692/stack > [<0>] __switch_to+0x6c/0x90 > [<0>] rescuer_thread+0x2e8/0x360 > [<0>] kthread+0x134/0x138 > [<0>] ret_from_fork+0x10/0x1c > [<0>] 0xffffffffffffffff > > I was now trying to collect more evidence ftracing during the quiet-stuck-period > till the restart happens. > attached to this mail there is a 3secs ftrace function-graph taken during the quiet/stalled period of an 'LKP run dbench'; issued directly from console (no ssh or netcat shell traffic). Ftrace filter was pre-set as: set_ftrace_filter was set to : nfs* rpc* xprt* tcp* and tracing started once NO traffic was observed flowing on Wireshark. Using ARM64 64k pages on Linux NFS next branch like previous mail this morning. Thanks Cristian [-- Attachment #2: nfs_64k_stuck_ftrace_filtered_3secs_stalled.txt --] [-- Type: text/plain, Size: 57577 bytes --] # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 3) | nfs_permission() { 3) | rpc_lookup_cred_nonblock() { 3) | rpcauth_lookupcred() { 3) 3.720 us | rpcauth_lookup_credcache(); 3) 8.100 us | } 3) + 11.500 us | } 3) | nfs_do_access() { 3) | nfs_check_cache_invalid() { 3) | nfs4_have_delegation() { 3) 1.220 us | nfs4_is_valid_delegation(); 3) 4.560 us | } 3) 8.040 us | } 3) + 11.420 us | } 3) + 36.140 us | } 3) | nfs4_lookup_revalidate() { 3) | nfs4_do_lookup_revalidate() { 3) | nfs_do_lookup_revalidate() { 3) | nfs4_have_delegation() { 3) 1.200 us | nfs4_is_valid_delegation(); 3) 3.860 us | } 3) | nfs_check_verifier() { 3) | nfs_mapping_need_revalidate_inode() { 3) | nfs_check_cache_invalid() { 3) | nfs4_have_delegation() { 3) 1.180 us | nfs4_is_valid_delegation(); 3) 3.740 us | } 3) 6.280 us | } 3) 8.840 us | } 3) + 11.860 us | } 3) 1.360 us | nfs_lookup_verify_inode(); 3) 1.440 us | nfs_advise_use_readdirplus(); 3) + 27.520 us | } 3) + 30.940 us | } 3) + 34.260 us | } 3) | nfs_permission() { 3) | rpc_lookup_cred_nonblock() { 3) | rpcauth_lookupcred() { 3) 1.720 us | rpcauth_lookup_credcache(); 3) 4.300 us | } 3) 6.700 us | } 3) | nfs_do_access() { 3) | nfs_check_cache_invalid() { 3) | nfs4_have_delegation() { 3) 1.220 us | nfs4_is_valid_delegation(); 3) 3.700 us | } 3) 6.320 us | } 3) 9.840 us | } 3) + 20.480 us | } 3) | nfs4_lookup_revalidate() { 3) | nfs4_do_lookup_revalidate() { 3) | nfs4_have_delegation() { 3) 1.580 us | nfs4_is_valid_delegation(); 3) 1.180 us | nfs_mark_delegation_referenced(); 3) 7.060 us | } 3) 9.620 us | } 3) + 12.320 us | } 3) 1.260 us | nfs_permission(); 3) | nfs4_file_open() { 3) 1.360 us | nfs_check_flags(); 3) | rpc_lookup_cred() { 3) | rpcauth_lookupcred() { 3) 1.580 us | rpcauth_lookup_credcache(); 3) 4.080 us | } 3) 6.640 us | } 3) 1.700 us | nfs_sb_active(); 3) | nfs4_atomic_open() { 3) | nfs4_do_open() { 3) 2.780 us | nfs4_get_state_owner(); 3) | nfs4_client_recover_expired_lease() { 3) | nfs4_wait_clnt_recover() { 3) | nfs_put_client() { 3) 1.940 us | nfs_put_client.part.2(); 3) 4.920 us | } 3) 8.140 us | } 3) + 11.180 us | } 3) | nfs4_opendata_alloc() { 3) 1.360 us | nfs4_label_alloc(); 3) 1.160 us | nfs4_label_alloc(); 3) 2.180 us | nfs_alloc_seqid(); 3) 1.300 us | nfs_sb_active(); 3) | nfs_fattr_init() { 3) 1.380 us | nfs_inc_attr_generation_counter(); 3) 4.760 us | } 3) 1.160 us | nfs_fattr_init_names(); 3) + 25.480 us | } 3) 3.520 us | nfs4_get_open_state(); 3) | nfs4_run_open_task() { 3) | rpc_run_task() { 3) | rpc_new_task() { 3) 1.280 us | xprt_get(); 3) 6.040 us | } 3) | xprt_iter_get_next() { 3) | xprt_iter_get_helper() { 3) 1.220 us | xprt_iter_first_entry(); 3) 1.480 us | xprt_get(); 3) 7.100 us | } 3) + 10.300 us | } 3) | rpc_execute() { 3) + 12.100 us | rpc_make_runnable(); 3) + 15.260 us | } 3) + 39.520 us | } 3) ! 129.680 us | rpc_wait_bit_killable(); 2) | rpc_async_schedule() { 2) | rpc_prepare_task() { 2) | nfs4_open_prepare() { 2) 1.260 us | nfs_wait_on_sequence(); 2) 0.820 us | nfs_mark_delegation_referenced(); 2) 0.740 us | nfs4_sequence_done(); 2) 8.780 us | } 2) + 11.360 us | } 2) | rpc_release_resources_task() { 2) 1.480 us | xprt_release(); 2) | rpc_task_release_client() { 2) 0.780 us | rpc_release_client(); 2) 1.120 us | xprt_put(); 2) 5.380 us | } 2) + 10.780 us | } 2) + 32.280 us | } 3) | rpc_put_task() { 3) | rpc_do_put_task() { 3) | rpc_release_resources_task() { 3) 1.460 us | xprt_release(); 3) 1.380 us | rpc_task_release_client(); 3) 7.340 us | } 3) | rpc_free_task() { 3) | nfs4_open_release() { 3) 1.520 us | nfs4_opendata_put.part.8(); 3) 4.480 us | } 3) 8.080 us | } 3) + 19.980 us | } 3) + 23.560 us | } 3) ! 203.540 us | } 3) 1.420 us | nfs_mark_delegation_referenced(); 3) 1.500 us | nfs_release_seqid(); 3) | nfs_may_open() { 3) | nfs_do_access() { 3) | nfs_check_cache_invalid() { 3) | nfs4_have_delegation() { 3) 1.200 us | nfs4_is_valid_delegation(); 3) 1.200 us | nfs_mark_delegation_referenced(); 3) 6.260 us | } 3) 8.860 us | } 3) + 11.860 us | } 3) + 14.480 us | } 3) 1.260 us | nfs_mark_delegation_referenced(); 3) 1.220 us | nfs4_state_set_mode_locked(); 3) 1.500 us | nfs_release_seqid(); 3) 1.540 us | nfs_inode_attach_open_context(); 3) 1.420 us | nfs4_sequence_free_slot(); 3) | nfs4_opendata_put.part.8() { 3) 1.380 us | nfs4_lgopen_release(); 3) | nfs_free_seqid() { 3) 1.380 us | nfs_release_seqid(); 3) 4.260 us | } 3) 1.280 us | nfs4_sequence_free_slot(); 3) 1.620 us | nfs4_put_open_state(); 3) 1.400 us | nfs4_put_state_owner(); 3) 1.280 us | nfs_sb_deactive(); 3) 1.340 us | nfs_fattr_free_names(); 3) + 25.680 us | } 3) 1.260 us | nfs4_put_state_owner(); 3) ! 329.940 us | } 3) ! 333.420 us | } 3) 1.360 us | nfs_file_set_open_context(); 3) ! 356.720 us | } 1) 2.520 us | nfs4_xattr_get_nfs4_label(); 1) | nfs_file_read() { 1) 0.900 us | nfs_start_io_read(); 1) | nfs_revalidate_mapping() { 1) | nfs_mapping_need_revalidate_inode() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.820 us | nfs4_is_valid_delegation(); 1) 0.800 us | nfs_mark_delegation_referenced(); 1) 4.700 us | } 1) 6.540 us | } 1) 8.220 us | } 1) + 10.360 us | } 1) 0.860 us | nfs_end_io_read(); 1) + 21.200 us | } 1) | nfs_permission() { 1) | rpc_lookup_cred_nonblock() { 1) | rpcauth_lookupcred() { 1) 1.280 us | rpcauth_lookup_credcache(); 1) 3.400 us | } 1) 5.400 us | } 1) | nfs_do_access() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 0.760 us | nfs_mark_delegation_referenced(); 1) 4.040 us | } 1) 5.620 us | } 1) 7.360 us | } 1) + 16.080 us | } 1) | nfs_file_read() { 1) 0.800 us | nfs_start_io_read(); 1) | nfs_revalidate_mapping() { 1) | nfs_mapping_need_revalidate_inode() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.860 us | nfs4_is_valid_delegation(); 1) 0.780 us | nfs_mark_delegation_referenced(); 1) 4.020 us | } 1) 6.120 us | } 1) 7.660 us | } 1) 9.320 us | } 1) 0.920 us | nfs_end_io_read(); 1) + 15.620 us | } 1) | nfs_file_read() { 1) 0.800 us | nfs_start_io_read(); 1) | nfs_revalidate_mapping() { 1) | nfs_mapping_need_revalidate_inode() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 0.900 us | nfs_mark_delegation_referenced(); 1) 4.040 us | } 1) 5.580 us | } 1) 7.120 us | } 1) 8.720 us | } 1) 0.740 us | nfs_end_io_read(); 1) + 14.000 us | } 1) | nfs_permission() { 1) | rpc_lookup_cred_nonblock() { 1) | rpcauth_lookupcred() { 1) 0.880 us | rpcauth_lookup_credcache(); 1) 2.440 us | } 1) 4.020 us | } 1) | nfs_do_access() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 2.360 us | } 1) 4.040 us | } 1) 5.920 us | } 1) + 12.820 us | } 1) | nfs4_lookup_revalidate() { 1) | nfs4_do_lookup_revalidate() { 1) | nfs_do_lookup_revalidate() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 2.460 us | } 1) | nfs_check_verifier() { 1) | nfs_mapping_need_revalidate_inode() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.860 us | nfs4_is_valid_delegation(); 1) 2.420 us | } 1) 4.180 us | } 1) 5.700 us | } 1) 7.240 us | } 1) 1.020 us | nfs_lookup_verify_inode(); 1) 0.780 us | nfs_advise_use_readdirplus(); 1) + 16.180 us | } 1) + 17.860 us | } 1) + 20.040 us | } 1) | nfs_permission() { 1) | rpc_lookup_cred_nonblock() { 1) | rpcauth_lookupcred() { 1) 0.820 us | rpcauth_lookup_credcache(); 1) 2.520 us | } 1) 4.100 us | } 1) | nfs_do_access() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.720 us | nfs4_is_valid_delegation(); 1) 2.400 us | } 1) 3.960 us | } 1) 5.760 us | } 1) + 12.360 us | } 1) | nfs4_lookup_revalidate() { 1) | nfs4_do_lookup_revalidate() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 2.380 us | } 1) | nfs_do_lookup_revalidate() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 2.280 us | } 1) | nfs_check_verifier() { 1) | nfs_mapping_need_revalidate_inode() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 2.320 us | } 1) 3.820 us | } 1) 5.340 us | } 1) 6.860 us | } 1) 0.840 us | nfs_lookup_verify_inode(); 1) 1.100 us | nfs_advise_use_readdirplus(); 1) + 15.400 us | } 1) + 20.240 us | } 1) + 21.800 us | } 1) | nfs_get_link() { 1) | nfs_revalidate_mapping_rcu() { 1) | nfs_mapping_need_revalidate_inode() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 2.320 us | } 1) 3.940 us | } 1) 5.600 us | } 1) 7.360 us | } 1) 9.500 us | } 1) | nfs_permission() { 1) | rpc_lookup_cred_nonblock() { 1) | rpcauth_lookupcred() { 1) 0.820 us | rpcauth_lookup_credcache(); 1) 2.380 us | } 1) 3.940 us | } 1) | nfs_do_access() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 2.280 us | } 1) 3.860 us | } 1) 5.380 us | } 1) + 11.960 us | } 1) | nfs4_lookup_revalidate() { 1) | nfs4_do_lookup_revalidate() { 1) | nfs_do_lookup_revalidate() { 1) | nfs4_have_delegation() { 1) 0.720 us | nfs4_is_valid_delegation(); 1) 2.300 us | } 1) | nfs_check_verifier() { 1) | nfs_mapping_need_revalidate_inode() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 2.300 us | } 1) 3.840 us | } 1) 5.360 us | } 1) 6.860 us | } 1) 0.740 us | nfs_lookup_verify_inode(); 1) 0.740 us | nfs_advise_use_readdirplus(); 1) + 14.680 us | } 1) + 16.260 us | } 1) + 17.940 us | } 1) | nfs_permission() { 1) | rpc_lookup_cred_nonblock() { 1) | rpcauth_lookupcred() { 1) 0.840 us | rpcauth_lookup_credcache(); 1) 2.520 us | } 1) 4.040 us | } 1) | nfs_do_access() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 2.280 us | } 1) 3.820 us | } 1) 5.620 us | } 1) + 12.020 us | } 1) | nfs4_lookup_revalidate() { 1) | nfs4_do_lookup_revalidate() { 1) | nfs4_have_delegation() { 1) 0.820 us | nfs4_is_valid_delegation(); 1) 0.760 us | nfs_mark_delegation_referenced(); 1) 4.120 us | } 1) 5.720 us | } 1) 7.280 us | } 1) 0.800 us | nfs_permission(); 1) | nfs4_file_open() { 1) 0.760 us | nfs_check_flags(); 1) | rpc_lookup_cred() { 1) | rpcauth_lookupcred() { 1) 0.900 us | rpcauth_lookup_credcache(); 1) 2.420 us | } 1) 3.920 us | } 1) 0.900 us | nfs_sb_active(); 1) | nfs4_atomic_open() { 1) | nfs4_do_open() { 1) 1.400 us | nfs4_get_state_owner(); 1) | nfs4_client_recover_expired_lease() { 1) | nfs4_wait_clnt_recover() { 1) | nfs_put_client() { 1) 0.960 us | nfs_put_client.part.2(); 1) 2.620 us | } 1) 4.500 us | } 1) 6.120 us | } 1) | nfs4_opendata_alloc() { 1) 0.980 us | nfs4_label_alloc(); 1) 0.720 us | nfs4_label_alloc(); 1) 0.840 us | nfs_alloc_seqid(); 1) 0.800 us | nfs_sb_active(); 1) | nfs_fattr_init() { 1) 0.780 us | nfs_inc_attr_generation_counter(); 1) 2.460 us | } 1) 0.740 us | nfs_fattr_init_names(); 1) + 13.720 us | } 1) 1.020 us | nfs4_get_open_state(); 1) | nfs4_run_open_task() { 1) | rpc_run_task() { 1) | rpc_new_task() { 1) 0.820 us | xprt_get(); 1) 3.600 us | } 1) | xprt_iter_get_next() { 1) | xprt_iter_get_helper() { 1) 0.760 us | xprt_iter_first_entry(); 1) 0.760 us | xprt_get(); 1) 4.160 us | } 1) 5.980 us | } 1) | rpc_execute() { 1) 6.000 us | rpc_make_runnable(); 1) 7.740 us | } 1) + 21.180 us | } 1) ! 104.520 us | rpc_wait_bit_killable(); 2) | rpc_async_schedule() { 2) | rpc_prepare_task() { 2) | nfs4_open_prepare() { 2) 1.040 us | nfs_wait_on_sequence(); 2) 0.920 us | nfs_mark_delegation_referenced(); 2) 0.740 us | nfs4_sequence_done(); 2) 6.600 us | } 2) 8.480 us | } 2) | rpc_release_resources_task() { 2) 0.780 us | xprt_release(); 2) | rpc_task_release_client() { 2) 0.780 us | rpc_release_client(); 2) 0.760 us | xprt_put(); 2) 4.220 us | } 2) 7.880 us | } 2) + 24.700 us | } 1) | rpc_put_task() { 1) | rpc_do_put_task() { 1) | rpc_release_resources_task() { 1) 0.840 us | xprt_release(); 1) 0.760 us | rpc_task_release_client(); 1) 4.080 us | } 1) | rpc_free_task() { 1) | nfs4_open_release() { 1) 0.800 us | nfs4_opendata_put.part.8(); 1) 2.400 us | } 1) 4.420 us | } 1) + 11.240 us | } 1) + 13.020 us | } 1) ! 143.480 us | } 1) 0.760 us | nfs_mark_delegation_referenced(); 1) 0.820 us | nfs_release_seqid(); 1) | nfs_may_open() { 1) | nfs_do_access() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 0.860 us | nfs_mark_delegation_referenced(); 1) 4.060 us | } 1) 5.640 us | } 1) 7.480 us | } 1) 9.040 us | } 1) 0.860 us | nfs_mark_delegation_referenced(); 1) 0.760 us | nfs4_state_set_mode_locked(); 1) 0.880 us | nfs_release_seqid(); 1) 1.420 us | nfs_inode_attach_open_context(); 1) 0.720 us | nfs4_sequence_free_slot(); 1) | nfs4_opendata_put.part.8() { 1) 0.720 us | nfs4_lgopen_release(); 1) | nfs_free_seqid() { 1) 0.760 us | nfs_release_seqid(); 1) 2.540 us | } 1) 0.740 us | nfs4_sequence_free_slot(); 1) 0.800 us | nfs4_put_open_state(); 1) 0.800 us | nfs4_put_state_owner(); 1) 0.780 us | nfs_sb_deactive(); 1) 0.780 us | nfs_fattr_free_names(); 1) + 15.020 us | } 1) 0.760 us | nfs4_put_state_owner(); 1) ! 213.580 us | } 1) ! 215.300 us | } 1) 0.760 us | nfs_file_set_open_context(); 1) ! 227.840 us | } 1) | nfs_permission() { 1) | rpc_lookup_cred_nonblock() { 1) | rpcauth_lookupcred() { 1) 0.900 us | rpcauth_lookup_credcache(); 1) 2.480 us | } 1) 4.080 us | } 1) | nfs_do_access() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.780 us | nfs4_is_valid_delegation(); 1) 0.760 us | nfs_mark_delegation_referenced(); 1) 3.960 us | } 1) 5.520 us | } 1) 7.100 us | } 1) + 14.040 us | } 1) | nfs_file_read() { 1) 0.780 us | nfs_start_io_read(); 1) | nfs_revalidate_mapping() { 1) | nfs_mapping_need_revalidate_inode() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 0.740 us | nfs_mark_delegation_referenced(); 1) 3.900 us | } 1) 5.440 us | } 1) 7.000 us | } 1) 8.720 us | } 1) 0.780 us | nfs_end_io_read(); 1) + 15.060 us | } 1) | nfs_file_read() { 1) 0.760 us | nfs_start_io_read(); 1) | nfs_revalidate_mapping() { 1) | nfs_mapping_need_revalidate_inode() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 0.740 us | nfs_mark_delegation_referenced(); 1) 3.860 us | } 1) 5.360 us | } 1) 6.900 us | } 1) 8.500 us | } 1) 0.760 us | nfs_end_io_read(); 1) + 14.420 us | } 1) | nfs_file_mmap() { 1) | nfs_revalidate_mapping() { 1) | nfs_mapping_need_revalidate_inode() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 0.780 us | nfs_mark_delegation_referenced(); 1) 4.080 us | } 1) 5.800 us | } 1) 7.400 us | } 1) 9.140 us | } 1) + 11.760 us | } 1) | nfs_file_mmap() { 1) | nfs_revalidate_mapping() { 1) | nfs_mapping_need_revalidate_inode() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 0.740 us | nfs_mark_delegation_referenced(); 1) 3.900 us | } 1) 5.420 us | } 1) 6.920 us | } 1) 8.540 us | } 1) + 10.300 us | } 1) | nfs_file_mmap() { 1) | nfs_revalidate_mapping() { 1) | nfs_mapping_need_revalidate_inode() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 0.780 us | nfs_mark_delegation_referenced(); 1) 4.020 us | } 1) 5.600 us | } 1) 7.160 us | } 1) 8.820 us | } 1) + 10.740 us | } 1) | nfs_file_mmap() { 1) | nfs_revalidate_mapping() { 1) | nfs_mapping_need_revalidate_inode() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 0.760 us | nfs_mark_delegation_referenced(); 1) 3.880 us | } 1) 5.400 us | } 1) 6.920 us | } 1) 8.740 us | } 1) + 10.400 us | } 1) | nfs_permission() { 1) | rpc_lookup_cred_nonblock() { 1) | rpcauth_lookupcred() { 1) 1.080 us | rpcauth_lookup_credcache(); 1) 2.860 us | } 1) 4.560 us | } 1) | nfs_do_access() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.760 us | nfs4_is_valid_delegation(); 1) 2.400 us | } 1) 4.080 us | } 1) 5.980 us | } 1) + 13.600 us | } 1) | nfs4_lookup_revalidate() { 1) | nfs4_do_lookup_revalidate() { 1) | nfs_do_lookup_revalidate() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 2.380 us | } 1) | nfs_check_verifier() { 1) | nfs_mapping_need_revalidate_inode() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 2.260 us | } 1) 3.800 us | } 1) 5.620 us | } 1) 7.180 us | } 1) 0.900 us | nfs_lookup_verify_inode(); 1) 0.740 us | nfs_advise_use_readdirplus(); 1) + 15.580 us | } 1) + 17.120 us | } 1) + 18.720 us | } 1) | nfs_permission() { 1) | rpc_lookup_cred_nonblock() { 1) | rpcauth_lookupcred() { 1) 0.900 us | rpcauth_lookup_credcache(); 1) 2.520 us | } 1) 4.240 us | } 1) | nfs_do_access() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.760 us | nfs4_is_valid_delegation(); 1) 2.600 us | } 1) 4.260 us | } 1) 6.400 us | } 1) + 13.080 us | } 1) | nfs4_lookup_revalidate() { 1) | nfs4_do_lookup_revalidate() { 1) | nfs_do_lookup_revalidate() { 1) | nfs_check_verifier() { 1) | nfs_mapping_need_revalidate_inode() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 2.480 us | } 1) 4.080 us | } 1) 5.600 us | } 1) 7.240 us | } 1) 0.740 us | nfs_lookup_revalidate_done(); 1) + 10.560 us | } 1) + 12.080 us | } 1) + 13.840 us | } 1) | nfs_permission() { 1) | rpc_lookup_cred_nonblock() { 1) | rpcauth_lookupcred() { 1) 1.080 us | rpcauth_lookup_credcache(); 1) 2.740 us | } 1) 4.340 us | } 1) | nfs_do_access() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.720 us | nfs4_is_valid_delegation(); 1) 2.340 us | } 1) 4.420 us | } 1) 6.040 us | } 1) + 13.200 us | } 1) | nfs4_lookup_revalidate() { 1) | nfs4_do_lookup_revalidate() { 1) | nfs_do_lookup_revalidate() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 2.480 us | } 1) | nfs_check_verifier() { 1) | nfs_mapping_need_revalidate_inode() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 2.280 us | } 1) 3.960 us | } 1) 5.460 us | } 1) 7.000 us | } 1) 0.720 us | nfs_lookup_verify_inode(); 1) 0.880 us | nfs_advise_use_readdirplus(); 1) + 15.200 us | } 1) + 16.720 us | } 1) + 18.260 us | } 1) | nfs_permission() { 1) | rpc_lookup_cred_nonblock() { 1) | rpcauth_lookupcred() { 1) 0.820 us | rpcauth_lookup_credcache(); 1) 2.460 us | } 1) 4.040 us | } 1) | nfs_do_access() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 2.300 us | } 1) 3.820 us | } 1) 5.360 us | } 1) + 11.800 us | } 1) | nfs4_lookup_revalidate() { 1) | nfs4_do_lookup_revalidate() { 1) | nfs_do_lookup_revalidate() { 1) | nfs_check_verifier() { 1) | nfs_mapping_need_revalidate_inode() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.860 us | nfs4_is_valid_delegation(); 1) 2.440 us | } 1) 3.980 us | } 1) 5.500 us | } 1) 6.980 us | } 1) 0.740 us | nfs_lookup_revalidate_done(); 1) + 10.060 us | } 1) + 11.580 us | } 1) + 13.260 us | } 1) | nfs_permission() { 1) | rpc_lookup_cred_nonblock() { 1) | rpcauth_lookupcred() { 1) 0.860 us | rpcauth_lookup_credcache(); 1) 2.580 us | } 1) 4.180 us | } 1) | nfs_do_access() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 2.420 us | } 1) 3.980 us | } 1) 5.560 us | } 1) + 12.300 us | } 1) | nfs4_lookup_revalidate() { 1) | nfs4_do_lookup_revalidate() { 1) | nfs_do_lookup_revalidate() { 1) | nfs4_have_delegation() { 1) 0.880 us | nfs4_is_valid_delegation(); 1) 2.420 us | } 1) | nfs_check_verifier() { 1) | nfs_mapping_need_revalidate_inode() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.720 us | nfs4_is_valid_delegation(); 1) 2.400 us | } 1) 3.920 us | } 1) 5.420 us | } 1) 6.960 us | } 1) 0.740 us | nfs_lookup_verify_inode(); 1) 0.740 us | nfs_advise_use_readdirplus(); 1) + 14.880 us | } 1) + 16.440 us | } 1) + 17.980 us | } 1) | nfs_permission() { 1) | rpc_lookup_cred_nonblock() { 1) | rpcauth_lookupcred() { 1) 0.840 us | rpcauth_lookup_credcache(); 1) 2.380 us | } 1) 3.940 us | } 1) | nfs_do_access() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 2.320 us | } 1) 3.980 us | } 1) 5.500 us | } 1) + 11.840 us | } 1) | nfs4_lookup_revalidate() { 1) | nfs4_do_lookup_revalidate() { 1) | nfs4_have_delegation() { 1) 0.960 us | nfs4_is_valid_delegation(); 1) 0.740 us | nfs_mark_delegation_referenced(); 1) 4.520 us | } 1) 6.160 us | } 1) 7.880 us | } 1) 0.800 us | nfs_permission(); 1) | nfs4_file_open() { 1) 0.740 us | nfs_check_flags(); 1) | rpc_lookup_cred() { 1) | rpcauth_lookupcred() { 1) 0.900 us | rpcauth_lookup_credcache(); 1) 2.440 us | } 1) 3.960 us | } 1) 0.800 us | nfs_sb_active(); 1) | nfs4_atomic_open() { 1) | nfs4_do_open() { 1) 1.020 us | nfs4_get_state_owner(); 1) | nfs4_client_recover_expired_lease() { 1) | nfs4_wait_clnt_recover() { 1) | nfs_put_client() { 1) 0.800 us | nfs_put_client.part.2(); 1) 2.320 us | } 1) 3.880 us | } 1) 5.460 us | } 1) | nfs4_opendata_alloc() { 1) 0.760 us | nfs4_label_alloc(); 1) 0.740 us | nfs4_label_alloc(); 1) 0.840 us | nfs_alloc_seqid(); 1) 0.800 us | nfs_sb_active(); 1) | nfs_fattr_init() { 1) 0.760 us | nfs_inc_attr_generation_counter(); 1) 2.440 us | } 1) 0.740 us | nfs_fattr_init_names(); 1) + 12.880 us | } 1) 1.540 us | nfs4_get_open_state(); 1) | nfs4_run_open_task() { 1) | rpc_run_task() { 1) | rpc_new_task() { 1) 0.740 us | xprt_get(); 1) 2.840 us | } 1) | xprt_iter_get_next() { 1) | xprt_iter_get_helper() { 1) 0.760 us | xprt_iter_first_entry(); 1) 0.760 us | xprt_get(); 1) 3.920 us | } 1) 5.540 us | } 1) | rpc_execute() { 1) 5.240 us | rpc_make_runnable(); 1) 6.900 us | } 1) + 18.840 us | } 1) ! 105.220 us | rpc_wait_bit_killable(); 2) | rpc_async_schedule() { 2) | rpc_prepare_task() { 2) | nfs4_open_prepare() { 2) 0.980 us | nfs_wait_on_sequence(); 2) 0.780 us | nfs_mark_delegation_referenced(); 2) 0.740 us | nfs4_sequence_done(); 2) 6.340 us | } 2) 8.180 us | } 2) | rpc_release_resources_task() { 2) 0.760 us | xprt_release(); 2) | rpc_task_release_client() { 2) 0.920 us | rpc_release_client(); 2) 0.760 us | xprt_put(); 2) 4.280 us | } 2) 7.740 us | } 2) + 24.480 us | } 1) | rpc_put_task() { 1) | rpc_do_put_task() { 1) | rpc_release_resources_task() { 1) 0.900 us | xprt_release(); 1) 0.760 us | rpc_task_release_client(); 1) 4.140 us | } 1) | rpc_free_task() { 1) | nfs4_open_release() { 1) 0.760 us | nfs4_opendata_put.part.8(); 1) 2.320 us | } 1) 4.120 us | } 1) + 10.820 us | } 1) + 12.500 us | } 1) ! 140.660 us | } 1) 0.740 us | nfs_mark_delegation_referenced(); 1) 0.840 us | nfs_release_seqid(); 1) | nfs_may_open() { 1) | nfs_do_access() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 0.760 us | nfs_mark_delegation_referenced(); 1) 4.140 us | } 1) 5.740 us | } 1) 7.620 us | } 1) 9.120 us | } 1) 0.820 us | nfs_mark_delegation_referenced(); 1) 0.800 us | nfs4_state_set_mode_locked(); 1) 0.760 us | nfs_release_seqid(); 1) 0.980 us | nfs_inode_attach_open_context(); 1) 0.740 us | nfs4_sequence_free_slot(); 1) | nfs4_opendata_put.part.8() { 1) 0.740 us | nfs4_lgopen_release(); 1) | nfs_free_seqid() { 1) 0.740 us | nfs_release_seqid(); 1) 2.340 us | } 1) 0.760 us | nfs4_sequence_free_slot(); 1) 0.800 us | nfs4_put_open_state(); 1) 0.760 us | nfs4_put_state_owner(); 1) 0.760 us | nfs_sb_deactive(); 1) 0.720 us | nfs_fattr_free_names(); 1) + 13.880 us | } 1) 0.740 us | nfs4_put_state_owner(); 1) ! 206.100 us | } 1) ! 207.740 us | } 1) 0.880 us | nfs_file_set_open_context(); 1) ! 219.600 us | } 1) | nfs_getattr() { 1) | nfs_writepages() { 1) | nfs_pageio_init_write() { 1) 0.860 us | nfs_pageio_init(); 1) 2.600 us | } 1) | nfs_pageio_complete() { 1) | nfs_pageio_doio() { 1) 0.720 us | nfs_pgio_current_mirror(); 1) 2.720 us | } 1) 4.960 us | } 1) | nfs_io_completion_put.part.0() { 1) | nfs_io_completion_commit() { 1) 1.080 us | nfs_commit_end(); 1) 3.380 us | } 1) 5.440 us | } 1) + 20.500 us | } 1) | nfs_attribute_cache_expired() { 1) | nfs4_have_delegation() { 1) 0.760 us | nfs4_is_valid_delegation(); 1) 0.760 us | nfs_mark_delegation_referenced(); 1) 3.980 us | } 1) 5.580 us | } 1) | nfs_readdirplus_parent_cache_hit.part.5() { 1) 0.800 us | nfs_advise_use_readdirplus(); 1) 2.460 us | } 1) + 36.240 us | } 1) | nfs_file_mmap() { 1) | nfs_revalidate_mapping() { 1) | nfs_mapping_need_revalidate_inode() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 0.740 us | nfs_mark_delegation_referenced(); 1) 3.820 us | } 1) 5.600 us | } 1) 7.160 us | } 1) 8.780 us | } 1) + 10.440 us | } 1) 0.840 us | nfs4_file_flush(); 1) | nfs_permission() { 1) | rpc_lookup_cred_nonblock() { 1) | rpcauth_lookupcred() { 1) 0.940 us | rpcauth_lookup_credcache(); 1) 2.640 us | } 1) 4.220 us | } 1) | nfs_do_access() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 2.320 us | } 1) 3.940 us | } 1) 5.540 us | } 1) + 12.320 us | } 1) | nfs4_lookup_revalidate() { 1) | nfs4_do_lookup_revalidate() { 1) | nfs_do_lookup_revalidate() { 1) | nfs4_have_delegation() { 1) 0.720 us | nfs4_is_valid_delegation(); 1) 2.400 us | } 1) | nfs_check_verifier() { 1) | nfs_mapping_need_revalidate_inode() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 2.300 us | } 1) 3.820 us | } 1) 5.340 us | } 1) 6.860 us | } 1) 0.760 us | nfs_lookup_verify_inode(); 1) 0.860 us | nfs_advise_use_readdirplus(); 1) + 15.000 us | } 1) + 16.560 us | } 1) + 18.080 us | } 1) | nfs_permission() { 1) | rpc_lookup_cred_nonblock() { 1) | rpcauth_lookupcred() { 1) 0.840 us | rpcauth_lookup_credcache(); 1) 2.400 us | } 1) 3.920 us | } 1) | nfs_do_access() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 2.300 us | } 1) 3.820 us | } 1) 5.380 us | } 1) + 11.720 us | } 1) | nfs4_lookup_revalidate() { 1) | nfs4_do_lookup_revalidate() { 1) | nfs_do_lookup_revalidate() { 1) | nfs_check_verifier() { 1) | nfs_mapping_need_revalidate_inode() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.880 us | nfs4_is_valid_delegation(); 1) 2.440 us | } 1) 3.980 us | } 1) 5.580 us | } 1) 7.120 us | } 1) 0.740 us | nfs_lookup_revalidate_done(); 1) + 10.280 us | } 1) + 11.820 us | } 1) + 13.540 us | } 1) | nfs_permission() { 1) | rpc_lookup_cred_nonblock() { 1) | rpcauth_lookupcred() { 1) 0.860 us | rpcauth_lookup_credcache(); 1) 2.560 us | } 1) 4.100 us | } 1) | nfs_do_access() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 2.320 us | } 1) 3.860 us | } 1) 5.440 us | } 1) + 12.020 us | } 1) | nfs4_lookup_revalidate() { 1) | nfs4_do_lookup_revalidate() { 1) | nfs_do_lookup_revalidate() { 1) | nfs4_have_delegation() { 1) 0.760 us | nfs4_is_valid_delegation(); 1) 2.340 us | } 1) | nfs_check_verifier() { 1) | nfs_mapping_need_revalidate_inode() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 2.280 us | } 1) 3.840 us | } 1) 5.360 us | } 1) 6.980 us | } 1) 0.740 us | nfs_lookup_verify_inode(); 1) 0.740 us | nfs_advise_use_readdirplus(); 1) + 15.000 us | } 1) + 16.520 us | } 1) + 18.040 us | } 1) | nfs_permission() { 1) | rpc_lookup_cred_nonblock() { 1) | rpcauth_lookupcred() { 1) 0.840 us | rpcauth_lookup_credcache(); 1) 2.400 us | } 1) 3.960 us | } 1) | nfs_do_access() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 2.300 us | } 1) 3.860 us | } 1) 5.580 us | } 1) + 11.920 us | } 1) | nfs4_lookup_revalidate() { 1) | nfs4_do_lookup_revalidate() { 1) | nfs_do_lookup_revalidate() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 2.300 us | } 1) | nfs_check_verifier() { 1) | nfs_mapping_need_revalidate_inode() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 2.280 us | } 1) 3.820 us | } 1) 5.340 us | } 1) 6.820 us | } 1) 0.760 us | nfs_lookup_verify_inode(); 1) 0.740 us | nfs_advise_use_readdirplus(); 1) + 14.540 us | } 1) + 16.060 us | } 1) + 17.640 us | } 1) | nfs_permission() { 1) | rpc_lookup_cred_nonblock() { 1) | rpcauth_lookupcred() { 1) 0.840 us | rpcauth_lookup_credcache(); 1) 2.360 us | } 1) 3.900 us | } 1) | nfs_do_access() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 2.300 us | } 1) 3.820 us | } 1) 5.420 us | } 1) + 11.660 us | } 1) | nfs4_lookup_revalidate() { 1) | nfs4_do_lookup_revalidate() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 2.400 us | } 1) | nfs_do_lookup_revalidate() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 2.320 us | } 1) | nfs_check_verifier() { 1) | nfs_mapping_need_revalidate_inode() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.720 us | nfs4_is_valid_delegation(); 1) 2.320 us | } 1) 3.860 us | } 1) 5.400 us | } 1) 6.880 us | } 1) 0.820 us | nfs_lookup_verify_inode(); 1) 0.740 us | nfs_advise_use_readdirplus(); 1) + 14.840 us | } 1) + 19.700 us | } 1) + 21.240 us | } 1) | nfs_get_link() { 1) | nfs_revalidate_mapping_rcu() { 1) | nfs_mapping_need_revalidate_inode() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 2.300 us | } 1) 3.940 us | } 1) 5.600 us | } 1) 7.320 us | } 1) 9.340 us | } 1) | nfs_permission() { 1) | rpc_lookup_cred_nonblock() { 1) | rpcauth_lookupcred() { 1) 0.840 us | rpcauth_lookup_credcache(); 1) 2.360 us | } 1) 3.920 us | } 1) | nfs_do_access() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 2.320 us | } 1) 3.860 us | } 1) 5.400 us | } 1) + 11.720 us | } 1) | nfs4_lookup_revalidate() { 1) | nfs4_do_lookup_revalidate() { 1) | nfs4_have_delegation() { 1) 0.980 us | nfs4_is_valid_delegation(); 1) 0.760 us | nfs_mark_delegation_referenced(); 1) 4.240 us | } 1) 5.780 us | } 1) 7.340 us | } 1) 0.840 us | nfs_permission(); 1) | nfs4_file_open() { 1) 0.740 us | nfs_check_flags(); 1) | rpc_lookup_cred() { 1) | rpcauth_lookupcred() { 1) 0.900 us | rpcauth_lookup_credcache(); 1) 2.480 us | } 1) 4.020 us | } 1) 0.800 us | nfs_sb_active(); 1) | nfs4_atomic_open() { 1) | nfs4_do_open() { 1) 0.960 us | nfs4_get_state_owner(); 1) | nfs4_client_recover_expired_lease() { 1) | nfs4_wait_clnt_recover() { 1) | nfs_put_client() { 1) 1.040 us | nfs_put_client.part.2(); 1) 2.580 us | } 1) 4.260 us | } 1) 5.840 us | } 1) | nfs4_opendata_alloc() { 1) 0.740 us | nfs4_label_alloc(); 1) 0.740 us | nfs4_label_alloc(); 1) 0.840 us | nfs_alloc_seqid(); 1) 0.820 us | nfs_sb_active(); 1) | nfs_fattr_init() { 1) 0.740 us | nfs_inc_attr_generation_counter(); 1) 2.280 us | } 1) 0.760 us | nfs_fattr_init_names(); 1) + 12.420 us | } 1) 0.980 us | nfs4_get_open_state(); 1) | nfs4_run_open_task() { 1) | rpc_run_task() { 1) | rpc_new_task() { 1) 0.740 us | xprt_get(); 1) 2.600 us | } 1) | xprt_iter_get_next() { 1) | xprt_iter_get_helper() { 1) 0.740 us | xprt_iter_first_entry(); 1) 0.740 us | xprt_get(); 1) 3.880 us | } 1) 5.480 us | } 1) | rpc_execute() { 1) 4.240 us | rpc_make_runnable(); 1) 5.920 us | } 1) + 17.600 us | } 1) ! 104.260 us | rpc_wait_bit_killable(); 2) | rpc_async_schedule() { 2) | rpc_prepare_task() { 2) | nfs4_open_prepare() { 2) 1.000 us | nfs_wait_on_sequence(); 2) 0.800 us | nfs_mark_delegation_referenced(); 2) 0.740 us | nfs4_sequence_done(); 2) 6.540 us | } 2) 8.480 us | } 2) | rpc_release_resources_task() { 2) 0.780 us | xprt_release(); 2) | rpc_task_release_client() { 2) 0.780 us | rpc_release_client(); 2) 0.740 us | xprt_put(); 2) 4.540 us | } 2) 8.000 us | } 2) + 24.700 us | } 1) | rpc_put_task() { 1) | rpc_do_put_task() { 1) | rpc_release_resources_task() { 1) 0.880 us | xprt_release(); 1) 0.720 us | rpc_task_release_client(); 1) 4.060 us | } 1) | rpc_free_task() { 1) | nfs4_open_release() { 1) 0.760 us | nfs4_opendata_put.part.8(); 1) 2.340 us | } 1) 4.120 us | } 1) + 10.700 us | } 1) + 12.320 us | } 1) ! 138.360 us | } 1) 0.760 us | nfs_mark_delegation_referenced(); 1) 0.820 us | nfs_release_seqid(); 1) | nfs_may_open() { 1) | nfs_do_access() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.740 us | nfs4_is_valid_delegation(); 1) 0.740 us | nfs_mark_delegation_referenced(); 1) 4.060 us | } 1) 5.620 us | } 1) 7.420 us | } 1) 8.940 us | } 1) 0.820 us | nfs_mark_delegation_referenced(); 1) 0.740 us | nfs4_state_set_mode_locked(); 1) 0.740 us | nfs_release_seqid(); 1) 1.220 us | nfs_inode_attach_open_context(); 1) 0.740 us | nfs4_sequence_free_slot(); 1) | nfs4_opendata_put.part.8() { 1) 0.720 us | nfs4_lgopen_release(); 1) | nfs_free_seqid() { 1) 0.740 us | nfs_release_seqid(); 1) 2.380 us | } 1) 0.760 us | nfs4_sequence_free_slot(); 1) 0.740 us | nfs4_put_open_state(); 1) 0.740 us | nfs4_put_state_owner(); 1) 0.740 us | nfs_sb_deactive(); 1) 0.740 us | nfs_fattr_free_names(); 1) + 13.740 us | } 1) 0.760 us | nfs4_put_state_owner(); 1) ! 202.420 us | } 1) ! 204.020 us | } 1) 0.880 us | nfs_file_set_open_context(); 1) ! 216.020 us | } 1) | nfs_file_read() { 1) 0.760 us | nfs_start_io_read(); 1) | nfs_revalidate_mapping() { 1) | nfs_mapping_need_revalidate_inode() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.760 us | nfs4_is_valid_delegation(); 1) 0.880 us | nfs_mark_delegation_referenced(); 1) 4.020 us | } 1) 5.560 us | } 1) 7.120 us | } 1) 8.740 us | } 1) 0.800 us | nfs_end_io_read(); 1) + 15.660 us | } 1) | nfs_getattr() { 1) | nfs_writepages() { 1) | nfs_pageio_init_write() { 1) 0.740 us | nfs_pageio_init(); 1) 2.260 us | } 1) | nfs_pageio_complete() { 1) | nfs_pageio_doio() { 1) 0.740 us | nfs_pgio_current_mirror(); 1) 2.260 us | } 1) 3.960 us | } 1) | nfs_io_completion_put.part.0() { 1) | nfs_io_completion_commit() { 1) 0.920 us | nfs_commit_end(); 1) 2.560 us | } 1) 4.200 us | } 1) + 14.760 us | } 1) | nfs_attribute_cache_expired() { 1) | nfs4_have_delegation() { 1) 0.780 us | nfs4_is_valid_delegation(); 1) 0.760 us | nfs_mark_delegation_referenced(); 1) 3.940 us | } 1) 5.540 us | } 1) | nfs_readdirplus_parent_cache_hit.part.5() { 1) 0.720 us | nfs_advise_use_readdirplus(); 1) 2.540 us | } 1) + 27.060 us | } 1) | nfs_file_mmap() { 1) | nfs_revalidate_mapping() { 1) | nfs_mapping_need_revalidate_inode() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.760 us | nfs4_is_valid_delegation(); 1) 0.740 us | nfs_mark_delegation_referenced(); 1) 3.900 us | } 1) 5.420 us | } 1) 7.000 us | } 1) 8.580 us | } 1) + 10.320 us | } 1) | nfs_file_mmap() { 1) | nfs_revalidate_mapping() { 1) | nfs_mapping_need_revalidate_inode() { 1) | nfs_check_cache_invalid() { 1) | nfs4_have_delegation() { 1) 0.780 us | nfs4_is_valid_delegation(); 1) 0.760 us | nfs_mark_delegation_referenced(); 1) 3.960 us | } 1) 5.660 us | } 1) 7.200 us | } 1) 8.900 us | } 1) + 10.680 us | } 1) 1.040 us | nfs4_file_flush(); 1) | nfs_file_release() { 1) | nfs_file_clear_open_context() { 1) | nfs4_close_context() { 1) | nfs4_close_sync() { 1) 0.800 us | nfs4_state_set_mode_locked(); 1) | nfs4_put_open_state() { 1) 0.740 us | nfs4_put_state_owner(); 1) 2.920 us | } 1) 0.760 us | nfs4_put_state_owner(); 1) 8.600 us | } 1) + 10.460 us | } 1) 0.800 us | nfs_sb_deactive(); 1) + 14.740 us | } 1) + 16.820 us | } 1) 0.940 us | nfs_dentry_delete(); 1) | nfs_file_release() { 1) | nfs_file_clear_open_context() { 1) | nfs4_close_context() { 1) | nfs4_close_sync() { 1) 0.840 us | nfs4_state_set_mode_locked(); 1) | nfs4_put_open_state() { 1) 0.900 us | nfs4_put_state_owner(); 1) 3.880 us | } 1) 0.760 us | nfs4_put_state_owner(); 1) + 10.520 us | } 1) + 13.100 us | } 1) 0.860 us | nfs_sb_deactive(); 1) + 19.060 us | } 1) + 25.300 us | } 1) 0.900 us | nfs_dentry_delete(); 1) | nfs_file_release() { 1) | nfs_file_clear_open_context() { 1) | nfs4_close_context() { 1) | nfs4_close_sync() { 1) 0.920 us | nfs4_state_set_mode_locked(); 1) 0.860 us | nfs4_put_open_state(); 1) 0.780 us | nfs4_put_state_owner(); 1) 6.120 us | } 1) 7.740 us | } 1) 0.760 us | nfs_sb_deactive(); 1) + 11.680 us | } 1) + 13.580 us | } 1) | nfs_file_release() { 1) | nfs_file_clear_open_context() { 1) | nfs4_close_context() { 1) | nfs4_close_sync() { 1) 0.920 us | nfs4_state_set_mode_locked(); 1) 0.780 us | nfs4_put_open_state(); 1) 0.740 us | nfs4_put_state_owner(); 1) 5.940 us | } 1) 7.520 us | } 1) 0.760 us | nfs_sb_deactive(); 1) + 11.380 us | } 1) + 13.040 us | } ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks 2018-09-17 13:03 ` [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 27/44] SUNRPC: Support for congestion control when queuing is enabled Trond Myklebust @ 2018-12-27 19:21 ` Chuck Lever 2018-12-27 22:14 ` Trond Myklebust 1 sibling, 1 reply; 76+ messages in thread From: Chuck Lever @ 2018-12-27 19:21 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS Mailing List > On Sep 17, 2018, at 9:03 AM, Trond Myklebust <trondmy@gmail.com> wrote: > > One of the intentions with the priority queues was to ensure that no > single process can hog the transport. The field task->tk_owner therefore > identifies the RPC call's origin, and is intended to allow the RPC layer > to organise queues for fairness. > This commit therefore modifies the transmit queue to group requests > by task->tk_owner, and ensures that we round robin among those groups. > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> > --- > include/linux/sunrpc/xprt.h | 1 + > net/sunrpc/xprt.c | 27 ++++++++++++++++++++++++--- > 2 files changed, 25 insertions(+), 3 deletions(-) > > diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h > index 8c2bb078f00c..e377620b9744 100644 > --- a/include/linux/sunrpc/xprt.h > +++ b/include/linux/sunrpc/xprt.h > @@ -89,6 +89,7 @@ struct rpc_rqst { > }; > > struct list_head rq_xmit; /* Send queue */ > + struct list_head rq_xmit2; /* Send queue */ > > void *rq_buffer; /* Call XDR encode buffer */ > size_t rq_callsize; > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c > index 35f5df367591..3e68f35f71f6 100644 > --- a/net/sunrpc/xprt.c > +++ b/net/sunrpc/xprt.c > @@ -1052,12 +1052,21 @@ xprt_request_need_enqueue_transmit(struct rpc_task *task, struct rpc_rqst *req) > void > xprt_request_enqueue_transmit(struct rpc_task *task) > { > - struct rpc_rqst *req = task->tk_rqstp; > + struct rpc_rqst *pos, *req = task->tk_rqstp; > struct rpc_xprt *xprt = req->rq_xprt; > > if (xprt_request_need_enqueue_transmit(task, req)) { > spin_lock(&xprt->queue_lock); > + list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) { > + if (pos->rq_task->tk_owner != task->tk_owner) > + continue; > + list_add_tail(&req->rq_xmit2, &pos->rq_xmit2); > + INIT_LIST_HEAD(&req->rq_xmit); > + goto out; > + } > list_add_tail(&req->rq_xmit, &xprt->xmit_queue); > + INIT_LIST_HEAD(&req->rq_xmit2); > +out: > set_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate); > spin_unlock(&xprt->queue_lock); > } > @@ -1073,8 +1082,20 @@ xprt_request_enqueue_transmit(struct rpc_task *task) > static void > xprt_request_dequeue_transmit_locked(struct rpc_task *task) > { > - if (test_and_clear_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate)) > - list_del(&task->tk_rqstp->rq_xmit); > + struct rpc_rqst *req = task->tk_rqstp; > + > + if (!test_and_clear_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate)) > + return; > + if (!list_empty(&req->rq_xmit)) { > + list_del(&req->rq_xmit); > + if (!list_empty(&req->rq_xmit2)) { > + struct rpc_rqst *next = list_first_entry(&req->rq_xmit2, > + struct rpc_rqst, rq_xmit2); > + list_del(&req->rq_xmit2); > + list_add_tail(&next->rq_xmit, &next->rq_xprt->xmit_queue); > + } > + } else > + list_del(&req->rq_xmit2); > } > > /** > -- > 2.17.1 Hi Trond- I've chased down a couple of remaining regressions with the v4.20 NFS client, and they seem to be rooted in this commit. When using sec=krb5, krb5i, or krb5p I found that multi-threaded workloads trigger a lot of server-side disconnects. This is with TCP and RDMA transports. An instrumented server shows that the client is under-running the GSS sequence number window. I monitored the order in which GSS sequence numbers appear on the wire, and after this commit, the sequence numbers are wildly misordered. If I revert the hunk in xprt_request_enqueue_transmit, the problem goes away. I also found that reverting that hunk results in a 3-4% improvement in fio IOPS rates, as well as improvement in average and maximum latency as reported by fio. -- Chuck Lever ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks 2018-12-27 19:21 ` [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks Chuck Lever @ 2018-12-27 22:14 ` Trond Myklebust 2018-12-27 22:34 ` Chuck Lever 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-12-27 22:14 UTC (permalink / raw) To: Charles Edward Lever; +Cc: Linux NFS Mailing List > On Dec 27, 2018, at 20:21, Chuck Lever <chuck.lever@oracle.com> wrote: > > Hi Trond- > > I've chased down a couple of remaining regressions with the v4.20 NFS client, > and they seem to be rooted in this commit. > > When using sec=krb5, krb5i, or krb5p I found that multi-threaded workloads > trigger a lot of server-side disconnects. This is with TCP and RDMA transports. > An instrumented server shows that the client is under-running the GSS sequence > number window. I monitored the order in which GSS sequence numbers appear on > the wire, and after this commit, the sequence numbers are wildly misordered. > If I revert the hunk in xprt_request_enqueue_transmit, the problem goes away. > > I also found that reverting that hunk results in a 3-4% improvement in fio > IOPS rates, as well as improvement in average and maximum latency as reported > by fio. > Hmm… Provided the sequence numbers still lie within the window, then why would the order matter? Cheers Trond _________________________________ Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks 2018-12-27 22:14 ` Trond Myklebust @ 2018-12-27 22:34 ` Chuck Lever 2018-12-31 18:09 ` Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Chuck Lever @ 2018-12-27 22:34 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS Mailing List > On Dec 27, 2018, at 5:14 PM, Trond Myklebust <trondmy@hammerspace.com> wrote: > > > >> On Dec 27, 2018, at 20:21, Chuck Lever <chuck.lever@oracle.com> wrote: >> >> Hi Trond- >> >> I've chased down a couple of remaining regressions with the v4.20 NFS client, >> and they seem to be rooted in this commit. >> >> When using sec=krb5, krb5i, or krb5p I found that multi-threaded workloads >> trigger a lot of server-side disconnects. This is with TCP and RDMA transports. >> An instrumented server shows that the client is under-running the GSS sequence >> number window. I monitored the order in which GSS sequence numbers appear on >> the wire, and after this commit, the sequence numbers are wildly misordered. >> If I revert the hunk in xprt_request_enqueue_transmit, the problem goes away. >> >> I also found that reverting that hunk results in a 3-4% improvement in fio >> IOPS rates, as well as improvement in average and maximum latency as reported >> by fio. >> > > Hmm… Provided the sequence numbers still lie within the window, then why would the order matter? The misordering is so bad that one request is delayed long enough to fall outside the window. The new “need re-encode” logic does not trigger. > Cheers > Trond > > > _________________________________ > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com > ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks 2018-12-27 22:34 ` Chuck Lever @ 2018-12-31 18:09 ` Trond Myklebust 2018-12-31 18:44 ` Chuck Lever 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-12-31 18:09 UTC (permalink / raw) To: chuck.lever; +Cc: linux-nfs On Thu, 2018-12-27 at 17:34 -0500, Chuck Lever wrote: > > On Dec 27, 2018, at 5:14 PM, Trond Myklebust < > > trondmy@hammerspace.com> wrote: > > > > > > > > > On Dec 27, 2018, at 20:21, Chuck Lever <chuck.lever@oracle.com> > > > wrote: > > > > > > Hi Trond- > > > > > > I've chased down a couple of remaining regressions with the v4.20 > > > NFS client, > > > and they seem to be rooted in this commit. > > > > > > When using sec=krb5, krb5i, or krb5p I found that multi-threaded > > > workloads > > > trigger a lot of server-side disconnects. This is with TCP and > > > RDMA transports. > > > An instrumented server shows that the client is under-running the > > > GSS sequence > > > number window. I monitored the order in which GSS sequence > > > numbers appear on > > > the wire, and after this commit, the sequence numbers are wildly > > > misordered. > > > If I revert the hunk in xprt_request_enqueue_transmit, the > > > problem goes away. > > > > > > I also found that reverting that hunk results in a 3-4% > > > improvement in fio > > > IOPS rates, as well as improvement in average and maximum latency > > > as reported > > > by fio. > > > > > > > Hmm… Provided the sequence numbers still lie within the window, > > then why would the order matter? > > The misordering is so bad that one request is delayed long enough to > fall outside the window. The new “need re-encode” logic does not > trigger. > That's weird. I can't see anything wrong with need re-encode at this point. Do the window sizes agree on the client and the server? -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks 2018-12-31 18:09 ` Trond Myklebust @ 2018-12-31 18:44 ` Chuck Lever 2018-12-31 18:59 ` Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Chuck Lever @ 2018-12-31 18:44 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS Mailing List > On Dec 31, 2018, at 1:09 PM, Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Thu, 2018-12-27 at 17:34 -0500, Chuck Lever wrote: >>> On Dec 27, 2018, at 5:14 PM, Trond Myklebust < >>> trondmy@hammerspace.com> wrote: >>> >>> >>> >>>> On Dec 27, 2018, at 20:21, Chuck Lever <chuck.lever@oracle.com> >>>> wrote: >>>> >>>> Hi Trond- >>>> >>>> I've chased down a couple of remaining regressions with the v4.20 >>>> NFS client, >>>> and they seem to be rooted in this commit. >>>> >>>> When using sec=krb5, krb5i, or krb5p I found that multi-threaded >>>> workloads >>>> trigger a lot of server-side disconnects. This is with TCP and >>>> RDMA transports. >>>> An instrumented server shows that the client is under-running the >>>> GSS sequence >>>> number window. I monitored the order in which GSS sequence >>>> numbers appear on >>>> the wire, and after this commit, the sequence numbers are wildly >>>> misordered. >>>> If I revert the hunk in xprt_request_enqueue_transmit, the >>>> problem goes away. >>>> >>>> I also found that reverting that hunk results in a 3-4% >>>> improvement in fio >>>> IOPS rates, as well as improvement in average and maximum latency >>>> as reported >>>> by fio. >>>> >>> >>> Hmm… Provided the sequence numbers still lie within the window, >>> then why would the order matter? >> >> The misordering is so bad that one request is delayed long enough to >> fall outside the window. The new “need re-encode” logic does not >> trigger. >> > > That's weird. I can't see anything wrong with need re-encode at this > point. I don't think there is anything wrong with it, it looks like it's not called in this case. > Do the window sizes agree on the client and the server? Yes, both are 128. I also tried with 64 on the client side and 128 on the server side. That reduces the frequency of disconnects, but does not eliminate them. I'm not clear what problem the logic in xprt_request_enqueue_transmit is trying to address. It seems to me that the initial, simple implementation of this function is entirely adequate..? -- Chuck Lever ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks 2018-12-31 18:44 ` Chuck Lever @ 2018-12-31 18:59 ` Trond Myklebust 2018-12-31 19:09 ` Chuck Lever 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-12-31 18:59 UTC (permalink / raw) To: chuck.lever; +Cc: linux-nfs On Mon, 2018-12-31 at 13:44 -0500, Chuck Lever wrote: > > On Dec 31, 2018, at 1:09 PM, Trond Myklebust < > > trondmy@hammerspace.com> wrote: > > > > On Thu, 2018-12-27 at 17:34 -0500, Chuck Lever wrote: > > > > On Dec 27, 2018, at 5:14 PM, Trond Myklebust < > > > > trondmy@hammerspace.com> wrote: > > > > > > > > > > > > > > > > > On Dec 27, 2018, at 20:21, Chuck Lever < > > > > > chuck.lever@oracle.com> > > > > > wrote: > > > > > > > > > > Hi Trond- > > > > > > > > > > I've chased down a couple of remaining regressions with the > > > > > v4.20 > > > > > NFS client, > > > > > and they seem to be rooted in this commit. > > > > > > > > > > When using sec=krb5, krb5i, or krb5p I found that multi- > > > > > threaded > > > > > workloads > > > > > trigger a lot of server-side disconnects. This is with TCP > > > > > and > > > > > RDMA transports. > > > > > An instrumented server shows that the client is under-running > > > > > the > > > > > GSS sequence > > > > > number window. I monitored the order in which GSS sequence > > > > > numbers appear on > > > > > the wire, and after this commit, the sequence numbers are > > > > > wildly > > > > > misordered. > > > > > If I revert the hunk in xprt_request_enqueue_transmit, the > > > > > problem goes away. > > > > > > > > > > I also found that reverting that hunk results in a 3-4% > > > > > improvement in fio > > > > > IOPS rates, as well as improvement in average and maximum > > > > > latency > > > > > as reported > > > > > by fio. > > > > > > > > > > > > > Hmm… Provided the sequence numbers still lie within the window, > > > > then why would the order matter? > > > > > > The misordering is so bad that one request is delayed long enough > > > to > > > fall outside the window. The new “need re-encode” logic does not > > > trigger. > > > > > > > That's weird. I can't see anything wrong with need re-encode at > > this > > point. > > I don't think there is anything wrong with it, it looks like it's > not called in this case. So you are saying that the call to rpcauth_xmit_need_reencode() is triggering the EBADMSG, but that this fails to cause a re-encode of the message? > > > Do the window sizes agree on the client and the server? > > Yes, both are 128. I also tried with 64 on the client side and 128 > on the server side. That reduces the frequency of disconnects, but > does not eliminate them. > > I'm not clear what problem the logic in xprt_request_enqueue_transmit > is trying to address. It seems to me that the initial, simple > implementation of this function is entirely adequate..? I agree that the fair queueing code could result in a reordering that could screw up the RPCSEC_GSS sequencing. However, we do expect the need reencode stuff to catch that. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks 2018-12-31 18:59 ` Trond Myklebust @ 2018-12-31 19:09 ` Chuck Lever 2018-12-31 19:18 ` Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Chuck Lever @ 2018-12-31 19:09 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS Mailing List > On Dec 31, 2018, at 1:59 PM, Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Mon, 2018-12-31 at 13:44 -0500, Chuck Lever wrote: >>> On Dec 31, 2018, at 1:09 PM, Trond Myklebust < >>> trondmy@hammerspace.com> wrote: >>> >>> On Thu, 2018-12-27 at 17:34 -0500, Chuck Lever wrote: >>>>> On Dec 27, 2018, at 5:14 PM, Trond Myklebust < >>>>> trondmy@hammerspace.com> wrote: >>>>> >>>>> >>>>> >>>>>> On Dec 27, 2018, at 20:21, Chuck Lever < >>>>>> chuck.lever@oracle.com> >>>>>> wrote: >>>>>> >>>>>> Hi Trond- >>>>>> >>>>>> I've chased down a couple of remaining regressions with the >>>>>> v4.20 >>>>>> NFS client, >>>>>> and they seem to be rooted in this commit. >>>>>> >>>>>> When using sec=krb5, krb5i, or krb5p I found that multi- >>>>>> threaded >>>>>> workloads >>>>>> trigger a lot of server-side disconnects. This is with TCP >>>>>> and >>>>>> RDMA transports. >>>>>> An instrumented server shows that the client is under-running >>>>>> the >>>>>> GSS sequence >>>>>> number window. I monitored the order in which GSS sequence >>>>>> numbers appear on >>>>>> the wire, and after this commit, the sequence numbers are >>>>>> wildly >>>>>> misordered. >>>>>> If I revert the hunk in xprt_request_enqueue_transmit, the >>>>>> problem goes away. >>>>>> >>>>>> I also found that reverting that hunk results in a 3-4% >>>>>> improvement in fio >>>>>> IOPS rates, as well as improvement in average and maximum >>>>>> latency >>>>>> as reported >>>>>> by fio. >>>>>> >>>>> >>>>> Hmm… Provided the sequence numbers still lie within the window, >>>>> then why would the order matter? >>>> >>>> The misordering is so bad that one request is delayed long enough >>>> to >>>> fall outside the window. The new “need re-encode” logic does not >>>> trigger. >>>> >>> >>> That's weird. I can't see anything wrong with need re-encode at >>> this >>> point. >> >> I don't think there is anything wrong with it, it looks like it's >> not called in this case. > > So you are saying that the call to rpcauth_xmit_need_reencode() is > triggering the EBADMSG, but that this fails to cause a re-encode of the > message? No, I think what's going on is that the need_reencode happens when the RPC is enqueued, and is successful. But xprt_request_enqueue_transmit places the RPC somewhere in the middle of xmit_queue. xmit_queue is long enough that more than 128 requests are before the enqueued request. >>> Do the window sizes agree on the client and the server? >> >> Yes, both are 128. I also tried with 64 on the client side and 128 >> on the server side. That reduces the frequency of disconnects, but >> does not eliminate them. >> >> I'm not clear what problem the logic in xprt_request_enqueue_transmit >> is trying to address. It seems to me that the initial, simple >> implementation of this function is entirely adequate..? > > I agree that the fair queueing code could result in a reordering that > could screw up the RPCSEC_GSS sequencing. However, we do expect the > need reencode stuff to catch that. -- Chuck Lever ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks 2018-12-31 19:09 ` Chuck Lever @ 2018-12-31 19:18 ` Trond Myklebust 2018-12-31 19:21 ` Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-12-31 19:18 UTC (permalink / raw) To: chuck.lever; +Cc: linux-nfs On Mon, 2018-12-31 at 14:09 -0500, Chuck Lever wrote: > > On Dec 31, 2018, at 1:59 PM, Trond Myklebust < > > trondmy@hammerspace.com> wrote: > > > > On Mon, 2018-12-31 at 13:44 -0500, Chuck Lever wrote: > > > > On Dec 31, 2018, at 1:09 PM, Trond Myklebust < > > > > trondmy@hammerspace.com> wrote: > > > > > > > > On Thu, 2018-12-27 at 17:34 -0500, Chuck Lever wrote: > > > > > > On Dec 27, 2018, at 5:14 PM, Trond Myklebust < > > > > > > trondmy@hammerspace.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > On Dec 27, 2018, at 20:21, Chuck Lever < > > > > > > > chuck.lever@oracle.com> > > > > > > > wrote: > > > > > > > > > > > > > > Hi Trond- > > > > > > > > > > > > > > I've chased down a couple of remaining regressions with > > > > > > > the > > > > > > > v4.20 > > > > > > > NFS client, > > > > > > > and they seem to be rooted in this commit. > > > > > > > > > > > > > > When using sec=krb5, krb5i, or krb5p I found that multi- > > > > > > > threaded > > > > > > > workloads > > > > > > > trigger a lot of server-side disconnects. This is with > > > > > > > TCP > > > > > > > and > > > > > > > RDMA transports. > > > > > > > An instrumented server shows that the client is under- > > > > > > > running > > > > > > > the > > > > > > > GSS sequence > > > > > > > number window. I monitored the order in which GSS > > > > > > > sequence > > > > > > > numbers appear on > > > > > > > the wire, and after this commit, the sequence numbers are > > > > > > > wildly > > > > > > > misordered. > > > > > > > If I revert the hunk in xprt_request_enqueue_transmit, > > > > > > > the > > > > > > > problem goes away. > > > > > > > > > > > > > > I also found that reverting that hunk results in a 3-4% > > > > > > > improvement in fio > > > > > > > IOPS rates, as well as improvement in average and maximum > > > > > > > latency > > > > > > > as reported > > > > > > > by fio. > > > > > > > > > > > > > > > > > > > Hmm… Provided the sequence numbers still lie within the > > > > > > window, > > > > > > then why would the order matter? > > > > > > > > > > The misordering is so bad that one request is delayed long > > > > > enough > > > > > to > > > > > fall outside the window. The new “need re-encode” logic does > > > > > not > > > > > trigger. > > > > > > > > > > > > > That's weird. I can't see anything wrong with need re-encode at > > > > this > > > > point. > > > > > > I don't think there is anything wrong with it, it looks like it's > > > not called in this case. > > > > So you are saying that the call to rpcauth_xmit_need_reencode() is > > triggering the EBADMSG, but that this fails to cause a re-encode of > > the > > message? > > No, I think what's going on is that the need_reencode happens when > the > RPC is enqueued, and is successful. > > But xprt_request_enqueue_transmit places the RPC somewhere in the > middle > of xmit_queue. xmit_queue is long enough that more than 128 requests > are > before the enqueued request. The test for rpcauth_xmit_need_reencode() happens when we call xprt_request_transmit() to actually put the RPC call on the wire. The enqueue order should not be able to defeat that test. Hmm... Is it perhaps the test for req->rq_bytes_sent that is failing because this is a retransmission after a disconnect/reconnect that didn't trigger a re-encode? > > > > Do the window sizes agree on the client and the server? > > > > > > Yes, both are 128. I also tried with 64 on the client side and > > > 128 > > > on the server side. That reduces the frequency of disconnects, > > > but > > > does not eliminate them. > > > > > > I'm not clear what problem the logic in > > > xprt_request_enqueue_transmit > > > is trying to address. It seems to me that the initial, simple > > > implementation of this function is entirely adequate..? > > > > I agree that the fair queueing code could result in a reordering > > that > > could screw up the RPCSEC_GSS sequencing. However, we do expect the > > need reencode stuff to catch that. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks 2018-12-31 19:18 ` Trond Myklebust @ 2018-12-31 19:21 ` Trond Myklebust 2019-01-02 18:17 ` Chuck Lever 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-12-31 19:21 UTC (permalink / raw) To: chuck.lever; +Cc: linux-nfs On Mon, 2018-12-31 at 19:18 +0000, Trond Myklebust wrote: > On Mon, 2018-12-31 at 14:09 -0500, Chuck Lever wrote: > > > On Dec 31, 2018, at 1:59 PM, Trond Myklebust < > > > trondmy@hammerspace.com> wrote: > > > > > > On Mon, 2018-12-31 at 13:44 -0500, Chuck Lever wrote: > > > > > On Dec 31, 2018, at 1:09 PM, Trond Myklebust < > > > > > trondmy@hammerspace.com> wrote: > > > > > > > > > > On Thu, 2018-12-27 at 17:34 -0500, Chuck Lever wrote: > > > > > > > On Dec 27, 2018, at 5:14 PM, Trond Myklebust < > > > > > > > trondmy@hammerspace.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Dec 27, 2018, at 20:21, Chuck Lever < > > > > > > > > chuck.lever@oracle.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > Hi Trond- > > > > > > > > > > > > > > > > I've chased down a couple of remaining regressions with > > > > > > > > the > > > > > > > > v4.20 > > > > > > > > NFS client, > > > > > > > > and they seem to be rooted in this commit. > > > > > > > > > > > > > > > > When using sec=krb5, krb5i, or krb5p I found that > > > > > > > > multi- > > > > > > > > threaded > > > > > > > > workloads > > > > > > > > trigger a lot of server-side disconnects. This is with > > > > > > > > TCP > > > > > > > > and > > > > > > > > RDMA transports. > > > > > > > > An instrumented server shows that the client is under- > > > > > > > > running > > > > > > > > the > > > > > > > > GSS sequence > > > > > > > > number window. I monitored the order in which GSS > > > > > > > > sequence > > > > > > > > numbers appear on > > > > > > > > the wire, and after this commit, the sequence numbers > > > > > > > > are > > > > > > > > wildly > > > > > > > > misordered. > > > > > > > > If I revert the hunk in xprt_request_enqueue_transmit, > > > > > > > > the > > > > > > > > problem goes away. > > > > > > > > > > > > > > > > I also found that reverting that hunk results in a 3-4% > > > > > > > > improvement in fio > > > > > > > > IOPS rates, as well as improvement in average and > > > > > > > > maximum > > > > > > > > latency > > > > > > > > as reported > > > > > > > > by fio. > > > > > > > > > > > > > > > > > > > > > > Hmm… Provided the sequence numbers still lie within the > > > > > > > window, > > > > > > > then why would the order matter? > > > > > > > > > > > > The misordering is so bad that one request is delayed long > > > > > > enough > > > > > > to > > > > > > fall outside the window. The new “need re-encode” logic > > > > > > does > > > > > > not > > > > > > trigger. > > > > > > > > > > > > > > > > That's weird. I can't see anything wrong with need re-encode > > > > > at > > > > > this > > > > > point. > > > > > > > > I don't think there is anything wrong with it, it looks like > > > > it's > > > > not called in this case. > > > > > > So you are saying that the call to rpcauth_xmit_need_reencode() > > > is > > > triggering the EBADMSG, but that this fails to cause a re-encode > > > of > > > the > > > message? > > > > No, I think what's going on is that the need_reencode happens when > > the > > RPC is enqueued, and is successful. > > > > But xprt_request_enqueue_transmit places the RPC somewhere in the > > middle > > of xmit_queue. xmit_queue is long enough that more than 128 > > requests > > are > > before the enqueued request. > > The test for rpcauth_xmit_need_reencode() happens when we call > xprt_request_transmit() to actually put the RPC call on the wire. The > enqueue order should not be able to defeat that test. > > Hmm... Is it perhaps the test for req->rq_bytes_sent that is failing > because this is a retransmission after a disconnect/reconnect that > didn't trigger a re-encode? Actually, it might be worth a try to move the test for rpcauth_xmit_need_reencode() outside the enclosing test for req- >rq_bytes_sent as that is just a minor optimisation. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks 2018-12-31 19:21 ` Trond Myklebust @ 2019-01-02 18:17 ` Chuck Lever 2019-01-02 18:45 ` Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Chuck Lever @ 2019-01-02 18:17 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS Mailing List > On Dec 31, 2018, at 2:21 PM, Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Mon, 2018-12-31 at 19:18 +0000, Trond Myklebust wrote: >> On Mon, 2018-12-31 at 14:09 -0500, Chuck Lever wrote: >>>> On Dec 31, 2018, at 1:59 PM, Trond Myklebust < >>>> trondmy@hammerspace.com> wrote: >>>> >>>> On Mon, 2018-12-31 at 13:44 -0500, Chuck Lever wrote: >>>>>> On Dec 31, 2018, at 1:09 PM, Trond Myklebust < >>>>>> trondmy@hammerspace.com> wrote: >>>>>> >>>>>> On Thu, 2018-12-27 at 17:34 -0500, Chuck Lever wrote: >>>>>>>> On Dec 27, 2018, at 5:14 PM, Trond Myklebust < >>>>>>>> trondmy@hammerspace.com> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On Dec 27, 2018, at 20:21, Chuck Lever < >>>>>>>>> chuck.lever@oracle.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hi Trond- >>>>>>>>> >>>>>>>>> I've chased down a couple of remaining regressions with >>>>>>>>> the >>>>>>>>> v4.20 >>>>>>>>> NFS client, >>>>>>>>> and they seem to be rooted in this commit. >>>>>>>>> >>>>>>>>> When using sec=krb5, krb5i, or krb5p I found that >>>>>>>>> multi- >>>>>>>>> threaded >>>>>>>>> workloads >>>>>>>>> trigger a lot of server-side disconnects. This is with >>>>>>>>> TCP >>>>>>>>> and >>>>>>>>> RDMA transports. >>>>>>>>> An instrumented server shows that the client is under- >>>>>>>>> running >>>>>>>>> the >>>>>>>>> GSS sequence >>>>>>>>> number window. I monitored the order in which GSS >>>>>>>>> sequence >>>>>>>>> numbers appear on >>>>>>>>> the wire, and after this commit, the sequence numbers >>>>>>>>> are >>>>>>>>> wildly >>>>>>>>> misordered. >>>>>>>>> If I revert the hunk in xprt_request_enqueue_transmit, >>>>>>>>> the >>>>>>>>> problem goes away. >>>>>>>>> >>>>>>>>> I also found that reverting that hunk results in a 3-4% >>>>>>>>> improvement in fio >>>>>>>>> IOPS rates, as well as improvement in average and >>>>>>>>> maximum >>>>>>>>> latency >>>>>>>>> as reported >>>>>>>>> by fio. >>>>>>>>> >>>>>>>> >>>>>>>> Hmm… Provided the sequence numbers still lie within the >>>>>>>> window, >>>>>>>> then why would the order matter? >>>>>>> >>>>>>> The misordering is so bad that one request is delayed long >>>>>>> enough >>>>>>> to >>>>>>> fall outside the window. The new “need re-encode” logic >>>>>>> does >>>>>>> not >>>>>>> trigger. >>>>>>> >>>>>> >>>>>> That's weird. I can't see anything wrong with need re-encode >>>>>> at >>>>>> this >>>>>> point. >>>>> >>>>> I don't think there is anything wrong with it, it looks like >>>>> it's >>>>> not called in this case. >>>> >>>> So you are saying that the call to rpcauth_xmit_need_reencode() >>>> is >>>> triggering the EBADMSG, but that this fails to cause a re-encode >>>> of >>>> the >>>> message? >>> >>> No, I think what's going on is that the need_reencode happens when >>> the >>> RPC is enqueued, and is successful. >>> >>> But xprt_request_enqueue_transmit places the RPC somewhere in the >>> middle >>> of xmit_queue. xmit_queue is long enough that more than 128 >>> requests >>> are >>> before the enqueued request. >> >> The test for rpcauth_xmit_need_reencode() happens when we call >> xprt_request_transmit() to actually put the RPC call on the wire. The >> enqueue order should not be able to defeat that test. >> >> Hmm... Is it perhaps the test for req->rq_bytes_sent that is failing >> because this is a retransmission after a disconnect/reconnect that >> didn't trigger a re-encode? > > Actually, it might be worth a try to move the test for > rpcauth_xmit_need_reencode() outside the enclosing test for req- >> rq_bytes_sent as that is just a minor optimisation. Perhaps that's the case for TCP, but RPCs sent via xprtrdma never set req->rq_bytes_sent to a non-zero value. The body of the "if" statement is always executed for those RPCs. -- Chuck Lever ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks 2019-01-02 18:17 ` Chuck Lever @ 2019-01-02 18:45 ` Trond Myklebust 2019-01-02 18:51 ` Chuck Lever 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2019-01-02 18:45 UTC (permalink / raw) To: chuck.lever; +Cc: linux-nfs On Wed, 2019-01-02 at 13:17 -0500, Chuck Lever wrote: > > On Dec 31, 2018, at 2:21 PM, Trond Myklebust < > > trondmy@hammerspace.com> wrote: > > > > On Mon, 2018-12-31 at 19:18 +0000, Trond Myklebust wrote: > > > On Mon, 2018-12-31 at 14:09 -0500, Chuck Lever wrote: > > > > > On Dec 31, 2018, at 1:59 PM, Trond Myklebust < > > > > > trondmy@hammerspace.com> wrote: > > > > > > > > > > > > > The test for rpcauth_xmit_need_reencode() happens when we call > > > xprt_request_transmit() to actually put the RPC call on the wire. > > > The > > > enqueue order should not be able to defeat that test. > > > > > > Hmm... Is it perhaps the test for req->rq_bytes_sent that is > > > failing > > > because this is a retransmission after a disconnect/reconnect > > > that > > > didn't trigger a re-encode? > > > > Actually, it might be worth a try to move the test for > > rpcauth_xmit_need_reencode() outside the enclosing test for req- > > > rq_bytes_sent as that is just a minor optimisation. > > Perhaps that's the case for TCP, but RPCs sent via xprtrdma never set > req->rq_bytes_sent to a non-zero value. The body of the "if" > statement > is always executed for those RPCs. > Then the question is what is defeating the call to rpcauth_xmit_need_reencode() in xprt_request_transmit() and causing it not to trigger in the misordered cases? -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks 2019-01-02 18:45 ` Trond Myklebust @ 2019-01-02 18:51 ` Chuck Lever 2019-01-02 18:57 ` Trond Myklebust 0 siblings, 1 reply; 76+ messages in thread From: Chuck Lever @ 2019-01-02 18:51 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS Mailing List > On Jan 2, 2019, at 1:45 PM, Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Wed, 2019-01-02 at 13:17 -0500, Chuck Lever wrote: >>> On Dec 31, 2018, at 2:21 PM, Trond Myklebust < >>> trondmy@hammerspace.com> wrote: >>> >>> On Mon, 2018-12-31 at 19:18 +0000, Trond Myklebust wrote: >>>> On Mon, 2018-12-31 at 14:09 -0500, Chuck Lever wrote: >>>>>> On Dec 31, 2018, at 1:59 PM, Trond Myklebust < >>>>>> trondmy@hammerspace.com> wrote: >>>>>> >>>>>> >>>> The test for rpcauth_xmit_need_reencode() happens when we call >>>> xprt_request_transmit() to actually put the RPC call on the wire. >>>> The >>>> enqueue order should not be able to defeat that test. >>>> >>>> Hmm... Is it perhaps the test for req->rq_bytes_sent that is >>>> failing >>>> because this is a retransmission after a disconnect/reconnect >>>> that >>>> didn't trigger a re-encode? >>> >>> Actually, it might be worth a try to move the test for >>> rpcauth_xmit_need_reencode() outside the enclosing test for req- >>>> rq_bytes_sent as that is just a minor optimisation. >> >> Perhaps that's the case for TCP, but RPCs sent via xprtrdma never set >> req->rq_bytes_sent to a non-zero value. The body of the "if" >> statement >> is always executed for those RPCs. >> > > Then the question is what is defeating the call to > rpcauth_xmit_need_reencode() in xprt_request_transmit() and causing it > not to trigger in the misordered cases? Here's a sample RPC/RDMA case. My instrumented server reports this: Jan 2 13:29:00 klimt kernel: gss_check_seq_num: dropped: seq_num=141220 sd->sd_max=141360 ftrace log on the client shows this: kworker/u28:12-2191 [004] 194.048534: rpcgss_need_reencode: task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 reencode unneeded kworker/u28:12-2191 [004] 194.048534: xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220 status=-57 kworker/u28:12-2191 [004] 194.048534: rpc_task_run_action: task:1779@5 flags=ASYNC runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-57 action=call_transmit_status kworker/u28:12-2191 [004] 194.048535: rpc_task_run_action: task:1779@5 flags=ASYNC runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0 action=call_transmit kworker/u28:12-2191 [004] 194.048535: rpc_task_sleep: task:1779@5 flags=ASYNC runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-11 timeout=0 queue=xprt_sending kworker/u28:12-2191 [004] 194.048552: rpcgss_need_reencode: task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 reencode unneeded kworker/u28:12-2191 [004] 194.048557: xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220 status=0 kworker/u28:12-2191 [004] 194.048559: rpcgss_need_reencode: task:1902@5 xid=0x14f5f47c rq_seqno=141360 seq_xmit=141336 reencode unneeded kworker/u28:12-2191 [004] 194.048563: xprt_transmit: task:1902@5 xid=0x14f5f47c seqno=141360 status=0 Note that first need_reencode: the sequence numbers show that the xmit queue has been significantly re-ordered. The request being transmitted is already very close to the lower end of the GSS sequence number window. The server then re-ordereds these two slightly because the first one had some Read chunks that need to be pulled over, the second was pure inline and therefore could be processed immediately. That is enough to force the first one outside the GSS sequence number window. I haven't looked closely at the pathology of the TCP case. -- Chuck Lever ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks 2019-01-02 18:51 ` Chuck Lever @ 2019-01-02 18:57 ` Trond Myklebust 2019-01-02 19:06 ` Trond Myklebust 2019-01-02 19:08 ` Chuck Lever 0 siblings, 2 replies; 76+ messages in thread From: Trond Myklebust @ 2019-01-02 18:57 UTC (permalink / raw) To: chuck.lever; +Cc: linux-nfs On Wed, 2019-01-02 at 13:51 -0500, Chuck Lever wrote: > > On Jan 2, 2019, at 1:45 PM, Trond Myklebust < > > trondmy@hammerspace.com> wrote: > > > > On Wed, 2019-01-02 at 13:17 -0500, Chuck Lever wrote: > > > > On Dec 31, 2018, at 2:21 PM, Trond Myklebust < > > > > trondmy@hammerspace.com> wrote: > > > > > > > > On Mon, 2018-12-31 at 19:18 +0000, Trond Myklebust wrote: > > > > > On Mon, 2018-12-31 at 14:09 -0500, Chuck Lever wrote: > > > > > > > On Dec 31, 2018, at 1:59 PM, Trond Myklebust < > > > > > > > trondmy@hammerspace.com> wrote: > > > > > > > > > > > > > > > > > > > The test for rpcauth_xmit_need_reencode() happens when we > > > > > call > > > > > xprt_request_transmit() to actually put the RPC call on the > > > > > wire. > > > > > The > > > > > enqueue order should not be able to defeat that test. > > > > > > > > > > Hmm... Is it perhaps the test for req->rq_bytes_sent that is > > > > > failing > > > > > because this is a retransmission after a disconnect/reconnect > > > > > that > > > > > didn't trigger a re-encode? > > > > > > > > Actually, it might be worth a try to move the test for > > > > rpcauth_xmit_need_reencode() outside the enclosing test for > > > > req- > > > > > rq_bytes_sent as that is just a minor optimisation. > > > > > > Perhaps that's the case for TCP, but RPCs sent via xprtrdma never > > > set > > > req->rq_bytes_sent to a non-zero value. The body of the "if" > > > statement > > > is always executed for those RPCs. > > > > > > > Then the question is what is defeating the call to > > rpcauth_xmit_need_reencode() in xprt_request_transmit() and causing > > it > > not to trigger in the misordered cases? > > Here's a sample RPC/RDMA case. > > My instrumented server reports this: > > Jan 2 13:29:00 klimt kernel: gss_check_seq_num: dropped: > seq_num=141220 sd->sd_max=141360 > > > ftrace log on the client shows this: > > kworker/u28:12-2191 [004] 194.048534: rpcgss_need_reencode: > task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 reencode > unneeded > kworker/u28:12-2191 [004] 194.048534: > xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220 > status=-57 > kworker/u28:12-2191 [004] 194.048534: > rpc_task_run_action: task:1779@5 flags=ASYNC > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-57 > action=call_transmit_status > kworker/u28:12-2191 [004] 194.048535: > rpc_task_run_action: task:1779@5 flags=ASYNC > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0 > action=call_transmit > kworker/u28:12-2191 [004] 194.048535: > rpc_task_sleep: task:1779@5 flags=ASYNC > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-11 timeout=0 > queue=xprt_sending > > > kworker/u28:12-2191 [004] 194.048552: rpcgss_need_reencode: > task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 reencode > unneeded > kworker/u28:12-2191 [004] 194.048557: > xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220 > status=0 > > > kworker/u28:12-2191 [004] 194.048559: rpcgss_need_reencode: > task:1902@5 xid=0x14f5f47c rq_seqno=141360 seq_xmit=141336 reencode > unneeded > kworker/u28:12-2191 [004] 194.048563: > xprt_transmit: task:1902@5 xid=0x14f5f47c seqno=141360 > status=0 > > > Note that first need_reencode: the sequence numbers show that the > xmit > queue has been significantly re-ordered. The request being > transmitted is > already very close to the lower end of the GSS sequence number > window. > > The server then re-ordereds these two slightly because the first one > had > some Read chunks that need to be pulled over, the second was pure > inline > and therefore could be processed immediately. That is enough to force > the > first one outside the GSS sequence number window. > > I haven't looked closely at the pathology of the TCP case. Wait a minute... That's not OK. The client can't be expected to take into account reordering that happens on the server side. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks 2019-01-02 18:57 ` Trond Myklebust @ 2019-01-02 19:06 ` Trond Myklebust 2019-01-02 19:24 ` Trond Myklebust 2019-01-02 19:08 ` Chuck Lever 1 sibling, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2019-01-02 19:06 UTC (permalink / raw) To: chuck.lever; +Cc: linux-nfs On Wed, 2019-01-02 at 13:57 -0500, Trond Myklebust wrote: > On Wed, 2019-01-02 at 13:51 -0500, Chuck Lever wrote: > > > On Jan 2, 2019, at 1:45 PM, Trond Myklebust < > > > trondmy@hammerspace.com> wrote: > > > > > > On Wed, 2019-01-02 at 13:17 -0500, Chuck Lever wrote: > > > > > On Dec 31, 2018, at 2:21 PM, Trond Myklebust < > > > > > trondmy@hammerspace.com> wrote: > > > > > > > > > > On Mon, 2018-12-31 at 19:18 +0000, Trond Myklebust wrote: > > > > > > On Mon, 2018-12-31 at 14:09 -0500, Chuck Lever wrote: > > > > > > > > On Dec 31, 2018, at 1:59 PM, Trond Myklebust < > > > > > > > > trondmy@hammerspace.com> wrote: > > > > > > > > > > > > > > > > > > > > > > The test for rpcauth_xmit_need_reencode() happens when we > > > > > > call > > > > > > xprt_request_transmit() to actually put the RPC call on the > > > > > > wire. > > > > > > The > > > > > > enqueue order should not be able to defeat that test. > > > > > > > > > > > > Hmm... Is it perhaps the test for req->rq_bytes_sent that > > > > > > is > > > > > > failing > > > > > > because this is a retransmission after a > > > > > > disconnect/reconnect > > > > > > that > > > > > > didn't trigger a re-encode? > > > > > > > > > > Actually, it might be worth a try to move the test for > > > > > rpcauth_xmit_need_reencode() outside the enclosing test for > > > > > req- > > > > > > rq_bytes_sent as that is just a minor optimisation. > > > > > > > > Perhaps that's the case for TCP, but RPCs sent via xprtrdma > > > > never > > > > set > > > > req->rq_bytes_sent to a non-zero value. The body of the "if" > > > > statement > > > > is always executed for those RPCs. > > > > > > > > > > Then the question is what is defeating the call to > > > rpcauth_xmit_need_reencode() in xprt_request_transmit() and > > > causing > > > it > > > not to trigger in the misordered cases? > > > > Here's a sample RPC/RDMA case. > > > > My instrumented server reports this: > > > > Jan 2 13:29:00 klimt kernel: gss_check_seq_num: dropped: > > seq_num=141220 sd->sd_max=141360 > > > > > > ftrace log on the client shows this: > > > > kworker/u28:12-2191 [004] 194.048534: rpcgss_need_reencode: > > task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 reencode > > unneeded > > kworker/u28:12-2191 [004] 194.048534: > > xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220 > > status=-57 > > kworker/u28:12-2191 [004] 194.048534: > > rpc_task_run_action: task:1779@5 flags=ASYNC > > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-57 > > action=call_transmit_status > > kworker/u28:12-2191 [004] 194.048535: > > rpc_task_run_action: task:1779@5 flags=ASYNC > > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0 > > action=call_transmit > > kworker/u28:12-2191 [004] 194.048535: > > rpc_task_sleep: task:1779@5 flags=ASYNC > > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-11 timeout=0 > > queue=xprt_sending > > > > > > kworker/u28:12-2191 [004] 194.048552: rpcgss_need_reencode: > > task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 reencode > > unneeded > > kworker/u28:12-2191 [004] 194.048557: > > xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220 > > status=0 > > > > > > kworker/u28:12-2191 [004] 194.048559: rpcgss_need_reencode: > > task:1902@5 xid=0x14f5f47c rq_seqno=141360 seq_xmit=141336 reencode > > unneeded > > kworker/u28:12-2191 [004] 194.048563: > > xprt_transmit: task:1902@5 xid=0x14f5f47c seqno=141360 > > status=0 > > > > > > Note that first need_reencode: the sequence numbers show that the > > xmit > > queue has been significantly re-ordered. The request being > > transmitted is > > already very close to the lower end of the GSS sequence number > > window. > > > > The server then re-ordereds these two slightly because the first > > one > > had > > some Read chunks that need to be pulled over, the second was pure > > inline > > and therefore could be processed immediately. That is enough to > > force > > the > > first one outside the GSS sequence number window. > > > > I haven't looked closely at the pathology of the TCP case. > > Wait a minute... That's not OK. The client can't be expected to take > into account reordering that happens on the server side. If that's the case, then we would need to halt transmission as soon as we hit the RPCSEC_GSS window edge. Off the cuff, I'm not sure how to do that, since those windows are per session (i.e. per user). -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks 2019-01-02 19:06 ` Trond Myklebust @ 2019-01-02 19:24 ` Trond Myklebust 2019-01-02 19:33 ` Chuck Lever 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2019-01-02 19:24 UTC (permalink / raw) To: chuck.lever; +Cc: linux-nfs On Wed, 2019-01-02 at 14:06 -0500, Trond Myklebust wrote: > On Wed, 2019-01-02 at 13:57 -0500, Trond Myklebust wrote: > > On Wed, 2019-01-02 at 13:51 -0500, Chuck Lever wrote: > > > > On Jan 2, 2019, at 1:45 PM, Trond Myklebust < > > > > trondmy@hammerspace.com> wrote: > > > > > > > > On Wed, 2019-01-02 at 13:17 -0500, Chuck Lever wrote: > > > > > > On Dec 31, 2018, at 2:21 PM, Trond Myklebust < > > > > > > trondmy@hammerspace.com> wrote: > > > > > > > > > > > > On Mon, 2018-12-31 at 19:18 +0000, Trond Myklebust wrote: > > > > > > > On Mon, 2018-12-31 at 14:09 -0500, Chuck Lever wrote: > > > > > > > > > On Dec 31, 2018, at 1:59 PM, Trond Myklebust < > > > > > > > > > trondmy@hammerspace.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > The test for rpcauth_xmit_need_reencode() happens when we > > > > > > > call > > > > > > > xprt_request_transmit() to actually put the RPC call on > > > > > > > the > > > > > > > wire. > > > > > > > The > > > > > > > enqueue order should not be able to defeat that test. > > > > > > > > > > > > > > Hmm... Is it perhaps the test for req->rq_bytes_sent that > > > > > > > is > > > > > > > failing > > > > > > > because this is a retransmission after a > > > > > > > disconnect/reconnect > > > > > > > that > > > > > > > didn't trigger a re-encode? > > > > > > > > > > > > Actually, it might be worth a try to move the test for > > > > > > rpcauth_xmit_need_reencode() outside the enclosing test for > > > > > > req- > > > > > > > rq_bytes_sent as that is just a minor optimisation. > > > > > > > > > > Perhaps that's the case for TCP, but RPCs sent via xprtrdma > > > > > never > > > > > set > > > > > req->rq_bytes_sent to a non-zero value. The body of the "if" > > > > > statement > > > > > is always executed for those RPCs. > > > > > > > > > > > > > Then the question is what is defeating the call to > > > > rpcauth_xmit_need_reencode() in xprt_request_transmit() and > > > > causing > > > > it > > > > not to trigger in the misordered cases? > > > > > > Here's a sample RPC/RDMA case. > > > > > > My instrumented server reports this: > > > > > > Jan 2 13:29:00 klimt kernel: gss_check_seq_num: dropped: > > > seq_num=141220 sd->sd_max=141360 > > > > > > > > > ftrace log on the client shows this: > > > > > > kworker/u28:12-2191 [004] 194.048534: rpcgss_need_reencode: > > > task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 > > > reencode > > > unneeded > > > kworker/u28:12-2191 [004] 194.048534: > > > xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220 > > > status=-57 > > > kworker/u28:12-2191 [004] 194.048534: > > > rpc_task_run_action: task:1779@5 flags=ASYNC > > > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-57 > > > action=call_transmit_status > > > kworker/u28:12-2191 [004] 194.048535: > > > rpc_task_run_action: task:1779@5 flags=ASYNC > > > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0 > > > action=call_transmit > > > kworker/u28:12-2191 [004] 194.048535: > > > rpc_task_sleep: task:1779@5 flags=ASYNC > > > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-11 timeout=0 > > > queue=xprt_sending > > > > > > > > > kworker/u28:12-2191 [004] 194.048552: rpcgss_need_reencode: > > > task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 > > > reencode > > > unneeded > > > kworker/u28:12-2191 [004] 194.048557: > > > xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220 > > > status=0 > > > > > > > > > kworker/u28:12-2191 [004] 194.048559: rpcgss_need_reencode: > > > task:1902@5 xid=0x14f5f47c rq_seqno=141360 seq_xmit=141336 > > > reencode > > > unneeded > > > kworker/u28:12-2191 [004] 194.048563: > > > xprt_transmit: task:1902@5 xid=0x14f5f47c seqno=141360 > > > status=0 > > > > > > > > > Note that first need_reencode: the sequence numbers show that the > > > xmit > > > queue has been significantly re-ordered. The request being > > > transmitted is > > > already very close to the lower end of the GSS sequence number > > > window. > > > > > > The server then re-ordereds these two slightly because the first > > > one > > > had > > > some Read chunks that need to be pulled over, the second was pure > > > inline > > > and therefore could be processed immediately. That is enough to > > > force > > > the > > > first one outside the GSS sequence number window. > > > > > > I haven't looked closely at the pathology of the TCP case. > > > > Wait a minute... That's not OK. The client can't be expected to > > take > > into account reordering that happens on the server side. > > If that's the case, then we would need to halt transmission as soon > as > we hit the RPCSEC_GSS window edge. Off the cuff, I'm not sure how to > do > that, since those windows are per session (i.e. per user). So here is something we probably could do: modify xprt_request_enqueue_transmit() to order the list in req->rq_xmit2 by req->rq_seqno. Since task->tk_owner is actually a pid, then that's not a perfect solution, but we could further mitigate by modifying gss_xmit_need_reencode() to only allow transmission of requests that are within 2/3 of the window. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks 2019-01-02 19:24 ` Trond Myklebust @ 2019-01-02 19:33 ` Chuck Lever 0 siblings, 0 replies; 76+ messages in thread From: Chuck Lever @ 2019-01-02 19:33 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS Mailing List > On Jan 2, 2019, at 2:24 PM, Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Wed, 2019-01-02 at 14:06 -0500, Trond Myklebust wrote: >> On Wed, 2019-01-02 at 13:57 -0500, Trond Myklebust wrote: >>> On Wed, 2019-01-02 at 13:51 -0500, Chuck Lever wrote: >>>>> On Jan 2, 2019, at 1:45 PM, Trond Myklebust < >>>>> trondmy@hammerspace.com> wrote: >>>>> >>>>> On Wed, 2019-01-02 at 13:17 -0500, Chuck Lever wrote: >>>>>>> On Dec 31, 2018, at 2:21 PM, Trond Myklebust < >>>>>>> trondmy@hammerspace.com> wrote: >>>>>>> >>>>>>> On Mon, 2018-12-31 at 19:18 +0000, Trond Myklebust wrote: >>>>>>>> On Mon, 2018-12-31 at 14:09 -0500, Chuck Lever wrote: >>>>>>>>>> On Dec 31, 2018, at 1:59 PM, Trond Myklebust < >>>>>>>>>> trondmy@hammerspace.com> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>> The test for rpcauth_xmit_need_reencode() happens when we >>>>>>>> call >>>>>>>> xprt_request_transmit() to actually put the RPC call on >>>>>>>> the >>>>>>>> wire. >>>>>>>> The >>>>>>>> enqueue order should not be able to defeat that test. >>>>>>>> >>>>>>>> Hmm... Is it perhaps the test for req->rq_bytes_sent that >>>>>>>> is >>>>>>>> failing >>>>>>>> because this is a retransmission after a >>>>>>>> disconnect/reconnect >>>>>>>> that >>>>>>>> didn't trigger a re-encode? >>>>>>> >>>>>>> Actually, it might be worth a try to move the test for >>>>>>> rpcauth_xmit_need_reencode() outside the enclosing test for >>>>>>> req- >>>>>>>> rq_bytes_sent as that is just a minor optimisation. >>>>>> >>>>>> Perhaps that's the case for TCP, but RPCs sent via xprtrdma >>>>>> never >>>>>> set >>>>>> req->rq_bytes_sent to a non-zero value. The body of the "if" >>>>>> statement >>>>>> is always executed for those RPCs. >>>>>> >>>>> >>>>> Then the question is what is defeating the call to >>>>> rpcauth_xmit_need_reencode() in xprt_request_transmit() and >>>>> causing >>>>> it >>>>> not to trigger in the misordered cases? >>>> >>>> Here's a sample RPC/RDMA case. >>>> >>>> My instrumented server reports this: >>>> >>>> Jan 2 13:29:00 klimt kernel: gss_check_seq_num: dropped: >>>> seq_num=141220 sd->sd_max=141360 >>>> >>>> >>>> ftrace log on the client shows this: >>>> >>>> kworker/u28:12-2191 [004] 194.048534: rpcgss_need_reencode: >>>> task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 >>>> reencode >>>> unneeded >>>> kworker/u28:12-2191 [004] 194.048534: >>>> xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220 >>>> status=-57 >>>> kworker/u28:12-2191 [004] 194.048534: >>>> rpc_task_run_action: task:1779@5 flags=ASYNC >>>> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-57 >>>> action=call_transmit_status >>>> kworker/u28:12-2191 [004] 194.048535: >>>> rpc_task_run_action: task:1779@5 flags=ASYNC >>>> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0 >>>> action=call_transmit >>>> kworker/u28:12-2191 [004] 194.048535: >>>> rpc_task_sleep: task:1779@5 flags=ASYNC >>>> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-11 timeout=0 >>>> queue=xprt_sending >>>> >>>> >>>> kworker/u28:12-2191 [004] 194.048552: rpcgss_need_reencode: >>>> task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 >>>> reencode >>>> unneeded >>>> kworker/u28:12-2191 [004] 194.048557: >>>> xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220 >>>> status=0 >>>> >>>> >>>> kworker/u28:12-2191 [004] 194.048559: rpcgss_need_reencode: >>>> task:1902@5 xid=0x14f5f47c rq_seqno=141360 seq_xmit=141336 >>>> reencode >>>> unneeded >>>> kworker/u28:12-2191 [004] 194.048563: >>>> xprt_transmit: task:1902@5 xid=0x14f5f47c seqno=141360 >>>> status=0 >>>> >>>> >>>> Note that first need_reencode: the sequence numbers show that the >>>> xmit >>>> queue has been significantly re-ordered. The request being >>>> transmitted is >>>> already very close to the lower end of the GSS sequence number >>>> window. >>>> >>>> The server then re-ordereds these two slightly because the first >>>> one >>>> had >>>> some Read chunks that need to be pulled over, the second was pure >>>> inline >>>> and therefore could be processed immediately. That is enough to >>>> force >>>> the >>>> first one outside the GSS sequence number window. >>>> >>>> I haven't looked closely at the pathology of the TCP case. >>> >>> Wait a minute... That's not OK. The client can't be expected to >>> take >>> into account reordering that happens on the server side. >> >> If that's the case, then we would need to halt transmission as soon >> as >> we hit the RPCSEC_GSS window edge. Off the cuff, I'm not sure how to >> do >> that, since those windows are per session (i.e. per user). > > So here is something we probably could do: modify > xprt_request_enqueue_transmit() to order the list in req->rq_xmit2 by > req->rq_seqno. Why not add " && !req->rq_seq_no " to the third arm? Calls are already enqueued in sequence number order. > Since task->tk_owner is actually a pid, then that's not > a perfect solution, but we could further mitigate by modifying > gss_xmit_need_reencode() to only allow transmission of requests that > are within 2/3 of the window. > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com -- Chuck Lever ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks 2019-01-02 18:57 ` Trond Myklebust 2019-01-02 19:06 ` Trond Myklebust @ 2019-01-02 19:08 ` Chuck Lever 2019-01-02 19:11 ` Trond Myklebust 1 sibling, 1 reply; 76+ messages in thread From: Chuck Lever @ 2019-01-02 19:08 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS Mailing List > On Jan 2, 2019, at 1:57 PM, Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Wed, 2019-01-02 at 13:51 -0500, Chuck Lever wrote: >>> On Jan 2, 2019, at 1:45 PM, Trond Myklebust < >>> trondmy@hammerspace.com> wrote: >>> >>> On Wed, 2019-01-02 at 13:17 -0500, Chuck Lever wrote: >>>>> On Dec 31, 2018, at 2:21 PM, Trond Myklebust < >>>>> trondmy@hammerspace.com> wrote: >>>>> >>>>> On Mon, 2018-12-31 at 19:18 +0000, Trond Myklebust wrote: >>>>>> On Mon, 2018-12-31 at 14:09 -0500, Chuck Lever wrote: >>>>>>>> On Dec 31, 2018, at 1:59 PM, Trond Myklebust < >>>>>>>> trondmy@hammerspace.com> wrote: >>>>>>>> >>>>>>>> >>>>>> The test for rpcauth_xmit_need_reencode() happens when we >>>>>> call >>>>>> xprt_request_transmit() to actually put the RPC call on the >>>>>> wire. >>>>>> The >>>>>> enqueue order should not be able to defeat that test. >>>>>> >>>>>> Hmm... Is it perhaps the test for req->rq_bytes_sent that is >>>>>> failing >>>>>> because this is a retransmission after a disconnect/reconnect >>>>>> that >>>>>> didn't trigger a re-encode? >>>>> >>>>> Actually, it might be worth a try to move the test for >>>>> rpcauth_xmit_need_reencode() outside the enclosing test for >>>>> req- >>>>>> rq_bytes_sent as that is just a minor optimisation. >>>> >>>> Perhaps that's the case for TCP, but RPCs sent via xprtrdma never >>>> set >>>> req->rq_bytes_sent to a non-zero value. The body of the "if" >>>> statement >>>> is always executed for those RPCs. >>>> >>> >>> Then the question is what is defeating the call to >>> rpcauth_xmit_need_reencode() in xprt_request_transmit() and causing >>> it >>> not to trigger in the misordered cases? >> >> Here's a sample RPC/RDMA case. >> >> My instrumented server reports this: >> >> Jan 2 13:29:00 klimt kernel: gss_check_seq_num: dropped: >> seq_num=141220 sd->sd_max=141360 >> >> >> ftrace log on the client shows this: >> >> kworker/u28:12-2191 [004] 194.048534: rpcgss_need_reencode: >> task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 reencode >> unneeded >> kworker/u28:12-2191 [004] 194.048534: >> xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220 >> status=-57 >> kworker/u28:12-2191 [004] 194.048534: >> rpc_task_run_action: task:1779@5 flags=ASYNC >> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-57 >> action=call_transmit_status >> kworker/u28:12-2191 [004] 194.048535: >> rpc_task_run_action: task:1779@5 flags=ASYNC >> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0 >> action=call_transmit >> kworker/u28:12-2191 [004] 194.048535: >> rpc_task_sleep: task:1779@5 flags=ASYNC >> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-11 timeout=0 >> queue=xprt_sending >> >> >> kworker/u28:12-2191 [004] 194.048552: rpcgss_need_reencode: >> task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 reencode >> unneeded >> kworker/u28:12-2191 [004] 194.048557: >> xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220 >> status=0 >> >> >> kworker/u28:12-2191 [004] 194.048559: rpcgss_need_reencode: >> task:1902@5 xid=0x14f5f47c rq_seqno=141360 seq_xmit=141336 reencode >> unneeded >> kworker/u28:12-2191 [004] 194.048563: >> xprt_transmit: task:1902@5 xid=0x14f5f47c seqno=141360 >> status=0 >> >> >> Note that first need_reencode: the sequence numbers show that the >> xmit >> queue has been significantly re-ordered. The request being >> transmitted is >> already very close to the lower end of the GSS sequence number >> window. >> >> The server then re-ordereds these two slightly because the first one >> had >> some Read chunks that need to be pulled over, the second was pure >> inline >> and therefore could be processed immediately. That is enough to force >> the >> first one outside the GSS sequence number window. >> >> I haven't looked closely at the pathology of the TCP case. > > Wait a minute... That's not OK. The client can't be expected to take > into account reordering that happens on the server side. Conversely, the client can't assume the transport and the server don't re-order. This does not appear to be a problem for the v4.19 client: I don't see disconnect storms with that client. -- Chuck Lever ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks 2019-01-02 19:08 ` Chuck Lever @ 2019-01-02 19:11 ` Trond Myklebust 0 siblings, 0 replies; 76+ messages in thread From: Trond Myklebust @ 2019-01-02 19:11 UTC (permalink / raw) To: chuck.lever; +Cc: linux-nfs On Wed, 2019-01-02 at 14:08 -0500, Chuck Lever wrote: > > On Jan 2, 2019, at 1:57 PM, Trond Myklebust < > > trondmy@hammerspace.com> wrote: > > > > On Wed, 2019-01-02 at 13:51 -0500, Chuck Lever wrote: > > > > On Jan 2, 2019, at 1:45 PM, Trond Myklebust < > > > > trondmy@hammerspace.com> wrote: > > > > > > > > On Wed, 2019-01-02 at 13:17 -0500, Chuck Lever wrote: > > > > > > On Dec 31, 2018, at 2:21 PM, Trond Myklebust < > > > > > > trondmy@hammerspace.com> wrote: > > > > > > > > > > > > On Mon, 2018-12-31 at 19:18 +0000, Trond Myklebust wrote: > > > > > > > On Mon, 2018-12-31 at 14:09 -0500, Chuck Lever wrote: > > > > > > > > > On Dec 31, 2018, at 1:59 PM, Trond Myklebust < > > > > > > > > > trondmy@hammerspace.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > The test for rpcauth_xmit_need_reencode() happens when we > > > > > > > call > > > > > > > xprt_request_transmit() to actually put the RPC call on > > > > > > > the > > > > > > > wire. > > > > > > > The > > > > > > > enqueue order should not be able to defeat that test. > > > > > > > > > > > > > > Hmm... Is it perhaps the test for req->rq_bytes_sent that > > > > > > > is > > > > > > > failing > > > > > > > because this is a retransmission after a > > > > > > > disconnect/reconnect > > > > > > > that > > > > > > > didn't trigger a re-encode? > > > > > > > > > > > > Actually, it might be worth a try to move the test for > > > > > > rpcauth_xmit_need_reencode() outside the enclosing test for > > > > > > req- > > > > > > > rq_bytes_sent as that is just a minor optimisation. > > > > > > > > > > Perhaps that's the case for TCP, but RPCs sent via xprtrdma > > > > > never > > > > > set > > > > > req->rq_bytes_sent to a non-zero value. The body of the "if" > > > > > statement > > > > > is always executed for those RPCs. > > > > > > > > > > > > > Then the question is what is defeating the call to > > > > rpcauth_xmit_need_reencode() in xprt_request_transmit() and > > > > causing > > > > it > > > > not to trigger in the misordered cases? > > > > > > Here's a sample RPC/RDMA case. > > > > > > My instrumented server reports this: > > > > > > Jan 2 13:29:00 klimt kernel: gss_check_seq_num: dropped: > > > seq_num=141220 sd->sd_max=141360 > > > > > > > > > ftrace log on the client shows this: > > > > > > kworker/u28:12-2191 [004] 194.048534: rpcgss_need_reencode: > > > task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 > > > reencode > > > unneeded > > > kworker/u28:12-2191 [004] 194.048534: > > > xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220 > > > status=-57 > > > kworker/u28:12-2191 [004] 194.048534: > > > rpc_task_run_action: task:1779@5 flags=ASYNC > > > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-57 > > > action=call_transmit_status > > > kworker/u28:12-2191 [004] 194.048535: > > > rpc_task_run_action: task:1779@5 flags=ASYNC > > > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0 > > > action=call_transmit > > > kworker/u28:12-2191 [004] 194.048535: > > > rpc_task_sleep: task:1779@5 flags=ASYNC > > > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-11 timeout=0 > > > queue=xprt_sending > > > > > > > > > kworker/u28:12-2191 [004] 194.048552: rpcgss_need_reencode: > > > task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 > > > reencode > > > unneeded > > > kworker/u28:12-2191 [004] 194.048557: > > > xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220 > > > status=0 > > > > > > > > > kworker/u28:12-2191 [004] 194.048559: rpcgss_need_reencode: > > > task:1902@5 xid=0x14f5f47c rq_seqno=141360 seq_xmit=141336 > > > reencode > > > unneeded > > > kworker/u28:12-2191 [004] 194.048563: > > > xprt_transmit: task:1902@5 xid=0x14f5f47c seqno=141360 > > > status=0 > > > > > > > > > Note that first need_reencode: the sequence numbers show that the > > > xmit > > > queue has been significantly re-ordered. The request being > > > transmitted is > > > already very close to the lower end of the GSS sequence number > > > window. > > > > > > The server then re-ordereds these two slightly because the first > > > one > > > had > > > some Read chunks that need to be pulled over, the second was pure > > > inline > > > and therefore could be processed immediately. That is enough to > > > force > > > the > > > first one outside the GSS sequence number window. > > > > > > I haven't looked closely at the pathology of the TCP case. > > > > Wait a minute... That's not OK. The client can't be expected to > > take > > into account reordering that happens on the server side. > > Conversely, the client can't assume the transport and the server > don't > re-order. This does not appear to be a problem for the v4.19 client: > I don't see disconnect storms with that client. There is absolutely nothing stopping it from happening in 4.19. It's just very unlikely because the stream is strictly ordered on the client side. So the misordering on the server would have to be pretty extreme. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v3 15/44] SUNRPC: Refactor xprt_transmit() to remove the reply queue code 2018-09-17 13:03 ` [PATCH v3 15/44] SUNRPC: Refactor xprt_transmit() to remove the reply queue code Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 16/44] SUNRPC: Refactor xprt_transmit() to remove wait for reply code Trond Myklebust @ 2018-09-18 21:01 ` Anna Schumaker 2018-09-19 15:48 ` Trond Myklebust 1 sibling, 1 reply; 76+ messages in thread From: Anna Schumaker @ 2018-09-18 21:01 UTC (permalink / raw) To: Trond Myklebust, linux-nfs Hi Trond, I'm seeing this crash while running cthon tests (on any NFS version) after applying this patch: [ 50.780104] general protection fault: 0000 [#1] PREEMPT SMP PTI [ 50.780796] CPU: 0 PID: 384 Comm: kworker/u5:1 Not tainted 4.19.0-rc4-ANNA+ #7455 [ 50.781601] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 50.782232] Workqueue: xprtiod xs_tcp_data_receive_workfn [sunrpc] [ 50.782911] RIP: 0010:xprt_lookup_rqst+0x2c/0x150 [sunrpc] [ 50.783510] Code: 48 8d 97 58 04 00 00 41 54 49 89 fc 55 89 f5 53 48 8b 87 58 04 00 00 48 39 c2 74 26 48 8d 98 48 ff ff ff 3b 70 e0 75 07 eb 3f <39> 68 e0 74 3a 48 8b 83 b8 00 00 00 48 8d 98 48 ff ff ff 48 39 c2 [ 50.785501] RSP: 0018:ffffc90000bebd60 EFLAGS: 00010202 [ 50.786090] RAX: dead000000000100 RBX: dead000000000048 RCX: 0000000000000051 [ 50.786853] RDX: ffff8800b915dc58 RSI: 000000005a1c5631 RDI: ffff8800b915d800 [ 50.787616] RBP: 000000005a1c5631 R08: 0000000000000000 R09: 00646f6974727078 [ 50.788380] R10: 8080808080808080 R11: 00000000000ee5f3 R12: ffff8800b915d800 [ 50.789153] R13: ffff8800b915dc18 R14: ffff8800b915d800 R15: ffffffffa03265b4 [ 50.789930] FS: 0000000000000000(0000) GS:ffff8800bca00000(0000) knlGS:0000000000000000 [ 50.790797] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 50.791416] CR2: 00007f9b670538b0 CR3: 000000000200a001 CR4: 00000000001606f0 [ 50.792182] Call Trace: [ 50.792471] xs_tcp_data_recv+0x3a6/0x780 [sunrpc] [ 50.792993] ? __switch_to_asm+0x34/0x70 [ 50.793426] ? xs_tcp_check_fraghdr.part.1+0x40/0x40 [sunrpc] [ 50.794047] tcp_read_sock+0x93/0x1b0 [ 50.794447] ? __switch_to_asm+0x40/0x70 [ 50.794879] xs_tcp_data_receive_workfn+0xb2/0x190 [sunrpc] [ 50.795482] process_one_work+0x1e6/0x3c0 [ 50.795928] worker_thread+0x28/0x3c0 [ 50.796337] ? process_one_work+0x3c0/0x3c0 [ 50.796814] kthread+0x10d/0x130 [ 50.797170] ? kthread_park+0x80/0x80 [ 50.797570] ret_from_fork+0x35/0x40 [ 50.797961] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 nfs fscache cfg80211 rpcrdma rfkill crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel joydev pcbc mousedev aesni_intel psmouse aes_x86_64 evdev crypto_simd cryptd input_leds glue_helper led_class mac_hid pcspkr intel_agp intel_gtt i2c_piix4 nfsd button auth_rpcgss nfs_acl lockd grace sunrpc sch_fq_codel ip_tables x_tables ata_generic pata_acpi ata_piix serio_raw uhci_hcd atkbd ehci_pci libps2 ehci_hcd libata usbcore usb_common i8042 floppy serio scsi_mod xfs virtio_balloon virtio_net net_failover failover virtio_pci virtio_blk virtio_ring virtio Cheers, Anna On Mon, 2018-09-17 at 09:03 -0400, Trond Myklebust wrote: > Separate out the action of adding a request to the reply queue so that the > backchannel code can simply skip calling it altogether. > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> > --- > include/linux/sunrpc/xprt.h | 1 + > net/sunrpc/backchannel_rqst.c | 1 - > net/sunrpc/clnt.c | 5 ++ > net/sunrpc/xprt.c | 126 +++++++++++++++++++----------- > net/sunrpc/xprtrdma/backchannel.c | 1 - > 5 files changed, 88 insertions(+), 46 deletions(-) > > diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h > index c25d0a5fda69..0250294c904a 100644 > --- a/include/linux/sunrpc/xprt.h > +++ b/include/linux/sunrpc/xprt.h > @@ -334,6 +334,7 @@ void xprt_free_slot(struct rpc_xprt > *xprt, > struct rpc_rqst *req); > void xprt_lock_and_alloc_slot(struct rpc_xprt *xprt, struct > rpc_task *task); > bool xprt_prepare_transmit(struct rpc_task *task); > +void xprt_request_enqueue_receive(struct rpc_task *task); > void xprt_transmit(struct rpc_task *task); > void xprt_end_transmit(struct rpc_task *task); > int xprt_adjust_timeout(struct rpc_rqst *req); > diff --git a/net/sunrpc/backchannel_rqst.c b/net/sunrpc/backchannel_rqst.c > index 3c15a99b9700..fa5ba6ed3197 100644 > --- a/net/sunrpc/backchannel_rqst.c > +++ b/net/sunrpc/backchannel_rqst.c > @@ -91,7 +91,6 @@ struct rpc_rqst *xprt_alloc_bc_req(struct rpc_xprt *xprt, > gfp_t gfp_flags) > return NULL; > > req->rq_xprt = xprt; > - INIT_LIST_HEAD(&req->rq_list); > INIT_LIST_HEAD(&req->rq_bc_list); > > /* Preallocate one XDR receive buffer */ > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c > index a858366cd15d..414966273a3f 100644 > --- a/net/sunrpc/clnt.c > +++ b/net/sunrpc/clnt.c > @@ -1962,6 +1962,11 @@ call_transmit(struct rpc_task *task) > return; > } > } > + > + /* Add task to reply queue before transmission to avoid races */ > + if (rpc_reply_expected(task)) > + xprt_request_enqueue_receive(task); > + > if (!xprt_prepare_transmit(task)) > return; > task->tk_action = call_transmit_status; > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c > index 6e3d4b4ee79e..d8f870b5dd46 100644 > --- a/net/sunrpc/xprt.c > +++ b/net/sunrpc/xprt.c > @@ -888,6 +888,61 @@ static void xprt_wait_on_pinned_rqst(struct rpc_rqst > *req) > wait_var_event(&req->rq_pin, !xprt_is_pinned_rqst(req)); > } > > +static bool > +xprt_request_data_received(struct rpc_task *task) > +{ > + return !test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate) && > + READ_ONCE(task->tk_rqstp->rq_reply_bytes_recvd) != 0; > +} > + > +static bool > +xprt_request_need_enqueue_receive(struct rpc_task *task, struct rpc_rqst > *req) > +{ > + return !xprt_request_data_received(task); > +} > + > +/** > + * xprt_request_enqueue_receive - Add an request to the receive queue > + * @task: RPC task > + * > + */ > +void > +xprt_request_enqueue_receive(struct rpc_task *task) > +{ > + struct rpc_rqst *req = task->tk_rqstp; > + struct rpc_xprt *xprt = req->rq_xprt; > + > + if (!xprt_request_need_enqueue_receive(task, req)) > + return; > + spin_lock(&xprt->queue_lock); > + > + /* Update the softirq receive buffer */ > + memcpy(&req->rq_private_buf, &req->rq_rcv_buf, > + sizeof(req->rq_private_buf)); > + > + /* Add request to the receive list */ > + list_add_tail(&req->rq_list, &xprt->recv); > + set_bit(RPC_TASK_NEED_RECV, &task->tk_runstate); > + spin_unlock(&xprt->queue_lock); > + > + xprt_reset_majortimeo(req); > + /* Turn off autodisconnect */ > + del_singleshot_timer_sync(&xprt->timer); > +} > + > +/** > + * xprt_request_dequeue_receive_locked - Remove a request from the receive > queue > + * @task: RPC task > + * > + * Caller must hold xprt->queue_lock. > + */ > +static void > +xprt_request_dequeue_receive_locked(struct rpc_task *task) > +{ > + if (test_and_clear_bit(RPC_TASK_NEED_RECV, &task->tk_runstate)) > + list_del(&task->tk_rqstp->rq_list); > +} > + > /** > * xprt_update_rtt - Update RPC RTT statistics > * @task: RPC request that recently completed > @@ -927,24 +982,16 @@ void xprt_complete_rqst(struct rpc_task *task, int > copied) > > xprt->stat.recvs++; > > - list_del_init(&req->rq_list); > req->rq_private_buf.len = copied; > /* Ensure all writes are done before we update */ > /* req->rq_reply_bytes_recvd */ > smp_wmb(); > req->rq_reply_bytes_recvd = copied; > - clear_bit(RPC_TASK_NEED_RECV, &task->tk_runstate); > + xprt_request_dequeue_receive_locked(task); > rpc_wake_up_queued_task(&xprt->pending, task); > } > EXPORT_SYMBOL_GPL(xprt_complete_rqst); > > -static bool > -xprt_request_data_received(struct rpc_task *task) > -{ > - return !test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate) && > - task->tk_rqstp->rq_reply_bytes_recvd != 0; > -} > - > static void xprt_timer(struct rpc_task *task) > { > struct rpc_rqst *req = task->tk_rqstp; > @@ -1018,32 +1065,15 @@ void xprt_transmit(struct rpc_task *task) > > dprintk("RPC: %5u xprt_transmit(%u)\n", task->tk_pid, req->rq_slen); > > - if (!req->rq_reply_bytes_recvd) { > - > + if (!req->rq_bytes_sent) { > + if (xprt_request_data_received(task)) > + return; > /* Verify that our message lies in the RPCSEC_GSS window */ > - if (!req->rq_bytes_sent && rpcauth_xmit_need_reencode(task)) { > + if (rpcauth_xmit_need_reencode(task)) { > task->tk_status = -EBADMSG; > return; > } > - > - if (list_empty(&req->rq_list) && rpc_reply_expected(task)) { > - /* > - * Add to the list only if we're expecting a reply > - */ > - /* Update the softirq receive buffer */ > - memcpy(&req->rq_private_buf, &req->rq_rcv_buf, > - sizeof(req->rq_private_buf)); > - /* Add request to the receive list */ > - spin_lock(&xprt->queue_lock); > - list_add_tail(&req->rq_list, &xprt->recv); > - set_bit(RPC_TASK_NEED_RECV, &task->tk_runstate); > - spin_unlock(&xprt->queue_lock); > - xprt_reset_majortimeo(req); > - /* Turn off autodisconnect */ > - del_singleshot_timer_sync(&xprt->timer); > - } > - } else if (xprt_request_data_received(task) && !req->rq_bytes_sent) > - return; > + } > > connect_cookie = xprt->connect_cookie; > status = xprt->ops->send_request(task); > @@ -1285,7 +1315,6 @@ xprt_request_init(struct rpc_task *task) > struct rpc_xprt *xprt = task->tk_xprt; > struct rpc_rqst *req = task->tk_rqstp; > > - INIT_LIST_HEAD(&req->rq_list); > req->rq_timeout = task->tk_client->cl_timeout->to_initval; > req->rq_task = task; > req->rq_xprt = xprt; > @@ -1355,6 +1384,26 @@ void xprt_retry_reserve(struct rpc_task *task) > xprt_do_reserve(xprt, task); > } > > +static void > +xprt_request_dequeue_all(struct rpc_task *task, struct rpc_rqst *req) > +{ > + struct rpc_xprt *xprt = req->rq_xprt; > + > + if (test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate) || > + xprt_is_pinned_rqst(req)) { > + spin_lock(&xprt->queue_lock); > + xprt_request_dequeue_receive_locked(task); > + while (xprt_is_pinned_rqst(req)) { > + set_bit(RPC_TASK_MSG_PIN_WAIT, &task->tk_runstate); > + spin_unlock(&xprt->queue_lock); > + xprt_wait_on_pinned_rqst(req); > + spin_lock(&xprt->queue_lock); > + clear_bit(RPC_TASK_MSG_PIN_WAIT, &task->tk_runstate); > + } > + spin_unlock(&xprt->queue_lock); > + } > +} > + > /** > * xprt_release - release an RPC request slot > * @task: task which is finished with the slot > @@ -1379,18 +1428,7 @@ void xprt_release(struct rpc_task *task) > task->tk_ops->rpc_count_stats(task, task->tk_calldata); > else if (task->tk_client) > rpc_count_iostats(task, task->tk_client->cl_metrics); > - spin_lock(&xprt->queue_lock); > - if (!list_empty(&req->rq_list)) { > - list_del_init(&req->rq_list); > - if (xprt_is_pinned_rqst(req)) { > - set_bit(RPC_TASK_MSG_PIN_WAIT, &req->rq_task- > >tk_runstate); > - spin_unlock(&xprt->queue_lock); > - xprt_wait_on_pinned_rqst(req); > - spin_lock(&xprt->queue_lock); > - clear_bit(RPC_TASK_MSG_PIN_WAIT, &req->rq_task- > >tk_runstate); > - } > - } > - spin_unlock(&xprt->queue_lock); > + xprt_request_dequeue_all(task, req); > spin_lock_bh(&xprt->transport_lock); > xprt->ops->release_xprt(xprt, task); > if (xprt->ops->release_request) > diff --git a/net/sunrpc/xprtrdma/backchannel.c > b/net/sunrpc/xprtrdma/backchannel.c > index 90adeff4c06b..ed58761e6b23 100644 > --- a/net/sunrpc/xprtrdma/backchannel.c > +++ b/net/sunrpc/xprtrdma/backchannel.c > @@ -51,7 +51,6 @@ static int rpcrdma_bc_setup_reqs(struct rpcrdma_xprt > *r_xprt, > rqst = &req->rl_slot; > > rqst->rq_xprt = xprt; > - INIT_LIST_HEAD(&rqst->rq_list); > INIT_LIST_HEAD(&rqst->rq_bc_list); > __set_bit(RPC_BC_PA_IN_USE, &rqst->rq_bc_pa_state); > spin_lock_bh(&xprt->bc_pa_lock); ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v3 15/44] SUNRPC: Refactor xprt_transmit() to remove the reply queue code 2018-09-18 21:01 ` [PATCH v3 15/44] SUNRPC: Refactor xprt_transmit() to remove the reply queue code Anna Schumaker @ 2018-09-19 15:48 ` Trond Myklebust 2018-09-19 17:30 ` Anna Schumaker 0 siblings, 1 reply; 76+ messages in thread From: Trond Myklebust @ 2018-09-19 15:48 UTC (permalink / raw) To: linux-nfs, schumaker.anna T24gVHVlLCAyMDE4LTA5LTE4IGF0IDE3OjAxIC0wNDAwLCBBbm5hIFNjaHVtYWtlciB3cm90ZToN Cj4gSGkgVHJvbmQsDQo+IA0KPiBJJ20gc2VlaW5nIHRoaXMgY3Jhc2ggd2hpbGUgcnVubmluZyBj dGhvbiB0ZXN0cyAob24gYW55IE5GUyB2ZXJzaW9uKQ0KPiBhZnRlcg0KPiBhcHBseWluZyB0aGlz IHBhdGNoOg0KPiANCj4gWyAgIDUwLjc4MDEwNF0gZ2VuZXJhbCBwcm90ZWN0aW9uIGZhdWx0OiAw MDAwIFsjMV0gUFJFRU1QVCBTTVAgUFRJDQo+IFsgICA1MC43ODA3OTZdIENQVTogMCBQSUQ6IDM4 NCBDb21tOiBrd29ya2VyL3U1OjEgTm90IHRhaW50ZWQgNC4xOS4wLQ0KPiByYzQtQU5OQSsNCj4g Izc0NTUNCj4gWyAgIDUwLjc4MTYwMV0gSGFyZHdhcmUgbmFtZTogQm9jaHMgQm9jaHMsIEJJT1Mg Qm9jaHMgMDEvMDEvMjAxMQ0KPiBbICAgNTAuNzgyMjMyXSBXb3JrcXVldWU6IHhwcnRpb2QgeHNf dGNwX2RhdGFfcmVjZWl2ZV93b3JrZm4gW3N1bnJwY10NCj4gWyAgIDUwLjc4MjkxMV0gUklQOiAw MDEwOnhwcnRfbG9va3VwX3Jxc3QrMHgyYy8weDE1MCBbc3VucnBjXQ0KPiBbICAgNTAuNzgzNTEw XSBDb2RlOiA0OCA4ZCA5NyA1OCAwNCAwMCAwMCA0MSA1NCA0OSA4OSBmYyA1NSA4OSBmNSA1Mw0K PiA0OCA4YiA4NyA1OA0KPiAwNCAwMCAwMCA0OCAzOSBjMiA3NCAyNiA0OCA4ZCA5OCA0OCBmZiBm ZiBmZiAzYiA3MCBlMCA3NSAwNyBlYiAzZg0KPiA8Mzk+IDY4IGUwIDc0DQo+IDNhIDQ4IDhiIDgz IGI4IDAwIDAwIDAwIDQ4IDhkIDk4IDQ4IGZmIGZmIGZmIDQ4IDM5IGMyDQo+IFsgICA1MC43ODU1 MDFdIFJTUDogMDAxODpmZmZmYzkwMDAwYmViZDYwIEVGTEFHUzogMDAwMTAyMDINCj4gWyAgIDUw Ljc4NjA5MF0gUkFYOiBkZWFkMDAwMDAwMDAwMTAwIFJCWDogZGVhZDAwMDAwMDAwMDA0OCBSQ1g6 DQo+IDAwMDAwMDAwMDAwMDAwNTENCj4gWyAgIDUwLjc4Njg1M10gUkRYOiBmZmZmODgwMGI5MTVk YzU4IFJTSTogMDAwMDAwMDA1YTFjNTYzMSBSREk6DQo+IGZmZmY4ODAwYjkxNWQ4MDANCj4gWyAg IDUwLjc4NzYxNl0gUkJQOiAwMDAwMDAwMDVhMWM1NjMxIFIwODogMDAwMDAwMDAwMDAwMDAwMCBS MDk6DQo+IDAwNjQ2ZjY5NzQ3MjcwNzgNCj4gWyAgIDUwLjc4ODM4MF0gUjEwOiA4MDgwODA4MDgw ODA4MDgwIFIxMTogMDAwMDAwMDAwMDBlZTVmMyBSMTI6DQo+IGZmZmY4ODAwYjkxNWQ4MDANCj4g WyAgIDUwLjc4OTE1M10gUjEzOiBmZmZmODgwMGI5MTVkYzE4IFIxNDogZmZmZjg4MDBiOTE1ZDgw MCBSMTU6DQo+IGZmZmZmZmZmYTAzMjY1YjQNCj4gWyAgIDUwLjc4OTkzMF0gRlM6ICAwMDAwMDAw MDAwMDAwMDAwKDAwMDApIEdTOmZmZmY4ODAwYmNhMDAwMDAoMDAwMCkNCj4ga25sR1M6MDAwMDAw MDAwMDAwMDAwMA0KPiBbICAgNTAuNzkwNzk3XSBDUzogIDAwMTAgRFM6IDAwMDAgRVM6IDAwMDAg Q1IwOiAwMDAwMDAwMDgwMDUwMDMzDQo+IFsgICA1MC43OTE0MTZdIENSMjogMDAwMDdmOWI2NzA1 MzhiMCBDUjM6IDAwMDAwMDAwMDIwMGEwMDEgQ1I0Og0KPiAwMDAwMDAwMDAwMTYwNmYwDQo+IFsg ICA1MC43OTIxODJdIENhbGwgVHJhY2U6DQo+IFsgICA1MC43OTI0NzFdICB4c190Y3BfZGF0YV9y ZWN2KzB4M2E2LzB4NzgwIFtzdW5ycGNdDQo+IFsgICA1MC43OTI5OTNdICA/IF9fc3dpdGNoX3Rv X2FzbSsweDM0LzB4NzANCj4gWyAgIDUwLjc5MzQyNl0gID8geHNfdGNwX2NoZWNrX2ZyYWdoZHIu cGFydC4xKzB4NDAvMHg0MCBbc3VucnBjXQ0KPiBbICAgNTAuNzk0MDQ3XSAgdGNwX3JlYWRfc29j aysweDkzLzB4MWIwDQo+IFsgICA1MC43OTQ0NDddICA/IF9fc3dpdGNoX3RvX2FzbSsweDQwLzB4 NzANCj4gWyAgIDUwLjc5NDg3OV0gIHhzX3RjcF9kYXRhX3JlY2VpdmVfd29ya2ZuKzB4YjIvMHgx OTAgW3N1bnJwY10NCj4gWyAgIDUwLjc5NTQ4Ml0gIHByb2Nlc3Nfb25lX3dvcmsrMHgxZTYvMHgz YzANCj4gWyAgIDUwLjc5NTkyOF0gIHdvcmtlcl90aHJlYWQrMHgyOC8weDNjMA0KPiBbICAgNTAu Nzk2MzM3XSAgPyBwcm9jZXNzX29uZV93b3JrKzB4M2MwLzB4M2MwDQo+IFsgICA1MC43OTY4MTRd ICBrdGhyZWFkKzB4MTBkLzB4MTMwDQo+IFsgICA1MC43OTcxNzBdICA/IGt0aHJlYWRfcGFyaysw eDgwLzB4ODANCj4gWyAgIDUwLjc5NzU3MF0gIHJldF9mcm9tX2ZvcmsrMHgzNS8weDQwDQo+IFsg ICA1MC43OTc5NjFdIE1vZHVsZXMgbGlua2VkIGluOiBuZnN2MyBycGNzZWNfZ3NzX2tyYjUgbmZz djQgbmZzDQo+IGZzY2FjaGUNCj4gY2ZnODAyMTEgcnBjcmRtYSByZmtpbGwgY3JjdDEwZGlmX3Bj bG11bCBjcmMzMl9wY2xtdWwgY3JjMzJjX2ludGVsDQo+IGdoYXNoX2NsbXVsbmlfaW50ZWwgam95 ZGV2IHBjYmMgbW91c2VkZXYgYWVzbmlfaW50ZWwgcHNtb3VzZQ0KPiBhZXNfeDg2XzY0IGV2ZGV2 DQo+IGNyeXB0b19zaW1kIGNyeXB0ZCBpbnB1dF9sZWRzIGdsdWVfaGVscGVyIGxlZF9jbGFzcyBt YWNfaGlkIHBjc3Brcg0KPiBpbnRlbF9hZ3ANCj4gaW50ZWxfZ3R0IGkyY19waWl4NCBuZnNkIGJ1 dHRvbiBhdXRoX3JwY2dzcyBuZnNfYWNsIGxvY2tkIGdyYWNlDQo+IHN1bnJwYw0KPiBzY2hfZnFf Y29kZWwgaXBfdGFibGVzIHhfdGFibGVzIGF0YV9nZW5lcmljIHBhdGFfYWNwaSBhdGFfcGlpeA0K PiBzZXJpb19yYXcNCj4gdWhjaV9oY2QgYXRrYmQgZWhjaV9wY2kgbGlicHMyIGVoY2lfaGNkIGxp YmF0YSB1c2Jjb3JlIHVzYl9jb21tb24NCj4gaTgwNDIgZmxvcHB5DQo+IHNlcmlvIHNjc2lfbW9k IHhmcyB2aXJ0aW9fYmFsbG9vbiB2aXJ0aW9fbmV0IG5ldF9mYWlsb3ZlciBmYWlsb3Zlcg0KPiB2 aXJ0aW9fcGNpDQo+IHZpcnRpb19ibGsgdmlydGlvX3JpbmcgdmlydGlvDQo+IA0KDQpUaGFua3Mg Zm9yIGZpbmRpbmcgdGhhdCEgSXQgbG9va3MgbGlrZSB0aGUgZGVmaW5pdGlvbiBvZg0KeHBydF9y ZXF1ZXN0X25lZWRfZW5xdWV1ZV9yZWNlaXZlKCkgd2FzIGluY29ycmVjdCBzbyBJJ3ZlIHB1c2hl ZCBvdXQgYQ0KZml4ZWQgdmVyc2lvbiB0byB0aGUgJ3Rlc3RpbmcnIGJyYW5jaC4NCg0KLS0gDQpU cm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lciwgSGFtbWVyc3BhY2UN CnRyb25kLm15a2xlYnVzdEBoYW1tZXJzcGFjZS5jb20NCg0KDQo= ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v3 15/44] SUNRPC: Refactor xprt_transmit() to remove the reply queue code 2018-09-19 15:48 ` Trond Myklebust @ 2018-09-19 17:30 ` Anna Schumaker 0 siblings, 0 replies; 76+ messages in thread From: Anna Schumaker @ 2018-09-19 17:30 UTC (permalink / raw) To: Trond Myklebust, linux-nfs On Wed, 2018-09-19 at 15:48 +0000, Trond Myklebust wrote: > On Tue, 2018-09-18 at 17:01 -0400, Anna Schumaker wrote: > > Hi Trond, > > > > I'm seeing this crash while running cthon tests (on any NFS version) > > after > > applying this patch: > > > > [ 50.780104] general protection fault: 0000 [#1] PREEMPT SMP PTI > > [ 50.780796] CPU: 0 PID: 384 Comm: kworker/u5:1 Not tainted 4.19.0- > > rc4-ANNA+ > > #7455 > > [ 50.781601] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > > [ 50.782232] Workqueue: xprtiod xs_tcp_data_receive_workfn [sunrpc] > > [ 50.782911] RIP: 0010:xprt_lookup_rqst+0x2c/0x150 [sunrpc] > > [ 50.783510] Code: 48 8d 97 58 04 00 00 41 54 49 89 fc 55 89 f5 53 > > 48 8b 87 58 > > 04 00 00 48 39 c2 74 26 48 8d 98 48 ff ff ff 3b 70 e0 75 07 eb 3f > > <39> 68 e0 74 > > 3a 48 8b 83 b8 00 00 00 48 8d 98 48 ff ff ff 48 39 c2 > > [ 50.785501] RSP: 0018:ffffc90000bebd60 EFLAGS: 00010202 > > [ 50.786090] RAX: dead000000000100 RBX: dead000000000048 RCX: > > 0000000000000051 > > [ 50.786853] RDX: ffff8800b915dc58 RSI: 000000005a1c5631 RDI: > > ffff8800b915d800 > > [ 50.787616] RBP: 000000005a1c5631 R08: 0000000000000000 R09: > > 00646f6974727078 > > [ 50.788380] R10: 8080808080808080 R11: 00000000000ee5f3 R12: > > ffff8800b915d800 > > [ 50.789153] R13: ffff8800b915dc18 R14: ffff8800b915d800 R15: > > ffffffffa03265b4 > > [ 50.789930] FS: 0000000000000000(0000) GS:ffff8800bca00000(0000) > > knlGS:0000000000000000 > > [ 50.790797] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 50.791416] CR2: 00007f9b670538b0 CR3: 000000000200a001 CR4: > > 00000000001606f0 > > [ 50.792182] Call Trace: > > [ 50.792471] xs_tcp_data_recv+0x3a6/0x780 [sunrpc] > > [ 50.792993] ? __switch_to_asm+0x34/0x70 > > [ 50.793426] ? xs_tcp_check_fraghdr.part.1+0x40/0x40 [sunrpc] > > [ 50.794047] tcp_read_sock+0x93/0x1b0 > > [ 50.794447] ? __switch_to_asm+0x40/0x70 > > [ 50.794879] xs_tcp_data_receive_workfn+0xb2/0x190 [sunrpc] > > [ 50.795482] process_one_work+0x1e6/0x3c0 > > [ 50.795928] worker_thread+0x28/0x3c0 > > [ 50.796337] ? process_one_work+0x3c0/0x3c0 > > [ 50.796814] kthread+0x10d/0x130 > > [ 50.797170] ? kthread_park+0x80/0x80 > > [ 50.797570] ret_from_fork+0x35/0x40 > > [ 50.797961] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 nfs > > fscache > > cfg80211 rpcrdma rfkill crct10dif_pclmul crc32_pclmul crc32c_intel > > ghash_clmulni_intel joydev pcbc mousedev aesni_intel psmouse > > aes_x86_64 evdev > > crypto_simd cryptd input_leds glue_helper led_class mac_hid pcspkr > > intel_agp > > intel_gtt i2c_piix4 nfsd button auth_rpcgss nfs_acl lockd grace > > sunrpc > > sch_fq_codel ip_tables x_tables ata_generic pata_acpi ata_piix > > serio_raw > > uhci_hcd atkbd ehci_pci libps2 ehci_hcd libata usbcore usb_common > > i8042 floppy > > serio scsi_mod xfs virtio_balloon virtio_net net_failover failover > > virtio_pci > > virtio_blk virtio_ring virtio > > > > Thanks for finding that! It looks like the definition of > xprt_request_need_enqueue_receive() was incorrect so I've pushed out a > fixed version to the 'testing' branch. The new version works for me, thanks! Anna > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com > > ^ permalink raw reply [flat|nested] 76+ messages in thread
end of thread, other threads:[~2019-01-02 19:33 UTC | newest] Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-09-17 13:02 [PATCH v3 00/44] Convert RPC client transmission to a queued model Trond Myklebust 2018-09-17 13:02 ` [PATCH v3 01/44] SUNRPC: Clean up initialisation of the struct rpc_rqst Trond Myklebust 2018-09-17 13:02 ` [PATCH v3 02/44] SUNRPC: If there is no reply expected, bail early from call_decode Trond Myklebust 2018-09-17 13:02 ` [PATCH v3 03/44] SUNRPC: The transmitted message must lie in the RPCSEC window of validity Trond Myklebust 2018-09-17 13:02 ` [PATCH v3 04/44] SUNRPC: Simplify identification of when the message send/receive is complete Trond Myklebust 2018-09-17 13:02 ` [PATCH v3 05/44] SUNRPC: Avoid holding locks across the XDR encoding of the RPC message Trond Myklebust 2018-09-17 13:02 ` [PATCH v3 06/44] SUNRPC: Rename TCP receive-specific state variables Trond Myklebust 2018-09-17 13:02 ` [PATCH v3 07/44] SUNRPC: Move reset of TCP state variables into the reconnect code Trond Myklebust 2018-09-17 13:02 ` [PATCH v3 08/44] SUNRPC: Add socket transmit queue offset tracking Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 09/44] SUNRPC: Simplify dealing with aborted partially transmitted messages Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 10/44] SUNRPC: Refactor the transport request pinning Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 11/44] SUNRPC: Add a helper to wake up a sleeping rpc_task and set its status Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 12/44] SUNRPC: Test whether the task is queued before grabbing the queue spinlocks Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 13/44] SUNRPC: Don't wake queued RPC calls multiple times in xprt_transmit Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 14/44] SUNRPC: Rename xprt->recv_lock to xprt->queue_lock Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 15/44] SUNRPC: Refactor xprt_transmit() to remove the reply queue code Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 16/44] SUNRPC: Refactor xprt_transmit() to remove wait for reply code Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 17/44] SUNRPC: Minor cleanup for call_transmit() Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 18/44] SUNRPC: Distinguish between the slot allocation list and receive queue Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 19/44] SUNRPC: Add a transmission queue for RPC requests Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 20/44] SUNRPC: Refactor RPC call encoding Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 21/44] SUNRPC: Fix up the back channel transmit Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 22/44] SUNRPC: Treat the task and request as separate in the xprt_ops->send_request() Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 23/44] SUNRPC: Don't reset the request 'bytes_sent' counter when releasing XPRT_LOCK Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 24/44] SUNRPC: Simplify xprt_prepare_transmit() Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 25/44] SUNRPC: Move RPC retransmission stat counter to xprt_transmit() Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 27/44] SUNRPC: Support for congestion control when queuing is enabled Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 28/44] SUNRPC: Enqueue swapper tagged RPCs at the head of the transmit queue Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 29/44] SUNRPC: Allow calls to xprt_transmit() to drain the entire " Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 30/44] SUNRPC: Allow soft RPC calls to time out when waiting for the XPRT_LOCK Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 31/44] SUNRPC: Turn off throttling of RPC slots for TCP sockets Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 32/44] SUNRPC: Clean up transport write space handling Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 33/44] SUNRPC: Cleanup: remove the unused 'task' argument from the request_send() Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 34/44] SUNRPC: Don't take transport->lock unnecessarily when taking XPRT_LOCK Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 35/44] SUNRPC: Convert xprt receive queue to use an rbtree Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 36/44] SUNRPC: Fix priority queue fairness Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 37/44] SUNRPC: Convert the xprt->sending queue back to an ordinary wait queue Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 38/44] SUNRPC: Add a label for RPC calls that require allocation on receive Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 39/44] SUNRPC: Add a bvec array to struct xdr_buf for use with iovec_iter() Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 41/44] SUNRPC: Clean up - rename xs_tcp_data_receive() to xs_stream_data_receive() Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 42/44] SUNRPC: Allow AF_LOCAL sockets to use the generic stream receive Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 43/44] SUNRPC: Clean up xs_udp_data_receive() Trond Myklebust 2018-09-17 13:03 ` [PATCH v3 44/44] SUNRPC: Unexport xdr_partial_copy_from_skb() Trond Myklebust 2018-09-17 20:44 ` [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators Trond Myklebust 2018-11-09 11:19 ` Catalin Marinas 2018-11-29 19:28 ` Cristian Marussi 2018-11-29 19:56 ` Trond Myklebust 2018-11-30 16:19 ` Cristian Marussi 2018-11-30 19:31 ` Trond Myklebust 2018-12-02 16:44 ` Trond Myklebust 2018-12-03 11:45 ` Catalin Marinas 2018-12-03 11:53 ` Cristian Marussi 2018-12-03 18:54 ` Cristian Marussi 2018-12-27 19:21 ` [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks Chuck Lever 2018-12-27 22:14 ` Trond Myklebust 2018-12-27 22:34 ` Chuck Lever 2018-12-31 18:09 ` Trond Myklebust 2018-12-31 18:44 ` Chuck Lever 2018-12-31 18:59 ` Trond Myklebust 2018-12-31 19:09 ` Chuck Lever 2018-12-31 19:18 ` Trond Myklebust 2018-12-31 19:21 ` Trond Myklebust 2019-01-02 18:17 ` Chuck Lever 2019-01-02 18:45 ` Trond Myklebust 2019-01-02 18:51 ` Chuck Lever 2019-01-02 18:57 ` Trond Myklebust 2019-01-02 19:06 ` Trond Myklebust 2019-01-02 19:24 ` Trond Myklebust 2019-01-02 19:33 ` Chuck Lever 2019-01-02 19:08 ` Chuck Lever 2019-01-02 19:11 ` Trond Myklebust 2018-09-18 21:01 ` [PATCH v3 15/44] SUNRPC: Refactor xprt_transmit() to remove the reply queue code Anna Schumaker 2018-09-19 15:48 ` Trond Myklebust 2018-09-19 17:30 ` Anna Schumaker
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).