From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chuck Lever Subject: Re: [PATCH v1 1/9] xprtrdma: Add a safety margin for receive buffers Date: Mon, 23 Nov 2015 20:44:38 -0500 Message-ID: References: <20151123220627.32702.62667.stgit@manet.1015granger.net> <20151123221357.32702.59922.stgit@manet.1015granger.net> <5653B586.705@talpey.com> <5653BBCB.8010107@talpey.com> Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: <5653BBCB.8010107-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org> Sender: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Tom Talpey Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Linux NFS Mailing List List-Id: linux-rdma@vger.kernel.org > On Nov 23, 2015, at 8:22 PM, Tom Talpey wrote: >=20 > On 11/23/2015 8:16 PM, Chuck Lever wrote: >>=20 >>> On Nov 23, 2015, at 7:55 PM, Tom Talpey wrote: >>>=20 >>> On 11/23/2015 5:13 PM, Chuck Lever wrote: >>>> Rarely, senders post a Send that is larger than the client's inlin= e >>>> threshold. That can be due to a bug, or the client and server may >>>> not have communicated about their inline limits. RPC-over-RDMA >>>> currently doesn't specify any particular limit on inline size, so >>>> peers have to guess what it is. >>>>=20 >>>> It is fatal to the connection if the size of a Send is larger than >>>> the client's receive buffer. The sender is likely to retry with th= e >>>> same message size, so the workload is stuck at that point. >>>>=20 >>>> Follow Postel's robustness principal: Be conservative in what you >>>> do, be liberal in what you accept from others. Increase the size o= f >>>> client receive buffers by a safety margin, and add a warning when >>>> the inline threshold is exceeded during receive. >>>=20 >>> Safety is good, but how do know the chosen value is enough? >>> Isn't it better to fail the badly-composed request and be done >>> with it? Even if the stupid sender loops, which it will do >>> anyway. >>=20 >> It=E2=80=99s good enough to compensate for the most common sender bu= g, >> which is that the sender did not account for the 28 bytes of >> the RPC-over-RDMA header when it built the send buffer. The >> additional 100 byte margin is gravy. >=20 > I think it's good to have sympathy and resilience to differing > designs on the other end of the wire, but I fail to have it for > stupid bugs. Unless this can take down the receiver, fail it fast. >=20 > MHO. See above: the client can=E2=80=99t tell why it=E2=80=99s failed. Again, the Send on the server side fails with LOCAL_LEN_ERR and the connection is terminated. The client sees only the connection loss. There=E2=80=99s no distinction between this and a cable pull or a server crash, where you do want the client to retransmit. I agree it=E2=80=99s a dumb server bug. But the idea is to preserve the connection, since NFS retransmits are a hazard. Just floating this idea here, this is v1. This one can be dropped. >> The loop occurs because the server gets a completion error. >> The client just sees a connection loss. There=E2=80=99s no way for i= t >> to know it should fail the RPC, so it keeps trying. >>=20 >> Perhaps the server could remember that the reply failed, and >> when the client retransmits, it can simply return that XID >> with an RDMA_ERROR. >>=20 >>=20 >>>> Note the Linux server's receive buffers are already page-sized. >>>>=20 >>>> Signed-off-by: Chuck Lever >>>> --- >>>> net/sunrpc/xprtrdma/rpc_rdma.c | 7 +++++++ >>>> net/sunrpc/xprtrdma/verbs.c | 6 +++++- >>>> net/sunrpc/xprtrdma/xprt_rdma.h | 5 +++++ >>>> 3 files changed, 17 insertions(+), 1 deletion(-) >>>>=20 >>>> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/= rpc_rdma.c >>>> index c10d969..a169252 100644 >>>> --- a/net/sunrpc/xprtrdma/rpc_rdma.c >>>> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c >>>> @@ -776,6 +776,7 @@ rpcrdma_reply_handler(struct rpcrdma_rep *rep) >>>> int rdmalen, status; >>>> unsigned long cwnd; >>>> u32 credits; >>>> + RPC_IFDEBUG(struct rpcrdma_create_data_internal *cdata); >>>>=20 >>>> dprintk("RPC: %s: incoming rep %p\n", __func__, rep); >>>>=20 >>>> @@ -783,6 +784,12 @@ rpcrdma_reply_handler(struct rpcrdma_rep *rep= ) >>>> goto out_badstatus; >>>> if (rep->rr_len < RPCRDMA_HDRLEN_MIN) >>>> goto out_shortreply; >>>> +#if IS_ENABLED(CONFIG_SUNRPC_DEBUG) >>>> + cdata =3D &r_xprt->rx_data; >>>> + if (rep->rr_len > cdata->inline_rsize) >>>> + pr_warn("RPC: %u byte reply exceeds inline threshold\n", >>>> + rep->rr_len); >>>> +#endif >>>>=20 >>>> headerp =3D rdmab_to_msg(rep->rr_rdmabuf); >>>> if (headerp->rm_vers !=3D rpcrdma_version) >>>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/ver= bs.c >>>> index eadd1655..e3f12e2 100644 >>>> --- a/net/sunrpc/xprtrdma/verbs.c >>>> +++ b/net/sunrpc/xprtrdma/verbs.c >>>> @@ -924,7 +924,11 @@ rpcrdma_create_rep(struct rpcrdma_xprt *r_xpr= t) >>>> if (rep =3D=3D NULL) >>>> goto out; >>>>=20 >>>> - rep->rr_rdmabuf =3D rpcrdma_alloc_regbuf(ia, cdata->inline_rsize= , >>>> + /* The actual size of our receive buffers is increased slightly >>>> + * to prevent small receive overruns from killing our connection= =2E >>>> + */ >>>> + rep->rr_rdmabuf =3D rpcrdma_alloc_regbuf(ia, cdata->inline_rsize= + >>>> + RPCRDMA_RECV_MARGIN, >>>> GFP_KERNEL); >>>> if (IS_ERR(rep->rr_rdmabuf)) { >>>> rc =3D PTR_ERR(rep->rr_rdmabuf); >>>> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma= /xprt_rdma.h >>>> index ac7f8d4..1b72ab1 100644 >>>> --- a/net/sunrpc/xprtrdma/xprt_rdma.h >>>> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h >>>> @@ -337,6 +337,11 @@ struct rpcrdma_create_data_internal { >>>> #define RPCRDMA_INLINE_PAD_VALUE(rq)\ >>>> rpcx_to_rdmad(rq->rq_xprt).padding >>>>=20 >>>> +/* To help prevent spurious connection shutdown, allow senders to >>>> + * overrun our receive inline threshold by a small bit. >>>> + */ >>>> +#define RPCRDMA_RECV_MARGIN (128) >>>> + >>>> /* >>>> * Statistics for RPCRDMA >>>> */ >>>>=20 >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-nf= s" in >>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>=20 >>>>=20 >>=20 >> -- >> Chuck Lever >>=20 >>=20 >>=20 >>=20 >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs"= in >> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >>=20 >>=20 -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" i= n the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from aserp1040.oracle.com ([141.146.126.69]:46984 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752463AbbKXBom convert rfc822-to-8bit (ORCPT ); Mon, 23 Nov 2015 20:44:42 -0500 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: [PATCH v1 1/9] xprtrdma: Add a safety margin for receive buffers From: Chuck Lever In-Reply-To: <5653BBCB.8010107@talpey.com> Date: Mon, 23 Nov 2015 20:44:38 -0500 Cc: linux-rdma@vger.kernel.org, Linux NFS Mailing List Message-Id: References: <20151123220627.32702.62667.stgit@manet.1015granger.net> <20151123221357.32702.59922.stgit@manet.1015granger.net> <5653B586.705@talpey.com> <5653BBCB.8010107@talpey.com> To: Tom Talpey Sender: linux-nfs-owner@vger.kernel.org List-ID: > On Nov 23, 2015, at 8:22 PM, Tom Talpey wrote: > > On 11/23/2015 8:16 PM, Chuck Lever wrote: >> >>> On Nov 23, 2015, at 7:55 PM, Tom Talpey wrote: >>> >>> On 11/23/2015 5:13 PM, Chuck Lever wrote: >>>> Rarely, senders post a Send that is larger than the client's inline >>>> threshold. That can be due to a bug, or the client and server may >>>> not have communicated about their inline limits. RPC-over-RDMA >>>> currently doesn't specify any particular limit on inline size, so >>>> peers have to guess what it is. >>>> >>>> It is fatal to the connection if the size of a Send is larger than >>>> the client's receive buffer. The sender is likely to retry with the >>>> same message size, so the workload is stuck at that point. >>>> >>>> Follow Postel's robustness principal: Be conservative in what you >>>> do, be liberal in what you accept from others. Increase the size of >>>> client receive buffers by a safety margin, and add a warning when >>>> the inline threshold is exceeded during receive. >>> >>> Safety is good, but how do know the chosen value is enough? >>> Isn't it better to fail the badly-composed request and be done >>> with it? Even if the stupid sender loops, which it will do >>> anyway. >> >> It’s good enough to compensate for the most common sender bug, >> which is that the sender did not account for the 28 bytes of >> the RPC-over-RDMA header when it built the send buffer. The >> additional 100 byte margin is gravy. > > I think it's good to have sympathy and resilience to differing > designs on the other end of the wire, but I fail to have it for > stupid bugs. Unless this can take down the receiver, fail it fast. > > MHO. See above: the client can’t tell why it’s failed. Again, the Send on the server side fails with LOCAL_LEN_ERR and the connection is terminated. The client sees only the connection loss. There’s no distinction between this and a cable pull or a server crash, where you do want the client to retransmit. I agree it’s a dumb server bug. But the idea is to preserve the connection, since NFS retransmits are a hazard. Just floating this idea here, this is v1. This one can be dropped. >> The loop occurs because the server gets a completion error. >> The client just sees a connection loss. There’s no way for it >> to know it should fail the RPC, so it keeps trying. >> >> Perhaps the server could remember that the reply failed, and >> when the client retransmits, it can simply return that XID >> with an RDMA_ERROR. >> >> >>>> Note the Linux server's receive buffers are already page-sized. >>>> >>>> Signed-off-by: Chuck Lever >>>> --- >>>> net/sunrpc/xprtrdma/rpc_rdma.c | 7 +++++++ >>>> net/sunrpc/xprtrdma/verbs.c | 6 +++++- >>>> net/sunrpc/xprtrdma/xprt_rdma.h | 5 +++++ >>>> 3 files changed, 17 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c >>>> index c10d969..a169252 100644 >>>> --- a/net/sunrpc/xprtrdma/rpc_rdma.c >>>> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c >>>> @@ -776,6 +776,7 @@ rpcrdma_reply_handler(struct rpcrdma_rep *rep) >>>> int rdmalen, status; >>>> unsigned long cwnd; >>>> u32 credits; >>>> + RPC_IFDEBUG(struct rpcrdma_create_data_internal *cdata); >>>> >>>> dprintk("RPC: %s: incoming rep %p\n", __func__, rep); >>>> >>>> @@ -783,6 +784,12 @@ rpcrdma_reply_handler(struct rpcrdma_rep *rep) >>>> goto out_badstatus; >>>> if (rep->rr_len < RPCRDMA_HDRLEN_MIN) >>>> goto out_shortreply; >>>> +#if IS_ENABLED(CONFIG_SUNRPC_DEBUG) >>>> + cdata = &r_xprt->rx_data; >>>> + if (rep->rr_len > cdata->inline_rsize) >>>> + pr_warn("RPC: %u byte reply exceeds inline threshold\n", >>>> + rep->rr_len); >>>> +#endif >>>> >>>> headerp = rdmab_to_msg(rep->rr_rdmabuf); >>>> if (headerp->rm_vers != rpcrdma_version) >>>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c >>>> index eadd1655..e3f12e2 100644 >>>> --- a/net/sunrpc/xprtrdma/verbs.c >>>> +++ b/net/sunrpc/xprtrdma/verbs.c >>>> @@ -924,7 +924,11 @@ rpcrdma_create_rep(struct rpcrdma_xprt *r_xprt) >>>> if (rep == NULL) >>>> goto out; >>>> >>>> - rep->rr_rdmabuf = rpcrdma_alloc_regbuf(ia, cdata->inline_rsize, >>>> + /* The actual size of our receive buffers is increased slightly >>>> + * to prevent small receive overruns from killing our connection. >>>> + */ >>>> + rep->rr_rdmabuf = rpcrdma_alloc_regbuf(ia, cdata->inline_rsize + >>>> + RPCRDMA_RECV_MARGIN, >>>> GFP_KERNEL); >>>> if (IS_ERR(rep->rr_rdmabuf)) { >>>> rc = PTR_ERR(rep->rr_rdmabuf); >>>> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h >>>> index ac7f8d4..1b72ab1 100644 >>>> --- a/net/sunrpc/xprtrdma/xprt_rdma.h >>>> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h >>>> @@ -337,6 +337,11 @@ struct rpcrdma_create_data_internal { >>>> #define RPCRDMA_INLINE_PAD_VALUE(rq)\ >>>> rpcx_to_rdmad(rq->rq_xprt).padding >>>> >>>> +/* To help prevent spurious connection shutdown, allow senders to >>>> + * overrun our receive inline threshold by a small bit. >>>> + */ >>>> +#define RPCRDMA_RECV_MARGIN (128) >>>> + >>>> /* >>>> * Statistics for RPCRDMA >>>> */ >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >> >> -- >> Chuck Lever >> >> >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- Chuck Lever