All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tom Talpey <tom-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>
To: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Linux NFS Mailing List
	<linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH v1 1/9] xprtrdma: Add a safety margin for receive buffers
Date: Mon, 23 Nov 2015 20:22:19 -0500	[thread overview]
Message-ID: <5653BBCB.8010107@talpey.com> (raw)
In-Reply-To: <D946BAC3-26D5-4801-BD50-9F026EEF6551-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

On 11/23/2015 8:16 PM, Chuck Lever wrote:
>
>> On Nov 23, 2015, at 7:55 PM, Tom Talpey <tom-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org> wrote:
>>
>> On 11/23/2015 5:13 PM, Chuck Lever wrote:
>>> Rarely, senders post a Send that is larger than the client's inline
>>> threshold. That can be due to a bug, or the client and server may
>>> not have communicated about their inline limits. RPC-over-RDMA
>>> currently doesn't specify any particular limit on inline size, so
>>> peers have to guess what it is.
>>>
>>> It is fatal to the connection if the size of a Send is larger than
>>> the client's receive buffer. The sender is likely to retry with the
>>> same message size, so the workload is stuck at that point.
>>>
>>> Follow Postel's robustness principal: Be conservative in what you
>>> do, be liberal in what you accept from others. Increase the size of
>>> client receive buffers by a safety margin, and add a warning when
>>> the inline threshold is exceeded during receive.
>>
>> Safety is good, but how do know the chosen value is enough?
>> Isn't it better to fail the badly-composed request and be done
>> with it? Even if the stupid sender loops, which it will do
>> anyway.
>
> It’s good enough to compensate for the most common sender bug,
> which is that the sender did not account for the 28 bytes of
> the RPC-over-RDMA header when it built the send buffer. The
> additional 100 byte margin is gravy.

I think it's good to have sympathy and resilience to differing
designs on the other end of the wire, but I fail to have it for
stupid bugs. Unless this can take down the receiver, fail it fast.

MHO.

>
> The loop occurs because the server gets a completion error.
> The client just sees a connection loss. There’s no way for it
> to know it should fail the RPC, so it keeps trying.
>
> Perhaps the server could remember that the reply failed, and
> when the client retransmits, it can simply return that XID
> with an RDMA_ERROR.
>
>
>>> Note the Linux server's receive buffers are already page-sized.
>>>
>>> Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>>> ---
>>>   net/sunrpc/xprtrdma/rpc_rdma.c  |    7 +++++++
>>>   net/sunrpc/xprtrdma/verbs.c     |    6 +++++-
>>>   net/sunrpc/xprtrdma/xprt_rdma.h |    5 +++++
>>>   3 files changed, 17 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
>>> index c10d969..a169252 100644
>>> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
>>> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
>>> @@ -776,6 +776,7 @@ rpcrdma_reply_handler(struct rpcrdma_rep *rep)
>>>   	int rdmalen, status;
>>>   	unsigned long cwnd;
>>>   	u32 credits;
>>> +	RPC_IFDEBUG(struct rpcrdma_create_data_internal *cdata);
>>>
>>>   	dprintk("RPC:       %s: incoming rep %p\n", __func__, rep);
>>>
>>> @@ -783,6 +784,12 @@ rpcrdma_reply_handler(struct rpcrdma_rep *rep)
>>>   		goto out_badstatus;
>>>   	if (rep->rr_len < RPCRDMA_HDRLEN_MIN)
>>>   		goto out_shortreply;
>>> +#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
>>> +	cdata = &r_xprt->rx_data;
>>> +	if (rep->rr_len > cdata->inline_rsize)
>>> +		pr_warn("RPC: %u byte reply exceeds inline threshold\n",
>>> +			rep->rr_len);
>>> +#endif
>>>
>>>   	headerp = rdmab_to_msg(rep->rr_rdmabuf);
>>>   	if (headerp->rm_vers != rpcrdma_version)
>>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>>> index eadd1655..e3f12e2 100644
>>> --- a/net/sunrpc/xprtrdma/verbs.c
>>> +++ b/net/sunrpc/xprtrdma/verbs.c
>>> @@ -924,7 +924,11 @@ rpcrdma_create_rep(struct rpcrdma_xprt *r_xprt)
>>>   	if (rep == NULL)
>>>   		goto out;
>>>
>>> -	rep->rr_rdmabuf = rpcrdma_alloc_regbuf(ia, cdata->inline_rsize,
>>> +	/* The actual size of our receive buffers is increased slightly
>>> +	 * to prevent small receive overruns from killing our connection.
>>> +	 */
>>> +	rep->rr_rdmabuf = rpcrdma_alloc_regbuf(ia, cdata->inline_rsize +
>>> +					       RPCRDMA_RECV_MARGIN,
>>>   					       GFP_KERNEL);
>>>   	if (IS_ERR(rep->rr_rdmabuf)) {
>>>   		rc = PTR_ERR(rep->rr_rdmabuf);
>>> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
>>> index ac7f8d4..1b72ab1 100644
>>> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
>>> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
>>> @@ -337,6 +337,11 @@ struct rpcrdma_create_data_internal {
>>>   #define RPCRDMA_INLINE_PAD_VALUE(rq)\
>>>   	rpcx_to_rdmad(rq->rq_xprt).padding
>>>
>>> +/* To help prevent spurious connection shutdown, allow senders to
>>> + * overrun our receive inline threshold by a small bit.
>>> + */
>>> +#define RPCRDMA_RECV_MARGIN	(128)
>>> +
>>>   /*
>>>    * Statistics for RPCRDMA
>>>    */
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>
> --
> Chuck Lever
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)
From: Tom Talpey <tom@talpey.com>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: linux-rdma@vger.kernel.org,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH v1 1/9] xprtrdma: Add a safety margin for receive buffers
Date: Mon, 23 Nov 2015 20:22:19 -0500	[thread overview]
Message-ID: <5653BBCB.8010107@talpey.com> (raw)
In-Reply-To: <D946BAC3-26D5-4801-BD50-9F026EEF6551@oracle.com>

On 11/23/2015 8:16 PM, Chuck Lever wrote:
>
>> On Nov 23, 2015, at 7:55 PM, Tom Talpey <tom@talpey.com> wrote:
>>
>> On 11/23/2015 5:13 PM, Chuck Lever wrote:
>>> Rarely, senders post a Send that is larger than the client's inline
>>> threshold. That can be due to a bug, or the client and server may
>>> not have communicated about their inline limits. RPC-over-RDMA
>>> currently doesn't specify any particular limit on inline size, so
>>> peers have to guess what it is.
>>>
>>> It is fatal to the connection if the size of a Send is larger than
>>> the client's receive buffer. The sender is likely to retry with the
>>> same message size, so the workload is stuck at that point.
>>>
>>> Follow Postel's robustness principal: Be conservative in what you
>>> do, be liberal in what you accept from others. Increase the size of
>>> client receive buffers by a safety margin, and add a warning when
>>> the inline threshold is exceeded during receive.
>>
>> Safety is good, but how do know the chosen value is enough?
>> Isn't it better to fail the badly-composed request and be done
>> with it? Even if the stupid sender loops, which it will do
>> anyway.
>
> It’s good enough to compensate for the most common sender bug,
> which is that the sender did not account for the 28 bytes of
> the RPC-over-RDMA header when it built the send buffer. The
> additional 100 byte margin is gravy.

I think it's good to have sympathy and resilience to differing
designs on the other end of the wire, but I fail to have it for
stupid bugs. Unless this can take down the receiver, fail it fast.

MHO.

>
> The loop occurs because the server gets a completion error.
> The client just sees a connection loss. There’s no way for it
> to know it should fail the RPC, so it keeps trying.
>
> Perhaps the server could remember that the reply failed, and
> when the client retransmits, it can simply return that XID
> with an RDMA_ERROR.
>
>
>>> Note the Linux server's receive buffers are already page-sized.
>>>
>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>> ---
>>>   net/sunrpc/xprtrdma/rpc_rdma.c  |    7 +++++++
>>>   net/sunrpc/xprtrdma/verbs.c     |    6 +++++-
>>>   net/sunrpc/xprtrdma/xprt_rdma.h |    5 +++++
>>>   3 files changed, 17 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
>>> index c10d969..a169252 100644
>>> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
>>> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
>>> @@ -776,6 +776,7 @@ rpcrdma_reply_handler(struct rpcrdma_rep *rep)
>>>   	int rdmalen, status;
>>>   	unsigned long cwnd;
>>>   	u32 credits;
>>> +	RPC_IFDEBUG(struct rpcrdma_create_data_internal *cdata);
>>>
>>>   	dprintk("RPC:       %s: incoming rep %p\n", __func__, rep);
>>>
>>> @@ -783,6 +784,12 @@ rpcrdma_reply_handler(struct rpcrdma_rep *rep)
>>>   		goto out_badstatus;
>>>   	if (rep->rr_len < RPCRDMA_HDRLEN_MIN)
>>>   		goto out_shortreply;
>>> +#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
>>> +	cdata = &r_xprt->rx_data;
>>> +	if (rep->rr_len > cdata->inline_rsize)
>>> +		pr_warn("RPC: %u byte reply exceeds inline threshold\n",
>>> +			rep->rr_len);
>>> +#endif
>>>
>>>   	headerp = rdmab_to_msg(rep->rr_rdmabuf);
>>>   	if (headerp->rm_vers != rpcrdma_version)
>>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>>> index eadd1655..e3f12e2 100644
>>> --- a/net/sunrpc/xprtrdma/verbs.c
>>> +++ b/net/sunrpc/xprtrdma/verbs.c
>>> @@ -924,7 +924,11 @@ rpcrdma_create_rep(struct rpcrdma_xprt *r_xprt)
>>>   	if (rep == NULL)
>>>   		goto out;
>>>
>>> -	rep->rr_rdmabuf = rpcrdma_alloc_regbuf(ia, cdata->inline_rsize,
>>> +	/* The actual size of our receive buffers is increased slightly
>>> +	 * to prevent small receive overruns from killing our connection.
>>> +	 */
>>> +	rep->rr_rdmabuf = rpcrdma_alloc_regbuf(ia, cdata->inline_rsize +
>>> +					       RPCRDMA_RECV_MARGIN,
>>>   					       GFP_KERNEL);
>>>   	if (IS_ERR(rep->rr_rdmabuf)) {
>>>   		rc = PTR_ERR(rep->rr_rdmabuf);
>>> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
>>> index ac7f8d4..1b72ab1 100644
>>> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
>>> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
>>> @@ -337,6 +337,11 @@ struct rpcrdma_create_data_internal {
>>>   #define RPCRDMA_INLINE_PAD_VALUE(rq)\
>>>   	rpcx_to_rdmad(rq->rq_xprt).padding
>>>
>>> +/* To help prevent spurious connection shutdown, allow senders to
>>> + * overrun our receive inline threshold by a small bit.
>>> + */
>>> +#define RPCRDMA_RECV_MARGIN	(128)
>>> +
>>>   /*
>>>    * Statistics for RPCRDMA
>>>    */
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>
> --
> Chuck Lever
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

  parent reply	other threads:[~2015-11-24  1:22 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-23 22:13 [PATCH v1 0/9] NFS/RDMA client patches for 4.5 Chuck Lever
2015-11-23 22:13 ` Chuck Lever
     [not found] ` <20151123220627.32702.62667.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2015-11-23 22:13   ` [PATCH v1 1/9] xprtrdma: Add a safety margin for receive buffers Chuck Lever
2015-11-23 22:13     ` Chuck Lever
     [not found]     ` <20151123221357.32702.59922.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2015-11-24  0:55       ` Tom Talpey
2015-11-24  0:55         ` Tom Talpey
     [not found]         ` <5653B586.705-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>
2015-11-24  1:16           ` Chuck Lever
2015-11-24  1:16             ` Chuck Lever
     [not found]             ` <D946BAC3-26D5-4801-BD50-9F026EEF6551-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-11-24  1:22               ` Tom Talpey [this message]
2015-11-24  1:22                 ` Tom Talpey
     [not found]                 ` <5653BBCB.8010107-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>
2015-11-24  1:44                   ` Chuck Lever
2015-11-24  1:44                     ` Chuck Lever
2015-11-23 22:14   ` [PATCH v1 2/9] xprtrdma: Move struct ib_send_wr off the stack Chuck Lever
2015-11-23 22:14     ` Chuck Lever
2015-11-23 22:14   ` [PATCH v1 3/9] xprtrdma: Introduce ro_unmap_sync method Chuck Lever
2015-11-23 22:14     ` Chuck Lever
     [not found]     ` <20151123221414.32702.87638.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2015-11-24  6:45       ` Christoph Hellwig
2015-11-24  6:45         ` Christoph Hellwig
     [not found]         ` <20151124064556.GA29141-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-11-24  7:28           ` Jason Gunthorpe
2015-11-24  7:28             ` Jason Gunthorpe
2015-11-24 10:59           ` Sagi Grimberg
2015-11-24 10:59             ` Sagi Grimberg
     [not found]             ` <565442F5.7080400-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-11-24 13:43               ` Tom Talpey
2015-11-24 13:43                 ` Tom Talpey
     [not found]                 ` <5654697E.1030809-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>
2015-11-24 14:40                   ` Sagi Grimberg
2015-11-24 14:40                     ` Sagi Grimberg
2015-11-24 14:39               ` Chuck Lever
2015-11-24 14:39                 ` Chuck Lever
     [not found]                 ` <4B2D7C66-31AC-44F3-A8CC-22CC7136015C-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-11-24 14:44                   ` Sagi Grimberg
2015-11-24 14:44                     ` Sagi Grimberg
     [not found]                     ` <565477CC.5070309-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-11-24 15:20                       ` Chuck Lever
2015-11-24 15:20                         ` Chuck Lever
2015-11-24 18:57                   ` Jason Gunthorpe
2015-11-24 18:57                     ` Jason Gunthorpe
2015-11-23 22:14   ` [PATCH v1 4/9] xprtrdma: Add ro_unmap_sync method for FRWR Chuck Lever
2015-11-23 22:14     ` Chuck Lever
2015-11-23 22:14   ` [PATCH v1 5/9] xprtrdma: Add ro_unmap_sync method for FMR Chuck Lever
2015-11-23 22:14     ` Chuck Lever
     [not found]     ` <20151123221430.32702.86114.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2015-11-24  0:57       ` Tom Talpey
2015-11-24  0:57         ` Tom Talpey
     [not found]         ` <5653B606.3070700-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>
2015-11-24  6:52           ` Future of FMR support, was: " Christoph Hellwig
2015-11-24  6:52             ` Christoph Hellwig
     [not found]             ` <20151124065225.GB29141-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-11-24  7:12               ` Jason Gunthorpe
2015-11-24  7:12                 ` Jason Gunthorpe
     [not found]                 ` <20151124071215.GC23597-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-12-01 15:33                   ` Chuck Lever
2015-12-01 15:33                     ` Chuck Lever
     [not found]                     ` <AE86B182-6000-4437-8502-9D2C5EC3B09D-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-12-02 12:27                       ` Christoph Hellwig
2015-12-02 12:27                         ` Christoph Hellwig
2015-11-24 12:36               ` Tom Talpey
2015-11-24 12:36                 ` Tom Talpey
2015-11-24 21:54               ` santosh shilimkar
2015-11-24 21:54                 ` santosh shilimkar
     [not found]                 ` <5654DC7A.7080807-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-11-25  9:00                   ` Christoph Hellwig
2015-11-25  9:00                     ` Christoph Hellwig
     [not found]                     ` <20151125090009.GA11255-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-11-25 17:09                       ` santosh shilimkar
2015-11-25 17:09                         ` santosh shilimkar
     [not found]                         ` <5655EB40.8070508-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-11-25 18:22                           ` Or Gerlitz
2015-11-25 18:22                             ` Or Gerlitz
     [not found]                             ` <CAJ3xEMgC5aa8pvvvs0y8O+h_WBk-em07gHvOgP=DFDrv4ydGAg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-11-25 19:17                               ` santosh shilimkar
2015-11-25 19:17                                 ` santosh shilimkar
2015-11-25 19:28                           ` Jason Gunthorpe
2015-11-25 19:28                             ` Jason Gunthorpe
     [not found]                             ` <20151125192824.GD3223-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-11-26 10:01                               ` Sagi Grimberg
2015-11-26 10:01                                 ` Sagi Grimberg
2015-11-23 22:14   ` [PATCH v1 6/9] xprtrdma: Add ro_unmap_sync method for all-physical registration Chuck Lever
2015-11-23 22:14     ` Chuck Lever
2015-11-23 22:14   ` [PATCH v1 7/9] SUNRPC: Introduct xprt_commit_rqst() Chuck Lever
2015-11-23 22:14     ` Chuck Lever
     [not found]     ` <20151123221446.32702.24797.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2015-11-24 19:54       ` Anna Schumaker
2015-11-24 19:54         ` Anna Schumaker
     [not found]         ` <5654C079.7060507-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
2015-11-24 19:56           ` Chuck Lever
2015-11-24 19:56             ` Chuck Lever
2015-11-23 22:14   ` [PATCH v1 8/9] xprtrdma: Invalidate in the RPC reply handler Chuck Lever
2015-11-23 22:14     ` Chuck Lever
     [not found]     ` <20151123221454.32702.76062.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2015-11-24  1:01       ` Tom Talpey
2015-11-24  1:01         ` Tom Talpey
2015-11-23 22:15   ` [PATCH v1 9/9] xprtrdma: Revert commit e7104a2a9606 ('xprtrdma: Cap req_cqinit') Chuck Lever
2015-11-23 22:15     ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5653BBCB.8010107@talpey.com \
    --to=tom-cls1zie5n5hqt0dzr+alfa@public.gmane.org \
    --cc=chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.