All of lore.kernel.org
 help / color / mirror / Atom feed
From: "J.Bruce Fields" <bfields@citi.umich.edu>
To: NeilBrown <neilb@suse.de>
Cc: Ben Myers <bpm@sgi.com>, Olga Kornievskaia <aglo@citi.umich.edu>,
	NFS <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH] NFSD/sunrpc: avoid deadlock on TCP connection due to memory pressure.
Date: Thu, 25 Jul 2013 16:18:05 -0400	[thread overview]
Message-ID: <20130725201805.GB17962@fieldses.org> (raw)
In-Reply-To: <20130725113023.7bcbc347@notabene.brown>

On Thu, Jul 25, 2013 at 11:30:23AM +1000, NeilBrown wrote:
> 
> Since we enabled auto-tuning for sunrpc TCP connections we do not
> guarantee that there is enough write-space on each connection to
> queue a reply.
> 
> If memory pressure causes the window to shrink too small, the request
> throttling in sunrpc/svc will not accept any requests so no more requests
> will be handled.  Even when pressure decreases the window will not
> grow again until data is sent on the connection.
> This means we get a deadlock:  no requests will be handled until there
> is more space, and no space will be allocated until a request is
> handled.
> 
> This can be simulated by modifying svc_tcp_has_wspace to inflate the
> number of byte required and removing the 'svc_sock_setbufsize' calls
> in svc_setup_socket.

Ah-hah!

> I found that multiplying by 16 was enough to make the requirement
> exceed the default allocation.  With this modification in place:
>    mount -o vers=3,proto=tcp 127.0.0.1:/home /mnt
> would block and eventually time out because the nfs server could not
> accept any requests.

So, this?:

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 305374d..36de50d 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1193,6 +1193,7 @@ static int svc_tcp_has_wspace(struct svc_xprt *xprt)
 	if (test_bit(XPT_LISTENER, &xprt->xpt_flags))
 		return 1;
 	required = atomic_read(&xprt->xpt_reserved) + serv->sv_max_mesg;
+	required *= 16;
 	if (sk_stream_wspace(svsk->sk_sk) >= required)
 		return 1;
 	set_bit(SOCK_NOSPACE, &svsk->sk_sock->flags);
@@ -1378,14 +1379,8 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
 	/* Initialize the socket */
 	if (sock->type == SOCK_DGRAM)
 		svc_udp_init(svsk, serv);
-	else {
-		/* initialise setting must have enough space to
-		 * receive and respond to one request.
-		 */
-		svc_sock_setbufsize(svsk->sk_sock, 4 * serv->sv_max_mesg,
-					4 * serv->sv_max_mesg);
+	else
 		svc_tcp_init(svsk, serv);
-	}
 
 	dprintk("svc: svc_setup_socket created %p (inet %p)\n",
 				svsk, svsk->sk_sk);

> This patch relaxes the request throttling to always allow at least one
> request through per connection.  It does this by checking both
>   sk_stream_min_wspace() and xprt->xpt_reserved
> are zero.
> The first is zero when the TCP transmit queue is empty.
> The second is zero when there are no RPC requests being processed.
> When both of these are zero the socket is idle and so one more
> request can safely be allowed through.
> 
> Applying this patch allows the above mount command to succeed cleanly.
> Tracing shows that the allocated write buffer space quickly grows and
> after a few requests are handled, the extra tests are no longer needed
> to permit further requests to be processed.
> 
> The main purpose of request throttling is to handle the case when one
> client is slow at collecting replies and the send queue gets full of
> replies that the client hasn't acknowledged (at the TCP level) yet.
> As we only change behaviour when the send queue is empty this main
> purpose is still preserved.
> 
> Reported-by: Ben Myers <bpm@sgi.com>
> Signed-off-by: NeilBrown <neilb@suse.de>
> 
> --
> As you can see I've changed the patch.  While writing up the above 
> description realised there was a weakness and so added the sk_stream_min_wspace
> test.  That allowed me to write the final paragraph.

This is great, thanks!

Inclined to queue it up for 3.11 and stable....

--b.

> 
> NeilBrown
> 
> 
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 305374d..7762b9f 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -1193,7 +1193,9 @@ static int svc_tcp_has_wspace(struct svc_xprt *xprt)
>  	if (test_bit(XPT_LISTENER, &xprt->xpt_flags))
>  		return 1;
>  	required = atomic_read(&xprt->xpt_reserved) + serv->sv_max_mesg;
> -	if (sk_stream_wspace(svsk->sk_sk) >= required)
> +	if (sk_stream_wspace(svsk->sk_sk) >= required ||
> +	    (sk_stream_min_wspace(svsk->sk_sk) == 0 &&
> +	     atomic_read(&xprt->xpt_reserved) == 0))
>  		return 1;
>  	set_bit(SOCK_NOSPACE, &svsk->sk_sock->flags);
>  	return 0;



  parent reply	other threads:[~2013-07-25 20:18 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20130710092255.0240a36d@notabene.brown>
2013-07-10  2:27 ` Is tcp autotuning really what NFS wants? J.Bruce Fields
2013-07-10  4:32   ` NeilBrown
2013-07-10 19:07     ` J.Bruce Fields
2013-07-15  4:32       ` NeilBrown
2013-07-16  1:58         ` J.Bruce Fields
2013-07-16  4:00           ` NeilBrown
2013-07-16 14:24             ` J.Bruce Fields
2013-07-18  0:03               ` Ben Myers
2013-07-24 21:07                 ` J.Bruce Fields
2013-07-25  1:30                   ` [PATCH] NFSD/sunrpc: avoid deadlock on TCP connection due to memory pressure NeilBrown
2013-07-25 12:35                     ` Jim Rees
2013-07-25 20:18                     ` J.Bruce Fields [this message]
2013-07-25 20:33                       ` NeilBrown
2013-07-26 14:19                         ` J.Bruce Fields
2013-07-30  2:48                           ` NeilBrown
2013-08-01  2:49                             ` J.Bruce Fields
2013-07-10 17:33   ` Is tcp autotuning really what NFS wants? Dean
2013-07-10 17:39     ` Ben Greear
2013-07-15  4:35       ` NeilBrown
2013-07-15 23:32         ` Ben Greear
2013-07-16  4:46           ` NeilBrown
2013-07-10 19:59     ` Michael Richardson
2013-07-15  1:26   ` Jim Rees
2013-07-15  5:02     ` NeilBrown
2013-07-15 11:57       ` Jim Rees
2013-07-15 13:42   ` Jim Rees
2013-07-16  1:10     ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130725201805.GB17962@fieldses.org \
    --to=bfields@citi.umich.edu \
    --cc=aglo@citi.umich.edu \
    --cc=bpm@sgi.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.