All of lore.kernel.org
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
To: David Vrabel <david.vrabel-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
Cc: List Linux NFS Mailing
	<linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Schumaker Anna
	<anna.schumaker-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org>,
	List Linux Network Devel Mailing
	<netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCHv1] sunrpc: fix write space race causing stalls
Date: Fri, 16 Sep 2016 17:06:39 +0000	[thread overview]
Message-ID: <4BCBEAB3-1AD4-455E-9C0B-CCD3172C2C17@primarydata.com> (raw)
In-Reply-To: <57DC20C8.5000306-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>


> On Sep 16, 2016, at 12:41, David Vrabel <david.vrabel-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org> wrote:
> 
> On 16/09/16 17:01, Trond Myklebust wrote:
>> 
>>> On Sep 16, 2016, at 08:28, David Vrabel <david.vrabel-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org> wrote:
>>> 
>>> Write space becoming available may race with putting the task to sleep
>>> in xprt_wait_for_buffer_space().  The existing mechanism to avoid the
>>> race does not work.
>>> 
>>> This (edited) partial trace illustrates the problem:
>>> 
>>>  [1] rpc_task_run_action: task:43546@5 ... action=call_transmit
>>>  [2] xs_write_space <-xs_tcp_write_space
>>>  [3] xprt_write_space <-xs_write_space
>>>  [4] rpc_task_sleep: task:43546@5 ...
>>>  [5] xs_write_space <-xs_tcp_write_space
>>> 
>>> [1] Task 43546 runs but is out of write space.
>>> 
>>> [2] Space becomes available, xs_write_space() clears the
>>>   SOCKWQ_ASYNC_NOSPACE bit.
>>> 
>>> [3] xprt_write_space() attemts to wake xprt->snd_task (== 43546), but
>>>   this has not yet been queued and the wake up is lost.
>>> 
>>> [4] xs_nospace() is called which calls xprt_wait_for_buffer_space()
>>>   which queues task 43546.
>>> 
>>> [5] The call to sk->sk_write_space() at the end of xs_nospace() (which
>>>   is supposed to handle the above race) does not call
>>>   xprt_write_space() as the SOCKWQ_ASYNC_NOSPACE bit is clear and
>>>   thus the task is not woken.
>>> 
>>> Fix the race by have xprt_wait_for_buffer_space() check for write
>>> space after putting the task to sleep.
>>> 
>>> Signed-off-by: David Vrabel <david.vrabel-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
>>> ---
>>> include/linux/sunrpc/xprt.h |  1 +
>>> net/sunrpc/xprt.c           |  4 ++++
>>> net/sunrpc/xprtsock.c       | 21 +++++++++++++++++++--
>>> 3 files changed, 24 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
>>> index a16070d..621e74b 100644
>>> --- a/include/linux/sunrpc/xprt.h
>>> +++ b/include/linux/sunrpc/xprt.h
>>> @@ -129,6 +129,7 @@ struct rpc_xprt_ops {
>>> 	void		(*connect)(struct rpc_xprt *xprt, struct rpc_task *task);
>>> 	void *		(*buf_alloc)(struct rpc_task *task, size_t size);
>>> 	void		(*buf_free)(void *buffer);
>>> +	bool            (*have_write_space)(struct rpc_xprt *task);
>>> 	int		(*send_request)(struct rpc_task *task);
>>> 	void		(*set_retrans_timeout)(struct rpc_task *task);
>>> 	void		(*timer)(struct rpc_xprt *xprt, struct rpc_task *task);
>>> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
>>> index ea244b2..d3c1b1e 100644
>>> --- a/net/sunrpc/xprt.c
>>> +++ b/net/sunrpc/xprt.c
>>> @@ -502,6 +502,10 @@ void xprt_wait_for_buffer_space(struct rpc_task *task, rpc_action action)
>>> 
>>> 	task->tk_timeout = RPC_IS_SOFT(task) ? req->rq_timeout : 0;
>>> 	rpc_sleep_on(&xprt->pending, task, action);
>>> +
>>> +	/* Write space notification may race with putting task to sleep. */
>>> +	if (xprt->ops->have_write_space(xprt))
>>> +		rpc_wake_up_queued_task(&xprt->pending, task);
>>> }
>>> EXPORT_SYMBOL_GPL(xprt_wait_for_buffer_space);
>>> 
>>> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
>>> index bf16883..211de5b 100644
>>> --- a/net/sunrpc/xprtsock.c
>>> +++ b/net/sunrpc/xprtsock.c
>>> @@ -472,8 +472,6 @@ static int xs_nospace(struct rpc_task *task)
>>> 
>>> 	spin_unlock_bh(&xprt->transport_lock);
>>> 
>>> -	/* Race breaker in case memory is freed before above code is called */
>>> -	sk->sk_write_space(sk);
>>> 	return ret;
>>> }
>> 
>> Instead of these callbacks, why not just add a call to
>> sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk) after queueing the task in
>> xs_nospace()? Won’t that fix the existing race breaker?
> 
> I don't see how that would help.  If sk->sk_write_space was already
> called, SOCKWQ_ASYNC_NOSPACE will still be clear and the next call to
> sk->sk_write_space will still be a nop.

Sorry. Copy+paste error. I meant SOCKWQ_ASYNC_NOSPACE.

> 
> Or did you mean SOCKWQ_ASYNC_NOSPACE here?  It doesn't seem right to set
> this bit when we don't know if there's space or not.

Why not?


--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)
From: Trond Myklebust <trondmy@primarydata.com>
To: David Vrabel <david.vrabel@citrix.com>
Cc: List Linux NFS Mailing <linux-nfs@vger.kernel.org>,
	Schumaker Anna <anna.schumaker@netapp.com>,
	List Linux Network Devel Mailing <netdev@vger.kernel.org>
Subject: Re: [PATCHv1] sunrpc: fix write space race causing stalls
Date: Fri, 16 Sep 2016 17:06:39 +0000	[thread overview]
Message-ID: <4BCBEAB3-1AD4-455E-9C0B-CCD3172C2C17@primarydata.com> (raw)
In-Reply-To: <57DC20C8.5000306@citrix.com>


> On Sep 16, 2016, at 12:41, David Vrabel <david.vrabel@citrix.com> wrote:
>=20
> On 16/09/16 17:01, Trond Myklebust wrote:
>>=20
>>> On Sep 16, 2016, at 08:28, David Vrabel <david.vrabel@citrix.com> wrote=
:
>>>=20
>>> Write space becoming available may race with putting the task to sleep
>>> in xprt_wait_for_buffer_space().  The existing mechanism to avoid the
>>> race does not work.
>>>=20
>>> This (edited) partial trace illustrates the problem:
>>>=20
>>>  [1] rpc_task_run_action: task:43546@5 ... action=3Dcall_transmit
>>>  [2] xs_write_space <-xs_tcp_write_space
>>>  [3] xprt_write_space <-xs_write_space
>>>  [4] rpc_task_sleep: task:43546@5 ...
>>>  [5] xs_write_space <-xs_tcp_write_space
>>>=20
>>> [1] Task 43546 runs but is out of write space.
>>>=20
>>> [2] Space becomes available, xs_write_space() clears the
>>>   SOCKWQ_ASYNC_NOSPACE bit.
>>>=20
>>> [3] xprt_write_space() attemts to wake xprt->snd_task (=3D=3D 43546), b=
ut
>>>   this has not yet been queued and the wake up is lost.
>>>=20
>>> [4] xs_nospace() is called which calls xprt_wait_for_buffer_space()
>>>   which queues task 43546.
>>>=20
>>> [5] The call to sk->sk_write_space() at the end of xs_nospace() (which
>>>   is supposed to handle the above race) does not call
>>>   xprt_write_space() as the SOCKWQ_ASYNC_NOSPACE bit is clear and
>>>   thus the task is not woken.
>>>=20
>>> Fix the race by have xprt_wait_for_buffer_space() check for write
>>> space after putting the task to sleep.
>>>=20
>>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
>>> ---
>>> include/linux/sunrpc/xprt.h |  1 +
>>> net/sunrpc/xprt.c           |  4 ++++
>>> net/sunrpc/xprtsock.c       | 21 +++++++++++++++++++--
>>> 3 files changed, 24 insertions(+), 2 deletions(-)
>>>=20
>>> diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
>>> index a16070d..621e74b 100644
>>> --- a/include/linux/sunrpc/xprt.h
>>> +++ b/include/linux/sunrpc/xprt.h
>>> @@ -129,6 +129,7 @@ struct rpc_xprt_ops {
>>> =09void=09=09(*connect)(struct rpc_xprt *xprt, struct rpc_task *task);
>>> =09void *=09=09(*buf_alloc)(struct rpc_task *task, size_t size);
>>> =09void=09=09(*buf_free)(void *buffer);
>>> +=09bool            (*have_write_space)(struct rpc_xprt *task);
>>> =09int=09=09(*send_request)(struct rpc_task *task);
>>> =09void=09=09(*set_retrans_timeout)(struct rpc_task *task);
>>> =09void=09=09(*timer)(struct rpc_xprt *xprt, struct rpc_task *task);
>>> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
>>> index ea244b2..d3c1b1e 100644
>>> --- a/net/sunrpc/xprt.c
>>> +++ b/net/sunrpc/xprt.c
>>> @@ -502,6 +502,10 @@ void xprt_wait_for_buffer_space(struct rpc_task *t=
ask, rpc_action action)
>>>=20
>>> =09task->tk_timeout =3D RPC_IS_SOFT(task) ? req->rq_timeout : 0;
>>> =09rpc_sleep_on(&xprt->pending, task, action);
>>> +
>>> +=09/* Write space notification may race with putting task to sleep. */
>>> +=09if (xprt->ops->have_write_space(xprt))
>>> +=09=09rpc_wake_up_queued_task(&xprt->pending, task);
>>> }
>>> EXPORT_SYMBOL_GPL(xprt_wait_for_buffer_space);
>>>=20
>>> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
>>> index bf16883..211de5b 100644
>>> --- a/net/sunrpc/xprtsock.c
>>> +++ b/net/sunrpc/xprtsock.c
>>> @@ -472,8 +472,6 @@ static int xs_nospace(struct rpc_task *task)
>>>=20
>>> =09spin_unlock_bh(&xprt->transport_lock);
>>>=20
>>> -=09/* Race breaker in case memory is freed before above code is called=
 */
>>> -=09sk->sk_write_space(sk);
>>> =09return ret;
>>> }
>>=20
>> Instead of these callbacks, why not just add a call to
>> sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk) after queueing the task in
>> xs_nospace()? Won=92t that fix the existing race breaker?
>=20
> I don't see how that would help.  If sk->sk_write_space was already
> called, SOCKWQ_ASYNC_NOSPACE will still be clear and the next call to
> sk->sk_write_space will still be a nop.

Sorry. Copy+paste error. I meant SOCKWQ_ASYNC_NOSPACE.

>=20
> Or did you mean SOCKWQ_ASYNC_NOSPACE here?  It doesn't seem right to set
> this bit when we don't know if there's space or not.

Why not?



  parent reply	other threads:[~2016-09-16 17:06 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-16 12:28 [PATCHv1] sunrpc: fix write space race causing stalls David Vrabel
2016-09-16 16:01 ` Trond Myklebust
2016-09-16 16:01   ` Trond Myklebust
2016-09-16 16:41   ` David Vrabel
     [not found]     ` <57DC20C8.5000306-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
2016-09-16 17:06       ` Trond Myklebust [this message]
2016-09-16 17:06         ` Trond Myklebust
2016-09-16 17:29         ` David Vrabel
2016-09-16 18:04           ` Trond Myklebust
2016-09-16 18:04             ` Trond Myklebust

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BCBEAB3-1AD4-455E-9C0B-CCD3172C2C17@primarydata.com \
    --to=trondmy-7i+n7zu2hftekmmhf/gkza@public.gmane.org \
    --cc=anna.schumaker-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org \
    --cc=david.vrabel-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org \
    --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.