Netdev Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH net] tcp: make sure EPOLLOUT wont be missed
@ 2019-08-17  4:26 Eric Dumazet
  2019-08-17 12:39 ` Soheil Hassas Yeganeh
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Eric Dumazet @ 2019-08-17  4:26 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Soheil Hassas Yeganeh, Neal Cardwell,
	Eric Dumazet, Jason Baron, Vladimir Rutsky

As Jason Baron explained in commit 790ba4566c1a ("tcp: set SOCK_NOSPACE
under memory pressure"), it is crucial we properly set SOCK_NOSPACE
when needed.

However, Jason patch had a bug, because the 'nonblocking' status
as far as sk_stream_wait_memory() is concerned is governed
by MSG_DONTWAIT flag passed at sendmsg() time :

    long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);

So it is very possible that tcp sendmsg() calls sk_stream_wait_memory(),
and that sk_stream_wait_memory() returns -EAGAIN with SOCK_NOSPACE
cleared, if sk->sk_sndtimeo has been set to a small (but not zero)
value.

This patch removes the 'noblock' variable since we must always
set SOCK_NOSPACE if -EAGAIN is returned.

It also renames the do_nonblock label since we might reach this
code path even if we were in blocking mode.

Fixes: 790ba4566c1a ("tcp: set SOCK_NOSPACE under memory pressure")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jason Baron <jbaron@akamai.com>
Reported-by: Vladimir Rutsky  <rutsky@google.com>
---
 net/core/stream.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/net/core/stream.c b/net/core/stream.c
index e94bb02a56295ec2db34ab423a8c7c890df0a696..4f1d4aa5fb38d989a9c81f32dfce3f31bbc1fa47 100644
--- a/net/core/stream.c
+++ b/net/core/stream.c
@@ -120,7 +120,6 @@ int sk_stream_wait_memory(struct sock *sk, long *timeo_p)
 	int err = 0;
 	long vm_wait = 0;
 	long current_timeo = *timeo_p;
-	bool noblock = (*timeo_p ? false : true);
 	DEFINE_WAIT_FUNC(wait, woken_wake_function);
 
 	if (sk_stream_memory_free(sk))
@@ -133,11 +132,8 @@ int sk_stream_wait_memory(struct sock *sk, long *timeo_p)
 
 		if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
 			goto do_error;
-		if (!*timeo_p) {
-			if (noblock)
-				set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
-			goto do_nonblock;
-		}
+		if (!*timeo_p)
+			goto do_eagain;
 		if (signal_pending(current))
 			goto do_interrupted;
 		sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk);
@@ -169,7 +165,13 @@ int sk_stream_wait_memory(struct sock *sk, long *timeo_p)
 do_error:
 	err = -EPIPE;
 	goto out;
-do_nonblock:
+do_eagain:
+	/* Make sure that whenever EAGAIN is returned, EPOLLOUT event can
+	 * be generated later.
+	 * When TCP receives ACK packets that make room, tcp_check_space()
+	 * only calls tcp_new_space() if SOCK_NOSPACE is set.
+	 */
+	set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
 	err = -EAGAIN;
 	goto out;
 do_interrupted:
-- 
2.23.0.rc1.153.gdeed80330f-goog


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] tcp: make sure EPOLLOUT wont be missed
  2019-08-17  4:26 [PATCH net] tcp: make sure EPOLLOUT wont be missed Eric Dumazet
@ 2019-08-17 12:39 ` Soheil Hassas Yeganeh
  2019-08-17 14:19 ` Jason Baron
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Soheil Hassas Yeganeh @ 2019-08-17 12:39 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, netdev, Neal Cardwell, Eric Dumazet,
	Jason Baron, Vladimir Rutsky

On Sat, Aug 17, 2019 at 12:26 AM Eric Dumazet <edumazet@google.com> wrote:
>
> As Jason Baron explained in commit 790ba4566c1a ("tcp: set SOCK_NOSPACE
> under memory pressure"), it is crucial we properly set SOCK_NOSPACE
> when needed.
>
> However, Jason patch had a bug, because the 'nonblocking' status
> as far as sk_stream_wait_memory() is concerned is governed
> by MSG_DONTWAIT flag passed at sendmsg() time :
>
>     long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
>
> So it is very possible that tcp sendmsg() calls sk_stream_wait_memory(),
> and that sk_stream_wait_memory() returns -EAGAIN with SOCK_NOSPACE
> cleared, if sk->sk_sndtimeo has been set to a small (but not zero)
> value.
>
> This patch removes the 'noblock' variable since we must always
> set SOCK_NOSPACE if -EAGAIN is returned.
>
> It also renames the do_nonblock label since we might reach this
> code path even if we were in blocking mode.
>
> Fixes: 790ba4566c1a ("tcp: set SOCK_NOSPACE under memory pressure")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Jason Baron <jbaron@akamai.com>
> Reported-by: Vladimir Rutsky  <rutsky@google.com>

Acked-by: Soheil Hassas Yeganeh <soheil@google.com>

Thank you for the fix!

> ---
>  net/core/stream.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/net/core/stream.c b/net/core/stream.c
> index e94bb02a56295ec2db34ab423a8c7c890df0a696..4f1d4aa5fb38d989a9c81f32dfce3f31bbc1fa47 100644
> --- a/net/core/stream.c
> +++ b/net/core/stream.c
> @@ -120,7 +120,6 @@ int sk_stream_wait_memory(struct sock *sk, long *timeo_p)
>         int err = 0;
>         long vm_wait = 0;
>         long current_timeo = *timeo_p;
> -       bool noblock = (*timeo_p ? false : true);
>         DEFINE_WAIT_FUNC(wait, woken_wake_function);
>
>         if (sk_stream_memory_free(sk))
> @@ -133,11 +132,8 @@ int sk_stream_wait_memory(struct sock *sk, long *timeo_p)
>
>                 if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
>                         goto do_error;
> -               if (!*timeo_p) {
> -                       if (noblock)
> -                               set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
> -                       goto do_nonblock;
> -               }
> +               if (!*timeo_p)
> +                       goto do_eagain;
>                 if (signal_pending(current))
>                         goto do_interrupted;
>                 sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk);
> @@ -169,7 +165,13 @@ int sk_stream_wait_memory(struct sock *sk, long *timeo_p)
>  do_error:
>         err = -EPIPE;
>         goto out;
> -do_nonblock:
> +do_eagain:
> +       /* Make sure that whenever EAGAIN is returned, EPOLLOUT event can
> +        * be generated later.
> +        * When TCP receives ACK packets that make room, tcp_check_space()
> +        * only calls tcp_new_space() if SOCK_NOSPACE is set.
> +        */
> +       set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
>         err = -EAGAIN;
>         goto out;
>  do_interrupted:
> --
> 2.23.0.rc1.153.gdeed80330f-goog
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] tcp: make sure EPOLLOUT wont be missed
  2019-08-17  4:26 [PATCH net] tcp: make sure EPOLLOUT wont be missed Eric Dumazet
  2019-08-17 12:39 ` Soheil Hassas Yeganeh
@ 2019-08-17 14:19 ` Jason Baron
  2019-08-17 16:26   ` Eric Dumazet
  2019-08-17 17:10 ` Neal Cardwell
  2019-08-19 20:08 ` David Miller
  3 siblings, 1 reply; 7+ messages in thread
From: Jason Baron @ 2019-08-17 14:19 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller
  Cc: netdev, Soheil Hassas Yeganeh, Neal Cardwell, Eric Dumazet,
	Vladimir Rutsky



On 8/17/19 12:26 AM, Eric Dumazet wrote:
> As Jason Baron explained in commit 790ba4566c1a ("tcp: set SOCK_NOSPACE
> under memory pressure"), it is crucial we properly set SOCK_NOSPACE
> when needed.
> 
> However, Jason patch had a bug, because the 'nonblocking' status
> as far as sk_stream_wait_memory() is concerned is governed
> by MSG_DONTWAIT flag passed at sendmsg() time :
> 
>     long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
> 
> So it is very possible that tcp sendmsg() calls sk_stream_wait_memory(),
> and that sk_stream_wait_memory() returns -EAGAIN with SOCK_NOSPACE
> cleared, if sk->sk_sndtimeo has been set to a small (but not zero)
> value.

Is MSG_DONTWAIT not set in this case? The original patch was intended
only for the explicit non-blocking case. The epoll manpage says:
"EPOLLET flag should use nonblocking file descriptors". So the original
intention was not to impact the blocking case. This seems to me like
a different use-case.

Thanks,

-Jason


> This patch removes the 'noblock' variable since we must always
> set SOCK_NOSPACE if -EAGAIN is returned.
> 
> It also renames the do_nonblock label since we might reach this
> code path even if we were in blocking mode.
> 
> Fixes: 790ba4566c1a ("tcp: set SOCK_NOSPACE under memory pressure")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Jason Baron <jbaron@akamai.com>
> Reported-by: Vladimir Rutsky  <rutsky@google.com>
> ---
>  net/core/stream.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/net/core/stream.c b/net/core/stream.c
> index e94bb02a56295ec2db34ab423a8c7c890df0a696..4f1d4aa5fb38d989a9c81f32dfce3f31bbc1fa47 100644
> --- a/net/core/stream.c
> +++ b/net/core/stream.c
> @@ -120,7 +120,6 @@ int sk_stream_wait_memory(struct sock *sk, long *timeo_p)
>  	int err = 0;
>  	long vm_wait = 0;
>  	long current_timeo = *timeo_p;
> -	bool noblock = (*timeo_p ? false : true);
>  	DEFINE_WAIT_FUNC(wait, woken_wake_function);
>  
>  	if (sk_stream_memory_free(sk))
> @@ -133,11 +132,8 @@ int sk_stream_wait_memory(struct sock *sk, long *timeo_p)
>  
>  		if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
>  			goto do_error;
> -		if (!*timeo_p) {
> -			if (noblock)
> -				set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
> -			goto do_nonblock;
> -		}
> +		if (!*timeo_p)
> +			goto do_eagain;
>  		if (signal_pending(current))
>  			goto do_interrupted;
>  		sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk);
> @@ -169,7 +165,13 @@ int sk_stream_wait_memory(struct sock *sk, long *timeo_p)
>  do_error:
>  	err = -EPIPE;
>  	goto out;
> -do_nonblock:
> +do_eagain:
> +	/* Make sure that whenever EAGAIN is returned, EPOLLOUT event can
> +	 * be generated later.
> +	 * When TCP receives ACK packets that make room, tcp_check_space()
> +	 * only calls tcp_new_space() if SOCK_NOSPACE is set.
> +	 */
> +	set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
>  	err = -EAGAIN;
>  	goto out;
>  do_interrupted:
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] tcp: make sure EPOLLOUT wont be missed
  2019-08-17 14:19 ` Jason Baron
@ 2019-08-17 16:26   ` Eric Dumazet
  2019-08-19 18:40     ` Jason Baron
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2019-08-17 16:26 UTC (permalink / raw)
  To: Jason Baron, Eric Dumazet, David S . Miller
  Cc: netdev, Soheil Hassas Yeganeh, Neal Cardwell, Eric Dumazet,
	Vladimir Rutsky



On 8/17/19 4:19 PM, Jason Baron wrote:
> 
> 
> On 8/17/19 12:26 AM, Eric Dumazet wrote:
>> As Jason Baron explained in commit 790ba4566c1a ("tcp: set SOCK_NOSPACE
>> under memory pressure"), it is crucial we properly set SOCK_NOSPACE
>> when needed.
>>
>> However, Jason patch had a bug, because the 'nonblocking' status
>> as far as sk_stream_wait_memory() is concerned is governed
>> by MSG_DONTWAIT flag passed at sendmsg() time :
>>
>>     long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
>>
>> So it is very possible that tcp sendmsg() calls sk_stream_wait_memory(),
>> and that sk_stream_wait_memory() returns -EAGAIN with SOCK_NOSPACE
>> cleared, if sk->sk_sndtimeo has been set to a small (but not zero)
>> value.
> 
> Is MSG_DONTWAIT not set in this case? The original patch was intended
> only for the explicit non-blocking case. The epoll manpage says:
> "EPOLLET flag should use nonblocking file descriptors". So the original
> intention was not to impact the blocking case. This seems to me like
> a different use-case.
>

I guess the problem is how we define 'non-blocking' ...

SO_SNDTIMEO can be used by application to implement a variation of non-blocking,
by waiting for a socket event with a short timeout, to maybe recover
from memory pressure conditions in a more efficient way than simply looping.

Note that the man page for epoll() only _suggests_ to use nonblocking file descriptors.

<quote>
       The  suggested  way  to use epoll as an edge-triggered (EPOLLET)
       interface is as follows:

              i   with nonblocking file descriptors; and

              ii  by  waiting  for  an  event  only  after  read(2)  or
                  write(2) return EAGAIN.
</quote>









^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] tcp: make sure EPOLLOUT wont be missed
  2019-08-17  4:26 [PATCH net] tcp: make sure EPOLLOUT wont be missed Eric Dumazet
  2019-08-17 12:39 ` Soheil Hassas Yeganeh
  2019-08-17 14:19 ` Jason Baron
@ 2019-08-17 17:10 ` Neal Cardwell
  2019-08-19 20:08 ` David Miller
  3 siblings, 0 replies; 7+ messages in thread
From: Neal Cardwell @ 2019-08-17 17:10 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, netdev, Soheil Hassas Yeganeh, Eric Dumazet,
	Jason Baron, Vladimir Rutsky

On Sat, Aug 17, 2019 at 12:26 AM Eric Dumazet <edumazet@google.com> wrote:
>
> As Jason Baron explained in commit 790ba4566c1a ("tcp: set SOCK_NOSPACE
> under memory pressure"), it is crucial we properly set SOCK_NOSPACE
> when needed.
>
> However, Jason patch had a bug, because the 'nonblocking' status
> as far as sk_stream_wait_memory() is concerned is governed
> by MSG_DONTWAIT flag passed at sendmsg() time :
>
>     long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
>
> So it is very possible that tcp sendmsg() calls sk_stream_wait_memory(),
> and that sk_stream_wait_memory() returns -EAGAIN with SOCK_NOSPACE
> cleared, if sk->sk_sndtimeo has been set to a small (but not zero)
> value.
>
> This patch removes the 'noblock' variable since we must always
> set SOCK_NOSPACE if -EAGAIN is returned.
>
> It also renames the do_nonblock label since we might reach this
> code path even if we were in blocking mode.
>
> Fixes: 790ba4566c1a ("tcp: set SOCK_NOSPACE under memory pressure")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Jason Baron <jbaron@akamai.com>
> Reported-by: Vladimir Rutsky  <rutsky@google.com>
> ---
>  net/core/stream.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)

Acked-by: Neal Cardwell <ncardwell@google.com>

Thanks, Eric!

neal

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] tcp: make sure EPOLLOUT wont be missed
  2019-08-17 16:26   ` Eric Dumazet
@ 2019-08-19 18:40     ` Jason Baron
  0 siblings, 0 replies; 7+ messages in thread
From: Jason Baron @ 2019-08-19 18:40 UTC (permalink / raw)
  To: Eric Dumazet, Eric Dumazet, David S . Miller
  Cc: netdev, Soheil Hassas Yeganeh, Neal Cardwell, Vladimir Rutsky



On 8/17/19 12:26 PM, Eric Dumazet wrote:
> 
> 
> On 8/17/19 4:19 PM, Jason Baron wrote:
>>
>>
>> On 8/17/19 12:26 AM, Eric Dumazet wrote:
>>> As Jason Baron explained in commit 790ba4566c1a ("tcp: set SOCK_NOSPACE
>>> under memory pressure"), it is crucial we properly set SOCK_NOSPACE
>>> when needed.
>>>
>>> However, Jason patch had a bug, because the 'nonblocking' status
>>> as far as sk_stream_wait_memory() is concerned is governed
>>> by MSG_DONTWAIT flag passed at sendmsg() time :
>>>
>>>     long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
>>>
>>> So it is very possible that tcp sendmsg() calls sk_stream_wait_memory(),
>>> and that sk_stream_wait_memory() returns -EAGAIN with SOCK_NOSPACE
>>> cleared, if sk->sk_sndtimeo has been set to a small (but not zero)
>>> value.
>>
>> Is MSG_DONTWAIT not set in this case? The original patch was intended
>> only for the explicit non-blocking case. The epoll manpage says:
>> "EPOLLET flag should use nonblocking file descriptors". So the original
>> intention was not to impact the blocking case. This seems to me like
>> a different use-case.
>>
> 
> I guess the problem is how we define 'non-blocking' ...
> 
> SO_SNDTIMEO can be used by application to implement a variation of non-blocking,
> by waiting for a socket event with a short timeout, to maybe recover
> from memory pressure conditions in a more efficient way than simply looping.
> 
> Note that the man page for epoll() only _suggests_ to use nonblocking file descriptors.
> 
> <quote>
>        The  suggested  way  to use epoll as an edge-triggered (EPOLLET)
>        interface is as follows:
> 
>               i   with nonblocking file descriptors; and
> 
>               ii  by  waiting  for  an  event  only  after  read(2)  or
>                   write(2) return EAGAIN.
> </quote>
> 
> 

Ok, seems reasonable:
Acked-by: Jason Baron <jbaron@akamai.com>

I found a similar pattern in net/smc/smc_tx.c, which I also just sent a
patch for.

Thanks,

-Jason




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] tcp: make sure EPOLLOUT wont be missed
  2019-08-17  4:26 [PATCH net] tcp: make sure EPOLLOUT wont be missed Eric Dumazet
                   ` (2 preceding siblings ...)
  2019-08-17 17:10 ` Neal Cardwell
@ 2019-08-19 20:08 ` David Miller
  3 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2019-08-19 20:08 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, soheil, ncardwell, eric.dumazet, jbaron, rutsky

From: Eric Dumazet <edumazet@google.com>
Date: Fri, 16 Aug 2019 21:26:22 -0700

> As Jason Baron explained in commit 790ba4566c1a ("tcp: set SOCK_NOSPACE
> under memory pressure"), it is crucial we properly set SOCK_NOSPACE
> when needed.
> 
> However, Jason patch had a bug, because the 'nonblocking' status
> as far as sk_stream_wait_memory() is concerned is governed
> by MSG_DONTWAIT flag passed at sendmsg() time :
> 
>     long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
> 
> So it is very possible that tcp sendmsg() calls sk_stream_wait_memory(),
> and that sk_stream_wait_memory() returns -EAGAIN with SOCK_NOSPACE
> cleared, if sk->sk_sndtimeo has been set to a small (but not zero)
> value.
> 
> This patch removes the 'noblock' variable since we must always
> set SOCK_NOSPACE if -EAGAIN is returned.
> 
> It also renames the do_nonblock label since we might reach this
> code path even if we were in blocking mode.
> 
> Fixes: 790ba4566c1a ("tcp: set SOCK_NOSPACE under memory pressure")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Jason Baron <jbaron@akamai.com>
> Reported-by: Vladimir Rutsky  <rutsky@google.com>

Applied and queued up for -stable.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-17  4:26 [PATCH net] tcp: make sure EPOLLOUT wont be missed Eric Dumazet
2019-08-17 12:39 ` Soheil Hassas Yeganeh
2019-08-17 14:19 ` Jason Baron
2019-08-17 16:26   ` Eric Dumazet
2019-08-19 18:40     ` Jason Baron
2019-08-17 17:10 ` Neal Cardwell
2019-08-19 20:08 ` David Miller

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org netdev@archiver.kernel.org
	public-inbox-index netdev


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox