Netdev Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH net] tcp: remove empty skb from write queue in error cases
@ 2019-08-26 16:19 Eric Dumazet
  2019-08-26 16:50 ` Soheil Hassas Yeganeh
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Eric Dumazet @ 2019-08-26 16:19 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Soheil Hassas Yeganeh, Neal Cardwell,
	Eric Dumazet, Jason Baron, Vladimir Rutsky

Vladimir Rutsky reported stuck TCP sessions after memory pressure
events. Edge Trigger epoll() user would never receive an EPOLLOUT
notification allowing them to retry a sendmsg().

Jason tested the case of sk_stream_alloc_skb() returning NULL,
but there are other paths that could lead both sendmsg() and sendpage()
to return -1 (EAGAIN), with an empty skb queued on the write queue.

This patch makes sure we remove this empty skb so that
Jason code can detect that the queue is empty, and
call sk->sk_write_space(sk) accordingly.

Fixes: ce5ec440994b ("tcp: ensure epoll edge trigger wakeup when write queue is empty")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jason Baron <jbaron@akamai.com>
Reported-by: Vladimir Rutsky <rutsky@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
---
 net/ipv4/tcp.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 77b485d60b9d0e00edc4e2f0d6c5bb3a9460b23b..61082065b26a068975c411b74eb46739ab0632ca 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -935,6 +935,22 @@ static int tcp_send_mss(struct sock *sk, int *size_goal, int flags)
 	return mss_now;
 }
 
+/* In some cases, both sendpage() and sendmsg() could have added
+ * an skb to the write queue, but failed adding payload on it.
+ * We need to remove it to consume less memory, but more
+ * importantly be able to generate EPOLLOUT for Edge Trigger epoll()
+ * users.
+ */
+static void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb)
+{
+	if (skb && !skb->len) {
+		tcp_unlink_write_queue(skb, sk);
+		if (tcp_write_queue_empty(sk))
+			tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
+		sk_wmem_free_skb(sk, skb);
+	}
+}
+
 ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
 			 size_t size, int flags)
 {
@@ -1064,6 +1080,7 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
 	return copied;
 
 do_error:
+	tcp_remove_empty_skb(sk, tcp_write_queue_tail(sk));
 	if (copied)
 		goto out;
 out_err:
@@ -1388,18 +1405,11 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 	sock_zerocopy_put(uarg);
 	return copied + copied_syn;
 
+do_error:
+	skb = tcp_write_queue_tail(sk);
 do_fault:
-	if (!skb->len) {
-		tcp_unlink_write_queue(skb, sk);
-		/* It is the one place in all of TCP, except connection
-		 * reset, where we can be unlinking the send_head.
-		 */
-		if (tcp_write_queue_empty(sk))
-			tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
-		sk_wmem_free_skb(sk, skb);
-	}
+	tcp_remove_empty_skb(sk, skb);
 
-do_error:
 	if (copied + copied_syn)
 		goto out;
 out_err:
-- 
2.23.0.187.g17f5b7556c-goog


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] tcp: remove empty skb from write queue in error cases
  2019-08-26 16:19 [PATCH net] tcp: remove empty skb from write queue in error cases Eric Dumazet
@ 2019-08-26 16:50 ` Soheil Hassas Yeganeh
  2019-08-26 17:18 ` Neal Cardwell
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Soheil Hassas Yeganeh @ 2019-08-26 16:50 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, netdev, Neal Cardwell, Eric Dumazet,
	Jason Baron, Vladimir Rutsky

On Mon, Aug 26, 2019 at 12:19 PM Eric Dumazet <edumazet@google.com> wrote:
>
> Vladimir Rutsky reported stuck TCP sessions after memory pressure
> events. Edge Trigger epoll() user would never receive an EPOLLOUT
> notification allowing them to retry a sendmsg().
>
> Jason tested the case of sk_stream_alloc_skb() returning NULL,
> but there are other paths that could lead both sendmsg() and sendpage()
> to return -1 (EAGAIN), with an empty skb queued on the write queue.
>
> This patch makes sure we remove this empty skb so that
> Jason code can detect that the queue is empty, and
> call sk->sk_write_space(sk) accordingly.
>
> Fixes: ce5ec440994b ("tcp: ensure epoll edge trigger wakeup when write queue is empty")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Jason Baron <jbaron@akamai.com>
> Reported-by: Vladimir Rutsky <rutsky@google.com>
> Cc: Soheil Hassas Yeganeh <soheil@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>

Acked-by: Soheil Hassas Yeganeh <soheil@google.com>

Nice find!

> ---
>  net/ipv4/tcp.c | 30 ++++++++++++++++++++----------
>  1 file changed, 20 insertions(+), 10 deletions(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 77b485d60b9d0e00edc4e2f0d6c5bb3a9460b23b..61082065b26a068975c411b74eb46739ab0632ca 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -935,6 +935,22 @@ static int tcp_send_mss(struct sock *sk, int *size_goal, int flags)
>         return mss_now;
>  }
>
> +/* In some cases, both sendpage() and sendmsg() could have added
> + * an skb to the write queue, but failed adding payload on it.
> + * We need to remove it to consume less memory, but more
> + * importantly be able to generate EPOLLOUT for Edge Trigger epoll()
> + * users.
> + */
> +static void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb)
> +{
> +       if (skb && !skb->len) {
> +               tcp_unlink_write_queue(skb, sk);
> +               if (tcp_write_queue_empty(sk))
> +                       tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
> +               sk_wmem_free_skb(sk, skb);
> +       }
> +}
> +
>  ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
>                          size_t size, int flags)
>  {
> @@ -1064,6 +1080,7 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
>         return copied;
>
>  do_error:
> +       tcp_remove_empty_skb(sk, tcp_write_queue_tail(sk));
>         if (copied)
>                 goto out;
>  out_err:
> @@ -1388,18 +1405,11 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>         sock_zerocopy_put(uarg);
>         return copied + copied_syn;
>
> +do_error:
> +       skb = tcp_write_queue_tail(sk);
>  do_fault:
> -       if (!skb->len) {
> -               tcp_unlink_write_queue(skb, sk);
> -               /* It is the one place in all of TCP, except connection
> -                * reset, where we can be unlinking the send_head.
> -                */
> -               if (tcp_write_queue_empty(sk))
> -                       tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
> -               sk_wmem_free_skb(sk, skb);
> -       }
> +       tcp_remove_empty_skb(sk, skb);
>
> -do_error:
>         if (copied + copied_syn)
>                 goto out;
>  out_err:
> --
> 2.23.0.187.g17f5b7556c-goog
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] tcp: remove empty skb from write queue in error cases
  2019-08-26 16:19 [PATCH net] tcp: remove empty skb from write queue in error cases Eric Dumazet
  2019-08-26 16:50 ` Soheil Hassas Yeganeh
@ 2019-08-26 17:18 ` Neal Cardwell
  2019-08-26 19:56 ` Jason Baron
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Neal Cardwell @ 2019-08-26 17:18 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, netdev, Soheil Hassas Yeganeh, Eric Dumazet,
	Jason Baron, Vladimir Rutsky

On Mon, Aug 26, 2019 at 12:19 PM Eric Dumazet <edumazet@google.com> wrote:
>
> Vladimir Rutsky reported stuck TCP sessions after memory pressure
> events. Edge Trigger epoll() user would never receive an EPOLLOUT
> notification allowing them to retry a sendmsg().
>
> Jason tested the case of sk_stream_alloc_skb() returning NULL,
> but there are other paths that could lead both sendmsg() and sendpage()
> to return -1 (EAGAIN), with an empty skb queued on the write queue.
>
> This patch makes sure we remove this empty skb so that
> Jason code can detect that the queue is empty, and
> call sk->sk_write_space(sk) accordingly.
>
> Fixes: ce5ec440994b ("tcp: ensure epoll edge trigger wakeup when write queue is empty")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Jason Baron <jbaron@akamai.com>
> Reported-by: Vladimir Rutsky <rutsky@google.com>
> Cc: Soheil Hassas Yeganeh <soheil@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> ---

Acked-by: Neal Cardwell <ncardwell@google.com>

Nice detective work. :-) Thanks, Eric!

neal

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] tcp: remove empty skb from write queue in error cases
  2019-08-26 16:19 [PATCH net] tcp: remove empty skb from write queue in error cases Eric Dumazet
  2019-08-26 16:50 ` Soheil Hassas Yeganeh
  2019-08-26 17:18 ` Neal Cardwell
@ 2019-08-26 19:56 ` Jason Baron
  2019-08-26 20:04   ` Eric Dumazet
  2019-08-28  4:38 ` David Miller
  2019-09-11 17:36 ` Christoph Paasch
  4 siblings, 1 reply; 8+ messages in thread
From: Jason Baron @ 2019-08-26 19:56 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller
  Cc: netdev, Soheil Hassas Yeganeh, Neal Cardwell, Eric Dumazet,
	Vladimir Rutsky



On 8/26/19 12:19 PM, Eric Dumazet wrote:
> Vladimir Rutsky reported stuck TCP sessions after memory pressure
> events. Edge Trigger epoll() user would never receive an EPOLLOUT
> notification allowing them to retry a sendmsg().
> 
> Jason tested the case of sk_stream_alloc_skb() returning NULL,
> but there are other paths that could lead both sendmsg() and sendpage()
> to return -1 (EAGAIN), with an empty skb queued on the write queue.
> 
> This patch makes sure we remove this empty skb so that
> Jason code can detect that the queue is empty, and
> call sk->sk_write_space(sk) accordingly.
> 

Makes sense, thanks. I think this check for empty queue could also
benefit from and update to use tcp_write_queue_empty(). I will send a
follow-up for that.

Thanks,

-Jason



> Fixes: ce5ec440994b ("tcp: ensure epoll edge trigger wakeup when write queue is empty")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Jason Baron <jbaron@akamai.com>
> Reported-by: Vladimir Rutsky <rutsky@google.com>
> Cc: Soheil Hassas Yeganeh <soheil@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> ---
>  net/ipv4/tcp.c | 30 ++++++++++++++++++++----------
>  1 file changed, 20 insertions(+), 10 deletions(-)
> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 77b485d60b9d0e00edc4e2f0d6c5bb3a9460b23b..61082065b26a068975c411b74eb46739ab0632ca 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -935,6 +935,22 @@ static int tcp_send_mss(struct sock *sk, int *size_goal, int flags)
>  	return mss_now;
>  }
>  
> +/* In some cases, both sendpage() and sendmsg() could have added
> + * an skb to the write queue, but failed adding payload on it.
> + * We need to remove it to consume less memory, but more
> + * importantly be able to generate EPOLLOUT for Edge Trigger epoll()
> + * users.
> + */
> +static void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb)
> +{
> +	if (skb && !skb->len) {
> +		tcp_unlink_write_queue(skb, sk);
> +		if (tcp_write_queue_empty(sk))
> +			tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
> +		sk_wmem_free_skb(sk, skb);
> +	}
> +}
> +
>  ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
>  			 size_t size, int flags)
>  {
> @@ -1064,6 +1080,7 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
>  	return copied;
>  
>  do_error:
> +	tcp_remove_empty_skb(sk, tcp_write_queue_tail(sk));
>  	if (copied)
>  		goto out;
>  out_err:
> @@ -1388,18 +1405,11 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>  	sock_zerocopy_put(uarg);
>  	return copied + copied_syn;
>  
> +do_error:
> +	skb = tcp_write_queue_tail(sk);
>  do_fault:
> -	if (!skb->len) {
> -		tcp_unlink_write_queue(skb, sk);
> -		/* It is the one place in all of TCP, except connection
> -		 * reset, where we can be unlinking the send_head.
> -		 */
> -		if (tcp_write_queue_empty(sk))
> -			tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
> -		sk_wmem_free_skb(sk, skb);
> -	}
> +	tcp_remove_empty_skb(sk, skb);
>  
> -do_error:
>  	if (copied + copied_syn)
>  		goto out;
>  out_err:
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] tcp: remove empty skb from write queue in error cases
  2019-08-26 19:56 ` Jason Baron
@ 2019-08-26 20:04   ` Eric Dumazet
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2019-08-26 20:04 UTC (permalink / raw)
  To: Jason Baron, Eric Dumazet, David S . Miller
  Cc: netdev, Soheil Hassas Yeganeh, Neal Cardwell, Vladimir Rutsky



On 8/26/19 9:56 PM, Jason Baron wrote:
> 
> 
> On 8/26/19 12:19 PM, Eric Dumazet wrote:
>> Vladimir Rutsky reported stuck TCP sessions after memory pressure
>> events. Edge Trigger epoll() user would never receive an EPOLLOUT
>> notification allowing them to retry a sendmsg().
>>
>> Jason tested the case of sk_stream_alloc_skb() returning NULL,
>> but there are other paths that could lead both sendmsg() and sendpage()
>> to return -1 (EAGAIN), with an empty skb queued on the write queue.
>>
>> This patch makes sure we remove this empty skb so that
>> Jason code can detect that the queue is empty, and
>> call sk->sk_write_space(sk) accordingly.
>>
> 
> Makes sense, thanks. I think this check for empty queue could also
> benefit from and update to use tcp_write_queue_empty(). I will send a
> follow-up for that.
>

My plan (for net-next) is to submit something more like this one instead.

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 051ef10374f6..908dbe91e04b 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1068,7 +1068,7 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
                goto out;
 out_err:
        /* make sure we wake any epoll edge trigger waiter */
-       if (unlikely(skb_queue_len(&sk->sk_write_queue) == 0 &&
+       if (unlikely(tcp_rtx_and_write_queues_empty(sk) &&
                     err == -EAGAIN)) {
                sk->sk_write_space(sk);
                tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED);
@@ -1407,7 +1407,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
        sock_zerocopy_put_abort(uarg, true);
        err = sk_stream_error(sk, flags, err);
        /* make sure we wake any epoll edge trigger waiter */
-       if (unlikely(skb_queue_len(&sk->sk_write_queue) == 0 &&
+       if (unlikely(tcp_rtx_and_write_queues_empty(sk) &&
                     err == -EAGAIN)) {
                sk->sk_write_space(sk);
                tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED);


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] tcp: remove empty skb from write queue in error cases
  2019-08-26 16:19 [PATCH net] tcp: remove empty skb from write queue in error cases Eric Dumazet
                   ` (2 preceding siblings ...)
  2019-08-26 19:56 ` Jason Baron
@ 2019-08-28  4:38 ` David Miller
  2019-09-11 17:36 ` Christoph Paasch
  4 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2019-08-28  4:38 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, soheil, ncardwell, eric.dumazet, jbaron, rutsky

From: Eric Dumazet <edumazet@google.com>
Date: Mon, 26 Aug 2019 09:19:15 -0700

> Vladimir Rutsky reported stuck TCP sessions after memory pressure
> events. Edge Trigger epoll() user would never receive an EPOLLOUT
> notification allowing them to retry a sendmsg().
> 
> Jason tested the case of sk_stream_alloc_skb() returning NULL,
> but there are other paths that could lead both sendmsg() and sendpage()
> to return -1 (EAGAIN), with an empty skb queued on the write queue.
> 
> This patch makes sure we remove this empty skb so that
> Jason code can detect that the queue is empty, and
> call sk->sk_write_space(sk) accordingly.
> 
> Fixes: ce5ec440994b ("tcp: ensure epoll edge trigger wakeup when write queue is empty")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Jason Baron <jbaron@akamai.com>
> Reported-by: Vladimir Rutsky <rutsky@google.com>

Applied and queued up for -stable.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] tcp: remove empty skb from write queue in error cases
  2019-08-26 16:19 [PATCH net] tcp: remove empty skb from write queue in error cases Eric Dumazet
                   ` (3 preceding siblings ...)
  2019-08-28  4:38 ` David Miller
@ 2019-09-11 17:36 ` Christoph Paasch
  2019-09-11 20:59   ` Eric Dumazet
  4 siblings, 1 reply; 8+ messages in thread
From: Christoph Paasch @ 2019-09-11 17:36 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, netdev, Soheil Hassas Yeganeh, Neal Cardwell,
	Eric Dumazet, Jason Baron, Vladimir Rutsky

Hello,

On Mon, Aug 26, 2019 at 11:04 AM Eric Dumazet <edumazet@google.com> wrote:
>
> Vladimir Rutsky reported stuck TCP sessions after memory pressure
> events. Edge Trigger epoll() user would never receive an EPOLLOUT
> notification allowing them to retry a sendmsg().
>
> Jason tested the case of sk_stream_alloc_skb() returning NULL,
> but there are other paths that could lead both sendmsg() and sendpage()
> to return -1 (EAGAIN), with an empty skb queued on the write queue.
>
> This patch makes sure we remove this empty skb so that
> Jason code can detect that the queue is empty, and
> call sk->sk_write_space(sk) accordingly.
>
> Fixes: ce5ec440994b ("tcp: ensure epoll edge trigger wakeup when write queue is empty")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Jason Baron <jbaron@akamai.com>
> Reported-by: Vladimir Rutsky <rutsky@google.com>
> Cc: Soheil Hassas Yeganeh <soheil@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> ---
>  net/ipv4/tcp.c | 30 ++++++++++++++++++++----------
>  1 file changed, 20 insertions(+), 10 deletions(-)

I got syzkaller complaining now on 4.14.143 with the following reproducer:

# {Threaded:true Collide:true Repeat:true RepeatTimes:0 Procs:1
Sandbox: Fault:false FaultCall:-1 FaultNth:0 EnableTun:false
UseTmpDir:false EnableCgroups:false EnableNetdev:false ResetNet:false
HandleSegv:false Repro:false Trace:false}
r0 = socket$inet_tcp(0x2, 0x1, 0x0)
setsockopt$inet_tcp_TCP_REPAIR(r0, 0x6, 0x13, &(0x7f0000000040)=0x1, 0x4)
setsockopt$inet_tcp_TCP_REPAIR_QUEUE(r0, 0x6, 0x14, &(0x7f00000012c0)=0x2, 0x4)
setsockopt$inet_tcp_int(r0, 0x6, 0x19, &(0x7f0000000000)=0x9, 0x4)
setsockopt$inet_tcp_TCP_MD5SIG(r0, 0x6, 0xe,
&(0x7f00000001c0)={@in={{0x2, 0x0, @empty}}, 0x0, 0x2, 0x0,
"c157cf4809151e5e89cfd6d934fbe981ec8ff6afc252ccf486c325c7ff3d35f3a89412a5cb6430e169092617df2ba65bf0ab844572e4e7dd4ece8ec1de5ac1ccd870067b018cb3b1f05f2391d872b67d"},
0xd8)
connect$inet(r0, &(0x7f0000000080)={0x2, 0x0, @dev={0xac, 0x14, 0x14,
0x1d}}, 0x10)
sendto(r0, 0x0, 0x87, 0x0, 0x0, 0x391)

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN PTI
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 2529 Comm: syz-executor709 Not tainted 4.14.143 #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
task: ffff8880677fdc00 task.stack: ffff8880642b0000
RIP: 0010:tcp_sendmsg_locked+0x6b4/0x4390 net/ipv4/tcp.c:1350
RSP: 0018:ffff8880642bf718 EFLAGS: 00010206
RAX: 0000000000000014 RBX: 0000000000000087 RCX: ffff88806a794f50
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000000a0
RBP: ffff8880642bfaa8 R08: 0000000000000006 R09: ffff8880677fe3a0
R10: 0000000000000000 R11: 0000000000000000 R12: dffffc0000000000
R13: ffff88806a794f50 R14: ffff88806a794d00 R15: 0000000000000087
FS:  00007f644b697700(0000) GS:ffff88806cf00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffcd37370b0 CR3: 00000000679f2006 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 tcp_sendmsg+0x2a/0x40 net/ipv4/tcp.c:1533
 inet_sendmsg+0x173/0x4e0 net/ipv4/af_inet.c:784
 sock_sendmsg_nosec net/socket.c:646 [inline]
 sock_sendmsg+0xc3/0x100 net/socket.c:656
 SYSC_sendto+0x35d/0x5e0 net/socket.c:1766
 do_syscall_64+0x241/0x680 arch/x86/entry/common.c:292
 entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x7f644afc6469
RSP: 002b:00007f644b696f28 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000602130 RCX: 00007f644afc6469
RDX: 0000000000000087 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 0000000000602138 R08: 0000000000000000 R09: 0000000000000391
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000060213c
R13: 00007ffcd373700f R14: 00007f644b677000 R15: 0000000000000003
Code: 74 08 3c 03 0f 8e f1 32 00 00 8b 85 98 fd ff ff 89 85 60 fd ff
ff 48 8b 85 70 fd ff ff 48 8d b8 a0 00 00 00 48 89 f8 48 c1 e8 03 <42>
0f b6 04 20 84 c0 74 06 0f 8e d2 32 00 00 4c 8b bd 70 fd ff
RIP: tcp_sendmsg_locked+0x6b4/0x4390 net/ipv4/tcp.c:1350 RSP: ffff8880642bf718
---[ end trace 70f07f242cd3b9d8 ]---


It's because skb is NULL in tcp_sendmsg_locked at:
                  skb = tcp_write_queue_tail(sk);
                  if (tcp_send_head(sk)) {
                          if (skb->ip_summed == CHECKSUM_NONE)


I think we need this here on pre-rb-tree kernels :

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 5ce069ce2a97..efe767e20d01 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -924,8 +924,7 @@ static void tcp_remove_empty_skb(struct sock *sk,
struct sk_buff *skb)
 {
  if (skb && !skb->len) {
  tcp_unlink_write_queue(skb, sk);
- if (tcp_write_queue_empty(sk))
- tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
+ tcp_check_send_head(sk, skb);
  sk_wmem_free_skb(sk, skb);
  }
 }

Does that look good?


Thanks,
Christoph

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] tcp: remove empty skb from write queue in error cases
  2019-09-11 17:36 ` Christoph Paasch
@ 2019-09-11 20:59   ` Eric Dumazet
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2019-09-11 20:59 UTC (permalink / raw)
  To: Christoph Paasch, Greg Kroah-Hartman
  Cc: David S . Miller, netdev, Soheil Hassas Yeganeh, Neal Cardwell,
	Eric Dumazet, Jason Baron, Vladimir Rutsky

On Wed, Sep 11, 2019 at 7:36 PM Christoph Paasch
<christoph.paasch@gmail.com> wrote:
>
> Hello,
>
> On Mon, Aug 26, 2019 at 11:04 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > Vladimir Rutsky reported stuck TCP sessions after memory pressure
> > events. Edge Trigger epoll() user would never receive an EPOLLOUT
> > notification allowing them to retry a sendmsg().
> >
> > Jason tested the case of sk_stream_alloc_skb() returning NULL,
> > but there are other paths that could lead both sendmsg() and sendpage()
> > to return -1 (EAGAIN), with an empty skb queued on the write queue.
> >
> > This patch makes sure we remove this empty skb so that
> > Jason code can detect that the queue is empty, and
> > call sk->sk_write_space(sk) accordingly.
> >
> > Fixes: ce5ec440994b ("tcp: ensure epoll edge trigger wakeup when write queue is empty")
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Cc: Jason Baron <jbaron@akamai.com>
> > Reported-by: Vladimir Rutsky <rutsky@google.com>
> > Cc: Soheil Hassas Yeganeh <soheil@google.com>
> > Cc: Neal Cardwell <ncardwell@google.com>
> > ---
> >  net/ipv4/tcp.c | 30 ++++++++++++++++++++----------
> >  1 file changed, 20 insertions(+), 10 deletions(-)
>
> I got syzkaller complaining now on 4.14.143 with the following reproducer:
>
> # {Threaded:true Collide:true Repeat:true RepeatTimes:0 Procs:1
> Sandbox: Fault:false FaultCall:-1 FaultNth:0 EnableTun:false
> UseTmpDir:false EnableCgroups:false EnableNetdev:false ResetNet:false
> HandleSegv:false Repro:false Trace:false}
> r0 = socket$inet_tcp(0x2, 0x1, 0x0)
> setsockopt$inet_tcp_TCP_REPAIR(r0, 0x6, 0x13, &(0x7f0000000040)=0x1, 0x4)
> setsockopt$inet_tcp_TCP_REPAIR_QUEUE(r0, 0x6, 0x14, &(0x7f00000012c0)=0x2, 0x4)
> setsockopt$inet_tcp_int(r0, 0x6, 0x19, &(0x7f0000000000)=0x9, 0x4)
> setsockopt$inet_tcp_TCP_MD5SIG(r0, 0x6, 0xe,
> &(0x7f00000001c0)={@in={{0x2, 0x0, @empty}}, 0x0, 0x2, 0x0,
> "c157cf4809151e5e89cfd6d934fbe981ec8ff6afc252ccf486c325c7ff3d35f3a89412a5cb6430e169092617df2ba65bf0ab844572e4e7dd4ece8ec1de5ac1ccd870067b018cb3b1f05f2391d872b67d"},
> 0xd8)
> connect$inet(r0, &(0x7f0000000080)={0x2, 0x0, @dev={0xac, 0x14, 0x14,
> 0x1d}}, 0x10)
> sendto(r0, 0x0, 0x87, 0x0, 0x0, 0x391)
>
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault: 0000 [#1] SMP KASAN PTI
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Modules linked in:
> CPU: 1 PID: 2529 Comm: syz-executor709 Not tainted 4.14.143 #5
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
> task: ffff8880677fdc00 task.stack: ffff8880642b0000
> RIP: 0010:tcp_sendmsg_locked+0x6b4/0x4390 net/ipv4/tcp.c:1350
> RSP: 0018:ffff8880642bf718 EFLAGS: 00010206
> RAX: 0000000000000014 RBX: 0000000000000087 RCX: ffff88806a794f50
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000000a0
> RBP: ffff8880642bfaa8 R08: 0000000000000006 R09: ffff8880677fe3a0
> R10: 0000000000000000 R11: 0000000000000000 R12: dffffc0000000000
> R13: ffff88806a794f50 R14: ffff88806a794d00 R15: 0000000000000087
> FS:  00007f644b697700(0000) GS:ffff88806cf00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007ffcd37370b0 CR3: 00000000679f2006 CR4: 00000000003606e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  tcp_sendmsg+0x2a/0x40 net/ipv4/tcp.c:1533
>  inet_sendmsg+0x173/0x4e0 net/ipv4/af_inet.c:784
>  sock_sendmsg_nosec net/socket.c:646 [inline]
>  sock_sendmsg+0xc3/0x100 net/socket.c:656
>  SYSC_sendto+0x35d/0x5e0 net/socket.c:1766
>  do_syscall_64+0x241/0x680 arch/x86/entry/common.c:292
>  entry_SYSCALL_64_after_hwframe+0x42/0xb7
> RIP: 0033:0x7f644afc6469
> RSP: 002b:00007f644b696f28 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
> RAX: ffffffffffffffda RBX: 0000000000602130 RCX: 00007f644afc6469
> RDX: 0000000000000087 RSI: 0000000000000000 RDI: 0000000000000003
> RBP: 0000000000602138 R08: 0000000000000000 R09: 0000000000000391
> R10: 0000000000000000 R11: 0000000000000246 R12: 000000000060213c
> R13: 00007ffcd373700f R14: 00007f644b677000 R15: 0000000000000003
> Code: 74 08 3c 03 0f 8e f1 32 00 00 8b 85 98 fd ff ff 89 85 60 fd ff
> ff 48 8b 85 70 fd ff ff 48 8d b8 a0 00 00 00 48 89 f8 48 c1 e8 03 <42>
> 0f b6 04 20 84 c0 74 06 0f 8e d2 32 00 00 4c 8b bd 70 fd ff
> RIP: tcp_sendmsg_locked+0x6b4/0x4390 net/ipv4/tcp.c:1350 RSP: ffff8880642bf718
> ---[ end trace 70f07f242cd3b9d8 ]---
>
>
> It's because skb is NULL in tcp_sendmsg_locked at:
>                   skb = tcp_write_queue_tail(sk);
>                   if (tcp_send_head(sk)) {
>                           if (skb->ip_summed == CHECKSUM_NONE)
>
>
> I think we need this here on pre-rb-tree kernels :
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 5ce069ce2a97..efe767e20d01 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -924,8 +924,7 @@ static void tcp_remove_empty_skb(struct sock *sk,
> struct sk_buff *skb)
>  {
>   if (skb && !skb->len) {
>   tcp_unlink_write_queue(skb, sk);
> - if (tcp_write_queue_empty(sk))
> - tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
> + tcp_check_send_head(sk, skb);
>   sk_wmem_free_skb(sk, skb);
>   }
>  }
>
> Does that look good?
>

Yes the backport to 4.14.143 was not done properly.

Thanks

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, back to index

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-26 16:19 [PATCH net] tcp: remove empty skb from write queue in error cases Eric Dumazet
2019-08-26 16:50 ` Soheil Hassas Yeganeh
2019-08-26 17:18 ` Neal Cardwell
2019-08-26 19:56 ` Jason Baron
2019-08-26 20:04   ` Eric Dumazet
2019-08-28  4:38 ` David Miller
2019-09-11 17:36 ` Christoph Paasch
2019-09-11 20:59   ` Eric Dumazet

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org netdev@archiver.kernel.org
	public-inbox-index netdev


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox