All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] tcp: do not send empty skb from tcp_write_xmit()
@ 2019-12-11  7:34 Eric Dumazet
  2019-12-11 19:17 ` Neal Cardwell
  2019-12-11 20:05 ` Neal Cardwell
  0 siblings, 2 replies; 6+ messages in thread
From: Eric Dumazet @ 2019-12-11  7:34 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, Christoph Paasch,
	Neal Cardwell, Jason Baron

Backport of commit fdfc5c8594c2 ("tcp: remove empty skb from
write queue in error cases") in linux-4.14 stable triggered
various bugs. One of them has been fixed in commit ba2ddb43f270
("tcp: Don't dequeue SYN/FIN-segments from write-queue"), but
we still have crashes in some occasions.

Root-cause is that when tcp_sendmsg() has allocated a fresh
skb and could not append a fragment before being blocked
in sk_stream_wait_memory(), tcp_write_xmit() might be called
and decide to send this fresh and empty skb.

Sending an empty packet is not only silly, it might have caused
many issues we had in the past with tp->packets_out being
out of sync.

Fixes: c65f7f00c587 ("[TCP]: Simplify SKB data portion allocation with NETIF_F_SG.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Christoph Paasch <cpaasch@apple.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Jason Baron <jbaron@akamai.com>
---
 net/ipv4/tcp_output.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index b184f03d743715ef4b2d166ceae651529be77953..57f434a8e41ffd6bc584cb4d9e87703491a378c1 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2438,6 +2438,14 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 		if (tcp_small_queue_check(sk, skb, 0))
 			break;
 
+		/* Argh, we hit an empty skb(), presumably a thread
+		 * is sleeping in sendmsg()/sk_stream_wait_memory().
+		 * We do not want to send a pure-ack packet and have
+		 * a strange looking rtx queue with empty packet(s).
+		 */
+		if (TCP_SKB_CB(skb)->end_seq == TCP_SKB_CB(skb)->seq)
+			break;
+
 		if (unlikely(tcp_transmit_skb(sk, skb, 1, gfp)))
 			break;
 
-- 
2.24.0.525.g8f36a354ae-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net] tcp: do not send empty skb from tcp_write_xmit()
  2019-12-11  7:34 [PATCH net] tcp: do not send empty skb from tcp_write_xmit() Eric Dumazet
@ 2019-12-11 19:17 ` Neal Cardwell
  2019-12-11 19:39   ` Eric Dumazet
  2019-12-11 20:05 ` Neal Cardwell
  1 sibling, 1 reply; 6+ messages in thread
From: Neal Cardwell @ 2019-12-11 19:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, netdev, Eric Dumazet, Christoph Paasch,
	Jason Baron, Yuchung Cheng, Soheil Hassas Yeganeh

On Wed, Dec 11, 2019 at 2:34 AM Eric Dumazet <edumazet@google.com> wrote:
>
> Backport of commit fdfc5c8594c2 ("tcp: remove empty skb from
> write queue in error cases") in linux-4.14 stable triggered
> various bugs. One of them has been fixed in commit ba2ddb43f270
> ("tcp: Don't dequeue SYN/FIN-segments from write-queue"), but
> we still have crashes in some occasions.
>
> Root-cause is that when tcp_sendmsg() has allocated a fresh
> skb and could not append a fragment before being blocked
> in sk_stream_wait_memory(), tcp_write_xmit() might be called
> and decide to send this fresh and empty skb.
>
> Sending an empty packet is not only silly, it might have caused
> many issues we had in the past with tp->packets_out being
> out of sync.
>
> Fixes: c65f7f00c587 ("[TCP]: Simplify SKB data portion allocation with NETIF_F_SG.")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Christoph Paasch <cpaasch@apple.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> Cc: Jason Baron <jbaron@akamai.com>
> ---
>  net/ipv4/tcp_output.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index b184f03d743715ef4b2d166ceae651529be77953..57f434a8e41ffd6bc584cb4d9e87703491a378c1 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2438,6 +2438,14 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
>                 if (tcp_small_queue_check(sk, skb, 0))
>                         break;
>
> +               /* Argh, we hit an empty skb(), presumably a thread
> +                * is sleeping in sendmsg()/sk_stream_wait_memory().
> +                * We do not want to send a pure-ack packet and have
> +                * a strange looking rtx queue with empty packet(s).
> +                */
> +               if (TCP_SKB_CB(skb)->end_seq == TCP_SKB_CB(skb)->seq)
> +                       break;
> +
>                 if (unlikely(tcp_transmit_skb(sk, skb, 1, gfp)))
>                         break;
>
> --

Thanks for the fix, Eric!

Is there any risk that any current or future bugs that create
persistently empty skbs could cause the connection to "freeze", unable
to reach the tcp_transmit_skb() call in tcp_write_xmit()?

To avoid this risk, would it make sense to delete the empty skb and
continue the tcp_write_xmit() transmit loop, rather than breaking out
of the loop?

Just curious to learn. :-)

thanks,
neal

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] tcp: do not send empty skb from tcp_write_xmit()
  2019-12-11 19:17 ` Neal Cardwell
@ 2019-12-11 19:39   ` Eric Dumazet
  2019-12-11 19:43     ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2019-12-11 19:39 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: David S . Miller, netdev, Eric Dumazet, Christoph Paasch,
	Jason Baron, Yuchung Cheng, Soheil Hassas Yeganeh

On Wed, Dec 11, 2019 at 11:17 AM Neal Cardwell <ncardwell@google.com> wrote:
>
> On Wed, Dec 11, 2019 at 2:34 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > Backport of commit fdfc5c8594c2 ("tcp: remove empty skb from
> > write queue in error cases") in linux-4.14 stable triggered
> > various bugs. One of them has been fixed in commit ba2ddb43f270
> > ("tcp: Don't dequeue SYN/FIN-segments from write-queue"), but
> > we still have crashes in some occasions.
> >
> > Root-cause is that when tcp_sendmsg() has allocated a fresh
> > skb and could not append a fragment before being blocked
> > in sk_stream_wait_memory(), tcp_write_xmit() might be called
> > and decide to send this fresh and empty skb.
> >
> > Sending an empty packet is not only silly, it might have caused
> > many issues we had in the past with tp->packets_out being
> > out of sync.
> >
> > Fixes: c65f7f00c587 ("[TCP]: Simplify SKB data portion allocation with NETIF_F_SG.")
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Cc: Christoph Paasch <cpaasch@apple.com>
> > Cc: Neal Cardwell <ncardwell@google.com>
> > Cc: Jason Baron <jbaron@akamai.com>
> > ---
> >  net/ipv4/tcp_output.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > index b184f03d743715ef4b2d166ceae651529be77953..57f434a8e41ffd6bc584cb4d9e87703491a378c1 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -2438,6 +2438,14 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
> >                 if (tcp_small_queue_check(sk, skb, 0))
> >                         break;
> >
> > +               /* Argh, we hit an empty skb(), presumably a thread
> > +                * is sleeping in sendmsg()/sk_stream_wait_memory().
> > +                * We do not want to send a pure-ack packet and have
> > +                * a strange looking rtx queue with empty packet(s).
> > +                */
> > +               if (TCP_SKB_CB(skb)->end_seq == TCP_SKB_CB(skb)->seq)
> > +                       break;
> > +
> >                 if (unlikely(tcp_transmit_skb(sk, skb, 1, gfp)))
> >                         break;
> >
> > --
>
> Thanks for the fix, Eric!
>
> Is there any risk that any current or future bugs that create
> persistently empty skbs could cause the connection to "freeze", unable
> to reach the tcp_transmit_skb() call in tcp_write_xmit()?
>
> To avoid this risk, would it make sense to delete the empty skb and
> continue the tcp_write_xmit() transmit loop, rather than breaking out
> of the loop?

This 'empty' skb must be the last in the queue.

Removing it from the queue would not prevent tcp_write_xmit() from
breaking the loop.

If we remove it, then we force sendmsg() to re-allocate a fresh skb,
which seems more work, and would paper around the bugs that would be
un-noticed.

Another question to ask is if we need to reconsider how we use
tcp_write_queue_empty() as an indicator
for 'I  have something in the write queue'

This is obviously wrong if the write queue has a single empty skb.

Maybe we should instead use tp->write_seq != tp->snd_nxt

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] tcp: do not send empty skb from tcp_write_xmit()
  2019-12-11 19:39   ` Eric Dumazet
@ 2019-12-11 19:43     ` Eric Dumazet
  2019-12-11 20:04       ` Neal Cardwell
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2019-12-11 19:43 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: David S . Miller, netdev, Eric Dumazet, Christoph Paasch,
	Jason Baron, Yuchung Cheng, Soheil Hassas Yeganeh

On Wed, Dec 11, 2019 at 11:39 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, Dec 11, 2019 at 11:17 AM Neal Cardwell <ncardwell@google.com> wrote:
> >
> > On Wed, Dec 11, 2019 at 2:34 AM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > Backport of commit fdfc5c8594c2 ("tcp: remove empty skb from
> > > write queue in error cases") in linux-4.14 stable triggered
> > > various bugs. One of them has been fixed in commit ba2ddb43f270
> > > ("tcp: Don't dequeue SYN/FIN-segments from write-queue"), but
> > > we still have crashes in some occasions.
> > >
> > > Root-cause is that when tcp_sendmsg() has allocated a fresh
> > > skb and could not append a fragment before being blocked
> > > in sk_stream_wait_memory(), tcp_write_xmit() might be called
> > > and decide to send this fresh and empty skb.
> > >
> > > Sending an empty packet is not only silly, it might have caused
> > > many issues we had in the past with tp->packets_out being
> > > out of sync.
> > >
> > > Fixes: c65f7f00c587 ("[TCP]: Simplify SKB data portion allocation with NETIF_F_SG.")
> > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > Cc: Christoph Paasch <cpaasch@apple.com>
> > > Cc: Neal Cardwell <ncardwell@google.com>
> > > Cc: Jason Baron <jbaron@akamai.com>
> > > ---
> > >  net/ipv4/tcp_output.c | 8 ++++++++
> > >  1 file changed, 8 insertions(+)
> > >
> > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > > index b184f03d743715ef4b2d166ceae651529be77953..57f434a8e41ffd6bc584cb4d9e87703491a378c1 100644
> > > --- a/net/ipv4/tcp_output.c
> > > +++ b/net/ipv4/tcp_output.c
> > > @@ -2438,6 +2438,14 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
> > >                 if (tcp_small_queue_check(sk, skb, 0))
> > >                         break;
> > >
> > > +               /* Argh, we hit an empty skb(), presumably a thread
> > > +                * is sleeping in sendmsg()/sk_stream_wait_memory().
> > > +                * We do not want to send a pure-ack packet and have
> > > +                * a strange looking rtx queue with empty packet(s).
> > > +                */
> > > +               if (TCP_SKB_CB(skb)->end_seq == TCP_SKB_CB(skb)->seq)
> > > +                       break;
> > > +
> > >                 if (unlikely(tcp_transmit_skb(sk, skb, 1, gfp)))
> > >                         break;
> > >
> > > --
> >
> > Thanks for the fix, Eric!
> >
> > Is there any risk that any current or future bugs that create
> > persistently empty skbs could cause the connection to "freeze", unable
> > to reach the tcp_transmit_skb() call in tcp_write_xmit()?
> >
> > To avoid this risk, would it make sense to delete the empty skb and
> > continue the tcp_write_xmit() transmit loop, rather than breaking out
> > of the loop?
>
> This 'empty' skb must be the last in the queue.
>
> Removing it from the queue would not prevent tcp_write_xmit() from
> breaking the loop.
>
> If we remove it, then we force sendmsg() to re-allocate a fresh skb,
> which seems more work, and would paper around the bugs that would be
> un-noticed.
>
> Another question to ask is if we need to reconsider how we use
> tcp_write_queue_empty() as an indicator
> for 'I  have something in the write queue'
>
> This is obviously wrong if the write queue has a single empty skb.
>
> Maybe we should instead use tp->write_seq != tp->snd_nxt

Also we use skb_queue_len(&sk->sk_write_queue) == 0 in
tcp_sendmsg_locked() and this seems not good.

We could have changed tcp_sendmsg_locked() to leave the empty skb in
the write queue.
( and we can thus remove tcp_remove_empty_skb() completely since it
has caused many problems in stable kernels)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 8a39ee79489192c02385aaadc8d1ae969fb55d23..9ba3294de9cb4e929afdc0e00b01b6b534c84af6
100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1419,7 +1419,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct
msghdr *msg, size_t size)
        sock_zerocopy_put_abort(uarg, true);
        err = sk_stream_error(sk, flags, err);
        /* make sure we wake any epoll edge trigger waiter */
-       if (unlikely(skb_queue_len(&sk->sk_write_queue) == 0 &&
+       if (unlikely(tp->write_seq == tp->snd_nxt &&
                     err == -EAGAIN)) {
                sk->sk_write_space(sk);
                tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] tcp: do not send empty skb from tcp_write_xmit()
  2019-12-11 19:43     ` Eric Dumazet
@ 2019-12-11 20:04       ` Neal Cardwell
  0 siblings, 0 replies; 6+ messages in thread
From: Neal Cardwell @ 2019-12-11 20:04 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, netdev, Eric Dumazet, Christoph Paasch,
	Jason Baron, Yuchung Cheng, Soheil Hassas Yeganeh

On Wed, Dec 11, 2019 at 2:44 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, Dec 11, 2019 at 11:39 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Wed, Dec 11, 2019 at 11:17 AM Neal Cardwell <ncardwell@google.com> wrote:
> > >
> > > On Wed, Dec 11, 2019 at 2:34 AM Eric Dumazet <edumazet@google.com> wrote:
> > > >
> > > > Backport of commit fdfc5c8594c2 ("tcp: remove empty skb from
> > > > write queue in error cases") in linux-4.14 stable triggered
> > > > various bugs. One of them has been fixed in commit ba2ddb43f270
> > > > ("tcp: Don't dequeue SYN/FIN-segments from write-queue"), but
> > > > we still have crashes in some occasions.
> > > >
> > > > Root-cause is that when tcp_sendmsg() has allocated a fresh
> > > > skb and could not append a fragment before being blocked
> > > > in sk_stream_wait_memory(), tcp_write_xmit() might be called
> > > > and decide to send this fresh and empty skb.
> > > >
> > > > Sending an empty packet is not only silly, it might have caused
> > > > many issues we had in the past with tp->packets_out being
> > > > out of sync.
> > > >
> > > > Fixes: c65f7f00c587 ("[TCP]: Simplify SKB data portion allocation with NETIF_F_SG.")
> > > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > > Cc: Christoph Paasch <cpaasch@apple.com>
> > > > Cc: Neal Cardwell <ncardwell@google.com>
> > > > Cc: Jason Baron <jbaron@akamai.com>
> > > > ---
> > > >  net/ipv4/tcp_output.c | 8 ++++++++
> > > >  1 file changed, 8 insertions(+)
> > > >
> > > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > > > index b184f03d743715ef4b2d166ceae651529be77953..57f434a8e41ffd6bc584cb4d9e87703491a378c1 100644
> > > > --- a/net/ipv4/tcp_output.c
> > > > +++ b/net/ipv4/tcp_output.c
> > > > @@ -2438,6 +2438,14 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
> > > >                 if (tcp_small_queue_check(sk, skb, 0))
> > > >                         break;
> > > >
> > > > +               /* Argh, we hit an empty skb(), presumably a thread
> > > > +                * is sleeping in sendmsg()/sk_stream_wait_memory().
> > > > +                * We do not want to send a pure-ack packet and have
> > > > +                * a strange looking rtx queue with empty packet(s).
> > > > +                */
> > > > +               if (TCP_SKB_CB(skb)->end_seq == TCP_SKB_CB(skb)->seq)
> > > > +                       break;
> > > > +
> > > >                 if (unlikely(tcp_transmit_skb(sk, skb, 1, gfp)))
> > > >                         break;
> > > >
> > > > --
> > >
> > > Thanks for the fix, Eric!
> > >
> > > Is there any risk that any current or future bugs that create
> > > persistently empty skbs could cause the connection to "freeze", unable
> > > to reach the tcp_transmit_skb() call in tcp_write_xmit()?
> > >
> > > To avoid this risk, would it make sense to delete the empty skb and
> > > continue the tcp_write_xmit() transmit loop, rather than breaking out
> > > of the loop?
> >
> > This 'empty' skb must be the last in the queue.
> >
> > Removing it from the queue would not prevent tcp_write_xmit() from
> > breaking the loop.
> >
> > If we remove it, then we force sendmsg() to re-allocate a fresh skb,
> > which seems more work, and would paper around the bugs that would be
> > un-noticed.
> >
> > Another question to ask is if we need to reconsider how we use
> > tcp_write_queue_empty() as an indicator
> > for 'I  have something in the write queue'
> >
> > This is obviously wrong if the write queue has a single empty skb.
> >
> > Maybe we should instead use tp->write_seq != tp->snd_nxt
>
> Also we use skb_queue_len(&sk->sk_write_queue) == 0 in
> tcp_sendmsg_locked() and this seems not good.
>
> We could have changed tcp_sendmsg_locked() to leave the empty skb in
> the write queue.
> ( and we can thus remove tcp_remove_empty_skb() completely since it
> has caused many problems in stable kernels)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 8a39ee79489192c02385aaadc8d1ae969fb55d23..9ba3294de9cb4e929afdc0e00b01b6b534c84af6
> 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -1419,7 +1419,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct
> msghdr *msg, size_t size)
>         sock_zerocopy_put_abort(uarg, true);
>         err = sk_stream_error(sk, flags, err);
>         /* make sure we wake any epoll edge trigger waiter */
> -       if (unlikely(skb_queue_len(&sk->sk_write_queue) == 0 &&
> +       if (unlikely(tp->write_seq == tp->snd_nxt &&
>                      err == -EAGAIN)) {
>                 sk->sk_write_space(sk);
>                 tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED);

Thanks, Eric.

I like your idea to audit the TCP calls to tcp_write_queue_empty() and
skb_queue_len(&sk->sk_write_queue) to see which ones should be
replaced with comparisons of tp->write_seq and tp->snd_nxt (including
this one). Good catch!

neal

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] tcp: do not send empty skb from tcp_write_xmit()
  2019-12-11  7:34 [PATCH net] tcp: do not send empty skb from tcp_write_xmit() Eric Dumazet
  2019-12-11 19:17 ` Neal Cardwell
@ 2019-12-11 20:05 ` Neal Cardwell
  1 sibling, 0 replies; 6+ messages in thread
From: Neal Cardwell @ 2019-12-11 20:05 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, netdev, Eric Dumazet, Christoph Paasch, Jason Baron

On Wed, Dec 11, 2019 at 2:34 AM Eric Dumazet <edumazet@google.com> wrote:
>
> Backport of commit fdfc5c8594c2 ("tcp: remove empty skb from
> write queue in error cases") in linux-4.14 stable triggered
> various bugs. One of them has been fixed in commit ba2ddb43f270
> ("tcp: Don't dequeue SYN/FIN-segments from write-queue"), but
> we still have crashes in some occasions.
>
> Root-cause is that when tcp_sendmsg() has allocated a fresh
> skb and could not append a fragment before being blocked
> in sk_stream_wait_memory(), tcp_write_xmit() might be called
> and decide to send this fresh and empty skb.
>
> Sending an empty packet is not only silly, it might have caused
> many issues we had in the past with tp->packets_out being
> out of sync.
>
> Fixes: c65f7f00c587 ("[TCP]: Simplify SKB data portion allocation with NETIF_F_SG.")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Christoph Paasch <cpaasch@apple.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> Cc: Jason Baron <jbaron@akamai.com>
> ---

Acked-by: Neal Cardwell <ncardwell@google.com>

thanks,
neal

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-12-11 20:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-11  7:34 [PATCH net] tcp: do not send empty skb from tcp_write_xmit() Eric Dumazet
2019-12-11 19:17 ` Neal Cardwell
2019-12-11 19:39   ` Eric Dumazet
2019-12-11 19:43     ` Eric Dumazet
2019-12-11 20:04       ` Neal Cardwell
2019-12-11 20:05 ` Neal Cardwell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.