* [PATCH net] tcp: do not send empty skb from tcp_write_xmit() @ 2019-12-11 7:34 Eric Dumazet 2019-12-11 19:17 ` Neal Cardwell 2019-12-11 20:05 ` Neal Cardwell 0 siblings, 2 replies; 6+ messages in thread From: Eric Dumazet @ 2019-12-11 7:34 UTC (permalink / raw) To: David S . Miller Cc: netdev, Eric Dumazet, Eric Dumazet, Christoph Paasch, Neal Cardwell, Jason Baron Backport of commit fdfc5c8594c2 ("tcp: remove empty skb from write queue in error cases") in linux-4.14 stable triggered various bugs. One of them has been fixed in commit ba2ddb43f270 ("tcp: Don't dequeue SYN/FIN-segments from write-queue"), but we still have crashes in some occasions. Root-cause is that when tcp_sendmsg() has allocated a fresh skb and could not append a fragment before being blocked in sk_stream_wait_memory(), tcp_write_xmit() might be called and decide to send this fresh and empty skb. Sending an empty packet is not only silly, it might have caused many issues we had in the past with tp->packets_out being out of sync. Fixes: c65f7f00c587 ("[TCP]: Simplify SKB data portion allocation with NETIF_F_SG.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Christoph Paasch <cpaasch@apple.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Jason Baron <jbaron@akamai.com> --- net/ipv4/tcp_output.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index b184f03d743715ef4b2d166ceae651529be77953..57f434a8e41ffd6bc584cb4d9e87703491a378c1 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2438,6 +2438,14 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, if (tcp_small_queue_check(sk, skb, 0)) break; + /* Argh, we hit an empty skb(), presumably a thread + * is sleeping in sendmsg()/sk_stream_wait_memory(). + * We do not want to send a pure-ack packet and have + * a strange looking rtx queue with empty packet(s). + */ + if (TCP_SKB_CB(skb)->end_seq == TCP_SKB_CB(skb)->seq) + break; + if (unlikely(tcp_transmit_skb(sk, skb, 1, gfp))) break; -- 2.24.0.525.g8f36a354ae-goog ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net] tcp: do not send empty skb from tcp_write_xmit() 2019-12-11 7:34 [PATCH net] tcp: do not send empty skb from tcp_write_xmit() Eric Dumazet @ 2019-12-11 19:17 ` Neal Cardwell 2019-12-11 19:39 ` Eric Dumazet 2019-12-11 20:05 ` Neal Cardwell 1 sibling, 1 reply; 6+ messages in thread From: Neal Cardwell @ 2019-12-11 19:17 UTC (permalink / raw) To: Eric Dumazet Cc: David S . Miller, netdev, Eric Dumazet, Christoph Paasch, Jason Baron, Yuchung Cheng, Soheil Hassas Yeganeh On Wed, Dec 11, 2019 at 2:34 AM Eric Dumazet <edumazet@google.com> wrote: > > Backport of commit fdfc5c8594c2 ("tcp: remove empty skb from > write queue in error cases") in linux-4.14 stable triggered > various bugs. One of them has been fixed in commit ba2ddb43f270 > ("tcp: Don't dequeue SYN/FIN-segments from write-queue"), but > we still have crashes in some occasions. > > Root-cause is that when tcp_sendmsg() has allocated a fresh > skb and could not append a fragment before being blocked > in sk_stream_wait_memory(), tcp_write_xmit() might be called > and decide to send this fresh and empty skb. > > Sending an empty packet is not only silly, it might have caused > many issues we had in the past with tp->packets_out being > out of sync. > > Fixes: c65f7f00c587 ("[TCP]: Simplify SKB data portion allocation with NETIF_F_SG.") > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Christoph Paasch <cpaasch@apple.com> > Cc: Neal Cardwell <ncardwell@google.com> > Cc: Jason Baron <jbaron@akamai.com> > --- > net/ipv4/tcp_output.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index b184f03d743715ef4b2d166ceae651529be77953..57f434a8e41ffd6bc584cb4d9e87703491a378c1 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -2438,6 +2438,14 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, > if (tcp_small_queue_check(sk, skb, 0)) > break; > > + /* Argh, we hit an empty skb(), presumably a thread > + * is sleeping in sendmsg()/sk_stream_wait_memory(). > + * We do not want to send a pure-ack packet and have > + * a strange looking rtx queue with empty packet(s). > + */ > + if (TCP_SKB_CB(skb)->end_seq == TCP_SKB_CB(skb)->seq) > + break; > + > if (unlikely(tcp_transmit_skb(sk, skb, 1, gfp))) > break; > > -- Thanks for the fix, Eric! Is there any risk that any current or future bugs that create persistently empty skbs could cause the connection to "freeze", unable to reach the tcp_transmit_skb() call in tcp_write_xmit()? To avoid this risk, would it make sense to delete the empty skb and continue the tcp_write_xmit() transmit loop, rather than breaking out of the loop? Just curious to learn. :-) thanks, neal ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net] tcp: do not send empty skb from tcp_write_xmit() 2019-12-11 19:17 ` Neal Cardwell @ 2019-12-11 19:39 ` Eric Dumazet 2019-12-11 19:43 ` Eric Dumazet 0 siblings, 1 reply; 6+ messages in thread From: Eric Dumazet @ 2019-12-11 19:39 UTC (permalink / raw) To: Neal Cardwell Cc: David S . Miller, netdev, Eric Dumazet, Christoph Paasch, Jason Baron, Yuchung Cheng, Soheil Hassas Yeganeh On Wed, Dec 11, 2019 at 11:17 AM Neal Cardwell <ncardwell@google.com> wrote: > > On Wed, Dec 11, 2019 at 2:34 AM Eric Dumazet <edumazet@google.com> wrote: > > > > Backport of commit fdfc5c8594c2 ("tcp: remove empty skb from > > write queue in error cases") in linux-4.14 stable triggered > > various bugs. One of them has been fixed in commit ba2ddb43f270 > > ("tcp: Don't dequeue SYN/FIN-segments from write-queue"), but > > we still have crashes in some occasions. > > > > Root-cause is that when tcp_sendmsg() has allocated a fresh > > skb and could not append a fragment before being blocked > > in sk_stream_wait_memory(), tcp_write_xmit() might be called > > and decide to send this fresh and empty skb. > > > > Sending an empty packet is not only silly, it might have caused > > many issues we had in the past with tp->packets_out being > > out of sync. > > > > Fixes: c65f7f00c587 ("[TCP]: Simplify SKB data portion allocation with NETIF_F_SG.") > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > Cc: Christoph Paasch <cpaasch@apple.com> > > Cc: Neal Cardwell <ncardwell@google.com> > > Cc: Jason Baron <jbaron@akamai.com> > > --- > > net/ipv4/tcp_output.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > > index b184f03d743715ef4b2d166ceae651529be77953..57f434a8e41ffd6bc584cb4d9e87703491a378c1 100644 > > --- a/net/ipv4/tcp_output.c > > +++ b/net/ipv4/tcp_output.c > > @@ -2438,6 +2438,14 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, > > if (tcp_small_queue_check(sk, skb, 0)) > > break; > > > > + /* Argh, we hit an empty skb(), presumably a thread > > + * is sleeping in sendmsg()/sk_stream_wait_memory(). > > + * We do not want to send a pure-ack packet and have > > + * a strange looking rtx queue with empty packet(s). > > + */ > > + if (TCP_SKB_CB(skb)->end_seq == TCP_SKB_CB(skb)->seq) > > + break; > > + > > if (unlikely(tcp_transmit_skb(sk, skb, 1, gfp))) > > break; > > > > -- > > Thanks for the fix, Eric! > > Is there any risk that any current or future bugs that create > persistently empty skbs could cause the connection to "freeze", unable > to reach the tcp_transmit_skb() call in tcp_write_xmit()? > > To avoid this risk, would it make sense to delete the empty skb and > continue the tcp_write_xmit() transmit loop, rather than breaking out > of the loop? This 'empty' skb must be the last in the queue. Removing it from the queue would not prevent tcp_write_xmit() from breaking the loop. If we remove it, then we force sendmsg() to re-allocate a fresh skb, which seems more work, and would paper around the bugs that would be un-noticed. Another question to ask is if we need to reconsider how we use tcp_write_queue_empty() as an indicator for 'I have something in the write queue' This is obviously wrong if the write queue has a single empty skb. Maybe we should instead use tp->write_seq != tp->snd_nxt ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net] tcp: do not send empty skb from tcp_write_xmit() 2019-12-11 19:39 ` Eric Dumazet @ 2019-12-11 19:43 ` Eric Dumazet 2019-12-11 20:04 ` Neal Cardwell 0 siblings, 1 reply; 6+ messages in thread From: Eric Dumazet @ 2019-12-11 19:43 UTC (permalink / raw) To: Neal Cardwell Cc: David S . Miller, netdev, Eric Dumazet, Christoph Paasch, Jason Baron, Yuchung Cheng, Soheil Hassas Yeganeh On Wed, Dec 11, 2019 at 11:39 AM Eric Dumazet <edumazet@google.com> wrote: > > On Wed, Dec 11, 2019 at 11:17 AM Neal Cardwell <ncardwell@google.com> wrote: > > > > On Wed, Dec 11, 2019 at 2:34 AM Eric Dumazet <edumazet@google.com> wrote: > > > > > > Backport of commit fdfc5c8594c2 ("tcp: remove empty skb from > > > write queue in error cases") in linux-4.14 stable triggered > > > various bugs. One of them has been fixed in commit ba2ddb43f270 > > > ("tcp: Don't dequeue SYN/FIN-segments from write-queue"), but > > > we still have crashes in some occasions. > > > > > > Root-cause is that when tcp_sendmsg() has allocated a fresh > > > skb and could not append a fragment before being blocked > > > in sk_stream_wait_memory(), tcp_write_xmit() might be called > > > and decide to send this fresh and empty skb. > > > > > > Sending an empty packet is not only silly, it might have caused > > > many issues we had in the past with tp->packets_out being > > > out of sync. > > > > > > Fixes: c65f7f00c587 ("[TCP]: Simplify SKB data portion allocation with NETIF_F_SG.") > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > > Cc: Christoph Paasch <cpaasch@apple.com> > > > Cc: Neal Cardwell <ncardwell@google.com> > > > Cc: Jason Baron <jbaron@akamai.com> > > > --- > > > net/ipv4/tcp_output.c | 8 ++++++++ > > > 1 file changed, 8 insertions(+) > > > > > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > > > index b184f03d743715ef4b2d166ceae651529be77953..57f434a8e41ffd6bc584cb4d9e87703491a378c1 100644 > > > --- a/net/ipv4/tcp_output.c > > > +++ b/net/ipv4/tcp_output.c > > > @@ -2438,6 +2438,14 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, > > > if (tcp_small_queue_check(sk, skb, 0)) > > > break; > > > > > > + /* Argh, we hit an empty skb(), presumably a thread > > > + * is sleeping in sendmsg()/sk_stream_wait_memory(). > > > + * We do not want to send a pure-ack packet and have > > > + * a strange looking rtx queue with empty packet(s). > > > + */ > > > + if (TCP_SKB_CB(skb)->end_seq == TCP_SKB_CB(skb)->seq) > > > + break; > > > + > > > if (unlikely(tcp_transmit_skb(sk, skb, 1, gfp))) > > > break; > > > > > > -- > > > > Thanks for the fix, Eric! > > > > Is there any risk that any current or future bugs that create > > persistently empty skbs could cause the connection to "freeze", unable > > to reach the tcp_transmit_skb() call in tcp_write_xmit()? > > > > To avoid this risk, would it make sense to delete the empty skb and > > continue the tcp_write_xmit() transmit loop, rather than breaking out > > of the loop? > > This 'empty' skb must be the last in the queue. > > Removing it from the queue would not prevent tcp_write_xmit() from > breaking the loop. > > If we remove it, then we force sendmsg() to re-allocate a fresh skb, > which seems more work, and would paper around the bugs that would be > un-noticed. > > Another question to ask is if we need to reconsider how we use > tcp_write_queue_empty() as an indicator > for 'I have something in the write queue' > > This is obviously wrong if the write queue has a single empty skb. > > Maybe we should instead use tp->write_seq != tp->snd_nxt Also we use skb_queue_len(&sk->sk_write_queue) == 0 in tcp_sendmsg_locked() and this seems not good. We could have changed tcp_sendmsg_locked() to leave the empty skb in the write queue. ( and we can thus remove tcp_remove_empty_skb() completely since it has caused many problems in stable kernels) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 8a39ee79489192c02385aaadc8d1ae969fb55d23..9ba3294de9cb4e929afdc0e00b01b6b534c84af6 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1419,7 +1419,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) sock_zerocopy_put_abort(uarg, true); err = sk_stream_error(sk, flags, err); /* make sure we wake any epoll edge trigger waiter */ - if (unlikely(skb_queue_len(&sk->sk_write_queue) == 0 && + if (unlikely(tp->write_seq == tp->snd_nxt && err == -EAGAIN)) { sk->sk_write_space(sk); tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED); ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net] tcp: do not send empty skb from tcp_write_xmit() 2019-12-11 19:43 ` Eric Dumazet @ 2019-12-11 20:04 ` Neal Cardwell 0 siblings, 0 replies; 6+ messages in thread From: Neal Cardwell @ 2019-12-11 20:04 UTC (permalink / raw) To: Eric Dumazet Cc: David S . Miller, netdev, Eric Dumazet, Christoph Paasch, Jason Baron, Yuchung Cheng, Soheil Hassas Yeganeh On Wed, Dec 11, 2019 at 2:44 PM Eric Dumazet <edumazet@google.com> wrote: > > On Wed, Dec 11, 2019 at 11:39 AM Eric Dumazet <edumazet@google.com> wrote: > > > > On Wed, Dec 11, 2019 at 11:17 AM Neal Cardwell <ncardwell@google.com> wrote: > > > > > > On Wed, Dec 11, 2019 at 2:34 AM Eric Dumazet <edumazet@google.com> wrote: > > > > > > > > Backport of commit fdfc5c8594c2 ("tcp: remove empty skb from > > > > write queue in error cases") in linux-4.14 stable triggered > > > > various bugs. One of them has been fixed in commit ba2ddb43f270 > > > > ("tcp: Don't dequeue SYN/FIN-segments from write-queue"), but > > > > we still have crashes in some occasions. > > > > > > > > Root-cause is that when tcp_sendmsg() has allocated a fresh > > > > skb and could not append a fragment before being blocked > > > > in sk_stream_wait_memory(), tcp_write_xmit() might be called > > > > and decide to send this fresh and empty skb. > > > > > > > > Sending an empty packet is not only silly, it might have caused > > > > many issues we had in the past with tp->packets_out being > > > > out of sync. > > > > > > > > Fixes: c65f7f00c587 ("[TCP]: Simplify SKB data portion allocation with NETIF_F_SG.") > > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > > > Cc: Christoph Paasch <cpaasch@apple.com> > > > > Cc: Neal Cardwell <ncardwell@google.com> > > > > Cc: Jason Baron <jbaron@akamai.com> > > > > --- > > > > net/ipv4/tcp_output.c | 8 ++++++++ > > > > 1 file changed, 8 insertions(+) > > > > > > > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > > > > index b184f03d743715ef4b2d166ceae651529be77953..57f434a8e41ffd6bc584cb4d9e87703491a378c1 100644 > > > > --- a/net/ipv4/tcp_output.c > > > > +++ b/net/ipv4/tcp_output.c > > > > @@ -2438,6 +2438,14 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, > > > > if (tcp_small_queue_check(sk, skb, 0)) > > > > break; > > > > > > > > + /* Argh, we hit an empty skb(), presumably a thread > > > > + * is sleeping in sendmsg()/sk_stream_wait_memory(). > > > > + * We do not want to send a pure-ack packet and have > > > > + * a strange looking rtx queue with empty packet(s). > > > > + */ > > > > + if (TCP_SKB_CB(skb)->end_seq == TCP_SKB_CB(skb)->seq) > > > > + break; > > > > + > > > > if (unlikely(tcp_transmit_skb(sk, skb, 1, gfp))) > > > > break; > > > > > > > > -- > > > > > > Thanks for the fix, Eric! > > > > > > Is there any risk that any current or future bugs that create > > > persistently empty skbs could cause the connection to "freeze", unable > > > to reach the tcp_transmit_skb() call in tcp_write_xmit()? > > > > > > To avoid this risk, would it make sense to delete the empty skb and > > > continue the tcp_write_xmit() transmit loop, rather than breaking out > > > of the loop? > > > > This 'empty' skb must be the last in the queue. > > > > Removing it from the queue would not prevent tcp_write_xmit() from > > breaking the loop. > > > > If we remove it, then we force sendmsg() to re-allocate a fresh skb, > > which seems more work, and would paper around the bugs that would be > > un-noticed. > > > > Another question to ask is if we need to reconsider how we use > > tcp_write_queue_empty() as an indicator > > for 'I have something in the write queue' > > > > This is obviously wrong if the write queue has a single empty skb. > > > > Maybe we should instead use tp->write_seq != tp->snd_nxt > > Also we use skb_queue_len(&sk->sk_write_queue) == 0 in > tcp_sendmsg_locked() and this seems not good. > > We could have changed tcp_sendmsg_locked() to leave the empty skb in > the write queue. > ( and we can thus remove tcp_remove_empty_skb() completely since it > has caused many problems in stable kernels) > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index 8a39ee79489192c02385aaadc8d1ae969fb55d23..9ba3294de9cb4e929afdc0e00b01b6b534c84af6 > 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -1419,7 +1419,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct > msghdr *msg, size_t size) > sock_zerocopy_put_abort(uarg, true); > err = sk_stream_error(sk, flags, err); > /* make sure we wake any epoll edge trigger waiter */ > - if (unlikely(skb_queue_len(&sk->sk_write_queue) == 0 && > + if (unlikely(tp->write_seq == tp->snd_nxt && > err == -EAGAIN)) { > sk->sk_write_space(sk); > tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED); Thanks, Eric. I like your idea to audit the TCP calls to tcp_write_queue_empty() and skb_queue_len(&sk->sk_write_queue) to see which ones should be replaced with comparisons of tp->write_seq and tp->snd_nxt (including this one). Good catch! neal ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net] tcp: do not send empty skb from tcp_write_xmit() 2019-12-11 7:34 [PATCH net] tcp: do not send empty skb from tcp_write_xmit() Eric Dumazet 2019-12-11 19:17 ` Neal Cardwell @ 2019-12-11 20:05 ` Neal Cardwell 1 sibling, 0 replies; 6+ messages in thread From: Neal Cardwell @ 2019-12-11 20:05 UTC (permalink / raw) To: Eric Dumazet Cc: David S . Miller, netdev, Eric Dumazet, Christoph Paasch, Jason Baron On Wed, Dec 11, 2019 at 2:34 AM Eric Dumazet <edumazet@google.com> wrote: > > Backport of commit fdfc5c8594c2 ("tcp: remove empty skb from > write queue in error cases") in linux-4.14 stable triggered > various bugs. One of them has been fixed in commit ba2ddb43f270 > ("tcp: Don't dequeue SYN/FIN-segments from write-queue"), but > we still have crashes in some occasions. > > Root-cause is that when tcp_sendmsg() has allocated a fresh > skb and could not append a fragment before being blocked > in sk_stream_wait_memory(), tcp_write_xmit() might be called > and decide to send this fresh and empty skb. > > Sending an empty packet is not only silly, it might have caused > many issues we had in the past with tp->packets_out being > out of sync. > > Fixes: c65f7f00c587 ("[TCP]: Simplify SKB data portion allocation with NETIF_F_SG.") > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Christoph Paasch <cpaasch@apple.com> > Cc: Neal Cardwell <ncardwell@google.com> > Cc: Jason Baron <jbaron@akamai.com> > --- Acked-by: Neal Cardwell <ncardwell@google.com> thanks, neal ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-12-11 20:05 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-12-11 7:34 [PATCH net] tcp: do not send empty skb from tcp_write_xmit() Eric Dumazet 2019-12-11 19:17 ` Neal Cardwell 2019-12-11 19:39 ` Eric Dumazet 2019-12-11 19:43 ` Eric Dumazet 2019-12-11 20:04 ` Neal Cardwell 2019-12-11 20:05 ` Neal Cardwell
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.