All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] tcp: fix SO_RCVLOWAT hangs with fat skbs
@ 2020-05-12 13:54 Eric Dumazet
  2020-05-12 18:10 ` Soheil Hassas Yeganeh
  2020-05-12 19:50 ` David Miller
  0 siblings, 2 replies; 3+ messages in thread
From: Eric Dumazet @ 2020-05-12 13:54 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet

We autotune rcvbuf whenever SO_RCVLOWAT is set to account for 100%
overhead in tcp_set_rcvlowat()

This works well when skb->len/skb->truesize ratio is bigger than 0.5

But if we receive packets with small MSS, we can end up in a situation
where not enough bytes are available in the receive queue to satisfy
RCVLOWAT setting.
As our sk_rcvbuf limit is hit, we send zero windows in ACK packets,
preventing remote peer from sending more data.

Even autotuning does not help, because it only triggers at the time
user process drains the queue. If no EPOLLIN is generated, this
can not happen.

Note poll() has a similar issue, after commit
c7004482e8dc ("tcp: Respect SO_RCVLOWAT in tcp_poll().")

Fixes: 03f45c883c6f ("tcp: avoid extra wakeups for SO_RCVLOWAT users")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/tcp.h    | 13 +++++++++++++
 net/ipv4/tcp.c       | 14 +++++++++++---
 net/ipv4/tcp_input.c |  3 ++-
 3 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 64f84683feaede11fa0789baad277e2b4655ec7a..6f8e60c6fbc746ea7ed2c2ddc97bffdbb7da4fc1 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1420,6 +1420,19 @@ static inline int tcp_full_space(const struct sock *sk)
 	return tcp_win_from_space(sk, READ_ONCE(sk->sk_rcvbuf));
 }
 
+/* We provision sk_rcvbuf around 200% of sk_rcvlowat.
+ * If 87.5 % (7/8) of the space has been consumed, we want to override
+ * SO_RCVLOWAT constraint, since we are receiving skbs with too small
+ * len/truesize ratio.
+ */
+static inline bool tcp_rmem_pressure(const struct sock *sk)
+{
+	int rcvbuf = READ_ONCE(sk->sk_rcvbuf);
+	int threshold = rcvbuf - (rcvbuf >> 3);
+
+	return atomic_read(&sk->sk_rmem_alloc) > threshold;
+}
+
 extern void tcp_openreq_init_rwin(struct request_sock *req,
 				  const struct sock *sk_listener,
 				  const struct dst_entry *dst);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index e72bd651d21acff036b856c1050bd86c31c468a0..a385fcaaa03beed9bfeabdebc12371e34e0649de 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -476,9 +476,17 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
 static inline bool tcp_stream_is_readable(const struct tcp_sock *tp,
 					  int target, struct sock *sk)
 {
-	return (READ_ONCE(tp->rcv_nxt) - READ_ONCE(tp->copied_seq) >= target) ||
-		(sk->sk_prot->stream_memory_read ?
-		sk->sk_prot->stream_memory_read(sk) : false);
+	int avail = READ_ONCE(tp->rcv_nxt) - READ_ONCE(tp->copied_seq);
+
+	if (avail > 0) {
+		if (avail >= target)
+			return true;
+		if (tcp_rmem_pressure(sk))
+			return true;
+	}
+	if (sk->sk_prot->stream_memory_read)
+		return sk->sk_prot->stream_memory_read(sk);
+	return false;
 }
 
 /*
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b996dc1069c53ead2e1f3552716a6cd427942afd..29c6fc8c7716881ec37ad08fbd3497747b9350fe 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4757,7 +4757,8 @@ void tcp_data_ready(struct sock *sk)
 	const struct tcp_sock *tp = tcp_sk(sk);
 	int avail = tp->rcv_nxt - tp->copied_seq;
 
-	if (avail < sk->sk_rcvlowat && !sock_flag(sk, SOCK_DONE))
+	if (avail < sk->sk_rcvlowat && !tcp_rmem_pressure(sk) &&
+	    !sock_flag(sk, SOCK_DONE))
 		return;
 
 	sk->sk_data_ready(sk);
-- 
2.26.2.645.ge9eca65c58-goog


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net] tcp: fix SO_RCVLOWAT hangs with fat skbs
  2020-05-12 13:54 [PATCH net] tcp: fix SO_RCVLOWAT hangs with fat skbs Eric Dumazet
@ 2020-05-12 18:10 ` Soheil Hassas Yeganeh
  2020-05-12 19:50 ` David Miller
  1 sibling, 0 replies; 3+ messages in thread
From: Soheil Hassas Yeganeh @ 2020-05-12 18:10 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S . Miller, netdev, Eric Dumazet

On Tue, May 12, 2020 at 9:54 AM Eric Dumazet <edumazet@google.com> wrote:
>
> We autotune rcvbuf whenever SO_RCVLOWAT is set to account for 100%
> overhead in tcp_set_rcvlowat()
>
> This works well when skb->len/skb->truesize ratio is bigger than 0.5
>
> But if we receive packets with small MSS, we can end up in a situation
> where not enough bytes are available in the receive queue to satisfy
> RCVLOWAT setting.
> As our sk_rcvbuf limit is hit, we send zero windows in ACK packets,
> preventing remote peer from sending more data.
>
> Even autotuning does not help, because it only triggers at the time
> user process drains the queue. If no EPOLLIN is generated, this
> can not happen.
>
> Note poll() has a similar issue, after commit
> c7004482e8dc ("tcp: Respect SO_RCVLOWAT in tcp_poll().")
>
> Fixes: 03f45c883c6f ("tcp: avoid extra wakeups for SO_RCVLOWAT users")
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Acked-by: Soheil Hassas Yeganeh <soheil@google.com>

Nice find! Thank you for the fix!

> ---
>  include/net/tcp.h    | 13 +++++++++++++
>  net/ipv4/tcp.c       | 14 +++++++++++---
>  net/ipv4/tcp_input.c |  3 ++-
>  3 files changed, 26 insertions(+), 4 deletions(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 64f84683feaede11fa0789baad277e2b4655ec7a..6f8e60c6fbc746ea7ed2c2ddc97bffdbb7da4fc1 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -1420,6 +1420,19 @@ static inline int tcp_full_space(const struct sock *sk)
>         return tcp_win_from_space(sk, READ_ONCE(sk->sk_rcvbuf));
>  }
>
> +/* We provision sk_rcvbuf around 200% of sk_rcvlowat.
> + * If 87.5 % (7/8) of the space has been consumed, we want to override
> + * SO_RCVLOWAT constraint, since we are receiving skbs with too small
> + * len/truesize ratio.
> + */
> +static inline bool tcp_rmem_pressure(const struct sock *sk)
> +{
> +       int rcvbuf = READ_ONCE(sk->sk_rcvbuf);
> +       int threshold = rcvbuf - (rcvbuf >> 3);
> +
> +       return atomic_read(&sk->sk_rmem_alloc) > threshold;
> +}
> +
>  extern void tcp_openreq_init_rwin(struct request_sock *req,
>                                   const struct sock *sk_listener,
>                                   const struct dst_entry *dst);
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index e72bd651d21acff036b856c1050bd86c31c468a0..a385fcaaa03beed9bfeabdebc12371e34e0649de 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -476,9 +476,17 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
>  static inline bool tcp_stream_is_readable(const struct tcp_sock *tp,
>                                           int target, struct sock *sk)
>  {
> -       return (READ_ONCE(tp->rcv_nxt) - READ_ONCE(tp->copied_seq) >= target) ||
> -               (sk->sk_prot->stream_memory_read ?
> -               sk->sk_prot->stream_memory_read(sk) : false);
> +       int avail = READ_ONCE(tp->rcv_nxt) - READ_ONCE(tp->copied_seq);
> +
> +       if (avail > 0) {
> +               if (avail >= target)
> +                       return true;
> +               if (tcp_rmem_pressure(sk))
> +                       return true;
> +       }
> +       if (sk->sk_prot->stream_memory_read)
> +               return sk->sk_prot->stream_memory_read(sk);
> +       return false;
>  }
>
>  /*
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index b996dc1069c53ead2e1f3552716a6cd427942afd..29c6fc8c7716881ec37ad08fbd3497747b9350fe 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -4757,7 +4757,8 @@ void tcp_data_ready(struct sock *sk)
>         const struct tcp_sock *tp = tcp_sk(sk);
>         int avail = tp->rcv_nxt - tp->copied_seq;
>
> -       if (avail < sk->sk_rcvlowat && !sock_flag(sk, SOCK_DONE))
> +       if (avail < sk->sk_rcvlowat && !tcp_rmem_pressure(sk) &&
> +           !sock_flag(sk, SOCK_DONE))
>                 return;
>
>         sk->sk_data_ready(sk);
> --
> 2.26.2.645.ge9eca65c58-goog
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH net] tcp: fix SO_RCVLOWAT hangs with fat skbs
  2020-05-12 13:54 [PATCH net] tcp: fix SO_RCVLOWAT hangs with fat skbs Eric Dumazet
  2020-05-12 18:10 ` Soheil Hassas Yeganeh
@ 2020-05-12 19:50 ` David Miller
  1 sibling, 0 replies; 3+ messages in thread
From: David Miller @ 2020-05-12 19:50 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet

From: Eric Dumazet <edumazet@google.com>
Date: Tue, 12 May 2020 06:54:30 -0700

> We autotune rcvbuf whenever SO_RCVLOWAT is set to account for 100%
> overhead in tcp_set_rcvlowat()
> 
> This works well when skb->len/skb->truesize ratio is bigger than 0.5
> 
> But if we receive packets with small MSS, we can end up in a situation
> where not enough bytes are available in the receive queue to satisfy
> RCVLOWAT setting.
> As our sk_rcvbuf limit is hit, we send zero windows in ACK packets,
> preventing remote peer from sending more data.
> 
> Even autotuning does not help, because it only triggers at the time
> user process drains the queue. If no EPOLLIN is generated, this
> can not happen.
> 
> Note poll() has a similar issue, after commit
> c7004482e8dc ("tcp: Respect SO_RCVLOWAT in tcp_poll().")
> 
> Fixes: 03f45c883c6f ("tcp: avoid extra wakeups for SO_RCVLOWAT users")
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied and queued up for -stable, thanks Eric.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-05-12 19:50 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-12 13:54 [PATCH net] tcp: fix SO_RCVLOWAT hangs with fat skbs Eric Dumazet
2020-05-12 18:10 ` Soheil Hassas Yeganeh
2020-05-12 19:50 ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.