netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 net-next 0/2] tcp: tcp_info extensions
@ 2015-04-28 22:28 Eric Dumazet
  2015-04-28 22:28 ` [PATCH v2 net-next 1/2] tcp: add tcpi_bytes_acked to tcp_info Eric Dumazet
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Eric Dumazet @ 2015-04-28 22:28 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, Yuchung Cheng, Matt Mathis,
	Eric Salo, Martin Lau, Chris Rapier

As discussed during Chris Rapier presentation in Ottawa / netdev0.1,
we add to tcp_info the first two fields are highly wanted.

Each field is added into a single patch for easy code review.

(Corresponding iproute2/ss patches will be sent)

Next fields will follow once consensus is reached.

v2: Addressed Yuchung feedback about FastOpen and bytes_received

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Eric Salo <salo@google.com>
Cc: Martin Lau <kafai@fb.com>
Cc: Chris Rapier <rapier@psc.edu>

Eric Dumazet (2):
  tcp: add tcpi_bytes_acked to tcp_info
  tcp: add tcpi_bytes_received to tcp_info

 include/linux/tcp.h      |  8 ++++++++
 include/net/tcp.h        |  2 +-
 include/uapi/linux/tcp.h |  2 ++
 net/ipv4/tcp.c           |  7 ++++++-
 net/ipv4/tcp_input.c     | 30 ++++++++++++++++++++++++------
 5 files changed, 41 insertions(+), 8 deletions(-)

-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 net-next 1/2] tcp: add tcpi_bytes_acked to tcp_info
  2015-04-28 22:28 [PATCH v2 net-next 0/2] tcp: tcp_info extensions Eric Dumazet
@ 2015-04-28 22:28 ` Eric Dumazet
  2015-04-29 22:12   ` David Miller
  2015-04-28 22:28 ` [PATCH v2 net-next 2/2] tcp: add tcpi_bytes_received " Eric Dumazet
  2015-05-12 13:08 ` [PATCH v2 net-next 0/2] tcp: tcp_info extensions Marcelo Ricardo Leitner
  2 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2015-04-28 22:28 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, Matt Mathis, Eric Salo,
	Martin Lau, Chris Rapier

This patch tracks total number of bytes acked for a TCP socket.
This is the sum of all changes done to tp->snd_una, and allows
for precise tracking of delivered data.

RFC4898 named this : tcpEStatsAppHCThruOctetsAcked

This is a 64bit field, and can be fetched both from TCP_INFO
getsockopt() if one has a handle on a TCP socket, or from inet_diag
netlink facility (iproute2/ss patch will follow)

Note that tp->bytes_acked was placed near tp->snd_una for
best data locality and minimal performance impact.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Eric Salo <salo@google.com>
Cc: Martin Lau <kafai@fb.com>
Cc: Chris Rapier <rapier@psc.edu>
---
 include/linux/tcp.h      |  4 ++++
 include/net/tcp.h        |  2 +-
 include/uapi/linux/tcp.h |  1 +
 net/ipv4/tcp.c           |  6 +++++-
 net/ipv4/tcp_input.c     | 13 +++++++++++--
 5 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 0caa3a2d4106..0f73b43171da 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -150,6 +150,10 @@ struct tcp_sock {
 	u32	rcv_wup;	/* rcv_nxt on last window update sent	*/
  	u32	snd_nxt;	/* Next sequence we send		*/
 
+	u64	bytes_acked;	/* RFC4898 tcpEStatsAppHCThruOctetsAcked
+				 * sum(delta(snd_una)), or how many bytes
+				 * were acked.
+				 */
  	u32	snd_una;	/* First byte we want an ack for	*/
  	u32	snd_sml;	/* Last byte of the most recently transmitted small packet */
 	u32	rcv_tstamp;	/* timestamp of last received ACK (for keepalives) */
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 051dc5c2802d..dd7b4ea6a10c 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -576,7 +576,7 @@ static inline int tcp_bound_to_half_wnd(struct tcp_sock *tp, int pktsize)
 }
 
 /* tcp.c */
-void tcp_get_info(const struct sock *, struct tcp_info *);
+void tcp_get_info(struct sock *, struct tcp_info *);
 
 /* Read 'sendfile()'-style from a TCP socket */
 typedef int (*sk_read_actor_t)(read_descriptor_t *, struct sk_buff *,
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 3b9718328d8b..6666e98a0af9 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -189,6 +189,7 @@ struct tcp_info {
 
 	__u64	tcpi_pacing_rate;
 	__u64	tcpi_max_pacing_rate;
+	__u64	tcpi_bytes_acked; /* RFC4898 tcpEStatsAppHCThruOctetsAcked */
 };
 
 /* for TCP_MD5SIG socket option */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 8c5cd9efebbc..4bf0e8ca7b5b 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2592,7 +2592,7 @@ EXPORT_SYMBOL(compat_tcp_setsockopt);
 #endif
 
 /* Return information about state of tcp endpoint in API format. */
-void tcp_get_info(const struct sock *sk, struct tcp_info *info)
+void tcp_get_info(struct sock *sk, struct tcp_info *info)
 {
 	const struct tcp_sock *tp = tcp_sk(sk);
 	const struct inet_connection_sock *icsk = inet_csk(sk);
@@ -2663,6 +2663,10 @@ void tcp_get_info(const struct sock *sk, struct tcp_info *info)
 
 	rate = READ_ONCE(sk->sk_max_pacing_rate);
 	info->tcpi_max_pacing_rate = rate != ~0U ? rate : ~0ULL;
+
+	spin_lock_bh(&sk->sk_lock.slock);
+	info->tcpi_bytes_acked = tp->bytes_acked;
+	spin_unlock_bh(&sk->sk_lock.slock);
 }
 EXPORT_SYMBOL_GPL(tcp_get_info);
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 3a4d9b34bed4..378d3f4d4dc3 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3280,6 +3280,15 @@ static inline bool tcp_may_update_window(const struct tcp_sock *tp,
 		(ack_seq == tp->snd_wl1 && nwin > tp->snd_wnd);
 }
 
+/* If we update tp->snd_una, also update tp->bytes_acked */
+static void tcp_snd_una_update(struct tcp_sock *tp, u32 ack)
+{
+	u32 delta = ack - tp->snd_una;
+
+	tp->bytes_acked += delta;
+	tp->snd_una = ack;
+}
+
 /* Update our send window.
  *
  * Window update algorithm, described in RFC793/RFC1122 (used in linux-2.2
@@ -3315,7 +3324,7 @@ static int tcp_ack_update_window(struct sock *sk, const struct sk_buff *skb, u32
 		}
 	}
 
-	tp->snd_una = ack;
+	tcp_snd_una_update(tp, ack);
 
 	return flag;
 }
@@ -3497,7 +3506,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 		 * Note, we use the fact that SND.UNA>=SND.WL2.
 		 */
 		tcp_update_wl(tp, ack_seq);
-		tp->snd_una = ack;
+		tcp_snd_una_update(tp, ack);
 		flag |= FLAG_WIN_UPDATE;
 
 		tcp_in_ack_event(sk, CA_ACK_WIN_UPDATE);
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 net-next 2/2] tcp: add tcpi_bytes_received to tcp_info
  2015-04-28 22:28 [PATCH v2 net-next 0/2] tcp: tcp_info extensions Eric Dumazet
  2015-04-28 22:28 ` [PATCH v2 net-next 1/2] tcp: add tcpi_bytes_acked to tcp_info Eric Dumazet
@ 2015-04-28 22:28 ` Eric Dumazet
  2015-04-28 22:56   ` Yuchung Cheng
  2015-04-29 22:12   ` David Miller
  2015-05-12 13:08 ` [PATCH v2 net-next 0/2] tcp: tcp_info extensions Marcelo Ricardo Leitner
  2 siblings, 2 replies; 28+ messages in thread
From: Eric Dumazet @ 2015-04-28 22:28 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, Yuchung Cheng, Matt Mathis,
	Eric Salo, Martin Lau, Chris Rapier

This patch tracks total number of payload bytes received on a TCP socket.
This is the sum of all changes done to tp->rcv_nxt

RFC4898 named this : tcpEStatsAppHCThruOctetsReceived

This is a 64bit field, and can be fetched both from TCP_INFO
getsockopt() if one has a handle on a TCP socket, or from inet_diag
netlink facility (iproute2/ss patch will follow)

Note that tp->bytes_received was placed near tp->rcv_nxt for
best data locality and minimal performance impact.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Eric Salo <salo@google.com>
Cc: Martin Lau <kafai@fb.com>
Cc: Chris Rapier <rapier@psc.edu>
---
 include/linux/tcp.h      |  4 ++++
 include/uapi/linux/tcp.h |  1 +
 net/ipv4/tcp.c           |  1 +
 net/ipv4/tcp_fastopen.c  |  1 +
 net/ipv4/tcp_input.c     | 17 +++++++++++++----
 5 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 0f73b43171da..3b2911502a8c 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -145,6 +145,10 @@ struct tcp_sock {
  *	read the code and the spec side by side (and laugh ...)
  *	See RFC793 and RFC1122. The RFC writes these in capitals.
  */
+	u64	bytes_received;	/* RFC4898 tcpEStatsAppHCThruOctetsReceived
+				 * sum(delta(rcv_nxt)), or how many bytes
+				 * were acked.
+				 */
  	u32	rcv_nxt;	/* What we want to receive next 	*/
 	u32	copied_seq;	/* Head of yet unread data		*/
 	u32	rcv_wup;	/* rcv_nxt on last window update sent	*/
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 6666e98a0af9..a48f93f3207b 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -190,6 +190,7 @@ struct tcp_info {
 	__u64	tcpi_pacing_rate;
 	__u64	tcpi_max_pacing_rate;
 	__u64	tcpi_bytes_acked; /* RFC4898 tcpEStatsAppHCThruOctetsAcked */
+	__u64	tcpi_bytes_received; /* RFC4898 tcpEStatsAppHCThruOctetsReceived */
 };
 
 /* for TCP_MD5SIG socket option */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 4bf0e8ca7b5b..99fcc0b22c92 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2666,6 +2666,7 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
 
 	spin_lock_bh(&sk->sk_lock.slock);
 	info->tcpi_bytes_acked = tp->bytes_acked;
+	info->tcpi_bytes_received = tp->bytes_received;
 	spin_unlock_bh(&sk->sk_lock.slock);
 }
 EXPORT_SYMBOL_GPL(tcp_get_info);
diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
index e3d87aca6be8..b1b110d07816 100644
--- a/net/ipv4/tcp_fastopen.c
+++ b/net/ipv4/tcp_fastopen.c
@@ -206,6 +206,7 @@ static bool tcp_fastopen_create_child(struct sock *sk,
 			skb_set_owner_r(skb2, child);
 			__skb_queue_tail(&child->sk_receive_queue, skb2);
 			tp->syn_data_acked = 1;
+			tp->bytes_received = end_seq - TCP_SKB_CB(skb)->seq - 1;
 		} else {
 			end_seq = TCP_SKB_CB(skb)->seq + 1;
 		}
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 378d3f4d4dc3..7e6962bcfc30 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3289,6 +3289,15 @@ static void tcp_snd_una_update(struct tcp_sock *tp, u32 ack)
 	tp->snd_una = ack;
 }
 
+/* If we update tp->rcv_nxt, also update tp->bytes_received */
+static void tcp_rcv_nxt_update(struct tcp_sock *tp, u32 seq)
+{
+	u32 delta = seq - tp->rcv_nxt;
+
+	tp->bytes_received += delta;
+	tp->rcv_nxt = seq;
+}
+
 /* Update our send window.
  *
  * Window update algorithm, described in RFC793/RFC1122 (used in linux-2.2
@@ -4245,7 +4254,7 @@ static void tcp_ofo_queue(struct sock *sk)
 
 		tail = skb_peek_tail(&sk->sk_receive_queue);
 		eaten = tail && tcp_try_coalesce(sk, tail, skb, &fragstolen);
-		tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq;
+		tcp_rcv_nxt_update(tp, TCP_SKB_CB(skb)->end_seq);
 		if (!eaten)
 			__skb_queue_tail(&sk->sk_receive_queue, skb);
 		if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
@@ -4413,7 +4422,7 @@ static int __must_check tcp_queue_rcv(struct sock *sk, struct sk_buff *skb, int
 	__skb_pull(skb, hdrlen);
 	eaten = (tail &&
 		 tcp_try_coalesce(sk, tail, skb, fragstolen)) ? 1 : 0;
-	tcp_sk(sk)->rcv_nxt = TCP_SKB_CB(skb)->end_seq;
+	tcp_rcv_nxt_update(tcp_sk(sk), TCP_SKB_CB(skb)->end_seq);
 	if (!eaten) {
 		__skb_queue_tail(&sk->sk_receive_queue, skb);
 		skb_set_owner_r(skb, sk);
@@ -4506,7 +4515,7 @@ queue_and_out:
 
 			eaten = tcp_queue_rcv(sk, skb, 0, &fragstolen);
 		}
-		tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq;
+		tcp_rcv_nxt_update(tp, TCP_SKB_CB(skb)->end_seq);
 		if (skb->len)
 			tcp_event_data_recv(sk, skb);
 		if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
@@ -5254,7 +5263,7 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb,
 					tcp_rcv_rtt_measure_ts(sk, skb);
 
 					__skb_pull(skb, tcp_header_len);
-					tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq;
+					tcp_rcv_nxt_update(tp, TCP_SKB_CB(skb)->end_seq);
 					NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPHPHITSTOUSER);
 					eaten = 1;
 				}
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 net-next 2/2] tcp: add tcpi_bytes_received to tcp_info
  2015-04-28 22:28 ` [PATCH v2 net-next 2/2] tcp: add tcpi_bytes_received " Eric Dumazet
@ 2015-04-28 22:56   ` Yuchung Cheng
  2015-04-28 23:07     ` Eric Dumazet
  2015-04-29 22:12   ` David Miller
  1 sibling, 1 reply; 28+ messages in thread
From: Yuchung Cheng @ 2015-04-28 22:56 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, netdev, Eric Dumazet, Matt Mathis, Eric Salo,
	Martin Lau, Chris Rapier

On Tue, Apr 28, 2015 at 3:28 PM, Eric Dumazet <edumazet@google.com> wrote:
> This patch tracks total number of payload bytes received on a TCP socket.
> This is the sum of all changes done to tp->rcv_nxt
>
> RFC4898 named this : tcpEStatsAppHCThruOctetsReceived
>
> This is a 64bit field, and can be fetched both from TCP_INFO
> getsockopt() if one has a handle on a TCP socket, or from inet_diag
> netlink facility (iproute2/ss patch will follow)
>
> Note that tp->bytes_received was placed near tp->rcv_nxt for
> best data locality and minimal performance impact.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
> Cc: Matt Mathis <mattmathis@google.com>
> Cc: Eric Salo <salo@google.com>
> Cc: Martin Lau <kafai@fb.com>
> Cc: Chris Rapier <rapier@psc.edu>
Acked-by: Yuchung Cheng <ycheng@google.com>

tho I slightly prefer to call tcp_rcv_nxt_update() when rcv_nxt is
updated in TFO for consistency.

> ---
>  include/linux/tcp.h      |  4 ++++
>  include/uapi/linux/tcp.h |  1 +
>  net/ipv4/tcp.c           |  1 +
>  net/ipv4/tcp_fastopen.c  |  1 +
>  net/ipv4/tcp_input.c     | 17 +++++++++++++----
>  5 files changed, 20 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> index 0f73b43171da..3b2911502a8c 100644
> --- a/include/linux/tcp.h
> +++ b/include/linux/tcp.h
> @@ -145,6 +145,10 @@ struct tcp_sock {
>   *     read the code and the spec side by side (and laugh ...)
>   *     See RFC793 and RFC1122. The RFC writes these in capitals.
>   */
> +       u64     bytes_received; /* RFC4898 tcpEStatsAppHCThruOctetsReceived
> +                                * sum(delta(rcv_nxt)), or how many bytes
> +                                * were acked.
> +                                */
>         u32     rcv_nxt;        /* What we want to receive next         */
>         u32     copied_seq;     /* Head of yet unread data              */
>         u32     rcv_wup;        /* rcv_nxt on last window update sent   */
> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
> index 6666e98a0af9..a48f93f3207b 100644
> --- a/include/uapi/linux/tcp.h
> +++ b/include/uapi/linux/tcp.h
> @@ -190,6 +190,7 @@ struct tcp_info {
>         __u64   tcpi_pacing_rate;
>         __u64   tcpi_max_pacing_rate;
>         __u64   tcpi_bytes_acked; /* RFC4898 tcpEStatsAppHCThruOctetsAcked */
> +       __u64   tcpi_bytes_received; /* RFC4898 tcpEStatsAppHCThruOctetsReceived */
>  };
>
>  /* for TCP_MD5SIG socket option */
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 4bf0e8ca7b5b..99fcc0b22c92 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2666,6 +2666,7 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
>
>         spin_lock_bh(&sk->sk_lock.slock);
>         info->tcpi_bytes_acked = tp->bytes_acked;
> +       info->tcpi_bytes_received = tp->bytes_received;
>         spin_unlock_bh(&sk->sk_lock.slock);
>  }
>  EXPORT_SYMBOL_GPL(tcp_get_info);
> diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
> index e3d87aca6be8..b1b110d07816 100644
> --- a/net/ipv4/tcp_fastopen.c
> +++ b/net/ipv4/tcp_fastopen.c
> @@ -206,6 +206,7 @@ static bool tcp_fastopen_create_child(struct sock *sk,
>                         skb_set_owner_r(skb2, child);
>                         __skb_queue_tail(&child->sk_receive_queue, skb2);
>                         tp->syn_data_acked = 1;
> +                       tp->bytes_received = end_seq - TCP_SKB_CB(skb)->seq - 1;
>                 } else {
>                         end_seq = TCP_SKB_CB(skb)->seq + 1;
>                 }
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 378d3f4d4dc3..7e6962bcfc30 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -3289,6 +3289,15 @@ static void tcp_snd_una_update(struct tcp_sock *tp, u32 ack)
>         tp->snd_una = ack;
>  }
>
> +/* If we update tp->rcv_nxt, also update tp->bytes_received */
> +static void tcp_rcv_nxt_update(struct tcp_sock *tp, u32 seq)
> +{
> +       u32 delta = seq - tp->rcv_nxt;
> +
> +       tp->bytes_received += delta;
> +       tp->rcv_nxt = seq;
> +}
> +
>  /* Update our send window.
>   *
>   * Window update algorithm, described in RFC793/RFC1122 (used in linux-2.2
> @@ -4245,7 +4254,7 @@ static void tcp_ofo_queue(struct sock *sk)
>
>                 tail = skb_peek_tail(&sk->sk_receive_queue);
>                 eaten = tail && tcp_try_coalesce(sk, tail, skb, &fragstolen);
> -               tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq;
> +               tcp_rcv_nxt_update(tp, TCP_SKB_CB(skb)->end_seq);
>                 if (!eaten)
>                         __skb_queue_tail(&sk->sk_receive_queue, skb);
>                 if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
> @@ -4413,7 +4422,7 @@ static int __must_check tcp_queue_rcv(struct sock *sk, struct sk_buff *skb, int
>         __skb_pull(skb, hdrlen);
>         eaten = (tail &&
>                  tcp_try_coalesce(sk, tail, skb, fragstolen)) ? 1 : 0;
> -       tcp_sk(sk)->rcv_nxt = TCP_SKB_CB(skb)->end_seq;
> +       tcp_rcv_nxt_update(tcp_sk(sk), TCP_SKB_CB(skb)->end_seq);
>         if (!eaten) {
>                 __skb_queue_tail(&sk->sk_receive_queue, skb);
>                 skb_set_owner_r(skb, sk);
> @@ -4506,7 +4515,7 @@ queue_and_out:
>
>                         eaten = tcp_queue_rcv(sk, skb, 0, &fragstolen);
>                 }
> -               tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq;
> +               tcp_rcv_nxt_update(tp, TCP_SKB_CB(skb)->end_seq);
>                 if (skb->len)
>                         tcp_event_data_recv(sk, skb);
>                 if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
> @@ -5254,7 +5263,7 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb,
>                                         tcp_rcv_rtt_measure_ts(sk, skb);
>
>                                         __skb_pull(skb, tcp_header_len);
> -                                       tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq;
> +                                       tcp_rcv_nxt_update(tp, TCP_SKB_CB(skb)->end_seq);
>                                         NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPHPHITSTOUSER);
>                                         eaten = 1;
>                                 }
> --
> 2.2.0.rc0.207.ga3a616c
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 net-next 2/2] tcp: add tcpi_bytes_received to tcp_info
  2015-04-28 22:56   ` Yuchung Cheng
@ 2015-04-28 23:07     ` Eric Dumazet
  0 siblings, 0 replies; 28+ messages in thread
From: Eric Dumazet @ 2015-04-28 23:07 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Eric Dumazet, David S. Miller, netdev, Matt Mathis, Eric Salo,
	Martin Lau, Chris Rapier

On Tue, 2015-04-28 at 15:56 -0700, Yuchung Cheng wrote:

> Acked-by: Yuchung Cheng <ycheng@google.com>
> 

Thanks !

> tho I slightly prefer to call tcp_rcv_nxt_update() when rcv_nxt is
> updated in TFO for consistency.

Right, but is the tp->rcv_nxt prior value even valid at this point ? :)

Anyway, this would need to make tcp_rcv_nxt_update() non static and
would convince compiler to not inline the code.

At this point, we can simply set tp->bytes_received.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 net-next 1/2] tcp: add tcpi_bytes_acked to tcp_info
  2015-04-28 22:28 ` [PATCH v2 net-next 1/2] tcp: add tcpi_bytes_acked to tcp_info Eric Dumazet
@ 2015-04-29 22:12   ` David Miller
  0 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2015-04-29 22:12 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet, mattmathis, salo, kafai, rapier

From: Eric Dumazet <edumazet@google.com>
Date: Tue, 28 Apr 2015 15:28:17 -0700

> This patch tracks total number of bytes acked for a TCP socket.
> This is the sum of all changes done to tp->snd_una, and allows
> for precise tracking of delivered data.
> 
> RFC4898 named this : tcpEStatsAppHCThruOctetsAcked
> 
> This is a 64bit field, and can be fetched both from TCP_INFO
> getsockopt() if one has a handle on a TCP socket, or from inet_diag
> netlink facility (iproute2/ss patch will follow)
> 
> Note that tp->bytes_acked was placed near tp->snd_una for
> best data locality and minimal performance impact.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Yuchung Cheng <ycheng@google.com>

Applied.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 net-next 2/2] tcp: add tcpi_bytes_received to tcp_info
  2015-04-28 22:28 ` [PATCH v2 net-next 2/2] tcp: add tcpi_bytes_received " Eric Dumazet
  2015-04-28 22:56   ` Yuchung Cheng
@ 2015-04-29 22:12   ` David Miller
  1 sibling, 0 replies; 28+ messages in thread
From: David Miller @ 2015-04-29 22:12 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet, ycheng, mattmathis, salo, kafai, rapier

From: Eric Dumazet <edumazet@google.com>
Date: Tue, 28 Apr 2015 15:28:18 -0700

> This patch tracks total number of payload bytes received on a TCP socket.
> This is the sum of all changes done to tp->rcv_nxt
> 
> RFC4898 named this : tcpEStatsAppHCThruOctetsReceived
> 
> This is a 64bit field, and can be fetched both from TCP_INFO
> getsockopt() if one has a handle on a TCP socket, or from inet_diag
> netlink facility (iproute2/ss patch will follow)
> 
> Note that tp->bytes_received was placed near tp->rcv_nxt for
> best data locality and minimal performance impact.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 net-next 0/2] tcp: tcp_info extensions
  2015-04-28 22:28 [PATCH v2 net-next 0/2] tcp: tcp_info extensions Eric Dumazet
  2015-04-28 22:28 ` [PATCH v2 net-next 1/2] tcp: add tcpi_bytes_acked to tcp_info Eric Dumazet
  2015-04-28 22:28 ` [PATCH v2 net-next 2/2] tcp: add tcpi_bytes_received " Eric Dumazet
@ 2015-05-12 13:08 ` Marcelo Ricardo Leitner
  2015-05-12 20:31   ` Eric Dumazet
  2 siblings, 1 reply; 28+ messages in thread
From: Marcelo Ricardo Leitner @ 2015-05-12 13:08 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, netdev, Eric Dumazet, Yuchung Cheng,
	Matt Mathis, Eric Salo, Martin Lau, Chris Rapier

On Tue, Apr 28, 2015 at 03:28:16PM -0700, Eric Dumazet wrote:
> As discussed during Chris Rapier presentation in Ottawa / netdev0.1,
> we add to tcp_info the first two fields are highly wanted.
> 
> Each field is added into a single patch for easy code review.
> 
> (Corresponding iproute2/ss patches will be sent)
> 
> Next fields will follow once consensus is reached.

Is tcpEStatsPerfSegsOut amongst them? I have a need for this one and can
submit a patch if you're not tracking it already.

  Marcelo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 net-next 0/2] tcp: tcp_info extensions
  2015-05-12 13:08 ` [PATCH v2 net-next 0/2] tcp: tcp_info extensions Marcelo Ricardo Leitner
@ 2015-05-12 20:31   ` Eric Dumazet
  2015-05-12 22:21     ` Marcelo Ricardo Leitner
  2015-05-20 23:35     ` [PATCH net-next] tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info Eric Dumazet
  0 siblings, 2 replies; 28+ messages in thread
From: Eric Dumazet @ 2015-05-12 20:31 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Eric Dumazet, David S. Miller, netdev, Yuchung Cheng,
	Matt Mathis, Craig Gallek, Martin Lau, Chris Rapier

On Tue, 2015-05-12 at 10:08 -0300, Marcelo Ricardo Leitner wrote:
> On Tue, Apr 28, 2015 at 03:28:16PM -0700, Eric Dumazet wrote:
> > As discussed during Chris Rapier presentation in Ottawa / netdev0.1,
> > we add to tcp_info the first two fields are highly wanted.
> > 
> > Each field is added into a single patch for easy code review.
> > 
> > (Corresponding iproute2/ss patches will be sent)
> > 
> > Next fields will follow once consensus is reached.
> 
> Is tcpEStatsPerfSegsOut amongst them? I have a need for this one and can
> submit a patch if you're not tracking it already.

Yes, along with tcpEStatsPerfSegsIn

Because tcp_info is aligned to 64bits, we cannot cook a patch simply
adding tcpEStatsPerfSegsOut (u32), otherwise we would introduce a hole.

If you can cook the patch adding tcpEStatsPerfSegsOut and
tcpEStatsPerfSegsIn, I would be very happy to review it.

Thanks !

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 net-next 0/2] tcp: tcp_info extensions
  2015-05-12 20:31   ` Eric Dumazet
@ 2015-05-12 22:21     ` Marcelo Ricardo Leitner
  2015-05-20 23:35     ` [PATCH net-next] tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info Eric Dumazet
  1 sibling, 0 replies; 28+ messages in thread
From: Marcelo Ricardo Leitner @ 2015-05-12 22:21 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, netdev, Yuchung Cheng, Matt Mathis,
	Craig Gallek, Martin Lau, Chris Rapier

On Tue, May 12, 2015 at 01:31:53PM -0700, Eric Dumazet wrote:
> On Tue, 2015-05-12 at 10:08 -0300, Marcelo Ricardo Leitner wrote:
> > On Tue, Apr 28, 2015 at 03:28:16PM -0700, Eric Dumazet wrote:
> > > As discussed during Chris Rapier presentation in Ottawa / netdev0.1,
> > > we add to tcp_info the first two fields are highly wanted.
> > > 
> > > Each field is added into a single patch for easy code review.
> > > 
> > > (Corresponding iproute2/ss patches will be sent)
> > > 
> > > Next fields will follow once consensus is reached.
> > 
> > Is tcpEStatsPerfSegsOut amongst them? I have a need for this one and can
> > submit a patch if you're not tracking it already.
> 
> Yes, along with tcpEStatsPerfSegsIn
> 
> Because tcp_info is aligned to 64bits, we cannot cook a patch simply
> adding tcpEStatsPerfSegsOut (u32), otherwise we would introduce a hole.

Oh ok, thanks for pointing this out.

> If you can cook the patch adding tcpEStatsPerfSegsOut and
> tcpEStatsPerfSegsIn, I would be very happy to review it.

Cool, will do it, thanks!

  Marcelo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH net-next] tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info
  2015-05-12 20:31   ` Eric Dumazet
  2015-05-12 22:21     ` Marcelo Ricardo Leitner
@ 2015-05-20 23:35     ` Eric Dumazet
  2015-05-21  0:06       ` Rick Jones
                         ` (2 more replies)
  1 sibling, 3 replies; 28+ messages in thread
From: Eric Dumazet @ 2015-05-20 23:35 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Eric Dumazet, David S. Miller, netdev, Yuchung Cheng,
	Matt Mathis, Craig Gallek, Martin Lau, Chris Rapier

From: Marcelo Ricardo Leitner <mleitner@redhat.com>

This patch tracks the total number of inbound and outbound segments on a
TCP socket. One may use this number to have an idea on connection
quality when compared against the retransmissions.

RFC4898 named these : tcpEStatsPerfSegsIn and tcpEStatsPerfSegsOut

These are a 32bit field each and can be fetched both from TCP_INFO
getsockopt() if one has a handle on a TCP socket, or from inet_diag
netlink facility (iproute2/ss patch will follow)

Note that tp->segs_out was placed near tp->snd_nxt for good data
locality and minimal performance impact, while tp->segs_in was placed
near tp->bytes_received for the same reason.

Join work with Eric Dumazet.

Note that received SYN are accounted on the listener, but sent SYNACK 
are not accounted.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/tcp.h      |    7 ++++++-
 include/uapi/linux/tcp.h |    4 +++-
 net/ipv4/tcp.c           |    2 ++
 net/ipv4/tcp_ipv4.c      |    1 +
 net/ipv4/tcp_minisocks.c |    1 +
 net/ipv4/tcp_output.c    |    1 +
 net/ipv6/tcp_ipv6.c      |    1 +
 7 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index e6fb5df22db1fb3a2a902581d958e6d4881b399b..f0212026c77fc1d74db96c0312fe9892f56c2a64 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -149,11 +149,16 @@ struct tcp_sock {
 				 * sum(delta(rcv_nxt)), or how many bytes
 				 * were acked.
 				 */
+	u32	segs_in;	/* RFC4898 tcpEStatsPerfSegsIn
+				 * total number of segments in.
+				 */
  	u32	rcv_nxt;	/* What we want to receive next 	*/
 	u32	copied_seq;	/* Head of yet unread data		*/
 	u32	rcv_wup;	/* rcv_nxt on last window update sent	*/
  	u32	snd_nxt;	/* Next sequence we send		*/
-
+	u32	segs_out;	/* RFC4898 tcpEStatsPerfSegsOut
+				 * The total number of segments sent.
+				 */
 	u64	bytes_acked;	/* RFC4898 tcpEStatsAppHCThruOctetsAcked
 				 * sum(delta(snd_una)), or how many bytes
 				 * were acked.
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 51ebedba577f36e75b9aefd1cdf9e191f47f734f..65a77b071e22bec39225799e808b44b35bb1910c 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -192,8 +192,10 @@ struct tcp_info {
 
 	__u64	tcpi_pacing_rate;
 	__u64	tcpi_max_pacing_rate;
-	__u64	tcpi_bytes_acked; /* RFC4898 tcpEStatsAppHCThruOctetsAcked */
+	__u64	tcpi_bytes_acked;    /* RFC4898 tcpEStatsAppHCThruOctetsAcked */
 	__u64	tcpi_bytes_received; /* RFC4898 tcpEStatsAppHCThruOctetsReceived */
+	__u32	tcpi_segs_out;	     /* RFC4898 tcpEStatsPerfSegsOut */
+	__u32	tcpi_segs_in;	     /* RFC4898 tcpEStatsPerfSegsIn */
 };
 
 /* for TCP_MD5SIG socket option */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index bb9bb844204f9f0cf9197fe323f287dce5e5bbd9..f283aba62cf313651f54f0e63448e89e4bafa689 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2686,6 +2686,8 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
 	spin_lock_bh(&sk->sk_lock.slock);
 	info->tcpi_bytes_acked = tp->bytes_acked;
 	info->tcpi_bytes_received = tp->bytes_received;
+	info->tcpi_segs_out = tp->segs_out;
+	info->tcpi_segs_in = tp->segs_in;
 	spin_unlock_bh(&sk->sk_lock.slock);
 }
 EXPORT_SYMBOL_GPL(tcp_get_info);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 0cc4b5a630cd49c60ea0767debf4ee171f41ad3e..feb875769b8d57dcdb85e12b782f3f5e0fb6193a 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1626,6 +1626,7 @@ process:
 	skb->dev = NULL;
 
 	bh_lock_sock_nested(sk);
+	tcp_sk(sk)->segs_in += max_t(u16, 1, skb_shinfo(skb)->gso_segs);
 	ret = 0;
 	if (!sock_owned_by_user(sk)) {
 		if (!tcp_prequeue(sk, skb))
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index ebe2ab2596ed3c2fbd27ecb3bf6f6b66c9e06e08..b62d15c8694679ed8688ab628f7edeb0b065dfe8 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -448,6 +448,7 @@ struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock *req,
 
 		newtp->rcv_wup = newtp->copied_seq =
 		newtp->rcv_nxt = treq->rcv_isn + 1;
+		newtp->segs_in = 0;
 
 		newtp->snd_sml = newtp->snd_una =
 		newtp->snd_nxt = newtp->snd_up = treq->snt_isn + 1;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index e29d43b5a0bb46305b0ab4af29bd05a61abd522d..e19594ac540a5c6091c43afc30c7510287260968 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1027,6 +1027,7 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
 		TCP_ADD_STATS(sock_net(sk), TCP_MIB_OUTSEGS,
 			      tcp_skb_pcount(skb));
 
+	tp->segs_out += tcp_skb_pcount(skb);
 	/* OK, its time to fill skb_shinfo(skb)->gso_segs */
 	skb_shinfo(skb)->gso_segs = tcp_skb_pcount(skb);
 
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index b6575d6655681e8e84993a5db929c7309d47d4d3..beac6bf840b9a9d1e2f281d2b1c71b5a3414b824 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1421,6 +1421,7 @@ process:
 	skb->dev = NULL;
 
 	bh_lock_sock_nested(sk);
+	tcp_sk(sk)->segs_in += max_t(u16, 1, skb_shinfo(skb)->gso_segs);
 	ret = 0;
 	if (!sock_owned_by_user(sk)) {
 		if (!tcp_prequeue(sk, skb))

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next] tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info
  2015-05-20 23:35     ` [PATCH net-next] tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info Eric Dumazet
@ 2015-05-21  0:06       ` Rick Jones
  2015-05-21  0:37         ` Eric Dumazet
  2015-05-21 19:41       ` Yuchung Cheng
  2015-05-22  3:25       ` David Miller
  2 siblings, 1 reply; 28+ messages in thread
From: Rick Jones @ 2015-05-21  0:06 UTC (permalink / raw)
  To: Eric Dumazet, Marcelo Ricardo Leitner
  Cc: Eric Dumazet, David S. Miller, netdev, Yuchung Cheng,
	Matt Mathis, Craig Gallek, Martin Lau, Chris Rapier

On 05/20/2015 04:35 PM, Eric Dumazet wrote:
> From: Marcelo Ricardo Leitner <mleitner@redhat.com>
>
> This patch tracks the total number of inbound and outbound segments on a
> TCP socket. One may use this number to have an idea on connection
> quality when compared against the retransmissions.
>
> RFC4898 named these : tcpEStatsPerfSegsIn and tcpEStatsPerfSegsOut
>
> These are a 32bit field each and can be fetched both from TCP_INFO
> getsockopt() if one has a handle on a TCP socket, or from inet_diag
> netlink facility (iproute2/ss patch will follow)

I suppose it is far-fetched, but is it a concern that at 100 Gbit/s and 
1500 byte MTU the 32 bit segment counter would wrap in something like 
500 seconds and change?

rick jones

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next] tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info
  2015-05-21  0:06       ` Rick Jones
@ 2015-05-21  0:37         ` Eric Dumazet
  2015-05-21  0:48           ` Rick Jones
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2015-05-21  0:37 UTC (permalink / raw)
  To: Rick Jones
  Cc: Marcelo Ricardo Leitner, Eric Dumazet, David S. Miller, netdev,
	Yuchung Cheng, Matt Mathis, Craig Gallek, Martin Lau,
	Chris Rapier

On Wed, 2015-05-20 at 17:06 -0700, Rick Jones wrote:
> On 05/20/2015 04:35 PM, Eric Dumazet wrote:
> > From: Marcelo Ricardo Leitner <mleitner@redhat.com>
> >
> > This patch tracks the total number of inbound and outbound segments on a
> > TCP socket. One may use this number to have an idea on connection
> > quality when compared against the retransmissions.
> >
> > RFC4898 named these : tcpEStatsPerfSegsIn and tcpEStatsPerfSegsOut
> >
> > These are a 32bit field each and can be fetched both from TCP_INFO
> > getsockopt() if one has a handle on a TCP socket, or from inet_diag
> > netlink facility (iproute2/ss patch will follow)
> 
> I suppose it is far-fetched, but is it a concern that at 100 Gbit/s and 
> 1500 byte MTU the 32 bit segment counter would wrap in something like 
> 500 seconds and change?


Like SNMP counters, applications are responsible to adapt snapshot
intervals and deals with wraps around.

For these counters, applications are probably not interested in absolute
numbers, but deltas every couple of seconds to study the behavior of a
flow during time.

The byte counters are 64bits and shall not overflow.

Anyway, if we can send tcp data at 100Gbits on one flow, I guess we are
doing a terrific job and do not need to tweak TCP stack anymore ;)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next] tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info
  2015-05-21  0:37         ` Eric Dumazet
@ 2015-05-21  0:48           ` Rick Jones
  2015-05-21  0:52             ` Eric Dumazet
  0 siblings, 1 reply; 28+ messages in thread
From: Rick Jones @ 2015-05-21  0:48 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Marcelo Ricardo Leitner, Eric Dumazet, David S. Miller, netdev,
	Yuchung Cheng, Matt Mathis, Craig Gallek, Martin Lau,
	Chris Rapier

On 05/20/2015 05:37 PM, Eric Dumazet wrote:

> Anyway, if we can send tcp data at 100Gbits on one flow, I guess we are
> doing a terrific job and do not need to tweak TCP stack anymore ;)

:)

rick

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next] tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info
  2015-05-21  0:48           ` Rick Jones
@ 2015-05-21  0:52             ` Eric Dumazet
  2015-05-21 12:38               ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2015-05-21  0:52 UTC (permalink / raw)
  To: Rick Jones
  Cc: Marcelo Ricardo Leitner, Eric Dumazet, David S. Miller, netdev,
	Yuchung Cheng, Matt Mathis, Craig Gallek, Martin Lau,
	Chris Rapier

On Wed, 2015-05-20 at 17:48 -0700, Rick Jones wrote:
> On 05/20/2015 05:37 PM, Eric Dumazet wrote:
> 
> > Anyway, if we can send tcp data at 100Gbits on one flow, I guess we are
> > doing a terrific job and do not need to tweak TCP stack anymore ;)
> 
> :)
> 

Note that I have no particular strong feelings with these counters being
32 or 64bits. Its adding 8 extra bytes and a bit more overhead, nothing
more.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next] tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info
  2015-05-21  0:52             ` Eric Dumazet
@ 2015-05-21 12:38               ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 28+ messages in thread
From: Marcelo Ricardo Leitner @ 2015-05-21 12:38 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Rick Jones, Eric Dumazet, David S. Miller, netdev, Yuchung Cheng,
	Matt Mathis, Craig Gallek, Martin Lau, Chris Rapier

On Wed, May 20, 2015 at 05:52:34PM -0700, Eric Dumazet wrote:
> On Wed, 2015-05-20 at 17:48 -0700, Rick Jones wrote:
> > On 05/20/2015 05:37 PM, Eric Dumazet wrote:
> > 
> > > Anyway, if we can send tcp data at 100Gbits on one flow, I guess we are
> > > doing a terrific job and do not need to tweak TCP stack anymore ;)
> > 
> > :)
> > 
> 
> Note that I have no particular strong feelings with these counters being
> 32 or 64bits. Its adding 8 extra bytes and a bit more overhead, nothing
> more.

And if they really need an absolute number, 500s is still pretty much
enough for polling these and adding the delta to some local u64 var.

  Marcelo
  

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next] tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info
  2015-05-20 23:35     ` [PATCH net-next] tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info Eric Dumazet
  2015-05-21  0:06       ` Rick Jones
@ 2015-05-21 19:41       ` Yuchung Cheng
  2015-05-21 20:06         ` Eric Dumazet
  2015-05-21 21:53         ` Marcelo Ricardo Leitner
  2015-05-22  3:25       ` David Miller
  2 siblings, 2 replies; 28+ messages in thread
From: Yuchung Cheng @ 2015-05-21 19:41 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Marcelo Ricardo Leitner, Eric Dumazet, David S. Miller, netdev,
	Matt Mathis, Craig Gallek, Martin Lau, Chris Rapier

On Wed, May 20, 2015 at 4:35 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> From: Marcelo Ricardo Leitner <mleitner@redhat.com>
>
> This patch tracks the total number of inbound and outbound segments on a
> TCP socket. One may use this number to have an idea on connection
> quality when compared against the retransmissions.
>
> RFC4898 named these : tcpEStatsPerfSegsIn and tcpEStatsPerfSegsOut
>
> These are a 32bit field each and can be fetched both from TCP_INFO
> getsockopt() if one has a handle on a TCP socket, or from inet_diag
> netlink facility (iproute2/ss patch will follow)
>
> Note that tp->segs_out was placed near tp->snd_nxt for good data
> locality and minimal performance impact, while tp->segs_in was placed
> near tp->bytes_received for the same reason.
>
> Join work with Eric Dumazet.
>
> Note that received SYN are accounted on the listener, but sent SYNACK
> are not accounted.
>
> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  include/linux/tcp.h      |    7 ++++++-
>  include/uapi/linux/tcp.h |    4 +++-
>  net/ipv4/tcp.c           |    2 ++
>  net/ipv4/tcp_ipv4.c      |    1 +
>  net/ipv4/tcp_minisocks.c |    1 +
>  net/ipv4/tcp_output.c    |    1 +
>  net/ipv6/tcp_ipv6.c      |    1 +
>  7 files changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> index e6fb5df22db1fb3a2a902581d958e6d4881b399b..f0212026c77fc1d74db96c0312fe9892f56c2a64 100644
> --- a/include/linux/tcp.h
> +++ b/include/linux/tcp.h
> @@ -149,11 +149,16 @@ struct tcp_sock {
>                                  * sum(delta(rcv_nxt)), or how many bytes
>                                  * were acked.
>                                  */
> +       u32     segs_in;        /* RFC4898 tcpEStatsPerfSegsIn
> +                                * total number of segments in.
> +                                */
>         u32     rcv_nxt;        /* What we want to receive next         */
>         u32     copied_seq;     /* Head of yet unread data              */
>         u32     rcv_wup;        /* rcv_nxt on last window update sent   */
>         u32     snd_nxt;        /* Next sequence we send                */
> -
> +       u32     segs_out;       /* RFC4898 tcpEStatsPerfSegsOut
> +                                * The total number of segments sent.
> +                                */
>         u64     bytes_acked;    /* RFC4898 tcpEStatsAppHCThruOctetsAcked
>                                  * sum(delta(snd_una)), or how many bytes
>                                  * were acked.
> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
> index 51ebedba577f36e75b9aefd1cdf9e191f47f734f..65a77b071e22bec39225799e808b44b35bb1910c 100644
> --- a/include/uapi/linux/tcp.h
> +++ b/include/uapi/linux/tcp.h
> @@ -192,8 +192,10 @@ struct tcp_info {
>
>         __u64   tcpi_pacing_rate;
>         __u64   tcpi_max_pacing_rate;
> -       __u64   tcpi_bytes_acked; /* RFC4898 tcpEStatsAppHCThruOctetsAcked */
> +       __u64   tcpi_bytes_acked;    /* RFC4898 tcpEStatsAppHCThruOctetsAcked */
>         __u64   tcpi_bytes_received; /* RFC4898 tcpEStatsAppHCThruOctetsReceived */
> +       __u32   tcpi_segs_out;       /* RFC4898 tcpEStatsPerfSegsOut */
> +       __u32   tcpi_segs_in;        /* RFC4898 tcpEStatsPerfSegsIn */
>  };
>
>  /* for TCP_MD5SIG socket option */
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index bb9bb844204f9f0cf9197fe323f287dce5e5bbd9..f283aba62cf313651f54f0e63448e89e4bafa689 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2686,6 +2686,8 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
>         spin_lock_bh(&sk->sk_lock.slock);
>         info->tcpi_bytes_acked = tp->bytes_acked;
>         info->tcpi_bytes_received = tp->bytes_received;
> +       info->tcpi_segs_out = tp->segs_out;
> +       info->tcpi_segs_in = tp->segs_in;
>         spin_unlock_bh(&sk->sk_lock.slock);
>  }
>  EXPORT_SYMBOL_GPL(tcp_get_info);
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 0cc4b5a630cd49c60ea0767debf4ee171f41ad3e..feb875769b8d57dcdb85e12b782f3f5e0fb6193a 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1626,6 +1626,7 @@ process:
>         skb->dev = NULL;
>
>         bh_lock_sock_nested(sk);
> +       tcp_sk(sk)->segs_in += max_t(u16, 1, skb_shinfo(skb)->gso_segs);
>         ret = 0;
>         if (!sock_owned_by_user(sk)) {
>                 if (!tcp_prequeue(sk, skb))
> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> index ebe2ab2596ed3c2fbd27ecb3bf6f6b66c9e06e08..b62d15c8694679ed8688ab628f7edeb0b065dfe8 100644
> --- a/net/ipv4/tcp_minisocks.c
> +++ b/net/ipv4/tcp_minisocks.c
> @@ -448,6 +448,7 @@ struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock *req,
>
>                 newtp->rcv_wup = newtp->copied_seq =
>                 newtp->rcv_nxt = treq->rcv_isn + 1;
> +               newtp->segs_in = 0;
It'd be nice to count SYN and SYNACKs for apps tracking the handshake stats.
For syn-cookies we can't do much. But for the rest we can account
req->num_retrans for SYN-ACKs sent, and perhaps track SYN received in
request sock?

>
>                 newtp->snd_sml = newtp->snd_una =
>                 newtp->snd_nxt = newtp->snd_up = treq->snt_isn + 1;
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index e29d43b5a0bb46305b0ab4af29bd05a61abd522d..e19594ac540a5c6091c43afc30c7510287260968 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1027,6 +1027,7 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
>                 TCP_ADD_STATS(sock_net(sk), TCP_MIB_OUTSEGS,
>                               tcp_skb_pcount(skb));
>
> +       tp->segs_out += tcp_skb_pcount(skb);
>         /* OK, its time to fill skb_shinfo(skb)->gso_segs */
>         skb_shinfo(skb)->gso_segs = tcp_skb_pcount(skb);
>
> diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> index b6575d6655681e8e84993a5db929c7309d47d4d3..beac6bf840b9a9d1e2f281d2b1c71b5a3414b824 100644
> --- a/net/ipv6/tcp_ipv6.c
> +++ b/net/ipv6/tcp_ipv6.c
> @@ -1421,6 +1421,7 @@ process:
>         skb->dev = NULL;
>
>         bh_lock_sock_nested(sk);
> +       tcp_sk(sk)->segs_in += max_t(u16, 1, skb_shinfo(skb)->gso_segs);
some helper like e.g. tcp_event_pkt_recv() for v4/v6?
i am curious why use max instead of the ternary op?

>         ret = 0;
>         if (!sock_owned_by_user(sk)) {
>                 if (!tcp_prequeue(sk, skb))
>
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next] tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info
  2015-05-21 19:41       ` Yuchung Cheng
@ 2015-05-21 20:06         ` Eric Dumazet
  2015-05-21 21:53         ` Marcelo Ricardo Leitner
  1 sibling, 0 replies; 28+ messages in thread
From: Eric Dumazet @ 2015-05-21 20:06 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Marcelo Ricardo Leitner, Eric Dumazet, David S. Miller, netdev,
	Matt Mathis, Craig Gallek, Martin Lau, Chris Rapier

On Thu, 2015-05-21 at 12:41 -0700, Yuchung Cheng wrote:

> It'd be nice to count SYN and SYNACKs for apps tracking the handshake stats.
> For syn-cookies we can't do much. But for the rest we can account
> req->num_retrans for SYN-ACKs sent, and perhaps track SYN received in
> request sock?

Well, this is partly because we can not guarantee the behavior that I
chose to explicitly state that we were not tracking SYNACK for passive
sessions.

Also TCP_DEFER_ACCEPT comes into play here...

If we want to handle this in a deterministic way, we would have to add
some bit to warn the application reading tcp_info that the session was
initiated after a valid syncookie was received, and that initial value
of tcp_info.tcpi_segs_{in|out} can be wrong.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next] tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info
  2015-05-21 19:41       ` Yuchung Cheng
  2015-05-21 20:06         ` Eric Dumazet
@ 2015-05-21 21:53         ` Marcelo Ricardo Leitner
  1 sibling, 0 replies; 28+ messages in thread
From: Marcelo Ricardo Leitner @ 2015-05-21 21:53 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Eric Dumazet, Eric Dumazet, David S. Miller, netdev, Matt Mathis,
	Craig Gallek, Martin Lau, Chris Rapier

On Thu, May 21, 2015 at 12:41:21PM -0700, Yuchung Cheng wrote:
> On Wed, May 20, 2015 at 4:35 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
...
> > diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> > index b6575d6655681e8e84993a5db929c7309d47d4d3..beac6bf840b9a9d1e2f281d2b1c71b5a3414b824 100644
> > --- a/net/ipv6/tcp_ipv6.c
> > +++ b/net/ipv6/tcp_ipv6.c
> > @@ -1421,6 +1421,7 @@ process:
> >         skb->dev = NULL;
> >
> >         bh_lock_sock_nested(sk);
> > +       tcp_sk(sk)->segs_in += max_t(u16, 1, skb_shinfo(skb)->gso_segs);
> some helper like e.g. tcp_event_pkt_recv() for v4/v6?
> i am curious why use max instead of the ternary op?

Just because it was based this line, nothing else:

ip_rcv()
   ...
        IP_ADD_STATS_BH(dev_net(dev),                                     
                        IPSTATS_MIB_NOECTPKTS + (iph->tos & INET_ECN_MASK),
                        max_t(unsigned short, 1, skb_shinfo(skb)->gso_segs));

Including these new lines, this calculation is being done 5 times now,
being just these 2 on tcp, then ip_input.c, ip_output and
ip_tunnel_core.c. I'd rather wait some more to add a helper as coming to
a reasonable name is hard at this point. For iptunnel, it's used at
iptunnel_xmit() for example. Or perhaps you have another suggestion of
a name?

Thanks,
Marcelo

> >         ret = 0;
> >         if (!sock_owned_by_user(sk)) {
> >                 if (!tcp_prequeue(sk, skb))
> >
> >
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next] tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info
  2015-05-20 23:35     ` [PATCH net-next] tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info Eric Dumazet
  2015-05-21  0:06       ` Rick Jones
  2015-05-21 19:41       ` Yuchung Cheng
@ 2015-05-22  3:25       ` David Miller
  2015-05-22  3:43         ` Eric Dumazet
  2 siblings, 1 reply; 28+ messages in thread
From: David Miller @ 2015-05-22  3:25 UTC (permalink / raw)
  To: eric.dumazet
  Cc: mleitner, edumazet, netdev, ycheng, mattmathis, cgallek, kafai, rapier

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 20 May 2015 16:35:41 -0700

> From: Marcelo Ricardo Leitner <mleitner@redhat.com>
> 
> This patch tracks the total number of inbound and outbound segments on a
> TCP socket. One may use this number to have an idea on connection
> quality when compared against the retransmissions.
> 
> RFC4898 named these : tcpEStatsPerfSegsIn and tcpEStatsPerfSegsOut
> 
> These are a 32bit field each and can be fetched both from TCP_INFO
> getsockopt() if one has a handle on a TCP socket, or from inet_diag
> netlink facility (iproute2/ss patch will follow)
> 
> Note that tp->segs_out was placed near tp->snd_nxt for good data
> locality and minimal performance impact, while tp->segs_in was placed
> near tp->bytes_received for the same reason.
> 
> Join work with Eric Dumazet.
> 
> Note that received SYN are accounted on the listener, but sent SYNACK 
> are not accounted.
> 
> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied to net-next, thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next] tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info
  2015-05-22  3:25       ` David Miller
@ 2015-05-22  3:43         ` Eric Dumazet
  2015-05-22  4:51           ` [PATCH net-next] tcp: fix a potential deadlock in tcp_get_info() Eric Dumazet
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2015-05-22  3:43 UTC (permalink / raw)
  To: David Miller
  Cc: mleitner, edumazet, netdev, ycheng, mattmathis, cgallek, kafai, rapier

On Thu, 2015-05-21 at 23:25 -0400, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 20 May 2015 16:35:41 -0700
> 
> > From: Marcelo Ricardo Leitner <mleitner@redhat.com>
> > 
> > This patch tracks the total number of inbound and outbound segments on a
> > TCP socket. One may use this number to have an idea on connection
> > quality when compared against the retransmissions.
> > 
> > RFC4898 named these : tcpEStatsPerfSegsIn and tcpEStatsPerfSegsOut
> > 
> > These are a 32bit field each and can be fetched both from TCP_INFO
> > getsockopt() if one has a handle on a TCP socket, or from inet_diag
> > netlink facility (iproute2/ss patch will follow)
> > 
> > Note that tp->segs_out was placed near tp->snd_nxt for good data
> > locality and minimal performance impact, while tp->segs_in was placed
> > near tp->bytes_received for the same reason.
> > 
> > Join work with Eric Dumazet.
> > 
> > Note that received SYN are accounted on the listener, but sent SYNACK 
> > are not accounted.
> > 
> > Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> 
> Applied to net-next, thanks.

Thanks David.

I'll send a fix for the stuff I added earlier, as the spin_lock_bh() in
get_tcp_info() can deadlock since the caller inet_diag_dump_icsk() might
hold the &hashinfo->ehash_locks[i]

I need to instead use
u64_stats_fetch_begin_irq()/u64_stats_fetch_retry_irq()

(and as a bonus optimize 64bit arches)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH net-next] tcp: fix a potential deadlock in tcp_get_info()
  2015-05-22  3:43         ` Eric Dumazet
@ 2015-05-22  4:51           ` Eric Dumazet
  2015-05-22 17:50             ` David Miller
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2015-05-22  4:51 UTC (permalink / raw)
  To: David Miller
  Cc: mleitner, edumazet, netdev, ycheng, mattmathis, cgallek, kafai

From: Eric Dumazet <edumazet@google.com>

Taking socket spinlock in tcp_get_info() can deadlock, as
inet_diag_dump_icsk() holds the &hashinfo->ehash_locks[i],
while packet processing can use the reverse locking order.

We could avoid this locking for TCP_LISTEN states, but lockdep would
certainly get confused as all TCP sockets share same lockdep classes.

[  523.722504] ======================================================
[  523.728706] [ INFO: possible circular locking dependency detected ]
[  523.734990] 4.1.0-dbg-DEV #1676 Not tainted
[  523.739202] -------------------------------------------------------
[  523.745474] ss/18032 is trying to acquire lock:
[  523.750002]  (slock-AF_INET){+.-...}, at: [<ffffffff81669d44>] tcp_get_info+0x2c4/0x360
[  523.758129]
[  523.758129] but task is already holding lock:
[  523.763968]  (&(&hashinfo->ehash_locks[i])->rlock){+.-...}, at: [<ffffffff816bcb75>] inet_diag_dump_icsk+0x1d5/0x6c0
[  523.774661]
[  523.774661] which lock already depends on the new lock.
[  523.774661]
[  523.782850]
[  523.782850] the existing dependency chain (in reverse order) is:
[  523.790326]
-> #1 (&(&hashinfo->ehash_locks[i])->rlock){+.-...}:
[  523.796599]        [<ffffffff811126bb>] lock_acquire+0xbb/0x270
[  523.802565]        [<ffffffff816f5868>] _raw_spin_lock+0x38/0x50
[  523.808628]        [<ffffffff81665af8>] __inet_hash_nolisten+0x78/0x110
[  523.815273]        [<ffffffff816819db>] tcp_v4_syn_recv_sock+0x24b/0x350
[  523.822067]        [<ffffffff81684d41>] tcp_check_req+0x3c1/0x500
[  523.828199]        [<ffffffff81682d09>] tcp_v4_do_rcv+0x239/0x3d0
[  523.834331]        [<ffffffff816842fe>] tcp_v4_rcv+0xa8e/0xc10
[  523.840202]        [<ffffffff81658fa3>] ip_local_deliver_finish+0x133/0x3e0
[  523.847214]        [<ffffffff81659a9a>] ip_local_deliver+0xaa/0xc0
[  523.853440]        [<ffffffff816593b8>] ip_rcv_finish+0x168/0x5c0
[  523.859624]        [<ffffffff81659db7>] ip_rcv+0x307/0x420

Lets use u64_sync infrastructure instead. As a bonus, 64bit
arches get optimized, as these are nop for them.

Fixes: 0df48c26d841 ("tcp: add tcpi_bytes_acked to tcp_info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/tcp.h     |    2 ++
 net/ipv4/tcp.c          |   12 ++++++++----
 net/ipv4/tcp_fastopen.c |    4 ++++
 net/ipv4/tcp_input.c    |    4 ++++
 4 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index f0212026c77f..48c3696e8645 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -163,6 +163,8 @@ struct tcp_sock {
 				 * sum(delta(snd_una)), or how many bytes
 				 * were acked.
 				 */
+	struct u64_stats_sync syncp; /* protects 64bit vars (cf tcp_get_info()) */
+
  	u32	snd_una;	/* First byte we want an ack for	*/
  	u32	snd_sml;	/* Last byte of the most recently transmitted small packet */
 	u32	rcv_tstamp;	/* timestamp of last received ACK (for keepalives) */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 7f3e721b9e69..7bab7dd130bf 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -402,6 +402,7 @@ void tcp_init_sock(struct sock *sk)
 	tp->snd_ssthresh = TCP_INFINITE_SSTHRESH;
 	tp->snd_cwnd_clamp = ~0;
 	tp->mss_cache = TCP_MSS_DEFAULT;
+	u64_stats_init(&tp->syncp);
 
 	tp->reordering = sysctl_tcp_reordering;
 	tcp_enable_early_retrans(tp);
@@ -2625,6 +2626,7 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
 	const struct tcp_sock *tp = tcp_sk(sk);
 	const struct inet_connection_sock *icsk = inet_csk(sk);
 	u32 now = tcp_time_stamp;
+	unsigned int start;
 	u32 rate;
 
 	memset(info, 0, sizeof(*info));
@@ -2692,12 +2694,14 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
 	rate = READ_ONCE(sk->sk_max_pacing_rate);
 	info->tcpi_max_pacing_rate = rate != ~0U ? rate : ~0ULL;
 
-	spin_lock_bh(&sk->sk_lock.slock);
-	info->tcpi_bytes_acked = tp->bytes_acked;
-	info->tcpi_bytes_received = tp->bytes_received;
+	do {
+		start = u64_stats_fetch_begin_irq(&tp->syncp);
+		info->tcpi_bytes_acked = tp->bytes_acked;
+		info->tcpi_bytes_received = tp->bytes_received;
+	} while (u64_stats_fetch_retry_irq(&tp->syncp, start));
+
 	info->tcpi_segs_out = tp->segs_out;
 	info->tcpi_segs_in = tp->segs_in;
-	spin_unlock_bh(&sk->sk_lock.slock);
 }
 EXPORT_SYMBOL_GPL(tcp_get_info);
 
diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
index 3c673d5e6cff..46b087a27503 100644
--- a/net/ipv4/tcp_fastopen.c
+++ b/net/ipv4/tcp_fastopen.c
@@ -206,6 +206,10 @@ static bool tcp_fastopen_create_child(struct sock *sk,
 			skb_set_owner_r(skb2, child);
 			__skb_queue_tail(&child->sk_receive_queue, skb2);
 			tp->syn_data_acked = 1;
+
+			/* u64_stats_update_begin(&tp->syncp) not needed here,
+			 * as we certainly are not changing upper 32bit value (0)
+			 */
 			tp->bytes_received = end_seq - TCP_SKB_CB(skb)->seq - 1;
 		} else {
 			end_seq = TCP_SKB_CB(skb)->seq + 1;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 40c435997e54..23bfd5492a32 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3281,7 +3281,9 @@ static void tcp_snd_una_update(struct tcp_sock *tp, u32 ack)
 {
 	u32 delta = ack - tp->snd_una;
 
+	u64_stats_update_begin(&tp->syncp);
 	tp->bytes_acked += delta;
+	u64_stats_update_end(&tp->syncp);
 	tp->snd_una = ack;
 }
 
@@ -3290,7 +3292,9 @@ static void tcp_rcv_nxt_update(struct tcp_sock *tp, u32 seq)
 {
 	u32 delta = seq - tp->rcv_nxt;
 
+	u64_stats_update_begin(&tp->syncp);
 	tp->bytes_received += delta;
+	u64_stats_update_end(&tp->syncp);
 	tp->rcv_nxt = seq;
 }
 

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next] tcp: fix a potential deadlock in tcp_get_info()
  2015-05-22  4:51           ` [PATCH net-next] tcp: fix a potential deadlock in tcp_get_info() Eric Dumazet
@ 2015-05-22 17:50             ` David Miller
  2015-05-22 17:56               ` Eric Dumazet
  0 siblings, 1 reply; 28+ messages in thread
From: David Miller @ 2015-05-22 17:50 UTC (permalink / raw)
  To: eric.dumazet
  Cc: mleitner, edumazet, netdev, ycheng, mattmathis, cgallek, kafai

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 21 May 2015 21:51:19 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> Taking socket spinlock in tcp_get_info() can deadlock, as
> inet_diag_dump_icsk() holds the &hashinfo->ehash_locks[i],
> while packet processing can use the reverse locking order.
> 
> We could avoid this locking for TCP_LISTEN states, but lockdep would
> certainly get confused as all TCP sockets share same lockdep classes.
 ...
> Lets use u64_sync infrastructure instead. As a bonus, 64bit
> arches get optimized, as these are nop for them.
> 
> Fixes: 0df48c26d841 ("tcp: add tcpi_bytes_acked to tcp_info")
> Signed-off-by: Eric Dumazet <edumazet@google.com>

This bug exists in 'net' so I've applied it there.

Thanks Eric.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next] tcp: fix a potential deadlock in tcp_get_info()
  2015-05-22 17:50             ` David Miller
@ 2015-05-22 17:56               ` Eric Dumazet
  2015-05-22 18:03                 ` Eric Dumazet
  2015-05-22 18:12                 ` David Miller
  0 siblings, 2 replies; 28+ messages in thread
From: Eric Dumazet @ 2015-05-22 17:56 UTC (permalink / raw)
  To: David Miller
  Cc: mleitner, edumazet, netdev, ycheng, mattmathis, cgallek, kafai

On Fri, 2015-05-22 at 13:50 -0400, David Miller wrote:

> This bug exists in 'net' so I've applied it there.
> 
> Thanks Eric.

Oh thats right, sorry for this, as you'll probably have a conflict when
merging net into net-next.

Thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next] tcp: fix a potential deadlock in tcp_get_info()
  2015-05-22 17:56               ` Eric Dumazet
@ 2015-05-22 18:03                 ` Eric Dumazet
  2015-05-22 18:08                   ` Eric Dumazet
  2015-05-22 18:12                 ` David Miller
  1 sibling, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2015-05-22 18:03 UTC (permalink / raw)
  To: David Miller
  Cc: mleitner, edumazet, netdev, ycheng, mattmathis, cgallek, kafai

On Fri, 2015-05-22 at 10:56 -0700, Eric Dumazet wrote:
> On Fri, 2015-05-22 at 13:50 -0400, David Miller wrote:
> 
> > This bug exists in 'net' so I've applied it there.
> > 
> > Thanks Eric.
> 
> Oh thats right, sorry for this, as you'll probably have a conflict when
> merging net into net-next.

Also I am wondering if not explicitly include <linux/u64_stats_sync.h>
could break one arch...

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next] tcp: fix a potential deadlock in tcp_get_info()
  2015-05-22 18:03                 ` Eric Dumazet
@ 2015-05-22 18:08                   ` Eric Dumazet
  2015-05-22 18:21                     ` David Miller
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2015-05-22 18:08 UTC (permalink / raw)
  To: David Miller
  Cc: mleitner, edumazet, netdev, ycheng, mattmathis, cgallek, kafai

On Fri, 2015-05-22 at 11:03 -0700, Eric Dumazet wrote:

> Also I am wondering if not explicitly include <linux/u64_stats_sync.h>
> could break one arch...

Sorry for the false alarm, we should be good because of snmp.h

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next] tcp: fix a potential deadlock in tcp_get_info()
  2015-05-22 17:56               ` Eric Dumazet
  2015-05-22 18:03                 ` Eric Dumazet
@ 2015-05-22 18:12                 ` David Miller
  1 sibling, 0 replies; 28+ messages in thread
From: David Miller @ 2015-05-22 18:12 UTC (permalink / raw)
  To: eric.dumazet
  Cc: mleitner, edumazet, netdev, ycheng, mattmathis, cgallek, kafai

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 22 May 2015 10:56:36 -0700

> On Fri, 2015-05-22 at 13:50 -0400, David Miller wrote:
> 
>> This bug exists in 'net' so I've applied it there.
>> 
>> Thanks Eric.
> 
> Oh thats right, sorry for this, as you'll probably have a conflict when
> merging net into net-next.

I know, but thanks for the heads up.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next] tcp: fix a potential deadlock in tcp_get_info()
  2015-05-22 18:08                   ` Eric Dumazet
@ 2015-05-22 18:21                     ` David Miller
  0 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2015-05-22 18:21 UTC (permalink / raw)
  To: eric.dumazet
  Cc: mleitner, edumazet, netdev, ycheng, mattmathis, cgallek, kafai

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 22 May 2015 11:08:39 -0700

> On Fri, 2015-05-22 at 11:03 -0700, Eric Dumazet wrote:
> 
>> Also I am wondering if not explicitly include <linux/u64_stats_sync.h>
>> could break one arch...
> 
> Sorry for the false alarm, we should be good because of snmp.h

Right.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2015-05-22 18:21 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-28 22:28 [PATCH v2 net-next 0/2] tcp: tcp_info extensions Eric Dumazet
2015-04-28 22:28 ` [PATCH v2 net-next 1/2] tcp: add tcpi_bytes_acked to tcp_info Eric Dumazet
2015-04-29 22:12   ` David Miller
2015-04-28 22:28 ` [PATCH v2 net-next 2/2] tcp: add tcpi_bytes_received " Eric Dumazet
2015-04-28 22:56   ` Yuchung Cheng
2015-04-28 23:07     ` Eric Dumazet
2015-04-29 22:12   ` David Miller
2015-05-12 13:08 ` [PATCH v2 net-next 0/2] tcp: tcp_info extensions Marcelo Ricardo Leitner
2015-05-12 20:31   ` Eric Dumazet
2015-05-12 22:21     ` Marcelo Ricardo Leitner
2015-05-20 23:35     ` [PATCH net-next] tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info Eric Dumazet
2015-05-21  0:06       ` Rick Jones
2015-05-21  0:37         ` Eric Dumazet
2015-05-21  0:48           ` Rick Jones
2015-05-21  0:52             ` Eric Dumazet
2015-05-21 12:38               ` Marcelo Ricardo Leitner
2015-05-21 19:41       ` Yuchung Cheng
2015-05-21 20:06         ` Eric Dumazet
2015-05-21 21:53         ` Marcelo Ricardo Leitner
2015-05-22  3:25       ` David Miller
2015-05-22  3:43         ` Eric Dumazet
2015-05-22  4:51           ` [PATCH net-next] tcp: fix a potential deadlock in tcp_get_info() Eric Dumazet
2015-05-22 17:50             ` David Miller
2015-05-22 17:56               ` Eric Dumazet
2015-05-22 18:03                 ` Eric Dumazet
2015-05-22 18:08                   ` Eric Dumazet
2015-05-22 18:21                     ` David Miller
2015-05-22 18:12                 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).