netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/8] bpf: TCP RTT sock_ops bpf callback
@ 2019-07-01 20:48 Stanislav Fomichev
  2019-07-01 20:48 ` [PATCH bpf-next 1/8] bpf: add BPF_CGROUP_SOCK_OPS callback that is executed on every RTT Stanislav Fomichev
                   ` (9 more replies)
  0 siblings, 10 replies; 17+ messages in thread
From: Stanislav Fomichev @ 2019-07-01 20:48 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, Stanislav Fomichev, Eric Dumazet,
	Priyaranjan Jha, Yuchung Cheng, Soheil Hassas Yeganeh

Congestion control team would like to have a periodic callback to
track some TCP statistics. Let's add a sock_ops callback that can be
selectively enabled on a socket by socket basis and is executed for
every RTT. BPF program frequency can be further controlled by calling
bpf_ktime_get_ns and bailing out early.

I run neper tcp_stream and tcp_rr tests with the sample program
from the last patch and didn't observe any noticeable performance
difference.

Suggested-by: Eric Dumazet <edumazet@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>

Stanislav Fomichev (8):
  bpf: add BPF_CGROUP_SOCK_OPS callback that is executed on every RTT
  bpf: split shared bpf_tcp_sock and bpf_sock_ops implementation
  bpf: add dsack_dups/delivered{,_ce} to bpf_tcp_sock
  bpf: add icsk_retransmits to bpf_tcp_sock
  bpf/tools: sync bpf.h
  selftests/bpf: test BPF_SOCK_OPS_RTT_CB
  samples/bpf: add sample program that periodically dumps TCP stats
  samples/bpf: fix tcp_bpf.readme detach command

 include/net/tcp.h                           |   8 +
 include/uapi/linux/bpf.h                    |  12 +-
 net/core/filter.c                           | 207 +++++++++++-----
 net/ipv4/tcp_input.c                        |   4 +
 samples/bpf/Makefile                        |   1 +
 samples/bpf/tcp_bpf.readme                  |   2 +-
 samples/bpf/tcp_dumpstats_kern.c            |  65 +++++
 tools/include/uapi/linux/bpf.h              |  12 +-
 tools/testing/selftests/bpf/Makefile        |   3 +-
 tools/testing/selftests/bpf/progs/tcp_rtt.c |  61 +++++
 tools/testing/selftests/bpf/test_tcp_rtt.c  | 253 ++++++++++++++++++++
 11 files changed, 570 insertions(+), 58 deletions(-)
 create mode 100644 samples/bpf/tcp_dumpstats_kern.c
 create mode 100644 tools/testing/selftests/bpf/progs/tcp_rtt.c
 create mode 100644 tools/testing/selftests/bpf/test_tcp_rtt.c

-- 
2.22.0.410.gd8fdbe21b5-goog

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH bpf-next 1/8] bpf: add BPF_CGROUP_SOCK_OPS callback that is executed on every RTT
  2019-07-01 20:48 [PATCH bpf-next 0/8] bpf: TCP RTT sock_ops bpf callback Stanislav Fomichev
@ 2019-07-01 20:48 ` Stanislav Fomichev
  2019-07-01 20:48 ` [PATCH bpf-next 2/8] bpf: split shared bpf_tcp_sock and bpf_sock_ops implementation Stanislav Fomichev
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Stanislav Fomichev @ 2019-07-01 20:48 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, Stanislav Fomichev, Eric Dumazet,
	Priyaranjan Jha, Yuchung Cheng, Soheil Hassas Yeganeh

Performance impact should be minimal because it's under a new
BPF_SOCK_OPS_RTT_CB_FLAG flag that has to be explicitly enabled.

Suggested-by: Eric Dumazet <edumazet@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/net/tcp.h        | 8 ++++++++
 include/uapi/linux/bpf.h | 6 +++++-
 net/ipv4/tcp_input.c     | 4 ++++
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 9d36cc88d043..e16d8a3fd3b4 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2221,6 +2221,14 @@ static inline bool tcp_bpf_ca_needs_ecn(struct sock *sk)
 	return (tcp_call_bpf(sk, BPF_SOCK_OPS_NEEDS_ECN, 0, NULL) == 1);
 }
 
+static inline void tcp_bpf_rtt(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+
+	if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_RTT_CB_FLAG))
+		tcp_call_bpf(sk, BPF_SOCK_OPS_RTT_CB, 0, NULL);
+}
+
 #if IS_ENABLED(CONFIG_SMC)
 extern struct static_key_false tcp_have_smc;
 #endif
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index cffea1826a1f..9cdd0aaeba06 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1770,6 +1770,7 @@ union bpf_attr {
  * 		* **BPF_SOCK_OPS_RTO_CB_FLAG** (retransmission time out)
  * 		* **BPF_SOCK_OPS_RETRANS_CB_FLAG** (retransmission)
  * 		* **BPF_SOCK_OPS_STATE_CB_FLAG** (TCP state change)
+ * 		* **BPF_SOCK_OPS_RTT_CB_FLAG** (every RTT)
  *
  * 		Therefore, this function can be used to clear a callback flag by
  * 		setting the appropriate bit to zero. e.g. to disable the RTO
@@ -3314,7 +3315,8 @@ struct bpf_sock_ops {
 #define BPF_SOCK_OPS_RTO_CB_FLAG	(1<<0)
 #define BPF_SOCK_OPS_RETRANS_CB_FLAG	(1<<1)
 #define BPF_SOCK_OPS_STATE_CB_FLAG	(1<<2)
-#define BPF_SOCK_OPS_ALL_CB_FLAGS       0x7		/* Mask of all currently
+#define BPF_SOCK_OPS_RTT_CB_FLAG	(1<<3)
+#define BPF_SOCK_OPS_ALL_CB_FLAGS       0xF		/* Mask of all currently
 							 * supported cb flags
 							 */
 
@@ -3369,6 +3371,8 @@ enum {
 	BPF_SOCK_OPS_TCP_LISTEN_CB,	/* Called on listen(2), right after
 					 * socket transition to LISTEN state.
 					 */
+	BPF_SOCK_OPS_RTT_CB,		/* Called on every RTT.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b71efeb0ae5b..c21e8a22fb3b 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -778,6 +778,8 @@ static void tcp_rtt_estimator(struct sock *sk, long mrtt_us)
 				tp->rttvar_us -= (tp->rttvar_us - tp->mdev_max_us) >> 2;
 			tp->rtt_seq = tp->snd_nxt;
 			tp->mdev_max_us = tcp_rto_min_us(sk);
+
+			tcp_bpf_rtt(sk);
 		}
 	} else {
 		/* no previous measure. */
@@ -786,6 +788,8 @@ static void tcp_rtt_estimator(struct sock *sk, long mrtt_us)
 		tp->rttvar_us = max(tp->mdev_us, tcp_rto_min_us(sk));
 		tp->mdev_max_us = tp->rttvar_us;
 		tp->rtt_seq = tp->snd_nxt;
+
+		tcp_bpf_rtt(sk);
 	}
 	tp->srtt_us = max(1U, srtt);
 }
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH bpf-next 2/8] bpf: split shared bpf_tcp_sock and bpf_sock_ops implementation
  2019-07-01 20:48 [PATCH bpf-next 0/8] bpf: TCP RTT sock_ops bpf callback Stanislav Fomichev
  2019-07-01 20:48 ` [PATCH bpf-next 1/8] bpf: add BPF_CGROUP_SOCK_OPS callback that is executed on every RTT Stanislav Fomichev
@ 2019-07-01 20:48 ` Stanislav Fomichev
  2019-07-01 20:48 ` [PATCH bpf-next 3/8] bpf: add dsack_dups/delivered{,_ce} to bpf_tcp_sock Stanislav Fomichev
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Stanislav Fomichev @ 2019-07-01 20:48 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, Stanislav Fomichev, Eric Dumazet,
	Priyaranjan Jha, Yuchung Cheng, Soheil Hassas Yeganeh

We've added bpf_tcp_sock member to bpf_sock_ops and don't expect
any new tcp_sock fields in bpf_sock_ops. Let's remove
CONVERT_COMMON_TCP_SOCK_FIELDS so bpf_tcp_sock can be independently
extended.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 net/core/filter.c | 180 ++++++++++++++++++++++++++++++++--------------
 1 file changed, 126 insertions(+), 54 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 4836264f82ee..ad908526545d 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5194,54 +5194,6 @@ static const struct bpf_func_proto bpf_lwt_seg6_adjust_srh_proto = {
 };
 #endif /* CONFIG_IPV6_SEG6_BPF */
 
-#define CONVERT_COMMON_TCP_SOCK_FIELDS(md_type, CONVERT)		\
-do {									\
-	switch (si->off) {						\
-	case offsetof(md_type, snd_cwnd):				\
-		CONVERT(snd_cwnd); break;				\
-	case offsetof(md_type, srtt_us):				\
-		CONVERT(srtt_us); break;				\
-	case offsetof(md_type, snd_ssthresh):				\
-		CONVERT(snd_ssthresh); break;				\
-	case offsetof(md_type, rcv_nxt):				\
-		CONVERT(rcv_nxt); break;				\
-	case offsetof(md_type, snd_nxt):				\
-		CONVERT(snd_nxt); break;				\
-	case offsetof(md_type, snd_una):				\
-		CONVERT(snd_una); break;				\
-	case offsetof(md_type, mss_cache):				\
-		CONVERT(mss_cache); break;				\
-	case offsetof(md_type, ecn_flags):				\
-		CONVERT(ecn_flags); break;				\
-	case offsetof(md_type, rate_delivered):				\
-		CONVERT(rate_delivered); break;				\
-	case offsetof(md_type, rate_interval_us):			\
-		CONVERT(rate_interval_us); break;			\
-	case offsetof(md_type, packets_out):				\
-		CONVERT(packets_out); break;				\
-	case offsetof(md_type, retrans_out):				\
-		CONVERT(retrans_out); break;				\
-	case offsetof(md_type, total_retrans):				\
-		CONVERT(total_retrans); break;				\
-	case offsetof(md_type, segs_in):				\
-		CONVERT(segs_in); break;				\
-	case offsetof(md_type, data_segs_in):				\
-		CONVERT(data_segs_in); break;				\
-	case offsetof(md_type, segs_out):				\
-		CONVERT(segs_out); break;				\
-	case offsetof(md_type, data_segs_out):				\
-		CONVERT(data_segs_out); break;				\
-	case offsetof(md_type, lost_out):				\
-		CONVERT(lost_out); break;				\
-	case offsetof(md_type, sacked_out):				\
-		CONVERT(sacked_out); break;				\
-	case offsetof(md_type, bytes_received):				\
-		CONVERT(bytes_received); break;				\
-	case offsetof(md_type, bytes_acked):				\
-		CONVERT(bytes_acked); break;				\
-	}								\
-} while (0)
-
 #ifdef CONFIG_INET
 static struct sock *sk_lookup(struct net *net, struct bpf_sock_tuple *tuple,
 			      int dif, int sdif, u8 family, u8 proto)
@@ -5623,9 +5575,6 @@ u32 bpf_tcp_sock_convert_ctx_access(enum bpf_access_type type,
 				      offsetof(struct tcp_sock, FIELD)); \
 	} while (0)
 
-	CONVERT_COMMON_TCP_SOCK_FIELDS(struct bpf_tcp_sock,
-				       BPF_TCP_SOCK_GET_COMMON);
-
 	if (insn > insn_buf)
 		return insn - insn_buf;
 
@@ -5640,6 +5589,69 @@ u32 bpf_tcp_sock_convert_ctx_access(enum bpf_access_type type,
 				      offsetof(struct tcp_sock, rtt_min) +
 				      offsetof(struct minmax_sample, v));
 		break;
+	case offsetof(struct bpf_tcp_sock, snd_cwnd):
+		BPF_TCP_SOCK_GET_COMMON(snd_cwnd);
+		break;
+	case offsetof(struct bpf_tcp_sock, srtt_us):
+		BPF_TCP_SOCK_GET_COMMON(srtt_us);
+		break;
+	case offsetof(struct bpf_tcp_sock, snd_ssthresh):
+		BPF_TCP_SOCK_GET_COMMON(snd_ssthresh);
+		break;
+	case offsetof(struct bpf_tcp_sock, rcv_nxt):
+		BPF_TCP_SOCK_GET_COMMON(rcv_nxt);
+		break;
+	case offsetof(struct bpf_tcp_sock, snd_nxt):
+		BPF_TCP_SOCK_GET_COMMON(snd_nxt);
+		break;
+	case offsetof(struct bpf_tcp_sock, snd_una):
+		BPF_TCP_SOCK_GET_COMMON(snd_una);
+		break;
+	case offsetof(struct bpf_tcp_sock, mss_cache):
+		BPF_TCP_SOCK_GET_COMMON(mss_cache);
+		break;
+	case offsetof(struct bpf_tcp_sock, ecn_flags):
+		BPF_TCP_SOCK_GET_COMMON(ecn_flags);
+		break;
+	case offsetof(struct bpf_tcp_sock, rate_delivered):
+		BPF_TCP_SOCK_GET_COMMON(rate_delivered);
+		break;
+	case offsetof(struct bpf_tcp_sock, rate_interval_us):
+		BPF_TCP_SOCK_GET_COMMON(rate_interval_us);
+		break;
+	case offsetof(struct bpf_tcp_sock, packets_out):
+		BPF_TCP_SOCK_GET_COMMON(packets_out);
+		break;
+	case offsetof(struct bpf_tcp_sock, retrans_out):
+		BPF_TCP_SOCK_GET_COMMON(retrans_out);
+		break;
+	case offsetof(struct bpf_tcp_sock, total_retrans):
+		BPF_TCP_SOCK_GET_COMMON(total_retrans);
+		break;
+	case offsetof(struct bpf_tcp_sock, segs_in):
+		BPF_TCP_SOCK_GET_COMMON(segs_in);
+		break;
+	case offsetof(struct bpf_tcp_sock, data_segs_in):
+		BPF_TCP_SOCK_GET_COMMON(data_segs_in);
+		break;
+	case offsetof(struct bpf_tcp_sock, segs_out):
+		BPF_TCP_SOCK_GET_COMMON(segs_out);
+		break;
+	case offsetof(struct bpf_tcp_sock, data_segs_out):
+		BPF_TCP_SOCK_GET_COMMON(data_segs_out);
+		break;
+	case offsetof(struct bpf_tcp_sock, lost_out):
+		BPF_TCP_SOCK_GET_COMMON(lost_out);
+		break;
+	case offsetof(struct bpf_tcp_sock, sacked_out):
+		BPF_TCP_SOCK_GET_COMMON(sacked_out);
+		break;
+	case offsetof(struct bpf_tcp_sock, bytes_received):
+		BPF_TCP_SOCK_GET_COMMON(bytes_received);
+		break;
+	case offsetof(struct bpf_tcp_sock, bytes_acked):
+		BPF_TCP_SOCK_GET_COMMON(bytes_acked);
+		break;
 	}
 
 	return insn - insn_buf;
@@ -7913,9 +7925,6 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
 			SOCK_OPS_GET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ);	      \
 	} while (0)
 
-	CONVERT_COMMON_TCP_SOCK_FIELDS(struct bpf_sock_ops,
-				       SOCK_OPS_GET_TCP_SOCK_FIELD);
-
 	if (insn > insn_buf)
 		return insn - insn_buf;
 
@@ -8085,6 +8094,69 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
 		SOCK_OPS_GET_OR_SET_FIELD(sk_txhash, sk_txhash,
 					  struct sock, type);
 		break;
+	case offsetof(struct bpf_sock_ops, snd_cwnd):
+		SOCK_OPS_GET_TCP_SOCK_FIELD(snd_cwnd);
+		break;
+	case offsetof(struct bpf_sock_ops, srtt_us):
+		SOCK_OPS_GET_TCP_SOCK_FIELD(srtt_us);
+		break;
+	case offsetof(struct bpf_sock_ops, snd_ssthresh):
+		SOCK_OPS_GET_TCP_SOCK_FIELD(snd_ssthresh);
+		break;
+	case offsetof(struct bpf_sock_ops, rcv_nxt):
+		SOCK_OPS_GET_TCP_SOCK_FIELD(rcv_nxt);
+		break;
+	case offsetof(struct bpf_sock_ops, snd_nxt):
+		SOCK_OPS_GET_TCP_SOCK_FIELD(snd_nxt);
+		break;
+	case offsetof(struct bpf_sock_ops, snd_una):
+		SOCK_OPS_GET_TCP_SOCK_FIELD(snd_una);
+		break;
+	case offsetof(struct bpf_sock_ops, mss_cache):
+		SOCK_OPS_GET_TCP_SOCK_FIELD(mss_cache);
+		break;
+	case offsetof(struct bpf_sock_ops, ecn_flags):
+		SOCK_OPS_GET_TCP_SOCK_FIELD(ecn_flags);
+		break;
+	case offsetof(struct bpf_sock_ops, rate_delivered):
+		SOCK_OPS_GET_TCP_SOCK_FIELD(rate_delivered);
+		break;
+	case offsetof(struct bpf_sock_ops, rate_interval_us):
+		SOCK_OPS_GET_TCP_SOCK_FIELD(rate_interval_us);
+		break;
+	case offsetof(struct bpf_sock_ops, packets_out):
+		SOCK_OPS_GET_TCP_SOCK_FIELD(packets_out);
+		break;
+	case offsetof(struct bpf_sock_ops, retrans_out):
+		SOCK_OPS_GET_TCP_SOCK_FIELD(retrans_out);
+		break;
+	case offsetof(struct bpf_sock_ops, total_retrans):
+		SOCK_OPS_GET_TCP_SOCK_FIELD(total_retrans);
+		break;
+	case offsetof(struct bpf_sock_ops, segs_in):
+		SOCK_OPS_GET_TCP_SOCK_FIELD(segs_in);
+		break;
+	case offsetof(struct bpf_sock_ops, data_segs_in):
+		SOCK_OPS_GET_TCP_SOCK_FIELD(data_segs_in);
+		break;
+	case offsetof(struct bpf_sock_ops, segs_out):
+		SOCK_OPS_GET_TCP_SOCK_FIELD(segs_out);
+		break;
+	case offsetof(struct bpf_sock_ops, data_segs_out):
+		SOCK_OPS_GET_TCP_SOCK_FIELD(data_segs_out);
+		break;
+	case offsetof(struct bpf_sock_ops, lost_out):
+		SOCK_OPS_GET_TCP_SOCK_FIELD(lost_out);
+		break;
+	case offsetof(struct bpf_sock_ops, sacked_out):
+		SOCK_OPS_GET_TCP_SOCK_FIELD(sacked_out);
+		break;
+	case offsetof(struct bpf_sock_ops, bytes_received):
+		SOCK_OPS_GET_TCP_SOCK_FIELD(bytes_received);
+		break;
+	case offsetof(struct bpf_sock_ops, bytes_acked):
+		SOCK_OPS_GET_TCP_SOCK_FIELD(bytes_acked);
+		break;
 	case offsetof(struct bpf_sock_ops, sk):
 		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
 						struct bpf_sock_ops_kern,
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH bpf-next 3/8] bpf: add dsack_dups/delivered{,_ce} to bpf_tcp_sock
  2019-07-01 20:48 [PATCH bpf-next 0/8] bpf: TCP RTT sock_ops bpf callback Stanislav Fomichev
  2019-07-01 20:48 ` [PATCH bpf-next 1/8] bpf: add BPF_CGROUP_SOCK_OPS callback that is executed on every RTT Stanislav Fomichev
  2019-07-01 20:48 ` [PATCH bpf-next 2/8] bpf: split shared bpf_tcp_sock and bpf_sock_ops implementation Stanislav Fomichev
@ 2019-07-01 20:48 ` Stanislav Fomichev
  2019-07-01 20:48 ` [PATCH bpf-next 4/8] bpf: add icsk_retransmits " Stanislav Fomichev
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Stanislav Fomichev @ 2019-07-01 20:48 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, Stanislav Fomichev, Eric Dumazet,
	Priyaranjan Jha, Yuchung Cheng, Soheil Hassas Yeganeh

Add more fields to bpf_tcp_sock that might be useful for debugging
congestion control issues.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/uapi/linux/bpf.h |  5 +++++
 net/core/filter.c        | 11 ++++++++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 9cdd0aaeba06..bfb0b1a76684 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3073,6 +3073,11 @@ struct bpf_tcp_sock {
 				 * sum(delta(snd_una)), or how many bytes
 				 * were acked.
 				 */
+	__u32 dsack_dups;	/* RFC4898 tcpEStatsStackDSACKDups
+				 * total number of DSACK blocks received
+				 */
+	__u32 delivered;	/* Total data packets delivered incl. rexmits */
+	__u32 delivered_ce;	/* Like the above but only ECE marked packets */
 };
 
 struct bpf_sock_tuple {
diff --git a/net/core/filter.c b/net/core/filter.c
index ad908526545d..3da4b6c38b46 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5544,7 +5544,7 @@ static const struct bpf_func_proto bpf_sock_addr_sk_lookup_udp_proto = {
 bool bpf_tcp_sock_is_valid_access(int off, int size, enum bpf_access_type type,
 				  struct bpf_insn_access_aux *info)
 {
-	if (off < 0 || off >= offsetofend(struct bpf_tcp_sock, bytes_acked))
+	if (off < 0 || off >= offsetofend(struct bpf_tcp_sock, delivered_ce))
 		return false;
 
 	if (off % size != 0)
@@ -5652,6 +5652,15 @@ u32 bpf_tcp_sock_convert_ctx_access(enum bpf_access_type type,
 	case offsetof(struct bpf_tcp_sock, bytes_acked):
 		BPF_TCP_SOCK_GET_COMMON(bytes_acked);
 		break;
+	case offsetof(struct bpf_tcp_sock, dsack_dups):
+		BPF_TCP_SOCK_GET_COMMON(dsack_dups);
+		break;
+	case offsetof(struct bpf_tcp_sock, delivered):
+		BPF_TCP_SOCK_GET_COMMON(delivered);
+		break;
+	case offsetof(struct bpf_tcp_sock, delivered_ce):
+		BPF_TCP_SOCK_GET_COMMON(delivered_ce);
+		break;
 	}
 
 	return insn - insn_buf;
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH bpf-next 4/8] bpf: add icsk_retransmits to bpf_tcp_sock
  2019-07-01 20:48 [PATCH bpf-next 0/8] bpf: TCP RTT sock_ops bpf callback Stanislav Fomichev
                   ` (2 preceding siblings ...)
  2019-07-01 20:48 ` [PATCH bpf-next 3/8] bpf: add dsack_dups/delivered{,_ce} to bpf_tcp_sock Stanislav Fomichev
@ 2019-07-01 20:48 ` Stanislav Fomichev
  2019-07-01 20:48 ` [PATCH bpf-next 5/8] bpf/tools: sync bpf.h Stanislav Fomichev
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Stanislav Fomichev @ 2019-07-01 20:48 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, Stanislav Fomichev, Eric Dumazet,
	Priyaranjan Jha, Yuchung Cheng, Soheil Hassas Yeganeh

Add some inet_connection_sock fields to bpf_tcp_sock that might be useful
for debugging congestion control issues.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/uapi/linux/bpf.h |  1 +
 net/core/filter.c        | 20 +++++++++++++++++++-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index bfb0b1a76684..ead27aebf491 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3078,6 +3078,7 @@ struct bpf_tcp_sock {
 				 */
 	__u32 delivered;	/* Total data packets delivered incl. rexmits */
 	__u32 delivered_ce;	/* Like the above but only ECE marked packets */
+	__u32 icsk_retransmits;	/* Number of unrecovered [RTO] timeouts */
 };
 
 struct bpf_sock_tuple {
diff --git a/net/core/filter.c b/net/core/filter.c
index 3da4b6c38b46..089aaea0ccc6 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5544,7 +5544,8 @@ static const struct bpf_func_proto bpf_sock_addr_sk_lookup_udp_proto = {
 bool bpf_tcp_sock_is_valid_access(int off, int size, enum bpf_access_type type,
 				  struct bpf_insn_access_aux *info)
 {
-	if (off < 0 || off >= offsetofend(struct bpf_tcp_sock, delivered_ce))
+	if (off < 0 || off >= offsetofend(struct bpf_tcp_sock,
+					  icsk_retransmits))
 		return false;
 
 	if (off % size != 0)
@@ -5575,6 +5576,20 @@ u32 bpf_tcp_sock_convert_ctx_access(enum bpf_access_type type,
 				      offsetof(struct tcp_sock, FIELD)); \
 	} while (0)
 
+#define BPF_INET_SOCK_GET_COMMON(FIELD)					\
+	do {								\
+		BUILD_BUG_ON(FIELD_SIZEOF(struct inet_connection_sock,	\
+					  FIELD) >			\
+			     FIELD_SIZEOF(struct bpf_tcp_sock, FIELD));	\
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(			\
+					struct inet_connection_sock,	\
+					FIELD),				\
+				      si->dst_reg, si->src_reg,		\
+				      offsetof(				\
+					struct inet_connection_sock,	\
+					FIELD));			\
+	} while (0)
+
 	if (insn > insn_buf)
 		return insn - insn_buf;
 
@@ -5661,6 +5676,9 @@ u32 bpf_tcp_sock_convert_ctx_access(enum bpf_access_type type,
 	case offsetof(struct bpf_tcp_sock, delivered_ce):
 		BPF_TCP_SOCK_GET_COMMON(delivered_ce);
 		break;
+	case offsetof(struct bpf_tcp_sock, icsk_retransmits):
+		BPF_INET_SOCK_GET_COMMON(icsk_retransmits);
+		break;
 	}
 
 	return insn - insn_buf;
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH bpf-next 5/8] bpf/tools: sync bpf.h
  2019-07-01 20:48 [PATCH bpf-next 0/8] bpf: TCP RTT sock_ops bpf callback Stanislav Fomichev
                   ` (3 preceding siblings ...)
  2019-07-01 20:48 ` [PATCH bpf-next 4/8] bpf: add icsk_retransmits " Stanislav Fomichev
@ 2019-07-01 20:48 ` Stanislav Fomichev
  2019-07-01 20:48 ` [PATCH bpf-next 6/8] selftests/bpf: test BPF_SOCK_OPS_RTT_CB Stanislav Fomichev
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Stanislav Fomichev @ 2019-07-01 20:48 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, Stanislav Fomichev, Eric Dumazet,
	Priyaranjan Jha, Yuchung Cheng, Soheil Hassas Yeganeh

Sync new bpf_tcp_sock fields and new BPF_PROG_TYPE_SOCK_OPS RTT callback.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 tools/include/uapi/linux/bpf.h | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index a396b516a2b2..cecf42c871d4 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1767,6 +1767,7 @@ union bpf_attr {
  * 		* **BPF_SOCK_OPS_RTO_CB_FLAG** (retransmission time out)
  * 		* **BPF_SOCK_OPS_RETRANS_CB_FLAG** (retransmission)
  * 		* **BPF_SOCK_OPS_STATE_CB_FLAG** (TCP state change)
+ * 		* **BPF_SOCK_OPS_RTT_CB_FLAG** (every RTT)
  *
  * 		Therefore, this function can be used to clear a callback flag by
  * 		setting the appropriate bit to zero. e.g. to disable the RTO
@@ -3069,6 +3070,12 @@ struct bpf_tcp_sock {
 				 * sum(delta(snd_una)), or how many bytes
 				 * were acked.
 				 */
+	__u32 dsack_dups;	/* RFC4898 tcpEStatsStackDSACKDups
+				 * total number of DSACK blocks received
+				 */
+	__u32 delivered;	/* Total data packets delivered incl. rexmits */
+	__u32 delivered_ce;	/* Like the above but only ECE marked packets */
+	__u32 icsk_retransmits;	/* Number of unrecovered [RTO] timeouts */
 };
 
 struct bpf_sock_tuple {
@@ -3311,7 +3318,8 @@ struct bpf_sock_ops {
 #define BPF_SOCK_OPS_RTO_CB_FLAG	(1<<0)
 #define BPF_SOCK_OPS_RETRANS_CB_FLAG	(1<<1)
 #define BPF_SOCK_OPS_STATE_CB_FLAG	(1<<2)
-#define BPF_SOCK_OPS_ALL_CB_FLAGS       0x7		/* Mask of all currently
+#define BPF_SOCK_OPS_RTT_CB_FLAG	(1<<3)
+#define BPF_SOCK_OPS_ALL_CB_FLAGS       0xF		/* Mask of all currently
 							 * supported cb flags
 							 */
 
@@ -3366,6 +3374,8 @@ enum {
 	BPF_SOCK_OPS_TCP_LISTEN_CB,	/* Called on listen(2), right after
 					 * socket transition to LISTEN state.
 					 */
+	BPF_SOCK_OPS_RTT_CB,		/* Called on every RTT.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH bpf-next 6/8] selftests/bpf: test BPF_SOCK_OPS_RTT_CB
  2019-07-01 20:48 [PATCH bpf-next 0/8] bpf: TCP RTT sock_ops bpf callback Stanislav Fomichev
                   ` (4 preceding siblings ...)
  2019-07-01 20:48 ` [PATCH bpf-next 5/8] bpf/tools: sync bpf.h Stanislav Fomichev
@ 2019-07-01 20:48 ` Stanislav Fomichev
  2019-07-01 23:40   ` Y Song
  2019-07-01 20:48 ` [PATCH bpf-next 7/8] samples/bpf: add sample program that periodically dumps TCP stats Stanislav Fomichev
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 17+ messages in thread
From: Stanislav Fomichev @ 2019-07-01 20:48 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, Stanislav Fomichev, Eric Dumazet,
	Priyaranjan Jha, Yuchung Cheng, Soheil Hassas Yeganeh

Make sure the callback is invoked for syn-ack and data packet.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 tools/testing/selftests/bpf/Makefile        |   3 +-
 tools/testing/selftests/bpf/progs/tcp_rtt.c |  61 +++++
 tools/testing/selftests/bpf/test_tcp_rtt.c  | 253 ++++++++++++++++++++
 3 files changed, 316 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/progs/tcp_rtt.c
 create mode 100644 tools/testing/selftests/bpf/test_tcp_rtt.c

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index de1754a8f5fe..2620406a53ec 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -27,7 +27,7 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test
 	test_cgroup_storage test_select_reuseport test_section_names \
 	test_netcnt test_tcpnotify_user test_sock_fields test_sysctl test_hashmap \
 	test_btf_dump test_cgroup_attach xdping test_sockopt test_sockopt_sk \
-	test_sockopt_multi
+	test_sockopt_multi test_tcp_rtt
 
 BPF_OBJ_FILES = $(patsubst %.c,%.o, $(notdir $(wildcard progs/*.c)))
 TEST_GEN_FILES = $(BPF_OBJ_FILES)
@@ -107,6 +107,7 @@ $(OUTPUT)/test_cgroup_attach: cgroup_helpers.c
 $(OUTPUT)/test_sockopt: cgroup_helpers.c
 $(OUTPUT)/test_sockopt_sk: cgroup_helpers.c
 $(OUTPUT)/test_sockopt_multi: cgroup_helpers.c
+$(OUTPUT)/test_tcp_rtt: cgroup_helpers.c
 
 .PHONY: force
 
diff --git a/tools/testing/selftests/bpf/progs/tcp_rtt.c b/tools/testing/selftests/bpf/progs/tcp_rtt.c
new file mode 100644
index 000000000000..233bdcb1659e
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/tcp_rtt.c
@@ -0,0 +1,61 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include "bpf_helpers.h"
+
+char _license[] SEC("license") = "GPL";
+__u32 _version SEC("version") = 1;
+
+struct tcp_rtt_storage {
+	__u32 invoked;
+	__u32 dsack_dups;
+	__u32 delivered;
+	__u32 delivered_ce;
+	__u32 icsk_retransmits;
+};
+
+struct bpf_map_def SEC("maps") socket_storage_map = {
+	.type = BPF_MAP_TYPE_SK_STORAGE,
+	.key_size = sizeof(int),
+	.value_size = sizeof(struct tcp_rtt_storage),
+	.map_flags = BPF_F_NO_PREALLOC,
+};
+BPF_ANNOTATE_KV_PAIR(socket_storage_map, int, struct tcp_rtt_storage);
+
+SEC("sockops")
+int _sockops(struct bpf_sock_ops *ctx)
+{
+	struct tcp_rtt_storage *storage;
+	struct bpf_tcp_sock *tcp_sk;
+	int op = (int) ctx->op;
+	struct bpf_sock *sk;
+
+	sk = ctx->sk;
+	if (!sk)
+		return 1;
+
+	storage = bpf_sk_storage_get(&socket_storage_map, sk, 0,
+				     BPF_SK_STORAGE_GET_F_CREATE);
+	if (!storage)
+		return 1;
+
+	if (op == BPF_SOCK_OPS_TCP_CONNECT_CB) {
+		bpf_sock_ops_cb_flags_set(ctx, BPF_SOCK_OPS_RTT_CB_FLAG);
+		return 1;
+	}
+
+	if (op != BPF_SOCK_OPS_RTT_CB)
+		return 1;
+
+	tcp_sk = bpf_tcp_sock(sk);
+	if (!tcp_sk)
+		return 1;
+
+	storage->invoked++;
+
+	storage->dsack_dups = tcp_sk->dsack_dups;
+	storage->delivered = tcp_sk->delivered;
+	storage->delivered_ce = tcp_sk->delivered_ce;
+	storage->icsk_retransmits = tcp_sk->icsk_retransmits;
+
+	return 1;
+}
diff --git a/tools/testing/selftests/bpf/test_tcp_rtt.c b/tools/testing/selftests/bpf/test_tcp_rtt.c
new file mode 100644
index 000000000000..413fd8514adc
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_tcp_rtt.c
@@ -0,0 +1,253 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <error.h>
+#include <errno.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <pthread.h>
+
+#include <linux/filter.h>
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+
+#include "bpf_rlimit.h"
+#include "bpf_util.h"
+#include "cgroup_helpers.h"
+
+#define CG_PATH                                "/tcp_rtt"
+
+struct tcp_rtt_storage {
+	__u32 invoked;
+	__u32 dsack_dups;
+	__u32 delivered;
+	__u32 delivered_ce;
+	__u32 icsk_retransmits;
+};
+
+static void send_byte(int fd)
+{
+	char b = 0x55;
+
+	if (write(fd, &b, sizeof(b)) != 1)
+		error(1, errno, "Failed to send single byte");
+}
+
+static int verify_sk(int map_fd, int client_fd, const char *msg, __u32 invoked,
+		     __u32 dsack_dups, __u32 delivered, __u32 delivered_ce,
+		     __u32 icsk_retransmits)
+{
+	int err = 0;
+	struct tcp_rtt_storage val;
+
+	if (bpf_map_lookup_elem(map_fd, &client_fd, &val) < 0)
+		error(1, errno, "Failed to read socket storage");
+
+	if (val.invoked != invoked) {
+		log_err("%s: unexpected bpf_tcp_sock.invoked %d != %d",
+			msg, val.invoked, invoked);
+		err++;
+	}
+
+	if (val.dsack_dups != dsack_dups) {
+		log_err("%s: unexpected bpf_tcp_sock.dsack_dups %d != %d",
+			msg, val.dsack_dups, dsack_dups);
+		err++;
+	}
+
+	if (val.delivered != delivered) {
+		log_err("%s: unexpected bpf_tcp_sock.delivered %d != %d",
+			msg, val.delivered, delivered);
+		err++;
+	}
+
+	if (val.delivered_ce != delivered_ce) {
+		log_err("%s: unexpected bpf_tcp_sock.delivered_ce %d != %d",
+			msg, val.delivered_ce, delivered_ce);
+		err++;
+	}
+
+	if (val.icsk_retransmits != icsk_retransmits) {
+		log_err("%s: unexpected bpf_tcp_sock.icsk_retransmits %d != %d",
+			msg, val.icsk_retransmits, icsk_retransmits);
+		err++;
+	}
+
+	return err;
+}
+
+static int connect_to_server(int server_fd)
+{
+	struct sockaddr_storage addr;
+	socklen_t len = sizeof(addr);
+	int fd;
+
+	fd = socket(AF_INET, SOCK_STREAM, 0);
+	if (fd < 0) {
+		log_err("Failed to create client socket");
+		return -1;
+	}
+
+	if (getsockname(server_fd, (struct sockaddr *)&addr, &len)) {
+		log_err("Failed to get server addr");
+		goto out;
+	}
+
+	if (connect(fd, (const struct sockaddr *)&addr, len) < 0) {
+		log_err("Fail to connect to server");
+		goto out;
+	}
+
+	return fd;
+
+out:
+	close(fd);
+	return -1;
+}
+
+static int run_test(int cgroup_fd, int server_fd)
+{
+	struct bpf_prog_load_attr attr = {
+		.prog_type = BPF_PROG_TYPE_SOCK_OPS,
+		.file = "./tcp_rtt.o",
+		.expected_attach_type = BPF_CGROUP_SOCK_OPS,
+	};
+	struct bpf_program *prog;
+	struct bpf_object *obj;
+	struct bpf_map *map;
+	int client_fd;
+	int ignored;
+	int map_fd;
+	int err;
+
+	err = bpf_prog_load_xattr(&attr, &obj, &ignored);
+	if (err) {
+		log_err("Failed to load BPF object");
+		return -1;
+	}
+
+	map = bpf_map__next(NULL, obj);
+	map_fd = bpf_map__fd(map);
+
+	prog = bpf_program__next(NULL, obj);
+	err = bpf_prog_attach(bpf_program__fd(prog), cgroup_fd,
+			      BPF_CGROUP_SOCK_OPS, 0);
+	if (err) {
+		log_err("Failed to attach BPF program");
+		goto close_bpf_object;
+	}
+
+	client_fd = connect_to_server(server_fd);
+	if (client_fd < 0) {
+		err = -1;
+		goto close_bpf_object;
+	}
+
+	err += verify_sk(map_fd, client_fd, "syn-ack",
+			 /*invoked=*/1,
+			 /*dsack_dups=*/0,
+			 /*delivered=*/1,
+			 /*delivered_ce=*/0,
+			 /*icsk_retransmits=*/0);
+
+	send_byte(client_fd);
+
+	err += verify_sk(map_fd, client_fd, "first payload byte",
+			 /*invoked=*/2,
+			 /*dsack_dups=*/0,
+			 /*delivered=*/2,
+			 /*delivered_ce=*/0,
+			 /*icsk_retransmits=*/0);
+
+	close(client_fd);
+
+close_bpf_object:
+	bpf_object__close(obj);
+	return err;
+}
+
+static int start_server(void)
+{
+	struct sockaddr_in addr = {
+		.sin_family = AF_INET,
+		.sin_addr.s_addr = htonl(INADDR_LOOPBACK),
+	};
+	int fd;
+
+	fd = socket(AF_INET, SOCK_STREAM, 0);
+	if (fd < 0) {
+		log_err("Failed to create server socket");
+		return -1;
+	}
+
+	if (bind(fd, (const struct sockaddr *)&addr, sizeof(addr)) < 0) {
+		log_err("Failed to bind socket");
+		close(fd);
+		return -1;
+	}
+
+	return fd;
+}
+
+static void *server_thread(void *arg)
+{
+	struct sockaddr_storage addr;
+	socklen_t len = sizeof(addr);
+	int fd = *(int *)arg;
+	int client_fd;
+
+	if (listen(fd, 1) < 0)
+		error(1, errno, "Failed to listed on socket");
+
+	client_fd = accept(fd, (struct sockaddr *)&addr, &len);
+	if (client_fd < 0)
+		error(1, errno, "Failed to accept client");
+
+	if (accept(fd, (struct sockaddr *)&addr, &len) >= 0)
+		error(1, errno, "Unexpected success in second accept");
+
+	close(client_fd);
+
+	return NULL;
+}
+
+int main(int args, char **argv)
+{
+	int server_fd, cgroup_fd;
+	int err = EXIT_SUCCESS;
+	pthread_t tid;
+
+	if (setup_cgroup_environment())
+		goto cleanup_obj;
+
+	cgroup_fd = create_and_get_cgroup(CG_PATH);
+	if (cgroup_fd < 0)
+		goto cleanup_cgroup_env;
+
+	if (join_cgroup(CG_PATH))
+		goto cleanup_cgroup;
+
+	server_fd = start_server();
+	if (server_fd < 0) {
+		err = EXIT_FAILURE;
+		goto cleanup_cgroup;
+	}
+
+	pthread_create(&tid, NULL, server_thread, (void *)&server_fd);
+
+	if (run_test(cgroup_fd, server_fd))
+		err = EXIT_FAILURE;
+
+	close(server_fd);
+
+	printf("test_sockopt_sk: %s\n",
+	       err == EXIT_SUCCESS ? "PASSED" : "FAILED");
+
+cleanup_cgroup:
+	close(cgroup_fd);
+cleanup_cgroup_env:
+	cleanup_cgroup_environment();
+cleanup_obj:
+	return err;
+}
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH bpf-next 7/8] samples/bpf: add sample program that periodically dumps TCP stats
  2019-07-01 20:48 [PATCH bpf-next 0/8] bpf: TCP RTT sock_ops bpf callback Stanislav Fomichev
                   ` (5 preceding siblings ...)
  2019-07-01 20:48 ` [PATCH bpf-next 6/8] selftests/bpf: test BPF_SOCK_OPS_RTT_CB Stanislav Fomichev
@ 2019-07-01 20:48 ` Stanislav Fomichev
  2019-07-02  0:15   ` Y Song
  2019-07-01 20:48 ` [PATCH bpf-next 8/8] samples/bpf: fix tcp_bpf.readme detach command Stanislav Fomichev
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 17+ messages in thread
From: Stanislav Fomichev @ 2019-07-01 20:48 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, Stanislav Fomichev, Eric Dumazet,
	Priyaranjan Jha, Yuchung Cheng, Soheil Hassas Yeganeh

Uses new RTT callback to dump stats every second.

$ mkdir -p /tmp/cgroupv2
$ mount -t cgroup2 none /tmp/cgroupv2
$ mkdir -p /tmp/cgroupv2/foo
$ echo $$ >> /tmp/cgroupv2/foo/cgroup.procs
$ bpftool prog load ./tcp_dumpstats_kern.o /sys/fs/bpf/tcp_prog
$ bpftool cgroup attach /tmp/cgroupv2/foo sock_ops pinned /sys/fs/bpf/tcp_prog
$ bpftool prog tracelog
$ # run neper/netperf/etc

Used neper to compare performance with and without this program attached
and didn't see any noticeable performance impact.

Sample output:
  <idle>-0     [015] ..s.  2074.128800: 0: dsack_dups=0 delivered=242526
  <idle>-0     [015] ..s.  2074.128808: 0: delivered_ce=0 icsk_retransmits=0
  <idle>-0     [015] ..s.  2075.130133: 0: dsack_dups=0 delivered=323599
  <idle>-0     [015] ..s.  2075.130138: 0: delivered_ce=0 icsk_retransmits=0
  <idle>-0     [005] .Ns.  2076.131440: 0: dsack_dups=0 delivered=404648
  <idle>-0     [005] .Ns.  2076.131447: 0: delivered_ce=0 icsk_retransmits=0

Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 samples/bpf/Makefile             |  1 +
 samples/bpf/tcp_dumpstats_kern.c | 65 ++++++++++++++++++++++++++++++++
 2 files changed, 66 insertions(+)
 create mode 100644 samples/bpf/tcp_dumpstats_kern.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 0917f8cf4fab..eaebbeead42f 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -154,6 +154,7 @@ always += tcp_iw_kern.o
 always += tcp_clamp_kern.o
 always += tcp_basertt_kern.o
 always += tcp_tos_reflect_kern.o
+always += tcp_dumpstats_kern.o
 always += xdp_redirect_kern.o
 always += xdp_redirect_map_kern.o
 always += xdp_redirect_cpu_kern.o
diff --git a/samples/bpf/tcp_dumpstats_kern.c b/samples/bpf/tcp_dumpstats_kern.c
new file mode 100644
index 000000000000..5d22bf61db65
--- /dev/null
+++ b/samples/bpf/tcp_dumpstats_kern.c
@@ -0,0 +1,65 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+
+#include "bpf_helpers.h"
+#include "bpf_endian.h"
+
+#define INTERVAL			1000000000ULL
+
+int _version SEC("version") = 1;
+char _license[] SEC("license") = "GPL";
+
+struct {
+	__u32 type;
+	__u32 map_flags;
+	int *key;
+	__u64 *value;
+} bpf_next_dump SEC(".maps") = {
+	.type = BPF_MAP_TYPE_SK_STORAGE,
+	.map_flags = BPF_F_NO_PREALLOC,
+};
+
+SEC("sockops")
+int _sockops(struct bpf_sock_ops *ctx)
+{
+	struct bpf_tcp_sock *tcp_sk;
+	struct bpf_sock *sk;
+	__u64 *next_dump;
+	__u64 now;
+
+	switch (ctx->op) {
+	case BPF_SOCK_OPS_TCP_CONNECT_CB:
+		bpf_sock_ops_cb_flags_set(ctx, BPF_SOCK_OPS_RTT_CB_FLAG);
+		return 1;
+	case BPF_SOCK_OPS_RTT_CB:
+		break;
+	default:
+		return 1;
+	}
+
+	sk = ctx->sk;
+	if (!sk)
+		return 1;
+
+	next_dump = bpf_sk_storage_get(&bpf_next_dump, sk, 0,
+				       BPF_SK_STORAGE_GET_F_CREATE);
+	if (!next_dump)
+		return 1;
+
+	now = bpf_ktime_get_ns();
+	if (now < *next_dump)
+		return 1;
+
+	tcp_sk = bpf_tcp_sock(sk);
+	if (!tcp_sk)
+		return 1;
+
+	*next_dump = now + INTERVAL;
+
+	bpf_printk("dsack_dups=%u delivered=%u\n",
+		   tcp_sk->dsack_dups, tcp_sk->delivered);
+	bpf_printk("delivered_ce=%u icsk_retransmits=%u\n",
+		   tcp_sk->delivered_ce, tcp_sk->icsk_retransmits);
+
+	return 1;
+}
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH bpf-next 8/8] samples/bpf: fix tcp_bpf.readme detach command
  2019-07-01 20:48 [PATCH bpf-next 0/8] bpf: TCP RTT sock_ops bpf callback Stanislav Fomichev
                   ` (6 preceding siblings ...)
  2019-07-01 20:48 ` [PATCH bpf-next 7/8] samples/bpf: add sample program that periodically dumps TCP stats Stanislav Fomichev
@ 2019-07-01 20:48 ` Stanislav Fomichev
  2019-07-01 21:03 ` [PATCH bpf-next 0/8] bpf: TCP RTT sock_ops bpf callback Soheil Hassas Yeganeh
  2019-07-01 21:15 ` Yuchung Cheng
  9 siblings, 0 replies; 17+ messages in thread
From: Stanislav Fomichev @ 2019-07-01 20:48 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev

Copy-paste, should be detach, not attach.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 samples/bpf/tcp_bpf.readme | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/samples/bpf/tcp_bpf.readme b/samples/bpf/tcp_bpf.readme
index fee746621aec..78e247f62108 100644
--- a/samples/bpf/tcp_bpf.readme
+++ b/samples/bpf/tcp_bpf.readme
@@ -25,4 +25,4 @@ attached to the cgroupv2).
 
 To remove (unattach) a socket_ops BPF program from a cgroupv2:
 
-  bpftool cgroup attach /tmp/cgroupv2/foo sock_ops pinned /sys/fs/bpf/tcp_prog
+  bpftool cgroup detach /tmp/cgroupv2/foo sock_ops pinned /sys/fs/bpf/tcp_prog
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next 0/8] bpf: TCP RTT sock_ops bpf callback
  2019-07-01 20:48 [PATCH bpf-next 0/8] bpf: TCP RTT sock_ops bpf callback Stanislav Fomichev
                   ` (7 preceding siblings ...)
  2019-07-01 20:48 ` [PATCH bpf-next 8/8] samples/bpf: fix tcp_bpf.readme detach command Stanislav Fomichev
@ 2019-07-01 21:03 ` Soheil Hassas Yeganeh
  2019-07-01 21:15 ` Yuchung Cheng
  9 siblings, 0 replies; 17+ messages in thread
From: Soheil Hassas Yeganeh @ 2019-07-01 21:03 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: netdev, bpf, David Miller, Alexei Starovoitov, Daniel Borkmann,
	Eric Dumazet, Priyaranjan Jha, Yuchung Cheng

On Mon, Jul 1, 2019 at 4:48 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> Congestion control team would like to have a periodic callback to
> track some TCP statistics. Let's add a sock_ops callback that can be
> selectively enabled on a socket by socket basis and is executed for
> every RTT. BPF program frequency can be further controlled by calling
> bpf_ktime_get_ns and bailing out early.
>
> I run neper tcp_stream and tcp_rr tests with the sample program
> from the last patch and didn't observe any noticeable performance
> difference.
>
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Priyaranjan Jha <priyarjha@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
> Cc: Soheil Hassas Yeganeh <soheil@google.com>

Acked-by: Soheil Hassas Yeganeh <soheil@google.com>

Thank you for the nice patch series!


> Stanislav Fomichev (8):
>   bpf: add BPF_CGROUP_SOCK_OPS callback that is executed on every RTT
>   bpf: split shared bpf_tcp_sock and bpf_sock_ops implementation
>   bpf: add dsack_dups/delivered{,_ce} to bpf_tcp_sock
>   bpf: add icsk_retransmits to bpf_tcp_sock
>   bpf/tools: sync bpf.h
>   selftests/bpf: test BPF_SOCK_OPS_RTT_CB
>   samples/bpf: add sample program that periodically dumps TCP stats
>   samples/bpf: fix tcp_bpf.readme detach command
>
>  include/net/tcp.h                           |   8 +
>  include/uapi/linux/bpf.h                    |  12 +-
>  net/core/filter.c                           | 207 +++++++++++-----
>  net/ipv4/tcp_input.c                        |   4 +
>  samples/bpf/Makefile                        |   1 +
>  samples/bpf/tcp_bpf.readme                  |   2 +-
>  samples/bpf/tcp_dumpstats_kern.c            |  65 +++++
>  tools/include/uapi/linux/bpf.h              |  12 +-
>  tools/testing/selftests/bpf/Makefile        |   3 +-
>  tools/testing/selftests/bpf/progs/tcp_rtt.c |  61 +++++
>  tools/testing/selftests/bpf/test_tcp_rtt.c  | 253 ++++++++++++++++++++
>  11 files changed, 570 insertions(+), 58 deletions(-)
>  create mode 100644 samples/bpf/tcp_dumpstats_kern.c
>  create mode 100644 tools/testing/selftests/bpf/progs/tcp_rtt.c
>  create mode 100644 tools/testing/selftests/bpf/test_tcp_rtt.c
>
> --
> 2.22.0.410.gd8fdbe21b5-goog

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next 0/8] bpf: TCP RTT sock_ops bpf callback
  2019-07-01 20:48 [PATCH bpf-next 0/8] bpf: TCP RTT sock_ops bpf callback Stanislav Fomichev
                   ` (8 preceding siblings ...)
  2019-07-01 21:03 ` [PATCH bpf-next 0/8] bpf: TCP RTT sock_ops bpf callback Soheil Hassas Yeganeh
@ 2019-07-01 21:15 ` Yuchung Cheng
  9 siblings, 0 replies; 17+ messages in thread
From: Yuchung Cheng @ 2019-07-01 21:15 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: netdev, bpf, David Miller, ast, Daniel Borkmann, Eric Dumazet,
	Priyaranjan Jha, Soheil Hassas Yeganeh

On Mon, Jul 1, 2019 at 1:48 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> Congestion control team would like to have a periodic callback to
> track some TCP statistics. Let's add a sock_ops callback that can be
> selectively enabled on a socket by socket basis and is executed for
> every RTT. BPF program frequency can be further controlled by calling
> bpf_ktime_get_ns and bailing out early.
>
> I run neper tcp_stream and tcp_rr tests with the sample program
> from the last patch and didn't observe any noticeable performance
> difference.
>
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Priyaranjan Jha <priyarjha@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>

Thanks!

> Cc: Soheil Hassas Yeganeh <soheil@google.com>
>
> Stanislav Fomichev (8):
>   bpf: add BPF_CGROUP_SOCK_OPS callback that is executed on every RTT
>   bpf: split shared bpf_tcp_sock and bpf_sock_ops implementation
>   bpf: add dsack_dups/delivered{,_ce} to bpf_tcp_sock
>   bpf: add icsk_retransmits to bpf_tcp_sock
>   bpf/tools: sync bpf.h
>   selftests/bpf: test BPF_SOCK_OPS_RTT_CB
>   samples/bpf: add sample program that periodically dumps TCP stats
>   samples/bpf: fix tcp_bpf.readme detach command
>
>  include/net/tcp.h                           |   8 +
>  include/uapi/linux/bpf.h                    |  12 +-
>  net/core/filter.c                           | 207 +++++++++++-----
>  net/ipv4/tcp_input.c                        |   4 +
>  samples/bpf/Makefile                        |   1 +
>  samples/bpf/tcp_bpf.readme                  |   2 +-
>  samples/bpf/tcp_dumpstats_kern.c            |  65 +++++
>  tools/include/uapi/linux/bpf.h              |  12 +-
>  tools/testing/selftests/bpf/Makefile        |   3 +-
>  tools/testing/selftests/bpf/progs/tcp_rtt.c |  61 +++++
>  tools/testing/selftests/bpf/test_tcp_rtt.c  | 253 ++++++++++++++++++++
>  11 files changed, 570 insertions(+), 58 deletions(-)
>  create mode 100644 samples/bpf/tcp_dumpstats_kern.c
>  create mode 100644 tools/testing/selftests/bpf/progs/tcp_rtt.c
>  create mode 100644 tools/testing/selftests/bpf/test_tcp_rtt.c
>
> --
> 2.22.0.410.gd8fdbe21b5-goog

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next 6/8] selftests/bpf: test BPF_SOCK_OPS_RTT_CB
  2019-07-01 20:48 ` [PATCH bpf-next 6/8] selftests/bpf: test BPF_SOCK_OPS_RTT_CB Stanislav Fomichev
@ 2019-07-01 23:40   ` Y Song
  2019-07-02  0:07     ` Stanislav Fomichev
  0 siblings, 1 reply; 17+ messages in thread
From: Y Song @ 2019-07-01 23:40 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: netdev, bpf, David Miller, Alexei Starovoitov, Daniel Borkmann,
	Eric Dumazet, Priyaranjan Jha, Yuchung Cheng,
	Soheil Hassas Yeganeh

On Mon, Jul 1, 2019 at 1:49 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> Make sure the callback is invoked for syn-ack and data packet.
>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Priyaranjan Jha <priyarjha@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
> Cc: Soheil Hassas Yeganeh <soheil@google.com>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  tools/testing/selftests/bpf/Makefile        |   3 +-
>  tools/testing/selftests/bpf/progs/tcp_rtt.c |  61 +++++
>  tools/testing/selftests/bpf/test_tcp_rtt.c  | 253 ++++++++++++++++++++
>  3 files changed, 316 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/bpf/progs/tcp_rtt.c
>  create mode 100644 tools/testing/selftests/bpf/test_tcp_rtt.c
>
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index de1754a8f5fe..2620406a53ec 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -27,7 +27,7 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test
>         test_cgroup_storage test_select_reuseport test_section_names \
>         test_netcnt test_tcpnotify_user test_sock_fields test_sysctl test_hashmap \
>         test_btf_dump test_cgroup_attach xdping test_sockopt test_sockopt_sk \
> -       test_sockopt_multi
> +       test_sockopt_multi test_tcp_rtt
>
>  BPF_OBJ_FILES = $(patsubst %.c,%.o, $(notdir $(wildcard progs/*.c)))
>  TEST_GEN_FILES = $(BPF_OBJ_FILES)
> @@ -107,6 +107,7 @@ $(OUTPUT)/test_cgroup_attach: cgroup_helpers.c
>  $(OUTPUT)/test_sockopt: cgroup_helpers.c
>  $(OUTPUT)/test_sockopt_sk: cgroup_helpers.c
>  $(OUTPUT)/test_sockopt_multi: cgroup_helpers.c
> +$(OUTPUT)/test_tcp_rtt: cgroup_helpers.c
>
>  .PHONY: force
>
> diff --git a/tools/testing/selftests/bpf/progs/tcp_rtt.c b/tools/testing/selftests/bpf/progs/tcp_rtt.c
> new file mode 100644
> index 000000000000..233bdcb1659e
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/tcp_rtt.c
> @@ -0,0 +1,61 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/bpf.h>
> +#include "bpf_helpers.h"
> +
> +char _license[] SEC("license") = "GPL";
> +__u32 _version SEC("version") = 1;
> +
> +struct tcp_rtt_storage {
> +       __u32 invoked;
> +       __u32 dsack_dups;
> +       __u32 delivered;
> +       __u32 delivered_ce;
> +       __u32 icsk_retransmits;
> +};
> +
> +struct bpf_map_def SEC("maps") socket_storage_map = {
> +       .type = BPF_MAP_TYPE_SK_STORAGE,
> +       .key_size = sizeof(int),
> +       .value_size = sizeof(struct tcp_rtt_storage),
> +       .map_flags = BPF_F_NO_PREALLOC,
> +};
> +BPF_ANNOTATE_KV_PAIR(socket_storage_map, int, struct tcp_rtt_storage);
> +
> +SEC("sockops")
> +int _sockops(struct bpf_sock_ops *ctx)
> +{
> +       struct tcp_rtt_storage *storage;
> +       struct bpf_tcp_sock *tcp_sk;
> +       int op = (int) ctx->op;
> +       struct bpf_sock *sk;
> +
> +       sk = ctx->sk;
> +       if (!sk)
> +               return 1;
> +
> +       storage = bpf_sk_storage_get(&socket_storage_map, sk, 0,
> +                                    BPF_SK_STORAGE_GET_F_CREATE);
> +       if (!storage)
> +               return 1;
> +
> +       if (op == BPF_SOCK_OPS_TCP_CONNECT_CB) {
> +               bpf_sock_ops_cb_flags_set(ctx, BPF_SOCK_OPS_RTT_CB_FLAG);
> +               return 1;
> +       }
> +
> +       if (op != BPF_SOCK_OPS_RTT_CB)
> +               return 1;
> +
> +       tcp_sk = bpf_tcp_sock(sk);
> +       if (!tcp_sk)
> +               return 1;
> +
> +       storage->invoked++;
> +
> +       storage->dsack_dups = tcp_sk->dsack_dups;
> +       storage->delivered = tcp_sk->delivered;
> +       storage->delivered_ce = tcp_sk->delivered_ce;
> +       storage->icsk_retransmits = tcp_sk->icsk_retransmits;
> +
> +       return 1;
> +}
> diff --git a/tools/testing/selftests/bpf/test_tcp_rtt.c b/tools/testing/selftests/bpf/test_tcp_rtt.c
> new file mode 100644
> index 000000000000..413fd8514adc
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/test_tcp_rtt.c
> @@ -0,0 +1,253 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <error.h>
> +#include <errno.h>
> +#include <stdio.h>
> +#include <unistd.h>
> +#include <sys/types.h>
> +#include <sys/socket.h>
> +#include <netinet/in.h>
> +#include <pthread.h>
> +
> +#include <linux/filter.h>
> +#include <bpf/bpf.h>
> +#include <bpf/libbpf.h>
> +
> +#include "bpf_rlimit.h"
> +#include "bpf_util.h"
> +#include "cgroup_helpers.h"
> +
> +#define CG_PATH                                "/tcp_rtt"
> +
> +struct tcp_rtt_storage {
> +       __u32 invoked;
> +       __u32 dsack_dups;
> +       __u32 delivered;
> +       __u32 delivered_ce;
> +       __u32 icsk_retransmits;
> +};
> +
> +static void send_byte(int fd)
> +{
> +       char b = 0x55;
> +
> +       if (write(fd, &b, sizeof(b)) != 1)
> +               error(1, errno, "Failed to send single byte");
> +}
> +
> +static int verify_sk(int map_fd, int client_fd, const char *msg, __u32 invoked,
> +                    __u32 dsack_dups, __u32 delivered, __u32 delivered_ce,
> +                    __u32 icsk_retransmits)
> +{
> +       int err = 0;
> +       struct tcp_rtt_storage val;
> +
> +       if (bpf_map_lookup_elem(map_fd, &client_fd, &val) < 0)
> +               error(1, errno, "Failed to read socket storage");
> +
> +       if (val.invoked != invoked) {
> +               log_err("%s: unexpected bpf_tcp_sock.invoked %d != %d",
> +                       msg, val.invoked, invoked);
> +               err++;
> +       }
> +
> +       if (val.dsack_dups != dsack_dups) {
> +               log_err("%s: unexpected bpf_tcp_sock.dsack_dups %d != %d",
> +                       msg, val.dsack_dups, dsack_dups);
> +               err++;
> +       }
> +
> +       if (val.delivered != delivered) {
> +               log_err("%s: unexpected bpf_tcp_sock.delivered %d != %d",
> +                       msg, val.delivered, delivered);
> +               err++;
> +       }
> +
> +       if (val.delivered_ce != delivered_ce) {
> +               log_err("%s: unexpected bpf_tcp_sock.delivered_ce %d != %d",
> +                       msg, val.delivered_ce, delivered_ce);
> +               err++;
> +       }
> +
> +       if (val.icsk_retransmits != icsk_retransmits) {
> +               log_err("%s: unexpected bpf_tcp_sock.icsk_retransmits %d != %d",
> +                       msg, val.icsk_retransmits, icsk_retransmits);
> +               err++;
> +       }
> +
> +       return err;
> +}
> +
> +static int connect_to_server(int server_fd)
> +{
> +       struct sockaddr_storage addr;
> +       socklen_t len = sizeof(addr);
> +       int fd;
> +
> +       fd = socket(AF_INET, SOCK_STREAM, 0);
> +       if (fd < 0) {
> +               log_err("Failed to create client socket");
> +               return -1;
> +       }
> +
> +       if (getsockname(server_fd, (struct sockaddr *)&addr, &len)) {
> +               log_err("Failed to get server addr");
> +               goto out;
> +       }
> +
> +       if (connect(fd, (const struct sockaddr *)&addr, len) < 0) {
> +               log_err("Fail to connect to server");
> +               goto out;
> +       }
> +
> +       return fd;
> +
> +out:
> +       close(fd);
> +       return -1;
> +}
> +
> +static int run_test(int cgroup_fd, int server_fd)
> +{
> +       struct bpf_prog_load_attr attr = {
> +               .prog_type = BPF_PROG_TYPE_SOCK_OPS,
> +               .file = "./tcp_rtt.o",
> +               .expected_attach_type = BPF_CGROUP_SOCK_OPS,
> +       };
> +       struct bpf_program *prog;
> +       struct bpf_object *obj;
> +       struct bpf_map *map;
> +       int client_fd;
> +       int ignored;
> +       int map_fd;
> +       int err;
> +
> +       err = bpf_prog_load_xattr(&attr, &obj, &ignored);
> +       if (err) {
> +               log_err("Failed to load BPF object");
> +               return -1;
> +       }

The third argument of bpf_prog_load_xattr is prog_fd.
If you have it, you do not need the below retrieving prog_fd
by bpf_program__fd(prog).

> +
> +       map = bpf_map__next(NULL, obj);
> +       map_fd = bpf_map__fd(map);
> +
> +       prog = bpf_program__next(NULL, obj);
> +       err = bpf_prog_attach(bpf_program__fd(prog), cgroup_fd,
> +                             BPF_CGROUP_SOCK_OPS, 0);
> +       if (err) {
> +               log_err("Failed to attach BPF program");
> +               goto close_bpf_object;
> +       }
> +
> +       client_fd = connect_to_server(server_fd);
> +       if (client_fd < 0) {
> +               err = -1;
> +               goto close_bpf_object;
> +       }
> +
> +       err += verify_sk(map_fd, client_fd, "syn-ack",
> +                        /*invoked=*/1,
> +                        /*dsack_dups=*/0,
> +                        /*delivered=*/1,
> +                        /*delivered_ce=*/0,
> +                        /*icsk_retransmits=*/0);
> +
> +       send_byte(client_fd);
> +
> +       err += verify_sk(map_fd, client_fd, "first payload byte",
> +                        /*invoked=*/2,
> +                        /*dsack_dups=*/0,
> +                        /*delivered=*/2,
> +                        /*delivered_ce=*/0,
> +                        /*icsk_retransmits=*/0);
> +
> +       close(client_fd);
> +
> +close_bpf_object:
> +       bpf_object__close(obj);
> +       return err;
> +}
> +
> +static int start_server(void)
> +{
> +       struct sockaddr_in addr = {
> +               .sin_family = AF_INET,
> +               .sin_addr.s_addr = htonl(INADDR_LOOPBACK),
> +       };
> +       int fd;
> +
> +       fd = socket(AF_INET, SOCK_STREAM, 0);
> +       if (fd < 0) {
> +               log_err("Failed to create server socket");
> +               return -1;
> +       }
> +
> +       if (bind(fd, (const struct sockaddr *)&addr, sizeof(addr)) < 0) {
> +               log_err("Failed to bind socket");
> +               close(fd);
> +               return -1;
> +       }
> +
> +       return fd;
> +}
> +
> +static void *server_thread(void *arg)
> +{
> +       struct sockaddr_storage addr;
> +       socklen_t len = sizeof(addr);
> +       int fd = *(int *)arg;
> +       int client_fd;
> +
> +       if (listen(fd, 1) < 0)
> +               error(1, errno, "Failed to listed on socket");

The error() here only reports the error, right? In case of error,
should the control jumps to the end of this function and return?
The same for several error() calls below.

> +
> +       client_fd = accept(fd, (struct sockaddr *)&addr, &len);
> +       if (client_fd < 0)
> +               error(1, errno, "Failed to accept client");
> +
> +       if (accept(fd, (struct sockaddr *)&addr, &len) >= 0)
> +               error(1, errno, "Unexpected success in second accept");

What is the purpose of this second default to-be-failed accept() call?

> +
> +       close(client_fd);
> +
> +       return NULL;
> +}
> +
> +int main(int args, char **argv)
> +{
> +       int server_fd, cgroup_fd;
> +       int err = EXIT_SUCCESS;
> +       pthread_t tid;
> +
> +       if (setup_cgroup_environment())
> +               goto cleanup_obj;
> +
> +       cgroup_fd = create_and_get_cgroup(CG_PATH);
> +       if (cgroup_fd < 0)
> +               goto cleanup_cgroup_env;
> +
> +       if (join_cgroup(CG_PATH))
> +               goto cleanup_cgroup;
> +
> +       server_fd = start_server();
> +       if (server_fd < 0) {
> +               err = EXIT_FAILURE;
> +               goto cleanup_cgroup;
> +       }
> +
> +       pthread_create(&tid, NULL, server_thread, (void *)&server_fd);
> +
> +       if (run_test(cgroup_fd, server_fd))
> +               err = EXIT_FAILURE;
> +
> +       close(server_fd);
> +
> +       printf("test_sockopt_sk: %s\n",
> +              err == EXIT_SUCCESS ? "PASSED" : "FAILED");
> +
> +cleanup_cgroup:
> +       close(cgroup_fd);
> +cleanup_cgroup_env:
> +       cleanup_cgroup_environment();
> +cleanup_obj:
> +       return err;
> +}
> --
> 2.22.0.410.gd8fdbe21b5-goog
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next 6/8] selftests/bpf: test BPF_SOCK_OPS_RTT_CB
  2019-07-01 23:40   ` Y Song
@ 2019-07-02  0:07     ` Stanislav Fomichev
  2019-07-02  0:26       ` Y Song
  0 siblings, 1 reply; 17+ messages in thread
From: Stanislav Fomichev @ 2019-07-02  0:07 UTC (permalink / raw)
  To: Y Song
  Cc: Stanislav Fomichev, netdev, bpf, David Miller,
	Alexei Starovoitov, Daniel Borkmann, Eric Dumazet,
	Priyaranjan Jha, Yuchung Cheng, Soheil Hassas Yeganeh

On 07/01, Y Song wrote:
> On Mon, Jul 1, 2019 at 1:49 PM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > Make sure the callback is invoked for syn-ack and data packet.
> >
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: Priyaranjan Jha <priyarjha@google.com>
> > Cc: Yuchung Cheng <ycheng@google.com>
> > Cc: Soheil Hassas Yeganeh <soheil@google.com>
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >  tools/testing/selftests/bpf/Makefile        |   3 +-
> >  tools/testing/selftests/bpf/progs/tcp_rtt.c |  61 +++++
> >  tools/testing/selftests/bpf/test_tcp_rtt.c  | 253 ++++++++++++++++++++
> >  3 files changed, 316 insertions(+), 1 deletion(-)
> >  create mode 100644 tools/testing/selftests/bpf/progs/tcp_rtt.c
> >  create mode 100644 tools/testing/selftests/bpf/test_tcp_rtt.c
> >
> > diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> > index de1754a8f5fe..2620406a53ec 100644
> > --- a/tools/testing/selftests/bpf/Makefile
> > +++ b/tools/testing/selftests/bpf/Makefile
> > @@ -27,7 +27,7 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test
> >         test_cgroup_storage test_select_reuseport test_section_names \
> >         test_netcnt test_tcpnotify_user test_sock_fields test_sysctl test_hashmap \
> >         test_btf_dump test_cgroup_attach xdping test_sockopt test_sockopt_sk \
> > -       test_sockopt_multi
> > +       test_sockopt_multi test_tcp_rtt
> >
> >  BPF_OBJ_FILES = $(patsubst %.c,%.o, $(notdir $(wildcard progs/*.c)))
> >  TEST_GEN_FILES = $(BPF_OBJ_FILES)
> > @@ -107,6 +107,7 @@ $(OUTPUT)/test_cgroup_attach: cgroup_helpers.c
> >  $(OUTPUT)/test_sockopt: cgroup_helpers.c
> >  $(OUTPUT)/test_sockopt_sk: cgroup_helpers.c
> >  $(OUTPUT)/test_sockopt_multi: cgroup_helpers.c
> > +$(OUTPUT)/test_tcp_rtt: cgroup_helpers.c
> >
> >  .PHONY: force
> >
> > diff --git a/tools/testing/selftests/bpf/progs/tcp_rtt.c b/tools/testing/selftests/bpf/progs/tcp_rtt.c
> > new file mode 100644
> > index 000000000000..233bdcb1659e
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/tcp_rtt.c
> > @@ -0,0 +1,61 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +#include <linux/bpf.h>
> > +#include "bpf_helpers.h"
> > +
> > +char _license[] SEC("license") = "GPL";
> > +__u32 _version SEC("version") = 1;
> > +
> > +struct tcp_rtt_storage {
> > +       __u32 invoked;
> > +       __u32 dsack_dups;
> > +       __u32 delivered;
> > +       __u32 delivered_ce;
> > +       __u32 icsk_retransmits;
> > +};
> > +
> > +struct bpf_map_def SEC("maps") socket_storage_map = {
> > +       .type = BPF_MAP_TYPE_SK_STORAGE,
> > +       .key_size = sizeof(int),
> > +       .value_size = sizeof(struct tcp_rtt_storage),
> > +       .map_flags = BPF_F_NO_PREALLOC,
> > +};
> > +BPF_ANNOTATE_KV_PAIR(socket_storage_map, int, struct tcp_rtt_storage);
> > +
> > +SEC("sockops")
> > +int _sockops(struct bpf_sock_ops *ctx)
> > +{
> > +       struct tcp_rtt_storage *storage;
> > +       struct bpf_tcp_sock *tcp_sk;
> > +       int op = (int) ctx->op;
> > +       struct bpf_sock *sk;
> > +
> > +       sk = ctx->sk;
> > +       if (!sk)
> > +               return 1;
> > +
> > +       storage = bpf_sk_storage_get(&socket_storage_map, sk, 0,
> > +                                    BPF_SK_STORAGE_GET_F_CREATE);
> > +       if (!storage)
> > +               return 1;
> > +
> > +       if (op == BPF_SOCK_OPS_TCP_CONNECT_CB) {
> > +               bpf_sock_ops_cb_flags_set(ctx, BPF_SOCK_OPS_RTT_CB_FLAG);
> > +               return 1;
> > +       }
> > +
> > +       if (op != BPF_SOCK_OPS_RTT_CB)
> > +               return 1;
> > +
> > +       tcp_sk = bpf_tcp_sock(sk);
> > +       if (!tcp_sk)
> > +               return 1;
> > +
> > +       storage->invoked++;
> > +
> > +       storage->dsack_dups = tcp_sk->dsack_dups;
> > +       storage->delivered = tcp_sk->delivered;
> > +       storage->delivered_ce = tcp_sk->delivered_ce;
> > +       storage->icsk_retransmits = tcp_sk->icsk_retransmits;
> > +
> > +       return 1;
> > +}
> > diff --git a/tools/testing/selftests/bpf/test_tcp_rtt.c b/tools/testing/selftests/bpf/test_tcp_rtt.c
> > new file mode 100644
> > index 000000000000..413fd8514adc
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/test_tcp_rtt.c
> > @@ -0,0 +1,253 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +#include <error.h>
> > +#include <errno.h>
> > +#include <stdio.h>
> > +#include <unistd.h>
> > +#include <sys/types.h>
> > +#include <sys/socket.h>
> > +#include <netinet/in.h>
> > +#include <pthread.h>
> > +
> > +#include <linux/filter.h>
> > +#include <bpf/bpf.h>
> > +#include <bpf/libbpf.h>
> > +
> > +#include "bpf_rlimit.h"
> > +#include "bpf_util.h"
> > +#include "cgroup_helpers.h"
> > +
> > +#define CG_PATH                                "/tcp_rtt"
> > +
> > +struct tcp_rtt_storage {
> > +       __u32 invoked;
> > +       __u32 dsack_dups;
> > +       __u32 delivered;
> > +       __u32 delivered_ce;
> > +       __u32 icsk_retransmits;
> > +};
> > +
> > +static void send_byte(int fd)
> > +{
> > +       char b = 0x55;
> > +
> > +       if (write(fd, &b, sizeof(b)) != 1)
> > +               error(1, errno, "Failed to send single byte");
> > +}
> > +
> > +static int verify_sk(int map_fd, int client_fd, const char *msg, __u32 invoked,
> > +                    __u32 dsack_dups, __u32 delivered, __u32 delivered_ce,
> > +                    __u32 icsk_retransmits)
> > +{
> > +       int err = 0;
> > +       struct tcp_rtt_storage val;
> > +
> > +       if (bpf_map_lookup_elem(map_fd, &client_fd, &val) < 0)
> > +               error(1, errno, "Failed to read socket storage");
> > +
> > +       if (val.invoked != invoked) {
> > +               log_err("%s: unexpected bpf_tcp_sock.invoked %d != %d",
> > +                       msg, val.invoked, invoked);
> > +               err++;
> > +       }
> > +
> > +       if (val.dsack_dups != dsack_dups) {
> > +               log_err("%s: unexpected bpf_tcp_sock.dsack_dups %d != %d",
> > +                       msg, val.dsack_dups, dsack_dups);
> > +               err++;
> > +       }
> > +
> > +       if (val.delivered != delivered) {
> > +               log_err("%s: unexpected bpf_tcp_sock.delivered %d != %d",
> > +                       msg, val.delivered, delivered);
> > +               err++;
> > +       }
> > +
> > +       if (val.delivered_ce != delivered_ce) {
> > +               log_err("%s: unexpected bpf_tcp_sock.delivered_ce %d != %d",
> > +                       msg, val.delivered_ce, delivered_ce);
> > +               err++;
> > +       }
> > +
> > +       if (val.icsk_retransmits != icsk_retransmits) {
> > +               log_err("%s: unexpected bpf_tcp_sock.icsk_retransmits %d != %d",
> > +                       msg, val.icsk_retransmits, icsk_retransmits);
> > +               err++;
> > +       }
> > +
> > +       return err;
> > +}
> > +
> > +static int connect_to_server(int server_fd)
> > +{
> > +       struct sockaddr_storage addr;
> > +       socklen_t len = sizeof(addr);
> > +       int fd;
> > +
> > +       fd = socket(AF_INET, SOCK_STREAM, 0);
> > +       if (fd < 0) {
> > +               log_err("Failed to create client socket");
> > +               return -1;
> > +       }
> > +
> > +       if (getsockname(server_fd, (struct sockaddr *)&addr, &len)) {
> > +               log_err("Failed to get server addr");
> > +               goto out;
> > +       }
> > +
> > +       if (connect(fd, (const struct sockaddr *)&addr, len) < 0) {
> > +               log_err("Fail to connect to server");
> > +               goto out;
> > +       }
> > +
> > +       return fd;
> > +
> > +out:
> > +       close(fd);
> > +       return -1;
> > +}
> > +
> > +static int run_test(int cgroup_fd, int server_fd)
> > +{
> > +       struct bpf_prog_load_attr attr = {
> > +               .prog_type = BPF_PROG_TYPE_SOCK_OPS,
> > +               .file = "./tcp_rtt.o",
> > +               .expected_attach_type = BPF_CGROUP_SOCK_OPS,
> > +       };
> > +       struct bpf_program *prog;
> > +       struct bpf_object *obj;
> > +       struct bpf_map *map;
> > +       int client_fd;
> > +       int ignored;
> > +       int map_fd;
> > +       int err;
> > +
> > +       err = bpf_prog_load_xattr(&attr, &obj, &ignored);
> > +       if (err) {
> > +               log_err("Failed to load BPF object");
> > +               return -1;
> > +       }
> 
> The third argument of bpf_prog_load_xattr is prog_fd.
> If you have it, you do not need the below retrieving prog_fd
> by bpf_program__fd(prog).
Ack. I think I copy-pasted it from my other test where
I have multiple progs in the object file and attach them
manually by name. Will s/ignored/prog_fd/ in the v2.

> > +
> > +       map = bpf_map__next(NULL, obj);
> > +       map_fd = bpf_map__fd(map);
> > +
> > +       prog = bpf_program__next(NULL, obj);
> > +       err = bpf_prog_attach(bpf_program__fd(prog), cgroup_fd,
> > +                             BPF_CGROUP_SOCK_OPS, 0);
> > +       if (err) {
> > +               log_err("Failed to attach BPF program");
> > +               goto close_bpf_object;
> > +       }
> > +
> > +       client_fd = connect_to_server(server_fd);
> > +       if (client_fd < 0) {
> > +               err = -1;
> > +               goto close_bpf_object;
> > +       }
> > +
> > +       err += verify_sk(map_fd, client_fd, "syn-ack",
> > +                        /*invoked=*/1,
> > +                        /*dsack_dups=*/0,
> > +                        /*delivered=*/1,
> > +                        /*delivered_ce=*/0,
> > +                        /*icsk_retransmits=*/0);
> > +
> > +       send_byte(client_fd);
> > +
> > +       err += verify_sk(map_fd, client_fd, "first payload byte",
> > +                        /*invoked=*/2,
> > +                        /*dsack_dups=*/0,
> > +                        /*delivered=*/2,
> > +                        /*delivered_ce=*/0,
> > +                        /*icsk_retransmits=*/0);
> > +
> > +       close(client_fd);
> > +
> > +close_bpf_object:
> > +       bpf_object__close(obj);
> > +       return err;
> > +}
> > +
> > +static int start_server(void)
> > +{
> > +       struct sockaddr_in addr = {
> > +               .sin_family = AF_INET,
> > +               .sin_addr.s_addr = htonl(INADDR_LOOPBACK),
> > +       };
> > +       int fd;
> > +
> > +       fd = socket(AF_INET, SOCK_STREAM, 0);
> > +       if (fd < 0) {
> > +               log_err("Failed to create server socket");
> > +               return -1;
> > +       }
> > +
> > +       if (bind(fd, (const struct sockaddr *)&addr, sizeof(addr)) < 0) {
> > +               log_err("Failed to bind socket");
> > +               close(fd);
> > +               return -1;
> > +       }
> > +
> > +       return fd;
> > +}
> > +
> > +static void *server_thread(void *arg)
> > +{
> > +       struct sockaddr_storage addr;
> > +       socklen_t len = sizeof(addr);
> > +       int fd = *(int *)arg;
> > +       int client_fd;
> > +
> > +       if (listen(fd, 1) < 0)
> > +               error(1, errno, "Failed to listed on socket");
> 
> The error() here only reports the error, right? In case of error,
> should the control jumps to the end of this function and return?
> The same for several error() calls below.
No, error() calls exit(), so the whole process should die. Do you think
it's better to gracefully handle that with pthread_join?

> > +
> > +       client_fd = accept(fd, (struct sockaddr *)&addr, &len);
> > +       if (client_fd < 0)
> > +               error(1, errno, "Failed to accept client");
> > +
> > +       if (accept(fd, (struct sockaddr *)&addr, &len) >= 0)
> > +               error(1, errno, "Unexpected success in second accept");
> 
> What is the purpose of this second default to-be-failed accept() call?
So the server_thread waits here for the next client (that never arrives)
and doesn't exit and call close(client_fd). I can add a comment here to
clarify. Alternatively, I can just drop close(client_fd) and let
the thread exit. WDYT?

> > +
> > +       close(client_fd);
> > +
> > +       return NULL;
> > +}
> > +
> > +int main(int args, char **argv)
> > +{
> > +       int server_fd, cgroup_fd;
> > +       int err = EXIT_SUCCESS;
> > +       pthread_t tid;
> > +
> > +       if (setup_cgroup_environment())
> > +               goto cleanup_obj;
> > +
> > +       cgroup_fd = create_and_get_cgroup(CG_PATH);
> > +       if (cgroup_fd < 0)
> > +               goto cleanup_cgroup_env;
> > +
> > +       if (join_cgroup(CG_PATH))
> > +               goto cleanup_cgroup;
> > +
> > +       server_fd = start_server();
> > +       if (server_fd < 0) {
> > +               err = EXIT_FAILURE;
> > +               goto cleanup_cgroup;
> > +       }
> > +
> > +       pthread_create(&tid, NULL, server_thread, (void *)&server_fd);
> > +
> > +       if (run_test(cgroup_fd, server_fd))
> > +               err = EXIT_FAILURE;
> > +
> > +       close(server_fd);
> > +
> > +       printf("test_sockopt_sk: %s\n",
> > +              err == EXIT_SUCCESS ? "PASSED" : "FAILED");
> > +
> > +cleanup_cgroup:
> > +       close(cgroup_fd);
> > +cleanup_cgroup_env:
> > +       cleanup_cgroup_environment();
> > +cleanup_obj:
> > +       return err;
> > +}
> > --
> > 2.22.0.410.gd8fdbe21b5-goog
> >

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next 7/8] samples/bpf: add sample program that periodically dumps TCP stats
  2019-07-01 20:48 ` [PATCH bpf-next 7/8] samples/bpf: add sample program that periodically dumps TCP stats Stanislav Fomichev
@ 2019-07-02  0:15   ` Y Song
  2019-07-02  0:31     ` Stanislav Fomichev
  0 siblings, 1 reply; 17+ messages in thread
From: Y Song @ 2019-07-02  0:15 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: netdev, bpf, David Miller, Alexei Starovoitov, Daniel Borkmann,
	Eric Dumazet, Priyaranjan Jha, Yuchung Cheng,
	Soheil Hassas Yeganeh

On Mon, Jul 1, 2019 at 1:49 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> Uses new RTT callback to dump stats every second.
>
> $ mkdir -p /tmp/cgroupv2
> $ mount -t cgroup2 none /tmp/cgroupv2
> $ mkdir -p /tmp/cgroupv2/foo
> $ echo $$ >> /tmp/cgroupv2/foo/cgroup.procs
> $ bpftool prog load ./tcp_dumpstats_kern.o /sys/fs/bpf/tcp_prog
> $ bpftool cgroup attach /tmp/cgroupv2/foo sock_ops pinned /sys/fs/bpf/tcp_prog
> $ bpftool prog tracelog
> $ # run neper/netperf/etc
>
> Used neper to compare performance with and without this program attached
> and didn't see any noticeable performance impact.
>
> Sample output:
>   <idle>-0     [015] ..s.  2074.128800: 0: dsack_dups=0 delivered=242526
>   <idle>-0     [015] ..s.  2074.128808: 0: delivered_ce=0 icsk_retransmits=0
>   <idle>-0     [015] ..s.  2075.130133: 0: dsack_dups=0 delivered=323599
>   <idle>-0     [015] ..s.  2075.130138: 0: delivered_ce=0 icsk_retransmits=0
>   <idle>-0     [005] .Ns.  2076.131440: 0: dsack_dups=0 delivered=404648
>   <idle>-0     [005] .Ns.  2076.131447: 0: delivered_ce=0 icsk_retransmits=0
>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Priyaranjan Jha <priyarjha@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
> Cc: Soheil Hassas Yeganeh <soheil@google.com>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  samples/bpf/Makefile             |  1 +
>  samples/bpf/tcp_dumpstats_kern.c | 65 ++++++++++++++++++++++++++++++++
>  2 files changed, 66 insertions(+)
>  create mode 100644 samples/bpf/tcp_dumpstats_kern.c

Currently, the bpf program into the repo. If we do not have another
script to use
this program for testing, the instructions in the commit message should be
added to the bpf program as comments so people know what to do with this file
without going through git commit message.

Is it possible to create a script to run with this bpf program?

>
> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> index 0917f8cf4fab..eaebbeead42f 100644
> --- a/samples/bpf/Makefile
> +++ b/samples/bpf/Makefile
> @@ -154,6 +154,7 @@ always += tcp_iw_kern.o
>  always += tcp_clamp_kern.o
>  always += tcp_basertt_kern.o
>  always += tcp_tos_reflect_kern.o
> +always += tcp_dumpstats_kern.o
>  always += xdp_redirect_kern.o
>  always += xdp_redirect_map_kern.o
>  always += xdp_redirect_cpu_kern.o
> diff --git a/samples/bpf/tcp_dumpstats_kern.c b/samples/bpf/tcp_dumpstats_kern.c
> new file mode 100644
> index 000000000000..5d22bf61db65
> --- /dev/null
> +++ b/samples/bpf/tcp_dumpstats_kern.c
> @@ -0,0 +1,65 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/bpf.h>
> +
> +#include "bpf_helpers.h"
> +#include "bpf_endian.h"
> +
> +#define INTERVAL                       1000000000ULL
> +
> +int _version SEC("version") = 1;
> +char _license[] SEC("license") = "GPL";
> +
> +struct {
> +       __u32 type;
> +       __u32 map_flags;
> +       int *key;
> +       __u64 *value;
> +} bpf_next_dump SEC(".maps") = {
> +       .type = BPF_MAP_TYPE_SK_STORAGE,
> +       .map_flags = BPF_F_NO_PREALLOC,
> +};
> +
> +SEC("sockops")
> +int _sockops(struct bpf_sock_ops *ctx)
> +{
> +       struct bpf_tcp_sock *tcp_sk;
> +       struct bpf_sock *sk;
> +       __u64 *next_dump;
> +       __u64 now;
> +
> +       switch (ctx->op) {
> +       case BPF_SOCK_OPS_TCP_CONNECT_CB:
> +               bpf_sock_ops_cb_flags_set(ctx, BPF_SOCK_OPS_RTT_CB_FLAG);
> +               return 1;
> +       case BPF_SOCK_OPS_RTT_CB:
> +               break;
> +       default:
> +               return 1;
> +       }
> +
> +       sk = ctx->sk;
> +       if (!sk)
> +               return 1;
> +
> +       next_dump = bpf_sk_storage_get(&bpf_next_dump, sk, 0,
> +                                      BPF_SK_STORAGE_GET_F_CREATE);
> +       if (!next_dump)
> +               return 1;
> +
> +       now = bpf_ktime_get_ns();
> +       if (now < *next_dump)
> +               return 1;
> +
> +       tcp_sk = bpf_tcp_sock(sk);
> +       if (!tcp_sk)
> +               return 1;
> +
> +       *next_dump = now + INTERVAL;
> +
> +       bpf_printk("dsack_dups=%u delivered=%u\n",
> +                  tcp_sk->dsack_dups, tcp_sk->delivered);
> +       bpf_printk("delivered_ce=%u icsk_retransmits=%u\n",
> +                  tcp_sk->delivered_ce, tcp_sk->icsk_retransmits);
> +
> +       return 1;
> +}
> --
> 2.22.0.410.gd8fdbe21b5-goog
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next 6/8] selftests/bpf: test BPF_SOCK_OPS_RTT_CB
  2019-07-02  0:07     ` Stanislav Fomichev
@ 2019-07-02  0:26       ` Y Song
  0 siblings, 0 replies; 17+ messages in thread
From: Y Song @ 2019-07-02  0:26 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Stanislav Fomichev, netdev, bpf, David Miller,
	Alexei Starovoitov, Daniel Borkmann, Eric Dumazet,
	Priyaranjan Jha, Yuchung Cheng, Soheil Hassas Yeganeh

On Mon, Jul 1, 2019 at 5:07 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
>
> On 07/01, Y Song wrote:
> > On Mon, Jul 1, 2019 at 1:49 PM Stanislav Fomichev <sdf@google.com> wrote:
> > >
> > > Make sure the callback is invoked for syn-ack and data packet.
> > >
> > > Cc: Eric Dumazet <edumazet@google.com>
> > > Cc: Priyaranjan Jha <priyarjha@google.com>
> > > Cc: Yuchung Cheng <ycheng@google.com>
> > > Cc: Soheil Hassas Yeganeh <soheil@google.com>
> > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > ---
> > >  tools/testing/selftests/bpf/Makefile        |   3 +-
> > >  tools/testing/selftests/bpf/progs/tcp_rtt.c |  61 +++++
> > >  tools/testing/selftests/bpf/test_tcp_rtt.c  | 253 ++++++++++++++++++++
> > >  3 files changed, 316 insertions(+), 1 deletion(-)
> > >  create mode 100644 tools/testing/selftests/bpf/progs/tcp_rtt.c
> > >  create mode 100644 tools/testing/selftests/bpf/test_tcp_rtt.c
> > >
> > > diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> > > index de1754a8f5fe..2620406a53ec 100644
> > > --- a/tools/testing/selftests/bpf/Makefile
> > > +++ b/tools/testing/selftests/bpf/Makefile
> > > @@ -27,7 +27,7 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test
> > >         test_cgroup_storage test_select_reuseport test_section_names \
> > >         test_netcnt test_tcpnotify_user test_sock_fields test_sysctl test_hashmap \
> > >         test_btf_dump test_cgroup_attach xdping test_sockopt test_sockopt_sk \
> > > -       test_sockopt_multi
> > > +       test_sockopt_multi test_tcp_rtt
> > >
> > >  BPF_OBJ_FILES = $(patsubst %.c,%.o, $(notdir $(wildcard progs/*.c)))
> > >  TEST_GEN_FILES = $(BPF_OBJ_FILES)
> > > @@ -107,6 +107,7 @@ $(OUTPUT)/test_cgroup_attach: cgroup_helpers.c
> > >  $(OUTPUT)/test_sockopt: cgroup_helpers.c
> > >  $(OUTPUT)/test_sockopt_sk: cgroup_helpers.c
> > >  $(OUTPUT)/test_sockopt_multi: cgroup_helpers.c
> > > +$(OUTPUT)/test_tcp_rtt: cgroup_helpers.c
> > >
> > >  .PHONY: force
> > >
> > > diff --git a/tools/testing/selftests/bpf/progs/tcp_rtt.c b/tools/testing/selftests/bpf/progs/tcp_rtt.c
> > > new file mode 100644
> > > index 000000000000..233bdcb1659e
> > > --- /dev/null
> > > +++ b/tools/testing/selftests/bpf/progs/tcp_rtt.c
> > > @@ -0,0 +1,61 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +#include <linux/bpf.h>
> > > +#include "bpf_helpers.h"
> > > +
> > > +char _license[] SEC("license") = "GPL";
> > > +__u32 _version SEC("version") = 1;
> > > +
> > > +struct tcp_rtt_storage {
> > > +       __u32 invoked;
> > > +       __u32 dsack_dups;
> > > +       __u32 delivered;
> > > +       __u32 delivered_ce;
> > > +       __u32 icsk_retransmits;
> > > +};
[...]
> > > +
> > > +static void *server_thread(void *arg)
> > > +{
> > > +       struct sockaddr_storage addr;
> > > +       socklen_t len = sizeof(addr);
> > > +       int fd = *(int *)arg;
> > > +       int client_fd;
> > > +
> > > +       if (listen(fd, 1) < 0)
> > > +               error(1, errno, "Failed to listed on socket");
> >
> > The error() here only reports the error, right? In case of error,
> > should the control jumps to the end of this function and return?
> > The same for several error() calls below.
> No, error() calls exit(), so the whole process should die. Do you think
> it's better to gracefully handle that with pthread_join?

Thanks for explanation of error() semantics.
test_tcp_rtt is a standalone a program, so exiting
with a meaningful error message is fine to me. No need to change then.

>
> > > +
> > > +       client_fd = accept(fd, (struct sockaddr *)&addr, &len);
> > > +       if (client_fd < 0)
> > > +               error(1, errno, "Failed to accept client");
> > > +
> > > +       if (accept(fd, (struct sockaddr *)&addr, &len) >= 0)
> > > +               error(1, errno, "Unexpected success in second accept");
> >
> > What is the purpose of this second default to-be-failed accept() call?
> So the server_thread waits here for the next client (that never arrives)
> and doesn't exit and call close(client_fd). I can add a comment here to
> clarify. Alternatively, I can just drop close(client_fd) and let
> the thread exit. WDYT?

Adding a comment to explain should be good enough. Thanks!

>
> > > +
> > > +       close(client_fd);
> > > +
> > > +       return NULL;
> > > +}
> > > +
> > > +int main(int args, char **argv)
> > > +{
> > > +       int server_fd, cgroup_fd;
> > > +       int err = EXIT_SUCCESS;
> > > +       pthread_t tid;
> > > +
> > > +       if (setup_cgroup_environment())
> > > +               goto cleanup_obj;
> > > +
> > > +       cgroup_fd = create_and_get_cgroup(CG_PATH);
> > > +       if (cgroup_fd < 0)
> > > +               goto cleanup_cgroup_env;
> > > +
> > > +       if (join_cgroup(CG_PATH))
> > > +               goto cleanup_cgroup;
> > > +
> > > +       server_fd = start_server();
> > > +       if (server_fd < 0) {
> > > +               err = EXIT_FAILURE;
> > > +               goto cleanup_cgroup;
> > > +       }
> > > +
> > > +       pthread_create(&tid, NULL, server_thread, (void *)&server_fd);
> > > +
> > > +       if (run_test(cgroup_fd, server_fd))
> > > +               err = EXIT_FAILURE;
> > > +
> > > +       close(server_fd);
> > > +
> > > +       printf("test_sockopt_sk: %s\n",
> > > +              err == EXIT_SUCCESS ? "PASSED" : "FAILED");
> > > +
> > > +cleanup_cgroup:
> > > +       close(cgroup_fd);
> > > +cleanup_cgroup_env:
> > > +       cleanup_cgroup_environment();
> > > +cleanup_obj:
> > > +       return err;
> > > +}
> > > --
> > > 2.22.0.410.gd8fdbe21b5-goog
> > >

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next 7/8] samples/bpf: add sample program that periodically dumps TCP stats
  2019-07-02  0:15   ` Y Song
@ 2019-07-02  0:31     ` Stanislav Fomichev
  2019-07-02  0:39       ` Y Song
  0 siblings, 1 reply; 17+ messages in thread
From: Stanislav Fomichev @ 2019-07-02  0:31 UTC (permalink / raw)
  To: Y Song
  Cc: Stanislav Fomichev, netdev, bpf, David Miller,
	Alexei Starovoitov, Daniel Borkmann, Eric Dumazet,
	Priyaranjan Jha, Yuchung Cheng, Soheil Hassas Yeganeh

On 07/01, Y Song wrote:
> On Mon, Jul 1, 2019 at 1:49 PM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > Uses new RTT callback to dump stats every second.
> >
> > $ mkdir -p /tmp/cgroupv2
> > $ mount -t cgroup2 none /tmp/cgroupv2
> > $ mkdir -p /tmp/cgroupv2/foo
> > $ echo $$ >> /tmp/cgroupv2/foo/cgroup.procs
> > $ bpftool prog load ./tcp_dumpstats_kern.o /sys/fs/bpf/tcp_prog
> > $ bpftool cgroup attach /tmp/cgroupv2/foo sock_ops pinned /sys/fs/bpf/tcp_prog
> > $ bpftool prog tracelog
> > $ # run neper/netperf/etc
> >
> > Used neper to compare performance with and without this program attached
> > and didn't see any noticeable performance impact.
> >
> > Sample output:
> >   <idle>-0     [015] ..s.  2074.128800: 0: dsack_dups=0 delivered=242526
> >   <idle>-0     [015] ..s.  2074.128808: 0: delivered_ce=0 icsk_retransmits=0
> >   <idle>-0     [015] ..s.  2075.130133: 0: dsack_dups=0 delivered=323599
> >   <idle>-0     [015] ..s.  2075.130138: 0: delivered_ce=0 icsk_retransmits=0
> >   <idle>-0     [005] .Ns.  2076.131440: 0: dsack_dups=0 delivered=404648
> >   <idle>-0     [005] .Ns.  2076.131447: 0: delivered_ce=0 icsk_retransmits=0
> >
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: Priyaranjan Jha <priyarjha@google.com>
> > Cc: Yuchung Cheng <ycheng@google.com>
> > Cc: Soheil Hassas Yeganeh <soheil@google.com>
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >  samples/bpf/Makefile             |  1 +
> >  samples/bpf/tcp_dumpstats_kern.c | 65 ++++++++++++++++++++++++++++++++
> >  2 files changed, 66 insertions(+)
> >  create mode 100644 samples/bpf/tcp_dumpstats_kern.c
> 
> Currently, the bpf program into the repo. If we do not have another
> script to use
> this program for testing, the instructions in the commit message should be
> added to the bpf program as comments so people know what to do with this file
> without going through git commit message.
> 
> Is it possible to create a script to run with this bpf program?
There is a general instruction in samples/bpf/tcp_bpf.readme
with bpftool examples/etc. Should I just a comment at the top
of the BPF program to point people to that .readme file instead?

> >
> > diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> > index 0917f8cf4fab..eaebbeead42f 100644
> > --- a/samples/bpf/Makefile
> > +++ b/samples/bpf/Makefile
> > @@ -154,6 +154,7 @@ always += tcp_iw_kern.o
> >  always += tcp_clamp_kern.o
> >  always += tcp_basertt_kern.o
> >  always += tcp_tos_reflect_kern.o
> > +always += tcp_dumpstats_kern.o
> >  always += xdp_redirect_kern.o
> >  always += xdp_redirect_map_kern.o
> >  always += xdp_redirect_cpu_kern.o
> > diff --git a/samples/bpf/tcp_dumpstats_kern.c b/samples/bpf/tcp_dumpstats_kern.c
> > new file mode 100644
> > index 000000000000..5d22bf61db65
> > --- /dev/null
> > +++ b/samples/bpf/tcp_dumpstats_kern.c
> > @@ -0,0 +1,65 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +#include <linux/bpf.h>
> > +
> > +#include "bpf_helpers.h"
> > +#include "bpf_endian.h"
> > +
> > +#define INTERVAL                       1000000000ULL
> > +
> > +int _version SEC("version") = 1;
> > +char _license[] SEC("license") = "GPL";
> > +
> > +struct {
> > +       __u32 type;
> > +       __u32 map_flags;
> > +       int *key;
> > +       __u64 *value;
> > +} bpf_next_dump SEC(".maps") = {
> > +       .type = BPF_MAP_TYPE_SK_STORAGE,
> > +       .map_flags = BPF_F_NO_PREALLOC,
> > +};
> > +
> > +SEC("sockops")
> > +int _sockops(struct bpf_sock_ops *ctx)
> > +{
> > +       struct bpf_tcp_sock *tcp_sk;
> > +       struct bpf_sock *sk;
> > +       __u64 *next_dump;
> > +       __u64 now;
> > +
> > +       switch (ctx->op) {
> > +       case BPF_SOCK_OPS_TCP_CONNECT_CB:
> > +               bpf_sock_ops_cb_flags_set(ctx, BPF_SOCK_OPS_RTT_CB_FLAG);
> > +               return 1;
> > +       case BPF_SOCK_OPS_RTT_CB:
> > +               break;
> > +       default:
> > +               return 1;
> > +       }
> > +
> > +       sk = ctx->sk;
> > +       if (!sk)
> > +               return 1;
> > +
> > +       next_dump = bpf_sk_storage_get(&bpf_next_dump, sk, 0,
> > +                                      BPF_SK_STORAGE_GET_F_CREATE);
> > +       if (!next_dump)
> > +               return 1;
> > +
> > +       now = bpf_ktime_get_ns();
> > +       if (now < *next_dump)
> > +               return 1;
> > +
> > +       tcp_sk = bpf_tcp_sock(sk);
> > +       if (!tcp_sk)
> > +               return 1;
> > +
> > +       *next_dump = now + INTERVAL;
> > +
> > +       bpf_printk("dsack_dups=%u delivered=%u\n",
> > +                  tcp_sk->dsack_dups, tcp_sk->delivered);
> > +       bpf_printk("delivered_ce=%u icsk_retransmits=%u\n",
> > +                  tcp_sk->delivered_ce, tcp_sk->icsk_retransmits);
> > +
> > +       return 1;
> > +}
> > --
> > 2.22.0.410.gd8fdbe21b5-goog
> >

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next 7/8] samples/bpf: add sample program that periodically dumps TCP stats
  2019-07-02  0:31     ` Stanislav Fomichev
@ 2019-07-02  0:39       ` Y Song
  0 siblings, 0 replies; 17+ messages in thread
From: Y Song @ 2019-07-02  0:39 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Stanislav Fomichev, netdev, bpf, David Miller,
	Alexei Starovoitov, Daniel Borkmann, Eric Dumazet,
	Priyaranjan Jha, Yuchung Cheng, Soheil Hassas Yeganeh

On Mon, Jul 1, 2019 at 5:31 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
>
> On 07/01, Y Song wrote:
> > On Mon, Jul 1, 2019 at 1:49 PM Stanislav Fomichev <sdf@google.com> wrote:
> > >
> > > Uses new RTT callback to dump stats every second.
> > >
> > > $ mkdir -p /tmp/cgroupv2
> > > $ mount -t cgroup2 none /tmp/cgroupv2
> > > $ mkdir -p /tmp/cgroupv2/foo
> > > $ echo $$ >> /tmp/cgroupv2/foo/cgroup.procs
> > > $ bpftool prog load ./tcp_dumpstats_kern.o /sys/fs/bpf/tcp_prog
> > > $ bpftool cgroup attach /tmp/cgroupv2/foo sock_ops pinned /sys/fs/bpf/tcp_prog
> > > $ bpftool prog tracelog
> > > $ # run neper/netperf/etc
> > >
> > > Used neper to compare performance with and without this program attached
> > > and didn't see any noticeable performance impact.
> > >
> > > Sample output:
> > >   <idle>-0     [015] ..s.  2074.128800: 0: dsack_dups=0 delivered=242526
> > >   <idle>-0     [015] ..s.  2074.128808: 0: delivered_ce=0 icsk_retransmits=0
> > >   <idle>-0     [015] ..s.  2075.130133: 0: dsack_dups=0 delivered=323599
> > >   <idle>-0     [015] ..s.  2075.130138: 0: delivered_ce=0 icsk_retransmits=0
> > >   <idle>-0     [005] .Ns.  2076.131440: 0: dsack_dups=0 delivered=404648
> > >   <idle>-0     [005] .Ns.  2076.131447: 0: delivered_ce=0 icsk_retransmits=0
> > >
> > > Cc: Eric Dumazet <edumazet@google.com>
> > > Cc: Priyaranjan Jha <priyarjha@google.com>
> > > Cc: Yuchung Cheng <ycheng@google.com>
> > > Cc: Soheil Hassas Yeganeh <soheil@google.com>
> > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > ---
> > >  samples/bpf/Makefile             |  1 +
> > >  samples/bpf/tcp_dumpstats_kern.c | 65 ++++++++++++++++++++++++++++++++
> > >  2 files changed, 66 insertions(+)
> > >  create mode 100644 samples/bpf/tcp_dumpstats_kern.c
> >
> > Currently, the bpf program into the repo. If we do not have another
> > script to use
> > this program for testing, the instructions in the commit message should be
> > added to the bpf program as comments so people know what to do with this file
> > without going through git commit message.
> >
> > Is it possible to create a script to run with this bpf program?
> There is a general instruction in samples/bpf/tcp_bpf.readme
> with bpftool examples/etc. Should I just a comment at the top
> of the BPF program to point people to that .readme file instead?

Referring to tcp_bpf.readme should work. Even simpler :-)

>
> > >
> > > diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> > > index 0917f8cf4fab..eaebbeead42f 100644
> > > --- a/samples/bpf/Makefile
> > > +++ b/samples/bpf/Makefile
> > > @@ -154,6 +154,7 @@ always += tcp_iw_kern.o
> > >  always += tcp_clamp_kern.o
> > >  always += tcp_basertt_kern.o
> > >  always += tcp_tos_reflect_kern.o
> > > +always += tcp_dumpstats_kern.o
> > >  always += xdp_redirect_kern.o
> > >  always += xdp_redirect_map_kern.o
> > >  always += xdp_redirect_cpu_kern.o
> > > diff --git a/samples/bpf/tcp_dumpstats_kern.c b/samples/bpf/tcp_dumpstats_kern.c
> > > new file mode 100644
> > > index 000000000000..5d22bf61db65
> > > --- /dev/null
> > > +++ b/samples/bpf/tcp_dumpstats_kern.c
> > > @@ -0,0 +1,65 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +#include <linux/bpf.h>
> > > +
> > > +#include "bpf_helpers.h"
> > > +#include "bpf_endian.h"
> > > +
> > > +#define INTERVAL                       1000000000ULL
> > > +
> > > +int _version SEC("version") = 1;
> > > +char _license[] SEC("license") = "GPL";
[...]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-07-02  0:40 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-01 20:48 [PATCH bpf-next 0/8] bpf: TCP RTT sock_ops bpf callback Stanislav Fomichev
2019-07-01 20:48 ` [PATCH bpf-next 1/8] bpf: add BPF_CGROUP_SOCK_OPS callback that is executed on every RTT Stanislav Fomichev
2019-07-01 20:48 ` [PATCH bpf-next 2/8] bpf: split shared bpf_tcp_sock and bpf_sock_ops implementation Stanislav Fomichev
2019-07-01 20:48 ` [PATCH bpf-next 3/8] bpf: add dsack_dups/delivered{,_ce} to bpf_tcp_sock Stanislav Fomichev
2019-07-01 20:48 ` [PATCH bpf-next 4/8] bpf: add icsk_retransmits " Stanislav Fomichev
2019-07-01 20:48 ` [PATCH bpf-next 5/8] bpf/tools: sync bpf.h Stanislav Fomichev
2019-07-01 20:48 ` [PATCH bpf-next 6/8] selftests/bpf: test BPF_SOCK_OPS_RTT_CB Stanislav Fomichev
2019-07-01 23:40   ` Y Song
2019-07-02  0:07     ` Stanislav Fomichev
2019-07-02  0:26       ` Y Song
2019-07-01 20:48 ` [PATCH bpf-next 7/8] samples/bpf: add sample program that periodically dumps TCP stats Stanislav Fomichev
2019-07-02  0:15   ` Y Song
2019-07-02  0:31     ` Stanislav Fomichev
2019-07-02  0:39       ` Y Song
2019-07-01 20:48 ` [PATCH bpf-next 8/8] samples/bpf: fix tcp_bpf.readme detach command Stanislav Fomichev
2019-07-01 21:03 ` [PATCH bpf-next 0/8] bpf: TCP RTT sock_ops bpf callback Soheil Hassas Yeganeh
2019-07-01 21:15 ` Yuchung Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).