* [PATCH bpf-next v2 0/8] bpf: TCP RTT sock_ops bpf callback
@ 2019-07-02 16:13 Stanislav Fomichev
2019-07-02 16:13 ` [PATCH bpf-next v2 1/8] bpf: add BPF_CGROUP_SOCK_OPS callback that is executed on every RTT Stanislav Fomichev
` (9 more replies)
0 siblings, 10 replies; 14+ messages in thread
From: Stanislav Fomichev @ 2019-07-02 16:13 UTC (permalink / raw)
To: netdev, bpf
Cc: davem, ast, daniel, Stanislav Fomichev, Eric Dumazet,
Priyaranjan Jha, Yuchung Cheng, Soheil Hassas Yeganeh
Congestion control team would like to have a periodic callback to
track some TCP statistics. Let's add a sock_ops callback that can be
selectively enabled on a socket by socket basis and is executed for
every RTT. BPF program frequency can be further controlled by calling
bpf_ktime_get_ns and bailing out early.
I run neper tcp_stream and tcp_rr tests with the sample program
from the last patch and didn't observe any noticeable performance
difference.
v2:
* add a comment about second accept() in selftest (Yonghong Song)
* refer to tcp_bpf.readme in sample program (Yonghong Song)
Suggested-by: Eric Dumazet <edumazet@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Stanislav Fomichev (8):
bpf: add BPF_CGROUP_SOCK_OPS callback that is executed on every RTT
bpf: split shared bpf_tcp_sock and bpf_sock_ops implementation
bpf: add dsack_dups/delivered{,_ce} to bpf_tcp_sock
bpf: add icsk_retransmits to bpf_tcp_sock
bpf/tools: sync bpf.h
selftests/bpf: test BPF_SOCK_OPS_RTT_CB
samples/bpf: add sample program that periodically dumps TCP stats
samples/bpf: fix tcp_bpf.readme detach command
include/net/tcp.h | 8 +
include/uapi/linux/bpf.h | 12 +-
net/core/filter.c | 207 +++++++++++-----
net/ipv4/tcp_input.c | 4 +
samples/bpf/Makefile | 1 +
samples/bpf/tcp_bpf.readme | 2 +-
samples/bpf/tcp_dumpstats_kern.c | 68 ++++++
tools/include/uapi/linux/bpf.h | 12 +-
tools/testing/selftests/bpf/Makefile | 3 +-
tools/testing/selftests/bpf/progs/tcp_rtt.c | 61 +++++
tools/testing/selftests/bpf/test_tcp_rtt.c | 254 ++++++++++++++++++++
11 files changed, 574 insertions(+), 58 deletions(-)
create mode 100644 samples/bpf/tcp_dumpstats_kern.c
create mode 100644 tools/testing/selftests/bpf/progs/tcp_rtt.c
create mode 100644 tools/testing/selftests/bpf/test_tcp_rtt.c
--
2.22.0.410.gd8fdbe21b5-goog
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH bpf-next v2 1/8] bpf: add BPF_CGROUP_SOCK_OPS callback that is executed on every RTT
2019-07-02 16:13 [PATCH bpf-next v2 0/8] bpf: TCP RTT sock_ops bpf callback Stanislav Fomichev
@ 2019-07-02 16:13 ` Stanislav Fomichev
2019-07-02 16:13 ` [PATCH bpf-next v2 2/8] bpf: split shared bpf_tcp_sock and bpf_sock_ops implementation Stanislav Fomichev
` (8 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Stanislav Fomichev @ 2019-07-02 16:13 UTC (permalink / raw)
To: netdev, bpf
Cc: davem, ast, daniel, Stanislav Fomichev, Eric Dumazet,
Priyaranjan Jha, Yuchung Cheng, Soheil Hassas Yeganeh
Performance impact should be minimal because it's under a new
BPF_SOCK_OPS_RTT_CB_FLAG flag that has to be explicitly enabled.
Suggested-by: Eric Dumazet <edumazet@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
include/net/tcp.h | 8 ++++++++
include/uapi/linux/bpf.h | 6 +++++-
net/ipv4/tcp_input.c | 4 ++++
3 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 9d36cc88d043..e16d8a3fd3b4 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2221,6 +2221,14 @@ static inline bool tcp_bpf_ca_needs_ecn(struct sock *sk)
return (tcp_call_bpf(sk, BPF_SOCK_OPS_NEEDS_ECN, 0, NULL) == 1);
}
+static inline void tcp_bpf_rtt(struct sock *sk)
+{
+ struct tcp_sock *tp = tcp_sk(sk);
+
+ if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_RTT_CB_FLAG))
+ tcp_call_bpf(sk, BPF_SOCK_OPS_RTT_CB, 0, NULL);
+}
+
#if IS_ENABLED(CONFIG_SMC)
extern struct static_key_false tcp_have_smc;
#endif
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index cffea1826a1f..9cdd0aaeba06 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1770,6 +1770,7 @@ union bpf_attr {
* * **BPF_SOCK_OPS_RTO_CB_FLAG** (retransmission time out)
* * **BPF_SOCK_OPS_RETRANS_CB_FLAG** (retransmission)
* * **BPF_SOCK_OPS_STATE_CB_FLAG** (TCP state change)
+ * * **BPF_SOCK_OPS_RTT_CB_FLAG** (every RTT)
*
* Therefore, this function can be used to clear a callback flag by
* setting the appropriate bit to zero. e.g. to disable the RTO
@@ -3314,7 +3315,8 @@ struct bpf_sock_ops {
#define BPF_SOCK_OPS_RTO_CB_FLAG (1<<0)
#define BPF_SOCK_OPS_RETRANS_CB_FLAG (1<<1)
#define BPF_SOCK_OPS_STATE_CB_FLAG (1<<2)
-#define BPF_SOCK_OPS_ALL_CB_FLAGS 0x7 /* Mask of all currently
+#define BPF_SOCK_OPS_RTT_CB_FLAG (1<<3)
+#define BPF_SOCK_OPS_ALL_CB_FLAGS 0xF /* Mask of all currently
* supported cb flags
*/
@@ -3369,6 +3371,8 @@ enum {
BPF_SOCK_OPS_TCP_LISTEN_CB, /* Called on listen(2), right after
* socket transition to LISTEN state.
*/
+ BPF_SOCK_OPS_RTT_CB, /* Called on every RTT.
+ */
};
/* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b71efeb0ae5b..c21e8a22fb3b 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -778,6 +778,8 @@ static void tcp_rtt_estimator(struct sock *sk, long mrtt_us)
tp->rttvar_us -= (tp->rttvar_us - tp->mdev_max_us) >> 2;
tp->rtt_seq = tp->snd_nxt;
tp->mdev_max_us = tcp_rto_min_us(sk);
+
+ tcp_bpf_rtt(sk);
}
} else {
/* no previous measure. */
@@ -786,6 +788,8 @@ static void tcp_rtt_estimator(struct sock *sk, long mrtt_us)
tp->rttvar_us = max(tp->mdev_us, tcp_rto_min_us(sk));
tp->mdev_max_us = tp->rttvar_us;
tp->rtt_seq = tp->snd_nxt;
+
+ tcp_bpf_rtt(sk);
}
tp->srtt_us = max(1U, srtt);
}
--
2.22.0.410.gd8fdbe21b5-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH bpf-next v2 2/8] bpf: split shared bpf_tcp_sock and bpf_sock_ops implementation
2019-07-02 16:13 [PATCH bpf-next v2 0/8] bpf: TCP RTT sock_ops bpf callback Stanislav Fomichev
2019-07-02 16:13 ` [PATCH bpf-next v2 1/8] bpf: add BPF_CGROUP_SOCK_OPS callback that is executed on every RTT Stanislav Fomichev
@ 2019-07-02 16:13 ` Stanislav Fomichev
2019-07-02 16:13 ` [PATCH bpf-next v2 3/8] bpf: add dsack_dups/delivered{,_ce} to bpf_tcp_sock Stanislav Fomichev
` (7 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Stanislav Fomichev @ 2019-07-02 16:13 UTC (permalink / raw)
To: netdev, bpf
Cc: davem, ast, daniel, Stanislav Fomichev, Eric Dumazet,
Priyaranjan Jha, Yuchung Cheng, Soheil Hassas Yeganeh
We've added bpf_tcp_sock member to bpf_sock_ops and don't expect
any new tcp_sock fields in bpf_sock_ops. Let's remove
CONVERT_COMMON_TCP_SOCK_FIELDS so bpf_tcp_sock can be independently
extended.
Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
net/core/filter.c | 180 ++++++++++++++++++++++++++++++++--------------
1 file changed, 126 insertions(+), 54 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index 4836264f82ee..ad908526545d 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5194,54 +5194,6 @@ static const struct bpf_func_proto bpf_lwt_seg6_adjust_srh_proto = {
};
#endif /* CONFIG_IPV6_SEG6_BPF */
-#define CONVERT_COMMON_TCP_SOCK_FIELDS(md_type, CONVERT) \
-do { \
- switch (si->off) { \
- case offsetof(md_type, snd_cwnd): \
- CONVERT(snd_cwnd); break; \
- case offsetof(md_type, srtt_us): \
- CONVERT(srtt_us); break; \
- case offsetof(md_type, snd_ssthresh): \
- CONVERT(snd_ssthresh); break; \
- case offsetof(md_type, rcv_nxt): \
- CONVERT(rcv_nxt); break; \
- case offsetof(md_type, snd_nxt): \
- CONVERT(snd_nxt); break; \
- case offsetof(md_type, snd_una): \
- CONVERT(snd_una); break; \
- case offsetof(md_type, mss_cache): \
- CONVERT(mss_cache); break; \
- case offsetof(md_type, ecn_flags): \
- CONVERT(ecn_flags); break; \
- case offsetof(md_type, rate_delivered): \
- CONVERT(rate_delivered); break; \
- case offsetof(md_type, rate_interval_us): \
- CONVERT(rate_interval_us); break; \
- case offsetof(md_type, packets_out): \
- CONVERT(packets_out); break; \
- case offsetof(md_type, retrans_out): \
- CONVERT(retrans_out); break; \
- case offsetof(md_type, total_retrans): \
- CONVERT(total_retrans); break; \
- case offsetof(md_type, segs_in): \
- CONVERT(segs_in); break; \
- case offsetof(md_type, data_segs_in): \
- CONVERT(data_segs_in); break; \
- case offsetof(md_type, segs_out): \
- CONVERT(segs_out); break; \
- case offsetof(md_type, data_segs_out): \
- CONVERT(data_segs_out); break; \
- case offsetof(md_type, lost_out): \
- CONVERT(lost_out); break; \
- case offsetof(md_type, sacked_out): \
- CONVERT(sacked_out); break; \
- case offsetof(md_type, bytes_received): \
- CONVERT(bytes_received); break; \
- case offsetof(md_type, bytes_acked): \
- CONVERT(bytes_acked); break; \
- } \
-} while (0)
-
#ifdef CONFIG_INET
static struct sock *sk_lookup(struct net *net, struct bpf_sock_tuple *tuple,
int dif, int sdif, u8 family, u8 proto)
@@ -5623,9 +5575,6 @@ u32 bpf_tcp_sock_convert_ctx_access(enum bpf_access_type type,
offsetof(struct tcp_sock, FIELD)); \
} while (0)
- CONVERT_COMMON_TCP_SOCK_FIELDS(struct bpf_tcp_sock,
- BPF_TCP_SOCK_GET_COMMON);
-
if (insn > insn_buf)
return insn - insn_buf;
@@ -5640,6 +5589,69 @@ u32 bpf_tcp_sock_convert_ctx_access(enum bpf_access_type type,
offsetof(struct tcp_sock, rtt_min) +
offsetof(struct minmax_sample, v));
break;
+ case offsetof(struct bpf_tcp_sock, snd_cwnd):
+ BPF_TCP_SOCK_GET_COMMON(snd_cwnd);
+ break;
+ case offsetof(struct bpf_tcp_sock, srtt_us):
+ BPF_TCP_SOCK_GET_COMMON(srtt_us);
+ break;
+ case offsetof(struct bpf_tcp_sock, snd_ssthresh):
+ BPF_TCP_SOCK_GET_COMMON(snd_ssthresh);
+ break;
+ case offsetof(struct bpf_tcp_sock, rcv_nxt):
+ BPF_TCP_SOCK_GET_COMMON(rcv_nxt);
+ break;
+ case offsetof(struct bpf_tcp_sock, snd_nxt):
+ BPF_TCP_SOCK_GET_COMMON(snd_nxt);
+ break;
+ case offsetof(struct bpf_tcp_sock, snd_una):
+ BPF_TCP_SOCK_GET_COMMON(snd_una);
+ break;
+ case offsetof(struct bpf_tcp_sock, mss_cache):
+ BPF_TCP_SOCK_GET_COMMON(mss_cache);
+ break;
+ case offsetof(struct bpf_tcp_sock, ecn_flags):
+ BPF_TCP_SOCK_GET_COMMON(ecn_flags);
+ break;
+ case offsetof(struct bpf_tcp_sock, rate_delivered):
+ BPF_TCP_SOCK_GET_COMMON(rate_delivered);
+ break;
+ case offsetof(struct bpf_tcp_sock, rate_interval_us):
+ BPF_TCP_SOCK_GET_COMMON(rate_interval_us);
+ break;
+ case offsetof(struct bpf_tcp_sock, packets_out):
+ BPF_TCP_SOCK_GET_COMMON(packets_out);
+ break;
+ case offsetof(struct bpf_tcp_sock, retrans_out):
+ BPF_TCP_SOCK_GET_COMMON(retrans_out);
+ break;
+ case offsetof(struct bpf_tcp_sock, total_retrans):
+ BPF_TCP_SOCK_GET_COMMON(total_retrans);
+ break;
+ case offsetof(struct bpf_tcp_sock, segs_in):
+ BPF_TCP_SOCK_GET_COMMON(segs_in);
+ break;
+ case offsetof(struct bpf_tcp_sock, data_segs_in):
+ BPF_TCP_SOCK_GET_COMMON(data_segs_in);
+ break;
+ case offsetof(struct bpf_tcp_sock, segs_out):
+ BPF_TCP_SOCK_GET_COMMON(segs_out);
+ break;
+ case offsetof(struct bpf_tcp_sock, data_segs_out):
+ BPF_TCP_SOCK_GET_COMMON(data_segs_out);
+ break;
+ case offsetof(struct bpf_tcp_sock, lost_out):
+ BPF_TCP_SOCK_GET_COMMON(lost_out);
+ break;
+ case offsetof(struct bpf_tcp_sock, sacked_out):
+ BPF_TCP_SOCK_GET_COMMON(sacked_out);
+ break;
+ case offsetof(struct bpf_tcp_sock, bytes_received):
+ BPF_TCP_SOCK_GET_COMMON(bytes_received);
+ break;
+ case offsetof(struct bpf_tcp_sock, bytes_acked):
+ BPF_TCP_SOCK_GET_COMMON(bytes_acked);
+ break;
}
return insn - insn_buf;
@@ -7913,9 +7925,6 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
SOCK_OPS_GET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ); \
} while (0)
- CONVERT_COMMON_TCP_SOCK_FIELDS(struct bpf_sock_ops,
- SOCK_OPS_GET_TCP_SOCK_FIELD);
-
if (insn > insn_buf)
return insn - insn_buf;
@@ -8085,6 +8094,69 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
SOCK_OPS_GET_OR_SET_FIELD(sk_txhash, sk_txhash,
struct sock, type);
break;
+ case offsetof(struct bpf_sock_ops, snd_cwnd):
+ SOCK_OPS_GET_TCP_SOCK_FIELD(snd_cwnd);
+ break;
+ case offsetof(struct bpf_sock_ops, srtt_us):
+ SOCK_OPS_GET_TCP_SOCK_FIELD(srtt_us);
+ break;
+ case offsetof(struct bpf_sock_ops, snd_ssthresh):
+ SOCK_OPS_GET_TCP_SOCK_FIELD(snd_ssthresh);
+ break;
+ case offsetof(struct bpf_sock_ops, rcv_nxt):
+ SOCK_OPS_GET_TCP_SOCK_FIELD(rcv_nxt);
+ break;
+ case offsetof(struct bpf_sock_ops, snd_nxt):
+ SOCK_OPS_GET_TCP_SOCK_FIELD(snd_nxt);
+ break;
+ case offsetof(struct bpf_sock_ops, snd_una):
+ SOCK_OPS_GET_TCP_SOCK_FIELD(snd_una);
+ break;
+ case offsetof(struct bpf_sock_ops, mss_cache):
+ SOCK_OPS_GET_TCP_SOCK_FIELD(mss_cache);
+ break;
+ case offsetof(struct bpf_sock_ops, ecn_flags):
+ SOCK_OPS_GET_TCP_SOCK_FIELD(ecn_flags);
+ break;
+ case offsetof(struct bpf_sock_ops, rate_delivered):
+ SOCK_OPS_GET_TCP_SOCK_FIELD(rate_delivered);
+ break;
+ case offsetof(struct bpf_sock_ops, rate_interval_us):
+ SOCK_OPS_GET_TCP_SOCK_FIELD(rate_interval_us);
+ break;
+ case offsetof(struct bpf_sock_ops, packets_out):
+ SOCK_OPS_GET_TCP_SOCK_FIELD(packets_out);
+ break;
+ case offsetof(struct bpf_sock_ops, retrans_out):
+ SOCK_OPS_GET_TCP_SOCK_FIELD(retrans_out);
+ break;
+ case offsetof(struct bpf_sock_ops, total_retrans):
+ SOCK_OPS_GET_TCP_SOCK_FIELD(total_retrans);
+ break;
+ case offsetof(struct bpf_sock_ops, segs_in):
+ SOCK_OPS_GET_TCP_SOCK_FIELD(segs_in);
+ break;
+ case offsetof(struct bpf_sock_ops, data_segs_in):
+ SOCK_OPS_GET_TCP_SOCK_FIELD(data_segs_in);
+ break;
+ case offsetof(struct bpf_sock_ops, segs_out):
+ SOCK_OPS_GET_TCP_SOCK_FIELD(segs_out);
+ break;
+ case offsetof(struct bpf_sock_ops, data_segs_out):
+ SOCK_OPS_GET_TCP_SOCK_FIELD(data_segs_out);
+ break;
+ case offsetof(struct bpf_sock_ops, lost_out):
+ SOCK_OPS_GET_TCP_SOCK_FIELD(lost_out);
+ break;
+ case offsetof(struct bpf_sock_ops, sacked_out):
+ SOCK_OPS_GET_TCP_SOCK_FIELD(sacked_out);
+ break;
+ case offsetof(struct bpf_sock_ops, bytes_received):
+ SOCK_OPS_GET_TCP_SOCK_FIELD(bytes_received);
+ break;
+ case offsetof(struct bpf_sock_ops, bytes_acked):
+ SOCK_OPS_GET_TCP_SOCK_FIELD(bytes_acked);
+ break;
case offsetof(struct bpf_sock_ops, sk):
*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
struct bpf_sock_ops_kern,
--
2.22.0.410.gd8fdbe21b5-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH bpf-next v2 3/8] bpf: add dsack_dups/delivered{,_ce} to bpf_tcp_sock
2019-07-02 16:13 [PATCH bpf-next v2 0/8] bpf: TCP RTT sock_ops bpf callback Stanislav Fomichev
2019-07-02 16:13 ` [PATCH bpf-next v2 1/8] bpf: add BPF_CGROUP_SOCK_OPS callback that is executed on every RTT Stanislav Fomichev
2019-07-02 16:13 ` [PATCH bpf-next v2 2/8] bpf: split shared bpf_tcp_sock and bpf_sock_ops implementation Stanislav Fomichev
@ 2019-07-02 16:13 ` Stanislav Fomichev
2019-07-02 16:13 ` [PATCH bpf-next v2 4/8] bpf: add icsk_retransmits " Stanislav Fomichev
` (6 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Stanislav Fomichev @ 2019-07-02 16:13 UTC (permalink / raw)
To: netdev, bpf
Cc: davem, ast, daniel, Stanislav Fomichev, Eric Dumazet,
Priyaranjan Jha, Yuchung Cheng, Soheil Hassas Yeganeh
Add more fields to bpf_tcp_sock that might be useful for debugging
congestion control issues.
Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
include/uapi/linux/bpf.h | 5 +++++
net/core/filter.c | 11 ++++++++++-
2 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 9cdd0aaeba06..bfb0b1a76684 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3073,6 +3073,11 @@ struct bpf_tcp_sock {
* sum(delta(snd_una)), or how many bytes
* were acked.
*/
+ __u32 dsack_dups; /* RFC4898 tcpEStatsStackDSACKDups
+ * total number of DSACK blocks received
+ */
+ __u32 delivered; /* Total data packets delivered incl. rexmits */
+ __u32 delivered_ce; /* Like the above but only ECE marked packets */
};
struct bpf_sock_tuple {
diff --git a/net/core/filter.c b/net/core/filter.c
index ad908526545d..3da4b6c38b46 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5544,7 +5544,7 @@ static const struct bpf_func_proto bpf_sock_addr_sk_lookup_udp_proto = {
bool bpf_tcp_sock_is_valid_access(int off, int size, enum bpf_access_type type,
struct bpf_insn_access_aux *info)
{
- if (off < 0 || off >= offsetofend(struct bpf_tcp_sock, bytes_acked))
+ if (off < 0 || off >= offsetofend(struct bpf_tcp_sock, delivered_ce))
return false;
if (off % size != 0)
@@ -5652,6 +5652,15 @@ u32 bpf_tcp_sock_convert_ctx_access(enum bpf_access_type type,
case offsetof(struct bpf_tcp_sock, bytes_acked):
BPF_TCP_SOCK_GET_COMMON(bytes_acked);
break;
+ case offsetof(struct bpf_tcp_sock, dsack_dups):
+ BPF_TCP_SOCK_GET_COMMON(dsack_dups);
+ break;
+ case offsetof(struct bpf_tcp_sock, delivered):
+ BPF_TCP_SOCK_GET_COMMON(delivered);
+ break;
+ case offsetof(struct bpf_tcp_sock, delivered_ce):
+ BPF_TCP_SOCK_GET_COMMON(delivered_ce);
+ break;
}
return insn - insn_buf;
--
2.22.0.410.gd8fdbe21b5-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH bpf-next v2 4/8] bpf: add icsk_retransmits to bpf_tcp_sock
2019-07-02 16:13 [PATCH bpf-next v2 0/8] bpf: TCP RTT sock_ops bpf callback Stanislav Fomichev
` (2 preceding siblings ...)
2019-07-02 16:13 ` [PATCH bpf-next v2 3/8] bpf: add dsack_dups/delivered{,_ce} to bpf_tcp_sock Stanislav Fomichev
@ 2019-07-02 16:13 ` Stanislav Fomichev
2019-07-02 16:14 ` [PATCH bpf-next v2 5/8] bpf/tools: sync bpf.h Stanislav Fomichev
` (5 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Stanislav Fomichev @ 2019-07-02 16:13 UTC (permalink / raw)
To: netdev, bpf
Cc: davem, ast, daniel, Stanislav Fomichev, Eric Dumazet,
Priyaranjan Jha, Yuchung Cheng, Soheil Hassas Yeganeh
Add some inet_connection_sock fields to bpf_tcp_sock that might be useful
for debugging congestion control issues.
Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
include/uapi/linux/bpf.h | 1 +
net/core/filter.c | 20 +++++++++++++++++++-
2 files changed, 20 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index bfb0b1a76684..ead27aebf491 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3078,6 +3078,7 @@ struct bpf_tcp_sock {
*/
__u32 delivered; /* Total data packets delivered incl. rexmits */
__u32 delivered_ce; /* Like the above but only ECE marked packets */
+ __u32 icsk_retransmits; /* Number of unrecovered [RTO] timeouts */
};
struct bpf_sock_tuple {
diff --git a/net/core/filter.c b/net/core/filter.c
index 3da4b6c38b46..089aaea0ccc6 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5544,7 +5544,8 @@ static const struct bpf_func_proto bpf_sock_addr_sk_lookup_udp_proto = {
bool bpf_tcp_sock_is_valid_access(int off, int size, enum bpf_access_type type,
struct bpf_insn_access_aux *info)
{
- if (off < 0 || off >= offsetofend(struct bpf_tcp_sock, delivered_ce))
+ if (off < 0 || off >= offsetofend(struct bpf_tcp_sock,
+ icsk_retransmits))
return false;
if (off % size != 0)
@@ -5575,6 +5576,20 @@ u32 bpf_tcp_sock_convert_ctx_access(enum bpf_access_type type,
offsetof(struct tcp_sock, FIELD)); \
} while (0)
+#define BPF_INET_SOCK_GET_COMMON(FIELD) \
+ do { \
+ BUILD_BUG_ON(FIELD_SIZEOF(struct inet_connection_sock, \
+ FIELD) > \
+ FIELD_SIZEOF(struct bpf_tcp_sock, FIELD)); \
+ *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( \
+ struct inet_connection_sock, \
+ FIELD), \
+ si->dst_reg, si->src_reg, \
+ offsetof( \
+ struct inet_connection_sock, \
+ FIELD)); \
+ } while (0)
+
if (insn > insn_buf)
return insn - insn_buf;
@@ -5661,6 +5676,9 @@ u32 bpf_tcp_sock_convert_ctx_access(enum bpf_access_type type,
case offsetof(struct bpf_tcp_sock, delivered_ce):
BPF_TCP_SOCK_GET_COMMON(delivered_ce);
break;
+ case offsetof(struct bpf_tcp_sock, icsk_retransmits):
+ BPF_INET_SOCK_GET_COMMON(icsk_retransmits);
+ break;
}
return insn - insn_buf;
--
2.22.0.410.gd8fdbe21b5-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH bpf-next v2 5/8] bpf/tools: sync bpf.h
2019-07-02 16:13 [PATCH bpf-next v2 0/8] bpf: TCP RTT sock_ops bpf callback Stanislav Fomichev
` (3 preceding siblings ...)
2019-07-02 16:13 ` [PATCH bpf-next v2 4/8] bpf: add icsk_retransmits " Stanislav Fomichev
@ 2019-07-02 16:14 ` Stanislav Fomichev
2019-07-02 16:14 ` [PATCH bpf-next v2 6/8] selftests/bpf: test BPF_SOCK_OPS_RTT_CB Stanislav Fomichev
` (4 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Stanislav Fomichev @ 2019-07-02 16:14 UTC (permalink / raw)
To: netdev, bpf
Cc: davem, ast, daniel, Stanislav Fomichev, Eric Dumazet,
Priyaranjan Jha, Yuchung Cheng, Soheil Hassas Yeganeh
Sync new bpf_tcp_sock fields and new BPF_PROG_TYPE_SOCK_OPS RTT callback.
Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
tools/include/uapi/linux/bpf.h | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index a396b516a2b2..cecf42c871d4 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1767,6 +1767,7 @@ union bpf_attr {
* * **BPF_SOCK_OPS_RTO_CB_FLAG** (retransmission time out)
* * **BPF_SOCK_OPS_RETRANS_CB_FLAG** (retransmission)
* * **BPF_SOCK_OPS_STATE_CB_FLAG** (TCP state change)
+ * * **BPF_SOCK_OPS_RTT_CB_FLAG** (every RTT)
*
* Therefore, this function can be used to clear a callback flag by
* setting the appropriate bit to zero. e.g. to disable the RTO
@@ -3069,6 +3070,12 @@ struct bpf_tcp_sock {
* sum(delta(snd_una)), or how many bytes
* were acked.
*/
+ __u32 dsack_dups; /* RFC4898 tcpEStatsStackDSACKDups
+ * total number of DSACK blocks received
+ */
+ __u32 delivered; /* Total data packets delivered incl. rexmits */
+ __u32 delivered_ce; /* Like the above but only ECE marked packets */
+ __u32 icsk_retransmits; /* Number of unrecovered [RTO] timeouts */
};
struct bpf_sock_tuple {
@@ -3311,7 +3318,8 @@ struct bpf_sock_ops {
#define BPF_SOCK_OPS_RTO_CB_FLAG (1<<0)
#define BPF_SOCK_OPS_RETRANS_CB_FLAG (1<<1)
#define BPF_SOCK_OPS_STATE_CB_FLAG (1<<2)
-#define BPF_SOCK_OPS_ALL_CB_FLAGS 0x7 /* Mask of all currently
+#define BPF_SOCK_OPS_RTT_CB_FLAG (1<<3)
+#define BPF_SOCK_OPS_ALL_CB_FLAGS 0xF /* Mask of all currently
* supported cb flags
*/
@@ -3366,6 +3374,8 @@ enum {
BPF_SOCK_OPS_TCP_LISTEN_CB, /* Called on listen(2), right after
* socket transition to LISTEN state.
*/
+ BPF_SOCK_OPS_RTT_CB, /* Called on every RTT.
+ */
};
/* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
--
2.22.0.410.gd8fdbe21b5-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH bpf-next v2 6/8] selftests/bpf: test BPF_SOCK_OPS_RTT_CB
2019-07-02 16:13 [PATCH bpf-next v2 0/8] bpf: TCP RTT sock_ops bpf callback Stanislav Fomichev
` (4 preceding siblings ...)
2019-07-02 16:14 ` [PATCH bpf-next v2 5/8] bpf/tools: sync bpf.h Stanislav Fomichev
@ 2019-07-02 16:14 ` Stanislav Fomichev
2019-07-03 19:01 ` Andrii Nakryiko
2019-07-02 16:14 ` [PATCH bpf-next v2 7/8] samples/bpf: add sample program that periodically dumps TCP stats Stanislav Fomichev
` (3 subsequent siblings)
9 siblings, 1 reply; 14+ messages in thread
From: Stanislav Fomichev @ 2019-07-02 16:14 UTC (permalink / raw)
To: netdev, bpf
Cc: davem, ast, daniel, Stanislav Fomichev, Eric Dumazet,
Priyaranjan Jha, Yuchung Cheng, Soheil Hassas Yeganeh
Make sure the callback is invoked for syn-ack and data packet.
Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
tools/testing/selftests/bpf/Makefile | 3 +-
tools/testing/selftests/bpf/progs/tcp_rtt.c | 61 +++++
tools/testing/selftests/bpf/test_tcp_rtt.c | 254 ++++++++++++++++++++
3 files changed, 317 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/bpf/progs/tcp_rtt.c
create mode 100644 tools/testing/selftests/bpf/test_tcp_rtt.c
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index de1754a8f5fe..2620406a53ec 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -27,7 +27,7 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test
test_cgroup_storage test_select_reuseport test_section_names \
test_netcnt test_tcpnotify_user test_sock_fields test_sysctl test_hashmap \
test_btf_dump test_cgroup_attach xdping test_sockopt test_sockopt_sk \
- test_sockopt_multi
+ test_sockopt_multi test_tcp_rtt
BPF_OBJ_FILES = $(patsubst %.c,%.o, $(notdir $(wildcard progs/*.c)))
TEST_GEN_FILES = $(BPF_OBJ_FILES)
@@ -107,6 +107,7 @@ $(OUTPUT)/test_cgroup_attach: cgroup_helpers.c
$(OUTPUT)/test_sockopt: cgroup_helpers.c
$(OUTPUT)/test_sockopt_sk: cgroup_helpers.c
$(OUTPUT)/test_sockopt_multi: cgroup_helpers.c
+$(OUTPUT)/test_tcp_rtt: cgroup_helpers.c
.PHONY: force
diff --git a/tools/testing/selftests/bpf/progs/tcp_rtt.c b/tools/testing/selftests/bpf/progs/tcp_rtt.c
new file mode 100644
index 000000000000..233bdcb1659e
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/tcp_rtt.c
@@ -0,0 +1,61 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include "bpf_helpers.h"
+
+char _license[] SEC("license") = "GPL";
+__u32 _version SEC("version") = 1;
+
+struct tcp_rtt_storage {
+ __u32 invoked;
+ __u32 dsack_dups;
+ __u32 delivered;
+ __u32 delivered_ce;
+ __u32 icsk_retransmits;
+};
+
+struct bpf_map_def SEC("maps") socket_storage_map = {
+ .type = BPF_MAP_TYPE_SK_STORAGE,
+ .key_size = sizeof(int),
+ .value_size = sizeof(struct tcp_rtt_storage),
+ .map_flags = BPF_F_NO_PREALLOC,
+};
+BPF_ANNOTATE_KV_PAIR(socket_storage_map, int, struct tcp_rtt_storage);
+
+SEC("sockops")
+int _sockops(struct bpf_sock_ops *ctx)
+{
+ struct tcp_rtt_storage *storage;
+ struct bpf_tcp_sock *tcp_sk;
+ int op = (int) ctx->op;
+ struct bpf_sock *sk;
+
+ sk = ctx->sk;
+ if (!sk)
+ return 1;
+
+ storage = bpf_sk_storage_get(&socket_storage_map, sk, 0,
+ BPF_SK_STORAGE_GET_F_CREATE);
+ if (!storage)
+ return 1;
+
+ if (op == BPF_SOCK_OPS_TCP_CONNECT_CB) {
+ bpf_sock_ops_cb_flags_set(ctx, BPF_SOCK_OPS_RTT_CB_FLAG);
+ return 1;
+ }
+
+ if (op != BPF_SOCK_OPS_RTT_CB)
+ return 1;
+
+ tcp_sk = bpf_tcp_sock(sk);
+ if (!tcp_sk)
+ return 1;
+
+ storage->invoked++;
+
+ storage->dsack_dups = tcp_sk->dsack_dups;
+ storage->delivered = tcp_sk->delivered;
+ storage->delivered_ce = tcp_sk->delivered_ce;
+ storage->icsk_retransmits = tcp_sk->icsk_retransmits;
+
+ return 1;
+}
diff --git a/tools/testing/selftests/bpf/test_tcp_rtt.c b/tools/testing/selftests/bpf/test_tcp_rtt.c
new file mode 100644
index 000000000000..90c3862f74a8
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_tcp_rtt.c
@@ -0,0 +1,254 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <error.h>
+#include <errno.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <pthread.h>
+
+#include <linux/filter.h>
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+
+#include "bpf_rlimit.h"
+#include "bpf_util.h"
+#include "cgroup_helpers.h"
+
+#define CG_PATH "/tcp_rtt"
+
+struct tcp_rtt_storage {
+ __u32 invoked;
+ __u32 dsack_dups;
+ __u32 delivered;
+ __u32 delivered_ce;
+ __u32 icsk_retransmits;
+};
+
+static void send_byte(int fd)
+{
+ char b = 0x55;
+
+ if (write(fd, &b, sizeof(b)) != 1)
+ error(1, errno, "Failed to send single byte");
+}
+
+static int verify_sk(int map_fd, int client_fd, const char *msg, __u32 invoked,
+ __u32 dsack_dups, __u32 delivered, __u32 delivered_ce,
+ __u32 icsk_retransmits)
+{
+ int err = 0;
+ struct tcp_rtt_storage val;
+
+ if (bpf_map_lookup_elem(map_fd, &client_fd, &val) < 0)
+ error(1, errno, "Failed to read socket storage");
+
+ if (val.invoked != invoked) {
+ log_err("%s: unexpected bpf_tcp_sock.invoked %d != %d",
+ msg, val.invoked, invoked);
+ err++;
+ }
+
+ if (val.dsack_dups != dsack_dups) {
+ log_err("%s: unexpected bpf_tcp_sock.dsack_dups %d != %d",
+ msg, val.dsack_dups, dsack_dups);
+ err++;
+ }
+
+ if (val.delivered != delivered) {
+ log_err("%s: unexpected bpf_tcp_sock.delivered %d != %d",
+ msg, val.delivered, delivered);
+ err++;
+ }
+
+ if (val.delivered_ce != delivered_ce) {
+ log_err("%s: unexpected bpf_tcp_sock.delivered_ce %d != %d",
+ msg, val.delivered_ce, delivered_ce);
+ err++;
+ }
+
+ if (val.icsk_retransmits != icsk_retransmits) {
+ log_err("%s: unexpected bpf_tcp_sock.icsk_retransmits %d != %d",
+ msg, val.icsk_retransmits, icsk_retransmits);
+ err++;
+ }
+
+ return err;
+}
+
+static int connect_to_server(int server_fd)
+{
+ struct sockaddr_storage addr;
+ socklen_t len = sizeof(addr);
+ int fd;
+
+ fd = socket(AF_INET, SOCK_STREAM, 0);
+ if (fd < 0) {
+ log_err("Failed to create client socket");
+ return -1;
+ }
+
+ if (getsockname(server_fd, (struct sockaddr *)&addr, &len)) {
+ log_err("Failed to get server addr");
+ goto out;
+ }
+
+ if (connect(fd, (const struct sockaddr *)&addr, len) < 0) {
+ log_err("Fail to connect to server");
+ goto out;
+ }
+
+ return fd;
+
+out:
+ close(fd);
+ return -1;
+}
+
+static int run_test(int cgroup_fd, int server_fd)
+{
+ struct bpf_prog_load_attr attr = {
+ .prog_type = BPF_PROG_TYPE_SOCK_OPS,
+ .file = "./tcp_rtt.o",
+ .expected_attach_type = BPF_CGROUP_SOCK_OPS,
+ };
+ struct bpf_object *obj;
+ struct bpf_map *map;
+ int client_fd;
+ int prog_fd;
+ int map_fd;
+ int err;
+
+ err = bpf_prog_load_xattr(&attr, &obj, &prog_fd);
+ if (err) {
+ log_err("Failed to load BPF object");
+ return -1;
+ }
+
+ map = bpf_map__next(NULL, obj);
+ map_fd = bpf_map__fd(map);
+
+ err = bpf_prog_attach(prog_fd, cgroup_fd, BPF_CGROUP_SOCK_OPS, 0);
+ if (err) {
+ log_err("Failed to attach BPF program");
+ goto close_bpf_object;
+ }
+
+ client_fd = connect_to_server(server_fd);
+ if (client_fd < 0) {
+ err = -1;
+ goto close_bpf_object;
+ }
+
+ err += verify_sk(map_fd, client_fd, "syn-ack",
+ /*invoked=*/1,
+ /*dsack_dups=*/0,
+ /*delivered=*/1,
+ /*delivered_ce=*/0,
+ /*icsk_retransmits=*/0);
+
+ send_byte(client_fd);
+
+ err += verify_sk(map_fd, client_fd, "first payload byte",
+ /*invoked=*/2,
+ /*dsack_dups=*/0,
+ /*delivered=*/2,
+ /*delivered_ce=*/0,
+ /*icsk_retransmits=*/0);
+
+ close(client_fd);
+
+close_bpf_object:
+ bpf_object__close(obj);
+ return err;
+}
+
+static int start_server(void)
+{
+ struct sockaddr_in addr = {
+ .sin_family = AF_INET,
+ .sin_addr.s_addr = htonl(INADDR_LOOPBACK),
+ };
+ int fd;
+
+ fd = socket(AF_INET, SOCK_STREAM, 0);
+ if (fd < 0) {
+ log_err("Failed to create server socket");
+ return -1;
+ }
+
+ if (bind(fd, (const struct sockaddr *)&addr, sizeof(addr)) < 0) {
+ log_err("Failed to bind socket");
+ close(fd);
+ return -1;
+ }
+
+ return fd;
+}
+
+static void *server_thread(void *arg)
+{
+ struct sockaddr_storage addr;
+ socklen_t len = sizeof(addr);
+ int fd = *(int *)arg;
+ int client_fd;
+
+ if (listen(fd, 1) < 0)
+ error(1, errno, "Failed to listed on socket");
+
+ client_fd = accept(fd, (struct sockaddr *)&addr, &len);
+ if (client_fd < 0)
+ error(1, errno, "Failed to accept client");
+
+ /* Wait for the next connection (that never arrives)
+ * to keep this thread alive to prevent calling
+ * close() on client_fd.
+ */
+ if (accept(fd, (struct sockaddr *)&addr, &len) >= 0)
+ error(1, errno, "Unexpected success in second accept");
+
+ close(client_fd);
+
+ return NULL;
+}
+
+int main(int args, char **argv)
+{
+ int server_fd, cgroup_fd;
+ int err = EXIT_SUCCESS;
+ pthread_t tid;
+
+ if (setup_cgroup_environment())
+ goto cleanup_obj;
+
+ cgroup_fd = create_and_get_cgroup(CG_PATH);
+ if (cgroup_fd < 0)
+ goto cleanup_cgroup_env;
+
+ if (join_cgroup(CG_PATH))
+ goto cleanup_cgroup;
+
+ server_fd = start_server();
+ if (server_fd < 0) {
+ err = EXIT_FAILURE;
+ goto cleanup_cgroup;
+ }
+
+ pthread_create(&tid, NULL, server_thread, (void *)&server_fd);
+
+ if (run_test(cgroup_fd, server_fd))
+ err = EXIT_FAILURE;
+
+ close(server_fd);
+
+ printf("test_sockopt_sk: %s\n",
+ err == EXIT_SUCCESS ? "PASSED" : "FAILED");
+
+cleanup_cgroup:
+ close(cgroup_fd);
+cleanup_cgroup_env:
+ cleanup_cgroup_environment();
+cleanup_obj:
+ return err;
+}
--
2.22.0.410.gd8fdbe21b5-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH bpf-next v2 7/8] samples/bpf: add sample program that periodically dumps TCP stats
2019-07-02 16:13 [PATCH bpf-next v2 0/8] bpf: TCP RTT sock_ops bpf callback Stanislav Fomichev
` (5 preceding siblings ...)
2019-07-02 16:14 ` [PATCH bpf-next v2 6/8] selftests/bpf: test BPF_SOCK_OPS_RTT_CB Stanislav Fomichev
@ 2019-07-02 16:14 ` Stanislav Fomichev
2019-07-02 16:14 ` [PATCH bpf-next v2 8/8] samples/bpf: fix tcp_bpf.readme detach command Stanislav Fomichev
` (2 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Stanislav Fomichev @ 2019-07-02 16:14 UTC (permalink / raw)
To: netdev, bpf
Cc: davem, ast, daniel, Stanislav Fomichev, Eric Dumazet,
Priyaranjan Jha, Yuchung Cheng, Soheil Hassas Yeganeh
Uses new RTT callback to dump stats every second.
$ mkdir -p /tmp/cgroupv2
$ mount -t cgroup2 none /tmp/cgroupv2
$ mkdir -p /tmp/cgroupv2/foo
$ echo $$ >> /tmp/cgroupv2/foo/cgroup.procs
$ bpftool prog load ./tcp_dumpstats_kern.o /sys/fs/bpf/tcp_prog
$ bpftool cgroup attach /tmp/cgroupv2/foo sock_ops pinned /sys/fs/bpf/tcp_prog
$ bpftool prog tracelog
$ # run neper/netperf/etc
Used neper to compare performance with and without this program attached
and didn't see any noticeable performance impact.
Sample output:
<idle>-0 [015] ..s. 2074.128800: 0: dsack_dups=0 delivered=242526
<idle>-0 [015] ..s. 2074.128808: 0: delivered_ce=0 icsk_retransmits=0
<idle>-0 [015] ..s. 2075.130133: 0: dsack_dups=0 delivered=323599
<idle>-0 [015] ..s. 2075.130138: 0: delivered_ce=0 icsk_retransmits=0
<idle>-0 [005] .Ns. 2076.131440: 0: dsack_dups=0 delivered=404648
<idle>-0 [005] .Ns. 2076.131447: 0: delivered_ce=0 icsk_retransmits=0
Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
samples/bpf/Makefile | 1 +
samples/bpf/tcp_dumpstats_kern.c | 68 ++++++++++++++++++++++++++++++++
2 files changed, 69 insertions(+)
create mode 100644 samples/bpf/tcp_dumpstats_kern.c
diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 0917f8cf4fab..eaebbeead42f 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -154,6 +154,7 @@ always += tcp_iw_kern.o
always += tcp_clamp_kern.o
always += tcp_basertt_kern.o
always += tcp_tos_reflect_kern.o
+always += tcp_dumpstats_kern.o
always += xdp_redirect_kern.o
always += xdp_redirect_map_kern.o
always += xdp_redirect_cpu_kern.o
diff --git a/samples/bpf/tcp_dumpstats_kern.c b/samples/bpf/tcp_dumpstats_kern.c
new file mode 100644
index 000000000000..8557913106a0
--- /dev/null
+++ b/samples/bpf/tcp_dumpstats_kern.c
@@ -0,0 +1,68 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Refer to samples/bpf/tcp_bpf.readme for the instructions on
+ * how to run this sample program.
+ */
+#include <linux/bpf.h>
+
+#include "bpf_helpers.h"
+#include "bpf_endian.h"
+
+#define INTERVAL 1000000000ULL
+
+int _version SEC("version") = 1;
+char _license[] SEC("license") = "GPL";
+
+struct {
+ __u32 type;
+ __u32 map_flags;
+ int *key;
+ __u64 *value;
+} bpf_next_dump SEC(".maps") = {
+ .type = BPF_MAP_TYPE_SK_STORAGE,
+ .map_flags = BPF_F_NO_PREALLOC,
+};
+
+SEC("sockops")
+int _sockops(struct bpf_sock_ops *ctx)
+{
+ struct bpf_tcp_sock *tcp_sk;
+ struct bpf_sock *sk;
+ __u64 *next_dump;
+ __u64 now;
+
+ switch (ctx->op) {
+ case BPF_SOCK_OPS_TCP_CONNECT_CB:
+ bpf_sock_ops_cb_flags_set(ctx, BPF_SOCK_OPS_RTT_CB_FLAG);
+ return 1;
+ case BPF_SOCK_OPS_RTT_CB:
+ break;
+ default:
+ return 1;
+ }
+
+ sk = ctx->sk;
+ if (!sk)
+ return 1;
+
+ next_dump = bpf_sk_storage_get(&bpf_next_dump, sk, 0,
+ BPF_SK_STORAGE_GET_F_CREATE);
+ if (!next_dump)
+ return 1;
+
+ now = bpf_ktime_get_ns();
+ if (now < *next_dump)
+ return 1;
+
+ tcp_sk = bpf_tcp_sock(sk);
+ if (!tcp_sk)
+ return 1;
+
+ *next_dump = now + INTERVAL;
+
+ bpf_printk("dsack_dups=%u delivered=%u\n",
+ tcp_sk->dsack_dups, tcp_sk->delivered);
+ bpf_printk("delivered_ce=%u icsk_retransmits=%u\n",
+ tcp_sk->delivered_ce, tcp_sk->icsk_retransmits);
+
+ return 1;
+}
--
2.22.0.410.gd8fdbe21b5-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH bpf-next v2 8/8] samples/bpf: fix tcp_bpf.readme detach command
2019-07-02 16:13 [PATCH bpf-next v2 0/8] bpf: TCP RTT sock_ops bpf callback Stanislav Fomichev
` (6 preceding siblings ...)
2019-07-02 16:14 ` [PATCH bpf-next v2 7/8] samples/bpf: add sample program that periodically dumps TCP stats Stanislav Fomichev
@ 2019-07-02 16:14 ` Stanislav Fomichev
2019-07-02 17:55 ` [PATCH bpf-next v2 0/8] bpf: TCP RTT sock_ops bpf callback Y Song
2019-07-03 15:04 ` Daniel Borkmann
9 siblings, 0 replies; 14+ messages in thread
From: Stanislav Fomichev @ 2019-07-02 16:14 UTC (permalink / raw)
To: netdev, bpf
Cc: davem, ast, daniel, Stanislav Fomichev, Soheil Hassas Yeganeh,
Yuchung Cheng
Copy-paste, should be detach, not attach.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
---
samples/bpf/tcp_bpf.readme | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/samples/bpf/tcp_bpf.readme b/samples/bpf/tcp_bpf.readme
index fee746621aec..78e247f62108 100644
--- a/samples/bpf/tcp_bpf.readme
+++ b/samples/bpf/tcp_bpf.readme
@@ -25,4 +25,4 @@ attached to the cgroupv2).
To remove (unattach) a socket_ops BPF program from a cgroupv2:
- bpftool cgroup attach /tmp/cgroupv2/foo sock_ops pinned /sys/fs/bpf/tcp_prog
+ bpftool cgroup detach /tmp/cgroupv2/foo sock_ops pinned /sys/fs/bpf/tcp_prog
--
2.22.0.410.gd8fdbe21b5-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next v2 0/8] bpf: TCP RTT sock_ops bpf callback
2019-07-02 16:13 [PATCH bpf-next v2 0/8] bpf: TCP RTT sock_ops bpf callback Stanislav Fomichev
` (7 preceding siblings ...)
2019-07-02 16:14 ` [PATCH bpf-next v2 8/8] samples/bpf: fix tcp_bpf.readme detach command Stanislav Fomichev
@ 2019-07-02 17:55 ` Y Song
2019-07-03 3:04 ` Lawrence Brakmo
2019-07-03 15:04 ` Daniel Borkmann
9 siblings, 1 reply; 14+ messages in thread
From: Y Song @ 2019-07-02 17:55 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: netdev, bpf, David Miller, Alexei Starovoitov, Daniel Borkmann,
Eric Dumazet, Priyaranjan Jha, Yuchung Cheng,
Soheil Hassas Yeganeh
On Tue, Jul 2, 2019 at 9:14 AM Stanislav Fomichev <sdf@google.com> wrote:
>
> Congestion control team would like to have a periodic callback to
> track some TCP statistics. Let's add a sock_ops callback that can be
> selectively enabled on a socket by socket basis and is executed for
> every RTT. BPF program frequency can be further controlled by calling
> bpf_ktime_get_ns and bailing out early.
>
> I run neper tcp_stream and tcp_rr tests with the sample program
> from the last patch and didn't observe any noticeable performance
> difference.
>
> v2:
> * add a comment about second accept() in selftest (Yonghong Song)
> * refer to tcp_bpf.readme in sample program (Yonghong Song)
>
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Priyaranjan Jha <priyarjha@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
> Cc: Soheil Hassas Yeganeh <soheil@google.com>
> Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
> Acked-by: Yuchung Cheng <ycheng@google.com>
Ack for the whole series.
Acked-by: Yonghong Song <yhs@fb.com>
>
> Stanislav Fomichev (8):
> bpf: add BPF_CGROUP_SOCK_OPS callback that is executed on every RTT
> bpf: split shared bpf_tcp_sock and bpf_sock_ops implementation
> bpf: add dsack_dups/delivered{,_ce} to bpf_tcp_sock
> bpf: add icsk_retransmits to bpf_tcp_sock
> bpf/tools: sync bpf.h
> selftests/bpf: test BPF_SOCK_OPS_RTT_CB
> samples/bpf: add sample program that periodically dumps TCP stats
> samples/bpf: fix tcp_bpf.readme detach command
>
> include/net/tcp.h | 8 +
> include/uapi/linux/bpf.h | 12 +-
> net/core/filter.c | 207 +++++++++++-----
> net/ipv4/tcp_input.c | 4 +
> samples/bpf/Makefile | 1 +
> samples/bpf/tcp_bpf.readme | 2 +-
> samples/bpf/tcp_dumpstats_kern.c | 68 ++++++
> tools/include/uapi/linux/bpf.h | 12 +-
> tools/testing/selftests/bpf/Makefile | 3 +-
> tools/testing/selftests/bpf/progs/tcp_rtt.c | 61 +++++
> tools/testing/selftests/bpf/test_tcp_rtt.c | 254 ++++++++++++++++++++
> 11 files changed, 574 insertions(+), 58 deletions(-)
> create mode 100644 samples/bpf/tcp_dumpstats_kern.c
> create mode 100644 tools/testing/selftests/bpf/progs/tcp_rtt.c
> create mode 100644 tools/testing/selftests/bpf/test_tcp_rtt.c
>
> --
> 2.22.0.410.gd8fdbe21b5-goog
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next v2 0/8] bpf: TCP RTT sock_ops bpf callback
2019-07-02 17:55 ` [PATCH bpf-next v2 0/8] bpf: TCP RTT sock_ops bpf callback Y Song
@ 2019-07-03 3:04 ` Lawrence Brakmo
0 siblings, 0 replies; 14+ messages in thread
From: Lawrence Brakmo @ 2019-07-03 3:04 UTC (permalink / raw)
To: Y Song, Stanislav Fomichev
Cc: netdev, bpf, David Miller, Alexei Starovoitov, Daniel Borkmann,
Eric Dumazet, Priyaranjan Jha, Yuchung Cheng,
Soheil Hassas Yeganeh
On 7/2/19, 10:55 AM, "netdev-owner@vger.kernel.org on behalf of Y Song" <netdev-owner@vger.kernel.org on behalf of ys114321@gmail.com> wrote:
On Tue, Jul 2, 2019 at 9:14 AM Stanislav Fomichev <sdf@google.com> wrote:
>
> Congestion control team would like to have a periodic callback to
> track some TCP statistics. Let's add a sock_ops callback that can be
> selectively enabled on a socket by socket basis and is executed for
> every RTT. BPF program frequency can be further controlled by calling
> bpf_ktime_get_ns and bailing out early.
>
> I run neper tcp_stream and tcp_rr tests with the sample program
> from the last patch and didn't observe any noticeable performance
> difference.
>
> v2:
> * add a comment about second accept() in selftest (Yonghong Song)
> * refer to tcp_bpf.readme in sample program (Yonghong Song)
>
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Priyaranjan Jha <priyarjha@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
> Cc: Soheil Hassas Yeganeh <soheil@google.com>
> Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
> Acked-by: Yuchung Cheng <ycheng@google.com>
Ack for the whole series.
Acked-by: Yonghong Song <yhs@fb.com>
Thanks, this is a very nice feature!
Ack for the whole series.
Acked-by: Lawrence Brakmo <brakmo@fb.com>
>
> Stanislav Fomichev (8):
> bpf: add BPF_CGROUP_SOCK_OPS callback that is executed on every RTT
> bpf: split shared bpf_tcp_sock and bpf_sock_ops implementation
> bpf: add dsack_dups/delivered{,_ce} to bpf_tcp_sock
> bpf: add icsk_retransmits to bpf_tcp_sock
> bpf/tools: sync bpf.h
> selftests/bpf: test BPF_SOCK_OPS_RTT_CB
> samples/bpf: add sample program that periodically dumps TCP stats
> samples/bpf: fix tcp_bpf.readme detach command
>
> include/net/tcp.h | 8 +
> include/uapi/linux/bpf.h | 12 +-
> net/core/filter.c | 207 +++++++++++-----
> net/ipv4/tcp_input.c | 4 +
> samples/bpf/Makefile | 1 +
> samples/bpf/tcp_bpf.readme | 2 +-
> samples/bpf/tcp_dumpstats_kern.c | 68 ++++++
> tools/include/uapi/linux/bpf.h | 12 +-
> tools/testing/selftests/bpf/Makefile | 3 +-
> tools/testing/selftests/bpf/progs/tcp_rtt.c | 61 +++++
> tools/testing/selftests/bpf/test_tcp_rtt.c | 254 ++++++++++++++++++++
> 11 files changed, 574 insertions(+), 58 deletions(-)
> create mode 100644 samples/bpf/tcp_dumpstats_kern.c
> create mode 100644 tools/testing/selftests/bpf/progs/tcp_rtt.c
> create mode 100644 tools/testing/selftests/bpf/test_tcp_rtt.c
>
> --
> 2.22.0.410.gd8fdbe21b5-goog
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next v2 0/8] bpf: TCP RTT sock_ops bpf callback
2019-07-02 16:13 [PATCH bpf-next v2 0/8] bpf: TCP RTT sock_ops bpf callback Stanislav Fomichev
` (8 preceding siblings ...)
2019-07-02 17:55 ` [PATCH bpf-next v2 0/8] bpf: TCP RTT sock_ops bpf callback Y Song
@ 2019-07-03 15:04 ` Daniel Borkmann
9 siblings, 0 replies; 14+ messages in thread
From: Daniel Borkmann @ 2019-07-03 15:04 UTC (permalink / raw)
To: Stanislav Fomichev, netdev, bpf
Cc: davem, ast, Eric Dumazet, Priyaranjan Jha, Yuchung Cheng,
Soheil Hassas Yeganeh
On 07/02/2019 06:13 PM, Stanislav Fomichev wrote:
> Congestion control team would like to have a periodic callback to
> track some TCP statistics. Let's add a sock_ops callback that can be
> selectively enabled on a socket by socket basis and is executed for
> every RTT. BPF program frequency can be further controlled by calling
> bpf_ktime_get_ns and bailing out early.
>
> I run neper tcp_stream and tcp_rr tests with the sample program
> from the last patch and didn't observe any noticeable performance
> difference.
>
> v2:
> * add a comment about second accept() in selftest (Yonghong Song)
> * refer to tcp_bpf.readme in sample program (Yonghong Song)
>
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Priyaranjan Jha <priyarjha@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
> Cc: Soheil Hassas Yeganeh <soheil@google.com>
> Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
> Acked-by: Yuchung Cheng <ycheng@google.com>
>
> Stanislav Fomichev (8):
> bpf: add BPF_CGROUP_SOCK_OPS callback that is executed on every RTT
> bpf: split shared bpf_tcp_sock and bpf_sock_ops implementation
> bpf: add dsack_dups/delivered{,_ce} to bpf_tcp_sock
> bpf: add icsk_retransmits to bpf_tcp_sock
> bpf/tools: sync bpf.h
> selftests/bpf: test BPF_SOCK_OPS_RTT_CB
> samples/bpf: add sample program that periodically dumps TCP stats
> samples/bpf: fix tcp_bpf.readme detach command
>
> include/net/tcp.h | 8 +
> include/uapi/linux/bpf.h | 12 +-
> net/core/filter.c | 207 +++++++++++-----
> net/ipv4/tcp_input.c | 4 +
> samples/bpf/Makefile | 1 +
> samples/bpf/tcp_bpf.readme | 2 +-
> samples/bpf/tcp_dumpstats_kern.c | 68 ++++++
> tools/include/uapi/linux/bpf.h | 12 +-
> tools/testing/selftests/bpf/Makefile | 3 +-
> tools/testing/selftests/bpf/progs/tcp_rtt.c | 61 +++++
> tools/testing/selftests/bpf/test_tcp_rtt.c | 254 ++++++++++++++++++++
> 11 files changed, 574 insertions(+), 58 deletions(-)
> create mode 100644 samples/bpf/tcp_dumpstats_kern.c
> create mode 100644 tools/testing/selftests/bpf/progs/tcp_rtt.c
> create mode 100644 tools/testing/selftests/bpf/test_tcp_rtt.c
>
Applied, thanks!
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next v2 6/8] selftests/bpf: test BPF_SOCK_OPS_RTT_CB
2019-07-02 16:14 ` [PATCH bpf-next v2 6/8] selftests/bpf: test BPF_SOCK_OPS_RTT_CB Stanislav Fomichev
@ 2019-07-03 19:01 ` Andrii Nakryiko
2019-07-03 19:54 ` Stanislav Fomichev
0 siblings, 1 reply; 14+ messages in thread
From: Andrii Nakryiko @ 2019-07-03 19:01 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: Networking, bpf, David S. Miller, Alexei Starovoitov,
Daniel Borkmann, Eric Dumazet, Priyaranjan Jha, Yuchung Cheng,
Soheil Hassas Yeganeh
On Tue, Jul 2, 2019 at 9:14 AM Stanislav Fomichev <sdf@google.com> wrote:
>
> Make sure the callback is invoked for syn-ack and data packet.
>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Priyaranjan Jha <priyarjha@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
> Cc: Soheil Hassas Yeganeh <soheil@google.com>
> Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
> Acked-by: Yuchung Cheng <ycheng@google.com>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
> tools/testing/selftests/bpf/Makefile | 3 +-
> tools/testing/selftests/bpf/progs/tcp_rtt.c | 61 +++++
> tools/testing/selftests/bpf/test_tcp_rtt.c | 254 ++++++++++++++++++++
> 3 files changed, 317 insertions(+), 1 deletion(-)
> create mode 100644 tools/testing/selftests/bpf/progs/tcp_rtt.c
> create mode 100644 tools/testing/selftests/bpf/test_tcp_rtt.c
Can you please post a follow-up patch to add test_tcp_rtt to .gitignore?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next v2 6/8] selftests/bpf: test BPF_SOCK_OPS_RTT_CB
2019-07-03 19:01 ` Andrii Nakryiko
@ 2019-07-03 19:54 ` Stanislav Fomichev
0 siblings, 0 replies; 14+ messages in thread
From: Stanislav Fomichev @ 2019-07-03 19:54 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Stanislav Fomichev, Networking, bpf, David S. Miller,
Alexei Starovoitov, Daniel Borkmann, Eric Dumazet,
Priyaranjan Jha, Yuchung Cheng, Soheil Hassas Yeganeh
On 07/03, Andrii Nakryiko wrote:
> On Tue, Jul 2, 2019 at 9:14 AM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > Make sure the callback is invoked for syn-ack and data packet.
> >
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: Priyaranjan Jha <priyarjha@google.com>
> > Cc: Yuchung Cheng <ycheng@google.com>
> > Cc: Soheil Hassas Yeganeh <soheil@google.com>
> > Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
> > Acked-by: Yuchung Cheng <ycheng@google.com>
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> > tools/testing/selftests/bpf/Makefile | 3 +-
> > tools/testing/selftests/bpf/progs/tcp_rtt.c | 61 +++++
> > tools/testing/selftests/bpf/test_tcp_rtt.c | 254 ++++++++++++++++++++
> > 3 files changed, 317 insertions(+), 1 deletion(-)
> > create mode 100644 tools/testing/selftests/bpf/progs/tcp_rtt.c
> > create mode 100644 tools/testing/selftests/bpf/test_tcp_rtt.c
>
> Can you please post a follow-up patch to add test_tcp_rtt to .gitignore?
Sure, will do, thanks for a report!
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2019-07-03 19:54 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-02 16:13 [PATCH bpf-next v2 0/8] bpf: TCP RTT sock_ops bpf callback Stanislav Fomichev
2019-07-02 16:13 ` [PATCH bpf-next v2 1/8] bpf: add BPF_CGROUP_SOCK_OPS callback that is executed on every RTT Stanislav Fomichev
2019-07-02 16:13 ` [PATCH bpf-next v2 2/8] bpf: split shared bpf_tcp_sock and bpf_sock_ops implementation Stanislav Fomichev
2019-07-02 16:13 ` [PATCH bpf-next v2 3/8] bpf: add dsack_dups/delivered{,_ce} to bpf_tcp_sock Stanislav Fomichev
2019-07-02 16:13 ` [PATCH bpf-next v2 4/8] bpf: add icsk_retransmits " Stanislav Fomichev
2019-07-02 16:14 ` [PATCH bpf-next v2 5/8] bpf/tools: sync bpf.h Stanislav Fomichev
2019-07-02 16:14 ` [PATCH bpf-next v2 6/8] selftests/bpf: test BPF_SOCK_OPS_RTT_CB Stanislav Fomichev
2019-07-03 19:01 ` Andrii Nakryiko
2019-07-03 19:54 ` Stanislav Fomichev
2019-07-02 16:14 ` [PATCH bpf-next v2 7/8] samples/bpf: add sample program that periodically dumps TCP stats Stanislav Fomichev
2019-07-02 16:14 ` [PATCH bpf-next v2 8/8] samples/bpf: fix tcp_bpf.readme detach command Stanislav Fomichev
2019-07-02 17:55 ` [PATCH bpf-next v2 0/8] bpf: TCP RTT sock_ops bpf callback Y Song
2019-07-03 3:04 ` Lawrence Brakmo
2019-07-03 15:04 ` Daniel Borkmann
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).