* [PATCH v2 bpf-next 0/7] Add __sk_buff->sk, bpf_tcp_sock, BPF_FUNC_tcp_sock and BPF_FUNC_sk_fullsock
@ 2019-02-10 7:22 Martin KaFai Lau
2019-02-10 7:22 ` [PATCH v2 bpf-next 1/7] bpf: Add a bpf_sock pointer to __sk_buff and a bpf_sk_fullsock helper Martin KaFai Lau
` (7 more replies)
0 siblings, 8 replies; 10+ messages in thread
From: Martin KaFai Lau @ 2019-02-10 7:22 UTC (permalink / raw)
To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Lawrence Brakmo
This series adds __sk_buff->sk, "struct bpf_tcp_sock",
BPF_FUNC_sk_fullsock and BPF_FUNC_tcp_sock. Together, they provide
a common way to expose the members of "struct tcp_sock" and
"struct bpf_sock" for the bpf_prog to access.
The patch series first adds a bpf_sock pointer to __sk_buff
and a new helper BPF_FUNC_sk_fullsock.
It then adds BPF_FUNC_tcp_sock to get a bpf_tcp_sock
pointer from a bpf_sock pointer.
The current use case is to allow a cg_skb_bpf_prog to provide
per cgroup traffic policing/shaping.
Please see individual patch for details.
v2:
- Patch 1 depends on
commit d623876646be ("bpf: Fix narrow load on a bpf_sock returned from sk_lookup()")
in the bpf branch.
- Add sk_to_full_sk() to bpf_sk_fullsock() and bpf_tcp_sock()
such that there is a way to access the listener's sk and tcp_sk
when __sk_buff->sk is a request_sock.
The comments in the uapi bpf.h is updated accordingly.
- bpf_ctx_range_till() is used in bpf_sock_common_is_valid_access()
in patch 1. Saved a few lines.
- Patch 2 is new in v2 and it adds "state", "dst_ip4", "dst_ip6" and
"dst_port" to the bpf_sock. Narrow load is allowed on them.
The "state" (i.e. sk_state) has already been used in
INET_DIAG (e.g. ss -t) and getsockopt(TCP_INFO).
- While at it in the new patch 2, also allow narrow load on some
existing fields of the bpf_sock, which are "family", "type", "protocol"
and "src_port". Only allow loading from first byte for now.
i.e. does not allow narrow load starting from the 2nd byte.
- Add some narrow load tests to the test_verifier's sock.c
Martin KaFai Lau (7):
bpf: Add a bpf_sock pointer to __sk_buff and a bpf_sk_fullsock helper
bpf: Add state, dst_ip4, dst_ip6 and dst_port to bpf_sock
bpf: Refactor sock_ops_convert_ctx_access
bpf: Add struct bpf_tcp_sock and BPF_FUNC_tcp_sock
bpf: Sync bpf.h to tools/
bpf: Add skb->sk, bpf_sk_fullsock and bpf_tcp_sock tests to
test_verifer
bpf: Add test_sock_fields for skb->sk and bpf_tcp_sock
include/linux/bpf.h | 42 ++
include/uapi/linux/bpf.h | 72 ++-
kernel/bpf/verifier.c | 159 ++++--
net/core/filter.c | 495 +++++++++++-------
tools/include/uapi/linux/bpf.h | 72 ++-
tools/testing/selftests/bpf/Makefile | 6 +-
tools/testing/selftests/bpf/bpf_helpers.h | 4 +
tools/testing/selftests/bpf/bpf_util.h | 9 +
.../testing/selftests/bpf/test_sock_fields.c | 327 ++++++++++++
.../selftests/bpf/test_sock_fields_kern.c | 152 ++++++
.../selftests/bpf/verifier/ref_tracking.c | 4 +-
tools/testing/selftests/bpf/verifier/sock.c | 384 ++++++++++++++
tools/testing/selftests/bpf/verifier/unpriv.c | 2 +-
13 files changed, 1493 insertions(+), 235 deletions(-)
create mode 100644 tools/testing/selftests/bpf/test_sock_fields.c
create mode 100644 tools/testing/selftests/bpf/test_sock_fields_kern.c
create mode 100644 tools/testing/selftests/bpf/verifier/sock.c
--
2.17.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 bpf-next 1/7] bpf: Add a bpf_sock pointer to __sk_buff and a bpf_sk_fullsock helper
2019-02-10 7:22 [PATCH v2 bpf-next 0/7] Add __sk_buff->sk, bpf_tcp_sock, BPF_FUNC_tcp_sock and BPF_FUNC_sk_fullsock Martin KaFai Lau
@ 2019-02-10 7:22 ` Martin KaFai Lau
2019-02-10 7:22 ` [PATCH v2 bpf-next 2/7] bpf: Add state, dst_ip4, dst_ip6 and dst_port to bpf_sock Martin KaFai Lau
` (6 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: Martin KaFai Lau @ 2019-02-10 7:22 UTC (permalink / raw)
To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Lawrence Brakmo
In kernel, it is common to check "skb->sk && sk_fullsock(skb->sk)"
before accessing the fields in sock. For example, in __netdev_pick_tx:
static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb,
struct net_device *sb_dev)
{
/* ... */
struct sock *sk = skb->sk;
if (queue_index != new_index && sk &&
sk_fullsock(sk) &&
rcu_access_pointer(sk->sk_dst_cache))
sk_tx_queue_set(sk, new_index);
/* ... */
return queue_index;
}
This patch adds a "struct bpf_sock *sk" pointer to the "struct __sk_buff"
where a few of the convert_ctx_access() in filter.c has already been
accessing the skb->sk sock_common's fields,
e.g. sock_ops_convert_ctx_access().
"__sk_buff->sk" is a PTR_TO_SOCK_COMMON_OR_NULL in the verifier.
Some of the fileds in "bpf_sock" will not be directly
accessible through the "__sk_buff->sk" pointer. It is limited
by the new "bpf_sock_common_is_valid_access()".
e.g. The existing "type", "protocol", "mark" and "priority" in bpf_sock
are not allowed.
The newly added "struct bpf_sock *bpf_sk_fullsock(struct bpf_sock *sk)"
can be used to get a sk with all accessible fields in "bpf_sock".
This helper is added to both cg_skb and sched_(cls|act).
int cg_skb_foo(struct __sk_buff *skb) {
struct bpf_sock *sk;
sk = skb->sk;
if (!sk)
return 1;
sk = bpf_sk_fullsock(sk);
if (!sk)
return 1;
if (sk->family != AF_INET6 || sk->protocol != IPPROTO_TCP)
return 1;
/* some_traffic_shaping(); */
return 1;
}
(1) The sk is read only
(2) There is no new "struct bpf_sock_common" introduced.
(3) Future kernel sock's members could be added to bpf_sock only
instead of repeatedly adding at multiple places like currently
in bpf_sock_ops_md, bpf_sock_addr_md, sk_reuseport_md...etc.
(4) After "sk = skb->sk", the reg holding sk is in type
PTR_TO_SOCK_COMMON_OR_NULL.
(5) After bpf_sk_fullsock(), the return type will be in type
PTR_TO_SOCKET_OR_NULL which is the same as the return type of
bpf_sk_lookup_xxx().
However, bpf_sk_fullsock() does not take refcnt. The
acquire_reference_state() is only depending on the return type now.
To avoid it, a new is_acquire_function() is checked before calling
acquire_reference_state().
(6) The WARN_ON in "release_reference_state()" is no longer an
internal verifier bug.
When reg->id is not found in state->refs[], it means the
bpf_prog does something wrong like
"bpf_sk_release(bpf_sk_fullsock(skb->sk))" where reference has
never been acquired by calling "bpf_sk_fullsock(skb->sk)".
A -EINVAL and a verbose are done instead of WARN_ON. A test is
added to the test_verifier in a later patch.
Since the WARN_ON in "release_reference_state()" is no longer
needed, "__release_reference_state()" is folded into
"release_reference_state()" also.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
include/linux/bpf.h | 12 ++++
include/uapi/linux/bpf.h | 12 +++-
kernel/bpf/verifier.c | 132 +++++++++++++++++++++++++++------------
net/core/filter.c | 42 +++++++++++++
4 files changed, 157 insertions(+), 41 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index bd169a7bcc93..a60463b45b54 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -194,6 +194,7 @@ enum bpf_arg_type {
ARG_ANYTHING, /* any (initialized) argument is ok */
ARG_PTR_TO_SOCKET, /* pointer to bpf_sock */
ARG_PTR_TO_SPIN_LOCK, /* pointer to bpf_spin_lock */
+ ARG_PTR_TO_SOCK_COMMON, /* pointer to sock_common */
};
/* type of values returned from helper functions */
@@ -256,6 +257,8 @@ enum bpf_reg_type {
PTR_TO_FLOW_KEYS, /* reg points to bpf_flow_keys */
PTR_TO_SOCKET, /* reg points to struct bpf_sock */
PTR_TO_SOCKET_OR_NULL, /* reg points to struct bpf_sock or NULL */
+ PTR_TO_SOCK_COMMON, /* reg points to sock_common */
+ PTR_TO_SOCK_COMMON_OR_NULL, /* reg points to sock_common or NULL */
};
/* The information passed from prog-specific *_is_valid_access
@@ -920,6 +923,9 @@ void bpf_user_rnd_init_once(void);
u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
#if defined(CONFIG_NET)
+bool bpf_sock_common_is_valid_access(int off, int size,
+ enum bpf_access_type type,
+ struct bpf_insn_access_aux *info);
bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type,
struct bpf_insn_access_aux *info);
u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
@@ -928,6 +934,12 @@ u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
struct bpf_prog *prog,
u32 *target_size);
#else
+static inline bool bpf_sock_common_is_valid_access(int off, int size,
+ enum bpf_access_type type,
+ struct bpf_insn_access_aux *info)
+{
+ return false;
+}
static inline bool bpf_sock_is_valid_access(int off, int size,
enum bpf_access_type type,
struct bpf_insn_access_aux *info)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 1777fa0c61e4..5d79cba74ddc 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2329,6 +2329,14 @@ union bpf_attr {
* "**y**".
* Return
* 0
+ *
+ * struct bpf_sock *bpf_sk_fullsock(struct bpf_sock *sk)
+ * Description
+ * This helper gets a **struct bpf_sock** pointer such
+ * that all the fields in bpf_sock can be accessed.
+ * Return
+ * A **struct bpf_sock** pointer on success, or NULL in
+ * case of failure.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -2425,7 +2433,8 @@ union bpf_attr {
FN(msg_pop_data), \
FN(rc_pointer_rel), \
FN(spin_lock), \
- FN(spin_unlock),
+ FN(spin_unlock), \
+ FN(sk_fullsock),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
@@ -2545,6 +2554,7 @@ struct __sk_buff {
__u64 tstamp;
__u32 wire_len;
__u32 gso_segs;
+ __bpf_md_ptr(struct bpf_sock *, sk);
};
struct bpf_tunnel_key {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 516dfc6d78de..b755d55a3791 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -331,10 +331,17 @@ static bool type_is_pkt_pointer(enum bpf_reg_type type)
type == PTR_TO_PACKET_META;
}
+static bool type_is_sk_pointer(enum bpf_reg_type type)
+{
+ return type == PTR_TO_SOCKET ||
+ type == PTR_TO_SOCK_COMMON;
+}
+
static bool reg_type_may_be_null(enum bpf_reg_type type)
{
return type == PTR_TO_MAP_VALUE_OR_NULL ||
- type == PTR_TO_SOCKET_OR_NULL;
+ type == PTR_TO_SOCKET_OR_NULL ||
+ type == PTR_TO_SOCK_COMMON_OR_NULL;
}
static bool type_is_refcounted(enum bpf_reg_type type)
@@ -377,6 +384,12 @@ static bool is_release_function(enum bpf_func_id func_id)
return func_id == BPF_FUNC_sk_release;
}
+static bool is_acquire_function(enum bpf_func_id func_id)
+{
+ return func_id == BPF_FUNC_sk_lookup_tcp ||
+ func_id == BPF_FUNC_sk_lookup_udp;
+}
+
/* string representation of 'enum bpf_reg_type' */
static const char * const reg_type_str[] = {
[NOT_INIT] = "?",
@@ -392,6 +405,8 @@ static const char * const reg_type_str[] = {
[PTR_TO_FLOW_KEYS] = "flow_keys",
[PTR_TO_SOCKET] = "sock",
[PTR_TO_SOCKET_OR_NULL] = "sock_or_null",
+ [PTR_TO_SOCK_COMMON] = "sock_common",
+ [PTR_TO_SOCK_COMMON_OR_NULL] = "sock_common_or_null",
};
static char slot_type_char[] = {
@@ -618,13 +633,10 @@ static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx)
}
/* release function corresponding to acquire_reference_state(). Idempotent. */
-static int __release_reference_state(struct bpf_func_state *state, int ptr_id)
+static int release_reference_state(struct bpf_func_state *state, int ptr_id)
{
int i, last_idx;
- if (!ptr_id)
- return -EFAULT;
-
last_idx = state->acquired_refs - 1;
for (i = 0; i < state->acquired_refs; i++) {
if (state->refs[i].id == ptr_id) {
@@ -636,21 +648,7 @@ static int __release_reference_state(struct bpf_func_state *state, int ptr_id)
return 0;
}
}
- return -EFAULT;
-}
-
-/* variation on the above for cases where we expect that there must be an
- * outstanding reference for the specified ptr_id.
- */
-static int release_reference_state(struct bpf_verifier_env *env, int ptr_id)
-{
- struct bpf_func_state *state = cur_func(env);
- int err;
-
- err = __release_reference_state(state, ptr_id);
- if (WARN_ON_ONCE(err != 0))
- verbose(env, "verifier internal error: can't release reference\n");
- return err;
+ return -EINVAL;
}
static int transfer_reference_state(struct bpf_func_state *dst,
@@ -1209,6 +1207,8 @@ static bool is_spillable_regtype(enum bpf_reg_type type)
case CONST_PTR_TO_MAP:
case PTR_TO_SOCKET:
case PTR_TO_SOCKET_OR_NULL:
+ case PTR_TO_SOCK_COMMON:
+ case PTR_TO_SOCK_COMMON_OR_NULL:
return true;
default:
return false;
@@ -1647,6 +1647,7 @@ static int check_sock_access(struct bpf_verifier_env *env, int insn_idx,
struct bpf_reg_state *regs = cur_regs(env);
struct bpf_reg_state *reg = ®s[regno];
struct bpf_insn_access_aux info = {};
+ bool valid;
if (reg->smin_value < 0) {
verbose(env, "R%d min value is negative, either use unsigned index or do a if (index >=0) check.\n",
@@ -1654,15 +1655,28 @@ static int check_sock_access(struct bpf_verifier_env *env, int insn_idx,
return -EACCES;
}
- if (!bpf_sock_is_valid_access(off, size, t, &info)) {
- verbose(env, "invalid bpf_sock access off=%d size=%d\n",
- off, size);
- return -EACCES;
+ switch (reg->type) {
+ case PTR_TO_SOCK_COMMON:
+ valid = bpf_sock_common_is_valid_access(off, size, t, &info);
+ break;
+ case PTR_TO_SOCKET:
+ valid = bpf_sock_is_valid_access(off, size, t, &info);
+ break;
+ default:
+ valid = false;
}
- env->insn_aux_data[insn_idx].ctx_field_size = info.ctx_field_size;
- return 0;
+ if (valid) {
+ env->insn_aux_data[insn_idx].ctx_field_size =
+ info.ctx_field_size;
+ return 0;
+ }
+
+ verbose(env, "R%d invalid %s access off=%d size=%d\n",
+ regno, reg_type_str[reg->type], off, size);
+
+ return -EACCES;
}
static bool __is_pointer_value(bool allow_ptr_leaks,
@@ -1688,8 +1702,14 @@ static bool is_ctx_reg(struct bpf_verifier_env *env, int regno)
{
const struct bpf_reg_state *reg = reg_state(env, regno);
- return reg->type == PTR_TO_CTX ||
- reg->type == PTR_TO_SOCKET;
+ return reg->type == PTR_TO_CTX;
+}
+
+static bool is_sk_reg(struct bpf_verifier_env *env, int regno)
+{
+ const struct bpf_reg_state *reg = reg_state(env, regno);
+
+ return type_is_sk_pointer(reg->type);
}
static bool is_pkt_reg(struct bpf_verifier_env *env, int regno)
@@ -1800,6 +1820,9 @@ static int check_ptr_alignment(struct bpf_verifier_env *env,
case PTR_TO_SOCKET:
pointer_desc = "sock ";
break;
+ case PTR_TO_SOCK_COMMON:
+ pointer_desc = "sock_common ";
+ break;
default:
break;
}
@@ -2003,11 +2026,14 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
* PTR_TO_PACKET[_META,_END]. In the latter
* case, we know the offset is zero.
*/
- if (reg_type == SCALAR_VALUE)
+ if (reg_type == SCALAR_VALUE) {
mark_reg_unknown(env, regs, value_regno);
- else
+ } else {
mark_reg_known_zero(env, regs,
value_regno);
+ if (reg_type_may_be_null(reg_type))
+ regs[value_regno].id = ++env->id_gen;
+ }
regs[value_regno].type = reg_type;
}
@@ -2053,9 +2079,10 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
err = check_flow_keys_access(env, off, size);
if (!err && t == BPF_READ && value_regno >= 0)
mark_reg_unknown(env, regs, value_regno);
- } else if (reg->type == PTR_TO_SOCKET) {
+ } else if (type_is_sk_pointer(reg->type)) {
if (t == BPF_WRITE) {
- verbose(env, "cannot write into socket\n");
+ verbose(env, "R%d cannot write into %s\n",
+ regno, reg_type_str[reg->type]);
return -EACCES;
}
err = check_sock_access(env, insn_idx, regno, off, size, t);
@@ -2102,7 +2129,8 @@ static int check_xadd(struct bpf_verifier_env *env, int insn_idx, struct bpf_ins
if (is_ctx_reg(env, insn->dst_reg) ||
is_pkt_reg(env, insn->dst_reg) ||
- is_flow_key_reg(env, insn->dst_reg)) {
+ is_flow_key_reg(env, insn->dst_reg) ||
+ is_sk_reg(env, insn->dst_reg)) {
verbose(env, "BPF_XADD stores into R%d %s is not allowed\n",
insn->dst_reg,
reg_type_str[reg_state(env, insn->dst_reg)->type]);
@@ -2369,6 +2397,11 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
err = check_ctx_reg(env, reg, regno);
if (err < 0)
return err;
+ } else if (arg_type == ARG_PTR_TO_SOCK_COMMON) {
+ expected_type = PTR_TO_SOCK_COMMON;
+ /* Any sk pointer can be ARG_PTR_TO_SOCK_COMMON */
+ if (!type_is_sk_pointer(type))
+ goto err_type;
} else if (arg_type == ARG_PTR_TO_SOCKET) {
expected_type = PTR_TO_SOCKET;
if (type != expected_type)
@@ -2783,7 +2816,7 @@ static int release_reference(struct bpf_verifier_env *env,
for (i = 0; i <= vstate->curframe; i++)
release_reg_references(env, vstate->frame[i], meta->ptr_id);
- return release_reference_state(env, meta->ptr_id);
+ return release_reference_state(cur_func(env), meta->ptr_id);
}
static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
@@ -3049,8 +3082,11 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
}
} else if (is_release_function(func_id)) {
err = release_reference(env, &meta);
- if (err)
+ if (err) {
+ verbose(env, "func %s#%d reference has not been acquired before\n",
+ func_id_name(func_id), func_id);
return err;
+ }
}
regs = cur_regs(env);
@@ -3099,12 +3135,19 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
regs[BPF_REG_0].id = ++env->id_gen;
}
} else if (fn->ret_type == RET_PTR_TO_SOCKET_OR_NULL) {
- int id = acquire_reference_state(env, insn_idx);
- if (id < 0)
- return id;
mark_reg_known_zero(env, regs, BPF_REG_0);
regs[BPF_REG_0].type = PTR_TO_SOCKET_OR_NULL;
- regs[BPF_REG_0].id = id;
+ if (is_acquire_function(func_id)) {
+ int id = acquire_reference_state(env, insn_idx);
+
+ if (id < 0)
+ return id;
+ /* For release_reference() */
+ regs[BPF_REG_0].id = id;
+ } else {
+ /* For mark_ptr_or_null_reg() */
+ regs[BPF_REG_0].id = ++env->id_gen;
+ }
} else {
verbose(env, "unknown return type %d of func %s#%d\n",
fn->ret_type, func_id_name(func_id), func_id);
@@ -3364,6 +3407,8 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
case PTR_TO_PACKET_END:
case PTR_TO_SOCKET:
case PTR_TO_SOCKET_OR_NULL:
+ case PTR_TO_SOCK_COMMON:
+ case PTR_TO_SOCK_COMMON_OR_NULL:
verbose(env, "R%d pointer arithmetic on %s prohibited\n",
dst, reg_type_str[ptr_reg->type]);
return -EACCES;
@@ -4597,6 +4642,8 @@ static void mark_ptr_or_null_reg(struct bpf_func_state *state,
}
} else if (reg->type == PTR_TO_SOCKET_OR_NULL) {
reg->type = PTR_TO_SOCKET;
+ } else if (reg->type == PTR_TO_SOCK_COMMON_OR_NULL) {
+ reg->type = PTR_TO_SOCK_COMMON;
}
if (is_null || !(reg_is_refcounted(reg) ||
reg_may_point_to_spin_lock(reg))) {
@@ -4621,7 +4668,7 @@ static void mark_ptr_or_null_regs(struct bpf_verifier_state *vstate, u32 regno,
int i, j;
if (reg_is_refcounted_or_null(®s[regno]) && is_null)
- __release_reference_state(state, id);
+ release_reference_state(state, id);
for (i = 0; i < MAX_BPF_REG; i++)
mark_ptr_or_null_reg(state, ®s[i], id, is_null);
@@ -5790,6 +5837,8 @@ static bool regsafe(struct bpf_reg_state *rold, struct bpf_reg_state *rcur,
case PTR_TO_FLOW_KEYS:
case PTR_TO_SOCKET:
case PTR_TO_SOCKET_OR_NULL:
+ case PTR_TO_SOCK_COMMON:
+ case PTR_TO_SOCK_COMMON_OR_NULL:
/* Only valid matches are exact, which memcmp() above
* would have accepted
*/
@@ -6110,6 +6159,8 @@ static bool reg_type_mismatch_ok(enum bpf_reg_type type)
case PTR_TO_CTX:
case PTR_TO_SOCKET:
case PTR_TO_SOCKET_OR_NULL:
+ case PTR_TO_SOCK_COMMON:
+ case PTR_TO_SOCK_COMMON_OR_NULL:
return false;
default:
return true;
@@ -7112,6 +7163,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
convert_ctx_access = ops->convert_ctx_access;
break;
case PTR_TO_SOCKET:
+ case PTR_TO_SOCK_COMMON:
convert_ctx_access = bpf_sock_convert_ctx_access;
break;
default:
diff --git a/net/core/filter.c b/net/core/filter.c
index 3a49f68eda10..401d2e0aebf8 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1793,6 +1793,20 @@ static const struct bpf_func_proto bpf_skb_pull_data_proto = {
.arg2_type = ARG_ANYTHING,
};
+BPF_CALL_1(bpf_sk_fullsock, struct sock *, sk)
+{
+ sk = sk_to_full_sk(sk);
+
+ return sk_fullsock(sk) ? (unsigned long)sk : (unsigned long)NULL;
+}
+
+static const struct bpf_func_proto bpf_sk_fullsock_proto = {
+ .func = bpf_sk_fullsock,
+ .gpl_only = false,
+ .ret_type = RET_PTR_TO_SOCKET_OR_NULL,
+ .arg1_type = ARG_PTR_TO_SOCK_COMMON,
+};
+
static inline int sk_skb_try_make_writable(struct sk_buff *skb,
unsigned int write_len)
{
@@ -5406,6 +5420,8 @@ cg_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
switch (func_id) {
case BPF_FUNC_get_local_storage:
return &bpf_get_local_storage_proto;
+ case BPF_FUNC_sk_fullsock:
+ return &bpf_sk_fullsock_proto;
default:
return sk_filter_func_proto(func_id, prog);
}
@@ -5477,6 +5493,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_get_socket_uid_proto;
case BPF_FUNC_fib_lookup:
return &bpf_skb_fib_lookup_proto;
+ case BPF_FUNC_sk_fullsock:
+ return &bpf_sk_fullsock_proto;
#ifdef CONFIG_XFRM
case BPF_FUNC_skb_get_xfrm_state:
return &bpf_skb_get_xfrm_state_proto;
@@ -5764,6 +5782,11 @@ static bool bpf_skb_is_valid_access(int off, int size, enum bpf_access_type type
if (size != sizeof(__u64))
return false;
break;
+ case offsetof(struct __sk_buff, sk):
+ if (type == BPF_WRITE || size != sizeof(__u64))
+ return false;
+ info->reg_type = PTR_TO_SOCK_COMMON_OR_NULL;
+ break;
default:
/* Only narrow read access allowed for now. */
if (type == BPF_WRITE) {
@@ -5950,6 +5973,18 @@ static bool __sock_filter_check_size(int off, int size,
return size == size_default;
}
+bool bpf_sock_common_is_valid_access(int off, int size,
+ enum bpf_access_type type,
+ struct bpf_insn_access_aux *info)
+{
+ switch (off) {
+ case bpf_ctx_range_till(struct bpf_sock, type, priority):
+ return false;
+ default:
+ return bpf_sock_is_valid_access(off, size, type, info);
+ }
+}
+
bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type,
struct bpf_insn_access_aux *info)
{
@@ -6748,6 +6783,13 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type type,
off += offsetof(struct qdisc_skb_cb, pkt_len);
*target_size = 4;
*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg, off);
+ break;
+
+ case offsetof(struct __sk_buff, sk):
+ *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct sk_buff, sk),
+ si->dst_reg, si->src_reg,
+ offsetof(struct sk_buff, sk));
+ break;
}
return insn - insn_buf;
--
2.17.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 bpf-next 2/7] bpf: Add state, dst_ip4, dst_ip6 and dst_port to bpf_sock
2019-02-10 7:22 [PATCH v2 bpf-next 0/7] Add __sk_buff->sk, bpf_tcp_sock, BPF_FUNC_tcp_sock and BPF_FUNC_sk_fullsock Martin KaFai Lau
2019-02-10 7:22 ` [PATCH v2 bpf-next 1/7] bpf: Add a bpf_sock pointer to __sk_buff and a bpf_sk_fullsock helper Martin KaFai Lau
@ 2019-02-10 7:22 ` Martin KaFai Lau
2019-02-10 7:22 ` [PATCH v2 bpf-next 3/7] bpf: Refactor sock_ops_convert_ctx_access Martin KaFai Lau
` (5 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: Martin KaFai Lau @ 2019-02-10 7:22 UTC (permalink / raw)
To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Lawrence Brakmo
This patch adds "state", "dst_ip4", "dst_ip6" and "dst_port" to the
bpf_sock. The userspace has already been using "state",
e.g. inet_diag (ss -t) and getsockopt(TCP_INFO).
This patch also allows narrow load on the following existing fields:
"family", "type", "protocol" and "src_port". Unlike IP address,
the load offset is resticted to the first byte for them but it
can be relaxed later if there is a use case.
This patch also folds __sock_filter_check_size() into
bpf_sock_is_valid_access() since it is not called
by any where else. All bpf_sock checking is in
one place.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
include/uapi/linux/bpf.h | 17 ++++---
net/core/filter.c | 99 +++++++++++++++++++++++++++++++---------
2 files changed, 85 insertions(+), 31 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 5d79cba74ddc..d8f91777c5b6 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2606,15 +2606,14 @@ struct bpf_sock {
__u32 protocol;
__u32 mark;
__u32 priority;
- __u32 src_ip4; /* Allows 1,2,4-byte read.
- * Stored in network byte order.
- */
- __u32 src_ip6[4]; /* Allows 1,2,4-byte read.
- * Stored in network byte order.
- */
- __u32 src_port; /* Allows 4-byte read.
- * Stored in host byte order
- */
+ /* IP address also allows 1 and 2 bytes access */
+ __u32 src_ip4;
+ __u32 src_ip6[4];
+ __u32 src_port; /* host byte order */
+ __u32 dst_port; /* network byte order */
+ __u32 dst_ip4;
+ __u32 dst_ip6[4];
+ __u32 state;
};
struct bpf_sock_tuple {
diff --git a/net/core/filter.c b/net/core/filter.c
index 401d2e0aebf8..01bb64bf2b5e 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5958,21 +5958,6 @@ static bool __sock_filter_check_attach_type(int off,
return true;
}
-static bool __sock_filter_check_size(int off, int size,
- struct bpf_insn_access_aux *info)
-{
- const int size_default = sizeof(__u32);
-
- switch (off) {
- case bpf_ctx_range(struct bpf_sock, src_ip4):
- case bpf_ctx_range_till(struct bpf_sock, src_ip6[0], src_ip6[3]):
- bpf_ctx_record_field_size(info, size_default);
- return bpf_ctx_narrow_access_ok(off, size, size_default);
- }
-
- return size == size_default;
-}
-
bool bpf_sock_common_is_valid_access(int off, int size,
enum bpf_access_type type,
struct bpf_insn_access_aux *info)
@@ -5988,13 +5973,29 @@ bool bpf_sock_common_is_valid_access(int off, int size,
bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type,
struct bpf_insn_access_aux *info)
{
+ const int size_default = sizeof(__u32);
+
if (off < 0 || off >= sizeof(struct bpf_sock))
return false;
if (off % size != 0)
return false;
- if (!__sock_filter_check_size(off, size, info))
- return false;
- return true;
+
+ switch (off) {
+ case offsetof(struct bpf_sock, state):
+ case offsetof(struct bpf_sock, family):
+ case offsetof(struct bpf_sock, type):
+ case offsetof(struct bpf_sock, protocol):
+ case offsetof(struct bpf_sock, dst_port):
+ case offsetof(struct bpf_sock, src_port):
+ case bpf_ctx_range(struct bpf_sock, src_ip4):
+ case bpf_ctx_range_till(struct bpf_sock, src_ip6[0], src_ip6[3]):
+ case bpf_ctx_range(struct bpf_sock, dst_ip4):
+ case bpf_ctx_range_till(struct bpf_sock, dst_ip6[0], dst_ip6[3]):
+ bpf_ctx_record_field_size(info, size_default);
+ return bpf_ctx_narrow_access_ok(off, size, size_default);
+ }
+
+ return size == size_default;
}
static bool sock_filter_is_valid_access(int off, int size,
@@ -6838,24 +6839,32 @@ u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
break;
case offsetof(struct bpf_sock, family):
- BUILD_BUG_ON(FIELD_SIZEOF(struct sock, sk_family) != 2);
-
- *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg,
- offsetof(struct sock, sk_family));
+ *insn++ = BPF_LDX_MEM(
+ BPF_FIELD_SIZEOF(struct sock_common, skc_family),
+ si->dst_reg, si->src_reg,
+ bpf_target_off(struct sock_common,
+ skc_family,
+ FIELD_SIZEOF(struct sock_common,
+ skc_family),
+ target_size));
break;
case offsetof(struct bpf_sock, type):
+ BUILD_BUG_ON(HWEIGHT32(SK_FL_TYPE_MASK) != BITS_PER_BYTE * 2);
*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg,
offsetof(struct sock, __sk_flags_offset));
*insn++ = BPF_ALU32_IMM(BPF_AND, si->dst_reg, SK_FL_TYPE_MASK);
*insn++ = BPF_ALU32_IMM(BPF_RSH, si->dst_reg, SK_FL_TYPE_SHIFT);
+ *target_size = 2;
break;
case offsetof(struct bpf_sock, protocol):
+ BUILD_BUG_ON(HWEIGHT32(SK_FL_PROTO_MASK) != BITS_PER_BYTE);
*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg,
offsetof(struct sock, __sk_flags_offset));
*insn++ = BPF_ALU32_IMM(BPF_AND, si->dst_reg, SK_FL_PROTO_MASK);
*insn++ = BPF_ALU32_IMM(BPF_RSH, si->dst_reg, SK_FL_PROTO_SHIFT);
+ *target_size = 1;
break;
case offsetof(struct bpf_sock, src_ip4):
@@ -6867,6 +6876,15 @@ u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
target_size));
break;
+ case offsetof(struct bpf_sock, dst_ip4):
+ *insn++ = BPF_LDX_MEM(
+ BPF_SIZE(si->code), si->dst_reg, si->src_reg,
+ bpf_target_off(struct sock_common, skc_daddr,
+ FIELD_SIZEOF(struct sock_common,
+ skc_daddr),
+ target_size));
+ break;
+
case bpf_ctx_range_till(struct bpf_sock, src_ip6[0], src_ip6[3]):
#if IS_ENABLED(CONFIG_IPV6)
off = si->off;
@@ -6885,6 +6903,23 @@ u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
#endif
break;
+ case bpf_ctx_range_till(struct bpf_sock, dst_ip6[0], dst_ip6[3]):
+#if IS_ENABLED(CONFIG_IPV6)
+ off = si->off;
+ off -= offsetof(struct bpf_sock, dst_ip6[0]);
+ *insn++ = BPF_LDX_MEM(
+ BPF_SIZE(si->code), si->dst_reg, si->src_reg,
+ bpf_target_off(struct sock_common,
+ skc_v6_daddr.s6_addr32[0],
+ FIELD_SIZEOF(struct sock_common,
+ skc_v6_daddr.s6_addr32[0]),
+ target_size) + off);
+#else
+ *insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
+ *target_size = 4;
+#endif
+ break;
+
case offsetof(struct bpf_sock, src_port):
*insn++ = BPF_LDX_MEM(
BPF_FIELD_SIZEOF(struct sock_common, skc_num),
@@ -6894,6 +6929,26 @@ u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
skc_num),
target_size));
break;
+
+ case offsetof(struct bpf_sock, dst_port):
+ *insn++ = BPF_LDX_MEM(
+ BPF_FIELD_SIZEOF(struct sock_common, skc_dport),
+ si->dst_reg, si->src_reg,
+ bpf_target_off(struct sock_common, skc_dport,
+ FIELD_SIZEOF(struct sock_common,
+ skc_dport),
+ target_size));
+ break;
+
+ case offsetof(struct bpf_sock, state):
+ *insn++ = BPF_LDX_MEM(
+ BPF_FIELD_SIZEOF(struct sock_common, skc_state),
+ si->dst_reg, si->src_reg,
+ bpf_target_off(struct sock_common, skc_state,
+ FIELD_SIZEOF(struct sock_common,
+ skc_state),
+ target_size));
+ break;
}
return insn - insn_buf;
--
2.17.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 bpf-next 3/7] bpf: Refactor sock_ops_convert_ctx_access
2019-02-10 7:22 [PATCH v2 bpf-next 0/7] Add __sk_buff->sk, bpf_tcp_sock, BPF_FUNC_tcp_sock and BPF_FUNC_sk_fullsock Martin KaFai Lau
2019-02-10 7:22 ` [PATCH v2 bpf-next 1/7] bpf: Add a bpf_sock pointer to __sk_buff and a bpf_sk_fullsock helper Martin KaFai Lau
2019-02-10 7:22 ` [PATCH v2 bpf-next 2/7] bpf: Add state, dst_ip4, dst_ip6 and dst_port to bpf_sock Martin KaFai Lau
@ 2019-02-10 7:22 ` Martin KaFai Lau
2019-02-10 7:22 ` [PATCH v2 bpf-next 4/7] bpf: Add struct bpf_tcp_sock and BPF_FUNC_tcp_sock Martin KaFai Lau
` (4 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: Martin KaFai Lau @ 2019-02-10 7:22 UTC (permalink / raw)
To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Lawrence Brakmo
The next patch will introduce a new "struct bpf_tcp_sock" which
exposes the same tcp_sock's fields already exposed in
"struct bpf_sock_ops".
This patch refactor the existing convert_ctx_access() codes for
"struct bpf_sock_ops" to get them ready to be reused for
"struct bpf_tcp_sock". The "rtt_min" is not refactored
in this patch because its handling is different from other
fields.
The SOCK_OPS_GET_TCP_SOCK_FIELD is new. All other SOCK_OPS_XXX_FIELD
changes are code move only.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
net/core/filter.c | 287 ++++++++++++++++++++--------------------------
1 file changed, 127 insertions(+), 160 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index 01bb64bf2b5e..c0d7b9ef279f 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5030,6 +5030,54 @@ static const struct bpf_func_proto bpf_lwt_seg6_adjust_srh_proto = {
};
#endif /* CONFIG_IPV6_SEG6_BPF */
+#define CONVERT_COMMON_TCP_SOCK_FIELDS(md_type, CONVERT) \
+do { \
+ switch (si->off) { \
+ case offsetof(md_type, snd_cwnd): \
+ CONVERT(snd_cwnd); break; \
+ case offsetof(md_type, srtt_us): \
+ CONVERT(srtt_us); break; \
+ case offsetof(md_type, snd_ssthresh): \
+ CONVERT(snd_ssthresh); break; \
+ case offsetof(md_type, rcv_nxt): \
+ CONVERT(rcv_nxt); break; \
+ case offsetof(md_type, snd_nxt): \
+ CONVERT(snd_nxt); break; \
+ case offsetof(md_type, snd_una): \
+ CONVERT(snd_una); break; \
+ case offsetof(md_type, mss_cache): \
+ CONVERT(mss_cache); break; \
+ case offsetof(md_type, ecn_flags): \
+ CONVERT(ecn_flags); break; \
+ case offsetof(md_type, rate_delivered): \
+ CONVERT(rate_delivered); break; \
+ case offsetof(md_type, rate_interval_us): \
+ CONVERT(rate_interval_us); break; \
+ case offsetof(md_type, packets_out): \
+ CONVERT(packets_out); break; \
+ case offsetof(md_type, retrans_out): \
+ CONVERT(retrans_out); break; \
+ case offsetof(md_type, total_retrans): \
+ CONVERT(total_retrans); break; \
+ case offsetof(md_type, segs_in): \
+ CONVERT(segs_in); break; \
+ case offsetof(md_type, data_segs_in): \
+ CONVERT(data_segs_in); break; \
+ case offsetof(md_type, segs_out): \
+ CONVERT(segs_out); break; \
+ case offsetof(md_type, data_segs_out): \
+ CONVERT(data_segs_out); break; \
+ case offsetof(md_type, lost_out): \
+ CONVERT(lost_out); break; \
+ case offsetof(md_type, sacked_out): \
+ CONVERT(sacked_out); break; \
+ case offsetof(md_type, bytes_received): \
+ CONVERT(bytes_received); break; \
+ case offsetof(md_type, bytes_acked): \
+ CONVERT(bytes_acked); break; \
+ } \
+} while (0)
+
#ifdef CONFIG_INET
static struct sock *sk_lookup(struct net *net, struct bpf_sock_tuple *tuple,
int dif, int sdif, u8 family, u8 proto)
@@ -7196,6 +7244,85 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
struct bpf_insn *insn = insn_buf;
int off;
+/* Helper macro for adding read access to tcp_sock or sock fields. */
+#define SOCK_OPS_GET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ) \
+ do { \
+ BUILD_BUG_ON(FIELD_SIZEOF(OBJ, OBJ_FIELD) > \
+ FIELD_SIZEOF(struct bpf_sock_ops, BPF_FIELD)); \
+ *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( \
+ struct bpf_sock_ops_kern, \
+ is_fullsock), \
+ si->dst_reg, si->src_reg, \
+ offsetof(struct bpf_sock_ops_kern, \
+ is_fullsock)); \
+ *insn++ = BPF_JMP_IMM(BPF_JEQ, si->dst_reg, 0, 2); \
+ *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( \
+ struct bpf_sock_ops_kern, sk),\
+ si->dst_reg, si->src_reg, \
+ offsetof(struct bpf_sock_ops_kern, sk));\
+ *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(OBJ, \
+ OBJ_FIELD), \
+ si->dst_reg, si->dst_reg, \
+ offsetof(OBJ, OBJ_FIELD)); \
+ } while (0)
+
+#define SOCK_OPS_GET_TCP_SOCK_FIELD(FIELD) \
+ SOCK_OPS_GET_FIELD(FIELD, FIELD, struct tcp_sock)
+
+/* Helper macro for adding write access to tcp_sock or sock fields.
+ * The macro is called with two registers, dst_reg which contains a pointer
+ * to ctx (context) and src_reg which contains the value that should be
+ * stored. However, we need an additional register since we cannot overwrite
+ * dst_reg because it may be used later in the program.
+ * Instead we "borrow" one of the other register. We first save its value
+ * into a new (temp) field in bpf_sock_ops_kern, use it, and then restore
+ * it at the end of the macro.
+ */
+#define SOCK_OPS_SET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ) \
+ do { \
+ int reg = BPF_REG_9; \
+ BUILD_BUG_ON(FIELD_SIZEOF(OBJ, OBJ_FIELD) > \
+ FIELD_SIZEOF(struct bpf_sock_ops, BPF_FIELD)); \
+ if (si->dst_reg == reg || si->src_reg == reg) \
+ reg--; \
+ if (si->dst_reg == reg || si->src_reg == reg) \
+ reg--; \
+ *insn++ = BPF_STX_MEM(BPF_DW, si->dst_reg, reg, \
+ offsetof(struct bpf_sock_ops_kern, \
+ temp)); \
+ *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( \
+ struct bpf_sock_ops_kern, \
+ is_fullsock), \
+ reg, si->dst_reg, \
+ offsetof(struct bpf_sock_ops_kern, \
+ is_fullsock)); \
+ *insn++ = BPF_JMP_IMM(BPF_JEQ, reg, 0, 2); \
+ *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( \
+ struct bpf_sock_ops_kern, sk),\
+ reg, si->dst_reg, \
+ offsetof(struct bpf_sock_ops_kern, sk));\
+ *insn++ = BPF_STX_MEM(BPF_FIELD_SIZEOF(OBJ, OBJ_FIELD), \
+ reg, si->src_reg, \
+ offsetof(OBJ, OBJ_FIELD)); \
+ *insn++ = BPF_LDX_MEM(BPF_DW, reg, si->dst_reg, \
+ offsetof(struct bpf_sock_ops_kern, \
+ temp)); \
+ } while (0)
+
+#define SOCK_OPS_GET_OR_SET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ, TYPE) \
+ do { \
+ if (TYPE == BPF_WRITE) \
+ SOCK_OPS_SET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ); \
+ else \
+ SOCK_OPS_GET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ); \
+ } while (0)
+
+ CONVERT_COMMON_TCP_SOCK_FIELDS(struct bpf_sock_ops,
+ SOCK_OPS_GET_TCP_SOCK_FIELD);
+
+ if (insn > insn_buf)
+ return insn - insn_buf;
+
switch (si->off) {
case offsetof(struct bpf_sock_ops, op) ...
offsetof(struct bpf_sock_ops, replylong[3]):
@@ -7353,175 +7480,15 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
FIELD_SIZEOF(struct minmax_sample, t));
break;
-/* Helper macro for adding read access to tcp_sock or sock fields. */
-#define SOCK_OPS_GET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ) \
- do { \
- BUILD_BUG_ON(FIELD_SIZEOF(OBJ, OBJ_FIELD) > \
- FIELD_SIZEOF(struct bpf_sock_ops, BPF_FIELD)); \
- *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( \
- struct bpf_sock_ops_kern, \
- is_fullsock), \
- si->dst_reg, si->src_reg, \
- offsetof(struct bpf_sock_ops_kern, \
- is_fullsock)); \
- *insn++ = BPF_JMP_IMM(BPF_JEQ, si->dst_reg, 0, 2); \
- *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( \
- struct bpf_sock_ops_kern, sk),\
- si->dst_reg, si->src_reg, \
- offsetof(struct bpf_sock_ops_kern, sk));\
- *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(OBJ, \
- OBJ_FIELD), \
- si->dst_reg, si->dst_reg, \
- offsetof(OBJ, OBJ_FIELD)); \
- } while (0)
-
-/* Helper macro for adding write access to tcp_sock or sock fields.
- * The macro is called with two registers, dst_reg which contains a pointer
- * to ctx (context) and src_reg which contains the value that should be
- * stored. However, we need an additional register since we cannot overwrite
- * dst_reg because it may be used later in the program.
- * Instead we "borrow" one of the other register. We first save its value
- * into a new (temp) field in bpf_sock_ops_kern, use it, and then restore
- * it at the end of the macro.
- */
-#define SOCK_OPS_SET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ) \
- do { \
- int reg = BPF_REG_9; \
- BUILD_BUG_ON(FIELD_SIZEOF(OBJ, OBJ_FIELD) > \
- FIELD_SIZEOF(struct bpf_sock_ops, BPF_FIELD)); \
- if (si->dst_reg == reg || si->src_reg == reg) \
- reg--; \
- if (si->dst_reg == reg || si->src_reg == reg) \
- reg--; \
- *insn++ = BPF_STX_MEM(BPF_DW, si->dst_reg, reg, \
- offsetof(struct bpf_sock_ops_kern, \
- temp)); \
- *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( \
- struct bpf_sock_ops_kern, \
- is_fullsock), \
- reg, si->dst_reg, \
- offsetof(struct bpf_sock_ops_kern, \
- is_fullsock)); \
- *insn++ = BPF_JMP_IMM(BPF_JEQ, reg, 0, 2); \
- *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( \
- struct bpf_sock_ops_kern, sk),\
- reg, si->dst_reg, \
- offsetof(struct bpf_sock_ops_kern, sk));\
- *insn++ = BPF_STX_MEM(BPF_FIELD_SIZEOF(OBJ, OBJ_FIELD), \
- reg, si->src_reg, \
- offsetof(OBJ, OBJ_FIELD)); \
- *insn++ = BPF_LDX_MEM(BPF_DW, reg, si->dst_reg, \
- offsetof(struct bpf_sock_ops_kern, \
- temp)); \
- } while (0)
-
-#define SOCK_OPS_GET_OR_SET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ, TYPE) \
- do { \
- if (TYPE == BPF_WRITE) \
- SOCK_OPS_SET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ); \
- else \
- SOCK_OPS_GET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ); \
- } while (0)
-
- case offsetof(struct bpf_sock_ops, snd_cwnd):
- SOCK_OPS_GET_FIELD(snd_cwnd, snd_cwnd, struct tcp_sock);
- break;
-
- case offsetof(struct bpf_sock_ops, srtt_us):
- SOCK_OPS_GET_FIELD(srtt_us, srtt_us, struct tcp_sock);
- break;
-
case offsetof(struct bpf_sock_ops, bpf_sock_ops_cb_flags):
SOCK_OPS_GET_FIELD(bpf_sock_ops_cb_flags, bpf_sock_ops_cb_flags,
struct tcp_sock);
break;
- case offsetof(struct bpf_sock_ops, snd_ssthresh):
- SOCK_OPS_GET_FIELD(snd_ssthresh, snd_ssthresh, struct tcp_sock);
- break;
-
- case offsetof(struct bpf_sock_ops, rcv_nxt):
- SOCK_OPS_GET_FIELD(rcv_nxt, rcv_nxt, struct tcp_sock);
- break;
-
- case offsetof(struct bpf_sock_ops, snd_nxt):
- SOCK_OPS_GET_FIELD(snd_nxt, snd_nxt, struct tcp_sock);
- break;
-
- case offsetof(struct bpf_sock_ops, snd_una):
- SOCK_OPS_GET_FIELD(snd_una, snd_una, struct tcp_sock);
- break;
-
- case offsetof(struct bpf_sock_ops, mss_cache):
- SOCK_OPS_GET_FIELD(mss_cache, mss_cache, struct tcp_sock);
- break;
-
- case offsetof(struct bpf_sock_ops, ecn_flags):
- SOCK_OPS_GET_FIELD(ecn_flags, ecn_flags, struct tcp_sock);
- break;
-
- case offsetof(struct bpf_sock_ops, rate_delivered):
- SOCK_OPS_GET_FIELD(rate_delivered, rate_delivered,
- struct tcp_sock);
- break;
-
- case offsetof(struct bpf_sock_ops, rate_interval_us):
- SOCK_OPS_GET_FIELD(rate_interval_us, rate_interval_us,
- struct tcp_sock);
- break;
-
- case offsetof(struct bpf_sock_ops, packets_out):
- SOCK_OPS_GET_FIELD(packets_out, packets_out, struct tcp_sock);
- break;
-
- case offsetof(struct bpf_sock_ops, retrans_out):
- SOCK_OPS_GET_FIELD(retrans_out, retrans_out, struct tcp_sock);
- break;
-
- case offsetof(struct bpf_sock_ops, total_retrans):
- SOCK_OPS_GET_FIELD(total_retrans, total_retrans,
- struct tcp_sock);
- break;
-
- case offsetof(struct bpf_sock_ops, segs_in):
- SOCK_OPS_GET_FIELD(segs_in, segs_in, struct tcp_sock);
- break;
-
- case offsetof(struct bpf_sock_ops, data_segs_in):
- SOCK_OPS_GET_FIELD(data_segs_in, data_segs_in, struct tcp_sock);
- break;
-
- case offsetof(struct bpf_sock_ops, segs_out):
- SOCK_OPS_GET_FIELD(segs_out, segs_out, struct tcp_sock);
- break;
-
- case offsetof(struct bpf_sock_ops, data_segs_out):
- SOCK_OPS_GET_FIELD(data_segs_out, data_segs_out,
- struct tcp_sock);
- break;
-
- case offsetof(struct bpf_sock_ops, lost_out):
- SOCK_OPS_GET_FIELD(lost_out, lost_out, struct tcp_sock);
- break;
-
- case offsetof(struct bpf_sock_ops, sacked_out):
- SOCK_OPS_GET_FIELD(sacked_out, sacked_out, struct tcp_sock);
- break;
-
case offsetof(struct bpf_sock_ops, sk_txhash):
SOCK_OPS_GET_OR_SET_FIELD(sk_txhash, sk_txhash,
struct sock, type);
break;
-
- case offsetof(struct bpf_sock_ops, bytes_received):
- SOCK_OPS_GET_FIELD(bytes_received, bytes_received,
- struct tcp_sock);
- break;
-
- case offsetof(struct bpf_sock_ops, bytes_acked):
- SOCK_OPS_GET_FIELD(bytes_acked, bytes_acked, struct tcp_sock);
- break;
-
}
return insn - insn_buf;
}
--
2.17.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 bpf-next 4/7] bpf: Add struct bpf_tcp_sock and BPF_FUNC_tcp_sock
2019-02-10 7:22 [PATCH v2 bpf-next 0/7] Add __sk_buff->sk, bpf_tcp_sock, BPF_FUNC_tcp_sock and BPF_FUNC_sk_fullsock Martin KaFai Lau
` (2 preceding siblings ...)
2019-02-10 7:22 ` [PATCH v2 bpf-next 3/7] bpf: Refactor sock_ops_convert_ctx_access Martin KaFai Lau
@ 2019-02-10 7:22 ` Martin KaFai Lau
2019-02-10 7:22 ` [PATCH v2 bpf-next 5/7] bpf: Sync bpf.h to tools/ Martin KaFai Lau
` (3 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: Martin KaFai Lau @ 2019-02-10 7:22 UTC (permalink / raw)
To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Lawrence Brakmo
This patch adds a helper function BPF_FUNC_tcp_sock and it
is currently available for cg_skb and sched_(cls|act):
struct bpf_tcp_sock *bpf_tcp_sock(struct bpf_sock *sk);
int cg_skb_foo(struct __sk_buff *skb) {
struct bpf_tcp_sock *tp;
struct bpf_sock *sk;
__u32 snd_cwnd;
sk = skb->sk;
if (!sk)
return 1;
tp = bpf_tcp_sock(sk);
if (!tp)
return 1;
snd_cwnd = tp->snd_cwnd;
/* ... */
return 1;
}
A 'struct bpf_tcp_sock' is also added to the uapi bpf.h to provide
read-only access. bpf_tcp_sock has all the existing tcp_sock's fields
that has already been exposed by the bpf_sock_ops.
i.e. no new tcp_sock's fields are exposed in bpf.h.
This helper returns a pointer to the tcp_sock. If it is not a tcp_sock
or it cannot be traced back to a tcp_sock by sk_to_full_sk(), it
returns NULL. Hence, the caller needs to check for NULL before
accessing it.
The current use case is to expose members from tcp_sock
to allow a cg_skb_bpf_prog to provide per cgroup traffic
policing/shaping.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
include/linux/bpf.h | 30 +++++++++++++++
include/uapi/linux/bpf.h | 51 +++++++++++++++++++++++++-
kernel/bpf/verifier.c | 31 +++++++++++++++-
net/core/filter.c | 79 ++++++++++++++++++++++++++++++++++++++++
4 files changed, 188 insertions(+), 3 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a60463b45b54..7f58828755fd 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -204,6 +204,7 @@ enum bpf_return_type {
RET_PTR_TO_MAP_VALUE, /* returns a pointer to map elem value */
RET_PTR_TO_MAP_VALUE_OR_NULL, /* returns a pointer to map elem value or NULL */
RET_PTR_TO_SOCKET_OR_NULL, /* returns a pointer to a socket or NULL */
+ RET_PTR_TO_TCP_SOCK_OR_NULL, /* returns a pointer to a tcp_sock or NULL */
};
/* eBPF function prototype used by verifier to allow BPF_CALLs from eBPF programs
@@ -259,6 +260,8 @@ enum bpf_reg_type {
PTR_TO_SOCKET_OR_NULL, /* reg points to struct bpf_sock or NULL */
PTR_TO_SOCK_COMMON, /* reg points to sock_common */
PTR_TO_SOCK_COMMON_OR_NULL, /* reg points to sock_common or NULL */
+ PTR_TO_TCP_SOCK, /* reg points to struct tcp_sock */
+ PTR_TO_TCP_SOCK_OR_NULL, /* reg points to struct tcp_sock or NULL */
};
/* The information passed from prog-specific *_is_valid_access
@@ -956,4 +959,31 @@ static inline u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
}
#endif
+#ifdef CONFIG_INET
+bool bpf_tcp_sock_is_valid_access(int off, int size, enum bpf_access_type type,
+ struct bpf_insn_access_aux *info);
+
+u32 bpf_tcp_sock_convert_ctx_access(enum bpf_access_type type,
+ const struct bpf_insn *si,
+ struct bpf_insn *insn_buf,
+ struct bpf_prog *prog,
+ u32 *target_size);
+#else
+static inline bool bpf_tcp_sock_is_valid_access(int off, int size,
+ enum bpf_access_type type,
+ struct bpf_insn_access_aux *info)
+{
+ return false;
+}
+
+static inline u32 bpf_tcp_sock_convert_ctx_access(enum bpf_access_type type,
+ const struct bpf_insn *si,
+ struct bpf_insn *insn_buf,
+ struct bpf_prog *prog,
+ u32 *target_size)
+{
+ return 0;
+}
+#endif /* CONFIG_INET */
+
#endif /* _LINUX_BPF_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d8f91777c5b6..25c8c0e62ecf 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2337,6 +2337,15 @@ union bpf_attr {
* Return
* A **struct bpf_sock** pointer on success, or NULL in
* case of failure.
+ *
+ * struct bpf_tcp_sock *bpf_tcp_sock(struct bpf_sock *sk)
+ * Description
+ * This helper gets a **struct bpf_tcp_sock** pointer from a
+ * **struct bpf_sock** pointer.
+ *
+ * Return
+ * A **struct bpf_tcp_sock** pointer on success, or NULL in
+ * case of failure.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -2434,7 +2443,8 @@ union bpf_attr {
FN(rc_pointer_rel), \
FN(spin_lock), \
FN(spin_unlock), \
- FN(sk_fullsock),
+ FN(sk_fullsock), \
+ FN(tcp_sock),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
@@ -2616,6 +2626,45 @@ struct bpf_sock {
__u32 state;
};
+struct bpf_tcp_sock {
+ __u32 snd_cwnd; /* Sending congestion window */
+ __u32 srtt_us; /* smoothed round trip time << 3 in usecs */
+ __u32 rtt_min;
+ __u32 snd_ssthresh; /* Slow start size threshold */
+ __u32 rcv_nxt; /* What we want to receive next */
+ __u32 snd_nxt; /* Next sequence we send */
+ __u32 snd_una; /* First byte we want an ack for */
+ __u32 mss_cache; /* Cached effective mss, not including SACKS */
+ __u32 ecn_flags; /* ECN status bits. */
+ __u32 rate_delivered; /* saved rate sample: packets delivered */
+ __u32 rate_interval_us; /* saved rate sample: time elapsed */
+ __u32 packets_out; /* Packets which are "in flight" */
+ __u32 retrans_out; /* Retransmitted packets out */
+ __u32 total_retrans; /* Total retransmits for entire connection */
+ __u32 segs_in; /* RFC4898 tcpEStatsPerfSegsIn
+ * total number of segments in.
+ */
+ __u32 data_segs_in; /* RFC4898 tcpEStatsPerfDataSegsIn
+ * total number of data segments in.
+ */
+ __u32 segs_out; /* RFC4898 tcpEStatsPerfSegsOut
+ * The total number of segments sent.
+ */
+ __u32 data_segs_out; /* RFC4898 tcpEStatsPerfDataSegsOut
+ * total number of data segments sent.
+ */
+ __u32 lost_out; /* Lost packets */
+ __u32 sacked_out; /* SACK'd packets */
+ __u64 bytes_received; /* RFC4898 tcpEStatsAppHCThruOctetsReceived
+ * sum(delta(rcv_nxt)), or how many bytes
+ * were acked.
+ */
+ __u64 bytes_acked; /* RFC4898 tcpEStatsAppHCThruOctetsAcked
+ * sum(delta(snd_una)), or how many bytes
+ * were acked.
+ */
+};
+
struct bpf_sock_tuple {
union {
struct {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b755d55a3791..1b9496c41383 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -334,14 +334,16 @@ static bool type_is_pkt_pointer(enum bpf_reg_type type)
static bool type_is_sk_pointer(enum bpf_reg_type type)
{
return type == PTR_TO_SOCKET ||
- type == PTR_TO_SOCK_COMMON;
+ type == PTR_TO_SOCK_COMMON ||
+ type == PTR_TO_TCP_SOCK;
}
static bool reg_type_may_be_null(enum bpf_reg_type type)
{
return type == PTR_TO_MAP_VALUE_OR_NULL ||
type == PTR_TO_SOCKET_OR_NULL ||
- type == PTR_TO_SOCK_COMMON_OR_NULL;
+ type == PTR_TO_SOCK_COMMON_OR_NULL ||
+ type == PTR_TO_TCP_SOCK_OR_NULL;
}
static bool type_is_refcounted(enum bpf_reg_type type)
@@ -407,6 +409,8 @@ static const char * const reg_type_str[] = {
[PTR_TO_SOCKET_OR_NULL] = "sock_or_null",
[PTR_TO_SOCK_COMMON] = "sock_common",
[PTR_TO_SOCK_COMMON_OR_NULL] = "sock_common_or_null",
+ [PTR_TO_TCP_SOCK] = "tcp_sock",
+ [PTR_TO_TCP_SOCK_OR_NULL] = "tcp_sock_or_null",
};
static char slot_type_char[] = {
@@ -1209,6 +1213,8 @@ static bool is_spillable_regtype(enum bpf_reg_type type)
case PTR_TO_SOCKET_OR_NULL:
case PTR_TO_SOCK_COMMON:
case PTR_TO_SOCK_COMMON_OR_NULL:
+ case PTR_TO_TCP_SOCK:
+ case PTR_TO_TCP_SOCK_OR_NULL:
return true;
default:
return false;
@@ -1662,6 +1668,9 @@ static int check_sock_access(struct bpf_verifier_env *env, int insn_idx,
case PTR_TO_SOCKET:
valid = bpf_sock_is_valid_access(off, size, t, &info);
break;
+ case PTR_TO_TCP_SOCK:
+ valid = bpf_tcp_sock_is_valid_access(off, size, t, &info);
+ break;
default:
valid = false;
}
@@ -1823,6 +1832,9 @@ static int check_ptr_alignment(struct bpf_verifier_env *env,
case PTR_TO_SOCK_COMMON:
pointer_desc = "sock_common ";
break;
+ case PTR_TO_TCP_SOCK:
+ pointer_desc = "tcp_sock ";
+ break;
default:
break;
}
@@ -3148,6 +3160,10 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
/* For mark_ptr_or_null_reg() */
regs[BPF_REG_0].id = ++env->id_gen;
}
+ } else if (fn->ret_type == RET_PTR_TO_TCP_SOCK_OR_NULL) {
+ mark_reg_known_zero(env, regs, BPF_REG_0);
+ regs[BPF_REG_0].type = PTR_TO_TCP_SOCK_OR_NULL;
+ regs[BPF_REG_0].id = ++env->id_gen;
} else {
verbose(env, "unknown return type %d of func %s#%d\n",
fn->ret_type, func_id_name(func_id), func_id);
@@ -3409,6 +3425,8 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
case PTR_TO_SOCKET_OR_NULL:
case PTR_TO_SOCK_COMMON:
case PTR_TO_SOCK_COMMON_OR_NULL:
+ case PTR_TO_TCP_SOCK:
+ case PTR_TO_TCP_SOCK_OR_NULL:
verbose(env, "R%d pointer arithmetic on %s prohibited\n",
dst, reg_type_str[ptr_reg->type]);
return -EACCES;
@@ -4644,6 +4662,8 @@ static void mark_ptr_or_null_reg(struct bpf_func_state *state,
reg->type = PTR_TO_SOCKET;
} else if (reg->type == PTR_TO_SOCK_COMMON_OR_NULL) {
reg->type = PTR_TO_SOCK_COMMON;
+ } else if (reg->type == PTR_TO_TCP_SOCK_OR_NULL) {
+ reg->type = PTR_TO_TCP_SOCK;
}
if (is_null || !(reg_is_refcounted(reg) ||
reg_may_point_to_spin_lock(reg))) {
@@ -5839,6 +5859,8 @@ static bool regsafe(struct bpf_reg_state *rold, struct bpf_reg_state *rcur,
case PTR_TO_SOCKET_OR_NULL:
case PTR_TO_SOCK_COMMON:
case PTR_TO_SOCK_COMMON_OR_NULL:
+ case PTR_TO_TCP_SOCK:
+ case PTR_TO_TCP_SOCK_OR_NULL:
/* Only valid matches are exact, which memcmp() above
* would have accepted
*/
@@ -6161,6 +6183,8 @@ static bool reg_type_mismatch_ok(enum bpf_reg_type type)
case PTR_TO_SOCKET_OR_NULL:
case PTR_TO_SOCK_COMMON:
case PTR_TO_SOCK_COMMON_OR_NULL:
+ case PTR_TO_TCP_SOCK:
+ case PTR_TO_TCP_SOCK_OR_NULL:
return false;
default:
return true;
@@ -7166,6 +7190,9 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
case PTR_TO_SOCK_COMMON:
convert_ctx_access = bpf_sock_convert_ctx_access;
break;
+ case PTR_TO_TCP_SOCK:
+ convert_ctx_access = bpf_tcp_sock_convert_ctx_access;
+ break;
default:
continue;
}
diff --git a/net/core/filter.c b/net/core/filter.c
index c0d7b9ef279f..353735575204 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5315,6 +5315,79 @@ static const struct bpf_func_proto bpf_sock_addr_sk_lookup_udp_proto = {
.arg5_type = ARG_ANYTHING,
};
+bool bpf_tcp_sock_is_valid_access(int off, int size, enum bpf_access_type type,
+ struct bpf_insn_access_aux *info)
+{
+ if (off < 0 || off >= offsetofend(struct bpf_tcp_sock, bytes_acked))
+ return false;
+
+ if (off % size != 0)
+ return false;
+
+ switch (off) {
+ case offsetof(struct bpf_tcp_sock, bytes_received):
+ case offsetof(struct bpf_tcp_sock, bytes_acked):
+ return size == sizeof(__u64);
+ default:
+ return size == sizeof(__u32);
+ }
+}
+
+u32 bpf_tcp_sock_convert_ctx_access(enum bpf_access_type type,
+ const struct bpf_insn *si,
+ struct bpf_insn *insn_buf,
+ struct bpf_prog *prog, u32 *target_size)
+{
+ struct bpf_insn *insn = insn_buf;
+
+#define BPF_TCP_SOCK_GET_COMMON(FIELD) \
+ do { \
+ BUILD_BUG_ON(FIELD_SIZEOF(struct tcp_sock, FIELD) > \
+ FIELD_SIZEOF(struct bpf_tcp_sock, FIELD)); \
+ *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct tcp_sock, FIELD),\
+ si->dst_reg, si->src_reg, \
+ offsetof(struct tcp_sock, FIELD)); \
+ } while (0)
+
+ CONVERT_COMMON_TCP_SOCK_FIELDS(struct bpf_tcp_sock,
+ BPF_TCP_SOCK_GET_COMMON);
+
+ if (insn > insn_buf)
+ return insn - insn_buf;
+
+ switch (si->off) {
+ case offsetof(struct bpf_tcp_sock, rtt_min):
+ BUILD_BUG_ON(FIELD_SIZEOF(struct tcp_sock, rtt_min) !=
+ sizeof(struct minmax));
+ BUILD_BUG_ON(sizeof(struct minmax) <
+ sizeof(struct minmax_sample));
+
+ *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg,
+ offsetof(struct tcp_sock, rtt_min) +
+ offsetof(struct minmax_sample, v));
+ break;
+ }
+
+ return insn - insn_buf;
+}
+
+BPF_CALL_1(bpf_tcp_sock, struct sock *, sk)
+{
+ sk = sk_to_full_sk(sk);
+
+ if (sk_fullsock(sk) && sk->sk_protocol == IPPROTO_TCP)
+ return (unsigned long)sk;
+
+ return (unsigned long)NULL;
+}
+
+static const struct bpf_func_proto bpf_tcp_sock_proto = {
+ .func = bpf_tcp_sock,
+ .gpl_only = false,
+ .ret_type = RET_PTR_TO_TCP_SOCK_OR_NULL,
+ .arg1_type = ARG_PTR_TO_SOCK_COMMON,
+};
+
#endif /* CONFIG_INET */
bool bpf_helper_changes_pkt_data(void *func)
@@ -5470,6 +5543,10 @@ cg_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_get_local_storage_proto;
case BPF_FUNC_sk_fullsock:
return &bpf_sk_fullsock_proto;
+#ifdef CONFIG_INET
+ case BPF_FUNC_tcp_sock:
+ return &bpf_tcp_sock_proto;
+#endif
default:
return sk_filter_func_proto(func_id, prog);
}
@@ -5560,6 +5637,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_sk_lookup_udp_proto;
case BPF_FUNC_sk_release:
return &bpf_sk_release_proto;
+ case BPF_FUNC_tcp_sock:
+ return &bpf_tcp_sock_proto;
#endif
default:
return bpf_base_func_proto(func_id);
--
2.17.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 bpf-next 5/7] bpf: Sync bpf.h to tools/
2019-02-10 7:22 [PATCH v2 bpf-next 0/7] Add __sk_buff->sk, bpf_tcp_sock, BPF_FUNC_tcp_sock and BPF_FUNC_sk_fullsock Martin KaFai Lau
` (3 preceding siblings ...)
2019-02-10 7:22 ` [PATCH v2 bpf-next 4/7] bpf: Add struct bpf_tcp_sock and BPF_FUNC_tcp_sock Martin KaFai Lau
@ 2019-02-10 7:22 ` Martin KaFai Lau
2019-02-10 7:22 ` [PATCH v2 bpf-next 6/7] bpf: Add skb->sk, bpf_sk_fullsock and bpf_tcp_sock tests to test_verifer Martin KaFai Lau
` (2 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: Martin KaFai Lau @ 2019-02-10 7:22 UTC (permalink / raw)
To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Lawrence Brakmo
This patch sync the uapi bpf.h to tools/.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
tools/include/uapi/linux/bpf.h | 72 ++++++++++++++++++++++++++++++----
1 file changed, 65 insertions(+), 7 deletions(-)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 1777fa0c61e4..25c8c0e62ecf 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2329,6 +2329,23 @@ union bpf_attr {
* "**y**".
* Return
* 0
+ *
+ * struct bpf_sock *bpf_sk_fullsock(struct bpf_sock *sk)
+ * Description
+ * This helper gets a **struct bpf_sock** pointer such
+ * that all the fields in bpf_sock can be accessed.
+ * Return
+ * A **struct bpf_sock** pointer on success, or NULL in
+ * case of failure.
+ *
+ * struct bpf_tcp_sock *bpf_tcp_sock(struct bpf_sock *sk)
+ * Description
+ * This helper gets a **struct bpf_tcp_sock** pointer from a
+ * **struct bpf_sock** pointer.
+ *
+ * Return
+ * A **struct bpf_tcp_sock** pointer on success, or NULL in
+ * case of failure.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -2425,7 +2442,9 @@ union bpf_attr {
FN(msg_pop_data), \
FN(rc_pointer_rel), \
FN(spin_lock), \
- FN(spin_unlock),
+ FN(spin_unlock), \
+ FN(sk_fullsock), \
+ FN(tcp_sock),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
@@ -2545,6 +2564,7 @@ struct __sk_buff {
__u64 tstamp;
__u32 wire_len;
__u32 gso_segs;
+ __bpf_md_ptr(struct bpf_sock *, sk);
};
struct bpf_tunnel_key {
@@ -2596,14 +2616,52 @@ struct bpf_sock {
__u32 protocol;
__u32 mark;
__u32 priority;
- __u32 src_ip4; /* Allows 1,2,4-byte read.
- * Stored in network byte order.
+ /* IP address also allows 1 and 2 bytes access */
+ __u32 src_ip4;
+ __u32 src_ip6[4];
+ __u32 src_port; /* host byte order */
+ __u32 dst_port; /* network byte order */
+ __u32 dst_ip4;
+ __u32 dst_ip6[4];
+ __u32 state;
+};
+
+struct bpf_tcp_sock {
+ __u32 snd_cwnd; /* Sending congestion window */
+ __u32 srtt_us; /* smoothed round trip time << 3 in usecs */
+ __u32 rtt_min;
+ __u32 snd_ssthresh; /* Slow start size threshold */
+ __u32 rcv_nxt; /* What we want to receive next */
+ __u32 snd_nxt; /* Next sequence we send */
+ __u32 snd_una; /* First byte we want an ack for */
+ __u32 mss_cache; /* Cached effective mss, not including SACKS */
+ __u32 ecn_flags; /* ECN status bits. */
+ __u32 rate_delivered; /* saved rate sample: packets delivered */
+ __u32 rate_interval_us; /* saved rate sample: time elapsed */
+ __u32 packets_out; /* Packets which are "in flight" */
+ __u32 retrans_out; /* Retransmitted packets out */
+ __u32 total_retrans; /* Total retransmits for entire connection */
+ __u32 segs_in; /* RFC4898 tcpEStatsPerfSegsIn
+ * total number of segments in.
*/
- __u32 src_ip6[4]; /* Allows 1,2,4-byte read.
- * Stored in network byte order.
+ __u32 data_segs_in; /* RFC4898 tcpEStatsPerfDataSegsIn
+ * total number of data segments in.
+ */
+ __u32 segs_out; /* RFC4898 tcpEStatsPerfSegsOut
+ * The total number of segments sent.
+ */
+ __u32 data_segs_out; /* RFC4898 tcpEStatsPerfDataSegsOut
+ * total number of data segments sent.
+ */
+ __u32 lost_out; /* Lost packets */
+ __u32 sacked_out; /* SACK'd packets */
+ __u64 bytes_received; /* RFC4898 tcpEStatsAppHCThruOctetsReceived
+ * sum(delta(rcv_nxt)), or how many bytes
+ * were acked.
*/
- __u32 src_port; /* Allows 4-byte read.
- * Stored in host byte order
+ __u64 bytes_acked; /* RFC4898 tcpEStatsAppHCThruOctetsAcked
+ * sum(delta(snd_una)), or how many bytes
+ * were acked.
*/
};
--
2.17.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 bpf-next 6/7] bpf: Add skb->sk, bpf_sk_fullsock and bpf_tcp_sock tests to test_verifer
2019-02-10 7:22 [PATCH v2 bpf-next 0/7] Add __sk_buff->sk, bpf_tcp_sock, BPF_FUNC_tcp_sock and BPF_FUNC_sk_fullsock Martin KaFai Lau
` (4 preceding siblings ...)
2019-02-10 7:22 ` [PATCH v2 bpf-next 5/7] bpf: Sync bpf.h to tools/ Martin KaFai Lau
@ 2019-02-10 7:22 ` Martin KaFai Lau
2019-02-10 7:22 ` [PATCH v2 bpf-next 7/7] bpf: Add test_sock_fields for skb->sk and bpf_tcp_sock Martin KaFai Lau
2019-02-11 3:55 ` [PATCH v2 bpf-next 0/7] Add __sk_buff->sk, bpf_tcp_sock, BPF_FUNC_tcp_sock and BPF_FUNC_sk_fullsock Alexei Starovoitov
7 siblings, 0 replies; 10+ messages in thread
From: Martin KaFai Lau @ 2019-02-10 7:22 UTC (permalink / raw)
To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Lawrence Brakmo
This patch tests accessing the skb->sk and the new helpers,
bpf_sk_fullsock and bpf_tcp_sock.
The errstr of some existing "reference tracking" tests is changed
with s/bpf_sock/sock/ and s/socket/sock/ where "sock" is from the
verifier's reg_type_str[].
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
tools/testing/selftests/bpf/bpf_util.h | 9 +
.../selftests/bpf/verifier/ref_tracking.c | 4 +-
tools/testing/selftests/bpf/verifier/sock.c | 384 ++++++++++++++++++
tools/testing/selftests/bpf/verifier/unpriv.c | 2 +-
4 files changed, 396 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/bpf/verifier/sock.c
diff --git a/tools/testing/selftests/bpf/bpf_util.h b/tools/testing/selftests/bpf/bpf_util.h
index 315a44fa32af..197347031038 100644
--- a/tools/testing/selftests/bpf/bpf_util.h
+++ b/tools/testing/selftests/bpf/bpf_util.h
@@ -48,4 +48,13 @@ static inline unsigned int bpf_num_possible_cpus(void)
# define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
#endif
+#ifndef sizeof_field
+#define sizeof_field(TYPE, MEMBER) sizeof((((TYPE *)0)->MEMBER))
+#endif
+
+#ifndef offsetofend
+#define offsetofend(TYPE, MEMBER) \
+ (offsetof(TYPE, MEMBER) + sizeof_field(TYPE, MEMBER))
+#endif
+
#endif /* __BPF_UTIL__ */
diff --git a/tools/testing/selftests/bpf/verifier/ref_tracking.c b/tools/testing/selftests/bpf/verifier/ref_tracking.c
index dc2cc823df2b..3ed3593bd8b6 100644
--- a/tools/testing/selftests/bpf/verifier/ref_tracking.c
+++ b/tools/testing/selftests/bpf/verifier/ref_tracking.c
@@ -547,7 +547,7 @@
BPF_EXIT_INSN(),
},
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
- .errstr = "cannot write into socket",
+ .errstr = "cannot write into sock",
.result = REJECT,
},
{
@@ -562,7 +562,7 @@
BPF_EXIT_INSN(),
},
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
- .errstr = "invalid bpf_sock access off=0 size=8",
+ .errstr = "invalid sock access off=0 size=8",
.result = REJECT,
},
{
diff --git a/tools/testing/selftests/bpf/verifier/sock.c b/tools/testing/selftests/bpf/verifier/sock.c
new file mode 100644
index 000000000000..0ddfdf76aba5
--- /dev/null
+++ b/tools/testing/selftests/bpf/verifier/sock.c
@@ -0,0 +1,384 @@
+{
+ "skb->sk: no NULL check",
+ .insns = {
+ BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)),
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1, 0),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+ .result = REJECT,
+ .errstr = "invalid mem access 'sock_common_or_null'",
+},
+{
+ "skb->sk: sk->family [non fullsock field]",
+ .insns = {
+ BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1, offsetof(struct bpf_sock, family)),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+ .result = ACCEPT,
+},
+{
+ "skb->sk: sk->type [fullsock field]",
+ .insns = {
+ BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1, offsetof(struct bpf_sock, type)),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+ .result = REJECT,
+ .errstr = "invalid sock_common access",
+},
+{
+ "bpf_sk_fullsock(skb->sk): no !skb->sk check",
+ .insns = {
+ BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)),
+ BPF_EMIT_CALL(BPF_FUNC_sk_fullsock),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+ .result = REJECT,
+ .errstr = "type=sock_common_or_null expected=sock_common",
+},
+{
+ "sk_fullsock(skb->sk): no NULL check on ret",
+ .insns = {
+ BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ BPF_EMIT_CALL(BPF_FUNC_sk_fullsock),
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, offsetof(struct bpf_sock, type)),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+ .result = REJECT,
+ .errstr = "invalid mem access 'sock_or_null'",
+},
+{
+ "sk_fullsock(skb->sk): sk->type [fullsock field]",
+ .insns = {
+ BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ BPF_EMIT_CALL(BPF_FUNC_sk_fullsock),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, offsetof(struct bpf_sock, type)),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+ .result = ACCEPT,
+},
+{
+ "sk_fullsock(skb->sk): sk->family [non fullsock field]",
+ .insns = {
+ BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ BPF_EMIT_CALL(BPF_FUNC_sk_fullsock),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+ BPF_EXIT_INSN(),
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, offsetof(struct bpf_sock, family)),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+ .result = ACCEPT,
+},
+{
+ "sk_fullsock(skb->sk): sk->state [narrow load]",
+ .insns = {
+ BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ BPF_EMIT_CALL(BPF_FUNC_sk_fullsock),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, offsetof(struct bpf_sock, state)),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+ .result = ACCEPT,
+},
+{
+ "sk_fullsock(skb->sk): sk->dst_port [narrow load]",
+ .insns = {
+ BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ BPF_EMIT_CALL(BPF_FUNC_sk_fullsock),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_0, offsetof(struct bpf_sock, dst_port)),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+ .result = ACCEPT,
+},
+{
+ "sk_fullsock(skb->sk): sk->dst_port [load 2nd byte]",
+ .insns = {
+ BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ BPF_EMIT_CALL(BPF_FUNC_sk_fullsock),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, offsetof(struct bpf_sock, dst_port) + 1),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+ .result = REJECT,
+ .errstr = "invalid sock access",
+},
+{
+ "sk_fullsock(skb->sk): sk->dst_ip6 [load 2nd byte]",
+ .insns = {
+ BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ BPF_EMIT_CALL(BPF_FUNC_sk_fullsock),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, offsetof(struct bpf_sock, dst_ip6[0]) + 1),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+ .result = ACCEPT,
+},
+{
+ "sk_fullsock(skb->sk): sk->type [narrow load]",
+ .insns = {
+ BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ BPF_EMIT_CALL(BPF_FUNC_sk_fullsock),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, offsetof(struct bpf_sock, type)),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+ .result = ACCEPT,
+},
+{
+ "sk_fullsock(skb->sk): sk->protocol [narrow load]",
+ .insns = {
+ BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ BPF_EMIT_CALL(BPF_FUNC_sk_fullsock),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, offsetof(struct bpf_sock, protocol)),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+ .result = ACCEPT,
+},
+{
+ "sk_fullsock(skb->sk): beyond last field",
+ .insns = {
+ BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ BPF_EMIT_CALL(BPF_FUNC_sk_fullsock),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, offsetofend(struct bpf_sock, state)),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+ .result = REJECT,
+ .errstr = "invalid sock access",
+},
+{
+ "bpf_tcp_sock(skb->sk): no !skb->sk check",
+ .insns = {
+ BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)),
+ BPF_EMIT_CALL(BPF_FUNC_tcp_sock),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+ .result = REJECT,
+ .errstr = "type=sock_common_or_null expected=sock_common",
+},
+{
+ "bpf_tcp_sock(skb->sk): no NULL check on ret",
+ .insns = {
+ BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ BPF_EMIT_CALL(BPF_FUNC_tcp_sock),
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, offsetof(struct bpf_tcp_sock, snd_cwnd)),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+ .result = REJECT,
+ .errstr = "invalid mem access 'tcp_sock_or_null'",
+},
+{
+ "bpf_tcp_sock(skb->sk): tp->snd_cwnd",
+ .insns = {
+ BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ BPF_EMIT_CALL(BPF_FUNC_tcp_sock),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+ BPF_EXIT_INSN(),
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, offsetof(struct bpf_tcp_sock, snd_cwnd)),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+ .result = ACCEPT,
+},
+{
+ "bpf_tcp_sock(skb->sk): tp->bytes_acked",
+ .insns = {
+ BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ BPF_EMIT_CALL(BPF_FUNC_tcp_sock),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+ BPF_EXIT_INSN(),
+ BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, offsetof(struct bpf_tcp_sock, bytes_acked)),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+ .result = ACCEPT,
+},
+{
+ "bpf_tcp_sock(skb->sk): beyond last field",
+ .insns = {
+ BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ BPF_EMIT_CALL(BPF_FUNC_tcp_sock),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+ BPF_EXIT_INSN(),
+ BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, offsetofend(struct bpf_tcp_sock, bytes_acked)),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+ .result = REJECT,
+ .errstr = "invalid tcp_sock access",
+},
+{
+ "bpf_tcp_sock(bpf_sk_fullsock(skb->sk)): tp->snd_cwnd",
+ .insns = {
+ BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ BPF_EMIT_CALL(BPF_FUNC_sk_fullsock),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+ BPF_EXIT_INSN(),
+ BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+ BPF_EMIT_CALL(BPF_FUNC_tcp_sock),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+ BPF_EXIT_INSN(),
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, offsetof(struct bpf_tcp_sock, snd_cwnd)),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+ .result = ACCEPT,
+},
+{
+ "bpf_sk_release(skb->sk)",
+ .insns = {
+ BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 1),
+ BPF_EMIT_CALL(BPF_FUNC_sk_release),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+ .result = REJECT,
+ .errstr = "type=sock_common expected=sock",
+},
+{
+ "bpf_sk_release(bpf_sk_fullsock(skb->sk))",
+ .insns = {
+ BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ BPF_EMIT_CALL(BPF_FUNC_sk_fullsock),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+ BPF_EXIT_INSN(),
+ BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+ BPF_EMIT_CALL(BPF_FUNC_sk_release),
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+ .result = REJECT,
+ .errstr = "reference has not been acquired before",
+},
+{
+ "bpf_sk_release(bpf_tcp_sock(skb->sk))",
+ .insns = {
+ BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 2),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ BPF_EMIT_CALL(BPF_FUNC_tcp_sock),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+ BPF_EXIT_INSN(),
+ BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+ BPF_EMIT_CALL(BPF_FUNC_sk_release),
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+ .result = REJECT,
+ .errstr = "type=tcp_sock expected=sock",
+},
diff --git a/tools/testing/selftests/bpf/verifier/unpriv.c b/tools/testing/selftests/bpf/verifier/unpriv.c
index 3e046695fad7..dbaf5be947b2 100644
--- a/tools/testing/selftests/bpf/verifier/unpriv.c
+++ b/tools/testing/selftests/bpf/verifier/unpriv.c
@@ -365,7 +365,7 @@
},
.result = REJECT,
//.errstr = "same insn cannot be used with different pointers",
- .errstr = "cannot write into socket",
+ .errstr = "cannot write into sock",
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
},
{
--
2.17.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 bpf-next 7/7] bpf: Add test_sock_fields for skb->sk and bpf_tcp_sock
2019-02-10 7:22 [PATCH v2 bpf-next 0/7] Add __sk_buff->sk, bpf_tcp_sock, BPF_FUNC_tcp_sock and BPF_FUNC_sk_fullsock Martin KaFai Lau
` (5 preceding siblings ...)
2019-02-10 7:22 ` [PATCH v2 bpf-next 6/7] bpf: Add skb->sk, bpf_sk_fullsock and bpf_tcp_sock tests to test_verifer Martin KaFai Lau
@ 2019-02-10 7:22 ` Martin KaFai Lau
2019-02-11 3:55 ` [PATCH v2 bpf-next 0/7] Add __sk_buff->sk, bpf_tcp_sock, BPF_FUNC_tcp_sock and BPF_FUNC_sk_fullsock Alexei Starovoitov
7 siblings, 0 replies; 10+ messages in thread
From: Martin KaFai Lau @ 2019-02-10 7:22 UTC (permalink / raw)
To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Lawrence Brakmo
This patch adds a C program to show the usage on
skb->sk and bpf_tcp_sock.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
tools/testing/selftests/bpf/Makefile | 6 +-
tools/testing/selftests/bpf/bpf_helpers.h | 4 +
.../testing/selftests/bpf/test_sock_fields.c | 327 ++++++++++++++++++
.../selftests/bpf/test_sock_fields_kern.c | 152 ++++++++
4 files changed, 487 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/bpf/test_sock_fields.c
create mode 100644 tools/testing/selftests/bpf/test_sock_fields_kern.c
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 383d2ff13fc7..c7e1e3255448 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -23,7 +23,7 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test
test_align test_verifier_log test_dev_cgroup test_tcpbpf_user \
test_sock test_btf test_sockmap test_lirc_mode2_user get_cgroup_id_user \
test_socket_cookie test_cgroup_storage test_select_reuseport test_section_names \
- test_netcnt test_tcpnotify_user
+ test_netcnt test_tcpnotify_user test_sock_fields
BPF_OBJ_FILES = \
test_xdp_redirect.o test_xdp_meta.o sockmap_parse_prog.o \
@@ -35,7 +35,8 @@ BPF_OBJ_FILES = \
sendmsg4_prog.o sendmsg6_prog.o test_lirc_mode2_kern.o \
get_cgroup_id_kern.o socket_cookie_prog.o test_select_reuseport_kern.o \
test_skb_cgroup_id_kern.o bpf_flow.o netcnt_prog.o test_xdp_vlan.o \
- xdp_dummy.o test_map_in_map.o test_spin_lock.o test_map_lock.o
+ xdp_dummy.o test_map_in_map.o test_spin_lock.o test_map_lock.o \
+ test_sock_fields_kern.o
# Objects are built with default compilation flags and with sub-register
# code-gen enabled.
@@ -111,6 +112,7 @@ $(OUTPUT)/test_progs: trace_helpers.c
$(OUTPUT)/get_cgroup_id_user: cgroup_helpers.c
$(OUTPUT)/test_cgroup_storage: cgroup_helpers.c
$(OUTPUT)/test_netcnt: cgroup_helpers.c
+$(OUTPUT)/test_sock_fields: cgroup_helpers.c
.PHONY: force
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 6a0ce0f055c5..d9999f1ed1d2 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -176,6 +176,10 @@ static void (*bpf_spin_lock)(struct bpf_spin_lock *lock) =
(void *) BPF_FUNC_spin_lock;
static void (*bpf_spin_unlock)(struct bpf_spin_lock *lock) =
(void *) BPF_FUNC_spin_unlock;
+static struct bpf_sock *(*bpf_sk_fullsock)(struct bpf_sock *sk) =
+ (void *) BPF_FUNC_sk_fullsock;
+static struct bpf_tcp_sock *(*bpf_tcp_sock)(struct bpf_sock *sk) =
+ (void *) BPF_FUNC_tcp_sock;
/* llvm builtin functions that eBPF C program may use to
* emit BPF_LD_ABS and BPF_LD_IND instructions
diff --git a/tools/testing/selftests/bpf/test_sock_fields.c b/tools/testing/selftests/bpf/test_sock_fields.c
new file mode 100644
index 000000000000..9bb58369b481
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_sock_fields.c
@@ -0,0 +1,327 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2019 Facebook */
+
+#include <sys/socket.h>
+#include <sys/epoll.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+
+#include "cgroup_helpers.h"
+
+enum bpf_array_idx {
+ SRV_IDX,
+ CLI_IDX,
+ __NR_BPF_ARRAY_IDX,
+};
+
+#define CHECK(condition, tag, format...) ({ \
+ int __ret = !!(condition); \
+ if (__ret) { \
+ printf("%s(%d):FAIL:%s ", __func__, __LINE__, tag); \
+ printf(format); \
+ printf("\n"); \
+ exit(-1); \
+ } \
+})
+
+#define TEST_CGROUP "/test-bpf-sock-fields"
+#define DATA "Hello BPF!"
+#define DATA_LEN sizeof(DATA)
+
+static struct sockaddr_in6 srv_sa6, cli_sa6;
+static int linum_map_fd;
+static int addr_map_fd;
+static int tp_map_fd;
+static int sk_map_fd;
+static __u32 srv_idx = SRV_IDX;
+static __u32 cli_idx = CLI_IDX;
+
+static void init_loopback6(struct sockaddr_in6 *sa6)
+{
+ memset(sa6, 0, sizeof(*sa6));
+ sa6->sin6_family = AF_INET6;
+ sa6->sin6_addr = in6addr_loopback;
+}
+
+static void print_sk(const struct bpf_sock *sk)
+{
+ char src_ip4[24], dst_ip4[24];
+ char src_ip6[64], dst_ip6[64];
+
+ inet_ntop(AF_INET, &sk->src_ip4, src_ip4, sizeof(src_ip4));
+ inet_ntop(AF_INET6, &sk->src_ip6, src_ip6, sizeof(src_ip6));
+ inet_ntop(AF_INET, &sk->dst_ip4, dst_ip4, sizeof(dst_ip4));
+ inet_ntop(AF_INET6, &sk->dst_ip6, dst_ip6, sizeof(dst_ip6));
+
+ printf("state:%u bound_dev_if:%u family:%u type:%u protocol:%u mark:%u priority:%u "
+ "src_ip4:%x(%s) src_ip6:%x:%x:%x:%x(%s) src_port:%u "
+ "dst_ip4:%x(%s) dst_ip6:%x:%x:%x:%x(%s) dst_port:%u\n",
+ sk->state, sk->bound_dev_if, sk->family, sk->type, sk->protocol,
+ sk->mark, sk->priority,
+ sk->src_ip4, src_ip4,
+ sk->src_ip6[0], sk->src_ip6[1], sk->src_ip6[2], sk->src_ip6[3],
+ src_ip6, sk->src_port,
+ sk->dst_ip4, dst_ip4,
+ sk->dst_ip6[0], sk->dst_ip6[1], sk->dst_ip6[2], sk->dst_ip6[3],
+ dst_ip6, ntohs(sk->dst_port));
+}
+
+static void print_tp(const struct bpf_tcp_sock *tp)
+{
+ printf("snd_cwnd:%u srtt_us:%u rtt_min:%u snd_ssthresh:%u rcv_nxt:%u "
+ "snd_nxt:%u snd:una:%u mss_cache:%u ecn_flags:%u "
+ "rate_delivered:%u rate_interval_us:%u packets_out:%u "
+ "retrans_out:%u total_retrans:%u segs_in:%u data_segs_in:%u "
+ "segs_out:%u data_segs_out:%u lost_out:%u sacked_out:%u "
+ "bytes_received:%llu bytes_acked:%llu\n",
+ tp->snd_cwnd, tp->srtt_us, tp->rtt_min, tp->snd_ssthresh,
+ tp->rcv_nxt, tp->snd_nxt, tp->snd_una, tp->mss_cache,
+ tp->ecn_flags, tp->rate_delivered, tp->rate_interval_us,
+ tp->packets_out, tp->retrans_out, tp->total_retrans,
+ tp->segs_in, tp->data_segs_in, tp->segs_out,
+ tp->data_segs_out, tp->lost_out, tp->sacked_out,
+ tp->bytes_received, tp->bytes_acked);
+}
+
+static void check_result(void)
+{
+ struct bpf_tcp_sock srv_tp, cli_tp;
+ struct bpf_sock srv_sk, cli_sk;
+ __u32 linum, idx0 = 0;
+ int err;
+
+ err = bpf_map_lookup_elem(linum_map_fd, &idx0, &linum);
+ CHECK(err == -1, "bpf_map_lookup_elem(linum_map_fd)",
+ "err:%d errno:%d", err, errno);
+
+ err = bpf_map_lookup_elem(sk_map_fd, &srv_idx, &srv_sk);
+ CHECK(err == -1, "bpf_map_lookup_elem(sk_map_fd, &srv_idx)",
+ "err:%d errno:%d", err, errno);
+ err = bpf_map_lookup_elem(tp_map_fd, &srv_idx, &srv_tp);
+ CHECK(err == -1, "bpf_map_lookup_elem(tp_map_fd, &srv_idx)",
+ "err:%d errno:%d", err, errno);
+
+ err = bpf_map_lookup_elem(sk_map_fd, &cli_idx, &cli_sk);
+ CHECK(err == -1, "bpf_map_lookup_elem(sk_map_fd, &cli_idx)",
+ "err:%d errno:%d", err, errno);
+ err = bpf_map_lookup_elem(tp_map_fd, &cli_idx, &cli_tp);
+ CHECK(err == -1, "bpf_map_lookup_elem(tp_map_fd, &cli_idx)",
+ "err:%d errno:%d", err, errno);
+
+ printf("srv_sk: ");
+ print_sk(&srv_sk);
+ printf("\n");
+
+ printf("cli_sk: ");
+ print_sk(&cli_sk);
+ printf("\n");
+
+ printf("srv_tp: ");
+ print_tp(&srv_tp);
+ printf("\n");
+
+ printf("cli_tp: ");
+ print_tp(&cli_tp);
+ printf("\n");
+
+ CHECK(srv_sk.state == 10 ||
+ !srv_sk.state ||
+ srv_sk.family != AF_INET6 ||
+ srv_sk.protocol != IPPROTO_TCP ||
+ memcmp(srv_sk.src_ip6, &in6addr_loopback,
+ sizeof(srv_sk.src_ip6)) ||
+ memcmp(srv_sk.dst_ip6, &in6addr_loopback,
+ sizeof(srv_sk.dst_ip6)) ||
+ srv_sk.src_port != ntohs(srv_sa6.sin6_port) ||
+ srv_sk.dst_port != cli_sa6.sin6_port,
+ "Unexpected srv_sk", "Check srv_sk output. linum:%u", linum);
+
+ CHECK(cli_sk.state == 10 ||
+ !cli_sk.state ||
+ cli_sk.family != AF_INET6 ||
+ cli_sk.protocol != IPPROTO_TCP ||
+ memcmp(cli_sk.src_ip6, &in6addr_loopback,
+ sizeof(cli_sk.src_ip6)) ||
+ memcmp(cli_sk.dst_ip6, &in6addr_loopback,
+ sizeof(cli_sk.dst_ip6)) ||
+ cli_sk.src_port != ntohs(cli_sa6.sin6_port) ||
+ cli_sk.dst_port != srv_sa6.sin6_port,
+ "Unexpected cli_sk", "Check cli_sk output. linum:%u", linum);
+
+ CHECK(srv_tp.data_segs_out != 1 ||
+ srv_tp.data_segs_in ||
+ srv_tp.snd_cwnd != 10 ||
+ srv_tp.total_retrans ||
+ srv_tp.bytes_acked != DATA_LEN,
+ "Unexpected srv_tp", "Check srv_tp output. linum:%u", linum);
+
+ CHECK(cli_tp.data_segs_out ||
+ cli_tp.data_segs_in != 1 ||
+ cli_tp.snd_cwnd != 10 ||
+ cli_tp.total_retrans ||
+ cli_tp.bytes_received != DATA_LEN,
+ "Unexpected cli_tp", "Check cli_tp output. linum:%u", linum);
+}
+
+static void test(void)
+{
+ int listen_fd, cli_fd, accept_fd, epfd, err;
+ struct epoll_event ev;
+ socklen_t addrlen;
+
+ addrlen = sizeof(struct sockaddr_in6);
+ ev.events = EPOLLIN;
+
+ epfd = epoll_create(1);
+ CHECK(epfd == -1, "epoll_create()", "epfd:%d errno:%d", epfd, errno);
+
+ /* Prepare listen_fd */
+ listen_fd = socket(AF_INET6, SOCK_STREAM | SOCK_NONBLOCK, 0);
+ CHECK(listen_fd == -1, "socket()", "listen_fd:%d errno:%d",
+ listen_fd, errno);
+
+ init_loopback6(&srv_sa6);
+ err = bind(listen_fd, (struct sockaddr *)&srv_sa6, sizeof(srv_sa6));
+ CHECK(err, "bind(listen_fd)", "err:%d errno:%d", err, errno);
+
+ err = getsockname(listen_fd, (struct sockaddr *)&srv_sa6, &addrlen);
+ CHECK(err, "getsockname(listen_fd)", "err:%d errno:%d", err, errno);
+
+ err = listen(listen_fd, 1);
+ CHECK(err, "listen(listen_fd)", "err:%d errno:%d", err, errno);
+
+ /* Prepare cli_fd */
+ cli_fd = socket(AF_INET6, SOCK_STREAM | SOCK_NONBLOCK, 0);
+ CHECK(cli_fd == -1, "socket()", "cli_fd:%d errno:%d", cli_fd, errno);
+
+ init_loopback6(&cli_sa6);
+ err = bind(cli_fd, (struct sockaddr *)&cli_sa6, sizeof(cli_sa6));
+ CHECK(err, "bind(cli_fd)", "err:%d errno:%d", err, errno);
+
+ err = getsockname(cli_fd, (struct sockaddr *)&cli_sa6, &addrlen);
+ CHECK(err, "getsockname(cli_fd)", "err:%d errno:%d",
+ err, errno);
+
+ /* Update addr_map with srv_sa6 and cli_sa6 */
+ err = bpf_map_update_elem(addr_map_fd, &srv_idx, &srv_sa6, 0);
+ CHECK(err, "map_update", "err:%d errno:%d", err, errno);
+
+ err = bpf_map_update_elem(addr_map_fd, &cli_idx, &cli_sa6, 0);
+ CHECK(err, "map_update", "err:%d errno:%d", err, errno);
+
+ /* Connect from cli_sa6 to srv_sa6 */
+ err = connect(cli_fd, (struct sockaddr *)&srv_sa6, addrlen);
+ printf("srv_sa6.sin6_port:%u cli_sa6.sin6_port:%u\n\n",
+ ntohs(srv_sa6.sin6_port), ntohs(cli_sa6.sin6_port));
+ CHECK(err && errno != EINPROGRESS,
+ "connect(cli_fd)", "err:%d errno:%d", err, errno);
+
+ ev.data.fd = listen_fd;
+ err = epoll_ctl(epfd, EPOLL_CTL_ADD, listen_fd, &ev);
+ CHECK(err, "epoll_ctl(EPOLL_CTL_ADD, listen_fd)", "err:%d errno:%d",
+ err, errno);
+
+ /* Accept the connection */
+ /* Have some timeout in accept(listen_fd). Just in case. */
+ err = epoll_wait(epfd, &ev, 1, 1000);
+ CHECK(err != 1 || ev.data.fd != listen_fd,
+ "epoll_wait(listen_fd)",
+ "err:%d errno:%d ev.data.fd:%d listen_fd:%d",
+ err, errno, ev.data.fd, listen_fd);
+
+ accept_fd = accept(listen_fd, NULL, NULL);
+ CHECK(accept_fd == -1, "accept(listen_fd)", "accept_fd:%d errno:%d",
+ accept_fd, errno);
+ close(listen_fd);
+
+ /* Send some data from accept_fd to cli_fd */
+ err = send(accept_fd, DATA, DATA_LEN, 0);
+ CHECK(err != DATA_LEN, "send(accept_fd)", "err:%d errno:%d",
+ err, errno);
+
+ /* Have some timeout in recv(cli_fd). Just in case. */
+ ev.data.fd = cli_fd;
+ err = epoll_ctl(epfd, EPOLL_CTL_ADD, cli_fd, &ev);
+ CHECK(err, "epoll_ctl(EPOLL_CTL_ADD, cli_fd)", "err:%d errno:%d",
+ err, errno);
+
+ err = epoll_wait(epfd, &ev, 1, 1000);
+ CHECK(err != 1 || ev.data.fd != cli_fd,
+ "epoll_wait(cli_fd)", "err:%d errno:%d ev.data.fd:%d cli_fd:%d",
+ err, errno, ev.data.fd, cli_fd);
+
+ err = recv(cli_fd, NULL, 0, MSG_TRUNC);
+ CHECK(err, "recv(cli_fd)", "err:%d errno:%d", err, errno);
+
+ close(epfd);
+ close(accept_fd);
+ close(cli_fd);
+
+ check_result();
+}
+
+int main(int argc, char **argv)
+{
+ struct bpf_prog_load_attr attr = {
+ .file = "test_sock_fields_kern.o",
+ .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+ .expected_attach_type = BPF_CGROUP_INET_EGRESS,
+ };
+ int cgroup_fd, prog_fd, err;
+ struct bpf_object *obj;
+ struct bpf_map *map;
+
+ err = setup_cgroup_environment();
+ CHECK(err, "setup_cgroup_environment()", "err:%d errno:%d",
+ err, errno);
+
+ atexit(cleanup_cgroup_environment);
+
+ /* Create a cgroup, get fd, and join it */
+ cgroup_fd = create_and_get_cgroup(TEST_CGROUP);
+ CHECK(cgroup_fd == -1, "create_and_get_cgroup()",
+ "cgroup_fd:%d errno:%d", cgroup_fd, errno);
+
+ err = join_cgroup(TEST_CGROUP);
+ CHECK(err, "join_cgroup", "err:%d errno:%d", err, errno);
+
+ err = bpf_prog_load_xattr(&attr, &obj, &prog_fd);
+ CHECK(err, "bpf_prog_load_xattr()", "err:%d", err);
+
+ err = bpf_prog_attach(prog_fd, cgroup_fd, BPF_CGROUP_INET_EGRESS, 0);
+ CHECK(err == -1, "bpf_prog_attach(CPF_CGROUP_INET_EGRESS)",
+ "err:%d errno%d", err, errno);
+ close(cgroup_fd);
+
+ map = bpf_object__find_map_by_name(obj, "addr_map");
+ CHECK(!map, "cannot find addr_map", "(null)");
+ addr_map_fd = bpf_map__fd(map);
+
+ map = bpf_object__find_map_by_name(obj, "sock_result_map");
+ CHECK(!map, "cannot find sock_result_map", "(null)");
+ sk_map_fd = bpf_map__fd(map);
+
+ map = bpf_object__find_map_by_name(obj, "tcp_sock_result_map");
+ CHECK(!map, "cannot find tcp_sock_result_map", "(null)");
+ tp_map_fd = bpf_map__fd(map);
+
+ map = bpf_object__find_map_by_name(obj, "linum_map");
+ CHECK(!map, "cannot find linum_map", "(null)");
+ linum_map_fd = bpf_map__fd(map);
+
+ test();
+
+ bpf_object__close(obj);
+ cleanup_cgroup_environment();
+
+ printf("PASS\n");
+
+ return 0;
+}
diff --git a/tools/testing/selftests/bpf/test_sock_fields_kern.c b/tools/testing/selftests/bpf/test_sock_fields_kern.c
new file mode 100644
index 000000000000..de1a43e8f610
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_sock_fields_kern.c
@@ -0,0 +1,152 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2019 Facebook */
+
+#include <linux/bpf.h>
+#include <netinet/in.h>
+#include <stdbool.h>
+
+#include "bpf_helpers.h"
+#include "bpf_endian.h"
+
+enum bpf_array_idx {
+ SRV_IDX,
+ CLI_IDX,
+ __NR_BPF_ARRAY_IDX,
+};
+
+struct bpf_map_def SEC("maps") addr_map = {
+ .type = BPF_MAP_TYPE_ARRAY,
+ .key_size = sizeof(__u32),
+ .value_size = sizeof(struct sockaddr_in6),
+ .max_entries = __NR_BPF_ARRAY_IDX,
+};
+
+struct bpf_map_def SEC("maps") sock_result_map = {
+ .type = BPF_MAP_TYPE_ARRAY,
+ .key_size = sizeof(__u32),
+ .value_size = sizeof(struct bpf_sock),
+ .max_entries = __NR_BPF_ARRAY_IDX,
+};
+
+struct bpf_map_def SEC("maps") tcp_sock_result_map = {
+ .type = BPF_MAP_TYPE_ARRAY,
+ .key_size = sizeof(__u32),
+ .value_size = sizeof(struct bpf_tcp_sock),
+ .max_entries = __NR_BPF_ARRAY_IDX,
+};
+
+struct bpf_map_def SEC("maps") linum_map = {
+ .type = BPF_MAP_TYPE_ARRAY,
+ .key_size = sizeof(__u32),
+ .value_size = sizeof(__u32),
+ .max_entries = 1,
+};
+
+static bool is_loopback6(__u32 *a6)
+{
+ return !a6[0] && !a6[1] && !a6[2] && a6[3] == bpf_htonl(1);
+}
+
+static void skcpy(struct bpf_sock *dst,
+ const struct bpf_sock *src)
+{
+ dst->bound_dev_if = src->bound_dev_if;
+ dst->family = src->family;
+ dst->type = src->type;
+ dst->protocol = src->protocol;
+ dst->mark = src->mark;
+ dst->priority = src->priority;
+ dst->src_ip4 = src->src_ip4;
+ dst->src_ip6[0] = src->src_ip6[0];
+ dst->src_ip6[1] = src->src_ip6[1];
+ dst->src_ip6[2] = src->src_ip6[2];
+ dst->src_ip6[3] = src->src_ip6[3];
+ dst->src_port = src->src_port;
+ dst->dst_ip4 = src->dst_ip4;
+ dst->dst_ip6[0] = src->dst_ip6[0];
+ dst->dst_ip6[1] = src->dst_ip6[1];
+ dst->dst_ip6[2] = src->dst_ip6[2];
+ dst->dst_ip6[3] = src->dst_ip6[3];
+ dst->dst_port = src->dst_port;
+ dst->state = src->state;
+}
+
+static void tpcpy(struct bpf_tcp_sock *dst,
+ const struct bpf_tcp_sock *src)
+{
+ dst->snd_cwnd = src->snd_cwnd;
+ dst->srtt_us = src->srtt_us;
+ dst->rtt_min = src->rtt_min;
+ dst->snd_ssthresh = src->snd_ssthresh;
+ dst->rcv_nxt = src->rcv_nxt;
+ dst->snd_nxt = src->snd_nxt;
+ dst->snd_una = src->snd_una;
+ dst->mss_cache = src->mss_cache;
+ dst->ecn_flags = src->ecn_flags;
+ dst->rate_delivered = src->rate_delivered;
+ dst->rate_interval_us = src->rate_interval_us;
+ dst->packets_out = src->packets_out;
+ dst->retrans_out = src->retrans_out;
+ dst->total_retrans = src->total_retrans;
+ dst->segs_in = src->segs_in;
+ dst->data_segs_in = src->data_segs_in;
+ dst->segs_out = src->segs_out;
+ dst->data_segs_out = src->data_segs_out;
+ dst->lost_out = src->lost_out;
+ dst->sacked_out = src->sacked_out;
+ dst->bytes_received = src->bytes_received;
+ dst->bytes_acked = src->bytes_acked;
+}
+
+#define RETURN { \
+ linum = __LINE__; \
+ bpf_map_update_elem(&linum_map, &idx0, &linum, 0); \
+ return 1; \
+}
+
+SEC("cgroup_skb/egress")
+int read_sock_fields(struct __sk_buff *skb)
+{
+ __u32 srv_idx = SRV_IDX, cli_idx = CLI_IDX, idx;
+ struct sockaddr_in6 *srv_sa6, *cli_sa6;
+ struct bpf_tcp_sock *tp, *tp_ret;
+ struct bpf_sock *sk, *sk_ret;
+ __u32 linum, idx0 = 0;
+
+ sk = skb->sk;
+ if (!sk || sk->state == 10)
+ RETURN;
+
+ sk = bpf_sk_fullsock(sk);
+ if (!sk || sk->family != AF_INET6 || sk->protocol != IPPROTO_TCP ||
+ !is_loopback6(sk->src_ip6))
+ RETURN;
+
+ tp = bpf_tcp_sock(sk);
+ if (!tp)
+ RETURN;
+
+ srv_sa6 = bpf_map_lookup_elem(&addr_map, &srv_idx);
+ cli_sa6 = bpf_map_lookup_elem(&addr_map, &cli_idx);
+ if (!srv_sa6 || !cli_sa6)
+ RETURN;
+
+ if (sk->src_port == bpf_ntohs(srv_sa6->sin6_port))
+ idx = srv_idx;
+ else if (sk->src_port == bpf_ntohs(cli_sa6->sin6_port))
+ idx = cli_idx;
+ else
+ RETURN;
+
+ sk_ret = bpf_map_lookup_elem(&sock_result_map, &idx);
+ tp_ret = bpf_map_lookup_elem(&tcp_sock_result_map, &idx);
+ if (!sk_ret || !tp_ret)
+ RETURN;
+
+ skcpy(sk_ret, sk);
+ tpcpy(tp_ret, tp);
+
+ RETURN;
+}
+
+char _license[] SEC("license") = "GPL";
--
2.17.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 bpf-next 0/7] Add __sk_buff->sk, bpf_tcp_sock, BPF_FUNC_tcp_sock and BPF_FUNC_sk_fullsock
2019-02-10 7:22 [PATCH v2 bpf-next 0/7] Add __sk_buff->sk, bpf_tcp_sock, BPF_FUNC_tcp_sock and BPF_FUNC_sk_fullsock Martin KaFai Lau
` (6 preceding siblings ...)
2019-02-10 7:22 ` [PATCH v2 bpf-next 7/7] bpf: Add test_sock_fields for skb->sk and bpf_tcp_sock Martin KaFai Lau
@ 2019-02-11 3:55 ` Alexei Starovoitov
2019-02-11 14:56 ` Daniel Borkmann
7 siblings, 1 reply; 10+ messages in thread
From: Alexei Starovoitov @ 2019-02-11 3:55 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team,
Lawrence Brakmo
On Sat, Feb 09, 2019 at 11:22:20PM -0800, Martin KaFai Lau wrote:
> This series adds __sk_buff->sk, "struct bpf_tcp_sock",
> BPF_FUNC_sk_fullsock and BPF_FUNC_tcp_sock. Together, they provide
> a common way to expose the members of "struct tcp_sock" and
> "struct bpf_sock" for the bpf_prog to access.
>
> The patch series first adds a bpf_sock pointer to __sk_buff
> and a new helper BPF_FUNC_sk_fullsock.
>
> It then adds BPF_FUNC_tcp_sock to get a bpf_tcp_sock
> pointer from a bpf_sock pointer.
>
> The current use case is to allow a cg_skb_bpf_prog to provide
> per cgroup traffic policing/shaping.
>
> Please see individual patch for details.
>
> v2:
> - Patch 1 depends on
> commit d623876646be ("bpf: Fix narrow load on a bpf_sock returned from sk_lookup()")
> in the bpf branch.
> - Add sk_to_full_sk() to bpf_sk_fullsock() and bpf_tcp_sock()
> such that there is a way to access the listener's sk and tcp_sk
> when __sk_buff->sk is a request_sock.
> The comments in the uapi bpf.h is updated accordingly.
> - bpf_ctx_range_till() is used in bpf_sock_common_is_valid_access()
> in patch 1. Saved a few lines.
> - Patch 2 is new in v2 and it adds "state", "dst_ip4", "dst_ip6" and
> "dst_port" to the bpf_sock. Narrow load is allowed on them.
> The "state" (i.e. sk_state) has already been used in
> INET_DIAG (e.g. ss -t) and getsockopt(TCP_INFO).
> - While at it in the new patch 2, also allow narrow load on some
> existing fields of the bpf_sock, which are "family", "type", "protocol"
> and "src_port". Only allow loading from first byte for now.
> i.e. does not allow narrow load starting from the 2nd byte.
> - Add some narrow load tests to the test_verifier's sock.c
Daniel,
I believe this new revision addresses your concerns exactly as we discussed.
So I pushed it to bpf-next.
please double check that it's what you expected.
We can always revert.
Thanks everyone!
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 bpf-next 0/7] Add __sk_buff->sk, bpf_tcp_sock, BPF_FUNC_tcp_sock and BPF_FUNC_sk_fullsock
2019-02-11 3:55 ` [PATCH v2 bpf-next 0/7] Add __sk_buff->sk, bpf_tcp_sock, BPF_FUNC_tcp_sock and BPF_FUNC_sk_fullsock Alexei Starovoitov
@ 2019-02-11 14:56 ` Daniel Borkmann
0 siblings, 0 replies; 10+ messages in thread
From: Daniel Borkmann @ 2019-02-11 14:56 UTC (permalink / raw)
To: Alexei Starovoitov, Martin KaFai Lau
Cc: netdev, Alexei Starovoitov, kernel-team, Lawrence Brakmo
On 02/11/2019 04:55 AM, Alexei Starovoitov wrote:
> On Sat, Feb 09, 2019 at 11:22:20PM -0800, Martin KaFai Lau wrote:
>> This series adds __sk_buff->sk, "struct bpf_tcp_sock",
>> BPF_FUNC_sk_fullsock and BPF_FUNC_tcp_sock. Together, they provide
>> a common way to expose the members of "struct tcp_sock" and
>> "struct bpf_sock" for the bpf_prog to access.
>>
>> The patch series first adds a bpf_sock pointer to __sk_buff
>> and a new helper BPF_FUNC_sk_fullsock.
>>
>> It then adds BPF_FUNC_tcp_sock to get a bpf_tcp_sock
>> pointer from a bpf_sock pointer.
>>
>> The current use case is to allow a cg_skb_bpf_prog to provide
>> per cgroup traffic policing/shaping.
>>
>> Please see individual patch for details.
>>
>> v2:
>> - Patch 1 depends on
>> commit d623876646be ("bpf: Fix narrow load on a bpf_sock returned from sk_lookup()")
>> in the bpf branch.
>> - Add sk_to_full_sk() to bpf_sk_fullsock() and bpf_tcp_sock()
>> such that there is a way to access the listener's sk and tcp_sk
>> when __sk_buff->sk is a request_sock.
>> The comments in the uapi bpf.h is updated accordingly.
>> - bpf_ctx_range_till() is used in bpf_sock_common_is_valid_access()
>> in patch 1. Saved a few lines.
>> - Patch 2 is new in v2 and it adds "state", "dst_ip4", "dst_ip6" and
>> "dst_port" to the bpf_sock. Narrow load is allowed on them.
>> The "state" (i.e. sk_state) has already been used in
>> INET_DIAG (e.g. ss -t) and getsockopt(TCP_INFO).
>> - While at it in the new patch 2, also allow narrow load on some
>> existing fields of the bpf_sock, which are "family", "type", "protocol"
>> and "src_port". Only allow loading from first byte for now.
>> i.e. does not allow narrow load starting from the 2nd byte.
>> - Add some narrow load tests to the test_verifier's sock.c
>
> Daniel,
> I believe this new revision addresses your concerns exactly as we discussed.
> So I pushed it to bpf-next.
> please double check that it's what you expected.
> We can always revert.
> Thanks everyone!
Yep, looks better, thanks!
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-02-11 14:56 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-10 7:22 [PATCH v2 bpf-next 0/7] Add __sk_buff->sk, bpf_tcp_sock, BPF_FUNC_tcp_sock and BPF_FUNC_sk_fullsock Martin KaFai Lau
2019-02-10 7:22 ` [PATCH v2 bpf-next 1/7] bpf: Add a bpf_sock pointer to __sk_buff and a bpf_sk_fullsock helper Martin KaFai Lau
2019-02-10 7:22 ` [PATCH v2 bpf-next 2/7] bpf: Add state, dst_ip4, dst_ip6 and dst_port to bpf_sock Martin KaFai Lau
2019-02-10 7:22 ` [PATCH v2 bpf-next 3/7] bpf: Refactor sock_ops_convert_ctx_access Martin KaFai Lau
2019-02-10 7:22 ` [PATCH v2 bpf-next 4/7] bpf: Add struct bpf_tcp_sock and BPF_FUNC_tcp_sock Martin KaFai Lau
2019-02-10 7:22 ` [PATCH v2 bpf-next 5/7] bpf: Sync bpf.h to tools/ Martin KaFai Lau
2019-02-10 7:22 ` [PATCH v2 bpf-next 6/7] bpf: Add skb->sk, bpf_sk_fullsock and bpf_tcp_sock tests to test_verifer Martin KaFai Lau
2019-02-10 7:22 ` [PATCH v2 bpf-next 7/7] bpf: Add test_sock_fields for skb->sk and bpf_tcp_sock Martin KaFai Lau
2019-02-11 3:55 ` [PATCH v2 bpf-next 0/7] Add __sk_buff->sk, bpf_tcp_sock, BPF_FUNC_tcp_sock and BPF_FUNC_sk_fullsock Alexei Starovoitov
2019-02-11 14:56 ` Daniel Borkmann
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).