All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/7] bpf/flow_dissector: support input flags
@ 2019-07-24 17:00 Stanislav Fomichev
  2019-07-24 17:00 ` [PATCH bpf-next 1/7] bpf/flow_dissector: pass input flags to BPF flow dissector program Stanislav Fomichev
                   ` (7 more replies)
  0 siblings, 8 replies; 20+ messages in thread
From: Stanislav Fomichev @ 2019-07-24 17:00 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, Stanislav Fomichev, Willem de Bruijn, Petar Penkov

C flow dissector supports input flags that tell it to customize parsing
by either stopping early or trying to parse as deep as possible.
BPF flow dissector always parses as deep as possible which is sub-optimal.
Pass input flags to the BPF flow dissector as well so it can make the same
decisions.

Series outline:
* remove unused FLOW_DISSECTOR_F_STOP_AT_L3 flag
* export FLOW_DISSECTOR_F_XXX flags as uapi and pass them to BPF
  flow dissector
* add documentation for the export flags
* support input flags in BPF_PROG_TEST_RUN via ctx_{in,out}
* sync uapi to tools
* support FLOW_DISSECTOR_F_PARSE_1ST_FRAG in selftest
* support FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL in kernel and selftest
* support FLOW_DISSECTOR_F_STOP_AT_ENCAP in selftest

Pros:
* makes BPF flow dissector faster by avoiding burning extra cycles
* existing BPF progs continue to work by ignoring the flags and always
  parsing as deep as possible

Cons:
* new UAPI which we need to support (OTOH, if we need to deprecate some
  flags, we can just stop setting them upon calling BPF programs)

Some numbers (with .repeat = 4000000 in test_flow_dissector):
        test_flow_dissector:PASS:ipv4-frag 35 nsec
        test_flow_dissector:PASS:ipv4-frag 35 nsec
        test_flow_dissector:PASS:ipv4-no-frag 32 nsec
        test_flow_dissector:PASS:ipv4-no-frag 32 nsec

        test_flow_dissector:PASS:ipv6-frag 39 nsec
        test_flow_dissector:PASS:ipv6-frag 39 nsec
        test_flow_dissector:PASS:ipv6-no-frag 36 nsec
        test_flow_dissector:PASS:ipv6-no-frag 36 nsec

        test_flow_dissector:PASS:ipv6-flow-label 36 nsec
        test_flow_dissector:PASS:ipv6-flow-label 36 nsec
        test_flow_dissector:PASS:ipv6-no-flow-label 33 nsec
        test_flow_dissector:PASS:ipv6-no-flow-label 33 nsec

        test_flow_dissector:PASS:ipip-encap 38 nsec
        test_flow_dissector:PASS:ipip-encap 38 nsec
        test_flow_dissector:PASS:ipip-no-encap 32 nsec
        test_flow_dissector:PASS:ipip-no-encap 32 nsec

The improvement is around 10%, but it's in a tight cache-hot
BPF_PROG_TEST_RUN loop.

Cc: Willem de Bruijn <willemb@google.com>
Cc: Petar Penkov <ppenkov@google.com>

Stanislav Fomichev (7):
  bpf/flow_dissector: pass input flags to BPF flow dissector program
  bpf/flow_dissector: document flags
  bpf/flow_dissector: support flags in BPF_PROG_TEST_RUN
  tools/bpf: sync bpf_flow_keys flags
  sefltests/bpf: support FLOW_DISSECTOR_F_PARSE_1ST_FRAG
  bpf/flow_dissector: support ipv6 flow_label and
    FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL
  selftests/bpf: support FLOW_DISSECTOR_F_STOP_AT_ENCAP

 Documentation/bpf/prog_flow_dissector.rst     |  18 ++
 include/linux/skbuff.h                        |   2 +-
 include/net/flow_dissector.h                  |   4 -
 include/uapi/linux/bpf.h                      |   6 +
 net/bpf/test_run.c                            |  39 ++-
 net/core/flow_dissector.c                     |  14 +-
 tools/include/uapi/linux/bpf.h                |   6 +
 .../selftests/bpf/prog_tests/flow_dissector.c | 235 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/bpf_flow.c  |  46 +++-
 9 files changed, 353 insertions(+), 17 deletions(-)

-- 
2.22.0.657.g960e92d24f-goog

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 1/7] bpf/flow_dissector: pass input flags to BPF flow dissector program
  2019-07-24 17:00 [PATCH bpf-next 0/7] bpf/flow_dissector: support input flags Stanislav Fomichev
@ 2019-07-24 17:00 ` Stanislav Fomichev
  2019-07-24 22:21   ` Song Liu
  2019-07-24 17:00 ` [PATCH bpf-next 2/7] bpf/flow_dissector: document flags Stanislav Fomichev
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Stanislav Fomichev @ 2019-07-24 17:00 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, Stanislav Fomichev, Willem de Bruijn, Petar Penkov

C flow dissector supports input flags that tell it to customize parsing
by either stopping early or trying to parse as deep as possible. Pass
those flags to the BPF flow dissector so it can make the same
decisions. In the next commits I'll add support for those flags to
our reference bpf_flow.c

Cc: Willem de Bruijn <willemb@google.com>
Cc: Petar Penkov <ppenkov@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/linux/skbuff.h       | 2 +-
 include/net/flow_dissector.h | 4 ----
 include/uapi/linux/bpf.h     | 5 +++++
 net/bpf/test_run.c           | 2 +-
 net/core/flow_dissector.c    | 5 +++--
 5 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 718742b1c505..9b7a8038beec 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1271,7 +1271,7 @@ static inline int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr)
 
 struct bpf_flow_dissector;
 bool bpf_flow_dissect(struct bpf_prog *prog, struct bpf_flow_dissector *ctx,
-		      __be16 proto, int nhoff, int hlen);
+		      __be16 proto, int nhoff, int hlen, unsigned int flags);
 
 bool __skb_flow_dissect(const struct net *net,
 			const struct sk_buff *skb,
diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index 90bd210be060..3e2642587b76 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -253,10 +253,6 @@ enum flow_dissector_key_id {
 	FLOW_DISSECTOR_KEY_MAX,
 };
 
-#define FLOW_DISSECTOR_F_PARSE_1ST_FRAG		BIT(0)
-#define FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL	BIT(1)
-#define FLOW_DISSECTOR_F_STOP_AT_ENCAP		BIT(2)
-
 struct flow_dissector_key {
 	enum flow_dissector_key_id key_id;
 	size_t offset; /* offset of struct flow_dissector_key_*
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index fa1c753dcdbc..b4ad19bd6aa8 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3507,6 +3507,10 @@ enum bpf_task_fd_type {
 	BPF_FD_TYPE_URETPROBE,		/* filename + offset */
 };
 
+#define FLOW_DISSECTOR_F_PARSE_1ST_FRAG		(1U << 0)
+#define FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL	(1U << 1)
+#define FLOW_DISSECTOR_F_STOP_AT_ENCAP		(1U << 2)
+
 struct bpf_flow_keys {
 	__u16	nhoff;
 	__u16	thoff;
@@ -3528,6 +3532,7 @@ struct bpf_flow_keys {
 			__u32	ipv6_dst[4];	/* in6_addr; network order */
 		};
 	};
+	__u32	flags;
 };
 
 struct bpf_func_info {
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 80e6f3a6864d..4e41d15a1098 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -419,7 +419,7 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
 	time_start = ktime_get_ns();
 	for (i = 0; i < repeat; i++) {
 		retval = bpf_flow_dissect(prog, &ctx, eth->h_proto, ETH_HLEN,
-					  size);
+					  size, 0);
 
 		if (signal_pending(current)) {
 			preempt_enable();
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 3e6fedb57bc1..a74c4ed1b30d 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -784,7 +784,7 @@ static void __skb_flow_bpf_to_target(const struct bpf_flow_keys *flow_keys,
 }
 
 bool bpf_flow_dissect(struct bpf_prog *prog, struct bpf_flow_dissector *ctx,
-		      __be16 proto, int nhoff, int hlen)
+		      __be16 proto, int nhoff, int hlen, unsigned int flags)
 {
 	struct bpf_flow_keys *flow_keys = ctx->flow_keys;
 	u32 result;
@@ -794,6 +794,7 @@ bool bpf_flow_dissect(struct bpf_prog *prog, struct bpf_flow_dissector *ctx,
 	flow_keys->n_proto = proto;
 	flow_keys->nhoff = nhoff;
 	flow_keys->thoff = flow_keys->nhoff;
+	flow_keys->flags = flags;
 
 	preempt_disable();
 	result = BPF_PROG_RUN(prog, ctx);
@@ -914,7 +915,7 @@ bool __skb_flow_dissect(const struct net *net,
 			}
 
 			ret = bpf_flow_dissect(attached, &ctx, n_proto, nhoff,
-					       hlen);
+					       hlen, flags);
 			__skb_flow_bpf_to_target(&flow_keys, flow_dissector,
 						 target_container);
 			rcu_read_unlock();
-- 
2.22.0.657.g960e92d24f-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 2/7] bpf/flow_dissector: document flags
  2019-07-24 17:00 [PATCH bpf-next 0/7] bpf/flow_dissector: support input flags Stanislav Fomichev
  2019-07-24 17:00 ` [PATCH bpf-next 1/7] bpf/flow_dissector: pass input flags to BPF flow dissector program Stanislav Fomichev
@ 2019-07-24 17:00 ` Stanislav Fomichev
  2019-07-24 22:24   ` Song Liu
  2019-07-24 17:00 ` [PATCH bpf-next 3/7] bpf/flow_dissector: support flags in BPF_PROG_TEST_RUN Stanislav Fomichev
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Stanislav Fomichev @ 2019-07-24 17:00 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, Stanislav Fomichev, Willem de Bruijn, Petar Penkov

Describe what each input flag does and who uses it.

Cc: Willem de Bruijn <willemb@google.com>
Cc: Petar Penkov <ppenkov@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 Documentation/bpf/prog_flow_dissector.rst | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/Documentation/bpf/prog_flow_dissector.rst b/Documentation/bpf/prog_flow_dissector.rst
index ed343abe541e..0f3f380b2ce4 100644
--- a/Documentation/bpf/prog_flow_dissector.rst
+++ b/Documentation/bpf/prog_flow_dissector.rst
@@ -26,6 +26,7 @@ and output arguments.
   * ``nhoff`` - initial offset of the networking header
   * ``thoff`` - initial offset of the transport header, initialized to nhoff
   * ``n_proto`` - L3 protocol type, parsed out of L2 header
+  * ``flags`` - optional flags
 
 Flow dissector BPF program should fill out the rest of the ``struct
 bpf_flow_keys`` fields. Input arguments ``nhoff/thoff/n_proto`` should be
@@ -101,6 +102,23 @@ can be called for both cases and would have to be written carefully to
 handle both cases.
 
 
+Flags
+=====
+
+``flow_keys->flags`` might contain optional input flags that work as follows:
+
+* ``FLOW_DISSECTOR_F_PARSE_1ST_FRAG`` - tells BPF flow dissector to continue
+  parsing first fragment; the default expected behavior is that flow dissector
+  returns as soon as it finds out that the packet is fragmented;
+  used by ``eth_get_headlen`` to estimate length of all headers for GRO.
+* ``FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL`` - tells BPF flow dissector to stop
+  parsing as soon as it reaches IPv6 flow label; used by ``___skb_get_hash``
+  and ``__skb_get_hash_symmetric`` to get flow hash.
+* ``FLOW_DISSECTOR_F_STOP_AT_ENCAP`` - tells BPF flow dissector to stop
+  parsing as soon as it reaches encapsulated headers; used by routing
+  infrastructure.
+
+
 Reference Implementation
 ========================
 
-- 
2.22.0.657.g960e92d24f-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 3/7] bpf/flow_dissector: support flags in BPF_PROG_TEST_RUN
  2019-07-24 17:00 [PATCH bpf-next 0/7] bpf/flow_dissector: support input flags Stanislav Fomichev
  2019-07-24 17:00 ` [PATCH bpf-next 1/7] bpf/flow_dissector: pass input flags to BPF flow dissector program Stanislav Fomichev
  2019-07-24 17:00 ` [PATCH bpf-next 2/7] bpf/flow_dissector: document flags Stanislav Fomichev
@ 2019-07-24 17:00 ` Stanislav Fomichev
  2019-07-24 22:33   ` Song Liu
  2019-07-24 17:00 ` [PATCH bpf-next 4/7] tools/bpf: sync bpf_flow_keys flags Stanislav Fomichev
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Stanislav Fomichev @ 2019-07-24 17:00 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, Stanislav Fomichev, Willem de Bruijn, Petar Penkov

This will allow us to write tests for those flags.

Cc: Willem de Bruijn <willemb@google.com>
Cc: Petar Penkov <ppenkov@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 net/bpf/test_run.c | 39 +++++++++++++++++++++++++++++++++++----
 1 file changed, 35 insertions(+), 4 deletions(-)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 4e41d15a1098..444a7baed791 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -377,6 +377,22 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 	return ret;
 }
 
+static int verify_user_bpf_flow_keys(struct bpf_flow_keys *ctx)
+{
+	/* make sure the fields we don't use are zeroed */
+	if (!range_is_zero(ctx, 0, offsetof(struct bpf_flow_keys, flags)))
+		return -EINVAL;
+
+	/* flags is allowed */
+
+	if (!range_is_zero(ctx, offsetof(struct bpf_flow_keys, flags) +
+			   FIELD_SIZEOF(struct bpf_flow_keys, flags),
+			   sizeof(struct bpf_flow_keys)))
+		return -EINVAL;
+
+	return 0;
+}
+
 int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
 				     const union bpf_attr *kattr,
 				     union bpf_attr __user *uattr)
@@ -384,9 +400,11 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
 	u32 size = kattr->test.data_size_in;
 	struct bpf_flow_dissector ctx = {};
 	u32 repeat = kattr->test.repeat;
+	struct bpf_flow_keys *user_ctx;
 	struct bpf_flow_keys flow_keys;
 	u64 time_start, time_spent = 0;
 	const struct ethhdr *eth;
+	unsigned int flags = 0;
 	u32 retval, duration;
 	void *data;
 	int ret;
@@ -395,9 +413,6 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
 	if (prog->type != BPF_PROG_TYPE_FLOW_DISSECTOR)
 		return -EINVAL;
 
-	if (kattr->test.ctx_in || kattr->test.ctx_out)
-		return -EINVAL;
-
 	if (size < ETH_HLEN)
 		return -EINVAL;
 
@@ -410,6 +425,18 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
 	if (!repeat)
 		repeat = 1;
 
+	user_ctx = bpf_ctx_init(kattr, sizeof(struct bpf_flow_keys));
+	if (IS_ERR(user_ctx)) {
+		kfree(data);
+		return PTR_ERR(user_ctx);
+	}
+	if (user_ctx) {
+		ret = verify_user_bpf_flow_keys(user_ctx);
+		if (ret)
+			goto out;
+		flags = user_ctx->flags;
+	}
+
 	ctx.flow_keys = &flow_keys;
 	ctx.data = data;
 	ctx.data_end = (__u8 *)data + size;
@@ -419,7 +446,7 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
 	time_start = ktime_get_ns();
 	for (i = 0; i < repeat; i++) {
 		retval = bpf_flow_dissect(prog, &ctx, eth->h_proto, ETH_HLEN,
-					  size, 0);
+					  size, flags);
 
 		if (signal_pending(current)) {
 			preempt_enable();
@@ -450,8 +477,12 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
 
 	ret = bpf_test_finish(kattr, uattr, &flow_keys, sizeof(flow_keys),
 			      retval, duration);
+	if (!ret)
+		ret = bpf_ctx_finish(kattr, uattr, user_ctx,
+				     sizeof(struct bpf_flow_keys));
 
 out:
 	kfree(data);
+	kfree(user_ctx);
 	return ret;
 }
-- 
2.22.0.657.g960e92d24f-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 4/7] tools/bpf: sync bpf_flow_keys flags
  2019-07-24 17:00 [PATCH bpf-next 0/7] bpf/flow_dissector: support input flags Stanislav Fomichev
                   ` (2 preceding siblings ...)
  2019-07-24 17:00 ` [PATCH bpf-next 3/7] bpf/flow_dissector: support flags in BPF_PROG_TEST_RUN Stanislav Fomichev
@ 2019-07-24 17:00 ` Stanislav Fomichev
  2019-07-24 23:17   ` Song Liu
  2019-07-24 17:00 ` [PATCH bpf-next 5/7] sefltests/bpf: support FLOW_DISSECTOR_F_PARSE_1ST_FRAG Stanislav Fomichev
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Stanislav Fomichev @ 2019-07-24 17:00 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, Stanislav Fomichev, Willem de Bruijn, Petar Penkov

Export bpf_flow_keys flags to tools/libbpf/selftests.

Cc: Willem de Bruijn <willemb@google.com>
Cc: Petar Penkov <ppenkov@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 tools/include/uapi/linux/bpf.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 4e455018da65..a0e1c891b56f 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3504,6 +3504,10 @@ enum bpf_task_fd_type {
 	BPF_FD_TYPE_URETPROBE,		/* filename + offset */
 };
 
+#define FLOW_DISSECTOR_F_PARSE_1ST_FRAG		(1U << 0)
+#define FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL	(1U << 1)
+#define FLOW_DISSECTOR_F_STOP_AT_ENCAP		(1U << 2)
+
 struct bpf_flow_keys {
 	__u16	nhoff;
 	__u16	thoff;
@@ -3525,6 +3529,7 @@ struct bpf_flow_keys {
 			__u32	ipv6_dst[4];	/* in6_addr; network order */
 		};
 	};
+	__u32	flags;
 };
 
 struct bpf_func_info {
-- 
2.22.0.657.g960e92d24f-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 5/7] sefltests/bpf: support FLOW_DISSECTOR_F_PARSE_1ST_FRAG
  2019-07-24 17:00 [PATCH bpf-next 0/7] bpf/flow_dissector: support input flags Stanislav Fomichev
                   ` (3 preceding siblings ...)
  2019-07-24 17:00 ` [PATCH bpf-next 4/7] tools/bpf: sync bpf_flow_keys flags Stanislav Fomichev
@ 2019-07-24 17:00 ` Stanislav Fomichev
  2019-07-24 23:21   ` Song Liu
  2019-07-24 17:00 ` [PATCH bpf-next 6/7] bpf/flow_dissector: support ipv6 flow_label and FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL Stanislav Fomichev
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Stanislav Fomichev @ 2019-07-24 17:00 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, Stanislav Fomichev, Willem de Bruijn, Petar Penkov

bpf_flow.c: exit early unless FLOW_DISSECTOR_F_PARSE_1ST_FRAG is passed
in flags. Also, set ip_proto earlier, this makes sure we have correct
value with fragmented packets.

Add selftest cases to test ipv4/ipv6 fragments and skip eth_get_headlen
tests that don't have FLOW_DISSECTOR_F_PARSE_1ST_FRAG flag.

eth_get_headlen calls flow dissector with
FLOW_DISSECTOR_F_PARSE_1ST_FRAG flag so we can't run tests that
have different set of input flags against it.

Cc: Willem de Bruijn <willemb@google.com>
Cc: Petar Penkov <ppenkov@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 .../selftests/bpf/prog_tests/flow_dissector.c | 129 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/bpf_flow.c  |  28 +++-
 2 files changed, 151 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
index c938283ac232..966cb3b06870 100644
--- a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
+++ b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
@@ -5,6 +5,10 @@
 #include <linux/if_tun.h>
 #include <sys/uio.h>
 
+#ifndef IP_MF
+#define IP_MF 0x2000
+#endif
+
 #define CHECK_FLOW_KEYS(desc, got, expected)				\
 	CHECK_ATTR(memcmp(&got, &expected, sizeof(got)) != 0,		\
 	      desc,							\
@@ -49,6 +53,18 @@ struct ipv6_pkt {
 	struct tcphdr tcp;
 } __packed;
 
+struct ipv6_frag_pkt {
+	struct ethhdr eth;
+	struct ipv6hdr iph;
+	struct frag_hdr {
+		__u8 nexthdr;
+		__u8 reserved;
+		__be16 frag_off;
+		__be32 identification;
+	} ipf;
+	struct tcphdr tcp;
+} __packed;
+
 struct dvlan_ipv6_pkt {
 	struct ethhdr eth;
 	__u16 vlan_tci;
@@ -65,9 +81,11 @@ struct test {
 		struct ipv4_pkt ipv4;
 		struct svlan_ipv4_pkt svlan_ipv4;
 		struct ipv6_pkt ipv6;
+		struct ipv6_frag_pkt ipv6_frag;
 		struct dvlan_ipv6_pkt dvlan_ipv6;
 	} pkt;
 	struct bpf_flow_keys keys;
+	__u32 flags;
 };
 
 #define VLAN_HLEN	4
@@ -143,6 +161,102 @@ struct test tests[] = {
 			.n_proto = __bpf_constant_htons(ETH_P_IPV6),
 		},
 	},
+	{
+		.name = "ipv4-frag",
+		.pkt.ipv4 = {
+			.eth.h_proto = __bpf_constant_htons(ETH_P_IP),
+			.iph.ihl = 5,
+			.iph.protocol = IPPROTO_TCP,
+			.iph.tot_len = __bpf_constant_htons(MAGIC_BYTES),
+			.iph.frag_off = __bpf_constant_htons(IP_MF),
+			.tcp.doff = 5,
+			.tcp.source = 80,
+			.tcp.dest = 8080,
+		},
+		.keys = {
+			.flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG,
+			.nhoff = ETH_HLEN,
+			.thoff = ETH_HLEN + sizeof(struct iphdr),
+			.addr_proto = ETH_P_IP,
+			.ip_proto = IPPROTO_TCP,
+			.n_proto = __bpf_constant_htons(ETH_P_IP),
+			.is_frag = true,
+			.is_first_frag = true,
+			.sport = 80,
+			.dport = 8080,
+		},
+		.flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG,
+	},
+	{
+		.name = "ipv4-no-frag",
+		.pkt.ipv4 = {
+			.eth.h_proto = __bpf_constant_htons(ETH_P_IP),
+			.iph.ihl = 5,
+			.iph.protocol = IPPROTO_TCP,
+			.iph.tot_len = __bpf_constant_htons(MAGIC_BYTES),
+			.iph.frag_off = __bpf_constant_htons(IP_MF),
+			.tcp.doff = 5,
+			.tcp.source = 80,
+			.tcp.dest = 8080,
+		},
+		.keys = {
+			.nhoff = ETH_HLEN,
+			.thoff = ETH_HLEN + sizeof(struct iphdr),
+			.addr_proto = ETH_P_IP,
+			.ip_proto = IPPROTO_TCP,
+			.n_proto = __bpf_constant_htons(ETH_P_IP),
+			.is_frag = true,
+			.is_first_frag = true,
+		},
+	},
+	{
+		.name = "ipv6-frag",
+		.pkt.ipv6_frag = {
+			.eth.h_proto = __bpf_constant_htons(ETH_P_IPV6),
+			.iph.nexthdr = IPPROTO_FRAGMENT,
+			.iph.payload_len = __bpf_constant_htons(MAGIC_BYTES),
+			.ipf.nexthdr = IPPROTO_TCP,
+			.tcp.doff = 5,
+			.tcp.source = 80,
+			.tcp.dest = 8080,
+		},
+		.keys = {
+			.flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG,
+			.nhoff = ETH_HLEN,
+			.thoff = ETH_HLEN + sizeof(struct ipv6hdr) +
+				sizeof(struct frag_hdr),
+			.addr_proto = ETH_P_IPV6,
+			.ip_proto = IPPROTO_TCP,
+			.n_proto = __bpf_constant_htons(ETH_P_IPV6),
+			.is_frag = true,
+			.is_first_frag = true,
+			.sport = 80,
+			.dport = 8080,
+		},
+		.flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG,
+	},
+	{
+		.name = "ipv6-no-frag",
+		.pkt.ipv6_frag = {
+			.eth.h_proto = __bpf_constant_htons(ETH_P_IPV6),
+			.iph.nexthdr = IPPROTO_FRAGMENT,
+			.iph.payload_len = __bpf_constant_htons(MAGIC_BYTES),
+			.ipf.nexthdr = IPPROTO_TCP,
+			.tcp.doff = 5,
+			.tcp.source = 80,
+			.tcp.dest = 8080,
+		},
+		.keys = {
+			.nhoff = ETH_HLEN,
+			.thoff = ETH_HLEN + sizeof(struct ipv6hdr) +
+				sizeof(struct frag_hdr),
+			.addr_proto = ETH_P_IPV6,
+			.ip_proto = IPPROTO_TCP,
+			.n_proto = __bpf_constant_htons(ETH_P_IPV6),
+			.is_frag = true,
+			.is_first_frag = true,
+		},
+	},
 };
 
 static int create_tap(const char *ifname)
@@ -225,6 +339,13 @@ void test_flow_dissector(void)
 			.data_size_in = sizeof(tests[i].pkt),
 			.data_out = &flow_keys,
 		};
+		static struct bpf_flow_keys ctx = {};
+
+		if (tests[i].flags) {
+			tattr.ctx_in = &ctx;
+			tattr.ctx_size_in = sizeof(ctx);
+			ctx.flags = tests[i].flags;
+		}
 
 		err = bpf_prog_test_run_xattr(&tattr);
 		CHECK_ATTR(tattr.data_size_out != sizeof(flow_keys) ||
@@ -255,6 +376,14 @@ void test_flow_dissector(void)
 		struct bpf_prog_test_run_attr tattr = {};
 		__u32 key = 0;
 
+		/* Don't run tests that are not marked as
+		 * FLOW_DISSECTOR_F_PARSE_1ST_FRAG; eth_get_headlen
+		 * sets this flag.
+		 */
+
+		if (tests[i].flags != FLOW_DISSECTOR_F_PARSE_1ST_FRAG)
+			continue;
+
 		err = tx_tap(tap_fd, &tests[i].pkt, sizeof(tests[i].pkt));
 		CHECK(err < 0, "tx_tap", "err %d errno %d\n", err, errno);
 
diff --git a/tools/testing/selftests/bpf/progs/bpf_flow.c b/tools/testing/selftests/bpf/progs/bpf_flow.c
index 5ae485a6af3f..0eabe5e57944 100644
--- a/tools/testing/selftests/bpf/progs/bpf_flow.c
+++ b/tools/testing/selftests/bpf/progs/bpf_flow.c
@@ -153,7 +153,6 @@ static __always_inline int parse_ip_proto(struct __sk_buff *skb, __u8 proto)
 	struct tcphdr *tcp, _tcp;
 	struct udphdr *udp, _udp;
 
-	keys->ip_proto = proto;
 	switch (proto) {
 	case IPPROTO_ICMP:
 		icmp = bpf_flow_dissect_get_header(skb, sizeof(*icmp), &_icmp);
@@ -231,7 +230,6 @@ static __always_inline int parse_ipv6_proto(struct __sk_buff *skb, __u8 nexthdr)
 {
 	struct bpf_flow_keys *keys = skb->flow_keys;
 
-	keys->ip_proto = nexthdr;
 	switch (nexthdr) {
 	case IPPROTO_HOPOPTS:
 	case IPPROTO_DSTOPTS:
@@ -266,6 +264,7 @@ PROG(IP)(struct __sk_buff *skb)
 	keys->addr_proto = ETH_P_IP;
 	keys->ipv4_src = iph->saddr;
 	keys->ipv4_dst = iph->daddr;
+	keys->ip_proto = iph->protocol;
 
 	keys->thoff += iph->ihl << 2;
 	if (data + keys->thoff > data_end)
@@ -273,13 +272,19 @@ PROG(IP)(struct __sk_buff *skb)
 
 	if (iph->frag_off & bpf_htons(IP_MF | IP_OFFSET)) {
 		keys->is_frag = true;
-		if (iph->frag_off & bpf_htons(IP_OFFSET))
+		if (iph->frag_off & bpf_htons(IP_OFFSET)) {
 			/* From second fragment on, packets do not have headers
 			 * we can parse.
 			 */
 			done = true;
-		else
+		} else {
 			keys->is_first_frag = true;
+			/* No need to parse fragmented packet unless
+			 * explicitly asked for.
+			 */
+			if (!(keys->flags & FLOW_DISSECTOR_F_PARSE_1ST_FRAG))
+				done = true;
+		}
 	}
 
 	if (done)
@@ -301,6 +306,7 @@ PROG(IPV6)(struct __sk_buff *skb)
 	memcpy(&keys->ipv6_src, &ip6h->saddr, 2*sizeof(ip6h->saddr));
 
 	keys->thoff += sizeof(struct ipv6hdr);
+	keys->ip_proto = ip6h->nexthdr;
 
 	return parse_ipv6_proto(skb, ip6h->nexthdr);
 }
@@ -317,7 +323,8 @@ PROG(IPV6OP)(struct __sk_buff *skb)
 	/* hlen is in 8-octets and does not include the first 8 bytes
 	 * of the header
 	 */
-	skb->flow_keys->thoff += (1 + ip6h->hdrlen) << 3;
+	keys->thoff += (1 + ip6h->hdrlen) << 3;
+	keys->ip_proto = ip6h->nexthdr;
 
 	return parse_ipv6_proto(skb, ip6h->nexthdr);
 }
@@ -333,9 +340,18 @@ PROG(IPV6FR)(struct __sk_buff *skb)
 
 	keys->thoff += sizeof(*fragh);
 	keys->is_frag = true;
-	if (!(fragh->frag_off & bpf_htons(IP6_OFFSET)))
+	keys->ip_proto = fragh->nexthdr;
+
+	if (!(fragh->frag_off & bpf_htons(IP6_OFFSET))) {
 		keys->is_first_frag = true;
 
+		/* No need to parse fragmented packet unless
+		 * explicitly asked for.
+		 */
+		if (!(keys->flags & FLOW_DISSECTOR_F_PARSE_1ST_FRAG))
+			return export_flow_keys(keys, BPF_OK);
+	}
+
 	return parse_ipv6_proto(skb, fragh->nexthdr);
 }
 
-- 
2.22.0.657.g960e92d24f-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 6/7] bpf/flow_dissector: support ipv6 flow_label and FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL
  2019-07-24 17:00 [PATCH bpf-next 0/7] bpf/flow_dissector: support input flags Stanislav Fomichev
                   ` (4 preceding siblings ...)
  2019-07-24 17:00 ` [PATCH bpf-next 5/7] sefltests/bpf: support FLOW_DISSECTOR_F_PARSE_1ST_FRAG Stanislav Fomichev
@ 2019-07-24 17:00 ` Stanislav Fomichev
  2019-07-24 23:28   ` Song Liu
  2019-07-24 17:00 ` [PATCH bpf-next 7/7] selftests/bpf: support FLOW_DISSECTOR_F_STOP_AT_ENCAP Stanislav Fomichev
  2019-07-24 22:10 ` [PATCH bpf-next 0/7] bpf/flow_dissector: support input flags Willem de Bruijn
  7 siblings, 1 reply; 20+ messages in thread
From: Stanislav Fomichev @ 2019-07-24 17:00 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, Stanislav Fomichev, Willem de Bruijn, Petar Penkov

Add support for exporting ipv6 flow label via bpf_flow_keys.
Export flow label from bpf_flow.c and also return early when
FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL is passed.

Cc: Willem de Bruijn <willemb@google.com>
Cc: Petar Penkov <ppenkov@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/uapi/linux/bpf.h                      |  1 +
 net/core/flow_dissector.c                     |  9 ++++
 tools/include/uapi/linux/bpf.h                |  1 +
 .../selftests/bpf/prog_tests/flow_dissector.c | 46 +++++++++++++++++++
 tools/testing/selftests/bpf/progs/bpf_flow.c  | 10 ++++
 5 files changed, 67 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b4ad19bd6aa8..83b4150466af 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3533,6 +3533,7 @@ struct bpf_flow_keys {
 		};
 	};
 	__u32	flags;
+	__be32	flow_label;
 };
 
 struct bpf_func_info {
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index a74c4ed1b30d..bcdb863cad28 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -737,6 +737,7 @@ static void __skb_flow_bpf_to_target(const struct bpf_flow_keys *flow_keys,
 	struct flow_dissector_key_basic *key_basic;
 	struct flow_dissector_key_addrs *key_addrs;
 	struct flow_dissector_key_ports *key_ports;
+	struct flow_dissector_key_tags *key_tags;
 
 	key_control = skb_flow_dissector_target(flow_dissector,
 						FLOW_DISSECTOR_KEY_CONTROL,
@@ -781,6 +782,14 @@ static void __skb_flow_bpf_to_target(const struct bpf_flow_keys *flow_keys,
 		key_ports->src = flow_keys->sport;
 		key_ports->dst = flow_keys->dport;
 	}
+
+	if (dissector_uses_key(flow_dissector,
+			       FLOW_DISSECTOR_KEY_FLOW_LABEL)) {
+		key_tags = skb_flow_dissector_target(flow_dissector,
+						     FLOW_DISSECTOR_KEY_FLOW_LABEL,
+						     target_container);
+		key_tags->flow_label = ntohl(flow_keys->flow_label);
+	}
 }
 
 bool bpf_flow_dissect(struct bpf_prog *prog, struct bpf_flow_dissector *ctx,
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index a0e1c891b56f..c26ca432b1b3 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3530,6 +3530,7 @@ struct bpf_flow_keys {
 		};
 	};
 	__u32	flags;
+	__be32	flow_label;
 };
 
 struct bpf_func_info {
diff --git a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
index 966cb3b06870..1ea921c4cdc0 100644
--- a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
+++ b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
@@ -20,6 +20,7 @@
 	      "is_encap=%u/%u "						\
 	      "ip_proto=0x%x/0x%x "					\
 	      "n_proto=0x%x/0x%x "					\
+	      "flow_label=0x%x/0x%x "					\
 	      "sport=%u/%u "						\
 	      "dport=%u/%u\n",						\
 	      got.nhoff, expected.nhoff,				\
@@ -30,6 +31,7 @@
 	      got.is_encap, expected.is_encap,				\
 	      got.ip_proto, expected.ip_proto,				\
 	      got.n_proto, expected.n_proto,				\
+	      got.flow_label, expected.flow_label,			\
 	      got.sport, expected.sport,				\
 	      got.dport, expected.dport)
 
@@ -257,6 +259,50 @@ struct test tests[] = {
 			.is_first_frag = true,
 		},
 	},
+	{
+		.name = "ipv6-flow-label",
+		.pkt.ipv6 = {
+			.eth.h_proto = __bpf_constant_htons(ETH_P_IPV6),
+			.iph.nexthdr = IPPROTO_TCP,
+			.iph.payload_len = __bpf_constant_htons(MAGIC_BYTES),
+			.iph.flow_lbl = { 0xb, 0xee, 0xef },
+			.tcp.doff = 5,
+			.tcp.source = 80,
+			.tcp.dest = 8080,
+		},
+		.keys = {
+			.nhoff = ETH_HLEN,
+			.thoff = ETH_HLEN + sizeof(struct ipv6hdr),
+			.addr_proto = ETH_P_IPV6,
+			.ip_proto = IPPROTO_TCP,
+			.n_proto = __bpf_constant_htons(ETH_P_IPV6),
+			.sport = 80,
+			.dport = 8080,
+			.flow_label = __bpf_constant_htonl(0xbeeef),
+		},
+	},
+	{
+		.name = "ipv6-no-flow-label",
+		.pkt.ipv6 = {
+			.eth.h_proto = __bpf_constant_htons(ETH_P_IPV6),
+			.iph.nexthdr = IPPROTO_TCP,
+			.iph.payload_len = __bpf_constant_htons(MAGIC_BYTES),
+			.iph.flow_lbl = { 0xb, 0xee, 0xef },
+			.tcp.doff = 5,
+			.tcp.source = 80,
+			.tcp.dest = 8080,
+		},
+		.keys = {
+			.flags = FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL,
+			.nhoff = ETH_HLEN,
+			.thoff = ETH_HLEN + sizeof(struct ipv6hdr),
+			.addr_proto = ETH_P_IPV6,
+			.ip_proto = IPPROTO_TCP,
+			.n_proto = __bpf_constant_htons(ETH_P_IPV6),
+			.flow_label = __bpf_constant_htonl(0xbeeef),
+		},
+		.flags = FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL,
+	},
 };
 
 static int create_tap(const char *ifname)
diff --git a/tools/testing/selftests/bpf/progs/bpf_flow.c b/tools/testing/selftests/bpf/progs/bpf_flow.c
index 0eabe5e57944..7d73b7bfe609 100644
--- a/tools/testing/selftests/bpf/progs/bpf_flow.c
+++ b/tools/testing/selftests/bpf/progs/bpf_flow.c
@@ -83,6 +83,12 @@ static __always_inline int export_flow_keys(struct bpf_flow_keys *keys,
 	return ret;
 }
 
+#define IPV6_FLOWLABEL_MASK		__bpf_constant_htonl(0x000FFFFF)
+static inline __be32 ip6_flowlabel(const struct ipv6hdr *hdr)
+{
+	return *(__be32 *)hdr & IPV6_FLOWLABEL_MASK;
+}
+
 static __always_inline void *bpf_flow_dissect_get_header(struct __sk_buff *skb,
 							 __u16 hdr_size,
 							 void *buffer)
@@ -307,6 +313,10 @@ PROG(IPV6)(struct __sk_buff *skb)
 
 	keys->thoff += sizeof(struct ipv6hdr);
 	keys->ip_proto = ip6h->nexthdr;
+	keys->flow_label = ip6_flowlabel(ip6h);
+
+	if (keys->flags & FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL)
+		return export_flow_keys(keys, BPF_OK);
 
 	return parse_ipv6_proto(skb, ip6h->nexthdr);
 }
-- 
2.22.0.657.g960e92d24f-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 7/7] selftests/bpf: support FLOW_DISSECTOR_F_STOP_AT_ENCAP
  2019-07-24 17:00 [PATCH bpf-next 0/7] bpf/flow_dissector: support input flags Stanislav Fomichev
                   ` (5 preceding siblings ...)
  2019-07-24 17:00 ` [PATCH bpf-next 6/7] bpf/flow_dissector: support ipv6 flow_label and FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL Stanislav Fomichev
@ 2019-07-24 17:00 ` Stanislav Fomichev
  2019-07-24 23:29   ` Song Liu
  2019-07-24 22:10 ` [PATCH bpf-next 0/7] bpf/flow_dissector: support input flags Willem de Bruijn
  7 siblings, 1 reply; 20+ messages in thread
From: Stanislav Fomichev @ 2019-07-24 17:00 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, Stanislav Fomichev, Willem de Bruijn, Petar Penkov

Exit as soon as we found that packet is encapped when
FLOW_DISSECTOR_F_STOP_AT_ENCAP is passed.
Add appropriate selftest cases.

Cc: Willem de Bruijn <willemb@google.com>
Cc: Petar Penkov <ppenkov@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 .../selftests/bpf/prog_tests/flow_dissector.c | 60 +++++++++++++++++++
 tools/testing/selftests/bpf/progs/bpf_flow.c  |  8 +++
 2 files changed, 68 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
index 1ea921c4cdc0..e382264fbc40 100644
--- a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
+++ b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
@@ -41,6 +41,13 @@ struct ipv4_pkt {
 	struct tcphdr tcp;
 } __packed;
 
+struct ipip_pkt {
+	struct ethhdr eth;
+	struct iphdr iph;
+	struct iphdr iph_inner;
+	struct tcphdr tcp;
+} __packed;
+
 struct svlan_ipv4_pkt {
 	struct ethhdr eth;
 	__u16 vlan_tci;
@@ -82,6 +89,7 @@ struct test {
 	union {
 		struct ipv4_pkt ipv4;
 		struct svlan_ipv4_pkt svlan_ipv4;
+		struct ipip_pkt ipip;
 		struct ipv6_pkt ipv6;
 		struct ipv6_frag_pkt ipv6_frag;
 		struct dvlan_ipv6_pkt dvlan_ipv6;
@@ -303,6 +311,58 @@ struct test tests[] = {
 		},
 		.flags = FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL,
 	},
+	{
+		.name = "ipip-encap",
+		.pkt.ipip = {
+			.eth.h_proto = __bpf_constant_htons(ETH_P_IP),
+			.iph.ihl = 5,
+			.iph.protocol = IPPROTO_IPIP,
+			.iph.tot_len = __bpf_constant_htons(MAGIC_BYTES),
+			.iph_inner.ihl = 5,
+			.iph_inner.protocol = IPPROTO_TCP,
+			.iph_inner.tot_len = __bpf_constant_htons(MAGIC_BYTES),
+			.tcp.doff = 5,
+			.tcp.source = 80,
+			.tcp.dest = 8080,
+		},
+		.keys = {
+			.nhoff = 0,
+			.nhoff = ETH_HLEN,
+			.thoff = ETH_HLEN + sizeof(struct iphdr) +
+				sizeof(struct iphdr),
+			.addr_proto = ETH_P_IP,
+			.ip_proto = IPPROTO_TCP,
+			.n_proto = __bpf_constant_htons(ETH_P_IP),
+			.is_encap = true,
+			.sport = 80,
+			.dport = 8080,
+		},
+	},
+	{
+		.name = "ipip-no-encap",
+		.pkt.ipip = {
+			.eth.h_proto = __bpf_constant_htons(ETH_P_IP),
+			.iph.ihl = 5,
+			.iph.protocol = IPPROTO_IPIP,
+			.iph.tot_len = __bpf_constant_htons(MAGIC_BYTES),
+			.iph_inner.ihl = 5,
+			.iph_inner.protocol = IPPROTO_TCP,
+			.iph_inner.tot_len = __bpf_constant_htons(MAGIC_BYTES),
+			.tcp.doff = 5,
+			.tcp.source = 80,
+			.tcp.dest = 8080,
+		},
+		.keys = {
+			.flags = FLOW_DISSECTOR_F_STOP_AT_ENCAP,
+			.nhoff = ETH_HLEN,
+			.thoff = ETH_HLEN + sizeof(struct iphdr),
+			.addr_proto = ETH_P_IP,
+			.ip_proto = IPPROTO_IPIP,
+			.n_proto = __bpf_constant_htons(ETH_P_IP),
+			.is_encap = true,
+		},
+		.flags = FLOW_DISSECTOR_F_STOP_AT_ENCAP,
+	},
 };
 
 static int create_tap(const char *ifname)
diff --git a/tools/testing/selftests/bpf/progs/bpf_flow.c b/tools/testing/selftests/bpf/progs/bpf_flow.c
index 7d73b7bfe609..b6236cdf8564 100644
--- a/tools/testing/selftests/bpf/progs/bpf_flow.c
+++ b/tools/testing/selftests/bpf/progs/bpf_flow.c
@@ -167,9 +167,15 @@ static __always_inline int parse_ip_proto(struct __sk_buff *skb, __u8 proto)
 		return export_flow_keys(keys, BPF_OK);
 	case IPPROTO_IPIP:
 		keys->is_encap = true;
+		if (keys->flags & FLOW_DISSECTOR_F_STOP_AT_ENCAP)
+			return export_flow_keys(keys, BPF_OK);
+
 		return parse_eth_proto(skb, bpf_htons(ETH_P_IP));
 	case IPPROTO_IPV6:
 		keys->is_encap = true;
+		if (keys->flags & FLOW_DISSECTOR_F_STOP_AT_ENCAP)
+			return export_flow_keys(keys, BPF_OK);
+
 		return parse_eth_proto(skb, bpf_htons(ETH_P_IPV6));
 	case IPPROTO_GRE:
 		gre = bpf_flow_dissect_get_header(skb, sizeof(*gre), &_gre);
@@ -189,6 +195,8 @@ static __always_inline int parse_ip_proto(struct __sk_buff *skb, __u8 proto)
 			keys->thoff += 4; /* Step over sequence number */
 
 		keys->is_encap = true;
+		if (keys->flags & FLOW_DISSECTOR_F_STOP_AT_ENCAP)
+			return export_flow_keys(keys, BPF_OK);
 
 		if (gre->proto == bpf_htons(ETH_P_TEB)) {
 			eth = bpf_flow_dissect_get_header(skb, sizeof(*eth),
-- 
2.22.0.657.g960e92d24f-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 0/7] bpf/flow_dissector: support input flags
  2019-07-24 17:00 [PATCH bpf-next 0/7] bpf/flow_dissector: support input flags Stanislav Fomichev
                   ` (6 preceding siblings ...)
  2019-07-24 17:00 ` [PATCH bpf-next 7/7] selftests/bpf: support FLOW_DISSECTOR_F_STOP_AT_ENCAP Stanislav Fomichev
@ 2019-07-24 22:10 ` Willem de Bruijn
  7 siblings, 0 replies; 20+ messages in thread
From: Willem de Bruijn @ 2019-07-24 22:10 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Network Development, bpf, David Miller, Alexei Starovoitov,
	Daniel Borkmann, Willem de Bruijn, Petar Penkov

On Wed, Jul 24, 2019 at 1:11 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> C flow dissector supports input flags that tell it to customize parsing
> by either stopping early or trying to parse as deep as possible.
> BPF flow dissector always parses as deep as possible which is sub-optimal.
> Pass input flags to the BPF flow dissector as well so it can make the same
> decisions.
>
> Series outline:
> * remove unused FLOW_DISSECTOR_F_STOP_AT_L3 flag
> * export FLOW_DISSECTOR_F_XXX flags as uapi and pass them to BPF
>   flow dissector
> * add documentation for the export flags
> * support input flags in BPF_PROG_TEST_RUN via ctx_{in,out}
> * sync uapi to tools
> * support FLOW_DISSECTOR_F_PARSE_1ST_FRAG in selftest
> * support FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL in kernel and selftest
> * support FLOW_DISSECTOR_F_STOP_AT_ENCAP in selftest
>
> Pros:
> * makes BPF flow dissector faster by avoiding burning extra cycles
> * existing BPF progs continue to work by ignoring the flags and always
>   parsing as deep as possible
>
> Cons:
> * new UAPI which we need to support (OTOH, if we need to deprecate some
>   flags, we can just stop setting them upon calling BPF programs)
>
> Some numbers (with .repeat = 4000000 in test_flow_dissector):
>         test_flow_dissector:PASS:ipv4-frag 35 nsec
>         test_flow_dissector:PASS:ipv4-frag 35 nsec
>         test_flow_dissector:PASS:ipv4-no-frag 32 nsec
>         test_flow_dissector:PASS:ipv4-no-frag 32 nsec
>
>         test_flow_dissector:PASS:ipv6-frag 39 nsec
>         test_flow_dissector:PASS:ipv6-frag 39 nsec
>         test_flow_dissector:PASS:ipv6-no-frag 36 nsec
>         test_flow_dissector:PASS:ipv6-no-frag 36 nsec
>
>         test_flow_dissector:PASS:ipv6-flow-label 36 nsec
>         test_flow_dissector:PASS:ipv6-flow-label 36 nsec
>         test_flow_dissector:PASS:ipv6-no-flow-label 33 nsec
>         test_flow_dissector:PASS:ipv6-no-flow-label 33 nsec
>
>         test_flow_dissector:PASS:ipip-encap 38 nsec
>         test_flow_dissector:PASS:ipip-encap 38 nsec
>         test_flow_dissector:PASS:ipip-no-encap 32 nsec
>         test_flow_dissector:PASS:ipip-no-encap 32 nsec
>
> The improvement is around 10%, but it's in a tight cache-hot
> BPF_PROG_TEST_RUN loop.
>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Petar Penkov <ppenkov@google.com>

This looks great to me. Thanks, Stan!

Acked-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 1/7] bpf/flow_dissector: pass input flags to BPF flow dissector program
  2019-07-24 17:00 ` [PATCH bpf-next 1/7] bpf/flow_dissector: pass input flags to BPF flow dissector program Stanislav Fomichev
@ 2019-07-24 22:21   ` Song Liu
  2019-07-24 22:32     ` Stanislav Fomichev
  0 siblings, 1 reply; 20+ messages in thread
From: Song Liu @ 2019-07-24 22:21 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Networking, bpf, David S . Miller, Alexei Starovoitov,
	Daniel Borkmann, Willem de Bruijn, Petar Penkov

On Wed, Jul 24, 2019 at 10:11 AM Stanislav Fomichev <sdf@google.com> wrote:
>
> C flow dissector supports input flags that tell it to customize parsing
> by either stopping early or trying to parse as deep as possible. Pass
> those flags to the BPF flow dissector so it can make the same
> decisions. In the next commits I'll add support for those flags to
> our reference bpf_flow.c
>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Petar Penkov <ppenkov@google.com>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  include/linux/skbuff.h       | 2 +-
>  include/net/flow_dissector.h | 4 ----
>  include/uapi/linux/bpf.h     | 5 +++++
>  net/bpf/test_run.c           | 2 +-
>  net/core/flow_dissector.c    | 5 +++--
>  5 files changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 718742b1c505..9b7a8038beec 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -1271,7 +1271,7 @@ static inline int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr)
>
>  struct bpf_flow_dissector;
>  bool bpf_flow_dissect(struct bpf_prog *prog, struct bpf_flow_dissector *ctx,
> -                     __be16 proto, int nhoff, int hlen);
> +                     __be16 proto, int nhoff, int hlen, unsigned int flags);
>
>  bool __skb_flow_dissect(const struct net *net,
>                         const struct sk_buff *skb,
> diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
> index 90bd210be060..3e2642587b76 100644
> --- a/include/net/flow_dissector.h
> +++ b/include/net/flow_dissector.h
> @@ -253,10 +253,6 @@ enum flow_dissector_key_id {
>         FLOW_DISSECTOR_KEY_MAX,
>  };
>
> -#define FLOW_DISSECTOR_F_PARSE_1ST_FRAG                BIT(0)
> -#define FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL    BIT(1)
> -#define FLOW_DISSECTOR_F_STOP_AT_ENCAP         BIT(2)
> -
>  struct flow_dissector_key {
>         enum flow_dissector_key_id key_id;
>         size_t offset; /* offset of struct flow_dissector_key_*
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index fa1c753dcdbc..b4ad19bd6aa8 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -3507,6 +3507,10 @@ enum bpf_task_fd_type {
>         BPF_FD_TYPE_URETPROBE,          /* filename + offset */
>  };
>
> +#define FLOW_DISSECTOR_F_PARSE_1ST_FRAG                (1U << 0)
> +#define FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL    (1U << 1)
> +#define FLOW_DISSECTOR_F_STOP_AT_ENCAP         (1U << 2)

Do we have to move these?

Thanks,
Song

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 2/7] bpf/flow_dissector: document flags
  2019-07-24 17:00 ` [PATCH bpf-next 2/7] bpf/flow_dissector: document flags Stanislav Fomichev
@ 2019-07-24 22:24   ` Song Liu
  0 siblings, 0 replies; 20+ messages in thread
From: Song Liu @ 2019-07-24 22:24 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Networking, bpf, David S . Miller, Alexei Starovoitov,
	Daniel Borkmann, Willem de Bruijn, Petar Penkov

On Wed, Jul 24, 2019 at 10:11 AM Stanislav Fomichev <sdf@google.com> wrote:
>
> Describe what each input flag does and who uses it.
>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Petar Penkov <ppenkov@google.com>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>

Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  Documentation/bpf/prog_flow_dissector.rst | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
>
> diff --git a/Documentation/bpf/prog_flow_dissector.rst b/Documentation/bpf/prog_flow_dissector.rst
> index ed343abe541e..0f3f380b2ce4 100644
> --- a/Documentation/bpf/prog_flow_dissector.rst
> +++ b/Documentation/bpf/prog_flow_dissector.rst
> @@ -26,6 +26,7 @@ and output arguments.
>    * ``nhoff`` - initial offset of the networking header
>    * ``thoff`` - initial offset of the transport header, initialized to nhoff
>    * ``n_proto`` - L3 protocol type, parsed out of L2 header
> +  * ``flags`` - optional flags
>
>  Flow dissector BPF program should fill out the rest of the ``struct
>  bpf_flow_keys`` fields. Input arguments ``nhoff/thoff/n_proto`` should be
> @@ -101,6 +102,23 @@ can be called for both cases and would have to be written carefully to
>  handle both cases.
>
>
> +Flags
> +=====
> +
> +``flow_keys->flags`` might contain optional input flags that work as follows:
> +
> +* ``FLOW_DISSECTOR_F_PARSE_1ST_FRAG`` - tells BPF flow dissector to continue
> +  parsing first fragment; the default expected behavior is that flow dissector
> +  returns as soon as it finds out that the packet is fragmented;
> +  used by ``eth_get_headlen`` to estimate length of all headers for GRO.
> +* ``FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL`` - tells BPF flow dissector to stop
> +  parsing as soon as it reaches IPv6 flow label; used by ``___skb_get_hash``
> +  and ``__skb_get_hash_symmetric`` to get flow hash.
> +* ``FLOW_DISSECTOR_F_STOP_AT_ENCAP`` - tells BPF flow dissector to stop
> +  parsing as soon as it reaches encapsulated headers; used by routing
> +  infrastructure.
> +
> +
>  Reference Implementation
>  ========================
>
> --
> 2.22.0.657.g960e92d24f-goog
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 1/7] bpf/flow_dissector: pass input flags to BPF flow dissector program
  2019-07-24 22:21   ` Song Liu
@ 2019-07-24 22:32     ` Stanislav Fomichev
  2019-07-24 22:45       ` Song Liu
  0 siblings, 1 reply; 20+ messages in thread
From: Stanislav Fomichev @ 2019-07-24 22:32 UTC (permalink / raw)
  To: Song Liu
  Cc: Stanislav Fomichev, Networking, bpf, David S . Miller,
	Alexei Starovoitov, Daniel Borkmann, Willem de Bruijn,
	Petar Penkov

On 07/24, Song Liu wrote:
> On Wed, Jul 24, 2019 at 10:11 AM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > C flow dissector supports input flags that tell it to customize parsing
> > by either stopping early or trying to parse as deep as possible. Pass
> > those flags to the BPF flow dissector so it can make the same
> > decisions. In the next commits I'll add support for those flags to
> > our reference bpf_flow.c
> >
> > Cc: Willem de Bruijn <willemb@google.com>
> > Cc: Petar Penkov <ppenkov@google.com>
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >  include/linux/skbuff.h       | 2 +-
> >  include/net/flow_dissector.h | 4 ----
> >  include/uapi/linux/bpf.h     | 5 +++++
> >  net/bpf/test_run.c           | 2 +-
> >  net/core/flow_dissector.c    | 5 +++--
> >  5 files changed, 10 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index 718742b1c505..9b7a8038beec 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -1271,7 +1271,7 @@ static inline int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr)
> >
> >  struct bpf_flow_dissector;
> >  bool bpf_flow_dissect(struct bpf_prog *prog, struct bpf_flow_dissector *ctx,
> > -                     __be16 proto, int nhoff, int hlen);
> > +                     __be16 proto, int nhoff, int hlen, unsigned int flags);
> >
> >  bool __skb_flow_dissect(const struct net *net,
> >                         const struct sk_buff *skb,
> > diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
> > index 90bd210be060..3e2642587b76 100644
> > --- a/include/net/flow_dissector.h
> > +++ b/include/net/flow_dissector.h
> > @@ -253,10 +253,6 @@ enum flow_dissector_key_id {
> >         FLOW_DISSECTOR_KEY_MAX,
> >  };
> >
> > -#define FLOW_DISSECTOR_F_PARSE_1ST_FRAG                BIT(0)
> > -#define FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL    BIT(1)
> > -#define FLOW_DISSECTOR_F_STOP_AT_ENCAP         BIT(2)
> > -
> >  struct flow_dissector_key {
> >         enum flow_dissector_key_id key_id;
> >         size_t offset; /* offset of struct flow_dissector_key_*
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index fa1c753dcdbc..b4ad19bd6aa8 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -3507,6 +3507,10 @@ enum bpf_task_fd_type {
> >         BPF_FD_TYPE_URETPROBE,          /* filename + offset */
> >  };
> >
> > +#define FLOW_DISSECTOR_F_PARSE_1ST_FRAG                (1U << 0)
> > +#define FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL    (1U << 1)
> > +#define FLOW_DISSECTOR_F_STOP_AT_ENCAP         (1U << 2)
> 
> Do we have to move these?
These have to be a part of UAPI one way or another. The easiest thing
we can do is to call the existing flags a UAPI and move them into
exported headers. Alternatively, we can introduce new set of UAPI flags
and do some kind of conversion between exported and internal ones.

Since it's pretty easy to add/deprecate them (see cover letter), I've
decided to just move them instead of adding another set and doing
conversion. I'm open to suggestions if you think otherwise.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 3/7] bpf/flow_dissector: support flags in BPF_PROG_TEST_RUN
  2019-07-24 17:00 ` [PATCH bpf-next 3/7] bpf/flow_dissector: support flags in BPF_PROG_TEST_RUN Stanislav Fomichev
@ 2019-07-24 22:33   ` Song Liu
  0 siblings, 0 replies; 20+ messages in thread
From: Song Liu @ 2019-07-24 22:33 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Networking, bpf, David S . Miller, Alexei Starovoitov,
	Daniel Borkmann, Willem de Bruijn, Petar Penkov

On Wed, Jul 24, 2019 at 10:11 AM Stanislav Fomichev <sdf@google.com> wrote:
>
> This will allow us to write tests for those flags.
>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Petar Penkov <ppenkov@google.com>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>

Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  net/bpf/test_run.c | 39 +++++++++++++++++++++++++++++++++++----
>  1 file changed, 35 insertions(+), 4 deletions(-)
>
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index 4e41d15a1098..444a7baed791 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -377,6 +377,22 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
>         return ret;
>  }
>
> +static int verify_user_bpf_flow_keys(struct bpf_flow_keys *ctx)
> +{
> +       /* make sure the fields we don't use are zeroed */
> +       if (!range_is_zero(ctx, 0, offsetof(struct bpf_flow_keys, flags)))
> +               return -EINVAL;
> +
> +       /* flags is allowed */
> +
> +       if (!range_is_zero(ctx, offsetof(struct bpf_flow_keys, flags) +
> +                          FIELD_SIZEOF(struct bpf_flow_keys, flags),
> +                          sizeof(struct bpf_flow_keys)))
> +               return -EINVAL;
> +
> +       return 0;
> +}
> +
>  int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
>                                      const union bpf_attr *kattr,
>                                      union bpf_attr __user *uattr)
> @@ -384,9 +400,11 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
>         u32 size = kattr->test.data_size_in;
>         struct bpf_flow_dissector ctx = {};
>         u32 repeat = kattr->test.repeat;
> +       struct bpf_flow_keys *user_ctx;
>         struct bpf_flow_keys flow_keys;
>         u64 time_start, time_spent = 0;
>         const struct ethhdr *eth;
> +       unsigned int flags = 0;
>         u32 retval, duration;
>         void *data;
>         int ret;
> @@ -395,9 +413,6 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
>         if (prog->type != BPF_PROG_TYPE_FLOW_DISSECTOR)
>                 return -EINVAL;
>
> -       if (kattr->test.ctx_in || kattr->test.ctx_out)
> -               return -EINVAL;
> -
>         if (size < ETH_HLEN)
>                 return -EINVAL;
>
> @@ -410,6 +425,18 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
>         if (!repeat)
>                 repeat = 1;
>
> +       user_ctx = bpf_ctx_init(kattr, sizeof(struct bpf_flow_keys));
> +       if (IS_ERR(user_ctx)) {
> +               kfree(data);
> +               return PTR_ERR(user_ctx);
> +       }
> +       if (user_ctx) {
> +               ret = verify_user_bpf_flow_keys(user_ctx);
> +               if (ret)
> +                       goto out;
> +               flags = user_ctx->flags;
> +       }
> +
>         ctx.flow_keys = &flow_keys;
>         ctx.data = data;
>         ctx.data_end = (__u8 *)data + size;
> @@ -419,7 +446,7 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
>         time_start = ktime_get_ns();
>         for (i = 0; i < repeat; i++) {
>                 retval = bpf_flow_dissect(prog, &ctx, eth->h_proto, ETH_HLEN,
> -                                         size, 0);
> +                                         size, flags);
>
>                 if (signal_pending(current)) {
>                         preempt_enable();
> @@ -450,8 +477,12 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
>
>         ret = bpf_test_finish(kattr, uattr, &flow_keys, sizeof(flow_keys),
>                               retval, duration);
> +       if (!ret)
> +               ret = bpf_ctx_finish(kattr, uattr, user_ctx,
> +                                    sizeof(struct bpf_flow_keys));
>
>  out:
>         kfree(data);
> +       kfree(user_ctx);

nit: it is not really necessary now, but maybe put kfree(user_ctx)
before kfree(data).

>         return ret;
>  }
> --
> 2.22.0.657.g960e92d24f-goog
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 1/7] bpf/flow_dissector: pass input flags to BPF flow dissector program
  2019-07-24 22:32     ` Stanislav Fomichev
@ 2019-07-24 22:45       ` Song Liu
  0 siblings, 0 replies; 20+ messages in thread
From: Song Liu @ 2019-07-24 22:45 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Stanislav Fomichev, Networking, bpf, David S . Miller,
	Alexei Starovoitov, Daniel Borkmann, Willem de Bruijn,
	Petar Penkov

On Wed, Jul 24, 2019 at 3:32 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
>
> On 07/24, Song Liu wrote:
> > On Wed, Jul 24, 2019 at 10:11 AM Stanislav Fomichev <sdf@google.com> wrote:
> > >
> > > C flow dissector supports input flags that tell it to customize parsing
> > > by either stopping early or trying to parse as deep as possible. Pass
> > > those flags to the BPF flow dissector so it can make the same
> > > decisions. In the next commits I'll add support for those flags to
> > > our reference bpf_flow.c
> > >
> > > Cc: Willem de Bruijn <willemb@google.com>
> > > Cc: Petar Penkov <ppenkov@google.com>
> > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > ---
> > >  include/linux/skbuff.h       | 2 +-
> > >  include/net/flow_dissector.h | 4 ----
> > >  include/uapi/linux/bpf.h     | 5 +++++
> > >  net/bpf/test_run.c           | 2 +-
> > >  net/core/flow_dissector.c    | 5 +++--
> > >  5 files changed, 10 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > > index 718742b1c505..9b7a8038beec 100644
> > > --- a/include/linux/skbuff.h
> > > +++ b/include/linux/skbuff.h
> > > @@ -1271,7 +1271,7 @@ static inline int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr)
> > >
> > >  struct bpf_flow_dissector;
> > >  bool bpf_flow_dissect(struct bpf_prog *prog, struct bpf_flow_dissector *ctx,
> > > -                     __be16 proto, int nhoff, int hlen);
> > > +                     __be16 proto, int nhoff, int hlen, unsigned int flags);
> > >
> > >  bool __skb_flow_dissect(const struct net *net,
> > >                         const struct sk_buff *skb,
> > > diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
> > > index 90bd210be060..3e2642587b76 100644
> > > --- a/include/net/flow_dissector.h
> > > +++ b/include/net/flow_dissector.h
> > > @@ -253,10 +253,6 @@ enum flow_dissector_key_id {
> > >         FLOW_DISSECTOR_KEY_MAX,
> > >  };
> > >
> > > -#define FLOW_DISSECTOR_F_PARSE_1ST_FRAG                BIT(0)
> > > -#define FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL    BIT(1)
> > > -#define FLOW_DISSECTOR_F_STOP_AT_ENCAP         BIT(2)
> > > -
> > >  struct flow_dissector_key {
> > >         enum flow_dissector_key_id key_id;
> > >         size_t offset; /* offset of struct flow_dissector_key_*
> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > index fa1c753dcdbc..b4ad19bd6aa8 100644
> > > --- a/include/uapi/linux/bpf.h
> > > +++ b/include/uapi/linux/bpf.h
> > > @@ -3507,6 +3507,10 @@ enum bpf_task_fd_type {
> > >         BPF_FD_TYPE_URETPROBE,          /* filename + offset */
> > >  };
> > >
> > > +#define FLOW_DISSECTOR_F_PARSE_1ST_FRAG                (1U << 0)
> > > +#define FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL    (1U << 1)
> > > +#define FLOW_DISSECTOR_F_STOP_AT_ENCAP         (1U << 2)
> >
> > Do we have to move these?
> These have to be a part of UAPI one way or another. The easiest thing
> we can do is to call the existing flags a UAPI and move them into
> exported headers. Alternatively, we can introduce new set of UAPI flags
> and do some kind of conversion between exported and internal ones.
>
> Since it's pretty easy to add/deprecate them (see cover letter), I've
> decided to just move them instead of adding another set and doing
> conversion. I'm open to suggestions if you think otherwise.

I see. This looks good to me. Thanks for the explanation.

Acked-by: Song Liu <songliubraving@fb.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 4/7] tools/bpf: sync bpf_flow_keys flags
  2019-07-24 17:00 ` [PATCH bpf-next 4/7] tools/bpf: sync bpf_flow_keys flags Stanislav Fomichev
@ 2019-07-24 23:17   ` Song Liu
  0 siblings, 0 replies; 20+ messages in thread
From: Song Liu @ 2019-07-24 23:17 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Networking, bpf, David S . Miller, Alexei Starovoitov,
	Daniel Borkmann, Willem de Bruijn, Petar Penkov

On Wed, Jul 24, 2019 at 10:11 AM Stanislav Fomichev <sdf@google.com> wrote:
>
> Export bpf_flow_keys flags to tools/libbpf/selftests.
>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Petar Penkov <ppenkov@google.com>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>

Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  tools/include/uapi/linux/bpf.h | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 4e455018da65..a0e1c891b56f 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -3504,6 +3504,10 @@ enum bpf_task_fd_type {
>         BPF_FD_TYPE_URETPROBE,          /* filename + offset */
>  };
>
> +#define FLOW_DISSECTOR_F_PARSE_1ST_FRAG                (1U << 0)
> +#define FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL    (1U << 1)
> +#define FLOW_DISSECTOR_F_STOP_AT_ENCAP         (1U << 2)
> +
>  struct bpf_flow_keys {
>         __u16   nhoff;
>         __u16   thoff;
> @@ -3525,6 +3529,7 @@ struct bpf_flow_keys {
>                         __u32   ipv6_dst[4];    /* in6_addr; network order */
>                 };
>         };
> +       __u32   flags;
>  };
>
>  struct bpf_func_info {
> --
> 2.22.0.657.g960e92d24f-goog
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 5/7] sefltests/bpf: support FLOW_DISSECTOR_F_PARSE_1ST_FRAG
  2019-07-24 17:00 ` [PATCH bpf-next 5/7] sefltests/bpf: support FLOW_DISSECTOR_F_PARSE_1ST_FRAG Stanislav Fomichev
@ 2019-07-24 23:21   ` Song Liu
  2019-07-24 23:52     ` Stanislav Fomichev
  0 siblings, 1 reply; 20+ messages in thread
From: Song Liu @ 2019-07-24 23:21 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Networking, bpf, David S . Miller, Alexei Starovoitov,
	Daniel Borkmann, Willem de Bruijn, Petar Penkov

On Wed, Jul 24, 2019 at 10:11 AM Stanislav Fomichev <sdf@google.com> wrote:
>
> bpf_flow.c: exit early unless FLOW_DISSECTOR_F_PARSE_1ST_FRAG is passed
> in flags. Also, set ip_proto earlier, this makes sure we have correct
> value with fragmented packets.
>
> Add selftest cases to test ipv4/ipv6 fragments and skip eth_get_headlen
> tests that don't have FLOW_DISSECTOR_F_PARSE_1ST_FRAG flag.
>
> eth_get_headlen calls flow dissector with
> FLOW_DISSECTOR_F_PARSE_1ST_FRAG flag so we can't run tests that
> have different set of input flags against it.
>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Petar Penkov <ppenkov@google.com>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  .../selftests/bpf/prog_tests/flow_dissector.c | 129 ++++++++++++++++++
>  tools/testing/selftests/bpf/progs/bpf_flow.c  |  28 +++-
>  2 files changed, 151 insertions(+), 6 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
> index c938283ac232..966cb3b06870 100644
> --- a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
> +++ b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
> @@ -5,6 +5,10 @@
>  #include <linux/if_tun.h>
>  #include <sys/uio.h>
>
> +#ifndef IP_MF
> +#define IP_MF 0x2000
> +#endif
> +
>  #define CHECK_FLOW_KEYS(desc, got, expected)                           \
>         CHECK_ATTR(memcmp(&got, &expected, sizeof(got)) != 0,           \
>               desc,                                                     \
> @@ -49,6 +53,18 @@ struct ipv6_pkt {
>         struct tcphdr tcp;
>  } __packed;
>
> +struct ipv6_frag_pkt {
> +       struct ethhdr eth;
> +       struct ipv6hdr iph;
> +       struct frag_hdr {
> +               __u8 nexthdr;
> +               __u8 reserved;
> +               __be16 frag_off;
> +               __be32 identification;
> +       } ipf;
> +       struct tcphdr tcp;
> +} __packed;
> +
>  struct dvlan_ipv6_pkt {
>         struct ethhdr eth;
>         __u16 vlan_tci;
> @@ -65,9 +81,11 @@ struct test {
>                 struct ipv4_pkt ipv4;
>                 struct svlan_ipv4_pkt svlan_ipv4;
>                 struct ipv6_pkt ipv6;
> +               struct ipv6_frag_pkt ipv6_frag;
>                 struct dvlan_ipv6_pkt dvlan_ipv6;
>         } pkt;
>         struct bpf_flow_keys keys;
> +       __u32 flags;
>  };
>
>  #define VLAN_HLEN      4
> @@ -143,6 +161,102 @@ struct test tests[] = {
>                         .n_proto = __bpf_constant_htons(ETH_P_IPV6),
>                 },
>         },
> +       {
> +               .name = "ipv4-frag",
> +               .pkt.ipv4 = {
> +                       .eth.h_proto = __bpf_constant_htons(ETH_P_IP),
> +                       .iph.ihl = 5,
> +                       .iph.protocol = IPPROTO_TCP,
> +                       .iph.tot_len = __bpf_constant_htons(MAGIC_BYTES),
> +                       .iph.frag_off = __bpf_constant_htons(IP_MF),
> +                       .tcp.doff = 5,
> +                       .tcp.source = 80,
> +                       .tcp.dest = 8080,
> +               },
> +               .keys = {
> +                       .flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG,
> +                       .nhoff = ETH_HLEN,
> +                       .thoff = ETH_HLEN + sizeof(struct iphdr),
> +                       .addr_proto = ETH_P_IP,
> +                       .ip_proto = IPPROTO_TCP,
> +                       .n_proto = __bpf_constant_htons(ETH_P_IP),
> +                       .is_frag = true,
> +                       .is_first_frag = true,
> +                       .sport = 80,
> +                       .dport = 8080,
> +               },
> +               .flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG,
> +       },
> +       {
> +               .name = "ipv4-no-frag",
> +               .pkt.ipv4 = {
> +                       .eth.h_proto = __bpf_constant_htons(ETH_P_IP),
> +                       .iph.ihl = 5,
> +                       .iph.protocol = IPPROTO_TCP,
> +                       .iph.tot_len = __bpf_constant_htons(MAGIC_BYTES),
> +                       .iph.frag_off = __bpf_constant_htons(IP_MF),
> +                       .tcp.doff = 5,
> +                       .tcp.source = 80,
> +                       .tcp.dest = 8080,
> +               },
> +               .keys = {
> +                       .nhoff = ETH_HLEN,
> +                       .thoff = ETH_HLEN + sizeof(struct iphdr),
> +                       .addr_proto = ETH_P_IP,
> +                       .ip_proto = IPPROTO_TCP,
> +                       .n_proto = __bpf_constant_htons(ETH_P_IP),
> +                       .is_frag = true,
> +                       .is_first_frag = true,
> +               },
> +       },
> +       {
> +               .name = "ipv6-frag",
> +               .pkt.ipv6_frag = {
> +                       .eth.h_proto = __bpf_constant_htons(ETH_P_IPV6),
> +                       .iph.nexthdr = IPPROTO_FRAGMENT,
> +                       .iph.payload_len = __bpf_constant_htons(MAGIC_BYTES),
> +                       .ipf.nexthdr = IPPROTO_TCP,
> +                       .tcp.doff = 5,
> +                       .tcp.source = 80,
> +                       .tcp.dest = 8080,
> +               },
> +               .keys = {
> +                       .flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG,
> +                       .nhoff = ETH_HLEN,
> +                       .thoff = ETH_HLEN + sizeof(struct ipv6hdr) +
> +                               sizeof(struct frag_hdr),
> +                       .addr_proto = ETH_P_IPV6,
> +                       .ip_proto = IPPROTO_TCP,
> +                       .n_proto = __bpf_constant_htons(ETH_P_IPV6),
> +                       .is_frag = true,
> +                       .is_first_frag = true,
> +                       .sport = 80,
> +                       .dport = 8080,
> +               },
> +               .flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG,
> +       },
> +       {
> +               .name = "ipv6-no-frag",
> +               .pkt.ipv6_frag = {
> +                       .eth.h_proto = __bpf_constant_htons(ETH_P_IPV6),
> +                       .iph.nexthdr = IPPROTO_FRAGMENT,
> +                       .iph.payload_len = __bpf_constant_htons(MAGIC_BYTES),
> +                       .ipf.nexthdr = IPPROTO_TCP,
> +                       .tcp.doff = 5,
> +                       .tcp.source = 80,
> +                       .tcp.dest = 8080,
> +               },
> +               .keys = {
> +                       .nhoff = ETH_HLEN,
> +                       .thoff = ETH_HLEN + sizeof(struct ipv6hdr) +
> +                               sizeof(struct frag_hdr),
> +                       .addr_proto = ETH_P_IPV6,
> +                       .ip_proto = IPPROTO_TCP,
> +                       .n_proto = __bpf_constant_htons(ETH_P_IPV6),
> +                       .is_frag = true,
> +                       .is_first_frag = true,
> +               },
> +       },
>  };
>
>  static int create_tap(const char *ifname)
> @@ -225,6 +339,13 @@ void test_flow_dissector(void)
>                         .data_size_in = sizeof(tests[i].pkt),
>                         .data_out = &flow_keys,
>                 };
> +               static struct bpf_flow_keys ctx = {};
> +
> +               if (tests[i].flags) {
> +                       tattr.ctx_in = &ctx;
> +                       tattr.ctx_size_in = sizeof(ctx);
> +                       ctx.flags = tests[i].flags;
> +               }
>
>                 err = bpf_prog_test_run_xattr(&tattr);
>                 CHECK_ATTR(tattr.data_size_out != sizeof(flow_keys) ||
> @@ -255,6 +376,14 @@ void test_flow_dissector(void)
>                 struct bpf_prog_test_run_attr tattr = {};
>                 __u32 key = 0;
>
> +               /* Don't run tests that are not marked as
> +                * FLOW_DISSECTOR_F_PARSE_1ST_FRAG; eth_get_headlen
> +                * sets this flag.
> +                */
> +
> +               if (tests[i].flags != FLOW_DISSECTOR_F_PARSE_1ST_FRAG)
> +                       continue;

Maybe test flags & FLOW_DISSECTOR_F_PARSE_1ST_FRAG == 0 instead?
It is not necessary now, but might be useful in the future.

Thanks,
Song

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 6/7] bpf/flow_dissector: support ipv6 flow_label and FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL
  2019-07-24 17:00 ` [PATCH bpf-next 6/7] bpf/flow_dissector: support ipv6 flow_label and FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL Stanislav Fomichev
@ 2019-07-24 23:28   ` Song Liu
  0 siblings, 0 replies; 20+ messages in thread
From: Song Liu @ 2019-07-24 23:28 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Networking, bpf, David S . Miller, Alexei Starovoitov,
	Daniel Borkmann, Willem de Bruijn, Petar Penkov

On Wed, Jul 24, 2019 at 10:11 AM Stanislav Fomichev <sdf@google.com> wrote:
>
> Add support for exporting ipv6 flow label via bpf_flow_keys.
> Export flow label from bpf_flow.c and also return early when
> FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL is passed.
>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Petar Penkov <ppenkov@google.com>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>

Acked-by: Song Liu <songliubraving@fb.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 7/7] selftests/bpf: support FLOW_DISSECTOR_F_STOP_AT_ENCAP
  2019-07-24 17:00 ` [PATCH bpf-next 7/7] selftests/bpf: support FLOW_DISSECTOR_F_STOP_AT_ENCAP Stanislav Fomichev
@ 2019-07-24 23:29   ` Song Liu
  0 siblings, 0 replies; 20+ messages in thread
From: Song Liu @ 2019-07-24 23:29 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Networking, bpf, David S . Miller, Alexei Starovoitov,
	Daniel Borkmann, Willem de Bruijn, Petar Penkov

On Wed, Jul 24, 2019 at 10:11 AM Stanislav Fomichev <sdf@google.com> wrote:
>
> Exit as soon as we found that packet is encapped when
> FLOW_DISSECTOR_F_STOP_AT_ENCAP is passed.
> Add appropriate selftest cases.
>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Petar Penkov <ppenkov@google.com>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>

Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  .../selftests/bpf/prog_tests/flow_dissector.c | 60 +++++++++++++++++++
>  tools/testing/selftests/bpf/progs/bpf_flow.c  |  8 +++
>  2 files changed, 68 insertions(+)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
> index 1ea921c4cdc0..e382264fbc40 100644
> --- a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
> +++ b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
> @@ -41,6 +41,13 @@ struct ipv4_pkt {
>         struct tcphdr tcp;
>  } __packed;
>
> +struct ipip_pkt {
> +       struct ethhdr eth;
> +       struct iphdr iph;
> +       struct iphdr iph_inner;
> +       struct tcphdr tcp;
> +} __packed;
> +
>  struct svlan_ipv4_pkt {
>         struct ethhdr eth;
>         __u16 vlan_tci;
> @@ -82,6 +89,7 @@ struct test {
>         union {
>                 struct ipv4_pkt ipv4;
>                 struct svlan_ipv4_pkt svlan_ipv4;
> +               struct ipip_pkt ipip;
>                 struct ipv6_pkt ipv6;
>                 struct ipv6_frag_pkt ipv6_frag;
>                 struct dvlan_ipv6_pkt dvlan_ipv6;
> @@ -303,6 +311,58 @@ struct test tests[] = {
>                 },
>                 .flags = FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL,
>         },
> +       {
> +               .name = "ipip-encap",
> +               .pkt.ipip = {
> +                       .eth.h_proto = __bpf_constant_htons(ETH_P_IP),
> +                       .iph.ihl = 5,
> +                       .iph.protocol = IPPROTO_IPIP,
> +                       .iph.tot_len = __bpf_constant_htons(MAGIC_BYTES),
> +                       .iph_inner.ihl = 5,
> +                       .iph_inner.protocol = IPPROTO_TCP,
> +                       .iph_inner.tot_len = __bpf_constant_htons(MAGIC_BYTES),
> +                       .tcp.doff = 5,
> +                       .tcp.source = 80,
> +                       .tcp.dest = 8080,
> +               },
> +               .keys = {
> +                       .nhoff = 0,
> +                       .nhoff = ETH_HLEN,
> +                       .thoff = ETH_HLEN + sizeof(struct iphdr) +
> +                               sizeof(struct iphdr),
> +                       .addr_proto = ETH_P_IP,
> +                       .ip_proto = IPPROTO_TCP,
> +                       .n_proto = __bpf_constant_htons(ETH_P_IP),
> +                       .is_encap = true,
> +                       .sport = 80,
> +                       .dport = 8080,
> +               },
> +       },
> +       {
> +               .name = "ipip-no-encap",
> +               .pkt.ipip = {
> +                       .eth.h_proto = __bpf_constant_htons(ETH_P_IP),
> +                       .iph.ihl = 5,
> +                       .iph.protocol = IPPROTO_IPIP,
> +                       .iph.tot_len = __bpf_constant_htons(MAGIC_BYTES),
> +                       .iph_inner.ihl = 5,
> +                       .iph_inner.protocol = IPPROTO_TCP,
> +                       .iph_inner.tot_len = __bpf_constant_htons(MAGIC_BYTES),
> +                       .tcp.doff = 5,
> +                       .tcp.source = 80,
> +                       .tcp.dest = 8080,
> +               },
> +               .keys = {
> +                       .flags = FLOW_DISSECTOR_F_STOP_AT_ENCAP,
> +                       .nhoff = ETH_HLEN,
> +                       .thoff = ETH_HLEN + sizeof(struct iphdr),
> +                       .addr_proto = ETH_P_IP,
> +                       .ip_proto = IPPROTO_IPIP,
> +                       .n_proto = __bpf_constant_htons(ETH_P_IP),
> +                       .is_encap = true,
> +               },
> +               .flags = FLOW_DISSECTOR_F_STOP_AT_ENCAP,
> +       },
>  };
>
>  static int create_tap(const char *ifname)
> diff --git a/tools/testing/selftests/bpf/progs/bpf_flow.c b/tools/testing/selftests/bpf/progs/bpf_flow.c
> index 7d73b7bfe609..b6236cdf8564 100644
> --- a/tools/testing/selftests/bpf/progs/bpf_flow.c
> +++ b/tools/testing/selftests/bpf/progs/bpf_flow.c
> @@ -167,9 +167,15 @@ static __always_inline int parse_ip_proto(struct __sk_buff *skb, __u8 proto)
>                 return export_flow_keys(keys, BPF_OK);
>         case IPPROTO_IPIP:
>                 keys->is_encap = true;
> +               if (keys->flags & FLOW_DISSECTOR_F_STOP_AT_ENCAP)
> +                       return export_flow_keys(keys, BPF_OK);
> +
>                 return parse_eth_proto(skb, bpf_htons(ETH_P_IP));
>         case IPPROTO_IPV6:
>                 keys->is_encap = true;
> +               if (keys->flags & FLOW_DISSECTOR_F_STOP_AT_ENCAP)
> +                       return export_flow_keys(keys, BPF_OK);
> +
>                 return parse_eth_proto(skb, bpf_htons(ETH_P_IPV6));
>         case IPPROTO_GRE:
>                 gre = bpf_flow_dissect_get_header(skb, sizeof(*gre), &_gre);
> @@ -189,6 +195,8 @@ static __always_inline int parse_ip_proto(struct __sk_buff *skb, __u8 proto)
>                         keys->thoff += 4; /* Step over sequence number */
>
>                 keys->is_encap = true;
> +               if (keys->flags & FLOW_DISSECTOR_F_STOP_AT_ENCAP)
> +                       return export_flow_keys(keys, BPF_OK);
>
>                 if (gre->proto == bpf_htons(ETH_P_TEB)) {
>                         eth = bpf_flow_dissect_get_header(skb, sizeof(*eth),
> --
> 2.22.0.657.g960e92d24f-goog
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 5/7] sefltests/bpf: support FLOW_DISSECTOR_F_PARSE_1ST_FRAG
  2019-07-24 23:21   ` Song Liu
@ 2019-07-24 23:52     ` Stanislav Fomichev
  2019-07-25  5:36       ` Song Liu
  0 siblings, 1 reply; 20+ messages in thread
From: Stanislav Fomichev @ 2019-07-24 23:52 UTC (permalink / raw)
  To: Song Liu
  Cc: Stanislav Fomichev, Networking, bpf, David S . Miller,
	Alexei Starovoitov, Daniel Borkmann, Willem de Bruijn,
	Petar Penkov

On 07/24, Song Liu wrote:
> On Wed, Jul 24, 2019 at 10:11 AM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > bpf_flow.c: exit early unless FLOW_DISSECTOR_F_PARSE_1ST_FRAG is passed
> > in flags. Also, set ip_proto earlier, this makes sure we have correct
> > value with fragmented packets.
> >
> > Add selftest cases to test ipv4/ipv6 fragments and skip eth_get_headlen
> > tests that don't have FLOW_DISSECTOR_F_PARSE_1ST_FRAG flag.
> >
> > eth_get_headlen calls flow dissector with
> > FLOW_DISSECTOR_F_PARSE_1ST_FRAG flag so we can't run tests that
> > have different set of input flags against it.
> >
> > Cc: Willem de Bruijn <willemb@google.com>
> > Cc: Petar Penkov <ppenkov@google.com>
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >  .../selftests/bpf/prog_tests/flow_dissector.c | 129 ++++++++++++++++++
> >  tools/testing/selftests/bpf/progs/bpf_flow.c  |  28 +++-
> >  2 files changed, 151 insertions(+), 6 deletions(-)
> >
> > diff --git a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
> > index c938283ac232..966cb3b06870 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
> > @@ -5,6 +5,10 @@
> >  #include <linux/if_tun.h>
> >  #include <sys/uio.h>
> >
> > +#ifndef IP_MF
> > +#define IP_MF 0x2000
> > +#endif
> > +
> >  #define CHECK_FLOW_KEYS(desc, got, expected)                           \
> >         CHECK_ATTR(memcmp(&got, &expected, sizeof(got)) != 0,           \
> >               desc,                                                     \
> > @@ -49,6 +53,18 @@ struct ipv6_pkt {
> >         struct tcphdr tcp;
> >  } __packed;
> >
> > +struct ipv6_frag_pkt {
> > +       struct ethhdr eth;
> > +       struct ipv6hdr iph;
> > +       struct frag_hdr {
> > +               __u8 nexthdr;
> > +               __u8 reserved;
> > +               __be16 frag_off;
> > +               __be32 identification;
> > +       } ipf;
> > +       struct tcphdr tcp;
> > +} __packed;
> > +
> >  struct dvlan_ipv6_pkt {
> >         struct ethhdr eth;
> >         __u16 vlan_tci;
> > @@ -65,9 +81,11 @@ struct test {
> >                 struct ipv4_pkt ipv4;
> >                 struct svlan_ipv4_pkt svlan_ipv4;
> >                 struct ipv6_pkt ipv6;
> > +               struct ipv6_frag_pkt ipv6_frag;
> >                 struct dvlan_ipv6_pkt dvlan_ipv6;
> >         } pkt;
> >         struct bpf_flow_keys keys;
> > +       __u32 flags;
> >  };
> >
> >  #define VLAN_HLEN      4
> > @@ -143,6 +161,102 @@ struct test tests[] = {
> >                         .n_proto = __bpf_constant_htons(ETH_P_IPV6),
> >                 },
> >         },
> > +       {
> > +               .name = "ipv4-frag",
> > +               .pkt.ipv4 = {
> > +                       .eth.h_proto = __bpf_constant_htons(ETH_P_IP),
> > +                       .iph.ihl = 5,
> > +                       .iph.protocol = IPPROTO_TCP,
> > +                       .iph.tot_len = __bpf_constant_htons(MAGIC_BYTES),
> > +                       .iph.frag_off = __bpf_constant_htons(IP_MF),
> > +                       .tcp.doff = 5,
> > +                       .tcp.source = 80,
> > +                       .tcp.dest = 8080,
> > +               },
> > +               .keys = {
> > +                       .flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG,
> > +                       .nhoff = ETH_HLEN,
> > +                       .thoff = ETH_HLEN + sizeof(struct iphdr),
> > +                       .addr_proto = ETH_P_IP,
> > +                       .ip_proto = IPPROTO_TCP,
> > +                       .n_proto = __bpf_constant_htons(ETH_P_IP),
> > +                       .is_frag = true,
> > +                       .is_first_frag = true,
> > +                       .sport = 80,
> > +                       .dport = 8080,
> > +               },
> > +               .flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG,
> > +       },
> > +       {
> > +               .name = "ipv4-no-frag",
> > +               .pkt.ipv4 = {
> > +                       .eth.h_proto = __bpf_constant_htons(ETH_P_IP),
> > +                       .iph.ihl = 5,
> > +                       .iph.protocol = IPPROTO_TCP,
> > +                       .iph.tot_len = __bpf_constant_htons(MAGIC_BYTES),
> > +                       .iph.frag_off = __bpf_constant_htons(IP_MF),
> > +                       .tcp.doff = 5,
> > +                       .tcp.source = 80,
> > +                       .tcp.dest = 8080,
> > +               },
> > +               .keys = {
> > +                       .nhoff = ETH_HLEN,
> > +                       .thoff = ETH_HLEN + sizeof(struct iphdr),
> > +                       .addr_proto = ETH_P_IP,
> > +                       .ip_proto = IPPROTO_TCP,
> > +                       .n_proto = __bpf_constant_htons(ETH_P_IP),
> > +                       .is_frag = true,
> > +                       .is_first_frag = true,
> > +               },
> > +       },
> > +       {
> > +               .name = "ipv6-frag",
> > +               .pkt.ipv6_frag = {
> > +                       .eth.h_proto = __bpf_constant_htons(ETH_P_IPV6),
> > +                       .iph.nexthdr = IPPROTO_FRAGMENT,
> > +                       .iph.payload_len = __bpf_constant_htons(MAGIC_BYTES),
> > +                       .ipf.nexthdr = IPPROTO_TCP,
> > +                       .tcp.doff = 5,
> > +                       .tcp.source = 80,
> > +                       .tcp.dest = 8080,
> > +               },
> > +               .keys = {
> > +                       .flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG,
> > +                       .nhoff = ETH_HLEN,
> > +                       .thoff = ETH_HLEN + sizeof(struct ipv6hdr) +
> > +                               sizeof(struct frag_hdr),
> > +                       .addr_proto = ETH_P_IPV6,
> > +                       .ip_proto = IPPROTO_TCP,
> > +                       .n_proto = __bpf_constant_htons(ETH_P_IPV6),
> > +                       .is_frag = true,
> > +                       .is_first_frag = true,
> > +                       .sport = 80,
> > +                       .dport = 8080,
> > +               },
> > +               .flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG,
> > +       },
> > +       {
> > +               .name = "ipv6-no-frag",
> > +               .pkt.ipv6_frag = {
> > +                       .eth.h_proto = __bpf_constant_htons(ETH_P_IPV6),
> > +                       .iph.nexthdr = IPPROTO_FRAGMENT,
> > +                       .iph.payload_len = __bpf_constant_htons(MAGIC_BYTES),
> > +                       .ipf.nexthdr = IPPROTO_TCP,
> > +                       .tcp.doff = 5,
> > +                       .tcp.source = 80,
> > +                       .tcp.dest = 8080,
> > +               },
> > +               .keys = {
> > +                       .nhoff = ETH_HLEN,
> > +                       .thoff = ETH_HLEN + sizeof(struct ipv6hdr) +
> > +                               sizeof(struct frag_hdr),
> > +                       .addr_proto = ETH_P_IPV6,
> > +                       .ip_proto = IPPROTO_TCP,
> > +                       .n_proto = __bpf_constant_htons(ETH_P_IPV6),
> > +                       .is_frag = true,
> > +                       .is_first_frag = true,
> > +               },
> > +       },
> >  };
> >
> >  static int create_tap(const char *ifname)
> > @@ -225,6 +339,13 @@ void test_flow_dissector(void)
> >                         .data_size_in = sizeof(tests[i].pkt),
> >                         .data_out = &flow_keys,
> >                 };
> > +               static struct bpf_flow_keys ctx = {};
> > +
> > +               if (tests[i].flags) {
> > +                       tattr.ctx_in = &ctx;
> > +                       tattr.ctx_size_in = sizeof(ctx);
> > +                       ctx.flags = tests[i].flags;
> > +               }
> >
> >                 err = bpf_prog_test_run_xattr(&tattr);
> >                 CHECK_ATTR(tattr.data_size_out != sizeof(flow_keys) ||
> > @@ -255,6 +376,14 @@ void test_flow_dissector(void)
> >                 struct bpf_prog_test_run_attr tattr = {};
> >                 __u32 key = 0;
> >
> > +               /* Don't run tests that are not marked as
> > +                * FLOW_DISSECTOR_F_PARSE_1ST_FRAG; eth_get_headlen
> > +                * sets this flag.
> > +                */
> > +
> > +               if (tests[i].flags != FLOW_DISSECTOR_F_PARSE_1ST_FRAG)
> > +                       continue;
> 
> Maybe test flags & FLOW_DISSECTOR_F_PARSE_1ST_FRAG == 0 instead?
> It is not necessary now, but might be useful in the future.
I'm not sure about this one. We want flags here to match flags
from eth_get_headlen:

	const unsigned int flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG;
	...
	if (!skb_flow_dissect_flow_keys_basic(..., flags))

Otherwise the test might break unexpectedly. So I'd rather manually
adjust a test here if eth_get_headlen flags change.

Maybe I should clarify the comment to signify that dependency? Because
currently it might be read as if we only care about
FLOW_DISSECTOR_F_PARSE_1ST_FRAG, but we really care about all flags
in eth_get_headlen; it just happens that it only has one right now.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 5/7] sefltests/bpf: support FLOW_DISSECTOR_F_PARSE_1ST_FRAG
  2019-07-24 23:52     ` Stanislav Fomichev
@ 2019-07-25  5:36       ` Song Liu
  0 siblings, 0 replies; 20+ messages in thread
From: Song Liu @ 2019-07-25  5:36 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Stanislav Fomichev, Networking, bpf, David S . Miller,
	Alexei Starovoitov, Daniel Borkmann, Willem de Bruijn,
	Petar Penkov

On Wed, Jul 24, 2019 at 4:52 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
>
> On 07/24, Song Liu wrote:
> > On Wed, Jul 24, 2019 at 10:11 AM Stanislav Fomichev <sdf@google.com> wrote:
> > >
> > > bpf_flow.c: exit early unless FLOW_DISSECTOR_F_PARSE_1ST_FRAG is passed
> > > in flags. Also, set ip_proto earlier, this makes sure we have correct
> > > value with fragmented packets.
> > >
> > > Add selftest cases to test ipv4/ipv6 fragments and skip eth_get_headlen
> > > tests that don't have FLOW_DISSECTOR_F_PARSE_1ST_FRAG flag.
> > >
> > > eth_get_headlen calls flow dissector with
> > > FLOW_DISSECTOR_F_PARSE_1ST_FRAG flag so we can't run tests that
> > > have different set of input flags against it.
> > >
> > > Cc: Willem de Bruijn <willemb@google.com>
> > > Cc: Petar Penkov <ppenkov@google.com>
> > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > ---
> > >  .../selftests/bpf/prog_tests/flow_dissector.c | 129 ++++++++++++++++++
> > >  tools/testing/selftests/bpf/progs/bpf_flow.c  |  28 +++-
> > >  2 files changed, 151 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
> > > index c938283ac232..966cb3b06870 100644
> > > --- a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
> > > +++ b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
> > > @@ -5,6 +5,10 @@
> > >  #include <linux/if_tun.h>
> > >  #include <sys/uio.h>
> > >
> > > +#ifndef IP_MF
> > > +#define IP_MF 0x2000
> > > +#endif
> > > +
> > >  #define CHECK_FLOW_KEYS(desc, got, expected)                           \
> > >         CHECK_ATTR(memcmp(&got, &expected, sizeof(got)) != 0,           \
> > >               desc,                                                     \
> > > @@ -49,6 +53,18 @@ struct ipv6_pkt {
> > >         struct tcphdr tcp;
> > >  } __packed;
> > >
> > > +struct ipv6_frag_pkt {
> > > +       struct ethhdr eth;
> > > +       struct ipv6hdr iph;
> > > +       struct frag_hdr {
> > > +               __u8 nexthdr;
> > > +               __u8 reserved;
> > > +               __be16 frag_off;
> > > +               __be32 identification;
> > > +       } ipf;
> > > +       struct tcphdr tcp;
> > > +} __packed;
> > > +
> > >  struct dvlan_ipv6_pkt {
> > >         struct ethhdr eth;
> > >         __u16 vlan_tci;
> > > @@ -65,9 +81,11 @@ struct test {
> > >                 struct ipv4_pkt ipv4;
> > >                 struct svlan_ipv4_pkt svlan_ipv4;
> > >                 struct ipv6_pkt ipv6;
> > > +               struct ipv6_frag_pkt ipv6_frag;
> > >                 struct dvlan_ipv6_pkt dvlan_ipv6;
> > >         } pkt;
> > >         struct bpf_flow_keys keys;
> > > +       __u32 flags;
> > >  };
> > >
> > >  #define VLAN_HLEN      4
> > > @@ -143,6 +161,102 @@ struct test tests[] = {
> > >                         .n_proto = __bpf_constant_htons(ETH_P_IPV6),
> > >                 },
> > >         },
> > > +       {
> > > +               .name = "ipv4-frag",
> > > +               .pkt.ipv4 = {
> > > +                       .eth.h_proto = __bpf_constant_htons(ETH_P_IP),
> > > +                       .iph.ihl = 5,
> > > +                       .iph.protocol = IPPROTO_TCP,
> > > +                       .iph.tot_len = __bpf_constant_htons(MAGIC_BYTES),
> > > +                       .iph.frag_off = __bpf_constant_htons(IP_MF),
> > > +                       .tcp.doff = 5,
> > > +                       .tcp.source = 80,
> > > +                       .tcp.dest = 8080,
> > > +               },
> > > +               .keys = {
> > > +                       .flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG,
> > > +                       .nhoff = ETH_HLEN,
> > > +                       .thoff = ETH_HLEN + sizeof(struct iphdr),
> > > +                       .addr_proto = ETH_P_IP,
> > > +                       .ip_proto = IPPROTO_TCP,
> > > +                       .n_proto = __bpf_constant_htons(ETH_P_IP),
> > > +                       .is_frag = true,
> > > +                       .is_first_frag = true,
> > > +                       .sport = 80,
> > > +                       .dport = 8080,
> > > +               },
> > > +               .flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG,
> > > +       },
> > > +       {
> > > +               .name = "ipv4-no-frag",
> > > +               .pkt.ipv4 = {
> > > +                       .eth.h_proto = __bpf_constant_htons(ETH_P_IP),
> > > +                       .iph.ihl = 5,
> > > +                       .iph.protocol = IPPROTO_TCP,
> > > +                       .iph.tot_len = __bpf_constant_htons(MAGIC_BYTES),
> > > +                       .iph.frag_off = __bpf_constant_htons(IP_MF),
> > > +                       .tcp.doff = 5,
> > > +                       .tcp.source = 80,
> > > +                       .tcp.dest = 8080,
> > > +               },
> > > +               .keys = {
> > > +                       .nhoff = ETH_HLEN,
> > > +                       .thoff = ETH_HLEN + sizeof(struct iphdr),
> > > +                       .addr_proto = ETH_P_IP,
> > > +                       .ip_proto = IPPROTO_TCP,
> > > +                       .n_proto = __bpf_constant_htons(ETH_P_IP),
> > > +                       .is_frag = true,
> > > +                       .is_first_frag = true,
> > > +               },
> > > +       },
> > > +       {
> > > +               .name = "ipv6-frag",
> > > +               .pkt.ipv6_frag = {
> > > +                       .eth.h_proto = __bpf_constant_htons(ETH_P_IPV6),
> > > +                       .iph.nexthdr = IPPROTO_FRAGMENT,
> > > +                       .iph.payload_len = __bpf_constant_htons(MAGIC_BYTES),
> > > +                       .ipf.nexthdr = IPPROTO_TCP,
> > > +                       .tcp.doff = 5,
> > > +                       .tcp.source = 80,
> > > +                       .tcp.dest = 8080,
> > > +               },
> > > +               .keys = {
> > > +                       .flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG,
> > > +                       .nhoff = ETH_HLEN,
> > > +                       .thoff = ETH_HLEN + sizeof(struct ipv6hdr) +
> > > +                               sizeof(struct frag_hdr),
> > > +                       .addr_proto = ETH_P_IPV6,
> > > +                       .ip_proto = IPPROTO_TCP,
> > > +                       .n_proto = __bpf_constant_htons(ETH_P_IPV6),
> > > +                       .is_frag = true,
> > > +                       .is_first_frag = true,
> > > +                       .sport = 80,
> > > +                       .dport = 8080,
> > > +               },
> > > +               .flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG,
> > > +       },
> > > +       {
> > > +               .name = "ipv6-no-frag",
> > > +               .pkt.ipv6_frag = {
> > > +                       .eth.h_proto = __bpf_constant_htons(ETH_P_IPV6),
> > > +                       .iph.nexthdr = IPPROTO_FRAGMENT,
> > > +                       .iph.payload_len = __bpf_constant_htons(MAGIC_BYTES),
> > > +                       .ipf.nexthdr = IPPROTO_TCP,
> > > +                       .tcp.doff = 5,
> > > +                       .tcp.source = 80,
> > > +                       .tcp.dest = 8080,
> > > +               },
> > > +               .keys = {
> > > +                       .nhoff = ETH_HLEN,
> > > +                       .thoff = ETH_HLEN + sizeof(struct ipv6hdr) +
> > > +                               sizeof(struct frag_hdr),
> > > +                       .addr_proto = ETH_P_IPV6,
> > > +                       .ip_proto = IPPROTO_TCP,
> > > +                       .n_proto = __bpf_constant_htons(ETH_P_IPV6),
> > > +                       .is_frag = true,
> > > +                       .is_first_frag = true,
> > > +               },
> > > +       },
> > >  };
> > >
> > >  static int create_tap(const char *ifname)
> > > @@ -225,6 +339,13 @@ void test_flow_dissector(void)
> > >                         .data_size_in = sizeof(tests[i].pkt),
> > >                         .data_out = &flow_keys,
> > >                 };
> > > +               static struct bpf_flow_keys ctx = {};
> > > +
> > > +               if (tests[i].flags) {
> > > +                       tattr.ctx_in = &ctx;
> > > +                       tattr.ctx_size_in = sizeof(ctx);
> > > +                       ctx.flags = tests[i].flags;
> > > +               }
> > >
> > >                 err = bpf_prog_test_run_xattr(&tattr);
> > >                 CHECK_ATTR(tattr.data_size_out != sizeof(flow_keys) ||
> > > @@ -255,6 +376,14 @@ void test_flow_dissector(void)
> > >                 struct bpf_prog_test_run_attr tattr = {};
> > >                 __u32 key = 0;
> > >
> > > +               /* Don't run tests that are not marked as
> > > +                * FLOW_DISSECTOR_F_PARSE_1ST_FRAG; eth_get_headlen
> > > +                * sets this flag.
> > > +                */
> > > +
> > > +               if (tests[i].flags != FLOW_DISSECTOR_F_PARSE_1ST_FRAG)
> > > +                       continue;
> >
> > Maybe test flags & FLOW_DISSECTOR_F_PARSE_1ST_FRAG == 0 instead?
> > It is not necessary now, but might be useful in the future.
> I'm not sure about this one. We want flags here to match flags
> from eth_get_headlen:
>
>         const unsigned int flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG;
>         ...
>         if (!skb_flow_dissect_flow_keys_basic(..., flags))
>
> Otherwise the test might break unexpectedly. So I'd rather manually
> adjust a test here if eth_get_headlen flags change.

Could we have

  flags ==  FLOW_DISSECTOR_F_PARSE_1ST_FRAG | some_other_flag

in the future? This flag is not equal to FLOW_DISSECTOR_F_PARSE_1ST_FRAG.

>
> Maybe I should clarify the comment to signify that dependency? Because
> currently it might be read as if we only care about
> FLOW_DISSECTOR_F_PARSE_1ST_FRAG, but we really care about all flags
> in eth_get_headlen; it just happens that it only has one right now.

Some clarification will be great.

Thanks,
Song

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2019-07-25  5:36 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-24 17:00 [PATCH bpf-next 0/7] bpf/flow_dissector: support input flags Stanislav Fomichev
2019-07-24 17:00 ` [PATCH bpf-next 1/7] bpf/flow_dissector: pass input flags to BPF flow dissector program Stanislav Fomichev
2019-07-24 22:21   ` Song Liu
2019-07-24 22:32     ` Stanislav Fomichev
2019-07-24 22:45       ` Song Liu
2019-07-24 17:00 ` [PATCH bpf-next 2/7] bpf/flow_dissector: document flags Stanislav Fomichev
2019-07-24 22:24   ` Song Liu
2019-07-24 17:00 ` [PATCH bpf-next 3/7] bpf/flow_dissector: support flags in BPF_PROG_TEST_RUN Stanislav Fomichev
2019-07-24 22:33   ` Song Liu
2019-07-24 17:00 ` [PATCH bpf-next 4/7] tools/bpf: sync bpf_flow_keys flags Stanislav Fomichev
2019-07-24 23:17   ` Song Liu
2019-07-24 17:00 ` [PATCH bpf-next 5/7] sefltests/bpf: support FLOW_DISSECTOR_F_PARSE_1ST_FRAG Stanislav Fomichev
2019-07-24 23:21   ` Song Liu
2019-07-24 23:52     ` Stanislav Fomichev
2019-07-25  5:36       ` Song Liu
2019-07-24 17:00 ` [PATCH bpf-next 6/7] bpf/flow_dissector: support ipv6 flow_label and FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL Stanislav Fomichev
2019-07-24 23:28   ` Song Liu
2019-07-24 17:00 ` [PATCH bpf-next 7/7] selftests/bpf: support FLOW_DISSECTOR_F_STOP_AT_ENCAP Stanislav Fomichev
2019-07-24 23:29   ` Song Liu
2019-07-24 22:10 ` [PATCH bpf-next 0/7] bpf/flow_dissector: support input flags Willem de Bruijn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.