bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC bpf-next v2 0/9] net: flow_dissector: trigger BPF hook when called from eth_get_headlen
@ 2019-03-19 22:19 Stanislav Fomichev
  2019-03-19 22:19 ` [RFC bpf-next v2 1/9] net: introduce __init_skb{,_data,_shinfo} helpers Stanislav Fomichev
                   ` (8 more replies)
  0 siblings, 9 replies; 29+ messages in thread
From: Stanislav Fomichev @ 2019-03-19 22:19 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, simon.horman, willemb, peterpenkov96,
	Stanislav Fomichev

Currently, when eth_get_headlen calls flow dissector, it doesn't pass any
skb. Because we use passed skb to lookup associated networking namespace
to find whether we have a BPF program attached or not, we always use
C-based flow dissector in this case.

The goal of this patch series is to add new networking namespace argument
to the eth_get_headlen and make BPF flow dissector programs be able to
work in the skb-less case.

The series goes like this:
1. introduce a bunch of skb initialization helpers; those will be used to
   initialize temporary skb
2. introduce skb_net which can be used to get networking namespace
   associated with an skb
3. add new optional network namespace argument to __skb_flow_dissect and
   plumb through the callers
4. prepare for new bpf flow dissector flavor, create some common routines
   and do some renames
5. restrict access to a limited set of __sk_buff fields from flow
   dissector programs
6. add new bpf_flow_dissect which uses temporary per-cpu skb and calls
   BPF flow dissector program
7. convert flow dissector BPF_PROG_TEST_RUN to skb-less mode to show that
   it works
8. add selftest that makes sure going over the packet bounds in
   bpf_skb_load_bytes with shinfo-less skb doesn't cause any problems
9. add new net namespace argument to eth_get_headlen and convert the
   callers

v2:
* moved temporary skb from stack into percpu (avoids memset of ~200 bytes
  per packet)
* tightened down access to __sk_buff fields from flow dissector programs to
  avoid touching shinfo (whitelist only relevant fields)
* addressed suggestions from Willem

Stanislav Fomichev (9):
  net: introduce __init_skb{,_data,_shinfo} helpers
  net: introduce skb_net helper
  net: plumb network namespace into __skb_flow_dissect
  net: flow_dissector: prepare for no-skb use case
  flow_dissector: allow access only to a subset of __sk_buff fields
  net: flow_dissector: handle no-skb use case
  bpf: when doing BPF_PROG_TEST_RUN for flow dissector use no-skb mode
  selftests/bpf: add flow dissector bpf_skb_load_bytes helper test
  net: flow_dissector: pass net argument to the eth_get_headlen

 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |   2 +-
 drivers/net/ethernet/hisilicon/hns/hns_enet.c |   3 +-
 .../net/ethernet/hisilicon/hns3/hns3_enet.c   |   3 +-
 drivers/net/ethernet/intel/fm10k/fm10k_main.c |   2 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |   3 +-
 drivers/net/ethernet/intel/iavf/iavf_txrx.c   |   2 +-
 drivers/net/ethernet/intel/ice/ice_txrx.c     |   2 +-
 drivers/net/ethernet/intel/igb/igb_main.c     |   2 +-
 drivers/net/ethernet/intel/igc/igc_main.c     |   2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   2 +-
 .../net/ethernet/intel/ixgbevf/ixgbevf_main.c |   3 +-
 .../net/ethernet/mellanox/mlx5/core/en_tx.c   |   3 +-
 drivers/net/tun.c                             |   3 +-
 include/linux/etherdevice.h                   |   2 +-
 include/linux/skbuff.h                        |  48 +++++--
 net/bpf/test_run.c                            |  48 ++-----
 net/core/filter.c                             |  13 +-
 net/core/flow_dissector.c                     | 135 +++++++++++++-----
 net/core/skbuff.c                             |  76 +++++-----
 net/ethernet/eth.c                            |   8 +-
 .../prog_tests/flow_dissector_load_bytes.c    |  50 +++++++
 21 files changed, 269 insertions(+), 143 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/flow_dissector_load_bytes.c

-- 
2.21.0.225.g810b269d1ac-goog

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC bpf-next v2 1/9] net: introduce __init_skb{,_data,_shinfo} helpers
  2019-03-19 22:19 [RFC bpf-next v2 0/9] net: flow_dissector: trigger BPF hook when called from eth_get_headlen Stanislav Fomichev
@ 2019-03-19 22:19 ` Stanislav Fomichev
  2019-03-21  3:39   ` Alexei Starovoitov
  2019-03-19 22:19 ` [RFC bpf-next v2 2/9] net: introduce skb_net helper Stanislav Fomichev
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 29+ messages in thread
From: Stanislav Fomichev @ 2019-03-19 22:19 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, simon.horman, willemb, peterpenkov96,
	Stanislav Fomichev

__init_skb is essentially a version of __build_skb which accepts skb as
an argument (instead of doing kmem_cache_alloc to allocate it).

__init_skb_shinfo initializes shinfo.

__init_skb_data initializes skb data pointers.

No functional changes.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/linux/skbuff.h | 14 ++++++++++
 net/core/skbuff.c      | 60 +++++++++++++++++-------------------------
 2 files changed, 38 insertions(+), 36 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 9027a8c4219f..e8c1d5b97f96 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -4399,5 +4399,19 @@ static inline __wsum lco_csum(struct sk_buff *skb)
 	return csum_partial(l4_hdr, csum_start - l4_hdr, partial);
 }
 
+static inline void __init_skb_data(struct sk_buff *skb, u8 *data,
+				   unsigned int size)
+{
+	/* Account for allocated memory : skb + skb->head */
+	skb->truesize = SKB_TRUESIZE(size);
+	refcount_set(&skb->users, 1);
+	skb->head = data;
+	skb->data = data;
+	skb_reset_tail_pointer(skb);
+	skb->end = skb->tail + size;
+	skb->mac_header = (typeof(skb->mac_header))~0U;
+	skb->transport_header = (typeof(skb->transport_header))~0U;
+}
+
 #endif	/* __KERNEL__ */
 #endif	/* _LINUX_SKBUFF_H */
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 2415d9cb9b89..b413354ee709 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -160,6 +160,26 @@ static void *__kmalloc_reserve(size_t size, gfp_t flags, int node,
  *
  */
 
+static inline void __init_skb(struct sk_buff *skb, u8 *data, unsigned int size)
+{
+	/* Only clear those fields we need to clear, not those that we will
+	 * actually initialize below. Hence, don't put any more fields after
+	 * the tail pointer in struct sk_buff!
+	 */
+	memset(skb, 0, offsetof(struct sk_buff, tail));
+	__init_skb_data(skb, data, size);
+}
+
+static inline void __init_skb_shinfo(struct sk_buff *skb)
+{
+	struct skb_shared_info *shinfo;
+
+	/* make sure we initialize shinfo sequentially */
+	shinfo = skb_shinfo(skb);
+	memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
+	atomic_set(&shinfo->dataref, 1);
+}
+
 /**
  *	__alloc_skb	-	allocate a network buffer
  *	@size: size to allocate
@@ -181,7 +201,6 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 			    int flags, int node)
 {
 	struct kmem_cache *cache;
-	struct skb_shared_info *shinfo;
 	struct sk_buff *skb;
 	u8 *data;
 	bool pfmemalloc;
@@ -215,27 +234,9 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 	size = SKB_WITH_OVERHEAD(ksize(data));
 	prefetchw(data + size);
 
-	/*
-	 * Only clear those fields we need to clear, not those that we will
-	 * actually initialise below. Hence, don't put any more fields after
-	 * the tail pointer in struct sk_buff!
-	 */
-	memset(skb, 0, offsetof(struct sk_buff, tail));
-	/* Account for allocated memory : skb + skb->head */
-	skb->truesize = SKB_TRUESIZE(size);
+	__init_skb(skb, data, size);
+	__init_skb_shinfo(skb);
 	skb->pfmemalloc = pfmemalloc;
-	refcount_set(&skb->users, 1);
-	skb->head = data;
-	skb->data = data;
-	skb_reset_tail_pointer(skb);
-	skb->end = skb->tail + size;
-	skb->mac_header = (typeof(skb->mac_header))~0U;
-	skb->transport_header = (typeof(skb->transport_header))~0U;
-
-	/* make sure we initialize shinfo sequentially */
-	shinfo = skb_shinfo(skb);
-	memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
-	atomic_set(&shinfo->dataref, 1);
 
 	if (flags & SKB_ALLOC_FCLONE) {
 		struct sk_buff_fclones *fclones;
@@ -277,7 +278,6 @@ EXPORT_SYMBOL(__alloc_skb);
  */
 struct sk_buff *__build_skb(void *data, unsigned int frag_size)
 {
-	struct skb_shared_info *shinfo;
 	struct sk_buff *skb;
 	unsigned int size = frag_size ? : ksize(data);
 
@@ -287,20 +287,8 @@ struct sk_buff *__build_skb(void *data, unsigned int frag_size)
 
 	size -= SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 
-	memset(skb, 0, offsetof(struct sk_buff, tail));
-	skb->truesize = SKB_TRUESIZE(size);
-	refcount_set(&skb->users, 1);
-	skb->head = data;
-	skb->data = data;
-	skb_reset_tail_pointer(skb);
-	skb->end = skb->tail + size;
-	skb->mac_header = (typeof(skb->mac_header))~0U;
-	skb->transport_header = (typeof(skb->transport_header))~0U;
-
-	/* make sure we initialize shinfo sequentially */
-	shinfo = skb_shinfo(skb);
-	memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
-	atomic_set(&shinfo->dataref, 1);
+	__init_skb(skb, data, size);
+	__init_skb_shinfo(skb);
 
 	return skb;
 }
-- 
2.21.0.225.g810b269d1ac-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC bpf-next v2 2/9] net: introduce skb_net helper
  2019-03-19 22:19 [RFC bpf-next v2 0/9] net: flow_dissector: trigger BPF hook when called from eth_get_headlen Stanislav Fomichev
  2019-03-19 22:19 ` [RFC bpf-next v2 1/9] net: introduce __init_skb{,_data,_shinfo} helpers Stanislav Fomichev
@ 2019-03-19 22:19 ` Stanislav Fomichev
  2019-03-20  2:14   ` Willem de Bruijn
  2019-03-19 22:19 ` [RFC bpf-next v2 3/9] net: plumb network namespace into __skb_flow_dissect Stanislav Fomichev
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 29+ messages in thread
From: Stanislav Fomichev @ 2019-03-19 22:19 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, simon.horman, willemb, peterpenkov96,
	Stanislav Fomichev

skb_net returns network namespace from the associated device or socket.

This will be used in the next commit.

I tried to inline it in skbuff.h, but I don't think it's feasible.
It depends on 'net/sock.h' for sock_net() and 'linux/netdevice.h' for
dev_net(), both of which we don't include from 'linux/skbuff.h' (but
both sock.h and netdevice.h include skbuff.h). I though about doing
it as a macro/putting it somewhere else, but we will have
skb_flow_dissect{_xyz} use it in the next commits.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/linux/skbuff.h |  2 ++
 net/core/skbuff.c      | 16 ++++++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index e8c1d5b97f96..75e1d4d73cca 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1275,6 +1275,8 @@ static inline int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr)
 }
 #endif
 
+struct net *skb_net(const struct sk_buff *skb);
+
 struct bpf_flow_keys;
 bool __skb_flow_bpf_dissect(struct bpf_prog *prog,
 			    const struct sk_buff *skb,
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index b413354ee709..d81f3a95fb4e 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5725,3 +5725,19 @@ void __skb_ext_put(struct skb_ext *ext)
 }
 EXPORT_SYMBOL(__skb_ext_put);
 #endif /* CONFIG_SKB_EXTENSIONS */
+
+/**
+ * skb_net - Return network namespace associated with skb.
+ * @skb: skb
+ *
+ * Returns pointer to struct net or NULL.
+ */
+struct net *skb_net(const struct sk_buff *skb)
+{
+	if (skb->dev)
+		return dev_net(skb->dev);
+	else if (skb->sk)
+		return sock_net(skb->sk);
+	return NULL;
+}
+EXPORT_SYMBOL(skb_net);
-- 
2.21.0.225.g810b269d1ac-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC bpf-next v2 3/9] net: plumb network namespace into __skb_flow_dissect
  2019-03-19 22:19 [RFC bpf-next v2 0/9] net: flow_dissector: trigger BPF hook when called from eth_get_headlen Stanislav Fomichev
  2019-03-19 22:19 ` [RFC bpf-next v2 1/9] net: introduce __init_skb{,_data,_shinfo} helpers Stanislav Fomichev
  2019-03-19 22:19 ` [RFC bpf-next v2 2/9] net: introduce skb_net helper Stanislav Fomichev
@ 2019-03-19 22:19 ` Stanislav Fomichev
  2019-03-19 22:19 ` [RFC bpf-next v2 4/9] net: flow_dissector: prepare for no-skb use case Stanislav Fomichev
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 29+ messages in thread
From: Stanislav Fomichev @ 2019-03-19 22:19 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, simon.horman, willemb, peterpenkov96,
	Stanislav Fomichev

This new argument will be used in the next patches for the
eth_get_headlen use case. eth_get_headlen calls flow dissector
with only data (without skb) so there is currently no way to
pull attached BPF flow dissector program. With this new argument,
we can amend the callers to explicitly pass network namespace
so we can use attached BPF program.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/linux/skbuff.h    | 19 +++++++++++--------
 net/core/flow_dissector.c | 22 +++++++++++-----------
 net/ethernet/eth.c        |  5 +++--
 3 files changed, 25 insertions(+), 21 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 75e1d4d73cca..4ca4c60cbacb 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1282,7 +1282,8 @@ bool __skb_flow_bpf_dissect(struct bpf_prog *prog,
 			    const struct sk_buff *skb,
 			    struct flow_dissector *flow_dissector,
 			    struct bpf_flow_keys *flow_keys);
-bool __skb_flow_dissect(const struct sk_buff *skb,
+bool __skb_flow_dissect(struct net *net,
+			const struct sk_buff *skb,
 			struct flow_dissector *flow_dissector,
 			void *target_container,
 			void *data, __be16 proto, int nhoff, int hlen,
@@ -1292,8 +1293,8 @@ static inline bool skb_flow_dissect(const struct sk_buff *skb,
 				    struct flow_dissector *flow_dissector,
 				    void *target_container, unsigned int flags)
 {
-	return __skb_flow_dissect(skb, flow_dissector, target_container,
-				  NULL, 0, 0, 0, flags);
+	return __skb_flow_dissect(skb_net(skb), skb, flow_dissector,
+				  target_container, NULL, 0, 0, 0, flags);
 }
 
 static inline bool skb_flow_dissect_flow_keys(const struct sk_buff *skb,
@@ -1301,18 +1302,19 @@ static inline bool skb_flow_dissect_flow_keys(const struct sk_buff *skb,
 					      unsigned int flags)
 {
 	memset(flow, 0, sizeof(*flow));
-	return __skb_flow_dissect(skb, &flow_keys_dissector, flow,
-				  NULL, 0, 0, 0, flags);
+	return __skb_flow_dissect(skb_net(skb), skb, &flow_keys_dissector,
+				  flow, NULL, 0, 0, 0, flags);
 }
 
 static inline bool
-skb_flow_dissect_flow_keys_basic(const struct sk_buff *skb,
+skb_flow_dissect_flow_keys_basic(struct net *net,
+				 const struct sk_buff *skb,
 				 struct flow_keys_basic *flow, void *data,
 				 __be16 proto, int nhoff, int hlen,
 				 unsigned int flags)
 {
 	memset(flow, 0, sizeof(*flow));
-	return __skb_flow_dissect(skb, &flow_keys_basic_dissector, flow,
+	return __skb_flow_dissect(net, skb, &flow_keys_basic_dissector, flow,
 				  data, proto, nhoff, hlen, flags);
 }
 
@@ -2492,7 +2494,8 @@ static inline void skb_probe_transport_header(struct sk_buff *skb)
 	if (skb_transport_header_was_set(skb))
 		return;
 
-	if (skb_flow_dissect_flow_keys_basic(skb, &keys, NULL, 0, 0, 0, 0))
+	if (skb_flow_dissect_flow_keys_basic(skb_net(skb), skb, &keys,
+					     NULL, 0, 0, 0, 0))
 		skb_set_transport_header(skb, keys.control.thoff);
 }
 
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index bb1a54747d64..e13165e7528c 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -725,6 +725,7 @@ bool __skb_flow_bpf_dissect(struct bpf_prog *prog,
 
 /**
  * __skb_flow_dissect - extract the flow_keys struct and return it
+ * @net: associated network namespace
  * @skb: sk_buff to extract the flow from, can be NULL if the rest are specified
  * @flow_dissector: list of keys to dissect
  * @target_container: target structure to put dissected values into
@@ -739,7 +740,8 @@ bool __skb_flow_bpf_dissect(struct bpf_prog *prog,
  *
  * Caller must take care of zeroing target container memory.
  */
-bool __skb_flow_dissect(const struct sk_buff *skb,
+bool __skb_flow_dissect(struct net *net,
+			const struct sk_buff *skb,
 			struct flow_dissector *flow_dissector,
 			void *target_container,
 			void *data, __be16 proto, int nhoff, int hlen,
@@ -797,14 +799,11 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 		struct bpf_flow_keys flow_keys;
 		struct bpf_prog *attached = NULL;
 
-		rcu_read_lock();
+		WARN_ON_ONCE(!net);
 
-		if (skb->dev)
-			attached = rcu_dereference(dev_net(skb->dev)->flow_dissector_prog);
-		else if (skb->sk)
-			attached = rcu_dereference(sock_net(skb->sk)->flow_dissector_prog);
-		else
-			WARN_ON_ONCE(1);
+		rcu_read_lock();
+		if (net)
+			attached = rcu_dereference(net->flow_dissector_prog);
 
 		if (attached) {
 			ret = __skb_flow_bpf_dissect(attached, skb,
@@ -1406,8 +1405,8 @@ u32 __skb_get_hash_symmetric(const struct sk_buff *skb)
 	__flow_hash_secret_init();
 
 	memset(&keys, 0, sizeof(keys));
-	__skb_flow_dissect(skb, &flow_keys_dissector_symmetric, &keys,
-			   NULL, 0, 0, 0,
+	__skb_flow_dissect(skb_net(skb), skb, &flow_keys_dissector_symmetric,
+			   &keys, NULL, 0, 0, 0,
 			   FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL);
 
 	return __flow_hash_from_keys(&keys, hashrnd);
@@ -1508,7 +1507,8 @@ u32 skb_get_poff(const struct sk_buff *skb)
 {
 	struct flow_keys_basic keys;
 
-	if (!skb_flow_dissect_flow_keys_basic(skb, &keys, NULL, 0, 0, 0, 0))
+	if (!skb_flow_dissect_flow_keys_basic(skb_net(skb), skb, &keys,
+					      NULL, 0, 0, 0, 0))
 		return 0;
 
 	return __skb_get_poff(skb, skb->data, &keys, skb_headlen(skb));
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index f7a3d7a171c7..1e439549c419 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -136,8 +136,9 @@ u32 eth_get_headlen(void *data, unsigned int len)
 		return len;
 
 	/* parse any remaining L2/L3 headers, check for L4 */
-	if (!skb_flow_dissect_flow_keys_basic(NULL, &keys, data, eth->h_proto,
-					      sizeof(*eth), len, flags))
+	if (!skb_flow_dissect_flow_keys_basic(NULL, NULL, &keys, data,
+					      eth->h_proto, sizeof(*eth),
+					      len, flags))
 		return max_t(u32, keys.control.thoff, sizeof(*eth));
 
 	/* parse for any L4 headers */
-- 
2.21.0.225.g810b269d1ac-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC bpf-next v2 4/9] net: flow_dissector: prepare for no-skb use case
  2019-03-19 22:19 [RFC bpf-next v2 0/9] net: flow_dissector: trigger BPF hook when called from eth_get_headlen Stanislav Fomichev
                   ` (2 preceding siblings ...)
  2019-03-19 22:19 ` [RFC bpf-next v2 3/9] net: plumb network namespace into __skb_flow_dissect Stanislav Fomichev
@ 2019-03-19 22:19 ` Stanislav Fomichev
  2019-03-19 22:19 ` [RFC bpf-next v2 5/9] flow_dissector: allow access only to a subset of __sk_buff fields Stanislav Fomichev
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 29+ messages in thread
From: Stanislav Fomichev @ 2019-03-19 22:19 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, simon.horman, willemb, peterpenkov96,
	Stanislav Fomichev

Create helpers that will be reused by both versions:
* init_flow_keys - initialize flow keys
* clamp_flow_keys - clam flow keys when done

Rename (to have consistent bpf_flow_ prefix):
* __skb_flow_bpf_dissect to bpf_flow_dissect_skb
* __skb_flow_bpf_to_target to bpf_flow_keys_to_target

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/linux/skbuff.h    |  8 +++---
 net/bpf/test_run.c        |  6 ++---
 net/core/flow_dissector.c | 56 +++++++++++++++++++++++++--------------
 3 files changed, 43 insertions(+), 27 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4ca4c60cbacb..194dbc2985e5 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1278,10 +1278,10 @@ static inline int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr)
 struct net *skb_net(const struct sk_buff *skb);
 
 struct bpf_flow_keys;
-bool __skb_flow_bpf_dissect(struct bpf_prog *prog,
-			    const struct sk_buff *skb,
-			    struct flow_dissector *flow_dissector,
-			    struct bpf_flow_keys *flow_keys);
+bool bpf_flow_dissect_skb(struct bpf_prog *prog,
+			  const struct sk_buff *skb,
+			  struct flow_dissector *flow_dissector,
+			  struct bpf_flow_keys *flow_keys);
 bool __skb_flow_dissect(struct net *net,
 			const struct sk_buff *skb,
 			struct flow_dissector *flow_dissector,
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index fab142b796ef..512773a95ad5 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -300,9 +300,9 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
 	preempt_disable();
 	time_start = ktime_get_ns();
 	for (i = 0; i < repeat; i++) {
-		retval = __skb_flow_bpf_dissect(prog, skb,
-						&flow_keys_dissector,
-						&flow_keys);
+		retval = bpf_flow_dissect_skb(prog, skb,
+					      &flow_keys_dissector,
+					      &flow_keys);
 
 		if (signal_pending(current)) {
 			preempt_enable();
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index e13165e7528c..ab43f9bd7ec4 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -629,9 +629,9 @@ static bool skb_flow_dissect_allowed(int *num_hdrs)
 	return (*num_hdrs <= MAX_FLOW_DISSECT_HDRS);
 }
 
-static void __skb_flow_bpf_to_target(const struct bpf_flow_keys *flow_keys,
-				     struct flow_dissector *flow_dissector,
-				     void *target_container)
+static void bpf_flow_keys_to_target(const struct bpf_flow_keys *flow_keys,
+				    struct flow_dissector *flow_dissector,
+				    void *target_container)
 {
 	struct flow_dissector_key_control *key_control;
 	struct flow_dissector_key_basic *key_basic;
@@ -683,10 +683,32 @@ static void __skb_flow_bpf_to_target(const struct bpf_flow_keys *flow_keys,
 	}
 }
 
-bool __skb_flow_bpf_dissect(struct bpf_prog *prog,
-			    const struct sk_buff *skb,
-			    struct flow_dissector *flow_dissector,
-			    struct bpf_flow_keys *flow_keys)
+static inline void init_flow_keys(struct bpf_flow_keys *flow_keys,
+				  const struct sk_buff *skb, int nhoff)
+{
+	struct bpf_skb_data_end *cb = (struct bpf_skb_data_end *)skb->cb;
+
+	memset(cb, 0, sizeof(*cb));
+	memset(flow_keys, 0, sizeof(*flow_keys));
+
+	flow_keys->nhoff = nhoff;
+	flow_keys->thoff = nhoff;
+
+	cb->qdisc_cb.flow_keys = flow_keys;
+}
+
+static inline void clamp_flow_keys(struct bpf_flow_keys *flow_keys,
+				   int hlen)
+{
+	flow_keys->nhoff = clamp_t(u16, flow_keys->nhoff, 0, hlen);
+	flow_keys->thoff = clamp_t(u16, flow_keys->thoff,
+				   flow_keys->nhoff, hlen);
+}
+
+bool bpf_flow_dissect_skb(struct bpf_prog *prog,
+			  const struct sk_buff *skb,
+			  struct flow_dissector *flow_dissector,
+			  struct bpf_flow_keys *flow_keys)
 {
 	struct bpf_skb_data_end cb_saved;
 	struct bpf_skb_data_end *cb;
@@ -702,13 +724,9 @@ bool __skb_flow_bpf_dissect(struct bpf_prog *prog,
 
 	/* Save Control Block */
 	memcpy(&cb_saved, cb, sizeof(cb_saved));
-	memset(cb, 0, sizeof(*cb));
 
 	/* Pass parameters to the BPF program */
-	memset(flow_keys, 0, sizeof(*flow_keys));
-	cb->qdisc_cb.flow_keys = flow_keys;
-	flow_keys->nhoff = skb_network_offset(skb);
-	flow_keys->thoff = flow_keys->nhoff;
+	init_flow_keys(flow_keys, skb, skb_network_offset(skb));
 
 	bpf_compute_data_pointers((struct sk_buff *)skb);
 	result = BPF_PROG_RUN(prog, skb);
@@ -716,9 +734,7 @@ bool __skb_flow_bpf_dissect(struct bpf_prog *prog,
 	/* Restore state */
 	memcpy(cb, &cb_saved, sizeof(cb_saved));
 
-	flow_keys->nhoff = clamp_t(u16, flow_keys->nhoff, 0, skb->len);
-	flow_keys->thoff = clamp_t(u16, flow_keys->thoff,
-				   flow_keys->nhoff, skb->len);
+	clamp_flow_keys(flow_keys, skb->len);
 
 	return result == BPF_OK;
 }
@@ -806,11 +822,11 @@ bool __skb_flow_dissect(struct net *net,
 			attached = rcu_dereference(net->flow_dissector_prog);
 
 		if (attached) {
-			ret = __skb_flow_bpf_dissect(attached, skb,
-						     flow_dissector,
-						     &flow_keys);
-			__skb_flow_bpf_to_target(&flow_keys, flow_dissector,
-						 target_container);
+			ret = bpf_flow_dissect_skb(attached, skb,
+						   flow_dissector,
+						   &flow_keys);
+			bpf_flow_keys_to_target(&flow_keys, flow_dissector,
+						target_container);
 			rcu_read_unlock();
 			return ret;
 		}
-- 
2.21.0.225.g810b269d1ac-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC bpf-next v2 5/9] flow_dissector: allow access only to a subset of __sk_buff fields
  2019-03-19 22:19 [RFC bpf-next v2 0/9] net: flow_dissector: trigger BPF hook when called from eth_get_headlen Stanislav Fomichev
                   ` (3 preceding siblings ...)
  2019-03-19 22:19 ` [RFC bpf-next v2 4/9] net: flow_dissector: prepare for no-skb use case Stanislav Fomichev
@ 2019-03-19 22:19 ` Stanislav Fomichev
  2019-03-19 22:19 ` [RFC bpf-next v2 6/9] net: flow_dissector: handle no-skb use case Stanislav Fomichev
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 29+ messages in thread
From: Stanislav Fomichev @ 2019-03-19 22:19 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, simon.horman, willemb, peterpenkov96,
	Stanislav Fomichev

Use whitelist instead of a blacklist and allow only a small set of
fields that might be relevant in the context of flow dissector:
* len
* protocol
* vlan_present
* vlan_tci
* vlan_proto
* cb

This is required for the eth_get_headlen case where we construct
temporary skb which might not have full/consistent state to let flow
dissector programs access all the fields (which are irrelevant in for
flow dissector program type).

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 net/core/filter.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 647c63a7b25b..5f413567ce8a 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -6632,11 +6632,14 @@ static bool flow_dissector_is_valid_access(int off, int size,
 	case bpf_ctx_range_ptr(struct __sk_buff, flow_keys):
 		info->reg_type = PTR_TO_FLOW_KEYS;
 		break;
-	case bpf_ctx_range(struct __sk_buff, tc_classid):
-	case bpf_ctx_range(struct __sk_buff, data_meta):
-	case bpf_ctx_range_till(struct __sk_buff, family, local_port):
-	case bpf_ctx_range(struct __sk_buff, tstamp):
-	case bpf_ctx_range(struct __sk_buff, wire_len):
+	case bpf_ctx_range(struct __sk_buff, len):
+	case bpf_ctx_range(struct __sk_buff, protocol):
+	case bpf_ctx_range(struct __sk_buff, vlan_present):
+	case bpf_ctx_range(struct __sk_buff, vlan_tci):
+	case bpf_ctx_range(struct __sk_buff, vlan_proto):
+	case bpf_ctx_range_till(struct __sk_buff, cb[0], cb[4]):
+		break;
+	default:
 		return false;
 	}
 
-- 
2.21.0.225.g810b269d1ac-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC bpf-next v2 6/9] net: flow_dissector: handle no-skb use case
  2019-03-19 22:19 [RFC bpf-next v2 0/9] net: flow_dissector: trigger BPF hook when called from eth_get_headlen Stanislav Fomichev
                   ` (4 preceding siblings ...)
  2019-03-19 22:19 ` [RFC bpf-next v2 5/9] flow_dissector: allow access only to a subset of __sk_buff fields Stanislav Fomichev
@ 2019-03-19 22:19 ` Stanislav Fomichev
  2019-03-19 22:19 ` [RFC bpf-next v2 7/9] bpf: when doing BPF_PROG_TEST_RUN for flow dissector use no-skb mode Stanislav Fomichev
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 29+ messages in thread
From: Stanislav Fomichev @ 2019-03-19 22:19 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, simon.horman, willemb, peterpenkov96,
	Stanislav Fomichev

When flow_dissector is called without skb (with only data and hlen),
use temporary per-cpu skb (which has a linear chunk of data passed
to the flow dissector). This should let us handle eth_get_headlen
case where only data is provided and we don't want to (yet) allocate
an skb.

Since this per-cpu skb doesn't allocate its own data, we can't
add shinfo and need to be careful to avoid any code paths that use
it. Flow dissector BPF programs can only call bpf_skb_load_bytes helper,
which doesn't touch shinfo in our case (skb->len is the length of the
linear header so it exits early).

bpf_flow_dissect can be called only by eth_get_headlen from softirq and task
contexts:
* NIC drivers call it from .ndo_start_xmit/.ndo_change_mtu/napi->poll (softirq)
* TUN driver calls it from fops->write_iter/proto_ops->sendmsg (task)

We protect per-cpu skb by using local_bh_enable/disable, and WARN_ON if
called from interrupt context.

Note: WARN_ON_ONCE(!net) will now trigger for eth_get_headlen users.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/linux/skbuff.h    |  5 +++
 net/core/flow_dissector.c | 69 +++++++++++++++++++++++++++++++--------
 2 files changed, 60 insertions(+), 14 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 194dbc2985e5..e8e0f650af0a 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1282,6 +1282,11 @@ bool bpf_flow_dissect_skb(struct bpf_prog *prog,
 			  const struct sk_buff *skb,
 			  struct flow_dissector *flow_dissector,
 			  struct bpf_flow_keys *flow_keys);
+bool bpf_flow_dissect(struct bpf_prog *prog,
+		      void *data, __be16 proto,
+		      int nhoff, int hlen,
+		      struct flow_dissector *flow_dissector,
+		      struct bpf_flow_keys *flow_keys);
 bool __skb_flow_dissect(struct net *net,
 			const struct sk_buff *skb,
 			struct flow_dissector *flow_dissector,
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index ab43f9bd7ec4..accf1ed460ec 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -739,6 +739,43 @@ bool bpf_flow_dissect_skb(struct bpf_prog *prog,
 	return result == BPF_OK;
 }
 
+static DEFINE_PER_CPU(struct sk_buff, bpf_flow_skb);
+
+bool bpf_flow_dissect(struct bpf_prog *prog,
+		      void *data, __be16 proto,
+		      int nhoff, int hlen,
+		      struct flow_dissector *flow_dissector,
+		      struct bpf_flow_keys *flow_keys)
+{
+	struct bpf_skb_data_end *cb;
+	struct sk_buff skb_head;
+	struct sk_buff *skb = &skb_head;
+	u32 result;
+
+	if (WARN_ON_ONCE(in_irq()))
+		return false;
+
+	local_bh_disable();
+	skb = this_cpu_ptr(&bpf_flow_skb);
+	skb->len = 0;
+	__init_skb_data(skb, data, hlen);
+	__skb_put(skb, hlen);
+	skb->protocol = proto;
+
+	init_flow_keys(flow_keys, skb, nhoff);
+
+	cb = (struct bpf_skb_data_end *)skb->cb;
+	cb->data_meta = skb->data;
+	cb->data_end  = skb->data + skb_headlen(skb);
+
+	result = BPF_PROG_RUN(prog, skb);
+
+	clamp_flow_keys(flow_keys, hlen);
+	local_bh_enable();
+
+	return result == BPF_OK;
+}
+
 /**
  * __skb_flow_dissect - extract the flow_keys struct and return it
  * @net: associated network namespace
@@ -770,8 +807,10 @@ bool __skb_flow_dissect(struct net *net,
 	struct flow_dissector_key_icmp *key_icmp;
 	struct flow_dissector_key_tags *key_tags;
 	struct flow_dissector_key_vlan *key_vlan;
-	enum flow_dissect_ret fdret;
 	enum flow_dissector_key_id dissector_vlan = FLOW_DISSECTOR_KEY_MAX;
+	struct bpf_prog *attached = NULL;
+	struct bpf_flow_keys flow_keys;
+	enum flow_dissect_ret fdret;
 	int num_hdrs = 0;
 	u8 ip_proto = 0;
 	bool ret;
@@ -811,28 +850,30 @@ bool __skb_flow_dissect(struct net *net,
 					      FLOW_DISSECTOR_KEY_BASIC,
 					      target_container);
 
-	if (skb) {
-		struct bpf_flow_keys flow_keys;
-		struct bpf_prog *attached = NULL;
+	WARN_ON_ONCE(!net);
 
-		WARN_ON_ONCE(!net);
+	rcu_read_lock();
 
-		rcu_read_lock();
-		if (net)
-			attached = rcu_dereference(net->flow_dissector_prog);
+	if (net)
+		attached = rcu_dereference(net->flow_dissector_prog);
 
-		if (attached) {
+	if (attached) {
+		if (skb)
 			ret = bpf_flow_dissect_skb(attached, skb,
 						   flow_dissector,
 						   &flow_keys);
-			bpf_flow_keys_to_target(&flow_keys, flow_dissector,
-						target_container);
-			rcu_read_unlock();
-			return ret;
-		}
+		else
+			ret = bpf_flow_dissect(attached, data, proto, nhoff,
+					       hlen, flow_dissector,
+					       &flow_keys);
+		bpf_flow_keys_to_target(&flow_keys, flow_dissector,
+					target_container);
 		rcu_read_unlock();
+		return ret;
 	}
 
+	rcu_read_unlock();
+
 	if (dissector_uses_key(flow_dissector,
 			       FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
 		struct ethhdr *eth = eth_hdr(skb);
-- 
2.21.0.225.g810b269d1ac-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC bpf-next v2 7/9] bpf: when doing BPF_PROG_TEST_RUN for flow dissector use no-skb mode
  2019-03-19 22:19 [RFC bpf-next v2 0/9] net: flow_dissector: trigger BPF hook when called from eth_get_headlen Stanislav Fomichev
                   ` (5 preceding siblings ...)
  2019-03-19 22:19 ` [RFC bpf-next v2 6/9] net: flow_dissector: handle no-skb use case Stanislav Fomichev
@ 2019-03-19 22:19 ` Stanislav Fomichev
  2019-03-20  2:14   ` Willem de Bruijn
  2019-03-19 22:19 ` [RFC bpf-next v2 8/9] selftests/bpf: add flow dissector bpf_skb_load_bytes helper test Stanislav Fomichev
  2019-03-19 22:19 ` [RFC bpf-next v2 9/9] net: flow_dissector: pass net argument to the eth_get_headlen Stanislav Fomichev
  8 siblings, 1 reply; 29+ messages in thread
From: Stanislav Fomichev @ 2019-03-19 22:19 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, simon.horman, willemb, peterpenkov96,
	Stanislav Fomichev

Now that we have __flow_bpf_dissect which works on raw data (by
constructing temporary on-stack skb), use it when doing
BPF_PROG_TEST_RUN for flow dissector.

This should help us catch any possible bugs due to missing shinfo on
the per-cpu skb.

Note that existing __skb_flow_bpf_dissect swallows L2 headers and returns
nhoff=0, we need to preserve the existing behavior.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 net/bpf/test_run.c | 48 ++++++++++++++--------------------------------
 1 file changed, 14 insertions(+), 34 deletions(-)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 512773a95ad5..90f7eaf129c6 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -252,10 +252,8 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
 	u32 repeat = kattr->test.repeat;
 	struct bpf_flow_keys flow_keys;
 	u64 time_start, time_spent = 0;
-	struct bpf_skb_data_end *cb;
+	const struct ethhdr *eth;
 	u32 retval, duration;
-	struct sk_buff *skb;
-	struct sock *sk;
 	void *data;
 	int ret;
 	u32 i;
@@ -263,35 +261,14 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
 	if (prog->type != BPF_PROG_TYPE_FLOW_DISSECTOR)
 		return -EINVAL;
 
-	data = bpf_test_init(kattr, size, NET_SKB_PAD + NET_IP_ALIGN,
-			     SKB_DATA_ALIGN(sizeof(struct skb_shared_info)));
+	if (size < ETH_HLEN)
+		return -EINVAL;
+
+	data = bpf_test_init(kattr, size, 0, 0);
 	if (IS_ERR(data))
 		return PTR_ERR(data);
 
-	sk = kzalloc(sizeof(*sk), GFP_USER);
-	if (!sk) {
-		kfree(data);
-		return -ENOMEM;
-	}
-	sock_net_set(sk, current->nsproxy->net_ns);
-	sock_init_data(NULL, sk);
-
-	skb = build_skb(data, 0);
-	if (!skb) {
-		kfree(data);
-		kfree(sk);
-		return -ENOMEM;
-	}
-	skb->sk = sk;
-
-	skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
-	__skb_put(skb, size);
-	skb->protocol = eth_type_trans(skb,
-				       current->nsproxy->net_ns->loopback_dev);
-	skb_reset_network_header(skb);
-
-	cb = (struct bpf_skb_data_end *)skb->cb;
-	cb->qdisc_cb.flow_keys = &flow_keys;
+	eth = (struct ethhdr *)data;
 
 	if (!repeat)
 		repeat = 1;
@@ -300,9 +277,13 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
 	preempt_disable();
 	time_start = ktime_get_ns();
 	for (i = 0; i < repeat; i++) {
-		retval = bpf_flow_dissect_skb(prog, skb,
-					      &flow_keys_dissector,
-					      &flow_keys);
+		retval = bpf_flow_dissect(prog, data, eth->h_proto, ETH_HLEN,
+					  size, &flow_keys_dissector,
+					  &flow_keys);
+		if (flow_keys.nhoff >= ETH_HLEN)
+			flow_keys.nhoff -= ETH_HLEN;
+		if (flow_keys.thoff >= ETH_HLEN)
+			flow_keys.thoff -= ETH_HLEN;
 
 		if (signal_pending(current)) {
 			preempt_enable();
@@ -335,7 +316,6 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
 			      retval, duration);
 
 out:
-	kfree_skb(skb);
-	kfree(sk);
+	kfree(data);
 	return ret;
 }
-- 
2.21.0.225.g810b269d1ac-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC bpf-next v2 8/9] selftests/bpf: add flow dissector bpf_skb_load_bytes helper test
  2019-03-19 22:19 [RFC bpf-next v2 0/9] net: flow_dissector: trigger BPF hook when called from eth_get_headlen Stanislav Fomichev
                   ` (6 preceding siblings ...)
  2019-03-19 22:19 ` [RFC bpf-next v2 7/9] bpf: when doing BPF_PROG_TEST_RUN for flow dissector use no-skb mode Stanislav Fomichev
@ 2019-03-19 22:19 ` Stanislav Fomichev
  2019-03-19 22:19 ` [RFC bpf-next v2 9/9] net: flow_dissector: pass net argument to the eth_get_headlen Stanislav Fomichev
  8 siblings, 0 replies; 29+ messages in thread
From: Stanislav Fomichev @ 2019-03-19 22:19 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, simon.horman, willemb, peterpenkov96,
	Stanislav Fomichev

With the per-cpu skb, we want to make sure we don't trigger any
shinfo access. Add small test which tries to read the data past
the packet boundary.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 .../prog_tests/flow_dissector_load_bytes.c    | 50 +++++++++++++++++++
 1 file changed, 50 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/flow_dissector_load_bytes.c

diff --git a/tools/testing/selftests/bpf/prog_tests/flow_dissector_load_bytes.c b/tools/testing/selftests/bpf/prog_tests/flow_dissector_load_bytes.c
new file mode 100644
index 000000000000..facf5734c915
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/flow_dissector_load_bytes.c
@@ -0,0 +1,50 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+
+void test_flow_dissector_load_bytes(void)
+{
+	struct bpf_flow_keys flow_keys;
+	__u32 duration = 0, retval, size;
+	struct bpf_insn prog[] = {
+		// BPF_REG_1 - 1st argument: context
+		// BPF_REG_2 - 2nd argument: offset, start at last byte + 1
+		BPF_MOV64_IMM(BPF_REG_2, sizeof(pkt_v4)),
+		// BPF_REG_3 - 3rd argument: destination, reserve byte on stack
+		BPF_ALU64_REG(BPF_MOV, BPF_REG_3, BPF_REG_10),
+		BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, -1),
+		// BPF_REG_4 - 4th argument: copy one byte
+		BPF_MOV64_IMM(BPF_REG_4, 1),
+		// bpf_skb_load_bytes(ctx, sizeof(pkt_v4), ptr, 1)
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+			     BPF_FUNC_skb_load_bytes),
+		BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 2),
+		// if (ret == 0) return BPF_DROP (2)
+		BPF_MOV64_IMM(BPF_REG_0, BPF_DROP),
+		BPF_EXIT_INSN(),
+		// if (ret != 0) return BPF_OK (0)
+		BPF_MOV64_IMM(BPF_REG_0, BPF_OK),
+		BPF_EXIT_INSN(),
+	};
+	int fd, err;
+
+	/* make sure bpf_skb_load_bytes helper doesn't cause any
+	 * problems when used with the fake skb in the flow
+	 * dissector (try to read past the last byte)
+	 */
+	fd = bpf_load_program(BPF_PROG_TYPE_FLOW_DISSECTOR, prog,
+			      ARRAY_SIZE(prog), "GPL", 0, NULL, 0);
+	CHECK(fd < 0,
+	      "flow_dissector-bpf_skb_load_bytes-load",
+	      "fd %d errno %d\n",
+	      fd, errno);
+
+	err = bpf_prog_test_run(fd, 1, &pkt_v4, sizeof(pkt_v4),
+				&flow_keys, &size, &retval, &duration);
+	CHECK(size != sizeof(flow_keys) || err || retval != 1,
+	      "flow_dissector-bpf_skb_load_bytes",
+	      "err %d errno %d retval %d duration %d size %u/%lu\n",
+	      err, errno, retval, duration, size, sizeof(flow_keys));
+
+	if (fd >= -1)
+		close(fd);
+}
-- 
2.21.0.225.g810b269d1ac-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC bpf-next v2 9/9] net: flow_dissector: pass net argument to the eth_get_headlen
  2019-03-19 22:19 [RFC bpf-next v2 0/9] net: flow_dissector: trigger BPF hook when called from eth_get_headlen Stanislav Fomichev
                   ` (7 preceding siblings ...)
  2019-03-19 22:19 ` [RFC bpf-next v2 8/9] selftests/bpf: add flow dissector bpf_skb_load_bytes helper test Stanislav Fomichev
@ 2019-03-19 22:19 ` Stanislav Fomichev
  8 siblings, 0 replies; 29+ messages in thread
From: Stanislav Fomichev @ 2019-03-19 22:19 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, simon.horman, willemb, peterpenkov96,
	Stanislav Fomichev

This is an example and for RFC only, just to show what the end result
would look like. In a proper submission, we need to introduce new
helper and convert each driver one by one to make sure we don't
break 'git bisect'.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c         | 2 +-
 drivers/net/ethernet/hisilicon/hns/hns_enet.c     | 3 ++-
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c   | 3 ++-
 drivers/net/ethernet/intel/fm10k/fm10k_main.c     | 2 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c       | 3 ++-
 drivers/net/ethernet/intel/iavf/iavf_txrx.c       | 2 +-
 drivers/net/ethernet/intel/ice/ice_txrx.c         | 2 +-
 drivers/net/ethernet/intel/igb/igb_main.c         | 2 +-
 drivers/net/ethernet/intel/igc/igc_main.c         | 2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     | 2 +-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 3 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c   | 3 ++-
 drivers/net/tun.c                                 | 3 ++-
 include/linux/etherdevice.h                       | 2 +-
 net/ethernet/eth.c                                | 5 +++--
 15 files changed, 23 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 0bb9d7b3a2b6..664e6392f71d 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -899,7 +899,7 @@ static struct sk_buff *bnxt_rx_page_skb(struct bnxt *bp,
 			     DMA_ATTR_WEAK_ORDERING);
 
 	if (unlikely(!payload))
-		payload = eth_get_headlen(data_ptr, len);
+		payload = eth_get_headlen(dev_net(bp->dev), data_ptr, len);
 
 	skb = napi_alloc_skb(&rxr->bnapi->napi, payload);
 	if (!skb) {
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
index 60e7d7ae3787..caf075a7449c 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -603,7 +603,8 @@ static int hns_nic_poll_rx_skb(struct hns_nic_ring_data *ring_data,
 	} else {
 		ring->stats.seg_pkt_cnt++;
 
-		pull_len = eth_get_headlen(va, HNS_RX_HEAD_SIZE);
+		pull_len = eth_get_headlen(dev_net(ndev), va,
+					   HNS_RX_HEAD_SIZE);
 		memcpy(__skb_put(skb, pull_len), va,
 		       ALIGN(pull_len, sizeof(long)));
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 1c1f17ec6be2..265bb8e47d48 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -2438,7 +2438,8 @@ static int hns3_alloc_skb(struct hns3_enet_ring *ring, int length,
 	ring->stats.seg_pkt_cnt++;
 	u64_stats_update_end(&ring->syncp);
 
-	ring->pull_len = eth_get_headlen(va, HNS3_RX_HEAD_SIZE);
+	ring->pull_len = eth_get_headlen(dev_net(netdev), va,
+					 HNS3_RX_HEAD_SIZE);
 	__skb_put(skb, ring->pull_len);
 	hns3_nic_reuse_page(skb, ring->frag_num++, ring, ring->pull_len,
 			    desc_cb);
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 5a0419421511..d1b4b259b90b 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -278,7 +278,7 @@ static bool fm10k_add_rx_frag(struct fm10k_rx_buffer *rx_buffer,
 	/* we need the header to contain the greater of either ETH_HLEN or
 	 * 60 bytes if the skb->len is less than 60 for skb_pad.
 	 */
-	pull_len = eth_get_headlen(va, FM10K_RX_HDR_LEN);
+	pull_len = eth_get_headlen(skb_net(skb), va, FM10K_RX_HDR_LEN);
 
 	/* align pull length to size of long to optimize memcpy performance */
 	memcpy(__skb_put(skb, pull_len), va, ALIGN(pull_len, sizeof(long)));
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 6c97667d20ef..9e42a3526360 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2035,7 +2035,8 @@ static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring,
 	/* Determine available headroom for copy */
 	headlen = size;
 	if (headlen > I40E_RX_HDR_SIZE)
-		headlen = eth_get_headlen(xdp->data, I40E_RX_HDR_SIZE);
+		headlen = eth_get_headlen(skb_net(skb), xdp->data,
+					  I40E_RX_HDR_SIZE);
 
 	/* align pull length to size of long to optimize memcpy performance */
 	memcpy(__skb_put(skb, headlen), xdp->data,
diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
index 9b4d7cec2e18..d3c7fd521fd7 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
@@ -1315,7 +1315,7 @@ static struct sk_buff *iavf_construct_skb(struct iavf_ring *rx_ring,
 	/* Determine available headroom for copy */
 	headlen = size;
 	if (headlen > IAVF_RX_HDR_SIZE)
-		headlen = eth_get_headlen(va, IAVF_RX_HDR_SIZE);
+		headlen = eth_get_headlen(skb_net(skb), va, IAVF_RX_HDR_SIZE);
 
 	/* align pull length to size of long to optimize memcpy performance */
 	memcpy(__skb_put(skb, headlen), va, ALIGN(headlen, sizeof(long)));
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index c289d97f477d..665ac75bb8ed 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -704,7 +704,7 @@ static void ice_pull_tail(struct sk_buff *skb)
 	/* we need the header to contain the greater of either ETH_HLEN or
 	 * 60 bytes if the skb->len is less than 60 for skb_pad.
 	 */
-	pull_len = eth_get_headlen(va, ICE_RX_HDR_SIZE);
+	pull_len = eth_get_headlen(skb_net(skb), va, ICE_RX_HDR_SIZE);
 
 	/* align pull length to size of long to optimize memcpy performance */
 	skb_copy_to_linear_data(skb, va, ALIGN(pull_len, sizeof(long)));
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 69b230c53fed..55388c2b72b7 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -8048,7 +8048,7 @@ static struct sk_buff *igb_construct_skb(struct igb_ring *rx_ring,
 	/* Determine available headroom for copy */
 	headlen = size;
 	if (headlen > IGB_RX_HDR_LEN)
-		headlen = eth_get_headlen(va, IGB_RX_HDR_LEN);
+		headlen = eth_get_headlen(skb_net(skb), va, IGB_RX_HDR_LEN);
 
 	/* align pull length to size of long to optimize memcpy performance */
 	memcpy(__skb_put(skb, headlen), va, ALIGN(headlen, sizeof(long)));
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 87a11879bf2d..9ea32e9d72c4 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -1150,7 +1150,7 @@ static struct sk_buff *igc_construct_skb(struct igc_ring *rx_ring,
 	/* Determine available headroom for copy */
 	headlen = size;
 	if (headlen > IGC_RX_HDR_LEN)
-		headlen = eth_get_headlen(va, IGC_RX_HDR_LEN);
+		headlen = eth_get_headlen(skb_net(skb), va, IGC_RX_HDR_LEN);
 
 	/* align pull length to size of long to optimize memcpy performance */
 	memcpy(__skb_put(skb, headlen), va, ALIGN(headlen, sizeof(long)));
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index e100054a3765..2fd75d761522 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1800,7 +1800,7 @@ static void ixgbe_pull_tail(struct ixgbe_ring *rx_ring,
 	 * we need the header to contain the greater of either ETH_HLEN or
 	 * 60 bytes if the skb->len is less than 60 for skb_pad.
 	 */
-	pull_len = eth_get_headlen(va, IXGBE_RX_HDR_SIZE);
+	pull_len = eth_get_headlen(skb_net(skb), va, IXGBE_RX_HDR_SIZE);
 
 	/* align pull length to size of long to optimize memcpy performance */
 	skb_copy_to_linear_data(skb, va, ALIGN(pull_len, sizeof(long)));
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 49e23afa05a2..c45be7487d57 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -895,7 +895,8 @@ struct sk_buff *ixgbevf_construct_skb(struct ixgbevf_ring *rx_ring,
 	/* Determine available headroom for copy */
 	headlen = size;
 	if (headlen > IXGBEVF_RX_HDR_SIZE)
-		headlen = eth_get_headlen(xdp->data, IXGBEVF_RX_HDR_SIZE);
+		headlen = eth_get_headlen(skb_net(skb), xdp->data,
+					  IXGBEVF_RX_HDR_SIZE);
 
 	/* align pull length to size of long to optimize memcpy performance */
 	memcpy(__skb_put(skb, headlen), xdp->data,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 25a8f8260c14..31beb9f92a2b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -163,7 +163,8 @@ static inline u16 mlx5e_calc_min_inline(enum mlx5_inline_modes mode,
 	case MLX5_INLINE_MODE_NONE:
 		return 0;
 	case MLX5_INLINE_MODE_TCP_UDP:
-		hlen = eth_get_headlen(skb->data, skb_headlen(skb));
+		hlen = eth_get_headlen(skb_net(skb), skb->data,
+				       skb_headlen(skb));
 		if (hlen == ETH_HLEN && !skb_vlan_tag_present(skb))
 			hlen += VLAN_HLEN;
 		break;
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index e9ca1c088d0b..dc7e343e4f79 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1966,7 +1966,8 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 
 	if (frags) {
 		/* Exercise flow dissector code path. */
-		u32 headlen = eth_get_headlen(skb->data, skb_headlen(skb));
+		u32 headlen = eth_get_headlen(skb_net(skb), skb->data,
+					      skb_headlen(skb));
 
 		if (unlikely(headlen > skb_headlen(skb))) {
 			this_cpu_inc(tun->pcpu_stats->rx_dropped);
diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h
index e2f3b21cd72a..782baa417685 100644
--- a/include/linux/etherdevice.h
+++ b/include/linux/etherdevice.h
@@ -33,7 +33,7 @@ struct device;
 int eth_platform_get_mac_address(struct device *dev, u8 *mac_addr);
 unsigned char *arch_get_platform_mac_address(void);
 int nvmem_get_mac_address(struct device *dev, void *addrbuf);
-u32 eth_get_headlen(void *data, unsigned int max_len);
+u32 eth_get_headlen(struct net *net, void *data, unsigned int max_len);
 __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev);
 extern const struct header_ops eth_header_ops;
 
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index 1e439549c419..a41e87816a48 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -119,13 +119,14 @@ EXPORT_SYMBOL(eth_header);
 
 /**
  * eth_get_headlen - determine the length of header for an ethernet frame
+ * @net: pointer to device network namespace
  * @data: pointer to start of frame
  * @len: total length of frame
  *
  * Make a best effort attempt to pull the length for all of the headers for
  * a given frame in a linear buffer.
  */
-u32 eth_get_headlen(void *data, unsigned int len)
+u32 eth_get_headlen(struct net *net, void *data, unsigned int len)
 {
 	const unsigned int flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG;
 	const struct ethhdr *eth = (const struct ethhdr *)data;
@@ -136,7 +137,7 @@ u32 eth_get_headlen(void *data, unsigned int len)
 		return len;
 
 	/* parse any remaining L2/L3 headers, check for L4 */
-	if (!skb_flow_dissect_flow_keys_basic(NULL, NULL, &keys, data,
+	if (!skb_flow_dissect_flow_keys_basic(net, NULL, &keys, data,
 					      eth->h_proto, sizeof(*eth),
 					      len, flags))
 		return max_t(u32, keys.control.thoff, sizeof(*eth));
-- 
2.21.0.225.g810b269d1ac-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next v2 2/9] net: introduce skb_net helper
  2019-03-19 22:19 ` [RFC bpf-next v2 2/9] net: introduce skb_net helper Stanislav Fomichev
@ 2019-03-20  2:14   ` Willem de Bruijn
  2019-03-20 16:49     ` Stanislav Fomichev
  0 siblings, 1 reply; 29+ messages in thread
From: Willem de Bruijn @ 2019-03-20  2:14 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Network Development, bpf, David Miller, Alexei Starovoitov,
	Daniel Borkmann, simon.horman, Willem de Bruijn, Petar Penkov

On Tue, Mar 19, 2019 at 6:20 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> skb_net returns network namespace from the associated device or socket.
>
> This will be used in the next commit.
>
> I tried to inline it in skbuff.h, but I don't think it's feasible.

If so, is this helper necessary?

All existing cases that use it pass both skb_net(skb) and skb to the
flow dissector, so can continue to call it locally if new argument net
is NULL.

The new users are in patch 9/9 in the device drivers, but those should
pass the device pointer unconditionally.



> It depends on 'net/sock.h' for sock_net() and 'linux/netdevice.h' for
> dev_net(), both of which we don't include from 'linux/skbuff.h' (but
> both sock.h and netdevice.h include skbuff.h). I though about doing
> it as a macro/putting it somewhere else, but we will have
> skb_flow_dissect{_xyz} use it in the next commits.
>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  include/linux/skbuff.h |  2 ++
>  net/core/skbuff.c      | 16 ++++++++++++++++
>  2 files changed, 18 insertions(+)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index e8c1d5b97f96..75e1d4d73cca 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -1275,6 +1275,8 @@ static inline int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr)
>  }
>  #endif
>
> +struct net *skb_net(const struct sk_buff *skb);
> +
>  struct bpf_flow_keys;
>  bool __skb_flow_bpf_dissect(struct bpf_prog *prog,
>                             const struct sk_buff *skb,
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index b413354ee709..d81f3a95fb4e 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -5725,3 +5725,19 @@ void __skb_ext_put(struct skb_ext *ext)
>  }
>  EXPORT_SYMBOL(__skb_ext_put);
>  #endif /* CONFIG_SKB_EXTENSIONS */
> +
> +/**
> + * skb_net - Return network namespace associated with skb.
> + * @skb: skb
> + *
> + * Returns pointer to struct net or NULL.
> + */
> +struct net *skb_net(const struct sk_buff *skb)
> +{
> +       if (skb->dev)
> +               return dev_net(skb->dev);
> +       else if (skb->sk)
> +               return sock_net(skb->sk);
> +       return NULL;
> +}
> +EXPORT_SYMBOL(skb_net);
> --
> 2.21.0.225.g810b269d1ac-goog
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next v2 7/9] bpf: when doing BPF_PROG_TEST_RUN for flow dissector use no-skb mode
  2019-03-19 22:19 ` [RFC bpf-next v2 7/9] bpf: when doing BPF_PROG_TEST_RUN for flow dissector use no-skb mode Stanislav Fomichev
@ 2019-03-20  2:14   ` Willem de Bruijn
  2019-03-20 16:57     ` Stanislav Fomichev
  0 siblings, 1 reply; 29+ messages in thread
From: Willem de Bruijn @ 2019-03-20  2:14 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Network Development, bpf, David Miller, Alexei Starovoitov,
	Daniel Borkmann, simon.horman, Willem de Bruijn, Petar Penkov

On Tue, Mar 19, 2019 at 6:21 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> Now that we have __flow_bpf_dissect which works on raw data (by
> constructing temporary on-stack skb), use it when doing
> BPF_PROG_TEST_RUN for flow dissector.
>
> This should help us catch any possible bugs due to missing shinfo on
> the per-cpu skb.
>
> Note that existing __skb_flow_bpf_dissect swallows L2 headers and returns
> nhoff=0, we need to preserve the existing behavior.
>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  net/bpf/test_run.c | 48 ++++++++++++++--------------------------------
>  1 file changed, 14 insertions(+), 34 deletions(-)
>

> @@ -300,9 +277,13 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
>         preempt_disable();
>         time_start = ktime_get_ns();
>         for (i = 0; i < repeat; i++) {
> -               retval = bpf_flow_dissect_skb(prog, skb,
> -                                             &flow_keys_dissector,
> -                                             &flow_keys);
> +               retval = bpf_flow_dissect(prog, data, eth->h_proto, ETH_HLEN,
> +                                         size, &flow_keys_dissector,
> +                                         &flow_keys);
> +               if (flow_keys.nhoff >= ETH_HLEN)
> +                       flow_keys.nhoff -= ETH_HLEN;
> +               if (flow_keys.thoff >= ETH_HLEN)
> +                       flow_keys.thoff -= ETH_HLEN;

why are these conditional?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next v2 2/9] net: introduce skb_net helper
  2019-03-20  2:14   ` Willem de Bruijn
@ 2019-03-20 16:49     ` Stanislav Fomichev
  0 siblings, 0 replies; 29+ messages in thread
From: Stanislav Fomichev @ 2019-03-20 16:49 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Stanislav Fomichev, Network Development, bpf, David Miller,
	Alexei Starovoitov, Daniel Borkmann, simon.horman,
	Willem de Bruijn, Petar Penkov

On 03/19, Willem de Bruijn wrote:
> On Tue, Mar 19, 2019 at 6:20 PM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > skb_net returns network namespace from the associated device or socket.
> >
> > This will be used in the next commit.
> >
> > I tried to inline it in skbuff.h, but I don't think it's feasible.
> 
> If so, is this helper necessary?
> 
> All existing cases that use it pass both skb_net(skb) and skb to the
> flow dissector, so can continue to call it locally if new argument net
> is NULL.
> 
> The new users are in patch 9/9 in the device drivers, but those should
> pass the device pointer unconditionally.
Good point, I can try to drop it.


> > It depends on 'net/sock.h' for sock_net() and 'linux/netdevice.h' for
> > dev_net(), both of which we don't include from 'linux/skbuff.h' (but
> > both sock.h and netdevice.h include skbuff.h). I though about doing
> > it as a macro/putting it somewhere else, but we will have
> > skb_flow_dissect{_xyz} use it in the next commits.
> >
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >  include/linux/skbuff.h |  2 ++
> >  net/core/skbuff.c      | 16 ++++++++++++++++
> >  2 files changed, 18 insertions(+)
> >
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index e8c1d5b97f96..75e1d4d73cca 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -1275,6 +1275,8 @@ static inline int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr)
> >  }
> >  #endif
> >
> > +struct net *skb_net(const struct sk_buff *skb);
> > +
> >  struct bpf_flow_keys;
> >  bool __skb_flow_bpf_dissect(struct bpf_prog *prog,
> >                             const struct sk_buff *skb,
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index b413354ee709..d81f3a95fb4e 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -5725,3 +5725,19 @@ void __skb_ext_put(struct skb_ext *ext)
> >  }
> >  EXPORT_SYMBOL(__skb_ext_put);
> >  #endif /* CONFIG_SKB_EXTENSIONS */
> > +
> > +/**
> > + * skb_net - Return network namespace associated with skb.
> > + * @skb: skb
> > + *
> > + * Returns pointer to struct net or NULL.
> > + */
> > +struct net *skb_net(const struct sk_buff *skb)
> > +{
> > +       if (skb->dev)
> > +               return dev_net(skb->dev);
> > +       else if (skb->sk)
> > +               return sock_net(skb->sk);
> > +       return NULL;
> > +}
> > +EXPORT_SYMBOL(skb_net);
> > --
> > 2.21.0.225.g810b269d1ac-goog
> >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next v2 7/9] bpf: when doing BPF_PROG_TEST_RUN for flow dissector use no-skb mode
  2019-03-20  2:14   ` Willem de Bruijn
@ 2019-03-20 16:57     ` Stanislav Fomichev
  2019-03-20 18:29       ` Willem de Bruijn
  0 siblings, 1 reply; 29+ messages in thread
From: Stanislav Fomichev @ 2019-03-20 16:57 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Stanislav Fomichev, Network Development, bpf, David Miller,
	Alexei Starovoitov, Daniel Borkmann, simon.horman,
	Willem de Bruijn, Petar Penkov

On 03/19, Willem de Bruijn wrote:
> On Tue, Mar 19, 2019 at 6:21 PM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > Now that we have __flow_bpf_dissect which works on raw data (by
> > constructing temporary on-stack skb), use it when doing
> > BPF_PROG_TEST_RUN for flow dissector.
> >
> > This should help us catch any possible bugs due to missing shinfo on
> > the per-cpu skb.
> >
> > Note that existing __skb_flow_bpf_dissect swallows L2 headers and returns
> > nhoff=0, we need to preserve the existing behavior.
> >
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >  net/bpf/test_run.c | 48 ++++++++++++++--------------------------------
> >  1 file changed, 14 insertions(+), 34 deletions(-)
> >
> 
> > @@ -300,9 +277,13 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
> >         preempt_disable();
> >         time_start = ktime_get_ns();
> >         for (i = 0; i < repeat; i++) {
> > -               retval = bpf_flow_dissect_skb(prog, skb,
> > -                                             &flow_keys_dissector,
> > -                                             &flow_keys);
> > +               retval = bpf_flow_dissect(prog, data, eth->h_proto, ETH_HLEN,
> > +                                         size, &flow_keys_dissector,
> > +                                         &flow_keys);
> > +               if (flow_keys.nhoff >= ETH_HLEN)
> > +                       flow_keys.nhoff -= ETH_HLEN;
> > +               if (flow_keys.thoff >= ETH_HLEN)
> > +                       flow_keys.thoff -= ETH_HLEN;
> 
> why are these conditional?
Hm, I didn't want these to be negative, because bpf flow program can set
them to zero and clamp_flow_keys makes sure they are in a "sensible"
range. For this particular case, I think we need to amend
clamp_flow_keys to make sure that flow_keys.nhoff is in the range of
initial_nhoff..hlen, not 0..hlen (and then we can drop these checks).

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next v2 7/9] bpf: when doing BPF_PROG_TEST_RUN for flow dissector use no-skb mode
  2019-03-20 16:57     ` Stanislav Fomichev
@ 2019-03-20 18:29       ` Willem de Bruijn
  2019-03-20 19:02         ` Stanislav Fomichev
  0 siblings, 1 reply; 29+ messages in thread
From: Willem de Bruijn @ 2019-03-20 18:29 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Stanislav Fomichev, Network Development, bpf, David Miller,
	Alexei Starovoitov, Daniel Borkmann, simon.horman,
	Willem de Bruijn, Petar Penkov

On Wed, Mar 20, 2019 at 12:57 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
>
> On 03/19, Willem de Bruijn wrote:
> > On Tue, Mar 19, 2019 at 6:21 PM Stanislav Fomichev <sdf@google.com> wrote:
> > >
> > > Now that we have __flow_bpf_dissect which works on raw data (by
> > > constructing temporary on-stack skb), use it when doing
> > > BPF_PROG_TEST_RUN for flow dissector.
> > >
> > > This should help us catch any possible bugs due to missing shinfo on
> > > the per-cpu skb.
> > >
> > > Note that existing __skb_flow_bpf_dissect swallows L2 headers and returns
> > > nhoff=0, we need to preserve the existing behavior.
> > >
> > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > ---
> > >  net/bpf/test_run.c | 48 ++++++++++++++--------------------------------
> > >  1 file changed, 14 insertions(+), 34 deletions(-)
> > >
> >
> > > @@ -300,9 +277,13 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
> > >         preempt_disable();
> > >         time_start = ktime_get_ns();
> > >         for (i = 0; i < repeat; i++) {
> > > -               retval = bpf_flow_dissect_skb(prog, skb,
> > > -                                             &flow_keys_dissector,
> > > -                                             &flow_keys);
> > > +               retval = bpf_flow_dissect(prog, data, eth->h_proto, ETH_HLEN,
> > > +                                         size, &flow_keys_dissector,
> > > +                                         &flow_keys);
> > > +               if (flow_keys.nhoff >= ETH_HLEN)
> > > +                       flow_keys.nhoff -= ETH_HLEN;
> > > +               if (flow_keys.thoff >= ETH_HLEN)
> > > +                       flow_keys.thoff -= ETH_HLEN;
> >
> > why are these conditional?
> Hm, I didn't want these to be negative, because bpf flow program can set
> them to zero and clamp_flow_keys makes sure they are in a "sensible"
> range. For this particular case, I think we need to amend
> clamp_flow_keys to make sure that flow_keys.nhoff is in the range of
> initial_nhoff..hlen, not 0..hlen (and then we can drop these checks).

So, previously eth_type_trans would call with data at the network
header. Now it is called with data at the link layer. How would
__skb_flow_bpf_dissect "swallows L2 headers and returns nhoff=0"? That
sounds incorrect.

Agreed that the output should lie between nhoff and hlen, but as far
as I can tell it is always zero indexed to the data passed, here the
link layer:

        if (!data) {
                data = skb->data;
                proto = skb_vlan_tag_present(skb) ?
                         skb->vlan_proto : skb->protocol;
                nhoff = skb_network_offset(skb);

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next v2 7/9] bpf: when doing BPF_PROG_TEST_RUN for flow dissector use no-skb mode
  2019-03-20 18:29       ` Willem de Bruijn
@ 2019-03-20 19:02         ` Stanislav Fomichev
  2019-03-20 19:08           ` Willem de Bruijn
  0 siblings, 1 reply; 29+ messages in thread
From: Stanislav Fomichev @ 2019-03-20 19:02 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Stanislav Fomichev, Network Development, bpf, David Miller,
	Alexei Starovoitov, Daniel Borkmann, simon.horman,
	Willem de Bruijn, Petar Penkov

On 03/20, Willem de Bruijn wrote:
> On Wed, Mar 20, 2019 at 12:57 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> >
> > On 03/19, Willem de Bruijn wrote:
> > > On Tue, Mar 19, 2019 at 6:21 PM Stanislav Fomichev <sdf@google.com> wrote:
> > > >
> > > > Now that we have __flow_bpf_dissect which works on raw data (by
> > > > constructing temporary on-stack skb), use it when doing
> > > > BPF_PROG_TEST_RUN for flow dissector.
> > > >
> > > > This should help us catch any possible bugs due to missing shinfo on
> > > > the per-cpu skb.
> > > >
> > > > Note that existing __skb_flow_bpf_dissect swallows L2 headers and returns
> > > > nhoff=0, we need to preserve the existing behavior.
> > > >
> > > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > > ---
> > > >  net/bpf/test_run.c | 48 ++++++++++++++--------------------------------
> > > >  1 file changed, 14 insertions(+), 34 deletions(-)
> > > >
> > >
> > > > @@ -300,9 +277,13 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
> > > >         preempt_disable();
> > > >         time_start = ktime_get_ns();
> > > >         for (i = 0; i < repeat; i++) {
> > > > -               retval = bpf_flow_dissect_skb(prog, skb,
> > > > -                                             &flow_keys_dissector,
> > > > -                                             &flow_keys);
> > > > +               retval = bpf_flow_dissect(prog, data, eth->h_proto, ETH_HLEN,
> > > > +                                         size, &flow_keys_dissector,
> > > > +                                         &flow_keys);
> > > > +               if (flow_keys.nhoff >= ETH_HLEN)
> > > > +                       flow_keys.nhoff -= ETH_HLEN;
> > > > +               if (flow_keys.thoff >= ETH_HLEN)
> > > > +                       flow_keys.thoff -= ETH_HLEN;
> > >
> > > why are these conditional?
> > Hm, I didn't want these to be negative, because bpf flow program can set
> > them to zero and clamp_flow_keys makes sure they are in a "sensible"
> > range. For this particular case, I think we need to amend
> > clamp_flow_keys to make sure that flow_keys.nhoff is in the range of
> > initial_nhoff..hlen, not 0..hlen (and then we can drop these checks).
> 
> So, previously eth_type_trans would call with data at the network
> header. Now it is called with data at the link layer. How would
> __skb_flow_bpf_dissect "swallows L2 headers and returns nhoff=0"? That
s/__skb_flow_bpf_dissect/eth_type_trans/, I'll clarify that in the patch
description.

> sounds incorrect.
Previously, for skb case, eth_type_trans would pull ETH_HLEN (L2) and
after that we did skb_reset_network_header. So when later we initialized
flow keys (flow_keys->nhoff = skb_network_offset(skb)), that would
yield nhoff == 0.

For example, see:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/prog_tests/flow_dissector.c

Now, we explicitly call bpf_flow_dissect with nhoff=ETH_HLEN and have to
undo it, otherwise, it breaks those tests.

We could do something like the following instead:
retval = bpf_flow_dissect(prog, data + ETH_HLEN, eth->h_proto, 0,
                          size, &flow_keys_dissector,
                          &flow_keys);

But I wanted to make sure nhoff != 0 works.

> 
> Agreed that the output should lie between nhoff and hlen, but as far
> as I can tell it is always zero indexed to the data passed, here the
> link layer:
> 
>         if (!data) {
>                 data = skb->data;
>                 proto = skb_vlan_tag_present(skb) ?
>                          skb->vlan_proto : skb->protocol;
>                 nhoff = skb_network_offset(skb);
That's for the skb != NULL case. eth_get_headlen calls with skb == NULL
and passes data and nhoff=sizeof(*eth):

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ethernet/eth.c#n139

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next v2 7/9] bpf: when doing BPF_PROG_TEST_RUN for flow dissector use no-skb mode
  2019-03-20 19:02         ` Stanislav Fomichev
@ 2019-03-20 19:08           ` Willem de Bruijn
  2019-03-20 19:19             ` Stanislav Fomichev
  0 siblings, 1 reply; 29+ messages in thread
From: Willem de Bruijn @ 2019-03-20 19:08 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Stanislav Fomichev, Network Development, bpf, David Miller,
	Alexei Starovoitov, Daniel Borkmann, simon.horman,
	Willem de Bruijn, Petar Penkov

On Wed, Mar 20, 2019 at 3:02 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
>
> On 03/20, Willem de Bruijn wrote:
> > On Wed, Mar 20, 2019 at 12:57 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > >
> > > On 03/19, Willem de Bruijn wrote:
> > > > On Tue, Mar 19, 2019 at 6:21 PM Stanislav Fomichev <sdf@google.com> wrote:
> > > > >
> > > > > Now that we have __flow_bpf_dissect which works on raw data (by
> > > > > constructing temporary on-stack skb), use it when doing
> > > > > BPF_PROG_TEST_RUN for flow dissector.
> > > > >
> > > > > This should help us catch any possible bugs due to missing shinfo on
> > > > > the per-cpu skb.
> > > > >
> > > > > Note that existing __skb_flow_bpf_dissect swallows L2 headers and returns
> > > > > nhoff=0, we need to preserve the existing behavior.
> > > > >
> > > > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > > > ---
> > > > >  net/bpf/test_run.c | 48 ++++++++++++++--------------------------------
> > > > >  1 file changed, 14 insertions(+), 34 deletions(-)
> > > > >
> > > >
> > > > > @@ -300,9 +277,13 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
> > > > >         preempt_disable();
> > > > >         time_start = ktime_get_ns();
> > > > >         for (i = 0; i < repeat; i++) {
> > > > > -               retval = bpf_flow_dissect_skb(prog, skb,
> > > > > -                                             &flow_keys_dissector,
> > > > > -                                             &flow_keys);
> > > > > +               retval = bpf_flow_dissect(prog, data, eth->h_proto, ETH_HLEN,
> > > > > +                                         size, &flow_keys_dissector,
> > > > > +                                         &flow_keys);
> > > > > +               if (flow_keys.nhoff >= ETH_HLEN)
> > > > > +                       flow_keys.nhoff -= ETH_HLEN;
> > > > > +               if (flow_keys.thoff >= ETH_HLEN)
> > > > > +                       flow_keys.thoff -= ETH_HLEN;
> > > >
> > > > why are these conditional?
> > > Hm, I didn't want these to be negative, because bpf flow program can set
> > > them to zero and clamp_flow_keys makes sure they are in a "sensible"
> > > range. For this particular case, I think we need to amend
> > > clamp_flow_keys to make sure that flow_keys.nhoff is in the range of
> > > initial_nhoff..hlen, not 0..hlen (and then we can drop these checks).
> >
> > So, previously eth_type_trans would call with data at the network
> > header. Now it is called with data at the link layer. How would
> > __skb_flow_bpf_dissect "swallows L2 headers and returns nhoff=0"? That
> s/__skb_flow_bpf_dissect/eth_type_trans/, I'll clarify that in the patch
> description.
>
> > sounds incorrect.
> Previously, for skb case, eth_type_trans would pull ETH_HLEN (L2) and
> after that we did skb_reset_network_header. So when later we initialized
> flow keys (flow_keys->nhoff = skb_network_offset(skb)), that would
> yield nhoff == 0.
>
> For example, see:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
>
> Now, we explicitly call bpf_flow_dissect with nhoff=ETH_HLEN and have to
> undo it, otherwise, it breaks those tests.
>
> We could do something like the following instead:
> retval = bpf_flow_dissect(prog, data + ETH_HLEN, eth->h_proto, 0,
>                           size, &flow_keys_dissector,
>                           &flow_keys);
>
> But I wanted to make sure nhoff != 0 works.

Makes sense. Ensuring that nhoff lies within initial_nhoff..hlen
sounds correct to me. But this is a limitation of the test, so should
be in the test logic, not in the generic clamp code. Perhaps just fail
the test if returned nhoff < ETH_HLEN?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next v2 7/9] bpf: when doing BPF_PROG_TEST_RUN for flow dissector use no-skb mode
  2019-03-20 19:08           ` Willem de Bruijn
@ 2019-03-20 19:19             ` Stanislav Fomichev
  2019-03-20 19:23               ` Willem de Bruijn
  0 siblings, 1 reply; 29+ messages in thread
From: Stanislav Fomichev @ 2019-03-20 19:19 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Stanislav Fomichev, Network Development, bpf, David Miller,
	Alexei Starovoitov, Daniel Borkmann, simon.horman,
	Willem de Bruijn, Petar Penkov

On 03/20, Willem de Bruijn wrote:
> On Wed, Mar 20, 2019 at 3:02 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> >
> > On 03/20, Willem de Bruijn wrote:
> > > On Wed, Mar 20, 2019 at 12:57 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > > >
> > > > On 03/19, Willem de Bruijn wrote:
> > > > > On Tue, Mar 19, 2019 at 6:21 PM Stanislav Fomichev <sdf@google.com> wrote:
> > > > > >
> > > > > > Now that we have __flow_bpf_dissect which works on raw data (by
> > > > > > constructing temporary on-stack skb), use it when doing
> > > > > > BPF_PROG_TEST_RUN for flow dissector.
> > > > > >
> > > > > > This should help us catch any possible bugs due to missing shinfo on
> > > > > > the per-cpu skb.
> > > > > >
> > > > > > Note that existing __skb_flow_bpf_dissect swallows L2 headers and returns
> > > > > > nhoff=0, we need to preserve the existing behavior.
> > > > > >
> > > > > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > > > > ---
> > > > > >  net/bpf/test_run.c | 48 ++++++++++++++--------------------------------
> > > > > >  1 file changed, 14 insertions(+), 34 deletions(-)
> > > > > >
> > > > >
> > > > > > @@ -300,9 +277,13 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
> > > > > >         preempt_disable();
> > > > > >         time_start = ktime_get_ns();
> > > > > >         for (i = 0; i < repeat; i++) {
> > > > > > -               retval = bpf_flow_dissect_skb(prog, skb,
> > > > > > -                                             &flow_keys_dissector,
> > > > > > -                                             &flow_keys);
> > > > > > +               retval = bpf_flow_dissect(prog, data, eth->h_proto, ETH_HLEN,
> > > > > > +                                         size, &flow_keys_dissector,
> > > > > > +                                         &flow_keys);
> > > > > > +               if (flow_keys.nhoff >= ETH_HLEN)
> > > > > > +                       flow_keys.nhoff -= ETH_HLEN;
> > > > > > +               if (flow_keys.thoff >= ETH_HLEN)
> > > > > > +                       flow_keys.thoff -= ETH_HLEN;
> > > > >
> > > > > why are these conditional?
> > > > Hm, I didn't want these to be negative, because bpf flow program can set
> > > > them to zero and clamp_flow_keys makes sure they are in a "sensible"
> > > > range. For this particular case, I think we need to amend
> > > > clamp_flow_keys to make sure that flow_keys.nhoff is in the range of
> > > > initial_nhoff..hlen, not 0..hlen (and then we can drop these checks).
> > >
> > > So, previously eth_type_trans would call with data at the network
> > > header. Now it is called with data at the link layer. How would
> > > __skb_flow_bpf_dissect "swallows L2 headers and returns nhoff=0"? That
> > s/__skb_flow_bpf_dissect/eth_type_trans/, I'll clarify that in the patch
> > description.
> >
> > > sounds incorrect.
> > Previously, for skb case, eth_type_trans would pull ETH_HLEN (L2) and
> > after that we did skb_reset_network_header. So when later we initialized
> > flow keys (flow_keys->nhoff = skb_network_offset(skb)), that would
> > yield nhoff == 0.
> >
> > For example, see:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
> >
> > Now, we explicitly call bpf_flow_dissect with nhoff=ETH_HLEN and have to
> > undo it, otherwise, it breaks those tests.
> >
> > We could do something like the following instead:
> > retval = bpf_flow_dissect(prog, data + ETH_HLEN, eth->h_proto, 0,
> >                           size, &flow_keys_dissector,
> >                           &flow_keys);
> >
> > But I wanted to make sure nhoff != 0 works.
> 
> Makes sense. Ensuring that nhoff lies within initial_nhoff..hlen
> sounds correct to me. But this is a limitation of the test, so should
> be in the test logic, not in the generic clamp code. Perhaps just fail
> the test if returned nhoff < ETH_HLEN?
I don't think it's only about the tests. BPF program can return
nhoff/thoff out of range as well (if there was some bug in its logic,
for example). We should not blindly trust whatever it returns, right?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next v2 7/9] bpf: when doing BPF_PROG_TEST_RUN for flow dissector use no-skb mode
  2019-03-20 19:19             ` Stanislav Fomichev
@ 2019-03-20 19:23               ` Willem de Bruijn
  2019-03-20 19:48                 ` Stanislav Fomichev
  0 siblings, 1 reply; 29+ messages in thread
From: Willem de Bruijn @ 2019-03-20 19:23 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Stanislav Fomichev, Network Development, bpf, David Miller,
	Alexei Starovoitov, Daniel Borkmann, simon.horman,
	Willem de Bruijn, Petar Penkov

On Wed, Mar 20, 2019 at 3:19 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
>
> On 03/20, Willem de Bruijn wrote:
> > On Wed, Mar 20, 2019 at 3:02 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > >
> > > On 03/20, Willem de Bruijn wrote:
> > > > On Wed, Mar 20, 2019 at 12:57 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > > > >
> > > > > On 03/19, Willem de Bruijn wrote:
> > > > > > On Tue, Mar 19, 2019 at 6:21 PM Stanislav Fomichev <sdf@google.com> wrote:
> > > > > > >
> > > > > > > Now that we have __flow_bpf_dissect which works on raw data (by
> > > > > > > constructing temporary on-stack skb), use it when doing
> > > > > > > BPF_PROG_TEST_RUN for flow dissector.
> > > > > > >
> > > > > > > This should help us catch any possible bugs due to missing shinfo on
> > > > > > > the per-cpu skb.
> > > > > > >
> > > > > > > Note that existing __skb_flow_bpf_dissect swallows L2 headers and returns
> > > > > > > nhoff=0, we need to preserve the existing behavior.
> > > > > > >
> > > > > > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > > > > > ---
> > > > > > >  net/bpf/test_run.c | 48 ++++++++++++++--------------------------------
> > > > > > >  1 file changed, 14 insertions(+), 34 deletions(-)
> > > > > > >
> > > > > >
> > > > > > > @@ -300,9 +277,13 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
> > > > > > >         preempt_disable();
> > > > > > >         time_start = ktime_get_ns();
> > > > > > >         for (i = 0; i < repeat; i++) {
> > > > > > > -               retval = bpf_flow_dissect_skb(prog, skb,
> > > > > > > -                                             &flow_keys_dissector,
> > > > > > > -                                             &flow_keys);
> > > > > > > +               retval = bpf_flow_dissect(prog, data, eth->h_proto, ETH_HLEN,
> > > > > > > +                                         size, &flow_keys_dissector,
> > > > > > > +                                         &flow_keys);
> > > > > > > +               if (flow_keys.nhoff >= ETH_HLEN)
> > > > > > > +                       flow_keys.nhoff -= ETH_HLEN;
> > > > > > > +               if (flow_keys.thoff >= ETH_HLEN)
> > > > > > > +                       flow_keys.thoff -= ETH_HLEN;
> > > > > >
> > > > > > why are these conditional?
> > > > > Hm, I didn't want these to be negative, because bpf flow program can set
> > > > > them to zero and clamp_flow_keys makes sure they are in a "sensible"
> > > > > range. For this particular case, I think we need to amend
> > > > > clamp_flow_keys to make sure that flow_keys.nhoff is in the range of
> > > > > initial_nhoff..hlen, not 0..hlen (and then we can drop these checks).
> > > >
> > > > So, previously eth_type_trans would call with data at the network
> > > > header. Now it is called with data at the link layer. How would
> > > > __skb_flow_bpf_dissect "swallows L2 headers and returns nhoff=0"? That
> > > s/__skb_flow_bpf_dissect/eth_type_trans/, I'll clarify that in the patch
> > > description.
> > >
> > > > sounds incorrect.
> > > Previously, for skb case, eth_type_trans would pull ETH_HLEN (L2) and
> > > after that we did skb_reset_network_header. So when later we initialized
> > > flow keys (flow_keys->nhoff = skb_network_offset(skb)), that would
> > > yield nhoff == 0.
> > >
> > > For example, see:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
> > >
> > > Now, we explicitly call bpf_flow_dissect with nhoff=ETH_HLEN and have to
> > > undo it, otherwise, it breaks those tests.
> > >
> > > We could do something like the following instead:
> > > retval = bpf_flow_dissect(prog, data + ETH_HLEN, eth->h_proto, 0,
> > >                           size, &flow_keys_dissector,
> > >                           &flow_keys);
> > >
> > > But I wanted to make sure nhoff != 0 works.
> >
> > Makes sense. Ensuring that nhoff lies within initial_nhoff..hlen
> > sounds correct to me. But this is a limitation of the test, so should
> > be in the test logic, not in the generic clamp code. Perhaps just fail
> > the test if returned nhoff < ETH_HLEN?
> I don't think it's only about the tests. BPF program can return
> nhoff/thoff out of range as well (if there was some bug in its logic,
> for example). We should not blindly trust whatever it returns, right?

Definitely. That's why we clamp. I'm not sure that we have to restrict
the minimum offset to initial nhoff, however.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next v2 7/9] bpf: when doing BPF_PROG_TEST_RUN for flow dissector use no-skb mode
  2019-03-20 19:23               ` Willem de Bruijn
@ 2019-03-20 19:48                 ` Stanislav Fomichev
  2019-03-20 20:03                   ` Willem de Bruijn
  0 siblings, 1 reply; 29+ messages in thread
From: Stanislav Fomichev @ 2019-03-20 19:48 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Stanislav Fomichev, Network Development, bpf, David Miller,
	Alexei Starovoitov, Daniel Borkmann, simon.horman,
	Willem de Bruijn, Petar Penkov

On 03/20, Willem de Bruijn wrote:
> On Wed, Mar 20, 2019 at 3:19 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> >
> > On 03/20, Willem de Bruijn wrote:
> > > On Wed, Mar 20, 2019 at 3:02 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > > >
> > > > On 03/20, Willem de Bruijn wrote:
> > > > > On Wed, Mar 20, 2019 at 12:57 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > > > > >
> > > > > > On 03/19, Willem de Bruijn wrote:
> > > > > > > On Tue, Mar 19, 2019 at 6:21 PM Stanislav Fomichev <sdf@google.com> wrote:
> > > > > > > >
> > > > > > > > Now that we have __flow_bpf_dissect which works on raw data (by
> > > > > > > > constructing temporary on-stack skb), use it when doing
> > > > > > > > BPF_PROG_TEST_RUN for flow dissector.
> > > > > > > >
> > > > > > > > This should help us catch any possible bugs due to missing shinfo on
> > > > > > > > the per-cpu skb.
> > > > > > > >
> > > > > > > > Note that existing __skb_flow_bpf_dissect swallows L2 headers and returns
> > > > > > > > nhoff=0, we need to preserve the existing behavior.
> > > > > > > >
> > > > > > > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > > > > > > ---
> > > > > > > >  net/bpf/test_run.c | 48 ++++++++++++++--------------------------------
> > > > > > > >  1 file changed, 14 insertions(+), 34 deletions(-)
> > > > > > > >
> > > > > > >
> > > > > > > > @@ -300,9 +277,13 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
> > > > > > > >         preempt_disable();
> > > > > > > >         time_start = ktime_get_ns();
> > > > > > > >         for (i = 0; i < repeat; i++) {
> > > > > > > > -               retval = bpf_flow_dissect_skb(prog, skb,
> > > > > > > > -                                             &flow_keys_dissector,
> > > > > > > > -                                             &flow_keys);
> > > > > > > > +               retval = bpf_flow_dissect(prog, data, eth->h_proto, ETH_HLEN,
> > > > > > > > +                                         size, &flow_keys_dissector,
> > > > > > > > +                                         &flow_keys);
> > > > > > > > +               if (flow_keys.nhoff >= ETH_HLEN)
> > > > > > > > +                       flow_keys.nhoff -= ETH_HLEN;
> > > > > > > > +               if (flow_keys.thoff >= ETH_HLEN)
> > > > > > > > +                       flow_keys.thoff -= ETH_HLEN;
> > > > > > >
> > > > > > > why are these conditional?
> > > > > > Hm, I didn't want these to be negative, because bpf flow program can set
> > > > > > them to zero and clamp_flow_keys makes sure they are in a "sensible"
> > > > > > range. For this particular case, I think we need to amend
> > > > > > clamp_flow_keys to make sure that flow_keys.nhoff is in the range of
> > > > > > initial_nhoff..hlen, not 0..hlen (and then we can drop these checks).
> > > > >
> > > > > So, previously eth_type_trans would call with data at the network
> > > > > header. Now it is called with data at the link layer. How would
> > > > > __skb_flow_bpf_dissect "swallows L2 headers and returns nhoff=0"? That
> > > > s/__skb_flow_bpf_dissect/eth_type_trans/, I'll clarify that in the patch
> > > > description.
> > > >
> > > > > sounds incorrect.
> > > > Previously, for skb case, eth_type_trans would pull ETH_HLEN (L2) and
> > > > after that we did skb_reset_network_header. So when later we initialized
> > > > flow keys (flow_keys->nhoff = skb_network_offset(skb)), that would
> > > > yield nhoff == 0.
> > > >
> > > > For example, see:
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
> > > >
> > > > Now, we explicitly call bpf_flow_dissect with nhoff=ETH_HLEN and have to
> > > > undo it, otherwise, it breaks those tests.
> > > >
> > > > We could do something like the following instead:
> > > > retval = bpf_flow_dissect(prog, data + ETH_HLEN, eth->h_proto, 0,
> > > >                           size, &flow_keys_dissector,
> > > >                           &flow_keys);
> > > >
> > > > But I wanted to make sure nhoff != 0 works.
> > >
> > > Makes sense. Ensuring that nhoff lies within initial_nhoff..hlen
> > > sounds correct to me. But this is a limitation of the test, so should
> > > be in the test logic, not in the generic clamp code. Perhaps just fail
> > > the test if returned nhoff < ETH_HLEN?
> > I don't think it's only about the tests. BPF program can return
> > nhoff/thoff out of range as well (if there was some bug in its logic,
> > for example). We should not blindly trust whatever it returns, right?
> 
> Definitely. That's why we clamp. I'm not sure that we have to restrict
> the minimum offset to initial nhoff, however.
Makes sense. TBH, only the tests currently care about nhoff that flow
dissector returns. In the kernel we use only thoff from bpf flow dissector
and ignore any modifications to the nhoff.

Do you think there is a usecase for nhoff possibly going backwards?
In other words, why not prohibit that from the beginning and set the
expectations strait (i.e. nhoff only grows).

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next v2 7/9] bpf: when doing BPF_PROG_TEST_RUN for flow dissector use no-skb mode
  2019-03-20 19:48                 ` Stanislav Fomichev
@ 2019-03-20 20:03                   ` Willem de Bruijn
  0 siblings, 0 replies; 29+ messages in thread
From: Willem de Bruijn @ 2019-03-20 20:03 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Stanislav Fomichev, Network Development, bpf, David Miller,
	Alexei Starovoitov, Daniel Borkmann, simon.horman,
	Willem de Bruijn, Petar Penkov

On Wed, Mar 20, 2019 at 3:48 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
>
> On 03/20, Willem de Bruijn wrote:
> > On Wed, Mar 20, 2019 at 3:19 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > >
> > > On 03/20, Willem de Bruijn wrote:
> > > > On Wed, Mar 20, 2019 at 3:02 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > > > >
> > > > > On 03/20, Willem de Bruijn wrote:
> > > > > > On Wed, Mar 20, 2019 at 12:57 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > > > > > >
> > > > > > > On 03/19, Willem de Bruijn wrote:
> > > > > > > > On Tue, Mar 19, 2019 at 6:21 PM Stanislav Fomichev <sdf@google.com> wrote:
> > > > > > > > >
> > > > > > > > > Now that we have __flow_bpf_dissect which works on raw data (by
> > > > > > > > > constructing temporary on-stack skb), use it when doing
> > > > > > > > > BPF_PROG_TEST_RUN for flow dissector.
> > > > > > > > >
> > > > > > > > > This should help us catch any possible bugs due to missing shinfo on
> > > > > > > > > the per-cpu skb.
> > > > > > > > >
> > > > > > > > > Note that existing __skb_flow_bpf_dissect swallows L2 headers and returns
> > > > > > > > > nhoff=0, we need to preserve the existing behavior.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > > > > > > > ---
> > > > > > > > >  net/bpf/test_run.c | 48 ++++++++++++++--------------------------------
> > > > > > > > >  1 file changed, 14 insertions(+), 34 deletions(-)
> > > > > > > > >
> > > > > > > >
> > > > > > > > > @@ -300,9 +277,13 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
> > > > > > > > >         preempt_disable();
> > > > > > > > >         time_start = ktime_get_ns();
> > > > > > > > >         for (i = 0; i < repeat; i++) {
> > > > > > > > > -               retval = bpf_flow_dissect_skb(prog, skb,
> > > > > > > > > -                                             &flow_keys_dissector,
> > > > > > > > > -                                             &flow_keys);
> > > > > > > > > +               retval = bpf_flow_dissect(prog, data, eth->h_proto, ETH_HLEN,
> > > > > > > > > +                                         size, &flow_keys_dissector,
> > > > > > > > > +                                         &flow_keys);
> > > > > > > > > +               if (flow_keys.nhoff >= ETH_HLEN)
> > > > > > > > > +                       flow_keys.nhoff -= ETH_HLEN;
> > > > > > > > > +               if (flow_keys.thoff >= ETH_HLEN)
> > > > > > > > > +                       flow_keys.thoff -= ETH_HLEN;
> > > > > > > >
> > > > > > > > why are these conditional?
> > > > > > > Hm, I didn't want these to be negative, because bpf flow program can set
> > > > > > > them to zero and clamp_flow_keys makes sure they are in a "sensible"
> > > > > > > range. For this particular case, I think we need to amend
> > > > > > > clamp_flow_keys to make sure that flow_keys.nhoff is in the range of
> > > > > > > initial_nhoff..hlen, not 0..hlen (and then we can drop these checks).
> > > > > >
> > > > > > So, previously eth_type_trans would call with data at the network
> > > > > > header. Now it is called with data at the link layer. How would
> > > > > > __skb_flow_bpf_dissect "swallows L2 headers and returns nhoff=0"? That
> > > > > s/__skb_flow_bpf_dissect/eth_type_trans/, I'll clarify that in the patch
> > > > > description.
> > > > >
> > > > > > sounds incorrect.
> > > > > Previously, for skb case, eth_type_trans would pull ETH_HLEN (L2) and
> > > > > after that we did skb_reset_network_header. So when later we initialized
> > > > > flow keys (flow_keys->nhoff = skb_network_offset(skb)), that would
> > > > > yield nhoff == 0.
> > > > >
> > > > > For example, see:
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
> > > > >
> > > > > Now, we explicitly call bpf_flow_dissect with nhoff=ETH_HLEN and have to
> > > > > undo it, otherwise, it breaks those tests.
> > > > >
> > > > > We could do something like the following instead:
> > > > > retval = bpf_flow_dissect(prog, data + ETH_HLEN, eth->h_proto, 0,
> > > > >                           size, &flow_keys_dissector,
> > > > >                           &flow_keys);
> > > > >
> > > > > But I wanted to make sure nhoff != 0 works.
> > > >
> > > > Makes sense. Ensuring that nhoff lies within initial_nhoff..hlen
> > > > sounds correct to me. But this is a limitation of the test, so should
> > > > be in the test logic, not in the generic clamp code. Perhaps just fail
> > > > the test if returned nhoff < ETH_HLEN?
> > > I don't think it's only about the tests. BPF program can return
> > > nhoff/thoff out of range as well (if there was some bug in its logic,
> > > for example). We should not blindly trust whatever it returns, right?
> >
> > Definitely. That's why we clamp. I'm not sure that we have to restrict
> > the minimum offset to initial nhoff, however.
> Makes sense. TBH, only the tests currently care about nhoff that flow
> dissector returns. In the kernel we use only thoff from bpf flow dissector
> and ignore any modifications to the nhoff.
>
> Do you think there is a usecase for nhoff possibly going backwards?
> In other words, why not prohibit that from the beginning and set the
> expectations strait (i.e. nhoff only grows).

Fair point. That is how the non bpf flow dissector works. And if the
initial offset is always sensible, indeed I see no reasonable case
where the program would return a lower value. I was a bit concerned
about that precondition. In practice, all but one caller passes 0 and
data at network header, where this discussion is moot. And the
exception is eth_get_headlen which hardcodes ETH_HLEN. Given that, y
our original suggestion to adjust the clamp function SGTM.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next v2 1/9] net: introduce __init_skb{,_data,_shinfo} helpers
  2019-03-19 22:19 ` [RFC bpf-next v2 1/9] net: introduce __init_skb{,_data,_shinfo} helpers Stanislav Fomichev
@ 2019-03-21  3:39   ` Alexei Starovoitov
  2019-03-21  4:44     ` Eric Dumazet
  0 siblings, 1 reply; 29+ messages in thread
From: Alexei Starovoitov @ 2019-03-21  3:39 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: netdev, bpf, davem, ast, daniel, simon.horman, willemb, peterpenkov96

On Tue, Mar 19, 2019 at 03:19:40PM -0700, Stanislav Fomichev wrote:
> __init_skb is essentially a version of __build_skb which accepts skb as
> an argument (instead of doing kmem_cache_alloc to allocate it).
> 
> __init_skb_shinfo initializes shinfo.
> 
> __init_skb_data initializes skb data pointers.
> 
> No functional changes.
> 
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  include/linux/skbuff.h | 14 ++++++++++
>  net/core/skbuff.c      | 60 +++++++++++++++++-------------------------
>  2 files changed, 38 insertions(+), 36 deletions(-)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 9027a8c4219f..e8c1d5b97f96 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -4399,5 +4399,19 @@ static inline __wsum lco_csum(struct sk_buff *skb)
>  	return csum_partial(l4_hdr, csum_start - l4_hdr, partial);
>  }
>  
> +static inline void __init_skb_data(struct sk_buff *skb, u8 *data,
> +				   unsigned int size)
> +{
> +	/* Account for allocated memory : skb + skb->head */
> +	skb->truesize = SKB_TRUESIZE(size);
> +	refcount_set(&skb->users, 1);
> +	skb->head = data;
> +	skb->data = data;
> +	skb_reset_tail_pointer(skb);
> +	skb->end = skb->tail + size;
> +	skb->mac_header = (typeof(skb->mac_header))~0U;
> +	skb->transport_header = (typeof(skb->transport_header))~0U;
> +}
> +
>  #endif	/* __KERNEL__ */
>  #endif	/* _LINUX_SKBUFF_H */
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 2415d9cb9b89..b413354ee709 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -160,6 +160,26 @@ static void *__kmalloc_reserve(size_t size, gfp_t flags, int node,
>   *
>   */
>  
> +static inline void __init_skb(struct sk_buff *skb, u8 *data, unsigned int size)
> +{
> +	/* Only clear those fields we need to clear, not those that we will
> +	 * actually initialize below. Hence, don't put any more fields after
> +	 * the tail pointer in struct sk_buff!
> +	 */
> +	memset(skb, 0, offsetof(struct sk_buff, tail));
> +	__init_skb_data(skb, data, size);
> +}
> +
> +static inline void __init_skb_shinfo(struct sk_buff *skb)
> +{
> +	struct skb_shared_info *shinfo;
> +
> +	/* make sure we initialize shinfo sequentially */
> +	shinfo = skb_shinfo(skb);
> +	memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
> +	atomic_set(&shinfo->dataref, 1);
> +}
> +
>  /**
>   *	__alloc_skb	-	allocate a network buffer
>   *	@size: size to allocate
> @@ -181,7 +201,6 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
>  			    int flags, int node)
>  {
>  	struct kmem_cache *cache;
> -	struct skb_shared_info *shinfo;
>  	struct sk_buff *skb;
>  	u8 *data;
>  	bool pfmemalloc;
> @@ -215,27 +234,9 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
>  	size = SKB_WITH_OVERHEAD(ksize(data));
>  	prefetchw(data + size);
>  
> -	/*
> -	 * Only clear those fields we need to clear, not those that we will
> -	 * actually initialise below. Hence, don't put any more fields after
> -	 * the tail pointer in struct sk_buff!
> -	 */
> -	memset(skb, 0, offsetof(struct sk_buff, tail));
> -	/* Account for allocated memory : skb + skb->head */
> -	skb->truesize = SKB_TRUESIZE(size);
> +	__init_skb(skb, data, size);
> +	__init_skb_shinfo(skb);
>  	skb->pfmemalloc = pfmemalloc;
> -	refcount_set(&skb->users, 1);
> -	skb->head = data;
> -	skb->data = data;
> -	skb_reset_tail_pointer(skb);
> -	skb->end = skb->tail + size;
> -	skb->mac_header = (typeof(skb->mac_header))~0U;
> -	skb->transport_header = (typeof(skb->transport_header))~0U;
> -
> -	/* make sure we initialize shinfo sequentially */
> -	shinfo = skb_shinfo(skb);
> -	memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
> -	atomic_set(&shinfo->dataref, 1);
>  
>  	if (flags & SKB_ALLOC_FCLONE) {
>  		struct sk_buff_fclones *fclones;
> @@ -277,7 +278,6 @@ EXPORT_SYMBOL(__alloc_skb);
>   */
>  struct sk_buff *__build_skb(void *data, unsigned int frag_size)
>  {
> -	struct skb_shared_info *shinfo;
>  	struct sk_buff *skb;
>  	unsigned int size = frag_size ? : ksize(data);
>  
> @@ -287,20 +287,8 @@ struct sk_buff *__build_skb(void *data, unsigned int frag_size)
>  
>  	size -= SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>  
> -	memset(skb, 0, offsetof(struct sk_buff, tail));
> -	skb->truesize = SKB_TRUESIZE(size);
> -	refcount_set(&skb->users, 1);
> -	skb->head = data;
> -	skb->data = data;
> -	skb_reset_tail_pointer(skb);
> -	skb->end = skb->tail + size;
> -	skb->mac_header = (typeof(skb->mac_header))~0U;
> -	skb->transport_header = (typeof(skb->transport_header))~0U;
> -
> -	/* make sure we initialize shinfo sequentially */
> -	shinfo = skb_shinfo(skb);
> -	memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
> -	atomic_set(&shinfo->dataref, 1);
> +	__init_skb(skb, data, size);
> +	__init_skb_shinfo(skb);

I think you need to convince Dave and Eric that
above surgery is necessary to do the hack in patch 6 with
+static DEFINE_PER_CPU(struct sk_buff, bpf_flow_skb);

I think the better option it to introduce new prog type that works
without skb. I think it can be pretty close to shape and form to xdp.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next v2 1/9] net: introduce __init_skb{,_data,_shinfo} helpers
  2019-03-21  3:39   ` Alexei Starovoitov
@ 2019-03-21  4:44     ` Eric Dumazet
  2019-03-21 13:58       ` Willem de Bruijn
  0 siblings, 1 reply; 29+ messages in thread
From: Eric Dumazet @ 2019-03-21  4:44 UTC (permalink / raw)
  To: Alexei Starovoitov, Stanislav Fomichev
  Cc: netdev, bpf, davem, ast, daniel, simon.horman, willemb, peterpenkov96



On 03/20/2019 08:39 PM, Alexei Starovoitov wrote:

> I think you need to convince Dave and Eric that
> above surgery is necessary to do the hack in patch 6 with
> +static DEFINE_PER_CPU(struct sk_buff, bpf_flow_skb);
> 

Yes, this is a huge code churn.

Honestly I believe we are going too far in this series.

> I think the better option it to introduce new prog type that works
> without skb. I think it can be pretty close to shape and form to xdp.
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next v2 1/9] net: introduce __init_skb{,_data,_shinfo} helpers
  2019-03-21  4:44     ` Eric Dumazet
@ 2019-03-21 13:58       ` Willem de Bruijn
  2019-03-21 15:44         ` Stanislav Fomichev
  0 siblings, 1 reply; 29+ messages in thread
From: Willem de Bruijn @ 2019-03-21 13:58 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexei Starovoitov, Stanislav Fomichev, Network Development, bpf,
	David Miller, Alexei Starovoitov, Daniel Borkmann, simon.horman,
	Willem de Bruijn, Petar Penkov

On Thu, Mar 21, 2019 at 12:45 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 03/20/2019 08:39 PM, Alexei Starovoitov wrote:
>
> > I think you need to convince Dave and Eric that
> > above surgery is necessary to do the hack in patch 6 with
> > +static DEFINE_PER_CPU(struct sk_buff, bpf_flow_skb);
> >
>
> Yes, this is a huge code churn.

It touches a lot of lines. But it deduplicates logic that might become
out of sync if we don't do that now.

> Honestly I believe we are going too far in this series.
>
> > I think the better option it to introduce new prog type that works
> > without skb. I think it can be pretty close to shape and form to xdp.
> >

That's an interesting alternative. The existing sample dissector
does not access any skb fields aside from vlan. The interface could
be superseded with one that has a more constrained context, like xdp.

When parsing for headlen in the device driver rx path, a program could
conceivably in the same pass also compute an rxhash in case the
device did not supply an L4 one. And perhaps some hash (the same?)
to speed up GRO flow lookup. This would require a few extra fields in
bpf_flow_keys.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next v2 1/9] net: introduce __init_skb{,_data,_shinfo} helpers
  2019-03-21 13:58       ` Willem de Bruijn
@ 2019-03-21 15:44         ` Stanislav Fomichev
  2019-03-21 16:00           ` Alexei Starovoitov
  0 siblings, 1 reply; 29+ messages in thread
From: Stanislav Fomichev @ 2019-03-21 15:44 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Eric Dumazet, Alexei Starovoitov, Stanislav Fomichev,
	Network Development, bpf, David Miller, Alexei Starovoitov,
	Daniel Borkmann, simon.horman, Willem de Bruijn, Petar Penkov

On 03/21, Willem de Bruijn wrote:
> On Thu, Mar 21, 2019 at 12:45 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> >
> >
> > On 03/20/2019 08:39 PM, Alexei Starovoitov wrote:
> >
> > > I think you need to convince Dave and Eric that
> > > above surgery is necessary to do the hack in patch 6 with
> > > +static DEFINE_PER_CPU(struct sk_buff, bpf_flow_skb);
> > >
> >
> > Yes, this is a huge code churn.
> 
> It touches a lot of lines. But it deduplicates logic that might become
> out of sync if we don't do that now.
Yeah, I remove a bunch of copy-paste with this change.

> > Honestly I believe we are going too far in this series.
> >
> > > I think the better option it to introduce new prog type that works
> > > without skb. I think it can be pretty close to shape and form to xdp.
> > >
> 
> That's an interesting alternative. The existing sample dissector
> does not access any skb fields aside from vlan. The interface could
> be superseded with one that has a more constrained context, like xdp.
> 
> When parsing for headlen in the device driver rx path, a program could
> conceivably in the same pass also compute an rxhash in case the
> device did not supply an L4 one. And perhaps some hash (the same?)
> to speed up GRO flow lookup. This would require a few extra fields in
> bpf_flow_keys.
With the xdp-like interface, what would be the end goal? Two different
program types, one triggering from eth_get_headlen, the other from
existing __skb_flow_dissect? Or we can use xdp-like interface everywhere
and just pass the first fragment in __skb_flow_dissect (that should be
enough, in theory, to get everything we want)?

If we end up with two interface, that can be confusing to the users. Which
one to use? Why? Why write essentially the same program twice?

If we can agree that we switch everything to xpd-like, do we deprecate the
skb-one?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next v2 1/9] net: introduce __init_skb{,_data,_shinfo} helpers
  2019-03-21 15:44         ` Stanislav Fomichev
@ 2019-03-21 16:00           ` Alexei Starovoitov
  2019-03-21 16:13             ` Willem de Bruijn
  0 siblings, 1 reply; 29+ messages in thread
From: Alexei Starovoitov @ 2019-03-21 16:00 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Willem de Bruijn, Eric Dumazet, Stanislav Fomichev,
	Network Development, bpf, David Miller, Alexei Starovoitov,
	Daniel Borkmann, simon.horman, Willem de Bruijn, Petar Penkov

On Thu, Mar 21, 2019 at 08:44:33AM -0700, Stanislav Fomichev wrote:
> 
> If we can agree that we switch everything to xpd-like, do we deprecate the
> skb-one?

This whole discussion that have been going on for long time is an indication
that initial bpf flow dissector concept was not thought through
and I regret on merging it in the first place.
Adding more hacks on top of it with fake skbs is not going to make it any better.
Since it's been in the official release we cannot remove it now.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next v2 1/9] net: introduce __init_skb{,_data,_shinfo} helpers
  2019-03-21 16:00           ` Alexei Starovoitov
@ 2019-03-21 16:13             ` Willem de Bruijn
  2019-03-21 20:56               ` Alexei Starovoitov
  0 siblings, 1 reply; 29+ messages in thread
From: Willem de Bruijn @ 2019-03-21 16:13 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Stanislav Fomichev, Eric Dumazet, Stanislav Fomichev,
	Network Development, bpf, David Miller, Alexei Starovoitov,
	Daniel Borkmann, simon.horman, Willem de Bruijn, Petar Penkov

On Thu, Mar 21, 2019 at 12:01 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Mar 21, 2019 at 08:44:33AM -0700, Stanislav Fomichev wrote:
> >
> > If we can agree that we switch everything to xpd-like, do we deprecate the
> > skb-one?
>
> This whole discussion that have been going on for long time is an indication
> that initial bpf flow dissector concept was not thought through
> and I regret on merging it in the first place.
> Adding more hacks on top of it with fake skbs is not going to make it any better.
> Since it's been in the official release we cannot remove it now.

This patch set addresses the only open issue.

That said, if direction is towards an alternative interface, then it would
make sense for the new interface to supplant the existing one for all
use-cases, even if that existing one cannot be removed.

Essentially a BPF_PROG_TYPE_FLOW_DISSECTOR_RAW that
takes a simpler context than skb. And either that or a program of
type BPF_PROG_TYPE_FLOW_DISSECTOR can be attached in
skb_flow_dissector_bpf_prog_attach, but not both.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next v2 1/9] net: introduce __init_skb{,_data,_shinfo} helpers
  2019-03-21 16:13             ` Willem de Bruijn
@ 2019-03-21 20:56               ` Alexei Starovoitov
  2019-03-21 21:13                 ` Stanislav Fomichev
  0 siblings, 1 reply; 29+ messages in thread
From: Alexei Starovoitov @ 2019-03-21 20:56 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Stanislav Fomichev, Eric Dumazet, Stanislav Fomichev,
	Network Development, bpf, David Miller, Alexei Starovoitov,
	Daniel Borkmann, Simon Horman, Willem de Bruijn, Petar Penkov

On Thu, Mar 21, 2019 at 9:13 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Thu, Mar 21, 2019 at 12:01 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Mar 21, 2019 at 08:44:33AM -0700, Stanislav Fomichev wrote:
> > >
> > > If we can agree that we switch everything to xpd-like, do we deprecate the
> > > skb-one?
> >
> > This whole discussion that have been going on for long time is an indication
> > that initial bpf flow dissector concept was not thought through
> > and I regret on merging it in the first place.
> > Adding more hacks on top of it with fake skbs is not going to make it any better.
> > Since it's been in the official release we cannot remove it now.
>
> This patch set addresses the only open issue.
>
> That said, if direction is towards an alternative interface, then it would
> make sense for the new interface to supplant the existing one for all
> use-cases, even if that existing one cannot be removed.
>
> Essentially a BPF_PROG_TYPE_FLOW_DISSECTOR_RAW that
> takes a simpler context than skb. And either that or a program of
> type BPF_PROG_TYPE_FLOW_DISSECTOR can be attached in
> skb_flow_dissector_bpf_prog_attach, but not both.

another idea is to keep 'struct __sk_buff' as a context,
but have kernel side to be something like struct xdp_buff
and disallow access to fields of __sk_buff depending on
attach_type.
and do different ctx rewrite for __sk_buff->foo
depending on attach_type.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next v2 1/9] net: introduce __init_skb{,_data,_shinfo} helpers
  2019-03-21 20:56               ` Alexei Starovoitov
@ 2019-03-21 21:13                 ` Stanislav Fomichev
  0 siblings, 0 replies; 29+ messages in thread
From: Stanislav Fomichev @ 2019-03-21 21:13 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Willem de Bruijn, Eric Dumazet, Stanislav Fomichev,
	Network Development, bpf, David Miller, Alexei Starovoitov,
	Daniel Borkmann, Simon Horman, Willem de Bruijn, Petar Penkov

On 03/21, Alexei Starovoitov wrote:
> On Thu, Mar 21, 2019 at 9:13 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > On Thu, Mar 21, 2019 at 12:01 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Thu, Mar 21, 2019 at 08:44:33AM -0700, Stanislav Fomichev wrote:
> > > >
> > > > If we can agree that we switch everything to xpd-like, do we deprecate the
> > > > skb-one?
> > >
> > > This whole discussion that have been going on for long time is an indication
> > > that initial bpf flow dissector concept was not thought through
> > > and I regret on merging it in the first place.
> > > Adding more hacks on top of it with fake skbs is not going to make it any better.
> > > Since it's been in the official release we cannot remove it now.
> >
> > This patch set addresses the only open issue.
> >
> > That said, if direction is towards an alternative interface, then it would
> > make sense for the new interface to supplant the existing one for all
> > use-cases, even if that existing one cannot be removed.
> >
> > Essentially a BPF_PROG_TYPE_FLOW_DISSECTOR_RAW that
> > takes a simpler context than skb. And either that or a program of
> > type BPF_PROG_TYPE_FLOW_DISSECTOR can be attached in
> > skb_flow_dissector_bpf_prog_attach, but not both.
> 
> another idea is to keep 'struct __sk_buff' as a context,
> but have kernel side to be something like struct xdp_buff
> and disallow access to fields of __sk_buff depending on
> attach_type.
> and do different ctx rewrite for __sk_buff->foo
> depending on attach_type.
That's a great idea! The only problem that in this case, I guess, we
need another attach_type and need to ask users explicitly attach to it.

What if we go a bit further: always have xdp_buff-like struct in the kernel?
For the real skb it would have data..data_end point to linear packet data (or
the first fragment). We would always prohibit access to most __sk_buff
fields (like in patch #5 from this series). We'd have to write our
own bpf_skb_load_bytes, but that doesn't sound like a big issue.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2019-03-21 21:13 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-19 22:19 [RFC bpf-next v2 0/9] net: flow_dissector: trigger BPF hook when called from eth_get_headlen Stanislav Fomichev
2019-03-19 22:19 ` [RFC bpf-next v2 1/9] net: introduce __init_skb{,_data,_shinfo} helpers Stanislav Fomichev
2019-03-21  3:39   ` Alexei Starovoitov
2019-03-21  4:44     ` Eric Dumazet
2019-03-21 13:58       ` Willem de Bruijn
2019-03-21 15:44         ` Stanislav Fomichev
2019-03-21 16:00           ` Alexei Starovoitov
2019-03-21 16:13             ` Willem de Bruijn
2019-03-21 20:56               ` Alexei Starovoitov
2019-03-21 21:13                 ` Stanislav Fomichev
2019-03-19 22:19 ` [RFC bpf-next v2 2/9] net: introduce skb_net helper Stanislav Fomichev
2019-03-20  2:14   ` Willem de Bruijn
2019-03-20 16:49     ` Stanislav Fomichev
2019-03-19 22:19 ` [RFC bpf-next v2 3/9] net: plumb network namespace into __skb_flow_dissect Stanislav Fomichev
2019-03-19 22:19 ` [RFC bpf-next v2 4/9] net: flow_dissector: prepare for no-skb use case Stanislav Fomichev
2019-03-19 22:19 ` [RFC bpf-next v2 5/9] flow_dissector: allow access only to a subset of __sk_buff fields Stanislav Fomichev
2019-03-19 22:19 ` [RFC bpf-next v2 6/9] net: flow_dissector: handle no-skb use case Stanislav Fomichev
2019-03-19 22:19 ` [RFC bpf-next v2 7/9] bpf: when doing BPF_PROG_TEST_RUN for flow dissector use no-skb mode Stanislav Fomichev
2019-03-20  2:14   ` Willem de Bruijn
2019-03-20 16:57     ` Stanislav Fomichev
2019-03-20 18:29       ` Willem de Bruijn
2019-03-20 19:02         ` Stanislav Fomichev
2019-03-20 19:08           ` Willem de Bruijn
2019-03-20 19:19             ` Stanislav Fomichev
2019-03-20 19:23               ` Willem de Bruijn
2019-03-20 19:48                 ` Stanislav Fomichev
2019-03-20 20:03                   ` Willem de Bruijn
2019-03-19 22:19 ` [RFC bpf-next v2 8/9] selftests/bpf: add flow dissector bpf_skb_load_bytes helper test Stanislav Fomichev
2019-03-19 22:19 ` [RFC bpf-next v2 9/9] net: flow_dissector: pass net argument to the eth_get_headlen Stanislav Fomichev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).