netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 net-next 0/6] flow_dissector: Protocol specific flow dissector offload
@ 2017-08-29 23:27 Tom Herbert
  2017-08-29 23:27 ` [PATCH v2 net-next 1/6] flow_dissector: Move ETH_P_TEB processing to main switch Tom Herbert
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Tom Herbert @ 2017-08-29 23:27 UTC (permalink / raw)
  To: davem; +Cc: netdev, Tom Herbert

This patch set adds a new offload type to perform flow dissection for
specific protocols (either by EtherType or by IP protocol). This is
primary useful to crack open UDP encapsulations (like VXLAN, GUE) for
the purposes of parsing the encapsulated packet.

Items in this patch set:
- Constify skb argument to UDP lookup functions
- Create new protocol case in __skb_dissect for ETH_P_TEB. This is based
  on the code in the GRE dissect function and the special handling in
  GRE can now be removed (it sets protocol to ETH_P_TEB and returns so
  goto proto_again is done)
- Add infrastructure for protocol specific flow dissection offload
- Add infrastructure to perform UDP flow dissection. Uses same model of
  GRO where a flow_dissect callback can be associated with a UDP
  socket
- Use the infrastructure to support flow dissection of VXLAN and GUE

Tested:

Forced RPS to call flow dissection for VXLAN, FOU, and GUE. Observed
that inner packet was being properly dissected.

v2: Add signed off

Tom Herbert (6):
  flow_dissector: Move ETH_P_TEB processing to main switch
  udp: Constify skb argument in lookup functions
  flow_dissector: Add protocol specific flow dissection offload
  udp: flow dissector offload
  fou: Support flow dissection
  vxlan: support flow dissect

 drivers/net/vxlan.c          |  50 ++++++++++++
 include/linux/netdevice.h    |   7 ++
 include/linux/udp.h          |   8 ++
 include/net/flow_dissector.h |   9 +++
 include/net/ip.h             |   2 +-
 include/net/sock_reuseport.h |   2 +-
 include/net/udp.h            |  19 +++--
 include/net/udp_tunnel.h     |   8 ++
 net/core/dev.c               |  14 ++++
 net/core/flow_dissector.c    | 176 +++++++++++++++++++++++++++++--------------
 net/core/sock_reuseport.c    |   5 +-
 net/ipv4/fou.c               |  63 ++++++++++++++++
 net/ipv4/route.c             |   4 +-
 net/ipv4/udp.c               |  11 +--
 net/ipv4/udp_offload.c       |  45 +++++++++++
 net/ipv4/udp_tunnel.c        |   1 +
 net/ipv6/udp.c               |  10 +--
 net/ipv6/udp_offload.c       |  13 ++++
 18 files changed, 369 insertions(+), 78 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 net-next 1/6] flow_dissector: Move ETH_P_TEB processing to main switch
  2017-08-29 23:27 [PATCH v2 net-next 0/6] flow_dissector: Protocol specific flow dissector offload Tom Herbert
@ 2017-08-29 23:27 ` Tom Herbert
  2017-08-29 23:27 ` [PATCH v2 net-next 2/6] udp: Constify skb argument in lookup functions Tom Herbert
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Tom Herbert @ 2017-08-29 23:27 UTC (permalink / raw)
  To: davem; +Cc: netdev, Tom Herbert

Support for processing TEB is currently in GRE flow dissection as a
special case. This can be moved to be a case the main proto switch in
__skb_flow_dissect.

Signed-off-by: Tom Herbert <tom@quantonium.net>
---
 net/core/flow_dissector.c | 44 +++++++++++++++++++++++---------------------
 1 file changed, 23 insertions(+), 21 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index e2eaa1ff948d..12302acdb073 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -288,27 +288,8 @@ __skb_flow_dissect_gre(const struct sk_buff *skb,
 	if (hdr->flags & GRE_SEQ)
 		offset += sizeof(((struct pptp_gre_header *) 0)->seq);
 
-	if (gre_ver == 0) {
-		if (*p_proto == htons(ETH_P_TEB)) {
-			const struct ethhdr *eth;
-			struct ethhdr _eth;
-
-			eth = __skb_header_pointer(skb, *p_nhoff + offset,
-						   sizeof(_eth),
-						   data, *p_hlen, &_eth);
-			if (!eth)
-				return FLOW_DISSECT_RET_OUT_BAD;
-			*p_proto = eth->h_proto;
-			offset += sizeof(*eth);
-
-			/* Cap headers that we access via pointers at the
-			 * end of the Ethernet header as our maximum alignment
-			 * at that point is only 2 bytes.
-			 */
-			if (NET_IP_ALIGN)
-				*p_hlen = *p_nhoff + offset;
-		}
-	} else { /* version 1, must be PPTP */
+	/* version 1, must be PPTP */
+	if (gre_ver == 1) {
 		u8 _ppp_hdr[PPP_HDRLEN];
 		u8 *ppp_hdr;
 
@@ -573,6 +554,27 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 
 		break;
 	}
+	case htons(ETH_P_TEB): {
+		const struct ethhdr *eth;
+		struct ethhdr _eth;
+
+		eth = __skb_header_pointer(skb, nhoff, sizeof(_eth),
+					   data, hlen, &_eth);
+		if (!eth)
+			goto out_bad;
+
+		proto = eth->h_proto;
+		nhoff += sizeof(*eth);
+
+		/* Cap headers that we access via pointers at the
+		 * end of the Ethernet header as our maximum alignment
+		 * at that point is only 2 bytes.
+		 */
+		if (NET_IP_ALIGN)
+			hlen = nhoff;
+
+		goto proto_again;
+	}
 	case htons(ETH_P_8021AD):
 	case htons(ETH_P_8021Q): {
 		const struct vlan_hdr *vlan;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 net-next 2/6] udp: Constify skb argument in lookup functions
  2017-08-29 23:27 [PATCH v2 net-next 0/6] flow_dissector: Protocol specific flow dissector offload Tom Herbert
  2017-08-29 23:27 ` [PATCH v2 net-next 1/6] flow_dissector: Move ETH_P_TEB processing to main switch Tom Herbert
@ 2017-08-29 23:27 ` Tom Herbert
  2017-08-30  0:58   ` David Miller
  2017-08-29 23:27 ` [PATCH v2 net-next 3/6] flow_dissector: Add protocol specific flow dissection offload Tom Herbert
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Tom Herbert @ 2017-08-29 23:27 UTC (permalink / raw)
  To: davem; +Cc: netdev, Tom Herbert

For UDP socket lookup functions, and associateed functions that take an
skbuf as argument, declare the skb argument as constant.

One caveat is that reuseport_select_sock can be called from the UDP
lookup functions with an skb argument. This function temporarily
modifies the skbuff data pointer (in bpf_run via a pull/push sequence).
To resolve compiler warning I added a local skbuf declaration that is
not const and assigned to the skb argument with an explicit cast.

Signed-off-by: Tom Herbert <tom@quantonium.net>
---
 include/net/ip.h             |  2 +-
 include/net/sock_reuseport.h |  2 +-
 include/net/udp.h            | 11 ++++++-----
 net/core/sock_reuseport.c    |  5 +++--
 net/ipv4/udp.c               | 11 ++++++-----
 net/ipv6/udp.c               | 10 +++++-----
 6 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index 9896f46cbbf1..8c0d84ffc659 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -79,7 +79,7 @@ struct ipcm_cookie {
 #define PKTINFO_SKB_CB(skb) ((struct in_pktinfo *)((skb)->cb))
 
 /* return enslaved device index if relevant */
-static inline int inet_sdif(struct sk_buff *skb)
+static inline int inet_sdif(const struct sk_buff *skb)
 {
 #if IS_ENABLED(CONFIG_NET_L3_MASTER_DEV)
 	if (skb && ipv4_l3mdev_skb(IPCB(skb)->flags))
diff --git a/include/net/sock_reuseport.h b/include/net/sock_reuseport.h
index aecd30308d50..d25352a848d9 100644
--- a/include/net/sock_reuseport.h
+++ b/include/net/sock_reuseport.h
@@ -20,7 +20,7 @@ extern int reuseport_add_sock(struct sock *sk, struct sock *sk2);
 extern void reuseport_detach_sock(struct sock *sk);
 extern struct sock *reuseport_select_sock(struct sock *sk,
 					  u32 hash,
-					  struct sk_buff *skb,
+					  const struct sk_buff *skb,
 					  int hdr_len);
 extern struct bpf_prog *reuseport_attach_prog(struct sock *sk,
 					      struct bpf_prog *prog);
diff --git a/include/net/udp.h b/include/net/udp.h
index 4e5f23fec35e..f3d1de6f0983 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -167,7 +167,7 @@ static inline void udp_csum_pull_header(struct sk_buff *skb)
 	UDP_SKB_CB(skb)->cscov -= sizeof(struct udphdr);
 }
 
-typedef struct sock *(*udp_lookup_t)(struct sk_buff *skb, __be16 sport,
+typedef struct sock *(*udp_lookup_t)(const struct sk_buff *skb, __be16 sport,
 				     __be16 dport);
 
 struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb,
@@ -288,8 +288,9 @@ struct sock *udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport,
 			     __be32 daddr, __be16 dport, int dif);
 struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport,
 			       __be32 daddr, __be16 dport, int dif, int sdif,
-			       struct udp_table *tbl, struct sk_buff *skb);
-struct sock *udp4_lib_lookup_skb(struct sk_buff *skb,
+			       struct udp_table *tbl,
+			       const struct sk_buff *skb);
+struct sock *udp4_lib_lookup_skb(const struct sk_buff *skb,
 				 __be16 sport, __be16 dport);
 struct sock *udp6_lib_lookup(struct net *net,
 			     const struct in6_addr *saddr, __be16 sport,
@@ -299,8 +300,8 @@ struct sock *__udp6_lib_lookup(struct net *net,
 			       const struct in6_addr *saddr, __be16 sport,
 			       const struct in6_addr *daddr, __be16 dport,
 			       int dif, int sdif, struct udp_table *tbl,
-			       struct sk_buff *skb);
-struct sock *udp6_lib_lookup_skb(struct sk_buff *skb,
+			       const struct sk_buff *skb);
+struct sock *udp6_lib_lookup_skb(const struct sk_buff *skb,
 				 __be16 sport, __be16 dport);
 
 /* UDP uses skb->dev_scratch to cache as much information as possible and avoid
diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c
index eed1ebf7f29d..a17f13b33189 100644
--- a/net/core/sock_reuseport.c
+++ b/net/core/sock_reuseport.c
@@ -164,9 +164,10 @@ void reuseport_detach_sock(struct sock *sk)
 EXPORT_SYMBOL(reuseport_detach_sock);
 
 static struct sock *run_bpf(struct sock_reuseport *reuse, u16 socks,
-			    struct bpf_prog *prog, struct sk_buff *skb,
+			    struct bpf_prog *prog, const struct sk_buff *_skb,
 			    int hdr_len)
 {
+	struct sk_buff *skb = (struct sk_buff *)_skb; /* Override const */
 	struct sk_buff *nskb = NULL;
 	u32 index;
 
@@ -205,7 +206,7 @@ static struct sock *run_bpf(struct sock_reuseport *reuse, u16 socks,
  */
 struct sock *reuseport_select_sock(struct sock *sk,
 				   u32 hash,
-				   struct sk_buff *skb,
+				   const struct sk_buff *skb,
 				   int hdr_len)
 {
 	struct sock_reuseport *reuse;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index bf6c406bf5e7..a851026ef28b 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -135,7 +135,8 @@ EXPORT_SYMBOL(udp_memory_allocated);
 #define PORTS_PER_CHAIN (MAX_UDP_PORTS / UDP_HTABLE_SIZE_MIN)
 
 /* IPCB reference means this can not be used from early demux */
-static bool udp_lib_exact_dif_match(struct net *net, struct sk_buff *skb)
+static bool udp_lib_exact_dif_match(struct net *net,
+				    const struct sk_buff *skb)
 {
 #if IS_ENABLED(CONFIG_NET_L3_MASTER_DEV)
 	if (!net->ipv4.sysctl_udp_l3mdev_accept &&
@@ -445,7 +446,7 @@ static struct sock *udp4_lib_lookup2(struct net *net,
 				     __be32 daddr, unsigned int hnum,
 				     int dif, int sdif, bool exact_dif,
 				     struct udp_hslot *hslot2,
-				     struct sk_buff *skb)
+				     const struct sk_buff *skb)
 {
 	struct sock *sk, *result;
 	int score, badness, matches = 0, reuseport = 0;
@@ -484,7 +485,7 @@ static struct sock *udp4_lib_lookup2(struct net *net,
  */
 struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
 		__be16 sport, __be32 daddr, __be16 dport, int dif,
-		int sdif, struct udp_table *udptable, struct sk_buff *skb)
+		int sdif, struct udp_table *udptable, const struct sk_buff *skb)
 {
 	struct sock *sk, *result;
 	unsigned short hnum = ntohs(dport);
@@ -552,7 +553,7 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
 }
 EXPORT_SYMBOL_GPL(__udp4_lib_lookup);
 
-static inline struct sock *__udp4_lib_lookup_skb(struct sk_buff *skb,
+static inline struct sock *__udp4_lib_lookup_skb(const struct sk_buff *skb,
 						 __be16 sport, __be16 dport,
 						 struct udp_table *udptable)
 {
@@ -563,7 +564,7 @@ static inline struct sock *__udp4_lib_lookup_skb(struct sk_buff *skb,
 				 inet_sdif(skb), udptable, skb);
 }
 
-struct sock *udp4_lib_lookup_skb(struct sk_buff *skb,
+struct sock *udp4_lib_lookup_skb(const struct sk_buff *skb,
 				 __be16 sport, __be16 dport)
 {
 	return __udp4_lib_lookup_skb(skb, sport, dport, &udp_table);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 976f30391356..e9aa4db3ba53 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -56,7 +56,7 @@
 #include <trace/events/skb.h>
 #include "udp_impl.h"
 
-static bool udp6_lib_exact_dif_match(struct net *net, struct sk_buff *skb)
+static bool udp6_lib_exact_dif_match(struct net *net, const struct sk_buff *skb)
 {
 #if defined(CONFIG_NET_L3_MASTER_DEV)
 	if (!net->ipv4.sysctl_udp_l3mdev_accept &&
@@ -181,7 +181,7 @@ static struct sock *udp6_lib_lookup2(struct net *net,
 		const struct in6_addr *saddr, __be16 sport,
 		const struct in6_addr *daddr, unsigned int hnum,
 		int dif, int sdif, bool exact_dif,
-		struct udp_hslot *hslot2, struct sk_buff *skb)
+		struct udp_hslot *hslot2, const struct sk_buff *skb)
 {
 	struct sock *sk, *result;
 	int score, badness, matches = 0, reuseport = 0;
@@ -221,7 +221,7 @@ struct sock *__udp6_lib_lookup(struct net *net,
 			       const struct in6_addr *saddr, __be16 sport,
 			       const struct in6_addr *daddr, __be16 dport,
 			       int dif, int sdif, struct udp_table *udptable,
-			       struct sk_buff *skb)
+			       const struct sk_buff *skb)
 {
 	struct sock *sk, *result;
 	unsigned short hnum = ntohs(dport);
@@ -290,7 +290,7 @@ struct sock *__udp6_lib_lookup(struct net *net,
 }
 EXPORT_SYMBOL_GPL(__udp6_lib_lookup);
 
-static struct sock *__udp6_lib_lookup_skb(struct sk_buff *skb,
+static struct sock *__udp6_lib_lookup_skb(const struct sk_buff *skb,
 					  __be16 sport, __be16 dport,
 					  struct udp_table *udptable)
 {
@@ -301,7 +301,7 @@ static struct sock *__udp6_lib_lookup_skb(struct sk_buff *skb,
 				 inet6_sdif(skb), udptable, skb);
 }
 
-struct sock *udp6_lib_lookup_skb(struct sk_buff *skb,
+struct sock *udp6_lib_lookup_skb(const struct sk_buff *skb,
 				 __be16 sport, __be16 dport)
 {
 	const struct ipv6hdr *iph = ipv6_hdr(skb);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 net-next 3/6] flow_dissector: Add protocol specific flow dissection offload
  2017-08-29 23:27 [PATCH v2 net-next 0/6] flow_dissector: Protocol specific flow dissector offload Tom Herbert
  2017-08-29 23:27 ` [PATCH v2 net-next 1/6] flow_dissector: Move ETH_P_TEB processing to main switch Tom Herbert
  2017-08-29 23:27 ` [PATCH v2 net-next 2/6] udp: Constify skb argument in lookup functions Tom Herbert
@ 2017-08-29 23:27 ` Tom Herbert
  2017-08-30  1:00   ` David Miller
  2017-08-29 23:27 ` [PATCH v2 net-next 4/6] udp: flow dissector offload Tom Herbert
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Tom Herbert @ 2017-08-29 23:27 UTC (permalink / raw)
  To: davem; +Cc: netdev, Tom Herbert

Add offload capability for performing protocol specific flow dissection
(either by EtherType or IP protocol).

Specifically:

- Add flow_dissect to offload callbacks
- Move flow_dissect_ret enum to flow_dissector.h, cleanup names and add a
  couple of values
- Create GOTO_BY_RESULT macro to use in the main flow dissector switch to
  simplify handling of functions that return flow_dissect_ret enum
- In __skb_flow_dissect, add default case for switch(proto) as well as
  switch(ip_proto) that looks up and calls protocol specific flow
  dissection

Signed-off-by: Tom Herbert <tom@quantonium.net>
---
 include/linux/netdevice.h    |   7 +++
 include/net/flow_dissector.h |   9 +++
 net/core/dev.c               |  14 +++++
 net/core/flow_dissector.c    | 132 +++++++++++++++++++++++++++++++------------
 net/ipv4/route.c             |   4 +-
 5 files changed, 128 insertions(+), 38 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c5475b37a631..90ccb434e127 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2208,6 +2208,12 @@ struct offload_callbacks {
 	struct sk_buff		**(*gro_receive)(struct sk_buff **head,
 						 struct sk_buff *skb);
 	int			(*gro_complete)(struct sk_buff *skb, int nhoff);
+	enum flow_dissect_ret (*flow_dissect)(const struct sk_buff *skb,
+			struct flow_dissector_key_control *key_control,
+			struct flow_dissector *flow_dissector,
+			void *target_container, void *data,
+			__be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
+			int *p_hlen, unsigned int flags);
 };
 
 struct packet_offload {
@@ -3253,6 +3259,7 @@ struct sk_buff *napi_get_frags(struct napi_struct *napi);
 gro_result_t napi_gro_frags(struct napi_struct *napi);
 struct packet_offload *gro_find_receive_by_type(__be16 type);
 struct packet_offload *gro_find_complete_by_type(__be16 type);
+struct packet_offload *flow_dissect_find_by_type(__be16 type);
 
 static inline void napi_free_frags(struct napi_struct *napi)
 {
diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index e2663e900b0a..ad75bbfd1c9c 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -19,6 +19,14 @@ struct flow_dissector_key_control {
 #define FLOW_DIS_FIRST_FRAG	BIT(1)
 #define FLOW_DIS_ENCAPSULATION	BIT(2)
 
+enum flow_dissect_ret {
+	FLOW_DISSECT_RET_OUT_GOOD,
+	FLOW_DISSECT_RET_OUT_BAD,
+	FLOW_DISSECT_RET_PROTO_AGAIN,
+	FLOW_DISSECT_RET_IPPROTO_AGAIN,
+	FLOW_DISSECT_RET_CONTINUE,
+};
+
 /**
  * struct flow_dissector_key_basic:
  * @thoff: Transport header offset
@@ -205,6 +213,7 @@ enum flow_dissector_key_id {
 #define FLOW_DISSECTOR_F_STOP_AT_L3		BIT(1)
 #define FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL	BIT(2)
 #define FLOW_DISSECTOR_F_STOP_AT_ENCAP		BIT(3)
+#define FLOW_DISSECTOR_F_STOP_AT_L4		BIT(4)
 
 struct flow_dissector_key {
 	enum flow_dissector_key_id key_id;
diff --git a/net/core/dev.c b/net/core/dev.c
index 270b54754821..22ea8daa930c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4860,6 +4860,20 @@ struct packet_offload *gro_find_receive_by_type(__be16 type)
 }
 EXPORT_SYMBOL(gro_find_receive_by_type);
 
+struct packet_offload *flow_dissect_find_by_type(__be16 type)
+{
+	struct list_head *offload_head = &offload_base;
+	struct packet_offload *ptype;
+
+	list_for_each_entry_rcu(ptype, offload_head, list) {
+		if (ptype->type != type || !ptype->callbacks.flow_dissect)
+			continue;
+		return ptype;
+	}
+	return NULL;
+}
+EXPORT_SYMBOL(flow_dissect_find_by_type);
+
 struct packet_offload *gro_find_complete_by_type(__be16 type)
 {
 	struct list_head *offload_head = &offload_base;
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 12302acdb073..6a2cf240069a 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -9,6 +9,7 @@
 #include <net/ipv6.h>
 #include <net/gre.h>
 #include <net/pptp.h>
+#include <net/protocol.h>
 #include <linux/igmp.h>
 #include <linux/icmp.h>
 #include <linux/sctp.h>
@@ -115,12 +116,6 @@ __be32 __skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto,
 }
 EXPORT_SYMBOL(__skb_flow_get_ports);
 
-enum flow_dissect_ret {
-	FLOW_DISSECT_RET_OUT_GOOD,
-	FLOW_DISSECT_RET_OUT_BAD,
-	FLOW_DISSECT_RET_OUT_PROTO_AGAIN,
-};
-
 static enum flow_dissect_ret
 __skb_flow_dissect_mpls(const struct sk_buff *skb,
 			struct flow_dissector *flow_dissector,
@@ -322,7 +317,7 @@ __skb_flow_dissect_gre(const struct sk_buff *skb,
 	if (flags & FLOW_DISSECTOR_F_STOP_AT_ENCAP)
 		return FLOW_DISSECT_RET_OUT_GOOD;
 
-	return FLOW_DISSECT_RET_OUT_PROTO_AGAIN;
+	return FLOW_DISSECT_RET_PROTO_AGAIN;
 }
 
 static void
@@ -383,6 +378,27 @@ __skb_flow_dissect_ipv6(const struct sk_buff *skb,
 	key_ip->ttl = iph->hop_limit;
 }
 
+#define GOTO_BY_RESULT(ret) do {				\
+	switch (ret) {						\
+	case FLOW_DISSECT_RET_OUT_GOOD:				\
+		goto out_good;					\
+	case FLOW_DISSECT_RET_PROTO_AGAIN:			\
+		goto proto_again;				\
+	case FLOW_DISSECT_RET_IPPROTO_AGAIN:			\
+		goto ip_proto_again;				\
+	case FLOW_DISSECT_RET_OUT_BAD:				\
+	default:						\
+		goto out_bad;					\
+	}							\
+} while (0)
+
+#define GOTO_OR_CONT_BY_RESULT(ret) do {			\
+	enum flow_dissect_ret __ret = (ret);			\
+								\
+	if (__ret != FLOW_DISSECT_RET_CONTINUE)			\
+		GOTO_BY_RESULT(__ret);				\
+} while (0)
+
 /**
  * __skb_flow_dissect - extract the flow_keys struct and return it
  * @skb: sk_buff to extract the flow from, can be NULL if the rest are specified
@@ -659,15 +675,10 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 	case htons(ETH_P_MPLS_UC):
 	case htons(ETH_P_MPLS_MC):
 mpls:
-		switch (__skb_flow_dissect_mpls(skb, flow_dissector,
-						target_container, data,
-						nhoff, hlen)) {
-		case FLOW_DISSECT_RET_OUT_GOOD:
-			goto out_good;
-		case FLOW_DISSECT_RET_OUT_BAD:
-		default:
-			goto out_bad;
-		}
+		GOTO_BY_RESULT(__skb_flow_dissect_mpls(skb, flow_dissector,
+						       target_container, data,
+						       nhoff, hlen));
+
 	case htons(ETH_P_FCOE):
 		if ((hlen - nhoff) < FCOE_HEADER_LEN)
 			goto out_bad;
@@ -677,32 +688,44 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 
 	case htons(ETH_P_ARP):
 	case htons(ETH_P_RARP):
-		switch (__skb_flow_dissect_arp(skb, flow_dissector,
-					       target_container, data,
-					       nhoff, hlen)) {
-		case FLOW_DISSECT_RET_OUT_GOOD:
-			goto out_good;
-		case FLOW_DISSECT_RET_OUT_BAD:
-		default:
-			goto out_bad;
+		GOTO_BY_RESULT(__skb_flow_dissect_arp(skb, flow_dissector,
+						      target_container, data,
+						      nhoff, hlen));
+
+	default: {
+		struct packet_offload *ptype;
+		enum flow_dissect_ret ret;
+
+		rcu_read_lock();
+
+		ptype = flow_dissect_find_by_type(proto);
+
+		if (ptype) {
+			ret = ptype->callbacks.flow_dissect(skb, key_control,
+						flow_dissector,
+						target_container,
+						data, &proto, &ip_proto, &nhoff,
+						&hlen, flags);
+			rcu_read_unlock();
+
+			GOTO_BY_RESULT(ret);
+		} else {
+			rcu_read_unlock();
 		}
-	default:
+
 		goto out_bad;
 	}
+	}
 
 ip_proto_again:
 	switch (ip_proto) {
 	case IPPROTO_GRE:
-		switch (__skb_flow_dissect_gre(skb, key_control, flow_dissector,
-					       target_container, data,
-					       &proto, &nhoff, &hlen, flags)) {
-		case FLOW_DISSECT_RET_OUT_GOOD:
-			goto out_good;
-		case FLOW_DISSECT_RET_OUT_BAD:
-			goto out_bad;
-		case FLOW_DISSECT_RET_OUT_PROTO_AGAIN:
-			goto proto_again;
-		}
+		GOTO_BY_RESULT(__skb_flow_dissect_gre(skb, key_control,
+						      flow_dissector,
+						      target_container, data,
+						      &proto, &nhoff, &hlen,
+						      flags));
+
 	case NEXTHDR_HOP:
 	case NEXTHDR_ROUTING:
 	case NEXTHDR_DEST: {
@@ -768,9 +791,43 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 		__skb_flow_dissect_tcp(skb, flow_dissector, target_container,
 				       data, nhoff, hlen);
 		break;
-	default:
+	default: {
+		const struct net_offload *ops = NULL;
+
+		if (flags & FLOW_DISSECTOR_F_STOP_AT_L4)
+			break;
+
+		rcu_read_lock();
+
+		switch (proto) {
+		case htons(ETH_P_IP):
+			ops = rcu_dereference(inet_offloads[ip_proto]);
+			break;
+		case htons(ETH_P_IPV6):
+			ops = rcu_dereference(inet6_offloads[ip_proto]);
+			break;
+		default:
+			break;
+		}
+
+		if (ops && ops->callbacks.flow_dissect) {
+			enum flow_dissect_ret ret;
+
+			ret = ops->callbacks.flow_dissect(skb, key_control,
+						flow_dissector,
+						target_container,
+						data, &proto, &ip_proto, &nhoff,
+						&hlen, flags);
+			rcu_read_unlock();
+
+			GOTO_OR_CONT_BY_RESULT(ret);
+		} else {
+			rcu_read_unlock();
+		}
+
 		break;
 	}
+	}
 
 	if (dissector_uses_key(flow_dissector,
 			       FLOW_DISSECTOR_KEY_PORTS)) {
@@ -935,7 +992,8 @@ static inline u32 ___skb_get_hash(const struct sk_buff *skb,
 				  struct flow_keys *keys, u32 keyval)
 {
 	skb_flow_dissect_flow_keys(skb, keys,
-				   FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL);
+				   FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL |
+				   FLOW_DISSECTOR_F_STOP_AT_L4);
 
 	return __flow_hash_from_keys(keys, keyval);
 }
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 94d4cd2d5ea4..85f12b8e0b7f 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1811,7 +1811,9 @@ int fib_multipath_hash(const struct fib_info *fi, const struct flowi4 *fl4,
 	case 1:
 		/* skb is currently provided only when forwarding */
 		if (skb) {
-			unsigned int flag = FLOW_DISSECTOR_F_STOP_AT_ENCAP;
+			unsigned int flag = FLOW_DISSECTOR_F_STOP_AT_ENCAP |
+					    FLOW_DISSECTOR_F_STOP_AT_L4;
+;
 			struct flow_keys keys;
 
 			/* short-circuit if we already have L4 hash present */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 net-next 4/6] udp: flow dissector offload
  2017-08-29 23:27 [PATCH v2 net-next 0/6] flow_dissector: Protocol specific flow dissector offload Tom Herbert
                   ` (2 preceding siblings ...)
  2017-08-29 23:27 ` [PATCH v2 net-next 3/6] flow_dissector: Add protocol specific flow dissection offload Tom Herbert
@ 2017-08-29 23:27 ` Tom Herbert
  2017-08-30 10:36   ` Paolo Abeni
  2017-08-29 23:27 ` [PATCH v2 net-next 5/6] fou: Support flow dissection Tom Herbert
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Tom Herbert @ 2017-08-29 23:27 UTC (permalink / raw)
  To: davem; +Cc: netdev, Tom Herbert

Add support to perform UDP specific flow dissection. This is
primarily intended for dissecting encapsulated packets in UDP
encapsulation.

This patch adds a flow_dissect offload for UDP4 and UDP6. The backend
function performs a socket lookup and calls the flow_dissect function
if a socket is found.

Signed-off-by: Tom Herbert <tom@quantonium.net>
---
 include/linux/udp.h      |  8 ++++++++
 include/net/udp.h        |  8 ++++++++
 include/net/udp_tunnel.h |  8 ++++++++
 net/ipv4/udp_offload.c   | 45 +++++++++++++++++++++++++++++++++++++++++++++
 net/ipv4/udp_tunnel.c    |  1 +
 net/ipv6/udp_offload.c   | 13 +++++++++++++
 6 files changed, 83 insertions(+)

diff --git a/include/linux/udp.h b/include/linux/udp.h
index eaea63bc79bb..2e90b189ef6a 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -79,6 +79,14 @@ struct udp_sock {
 	int			(*gro_complete)(struct sock *sk,
 						struct sk_buff *skb,
 						int nhoff);
+	/* Flow dissector function for a UDP socket */
+	enum flow_dissect_ret (*flow_dissect)(struct sock *sk,
+			const struct sk_buff *skb,
+			struct flow_dissector_key_control *key_control,
+			struct flow_dissector *flow_dissector,
+			void *target_container, void *data,
+			__be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
+			int *p_hlen, unsigned int flags);
 
 	/* udp_recvmsg try to use this before splicing sk_receive_queue */
 	struct sk_buff_head	reader_queue ____cacheline_aligned_in_smp;
diff --git a/include/net/udp.h b/include/net/udp.h
index f3d1de6f0983..499e4faf8b14 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -174,6 +174,14 @@ struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb,
 				 struct udphdr *uh, udp_lookup_t lookup);
 int udp_gro_complete(struct sk_buff *skb, int nhoff, udp_lookup_t lookup);
 
+enum flow_dissect_ret udp_flow_dissect(const struct sk_buff *skb,
+			udp_lookup_t lookup,
+			struct flow_dissector_key_control *key_control,
+			struct flow_dissector *flow_dissector,
+			void *target_container, void *data,
+			__be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
+			int *p_hlen, unsigned int flags);
+
 static inline struct udphdr *udp_gro_udphdr(struct sk_buff *skb)
 {
 	struct udphdr *uh;
diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
index 10cce0dd4450..b7102e0f41a9 100644
--- a/include/net/udp_tunnel.h
+++ b/include/net/udp_tunnel.h
@@ -69,6 +69,13 @@ typedef struct sk_buff **(*udp_tunnel_gro_receive_t)(struct sock *sk,
 						     struct sk_buff *skb);
 typedef int (*udp_tunnel_gro_complete_t)(struct sock *sk, struct sk_buff *skb,
 					 int nhoff);
+typedef enum flow_dissect_ret (*udp_tunnel_flow_dissect_t)(struct sock *sk,
+			const struct sk_buff *skb,
+			struct flow_dissector_key_control *key_control,
+			struct flow_dissector *flow_dissector,
+			void *target_container, void *data,
+			__be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
+			int *p_hlen, unsigned int flags);
 
 struct udp_tunnel_sock_cfg {
 	void *sk_user_data;     /* user data used by encap_rcv call back */
@@ -78,6 +85,7 @@ struct udp_tunnel_sock_cfg {
 	udp_tunnel_encap_destroy_t encap_destroy;
 	udp_tunnel_gro_receive_t gro_receive;
 	udp_tunnel_gro_complete_t gro_complete;
+	udp_tunnel_flow_dissect_t flow_dissect;
 };
 
 /* Setup the given (UDP) sock to receive UDP encapsulated packets */
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 97658bfc1b58..7f0a7ed4a6f7 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -328,11 +328,56 @@ static int udp4_gro_complete(struct sk_buff *skb, int nhoff)
 	return udp_gro_complete(skb, nhoff, udp4_lib_lookup_skb);
 }
 
+enum flow_dissect_ret udp_flow_dissect(const struct sk_buff *skb,
+			udp_lookup_t lookup,
+			struct flow_dissector_key_control *key_control,
+			struct flow_dissector *flow_dissector,
+			void *target_container, void *data,
+			__be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
+			int *p_hlen, unsigned int flags)
+{
+	enum flow_dissect_ret ret = FLOW_DISSECT_RET_CONTINUE;
+	struct udphdr *uh, _uh;
+	struct sock *sk;
+
+	uh = __skb_header_pointer(skb, *p_nhoff, sizeof(_uh), data,
+				  *p_hlen, &_uh);
+	if (!uh)
+		return FLOW_DISSECT_RET_OUT_BAD;
+
+	rcu_read_lock();
+
+	sk = (*lookup)(skb, uh->source, uh->dest);
+
+	if (sk && udp_sk(sk)->flow_dissect)
+		ret = udp_sk(sk)->flow_dissect(sk, skb, key_control,
+					       flow_dissector, target_container,
+					       data, p_proto, p_ip_proto,
+					       p_nhoff, p_hlen, flags);
+	rcu_read_unlock();
+
+	return ret;
+}
+EXPORT_SYMBOL(udp_flow_dissect);
+
+static enum flow_dissect_ret udp4_flow_dissect(const struct sk_buff *skb,
+			struct flow_dissector_key_control *key_control,
+			struct flow_dissector *flow_dissector,
+			void *target_container, void *data,
+			__be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
+			int *p_hlen, unsigned int flags)
+{
+	return udp_flow_dissect(skb, udp4_lib_lookup_skb, key_control,
+				flow_dissector, target_container, data,
+				p_proto, p_ip_proto, p_nhoff, p_hlen, flags);
+}
+
 static const struct net_offload udpv4_offload = {
 	.callbacks = {
 		.gso_segment = udp4_tunnel_segment,
 		.gro_receive  =	udp4_gro_receive,
 		.gro_complete =	udp4_gro_complete,
+		.flow_dissect = udp4_flow_dissect,
 	},
 };
 
diff --git a/net/ipv4/udp_tunnel.c b/net/ipv4/udp_tunnel.c
index 6539ff15e9a3..a4eec2a044d2 100644
--- a/net/ipv4/udp_tunnel.c
+++ b/net/ipv4/udp_tunnel.c
@@ -71,6 +71,7 @@ void setup_udp_tunnel_sock(struct net *net, struct socket *sock,
 	udp_sk(sk)->encap_destroy = cfg->encap_destroy;
 	udp_sk(sk)->gro_receive = cfg->gro_receive;
 	udp_sk(sk)->gro_complete = cfg->gro_complete;
+	udp_sk(sk)->flow_dissect = cfg->flow_dissect;
 
 	udp_tunnel_encap_enable(sock);
 }
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index 455fd4e39333..99ade504eaf7 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -73,11 +73,24 @@ static int udp6_gro_complete(struct sk_buff *skb, int nhoff)
 	return udp_gro_complete(skb, nhoff, udp6_lib_lookup_skb);
 }
 
+static enum flow_dissect_ret udp6_flow_dissect(const struct sk_buff *skb,
+			struct flow_dissector_key_control *key_control,
+			struct flow_dissector *flow_dissector,
+			void *target_container, void *data,
+			__be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
+			int *p_hlen, unsigned int flags)
+{
+	return udp_flow_dissect(skb, udp6_lib_lookup_skb, key_control,
+				flow_dissector, target_container, data,
+				p_proto, p_ip_proto, p_nhoff, p_hlen, flags);
+}
+
 static const struct net_offload udpv6_offload = {
 	.callbacks = {
 		.gso_segment	=	udp6_tunnel_segment,
 		.gro_receive	=	udp6_gro_receive,
 		.gro_complete	=	udp6_gro_complete,
+		.flow_dissect	=	udp6_flow_dissect,
 	},
 };
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 net-next 5/6] fou: Support flow dissection
  2017-08-29 23:27 [PATCH v2 net-next 0/6] flow_dissector: Protocol specific flow dissector offload Tom Herbert
                   ` (3 preceding siblings ...)
  2017-08-29 23:27 ` [PATCH v2 net-next 4/6] udp: flow dissector offload Tom Herbert
@ 2017-08-29 23:27 ` Tom Herbert
  2017-08-29 23:27 ` [PATCH v2 net-next 6/6] vxlan: support flow dissect Tom Herbert
  2017-08-30  8:41 ` [PATCH v2 net-next 0/6] flow_dissector: Protocol specific flow dissector offload Hannes Frederic Sowa
  6 siblings, 0 replies; 16+ messages in thread
From: Tom Herbert @ 2017-08-29 23:27 UTC (permalink / raw)
  To: davem; +Cc: netdev, Tom Herbert

Populate offload flow_dissect callabck appropriately for fou and gue.

Signed-off-by: Tom Herbert <tom@quantonium.net>
---
 net/ipv4/fou.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index 1540db65241a..a831dd49fb28 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -282,6 +282,20 @@ static int fou_gro_complete(struct sock *sk, struct sk_buff *skb,
 	return err;
 }
 
+static enum flow_dissect_ret fou_flow_dissect(struct sock *sk,
+			const struct sk_buff *skb,
+			struct flow_dissector_key_control *key_control,
+			struct flow_dissector *flow_dissector,
+			void *target_container, void *data,
+			__be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
+			int *p_hlen, unsigned int flags)
+{
+	*p_ip_proto = fou_from_sock(sk)->protocol;
+	*p_nhoff += sizeof(struct udphdr);
+
+	return FLOW_DISSECT_RET_IPPROTO_AGAIN;
+}
+
 static struct guehdr *gue_gro_remcsum(struct sk_buff *skb, unsigned int off,
 				      struct guehdr *guehdr, void *data,
 				      size_t hdrlen, struct gro_remcsum *grc,
@@ -500,6 +514,53 @@ static int gue_gro_complete(struct sock *sk, struct sk_buff *skb, int nhoff)
 	return err;
 }
 
+static enum flow_dissect_ret gue_flow_dissect(struct sock *sk,
+			const struct sk_buff *skb,
+			struct flow_dissector_key_control *key_control,
+			struct flow_dissector *flow_dissector,
+			void *target_container, void *data,
+			__be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
+			int *p_hlen, unsigned int flags)
+{
+	struct guehdr *guehdr, _guehdr;
+
+	guehdr = __skb_header_pointer(skb, *p_nhoff + sizeof(struct udphdr),
+				      sizeof(_guehdr), data, *p_hlen, &_guehdr);
+	if (!guehdr)
+		return FLOW_DISSECT_RET_OUT_BAD;
+
+	switch (guehdr->version) {
+	case 0:
+		if (unlikely(guehdr->control))
+			return FLOW_DISSECT_RET_CONTINUE;
+
+		*p_ip_proto = guehdr->proto_ctype;
+		*p_nhoff += sizeof(struct udphdr) +
+		    sizeof(*guehdr) + (guehdr->hlen << 2);
+
+		break;
+	case 1:
+		switch (((struct iphdr *)guehdr)->version) {
+		case 4:
+			*p_ip_proto = IPPROTO_IPIP;
+			break;
+		case 6:
+			*p_ip_proto = IPPROTO_IPV6;
+			break;
+		default:
+			return FLOW_DISSECT_RET_CONTINUE;
+		}
+
+		*p_nhoff += sizeof(struct udphdr);
+
+		break;
+	default:
+		return FLOW_DISSECT_RET_CONTINUE;
+	}
+
+	return FLOW_DISSECT_RET_IPPROTO_AGAIN;
+}
+
 static int fou_add_to_port_list(struct net *net, struct fou *fou)
 {
 	struct fou_net *fn = net_generic(net, fou_net_id);
@@ -570,12 +631,14 @@ static int fou_create(struct net *net, struct fou_cfg *cfg,
 		tunnel_cfg.encap_rcv = fou_udp_recv;
 		tunnel_cfg.gro_receive = fou_gro_receive;
 		tunnel_cfg.gro_complete = fou_gro_complete;
+		tunnel_cfg.flow_dissect = fou_flow_dissect;
 		fou->protocol = cfg->protocol;
 		break;
 	case FOU_ENCAP_GUE:
 		tunnel_cfg.encap_rcv = gue_udp_recv;
 		tunnel_cfg.gro_receive = gue_gro_receive;
 		tunnel_cfg.gro_complete = gue_gro_complete;
+		tunnel_cfg.flow_dissect = gue_flow_dissect;
 		break;
 	default:
 		err = -EINVAL;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 net-next 6/6] vxlan: support flow dissect
  2017-08-29 23:27 [PATCH v2 net-next 0/6] flow_dissector: Protocol specific flow dissector offload Tom Herbert
                   ` (4 preceding siblings ...)
  2017-08-29 23:27 ` [PATCH v2 net-next 5/6] fou: Support flow dissection Tom Herbert
@ 2017-08-29 23:27 ` Tom Herbert
  2017-08-30  8:41 ` [PATCH v2 net-next 0/6] flow_dissector: Protocol specific flow dissector offload Hannes Frederic Sowa
  6 siblings, 0 replies; 16+ messages in thread
From: Tom Herbert @ 2017-08-29 23:27 UTC (permalink / raw)
  To: davem; +Cc: netdev, Tom Herbert

Populate offload flow_dissect callback appropriately for VXLAN and
VXLAN-GPE.

Signed-off-by: Tom Herbert <tom@quantonium.net>
---
 drivers/net/vxlan.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index ae3a1da703c2..41e50de40af4 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1336,6 +1336,55 @@ static bool vxlan_ecn_decapsulate(struct vxlan_sock *vs, void *oiph,
 	return err <= 1;
 }
 
+static enum flow_dissect_ret vxlan_flow_dissect(struct sock *sk,
+			const struct sk_buff *skb,
+			struct flow_dissector_key_control *key_control,
+			struct flow_dissector *flow_dissector,
+			void *target_container, void *data,
+			__be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
+			int *p_hlen, unsigned int flags)
+{
+	__be16 protocol = htons(ETH_P_TEB);
+	struct vxlanhdr *vhdr, _vhdr;
+	struct vxlan_sock *vs;
+
+	vhdr = __skb_header_pointer(skb, *p_nhoff + sizeof(struct udphdr),
+				    sizeof(_vhdr), data, *p_hlen, &_vhdr);
+	if (!vhdr)
+		return FLOW_DISSECT_RET_OUT_BAD;
+
+	vs = rcu_dereference_sk_user_data(sk);
+	if (!vs)
+		return FLOW_DISSECT_RET_OUT_BAD;
+
+	if (vs->flags & VXLAN_F_GPE) {
+		struct vxlanhdr_gpe *gpe = (struct vxlanhdr_gpe *)vhdr;
+
+		/* Need to have Next Protocol set for interfaces in GPE mode. */
+		if (gpe->version != 0 || !gpe->np_applied || gpe->oam_flag)
+			return FLOW_DISSECT_RET_CONTINUE;
+
+		switch (gpe->next_protocol) {
+		case VXLAN_GPE_NP_IPV4:
+			protocol = htons(ETH_P_IP);
+			break;
+		case VXLAN_GPE_NP_IPV6:
+			protocol = htons(ETH_P_IPV6);
+			break;
+		case VXLAN_GPE_NP_ETHERNET:
+			protocol = htons(ETH_P_TEB);
+			break;
+		default:
+			return FLOW_DISSECT_RET_CONTINUE;
+		}
+	}
+
+	*p_nhoff += sizeof(struct udphdr) + sizeof(_vhdr);
+	*p_proto = protocol;
+
+	return FLOW_DISSECT_RET_PROTO_AGAIN;
+}
+
 /* Callback from net/ipv4/udp.c to receive packets */
 static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
 {
@@ -2864,6 +2913,7 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, bool ipv6,
 	tunnel_cfg.encap_destroy = NULL;
 	tunnel_cfg.gro_receive = vxlan_gro_receive;
 	tunnel_cfg.gro_complete = vxlan_gro_complete;
+	tunnel_cfg.flow_dissect = vxlan_flow_dissect;
 
 	setup_udp_tunnel_sock(net, sock, &tunnel_cfg);
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 net-next 2/6] udp: Constify skb argument in lookup functions
  2017-08-29 23:27 ` [PATCH v2 net-next 2/6] udp: Constify skb argument in lookup functions Tom Herbert
@ 2017-08-30  0:58   ` David Miller
  2017-08-30  3:09     ` Tom Herbert
  0 siblings, 1 reply; 16+ messages in thread
From: David Miller @ 2017-08-30  0:58 UTC (permalink / raw)
  To: tom; +Cc: netdev

From: Tom Herbert <tom@quantonium.net>
Date: Tue, 29 Aug 2017 16:27:07 -0700

> For UDP socket lookup functions, and associateed functions that take an
> skbuf as argument, declare the skb argument as constant.
> 
> One caveat is that reuseport_select_sock can be called from the UDP
> lookup functions with an skb argument. This function temporarily
> modifies the skbuff data pointer (in bpf_run via a pull/push sequence).
> To resolve compiler warning I added a local skbuf declaration that is
> not const and assigned to the skb argument with an explicit cast.
> 
> Signed-off-by: Tom Herbert <tom@quantonium.net>

Please don't do this.

If reuseport_select_sock() modifies anything in the SKB, especially
skb->data, it infects the entire call chain.  So you can't mark it
const in this family of calls.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 net-next 3/6] flow_dissector: Add protocol specific flow dissection offload
  2017-08-29 23:27 ` [PATCH v2 net-next 3/6] flow_dissector: Add protocol specific flow dissection offload Tom Herbert
@ 2017-08-30  1:00   ` David Miller
  0 siblings, 0 replies; 16+ messages in thread
From: David Miller @ 2017-08-30  1:00 UTC (permalink / raw)
  To: tom; +Cc: netdev

From: Tom Herbert <tom@quantonium.net>
Date: Tue, 29 Aug 2017 16:27:08 -0700

> +#define GOTO_BY_RESULT(ret) do {				\
> +	switch (ret) {						\
> +	case FLOW_DISSECT_RET_OUT_GOOD:				\
> +		goto out_good;					\
> +	case FLOW_DISSECT_RET_PROTO_AGAIN:			\
> +		goto proto_again;				\
> +	case FLOW_DISSECT_RET_IPPROTO_AGAIN:			\
> +		goto ip_proto_again;				\
> +	case FLOW_DISSECT_RET_OUT_BAD:				\
> +	default:						\
> +		goto out_bad;					\
> +	}							\
> +} while (0)
> +
> +#define GOTO_OR_CONT_BY_RESULT(ret) do {			\
> +	enum flow_dissect_ret __ret = (ret);			\
> +								\
> +	if (__ret != FLOW_DISSECT_RET_CONTINUE)			\
> +		GOTO_BY_RESULT(__ret);				\
> +} while (0)

Please don't hide major control flow changes inside of a macro.  This
means returns and gotos.

It makes code impossible to audit.

Yes, this applies even if the macro has the word "GOTO" in it :-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 net-next 2/6] udp: Constify skb argument in lookup functions
  2017-08-30  0:58   ` David Miller
@ 2017-08-30  3:09     ` Tom Herbert
  0 siblings, 0 replies; 16+ messages in thread
From: Tom Herbert @ 2017-08-30  3:09 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Kernel Network Developers

On Tue, Aug 29, 2017 at 5:58 PM, David Miller <davem@davemloft.net> wrote:
> From: Tom Herbert <tom@quantonium.net>
> Date: Tue, 29 Aug 2017 16:27:07 -0700
>
>> For UDP socket lookup functions, and associateed functions that take an
>> skbuf as argument, declare the skb argument as constant.
>>
>> One caveat is that reuseport_select_sock can be called from the UDP
>> lookup functions with an skb argument. This function temporarily
>> modifies the skbuff data pointer (in bpf_run via a pull/push sequence).
>> To resolve compiler warning I added a local skbuf declaration that is
>> not const and assigned to the skb argument with an explicit cast.
>>
>> Signed-off-by: Tom Herbert <tom@quantonium.net>
>
> Please don't do this.
>
> If reuseport_select_sock() modifies anything in the SKB, especially
> skb->data, it infects the entire call chain.  So you can't mark it
> const in this family of calls.
>
reuseport_select_sock calls run_bpf that calls pskb_pull to
"temporarily advance data past protocol header" and it calls
bpf_prog_run_save_cb which takes non-constant skb argument. This is
the only instance in all the udp lookup functions where non-constant
is needed. It's logical that constant skbuf makes sense for socket
lookup-- I doubt any caller would expect the skbuf to be modified as a
side effect. It's also an implicit characteristic since
reuseport_select_sock may just clone the socket before calling BPF.

The problem is that all the flow dissector functions operate on const
skbs (again that's logical :-) ). So if we want to be able to call
lookup functions or even BPF to do flow dissection, then I think
something needs to change. I really don't want to unconsitify the flow
dissector functions. We could just always do the skb before calling
BPF, but I suppose that is a potential performance hit. Is there a
better way to resolve this?

Thanks,
Tom

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 net-next 0/6] flow_dissector: Protocol specific flow dissector offload
  2017-08-29 23:27 [PATCH v2 net-next 0/6] flow_dissector: Protocol specific flow dissector offload Tom Herbert
                   ` (5 preceding siblings ...)
  2017-08-29 23:27 ` [PATCH v2 net-next 6/6] vxlan: support flow dissect Tom Herbert
@ 2017-08-30  8:41 ` Hannes Frederic Sowa
  2017-08-30 14:50   ` Tom Herbert
  6 siblings, 1 reply; 16+ messages in thread
From: Hannes Frederic Sowa @ 2017-08-30  8:41 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev

Hello Tom,

Tom Herbert <tom@quantonium.net> writes:

> This patch set adds a new offload type to perform flow dissection for
> specific protocols (either by EtherType or by IP protocol). This is
> primary useful to crack open UDP encapsulations (like VXLAN, GUE) for
> the purposes of parsing the encapsulated packet.
>
> Items in this patch set:
> - Constify skb argument to UDP lookup functions
> - Create new protocol case in __skb_dissect for ETH_P_TEB. This is based
>   on the code in the GRE dissect function and the special handling in
>   GRE can now be removed (it sets protocol to ETH_P_TEB and returns so
>   goto proto_again is done)
> - Add infrastructure for protocol specific flow dissection offload
> - Add infrastructure to perform UDP flow dissection. Uses same model of
>   GRO where a flow_dissect callback can be associated with a UDP
>   socket
> - Use the infrastructure to support flow dissection of VXLAN and GUE
>
> Tested:
>
> Forced RPS to call flow dissection for VXLAN, FOU, and GUE. Observed
> that inner packet was being properly dissected.
>
> v2: Add signed off

[...]

Can you provide more context on why you did this series? Is the entropy
insufficient you receive via UDP source ports? I assume this is the case
for HW RSS hashing but actually not for the software dissector.

Btw. we forbid hardware to use L4 information if IP_PROTO is UDP but we
allow it in RPS (not in IPv6 if flowlabel is present). Your series could
solve this problem by being more protocol specific and disallow
fragmentation on a particular quadtuple, very much the same like hw
encap offload, where we tell the specific port number to the hardware
and then disallow using L4 information for all other UDP protocols.

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 net-next 4/6] udp: flow dissector offload
  2017-08-29 23:27 ` [PATCH v2 net-next 4/6] udp: flow dissector offload Tom Herbert
@ 2017-08-30 10:36   ` Paolo Abeni
  2017-08-30 14:56     ` Tom Herbert
  2017-08-31 15:53     ` Willem de Bruijn
  0 siblings, 2 replies; 16+ messages in thread
From: Paolo Abeni @ 2017-08-30 10:36 UTC (permalink / raw)
  To: Tom Herbert, davem; +Cc: netdev

On Tue, 2017-08-29 at 16:27 -0700, Tom Herbert wrote:
> Add support to perform UDP specific flow dissection. This is
> primarily intended for dissecting encapsulated packets in UDP
> encapsulation.
> 
> This patch adds a flow_dissect offload for UDP4 and UDP6. The backend
> function performs a socket lookup and calls the flow_dissect function
> if a socket is found.
> 
> Signed-off-by: Tom Herbert <tom@quantonium.net>
> ---
>  include/linux/udp.h      |  8 ++++++++
>  include/net/udp.h        |  8 ++++++++
>  include/net/udp_tunnel.h |  8 ++++++++
>  net/ipv4/udp_offload.c   | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  net/ipv4/udp_tunnel.c    |  1 +
>  net/ipv6/udp_offload.c   | 13 +++++++++++++
>  6 files changed, 83 insertions(+)
> 
> diff --git a/include/linux/udp.h b/include/linux/udp.h
> index eaea63bc79bb..2e90b189ef6a 100644
> --- a/include/linux/udp.h
> +++ b/include/linux/udp.h
> @@ -79,6 +79,14 @@ struct udp_sock {
>  	int			(*gro_complete)(struct sock *sk,
>  						struct sk_buff *skb,
>  						int nhoff);
> +	/* Flow dissector function for a UDP socket */
> +	enum flow_dissect_ret (*flow_dissect)(struct sock *sk,
> +			const struct sk_buff *skb,
> +			struct flow_dissector_key_control *key_control,
> +			struct flow_dissector *flow_dissector,
> +			void *target_container, void *data,
> +			__be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
> +			int *p_hlen, unsigned int flags);
>  
>  	/* udp_recvmsg try to use this before splicing sk_receive_queue */
>  	struct sk_buff_head	reader_queue ____cacheline_aligned_in_smp;
> diff --git a/include/net/udp.h b/include/net/udp.h
> index f3d1de6f0983..499e4faf8b14 100644
> --- a/include/net/udp.h
> +++ b/include/net/udp.h
> @@ -174,6 +174,14 @@ struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb,
>  				 struct udphdr *uh, udp_lookup_t lookup);
>  int udp_gro_complete(struct sk_buff *skb, int nhoff, udp_lookup_t lookup);
>  
> +enum flow_dissect_ret udp_flow_dissect(const struct sk_buff *skb,
> +			udp_lookup_t lookup,
> +			struct flow_dissector_key_control *key_control,
> +			struct flow_dissector *flow_dissector,
> +			void *target_container, void *data,
> +			__be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
> +			int *p_hlen, unsigned int flags);
> +
>  static inline struct udphdr *udp_gro_udphdr(struct sk_buff *skb)
>  {
>  	struct udphdr *uh;
> diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
> index 10cce0dd4450..b7102e0f41a9 100644
> --- a/include/net/udp_tunnel.h
> +++ b/include/net/udp_tunnel.h
> @@ -69,6 +69,13 @@ typedef struct sk_buff **(*udp_tunnel_gro_receive_t)(struct sock *sk,
>  						     struct sk_buff *skb);
>  typedef int (*udp_tunnel_gro_complete_t)(struct sock *sk, struct sk_buff *skb,
>  					 int nhoff);
> +typedef enum flow_dissect_ret (*udp_tunnel_flow_dissect_t)(struct sock *sk,
> +			const struct sk_buff *skb,
> +			struct flow_dissector_key_control *key_control,
> +			struct flow_dissector *flow_dissector,
> +			void *target_container, void *data,
> +			__be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
> +			int *p_hlen, unsigned int flags);
>  
>  struct udp_tunnel_sock_cfg {
>  	void *sk_user_data;     /* user data used by encap_rcv call back */
> @@ -78,6 +85,7 @@ struct udp_tunnel_sock_cfg {
>  	udp_tunnel_encap_destroy_t encap_destroy;
>  	udp_tunnel_gro_receive_t gro_receive;
>  	udp_tunnel_gro_complete_t gro_complete;
> +	udp_tunnel_flow_dissect_t flow_dissect;
>  };
>  
>  /* Setup the given (UDP) sock to receive UDP encapsulated packets */
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index 97658bfc1b58..7f0a7ed4a6f7 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -328,11 +328,56 @@ static int udp4_gro_complete(struct sk_buff *skb, int nhoff)
>  	return udp_gro_complete(skb, nhoff, udp4_lib_lookup_skb);
>  }
>  
> +enum flow_dissect_ret udp_flow_dissect(const struct sk_buff *skb,
> +			udp_lookup_t lookup,
> +			struct flow_dissector_key_control *key_control,
> +			struct flow_dissector *flow_dissector,
> +			void *target_container, void *data,
> +			__be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
> +			int *p_hlen, unsigned int flags)
> +{
> +	enum flow_dissect_ret ret = FLOW_DISSECT_RET_CONTINUE;
> +	struct udphdr *uh, _uh;
> +	struct sock *sk;
> +
> +	uh = __skb_header_pointer(skb, *p_nhoff, sizeof(_uh), data,
> +				  *p_hlen, &_uh);
> +	if (!uh)
> +		return FLOW_DISSECT_RET_OUT_BAD;
> +
> +	rcu_read_lock();
> +
> +	sk = (*lookup)(skb, uh->source, uh->dest);
> +
> +	if (sk && udp_sk(sk)->flow_dissect)
> +		ret = udp_sk(sk)->flow_dissect(sk, skb, key_control,
> +					       flow_dissector, target_container,
> +					       data, p_proto, p_ip_proto,
> +					       p_nhoff, p_hlen, flags);
> +	rcu_read_unlock();
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(udp_flow_dissect);

If I read the above correctly, this is going to add another full UDP
lookup per UDP packet, can we avoid it with some static key enabled by
vxlan/fou/etc. ?

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 net-next 0/6] flow_dissector: Protocol specific flow dissector offload
  2017-08-30  8:41 ` [PATCH v2 net-next 0/6] flow_dissector: Protocol specific flow dissector offload Hannes Frederic Sowa
@ 2017-08-30 14:50   ` Tom Herbert
  2017-08-31 10:11     ` Hannes Frederic Sowa
  0 siblings, 1 reply; 16+ messages in thread
From: Tom Herbert @ 2017-08-30 14:50 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: David S . Miller, Linux Kernel Network Developers

On Wed, Aug 30, 2017 at 1:41 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> Hello Tom,
>
> Tom Herbert <tom@quantonium.net> writes:
>
>> This patch set adds a new offload type to perform flow dissection for
>> specific protocols (either by EtherType or by IP protocol). This is
>> primary useful to crack open UDP encapsulations (like VXLAN, GUE) for
>> the purposes of parsing the encapsulated packet.
>>
>> Items in this patch set:
>> - Constify skb argument to UDP lookup functions
>> - Create new protocol case in __skb_dissect for ETH_P_TEB. This is based
>>   on the code in the GRE dissect function and the special handling in
>>   GRE can now be removed (it sets protocol to ETH_P_TEB and returns so
>>   goto proto_again is done)
>> - Add infrastructure for protocol specific flow dissection offload
>> - Add infrastructure to perform UDP flow dissection. Uses same model of
>>   GRO where a flow_dissect callback can be associated with a UDP
>>   socket
>> - Use the infrastructure to support flow dissection of VXLAN and GUE
>>
>> Tested:
>>
>> Forced RPS to call flow dissection for VXLAN, FOU, and GUE. Observed
>> that inner packet was being properly dissected.
>>
>> v2: Add signed off
>
> [...]
>
> Can you provide more context on why you did this series? Is the entropy
> insufficient you receive via UDP source ports? I assume this is the case
> for HW RSS hashing but actually not for the software dissector.
>
Hi Hannes,

I think entropy is sufficient looking at UDP source ports, but there
is not universal agreement on that. In any case there are now many
other uses of flow dissector, for those that want DPI like getting TCP
flags, UDP encapsulation is currently a blind spot.

> Btw. we forbid hardware to use L4 information if IP_PROTO is UDP but we
> allow it in RPS (not in IPv6 if flowlabel is present). Your series could
> solve this problem by being more protocol specific and disallow
> fragmentation on a particular quadtuple, very much the same like hw
> encap offload, where we tell the specific port number to the hardware
> and then disallow using L4 information for all other UDP protocols.
>
IMO the fact that HW is protocol specific and operates solely on ports
is a problem (remember Less Is More...). It's better to be protocol
generic and do the socket lookup in SW which no longer has atomic
operations. Matching by bound socket tuple is more accurate than just
a port. However, technically this solution still isn't 100% correct
since it's possible that macvlan or ipvlan may intercede and steer
packet to a namespace where the socket isn't valid.

Tom

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 net-next 4/6] udp: flow dissector offload
  2017-08-30 10:36   ` Paolo Abeni
@ 2017-08-30 14:56     ` Tom Herbert
  2017-08-31 15:53     ` Willem de Bruijn
  1 sibling, 0 replies; 16+ messages in thread
From: Tom Herbert @ 2017-08-30 14:56 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: David S . Miller, Linux Kernel Network Developers

On Wed, Aug 30, 2017 at 3:36 AM, Paolo Abeni <pabeni@redhat.com> wrote:
> On Tue, 2017-08-29 at 16:27 -0700, Tom Herbert wrote:
>> Add support to perform UDP specific flow dissection. This is
>> primarily intended for dissecting encapsulated packets in UDP
>> encapsulation.
>>
>> This patch adds a flow_dissect offload for UDP4 and UDP6. The backend
>> function performs a socket lookup and calls the flow_dissect function
>> if a socket is found.
>>
>> Signed-off-by: Tom Herbert <tom@quantonium.net>
>> ---
>>  include/linux/udp.h      |  8 ++++++++
>>  include/net/udp.h        |  8 ++++++++
>>  include/net/udp_tunnel.h |  8 ++++++++
>>  net/ipv4/udp_offload.c   | 45 +++++++++++++++++++++++++++++++++++++++++++++
>>  net/ipv4/udp_tunnel.c    |  1 +
>>  net/ipv6/udp_offload.c   | 13 +++++++++++++
>>  6 files changed, 83 insertions(+)
>>
>> diff --git a/include/linux/udp.h b/include/linux/udp.h
>> index eaea63bc79bb..2e90b189ef6a 100644
>> --- a/include/linux/udp.h
>> +++ b/include/linux/udp.h
>> @@ -79,6 +79,14 @@ struct udp_sock {
>>       int                     (*gro_complete)(struct sock *sk,
>>                                               struct sk_buff *skb,
>>                                               int nhoff);
>> +     /* Flow dissector function for a UDP socket */
>> +     enum flow_dissect_ret (*flow_dissect)(struct sock *sk,
>> +                     const struct sk_buff *skb,
>> +                     struct flow_dissector_key_control *key_control,
>> +                     struct flow_dissector *flow_dissector,
>> +                     void *target_container, void *data,
>> +                     __be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
>> +                     int *p_hlen, unsigned int flags);
>>
>>       /* udp_recvmsg try to use this before splicing sk_receive_queue */
>>       struct sk_buff_head     reader_queue ____cacheline_aligned_in_smp;
>> diff --git a/include/net/udp.h b/include/net/udp.h
>> index f3d1de6f0983..499e4faf8b14 100644
>> --- a/include/net/udp.h
>> +++ b/include/net/udp.h
>> @@ -174,6 +174,14 @@ struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb,
>>                                struct udphdr *uh, udp_lookup_t lookup);
>>  int udp_gro_complete(struct sk_buff *skb, int nhoff, udp_lookup_t lookup);
>>
>> +enum flow_dissect_ret udp_flow_dissect(const struct sk_buff *skb,
>> +                     udp_lookup_t lookup,
>> +                     struct flow_dissector_key_control *key_control,
>> +                     struct flow_dissector *flow_dissector,
>> +                     void *target_container, void *data,
>> +                     __be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
>> +                     int *p_hlen, unsigned int flags);
>> +
>>  static inline struct udphdr *udp_gro_udphdr(struct sk_buff *skb)
>>  {
>>       struct udphdr *uh;
>> diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
>> index 10cce0dd4450..b7102e0f41a9 100644
>> --- a/include/net/udp_tunnel.h
>> +++ b/include/net/udp_tunnel.h
>> @@ -69,6 +69,13 @@ typedef struct sk_buff **(*udp_tunnel_gro_receive_t)(struct sock *sk,
>>                                                    struct sk_buff *skb);
>>  typedef int (*udp_tunnel_gro_complete_t)(struct sock *sk, struct sk_buff *skb,
>>                                        int nhoff);
>> +typedef enum flow_dissect_ret (*udp_tunnel_flow_dissect_t)(struct sock *sk,
>> +                     const struct sk_buff *skb,
>> +                     struct flow_dissector_key_control *key_control,
>> +                     struct flow_dissector *flow_dissector,
>> +                     void *target_container, void *data,
>> +                     __be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
>> +                     int *p_hlen, unsigned int flags);
>>
>>  struct udp_tunnel_sock_cfg {
>>       void *sk_user_data;     /* user data used by encap_rcv call back */
>> @@ -78,6 +85,7 @@ struct udp_tunnel_sock_cfg {
>>       udp_tunnel_encap_destroy_t encap_destroy;
>>       udp_tunnel_gro_receive_t gro_receive;
>>       udp_tunnel_gro_complete_t gro_complete;
>> +     udp_tunnel_flow_dissect_t flow_dissect;
>>  };
>>
>>  /* Setup the given (UDP) sock to receive UDP encapsulated packets */
>> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
>> index 97658bfc1b58..7f0a7ed4a6f7 100644
>> --- a/net/ipv4/udp_offload.c
>> +++ b/net/ipv4/udp_offload.c
>> @@ -328,11 +328,56 @@ static int udp4_gro_complete(struct sk_buff *skb, int nhoff)
>>       return udp_gro_complete(skb, nhoff, udp4_lib_lookup_skb);
>>  }
>>
>> +enum flow_dissect_ret udp_flow_dissect(const struct sk_buff *skb,
>> +                     udp_lookup_t lookup,
>> +                     struct flow_dissector_key_control *key_control,
>> +                     struct flow_dissector *flow_dissector,
>> +                     void *target_container, void *data,
>> +                     __be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
>> +                     int *p_hlen, unsigned int flags)
>> +{
>> +     enum flow_dissect_ret ret = FLOW_DISSECT_RET_CONTINUE;
>> +     struct udphdr *uh, _uh;
>> +     struct sock *sk;
>> +
>> +     uh = __skb_header_pointer(skb, *p_nhoff, sizeof(_uh), data,
>> +                               *p_hlen, &_uh);
>> +     if (!uh)
>> +             return FLOW_DISSECT_RET_OUT_BAD;
>> +
>> +     rcu_read_lock();
>> +
>> +     sk = (*lookup)(skb, uh->source, uh->dest);
>> +
>> +     if (sk && udp_sk(sk)->flow_dissect)
>> +             ret = udp_sk(sk)->flow_dissect(sk, skb, key_control,
>> +                                            flow_dissector, target_container,
>> +                                            data, p_proto, p_ip_proto,
>> +                                            p_nhoff, p_hlen, flags);
>> +     rcu_read_unlock();
>> +
>> +     return ret;
>> +}
>> +EXPORT_SYMBOL(udp_flow_dissect);
>
> If I read the above correctly, this is going to add another full UDP
> lookup per UDP packet, can we avoid it with some static key enabled by
> vxlan/fou/etc. ?
>
That's a good idea! Should just check udp_encap_needed. Also makes
sense to have in udp_gro_receive.

Tom

> Thanks,
>
> Paolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 net-next 0/6] flow_dissector: Protocol specific flow dissector offload
  2017-08-30 14:50   ` Tom Herbert
@ 2017-08-31 10:11     ` Hannes Frederic Sowa
  0 siblings, 0 replies; 16+ messages in thread
From: Hannes Frederic Sowa @ 2017-08-31 10:11 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David S . Miller, Linux Kernel Network Developers

Hello,

Tom Herbert <tom@quantonium.net> writes:

> On Wed, Aug 30, 2017 at 1:41 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
>> Hello Tom,
>>
>> Tom Herbert <tom@quantonium.net> writes:
>>
>>> This patch set adds a new offload type to perform flow dissection for
>>> specific protocols (either by EtherType or by IP protocol). This is
>>> primary useful to crack open UDP encapsulations (like VXLAN, GUE) for
>>> the purposes of parsing the encapsulated packet.
>>>
>>> Items in this patch set:
>>> - Constify skb argument to UDP lookup functions
>>> - Create new protocol case in __skb_dissect for ETH_P_TEB. This is based
>>>   on the code in the GRE dissect function and the special handling in
>>>   GRE can now be removed (it sets protocol to ETH_P_TEB and returns so
>>>   goto proto_again is done)
>>> - Add infrastructure for protocol specific flow dissection offload
>>> - Add infrastructure to perform UDP flow dissection. Uses same model of
>>>   GRO where a flow_dissect callback can be associated with a UDP
>>>   socket
>>> - Use the infrastructure to support flow dissection of VXLAN and GUE
>>>
>>> Tested:
>>>
>>> Forced RPS to call flow dissection for VXLAN, FOU, and GUE. Observed
>>> that inner packet was being properly dissected.
>>>
>>> v2: Add signed off
>>
>> [...]
>>
>> Can you provide more context on why you did this series? Is the entropy
>> insufficient you receive via UDP source ports? I assume this is the case
>> for HW RSS hashing but actually not for the software dissector.
>>
> Hi Hannes,
>
> I think entropy is sufficient looking at UDP source ports, but there
> is not universal agreement on that. In any case there are now many
> other uses of flow dissector, for those that want DPI like getting TCP
> flags, UDP encapsulation is currently a blind spot.

Regarding entropy, Toeplitz seems to do worse while mixing it in
compared to jenkins hash used in flow dissection in software.

I have a number of things I don't understand yet and haven't wound my
head around, yet:

* it seems you implemented boundless looping in the flow dissection. If
  you do know the outer vxlan tunnel parameter (dst-ip and port) I
  basically can let your implementation loop a while until the packet
  data is exceeded. This is not good. This seriously needs to be limited
  to one layer above the tunnels. Never trust user input! It seems a
  user can even overwrite the VID in the flow keys while reparsing? (I
  got this only from looking at the code)

* MPLS/VPLS do encapsulate IP or Ethernet depnding on the label but
  don't have representative sockets but would need other ways to query
  inner content - is this relevant to you.

>> Btw. we forbid hardware to use L4 information if IP_PROTO is UDP but we
>> allow it in RPS (not in IPv6 if flowlabel is present). Your series could
>> solve this problem by being more protocol specific and disallow
>> fragmentation on a particular quadtuple, very much the same like hw
>> encap offload, where we tell the specific port number to the hardware
>> and then disallow using L4 information for all other UDP protocols.
>>
> IMO the fact that HW is protocol specific and operates solely on ports
> is a problem (remember Less Is More...). It's better to be protocol
> generic and do the socket lookup in SW which no longer has atomic
> operations. Matching by bound socket tuple is more accurate than just
> a port. However, technically this solution still isn't 100% correct
> since it's possible that macvlan or ipvlan may intercede and steer
> packet to a namespace where the socket isn't valid.

Your implementation needs to do hierachical socket lookups with checking
the bound interface and traverse the stack figure out the next stacked
interface and use that for the next socket lookup.

I don't think this approach works, to be honest.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 net-next 4/6] udp: flow dissector offload
  2017-08-30 10:36   ` Paolo Abeni
  2017-08-30 14:56     ` Tom Herbert
@ 2017-08-31 15:53     ` Willem de Bruijn
  1 sibling, 0 replies; 16+ messages in thread
From: Willem de Bruijn @ 2017-08-31 15:53 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: Tom Herbert, David Miller, Network Development

On Wed, Aug 30, 2017 at 6:36 AM, Paolo Abeni <pabeni@redhat.com> wrote:
> On Tue, 2017-08-29 at 16:27 -0700, Tom Herbert wrote:
>> Add support to perform UDP specific flow dissection. This is
>> primarily intended for dissecting encapsulated packets in UDP
>> encapsulation.
>>
>> This patch adds a flow_dissect offload for UDP4 and UDP6. The backend
>> function performs a socket lookup and calls the flow_dissect function
>> if a socket is found.
>>
>> Signed-off-by: Tom Herbert <tom@quantonium.net>
>> ---
>>  include/linux/udp.h      |  8 ++++++++
>>  include/net/udp.h        |  8 ++++++++
>>  include/net/udp_tunnel.h |  8 ++++++++
>>  net/ipv4/udp_offload.c   | 45 +++++++++++++++++++++++++++++++++++++++++++++
>>  net/ipv4/udp_tunnel.c    |  1 +
>>  net/ipv6/udp_offload.c   | 13 +++++++++++++
>>  6 files changed, 83 insertions(+)
>>
>> diff --git a/include/linux/udp.h b/include/linux/udp.h
>> index eaea63bc79bb..2e90b189ef6a 100644
>> --- a/include/linux/udp.h
>> +++ b/include/linux/udp.h
>> @@ -79,6 +79,14 @@ struct udp_sock {
>>       int                     (*gro_complete)(struct sock *sk,
>>                                               struct sk_buff *skb,
>>                                               int nhoff);
>> +     /* Flow dissector function for a UDP socket */
>> +     enum flow_dissect_ret (*flow_dissect)(struct sock *sk,
>> +                     const struct sk_buff *skb,
>> +                     struct flow_dissector_key_control *key_control,
>> +                     struct flow_dissector *flow_dissector,
>> +                     void *target_container, void *data,
>> +                     __be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
>> +                     int *p_hlen, unsigned int flags);
>>
>>       /* udp_recvmsg try to use this before splicing sk_receive_queue */
>>       struct sk_buff_head     reader_queue ____cacheline_aligned_in_smp;
>> diff --git a/include/net/udp.h b/include/net/udp.h
>> index f3d1de6f0983..499e4faf8b14 100644
>> --- a/include/net/udp.h
>> +++ b/include/net/udp.h
>> @@ -174,6 +174,14 @@ struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb,
>>                                struct udphdr *uh, udp_lookup_t lookup);
>>  int udp_gro_complete(struct sk_buff *skb, int nhoff, udp_lookup_t lookup);
>>
>> +enum flow_dissect_ret udp_flow_dissect(const struct sk_buff *skb,
>> +                     udp_lookup_t lookup,
>> +                     struct flow_dissector_key_control *key_control,
>> +                     struct flow_dissector *flow_dissector,
>> +                     void *target_container, void *data,
>> +                     __be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
>> +                     int *p_hlen, unsigned int flags);
>> +
>>  static inline struct udphdr *udp_gro_udphdr(struct sk_buff *skb)
>>  {
>>       struct udphdr *uh;
>> diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
>> index 10cce0dd4450..b7102e0f41a9 100644
>> --- a/include/net/udp_tunnel.h
>> +++ b/include/net/udp_tunnel.h
>> @@ -69,6 +69,13 @@ typedef struct sk_buff **(*udp_tunnel_gro_receive_t)(struct sock *sk,
>>                                                    struct sk_buff *skb);
>>  typedef int (*udp_tunnel_gro_complete_t)(struct sock *sk, struct sk_buff *skb,
>>                                        int nhoff);
>> +typedef enum flow_dissect_ret (*udp_tunnel_flow_dissect_t)(struct sock *sk,
>> +                     const struct sk_buff *skb,
>> +                     struct flow_dissector_key_control *key_control,
>> +                     struct flow_dissector *flow_dissector,
>> +                     void *target_container, void *data,
>> +                     __be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
>> +                     int *p_hlen, unsigned int flags);
>>
>>  struct udp_tunnel_sock_cfg {
>>       void *sk_user_data;     /* user data used by encap_rcv call back */
>> @@ -78,6 +85,7 @@ struct udp_tunnel_sock_cfg {
>>       udp_tunnel_encap_destroy_t encap_destroy;
>>       udp_tunnel_gro_receive_t gro_receive;
>>       udp_tunnel_gro_complete_t gro_complete;
>> +     udp_tunnel_flow_dissect_t flow_dissect;
>>  };
>>
>>  /* Setup the given (UDP) sock to receive UDP encapsulated packets */
>> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
>> index 97658bfc1b58..7f0a7ed4a6f7 100644
>> --- a/net/ipv4/udp_offload.c
>> +++ b/net/ipv4/udp_offload.c
>> @@ -328,11 +328,56 @@ static int udp4_gro_complete(struct sk_buff *skb, int nhoff)
>>       return udp_gro_complete(skb, nhoff, udp4_lib_lookup_skb);
>>  }
>>
>> +enum flow_dissect_ret udp_flow_dissect(const struct sk_buff *skb,
>> +                     udp_lookup_t lookup,
>> +                     struct flow_dissector_key_control *key_control,
>> +                     struct flow_dissector *flow_dissector,
>> +                     void *target_container, void *data,
>> +                     __be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
>> +                     int *p_hlen, unsigned int flags)
>> +{
>> +     enum flow_dissect_ret ret = FLOW_DISSECT_RET_CONTINUE;
>> +     struct udphdr *uh, _uh;
>> +     struct sock *sk;
>> +
>> +     uh = __skb_header_pointer(skb, *p_nhoff, sizeof(_uh), data,
>> +                               *p_hlen, &_uh);
>> +     if (!uh)
>> +             return FLOW_DISSECT_RET_OUT_BAD;
>> +
>> +     rcu_read_lock();
>> +
>> +     sk = (*lookup)(skb, uh->source, uh->dest);
>> +
>> +     if (sk && udp_sk(sk)->flow_dissect)
>> +             ret = udp_sk(sk)->flow_dissect(sk, skb, key_control,
>> +                                            flow_dissector, target_container,
>> +                                            data, p_proto, p_ip_proto,
>> +                                            p_nhoff, p_hlen, flags);
>> +     rcu_read_unlock();
>> +
>> +     return ret;
>> +}
>> +EXPORT_SYMBOL(udp_flow_dissect);
>
> If I read the above correctly, this is going to add another full UDP
> lookup per UDP packet, can we avoid it with some static key enabled by
> vxlan/fou/etc. ?

That would also limit the exposure if a bug is discovered. The
vulnerability of this critical path also came up with the recent
flow dissector expansion for ipv6 neighbor discovery.

  http://patchwork.ozlabs.org/patch/722957/

Ideally, flow dissector support for less common protocols can be
disabled administratively.

That said, the parser code looks great to me.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-08-31 15:54 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-29 23:27 [PATCH v2 net-next 0/6] flow_dissector: Protocol specific flow dissector offload Tom Herbert
2017-08-29 23:27 ` [PATCH v2 net-next 1/6] flow_dissector: Move ETH_P_TEB processing to main switch Tom Herbert
2017-08-29 23:27 ` [PATCH v2 net-next 2/6] udp: Constify skb argument in lookup functions Tom Herbert
2017-08-30  0:58   ` David Miller
2017-08-30  3:09     ` Tom Herbert
2017-08-29 23:27 ` [PATCH v2 net-next 3/6] flow_dissector: Add protocol specific flow dissection offload Tom Herbert
2017-08-30  1:00   ` David Miller
2017-08-29 23:27 ` [PATCH v2 net-next 4/6] udp: flow dissector offload Tom Herbert
2017-08-30 10:36   ` Paolo Abeni
2017-08-30 14:56     ` Tom Herbert
2017-08-31 15:53     ` Willem de Bruijn
2017-08-29 23:27 ` [PATCH v2 net-next 5/6] fou: Support flow dissection Tom Herbert
2017-08-29 23:27 ` [PATCH v2 net-next 6/6] vxlan: support flow dissect Tom Herbert
2017-08-30  8:41 ` [PATCH v2 net-next 0/6] flow_dissector: Protocol specific flow dissector offload Hannes Frederic Sowa
2017-08-30 14:50   ` Tom Herbert
2017-08-31 10:11     ` Hannes Frederic Sowa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).