netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/16] GTP: flow based
@ 2021-01-23 19:59 Jonas Bonn
  2021-01-23 19:59 ` [RFC PATCH 01/16] Revert "GTP: add support for flow based tunneling API" Jonas Bonn
                   ` (16 more replies)
  0 siblings, 17 replies; 39+ messages in thread
From: Jonas Bonn @ 2021-01-23 19:59 UTC (permalink / raw)
  To: laforge, netdev, pbshelar, kuba; +Cc: pablo, Jonas Bonn

This series begins by reverting the recently added patch adding support
for GTP with lightweight tunnels.  That patch was added without getting
any ACK from the maintainers and has several issues, as discussed on the
mailing list.

In order to try to make this as painless as possible, I have reworked
Pravin's patch into a series that is, hopefully, a bit more reviewable.
That series is rebased onto a series of other changes that constitute
cleanup work necessary for on-going work to IPv6 support into the
driver.  The IPv6 work should be rebaseable onto top of this series
later on.

I did try to do this other way around:  rebasing the IPv6 series on top
of Pravin's patch.  Given that Pravin's patch contained about 200 lines
of superfluous changes that would have had to be reverted in the process
of realigning the patch series, things got ugly pretty quickly.  The end
result would not have been pretty.

So the result of this is that Pravin's patch is now mostly still in
place.  I've reworked some small bits in order to simplify things.  My
expectation is that Pravin will review and test his bits here.  In
particular, the patch adding GTP control headers needs a bit of
explanation.

This is still an RFC only because I'm not quite convinced that I'm done
with this.  I do want to get this onto the list quickly, though, since
this has implications for the next merge window.  So let's see if we can
sort this out to everyone's satisfaction.

/Jonas

Jonas Bonn (13):
  Revert "GTP: add support for flow based tunneling API"
  gtp: set initial MTU
  gtp: include role in link info
  gtp: really check namespaces before xmit
  gtp: drop unnecessary call to skb_dst_drop
  gtp: set device type
  gtp: rework IPv4 functionality
  gtp: set dev features to enable GSO
  gtp: support GRO
  gtp: refactor check_ms back into version specific handlers
  gtp: drop duplicated assignment
  gtp: update rx_length_errors for abnormally short packets
  gtp: set skb protocol after pulling headers

Pravin B Shelar (3):
  gtp: add support for flow based tunneling
  gtp: add ability to send GTP controls headers
  gtp: add netlink support for setting up flow based tunnels

 drivers/net/gtp.c | 609 +++++++++++++++++++++++++++++-----------------
 1 file changed, 380 insertions(+), 229 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH 01/16] Revert "GTP: add support for flow based tunneling API"
  2021-01-23 19:59 [RFC PATCH 00/16] GTP: flow based Jonas Bonn
@ 2021-01-23 19:59 ` Jonas Bonn
  2021-01-24 16:34   ` Harald Welte
  2021-01-23 19:59 ` [RFC PATCH 02/16] gtp: set initial MTU Jonas Bonn
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 39+ messages in thread
From: Jonas Bonn @ 2021-01-23 19:59 UTC (permalink / raw)
  To: laforge, netdev, pbshelar, kuba; +Cc: pablo, Jonas Bonn

This reverts commit 9ab7e76aefc97a9aa664accb59d6e8dc5e52514a.

This patch was committed without maintainer approval and despite a number
of unaddressed concerns from review.  There are several issues that
impede the acceptance of this patch and that make a reversion of this
particular instance of these changes the best way forward:

i)  the patch contains several logically separate changes that would be
better served as smaller patches (for review purposes)
ii) functionality like the handling of end markers has been introduced
without further explanation
iii) symmetry between the handling of GTPv0 and GTPv1 has been
unnecessarily broken
iv) there are no available userspace tools to allow for testing this
functionality
v) there is an unaddressed Coverity report against the patch concering
memory leakage
vi) most importantly, the patch contains a large amount of superfluous
churn that impedes other ongoing work with this driver

This patch will be reworked into a series that aligns with other
ongoing work and facilitates review.

Signed-off-by: Jonas Bonn <jonas@norrbonn.se>
---
 drivers/net/gtp.c                  | 527 ++++++++---------------------
 include/uapi/linux/gtp.h           |  12 -
 include/uapi/linux/if_link.h       |   1 -
 include/uapi/linux/if_tunnel.h     |   1 -
 tools/include/uapi/linux/if_link.h |   1 -
 5 files changed, 144 insertions(+), 398 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 851364314ecc..4c04e271f184 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -21,7 +21,6 @@
 #include <linux/file.h>
 #include <linux/gtp.h>
 
-#include <net/dst_metadata.h>
 #include <net/net_namespace.h>
 #include <net/protocol.h>
 #include <net/ip.h>
@@ -74,9 +73,6 @@ struct gtp_dev {
 	unsigned int		hash_size;
 	struct hlist_head	*tid_hash;
 	struct hlist_head	*addr_hash;
-	/* Used by LWT tunnel. */
-	bool			collect_md;
-	struct socket		*collect_md_sock;
 };
 
 static unsigned int gtp_net_id __read_mostly;
@@ -183,121 +179,33 @@ static bool gtp_check_ms(struct sk_buff *skb, struct pdp_ctx *pctx,
 	return false;
 }
 
-static int gtp_set_tun_dst(struct gtp_dev *gtp, struct sk_buff *skb,
-			   unsigned int hdrlen, u8 gtp_version,
-			   __be64 tid, u8 flags)
+static int gtp_rx(struct pdp_ctx *pctx, struct sk_buff *skb,
+			unsigned int hdrlen, unsigned int role)
 {
-	struct metadata_dst *tun_dst;
-	int opts_len = 0;
-
-	if (unlikely(flags & GTP1_F_MASK))
-		opts_len = sizeof(struct gtpu_metadata);
-
-	tun_dst = udp_tun_rx_dst(skb, gtp->sk1u->sk_family, TUNNEL_KEY, tid, opts_len);
-	if (!tun_dst) {
-		netdev_dbg(gtp->dev, "Failed to allocate tun_dst");
-		goto err;
+	if (!gtp_check_ms(skb, pctx, hdrlen, role)) {
+		netdev_dbg(pctx->dev, "No PDP ctx for this MS\n");
+		return 1;
 	}
 
-	netdev_dbg(gtp->dev, "attaching metadata_dst to skb, gtp ver %d hdrlen %d\n",
-		   gtp_version, hdrlen);
-	if (unlikely(opts_len)) {
-		struct gtpu_metadata *opts;
-		struct gtp1_header *gtp1;
-
-		opts = ip_tunnel_info_opts(&tun_dst->u.tun_info);
-		gtp1 = (struct gtp1_header *)(skb->data + sizeof(struct udphdr));
-		opts->ver = GTP_METADATA_V1;
-		opts->flags = gtp1->flags;
-		opts->type = gtp1->type;
-		netdev_dbg(gtp->dev, "recved control pkt: flag %x type: %d\n",
-			   opts->flags, opts->type);
-		tun_dst->u.tun_info.key.tun_flags |= TUNNEL_GTPU_OPT;
-		tun_dst->u.tun_info.options_len = opts_len;
-		skb->protocol = htons(0xffff);         /* Unknown */
-	}
 	/* Get rid of the GTP + UDP headers. */
 	if (iptunnel_pull_header(skb, hdrlen, skb->protocol,
-				 !net_eq(sock_net(gtp->sk1u), dev_net(gtp->dev)))) {
-		gtp->dev->stats.rx_length_errors++;
-		goto err;
-	}
-
-	skb_dst_set(skb, &tun_dst->dst);
-	return 0;
-err:
-	return -1;
-}
-
-static int gtp_rx(struct gtp_dev *gtp, struct sk_buff *skb,
-		  unsigned int hdrlen, u8 gtp_version, unsigned int role,
-		  __be64 tid, u8 flags, u8 type)
-{
-	if (ip_tunnel_collect_metadata() || gtp->collect_md) {
-		int err;
-
-		err = gtp_set_tun_dst(gtp, skb, hdrlen, gtp_version, tid, flags);
-		if (err)
-			goto err;
-	} else {
-		struct pdp_ctx *pctx;
-
-		if (flags & GTP1_F_MASK)
-			hdrlen += 4;
-
-		if (type != GTP_TPDU)
-			return 1;
+				 !net_eq(sock_net(pctx->sk), dev_net(pctx->dev))))
+		return -1;
 
-		if (gtp_version == GTP_V0)
-			pctx = gtp0_pdp_find(gtp, be64_to_cpu(tid));
-		else
-			pctx = gtp1_pdp_find(gtp, be64_to_cpu(tid));
-		if (!pctx) {
-			netdev_dbg(gtp->dev, "No PDP ctx to decap skb=%p\n", skb);
-			return 1;
-		}
-
-		if (!gtp_check_ms(skb, pctx, hdrlen, role)) {
-			netdev_dbg(pctx->dev, "No PDP ctx for this MS\n");
-			return 1;
-		}
-		/* Get rid of the GTP + UDP headers. */
-		if (iptunnel_pull_header(skb, hdrlen, skb->protocol,
-					 !net_eq(sock_net(pctx->sk), dev_net(gtp->dev)))) {
-			gtp->dev->stats.rx_length_errors++;
-			goto err;
-		}
-	}
-	netdev_dbg(gtp->dev, "forwarding packet from GGSN to uplink\n");
+	netdev_dbg(pctx->dev, "forwarding packet from GGSN to uplink\n");
 
 	/* Now that the UDP and the GTP header have been removed, set up the
 	 * new network header. This is required by the upper layer to
 	 * calculate the transport header.
 	 */
 	skb_reset_network_header(skb);
-	if (pskb_may_pull(skb, sizeof(struct iphdr))) {
-		struct iphdr *iph;
-
-		iph = ip_hdr(skb);
-		if (iph->version == 4) {
-			netdev_dbg(gtp->dev, "inner pkt: ipv4");
-			skb->protocol = htons(ETH_P_IP);
-		} else if (iph->version == 6) {
-			netdev_dbg(gtp->dev, "inner pkt: ipv6");
-			skb->protocol = htons(ETH_P_IPV6);
-		} else {
-			netdev_dbg(gtp->dev, "inner pkt error: Unknown type");
-		}
-	}
 
-	skb->dev = gtp->dev;
-	dev_sw_netstats_rx_add(gtp->dev, skb->len);
+	skb->dev = pctx->dev;
+
+	dev_sw_netstats_rx_add(pctx->dev, skb->len);
+
 	netif_rx(skb);
 	return 0;
-
-err:
-	gtp->dev->stats.rx_dropped++;
-	return -1;
 }
 
 /* 1 means pass up to the stack, -1 means drop and 0 means decapsulated. */
@@ -306,6 +214,7 @@ static int gtp0_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
 	unsigned int hdrlen = sizeof(struct udphdr) +
 			      sizeof(struct gtp0_header);
 	struct gtp0_header *gtp0;
+	struct pdp_ctx *pctx;
 
 	if (!pskb_may_pull(skb, hdrlen))
 		return -1;
@@ -315,7 +224,16 @@ static int gtp0_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
 	if ((gtp0->flags >> 5) != GTP_V0)
 		return 1;
 
-	return gtp_rx(gtp, skb, hdrlen, GTP_V0, gtp->role, gtp0->tid, gtp0->flags, gtp0->type);
+	if (gtp0->type != GTP_TPDU)
+		return 1;
+
+	pctx = gtp0_pdp_find(gtp, be64_to_cpu(gtp0->tid));
+	if (!pctx) {
+		netdev_dbg(gtp->dev, "No PDP ctx to decap skb=%p\n", skb);
+		return 1;
+	}
+
+	return gtp_rx(pctx, skb, hdrlen, gtp->role);
 }
 
 static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
@@ -323,30 +241,41 @@ static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
 	unsigned int hdrlen = sizeof(struct udphdr) +
 			      sizeof(struct gtp1_header);
 	struct gtp1_header *gtp1;
+	struct pdp_ctx *pctx;
 
 	if (!pskb_may_pull(skb, hdrlen))
 		return -1;
 
 	gtp1 = (struct gtp1_header *)(skb->data + sizeof(struct udphdr));
 
-	netdev_dbg(gtp->dev, "GTPv1 recv: flags %x\n", gtp1->flags);
 	if ((gtp1->flags >> 5) != GTP_V1)
 		return 1;
 
+	if (gtp1->type != GTP_TPDU)
+		return 1;
+
 	/* From 29.060: "This field shall be present if and only if any one or
 	 * more of the S, PN and E flags are set.".
 	 *
 	 * If any of the bit is set, then the remaining ones also have to be
 	 * set.
 	 */
+	if (gtp1->flags & GTP1_F_MASK)
+		hdrlen += 4;
+
 	/* Make sure the header is larger enough, including extensions. */
 	if (!pskb_may_pull(skb, hdrlen))
 		return -1;
 
 	gtp1 = (struct gtp1_header *)(skb->data + sizeof(struct udphdr));
 
-	return gtp_rx(gtp, skb, hdrlen, GTP_V1, gtp->role,
-		      key32_to_tunnel_id(gtp1->tid), gtp1->flags, gtp1->type);
+	pctx = gtp1_pdp_find(gtp, ntohl(gtp1->tid));
+	if (!pctx) {
+		netdev_dbg(gtp->dev, "No PDP ctx to decap skb=%p\n", skb);
+		return 1;
+	}
+
+	return gtp_rx(pctx, skb, hdrlen, gtp->role);
 }
 
 static void __gtp_encap_destroy(struct sock *sk)
@@ -386,11 +315,6 @@ static void gtp_encap_disable(struct gtp_dev *gtp)
 {
 	gtp_encap_disable_sock(gtp->sk0);
 	gtp_encap_disable_sock(gtp->sk1u);
-	if (gtp->collect_md_sock) {
-		udp_tunnel_sock_release(gtp->collect_md_sock);
-		gtp->collect_md_sock = NULL;
-		netdev_dbg(gtp->dev, "GTP socket released.\n");
-	}
 }
 
 /* UDP encapsulation receive handler. See net/ipv4/udp.c.
@@ -405,8 +329,7 @@ static int gtp_encap_recv(struct sock *sk, struct sk_buff *skb)
 	if (!gtp)
 		return 1;
 
-	netdev_dbg(gtp->dev, "encap_recv sk=%p type %d\n",
-		   sk, udp_sk(sk)->encap_type);
+	netdev_dbg(gtp->dev, "encap_recv sk=%p\n", sk);
 
 	switch (udp_sk(sk)->encap_type) {
 	case UDP_ENCAP_GTP0:
@@ -460,13 +383,12 @@ static void gtp_dev_uninit(struct net_device *dev)
 
 static struct rtable *ip4_route_output_gtp(struct flowi4 *fl4,
 					   const struct sock *sk,
-					   __be32 daddr,
-					   __be32 saddr)
+					   __be32 daddr)
 {
 	memset(fl4, 0, sizeof(*fl4));
 	fl4->flowi4_oif		= sk->sk_bound_dev_if;
 	fl4->daddr		= daddr;
-	fl4->saddr		= saddr;
+	fl4->saddr		= inet_sk(sk)->inet_saddr;
 	fl4->flowi4_tos		= RT_CONN_FLAGS(sk);
 	fl4->flowi4_proto	= sk->sk_protocol;
 
@@ -490,7 +412,7 @@ static inline void gtp0_push_header(struct sk_buff *skb, struct pdp_ctx *pctx)
 	gtp0->tid	= cpu_to_be64(pctx->u.v0.tid);
 }
 
-static inline void gtp1_push_header(struct sk_buff *skb, __be32 tid)
+static inline void gtp1_push_header(struct sk_buff *skb, struct pdp_ctx *pctx)
 {
 	int payload_len = skb->len;
 	struct gtp1_header *gtp1;
@@ -506,63 +428,46 @@ static inline void gtp1_push_header(struct sk_buff *skb, __be32 tid)
 	gtp1->flags	= 0x30; /* v1, GTP-non-prime. */
 	gtp1->type	= GTP_TPDU;
 	gtp1->length	= htons(payload_len);
-	gtp1->tid	= tid;
+	gtp1->tid	= htonl(pctx->u.v1.o_tei);
 
 	/* TODO: Suppport for extension header, sequence number and N-PDU.
 	 *	 Update the length field if any of them is available.
 	 */
 }
 
-static inline int gtp1_push_control_header(struct sk_buff *skb,
-					   __be32 tid,
-					   struct gtpu_metadata *opts,
-					   struct net_device *dev)
-{
-	struct gtp1_header *gtp1c;
-	int payload_len;
-
-	if (opts->ver != GTP_METADATA_V1)
-		return -ENOENT;
+struct gtp_pktinfo {
+	struct sock		*sk;
+	struct iphdr		*iph;
+	struct flowi4		fl4;
+	struct rtable		*rt;
+	struct pdp_ctx		*pctx;
+	struct net_device	*dev;
+	__be16			gtph_port;
+};
 
-	if (opts->type == 0xFE) {
-		/* for end marker ignore skb data. */
-		netdev_dbg(dev, "xmit pkt with null data");
-		pskb_trim(skb, 0);
+static void gtp_push_header(struct sk_buff *skb, struct gtp_pktinfo *pktinfo)
+{
+	switch (pktinfo->pctx->gtp_version) {
+	case GTP_V0:
+		pktinfo->gtph_port = htons(GTP0_PORT);
+		gtp0_push_header(skb, pktinfo->pctx);
+		break;
+	case GTP_V1:
+		pktinfo->gtph_port = htons(GTP1U_PORT);
+		gtp1_push_header(skb, pktinfo->pctx);
+		break;
 	}
-	if (skb_cow_head(skb, sizeof(*gtp1c)) < 0)
-		return -ENOMEM;
-
-	payload_len = skb->len;
-
-	gtp1c = skb_push(skb, sizeof(*gtp1c));
-
-	gtp1c->flags	= opts->flags;
-	gtp1c->type	= opts->type;
-	gtp1c->length	= htons(payload_len);
-	gtp1c->tid	= tid;
-	netdev_dbg(dev, "xmit control pkt: ver %d flags %x type %x pkt len %d tid %x",
-		   opts->ver, opts->flags, opts->type, skb->len, tid);
-	return 0;
 }
 
-struct gtp_pktinfo {
-	struct sock             *sk;
-	__u8                    tos;
-	struct flowi4           fl4;
-	struct rtable           *rt;
-	struct net_device       *dev;
-	__be16                  gtph_port;
-};
-
 static inline void gtp_set_pktinfo_ipv4(struct gtp_pktinfo *pktinfo,
-					struct sock *sk,
-					__u8 tos,
-					struct rtable *rt,
+					struct sock *sk, struct iphdr *iph,
+					struct pdp_ctx *pctx, struct rtable *rt,
 					struct flowi4 *fl4,
 					struct net_device *dev)
 {
 	pktinfo->sk	= sk;
-	pktinfo->tos    = tos;
+	pktinfo->iph	= iph;
+	pktinfo->pctx	= pctx;
 	pktinfo->rt	= rt;
 	pktinfo->fl4	= *fl4;
 	pktinfo->dev	= dev;
@@ -572,99 +477,40 @@ static int gtp_build_skb_ip4(struct sk_buff *skb, struct net_device *dev,
 			     struct gtp_pktinfo *pktinfo)
 {
 	struct gtp_dev *gtp = netdev_priv(dev);
-	struct gtpu_metadata *opts = NULL;
-	struct sock *sk = NULL;
 	struct pdp_ctx *pctx;
 	struct rtable *rt;
 	struct flowi4 fl4;
-	u8 gtp_version;
-	__be16 df = 0;
-	__be32 tun_id;
-	__be32 daddr;
-	__be32 saddr;
-	__u8 tos;
+	struct iphdr *iph;
+	__be16 df;
 	int mtu;
 
-	if (gtp->collect_md) {
-		/* LWT GTP1U encap */
-		struct ip_tunnel_info *info = NULL;
-
-		info = skb_tunnel_info(skb);
-		if (!info) {
-			netdev_dbg(dev, "missing tunnel info");
-			return -ENOENT;
-		}
-		if (info->key.tp_dst && ntohs(info->key.tp_dst) != GTP1U_PORT) {
-			netdev_dbg(dev, "unexpected GTP dst port: %d", ntohs(info->key.tp_dst));
-			return -EOPNOTSUPP;
-		}
-		pctx = NULL;
-		gtp_version = GTP_V1;
-		tun_id = tunnel_id_to_key32(info->key.tun_id);
-		daddr = info->key.u.ipv4.dst;
-		saddr = info->key.u.ipv4.src;
-		sk = gtp->sk1u;
-		if (!sk) {
-			netdev_dbg(dev, "missing tunnel sock");
-			return -EOPNOTSUPP;
-		}
-		tos = info->key.tos;
-		if (info->key.tun_flags & TUNNEL_DONT_FRAGMENT)
-			df = htons(IP_DF);
-
-		if (info->options_len != 0) {
-			if (info->key.tun_flags & TUNNEL_GTPU_OPT) {
-				opts = ip_tunnel_info_opts(info);
-			} else {
-				netdev_dbg(dev, "missing tunnel metadata for control pkt");
-				return -EOPNOTSUPP;
-			}
-		}
-		netdev_dbg(dev, "flow-based GTP1U encap: tunnel id %d\n",
-			   be32_to_cpu(tun_id));
-	} else {
-		struct iphdr *iph;
-
-		if (ntohs(skb->protocol) != ETH_P_IP)
-			return -EOPNOTSUPP;
-
-		iph = ip_hdr(skb);
-
-		/* Read the IP destination address and resolve the PDP context.
-		 * Prepend PDP header with TEI/TID from PDP ctx.
-		 */
-		if (gtp->role == GTP_ROLE_SGSN)
-			pctx = ipv4_pdp_find(gtp, iph->saddr);
-		else
-			pctx = ipv4_pdp_find(gtp, iph->daddr);
+	/* Read the IP destination address and resolve the PDP context.
+	 * Prepend PDP header with TEI/TID from PDP ctx.
+	 */
+	iph = ip_hdr(skb);
+	if (gtp->role == GTP_ROLE_SGSN)
+		pctx = ipv4_pdp_find(gtp, iph->saddr);
+	else
+		pctx = ipv4_pdp_find(gtp, iph->daddr);
 
-		if (!pctx) {
-			netdev_dbg(dev, "no PDP ctx found for %pI4, skip\n",
-				   &iph->daddr);
-			return -ENOENT;
-		}
-		sk = pctx->sk;
-		netdev_dbg(dev, "found PDP context %p\n", pctx);
-
-		gtp_version = pctx->gtp_version;
-		tun_id  = htonl(pctx->u.v1.o_tei);
-		daddr = pctx->peer_addr_ip4.s_addr;
-		saddr = inet_sk(sk)->inet_saddr;
-		tos = iph->tos;
-		df = iph->frag_off;
-		netdev_dbg(dev, "gtp -> IP src: %pI4 dst: %pI4\n",
-			   &iph->saddr, &iph->daddr);
+	if (!pctx) {
+		netdev_dbg(dev, "no PDP ctx found for %pI4, skip\n",
+			   &iph->daddr);
+		return -ENOENT;
 	}
+	netdev_dbg(dev, "found PDP context %p\n", pctx);
 
-	rt = ip4_route_output_gtp(&fl4, sk, daddr, saddr);
+	rt = ip4_route_output_gtp(&fl4, pctx->sk, pctx->peer_addr_ip4.s_addr);
 	if (IS_ERR(rt)) {
-		netdev_dbg(dev, "no route to SSGN %pI4\n", &daddr);
+		netdev_dbg(dev, "no route to SSGN %pI4\n",
+			   &pctx->peer_addr_ip4.s_addr);
 		dev->stats.tx_carrier_errors++;
 		goto err;
 	}
 
 	if (rt->dst.dev == dev) {
-		netdev_dbg(dev, "circular route to SSGN %pI4\n", &daddr);
+		netdev_dbg(dev, "circular route to SSGN %pI4\n",
+			   &pctx->peer_addr_ip4.s_addr);
 		dev->stats.collisions++;
 		goto err_rt;
 	}
@@ -672,10 +518,11 @@ static int gtp_build_skb_ip4(struct sk_buff *skb, struct net_device *dev,
 	skb_dst_drop(skb);
 
 	/* This is similar to tnl_update_pmtu(). */
+	df = iph->frag_off;
 	if (df) {
 		mtu = dst_mtu(&rt->dst) - dev->hard_header_len -
 			sizeof(struct iphdr) - sizeof(struct udphdr);
-		switch (gtp_version) {
+		switch (pctx->gtp_version) {
 		case GTP_V0:
 			mtu -= sizeof(struct gtp0_header);
 			break;
@@ -689,38 +536,17 @@ static int gtp_build_skb_ip4(struct sk_buff *skb, struct net_device *dev,
 
 	rt->dst.ops->update_pmtu(&rt->dst, NULL, skb, mtu, false);
 
-	if (!skb_is_gso(skb) && (df & htons(IP_DF)) && mtu < skb->len) {
-		netdev_dbg(dev, "packet too big, fragmentation needed");
+	if (!skb_is_gso(skb) && (iph->frag_off & htons(IP_DF)) &&
+	    mtu < ntohs(iph->tot_len)) {
+		netdev_dbg(dev, "packet too big, fragmentation needed\n");
 		memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
 		icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
 			      htonl(mtu));
 		goto err_rt;
 	}
 
-	gtp_set_pktinfo_ipv4(pktinfo, sk, tos, rt, &fl4, dev);
-
-	if (unlikely(opts)) {
-		int err;
-
-		pktinfo->gtph_port = htons(GTP1U_PORT);
-		err = gtp1_push_control_header(skb, tun_id, opts, dev);
-		if (err) {
-			netdev_info(dev, "cntr pkt error %d", err);
-			goto err_rt;
-		}
-		return 0;
-	}
-
-	switch (gtp_version) {
-	case GTP_V0:
-		pktinfo->gtph_port = htons(GTP0_PORT);
-		gtp0_push_header(skb, pctx);
-		break;
-	case GTP_V1:
-		pktinfo->gtph_port = htons(GTP1U_PORT);
-		gtp1_push_header(skb, tun_id);
-		break;
-	}
+	gtp_set_pktinfo_ipv4(pktinfo, pctx->sk, iph, pctx, rt, &fl4, dev);
+	gtp_push_header(skb, pktinfo);
 
 	return 0;
 err_rt:
@@ -731,6 +557,7 @@ static int gtp_build_skb_ip4(struct sk_buff *skb, struct net_device *dev,
 
 static netdev_tx_t gtp_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 {
+	unsigned int proto = ntohs(skb->protocol);
 	struct gtp_pktinfo pktinfo;
 	int err;
 
@@ -742,22 +569,32 @@ static netdev_tx_t gtp_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	/* PDP context lookups in gtp_build_skb_*() need rcu read-side lock. */
 	rcu_read_lock();
-	err = gtp_build_skb_ip4(skb, dev, &pktinfo);
+	switch (proto) {
+	case ETH_P_IP:
+		err = gtp_build_skb_ip4(skb, dev, &pktinfo);
+		break;
+	default:
+		err = -EOPNOTSUPP;
+		break;
+	}
 	rcu_read_unlock();
 
 	if (err < 0)
 		goto tx_err;
 
-	udp_tunnel_xmit_skb(pktinfo.rt, pktinfo.sk, skb,
-			    pktinfo.fl4.saddr,
-			    pktinfo.fl4.daddr,
-			    pktinfo.tos,
-			    ip4_dst_hoplimit(&pktinfo.rt->dst),
-			    0,
-			    pktinfo.gtph_port,
-			    pktinfo.gtph_port,
-			    true,
-			    false);
+	switch (proto) {
+	case ETH_P_IP:
+		netdev_dbg(pktinfo.dev, "gtp -> IP src: %pI4 dst: %pI4\n",
+			   &pktinfo.iph->saddr, &pktinfo.iph->daddr);
+		udp_tunnel_xmit_skb(pktinfo.rt, pktinfo.sk, skb,
+				    pktinfo.fl4.saddr, pktinfo.fl4.daddr,
+				    pktinfo.iph->tos,
+				    ip4_dst_hoplimit(&pktinfo.rt->dst),
+				    0,
+				    pktinfo.gtph_port, pktinfo.gtph_port,
+				    true, false);
+		break;
+	}
 
 	return NETDEV_TX_OK;
 tx_err:
@@ -773,19 +610,6 @@ static const struct net_device_ops gtp_netdev_ops = {
 	.ndo_get_stats64	= dev_get_tstats64,
 };
 
-static struct gtp_dev *gtp_find_flow_based_dev(struct net *net)
-{
-	struct gtp_net *gn = net_generic(net, gtp_net_id);
-	struct gtp_dev *gtp;
-
-	list_for_each_entry(gtp, &gn->gtp_dev_list, list) {
-		if (gtp->collect_md)
-			return gtp;
-	}
-
-	return NULL;
-}
-
 static void gtp_link_setup(struct net_device *dev)
 {
 	dev->netdev_ops		= &gtp_netdev_ops;
@@ -810,7 +634,7 @@ static void gtp_link_setup(struct net_device *dev)
 }
 
 static int gtp_hashtable_new(struct gtp_dev *gtp, int hsize);
-static int gtp_encap_enable(struct gtp_dev *gtp, struct net_device *dev, struct nlattr *data[]);
+static int gtp_encap_enable(struct gtp_dev *gtp, struct nlattr *data[]);
 
 static void gtp_destructor(struct net_device *dev)
 {
@@ -828,24 +652,11 @@ static int gtp_newlink(struct net *src_net, struct net_device *dev,
 	struct gtp_net *gn;
 	int hashsize, err;
 
-	if (!data[IFLA_GTP_FD0] && !data[IFLA_GTP_FD1] &&
-	    !data[IFLA_GTP_COLLECT_METADATA])
+	if (!data[IFLA_GTP_FD0] && !data[IFLA_GTP_FD1])
 		return -EINVAL;
 
 	gtp = netdev_priv(dev);
 
-	if (data[IFLA_GTP_COLLECT_METADATA]) {
-		if (data[IFLA_GTP_FD0]) {
-			netdev_dbg(dev, "LWT device does not support setting v0 socket");
-			return -EINVAL;
-		}
-		if (gtp_find_flow_based_dev(src_net)) {
-			netdev_dbg(dev, "LWT device already exist");
-			return -EBUSY;
-		}
-		gtp->collect_md = true;
-	}
-
 	if (!data[IFLA_GTP_PDP_HASHSIZE]) {
 		hashsize = 1024;
 	} else {
@@ -858,7 +669,7 @@ static int gtp_newlink(struct net *src_net, struct net_device *dev,
 	if (err < 0)
 		return err;
 
-	err = gtp_encap_enable(gtp, dev, data);
+	err = gtp_encap_enable(gtp, data);
 	if (err < 0)
 		goto out_hashtable;
 
@@ -872,7 +683,7 @@ static int gtp_newlink(struct net *src_net, struct net_device *dev,
 	list_add_rcu(&gtp->list, &gn->gtp_dev_list);
 	dev->priv_destructor = gtp_destructor;
 
-	netdev_dbg(dev, "registered new GTP interface %s\n", dev->name);
+	netdev_dbg(dev, "registered new GTP interface\n");
 
 	return 0;
 
@@ -903,7 +714,6 @@ static const struct nla_policy gtp_policy[IFLA_GTP_MAX + 1] = {
 	[IFLA_GTP_FD1]			= { .type = NLA_U32 },
 	[IFLA_GTP_PDP_HASHSIZE]		= { .type = NLA_U32 },
 	[IFLA_GTP_ROLE]			= { .type = NLA_U32 },
-	[IFLA_GTP_COLLECT_METADATA]	= { .type = NLA_FLAG },
 };
 
 static int gtp_validate(struct nlattr *tb[], struct nlattr *data[],
@@ -927,9 +737,6 @@ static int gtp_fill_info(struct sk_buff *skb, const struct net_device *dev)
 	if (nla_put_u32(skb, IFLA_GTP_PDP_HASHSIZE, gtp->hash_size))
 		goto nla_put_failure;
 
-	if (gtp->collect_md && nla_put_flag(skb, IFLA_GTP_COLLECT_METADATA))
-		goto nla_put_failure;
-
 	return 0;
 
 nla_put_failure:
@@ -975,24 +782,35 @@ static int gtp_hashtable_new(struct gtp_dev *gtp, int hsize)
 	return -ENOMEM;
 }
 
-static int __gtp_encap_enable_socket(struct socket *sock, int type,
-				     struct gtp_dev *gtp)
+static struct sock *gtp_encap_enable_socket(int fd, int type,
+					    struct gtp_dev *gtp)
 {
 	struct udp_tunnel_sock_cfg tuncfg = {NULL};
+	struct socket *sock;
 	struct sock *sk;
+	int err;
+
+	pr_debug("enable gtp on %d, %d\n", fd, type);
+
+	sock = sockfd_lookup(fd, &err);
+	if (!sock) {
+		pr_debug("gtp socket fd=%d not found\n", fd);
+		return NULL;
+	}
 
 	sk = sock->sk;
 	if (sk->sk_protocol != IPPROTO_UDP ||
 	    sk->sk_type != SOCK_DGRAM ||
 	    (sk->sk_family != AF_INET && sk->sk_family != AF_INET6)) {
-		pr_debug("socket not UDP\n");
-		return -EINVAL;
+		pr_debug("socket fd=%d not UDP\n", fd);
+		sk = ERR_PTR(-EINVAL);
+		goto out_sock;
 	}
 
 	lock_sock(sk);
 	if (sk->sk_user_data) {
-		release_sock(sock->sk);
-		return -EBUSY;
+		sk = ERR_PTR(-EBUSY);
+		goto out_rel_sock;
 	}
 
 	sock_hold(sk);
@@ -1003,58 +821,15 @@ static int __gtp_encap_enable_socket(struct socket *sock, int type,
 	tuncfg.encap_destroy = gtp_encap_destroy;
 
 	setup_udp_tunnel_sock(sock_net(sock->sk), sock, &tuncfg);
-	release_sock(sock->sk);
-	return 0;
-}
 
-static struct sock *gtp_encap_enable_socket(int fd, int type,
-					    struct gtp_dev *gtp)
-{
-	struct socket *sock;
-	int err;
-
-	pr_debug("enable gtp on %d, %d\n", fd, type);
-
-	sock = sockfd_lookup(fd, &err);
-	if (!sock) {
-		pr_debug("gtp socket fd=%d not found\n", fd);
-		return NULL;
-	}
-	err =  __gtp_encap_enable_socket(sock, type, gtp);
+out_rel_sock:
+	release_sock(sock->sk);
+out_sock:
 	sockfd_put(sock);
-	if (err)
-		return ERR_PTR(err);
-
-	return sock->sk;
+	return sk;
 }
 
-static struct socket *gtp_create_gtp_socket(struct gtp_dev *gtp, struct net_device *dev)
-{
-	struct udp_port_cfg udp_conf;
-	struct socket *sock;
-	int err;
-
-	memset(&udp_conf, 0, sizeof(udp_conf));
-	udp_conf.family = AF_INET;
-	udp_conf.local_ip.s_addr = htonl(INADDR_ANY);
-	udp_conf.local_udp_port = htons(GTP1U_PORT);
-
-	err = udp_sock_create(dev_net(dev), &udp_conf, &sock);
-	if (err < 0) {
-		pr_debug("create gtp sock failed: %d\n", err);
-		return ERR_PTR(err);
-	}
-	err = __gtp_encap_enable_socket(sock, UDP_ENCAP_GTP1U, gtp);
-	if (err) {
-		pr_debug("enable gtp sock encap failed: %d\n", err);
-		udp_tunnel_sock_release(sock);
-		return ERR_PTR(err);
-	}
-	pr_debug("create gtp sock done\n");
-	return sock;
-}
-
-static int gtp_encap_enable(struct gtp_dev *gtp, struct net_device *dev, struct nlattr *data[])
+static int gtp_encap_enable(struct gtp_dev *gtp, struct nlattr *data[])
 {
 	struct sock *sk1u = NULL;
 	struct sock *sk0 = NULL;
@@ -1078,25 +853,11 @@ static int gtp_encap_enable(struct gtp_dev *gtp, struct net_device *dev, struct
 		}
 	}
 
-	if (data[IFLA_GTP_COLLECT_METADATA]) {
-		struct socket *sock;
-
-		if (!sk1u) {
-			sock = gtp_create_gtp_socket(gtp, dev);
-			if (IS_ERR(sock))
-				return PTR_ERR(sock);
-
-			gtp->collect_md_sock = sock;
-			sk1u = sock->sk;
-		} else {
-			gtp->collect_md_sock = NULL;
-		}
-	}
-
 	if (data[IFLA_GTP_ROLE]) {
 		role = nla_get_u32(data[IFLA_GTP_ROLE]);
 		if (role > GTP_ROLE_SGSN) {
-			gtp_encap_disable(gtp);
+			gtp_encap_disable_sock(sk0);
+			gtp_encap_disable_sock(sk1u);
 			return -EINVAL;
 		}
 	}
@@ -1655,7 +1416,7 @@ static int __init gtp_init(void)
 	if (err < 0)
 		goto unreg_genl_family;
 
-	pr_info("GTP module loaded (pdp ctx size %zd bytes) with tnl-md support\n",
+	pr_info("GTP module loaded (pdp ctx size %zd bytes)\n",
 		sizeof(struct pdp_ctx));
 	return 0;
 
diff --git a/include/uapi/linux/gtp.h b/include/uapi/linux/gtp.h
index 62aff78b7c56..79f9191bbb24 100644
--- a/include/uapi/linux/gtp.h
+++ b/include/uapi/linux/gtp.h
@@ -2,8 +2,6 @@
 #ifndef _UAPI_LINUX_GTP_H_
 #define _UAPI_LINUX_GTP_H_
 
-#include <linux/types.h>
-
 #define GTP_GENL_MCGRP_NAME	"gtp"
 
 enum gtp_genl_cmds {
@@ -36,14 +34,4 @@ enum gtp_attrs {
 };
 #define GTPA_MAX (__GTPA_MAX + 1)
 
-enum {
-	GTP_METADATA_V1
-};
-
-struct gtpu_metadata {
-	__u8    ver;
-	__u8    flags;
-	__u8    type;
-};
-
 #endif /* _UAPI_LINUX_GTP_H_ */
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 2bd0d8bbcdb2..82708c6db432 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -809,7 +809,6 @@ enum {
 	IFLA_GTP_FD1,
 	IFLA_GTP_PDP_HASHSIZE,
 	IFLA_GTP_ROLE,
-	IFLA_GTP_COLLECT_METADATA,
 	__IFLA_GTP_MAX,
 };
 #define IFLA_GTP_MAX (__IFLA_GTP_MAX - 1)
diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
index 802da679fab1..7d9105533c7b 100644
--- a/include/uapi/linux/if_tunnel.h
+++ b/include/uapi/linux/if_tunnel.h
@@ -176,7 +176,6 @@ enum {
 #define TUNNEL_VXLAN_OPT	__cpu_to_be16(0x1000)
 #define TUNNEL_NOCACHE		__cpu_to_be16(0x2000)
 #define TUNNEL_ERSPAN_OPT	__cpu_to_be16(0x4000)
-#define TUNNEL_GTPU_OPT		__cpu_to_be16(0x8000)
 
 #define TUNNEL_OPTIONS_PRESENT \
 		(TUNNEL_GENEVE_OPT | TUNNEL_VXLAN_OPT | TUNNEL_ERSPAN_OPT)
diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h
index 28d649bda686..d208b2af697f 100644
--- a/tools/include/uapi/linux/if_link.h
+++ b/tools/include/uapi/linux/if_link.h
@@ -617,7 +617,6 @@ enum {
 	IFLA_GTP_FD1,
 	IFLA_GTP_PDP_HASHSIZE,
 	IFLA_GTP_ROLE,
-	IFLA_GTP_COLLECT_METADATA,
 	__IFLA_GTP_MAX,
 };
 #define IFLA_GTP_MAX (__IFLA_GTP_MAX - 1)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 02/16] gtp: set initial MTU
  2021-01-23 19:59 [RFC PATCH 00/16] GTP: flow based Jonas Bonn
  2021-01-23 19:59 ` [RFC PATCH 01/16] Revert "GTP: add support for flow based tunneling API" Jonas Bonn
@ 2021-01-23 19:59 ` Jonas Bonn
  2021-01-23 19:59 ` [RFC PATCH 03/16] gtp: include role in link info Jonas Bonn
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 39+ messages in thread
From: Jonas Bonn @ 2021-01-23 19:59 UTC (permalink / raw)
  To: laforge, netdev, pbshelar, kuba; +Cc: pablo, Jonas Bonn

The GTP link is brought up with a default MTU of zero.  This can lead to
some rather unexpected behaviour for users who are more accustomed to
interfaces coming online with reasonable defaults.

This patch sets an initial MTU for the GTP link of 1500 less worst-case
tunnel overhead.

Signed-off-by: Jonas Bonn <jonas@norrbonn.se>
Acked-by: Harald Welte <laforge@gnumonks.org>
---
 drivers/net/gtp.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 4c04e271f184..5a048f050a9c 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -612,11 +612,16 @@ static const struct net_device_ops gtp_netdev_ops = {
 
 static void gtp_link_setup(struct net_device *dev)
 {
+	unsigned int max_gtp_header_len = sizeof(struct iphdr) +
+					  sizeof(struct udphdr) +
+					  sizeof(struct gtp0_header);
+
 	dev->netdev_ops		= &gtp_netdev_ops;
 	dev->needs_free_netdev	= true;
 
 	dev->hard_header_len = 0;
 	dev->addr_len = 0;
+	dev->mtu = ETH_DATA_LEN - max_gtp_header_len;
 
 	/* Zero header length. */
 	dev->type = ARPHRD_NONE;
@@ -626,11 +631,7 @@ static void gtp_link_setup(struct net_device *dev)
 	dev->features	|= NETIF_F_LLTX;
 	netif_keep_dst(dev);
 
-	/* Assume largest header, ie. GTPv0. */
-	dev->needed_headroom	= LL_MAX_HEADER +
-				  sizeof(struct iphdr) +
-				  sizeof(struct udphdr) +
-				  sizeof(struct gtp0_header);
+	dev->needed_headroom	= LL_MAX_HEADER + max_gtp_header_len;
 }
 
 static int gtp_hashtable_new(struct gtp_dev *gtp, int hsize);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 03/16] gtp: include role in link info
  2021-01-23 19:59 [RFC PATCH 00/16] GTP: flow based Jonas Bonn
  2021-01-23 19:59 ` [RFC PATCH 01/16] Revert "GTP: add support for flow based tunneling API" Jonas Bonn
  2021-01-23 19:59 ` [RFC PATCH 02/16] gtp: set initial MTU Jonas Bonn
@ 2021-01-23 19:59 ` Jonas Bonn
  2021-01-23 19:59 ` [RFC PATCH 04/16] gtp: really check namespaces before xmit Jonas Bonn
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 39+ messages in thread
From: Jonas Bonn @ 2021-01-23 19:59 UTC (permalink / raw)
  To: laforge, netdev, pbshelar, kuba; +Cc: pablo, Jonas Bonn

Querying link info for the GTP interface doesn't reveal in which "role" the
device is set to operate.  Include this information in the info query
result.

Signed-off-by: Jonas Bonn <jonas@norrbonn.se>
Acked-by: Harald Welte <laforge@gnumonks.org>
---
 drivers/net/gtp.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 5a048f050a9c..5682d3ba7aa5 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -728,7 +728,8 @@ static int gtp_validate(struct nlattr *tb[], struct nlattr *data[],
 
 static size_t gtp_get_size(const struct net_device *dev)
 {
-	return nla_total_size(sizeof(__u32));	/* IFLA_GTP_PDP_HASHSIZE */
+	return nla_total_size(sizeof(__u32)) + /* IFLA_GTP_PDP_HASHSIZE */
+		nla_total_size(sizeof(__u32)); /* IFLA_GTP_ROLE */
 }
 
 static int gtp_fill_info(struct sk_buff *skb, const struct net_device *dev)
@@ -737,6 +738,8 @@ static int gtp_fill_info(struct sk_buff *skb, const struct net_device *dev)
 
 	if (nla_put_u32(skb, IFLA_GTP_PDP_HASHSIZE, gtp->hash_size))
 		goto nla_put_failure;
+	if (nla_put_u32(skb, IFLA_GTP_ROLE, gtp->role))
+		goto nla_put_failure;
 
 	return 0;
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 04/16] gtp: really check namespaces before xmit
  2021-01-23 19:59 [RFC PATCH 00/16] GTP: flow based Jonas Bonn
                   ` (2 preceding siblings ...)
  2021-01-23 19:59 ` [RFC PATCH 03/16] gtp: include role in link info Jonas Bonn
@ 2021-01-23 19:59 ` Jonas Bonn
  2021-01-23 19:59 ` [RFC PATCH 05/16] gtp: drop unnecessary call to skb_dst_drop Jonas Bonn
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 39+ messages in thread
From: Jonas Bonn @ 2021-01-23 19:59 UTC (permalink / raw)
  To: laforge, netdev, pbshelar, kuba; +Cc: pablo, Jonas Bonn

Blindly assuming that packet transmission crosses namespaces results in
skb marks being lost in the single namespace case.

Signed-off-by: Jonas Bonn <jonas@norrbonn.se>
Acked-by: Harald Welte <laforge@gnumonks.org>
---
 drivers/net/gtp.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 5682d3ba7aa5..e4e57c0552ee 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -592,7 +592,9 @@ static netdev_tx_t gtp_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 				    ip4_dst_hoplimit(&pktinfo.rt->dst),
 				    0,
 				    pktinfo.gtph_port, pktinfo.gtph_port,
-				    true, false);
+				    !net_eq(sock_net(pktinfo.pctx->sk),
+					    dev_net(dev)),
+				    false);
 		break;
 	}
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 05/16] gtp: drop unnecessary call to skb_dst_drop
  2021-01-23 19:59 [RFC PATCH 00/16] GTP: flow based Jonas Bonn
                   ` (3 preceding siblings ...)
  2021-01-23 19:59 ` [RFC PATCH 04/16] gtp: really check namespaces before xmit Jonas Bonn
@ 2021-01-23 19:59 ` Jonas Bonn
  2021-01-24 16:48   ` Harald Welte
  2021-01-23 19:59 ` [RFC PATCH 06/16] gtp: set device type Jonas Bonn
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 39+ messages in thread
From: Jonas Bonn @ 2021-01-23 19:59 UTC (permalink / raw)
  To: laforge, netdev, pbshelar, kuba; +Cc: pablo, Jonas Bonn

The call to skb_dst_drop() is already done as part of udp_tunnel_xmit().

Signed-off-by: Jonas Bonn <jonas@norrbonn.se>
---
 drivers/net/gtp.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index e4e57c0552ee..04d9de385549 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -515,8 +515,6 @@ static int gtp_build_skb_ip4(struct sk_buff *skb, struct net_device *dev,
 		goto err_rt;
 	}
 
-	skb_dst_drop(skb);
-
 	/* This is similar to tnl_update_pmtu(). */
 	df = iph->frag_off;
 	if (df) {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 06/16] gtp: set device type
  2021-01-23 19:59 [RFC PATCH 00/16] GTP: flow based Jonas Bonn
                   ` (4 preceding siblings ...)
  2021-01-23 19:59 ` [RFC PATCH 05/16] gtp: drop unnecessary call to skb_dst_drop Jonas Bonn
@ 2021-01-23 19:59 ` Jonas Bonn
  2021-01-23 19:59 ` [RFC PATCH 07/16] gtp: rework IPv4 functionality Jonas Bonn
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 39+ messages in thread
From: Jonas Bonn @ 2021-01-23 19:59 UTC (permalink / raw)
  To: laforge, netdev, pbshelar, kuba; +Cc: pablo, Jonas Bonn

Set the devtype to 'gtp' when setting up the link.

Signed-off-by: Jonas Bonn <jonas@norrbonn.se>
Acked-by: Harald Welte <laforge@gnumonks.org>
---
 drivers/net/gtp.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 04d9de385549..a1bb02818977 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -610,6 +610,10 @@ static const struct net_device_ops gtp_netdev_ops = {
 	.ndo_get_stats64	= dev_get_tstats64,
 };
 
+static const struct device_type gtp_type = {
+	.name = "gtp",
+};
+
 static void gtp_link_setup(struct net_device *dev)
 {
 	unsigned int max_gtp_header_len = sizeof(struct iphdr) +
@@ -618,6 +622,7 @@ static void gtp_link_setup(struct net_device *dev)
 
 	dev->netdev_ops		= &gtp_netdev_ops;
 	dev->needs_free_netdev	= true;
+	SET_NETDEV_DEVTYPE(dev, &gtp_type);
 
 	dev->hard_header_len = 0;
 	dev->addr_len = 0;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 07/16] gtp: rework IPv4 functionality
  2021-01-23 19:59 [RFC PATCH 00/16] GTP: flow based Jonas Bonn
                   ` (5 preceding siblings ...)
  2021-01-23 19:59 ` [RFC PATCH 06/16] gtp: set device type Jonas Bonn
@ 2021-01-23 19:59 ` Jonas Bonn
  2021-01-23 19:59 ` [RFC PATCH 08/16] gtp: set dev features to enable GSO Jonas Bonn
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 39+ messages in thread
From: Jonas Bonn @ 2021-01-23 19:59 UTC (permalink / raw)
  To: laforge, netdev, pbshelar, kuba; +Cc: pablo, Jonas Bonn

This patch does some cleanup work in the IPv4 functionality to lay the
groundwork for adding support for IPv6.  The form of these changes is
largely borrowed from the bareudp and geneve drivers, so there shouldn't
be anything here that looks unnecessarily unfamiliar.

Signed-off-by: Jonas Bonn <jonas@norrbonn.se>
---
 drivers/net/gtp.c | 204 +++++++++++++++++++++-------------------------
 1 file changed, 92 insertions(+), 112 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index a1bb02818977..4a3a52970856 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -25,6 +25,7 @@
 #include <net/protocol.h>
 #include <net/ip.h>
 #include <net/udp.h>
+#include <net/ip_tunnels.h>
 #include <net/udp_tunnel.h>
 #include <net/icmp.h>
 #include <net/xfrm.h>
@@ -381,18 +382,36 @@ static void gtp_dev_uninit(struct net_device *dev)
 	free_percpu(dev->tstats);
 }
 
-static struct rtable *ip4_route_output_gtp(struct flowi4 *fl4,
-					   const struct sock *sk,
-					   __be32 daddr)
+static struct rtable *gtp_get_v4_rt(struct sk_buff *skb,
+				    struct net_device *dev,
+				    struct pdp_ctx *pctx,
+				    __be32 *saddr)
 {
-	memset(fl4, 0, sizeof(*fl4));
-	fl4->flowi4_oif		= sk->sk_bound_dev_if;
-	fl4->daddr		= daddr;
-	fl4->saddr		= inet_sk(sk)->inet_saddr;
-	fl4->flowi4_tos		= RT_CONN_FLAGS(sk);
-	fl4->flowi4_proto	= sk->sk_protocol;
-
-	return ip_route_output_key(sock_net(sk), fl4);
+	const struct sock *sk = pctx->sk;
+	struct rtable *rt = NULL;
+	struct flowi4 fl4;
+
+	memset(&fl4, 0, sizeof(fl4));
+	fl4.flowi4_oif		= sk->sk_bound_dev_if;
+	fl4.daddr		= pctx->peer_addr_ip4.s_addr;
+	fl4.saddr		= inet_sk(sk)->inet_saddr;
+	fl4.flowi4_tos		= RT_CONN_FLAGS(sk);
+	fl4.flowi4_proto	= sk->sk_protocol;
+
+	rt = ip_route_output_key(sock_net(sk), &fl4);
+	if (IS_ERR(rt)) {
+		netdev_dbg(pctx->dev, "no route to %pI4\n", &fl4.daddr);
+		return ERR_PTR(-ENETUNREACH);
+	}
+	if (rt->dst.dev == dev) {
+		netdev_dbg(pctx->dev, "circular route to %pI4\n", &fl4.daddr);
+		ip_rt_put(rt);
+		return ERR_PTR(-ELOOP);
+	}
+
+	*saddr = fl4.saddr;
+
+	return rt;
 }
 
 static inline void gtp0_push_header(struct sk_buff *skb, struct pdp_ctx *pctx)
@@ -435,54 +454,31 @@ static inline void gtp1_push_header(struct sk_buff *skb, struct pdp_ctx *pctx)
 	 */
 }
 
-struct gtp_pktinfo {
-	struct sock		*sk;
-	struct iphdr		*iph;
-	struct flowi4		fl4;
-	struct rtable		*rt;
-	struct pdp_ctx		*pctx;
-	struct net_device	*dev;
-	__be16			gtph_port;
-};
-
-static void gtp_push_header(struct sk_buff *skb, struct gtp_pktinfo *pktinfo)
+static void gtp_push_header(struct sk_buff *skb, struct pdp_ctx *pctx,
+						__be16 *port)
 {
-	switch (pktinfo->pctx->gtp_version) {
+	switch (pctx->gtp_version) {
 	case GTP_V0:
-		pktinfo->gtph_port = htons(GTP0_PORT);
-		gtp0_push_header(skb, pktinfo->pctx);
+		*port = htons(GTP0_PORT);
+		gtp0_push_header(skb, pctx);
 		break;
 	case GTP_V1:
-		pktinfo->gtph_port = htons(GTP1U_PORT);
-		gtp1_push_header(skb, pktinfo->pctx);
+		*port = htons(GTP1U_PORT);
+		gtp1_push_header(skb, pctx);
 		break;
 	}
 }
 
-static inline void gtp_set_pktinfo_ipv4(struct gtp_pktinfo *pktinfo,
-					struct sock *sk, struct iphdr *iph,
-					struct pdp_ctx *pctx, struct rtable *rt,
-					struct flowi4 *fl4,
-					struct net_device *dev)
-{
-	pktinfo->sk	= sk;
-	pktinfo->iph	= iph;
-	pktinfo->pctx	= pctx;
-	pktinfo->rt	= rt;
-	pktinfo->fl4	= *fl4;
-	pktinfo->dev	= dev;
-}
-
-static int gtp_build_skb_ip4(struct sk_buff *skb, struct net_device *dev,
-			     struct gtp_pktinfo *pktinfo)
+static int gtp_xmit_ip4(struct sk_buff *skb, struct net_device *dev)
 {
 	struct gtp_dev *gtp = netdev_priv(dev);
 	struct pdp_ctx *pctx;
 	struct rtable *rt;
-	struct flowi4 fl4;
+	__be32 saddr;
 	struct iphdr *iph;
-	__be16 df;
-	int mtu;
+	int headroom;
+	__be16 port;
+	int r;
 
 	/* Read the IP destination address and resolve the PDP context.
 	 * Prepend PDP header with TEI/TID from PDP ctx.
@@ -500,102 +496,86 @@ static int gtp_build_skb_ip4(struct sk_buff *skb, struct net_device *dev,
 	}
 	netdev_dbg(dev, "found PDP context %p\n", pctx);
 
-	rt = ip4_route_output_gtp(&fl4, pctx->sk, pctx->peer_addr_ip4.s_addr);
+	rt = gtp_get_v4_rt(skb, dev, pctx, &saddr);
 	if (IS_ERR(rt)) {
-		netdev_dbg(dev, "no route to SSGN %pI4\n",
-			   &pctx->peer_addr_ip4.s_addr);
-		dev->stats.tx_carrier_errors++;
-		goto err;
+		if (PTR_ERR(rt) == -ENETUNREACH)
+			dev->stats.tx_carrier_errors++;
+		else if (PTR_ERR(rt) == -ELOOP)
+			dev->stats.collisions++;
+		return PTR_ERR(rt);
 	}
 
-	if (rt->dst.dev == dev) {
-		netdev_dbg(dev, "circular route to SSGN %pI4\n",
-			   &pctx->peer_addr_ip4.s_addr);
-		dev->stats.collisions++;
-		goto err_rt;
-	}
+	headroom = sizeof(struct iphdr) + sizeof(struct udphdr);
 
-	/* This is similar to tnl_update_pmtu(). */
-	df = iph->frag_off;
-	if (df) {
-		mtu = dst_mtu(&rt->dst) - dev->hard_header_len -
-			sizeof(struct iphdr) - sizeof(struct udphdr);
-		switch (pctx->gtp_version) {
-		case GTP_V0:
-			mtu -= sizeof(struct gtp0_header);
-			break;
-		case GTP_V1:
-			mtu -= sizeof(struct gtp1_header);
-			break;
-		}
-	} else {
-		mtu = dst_mtu(&rt->dst);
+	switch (pctx->gtp_version) {
+	case GTP_V0:
+		headroom += sizeof(struct gtp0_header);
+		break;
+	case GTP_V1:
+		headroom += sizeof(struct gtp1_header);
+		break;
 	}
 
-	rt->dst.ops->update_pmtu(&rt->dst, NULL, skb, mtu, false);
+	r = skb_tunnel_check_pmtu(skb, &rt->dst, headroom,
+					netif_is_any_bridge_port(dev));
+	if (r < 0) {
+		ip_rt_put(rt);
+		return r;
+	} else if (r) {
+		netif_rx(skb);
+		ip_rt_put(rt);
+		return -EMSGSIZE;
+	}
 
-	if (!skb_is_gso(skb) && (iph->frag_off & htons(IP_DF)) &&
-	    mtu < ntohs(iph->tot_len)) {
-		netdev_dbg(dev, "packet too big, fragmentation needed\n");
-		memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
-		icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
-			      htonl(mtu));
+	/* Ensure there is sufficient headroom. */
+	r = skb_cow_head(skb, headroom);
+	if (unlikely(r))
 		goto err_rt;
-	}
 
-	gtp_set_pktinfo_ipv4(pktinfo, pctx->sk, iph, pctx, rt, &fl4, dev);
-	gtp_push_header(skb, pktinfo);
+	skb_reset_inner_headers(skb);
+
+	gtp_push_header(skb, pctx, &port);
+
+	iph = ip_hdr(skb);
+	netdev_dbg(dev, "gtp -> IP src: %pI4 dst: %pI4\n",
+		   &iph->saddr, &iph->daddr);
+
+	udp_tunnel_xmit_skb(rt, pctx->sk, skb,
+			    saddr, pctx->peer_addr_ip4.s_addr,
+			    iph->tos,
+			    ip4_dst_hoplimit(&rt->dst),
+			    0,
+			    port, port,
+			    !net_eq(sock_net(pctx->sk),
+				    dev_net(pctx->dev)),
+			    false);
 
 	return 0;
 err_rt:
 	ip_rt_put(rt);
-err:
 	return -EBADMSG;
 }
 
 static netdev_tx_t gtp_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	unsigned int proto = ntohs(skb->protocol);
-	struct gtp_pktinfo pktinfo;
 	int err;
 
-	/* Ensure there is sufficient headroom. */
-	if (skb_cow_head(skb, dev->needed_headroom))
+	if (proto != ETH_P_IP && proto != ETH_P_IPV6) {
+		err = -ENOTSUPP;
 		goto tx_err;
-
-	skb_reset_inner_headers(skb);
+	}
 
 	/* PDP context lookups in gtp_build_skb_*() need rcu read-side lock. */
 	rcu_read_lock();
-	switch (proto) {
-	case ETH_P_IP:
-		err = gtp_build_skb_ip4(skb, dev, &pktinfo);
-		break;
-	default:
-		err = -EOPNOTSUPP;
-		break;
-	}
+
+	err = gtp_xmit_ip4(skb, dev);
+
 	rcu_read_unlock();
 
 	if (err < 0)
 		goto tx_err;
 
-	switch (proto) {
-	case ETH_P_IP:
-		netdev_dbg(pktinfo.dev, "gtp -> IP src: %pI4 dst: %pI4\n",
-			   &pktinfo.iph->saddr, &pktinfo.iph->daddr);
-		udp_tunnel_xmit_skb(pktinfo.rt, pktinfo.sk, skb,
-				    pktinfo.fl4.saddr, pktinfo.fl4.daddr,
-				    pktinfo.iph->tos,
-				    ip4_dst_hoplimit(&pktinfo.rt->dst),
-				    0,
-				    pktinfo.gtph_port, pktinfo.gtph_port,
-				    !net_eq(sock_net(pktinfo.pctx->sk),
-					    dev_net(dev)),
-				    false);
-		break;
-	}
-
 	return NETDEV_TX_OK;
 tx_err:
 	dev->stats.tx_errors++;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 08/16] gtp: set dev features to enable GSO
  2021-01-23 19:59 [RFC PATCH 00/16] GTP: flow based Jonas Bonn
                   ` (6 preceding siblings ...)
  2021-01-23 19:59 ` [RFC PATCH 07/16] gtp: rework IPv4 functionality Jonas Bonn
@ 2021-01-23 19:59 ` Jonas Bonn
  2021-01-23 19:59 ` [RFC PATCH 09/16] gtp: support GRO Jonas Bonn
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 39+ messages in thread
From: Jonas Bonn @ 2021-01-23 19:59 UTC (permalink / raw)
  To: laforge, netdev, pbshelar, kuba; +Cc: pablo, Jonas Bonn

Signed-off-by: Jonas Bonn <jonas@norrbonn.se>
---
 drivers/net/gtp.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 4a3a52970856..df2f227680eb 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -532,7 +532,11 @@ static int gtp_xmit_ip4(struct sk_buff *skb, struct net_device *dev)
 	if (unlikely(r))
 		goto err_rt;
 
-	skb_reset_inner_headers(skb);
+	r = udp_tunnel_handle_offloads(skb, true);
+	if (unlikely(r))
+		goto err_rt;
+
+	skb_set_inner_protocol(skb, skb->protocol);
 
 	gtp_push_header(skb, pctx, &port);
 
@@ -614,6 +618,8 @@ static void gtp_link_setup(struct net_device *dev)
 
 	dev->priv_flags	|= IFF_NO_QUEUE;
 	dev->features	|= NETIF_F_LLTX;
+	dev->hw_features |= NETIF_F_SG | NETIF_F_GSO_SOFTWARE | NETIF_F_HW_CSUM;
+	dev->features	|= NETIF_F_SG | NETIF_F_GSO_SOFTWARE | NETIF_F_HW_CSUM;
 	netif_keep_dst(dev);
 
 	dev->needed_headroom	= LL_MAX_HEADER + max_gtp_header_len;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 09/16] gtp: support GRO
  2021-01-23 19:59 [RFC PATCH 00/16] GTP: flow based Jonas Bonn
                   ` (7 preceding siblings ...)
  2021-01-23 19:59 ` [RFC PATCH 08/16] gtp: set dev features to enable GSO Jonas Bonn
@ 2021-01-23 19:59 ` Jonas Bonn
  2021-01-23 19:59 ` [RFC PATCH 10/16] gtp: refactor check_ms back into version specific handlers Jonas Bonn
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 39+ messages in thread
From: Jonas Bonn @ 2021-01-23 19:59 UTC (permalink / raw)
  To: laforge, netdev, pbshelar, kuba; +Cc: pablo, Jonas Bonn

This patch implements GRO callbacks for UDP-tunneled GTP traffic.

iperf3 numbers

Without GRO for GTP tunnels:

Accepted connection from 172.99.2.1, port 48783
[  5] local 172.99.0.1 port 5201 connected to 172.99.2.1 port 46095
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   563 MBytes  576306 KBytes/sec
[  5]   1.00-2.00   sec   681 MBytes  697814 KBytes/sec
[  5]   2.00-3.00   sec   677 MBytes  693612 KBytes/sec
[  5]   3.00-4.00   sec   679 MBytes  695690 KBytes/sec
[  5]   4.00-5.00   sec   683 MBytes  699521 KBytes/sec
[  5]   5.00-6.00   sec   682 MBytes  698922 KBytes/sec
[  5]   6.00-7.00   sec   683 MBytes  699820 KBytes/sec
[  5]   7.00-8.00   sec   682 MBytes  698052 KBytes/sec
[  5]   8.00-9.00   sec   683 MBytes  699245 KBytes/sec
[  5]   9.00-10.00  sec   683 MBytes  699554 KBytes/sec
[  5]  10.00-10.00  sec   616 KBytes  687914 KBytes/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  6.54 GBytes  685853 KBytes/sec                  receiver

With GRO for GTP tunnels:

Accepted connection from 172.99.2.1, port 40847
[  5] local 172.99.0.1 port 5201 connected to 172.99.2.1 port 55053
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   989 MBytes  1012640 KBytes/sec
[  5]   1.00-2.00   sec  1.23 GBytes  1291408 KBytes/sec
[  5]   2.00-3.00   sec  1.26 GBytes  1320197 KBytes/sec
[  5]   3.00-4.00   sec  1.29 GBytes  1350097 KBytes/sec
[  5]   4.00-5.00   sec  1.23 GBytes  1284512 KBytes/sec
[  5]   5.00-6.00   sec  1.26 GBytes  1326329 KBytes/sec
[  5]   6.00-7.00   sec  1.28 GBytes  1338620 KBytes/sec
[  5]   7.00-8.00   sec  1.28 GBytes  1346391 KBytes/sec
[  5]   8.00-9.00   sec  1.30 GBytes  1366394 KBytes/sec
[  5]   9.00-10.00  sec  1.26 GBytes  1323848 KBytes/sec
[  5]  10.00-10.00  sec   384 KBytes  1113043 KBytes/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  12.4 GBytes  1296036 KBytes/sec                  receiver

Signed-off-by: Jonas Bonn <jonas@norrbonn.se>
---
 drivers/net/gtp.c | 126 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 126 insertions(+)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index df2f227680eb..b20e17988bfa 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -361,6 +361,128 @@ static int gtp_encap_recv(struct sock *sk, struct sk_buff *skb)
 	return ret;
 }
 
+static int gtp_gro_complete(struct sock *sk, struct sk_buff * skb, int nhoff)
+{
+	size_t hdrlen;
+	char* gtphdr = skb->data + nhoff;
+	u8 version;
+	__be16 type;
+	struct packet_offload *ptype;
+	uint8_t ipver;
+	int err = -ENOENT;
+
+	version = *gtphdr >> 5;
+	switch (version) {
+	case GTP_V0:
+		hdrlen = sizeof(struct gtp0_header);
+		break;
+	case GTP_V1:
+		hdrlen = sizeof(struct gtp1_header);
+		if (*gtphdr & GTP1_F_MASK)
+			hdrlen += 4;
+		break;
+	}
+
+	skb_set_inner_network_header(skb, nhoff + hdrlen);
+
+	ipver = inner_ip_hdr(skb)->version;
+	switch (ipver) {
+	case 4:
+		type = cpu_to_be16(ETH_P_IP);
+		break;
+	case 6:
+		type = cpu_to_be16(ETH_P_IPV6);
+		break;
+	default:
+		goto out;
+	}
+
+	rcu_read_lock();
+	ptype = gro_find_complete_by_type(type);
+	if (!ptype)
+		goto out_unlock;
+
+	err = ptype->callbacks.gro_complete(skb, nhoff + hdrlen);
+
+	skb_set_inner_mac_header(skb, nhoff + hdrlen);
+
+out_unlock:
+	rcu_read_unlock();
+out:
+
+	return err;
+
+}
+
+static struct sk_buff *gtp_gro_receive(struct sock *sk,
+				       struct list_head *head,
+				       struct sk_buff *skb)
+{
+	size_t off, hdrlen;
+	char* gtphdr;
+	u8 version;
+	struct sk_buff *pp = NULL;
+	__be16 type;
+	struct packet_offload *ptype;
+
+	off = skb_gro_offset(skb);
+
+	gtphdr = skb_gro_header_fast(skb, off);
+	if (skb_gro_header_hard(skb, off+1)) {
+		gtphdr = skb_gro_header_slow(skb, off+1, off);
+		if (unlikely(!gtphdr))
+			goto out;
+	}
+
+	version = *gtphdr >> 5;
+	switch (version) {
+	case GTP_V0:
+		hdrlen = sizeof(struct gtp0_header);
+		break;
+	case GTP_V1:
+		hdrlen = sizeof(struct gtp1_header);
+		if (*gtphdr & GTP1_F_MASK)
+			hdrlen += 4;
+		break;
+	}
+
+	gtphdr = skb_gro_header_fast(skb, off);
+	if (skb_gro_header_hard(skb, off+hdrlen)) {
+		gtphdr = skb_gro_header_slow(skb, off+hdrlen, off);
+		if (unlikely(!gtphdr))
+			goto out;
+	}
+
+	skb_set_inner_network_header(skb, off + hdrlen);
+
+	switch(inner_ip_hdr(skb)->version) {
+	case 4:
+		type = cpu_to_be16(ETH_P_IP);
+		break;
+	case 6:
+		type = cpu_to_be16(ETH_P_IPV6);
+		break;
+	default:
+		goto out;
+	}
+
+	rcu_read_lock();
+	ptype = gro_find_receive_by_type(type);
+	if (!ptype)
+		goto out_unlock;
+
+	skb_gro_pull(skb, hdrlen);
+	skb_gro_postpull_rcsum(skb, gtphdr, hdrlen);
+
+	pp = call_gro_receive(ptype->callbacks.gro_receive, head, skb);
+
+out_unlock:
+	rcu_read_unlock();
+out:
+
+	return pp;
+}
+
 static int gtp_dev_init(struct net_device *dev)
 {
 	struct gtp_dev *gtp = netdev_priv(dev);
@@ -618,7 +740,9 @@ static void gtp_link_setup(struct net_device *dev)
 
 	dev->priv_flags	|= IFF_NO_QUEUE;
 	dev->features	|= NETIF_F_LLTX;
+	dev->hw_features |= NETIF_F_RXCSUM;
 	dev->hw_features |= NETIF_F_SG | NETIF_F_GSO_SOFTWARE | NETIF_F_HW_CSUM;
+	dev->features	|= NETIF_F_RXCSUM;
 	dev->features	|= NETIF_F_SG | NETIF_F_GSO_SOFTWARE | NETIF_F_HW_CSUM;
 	netif_keep_dst(dev);
 
@@ -814,6 +938,8 @@ static struct sock *gtp_encap_enable_socket(int fd, int type,
 	tuncfg.encap_type = type;
 	tuncfg.encap_rcv = gtp_encap_recv;
 	tuncfg.encap_destroy = gtp_encap_destroy;
+	tuncfg.gro_receive = gtp_gro_receive;
+	tuncfg.gro_complete = gtp_gro_complete;
 
 	setup_udp_tunnel_sock(sock_net(sock->sk), sock, &tuncfg);
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 10/16] gtp: refactor check_ms back into version specific handlers
  2021-01-23 19:59 [RFC PATCH 00/16] GTP: flow based Jonas Bonn
                   ` (8 preceding siblings ...)
  2021-01-23 19:59 ` [RFC PATCH 09/16] gtp: support GRO Jonas Bonn
@ 2021-01-23 19:59 ` Jonas Bonn
  2021-01-23 19:59 ` [RFC PATCH 11/16] gtp: drop duplicated assignment Jonas Bonn
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 39+ messages in thread
From: Jonas Bonn @ 2021-01-23 19:59 UTC (permalink / raw)
  To: laforge, netdev, pbshelar, kuba; +Cc: pablo, Jonas Bonn

This is preparatory work for adding flow based tunneling work by Pravin
Shelar.

Signed-off-by: Jonas Bonn <jonas@norrbonn.se>
---
 drivers/net/gtp.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index b20e17988bfa..c42092bb505f 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -181,13 +181,8 @@ static bool gtp_check_ms(struct sk_buff *skb, struct pdp_ctx *pctx,
 }
 
 static int gtp_rx(struct pdp_ctx *pctx, struct sk_buff *skb,
-			unsigned int hdrlen, unsigned int role)
+			unsigned int hdrlen)
 {
-	if (!gtp_check_ms(skb, pctx, hdrlen, role)) {
-		netdev_dbg(pctx->dev, "No PDP ctx for this MS\n");
-		return 1;
-	}
-
 	/* Get rid of the GTP + UDP headers. */
 	if (iptunnel_pull_header(skb, hdrlen, skb->protocol,
 				 !net_eq(sock_net(pctx->sk), dev_net(pctx->dev))))
@@ -234,7 +229,12 @@ static int gtp0_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
 		return 1;
 	}
 
-	return gtp_rx(pctx, skb, hdrlen, gtp->role);
+	if (!gtp_check_ms(skb, pctx, hdrlen, gtp->role)) {
+		netdev_dbg(pctx->dev, "No PDP ctx for this MS\n");
+		return 1;
+	}
+
+	return gtp_rx(pctx, skb, hdrlen);
 }
 
 static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
@@ -276,7 +276,12 @@ static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
 		return 1;
 	}
 
-	return gtp_rx(pctx, skb, hdrlen, gtp->role);
+	if (!gtp_check_ms(skb, pctx, hdrlen, gtp->role)) {
+		netdev_dbg(pctx->dev, "No PDP ctx for this MS\n");
+		return 1;
+	}
+
+	return gtp_rx(pctx, skb, hdrlen);
 }
 
 static void __gtp_encap_destroy(struct sock *sk)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 11/16] gtp: drop duplicated assignment
  2021-01-23 19:59 [RFC PATCH 00/16] GTP: flow based Jonas Bonn
                   ` (9 preceding siblings ...)
  2021-01-23 19:59 ` [RFC PATCH 10/16] gtp: refactor check_ms back into version specific handlers Jonas Bonn
@ 2021-01-23 19:59 ` Jonas Bonn
  2021-01-23 19:59 ` [RFC PATCH 12/16] gtp: update rx_length_errors for abnormally short packets Jonas Bonn
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 39+ messages in thread
From: Jonas Bonn @ 2021-01-23 19:59 UTC (permalink / raw)
  To: laforge, netdev, pbshelar, kuba; +Cc: pablo, Jonas Bonn

This assignment is already done a few line earlier.

Signed-off-by: Jonas Bonn <jonas@norrbonn.se>
---
 drivers/net/gtp.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index c42092bb505f..023d38b1098d 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -268,8 +268,6 @@ static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
 	if (!pskb_may_pull(skb, hdrlen))
 		return -1;
 
-	gtp1 = (struct gtp1_header *)(skb->data + sizeof(struct udphdr));
-
 	pctx = gtp1_pdp_find(gtp, ntohl(gtp1->tid));
 	if (!pctx) {
 		netdev_dbg(gtp->dev, "No PDP ctx to decap skb=%p\n", skb);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 12/16] gtp: update rx_length_errors for abnormally short packets
  2021-01-23 19:59 [RFC PATCH 00/16] GTP: flow based Jonas Bonn
                   ` (10 preceding siblings ...)
  2021-01-23 19:59 ` [RFC PATCH 11/16] gtp: drop duplicated assignment Jonas Bonn
@ 2021-01-23 19:59 ` Jonas Bonn
  2021-01-24 16:50   ` Harald Welte
  2021-01-23 19:59 ` [RFC PATCH 13/16] gtp: set skb protocol after pulling headers Jonas Bonn
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 39+ messages in thread
From: Jonas Bonn @ 2021-01-23 19:59 UTC (permalink / raw)
  To: laforge, netdev, pbshelar, kuba; +Cc: pablo, Jonas Bonn

Based on work by Pravin Shelar.

Update appropriate stats when packet transmission isn't possible.

Signed-off-by: Jonas Bonn <jonas@norrbonn.se>
---
 drivers/net/gtp.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 023d38b1098d..7ab8540e46d2 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -185,8 +185,10 @@ static int gtp_rx(struct pdp_ctx *pctx, struct sk_buff *skb,
 {
 	/* Get rid of the GTP + UDP headers. */
 	if (iptunnel_pull_header(skb, hdrlen, skb->protocol,
-				 !net_eq(sock_net(pctx->sk), dev_net(pctx->dev))))
-		return -1;
+			 !net_eq(sock_net(pctx->sk), dev_net(pctx->dev)))) {
+		pctx->dev->stats.rx_length_errors++;
+		goto err;
+	}
 
 	netdev_dbg(pctx->dev, "forwarding packet from GGSN to uplink\n");
 
@@ -202,6 +204,10 @@ static int gtp_rx(struct pdp_ctx *pctx, struct sk_buff *skb,
 
 	netif_rx(skb);
 	return 0;
+
+err:
+	pctx->dev->stats.rx_dropped++;
+	return -1;
 }
 
 /* 1 means pass up to the stack, -1 means drop and 0 means decapsulated. */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 13/16] gtp: set skb protocol after pulling headers
  2021-01-23 19:59 [RFC PATCH 00/16] GTP: flow based Jonas Bonn
                   ` (11 preceding siblings ...)
  2021-01-23 19:59 ` [RFC PATCH 12/16] gtp: update rx_length_errors for abnormally short packets Jonas Bonn
@ 2021-01-23 19:59 ` Jonas Bonn
  2021-01-23 19:59 ` [RFC PATCH 14/16] gtp: add support for flow based tunneling Jonas Bonn
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 39+ messages in thread
From: Jonas Bonn @ 2021-01-23 19:59 UTC (permalink / raw)
  To: laforge, netdev, pbshelar, kuba; +Cc: pablo, Jonas Bonn

Based on work by Pravin Shelar.

Once the GTP headers have been the removed, the SKB protocol should be
set to that of the inner packet.

Signed-off-by: Jonas Bonn <jonas@norrbonn.se>
---
 drivers/net/gtp.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 7ab8540e46d2..8aab46ec8a94 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -197,6 +197,20 @@ static int gtp_rx(struct pdp_ctx *pctx, struct sk_buff *skb,
 	 * calculate the transport header.
 	 */
 	skb_reset_network_header(skb);
+	if (pskb_may_pull(skb, sizeof(struct iphdr))) {
+		struct iphdr *iph;
+
+		iph = ip_hdr(skb);
+		if (iph->version == 4) {
+			netdev_dbg(pctx->dev, "inner pkt: ipv4");
+			skb->protocol = htons(ETH_P_IP);
+		} else if (iph->version == 6) {
+			netdev_dbg(pctx->dev, "inner pkt: ipv6");
+			skb->protocol = htons(ETH_P_IPV6);
+		} else {
+			netdev_dbg(pctx->dev, "inner pkt error: Unknown type");
+		}
+	}
 
 	skb->dev = pctx->dev;
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 14/16] gtp: add support for flow based tunneling
  2021-01-23 19:59 [RFC PATCH 00/16] GTP: flow based Jonas Bonn
                   ` (12 preceding siblings ...)
  2021-01-23 19:59 ` [RFC PATCH 13/16] gtp: set skb protocol after pulling headers Jonas Bonn
@ 2021-01-23 19:59 ` Jonas Bonn
  2021-01-24 14:27   ` Jonas Bonn
                     ` (4 more replies)
  2021-01-23 19:59 ` [RFC PATCH 15/16] gtp: add ability to send GTP controls headers Jonas Bonn
                   ` (2 subsequent siblings)
  16 siblings, 5 replies; 39+ messages in thread
From: Jonas Bonn @ 2021-01-23 19:59 UTC (permalink / raw)
  To: laforge, netdev, pbshelar, kuba; +Cc: pablo, Jonas Bonn

From: Pravin B Shelar <pbshelar@fb.com>

This patch adds support for flow based tunneling, allowing to send and
receive GTP tunneled packets via the (lightweight) tunnel metadata
mechanism.  This would allow integration with OVS and eBPF using flow
based tunneling APIs.

The mechanism used here is to get the required GTP tunnel parameters
from the tunnel metadata instead of looking up a pre-configured PDP
context.  The tunnel metadata contains the necessary information for
creating the GTP header.

Signed-off-by: Jonas Bonn <jonas@norrbonn.se>
---
 drivers/net/gtp.c                  | 160 +++++++++++++++++++++++++----
 include/uapi/linux/gtp.h           |  12 +++
 include/uapi/linux/if_tunnel.h     |   1 +
 tools/include/uapi/linux/if_link.h |   1 +
 4 files changed, 156 insertions(+), 18 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 8aab46ec8a94..668ed8a4836e 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -21,6 +21,7 @@
 #include <linux/file.h>
 #include <linux/gtp.h>
 
+#include <net/dst_metadata.h>
 #include <net/net_namespace.h>
 #include <net/protocol.h>
 #include <net/ip.h>
@@ -74,6 +75,9 @@ struct gtp_dev {
 	unsigned int		hash_size;
 	struct hlist_head	*tid_hash;
 	struct hlist_head	*addr_hash;
+	/* Used by LWT tunnel. */
+	bool			collect_md;
+	struct socket		*collect_md_sock;
 };
 
 static unsigned int gtp_net_id __read_mostly;
@@ -224,6 +228,51 @@ static int gtp_rx(struct pdp_ctx *pctx, struct sk_buff *skb,
 	return -1;
 }
 
+static int gtp_set_tun_dst(struct pdp_ctx *pctx, struct sk_buff *skb,
+			   unsigned int hdrlen)
+{
+	struct metadata_dst *tun_dst;
+	struct gtp1_header *gtp1;
+	int opts_len = 0;
+	__be64 tid;
+
+	gtp1 = (struct gtp1_header *)(skb->data + sizeof(struct udphdr));
+
+	tid = key32_to_tunnel_id(gtp1->tid);
+
+	if (unlikely(gtp1->flags & GTP1_F_MASK))
+		opts_len = sizeof(struct gtpu_metadata);
+
+	tun_dst = udp_tun_rx_dst(skb,
+			pctx->sk->sk_family, TUNNEL_KEY, tid, opts_len);
+	if (!tun_dst) {
+		netdev_dbg(pctx->dev, "Failed to allocate tun_dst");
+		goto err;
+	}
+
+	netdev_dbg(pctx->dev, "attaching metadata_dst to skb, gtp ver %d hdrlen %d\n",
+		   pctx->gtp_version, hdrlen);
+	if (unlikely(opts_len)) {
+		struct gtpu_metadata *opts;
+
+		opts = ip_tunnel_info_opts(&tun_dst->u.tun_info);
+		opts->ver = GTP_METADATA_V1;
+		opts->flags = gtp1->flags;
+		opts->type = gtp1->type;
+		netdev_dbg(pctx->dev, "recved control pkt: flag %x type: %d\n",
+			   opts->flags, opts->type);
+		tun_dst->u.tun_info.key.tun_flags |= TUNNEL_GTPU_OPT;
+		tun_dst->u.tun_info.options_len = opts_len;
+		skb->protocol = htons(0xffff);         /* Unknown */
+	}
+
+	skb_dst_set(skb, &tun_dst->dst);
+	return 0;
+err:
+	return -1;
+}
+
+
 /* 1 means pass up to the stack, -1 means drop and 0 means decapsulated. */
 static int gtp0_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
 {
@@ -262,6 +311,7 @@ static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
 	unsigned int hdrlen = sizeof(struct udphdr) +
 			      sizeof(struct gtp1_header);
 	struct gtp1_header *gtp1;
+	struct pdp_ctx md_pctx;
 	struct pdp_ctx *pctx;
 
 	if (!pskb_may_pull(skb, hdrlen))
@@ -272,6 +322,24 @@ static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
 	if ((gtp1->flags >> 5) != GTP_V1)
 		return 1;
 
+	if (ip_tunnel_collect_metadata() || gtp->collect_md) {
+		int err;
+
+		pctx = &md_pctx;
+
+		pctx->gtp_version = GTP_V1;
+		pctx->sk = gtp->sk1u;
+		pctx->dev = gtp->dev;
+
+		err = gtp_set_tun_dst(pctx, skb, hdrlen);
+		if (err) {
+			gtp->dev->stats.rx_dropped++;
+			return -1;
+		}
+
+		return gtp_rx(pctx, skb, hdrlen);
+	}
+
 	if (gtp1->type != GTP_TPDU)
 		return 1;
 
@@ -353,7 +421,8 @@ static int gtp_encap_recv(struct sock *sk, struct sk_buff *skb)
 	if (!gtp)
 		return 1;
 
-	netdev_dbg(gtp->dev, "encap_recv sk=%p\n", sk);
+	netdev_dbg(gtp->dev, "encap_recv sk=%p type %d\n",
+		   sk, udp_sk(sk)->encap_type);
 
 	switch (udp_sk(sk)->encap_type) {
 	case UDP_ENCAP_GTP0:
@@ -539,7 +608,7 @@ static struct rtable *gtp_get_v4_rt(struct sk_buff *skb,
 	memset(&fl4, 0, sizeof(fl4));
 	fl4.flowi4_oif		= sk->sk_bound_dev_if;
 	fl4.daddr		= pctx->peer_addr_ip4.s_addr;
-	fl4.saddr		= inet_sk(sk)->inet_saddr;
+	fl4.saddr		= *saddr;
 	fl4.flowi4_tos		= RT_CONN_FLAGS(sk);
 	fl4.flowi4_proto	= sk->sk_protocol;
 
@@ -617,29 +686,84 @@ static void gtp_push_header(struct sk_buff *skb, struct pdp_ctx *pctx,
 static int gtp_xmit_ip4(struct sk_buff *skb, struct net_device *dev)
 {
 	struct gtp_dev *gtp = netdev_priv(dev);
+	struct gtpu_metadata *opts = NULL;
+	struct pdp_ctx md_pctx;
 	struct pdp_ctx *pctx;
+	__be16 port;
 	struct rtable *rt;
-	__be32 saddr;
 	struct iphdr *iph;
+	__be32 saddr;
 	int headroom;
-	__be16 port;
+	__u8 tos;
 	int r;
 
-	/* Read the IP destination address and resolve the PDP context.
-	 * Prepend PDP header with TEI/TID from PDP ctx.
-	 */
-	iph = ip_hdr(skb);
-	if (gtp->role == GTP_ROLE_SGSN)
-		pctx = ipv4_pdp_find(gtp, iph->saddr);
-	else
-		pctx = ipv4_pdp_find(gtp, iph->daddr);
+	if (gtp->collect_md) {
+		/* LWT GTP1U encap */
+		struct ip_tunnel_info *info = NULL;
 
-	if (!pctx) {
-		netdev_dbg(dev, "no PDP ctx found for %pI4, skip\n",
-			   &iph->daddr);
-		return -ENOENT;
+		info = skb_tunnel_info(skb);
+		if (!info) {
+			netdev_dbg(dev, "missing tunnel info");
+			return -ENOENT;
+		}
+		if (info->key.tp_dst && ntohs(info->key.tp_dst) != GTP1U_PORT) {
+			netdev_dbg(dev, "unexpected GTP dst port: %d", ntohs(info->key.tp_dst));
+			return -EOPNOTSUPP;
+		}
+
+		if (!gtp->sk1u) {
+			netdev_dbg(dev, "missing tunnel sock");
+			return -EOPNOTSUPP;
+		}
+
+		pctx = &md_pctx;
+		memset(pctx, 0, sizeof(*pctx));
+		pctx->sk = gtp->sk1u;
+		pctx->gtp_version = GTP_V1;
+		pctx->u.v1.o_tei = ntohl(tunnel_id_to_key32(info->key.tun_id));
+		pctx->peer_addr_ip4.s_addr = info->key.u.ipv4.dst;
+
+		saddr = info->key.u.ipv4.src;
+		tos = info->key.tos;
+
+		if (info->options_len != 0) {
+			if (info->key.tun_flags & TUNNEL_GTPU_OPT) {
+				opts = ip_tunnel_info_opts(info);
+			} else {
+				netdev_dbg(dev, "missing tunnel metadata for control pkt");
+				return -EOPNOTSUPP;
+			}
+		}
+		netdev_dbg(dev, "flow-based GTP1U encap: tunnel id %d\n",
+			   pctx->u.v1.o_tei);
+	} else {
+		struct iphdr *iph;
+
+		if (ntohs(skb->protocol) != ETH_P_IP)
+			return -EOPNOTSUPP;
+
+		iph = ip_hdr(skb);
+
+		/* Read the IP destination address and resolve the PDP context.
+		 * Prepend PDP header with TEI/TID from PDP ctx.
+		 */
+		if (gtp->role == GTP_ROLE_SGSN)
+			pctx = ipv4_pdp_find(gtp, iph->saddr);
+		else
+			pctx = ipv4_pdp_find(gtp, iph->daddr);
+
+		if (!pctx) {
+			netdev_dbg(dev, "no PDP ctx found for %pI4, skip\n",
+				   &iph->daddr);
+			return -ENOENT;
+		}
+		netdev_dbg(dev, "found PDP context %p\n", pctx);
+
+		saddr = inet_sk(pctx->sk)->inet_saddr;
+		tos = iph->tos;
+		netdev_dbg(dev, "gtp -> IP src: %pI4 dst: %pI4\n",
+			   &iph->saddr, &iph->daddr);
 	}
-	netdev_dbg(dev, "found PDP context %p\n", pctx);
 
 	rt = gtp_get_v4_rt(skb, dev, pctx, &saddr);
 	if (IS_ERR(rt)) {
@@ -691,7 +815,7 @@ static int gtp_xmit_ip4(struct sk_buff *skb, struct net_device *dev)
 
 	udp_tunnel_xmit_skb(rt, pctx->sk, skb,
 			    saddr, pctx->peer_addr_ip4.s_addr,
-			    iph->tos,
+			    tos,
 			    ip4_dst_hoplimit(&rt->dst),
 			    0,
 			    port, port,
diff --git a/include/uapi/linux/gtp.h b/include/uapi/linux/gtp.h
index 79f9191bbb24..62aff78b7c56 100644
--- a/include/uapi/linux/gtp.h
+++ b/include/uapi/linux/gtp.h
@@ -2,6 +2,8 @@
 #ifndef _UAPI_LINUX_GTP_H_
 #define _UAPI_LINUX_GTP_H_
 
+#include <linux/types.h>
+
 #define GTP_GENL_MCGRP_NAME	"gtp"
 
 enum gtp_genl_cmds {
@@ -34,4 +36,14 @@ enum gtp_attrs {
 };
 #define GTPA_MAX (__GTPA_MAX + 1)
 
+enum {
+	GTP_METADATA_V1
+};
+
+struct gtpu_metadata {
+	__u8    ver;
+	__u8    flags;
+	__u8    type;
+};
+
 #endif /* _UAPI_LINUX_GTP_H_ */
diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
index 7d9105533c7b..802da679fab1 100644
--- a/include/uapi/linux/if_tunnel.h
+++ b/include/uapi/linux/if_tunnel.h
@@ -176,6 +176,7 @@ enum {
 #define TUNNEL_VXLAN_OPT	__cpu_to_be16(0x1000)
 #define TUNNEL_NOCACHE		__cpu_to_be16(0x2000)
 #define TUNNEL_ERSPAN_OPT	__cpu_to_be16(0x4000)
+#define TUNNEL_GTPU_OPT		__cpu_to_be16(0x8000)
 
 #define TUNNEL_OPTIONS_PRESENT \
 		(TUNNEL_GENEVE_OPT | TUNNEL_VXLAN_OPT | TUNNEL_ERSPAN_OPT)
diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h
index d208b2af697f..28d649bda686 100644
--- a/tools/include/uapi/linux/if_link.h
+++ b/tools/include/uapi/linux/if_link.h
@@ -617,6 +617,7 @@ enum {
 	IFLA_GTP_FD1,
 	IFLA_GTP_PDP_HASHSIZE,
 	IFLA_GTP_ROLE,
+	IFLA_GTP_COLLECT_METADATA,
 	__IFLA_GTP_MAX,
 };
 #define IFLA_GTP_MAX (__IFLA_GTP_MAX - 1)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 15/16] gtp: add ability to send GTP controls headers
  2021-01-23 19:59 [RFC PATCH 00/16] GTP: flow based Jonas Bonn
                   ` (13 preceding siblings ...)
  2021-01-23 19:59 ` [RFC PATCH 14/16] gtp: add support for flow based tunneling Jonas Bonn
@ 2021-01-23 19:59 ` Jonas Bonn
  2021-01-24 14:21   ` Jonas Bonn
  2021-01-23 19:59 ` [RFC PATCH 16/16] gtp: add netlink support for setting up flow based tunnels Jonas Bonn
  2021-01-24 17:42 ` [RFC PATCH 00/16] GTP: flow based Pravin Shelar
  16 siblings, 1 reply; 39+ messages in thread
From: Jonas Bonn @ 2021-01-23 19:59 UTC (permalink / raw)
  To: laforge, netdev, pbshelar, kuba; +Cc: pablo, Jonas Bonn

From: Pravin B Shelar <pbshelar@fb.com>

Please explain how this patch actually works... creation of the control
header makes sense, but I don't understand how sending of a
control header is actually triggered.

Signed-off-by: Jonas Bonn <jonas@norrbonn.se>
---
 drivers/net/gtp.c | 43 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 42 insertions(+), 1 deletion(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 668ed8a4836e..bbce2671de2d 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -683,6 +683,38 @@ static void gtp_push_header(struct sk_buff *skb, struct pdp_ctx *pctx,
 	}
 }
 
+static inline int gtp1_push_control_header(struct sk_buff *skb,
+					   struct pdp_ctx *pctx,
+					   struct gtpu_metadata *opts,
+					   struct net_device *dev)
+{
+	struct gtp1_header *gtp1c;
+	int payload_len;
+
+	if (opts->ver != GTP_METADATA_V1)
+		return -ENOENT;
+
+	if (opts->type == 0xFE) {
+		/* for end marker ignore skb data. */
+		netdev_dbg(dev, "xmit pkt with null data");
+		pskb_trim(skb, 0);
+	}
+	if (skb_cow_head(skb, sizeof(*gtp1c)) < 0)
+		return -ENOMEM;
+
+	payload_len = skb->len;
+
+	gtp1c = skb_push(skb, sizeof(*gtp1c));
+
+	gtp1c->flags	= opts->flags;
+	gtp1c->type	= opts->type;
+	gtp1c->length	= htons(payload_len);
+	gtp1c->tid	= htonl(pctx->u.v1.o_tei);
+	netdev_dbg(dev, "xmit control pkt: ver %d flags %x type %x pkt len %d tid %x",
+		   opts->ver, opts->flags, opts->type, skb->len, pctx->u.v1.o_tei);
+	return 0;
+}
+
 static int gtp_xmit_ip4(struct sk_buff *skb, struct net_device *dev)
 {
 	struct gtp_dev *gtp = netdev_priv(dev);
@@ -807,7 +839,16 @@ static int gtp_xmit_ip4(struct sk_buff *skb, struct net_device *dev)
 
 	skb_set_inner_protocol(skb, skb->protocol);
 
-	gtp_push_header(skb, pctx, &port);
+	if (unlikely(opts)) {
+		port = htons(GTP1U_PORT);
+		r = gtp1_push_control_header(skb, pctx, opts, dev);
+		if (r) {
+			netdev_info(dev, "cntr pkt error %d", r);
+			goto err_rt;
+		}
+	} else {
+		gtp_push_header(skb, pctx, &port);
+	}
 
 	iph = ip_hdr(skb);
 	netdev_dbg(dev, "gtp -> IP src: %pI4 dst: %pI4\n",
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 16/16] gtp: add netlink support for setting up flow based tunnels
  2021-01-23 19:59 [RFC PATCH 00/16] GTP: flow based Jonas Bonn
                   ` (14 preceding siblings ...)
  2021-01-23 19:59 ` [RFC PATCH 15/16] gtp: add ability to send GTP controls headers Jonas Bonn
@ 2021-01-23 19:59 ` Jonas Bonn
  2021-01-24 17:42 ` [RFC PATCH 00/16] GTP: flow based Pravin Shelar
  16 siblings, 0 replies; 39+ messages in thread
From: Jonas Bonn @ 2021-01-23 19:59 UTC (permalink / raw)
  To: laforge, netdev, pbshelar, kuba; +Cc: pablo, Jonas Bonn

From: Pravin B Shelar <pbshelar@fb.com>

This adds the Netlink interface necessary to set up flow based tunnels.

Signed-off-by: Jonas Bonn <jonas@norrbonn.se>
---
 drivers/net/gtp.c            | 139 +++++++++++++++++++++++++++--------
 include/uapi/linux/if_link.h |   1 +
 2 files changed, 111 insertions(+), 29 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index bbce2671de2d..a4fff0f1e174 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -407,6 +407,11 @@ static void gtp_encap_disable(struct gtp_dev *gtp)
 {
 	gtp_encap_disable_sock(gtp->sk0);
 	gtp_encap_disable_sock(gtp->sk1u);
+	if (gtp->collect_md_sock) {
+		udp_tunnel_sock_release(gtp->collect_md_sock);
+		gtp->collect_md_sock = NULL;
+		netdev_dbg(gtp->dev, "GTP socket released.\n");
+	}
 }
 
 /* UDP encapsulation receive handler. See net/ipv4/udp.c.
@@ -904,6 +909,19 @@ static const struct net_device_ops gtp_netdev_ops = {
 	.ndo_get_stats64	= dev_get_tstats64,
 };
 
+static struct gtp_dev *gtp_find_flow_based_dev(struct net *net)
+{
+	struct gtp_net *gn = net_generic(net, gtp_net_id);
+	struct gtp_dev *gtp;
+
+	list_for_each_entry(gtp, &gn->gtp_dev_list, list) {
+		if (gtp->collect_md)
+			return gtp;
+	}
+
+	return NULL;
+}
+
 static const struct device_type gtp_type = {
 	.name = "gtp",
 };
@@ -938,7 +956,7 @@ static void gtp_link_setup(struct net_device *dev)
 }
 
 static int gtp_hashtable_new(struct gtp_dev *gtp, int hsize);
-static int gtp_encap_enable(struct gtp_dev *gtp, struct nlattr *data[]);
+static int gtp_encap_enable(struct gtp_dev *gtp, struct net_device *dev, struct nlattr *data[]);
 
 static void gtp_destructor(struct net_device *dev)
 {
@@ -956,11 +974,24 @@ static int gtp_newlink(struct net *src_net, struct net_device *dev,
 	struct gtp_net *gn;
 	int hashsize, err;
 
-	if (!data[IFLA_GTP_FD0] && !data[IFLA_GTP_FD1])
+	if (!data[IFLA_GTP_FD0] && !data[IFLA_GTP_FD1] &&
+	    !data[IFLA_GTP_COLLECT_METADATA])
 		return -EINVAL;
 
 	gtp = netdev_priv(dev);
 
+	if (data[IFLA_GTP_COLLECT_METADATA]) {
+		if (data[IFLA_GTP_FD0]) {
+			netdev_dbg(dev, "LWT device does not support setting v0 socket");
+			return -EINVAL;
+		}
+		if (gtp_find_flow_based_dev(src_net)) {
+			netdev_dbg(dev, "LWT device already exist");
+			return -EBUSY;
+		}
+		gtp->collect_md = true;
+	}
+
 	if (!data[IFLA_GTP_PDP_HASHSIZE]) {
 		hashsize = 1024;
 	} else {
@@ -973,7 +1004,7 @@ static int gtp_newlink(struct net *src_net, struct net_device *dev,
 	if (err < 0)
 		return err;
 
-	err = gtp_encap_enable(gtp, data);
+	err = gtp_encap_enable(gtp, dev, data);
 	if (err < 0)
 		goto out_hashtable;
 
@@ -987,7 +1018,7 @@ static int gtp_newlink(struct net *src_net, struct net_device *dev,
 	list_add_rcu(&gtp->list, &gn->gtp_dev_list);
 	dev->priv_destructor = gtp_destructor;
 
-	netdev_dbg(dev, "registered new GTP interface\n");
+	netdev_dbg(dev, "registered new GTP interface %s\n", dev->name);
 
 	return 0;
 
@@ -1018,6 +1049,7 @@ static const struct nla_policy gtp_policy[IFLA_GTP_MAX + 1] = {
 	[IFLA_GTP_FD1]			= { .type = NLA_U32 },
 	[IFLA_GTP_PDP_HASHSIZE]		= { .type = NLA_U32 },
 	[IFLA_GTP_ROLE]			= { .type = NLA_U32 },
+	[IFLA_GTP_COLLECT_METADATA]	= { .type = NLA_FLAG },
 };
 
 static int gtp_validate(struct nlattr *tb[], struct nlattr *data[],
@@ -1044,6 +1076,9 @@ static int gtp_fill_info(struct sk_buff *skb, const struct net_device *dev)
 	if (nla_put_u32(skb, IFLA_GTP_ROLE, gtp->role))
 		goto nla_put_failure;
 
+	if (gtp->collect_md && nla_put_flag(skb, IFLA_GTP_COLLECT_METADATA))
+		goto nla_put_failure;
+
 	return 0;
 
 nla_put_failure:
@@ -1089,35 +1124,24 @@ static int gtp_hashtable_new(struct gtp_dev *gtp, int hsize)
 	return -ENOMEM;
 }
 
-static struct sock *gtp_encap_enable_socket(int fd, int type,
-					    struct gtp_dev *gtp)
+static int __gtp_encap_enable_socket(struct socket *sock, int type,
+				     struct gtp_dev *gtp)
 {
 	struct udp_tunnel_sock_cfg tuncfg = {NULL};
-	struct socket *sock;
 	struct sock *sk;
-	int err;
-
-	pr_debug("enable gtp on %d, %d\n", fd, type);
-
-	sock = sockfd_lookup(fd, &err);
-	if (!sock) {
-		pr_debug("gtp socket fd=%d not found\n", fd);
-		return NULL;
-	}
 
 	sk = sock->sk;
 	if (sk->sk_protocol != IPPROTO_UDP ||
 	    sk->sk_type != SOCK_DGRAM ||
 	    (sk->sk_family != AF_INET && sk->sk_family != AF_INET6)) {
-		pr_debug("socket fd=%d not UDP\n", fd);
-		sk = ERR_PTR(-EINVAL);
-		goto out_sock;
+		pr_debug("socket not UDP\n");
+		return -EINVAL;
 	}
 
 	lock_sock(sk);
 	if (sk->sk_user_data) {
-		sk = ERR_PTR(-EBUSY);
-		goto out_rel_sock;
+		release_sock(sock->sk);
+		return -EBUSY;
 	}
 
 	sock_hold(sk);
@@ -1130,15 +1154,58 @@ static struct sock *gtp_encap_enable_socket(int fd, int type,
 	tuncfg.gro_complete = gtp_gro_complete;
 
 	setup_udp_tunnel_sock(sock_net(sock->sk), sock, &tuncfg);
-
-out_rel_sock:
 	release_sock(sock->sk);
-out_sock:
+	return 0;
+}
+
+static struct sock *gtp_encap_enable_socket(int fd, int type,
+					    struct gtp_dev *gtp)
+{
+	struct socket *sock;
+	int err;
+
+	pr_debug("enable gtp on %d, %d\n", fd, type);
+
+	sock = sockfd_lookup(fd, &err);
+	if (!sock) {
+		pr_debug("gtp socket fd=%d not found\n", fd);
+		return NULL;
+	}
+	err =  __gtp_encap_enable_socket(sock, type, gtp);
 	sockfd_put(sock);
-	return sk;
+	if (err)
+		return ERR_PTR(err);
+
+	return sock->sk;
+}
+
+static struct socket *gtp_create_gtp_socket(struct gtp_dev *gtp, struct net_device *dev)
+{
+	struct udp_port_cfg udp_conf;
+	struct socket *sock;
+	int err;
+
+	memset(&udp_conf, 0, sizeof(udp_conf));
+	udp_conf.family = AF_INET;
+	udp_conf.local_ip.s_addr = htonl(INADDR_ANY);
+	udp_conf.local_udp_port = htons(GTP1U_PORT);
+
+	err = udp_sock_create(dev_net(dev), &udp_conf, &sock);
+	if (err < 0) {
+		pr_debug("create gtp sock failed: %d\n", err);
+		return ERR_PTR(err);
+	}
+	err = __gtp_encap_enable_socket(sock, UDP_ENCAP_GTP1U, gtp);
+	if (err) {
+		pr_debug("enable gtp sock encap failed: %d\n", err);
+		udp_tunnel_sock_release(sock);
+		return ERR_PTR(err);
+	}
+	pr_debug("create gtp sock done\n");
+	return sock;
 }
 
-static int gtp_encap_enable(struct gtp_dev *gtp, struct nlattr *data[])
+static int gtp_encap_enable(struct gtp_dev *gtp, struct net_device *dev, struct nlattr *data[])
 {
 	struct sock *sk1u = NULL;
 	struct sock *sk0 = NULL;
@@ -1162,11 +1229,25 @@ static int gtp_encap_enable(struct gtp_dev *gtp, struct nlattr *data[])
 		}
 	}
 
+	if (data[IFLA_GTP_COLLECT_METADATA]) {
+		struct socket *sock;
+
+		if (!sk1u) {
+			sock = gtp_create_gtp_socket(gtp, dev);
+			if (IS_ERR(sock))
+				return PTR_ERR(sock);
+
+			gtp->collect_md_sock = sock;
+			sk1u = sock->sk;
+		} else {
+			gtp->collect_md_sock = NULL;
+		}
+	}
+
 	if (data[IFLA_GTP_ROLE]) {
 		role = nla_get_u32(data[IFLA_GTP_ROLE]);
 		if (role > GTP_ROLE_SGSN) {
-			gtp_encap_disable_sock(sk0);
-			gtp_encap_disable_sock(sk1u);
+			gtp_encap_disable(gtp);
 			return -EINVAL;
 		}
 	}
@@ -1725,7 +1806,7 @@ static int __init gtp_init(void)
 	if (err < 0)
 		goto unreg_genl_family;
 
-	pr_info("GTP module loaded (pdp ctx size %zd bytes)\n",
+	pr_info("GTP module loaded (pdp ctx size %zd bytes) with tnl-md support\n",
 		sizeof(struct pdp_ctx));
 	return 0;
 
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 82708c6db432..2bd0d8bbcdb2 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -809,6 +809,7 @@ enum {
 	IFLA_GTP_FD1,
 	IFLA_GTP_PDP_HASHSIZE,
 	IFLA_GTP_ROLE,
+	IFLA_GTP_COLLECT_METADATA,
 	__IFLA_GTP_MAX,
 };
 #define IFLA_GTP_MAX (__IFLA_GTP_MAX - 1)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 15/16] gtp: add ability to send GTP controls headers
  2021-01-23 19:59 ` [RFC PATCH 15/16] gtp: add ability to send GTP controls headers Jonas Bonn
@ 2021-01-24 14:21   ` Jonas Bonn
  2021-01-25 17:41     ` Harald Welte
  2021-01-28 21:29     ` Pravin Shelar
  0 siblings, 2 replies; 39+ messages in thread
From: Jonas Bonn @ 2021-01-24 14:21 UTC (permalink / raw)
  To: laforge, netdev, pbshelar, kuba; +Cc: pablo

Hi Pravin,

So, this whole GTP metadata thing has me a bit confused.

You've defined a metadata structure like this:

struct gtpu_metadata {
         __u8    ver;
         __u8    flags;
         __u8    type;
};

Here ver is the version of the metadata structure itself, which is fine.
'flags' corresponds to the 3 flag bits of GTP header's first byte:  E, 
S, and PN.
'type' corresponds to the 'message type' field of the GTP header.

The 'control header' (strange name) example below allows the flags to be 
set; however, setting these flags alone is insufficient because each one 
indicates the presence of additional fields in the header and there's 
nothing in the code to account for that.

If E is set, extension headers would need to be added.
If S is set, a sequence number field would need to be added.
If PN is set, a PDU-number header would need to be added.

It's not clear to me who sets up this metadata in the first place.  Is 
that where OVS or eBPF come in?  Can you give some pointers to how this 
works?

Couple of comments below....

On 23/01/2021 20:59, Jonas Bonn wrote:
> From: Pravin B Shelar <pbshelar@fb.com>
> 
> Please explain how this patch actually works... creation of the control
> header makes sense, but I don't understand how sending of a
> control header is actually triggered.
> 
> Signed-off-by: Jonas Bonn <jonas@norrbonn.se>
> ---
>   drivers/net/gtp.c | 43 ++++++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 42 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
> index 668ed8a4836e..bbce2671de2d 100644
> --- a/drivers/net/gtp.c
> +++ b/drivers/net/gtp.c
> @@ -683,6 +683,38 @@ static void gtp_push_header(struct sk_buff *skb, struct pdp_ctx *pctx,
>   	}
>   }
>   
> +static inline int gtp1_push_control_header(struct sk_buff *skb,

I'm not enamored with the name 'control header' because it makes sound 
like this is some GTP-C thing.  The GTP module is really only about 
GTP-U and the function itself just sets up a GTP-U header.


> +					   struct pdp_ctx *pctx,
> +					   struct gtpu_metadata *opts,
> +					   struct net_device *dev)
> +{
> +	struct gtp1_header *gtp1c;
> +	int payload_len;
> +
> +	if (opts->ver != GTP_METADATA_V1)
> +		return -ENOENT;
> +
> +	if (opts->type == 0xFE) {
> +		/* for end marker ignore skb data. */
> +		netdev_dbg(dev, "xmit pkt with null data");
> +		pskb_trim(skb, 0);
> +	}
> +	if (skb_cow_head(skb, sizeof(*gtp1c)) < 0)
> +		return -ENOMEM;
> +
> +	payload_len = skb->len;
> +
> +	gtp1c = skb_push(skb, sizeof(*gtp1c));
> +
> +	gtp1c->flags	= opts->flags;
> +	gtp1c->type	= opts->type;
> +	gtp1c->length	= htons(payload_len);
> +	gtp1c->tid	= htonl(pctx->u.v1.o_tei);
> +	netdev_dbg(dev, "xmit control pkt: ver %d flags %x type %x pkt len %d tid %x",
> +		   opts->ver, opts->flags, opts->type, skb->len, pctx->u.v1.o_tei);
> +	return 0;
> +}

There's nothing really special about that above function aside from the 
facts that it takes 'opts' as an argument.  Can't we just merge this 
with the regular 'gtp_push_header' function?  Do you have plans for this 
beyond what's here that would complicated by doing so?

/Jonas


> +
>   static int gtp_xmit_ip4(struct sk_buff *skb, struct net_device *dev)
>   {
>   	struct gtp_dev *gtp = netdev_priv(dev);
> @@ -807,7 +839,16 @@ static int gtp_xmit_ip4(struct sk_buff *skb, struct net_device *dev)
>   
>   	skb_set_inner_protocol(skb, skb->protocol);
>   
> -	gtp_push_header(skb, pctx, &port);
> +	if (unlikely(opts)) {
> +		port = htons(GTP1U_PORT);
> +		r = gtp1_push_control_header(skb, pctx, opts, dev);
> +		if (r) {
> +			netdev_info(dev, "cntr pkt error %d", r);
> +			goto err_rt;
> +		}
> +	} else {
> +		gtp_push_header(skb, pctx, &port);
> +	}
>   
>   	iph = ip_hdr(skb);
>   	netdev_dbg(dev, "gtp -> IP src: %pI4 dst: %pI4\n",
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 14/16] gtp: add support for flow based tunneling
  2021-01-23 19:59 ` [RFC PATCH 14/16] gtp: add support for flow based tunneling Jonas Bonn
@ 2021-01-24 14:27   ` Jonas Bonn
  2021-01-24 15:11   ` Jonas Bonn
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 39+ messages in thread
From: Jonas Bonn @ 2021-01-24 14:27 UTC (permalink / raw)
  To: laforge, netdev, pbshelar, kuba; +Cc: pablo

Hi Pravin,

A couple more comments around the GTP_METADATA bits:

On 23/01/2021 20:59, Jonas Bonn wrote:
> From: Pravin B Shelar <pbshelar@fb.com>
> 
> This patch adds support for flow based tunneling, allowing to send and
> receive GTP tunneled packets via the (lightweight) tunnel metadata
> mechanism.  This would allow integration with OVS and eBPF using flow
> based tunneling APIs.
> 
> The mechanism used here is to get the required GTP tunnel parameters
> from the tunnel metadata instead of looking up a pre-configured PDP
> context.  The tunnel metadata contains the necessary information for
> creating the GTP header.
> 
> Signed-off-by: Jonas Bonn <jonas@norrbonn.se>
> ---
>   drivers/net/gtp.c                  | 160 +++++++++++++++++++++++++----
>   include/uapi/linux/gtp.h           |  12 +++
>   include/uapi/linux/if_tunnel.h     |   1 +
>   tools/include/uapi/linux/if_link.h |   1 +
>   4 files changed, 156 insertions(+), 18 deletions(-)
> 

<...>

>   
> +static int gtp_set_tun_dst(struct pdp_ctx *pctx, struct sk_buff *skb,
> +			   unsigned int hdrlen)
> +{
> +	struct metadata_dst *tun_dst;
> +	struct gtp1_header *gtp1;
> +	int opts_len = 0;
> +	__be64 tid;
> +
> +	gtp1 = (struct gtp1_header *)(skb->data + sizeof(struct udphdr));
> +
> +	tid = key32_to_tunnel_id(gtp1->tid);
> +
> +	if (unlikely(gtp1->flags & GTP1_F_MASK))
> +		opts_len = sizeof(struct gtpu_metadata);

So if there are GTP flags sets, you're saying that this no longer a 
T-PDU but something else.  That's wrong... the flags indicate the 
presence of extensions to the GTP header itself.

> +
> +	tun_dst = udp_tun_rx_dst(skb,
> +			pctx->sk->sk_family, TUNNEL_KEY, tid, opts_len);
> +	if (!tun_dst) {
> +		netdev_dbg(pctx->dev, "Failed to allocate tun_dst");
> +		goto err;
> +	}
> +
> +	netdev_dbg(pctx->dev, "attaching metadata_dst to skb, gtp ver %d hdrlen %d\n",
> +		   pctx->gtp_version, hdrlen);
> +	if (unlikely(opts_len)) {
> +		struct gtpu_metadata *opts;
> +
> +		opts = ip_tunnel_info_opts(&tun_dst->u.tun_info);
> +		opts->ver = GTP_METADATA_V1;
> +		opts->flags = gtp1->flags;
> +		opts->type = gtp1->type;
> +		netdev_dbg(pctx->dev, "recved control pkt: flag %x type: %d\n",
> +			   opts->flags, opts->type);
> +		tun_dst->u.tun_info.key.tun_flags |= TUNNEL_GTPU_OPT;
> +		tun_dst->u.tun_info.options_len = opts_len;
> +		skb->protocol = htons(0xffff);         /* Unknown */

It might be relevant to set protocol to unknown for 'end markers' (i.e. 
gtp1->type == 0xfe), but not for everything that happens to have flag 
bits set.  'flags' and 'type' are independent of each other and need to 
handled as such.

/Jonas

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 14/16] gtp: add support for flow based tunneling
  2021-01-23 19:59 ` [RFC PATCH 14/16] gtp: add support for flow based tunneling Jonas Bonn
  2021-01-24 14:27   ` Jonas Bonn
@ 2021-01-24 15:11   ` Jonas Bonn
  2021-01-25  8:12   ` Jonas Bonn
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 39+ messages in thread
From: Jonas Bonn @ 2021-01-24 15:11 UTC (permalink / raw)
  To: laforge, netdev, pbshelar, kuba; +Cc: pablo

Hi,

On 23/01/2021 20:59, Jonas Bonn wrote:
> From: Pravin B Shelar <pbshelar@fb.com>
> 
> This patch adds support for flow based tunneling, allowing to send and
> receive GTP tunneled packets via the (lightweight) tunnel metadata
> mechanism.  This would allow integration with OVS and eBPF using flow
> based tunneling APIs.
> 
> The mechanism used here is to get the required GTP tunnel parameters
> from the tunnel metadata instead of looking up a pre-configured PDP
> context.  The tunnel metadata contains the necessary information for
> creating the GTP header.

The GTP driver operates in two modes:  GGSN/PGW/UPF and SGSN/SGW/NG-U. 
For simplicity we'll just refer to these as 'ggsn' and 'sgsn', but these 
are essentially just upstream and downstream nodes.

So, the classic way of adding a tunnel to the driver is:

gtp-tunnel add gtp v1 100 200 192.168.100.1 172.99.0.1

That encapsulates a lot of information about both ends of the tunnel 
including TEID's for both ends of the tunnel, the local IP in use, and 
the remote end.

With this new approach we have ('ggsn' side):

ip route add 192.168.100.1/32 encap ip id 200 dst 172.99.0.1 dev gtp

That has all the information required for sending packets from 'ggsn' to 
'sgsn', but it's missing everything required for the validation of 
incoming packets from the 'sgsn'.  The implementation just ignores the 
TEID on incoming packets and doesn't validate TEID against IP like the 
PDP context variant does.

'sgsn' side we have:

ip route add SOME_IP encap ip id 100 dst 172.99.0.2 dev gtp

'sgsn' side is intended for testing eNodeB-type entities behind which 
there are a large number of simulated UE's.  The PDP context setup 
allows a form of 'source routing' internally in the driver whereby the 
MS/UE address of the PDP context is matched to the source IP of the 
outgoing packet in order to determine the required TEID.  How is 
something similar achievable with the ip route example above; can a 
'src' parameter be added in order to get the right 'id' (TEID) from a 
set of otherwise similar routing rules?


In the one example you posted 
(https://github.com/pshelar/iproute2/commit/d6e99f8342672e6e9ce0b71e153296f8e2b41cfc) 
you have, on the 'ggsn' side:

ip route add 1.1.1.0/24 encap id 0 dst 10.1.0.2 dev gtp1

This amounts to mapping a route to an entire network of 'devices' 
through a single TEID at host 10.1.0.2.  This might work because you 
just ignore the TEID on incoming packets and inject them into the 
receiving host in your 'sgsn' variant of the driver, but with any real 
downstream GTP device I don't think the above is of any practical value.

If I were to consider the above as an 'sgsn' side route setup, then 
you've got an entire network of devices talking to an upstream GTP 
entity (UPF) through a single TEID... security-wise, I'd be surprised if 
anybody actually allowed this.

Thanks for taking the time to consider the above.  I look forward to 
your comments.

/Jonas

> 
> Signed-off-by: Jonas Bonn <jonas@norrbonn.se>
> ---
>   drivers/net/gtp.c                  | 160 +++++++++++++++++++++++++----
>   include/uapi/linux/gtp.h           |  12 +++
>   include/uapi/linux/if_tunnel.h     |   1 +
>   tools/include/uapi/linux/if_link.h |   1 +
>   4 files changed, 156 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
> index 8aab46ec8a94..668ed8a4836e 100644
> --- a/drivers/net/gtp.c
> +++ b/drivers/net/gtp.c
> @@ -21,6 +21,7 @@
>   #include <linux/file.h>
>   #include <linux/gtp.h>
>   
> +#include <net/dst_metadata.h>
>   #include <net/net_namespace.h>
>   #include <net/protocol.h>
>   #include <net/ip.h>
> @@ -74,6 +75,9 @@ struct gtp_dev {
>   	unsigned int		hash_size;
>   	struct hlist_head	*tid_hash;
>   	struct hlist_head	*addr_hash;
> +	/* Used by LWT tunnel. */
> +	bool			collect_md;
> +	struct socket		*collect_md_sock;
>   };
>   
>   static unsigned int gtp_net_id __read_mostly;
> @@ -224,6 +228,51 @@ static int gtp_rx(struct pdp_ctx *pctx, struct sk_buff *skb,
>   	return -1;
>   }
>   
> +static int gtp_set_tun_dst(struct pdp_ctx *pctx, struct sk_buff *skb,
> +			   unsigned int hdrlen)
> +{
> +	struct metadata_dst *tun_dst;
> +	struct gtp1_header *gtp1;
> +	int opts_len = 0;
> +	__be64 tid;
> +
> +	gtp1 = (struct gtp1_header *)(skb->data + sizeof(struct udphdr));
> +
> +	tid = key32_to_tunnel_id(gtp1->tid);
> +
> +	if (unlikely(gtp1->flags & GTP1_F_MASK))
> +		opts_len = sizeof(struct gtpu_metadata);
> +
> +	tun_dst = udp_tun_rx_dst(skb,
> +			pctx->sk->sk_family, TUNNEL_KEY, tid, opts_len);
> +	if (!tun_dst) {
> +		netdev_dbg(pctx->dev, "Failed to allocate tun_dst");
> +		goto err;
> +	}
> +
> +	netdev_dbg(pctx->dev, "attaching metadata_dst to skb, gtp ver %d hdrlen %d\n",
> +		   pctx->gtp_version, hdrlen);
> +	if (unlikely(opts_len)) {
> +		struct gtpu_metadata *opts;
> +
> +		opts = ip_tunnel_info_opts(&tun_dst->u.tun_info);
> +		opts->ver = GTP_METADATA_V1;
> +		opts->flags = gtp1->flags;
> +		opts->type = gtp1->type;
> +		netdev_dbg(pctx->dev, "recved control pkt: flag %x type: %d\n",
> +			   opts->flags, opts->type);
> +		tun_dst->u.tun_info.key.tun_flags |= TUNNEL_GTPU_OPT;
> +		tun_dst->u.tun_info.options_len = opts_len;
> +		skb->protocol = htons(0xffff);         /* Unknown */
> +	}
> +
> +	skb_dst_set(skb, &tun_dst->dst);
> +	return 0;
> +err:
> +	return -1;
> +}
> +
> +
>   /* 1 means pass up to the stack, -1 means drop and 0 means decapsulated. */
>   static int gtp0_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
>   {
> @@ -262,6 +311,7 @@ static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
>   	unsigned int hdrlen = sizeof(struct udphdr) +
>   			      sizeof(struct gtp1_header);
>   	struct gtp1_header *gtp1;
> +	struct pdp_ctx md_pctx;
>   	struct pdp_ctx *pctx;
>   
>   	if (!pskb_may_pull(skb, hdrlen))
> @@ -272,6 +322,24 @@ static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
>   	if ((gtp1->flags >> 5) != GTP_V1)
>   		return 1;
>   
> +	if (ip_tunnel_collect_metadata() || gtp->collect_md) {
> +		int err;
> +
> +		pctx = &md_pctx;
> +
> +		pctx->gtp_version = GTP_V1;
> +		pctx->sk = gtp->sk1u;
> +		pctx->dev = gtp->dev;
> +
> +		err = gtp_set_tun_dst(pctx, skb, hdrlen);
> +		if (err) {
> +			gtp->dev->stats.rx_dropped++;
> +			return -1;
> +		}
> +
> +		return gtp_rx(pctx, skb, hdrlen);
> +	}
> +
>   	if (gtp1->type != GTP_TPDU)
>   		return 1;
>   
> @@ -353,7 +421,8 @@ static int gtp_encap_recv(struct sock *sk, struct sk_buff *skb)
>   	if (!gtp)
>   		return 1;
>   
> -	netdev_dbg(gtp->dev, "encap_recv sk=%p\n", sk);
> +	netdev_dbg(gtp->dev, "encap_recv sk=%p type %d\n",
> +		   sk, udp_sk(sk)->encap_type);
>   
>   	switch (udp_sk(sk)->encap_type) {
>   	case UDP_ENCAP_GTP0:
> @@ -539,7 +608,7 @@ static struct rtable *gtp_get_v4_rt(struct sk_buff *skb,
>   	memset(&fl4, 0, sizeof(fl4));
>   	fl4.flowi4_oif		= sk->sk_bound_dev_if;
>   	fl4.daddr		= pctx->peer_addr_ip4.s_addr;
> -	fl4.saddr		= inet_sk(sk)->inet_saddr;
> +	fl4.saddr		= *saddr;
>   	fl4.flowi4_tos		= RT_CONN_FLAGS(sk);
>   	fl4.flowi4_proto	= sk->sk_protocol;
>   
> @@ -617,29 +686,84 @@ static void gtp_push_header(struct sk_buff *skb, struct pdp_ctx *pctx,
>   static int gtp_xmit_ip4(struct sk_buff *skb, struct net_device *dev)
>   {
>   	struct gtp_dev *gtp = netdev_priv(dev);
> +	struct gtpu_metadata *opts = NULL;
> +	struct pdp_ctx md_pctx;
>   	struct pdp_ctx *pctx;
> +	__be16 port;
>   	struct rtable *rt;
> -	__be32 saddr;
>   	struct iphdr *iph;
> +	__be32 saddr;
>   	int headroom;
> -	__be16 port;
> +	__u8 tos;
>   	int r;
>   
> -	/* Read the IP destination address and resolve the PDP context.
> -	 * Prepend PDP header with TEI/TID from PDP ctx.
> -	 */
> -	iph = ip_hdr(skb);
> -	if (gtp->role == GTP_ROLE_SGSN)
> -		pctx = ipv4_pdp_find(gtp, iph->saddr);
> -	else
> -		pctx = ipv4_pdp_find(gtp, iph->daddr);
> +	if (gtp->collect_md) {
> +		/* LWT GTP1U encap */
> +		struct ip_tunnel_info *info = NULL;
>   
> -	if (!pctx) {
> -		netdev_dbg(dev, "no PDP ctx found for %pI4, skip\n",
> -			   &iph->daddr);
> -		return -ENOENT;
> +		info = skb_tunnel_info(skb);
> +		if (!info) {
> +			netdev_dbg(dev, "missing tunnel info");
> +			return -ENOENT;
> +		}
> +		if (info->key.tp_dst && ntohs(info->key.tp_dst) != GTP1U_PORT) {
> +			netdev_dbg(dev, "unexpected GTP dst port: %d", ntohs(info->key.tp_dst));
> +			return -EOPNOTSUPP;
> +		}
> +
> +		if (!gtp->sk1u) {
> +			netdev_dbg(dev, "missing tunnel sock");
> +			return -EOPNOTSUPP;
> +		}
> +
> +		pctx = &md_pctx;
> +		memset(pctx, 0, sizeof(*pctx));
> +		pctx->sk = gtp->sk1u;
> +		pctx->gtp_version = GTP_V1;
> +		pctx->u.v1.o_tei = ntohl(tunnel_id_to_key32(info->key.tun_id));
> +		pctx->peer_addr_ip4.s_addr = info->key.u.ipv4.dst;
> +
> +		saddr = info->key.u.ipv4.src;
> +		tos = info->key.tos;
> +
> +		if (info->options_len != 0) {
> +			if (info->key.tun_flags & TUNNEL_GTPU_OPT) {
> +				opts = ip_tunnel_info_opts(info);
> +			} else {
> +				netdev_dbg(dev, "missing tunnel metadata for control pkt");
> +				return -EOPNOTSUPP;
> +			}
> +		}
> +		netdev_dbg(dev, "flow-based GTP1U encap: tunnel id %d\n",
> +			   pctx->u.v1.o_tei);
> +	} else {
> +		struct iphdr *iph;
> +
> +		if (ntohs(skb->protocol) != ETH_P_IP)
> +			return -EOPNOTSUPP;
> +
> +		iph = ip_hdr(skb);
> +
> +		/* Read the IP destination address and resolve the PDP context.
> +		 * Prepend PDP header with TEI/TID from PDP ctx.
> +		 */
> +		if (gtp->role == GTP_ROLE_SGSN)
> +			pctx = ipv4_pdp_find(gtp, iph->saddr);
> +		else
> +			pctx = ipv4_pdp_find(gtp, iph->daddr);
> +
> +		if (!pctx) {
> +			netdev_dbg(dev, "no PDP ctx found for %pI4, skip\n",
> +				   &iph->daddr);
> +			return -ENOENT;
> +		}
> +		netdev_dbg(dev, "found PDP context %p\n", pctx);
> +
> +		saddr = inet_sk(pctx->sk)->inet_saddr;
> +		tos = iph->tos;
> +		netdev_dbg(dev, "gtp -> IP src: %pI4 dst: %pI4\n",
> +			   &iph->saddr, &iph->daddr);
>   	}
> -	netdev_dbg(dev, "found PDP context %p\n", pctx);
>   
>   	rt = gtp_get_v4_rt(skb, dev, pctx, &saddr);
>   	if (IS_ERR(rt)) {
> @@ -691,7 +815,7 @@ static int gtp_xmit_ip4(struct sk_buff *skb, struct net_device *dev)
>   
>   	udp_tunnel_xmit_skb(rt, pctx->sk, skb,
>   			    saddr, pctx->peer_addr_ip4.s_addr,
> -			    iph->tos,
> +			    tos,
>   			    ip4_dst_hoplimit(&rt->dst),
>   			    0,
>   			    port, port,
> diff --git a/include/uapi/linux/gtp.h b/include/uapi/linux/gtp.h
> index 79f9191bbb24..62aff78b7c56 100644
> --- a/include/uapi/linux/gtp.h
> +++ b/include/uapi/linux/gtp.h
> @@ -2,6 +2,8 @@
>   #ifndef _UAPI_LINUX_GTP_H_
>   #define _UAPI_LINUX_GTP_H_
>   
> +#include <linux/types.h>
> +
>   #define GTP_GENL_MCGRP_NAME	"gtp"
>   
>   enum gtp_genl_cmds {
> @@ -34,4 +36,14 @@ enum gtp_attrs {
>   };
>   #define GTPA_MAX (__GTPA_MAX + 1)
>   
> +enum {
> +	GTP_METADATA_V1
> +};
> +
> +struct gtpu_metadata {
> +	__u8    ver;
> +	__u8    flags;
> +	__u8    type;
> +};
> +
>   #endif /* _UAPI_LINUX_GTP_H_ */
> diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
> index 7d9105533c7b..802da679fab1 100644
> --- a/include/uapi/linux/if_tunnel.h
> +++ b/include/uapi/linux/if_tunnel.h
> @@ -176,6 +176,7 @@ enum {
>   #define TUNNEL_VXLAN_OPT	__cpu_to_be16(0x1000)
>   #define TUNNEL_NOCACHE		__cpu_to_be16(0x2000)
>   #define TUNNEL_ERSPAN_OPT	__cpu_to_be16(0x4000)
> +#define TUNNEL_GTPU_OPT		__cpu_to_be16(0x8000)
>   
>   #define TUNNEL_OPTIONS_PRESENT \
>   		(TUNNEL_GENEVE_OPT | TUNNEL_VXLAN_OPT | TUNNEL_ERSPAN_OPT)
> diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h
> index d208b2af697f..28d649bda686 100644
> --- a/tools/include/uapi/linux/if_link.h
> +++ b/tools/include/uapi/linux/if_link.h
> @@ -617,6 +617,7 @@ enum {
>   	IFLA_GTP_FD1,
>   	IFLA_GTP_PDP_HASHSIZE,
>   	IFLA_GTP_ROLE,
> +	IFLA_GTP_COLLECT_METADATA,
>   	__IFLA_GTP_MAX,
>   };
>   #define IFLA_GTP_MAX (__IFLA_GTP_MAX - 1)
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 01/16] Revert "GTP: add support for flow based tunneling API"
  2021-01-23 19:59 ` [RFC PATCH 01/16] Revert "GTP: add support for flow based tunneling API" Jonas Bonn
@ 2021-01-24 16:34   ` Harald Welte
  0 siblings, 0 replies; 39+ messages in thread
From: Harald Welte @ 2021-01-24 16:34 UTC (permalink / raw)
  To: Jonas Bonn; +Cc: netdev, pbshelar, kuba, pablo

Dear Jonas,

thanks for your effort in breaking this down into more digestible chunks
for further review.

> Signed-off-by: Jonas Bonn <jonas@norrbonn.se>

Acked-by: Harald Welte <laforge@gnumonks.org>

-- 
- Harald Welte <laforge@gnumonks.org>           http://laforge.gnumonks.org/
============================================================================
"Privacy in residential applications is a desirable marketing option."
                                                  (ETSI EN 300 175-7 Ch. A6)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 05/16] gtp: drop unnecessary call to skb_dst_drop
  2021-01-23 19:59 ` [RFC PATCH 05/16] gtp: drop unnecessary call to skb_dst_drop Jonas Bonn
@ 2021-01-24 16:48   ` Harald Welte
  0 siblings, 0 replies; 39+ messages in thread
From: Harald Welte @ 2021-01-24 16:48 UTC (permalink / raw)
  To: Jonas Bonn; +Cc: netdev, pbshelar, kuba, pablo

On Sat, Jan 23, 2021 at 08:59:05PM +0100, Jonas Bonn wrote:
> Signed-off-by: Jonas Bonn <jonas@norrbonn.se>

Acked-by: Harald Welte <laforge@gnumonks.org>
-- 
- Harald Welte <laforge@gnumonks.org>           http://laforge.gnumonks.org/
============================================================================
"Privacy in residential applications is a desirable marketing option."
                                                  (ETSI EN 300 175-7 Ch. A6)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 12/16] gtp: update rx_length_errors for abnormally short packets
  2021-01-23 19:59 ` [RFC PATCH 12/16] gtp: update rx_length_errors for abnormally short packets Jonas Bonn
@ 2021-01-24 16:50   ` Harald Welte
  0 siblings, 0 replies; 39+ messages in thread
From: Harald Welte @ 2021-01-24 16:50 UTC (permalink / raw)
  To: Jonas Bonn; +Cc: netdev, pbshelar, kuba, pablo

On Sat, Jan 23, 2021 at 08:59:12PM +0100, Jonas Bonn wrote:
> Signed-off-by: Jonas Bonn <jonas@norrbonn.se>

Acked-by: Harald Welte <laforge@gnumonks.org>
-- 
- Harald Welte <laforge@gnumonks.org>           http://laforge.gnumonks.org/
============================================================================
"Privacy in residential applications is a desirable marketing option."
                                                  (ETSI EN 300 175-7 Ch. A6)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 00/16] GTP: flow based
  2021-01-23 19:59 [RFC PATCH 00/16] GTP: flow based Jonas Bonn
                   ` (15 preceding siblings ...)
  2021-01-23 19:59 ` [RFC PATCH 16/16] gtp: add netlink support for setting up flow based tunnels Jonas Bonn
@ 2021-01-24 17:42 ` Pravin Shelar
  16 siblings, 0 replies; 39+ messages in thread
From: Pravin Shelar @ 2021-01-24 17:42 UTC (permalink / raw)
  To: Linux Kernel Network Developers

On Sat, Jan 23, 2021 at 12:06 PM Jonas Bonn <jonas@norrbonn.se> wrote:
>
> This series begins by reverting the recently added patch adding support
> for GTP with lightweight tunnels.  That patch was added without getting
> any ACK from the maintainers and has several issues, as discussed on the
> mailing list.
>
> In order to try to make this as painless as possible, I have reworked
> Pravin's patch into a series that is, hopefully, a bit more reviewable.
> That series is rebased onto a series of other changes that constitute
> cleanup work necessary for on-going work to IPv6 support into the
> driver.  The IPv6 work should be rebaseable onto top of this series
> later on.
>
> I did try to do this other way around:  rebasing the IPv6 series on top
> of Pravin's patch.  Given that Pravin's patch contained about 200 lines
> of superfluous changes that would have had to be reverted in the process
> of realigning the patch series, things got ugly pretty quickly.  The end
> result would not have been pretty.
>
> So the result of this is that Pravin's patch is now mostly still in
> place.  I've reworked some small bits in order to simplify things.  My
> expectation is that Pravin will review and test his bits here.  In
> particular, the patch adding GTP control headers needs a bit of
> explanation.
>
> This is still an RFC only because I'm not quite convinced that I'm done
> with this.  I do want to get this onto the list quickly, though, since
> this has implications for the next merge window.  So let's see if we can
> sort this out to everyone's satisfaction.
>

I am all for making progress forward. Thanks Jonas for working on this.
I will finish the review next week.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 14/16] gtp: add support for flow based tunneling
  2021-01-23 19:59 ` [RFC PATCH 14/16] gtp: add support for flow based tunneling Jonas Bonn
  2021-01-24 14:27   ` Jonas Bonn
  2021-01-24 15:11   ` Jonas Bonn
@ 2021-01-25  8:12   ` Jonas Bonn
  2021-01-25  8:47   ` Jonas Bonn
  2021-02-06 18:04   ` Jonas Bonn
  4 siblings, 0 replies; 39+ messages in thread
From: Jonas Bonn @ 2021-01-25  8:12 UTC (permalink / raw)
  To: laforge, netdev, pbshelar, kuba; +Cc: pablo

Hi Pravin,

I'm going to submit a new series without the GTP_METADATA bits.  I think 
the whole "collect metadata" approach is fine, but the way GTP header 
information is passed through the tunnel via metadata needs a bit more 
thought.  See below...

On 23/01/2021 20:59, Jonas Bonn wrote:
> From: Pravin B Shelar <pbshelar@fb.com>
>   
> +static int gtp_set_tun_dst(struct pdp_ctx *pctx, struct sk_buff *skb,
> +			   unsigned int hdrlen)
> +{
> +	struct metadata_dst *tun_dst;
> +	struct gtp1_header *gtp1;
> +	int opts_len = 0;
> +	__be64 tid;
> +
> +	gtp1 = (struct gtp1_header *)(skb->data + sizeof(struct udphdr));
> +
> +	tid = key32_to_tunnel_id(gtp1->tid);
> +
> +	if (unlikely(gtp1->flags & GTP1_F_MASK))
> +		opts_len = sizeof(struct gtpu_metadata);

This decides that GTP metadata is required if any of the S, E, and PN 
bits are set in the header.  However:

i) even when any of those bits are set, none of the extra headers are 
actually added to the metadata so it's somewhat pointless to even bother 
reporting that they're set

ii) the more interesting case is that you might want to report reception 
of an end marker through the tunnel; that however, is signalled by way 
of the GTP header type and not via the flags; but, see below...


> +
> +	tun_dst = udp_tun_rx_dst(skb,
> +			pctx->sk->sk_family, TUNNEL_KEY, tid, opts_len);
> +	if (!tun_dst) {
> +		netdev_dbg(pctx->dev, "Failed to allocate tun_dst");
> +		goto err;
> +	}

The problem, as I see it, is that end marker messages don't actually 
contain an inner packet, so you won't be able to set up a destination 
for them.  The above fails and you never hit the metadata path below.

> +
> +	netdev_dbg(pctx->dev, "attaching metadata_dst to skb, gtp ver %d hdrlen %d\n",
> +		   pctx->gtp_version, hdrlen);
> +	if (unlikely(opts_len)) {
> +		struct gtpu_metadata *opts;
> +
> +		opts = ip_tunnel_info_opts(&tun_dst->u.tun_info);
> +		opts->ver = GTP_METADATA_V1;
> +		opts->flags = gtp1->flags;
> +		opts->type = gtp1->type;
> +		netdev_dbg(pctx->dev, "recved control pkt: flag %x type: %d\n",
> +			   opts->flags, opts->type);
> +		tun_dst->u.tun_info.key.tun_flags |= TUNNEL_GTPU_OPT;
> +		tun_dst->u.tun_info.options_len = opts_len;
> +		skb->protocol = htons(0xffff);         /* Unknown */
> +	}

Assuming that you do hit this code and are able to set the 'type' field 
in the metadata, who is going to be the recipient.  After you pull the 
GTP headers, the SKB is presumably zero-length...

/Jonas

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 14/16] gtp: add support for flow based tunneling
  2021-01-23 19:59 ` [RFC PATCH 14/16] gtp: add support for flow based tunneling Jonas Bonn
                     ` (2 preceding siblings ...)
  2021-01-25  8:12   ` Jonas Bonn
@ 2021-01-25  8:47   ` Jonas Bonn
  2021-02-06 18:04   ` Jonas Bonn
  4 siblings, 0 replies; 39+ messages in thread
From: Jonas Bonn @ 2021-01-25  8:47 UTC (permalink / raw)
  To: laforge, netdev, pbshelar, kuba; +Cc: pablo

Hi Pravin,

On 23/01/2021 20:59, Jonas Bonn wrote:
> From: Pravin B Shelar <pbshelar@fb.com>
> 
> @@ -617,29 +686,84 @@ static void gtp_push_header(struct sk_buff *skb, struct pdp_ctx *pctx,
>   static int gtp_xmit_ip4(struct sk_buff *skb, struct net_device *dev)
>   {
>   	struct gtp_dev *gtp = netdev_priv(dev);
> +	struct gtpu_metadata *opts = NULL;
> +	struct pdp_ctx md_pctx;
>   	struct pdp_ctx *pctx;
> +	__be16 port;
>   	struct rtable *rt;
> -	__be32 saddr;
>   	struct iphdr *iph;
> +	__be32 saddr;
>   	int headroom;
> -	__be16 port;
> +	__u8 tos;
>   	int r;
>   
> -	/* Read the IP destination address and resolve the PDP context.
> -	 * Prepend PDP header with TEI/TID from PDP ctx.
> -	 */
> -	iph = ip_hdr(skb);
> -	if (gtp->role == GTP_ROLE_SGSN)
> -		pctx = ipv4_pdp_find(gtp, iph->saddr);
> -	else
> -		pctx = ipv4_pdp_find(gtp, iph->daddr);
> +	if (gtp->collect_md) {

Why do we have this restriction that the device be exclusively "collect 
metadata" mode or PDP context mode?  Why are we not able to mix the two?

Furthermore, since the collect_md_sock will effectively always be 
listening on INADDR_ANY, that precludes any other PDP context devices 
from co-existing with it.  So setting up a secondary device for PDP 
contexts isn't a feasible workaround.

If mixing isn't possible, then I suppose PDP context management needs to 
be made to fail gracefully in "collect_md" mode... with the current 
patches I think that contexts can be set up but they are just silently 
ignored, which seems like a potential source of confusion.

/Jonas


> +		/* LWT GTP1U encap */
> +		struct ip_tunnel_info *info = NULL;
>   
> -	if (!pctx) {
> -		netdev_dbg(dev, "no PDP ctx found for %pI4, skip\n",
> -			   &iph->daddr);
> -		return -ENOENT;
> +		info = skb_tunnel_info(skb);
> +		if (!info) {
> +			netdev_dbg(dev, "missing tunnel info");
> +			return -ENOENT;
> +		}
> +		if (info->key.tp_dst && ntohs(info->key.tp_dst) != GTP1U_PORT) {
> +			netdev_dbg(dev, "unexpected GTP dst port: %d", ntohs(info->key.tp_dst));
> +			return -EOPNOTSUPP;
> +		}
> +
> +		if (!gtp->sk1u) {
> +			netdev_dbg(dev, "missing tunnel sock");
> +			return -EOPNOTSUPP;
> +		}
> +
> +		pctx = &md_pctx;
> +		memset(pctx, 0, sizeof(*pctx));
> +		pctx->sk = gtp->sk1u;
> +		pctx->gtp_version = GTP_V1;
> +		pctx->u.v1.o_tei = ntohl(tunnel_id_to_key32(info->key.tun_id));
> +		pctx->peer_addr_ip4.s_addr = info->key.u.ipv4.dst;
> +
> +		saddr = info->key.u.ipv4.src;
> +		tos = info->key.tos;
> +
> +		if (info->options_len != 0) {
> +			if (info->key.tun_flags & TUNNEL_GTPU_OPT) {
> +				opts = ip_tunnel_info_opts(info);
> +			} else {
> +				netdev_dbg(dev, "missing tunnel metadata for control pkt");
> +				return -EOPNOTSUPP;
> +			}
> +		}
> +		netdev_dbg(dev, "flow-based GTP1U encap: tunnel id %d\n",
> +			   pctx->u.v1.o_tei);
> +	} else {
> +		struct iphdr *iph;
> +
> +		if (ntohs(skb->protocol) != ETH_P_IP)
> +			return -EOPNOTSUPP;
> +
> +		iph = ip_hdr(skb);
> +
> +		/* Read the IP destination address and resolve the PDP context.
> +		 * Prepend PDP header with TEI/TID from PDP ctx.
> +		 */
> +		if (gtp->role == GTP_ROLE_SGSN)
> +			pctx = ipv4_pdp_find(gtp, iph->saddr);
> +		else
> +			pctx = ipv4_pdp_find(gtp, iph->daddr);
> +
> +		if (!pctx) {
> +			netdev_dbg(dev, "no PDP ctx found for %pI4, skip\n",
> +				   &iph->daddr);
> +			return -ENOENT;
> +		}
> +		netdev_dbg(dev, "found PDP context %p\n", pctx);
> +
> +		saddr = inet_sk(pctx->sk)->inet_saddr;
> +		tos = iph->tos;
> +		netdev_dbg(dev, "gtp -> IP src: %pI4 dst: %pI4\n",
> +			   &iph->saddr, &iph->daddr);
>   	}
> -	netdev_dbg(dev, "found PDP context %p\n", pctx);
>   
>   	rt = gtp_get_v4_rt(skb, dev, pctx, &saddr);
>   	if (IS_ERR(rt)) {
> @@ -691,7 +815,7 @@ static int gtp_xmit_ip4(struct sk_buff *skb, struct net_device *dev)
>   
>   	udp_tunnel_xmit_skb(rt, pctx->sk, skb,
>   			    saddr, pctx->peer_addr_ip4.s_addr,
> -			    iph->tos,
> +			    tos,
>   			    ip4_dst_hoplimit(&rt->dst),
>   			    0,
>   			    port, port,
> diff --git a/include/uapi/linux/gtp.h b/include/uapi/linux/gtp.h
> index 79f9191bbb24..62aff78b7c56 100644
> --- a/include/uapi/linux/gtp.h
> +++ b/include/uapi/linux/gtp.h
> @@ -2,6 +2,8 @@
>   #ifndef _UAPI_LINUX_GTP_H_
>   #define _UAPI_LINUX_GTP_H_
>   
> +#include <linux/types.h>
> +
>   #define GTP_GENL_MCGRP_NAME	"gtp"
>   
>   enum gtp_genl_cmds {
> @@ -34,4 +36,14 @@ enum gtp_attrs {
>   };
>   #define GTPA_MAX (__GTPA_MAX + 1)
>   
> +enum {
> +	GTP_METADATA_V1
> +};
> +
> +struct gtpu_metadata {
> +	__u8    ver;
> +	__u8    flags;
> +	__u8    type;
> +};
> +
>   #endif /* _UAPI_LINUX_GTP_H_ */
> diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
> index 7d9105533c7b..802da679fab1 100644
> --- a/include/uapi/linux/if_tunnel.h
> +++ b/include/uapi/linux/if_tunnel.h
> @@ -176,6 +176,7 @@ enum {
>   #define TUNNEL_VXLAN_OPT	__cpu_to_be16(0x1000)
>   #define TUNNEL_NOCACHE		__cpu_to_be16(0x2000)
>   #define TUNNEL_ERSPAN_OPT	__cpu_to_be16(0x4000)
> +#define TUNNEL_GTPU_OPT		__cpu_to_be16(0x8000)
>   
>   #define TUNNEL_OPTIONS_PRESENT \
>   		(TUNNEL_GENEVE_OPT | TUNNEL_VXLAN_OPT | TUNNEL_ERSPAN_OPT)
> diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h
> index d208b2af697f..28d649bda686 100644
> --- a/tools/include/uapi/linux/if_link.h
> +++ b/tools/include/uapi/linux/if_link.h
> @@ -617,6 +617,7 @@ enum {
>   	IFLA_GTP_FD1,
>   	IFLA_GTP_PDP_HASHSIZE,
>   	IFLA_GTP_ROLE,
> +	IFLA_GTP_COLLECT_METADATA,
>   	__IFLA_GTP_MAX,
>   };
>   #define IFLA_GTP_MAX (__IFLA_GTP_MAX - 1)
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 15/16] gtp: add ability to send GTP controls headers
  2021-01-24 14:21   ` Jonas Bonn
@ 2021-01-25 17:41     ` Harald Welte
  2021-01-28 21:29       ` Pravin Shelar
  2021-01-28 21:29     ` Pravin Shelar
  1 sibling, 1 reply; 39+ messages in thread
From: Harald Welte @ 2021-01-25 17:41 UTC (permalink / raw)
  To: Jonas Bonn; +Cc: netdev, pbshelar, kuba, pablo

Hi Jonas,

thanks for your detailed analysis and review of the changes.  To me, they
once again show that the original patch was merged too quickly, without
a detailed review by people with strong GTP background.

On Sun, Jan 24, 2021 at 03:21:21PM +0100, Jonas Bonn wrote:
> struct gtpu_metadata {
>         __u8    ver;
>         __u8    flags;
>         __u8    type;
> };
> 
> Here ver is the version of the metadata structure itself, which is fine.
> 'flags' corresponds to the 3 flag bits of GTP header's first byte:  E, S,
> and PN.
> 'type' corresponds to the 'message type' field of the GTP header.

One more comment on the 'type': Of how much use is it?  After all, the
GTP-U kernel driver only handles a single message type at all (G-PDU /
255 - the only message type that encapsulates user IP data), while all
other message types are always processed in userland via the UDP socket.

Side-note: 3GPP TS 29.060 lists 5 other message types that can happen in
GTP-U:
* Echo Request
* Echo Response
* Error Indication
* Supported Extension Headers Notification
* End Marker

It would be interesting to understand how the new flow-based tunnel would
treat those, if those 

> The 'control header' (strange name) example below allows the flags to be
> set; however, setting these flags alone is insufficient because each one
> indicates the presence of additional fields in the header and there's
> nothing in the code to account for that.

Full ACK from my side here.  Setting arbitrary bits in the GTP flags without
then actually encoding the required additional bits that those flags require
will produce broken packets.  IMHO, the GTP driver should never do that.

-- 
- Harald Welte <laforge@gnumonks.org>           http://laforge.gnumonks.org/
============================================================================
"Privacy in residential applications is a desirable marketing option."
                                                  (ETSI EN 300 175-7 Ch. A6)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 15/16] gtp: add ability to send GTP controls headers
  2021-01-24 14:21   ` Jonas Bonn
  2021-01-25 17:41     ` Harald Welte
@ 2021-01-28 21:29     ` Pravin Shelar
       [not found]       ` <9b9476d2-186f-e749-f17d-d191c30347e4@norrbonn.se>
  1 sibling, 1 reply; 39+ messages in thread
From: Pravin Shelar @ 2021-01-28 21:29 UTC (permalink / raw)
  To: Jonas Bonn
  Cc: Harald Welte, Linux Kernel Network Developers, Pravin B Shelar,
	Jakub Kicinski, Pablo Neira Ayuso

On Sun, Jan 24, 2021 at 6:22 AM Jonas Bonn <jonas@norrbonn.se> wrote:
>
> Hi Pravin,
>
> So, this whole GTP metadata thing has me a bit confused.
>
> You've defined a metadata structure like this:
>
> struct gtpu_metadata {
>          __u8    ver;
>          __u8    flags;
>          __u8    type;
> };
>
> Here ver is the version of the metadata structure itself, which is fine.
> 'flags' corresponds to the 3 flag bits of GTP header's first byte:  E,
> S, and PN.
> 'type' corresponds to the 'message type' field of the GTP header.
>
> The 'control header' (strange name) example below allows the flags to be
> set; however, setting these flags alone is insufficient because each one
> indicates the presence of additional fields in the header and there's
> nothing in the code to account for that.
>
> If E is set, extension headers would need to be added.
> If S is set, a sequence number field would need to be added.
> If PN is set, a PDU-number header would need to be added.
>
> It's not clear to me who sets up this metadata in the first place.  Is
> that where OVS or eBPF come in?  Can you give some pointers to how this
> works?
>

Receive path: LWT extracts tunnel metadata into tunnel-metadata
struct. This object has 5-tuple info from outer header and tunnel key.
When there is presence of extension header there is no way to store
the info standard tunnel-metadata object. That is when the optional
section of tunnel-metadata comes in the play.
As you can see the packet data from GTP header onwards is still pushed
to the device, so consumers of LWT can look at tunnel-metadata and
make sense of the inner packet that is received on the device.
OVS does exactly the same. When it receives a GTP packet with optional
metadata, it looks at flags and parses the inner packet and extension
header accordingly.

xmit path: it is set by LWT tunnel user, OVS or eBPF code. it needs to
set the metadata in tunnel dst along with the 5-tuple data and tunel
ID. This is how it can steer the packet at the right tunnel using a
single tunnel net device.

> Couple of comments below....
>
> On 23/01/2021 20:59, Jonas Bonn wrote:
> > From: Pravin B Shelar <pbshelar@fb.com>
> >
> > Please explain how this patch actually works... creation of the control
> > header makes sense, but I don't understand how sending of a
> > control header is actually triggered.
> >
> > Signed-off-by: Jonas Bonn <jonas@norrbonn.se>
> > ---
> >   drivers/net/gtp.c | 43 ++++++++++++++++++++++++++++++++++++++++++-
> >   1 file changed, 42 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
> > index 668ed8a4836e..bbce2671de2d 100644
> > --- a/drivers/net/gtp.c
> > +++ b/drivers/net/gtp.c
> > @@ -683,6 +683,38 @@ static void gtp_push_header(struct sk_buff *skb, struct pdp_ctx *pctx,
> >       }
> >   }
> >
> > +static inline int gtp1_push_control_header(struct sk_buff *skb,
>
> I'm not enamored with the name 'control header' because it makes sound
> like this is some GTP-C thing.  The GTP module is really only about
> GTP-U and the function itself just sets up a GTP-U header.
>
sure. lets call ext_hdr.

>
> > +                                        struct pdp_ctx *pctx,
> > +                                        struct gtpu_metadata *opts,
> > +                                        struct net_device *dev)
> > +{
> > +     struct gtp1_header *gtp1c;
> > +     int payload_len;
> > +
> > +     if (opts->ver != GTP_METADATA_V1)
> > +             return -ENOENT;
> > +
> > +     if (opts->type == 0xFE) {
> > +             /* for end marker ignore skb data. */
> > +             netdev_dbg(dev, "xmit pkt with null data");
> > +             pskb_trim(skb, 0);
> > +     }
> > +     if (skb_cow_head(skb, sizeof(*gtp1c)) < 0)
> > +             return -ENOMEM;
> > +
> > +     payload_len = skb->len;
> > +
> > +     gtp1c = skb_push(skb, sizeof(*gtp1c));
> > +
> > +     gtp1c->flags    = opts->flags;
> > +     gtp1c->type     = opts->type;
> > +     gtp1c->length   = htons(payload_len);
> > +     gtp1c->tid      = htonl(pctx->u.v1.o_tei);
> > +     netdev_dbg(dev, "xmit control pkt: ver %d flags %x type %x pkt len %d tid %x",
> > +                opts->ver, opts->flags, opts->type, skb->len, pctx->u.v1.o_tei);
> > +     return 0;
> > +}
>
> There's nothing really special about that above function aside from the
> facts that it takes 'opts' as an argument.  Can't we just merge this
> with the regular 'gtp_push_header' function?  Do you have plans for this
> beyond what's here that would complicated by doing so?
>

Yes, we already have usecase for handling GTP PDU session container
related extension header for 5G UPF endpoitnt.



> /Jonas
>
>
> > +
> >   static int gtp_xmit_ip4(struct sk_buff *skb, struct net_device *dev)
> >   {
> >       struct gtp_dev *gtp = netdev_priv(dev);
> > @@ -807,7 +839,16 @@ static int gtp_xmit_ip4(struct sk_buff *skb, struct net_device *dev)
> >
> >       skb_set_inner_protocol(skb, skb->protocol);
> >
> > -     gtp_push_header(skb, pctx, &port);
> > +     if (unlikely(opts)) {
> > +             port = htons(GTP1U_PORT);
> > +             r = gtp1_push_control_header(skb, pctx, opts, dev);
> > +             if (r) {
> > +                     netdev_info(dev, "cntr pkt error %d", r);
> > +                     goto err_rt;
> > +             }
> > +     } else {
> > +             gtp_push_header(skb, pctx, &port);
> > +     }
> >
> >       iph = ip_hdr(skb);
> >       netdev_dbg(dev, "gtp -> IP src: %pI4 dst: %pI4\n",
> >

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 15/16] gtp: add ability to send GTP controls headers
  2021-01-25 17:41     ` Harald Welte
@ 2021-01-28 21:29       ` Pravin Shelar
  0 siblings, 0 replies; 39+ messages in thread
From: Pravin Shelar @ 2021-01-28 21:29 UTC (permalink / raw)
  To: Harald Welte
  Cc: Jonas Bonn, Linux Kernel Network Developers, Pravin B Shelar,
	Jakub Kicinski, Pablo Neira Ayuso

On Mon, Jan 25, 2021 at 9:52 AM Harald Welte <laforge@gnumonks.org> wrote:
>
> Hi Jonas,
>

> On Sun, Jan 24, 2021 at 03:21:21PM +0100, Jonas Bonn wrote:
> > struct gtpu_metadata {
> >         __u8    ver;
> >         __u8    flags;
> >         __u8    type;
> > };
> >
> > Here ver is the version of the metadata structure itself, which is fine.
> > 'flags' corresponds to the 3 flag bits of GTP header's first byte:  E, S,
> > and PN.
> > 'type' corresponds to the 'message type' field of the GTP header.
>
> One more comment on the 'type': Of how much use is it?  After all, the
> GTP-U kernel driver only handles a single message type at all (G-PDU /
> 255 - the only message type that encapsulates user IP data), while all
> other message types are always processed in userland via the UDP socket.
>
There is NO userland UDP socket for the LWT tunnel. All UDP traffic is
terminated in kernel space. userland only sets rules over LTW tunnel
device to handle traffic. This is how OVS/eBPF handles other UDP based
LWT tunnel devices.


> Side-note: 3GPP TS 29.060 lists 5 other message types that can happen in
> GTP-U:
> * Echo Request
> * Echo Response
> * Error Indication
> * Supported Extension Headers Notification
> * End Marker
>
> It would be interesting to understand how the new flow-based tunnel would
> treat those, if those
>
Current code does handle Echo Request, Response and End marker.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 15/16] gtp: add ability to send GTP controls headers
       [not found]       ` <9b9476d2-186f-e749-f17d-d191c30347e4@norrbonn.se>
@ 2021-01-30  6:59         ` Pravin Shelar
  2021-01-30 18:44           ` Jakub Kicinski
  0 siblings, 1 reply; 39+ messages in thread
From: Pravin Shelar @ 2021-01-30  6:59 UTC (permalink / raw)
  To: Jonas Bonn
  Cc: Harald Welte, Linux Kernel Network Developers, Pravin B Shelar,
	Jakub Kicinski, Pablo Neira Ayuso

On Fri, Jan 29, 2021 at 6:08 AM Jonas Bonn <jonas@norrbonn.se> wrote:
>
> Hi Pravin,
>
> On 28/01/2021 22:29, Pravin Shelar wrote:
> > On Sun, Jan 24, 2021 at 6:22 AM Jonas Bonn <jonas@norrbonn.se> wrote:
> >>
> >> Hi Pravin,
> >>
> >> So, this whole GTP metadata thing has me a bit confused.
> >>
> >> You've defined a metadata structure like this:
> >>
> >> struct gtpu_metadata {
> >>           __u8    ver;
> >>           __u8    flags;
> >>           __u8    type;
> >> };
> >>
> >> Here ver is the version of the metadata structure itself, which is fine.
> >> 'flags' corresponds to the 3 flag bits of GTP header's first byte:  E,
> >> S, and PN.
> >> 'type' corresponds to the 'message type' field of the GTP header.
> >>
> >> The 'control header' (strange name) example below allows the flags to be
> >> set; however, setting these flags alone is insufficient because each one
> >> indicates the presence of additional fields in the header and there's
> >> nothing in the code to account for that.
> >>
> >> If E is set, extension headers would need to be added.
> >> If S is set, a sequence number field would need to be added.
> >> If PN is set, a PDU-number header would need to be added.
> >>
> >> It's not clear to me who sets up this metadata in the first place.  Is
> >> that where OVS or eBPF come in?  Can you give some pointers to how this
> >> works?
> >>
> >
> > Receive path: LWT extracts tunnel metadata into tunnel-metadata
> > struct. This object has 5-tuple info from outer header and tunnel key.
> > When there is presence of extension header there is no way to store
> > the info standard tunnel-metadata object. That is when the optional
> > section of tunnel-metadata comes in the play.
> > As you can see the packet data from GTP header onwards is still pushed
> > to the device, so consumers of LWT can look at tunnel-metadata and
> > make sense of the inner packet that is received on the device.
> > OVS does exactly the same. When it receives a GTP packet with optional
> > metadata, it looks at flags and parses the inner packet and extension
> > header accordingly.
>
> Ah, ok, I see.  So you are pulling _half_ of the GTP header off the
> packet but leaving the optional GTP extension headers in place if they
> exist.  So what OVS receives is a packet with metadata indicating
> whether or not it begins with these extension headers or whether it
> begins with an IP header.
>
> So OVS might need to begin by pulling parts of the packet in order to
> get to the inner IP packet.  In that case, why don't you just leave the
> _entire_ GTP header in place and let OVS work from that?  The header
> contains exactly the data you've copied to the metadata struct PLUS it
> has the incoming TEID value that you really should be validating inner
> IP against.
>

Following are the reasons for extracting the header and populating metadata.
1. That is the design used by other tunneling protocols
implementations for handling optional headers. We need to have a
consistent model across all tunnel devices for upper layers.
2. GTP module is parsing the UDP and GTP header. It would be wasteful
to repeat the same process in upper layers.
3. TIED is part of tunnel metadata, it is already used to validating
inner packets. But TIED is not alone to handle packets with extended
header.

I am fine with processing the entire header in GTP but in case of 'end
marker' there is no data left after pulling entire GTP header. Thats
why I took this path.

> Also, what happens if this is used WITHOUT OVS/eBPF... just a route with
> 'encap' set.  In that case, nothing will be there to pull the extension
> headers off the packet.
>

One way would be, the user can use encap to steer "special" packets to
a netdev that can handle such packets. it would be a userspace process
or eBPF program.

> >
> > xmit path: it is set by LWT tunnel user, OVS or eBPF code. it needs to
> > set the metadata in tunnel dst along with the 5-tuple data and tunel
> > ID. This is how it can steer the packet at the right tunnel using a
> > single tunnel net device.
>
> Right, that part is fine.  However, for setting extension headers you'd
> need to set the flags in the metadata and the extensions themselves
> prepended to the IP packet meaning your SKB contains a pseudo-GTP packet
> before the GTP module can finish adding the header.  I don't know why
> you wouldn't just add the entire GTP header in one place and be done
> with it... going via the GTP module gets you almost nothing at this point.
>
The UDP socket is owned by GTP module, so there is no other way to
inject a packet in the tunnel than sending it over GTP module.
plus this would looks this will break the standard model of
abstracting tunnel implementation in ovs or bridge code. We will need
special code for GTP packets to parse extra outer headers when OVS
extracts the flow from the inner packet.
I am open to handling  the optional headers completely GTP module.

>
> >
> >> Couple of comments below....
> >>
> >> On 23/01/2021 20:59, Jonas Bonn wrote:
> >>> From: Pravin B Shelar <pbshelar@fb.com>
> >>>
> >>> Please explain how this patch actually works... creation of the control
> >>> header makes sense, but I don't understand how sending of a
> >>> control header is actually triggered.
> >>>
> >>> Signed-off-by: Jonas Bonn <jonas@norrbonn.se>
> >>> ---
> >>>    drivers/net/gtp.c | 43 ++++++++++++++++++++++++++++++++++++++++++-
> >>>    1 file changed, 42 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
> >>> index 668ed8a4836e..bbce2671de2d 100644
> >>> --- a/drivers/net/gtp.c
> >>> +++ b/drivers/net/gtp.c
> >>> @@ -683,6 +683,38 @@ static void gtp_push_header(struct sk_buff *skb, struct pdp_ctx *pctx,
> >>>        }
> >>>    }
> >>>
> >>> +static inline int gtp1_push_control_header(struct sk_buff *skb,
> >>
> >> I'm not enamored with the name 'control header' because it makes sound
> >> like this is some GTP-C thing.  The GTP module is really only about
> >> GTP-U and the function itself just sets up a GTP-U header.
> >>
> > sure. lets call ext_hdr.
> >
> >>
> >>> +                                        struct pdp_ctx *pctx,
> >>> +                                        struct gtpu_metadata *opts,
> >>> +                                        struct net_device *dev)
> >>> +{
> >>> +     struct gtp1_header *gtp1c;
> >>> +     int payload_len;
> >>> +
> >>> +     if (opts->ver != GTP_METADATA_V1)
> >>> +             return -ENOENT;
> >>> +
> >>> +     if (opts->type == 0xFE) {
> >>> +             /* for end marker ignore skb data. */
> >>> +             netdev_dbg(dev, "xmit pkt with null data");
> >>> +             pskb_trim(skb, 0);
> >>> +     }
> >>> +     if (skb_cow_head(skb, sizeof(*gtp1c)) < 0)
> >>> +             return -ENOMEM;
> >>> +
> >>> +     payload_len = skb->len;
> >>> +
> >>> +     gtp1c = skb_push(skb, sizeof(*gtp1c));
> >>> +
> >>> +     gtp1c->flags    = opts->flags;
> >>> +     gtp1c->type     = opts->type;
> >>> +     gtp1c->length   = htons(payload_len);
> >>> +     gtp1c->tid      = htonl(pctx->u.v1.o_tei);
> >>> +     netdev_dbg(dev, "xmit control pkt: ver %d flags %x type %x pkt len %d tid %x",
> >>> +                opts->ver, opts->flags, opts->type, skb->len, pctx->u.v1.o_tei);
> >>> +     return 0;
> >>> +}
> >>
> >> There's nothing really special about that above function aside from the
> >> facts that it takes 'opts' as an argument.  Can't we just merge this
> >> with the regular 'gtp_push_header' function?  Do you have plans for this
> >> beyond what's here that would complicated by doing so?
> >>
> >
> > Yes, we already have usecase for handling GTP PDU session container
> > related extension header for 5G UPF endpoitnt.
>
> I'll venture a guess that you're referring to QoS indications.  I, too,
> have been wondering whether or not these can be made to be compatible
> with the GTP module... I'm not sure, at this point.
>
Right, It's QoS related headers.
We can use the same optional metadata field to process this header. we
need to extend the option data structure to pass PDU session container
header values.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 15/16] gtp: add ability to send GTP controls headers
  2021-01-30  6:59         ` Pravin Shelar
@ 2021-01-30 18:44           ` Jakub Kicinski
  2021-01-30 20:05             ` Pravin Shelar
  0 siblings, 1 reply; 39+ messages in thread
From: Jakub Kicinski @ 2021-01-30 18:44 UTC (permalink / raw)
  To: Pravin Shelar
  Cc: Jonas Bonn, Harald Welte, Linux Kernel Network Developers,
	Pravin B Shelar, Pablo Neira Ayuso

On Fri, 29 Jan 2021 22:59:06 -0800 Pravin Shelar wrote:
> On Fri, Jan 29, 2021 at 6:08 AM Jonas Bonn <jonas@norrbonn.se> wrote:
> > On 28/01/2021 22:29, Pravin Shelar wrote:  
> > > Receive path: LWT extracts tunnel metadata into tunnel-metadata
> > > struct. This object has 5-tuple info from outer header and tunnel key.
> > > When there is presence of extension header there is no way to store
> > > the info standard tunnel-metadata object. That is when the optional
> > > section of tunnel-metadata comes in the play.
> > > As you can see the packet data from GTP header onwards is still pushed
> > > to the device, so consumers of LWT can look at tunnel-metadata and
> > > make sense of the inner packet that is received on the device.
> > > OVS does exactly the same. When it receives a GTP packet with optional
> > > metadata, it looks at flags and parses the inner packet and extension
> > > header accordingly.  
> >
> > Ah, ok, I see.  So you are pulling _half_ of the GTP header off the
> > packet but leaving the optional GTP extension headers in place if they
> > exist.  So what OVS receives is a packet with metadata indicating
> > whether or not it begins with these extension headers or whether it
> > begins with an IP header.
> >
> > So OVS might need to begin by pulling parts of the packet in order to
> > get to the inner IP packet.  In that case, why don't you just leave the
> > _entire_ GTP header in place and let OVS work from that?  The header
> > contains exactly the data you've copied to the metadata struct PLUS it
> > has the incoming TEID value that you really should be validating inner
> > IP against.
> >  
> 
> Following are the reasons for extracting the header and populating metadata.
> 1. That is the design used by other tunneling protocols
> implementations for handling optional headers. We need to have a
> consistent model across all tunnel devices for upper layers.

Could you clarify with some examples? This does not match intuition, 
I must be missing something.

> 2. GTP module is parsing the UDP and GTP header. It would be wasteful
> to repeat the same process in upper layers.
> 3. TIED is part of tunnel metadata, it is already used to validating
> inner packets. But TIED is not alone to handle packets with extended
> header.
> 
> I am fine with processing the entire header in GTP but in case of 'end
> marker' there is no data left after pulling entire GTP header. Thats
> why I took this path.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 15/16] gtp: add ability to send GTP controls headers
  2021-01-30 18:44           ` Jakub Kicinski
@ 2021-01-30 20:05             ` Pravin Shelar
  2021-02-01 20:44               ` Jakub Kicinski
  0 siblings, 1 reply; 39+ messages in thread
From: Pravin Shelar @ 2021-01-30 20:05 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jonas Bonn, Harald Welte, Linux Kernel Network Developers,
	Pravin B Shelar, Pablo Neira Ayuso

On Sat, Jan 30, 2021 at 10:44 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 29 Jan 2021 22:59:06 -0800 Pravin Shelar wrote:
> > On Fri, Jan 29, 2021 at 6:08 AM Jonas Bonn <jonas@norrbonn.se> wrote:
> > > On 28/01/2021 22:29, Pravin Shelar wrote:
> > > > Receive path: LWT extracts tunnel metadata into tunnel-metadata
> > > > struct. This object has 5-tuple info from outer header and tunnel key.
> > > > When there is presence of extension header there is no way to store
> > > > the info standard tunnel-metadata object. That is when the optional
> > > > section of tunnel-metadata comes in the play.
> > > > As you can see the packet data from GTP header onwards is still pushed
> > > > to the device, so consumers of LWT can look at tunnel-metadata and
> > > > make sense of the inner packet that is received on the device.
> > > > OVS does exactly the same. When it receives a GTP packet with optional
> > > > metadata, it looks at flags and parses the inner packet and extension
> > > > header accordingly.
> > >
> > > Ah, ok, I see.  So you are pulling _half_ of the GTP header off the
> > > packet but leaving the optional GTP extension headers in place if they
> > > exist.  So what OVS receives is a packet with metadata indicating
> > > whether or not it begins with these extension headers or whether it
> > > begins with an IP header.
> > >
> > > So OVS might need to begin by pulling parts of the packet in order to
> > > get to the inner IP packet.  In that case, why don't you just leave the
> > > _entire_ GTP header in place and let OVS work from that?  The header
> > > contains exactly the data you've copied to the metadata struct PLUS it
> > > has the incoming TEID value that you really should be validating inner
> > > IP against.
> > >
> >
> > Following are the reasons for extracting the header and populating metadata.
> > 1. That is the design used by other tunneling protocols
> > implementations for handling optional headers. We need to have a
> > consistent model across all tunnel devices for upper layers.
>
> Could you clarify with some examples? This does not match intuition,
> I must be missing something.
>

You can look at geneve_rx() or vxlan_rcv() that extracts optional
headers in ip_tunnel_info opts.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 15/16] gtp: add ability to send GTP controls headers
  2021-01-30 20:05             ` Pravin Shelar
@ 2021-02-01 20:44               ` Jakub Kicinski
  2021-02-02  5:24                 ` Jonas Bonn
  0 siblings, 1 reply; 39+ messages in thread
From: Jakub Kicinski @ 2021-02-01 20:44 UTC (permalink / raw)
  To: Pravin Shelar
  Cc: Jonas Bonn, Harald Welte, Linux Kernel Network Developers,
	Pravin B Shelar, Pablo Neira Ayuso

On Sat, 30 Jan 2021 12:05:40 -0800 Pravin Shelar wrote:
> On Sat, Jan 30, 2021 at 10:44 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > On Fri, 29 Jan 2021 22:59:06 -0800 Pravin Shelar wrote:  
> > > On Fri, Jan 29, 2021 at 6:08 AM Jonas Bonn <jonas@norrbonn.se> wrote:  
> > > Following are the reasons for extracting the header and populating metadata.
> > > 1. That is the design used by other tunneling protocols
> > > implementations for handling optional headers. We need to have a
> > > consistent model across all tunnel devices for upper layers.  
> >
> > Could you clarify with some examples? This does not match intuition,
> > I must be missing something.
> 
> You can look at geneve_rx() or vxlan_rcv() that extracts optional
> headers in ip_tunnel_info opts.

Okay, I got confused what Jonas was inquiring about. I thought that the
extension headers were not pulled, rather than not parsed. Copying them
as-is to info->opts is right, thanks!

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 15/16] gtp: add ability to send GTP controls headers
  2021-02-01 20:44               ` Jakub Kicinski
@ 2021-02-02  5:24                 ` Jonas Bonn
  2021-02-02  6:56                   ` Pravin Shelar
  0 siblings, 1 reply; 39+ messages in thread
From: Jonas Bonn @ 2021-02-02  5:24 UTC (permalink / raw)
  To: Jakub Kicinski, Pravin Shelar
  Cc: Harald Welte, Linux Kernel Network Developers, Pravin B Shelar,
	Pablo Neira Ayuso

Hi Jakub,

On 01/02/2021 21:44, Jakub Kicinski wrote:
> On Sat, 30 Jan 2021 12:05:40 -0800 Pravin Shelar wrote:
>> On Sat, Jan 30, 2021 at 10:44 AM Jakub Kicinski <kuba@kernel.org> wrote:
>>> On Fri, 29 Jan 2021 22:59:06 -0800 Pravin Shelar wrote:
>>>> On Fri, Jan 29, 2021 at 6:08 AM Jonas Bonn <jonas@norrbonn.se> wrote:
>>>> Following are the reasons for extracting the header and populating metadata.
>>>> 1. That is the design used by other tunneling protocols
>>>> implementations for handling optional headers. We need to have a
>>>> consistent model across all tunnel devices for upper layers.
>>>
>>> Could you clarify with some examples? This does not match intuition,
>>> I must be missing something.
>>
>> You can look at geneve_rx() or vxlan_rcv() that extracts optional
>> headers in ip_tunnel_info opts.
> 
> Okay, I got confused what Jonas was inquiring about. I thought that the
> extension headers were not pulled, rather than not parsed. Copying them
> as-is to info->opts is right, thanks!
> 

No, you're not confused.  The extension headers are not being pulled in 
the current patchset.

Incoming packet:

---------------------------------------------------------------------
| flags | type | len | TEID | N-PDU | SEQ | Ext | EXT.Hdr | IP | ...
---------------------------------------------------------------------
<--------- GTP header ------<<Optional GTP elements>>-----><- Pkt --->

The "collect metadata" path of the patchset copies 'flags' and 'type' to 
info->opts, but leaves the following:

-----------------------------------------
| N-PDU | SEQ | Ext | EXT.Hdr | IP | ...
-----------------------------------------
<--------- GTP header -------><- Pkt --->

So it's leaving _half_ the header and making it a requirement that there 
be further intelligence down the line that can handle this.  This is far 
from intuitive.

/Jonas

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 15/16] gtp: add ability to send GTP controls headers
  2021-02-02  5:24                 ` Jonas Bonn
@ 2021-02-02  6:56                   ` Pravin Shelar
  2021-02-02  8:03                     ` Jonas Bonn
  0 siblings, 1 reply; 39+ messages in thread
From: Pravin Shelar @ 2021-02-02  6:56 UTC (permalink / raw)
  To: Jonas Bonn
  Cc: Jakub Kicinski, Harald Welte, Linux Kernel Network Developers,
	Pravin B Shelar, Pablo Neira Ayuso

On Mon, Feb 1, 2021 at 9:24 PM Jonas Bonn <jonas@norrbonn.se> wrote:
>
> Hi Jakub,
>
> On 01/02/2021 21:44, Jakub Kicinski wrote:
> > On Sat, 30 Jan 2021 12:05:40 -0800 Pravin Shelar wrote:
> >> On Sat, Jan 30, 2021 at 10:44 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >>> On Fri, 29 Jan 2021 22:59:06 -0800 Pravin Shelar wrote:
> >>>> On Fri, Jan 29, 2021 at 6:08 AM Jonas Bonn <jonas@norrbonn.se> wrote:
> >>>> Following are the reasons for extracting the header and populating metadata.
> >>>> 1. That is the design used by other tunneling protocols
> >>>> implementations for handling optional headers. We need to have a
> >>>> consistent model across all tunnel devices for upper layers.
> >>>
> >>> Could you clarify with some examples? This does not match intuition,
> >>> I must be missing something.
> >>
> >> You can look at geneve_rx() or vxlan_rcv() that extracts optional
> >> headers in ip_tunnel_info opts.
> >
> > Okay, I got confused what Jonas was inquiring about. I thought that the
> > extension headers were not pulled, rather than not parsed. Copying them
> > as-is to info->opts is right, thanks!
> >
>
> No, you're not confused.  The extension headers are not being pulled in
> the current patchset.
>
> Incoming packet:
>
> ---------------------------------------------------------------------
> | flags | type | len | TEID | N-PDU | SEQ | Ext | EXT.Hdr | IP | ...
> ---------------------------------------------------------------------
> <--------- GTP header ------<<Optional GTP elements>>-----><- Pkt --->
>
> The "collect metadata" path of the patchset copies 'flags' and 'type' to
> info->opts, but leaves the following:
>
> -----------------------------------------
> | N-PDU | SEQ | Ext | EXT.Hdr | IP | ...
> -----------------------------------------
> <--------- GTP header -------><- Pkt --->
>
> So it's leaving _half_ the header and making it a requirement that there
> be further intelligence down the line that can handle this.  This is far
> from intuitive.
>

The patch supports Echo, Echo response and End marker packet.
Issue with pulling the entire extension header is that it would result
in zero length skb, such packets can not be passed on to the upper
layer. That is the reason I kept the extension header in skb and added
indication in tunnel metadata that it is not a IP packet. so that
upper layer can process the packet.
IP packet without an extension header would be handled in a fast path
without any special handling.

Obviously In case of PDU session container extension header GTP driver
would need to process the entire extension header in the module. This
way we can handle these user data packets in fastpath.
I can make changes to use the same method for all extension headers if needed.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 15/16] gtp: add ability to send GTP controls headers
  2021-02-02  6:56                   ` Pravin Shelar
@ 2021-02-02  8:03                     ` Jonas Bonn
  2021-02-02 22:54                       ` Pravin Shelar
  0 siblings, 1 reply; 39+ messages in thread
From: Jonas Bonn @ 2021-02-02  8:03 UTC (permalink / raw)
  To: Pravin Shelar
  Cc: Jakub Kicinski, Harald Welte, Linux Kernel Network Developers,
	Pravin B Shelar, Pablo Neira Ayuso



On 02/02/2021 07:56, Pravin Shelar wrote:
> On Mon, Feb 1, 2021 at 9:24 PM Jonas Bonn <jonas@norrbonn.se> wrote:
>>
>> Hi Jakub,
>>
>> On 01/02/2021 21:44, Jakub Kicinski wrote:
>>> On Sat, 30 Jan 2021 12:05:40 -0800 Pravin Shelar wrote:
>>>> On Sat, Jan 30, 2021 at 10:44 AM Jakub Kicinski <kuba@kernel.org> wrote:
>>>>> On Fri, 29 Jan 2021 22:59:06 -0800 Pravin Shelar wrote:
>>>>>> On Fri, Jan 29, 2021 at 6:08 AM Jonas Bonn <jonas@norrbonn.se> wrote:
>>>>>> Following are the reasons for extracting the header and populating metadata.
>>>>>> 1. That is the design used by other tunneling protocols
>>>>>> implementations for handling optional headers. We need to have a
>>>>>> consistent model across all tunnel devices for upper layers.
>>>>>
>>>>> Could you clarify with some examples? This does not match intuition,
>>>>> I must be missing something.
>>>>
>>>> You can look at geneve_rx() or vxlan_rcv() that extracts optional
>>>> headers in ip_tunnel_info opts.
>>>
>>> Okay, I got confused what Jonas was inquiring about. I thought that the
>>> extension headers were not pulled, rather than not parsed. Copying them
>>> as-is to info->opts is right, thanks!
>>>
>>
>> No, you're not confused.  The extension headers are not being pulled in
>> the current patchset.
>>
>> Incoming packet:
>>
>> ---------------------------------------------------------------------
>> | flags | type | len | TEID | N-PDU | SEQ | Ext | EXT.Hdr | IP | ...
>> ---------------------------------------------------------------------
>> <--------- GTP header ------<<Optional GTP elements>>-----><- Pkt --->
>>
>> The "collect metadata" path of the patchset copies 'flags' and 'type' to
>> info->opts, but leaves the following:
>>
>> -----------------------------------------
>> | N-PDU | SEQ | Ext | EXT.Hdr | IP | ...
>> -----------------------------------------
>> <--------- GTP header -------><- Pkt --->
>>
>> So it's leaving _half_ the header and making it a requirement that there
>> be further intelligence down the line that can handle this.  This is far
>> from intuitive.
>>
> 
> The patch supports Echo, Echo response and End marker packet.
> Issue with pulling the entire extension header is that it would result
> in zero length skb, such packets can not be passed on to the upper
> layer. That is the reason I kept the extension header in skb and added
> indication in tunnel metadata that it is not a IP packet. so that
> upper layer can process the packet.
> IP packet without an extension header would be handled in a fast path
> without any special handling.
> 
> Obviously In case of PDU session container extension header GTP driver
> would need to process the entire extension header in the module. This
> way we can handle these user data packets in fastpath.
> I can make changes to use the same method for all extension headers if needed.
> 

The most disturbing bit is the fact that the upper layer needs to 
understand that part of the header info is in info->opts whereas the 
remainder is on the SKB itself.  If it is going to access the SKB 
anyway, why not just leave the entire GTP header in place and let the 
upper layer just get all the information from there?  What's the 
advantage of info->opts in this case?

Normally, the gtp module extracts T-PDU's from the GTP packet and passes 
them on (after validating their IP address) to the network stack.  For 
_everything else_, it just passes them along the socket for handling 
elsewhere.

It sounds like you are trying to do exactly the same thing:  extract 
T-PDU and inject into network stack for T-PDU's, and pass everything
else to another handler.

So what is different in your case from the normal case?
- there's metadata on the packet... can't we detect this and set the 
tunnel ID from the TEID in that case?  Or can't we just always have 
metadata on the packet?
- the upper layer handler is in kernel space instead of userspace; but 
they are doing pretty much the same thing, right?  why does the kernel 
space variant need something (info->opts) that userspace can get by without?

It would be seriously good to see a _real_ example of how you intend to 
use this.  Isn't the PDP context mechanism already sufficient to do all 
of the above?  What's missing?

ip route 192.168.99.0/24 encap gtp id 100 dst 172.99.0.2 dev gtp1

is roughly equivalent to:

gtp-tunnel add gtp1 v1 [LOCAL_TEID] 100 172.99.0.2 [UE_IP]

/Jonas



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 15/16] gtp: add ability to send GTP controls headers
  2021-02-02  8:03                     ` Jonas Bonn
@ 2021-02-02 22:54                       ` Pravin Shelar
  0 siblings, 0 replies; 39+ messages in thread
From: Pravin Shelar @ 2021-02-02 22:54 UTC (permalink / raw)
  To: Jonas Bonn
  Cc: Jakub Kicinski, Harald Welte, Linux Kernel Network Developers,
	Pravin B Shelar, Pablo Neira Ayuso

On Tue, Feb 2, 2021 at 12:03 AM Jonas Bonn <jonas@norrbonn.se> wrote:
>
>
>
> On 02/02/2021 07:56, Pravin Shelar wrote:
> > On Mon, Feb 1, 2021 at 9:24 PM Jonas Bonn <jonas@norrbonn.se> wrote:
> >>
> >> Hi Jakub,
> >>
> >> On 01/02/2021 21:44, Jakub Kicinski wrote:
> >>> On Sat, 30 Jan 2021 12:05:40 -0800 Pravin Shelar wrote:
> >>>> On Sat, Jan 30, 2021 at 10:44 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >>>>> On Fri, 29 Jan 2021 22:59:06 -0800 Pravin Shelar wrote:
> >>>>>> On Fri, Jan 29, 2021 at 6:08 AM Jonas Bonn <jonas@norrbonn.se> wrote:
> >>>>>> Following are the reasons for extracting the header and populating metadata.
> >>>>>> 1. That is the design used by other tunneling protocols
> >>>>>> implementations for handling optional headers. We need to have a
> >>>>>> consistent model across all tunnel devices for upper layers.
> >>>>>
> >>>>> Could you clarify with some examples? This does not match intuition,
> >>>>> I must be missing something.
> >>>>
> >>>> You can look at geneve_rx() or vxlan_rcv() that extracts optional
> >>>> headers in ip_tunnel_info opts.
> >>>
> >>> Okay, I got confused what Jonas was inquiring about. I thought that the
> >>> extension headers were not pulled, rather than not parsed. Copying them
> >>> as-is to info->opts is right, thanks!
> >>>
> >>
> >> No, you're not confused.  The extension headers are not being pulled in
> >> the current patchset.
> >>
> >> Incoming packet:
> >>
> >> ---------------------------------------------------------------------
> >> | flags | type | len | TEID | N-PDU | SEQ | Ext | EXT.Hdr | IP | ...
> >> ---------------------------------------------------------------------
> >> <--------- GTP header ------<<Optional GTP elements>>-----><- Pkt --->
> >>
> >> The "collect metadata" path of the patchset copies 'flags' and 'type' to
> >> info->opts, but leaves the following:
> >>
> >> -----------------------------------------
> >> | N-PDU | SEQ | Ext | EXT.Hdr | IP | ...
> >> -----------------------------------------
> >> <--------- GTP header -------><- Pkt --->
> >>
> >> So it's leaving _half_ the header and making it a requirement that there
> >> be further intelligence down the line that can handle this.  This is far
> >> from intuitive.
> >>
> >
> > The patch supports Echo, Echo response and End marker packet.
> > Issue with pulling the entire extension header is that it would result
> > in zero length skb, such packets can not be passed on to the upper
> > layer. That is the reason I kept the extension header in skb and added
> > indication in tunnel metadata that it is not a IP packet. so that
> > upper layer can process the packet.
> > IP packet without an extension header would be handled in a fast path
> > without any special handling.
> >
> > Obviously In case of PDU session container extension header GTP driver
> > would need to process the entire extension header in the module. This
> > way we can handle these user data packets in fastpath.
> > I can make changes to use the same method for all extension headers if needed.
> >
>
> The most disturbing bit is the fact that the upper layer needs to
> understand that part of the header info is in info->opts whereas the
> remainder is on the SKB itself.  If it is going to access the SKB
> anyway, why not just leave the entire GTP header in place and let the
> upper layer just get all the information from there?  What's the
> advantage of info->opts in this case?
>
> Normally, the gtp module extracts T-PDU's from the GTP packet and passes
> them on (after validating their IP address) to the network stack.  For
> _everything else_, it just passes them along the socket for handling
> elsewhere.
>
> It sounds like you are trying to do exactly the same thing:  extract
> T-PDU and inject into network stack for T-PDU's, and pass everything
> else to another handler.
>
> So what is different in your case from the normal case?
> - there's metadata on the packet... can't we detect this and set the
> tunnel ID from the TEID in that case?  Or can't we just always have
> metadata on the packet?
> - the upper layer handler is in kernel space instead of userspace; but
> they are doing pretty much the same thing, right?  why does the kernel
> space variant need something (info->opts) that userspace can get by without?
>
This is how OVS/eBPF abstracts tunneling implementation. Tunnel
context is represented in tunnel metadata. I am trying to integrate
GTP tunnel with LTW framework. This way we can make use of GTP tunnel
as any other LWT device.

I am fine with the GTP extension header handling that passes GTP
packet as is in case extension header. Can you make those changes in
this series and post Non RFC Patch Set.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 14/16] gtp: add support for flow based tunneling
  2021-01-23 19:59 ` [RFC PATCH 14/16] gtp: add support for flow based tunneling Jonas Bonn
                     ` (3 preceding siblings ...)
  2021-01-25  8:47   ` Jonas Bonn
@ 2021-02-06 18:04   ` Jonas Bonn
  2021-02-07 17:56     ` Pravin Shelar
  4 siblings, 1 reply; 39+ messages in thread
From: Jonas Bonn @ 2021-02-06 18:04 UTC (permalink / raw)
  To: laforge, netdev, pbshelar, kuba; +Cc: pablo

Hi Pravin et al;

TL;DR:  we don't need to introduce an entire collect_md mode to the 
driver; we just need to tweak what we've got so that metadata is 
"always" added on RX and respected on TX; make the userspace socket 
optional and dump GTP packets to netstack if it's not present; and make 
a decision on whether to allow packets into netstack without validating 
their IP address.

On 23/01/2021 20:59, Jonas Bonn wrote:
> From: Pravin B Shelar <pbshelar@fb.com>
> 
> This patch adds support for flow based tunneling, allowing to send and
> receive GTP tunneled packets via the (lightweight) tunnel metadata
> mechanism.  This would allow integration with OVS and eBPF using flow
> based tunneling APIs.
> 
> The mechanism used here is to get the required GTP tunnel parameters
> from the tunnel metadata instead of looking up a pre-configured PDP
> context.  The tunnel metadata contains the necessary information for
> creating the GTP header.


So, I've been investigating this a bit further...

What is being introduced in this patch is a variant of "normal 
functionality" that resembles something called "collect metadata" mode 
in other drivers (GRE, VXLAN, etc).  These other drivers operate in one 
of two modes:  more-or-less-point-to-point mode, or this alternative 
collect metadata varaint.  The latter is something that has been bolted 
on to an existing implementation of the former.

For iproute2, a parameter called 'external' is added to the setup of 
links of the above type to switch between the two modes; the 
point-to-point parameters are invalid in 'external' (or 'collect 
metadata') mode.  The name 'external' is because the driver is 
externally controlled, meaning that the tunnel information is not 
hardcoded into the device instance itself.

The GTP driver, however, has never been a point-to-point device.  It is 
already 'externally controlled' in the sense that tunnels can be added 
and removed at any time.  Adding this 'external' mode doesn't 
immediately make sense because that's roughly what we already have.

Looking into how ip_tunnel_collect_metadata() works, it looks to me like 
it's always true if there's _ANY_ routing rule in the system with 
'tunnel ID' set.  Is that correct?  I'll assume it is in order to 
continue my thoughts.

So, with that, either the system is collecting SKB metadata or it isn't. 
  If it is, it seems reasonable that we populate incoming packets with 
the tunnel ID as in this patch.  That said, I'm not convinced that we 
should bypass the PDP context mechanism entirely... there should still 
be a PDP context set up or we drop the packet for security reasons.

For outgoing packets, it seems reasonable that the remote TEID may come 
from metadata OR a PDP context.  That would allow some routing 
alternatives to what we have right now which just does a PDP context 
lookup based on the destination/source address of the packet.  It would 
be nice for OVS/BPF to be able to direct a packet to a remote GTP 
endpoint by providing that endpoint/TEID via a metadata structure.

So we end up with, roughly:

On RX:
i) check TEID in GTP header
ii) lookup PDP context
iii) validate IP of encapsulated packet
iv) if ip(tunnel_collect_metadata()) { /* add tun info */ }
v) decapsulate and pass to network stack

On TX:
i) if SKB has metadata, get destination and TEID from metadata (tunnel ID)
ii) otherwise, lookup PDP context for packet

For RX, only iv) is new; for TX only step i) is new.  The rest is what 
we already have.

The one thing that this complicates is the case where an external entity 
(OVS or BPF) is doing validation of packet IP against incoming TEID.  In 
that case, we've got double validation of the incoming address and, I 
suppose, OVS/BPF would perhaps be more efficient (?)...  In that case, 
holding a PDP context is a bit of a waste of memory.

It's somewhat of a security issue to allow un-checked packets into the 
system, so I'm not super keen on dropping the validation of incoming 
packets; given the previous paragraph, however, we might add a flag when 
creating the link to decide whether or not we allow packets through even 
if we can't validate them.  This would also apply to packets without a 
PDP context for an incoming TEID, too.  This flag, I suppose, looks a 
bit like the 'collect_metadata' flag that Pravin has added here.

The other difference to what we currently have is that this patch sets 
up a socket exclusively in kernel space for the tunnel; currently, all 
sockets terminate in userspace:  userspace receives all packets that 
can't be decapsulated and re-injected in to the network stack.  With 
this patch, ALL packets (without a userspace termination) are 
re-injected into the network stack; it's just that anything that can't 
be decapsulated gets injected as a "GTP packet" with some metadata and 
an UNKNOWN protocol type.  If nothing is looking at the metadata and 
acting on it, then these will just get dropped; and that's what would 
happen if nothing was listening on the userspace end, too.  So we might 
as well just make the FD1 socket parameter to the link setup optional 
and be done with it.

So, thoughts?  What am I missing?

/Jonas

> 
> Signed-off-by: Jonas Bonn <jonas@norrbonn.se>
> ---
>   drivers/net/gtp.c                  | 160 +++++++++++++++++++++++++----
>   include/uapi/linux/gtp.h           |  12 +++
>   include/uapi/linux/if_tunnel.h     |   1 +
>   tools/include/uapi/linux/if_link.h |   1 +
>   4 files changed, 156 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
> index 8aab46ec8a94..668ed8a4836e 100644
> --- a/drivers/net/gtp.c
> +++ b/drivers/net/gtp.c
> @@ -21,6 +21,7 @@
>   #include <linux/file.h>
>   #include <linux/gtp.h>
>   
> +#include <net/dst_metadata.h>
>   #include <net/net_namespace.h>
>   #include <net/protocol.h>
>   #include <net/ip.h>
> @@ -74,6 +75,9 @@ struct gtp_dev {
>   	unsigned int		hash_size;
>   	struct hlist_head	*tid_hash;
>   	struct hlist_head	*addr_hash;
> +	/* Used by LWT tunnel. */
> +	bool			collect_md;
> +	struct socket		*collect_md_sock;
>   };
>   
>   static unsigned int gtp_net_id __read_mostly;
> @@ -224,6 +228,51 @@ static int gtp_rx(struct pdp_ctx *pctx, struct sk_buff *skb,
>   	return -1;
>   }
>   
> +static int gtp_set_tun_dst(struct pdp_ctx *pctx, struct sk_buff *skb,
> +			   unsigned int hdrlen)
> +{
> +	struct metadata_dst *tun_dst;
> +	struct gtp1_header *gtp1;
> +	int opts_len = 0;
> +	__be64 tid;
> +
> +	gtp1 = (struct gtp1_header *)(skb->data + sizeof(struct udphdr));
> +
> +	tid = key32_to_tunnel_id(gtp1->tid);
> +
> +	if (unlikely(gtp1->flags & GTP1_F_MASK))
> +		opts_len = sizeof(struct gtpu_metadata);
> +
> +	tun_dst = udp_tun_rx_dst(skb,
> +			pctx->sk->sk_family, TUNNEL_KEY, tid, opts_len);
> +	if (!tun_dst) {
> +		netdev_dbg(pctx->dev, "Failed to allocate tun_dst");
> +		goto err;
> +	}
> +
> +	netdev_dbg(pctx->dev, "attaching metadata_dst to skb, gtp ver %d hdrlen %d\n",
> +		   pctx->gtp_version, hdrlen);
> +	if (unlikely(opts_len)) {
> +		struct gtpu_metadata *opts;
> +
> +		opts = ip_tunnel_info_opts(&tun_dst->u.tun_info);
> +		opts->ver = GTP_METADATA_V1;
> +		opts->flags = gtp1->flags;
> +		opts->type = gtp1->type;
> +		netdev_dbg(pctx->dev, "recved control pkt: flag %x type: %d\n",
> +			   opts->flags, opts->type);
> +		tun_dst->u.tun_info.key.tun_flags |= TUNNEL_GTPU_OPT;
> +		tun_dst->u.tun_info.options_len = opts_len;
> +		skb->protocol = htons(0xffff);         /* Unknown */
> +	}
> +
> +	skb_dst_set(skb, &tun_dst->dst);
> +	return 0;
> +err:
> +	return -1;
> +}
> +
> +
>   /* 1 means pass up to the stack, -1 means drop and 0 means decapsulated. */
>   static int gtp0_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
>   {
> @@ -262,6 +311,7 @@ static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
>   	unsigned int hdrlen = sizeof(struct udphdr) +
>   			      sizeof(struct gtp1_header);
>   	struct gtp1_header *gtp1;
> +	struct pdp_ctx md_pctx;
>   	struct pdp_ctx *pctx;
>   
>   	if (!pskb_may_pull(skb, hdrlen))
> @@ -272,6 +322,24 @@ static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
>   	if ((gtp1->flags >> 5) != GTP_V1)
>   		return 1;
>   
> +	if (ip_tunnel_collect_metadata() || gtp->collect_md) {
> +		int err;
> +
> +		pctx = &md_pctx;
> +
> +		pctx->gtp_version = GTP_V1;
> +		pctx->sk = gtp->sk1u;
> +		pctx->dev = gtp->dev;
> +
> +		err = gtp_set_tun_dst(pctx, skb, hdrlen);
> +		if (err) {
> +			gtp->dev->stats.rx_dropped++;
> +			return -1;
> +		}
> +
> +		return gtp_rx(pctx, skb, hdrlen);
> +	}
> +
>   	if (gtp1->type != GTP_TPDU)
>   		return 1;
>   
> @@ -353,7 +421,8 @@ static int gtp_encap_recv(struct sock *sk, struct sk_buff *skb)
>   	if (!gtp)
>   		return 1;
>   
> -	netdev_dbg(gtp->dev, "encap_recv sk=%p\n", sk);
> +	netdev_dbg(gtp->dev, "encap_recv sk=%p type %d\n",
> +		   sk, udp_sk(sk)->encap_type);
>   
>   	switch (udp_sk(sk)->encap_type) {
>   	case UDP_ENCAP_GTP0:
> @@ -539,7 +608,7 @@ static struct rtable *gtp_get_v4_rt(struct sk_buff *skb,
>   	memset(&fl4, 0, sizeof(fl4));
>   	fl4.flowi4_oif		= sk->sk_bound_dev_if;
>   	fl4.daddr		= pctx->peer_addr_ip4.s_addr;
> -	fl4.saddr		= inet_sk(sk)->inet_saddr;
> +	fl4.saddr		= *saddr;
>   	fl4.flowi4_tos		= RT_CONN_FLAGS(sk);
>   	fl4.flowi4_proto	= sk->sk_protocol;
>   
> @@ -617,29 +686,84 @@ static void gtp_push_header(struct sk_buff *skb, struct pdp_ctx *pctx,
>   static int gtp_xmit_ip4(struct sk_buff *skb, struct net_device *dev)
>   {
>   	struct gtp_dev *gtp = netdev_priv(dev);
> +	struct gtpu_metadata *opts = NULL;
> +	struct pdp_ctx md_pctx;
>   	struct pdp_ctx *pctx;
> +	__be16 port;
>   	struct rtable *rt;
> -	__be32 saddr;
>   	struct iphdr *iph;
> +	__be32 saddr;
>   	int headroom;
> -	__be16 port;
> +	__u8 tos;
>   	int r;
>   
> -	/* Read the IP destination address and resolve the PDP context.
> -	 * Prepend PDP header with TEI/TID from PDP ctx.
> -	 */
> -	iph = ip_hdr(skb);
> -	if (gtp->role == GTP_ROLE_SGSN)
> -		pctx = ipv4_pdp_find(gtp, iph->saddr);
> -	else
> -		pctx = ipv4_pdp_find(gtp, iph->daddr);
> +	if (gtp->collect_md) {
> +		/* LWT GTP1U encap */
> +		struct ip_tunnel_info *info = NULL;
>   
> -	if (!pctx) {
> -		netdev_dbg(dev, "no PDP ctx found for %pI4, skip\n",
> -			   &iph->daddr);
> -		return -ENOENT;
> +		info = skb_tunnel_info(skb);
> +		if (!info) {
> +			netdev_dbg(dev, "missing tunnel info");
> +			return -ENOENT;
> +		}
> +		if (info->key.tp_dst && ntohs(info->key.tp_dst) != GTP1U_PORT) {
> +			netdev_dbg(dev, "unexpected GTP dst port: %d", ntohs(info->key.tp_dst));
> +			return -EOPNOTSUPP;
> +		}
> +
> +		if (!gtp->sk1u) {
> +			netdev_dbg(dev, "missing tunnel sock");
> +			return -EOPNOTSUPP;
> +		}
> +
> +		pctx = &md_pctx;
> +		memset(pctx, 0, sizeof(*pctx));
> +		pctx->sk = gtp->sk1u;
> +		pctx->gtp_version = GTP_V1;
> +		pctx->u.v1.o_tei = ntohl(tunnel_id_to_key32(info->key.tun_id));
> +		pctx->peer_addr_ip4.s_addr = info->key.u.ipv4.dst;
> +
> +		saddr = info->key.u.ipv4.src;
> +		tos = info->key.tos;
> +
> +		if (info->options_len != 0) {
> +			if (info->key.tun_flags & TUNNEL_GTPU_OPT) {
> +				opts = ip_tunnel_info_opts(info);
> +			} else {
> +				netdev_dbg(dev, "missing tunnel metadata for control pkt");
> +				return -EOPNOTSUPP;
> +			}
> +		}
> +		netdev_dbg(dev, "flow-based GTP1U encap: tunnel id %d\n",
> +			   pctx->u.v1.o_tei);
> +	} else {
> +		struct iphdr *iph;
> +
> +		if (ntohs(skb->protocol) != ETH_P_IP)
> +			return -EOPNOTSUPP;
> +
> +		iph = ip_hdr(skb);
> +
> +		/* Read the IP destination address and resolve the PDP context.
> +		 * Prepend PDP header with TEI/TID from PDP ctx.
> +		 */
> +		if (gtp->role == GTP_ROLE_SGSN)
> +			pctx = ipv4_pdp_find(gtp, iph->saddr);
> +		else
> +			pctx = ipv4_pdp_find(gtp, iph->daddr);
> +
> +		if (!pctx) {
> +			netdev_dbg(dev, "no PDP ctx found for %pI4, skip\n",
> +				   &iph->daddr);
> +			return -ENOENT;
> +		}
> +		netdev_dbg(dev, "found PDP context %p\n", pctx);
> +
> +		saddr = inet_sk(pctx->sk)->inet_saddr;
> +		tos = iph->tos;
> +		netdev_dbg(dev, "gtp -> IP src: %pI4 dst: %pI4\n",
> +			   &iph->saddr, &iph->daddr);
>   	}
> -	netdev_dbg(dev, "found PDP context %p\n", pctx);
>   
>   	rt = gtp_get_v4_rt(skb, dev, pctx, &saddr);
>   	if (IS_ERR(rt)) {
> @@ -691,7 +815,7 @@ static int gtp_xmit_ip4(struct sk_buff *skb, struct net_device *dev)
>   
>   	udp_tunnel_xmit_skb(rt, pctx->sk, skb,
>   			    saddr, pctx->peer_addr_ip4.s_addr,
> -			    iph->tos,
> +			    tos,
>   			    ip4_dst_hoplimit(&rt->dst),
>   			    0,
>   			    port, port,
> diff --git a/include/uapi/linux/gtp.h b/include/uapi/linux/gtp.h
> index 79f9191bbb24..62aff78b7c56 100644
> --- a/include/uapi/linux/gtp.h
> +++ b/include/uapi/linux/gtp.h
> @@ -2,6 +2,8 @@
>   #ifndef _UAPI_LINUX_GTP_H_
>   #define _UAPI_LINUX_GTP_H_
>   
> +#include <linux/types.h>
> +
>   #define GTP_GENL_MCGRP_NAME	"gtp"
>   
>   enum gtp_genl_cmds {
> @@ -34,4 +36,14 @@ enum gtp_attrs {
>   };
>   #define GTPA_MAX (__GTPA_MAX + 1)
>   
> +enum {
> +	GTP_METADATA_V1
> +};
> +
> +struct gtpu_metadata {
> +	__u8    ver;
> +	__u8    flags;
> +	__u8    type;
> +};
> +
>   #endif /* _UAPI_LINUX_GTP_H_ */
> diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
> index 7d9105533c7b..802da679fab1 100644
> --- a/include/uapi/linux/if_tunnel.h
> +++ b/include/uapi/linux/if_tunnel.h
> @@ -176,6 +176,7 @@ enum {
>   #define TUNNEL_VXLAN_OPT	__cpu_to_be16(0x1000)
>   #define TUNNEL_NOCACHE		__cpu_to_be16(0x2000)
>   #define TUNNEL_ERSPAN_OPT	__cpu_to_be16(0x4000)
> +#define TUNNEL_GTPU_OPT		__cpu_to_be16(0x8000)
>   
>   #define TUNNEL_OPTIONS_PRESENT \
>   		(TUNNEL_GENEVE_OPT | TUNNEL_VXLAN_OPT | TUNNEL_ERSPAN_OPT)
> diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h
> index d208b2af697f..28d649bda686 100644
> --- a/tools/include/uapi/linux/if_link.h
> +++ b/tools/include/uapi/linux/if_link.h
> @@ -617,6 +617,7 @@ enum {
>   	IFLA_GTP_FD1,
>   	IFLA_GTP_PDP_HASHSIZE,
>   	IFLA_GTP_ROLE,
> +	IFLA_GTP_COLLECT_METADATA,
>   	__IFLA_GTP_MAX,
>   };
>   #define IFLA_GTP_MAX (__IFLA_GTP_MAX - 1)
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 14/16] gtp: add support for flow based tunneling
  2021-02-06 18:04   ` Jonas Bonn
@ 2021-02-07 17:56     ` Pravin Shelar
  0 siblings, 0 replies; 39+ messages in thread
From: Pravin Shelar @ 2021-02-07 17:56 UTC (permalink / raw)
  To: Jonas Bonn
  Cc: Harald Welte, Linux Kernel Network Developers, Pravin B Shelar,
	Jakub Kicinski, Pablo Neira Ayuso

On Sat, Feb 6, 2021 at 10:06 AM Jonas Bonn <jonas@norrbonn.se> wrote:
>
> Hi Pravin et al;
>
> TL;DR:  we don't need to introduce an entire collect_md mode to the
> driver; we just need to tweak what we've got so that metadata is
> "always" added on RX and respected on TX; make the userspace socket
> optional and dump GTP packets to netstack if it's not present; and make
> a decision on whether to allow packets into netstack without validating
> their IP address.
>
Thanks for summarizing the LWT mechanism below. Overall I am fine with
the changes, a couple of comments inlined.


> On 23/01/2021 20:59, Jonas Bonn wrote:
> > From: Pravin B Shelar <pbshelar@fb.com>
> >
> > This patch adds support for flow based tunneling, allowing to send and
> > receive GTP tunneled packets via the (lightweight) tunnel metadata
> > mechanism.  This would allow integration with OVS and eBPF using flow
> > based tunneling APIs.
> >
> > The mechanism used here is to get the required GTP tunnel parameters
> > from the tunnel metadata instead of looking up a pre-configured PDP
> > context.  The tunnel metadata contains the necessary information for
> > creating the GTP header.
>
>
> So, I've been investigating this a bit further...
>
> What is being introduced in this patch is a variant of "normal
> functionality" that resembles something called "collect metadata" mode
> in other drivers (GRE, VXLAN, etc).  These other drivers operate in one
> of two modes:  more-or-less-point-to-point mode, or this alternative
> collect metadata varaint.  The latter is something that has been bolted
> on to an existing implementation of the former.
>
> For iproute2, a parameter called 'external' is added to the setup of
> links of the above type to switch between the two modes; the
> point-to-point parameters are invalid in 'external' (or 'collect
> metadata') mode.  The name 'external' is because the driver is
> externally controlled, meaning that the tunnel information is not
> hardcoded into the device instance itself.
>
> The GTP driver, however, has never been a point-to-point device.  It is
> already 'externally controlled' in the sense that tunnels can be added
> and removed at any time.  Adding this 'external' mode doesn't
> immediately make sense because that's roughly what we already have.
>
> Looking into how ip_tunnel_collect_metadata() works, it looks to me like
> it's always true if there's _ANY_ routing rule in the system with
> 'tunnel ID' set.  Is that correct?  I'll assume it is in order to
> continue my thoughts.
>
Right. It is just optimization to avoid allocating tun-dst in datapath.

> So, with that, either the system is collecting SKB metadata or it isn't.
>   If it is, it seems reasonable that we populate incoming packets with
> the tunnel ID as in this patch.  That said, I'm not convinced that we
> should bypass the PDP context mechanism entirely... there should still
> be a PDP context set up or we drop the packet for security reasons.
>
> For outgoing packets, it seems reasonable that the remote TEID may come
> from metadata OR a PDP context.  That would allow some routing
> alternatives to what we have right now which just does a PDP context
> lookup based on the destination/source address of the packet.  It would
> be nice for OVS/BPF to be able to direct a packet to a remote GTP
> endpoint by providing that endpoint/TEID via a metadata structure.
>
> So we end up with, roughly:
>
> On RX:
> i) check TEID in GTP header
> ii) lookup PDP context
> iii) validate IP of encapsulated packet
> iv) if ip(tunnel_collect_metadata()) { /* add tun info */ }
> v) decapsulate and pass to network stack
>
> On TX:
> i) if SKB has metadata, get destination and TEID from metadata (tunnel ID)
> ii) otherwise, lookup PDP context for packet
>
> For RX, only iv) is new; for TX only step i) is new.  The rest is what
> we already have.
>
> The one thing that this complicates is the case where an external entity
> (OVS or BPF) is doing validation of packet IP against incoming TEID.  In
> that case, we've got double validation of the incoming address and, I
> suppose, OVS/BPF would perhaps be more efficient (?)...  In that case,
> holding a PDP context is a bit of a waste of memory.
>
> It's somewhat of a security issue to allow un-checked packets into the
> system, so I'm not super keen on dropping the validation of incoming
> packets; given the previous paragraph, however, we might add a flag when
> creating the link to decide whether or not we allow packets through even
> if we can't validate them.  This would also apply to packets without a
> PDP context for an incoming TEID, too.  This flag, I suppose, looks a
> bit like the 'collect_metadata' flag that Pravin has added here.
>
Yes. user should have the option to use GTP devices in LTW mode and
bypass PDP session context completely. Lets add a flag for GTP link to
have consistent behaviour for all types of LWT devices.

> The other difference to what we currently have is that this patch sets
> up a socket exclusively in kernel space for the tunnel; currently, all
> sockets terminate in userspace:  userspace receives all packets that
> can't be decapsulated and re-injected in to the network stack.  With
> this patch, ALL packets (without a userspace termination) are
> re-injected into the network stack; it's just that anything that can't
> be decapsulated gets injected as a "GTP packet" with some metadata and
> an UNKNOWN protocol type.  If nothing is looking at the metadata and
> acting on it, then these will just get dropped; and that's what would
> happen if nothing was listening on the userspace end, too.  So we might
> as well just make the FD1 socket parameter to the link setup optional
> and be done with it.
>
Good idea to make FD1 optional argument.

Thanks.

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2021-02-07 17:57 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-23 19:59 [RFC PATCH 00/16] GTP: flow based Jonas Bonn
2021-01-23 19:59 ` [RFC PATCH 01/16] Revert "GTP: add support for flow based tunneling API" Jonas Bonn
2021-01-24 16:34   ` Harald Welte
2021-01-23 19:59 ` [RFC PATCH 02/16] gtp: set initial MTU Jonas Bonn
2021-01-23 19:59 ` [RFC PATCH 03/16] gtp: include role in link info Jonas Bonn
2021-01-23 19:59 ` [RFC PATCH 04/16] gtp: really check namespaces before xmit Jonas Bonn
2021-01-23 19:59 ` [RFC PATCH 05/16] gtp: drop unnecessary call to skb_dst_drop Jonas Bonn
2021-01-24 16:48   ` Harald Welte
2021-01-23 19:59 ` [RFC PATCH 06/16] gtp: set device type Jonas Bonn
2021-01-23 19:59 ` [RFC PATCH 07/16] gtp: rework IPv4 functionality Jonas Bonn
2021-01-23 19:59 ` [RFC PATCH 08/16] gtp: set dev features to enable GSO Jonas Bonn
2021-01-23 19:59 ` [RFC PATCH 09/16] gtp: support GRO Jonas Bonn
2021-01-23 19:59 ` [RFC PATCH 10/16] gtp: refactor check_ms back into version specific handlers Jonas Bonn
2021-01-23 19:59 ` [RFC PATCH 11/16] gtp: drop duplicated assignment Jonas Bonn
2021-01-23 19:59 ` [RFC PATCH 12/16] gtp: update rx_length_errors for abnormally short packets Jonas Bonn
2021-01-24 16:50   ` Harald Welte
2021-01-23 19:59 ` [RFC PATCH 13/16] gtp: set skb protocol after pulling headers Jonas Bonn
2021-01-23 19:59 ` [RFC PATCH 14/16] gtp: add support for flow based tunneling Jonas Bonn
2021-01-24 14:27   ` Jonas Bonn
2021-01-24 15:11   ` Jonas Bonn
2021-01-25  8:12   ` Jonas Bonn
2021-01-25  8:47   ` Jonas Bonn
2021-02-06 18:04   ` Jonas Bonn
2021-02-07 17:56     ` Pravin Shelar
2021-01-23 19:59 ` [RFC PATCH 15/16] gtp: add ability to send GTP controls headers Jonas Bonn
2021-01-24 14:21   ` Jonas Bonn
2021-01-25 17:41     ` Harald Welte
2021-01-28 21:29       ` Pravin Shelar
2021-01-28 21:29     ` Pravin Shelar
     [not found]       ` <9b9476d2-186f-e749-f17d-d191c30347e4@norrbonn.se>
2021-01-30  6:59         ` Pravin Shelar
2021-01-30 18:44           ` Jakub Kicinski
2021-01-30 20:05             ` Pravin Shelar
2021-02-01 20:44               ` Jakub Kicinski
2021-02-02  5:24                 ` Jonas Bonn
2021-02-02  6:56                   ` Pravin Shelar
2021-02-02  8:03                     ` Jonas Bonn
2021-02-02 22:54                       ` Pravin Shelar
2021-01-23 19:59 ` [RFC PATCH 16/16] gtp: add netlink support for setting up flow based tunnels Jonas Bonn
2021-01-24 17:42 ` [RFC PATCH 00/16] GTP: flow based Pravin Shelar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).