netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v7 0/2] Bare UDP L3 Encapsulation Module
@ 2020-02-15  6:19 Martin Varghese
  2020-02-15  6:20 ` [PATCH net-next v7 1/2] net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS,IP,NSH etc Martin Varghese
  2020-02-15  6:20 ` [PATCH net-next v7 2/2] net: Special handling for IP & MPLS Martin Varghese
  0 siblings, 2 replies; 11+ messages in thread
From: Martin Varghese @ 2020-02-15  6:19 UTC (permalink / raw)
  To: netdev, davem, corbet, kuznet, yoshfuji, scott.drennan, jbenc,
	martin.varghese

From: Martin Varghese <martin.varghese@nokia.com>

There are various L3 encapsulation standards using UDP being discussed to
leverage the UDP based load balancing capability of different networks.
MPLSoUDP (__ https://tools.ietf.org/html/rfc7510) is one among them.

The Bareudp tunnel module provides a generic L3 encapsulation tunnelling
support for tunnelling different L3 protocols like MPLS, IP, NSH etc. inside
a UDP tunnel.

Special Handling
----------------
The bareudp device supports special handling for MPLS & IP as they can have
multiple ethertypes.
MPLS procotcol can have ethertypes ETH_P_MPLS_UC  (unicast) & ETH_P_MPLS_MC (multicast).
IP protocol can have ethertypes ETH_P_IP (v4) & ETH_P_IPV6 (v6).
This special handling can be enabled only for ethertypes ETH_P_IP & ETH_P_MPLS_UC
with a flag called multiproto mode.

Usage
------

1) Device creation & deletion

    a) ip link add dev bareudp0 type bareudp dstport 6635 ethertype 0x8847.

       This creates a bareudp tunnel device which tunnels L3 traffic with ethertype
       0x8847 (MPLS traffic). The destination port of the UDP header will be set to
       6635.The device will listen on UDP port 6635 to receive traffic.

    b) ip link delete bareudp0

2) Device creation with multiple proto mode enabled

There are two ways to create a bareudp device for MPLS & IP with multiproto mode
enabled.

    a) ip link add dev  bareudp0 type bareudp dstport 6635 ethertype 0x8847 multiproto

    b) ip link add dev  bareudp0 type bareudp dstport 6635 ethertype mpls

3) Device Usage

The bareudp device could be used along with OVS or flower filter in TC.
The OVS or TC flower layer must set the tunnel information in SKB dst field before
sending packet buffer to the bareudp device for transmission. On reception the
bareudp device extracts and stores the tunnel information in SKB dst field before
passing the packet buffer to the network stack.

Why not FOU ?
------------
FOU by design does l4 encapsulation.It maps udp port to ipproto (IP protocol number for l4 protocol).
Bareudp acheives a generic l3 encapsulation.It maps udp port to l3 ethertype.

Martin Varghese (2):
  net: UDP tunnel encapsulation module for tunnelling different
    protocols like     MPLS,IP,NSH etc.
  net: Special handling for IP & MPLS.

 Documentation/networking/bareudp.rst |  53 +++
 Documentation/networking/index.rst   |   1 +
 drivers/net/Kconfig                  |  13 +
 drivers/net/Makefile                 |   1 +
 drivers/net/bareudp.c                | 803 +++++++++++++++++++++++++++++++++++
 include/net/bareudp.h                |  20 +
 include/net/ipv6.h                   |   6 +
 include/net/route.h                  |   6 +
 include/uapi/linux/if_link.h         |  12 +
 net/ipv4/route.c                     |  48 +++
 net/ipv6/ip6_output.c                |  70 +++
 11 files changed, 1033 insertions(+)
 create mode 100644 Documentation/networking/bareudp.rst
 create mode 100644 drivers/net/bareudp.c
 create mode 100644 include/net/bareudp.h

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net-next v7 1/2] net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS,IP,NSH etc.
  2020-02-15  6:19 [PATCH net-next v7 0/2] Bare UDP L3 Encapsulation Module Martin Varghese
@ 2020-02-15  6:20 ` Martin Varghese
  2020-02-16 16:58   ` Willem de Bruijn
  2020-02-16 18:26   ` Willem de Bruijn
  2020-02-15  6:20 ` [PATCH net-next v7 2/2] net: Special handling for IP & MPLS Martin Varghese
  1 sibling, 2 replies; 11+ messages in thread
From: Martin Varghese @ 2020-02-15  6:20 UTC (permalink / raw)
  To: netdev, davem, corbet, kuznet, yoshfuji, scott.drennan, jbenc,
	martin.varghese

From: Martin Varghese <martin.varghese@nokia.com>

The Bareudp tunnel module provides a generic L3 encapsulation
tunnelling module for tunnelling different protocols like MPLS,
IP,NSH etc inside a UDP tunnel.

Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
---
Changes in v2:
     - Fixed documentation errors.
     - Converted documentation to rst format.
     - Moved ip tunnel rt lookup code to a common location.
     - Removed seperate v4 and v6 socket.
     - Added call to skb_ensure_writable before updating ethernet header.
     - Simplified bareudp_destroy_tunnels as deleting devices under a
       namespace is taken care be the default pernet exit code.
     - Fixed bareudp_change_mtu.

Changes in v3:
     - Re-sending the patch again.

Changes in v4:
     - Converted bareudp device to l3 device.
     - Removed redundant fields in bareudp device.
     - Added device usage section in documentation

Changes in v5:
     - Modified version 4 change log
     - Added Select NET_UDP_TUNNEL in Bareudp config section
     - Fixed bareudp index position in documentation index file.
     - 1500 changed to ETH_DATA_LEN while setting MTU field.
     - Replaced bareudp_change_mtu with core function dev_set_mtu.
     - Removed udp header present redundant check in recv.

Changes in v6:
     - Moved ip tunnel rt lookup code to ipv4/route.c & ipv6/ip6_route.c

Changes in v7:
     - Re-Sending the patch 


 Documentation/networking/bareudp.rst |  35 ++
 Documentation/networking/index.rst   |   1 +
 drivers/net/Kconfig                  |  13 +
 drivers/net/Makefile                 |   1 +
 drivers/net/bareudp.c                | 742 +++++++++++++++++++++++++++++++++++
 include/net/bareudp.h                |  19 +
 include/net/ipv6.h                   |   6 +
 include/net/route.h                  |   6 +
 include/uapi/linux/if_link.h         |  11 +
 net/ipv4/route.c                     |  48 +++
 net/ipv6/ip6_output.c                |  70 ++++
 11 files changed, 952 insertions(+)
 create mode 100644 Documentation/networking/bareudp.rst
 create mode 100644 drivers/net/bareudp.c
 create mode 100644 include/net/bareudp.h

diff --git a/Documentation/networking/bareudp.rst b/Documentation/networking/bareudp.rst
new file mode 100644
index 0000000..4087a1b
--- /dev/null
+++ b/Documentation/networking/bareudp.rst
@@ -0,0 +1,35 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+========================================
+Bare UDP Tunnelling Module Documentation
+========================================
+
+There are various L3 encapsulation standards using UDP being discussed to
+leverage the UDP based load balancing capability of different networks.
+MPLSoUDP (__ https://tools.ietf.org/html/rfc7510) is one among them.
+
+The Bareudp tunnel module provides a generic L3 encapsulation tunnelling
+support for tunnelling different L3 protocols like MPLS, IP, NSH etc. inside
+a UDP tunnel.
+
+Usage
+------
+
+1) Device creation & deletion
+
+    a) ip link add dev bareudp0 type bareudp dstport 6635 ethertype 0x8847.
+
+       This creates a bareudp tunnel device which tunnels L3 traffic with ethertype
+       0x8847 (MPLS traffic). The destination port of the UDP header will be set to
+       6635.The device will listen on UDP port 6635 to receive traffic.
+
+    b) ip link delete bareudp0
+
+2) Device Usage
+
+The bareudp device could be used along with OVS or flower filter in TC.
+The OVS or TC flower layer must set the tunnel information in SKB dst field before
+sending packet buffer to the bareudp device for transmission. On reception the
+bareudp device extracts and stores the tunnel information in SKB dst field before
+passing the packet buffer to the network stack.
+
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index d07d985..3a83cfb 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -8,6 +8,7 @@ Contents:
 
    netdev-FAQ
    af_xdp
+   bareudp
    batman-adv
    can
    can_ucan_protocol
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 25a8f93..66e410e 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -258,6 +258,19 @@ config GENEVE
 	  To compile this driver as a module, choose M here: the module
 	  will be called geneve.
 
+config BAREUDP
+       tristate "Bare UDP Encapsulation"
+       depends on INET
+       depends on IPV6 || !IPV6
+       select NET_UDP_TUNNEL
+       select GRO_CELLS
+       help
+          This adds a bare UDP tunnel module for tunnelling different
+          kinds of traffic like MPLS, IP, etc. inside a UDP tunnel.
+
+          To compile this driver as a module, choose M here: the module
+          will be called bareudp.
+
 config GTP
 	tristate "GPRS Tunneling Protocol datapath (GTP-U)"
 	depends on INET
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 71b88ff..6596724 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -29,6 +29,7 @@ obj-$(CONFIG_VETH) += veth.o
 obj-$(CONFIG_VIRTIO_NET) += virtio_net.o
 obj-$(CONFIG_VXLAN) += vxlan.o
 obj-$(CONFIG_GENEVE) += geneve.o
+obj-$(CONFIG_BAREUDP) += bareudp.o
 obj-$(CONFIG_GTP) += gtp.o
 obj-$(CONFIG_NLMON) += nlmon.o
 obj-$(CONFIG_NET_VRF) += vrf.o
diff --git a/drivers/net/bareudp.c b/drivers/net/bareudp.c
new file mode 100644
index 0000000..0338160
--- /dev/null
+++ b/drivers/net/bareudp.c
@@ -0,0 +1,742 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Bareudp: UDP  tunnel encasulation for different Payload types like
+ * MPLS, NSH, IP, etc.
+ * Copyright (c) 2019 Nokia, Inc.
+ * Authors:  Martin Varghese, <martin.varghese@nokia.com>
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/etherdevice.h>
+#include <linux/hash.h>
+#include <net/dst_metadata.h>
+#include <net/gro_cells.h>
+#include <net/rtnetlink.h>
+#include <net/protocol.h>
+#include <net/ip6_tunnel.h>
+#include <net/ip_tunnels.h>
+#include <net/udp_tunnel.h>
+#include <net/bareudp.h>
+
+#define BAREUDP_BASE_HLEN sizeof(struct udphdr)
+#define BAREUDP_IPV4_HLEN (sizeof(struct iphdr) + \
+			   sizeof(struct udphdr))
+#define BAREUDP_IPV6_HLEN (sizeof(struct ipv6hdr) + \
+			   sizeof(struct udphdr))
+
+static bool log_ecn_error = true;
+module_param(log_ecn_error, bool, 0644);
+MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");
+
+/* per-network namespace private data for this module */
+
+static unsigned int bareudp_net_id;
+
+struct bareudp_net {
+	struct list_head        bareudp_list;
+};
+
+/* Pseudo network device */
+struct bareudp_dev {
+	struct net         *net;        /* netns for packet i/o */
+	struct net_device  *dev;        /* netdev for bareudp tunnel */
+	__be16		   ethertype;
+	__be16             port;
+	u16	           sport_min;
+	struct socket      __rcu *sock;
+	struct list_head   next;        /* bareudp node  on namespace list */
+	struct gro_cells   gro_cells;
+};
+
+static int bareudp_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
+{
+	struct bareudp_dev *bareudp;
+	struct metadata_dst *tun_dst = NULL;
+	struct pcpu_sw_netstats *stats;
+	unsigned int len;
+	int err = 0;
+	void *oiph;
+	__be16 proto;
+	unsigned short family;
+
+	bareudp = rcu_dereference_sk_user_data(sk);
+	if (!bareudp)
+		goto drop;
+
+	if (skb->protocol ==  htons(ETH_P_IP))
+		family = AF_INET;
+	else
+		family = AF_INET6;
+
+	proto = bareudp->ethertype;
+
+	if (iptunnel_pull_header(skb, BAREUDP_BASE_HLEN,
+				 proto,
+				 !net_eq(bareudp->net,
+				 dev_net(bareudp->dev)))) {
+		bareudp->dev->stats.rx_dropped++;
+		goto drop;
+	}
+
+	tun_dst = udp_tun_rx_dst(skb, family, TUNNEL_KEY, 0, 0);
+	if (!tun_dst) {
+		bareudp->dev->stats.rx_dropped++;
+		goto drop;
+	}
+	skb_dst_set(skb, &tun_dst->dst);
+	skb->dev = bareudp->dev;
+	oiph = skb_network_header(skb);
+	skb_reset_network_header(skb);
+
+	if (family == AF_INET)
+		err = IP_ECN_decapsulate(oiph, skb);
+#if IS_ENABLED(CONFIG_IPV6)
+	else
+		err = IP6_ECN_decapsulate(oiph, skb);
+#endif
+
+	if (unlikely(err)) {
+		if (log_ecn_error) {
+			if  (family == AF_INET)
+				net_info_ratelimited("non-ECT from %pI4 "
+						     "with TOS=%#x\n",
+						     &((struct iphdr *)oiph)->saddr,
+						     ((struct iphdr *)oiph)->tos);
+#if IS_ENABLED(CONFIG_IPV6)
+			else
+				net_info_ratelimited("non-ECT from %pI6\n",
+						     &((struct ipv6hdr *)oiph)->saddr);
+#endif
+		}
+		if (err > 1) {
+			++bareudp->dev->stats.rx_frame_errors;
+			++bareudp->dev->stats.rx_errors;
+			goto drop;
+		}
+	}
+
+	len = skb->len;
+	err = gro_cells_receive(&bareudp->gro_cells, skb);
+	if (likely(err == NET_RX_SUCCESS)) {
+		stats = this_cpu_ptr(bareudp->dev->tstats);
+		u64_stats_update_begin(&stats->syncp);
+		stats->rx_packets++;
+		stats->rx_bytes += len;
+		u64_stats_update_end(&stats->syncp);
+	}
+	return 0;
+drop:
+	/* Consume bad packet */
+	kfree_skb(skb);
+
+	return 0;
+}
+
+static int bareudp_err_lookup(struct sock *sk, struct sk_buff *skb)
+{
+	return 0;
+}
+
+static int bareudp_init(struct net_device *dev)
+{
+	struct bareudp_dev *bareudp = netdev_priv(dev);
+	int err;
+
+	dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);
+	if (!dev->tstats)
+		return -ENOMEM;
+
+	err = gro_cells_init(&bareudp->gro_cells, dev);
+	if (err) {
+		free_percpu(dev->tstats);
+		return err;
+	}
+	return 0;
+}
+
+static void bareudp_uninit(struct net_device *dev)
+{
+	struct bareudp_dev *bareudp = netdev_priv(dev);
+
+	gro_cells_destroy(&bareudp->gro_cells);
+	free_percpu(dev->tstats);
+}
+
+static struct socket *bareudp_create_sock(struct net *net, __be16 port)
+{
+	struct socket *sock;
+	struct udp_port_cfg udp_conf;
+	int err;
+
+	memset(&udp_conf, 0, sizeof(udp_conf));
+#if IS_ENABLED(CONFIG_IPV6)
+	udp_conf.family = AF_INET6;
+#else
+	udp_conf.family = AF_INET;
+#endif
+	udp_conf.local_udp_port = port;
+	/* Open UDP socket */
+	err = udp_sock_create(net, &udp_conf, &sock);
+	if (err < 0)
+		return ERR_PTR(err);
+
+	return sock;
+}
+
+/* Create new listen socket if needed */
+static int bareudp_socket_create(struct bareudp_dev *bareudp, __be16 port)
+{
+	struct socket *sock;
+	struct udp_tunnel_sock_cfg tunnel_cfg;
+
+	sock = bareudp_create_sock(bareudp->net, port);
+	if (IS_ERR(sock))
+		return PTR_ERR(sock);
+
+	/* Mark socket as an encapsulation socket */
+	memset(&tunnel_cfg, 0, sizeof(tunnel_cfg));
+	tunnel_cfg.sk_user_data = bareudp;
+	tunnel_cfg.encap_type = 1;
+	tunnel_cfg.encap_rcv = bareudp_udp_encap_recv;
+	tunnel_cfg.encap_err_lookup = bareudp_err_lookup;
+	tunnel_cfg.encap_destroy = NULL;
+	setup_udp_tunnel_sock(bareudp->net, sock, &tunnel_cfg);
+
+	if (sock->sk->sk_family == AF_INET6)
+		udp_encap_enable();
+
+	rcu_assign_pointer(bareudp->sock, sock);
+	return 0;
+}
+
+static int bareudp_open(struct net_device *dev)
+{
+	struct bareudp_dev *bareudp = netdev_priv(dev);
+	int ret = 0;
+
+	ret =  bareudp_socket_create(bareudp, bareudp->port);
+	return ret;
+}
+
+static void bareudp_sock_release(struct bareudp_dev *bareudp)
+{
+	struct socket *sock;
+
+	sock = bareudp->sock;
+	rcu_assign_pointer(bareudp->sock, NULL);
+	synchronize_net();
+	udp_tunnel_sock_release(sock);
+}
+
+static int bareudp_stop(struct net_device *dev)
+{
+	struct bareudp_dev *bareudp = netdev_priv(dev);
+
+	bareudp_sock_release(bareudp);
+	return 0;
+}
+
+static int bareudp_xmit_skb(struct sk_buff *skb, struct net_device *dev,
+			    struct bareudp_dev *bareudp,
+			    const struct ip_tunnel_info *info)
+{
+	bool xnet = !net_eq(bareudp->net, dev_net(bareudp->dev));
+	struct socket *sock = rcu_dereference(bareudp->sock);
+	const struct ip_tunnel_key *key = &info->key;
+	bool udp_sum = !!(info->key.tun_flags & TUNNEL_CSUM);
+	bool use_cache = ip_tunnel_dst_cache_usable(skb, info);
+	int err;
+	struct rtable *rt;
+	__u8 tos, ttl;
+	__be16 sport;
+	__be16 df;
+	int min_headroom;
+	__be32 saddr;
+
+	if (!sock)
+		return -ESHUTDOWN;
+
+	rt = ip_route_output_tunnel(skb, dev, bareudp->net, &saddr, info,
+				    IPPROTO_UDP, use_cache);
+
+	if (IS_ERR(rt))
+		return PTR_ERR(rt);
+
+	skb_tunnel_check_pmtu(skb, &rt->dst,
+			      BAREUDP_IPV4_HLEN + info->options_len);
+
+	sport = udp_flow_src_port(bareudp->net, skb,
+				  bareudp->sport_min, USHRT_MAX,
+				  true);
+	tos = ip_tunnel_ecn_encap(key->tos, ip_hdr(skb), skb);
+	ttl = key->ttl;
+	df = key->tun_flags & TUNNEL_DONT_FRAGMENT ? htons(IP_DF) : 0;
+	skb_scrub_packet(skb, xnet);
+
+	if (!skb_pull(skb, skb_network_offset(skb)))
+		goto free_dst;
+
+	min_headroom = LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len +
+		BAREUDP_BASE_HLEN + info->options_len + sizeof(struct iphdr);
+
+	err = skb_cow_head(skb, min_headroom);
+	if (unlikely(err))
+		goto free_dst;
+
+	err = udp_tunnel_handle_offloads(skb, udp_sum);
+	if (err)
+		goto free_dst;
+
+	skb_set_inner_protocol(skb, bareudp->ethertype);
+	udp_tunnel_xmit_skb(rt, sock->sk, skb, saddr, info->key.u.ipv4.dst,
+			    tos, ttl, df, sport, bareudp->port,
+			    !net_eq(bareudp->net, dev_net(bareudp->dev)),
+			    !(info->key.tun_flags & TUNNEL_CSUM));
+	return 0;
+
+free_dst:
+	dst_release(&rt->dst);
+	return err;
+}
+
+#if IS_ENABLED(CONFIG_IPV6)
+static int bareudp6_xmit_skb(struct sk_buff *skb, struct net_device *dev,
+			     struct bareudp_dev *bareudp,
+			     const struct ip_tunnel_info *info)
+{
+	bool xnet = !net_eq(bareudp->net, dev_net(bareudp->dev));
+	struct socket *sock  = rcu_dereference(bareudp->sock);
+	const struct ip_tunnel_key *key = &info->key;
+	bool udp_sum = !!(info->key.tun_flags & TUNNEL_CSUM);
+	bool use_cache = ip_tunnel_dst_cache_usable(skb, info);
+	struct dst_entry *dst = NULL;
+	struct in6_addr saddr, daddr;
+	__u8 prio, ttl;
+	__be16 sport;
+	int min_headroom;
+	int err;
+
+	if (!sock)
+		return -ESHUTDOWN;
+
+	dst = ip6_dst_lookup_tunnel(skb, dev, bareudp->net, sock, &saddr, info,
+				    IPPROTO_UDP, use_cache);
+	if (IS_ERR(dst))
+		return PTR_ERR(dst);
+
+	skb_tunnel_check_pmtu(skb, dst, BAREUDP_IPV6_HLEN + info->options_len);
+
+	sport = udp_flow_src_port(bareudp->net, skb,
+				  bareudp->sport_min, USHRT_MAX,
+				  true);
+	prio = ip_tunnel_ecn_encap(key->tos, ip_hdr(skb), skb);
+	ttl = key->ttl;
+
+	skb_scrub_packet(skb, xnet);
+
+	if (!skb_pull(skb, skb_network_offset(skb)))
+		goto free_dst;
+
+	min_headroom = LL_RESERVED_SPACE(dst->dev) + dst->header_len +
+		BAREUDP_BASE_HLEN + info->options_len + sizeof(struct iphdr);
+
+	err = skb_cow_head(skb, min_headroom);
+	if (unlikely(err))
+		goto free_dst;
+
+	err = udp_tunnel_handle_offloads(skb, udp_sum);
+	if (err)
+		goto free_dst;
+
+	daddr = info->key.u.ipv6.dst;
+	udp_tunnel6_xmit_skb(dst, sock->sk, skb, dev,
+			     &saddr, &daddr, prio, ttl,
+			     info->key.label, sport, bareudp->port,
+			     !(info->key.tun_flags & TUNNEL_CSUM));
+	return 0;
+
+free_dst:
+	dst_release(dst);
+	return err;
+}
+#endif
+
+static netdev_tx_t bareudp_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	struct bareudp_dev *bareudp = netdev_priv(dev);
+	struct ip_tunnel_info *info = NULL;
+	int err;
+
+	if (skb->protocol != bareudp->ethertype) {
+		err = -EINVAL;
+		goto tx_error;
+	}
+
+	info = skb_tunnel_info(skb);
+	if (unlikely(!info || !(info->mode & IP_TUNNEL_INFO_TX))) {
+		err = -EINVAL;
+		goto tx_error;
+	}
+
+	rcu_read_lock();
+#if IS_ENABLED(CONFIG_IPV6)
+	if (info->mode & IP_TUNNEL_INFO_IPV6)
+		err = bareudp6_xmit_skb(skb, dev, bareudp, info);
+	else
+#endif
+		err = bareudp_xmit_skb(skb, dev, bareudp, info);
+
+	rcu_read_unlock();
+
+	if (likely(!err))
+		return NETDEV_TX_OK;
+tx_error:
+	dev_kfree_skb(skb);
+
+	if (err == -ELOOP)
+		dev->stats.collisions++;
+	else if (err == -ENETUNREACH)
+		dev->stats.tx_carrier_errors++;
+
+	dev->stats.tx_errors++;
+	return NETDEV_TX_OK;
+}
+
+static int bareudp_fill_metadata_dst(struct net_device *dev,
+				     struct sk_buff *skb)
+{
+	struct ip_tunnel_info *info = skb_tunnel_info(skb);
+	struct bareudp_dev *bareudp = netdev_priv(dev);
+	bool use_cache = ip_tunnel_dst_cache_usable(skb, info);
+
+	if (ip_tunnel_info_af(info) == AF_INET) {
+		struct rtable *rt;
+		__be32 saddr;
+
+		rt = ip_route_output_tunnel(skb, dev, bareudp->net, &saddr,
+					    info, IPPROTO_UDP, use_cache);
+		if (IS_ERR(rt))
+			return PTR_ERR(rt);
+
+		ip_rt_put(rt);
+		info->key.u.ipv4.src = saddr;
+#if IS_ENABLED(CONFIG_IPV6)
+	} else if (ip_tunnel_info_af(info) == AF_INET6) {
+		struct dst_entry *dst;
+		struct in6_addr saddr;
+		struct socket *sock = rcu_dereference(bareudp->sock);
+
+		dst = ip6_dst_lookup_tunnel(skb, dev, bareudp->net, sock,
+					    &saddr, info, IPPROTO_UDP,
+					    use_cache);
+		if (IS_ERR(dst))
+			return PTR_ERR(dst);
+
+		dst_release(dst);
+		info->key.u.ipv6.src = saddr;
+#endif
+	} else {
+		return -EINVAL;
+	}
+
+	info->key.tp_src = udp_flow_src_port(bareudp->net, skb,
+					     bareudp->sport_min,
+			USHRT_MAX, true);
+	info->key.tp_dst = bareudp->port;
+	return 0;
+}
+
+static const struct net_device_ops bareudp_netdev_ops = {
+	.ndo_init               = bareudp_init,
+	.ndo_uninit             = bareudp_uninit,
+	.ndo_open               = bareudp_open,
+	.ndo_stop               = bareudp_stop,
+	.ndo_start_xmit         = bareudp_xmit,
+	.ndo_get_stats64        = ip_tunnel_get_stats64,
+	.ndo_fill_metadata_dst  = bareudp_fill_metadata_dst,
+};
+
+static const struct nla_policy bareudp_policy[IFLA_BAREUDP_MAX + 1] = {
+	[IFLA_BAREUDP_PORT]                = { .type = NLA_U16 },
+	[IFLA_BAREUDP_ETHERTYPE]	   = { .type = NLA_U16 },
+	[IFLA_BAREUDP_SRCPORT_MIN]         = { .type = NLA_U16 },
+};
+
+/* Info for udev, that this is a virtual tunnel endpoint */
+static struct device_type bareudp_type = {
+	.name = "bareudp",
+};
+
+/* Initialize the device structure. */
+static void bareudp_setup(struct net_device *dev)
+{
+	dev->netdev_ops = &bareudp_netdev_ops;
+	dev->needs_free_netdev = true;
+	SET_NETDEV_DEVTYPE(dev, &bareudp_type);
+	dev->features    |= NETIF_F_SG | NETIF_F_HW_CSUM;
+	dev->features    |= NETIF_F_RXCSUM;
+	dev->features    |= NETIF_F_GSO_SOFTWARE;
+	dev->hw_features |= NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_RXCSUM;
+	dev->hw_features |= NETIF_F_GSO_SOFTWARE;
+	dev->hard_header_len = 0;
+	dev->addr_len = 0;
+	dev->mtu = ETH_DATA_LEN;
+	dev->min_mtu = IPV4_MIN_MTU;
+	dev->max_mtu = IP_MAX_MTU - BAREUDP_BASE_HLEN;
+	dev->type = ARPHRD_NONE;
+	netif_keep_dst(dev);
+	dev->priv_flags |= IFF_NO_QUEUE;
+	dev->flags = IFF_POINTOPOINT | IFF_NOARP | IFF_MULTICAST;
+}
+
+static int bareudp_validate(struct nlattr *tb[], struct nlattr *data[],
+			    struct netlink_ext_ack *extack)
+{
+	if (!data) {
+		NL_SET_ERR_MSG(extack,
+			       "Not enough attributes provided to perform the operation");
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static int bareudp2info(struct nlattr *data[], struct bareudp_conf *conf)
+{
+	if (!data[IFLA_BAREUDP_PORT] || !data[IFLA_BAREUDP_ETHERTYPE])
+		return -EINVAL;
+
+	if (data[IFLA_BAREUDP_PORT])
+		conf->port =  nla_get_u16(data[IFLA_BAREUDP_PORT]);
+
+	if (data[IFLA_BAREUDP_ETHERTYPE])
+		conf->ethertype =  nla_get_u16(data[IFLA_BAREUDP_ETHERTYPE]);
+
+	if (data[IFLA_BAREUDP_SRCPORT_MIN])
+		conf->sport_min =  nla_get_u16(data[IFLA_BAREUDP_SRCPORT_MIN]);
+
+	return 0;
+}
+
+static struct bareudp_dev *bareudp_find_dev(struct bareudp_net *bn,
+					    const struct bareudp_conf *conf)
+{
+	struct bareudp_dev *bareudp, *t = NULL;
+
+	list_for_each_entry(bareudp, &bn->bareudp_list, next) {
+		if (conf->port == bareudp->port)
+			t = bareudp;
+	}
+	return t;
+}
+
+static int bareudp_configure(struct net *net, struct net_device *dev,
+			     struct bareudp_conf *conf)
+{
+	struct bareudp_net *bn = net_generic(net, bareudp_net_id);
+	struct bareudp_dev *t, *bareudp = netdev_priv(dev);
+	int err;
+
+	bareudp->net = net;
+	bareudp->dev = dev;
+	t = bareudp_find_dev(bn, conf);
+	if (t)
+		return -EBUSY;
+
+	bareudp->port = conf->port;
+	bareudp->ethertype = conf->ethertype;
+	bareudp->sport_min = conf->sport_min;
+	err = register_netdevice(dev);
+	if (err)
+		return err;
+
+	list_add(&bareudp->next, &bn->bareudp_list);
+	return 0;
+}
+
+static int bareudp_link_config(struct net_device *dev,
+			       struct nlattr *tb[])
+{
+	int err;
+
+	if (tb[IFLA_MTU]) {
+		err = dev_set_mtu(dev, nla_get_u32(tb[IFLA_MTU]));
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
+static int bareudp_newlink(struct net *net, struct net_device *dev,
+			   struct nlattr *tb[], struct nlattr *data[],
+			   struct netlink_ext_ack *extack)
+{
+	struct bareudp_conf conf;
+	int err;
+
+	err = bareudp2info(data, &conf);
+	if (err)
+		return err;
+
+	err = bareudp_configure(net, dev, &conf);
+	if (err)
+		return err;
+
+	err = bareudp_link_config(dev, tb);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+static void bareudp_dellink(struct net_device *dev, struct list_head *head)
+{
+	struct bareudp_dev *bareudp = netdev_priv(dev);
+
+	list_del(&bareudp->next);
+	unregister_netdevice_queue(dev, head);
+}
+
+static size_t bareudp_get_size(const struct net_device *dev)
+{
+	return  nla_total_size(sizeof(__be16)) +  /* IFLA_BAREUDP_PORT */
+		nla_total_size(sizeof(__be16)) +  /* IFLA_BAREUDP_ETHERTYPE */
+		nla_total_size(sizeof(__u16))  +  /* IFLA_BAREUDP_SRCPORT_MIN */
+		0;
+}
+
+static int bareudp_fill_info(struct sk_buff *skb, const struct net_device *dev)
+{
+	struct bareudp_dev *bareudp = netdev_priv(dev);
+
+	if (nla_put_be16(skb, IFLA_BAREUDP_PORT, bareudp->port))
+		goto nla_put_failure;
+	if (nla_put_be16(skb, IFLA_BAREUDP_ETHERTYPE, bareudp->ethertype))
+		goto nla_put_failure;
+	if (nla_put_u16(skb, IFLA_BAREUDP_SRCPORT_MIN, bareudp->sport_min))
+		goto nla_put_failure;
+
+	return 0;
+
+nla_put_failure:
+	return -EMSGSIZE;
+}
+
+static struct rtnl_link_ops bareudp_link_ops __read_mostly = {
+	.kind           = "bareudp",
+	.maxtype        = IFLA_BAREUDP_MAX,
+	.policy         = bareudp_policy,
+	.priv_size      = sizeof(struct bareudp_dev),
+	.setup          = bareudp_setup,
+	.validate       = bareudp_validate,
+	.newlink        = bareudp_newlink,
+	.dellink        = bareudp_dellink,
+	.get_size       = bareudp_get_size,
+	.fill_info      = bareudp_fill_info,
+};
+
+struct net_device *bareudp_dev_create(struct net *net, const char *name,
+				      u8 name_assign_type,
+				      struct bareudp_conf *conf)
+{
+	struct nlattr *tb[IFLA_MAX + 1];
+	struct net_device *dev;
+	LIST_HEAD(list_kill);
+	int err;
+
+	memset(tb, 0, sizeof(tb));
+	dev = rtnl_create_link(net, name, name_assign_type,
+			       &bareudp_link_ops, tb, NULL);
+	if (IS_ERR(dev))
+		return dev;
+
+	err = bareudp_configure(net, dev, conf);
+	if (err) {
+		free_netdev(dev);
+		return ERR_PTR(err);
+	}
+	err = dev_set_mtu(dev, IP_MAX_MTU);
+	if (err)
+		goto err;
+
+	err = rtnl_configure_link(dev, NULL);
+	if (err < 0)
+		goto err;
+
+	return dev;
+err:
+	bareudp_dellink(dev, &list_kill);
+	unregister_netdevice_many(&list_kill);
+	return ERR_PTR(err);
+}
+EXPORT_SYMBOL_GPL(bareudp_dev_create);
+
+static __net_init int bareudp_init_net(struct net *net)
+{
+	struct bareudp_net *bn = net_generic(net, bareudp_net_id);
+
+	INIT_LIST_HEAD(&bn->bareudp_list);
+	return 0;
+}
+
+static void bareudp_destroy_tunnels(struct net *net, struct list_head *head)
+{
+	struct bareudp_net *bn = net_generic(net, bareudp_net_id);
+	struct bareudp_dev *bareudp, *next;
+
+	list_for_each_entry_safe(bareudp, next, &bn->bareudp_list, next)
+		unregister_netdevice_queue(bareudp->dev, head);
+}
+
+static void __net_exit bareudp_exit_batch_net(struct list_head *net_list)
+{
+	struct net *net;
+	LIST_HEAD(list);
+
+	rtnl_lock();
+	list_for_each_entry(net, net_list, exit_list)
+		bareudp_destroy_tunnels(net, &list);
+
+	/* unregister the devices gathered above */
+	unregister_netdevice_many(&list);
+	rtnl_unlock();
+}
+
+static struct pernet_operations bareudp_net_ops = {
+	.init = bareudp_init_net,
+	.exit_batch = bareudp_exit_batch_net,
+	.id   = &bareudp_net_id,
+	.size = sizeof(struct bareudp_net),
+};
+
+static int __init bareudp_init_module(void)
+{
+	int rc;
+
+	rc = register_pernet_subsys(&bareudp_net_ops);
+	if (rc)
+		goto out1;
+
+	rc = rtnl_link_register(&bareudp_link_ops);
+	if (rc)
+		goto out2;
+
+	return 0;
+out2:
+	unregister_pernet_subsys(&bareudp_net_ops);
+out1:
+	return rc;
+}
+late_initcall(bareudp_init_module);
+
+static void __exit bareudp_cleanup_module(void)
+{
+	rtnl_link_unregister(&bareudp_link_ops);
+	unregister_pernet_subsys(&bareudp_net_ops);
+}
+module_exit(bareudp_cleanup_module);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Martin Varghese <martin.varghese@nokia.com>");
+MODULE_DESCRIPTION("Interface driver for UDP encapsulated traffic");
diff --git a/include/net/bareudp.h b/include/net/bareudp.h
new file mode 100644
index 0000000..513fae6
--- /dev/null
+++ b/include/net/bareudp.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __NET_BAREUDP_H
+#define __NET_BAREUDP_H
+
+#include <linux/types.h>
+#include <linux/skbuff.h>
+
+struct bareudp_conf {
+	__be16 ethertype;
+	__be16 port;
+	u16 sport_min;
+};
+
+struct net_device *bareudp_dev_create(struct net *net, const char *name,
+				      u8 name_assign_type,
+				      struct bareudp_conf *info);
+
+#endif
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index cec1a54..1bf8065 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -1027,6 +1027,12 @@ struct dst_entry *ip6_dst_lookup_flow(struct net *net, const struct sock *sk, st
 struct dst_entry *ip6_sk_dst_lookup_flow(struct sock *sk, struct flowi6 *fl6,
 					 const struct in6_addr *final_dst,
 					 bool connected);
+struct dst_entry *ip6_dst_lookup_tunnel(struct sk_buff *skb,
+					struct net_device *dev,
+					struct net *net, struct socket *sock,
+					struct in6_addr *saddr,
+					const struct ip_tunnel_info *info,
+					u8 protocol, bool use_cache);
 struct dst_entry *ip6_blackhole_route(struct net *net,
 				      struct dst_entry *orig_dst);
 
diff --git a/include/net/route.h b/include/net/route.h
index a9c60fc..81750ae 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -128,6 +128,12 @@ static inline struct rtable *__ip_route_output_key(struct net *net,
 
 struct rtable *ip_route_output_flow(struct net *, struct flowi4 *flp,
 				    const struct sock *sk);
+struct rtable *ip_route_output_tunnel(struct sk_buff *skb,
+				      struct net_device *dev,
+				      struct net *net, __be32 *saddr,
+				      const struct ip_tunnel_info *info,
+				      u8 protocol, bool use_cache);
+
 struct dst_entry *ipv4_blackhole_route(struct net *net,
 				       struct dst_entry *dst_orig);
 
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 024af2d..fb4b33a 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -590,6 +590,17 @@ enum ifla_geneve_df {
 	GENEVE_DF_MAX = __GENEVE_DF_END - 1,
 };
 
+/* Bareudp section  */
+enum {
+	IFLA_BAREUDP_UNSPEC,
+	IFLA_BAREUDP_PORT,
+	IFLA_BAREUDP_ETHERTYPE,
+	IFLA_BAREUDP_SRCPORT_MIN,
+	__IFLA_BAREUDP_MAX
+};
+
+#define IFLA_BAREUDP_MAX (__IFLA_BAREUDP_MAX - 1)
+
 /* PPP section */
 enum {
 	IFLA_PPP_UNSPEC,
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index ebe7060..66f6cc2 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2774,6 +2774,54 @@ struct rtable *ip_route_output_flow(struct net *net, struct flowi4 *flp4,
 }
 EXPORT_SYMBOL_GPL(ip_route_output_flow);
 
+struct rtable *ip_route_output_tunnel(struct sk_buff *skb,
+				      struct net_device *dev,
+				      struct net *net, __be32 *saddr,
+				      const struct ip_tunnel_info *info,
+				      u8 protocol, bool use_cache)
+{
+#ifdef CONFIG_DST_CACHE
+	struct dst_cache *dst_cache;
+#endif
+	struct rtable *rt = NULL;
+	struct flowi4 fl4;
+	__u8 tos;
+
+	memset(&fl4, 0, sizeof(fl4));
+	fl4.flowi4_mark = skb->mark;
+	fl4.flowi4_proto = protocol;
+	fl4.daddr = info->key.u.ipv4.dst;
+	fl4.saddr = info->key.u.ipv4.src;
+
+	tos = info->key.tos;
+	fl4.flowi4_tos = RT_TOS(tos);
+#ifdef CONFIG_DST_CACHE
+	dst_cache = (struct dst_cache *)&info->dst_cache;
+	if (use_cache) {
+		rt = dst_cache_get_ip4(dst_cache, saddr);
+		if (rt)
+			return rt;
+	}
+#endif
+	rt = ip_route_output_key(net, &fl4);
+	if (IS_ERR(rt)) {
+		netdev_dbg(dev, "no route to %pI4\n", &fl4.daddr);
+		return ERR_PTR(-ENETUNREACH);
+	}
+	if (rt->dst.dev == dev) { /* is this necessary? */
+		netdev_dbg(dev, "circular route to %pI4\n", &fl4.daddr);
+		ip_rt_put(rt);
+		return ERR_PTR(-ELOOP);
+	}
+#ifdef CONFIG_DST_CACHE
+	if (use_cache)
+		dst_cache_set_ip4(dst_cache, &rt->dst, fl4.saddr);
+#endif
+	*saddr = fl4.saddr;
+	return rt;
+}
+EXPORT_SYMBOL_GPL(ip_route_output_tunnel);
+
 /* called with rcu_read_lock held */
 static int rt_fill_info(struct net *net, __be32 dst, __be32 src,
 			struct rtable *rt, u32 table_id, struct flowi4 *fl4,
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 0873044..35663f0 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -54,6 +54,7 @@
 #include <linux/mroute6.h>
 #include <net/l3mdev.h>
 #include <net/lwtunnel.h>
+#include <net/ip_tunnels.h>
 
 static int ip6_finish_output2(struct net *net, struct sock *sk, struct sk_buff *skb)
 {
@@ -1196,6 +1197,75 @@ struct dst_entry *ip6_sk_dst_lookup_flow(struct sock *sk, struct flowi6 *fl6,
 }
 EXPORT_SYMBOL_GPL(ip6_sk_dst_lookup_flow);
 
+/**
+ *      ip6_dst_lookup_tunnel - perform route lookup on tunnel
+ *      @skb: Packet for which lookup is done
+ *      @dev: Tunnel device
+ *      @net: Network namespace of tunnel device
+ *      @sk: Socket which provides route info
+ *      @saddr: Memory to store the src ip address
+ *      @info: Tunnel information
+ *      @protocol: IP protocol
+ *      @use_cahce: Flag to enable cache usage
+ *      This function performs a route lookup on a tunnel
+ *
+ *      It returns a valid dst pointer and stores src address to be used in
+ *      tunnel in param saddr on success, else a pointer encoded error code.
+ */
+
+struct dst_entry *ip6_dst_lookup_tunnel(struct sk_buff *skb,
+					struct net_device *dev,
+					struct net *net,
+					struct socket *sock,
+					struct in6_addr *saddr,
+					const struct ip_tunnel_info *info,
+					u8 protocol,
+					bool use_cache)
+{
+	struct dst_entry *dst = NULL;
+#ifdef CONFIG_DST_CACHE
+	struct dst_cache *dst_cache;
+#endif
+	struct flowi6 fl6;
+	__u8 prio;
+
+	memset(&fl6, 0, sizeof(fl6));
+	fl6.flowi6_mark = skb->mark;
+	fl6.flowi6_proto = protocol;
+	fl6.daddr = info->key.u.ipv6.dst;
+	fl6.saddr = info->key.u.ipv6.src;
+	prio = info->key.tos;
+
+	fl6.flowlabel = ip6_make_flowinfo(RT_TOS(prio),
+					  info->key.label);
+#ifdef CONFIG_DST_CACHE
+	dst_cache = (struct dst_cache *)&info->dst_cache;
+	if (use_cache) {
+		dst = dst_cache_get_ip6(dst_cache, saddr);
+		if (dst)
+			return dst;
+	}
+#endif
+	dst = ipv6_stub->ipv6_dst_lookup_flow(net, sock->sk, &fl6,
+					      NULL);
+	if (IS_ERR(dst)) {
+		netdev_dbg(dev, "no route to %pI6\n", &fl6.daddr);
+		return ERR_PTR(-ENETUNREACH);
+	}
+	if (dst->dev == dev) { /* is this necessary? */
+		netdev_dbg(dev, "circular route to %pI6\n", &fl6.daddr);
+		dst_release(dst);
+		return ERR_PTR(-ELOOP);
+	}
+#ifdef CONFIG_DST_CACHE
+	if (use_cache)
+		dst_cache_set_ip6(dst_cache, dst, &fl6.saddr);
+#endif
+	*saddr = fl6.saddr;
+	return dst;
+}
+EXPORT_SYMBOL_GPL(ip6_dst_lookup_tunnel);
+
 static inline struct ipv6_opt_hdr *ip6_opt_dup(struct ipv6_opt_hdr *src,
 					       gfp_t gfp)
 {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next v7 2/2] net: Special handling for IP & MPLS.
  2020-02-15  6:19 [PATCH net-next v7 0/2] Bare UDP L3 Encapsulation Module Martin Varghese
  2020-02-15  6:20 ` [PATCH net-next v7 1/2] net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS,IP,NSH etc Martin Varghese
@ 2020-02-15  6:20 ` Martin Varghese
  1 sibling, 0 replies; 11+ messages in thread
From: Martin Varghese @ 2020-02-15  6:20 UTC (permalink / raw)
  To: netdev, davem, corbet, kuznet, yoshfuji, scott.drennan, jbenc,
	martin.varghese

From: Martin Varghese <martin.varghese@nokia.com>

Special handling is needed in bareudp module for IP & MPLS as they
support more than one ethertypes.

MPLS has 2 ethertypes. 0x8847 for MPLS unicast and 0x8848 for MPLS multicast.
While decapsulating MPLS packet from UDP packet the tunnel destination IP
address is checked to determine the ethertype. The ethertype of the packet
will be set to 0x8848 if the  tunnel destination IP address is a multicast
IP address. The ethertype of the packet will be set to 0x8847 if the
tunnel destination IP address is a unicast IP address.

IP has 2 ethertypes.0x0800 for IPV4 and 0x86dd for IPv6. The version
field of the IP header tunnelled will be checked to determine the ethertype.

This special handling to tunnel additional ethertypes will be disabled
by default and can be enabled using a flag called multiproto. This flag can
be used only with ethertypes 0x8847 and 0x0800.

Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-by: Willem de Bruijn <willemb@google.com>
---
Changes in v2:
    - Fixed documentation errors.
    - Changed commit message.

Changes in v3:
    - Re-sending the patch.

Changes in v4:
    - Renamed extmode flag to multiproto
    - Fixed typo in description.

Changes in v5:
    - Mention about extmode is changed in multiproto in commit msg.
    - Ack from Willem added.

Changes in v6:
    - Sending Again.

Changes in v7:
    - Re-sending the patch. 


 Documentation/networking/bareudp.rst | 20 ++++++++++-
 drivers/net/bareudp.c                | 67 ++++++++++++++++++++++++++++++++++--
 include/net/bareudp.h                |  1 +
 include/uapi/linux/if_link.h         |  1 +
 4 files changed, 85 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/bareudp.rst b/Documentation/networking/bareudp.rst
index 4087a1b..9794dd8 100644
--- a/Documentation/networking/bareudp.rst
+++ b/Documentation/networking/bareudp.rst
@@ -12,6 +12,15 @@ The Bareudp tunnel module provides a generic L3 encapsulation tunnelling
 support for tunnelling different L3 protocols like MPLS, IP, NSH etc. inside
 a UDP tunnel.
 
+Special Handling
+----------------
+The bareudp device supports special handling for MPLS & IP as they can have
+multiple ethertypes.
+MPLS procotcol can have ethertypes ETH_P_MPLS_UC  (unicast) & ETH_P_MPLS_MC (multicast).
+IP protocol can have ethertypes ETH_P_IP (v4) & ETH_P_IPV6 (v6).
+This special handling can be enabled only for ethertypes ETH_P_IP & ETH_P_MPLS_UC
+with a flag called multiproto mode.
+
 Usage
 ------
 
@@ -25,7 +34,16 @@ Usage
 
     b) ip link delete bareudp0
 
-2) Device Usage
+2) Device creation with multiple proto mode enabled
+
+There are two ways to create a bareudp device for MPLS & IP with multiproto mode
+enabled.
+
+    a) ip link add dev  bareudp0 type bareudp dstport 6635 ethertype 0x8847 multiproto
+
+    b) ip link add dev  bareudp0 type bareudp dstport 6635 ethertype mpls
+
+3) Device Usage
 
 The bareudp device could be used along with OVS or flower filter in TC.
 The OVS or TC flower layer must set the tunnel information in SKB dst field before
diff --git a/drivers/net/bareudp.c b/drivers/net/bareudp.c
index 0338160..88cef80 100644
--- a/drivers/net/bareudp.c
+++ b/drivers/net/bareudp.c
@@ -45,6 +45,7 @@ struct bareudp_dev {
 	__be16		   ethertype;
 	__be16             port;
 	u16	           sport_min;
+	bool               multi_proto_mode;
 	struct socket      __rcu *sock;
 	struct list_head   next;        /* bareudp node  on namespace list */
 	struct gro_cells   gro_cells;
@@ -70,7 +71,52 @@ static int bareudp_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 	else
 		family = AF_INET6;
 
-	proto = bareudp->ethertype;
+	if (bareudp->ethertype == htons(ETH_P_IP)) {
+		struct iphdr *iphdr;
+
+		iphdr = (struct iphdr *)(skb->data + BAREUDP_BASE_HLEN);
+		if (iphdr->version == 4) {
+			proto = bareudp->ethertype;
+		} else if (bareudp->multi_proto_mode && (iphdr->version == 6)) {
+			proto = htons(ETH_P_IPV6);
+		} else {
+			bareudp->dev->stats.rx_dropped++;
+			goto drop;
+		}
+	} else if (bareudp->ethertype == htons(ETH_P_MPLS_UC)) {
+		struct iphdr *tunnel_hdr;
+
+		tunnel_hdr = (struct iphdr *)skb_network_header(skb);
+		if (tunnel_hdr->version == 4) {
+			if (!ipv4_is_multicast(tunnel_hdr->daddr)) {
+				proto = bareudp->ethertype;
+			} else if (bareudp->multi_proto_mode &&
+				   ipv4_is_multicast(tunnel_hdr->daddr)) {
+				proto = htons(ETH_P_MPLS_MC);
+			} else {
+				bareudp->dev->stats.rx_dropped++;
+				goto drop;
+			}
+		} else {
+			int addr_type;
+			struct ipv6hdr *tunnel_hdr_v6;
+
+			tunnel_hdr_v6 = (struct ipv6hdr *)skb_network_header(skb);
+			addr_type =
+			ipv6_addr_type((struct in6_addr *)&tunnel_hdr_v6->daddr);
+			if (!(addr_type & IPV6_ADDR_MULTICAST)) {
+				proto = bareudp->ethertype;
+			} else if (bareudp->multi_proto_mode &&
+				   (addr_type & IPV6_ADDR_MULTICAST)) {
+				proto = htons(ETH_P_MPLS_MC);
+			} else {
+				bareudp->dev->stats.rx_dropped++;
+				goto drop;
+			}
+		}
+	} else {
+		proto = bareudp->ethertype;
+	}
 
 	if (iptunnel_pull_header(skb, BAREUDP_BASE_HLEN,
 				 proto,
@@ -370,8 +416,12 @@ static netdev_tx_t bareudp_xmit(struct sk_buff *skb, struct net_device *dev)
 	int err;
 
 	if (skb->protocol != bareudp->ethertype) {
-		err = -EINVAL;
-		goto tx_error;
+		if (!bareudp->multi_proto_mode ||
+		    (skb->protocol !=  htons(ETH_P_MPLS_MC) &&
+		     skb->protocol !=  htons(ETH_P_IPV6))) {
+			err = -EINVAL;
+			goto tx_error;
+		}
 	}
 
 	info = skb_tunnel_info(skb);
@@ -462,6 +512,7 @@ static int bareudp_fill_metadata_dst(struct net_device *dev,
 	[IFLA_BAREUDP_PORT]                = { .type = NLA_U16 },
 	[IFLA_BAREUDP_ETHERTYPE]	   = { .type = NLA_U16 },
 	[IFLA_BAREUDP_SRCPORT_MIN]         = { .type = NLA_U16 },
+	[IFLA_BAREUDP_MULTIPROTO_MODE]     = { .type = NLA_FLAG },
 };
 
 /* Info for udev, that this is a virtual tunnel endpoint */
@@ -544,9 +595,15 @@ static int bareudp_configure(struct net *net, struct net_device *dev,
 	if (t)
 		return -EBUSY;
 
+	if (conf->multi_proto_mode &&
+	    (conf->ethertype != htons(ETH_P_MPLS_UC) &&
+	     conf->ethertype != htons(ETH_P_IP)))
+		return -EINVAL;
+
 	bareudp->port = conf->port;
 	bareudp->ethertype = conf->ethertype;
 	bareudp->sport_min = conf->sport_min;
+	bareudp->multi_proto_mode = conf->multi_proto_mode;
 	err = register_netdevice(dev);
 	if (err)
 		return err;
@@ -603,6 +660,7 @@ static size_t bareudp_get_size(const struct net_device *dev)
 	return  nla_total_size(sizeof(__be16)) +  /* IFLA_BAREUDP_PORT */
 		nla_total_size(sizeof(__be16)) +  /* IFLA_BAREUDP_ETHERTYPE */
 		nla_total_size(sizeof(__u16))  +  /* IFLA_BAREUDP_SRCPORT_MIN */
+		nla_total_size(0)              +  /* IFLA_BAREUDP_MULTIPROTO_MODE */
 		0;
 }
 
@@ -616,6 +674,9 @@ static int bareudp_fill_info(struct sk_buff *skb, const struct net_device *dev)
 		goto nla_put_failure;
 	if (nla_put_u16(skb, IFLA_BAREUDP_SRCPORT_MIN, bareudp->sport_min))
 		goto nla_put_failure;
+	if (bareudp->multi_proto_mode &&
+	    nla_put_flag(skb, IFLA_BAREUDP_MULTIPROTO_MODE))
+		goto nla_put_failure;
 
 	return 0;
 
diff --git a/include/net/bareudp.h b/include/net/bareudp.h
index 513fae6..cb03f6f 100644
--- a/include/net/bareudp.h
+++ b/include/net/bareudp.h
@@ -10,6 +10,7 @@ struct bareudp_conf {
 	__be16 ethertype;
 	__be16 port;
 	u16 sport_min;
+	bool multi_proto_mode;
 };
 
 struct net_device *bareudp_dev_create(struct net *net, const char *name,
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index fb4b33a..61e0801 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -596,6 +596,7 @@ enum {
 	IFLA_BAREUDP_PORT,
 	IFLA_BAREUDP_ETHERTYPE,
 	IFLA_BAREUDP_SRCPORT_MIN,
+	IFLA_BAREUDP_MULTIPROTO_MODE,
 	__IFLA_BAREUDP_MAX
 };
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v7 1/2] net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS,IP,NSH etc.
  2020-02-15  6:20 ` [PATCH net-next v7 1/2] net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS,IP,NSH etc Martin Varghese
@ 2020-02-16 16:58   ` Willem de Bruijn
  2020-02-17  2:43     ` Martin Varghese
  2020-02-16 18:26   ` Willem de Bruijn
  1 sibling, 1 reply; 11+ messages in thread
From: Willem de Bruijn @ 2020-02-16 16:58 UTC (permalink / raw)
  To: Martin Varghese
  Cc: Network Development, David Miller, Jonathan Corbet,
	Alexey Kuznetsov, Hideaki YOSHIFUJI, scott.drennan, Jiri Benc,
	martin.varghese

On Fri, Feb 14, 2020 at 11:20 PM Martin Varghese
<martinvarghesenokia@gmail.com> wrote:
>
> From: Martin Varghese <martin.varghese@nokia.com>
>
> The Bareudp tunnel module provides a generic L3 encapsulation
> tunnelling module for tunnelling different protocols like MPLS,
> IP,NSH etc inside a UDP tunnel.
>
> Signed-off-by: Martin Varghese <martin.varghese@nokia.com>

A few small points

>  net/ipv4/route.c                     |  48 +++
>  net/ipv6/ip6_output.c                |  70 ++++

Both protocols have route.c and ip(6)_output.c files. For the sake of
consistency, both should ideally be in route.c. Did you choose
ip6_output.c for a reason?

There are also a couple of reverse christmas tree violations.

> +struct rtable *ip_route_output_tunnel(struct sk_buff *skb,
> +                                     struct net_device *dev,
> +                                     struct net *net, __be32 *saddr,
> +                                     const struct ip_tunnel_info *info,
> +                                     u8 protocol, bool use_cache)
> +{
> +#ifdef CONFIG_DST_CACHE
> +       struct dst_cache *dst_cache;
> +#endif
> +       struct rtable *rt = NULL;
> +       struct flowi4 fl4;
> +       __u8 tos;
> +
> +       memset(&fl4, 0, sizeof(fl4));
> +       fl4.flowi4_mark = skb->mark;
> +       fl4.flowi4_proto = protocol;
> +       fl4.daddr = info->key.u.ipv4.dst;
> +       fl4.saddr = info->key.u.ipv4.src;
> +
> +       tos = info->key.tos;
> +       fl4.flowi4_tos = RT_TOS(tos);
> +#ifdef CONFIG_DST_CACHE
> +       dst_cache = (struct dst_cache *)&info->dst_cache;
> +       if (use_cache) {
> +               rt = dst_cache_get_ip4(dst_cache, saddr);
> +               if (rt)
> +                       return rt;
> +       }
> +#endif

This is the same in geneve, but no need to initialize fl4 on a cache
hit. Then can also be restructured to only have a single #ifdef block.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v7 1/2] net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS,IP,NSH etc.
  2020-02-15  6:20 ` [PATCH net-next v7 1/2] net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS,IP,NSH etc Martin Varghese
  2020-02-16 16:58   ` Willem de Bruijn
@ 2020-02-16 18:26   ` Willem de Bruijn
  2020-02-17  2:49     ` Martin Varghese
  1 sibling, 1 reply; 11+ messages in thread
From: Willem de Bruijn @ 2020-02-16 18:26 UTC (permalink / raw)
  To: Martin Varghese
  Cc: Network Development, David Miller, Jonathan Corbet,
	Alexey Kuznetsov, Hideaki YOSHIFUJI, scott.drennan, Jiri Benc,
	martin.varghese

On Sat, Feb 15, 2020 at 12:20 AM Martin Varghese
<martinvarghesenokia@gmail.com> wrote:
>
> From: Martin Varghese <martin.varghese@nokia.com>
>
> The Bareudp tunnel module provides a generic L3 encapsulation
> tunnelling module for tunnelling different protocols like MPLS,
> IP,NSH etc inside a UDP tunnel.
>
> Signed-off-by: Martin Varghese <martin.varghese@nokia.com>

> +struct net_device *bareudp_dev_create(struct net *net, const char *name,
> +                                     u8 name_assign_type,
> +                                     struct bareudp_conf *conf)
> +{
> +       struct nlattr *tb[IFLA_MAX + 1];
> +       struct net_device *dev;
> +       LIST_HEAD(list_kill);
> +       int err;
> +
> +       memset(tb, 0, sizeof(tb));
> +       dev = rtnl_create_link(net, name, name_assign_type,
> +                              &bareudp_link_ops, tb, NULL);
> +       if (IS_ERR(dev))
> +               return dev;
> +
> +       err = bareudp_configure(net, dev, conf);
> +       if (err) {
> +               free_netdev(dev);
> +               return ERR_PTR(err);
> +       }
> +       err = dev_set_mtu(dev, IP_MAX_MTU);

does this not exceed dev->max_mtu?

> diff --git a/include/net/ipv6.h b/include/net/ipv6.h
> index cec1a54..1bf8065 100644
> --- a/include/net/ipv6.h
> +++ b/include/net/ipv6.h
> @@ -1027,6 +1027,12 @@ struct dst_entry *ip6_dst_lookup_flow(struct net *net, const struct sock *sk, st
>  struct dst_entry *ip6_sk_dst_lookup_flow(struct sock *sk, struct flowi6 *fl6,
>                                          const struct in6_addr *final_dst,
>                                          bool connected);
> +struct dst_entry *ip6_dst_lookup_tunnel(struct sk_buff *skb,
> +                                       struct net_device *dev,
> +                                       struct net *net, struct socket *sock,
> +                                       struct in6_addr *saddr,
> +                                       const struct ip_tunnel_info *info,
> +                                       u8 protocol, bool use_cache);
>  struct dst_entry *ip6_blackhole_route(struct net *net,
>                                       struct dst_entry *orig_dst);
>
> diff --git a/include/net/route.h b/include/net/route.h
> index a9c60fc..81750ae 100644
> --- a/include/net/route.h
> +++ b/include/net/route.h
> @@ -128,6 +128,12 @@ static inline struct rtable *__ip_route_output_key(struct net *net,
>
>  struct rtable *ip_route_output_flow(struct net *, struct flowi4 *flp,
>                                     const struct sock *sk);
> +struct rtable *ip_route_output_tunnel(struct sk_buff *skb,
> +                                     struct net_device *dev,
> +                                     struct net *net, __be32 *saddr,
> +                                     const struct ip_tunnel_info *info,
> +                                     u8 protocol, bool use_cache);
> +
>  struct dst_entry *ipv4_blackhole_route(struct net *net,
>                                        struct dst_entry *dst_orig);
>

Ah, I now see where the difference between net/ipv4/route.c and
net/ipv6/ip6_output.c come from. It follows from existing locations of
 ip6_sk_dst_lookup_flow and ip_route_output_flow.

Looking for the ipv6 analog of ip_route_output_flow, I see that, e.g.,
ipvlan uses ip6_route_output from net/ipv6/route.c without a NULL sk.
But ping calls ip6_sk_dst_lookup_flow.

It might be a better fit behind ip6_route_output_flags, but it's
probably moot, really.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v7 1/2] net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS,IP,NSH etc.
  2020-02-16 16:58   ` Willem de Bruijn
@ 2020-02-17  2:43     ` Martin Varghese
  2020-02-17  5:16       ` Willem de Bruijn
  0 siblings, 1 reply; 11+ messages in thread
From: Martin Varghese @ 2020-02-17  2:43 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, David Miller, Jonathan Corbet,
	Alexey Kuznetsov, Hideaki YOSHIFUJI, scott.drennan, Jiri Benc,
	martin.varghese

On Sun, Feb 16, 2020 at 10:58:30AM -0600, Willem de Bruijn wrote:
> On Fri, Feb 14, 2020 at 11:20 PM Martin Varghese
> <martinvarghesenokia@gmail.com> wrote:
> >
> > From: Martin Varghese <martin.varghese@nokia.com>
> >
> > The Bareudp tunnel module provides a generic L3 encapsulation
> > tunnelling module for tunnelling different protocols like MPLS,
> > IP,NSH etc inside a UDP tunnel.
> >
> > Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
> 
> A few small points
> 
> >  net/ipv4/route.c                     |  48 +++
> >  net/ipv6/ip6_output.c                |  70 ++++
> 
> Both protocols have route.c and ip(6)_output.c files. For the sake of
> consistency, both should ideally be in route.c. Did you choose
> ip6_output.c for a reason?
> 
> There are also a couple of reverse christmas tree violations.
>
In Bareudp.c correct?
Wondering if there is any flag in checkpatch to catch them? 
> > +struct rtable *ip_route_output_tunnel(struct sk_buff *skb,
> > +                                     struct net_device *dev,
> > +                                     struct net *net, __be32 *saddr,
> > +                                     const struct ip_tunnel_info *info,
> > +                                     u8 protocol, bool use_cache)
> > +{
> > +#ifdef CONFIG_DST_CACHE
> > +       struct dst_cache *dst_cache;
> > +#endif
> > +       struct rtable *rt = NULL;
> > +       struct flowi4 fl4;
> > +       __u8 tos;
> > +
> > +       memset(&fl4, 0, sizeof(fl4));
> > +       fl4.flowi4_mark = skb->mark;
> > +       fl4.flowi4_proto = protocol;
> > +       fl4.daddr = info->key.u.ipv4.dst;
> > +       fl4.saddr = info->key.u.ipv4.src;
> > +
> > +       tos = info->key.tos;
> > +       fl4.flowi4_tos = RT_TOS(tos);
> > +#ifdef CONFIG_DST_CACHE
> > +       dst_cache = (struct dst_cache *)&info->dst_cache;
> > +       if (use_cache) {
> > +               rt = dst_cache_get_ip4(dst_cache, saddr);
> > +               if (rt)
> > +                       return rt;
> > +       }
> > +#endif
> 
> This is the same in geneve, but no need to initialize fl4 on a cache
> hit. Then can also be restructured to only have a single #ifdef block.
Yes , We need not initialize fl4 when cache is used.
But i didnt get your point on restructuing to have a single #ifdef block
Could you please give more details

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v7 1/2] net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS,IP,NSH etc.
  2020-02-16 18:26   ` Willem de Bruijn
@ 2020-02-17  2:49     ` Martin Varghese
  2020-02-17  5:19       ` Willem de Bruijn
  0 siblings, 1 reply; 11+ messages in thread
From: Martin Varghese @ 2020-02-17  2:49 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, David Miller, Jonathan Corbet,
	Alexey Kuznetsov, Hideaki YOSHIFUJI, scott.drennan, Jiri Benc,
	martin.varghese

On Sun, Feb 16, 2020 at 12:26:18PM -0600, Willem de Bruijn wrote:
> On Sat, Feb 15, 2020 at 12:20 AM Martin Varghese
> <martinvarghesenokia@gmail.com> wrote:
> >
> > From: Martin Varghese <martin.varghese@nokia.com>
> >
> > The Bareudp tunnel module provides a generic L3 encapsulation
> > tunnelling module for tunnelling different protocols like MPLS,
> > IP,NSH etc inside a UDP tunnel.
> >
> > Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
> 
> > +struct net_device *bareudp_dev_create(struct net *net, const char *name,
> > +                                     u8 name_assign_type,
> > +                                     struct bareudp_conf *conf)
> > +{
> > +       struct nlattr *tb[IFLA_MAX + 1];
> > +       struct net_device *dev;
> > +       LIST_HEAD(list_kill);
> > +       int err;
> > +
> > +       memset(tb, 0, sizeof(tb));
> > +       dev = rtnl_create_link(net, name, name_assign_type,
> > +                              &bareudp_link_ops, tb, NULL);
> > +       if (IS_ERR(dev))
> > +               return dev;
> > +
> > +       err = bareudp_configure(net, dev, conf);
> > +       if (err) {
> > +               free_netdev(dev);
> > +               return ERR_PTR(err);
> > +       }
> > +       err = dev_set_mtu(dev, IP_MAX_MTU);
> 
> does this not exceed dev->max_mtu?
> 
Noted.Must consider BAREUDP Overhead.
> > diff --git a/include/net/ipv6.h b/include/net/ipv6.h
> > index cec1a54..1bf8065 100644
> > --- a/include/net/ipv6.h
> > +++ b/include/net/ipv6.h
> > @@ -1027,6 +1027,12 @@ struct dst_entry *ip6_dst_lookup_flow(struct net *net, const struct sock *sk, st
> >  struct dst_entry *ip6_sk_dst_lookup_flow(struct sock *sk, struct flowi6 *fl6,
> >                                          const struct in6_addr *final_dst,
> >                                          bool connected);
> > +struct dst_entry *ip6_dst_lookup_tunnel(struct sk_buff *skb,
> > +                                       struct net_device *dev,
> > +                                       struct net *net, struct socket *sock,
> > +                                       struct in6_addr *saddr,
> > +                                       const struct ip_tunnel_info *info,
> > +                                       u8 protocol, bool use_cache);
> >  struct dst_entry *ip6_blackhole_route(struct net *net,
> >                                       struct dst_entry *orig_dst);
> >
> > diff --git a/include/net/route.h b/include/net/route.h
> > index a9c60fc..81750ae 100644
> > --- a/include/net/route.h
> > +++ b/include/net/route.h
> > @@ -128,6 +128,12 @@ static inline struct rtable *__ip_route_output_key(struct net *net,
> >
> >  struct rtable *ip_route_output_flow(struct net *, struct flowi4 *flp,
> >                                     const struct sock *sk);
> > +struct rtable *ip_route_output_tunnel(struct sk_buff *skb,
> > +                                     struct net_device *dev,
> > +                                     struct net *net, __be32 *saddr,
> > +                                     const struct ip_tunnel_info *info,
> > +                                     u8 protocol, bool use_cache);
> > +
> >  struct dst_entry *ipv4_blackhole_route(struct net *net,
> >                                        struct dst_entry *dst_orig);
> >
> 
> Ah, I now see where the difference between net/ipv4/route.c and
> net/ipv6/ip6_output.c come from. It follows from existing locations of
>  ip6_sk_dst_lookup_flow and ip_route_output_flow.
> 
> Looking for the ipv6 analog of ip_route_output_flow, I see that, e.g.,
> ipvlan uses ip6_route_output from net/ipv6/route.c without a NULL sk.
> But ping calls ip6_sk_dst_lookup_flow.
> 
> It might be a better fit behind ip6_route_output_flags, but it's
> probably moot, really.

Actually i considered both the files but i felt this function 
should naturally sit with ip6_sk_dst_lookup_flow.
If you dont have strong objection i would like to keep the
function in ip6_output.c

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v7 1/2] net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS,IP,NSH etc.
  2020-02-17  2:43     ` Martin Varghese
@ 2020-02-17  5:16       ` Willem de Bruijn
  2020-02-23 16:14         ` Martin Varghese
  0 siblings, 1 reply; 11+ messages in thread
From: Willem de Bruijn @ 2020-02-17  5:16 UTC (permalink / raw)
  To: Martin Varghese
  Cc: Willem de Bruijn, Network Development, David Miller,
	Jonathan Corbet, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	scott.drennan, Jiri Benc, martin.varghese

> > There are also a couple of reverse christmas tree violations.
> >
> In Bareudp.c correct?

Right. Like bareudp_udp_encap_recv.

> Wondering if there is any flag in checkpatch to catch them?

It has come up, but I don't believe anything is merged.

> > > +struct rtable *ip_route_output_tunnel(struct sk_buff *skb,
> > > +                                     struct net_device *dev,
> > > +                                     struct net *net, __be32 *saddr,
> > > +                                     const struct ip_tunnel_info *info,
> > > +                                     u8 protocol, bool use_cache)
> > > +{
> > > +#ifdef CONFIG_DST_CACHE
> > > +       struct dst_cache *dst_cache;
> > > +#endif
> > > +       struct rtable *rt = NULL;
> > > +       struct flowi4 fl4;
> > > +       __u8 tos;
> > > +
> > > +       memset(&fl4, 0, sizeof(fl4));
> > > +       fl4.flowi4_mark = skb->mark;
> > > +       fl4.flowi4_proto = protocol;
> > > +       fl4.daddr = info->key.u.ipv4.dst;
> > > +       fl4.saddr = info->key.u.ipv4.src;
> > > +
> > > +       tos = info->key.tos;
> > > +       fl4.flowi4_tos = RT_TOS(tos);
> > > +#ifdef CONFIG_DST_CACHE
> > > +       dst_cache = (struct dst_cache *)&info->dst_cache;
> > > +       if (use_cache) {
> > > +               rt = dst_cache_get_ip4(dst_cache, saddr);
> > > +               if (rt)
> > > +                       return rt;
> > > +       }
> > > +#endif
> >
> > This is the same in geneve, but no need to initialize fl4 on a cache
> > hit. Then can also be restructured to only have a single #ifdef block.
> Yes , We need not initialize fl4 when cache is used.
> But i didnt get your point on restructuing to have a single #ifdef block
> Could you please give more details

Actually, I was mistaken, missing the third #ifdef block that calls
dst_cache_set_ip[46]. But the type of info->dst_cache is struct
dst_cache, so I don't think the explicit cast or additional pointer
variable (and with that the first #ifdef) is needed. But it's clearly
not terribly important.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v7 1/2] net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS,IP,NSH etc.
  2020-02-17  2:49     ` Martin Varghese
@ 2020-02-17  5:19       ` Willem de Bruijn
  0 siblings, 0 replies; 11+ messages in thread
From: Willem de Bruijn @ 2020-02-17  5:19 UTC (permalink / raw)
  To: Martin Varghese
  Cc: Willem de Bruijn, Network Development, David Miller,
	Jonathan Corbet, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	scott.drennan, Jiri Benc, martin.varghese

> > > diff --git a/include/net/ipv6.h b/include/net/ipv6.h
> > > index cec1a54..1bf8065 100644
> > > --- a/include/net/ipv6.h
> > > +++ b/include/net/ipv6.h
> > > @@ -1027,6 +1027,12 @@ struct dst_entry *ip6_dst_lookup_flow(struct net *net, const struct sock *sk, st
> > >  struct dst_entry *ip6_sk_dst_lookup_flow(struct sock *sk, struct flowi6 *fl6,
> > >                                          const struct in6_addr *final_dst,
> > >                                          bool connected);
> > > +struct dst_entry *ip6_dst_lookup_tunnel(struct sk_buff *skb,
> > > +                                       struct net_device *dev,
> > > +                                       struct net *net, struct socket *sock,
> > > +                                       struct in6_addr *saddr,
> > > +                                       const struct ip_tunnel_info *info,
> > > +                                       u8 protocol, bool use_cache);
> > >  struct dst_entry *ip6_blackhole_route(struct net *net,
> > >                                       struct dst_entry *orig_dst);
> > >
> > > diff --git a/include/net/route.h b/include/net/route.h
> > > index a9c60fc..81750ae 100644
> > > --- a/include/net/route.h
> > > +++ b/include/net/route.h
> > > @@ -128,6 +128,12 @@ static inline struct rtable *__ip_route_output_key(struct net *net,
> > >
> > >  struct rtable *ip_route_output_flow(struct net *, struct flowi4 *flp,
> > >                                     const struct sock *sk);
> > > +struct rtable *ip_route_output_tunnel(struct sk_buff *skb,
> > > +                                     struct net_device *dev,
> > > +                                     struct net *net, __be32 *saddr,
> > > +                                     const struct ip_tunnel_info *info,
> > > +                                     u8 protocol, bool use_cache);
> > > +
> > >  struct dst_entry *ipv4_blackhole_route(struct net *net,
> > >                                        struct dst_entry *dst_orig);
> > >
> >
> > Ah, I now see where the difference between net/ipv4/route.c and
> > net/ipv6/ip6_output.c come from. It follows from existing locations of
> >  ip6_sk_dst_lookup_flow and ip_route_output_flow.
> >
> > Looking for the ipv6 analog of ip_route_output_flow, I see that, e.g.,
> > ipvlan uses ip6_route_output from net/ipv6/route.c without a NULL sk.
> > But ping calls ip6_sk_dst_lookup_flow.
> >
> > It might be a better fit behind ip6_route_output_flags, but it's
> > probably moot, really.
>
> Actually i considered both the files but i felt this function
> should naturally sit with ip6_sk_dst_lookup_flow.
> If you dont have strong objection i would like to keep the
> function in ip6_output.c

Yes, sounds good, thanks. The difference stood out to me in an initial
git show --stat, but on closer reading both choices can be argued for.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v7 1/2] net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS,IP,NSH etc.
  2020-02-17  5:16       ` Willem de Bruijn
@ 2020-02-23 16:14         ` Martin Varghese
  2020-02-24  2:19           ` Willem de Bruijn
  0 siblings, 1 reply; 11+ messages in thread
From: Martin Varghese @ 2020-02-23 16:14 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, David Miller, Jonathan Corbet,
	Alexey Kuznetsov, Hideaki YOSHIFUJI, scott.drennan, Jiri Benc,
	martin.varghese

On Sun, Feb 16, 2020 at 09:16:34PM -0800, Willem de Bruijn wrote:
> > > There are also a couple of reverse christmas tree violations.
> > >
> > In Bareudp.c correct?
> 
> Right. Like bareudp_udp_encap_recv.
> 
> > Wondering if there is any flag in checkpatch to catch them?
> 
> It has come up, but I don't believe anything is merged.
> 
> > > > +struct rtable *ip_route_output_tunnel(struct sk_buff *skb,
> > > > +                                     struct net_device *dev,
> > > > +                                     struct net *net, __be32 *saddr,
> > > > +                                     const struct ip_tunnel_info *info,
> > > > +                                     u8 protocol, bool use_cache)
> > > > +{
> > > > +#ifdef CONFIG_DST_CACHE
> > > > +       struct dst_cache *dst_cache;
> > > > +#endif
> > > > +       struct rtable *rt = NULL;
> > > > +       struct flowi4 fl4;
> > > > +       __u8 tos;
> > > > +
> > > > +       memset(&fl4, 0, sizeof(fl4));
> > > > +       fl4.flowi4_mark = skb->mark;
> > > > +       fl4.flowi4_proto = protocol;
> > > > +       fl4.daddr = info->key.u.ipv4.dst;
> > > > +       fl4.saddr = info->key.u.ipv4.src;
> > > > +
> > > > +       tos = info->key.tos;
> > > > +       fl4.flowi4_tos = RT_TOS(tos);
> > > > +#ifdef CONFIG_DST_CACHE
> > > > +       dst_cache = (struct dst_cache *)&info->dst_cache;
> > > > +       if (use_cache) {
> > > > +               rt = dst_cache_get_ip4(dst_cache, saddr);
> > > > +               if (rt)
> > > > +                       return rt;
> > > > +       }
> > > > +#endif
> > >
> > > This is the same in geneve, but no need to initialize fl4 on a cache
> > > hit. Then can also be restructured to only have a single #ifdef block.
> > Yes , We need not initialize fl4 when cache is used.
> > But i didnt get your point on restructuing to have a single #ifdef block
> > Could you please give more details
> 
> Actually, I was mistaken, missing the third #ifdef block that calls
> dst_cache_set_ip[46]. But the type of info->dst_cache is struct
> dst_cache, so I don't think the explicit cast or additional pointer
> variable (and with that the first #ifdef) is needed. But it's clearly
> not terribly important.
I tried to remove the additional pointer variable and the explicit cast.But Compiler warns as 
the info is a const variable (same for geneve)

So shall we keep as it is ?



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v7 1/2] net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS,IP,NSH etc.
  2020-02-23 16:14         ` Martin Varghese
@ 2020-02-24  2:19           ` Willem de Bruijn
  0 siblings, 0 replies; 11+ messages in thread
From: Willem de Bruijn @ 2020-02-24  2:19 UTC (permalink / raw)
  To: Martin Varghese
  Cc: Network Development, David Miller, Jonathan Corbet,
	Alexey Kuznetsov, Hideaki YOSHIFUJI, scott.drennan, Jiri Benc,
	martin.varghese

On Sun, Feb 23, 2020 at 11:15 AM Martin Varghese
<martinvarghesenokia@gmail.com> wrote:
>
> On Sun, Feb 16, 2020 at 09:16:34PM -0800, Willem de Bruijn wrote:
> > > > There are also a couple of reverse christmas tree violations.
> > > >
> > > In Bareudp.c correct?
> >
> > Right. Like bareudp_udp_encap_recv.
> >
> > > Wondering if there is any flag in checkpatch to catch them?
> >
> > It has come up, but I don't believe anything is merged.
> >
> > > > > +struct rtable *ip_route_output_tunnel(struct sk_buff *skb,
> > > > > +                                     struct net_device *dev,
> > > > > +                                     struct net *net, __be32 *saddr,
> > > > > +                                     const struct ip_tunnel_info *info,
> > > > > +                                     u8 protocol, bool use_cache)
> > > > > +{
> > > > > +#ifdef CONFIG_DST_CACHE
> > > > > +       struct dst_cache *dst_cache;
> > > > > +#endif
> > > > > +       struct rtable *rt = NULL;
> > > > > +       struct flowi4 fl4;
> > > > > +       __u8 tos;
> > > > > +
> > > > > +       memset(&fl4, 0, sizeof(fl4));
> > > > > +       fl4.flowi4_mark = skb->mark;
> > > > > +       fl4.flowi4_proto = protocol;
> > > > > +       fl4.daddr = info->key.u.ipv4.dst;
> > > > > +       fl4.saddr = info->key.u.ipv4.src;
> > > > > +
> > > > > +       tos = info->key.tos;
> > > > > +       fl4.flowi4_tos = RT_TOS(tos);
> > > > > +#ifdef CONFIG_DST_CACHE
> > > > > +       dst_cache = (struct dst_cache *)&info->dst_cache;
> > > > > +       if (use_cache) {
> > > > > +               rt = dst_cache_get_ip4(dst_cache, saddr);
> > > > > +               if (rt)
> > > > > +                       return rt;
> > > > > +       }
> > > > > +#endif
> > > >
> > > > This is the same in geneve, but no need to initialize fl4 on a cache
> > > > hit. Then can also be restructured to only have a single #ifdef block.
> > > Yes , We need not initialize fl4 when cache is used.
> > > But i didnt get your point on restructuing to have a single #ifdef block
> > > Could you please give more details
> >
> > Actually, I was mistaken, missing the third #ifdef block that calls
> > dst_cache_set_ip[46]. But the type of info->dst_cache is struct
> > dst_cache, so I don't think the explicit cast or additional pointer
> > variable (and with that the first #ifdef) is needed. But it's clearly
> > not terribly important.
> I tried to remove the additional pointer variable and the explicit cast.But Compiler warns as
> the info is a const variable (same for geneve)
>
> So shall we keep as it is ?

Thanks for giving it a try. Yes, that's fine then.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-02-24  2:20 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-15  6:19 [PATCH net-next v7 0/2] Bare UDP L3 Encapsulation Module Martin Varghese
2020-02-15  6:20 ` [PATCH net-next v7 1/2] net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS,IP,NSH etc Martin Varghese
2020-02-16 16:58   ` Willem de Bruijn
2020-02-17  2:43     ` Martin Varghese
2020-02-17  5:16       ` Willem de Bruijn
2020-02-23 16:14         ` Martin Varghese
2020-02-24  2:19           ` Willem de Bruijn
2020-02-16 18:26   ` Willem de Bruijn
2020-02-17  2:49     ` Martin Varghese
2020-02-17  5:19       ` Willem de Bruijn
2020-02-15  6:20 ` [PATCH net-next v7 2/2] net: Special handling for IP & MPLS Martin Varghese

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).