All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v2 0/9] Geneve: Add support for tunnel metadata mode
@ 2015-08-17 21:11 Pravin B Shelar
  2015-08-17 21:11 ` [PATCH net-next v2 1/9] geneve: Initialize ethernet address in device setup Pravin B Shelar
                   ` (8 more replies)
  0 siblings, 9 replies; 33+ messages in thread
From: Pravin B Shelar @ 2015-08-17 21:11 UTC (permalink / raw)
  To: netdev; +Cc: Pravin B Shelar

Following patches adds spport for Geneve tunnel metadata
mode. OVS can make use of Geneve net-device with tunnel
metadata API from kernel.

This also allows us to consolidate Geneve implementation
from two kernel modules geneve_core and geneve to single
geneve module. geneve_core module was targeted to share
Geneve encap and decap code between Geneve netdevice and
OVS Geneve tunnel implementation, Since OVS no longer
needs these API, Geneve code can be consolidated into
single geneve module.

v1-v2:
- Replaced per hash table tunnel pointer (metadata enabled) with flag.
- Added support for changelink.
- Improve geneve device route lookup with more parameters.

Pravin B Shelar (9):
  geneve: Initialize ethernet address in device setup.
  geneve: Use skb mark and protocol to lookup route.
  tunnel: introduce udp_tun_rx_dst()
  geneve: Make dst-port configurable.
  geneve: Add support to collect tunnel metadata.
  openvswitch: Use Geneve device.
  geneve: Consolidate Geneve functionality in single module.
  geneve: Move device hash table to geneve socket.
  geneve: Implement rtnl changelink

 drivers/net/Kconfig            |   2 +-
 drivers/net/geneve.c           | 804 +++++++++++++++++++++++++++++++++++------
 drivers/net/vxlan.c            |  18 +-
 include/net/dst_metadata.h     |  27 ++
 include/net/geneve.h           |  35 +-
 include/net/udp_tunnel.h       |   3 +
 include/uapi/linux/if_link.h   |   2 +
 net/ipv4/Kconfig               |  14 -
 net/ipv4/Makefile              |   1 -
 net/ipv4/geneve_core.c         | 447 -----------------------
 net/ipv4/ip_gre.c              |  21 +-
 net/ipv4/udp_tunnel.c          |  24 +-
 net/openvswitch/Kconfig        |   2 +-
 net/openvswitch/vport-geneve.c | 179 ++-------
 14 files changed, 791 insertions(+), 788 deletions(-)
 delete mode 100644 net/ipv4/geneve_core.c

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH net-next v2 1/9] geneve: Initialize ethernet address in device setup.
  2015-08-17 21:11 [PATCH net-next v2 0/9] Geneve: Add support for tunnel metadata mode Pravin B Shelar
@ 2015-08-17 21:11 ` Pravin B Shelar
  2015-08-19  0:10   ` Jesse Gross
  2015-08-20 16:47   ` Thomas Graf
  2015-08-17 21:11 ` [PATCH net-next v2 2/9] geneve: Use skb mark and protocol to lookup route Pravin B Shelar
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 33+ messages in thread
From: Pravin B Shelar @ 2015-08-17 21:11 UTC (permalink / raw)
  To: netdev; +Cc: Pravin B Shelar

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 drivers/net/geneve.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 78d49d1..51f7f8b 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -298,6 +298,7 @@ static void geneve_setup(struct net_device *dev)
 
 	netif_keep_dst(dev);
 	dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
+	eth_hw_addr_random(dev);
 }
 
 static const struct nla_policy geneve_policy[IFLA_GENEVE_MAX + 1] = {
@@ -365,9 +366,6 @@ static int geneve_newlink(struct net *net, struct net_device *dev,
 			return -EBUSY;
 	}
 
-	if (tb[IFLA_ADDRESS] == NULL)
-		eth_hw_addr_random(dev);
-
 	err = register_netdevice(dev);
 	if (err)
 		return err;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next v2 2/9] geneve: Use skb mark and protocol to lookup route.
  2015-08-17 21:11 [PATCH net-next v2 0/9] Geneve: Add support for tunnel metadata mode Pravin B Shelar
  2015-08-17 21:11 ` [PATCH net-next v2 1/9] geneve: Initialize ethernet address in device setup Pravin B Shelar
@ 2015-08-17 21:11 ` Pravin B Shelar
  2015-08-19  0:12   ` Jesse Gross
  2015-08-20 16:47   ` Thomas Graf
  2015-08-17 21:11 ` [PATCH net-next v2 3/9] tunnel: introduce udp_tun_rx_dst() Pravin B Shelar
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 33+ messages in thread
From: Pravin B Shelar @ 2015-08-17 21:11 UTC (permalink / raw)
  To: netdev; +Cc: Pravin B Shelar

On packet transmit path geneve need to lookup route. Following
patch improves route lookup using more parameters.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 drivers/net/geneve.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 51f7f8b..4f4d15e 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -202,6 +202,9 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct net_device *dev)
 	memset(&fl4, 0, sizeof(fl4));
 	fl4.flowi4_tos = RT_TOS(tos);
 	fl4.daddr = geneve->remote.sin_addr.s_addr;
+	fl4.flowi4_mark = skb->mark;
+	fl4.flowi4_proto = IPPROTO_UDP;
+
 	rt = ip_route_output_key(geneve->net, &fl4);
 	if (IS_ERR(rt)) {
 		netdev_dbg(dev, "no route to %pI4\n", &fl4.daddr);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next v2 3/9] tunnel: introduce udp_tun_rx_dst()
  2015-08-17 21:11 [PATCH net-next v2 0/9] Geneve: Add support for tunnel metadata mode Pravin B Shelar
  2015-08-17 21:11 ` [PATCH net-next v2 1/9] geneve: Initialize ethernet address in device setup Pravin B Shelar
  2015-08-17 21:11 ` [PATCH net-next v2 2/9] geneve: Use skb mark and protocol to lookup route Pravin B Shelar
@ 2015-08-17 21:11 ` Pravin B Shelar
  2015-08-20 16:56   ` Thomas Graf
  2015-08-17 21:11 ` [PATCH net-next v2 4/9] geneve: Make dst-port configurable Pravin B Shelar
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 33+ messages in thread
From: Pravin B Shelar @ 2015-08-17 21:11 UTC (permalink / raw)
  To: netdev; +Cc: Pravin B Shelar

Introduce function udp_tun_rx_dst() to initialize tunnel dst on
receive path.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Reviewed-by: Jesse Gross <jesse@nicira.com>
---
 drivers/net/vxlan.c        | 18 ++----------------
 include/net/dst_metadata.h | 27 +++++++++++++++++++++++++++
 include/net/udp_tunnel.h   |  3 +++
 net/ipv4/ip_gre.c          | 21 +++++----------------
 net/ipv4/udp_tunnel.c      | 24 +++++++++++++++++++++++-
 5 files changed, 60 insertions(+), 33 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 06c0731..94a12ef 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1269,26 +1269,12 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 	}
 
 	if (vxlan_collect_metadata(vs)) {
-		const struct iphdr *iph = ip_hdr(skb);
-
-		tun_dst = metadata_dst_alloc(sizeof(*md), GFP_ATOMIC);
+		tun_dst = udp_tun_rx_dst(skb, TUNNEL_KEY,
+					 cpu_to_be64(vni >> 8), sizeof(*md));
 		if (!tun_dst)
 			goto drop;
 
 		info = &tun_dst->u.tun_info;
-		info->key.ipv4_src = iph->saddr;
-		info->key.ipv4_dst = iph->daddr;
-		info->key.ipv4_tos = iph->tos;
-		info->key.ipv4_ttl = iph->ttl;
-		info->key.tp_src = udp_hdr(skb)->source;
-		info->key.tp_dst = udp_hdr(skb)->dest;
-
-		info->mode = IP_TUNNEL_INFO_RX;
-		info->key.tun_flags = TUNNEL_KEY;
-		info->key.tun_id = cpu_to_be64(vni >> 8);
-		if (udp_hdr(skb)->check != 0)
-			info->key.tun_flags |= TUNNEL_CSUM;
-
 		md = ip_tunnel_info_opts(info, sizeof(*md));
 	} else {
 		memset(md, 0, sizeof(*md));
diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
index 075f523..c0934bd 100644
--- a/include/net/dst_metadata.h
+++ b/include/net/dst_metadata.h
@@ -53,4 +53,31 @@ static inline bool skb_valid_dst(const struct sk_buff *skb)
 struct metadata_dst *metadata_dst_alloc(u8 optslen, gfp_t flags);
 struct metadata_dst __percpu *metadata_dst_alloc_percpu(u8 optslen, gfp_t flags);
 
+static inline struct metadata_dst *ip_tun_rx_dst(struct sk_buff *skb,
+						 __be16 flags,
+						 __be64 tunnel_id,
+						 int md_size)
+{
+	const struct iphdr *iph = ip_hdr(skb);
+	struct metadata_dst *tun_dst;
+	struct ip_tunnel_info *info;
+
+	tun_dst = metadata_dst_alloc(md_size, GFP_ATOMIC);
+	if (!tun_dst)
+		return NULL;
+
+	info = &tun_dst->u.tun_info;
+	info->key.ipv4_src = iph->saddr;
+	info->key.ipv4_dst = iph->daddr;
+	info->key.ipv4_tos = iph->tos;
+	info->key.ipv4_ttl = iph->ttl;
+
+	info->mode = IP_TUNNEL_INFO_RX;
+	info->key.tun_flags = flags;
+	info->key.tun_id = tunnel_id;
+	info->key.tp_src = 0;
+	info->key.tp_dst = 0;
+	return tun_dst;
+}
+
 #endif /* __NET_DST_METADATA_H */
diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
index c491c12..0ca17873 100644
--- a/include/net/udp_tunnel.h
+++ b/include/net/udp_tunnel.h
@@ -93,6 +93,9 @@ int udp_tunnel6_xmit_skb(struct dst_entry *dst, struct sock *sk,
 
 void udp_tunnel_sock_release(struct socket *sock);
 
+struct metadata_dst *udp_tun_rx_dst(struct sk_buff *skb, __be16 flags,
+				 __be64 tunnel_id, int md_size);
+
 static inline struct sk_buff *udp_tunnel_handle_offloads(struct sk_buff *skb,
 							 bool udp_csum)
 {
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index fb44d69..51f722a 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -400,25 +400,14 @@ static int ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi)
 	if (tunnel) {
 		skb_pop_mac_header(skb);
 		if (tunnel->collect_md) {
-			struct ip_tunnel_info *info;
+			__be16 flags;
+			__be64 tun_id;
 
-			tun_dst = metadata_dst_alloc(0, GFP_ATOMIC);
+			flags = tpi->flags & (TUNNEL_CSUM | TUNNEL_KEY);
+			tun_id = key_to_tunnel_id(tpi->key);
+			tun_dst = ip_tun_rx_dst(skb, flags, tun_id, 0);
 			if (!tun_dst)
 				return PACKET_REJECT;
-
-			info = &tun_dst->u.tun_info;
-			info->key.ipv4_src = iph->saddr;
-			info->key.ipv4_dst = iph->daddr;
-			info->key.ipv4_tos = iph->tos;
-			info->key.ipv4_ttl = iph->ttl;
-
-			info->mode = IP_TUNNEL_INFO_RX;
-			info->key.tun_flags = tpi->flags &
-					      (TUNNEL_CSUM | TUNNEL_KEY);
-			info->key.tun_id = key_to_tunnel_id(tpi->key);
-
-			info->key.tp_src = 0;
-			info->key.tp_dst = 0;
 		}
 
 		ip_tunnel_rcv(tunnel, skb, tpi, tun_dst, log_ecn_error);
diff --git a/net/ipv4/udp_tunnel.c b/net/ipv4/udp_tunnel.c
index 933ea90..b67fe04 100644
--- a/net/ipv4/udp_tunnel.c
+++ b/net/ipv4/udp_tunnel.c
@@ -4,9 +4,10 @@
 #include <linux/udp.h>
 #include <linux/types.h>
 #include <linux/kernel.h>
+#include <net/dst_metadata.h>
+#include <net/net_namespace.h>
 #include <net/udp.h>
 #include <net/udp_tunnel.h>
-#include <net/net_namespace.h>
 
 int udp_sock_create4(struct net *net, struct udp_port_cfg *cfg,
 		     struct socket **sockp)
@@ -103,4 +104,25 @@ void udp_tunnel_sock_release(struct socket *sock)
 }
 EXPORT_SYMBOL_GPL(udp_tunnel_sock_release);
 
+struct metadata_dst *udp_tun_rx_dst(struct sk_buff *skb, __be16 flags,
+				    __be64 tunnel_id, int md_size)
+{
+	const struct iphdr *iph = ip_hdr(skb);
+	struct metadata_dst *tun_dst;
+	struct ip_tunnel_info *info;
+
+	tun_dst = ip_tun_rx_dst(skb, flags, tunnel_id, md_size);
+	if (!tun_dst)
+		return NULL;
+
+	info = &tun_dst->u.tun_info;
+	info->key.tp_src = udp_hdr(skb)->source;
+	info->key.tp_dst = udp_hdr(skb)->dest;
+	if (udp_hdr(skb)->check != 0)
+		info->key.tun_flags |= TUNNEL_CSUM;
+
+	return tun_dst;
+}
+EXPORT_SYMBOL_GPL(udp_tun_rx_dst);
+
 MODULE_LICENSE("GPL");
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next v2 4/9] geneve: Make dst-port configurable.
  2015-08-17 21:11 [PATCH net-next v2 0/9] Geneve: Add support for tunnel metadata mode Pravin B Shelar
                   ` (2 preceding siblings ...)
  2015-08-17 21:11 ` [PATCH net-next v2 3/9] tunnel: introduce udp_tun_rx_dst() Pravin B Shelar
@ 2015-08-17 21:11 ` Pravin B Shelar
  2015-08-19  0:27   ` Jesse Gross
  2015-08-20 16:58   ` Thomas Graf
  2015-08-17 21:11 ` [PATCH net-next v2 5/9] geneve: Add support to collect tunnel metadata Pravin B Shelar
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 33+ messages in thread
From: Pravin B Shelar @ 2015-08-17 21:11 UTC (permalink / raw)
  To: netdev; +Cc: Pravin B Shelar

Add netlink interface to configure Geneve UDP port number.
So that user can configure it for a Gevene device.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 drivers/net/geneve.c         | 25 +++++++++++++++++++++----
 include/uapi/linux/if_link.h |  1 +
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 4f4d15e..546494d 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -49,6 +49,7 @@ struct geneve_dev {
 	u8                 tos;		/* TOS override */
 	struct sockaddr_in remote;	/* IPv4 address for link partner */
 	struct list_head   next;	/* geneve's per namespace list */
+	__be16		   dst_port;
 };
 
 static int geneve_net_id;
@@ -64,6 +65,7 @@ static inline __u32 geneve_net_vni_hash(u8 vni[3])
 /* geneve receive/decap routine */
 static void geneve_rx(struct geneve_sock *gs, struct sk_buff *skb)
 {
+	struct inet_sock *sk = inet_sk(gs->sock->sk);
 	struct genevehdr *gnvh = geneve_hdr(skb);
 	struct geneve_dev *dummy, *geneve = NULL;
 	struct geneve_net *gn;
@@ -82,7 +84,8 @@ static void geneve_rx(struct geneve_sock *gs, struct sk_buff *skb)
 	vni_list_head = &gn->vni_list[hash];
 	hlist_for_each_entry_rcu(dummy, vni_list_head, hlist) {
 		if (!memcmp(gnvh->vni, dummy->vni, sizeof(dummy->vni)) &&
-		    iph->saddr == dummy->remote.sin_addr.s_addr) {
+		    iph->saddr == dummy->remote.sin_addr.s_addr &&
+		    sk->inet_sport == dummy->dst_port) {
 			geneve = dummy;
 			break;
 		}
@@ -157,7 +160,7 @@ static int geneve_open(struct net_device *dev)
 	struct geneve_net *gn = net_generic(geneve->net, geneve_net_id);
 	struct geneve_sock *gs;
 
-	gs = geneve_sock_add(net, htons(GENEVE_UDP_PORT), geneve_rx, gn,
+	gs = geneve_sock_add(net, geneve->dst_port, geneve_rx, gn,
 	                     false, false);
 	if (IS_ERR(gs))
 		return PTR_ERR(gs);
@@ -228,7 +231,7 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct net_device *dev)
 	/* no need to handle local destination and encap bypass...yet... */
 
 	err = geneve_xmit_skb(gs, rt, skb, fl4.saddr, fl4.daddr,
-	                      tos, ttl, 0, sport, htons(GENEVE_UDP_PORT), 0,
+			      tos, ttl, 0, sport, geneve->dst_port, 0,
 	                      geneve->vni, 0, NULL, false,
 	                      !net_eq(geneve->net, dev_net(geneve->dev)));
 	if (err < 0)
@@ -309,6 +312,7 @@ static const struct nla_policy geneve_policy[IFLA_GENEVE_MAX + 1] = {
 	[IFLA_GENEVE_REMOTE]		= { .len = FIELD_SIZEOF(struct iphdr, daddr) },
 	[IFLA_GENEVE_TTL]		= { .type = NLA_U8 },
 	[IFLA_GENEVE_TOS]		= { .type = NLA_U8 },
+	[IFLA_GENEVE_PORT]		= { .type = NLA_U16 },
 };
 
 static int geneve_validate(struct nlattr *tb[], struct nlattr *data[])
@@ -342,6 +346,7 @@ static int geneve_newlink(struct net *net, struct net_device *dev,
 	struct hlist_head *vni_list_head;
 	struct sockaddr_in remote;	/* IPv4 address for link partner */
 	__u32 vni, hash;
+	__be16 dst_port;
 	int err;
 
 	if (!data[IFLA_GENEVE_ID] || !data[IFLA_GENEVE_REMOTE])
@@ -360,13 +365,20 @@ static int geneve_newlink(struct net *net, struct net_device *dev,
 	if (IN_MULTICAST(ntohl(geneve->remote.sin_addr.s_addr)))
 		return -EINVAL;
 
+	if (data[IFLA_GENEVE_PORT])
+		dst_port = htons(nla_get_u16(data[IFLA_GENEVE_PORT]));
+	else
+		dst_port = htons(GENEVE_UDP_PORT);
+
 	remote = geneve->remote;
 	hash = geneve_net_vni_hash(geneve->vni);
 	vni_list_head = &gn->vni_list[hash];
 	hlist_for_each_entry_rcu(dummy, vni_list_head, hlist) {
 		if (!memcmp(geneve->vni, dummy->vni, sizeof(dummy->vni)) &&
-		    !memcmp(&remote, &dummy->remote, sizeof(dummy->remote)))
+		    !memcmp(&remote, &dummy->remote, sizeof(dummy->remote)) &&
+		    dst_port == dummy->dst_port) {
 			return -EBUSY;
+		}
 	}
 
 	err = register_netdevice(dev);
@@ -379,6 +391,7 @@ static int geneve_newlink(struct net *net, struct net_device *dev,
 	if (data[IFLA_GENEVE_TOS])
 		geneve->tos = nla_get_u8(data[IFLA_GENEVE_TOS]);
 
+	geneve->dst_port = dst_port;
 	list_add(&geneve->next, &gn->geneve_list);
 
 	hlist_add_head_rcu(&geneve->hlist, &gn->vni_list[hash]);
@@ -403,6 +416,7 @@ static size_t geneve_get_size(const struct net_device *dev)
 		nla_total_size(sizeof(struct in_addr)) + /* IFLA_GENEVE_REMOTE */
 		nla_total_size(sizeof(__u8)) +  /* IFLA_GENEVE_TTL */
 		nla_total_size(sizeof(__u8)) +  /* IFLA_GENEVE_TOS */
+		nla_total_size(sizeof(__u16)) +  /* IFLA_GENEVE_PORT */
 		0;
 }
 
@@ -423,6 +437,9 @@ static int geneve_fill_info(struct sk_buff *skb, const struct net_device *dev)
 	    nla_put_u8(skb, IFLA_GENEVE_TOS, geneve->tos))
 		goto nla_put_failure;
 
+	if (nla_put_u32(skb, IFLA_GENEVE_PORT, ntohs(geneve->dst_port)))
+		goto nla_put_failure;
+
 	return 0;
 
 nla_put_failure:
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 313c305..aa0d0f1 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -409,6 +409,7 @@ enum {
 	IFLA_GENEVE_REMOTE,
 	IFLA_GENEVE_TTL,
 	IFLA_GENEVE_TOS,
+	IFLA_GENEVE_PORT,	/* destination port */
 	__IFLA_GENEVE_MAX
 };
 #define IFLA_GENEVE_MAX	(__IFLA_GENEVE_MAX - 1)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next v2 5/9] geneve: Add support to collect tunnel metadata.
  2015-08-17 21:11 [PATCH net-next v2 0/9] Geneve: Add support for tunnel metadata mode Pravin B Shelar
                   ` (3 preceding siblings ...)
  2015-08-17 21:11 ` [PATCH net-next v2 4/9] geneve: Make dst-port configurable Pravin B Shelar
@ 2015-08-17 21:11 ` Pravin B Shelar
  2015-08-18 21:12   ` David Miller
  2015-08-19  1:07   ` Jesse Gross
  2015-08-17 21:11 ` [PATCH net-next v2 6/9] openvswitch: Use Geneve device Pravin B Shelar
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 33+ messages in thread
From: Pravin B Shelar @ 2015-08-17 21:11 UTC (permalink / raw)
  To: netdev; +Cc: Pravin B Shelar

Following patch create new tunnel flag which enable
tunnel metadata collection on given device. These devices
can be used by tunnel metadata based routing or by OVS.
Geneve Consolidation patch get rid of collect_md_tun to
simplify tunnel lookup further.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 drivers/net/geneve.c         | 348 ++++++++++++++++++++++++++++++++-----------
 include/net/geneve.h         |   3 +
 include/uapi/linux/if_link.h |   1 +
 3 files changed, 268 insertions(+), 84 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 546494d..cb2d874 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -15,6 +15,7 @@
 #include <linux/netdevice.h>
 #include <linux/etherdevice.h>
 #include <linux/hash.h>
+#include <net/dst_metadata.h>
 #include <net/rtnetlink.h>
 #include <net/geneve.h>
 
@@ -36,6 +37,7 @@ MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");
 struct geneve_net {
 	struct list_head  geneve_list;
 	struct hlist_head vni_list[VNI_HASH_SIZE];
+	struct geneve_dev __rcu *collect_md_tun;
 };
 
 /* Pseudo network device */
@@ -50,6 +52,7 @@ struct geneve_dev {
 	struct sockaddr_in remote;	/* IPv4 address for link partner */
 	struct list_head   next;	/* geneve's per namespace list */
 	__be16		   dst_port;
+	bool		   collect_md;
 };
 
 static int geneve_net_id;
@@ -62,48 +65,95 @@ static inline __u32 geneve_net_vni_hash(u8 vni[3])
 	return hash_32(vnid, VNI_HASH_BITS);
 }
 
-/* geneve receive/decap routine */
-static void geneve_rx(struct geneve_sock *gs, struct sk_buff *skb)
+static __be64 vni_to_tunnel_id(const __u8 *vni)
+{
+#ifdef __BIG_ENDIAN
+	return (vni[0] << 16) | (vni[1] << 8) | vni[2];
+#else
+	return (__force __be64)(((__force u64)vni[0] << 40) |
+				((__force u64)vni[1] << 48) |
+				((__force u64)vni[2] << 56));
+#endif
+}
+
+static struct geneve_dev *geneve_lookup(struct geneve_net *gn,
+					struct geneve_sock *gs,
+					struct iphdr *iph,
+					struct genevehdr *gnvh)
 {
 	struct inet_sock *sk = inet_sk(gs->sock->sk);
-	struct genevehdr *gnvh = geneve_hdr(skb);
-	struct geneve_dev *dummy, *geneve = NULL;
-	struct geneve_net *gn;
-	struct iphdr *iph = NULL;
-	struct pcpu_sw_netstats *stats;
 	struct hlist_head *vni_list_head;
-	int err = 0;
+	struct geneve_dev *geneve;
 	__u32 hash;
 
-	iph = ip_hdr(skb); /* Still outer IP header... */
-
-	gn = gs->rcv_data;
+	geneve = rcu_dereference(gn->collect_md_tun);
+	if (geneve)
+		return geneve;
 
 	/* Find the device for this VNI */
 	hash = geneve_net_vni_hash(gnvh->vni);
 	vni_list_head = &gn->vni_list[hash];
-	hlist_for_each_entry_rcu(dummy, vni_list_head, hlist) {
-		if (!memcmp(gnvh->vni, dummy->vni, sizeof(dummy->vni)) &&
-		    iph->saddr == dummy->remote.sin_addr.s_addr &&
-		    sk->inet_sport == dummy->dst_port) {
-			geneve = dummy;
-			break;
+	hlist_for_each_entry_rcu(geneve, vni_list_head, hlist) {
+		if (!memcmp(gnvh->vni, geneve->vni, sizeof(geneve->vni)) &&
+		    iph->saddr == geneve->remote.sin_addr.s_addr &&
+		    sk->inet_sport == geneve->dst_port) {
+			return geneve;
 		}
 	}
+	return NULL;
+}
+
+/* geneve receive/decap routine */
+static void geneve_rx(struct geneve_sock *gs, struct sk_buff *skb)
+{
+	struct genevehdr *gnvh = geneve_hdr(skb);
+	struct metadata_dst *tun_dst = NULL;
+	struct geneve_dev *geneve = NULL;
+	struct pcpu_sw_netstats *stats;
+	struct geneve_net *gn;
+	struct iphdr *iph;
+	int err;
+
+	iph = ip_hdr(skb); /* Still outer IP header... */
+	gn = gs->rcv_data;
+	geneve = geneve_lookup(gn, gs, iph, gnvh);
 	if (!geneve)
 		goto drop;
 
-	/* Drop packets w/ critical options,
-	 * since we don't support any...
-	 */
-	if (gnvh->critical)
-		goto drop;
+	if (ip_tunnel_collect_metadata() && geneve->collect_md) {
+		__be16 flags;
+		void *opts;
+
+		flags = TUNNEL_KEY | TUNNEL_GENEVE_OPT |
+			(gnvh->oam ? TUNNEL_OAM : 0) |
+			(gnvh->critical ? TUNNEL_CRIT_OPT : 0);
+
+		tun_dst = udp_tun_rx_dst(skb, flags,
+					 vni_to_tunnel_id(gnvh->vni),
+					 gnvh->opt_len * 4);
+		if (!tun_dst)
+			goto drop;
+
+		/* Update tunnel dst according to Geneve options. */
+		opts = ip_tunnel_info_opts(&tun_dst->u.tun_info,
+					   gnvh->opt_len * 4);
+		memcpy(opts, gnvh->options, gnvh->opt_len * 4);
+	} else {
+		/* Drop packets w/ critical options,
+		 * since we don't support any...
+		 */
+		if (gnvh->critical)
+			goto drop;
+	}
 
 	skb_reset_mac_header(skb);
 	skb_scrub_packet(skb, !net_eq(geneve->net, dev_net(geneve->dev)));
 	skb->protocol = eth_type_trans(skb, geneve->dev);
 	skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
 
+	if (tun_dst)
+		skb_dst_set(skb, (struct dst_entry *)tun_dst);
+
 	/* Ignore packet loops (and multicast echo) */
 	if (ether_addr_equal(eth_hdr(skb)->h_source, geneve->dev->dev_addr))
 		goto drop;
@@ -131,7 +181,6 @@ static void geneve_rx(struct geneve_sock *gs, struct sk_buff *skb)
 	u64_stats_update_end(&stats->syncp);
 
 	netif_rx(skb);
-
 	return;
 drop:
 	/* Consume bad packet */
@@ -141,10 +190,15 @@ drop:
 /* Setup stats when device is created */
 static int geneve_init(struct net_device *dev)
 {
+	struct geneve_dev *geneve = netdev_priv(dev);
+
 	dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);
 	if (!dev->tstats)
 		return -ENOMEM;
 
+	if (geneve->collect_md)
+		dev->features |= NETIF_F_NETNS_LOCAL;
+
 	return 0;
 }
 
@@ -180,69 +234,137 @@ static int geneve_stop(struct net_device *dev)
 	return 0;
 }
 
+static struct rtable *geneve_get_rt(struct sk_buff *skb,
+				    struct net_device *dev,
+				    struct flowi4 *fl4,
+				    struct ip_tunnel_info *info)
+{
+	struct geneve_dev *geneve = netdev_priv(dev);
+	struct rtable *rt = NULL;
+	__u8 tos;
+
+	memset(fl4, 0, sizeof(*fl4));
+	fl4->flowi4_mark = skb->mark;
+	fl4->flowi4_proto = IPPROTO_UDP;
+
+	if (info) {
+		fl4->daddr = info->key.ipv4_dst;
+		fl4->saddr = info->key.ipv4_src;
+		fl4->flowi4_tos = RT_TOS(info->key.ipv4_tos);
+	} else {
+		tos = geneve->tos;
+		if (tos == 1) {
+			const struct iphdr *iip = ip_hdr(skb);
+
+			tos = ip_tunnel_get_dsfield(iip, skb);
+		}
+
+		fl4->flowi4_tos = RT_TOS(tos);
+		fl4->daddr = geneve->remote.sin_addr.s_addr;
+	}
+
+	rt = ip_route_output_key(geneve->net, fl4);
+	if (IS_ERR(rt)) {
+		netdev_dbg(dev, "no route to %pI4\n", &fl4->daddr);
+		dev->stats.tx_carrier_errors++;
+		return rt;
+	}
+	if (rt->dst.dev == dev) { /* is this necessary? */
+		netdev_dbg(dev, "circular route to %pI4\n", &fl4->daddr);
+		dev->stats.collisions++;
+		ip_rt_put(rt);
+		return ERR_PTR(-EINVAL);
+	}
+
+	return rt;
+}
+
+/* Convert 64 bit tunnel ID to 24 bit VNI. */
+static void tunnel_id_to_vni(__be64 tun_id, __u8 *vni)
+{
+#ifdef __BIG_ENDIAN
+	vni[0] = (__force __u8)(tun_id >> 16);
+	vni[1] = (__force __u8)(tun_id >> 8);
+	vni[2] = (__force __u8)tun_id;
+#else
+	vni[0] = (__force __u8)((__force u64)tun_id >> 40);
+	vni[1] = (__force __u8)((__force u64)tun_id >> 48);
+	vni[2] = (__force __u8)((__force u64)tun_id >> 56);
+#endif
+}
+
 static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct geneve_dev *geneve = netdev_priv(dev);
 	struct geneve_sock *gs = geneve->sock;
+	struct ip_tunnel_info *info = NULL;
 	struct rtable *rt = NULL;
 	const struct iphdr *iip; /* interior IP header */
 	struct flowi4 fl4;
-	int err;
-	__be16 sport;
 	__u8 tos, ttl;
+	__be16 sport;
+	bool xnet;
+	int err;
 
-	iip = ip_hdr(skb);
-
-	skb_reset_mac_header(skb);
-
-	/* TODO: port min/max limits should be configurable */
-	sport = udp_flow_src_port(dev_net(dev), skb, 0, 0, true);
-
-	tos = geneve->tos;
-	if (tos == 1)
-		tos = ip_tunnel_get_dsfield(iip, skb);
+	sport = udp_flow_src_port(geneve->net, skb, 1, USHRT_MAX, true);
 
-	memset(&fl4, 0, sizeof(fl4));
-	fl4.flowi4_tos = RT_TOS(tos);
-	fl4.daddr = geneve->remote.sin_addr.s_addr;
-	fl4.flowi4_mark = skb->mark;
-	fl4.flowi4_proto = IPPROTO_UDP;
+	if (geneve->collect_md) {
+		info = skb_tunnel_info(skb, AF_INET);
+		if (unlikely(info && info->mode != IP_TUNNEL_INFO_TX)) {
+			netdev_dbg(dev, "no tunnel metadata\n");
+			goto tx_error;
+		}
+	}
 
-	rt = ip_route_output_key(geneve->net, &fl4);
+	rt = geneve_get_rt(skb, dev, &fl4, info);
 	if (IS_ERR(rt)) {
 		netdev_dbg(dev, "no route to %pI4\n", &fl4.daddr);
 		dev->stats.tx_carrier_errors++;
 		goto tx_error;
 	}
-	if (rt->dst.dev == dev) { /* is this necessary? */
-		netdev_dbg(dev, "circular route to %pI4\n", &fl4.daddr);
-		dev->stats.collisions++;
-		goto rt_tx_error;
+	skb_reset_mac_header(skb);
+	xnet = !net_eq(geneve->net, dev_net(geneve->dev));
+
+	if (info) {
+		const struct ip_tunnel_key *key = &info->key;
+		bool udp_csum;
+		u8 *opts = NULL;
+		u8 vni[3];
+		__be16 df;
+
+		tunnel_id_to_vni(key->tun_id, vni);
+		df = key->tun_flags & TUNNEL_DONT_FRAGMENT ? htons(IP_DF) : 0;
+		udp_csum = !!(key->tun_flags & TUNNEL_CSUM);
+
+		if (key->tun_flags & TUNNEL_GENEVE_OPT)
+			opts = ip_tunnel_info_opts(info, info->options_len);
+
+		err = geneve_xmit_skb(gs, rt, skb, fl4.saddr, fl4.daddr,
+				      key->ipv4_tos, key->ipv4_ttl, df,
+				      sport, geneve->dst_port,
+				      key->tun_flags, vni,
+				      info->options_len, opts, udp_csum, xnet);
+	} else {
+		iip = ip_hdr(skb);
+		tos = ip_tunnel_ecn_encap(fl4.flowi4_tos, iip, skb);
+
+		ttl = geneve->ttl;
+		if (!ttl && IN_MULTICAST(ntohl(fl4.daddr)))
+			ttl = 1;
+
+		ttl = ttl ? : ip4_dst_hoplimit(&rt->dst);
+
+		/* no need to handle local destination and encap bypass...yet... */
+		err = geneve_xmit_skb(gs, rt, skb, fl4.saddr, fl4.daddr, tos,
+				      ttl, 0, sport, geneve->dst_port, 0,
+				      geneve->vni, 0, NULL, false, xnet);
 	}
-
-	tos = ip_tunnel_ecn_encap(tos, iip, skb);
-
-	ttl = geneve->ttl;
-	if (!ttl && IN_MULTICAST(ntohl(fl4.daddr)))
-		ttl = 1;
-
-	ttl = ttl ? : ip4_dst_hoplimit(&rt->dst);
-
-	/* no need to handle local destination and encap bypass...yet... */
-
-	err = geneve_xmit_skb(gs, rt, skb, fl4.saddr, fl4.daddr,
-			      tos, ttl, 0, sport, geneve->dst_port, 0,
-	                      geneve->vni, 0, NULL, false,
-	                      !net_eq(geneve->net, dev_net(geneve->dev)));
 	if (err < 0)
 		ip_rt_put(rt);
 
 	iptunnel_xmit_stats(err, &dev->stats, dev->tstats);
-
 	return NETDEV_TX_OK;
 
-rt_tx_error:
-	ip_rt_put(rt);
 tx_error:
 	dev->stats.tx_errors++;
 	dev_kfree_skb(skb);
@@ -313,6 +435,7 @@ static const struct nla_policy geneve_policy[IFLA_GENEVE_MAX + 1] = {
 	[IFLA_GENEVE_TTL]		= { .type = NLA_U8 },
 	[IFLA_GENEVE_TOS]		= { .type = NLA_U8 },
 	[IFLA_GENEVE_PORT]		= { .type = NLA_U16 },
+	[IFLA_GENEVE_COLLECT_METADATA]	= { .type = NLA_FLAG },
 };
 
 static int geneve_validate(struct nlattr *tb[], struct nlattr *data[])
@@ -338,71 +461,100 @@ static int geneve_validate(struct nlattr *tb[], struct nlattr *data[])
 	return 0;
 }
 
-static int geneve_newlink(struct net *net, struct net_device *dev,
-			 struct nlattr *tb[], struct nlattr *data[])
+static int geneve_configure(struct net *net, struct net_device *dev,
+			    __be32 rem_addr, __u32 vni, __u8 ttl, __u8 tos,
+			    __u16 dst_port, bool metadata)
 {
 	struct geneve_net *gn = net_generic(net, geneve_net_id);
 	struct geneve_dev *dummy, *geneve = netdev_priv(dev);
 	struct hlist_head *vni_list_head;
 	struct sockaddr_in remote;	/* IPv4 address for link partner */
-	__u32 vni, hash;
-	__be16 dst_port;
+	__u32 hash;
 	int err;
 
-	if (!data[IFLA_GENEVE_ID] || !data[IFLA_GENEVE_REMOTE])
-		return -EINVAL;
+	if (metadata && rtnl_dereference(gn->collect_md_tun))
+		return -EEXIST;
 
 	geneve->net = net;
 	geneve->dev = dev;
 
-	vni = nla_get_u32(data[IFLA_GENEVE_ID]);
 	geneve->vni[0] = (vni & 0x00ff0000) >> 16;
 	geneve->vni[1] = (vni & 0x0000ff00) >> 8;
 	geneve->vni[2] =  vni & 0x000000ff;
 
-	geneve->remote.sin_addr.s_addr =
-		nla_get_in_addr(data[IFLA_GENEVE_REMOTE]);
+	geneve->remote.sin_addr.s_addr = rem_addr;
 	if (IN_MULTICAST(ntohl(geneve->remote.sin_addr.s_addr)))
 		return -EINVAL;
 
-	if (data[IFLA_GENEVE_PORT])
-		dst_port = htons(nla_get_u16(data[IFLA_GENEVE_PORT]));
-	else
-		dst_port = htons(GENEVE_UDP_PORT);
-
 	remote = geneve->remote;
 	hash = geneve_net_vni_hash(geneve->vni);
 	vni_list_head = &gn->vni_list[hash];
 	hlist_for_each_entry_rcu(dummy, vni_list_head, hlist) {
 		if (!memcmp(geneve->vni, dummy->vni, sizeof(dummy->vni)) &&
 		    !memcmp(&remote, &dummy->remote, sizeof(dummy->remote)) &&
-		    dst_port == dummy->dst_port) {
+		    htons(dst_port) == dummy->dst_port) {
 			return -EBUSY;
 		}
 	}
 
+	geneve->ttl = ttl;
+	geneve->tos = tos;
+	geneve->dst_port = htons(dst_port);
+	geneve->collect_md = metadata;
+
 	err = register_netdevice(dev);
 	if (err)
 		return err;
 
+	list_add(&geneve->next, &gn->geneve_list);
+	hlist_add_head_rcu(&geneve->hlist, &gn->vni_list[hash]);
+
+	if (geneve->collect_md)
+		rcu_assign_pointer(gn->collect_md_tun, geneve);
+	return 0;
+}
+
+static int geneve_newlink(struct net *net, struct net_device *dev,
+			  struct nlattr *tb[], struct nlattr *data[])
+{
+	__u16 dst_port = GENEVE_UDP_PORT;
+	__u8 ttl = 0, tos = 0;
+	bool metadata = false;
+	__be32 rem_addr;
+	__u32 vni;
+
+	if (!data[IFLA_GENEVE_ID] || !data[IFLA_GENEVE_REMOTE])
+		return -EINVAL;
+
+	vni = nla_get_u32(data[IFLA_GENEVE_ID]);
+	rem_addr = nla_get_in_addr(data[IFLA_GENEVE_REMOTE]);
+
 	if (data[IFLA_GENEVE_TTL])
-		geneve->ttl = nla_get_u8(data[IFLA_GENEVE_TTL]);
+		ttl = nla_get_u8(data[IFLA_GENEVE_TTL]);
 
 	if (data[IFLA_GENEVE_TOS])
-		geneve->tos = nla_get_u8(data[IFLA_GENEVE_TOS]);
+		tos = nla_get_u8(data[IFLA_GENEVE_TOS]);
 
-	geneve->dst_port = dst_port;
-	list_add(&geneve->next, &gn->geneve_list);
+	if (data[IFLA_GENEVE_PORT])
+		dst_port = nla_get_u16(data[IFLA_GENEVE_PORT]);
 
-	hlist_add_head_rcu(&geneve->hlist, &gn->vni_list[hash]);
+	if (data[IFLA_GENEVE_COLLECT_METADATA])
+		metadata = true;
 
-	return 0;
+	return geneve_configure(net, dev, rem_addr, vni,
+				ttl, tos, dst_port, metadata);
 }
 
 static void geneve_dellink(struct net_device *dev, struct list_head *head)
 {
 	struct geneve_dev *geneve = netdev_priv(dev);
 
+	if (geneve->collect_md) {
+		struct geneve_net *gn = net_generic(geneve->net, geneve_net_id);
+
+		rcu_assign_pointer(gn->collect_md_tun, NULL);
+	}
+
 	if (!hlist_unhashed(&geneve->hlist))
 		hlist_del_rcu(&geneve->hlist);
 
@@ -417,6 +569,7 @@ static size_t geneve_get_size(const struct net_device *dev)
 		nla_total_size(sizeof(__u8)) +  /* IFLA_GENEVE_TTL */
 		nla_total_size(sizeof(__u8)) +  /* IFLA_GENEVE_TOS */
 		nla_total_size(sizeof(__u16)) +  /* IFLA_GENEVE_PORT */
+		nla_total_size(0) +	 /* IFLA_GENEVE_COLLECT_METADATA */
 		0;
 }
 
@@ -440,6 +593,11 @@ static int geneve_fill_info(struct sk_buff *skb, const struct net_device *dev)
 	if (nla_put_u32(skb, IFLA_GENEVE_PORT, ntohs(geneve->dst_port)))
 		goto nla_put_failure;
 
+	if (geneve->collect_md) {
+		if (nla_put_flag(skb, IFLA_GENEVE_COLLECT_METADATA))
+			goto nla_put_failure;
+	}
+
 	return 0;
 
 nla_put_failure:
@@ -459,6 +617,28 @@ static struct rtnl_link_ops geneve_link_ops __read_mostly = {
 	.fill_info	= geneve_fill_info,
 };
 
+struct net_device *geneve_dev_create_fb(struct net *net, const char *name,
+					u8 name_assign_type, u16 dst_port)
+{
+	struct nlattr *tb[IFLA_MAX + 1];
+	struct net_device *dev;
+	int err;
+
+	memset(tb, 0, sizeof(tb));
+	dev = rtnl_create_link(net, name, name_assign_type,
+			       &geneve_link_ops, tb);
+	if (IS_ERR(dev))
+		return dev;
+
+	err = geneve_configure(net, dev, 0, 0, 0, 0, dst_port, true);
+	if (err) {
+		free_netdev(dev);
+		return ERR_PTR(err);
+	}
+	return dev;
+}
+EXPORT_SYMBOL_GPL(geneve_dev_create_fb);
+
 static __net_init int geneve_init_net(struct net *net)
 {
 	struct geneve_net *gn = net_generic(net, geneve_net_id);
diff --git a/include/net/geneve.h b/include/net/geneve.h
index 2a0543a..4245e1d 100644
--- a/include/net/geneve.h
+++ b/include/net/geneve.h
@@ -96,6 +96,9 @@ int geneve_xmit_skb(struct geneve_sock *gs, struct rtable *rt,
 		    __u8 ttl, __be16 df, __be16 src_port, __be16 dst_port,
 		    __be16 tun_flags, u8 vni[3], u8 opt_len, u8 *opt,
 		    bool csum, bool xnet);
+
+struct net_device *geneve_dev_create_fb(struct net *net, const char *name,
+					u8 name_assign_type, u16 dst_port);
 #endif /*ifdef CONFIG_INET */
 
 #endif /*ifdef__NET_GENEVE_H */
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index aa0d0f1..2a8807c 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -410,6 +410,7 @@ enum {
 	IFLA_GENEVE_TTL,
 	IFLA_GENEVE_TOS,
 	IFLA_GENEVE_PORT,	/* destination port */
+	IFLA_GENEVE_COLLECT_METADATA,
 	__IFLA_GENEVE_MAX
 };
 #define IFLA_GENEVE_MAX	(__IFLA_GENEVE_MAX - 1)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next v2 6/9] openvswitch: Use Geneve device.
  2015-08-17 21:11 [PATCH net-next v2 0/9] Geneve: Add support for tunnel metadata mode Pravin B Shelar
                   ` (4 preceding siblings ...)
  2015-08-17 21:11 ` [PATCH net-next v2 5/9] geneve: Add support to collect tunnel metadata Pravin B Shelar
@ 2015-08-17 21:11 ` Pravin B Shelar
  2015-08-17 21:11 ` [PATCH net-next v2 7/9] geneve: Consolidate Geneve functionality in single module Pravin B Shelar
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 33+ messages in thread
From: Pravin B Shelar @ 2015-08-17 21:11 UTC (permalink / raw)
  To: netdev; +Cc: Pravin B Shelar

With help of tunnel metadata mode OVS can directly use
Geneve devices to implement Geneve tunnels.
This patch removes all of the OVS specific Geneve code
and make OVS use a Geneve net_device. Basic geneve vport
is still there to handle compatibility with current
userspace application.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Reviewed-by: Jesse Gross <jesse@nicira.com>
---
 net/openvswitch/Kconfig        |   2 +-
 net/openvswitch/vport-geneve.c | 179 ++++++++---------------------------------
 2 files changed, 33 insertions(+), 148 deletions(-)

diff --git a/net/openvswitch/Kconfig b/net/openvswitch/Kconfig
index 422dc05..87b98c01 100644
--- a/net/openvswitch/Kconfig
+++ b/net/openvswitch/Kconfig
@@ -59,7 +59,7 @@ config OPENVSWITCH_VXLAN
 config OPENVSWITCH_GENEVE
 	tristate "Open vSwitch Geneve tunneling support"
 	depends on OPENVSWITCH
-	depends on GENEVE_CORE
+	depends on GENEVE
 	default OPENVSWITCH
 	---help---
 	  If you say Y here, then the Open vSwitch will be able create geneve vport.
diff --git a/net/openvswitch/vport-geneve.c b/net/openvswitch/vport-geneve.c
index 1da3a14..fa37c95 100644
--- a/net/openvswitch/vport-geneve.c
+++ b/net/openvswitch/vport-geneve.c
@@ -26,95 +26,44 @@
 
 #include "datapath.h"
 #include "vport.h"
+#include "vport-netdev.h"
 
 static struct vport_ops ovs_geneve_vport_ops;
-
 /**
  * struct geneve_port - Keeps track of open UDP ports
- * @gs: The socket created for this port number.
- * @name: vport name.
+ * @dst_port: destination port.
  */
 struct geneve_port {
-	struct geneve_sock *gs;
-	char name[IFNAMSIZ];
+	u16 port_no;
 };
 
-static LIST_HEAD(geneve_ports);
-
 static inline struct geneve_port *geneve_vport(const struct vport *vport)
 {
 	return vport_priv(vport);
 }
 
-/* Convert 64 bit tunnel ID to 24 bit VNI. */
-static void tunnel_id_to_vni(__be64 tun_id, __u8 *vni)
-{
-#ifdef __BIG_ENDIAN
-	vni[0] = (__force __u8)(tun_id >> 16);
-	vni[1] = (__force __u8)(tun_id >> 8);
-	vni[2] = (__force __u8)tun_id;
-#else
-	vni[0] = (__force __u8)((__force u64)tun_id >> 40);
-	vni[1] = (__force __u8)((__force u64)tun_id >> 48);
-	vni[2] = (__force __u8)((__force u64)tun_id >> 56);
-#endif
-}
-
-/* Convert 24 bit VNI to 64 bit tunnel ID. */
-static __be64 vni_to_tunnel_id(const __u8 *vni)
-{
-#ifdef __BIG_ENDIAN
-	return (vni[0] << 16) | (vni[1] << 8) | vni[2];
-#else
-	return (__force __be64)(((__force u64)vni[0] << 40) |
-				((__force u64)vni[1] << 48) |
-				((__force u64)vni[2] << 56));
-#endif
-}
-
-static void geneve_rcv(struct geneve_sock *gs, struct sk_buff *skb)
-{
-	struct vport *vport = gs->rcv_data;
-	struct genevehdr *geneveh = geneve_hdr(skb);
-	int opts_len;
-	struct ip_tunnel_info tun_info;
-	__be64 key;
-	__be16 flags;
-
-	opts_len = geneveh->opt_len * 4;
-
-	flags = TUNNEL_KEY | TUNNEL_GENEVE_OPT |
-		(udp_hdr(skb)->check != 0 ? TUNNEL_CSUM : 0) |
-		(geneveh->oam ? TUNNEL_OAM : 0) |
-		(geneveh->critical ? TUNNEL_CRIT_OPT : 0);
-
-	key = vni_to_tunnel_id(geneveh->vni);
-
-	ip_tunnel_info_init(&tun_info, ip_hdr(skb),
-			    udp_hdr(skb)->source, udp_hdr(skb)->dest,
-			    key, flags, geneveh->options, opts_len);
-
-	ovs_vport_receive(vport, skb, &tun_info);
-}
-
 static int geneve_get_options(const struct vport *vport,
 			      struct sk_buff *skb)
 {
 	struct geneve_port *geneve_port = geneve_vport(vport);
-	struct inet_sock *sk = inet_sk(geneve_port->gs->sock->sk);
 
-	if (nla_put_u16(skb, OVS_TUNNEL_ATTR_DST_PORT, ntohs(sk->inet_sport)))
+	if (nla_put_u16(skb, OVS_TUNNEL_ATTR_DST_PORT, geneve_port->port_no))
 		return -EMSGSIZE;
 	return 0;
 }
 
-static void geneve_tnl_destroy(struct vport *vport)
+static int geneve_get_egress_tun_info(struct vport *vport, struct sk_buff *skb,
+				      struct ip_tunnel_info *egress_tun_info)
 {
 	struct geneve_port *geneve_port = geneve_vport(vport);
+	struct net *net = ovs_dp_get_net(vport->dp);
+	__be16 dport = htons(geneve_port->port_no);
+	__be16 sport = udp_flow_src_port(net, skb, 1, USHRT_MAX, true);
 
-	geneve_sock_release(geneve_port->gs);
-
-	ovs_vport_deferred_free(vport);
+	return ovs_tunnel_get_egress_info(egress_tun_info,
+					  ovs_dp_get_net(vport->dp),
+					  OVS_CB(skb)->egress_tun_info,
+					  IPPROTO_UDP, skb->mark, sport, dport);
 }
 
 static struct vport *geneve_tnl_create(const struct vport_parms *parms)
@@ -122,11 +71,11 @@ static struct vport *geneve_tnl_create(const struct vport_parms *parms)
 	struct net *net = ovs_dp_get_net(parms->dp);
 	struct nlattr *options = parms->options;
 	struct geneve_port *geneve_port;
-	struct geneve_sock *gs;
+	struct net_device *dev;
 	struct vport *vport;
 	struct nlattr *a;
-	int err;
 	u16 dst_port;
+	int err;
 
 	if (!options) {
 		err = -EINVAL;
@@ -148,104 +97,40 @@ static struct vport *geneve_tnl_create(const struct vport_parms *parms)
 		return vport;
 
 	geneve_port = geneve_vport(vport);
-	strncpy(geneve_port->name, parms->name, IFNAMSIZ);
+	geneve_port->port_no = dst_port;
 
-	gs = geneve_sock_add(net, htons(dst_port), geneve_rcv, vport, true, 0);
-	if (IS_ERR(gs)) {
+	rtnl_lock();
+	dev = geneve_dev_create_fb(net, parms->name, NET_NAME_USER, dst_port);
+	if (IS_ERR(dev)) {
+		rtnl_unlock();
 		ovs_vport_free(vport);
-		return (void *)gs;
+		return ERR_CAST(dev);
 	}
-	geneve_port->gs = gs;
 
+	dev_change_flags(dev, dev->flags | IFF_UP);
+	rtnl_unlock();
 	return vport;
 error:
 	return ERR_PTR(err);
 }
 
-static int geneve_tnl_send(struct vport *vport, struct sk_buff *skb)
+static struct vport *geneve_create(const struct vport_parms *parms)
 {
-	const struct ip_tunnel_key *tun_key;
-	struct ip_tunnel_info *tun_info;
-	struct net *net = ovs_dp_get_net(vport->dp);
-	struct geneve_port *geneve_port = geneve_vport(vport);
-	__be16 dport = inet_sk(geneve_port->gs->sock->sk)->inet_sport;
-	__be16 sport;
-	struct rtable *rt;
-	struct flowi4 fl;
-	u8 vni[3], opts_len, *opts;
-	__be16 df;
-	int err;
-
-	tun_info = OVS_CB(skb)->egress_tun_info;
-	if (unlikely(!tun_info)) {
-		err = -EINVAL;
-		goto error;
-	}
-
-	tun_key = &tun_info->key;
-	rt = ovs_tunnel_route_lookup(net, tun_key, skb->mark, &fl, IPPROTO_UDP);
-	if (IS_ERR(rt)) {
-		err = PTR_ERR(rt);
-		goto error;
-	}
-
-	df = tun_key->tun_flags & TUNNEL_DONT_FRAGMENT ? htons(IP_DF) : 0;
-	sport = udp_flow_src_port(net, skb, 1, USHRT_MAX, true);
-	tunnel_id_to_vni(tun_key->tun_id, vni);
-	skb->ignore_df = 1;
-
-	if (tun_key->tun_flags & TUNNEL_GENEVE_OPT) {
-		opts = (u8 *)tun_info->options;
-		opts_len = tun_info->options_len;
-	} else {
-		opts = NULL;
-		opts_len = 0;
-	}
-
-	err = geneve_xmit_skb(geneve_port->gs, rt, skb, fl.saddr,
-			      tun_key->ipv4_dst, tun_key->ipv4_tos,
-			      tun_key->ipv4_ttl, df, sport, dport,
-			      tun_key->tun_flags, vni, opts_len, opts,
-			      !!(tun_key->tun_flags & TUNNEL_CSUM), false);
-	if (err < 0)
-		ip_rt_put(rt);
-	return err;
-
-error:
-	kfree_skb(skb);
-	return err;
-}
-
-static const char *geneve_get_name(const struct vport *vport)
-{
-	struct geneve_port *geneve_port = geneve_vport(vport);
-
-	return geneve_port->name;
-}
+	struct vport *vport;
 
-static int geneve_get_egress_tun_info(struct vport *vport, struct sk_buff *skb,
-				      struct ip_tunnel_info *egress_tun_info)
-{
-	struct geneve_port *geneve_port = geneve_vport(vport);
-	struct net *net = ovs_dp_get_net(vport->dp);
-	__be16 dport = inet_sk(geneve_port->gs->sock->sk)->inet_sport;
-	__be16 sport = udp_flow_src_port(net, skb, 1, USHRT_MAX, true);
+	vport = geneve_tnl_create(parms);
+	if (IS_ERR(vport))
+		return vport;
 
-	/* Get tp_src and tp_dst, refert to geneve_build_header().
-	 */
-	return ovs_tunnel_get_egress_info(egress_tun_info,
-					  ovs_dp_get_net(vport->dp),
-					  OVS_CB(skb)->egress_tun_info,
-					  IPPROTO_UDP, skb->mark, sport, dport);
+	return ovs_netdev_link(vport, parms->name);
 }
 
 static struct vport_ops ovs_geneve_vport_ops = {
 	.type		= OVS_VPORT_TYPE_GENEVE,
-	.create		= geneve_tnl_create,
-	.destroy	= geneve_tnl_destroy,
-	.get_name	= geneve_get_name,
+	.create		= geneve_create,
+	.destroy	= ovs_netdev_tunnel_destroy,
 	.get_options	= geneve_get_options,
-	.send		= geneve_tnl_send,
+	.send		= ovs_netdev_send,
 	.owner          = THIS_MODULE,
 	.get_egress_tun_info	= geneve_get_egress_tun_info,
 };
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next v2 7/9] geneve: Consolidate Geneve functionality in single module.
  2015-08-17 21:11 [PATCH net-next v2 0/9] Geneve: Add support for tunnel metadata mode Pravin B Shelar
                   ` (5 preceding siblings ...)
  2015-08-17 21:11 ` [PATCH net-next v2 6/9] openvswitch: Use Geneve device Pravin B Shelar
@ 2015-08-17 21:11 ` Pravin B Shelar
  2015-08-19 18:18   ` Jesse Gross
  2015-08-17 21:11 ` [PATCH net-next v2 8/9] geneve: Move device hash table to geneve socket Pravin B Shelar
  2015-08-17 21:11 ` [PATCH net-next v2 9/9] geneve: Implement rtnl changelink Pravin B Shelar
  8 siblings, 1 reply; 33+ messages in thread
From: Pravin B Shelar @ 2015-08-17 21:11 UTC (permalink / raw)
  To: netdev; +Cc: Pravin B Shelar

geneve_core module handles send and receive functionality.
This way OVS could use the Geneve API. Now with use of
tunnel meatadata mode OVS can directly use Geneve netdevice.
So there is no need for separate module for Geneve. Following
patch consolidates Geneve protocol processing in single module.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 drivers/net/Kconfig    |   2 +-
 drivers/net/geneve.c   | 458 +++++++++++++++++++++++++++++++++++++++++--------
 include/net/geneve.h   |  34 ----
 net/ipv4/Kconfig       |  14 --
 net/ipv4/Makefile      |   1 -
 net/ipv4/geneve_core.c | 447 -----------------------------------------------
 6 files changed, 383 insertions(+), 573 deletions(-)
 delete mode 100644 net/ipv4/geneve_core.c

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index e58468b..18ff83b 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -181,7 +181,7 @@ config VXLAN
 
 config GENEVE
        tristate "Generic Network Virtualization Encapsulation netdev"
-       depends on INET && GENEVE_CORE
+       depends on INET
        select NET_IP_TUNNEL
        ---help---
 	  This allows one to create geneve virtual interfaces that provide
diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 5b43382..eb298ff 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -18,6 +18,7 @@
 #include <net/dst_metadata.h>
 #include <net/rtnetlink.h>
 #include <net/geneve.h>
+#include <net/protocol.h>
 
 #define GENEVE_NETDEV_VER	"0.6"
 
@@ -33,13 +34,18 @@ static bool log_ecn_error = true;
 module_param(log_ecn_error, bool, 0644);
 MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");
 
+#define GENEVE_VER 0
+#define GENEVE_BASE_HLEN (sizeof(struct udphdr) + sizeof(struct genevehdr))
+
 /* per-network namespace private data for this module */
 struct geneve_net {
-	struct list_head  geneve_list;
-	struct hlist_head vni_list[VNI_HASH_SIZE];
-	struct geneve_dev __rcu *collect_md_tun;
+	struct list_head	geneve_list;
+	struct hlist_head	vni_list[VNI_HASH_SIZE];
+	struct list_head	sock_list;
 };
 
+static int geneve_net_id;
+
 /* Pseudo network device */
 struct geneve_dev {
 	struct hlist_node  hlist;	/* vni hash table */
@@ -55,7 +61,15 @@ struct geneve_dev {
 	bool		   collect_md;
 };
 
-static int geneve_net_id;
+struct geneve_sock {
+	bool			collect_md;
+	struct geneve_net	*gn;
+	struct list_head	list;
+	struct socket		*sock;
+	struct rcu_head		rcu;
+	int			refcnt;
+	struct udp_offload	udp_offloads;
+};
 
 static inline __u32 geneve_net_vni_hash(u8 vni[3])
 {
@@ -76,51 +90,63 @@ static __be64 vni_to_tunnel_id(const __u8 *vni)
 #endif
 }
 
-static struct geneve_dev *geneve_lookup(struct geneve_net *gn,
-					struct geneve_sock *gs,
-					struct iphdr *iph,
-					struct genevehdr *gnvh)
+static struct geneve_dev *geneve_lookup(struct geneve_net *gn, __be16 port,
+					__be32 addr, u8 vni[])
 {
-	struct inet_sock *sk = inet_sk(gs->sock->sk);
 	struct hlist_head *vni_list_head;
 	struct geneve_dev *geneve;
 	__u32 hash;
 
-	geneve = rcu_dereference(gn->collect_md_tun);
-	if (geneve)
-		return geneve;
-
 	/* Find the device for this VNI */
-	hash = geneve_net_vni_hash(gnvh->vni);
+	hash = geneve_net_vni_hash(vni);
 	vni_list_head = &gn->vni_list[hash];
 	hlist_for_each_entry_rcu(geneve, vni_list_head, hlist) {
-		if (!memcmp(gnvh->vni, geneve->vni, sizeof(geneve->vni)) &&
-		    iph->saddr == geneve->remote.sin_addr.s_addr &&
-		    sk->inet_sport == geneve->dst_port) {
+		if (!memcmp(vni, geneve->vni, sizeof(geneve->vni)) &&
+		    addr == geneve->remote.sin_addr.s_addr &&
+		    port == geneve->dst_port) {
 			return geneve;
 		}
 	}
 	return NULL;
 }
 
+static inline struct genevehdr *geneve_hdr(const struct sk_buff *skb)
+{
+	return (struct genevehdr *)(udp_hdr(skb) + 1);
+}
+
 /* geneve receive/decap routine */
 static void geneve_rx(struct geneve_sock *gs, struct sk_buff *skb)
 {
+	struct inet_sock *sk = inet_sk(gs->sock->sk);
 	struct genevehdr *gnvh = geneve_hdr(skb);
+	struct geneve_net *gn = gs->gn;
 	struct metadata_dst *tun_dst = NULL;
 	struct geneve_dev *geneve = NULL;
 	struct pcpu_sw_netstats *stats;
-	struct geneve_net *gn;
 	struct iphdr *iph;
+	u8 *vni;
+	__be32 addr;
+	bool xnet;
 	int err;
 
 	iph = ip_hdr(skb); /* Still outer IP header... */
-	gn = gs->rcv_data;
-	geneve = geneve_lookup(gn, gs, iph, gnvh);
+
+	if (gs->collect_md) {
+		static u8 zero_vni[3];
+
+		vni = zero_vni;
+		addr = 0;
+	} else {
+		vni = gnvh->vni;
+		addr = iph->saddr;
+	}
+
+	geneve = geneve_lookup(gn, sk->inet_sport, addr, vni);
 	if (!geneve)
 		goto drop;
 
-	if (ip_tunnel_collect_metadata() && geneve->collect_md) {
+	if (ip_tunnel_collect_metadata() || gs->collect_md) {
 		__be16 flags;
 		void *opts;
 
@@ -138,16 +164,18 @@ static void geneve_rx(struct geneve_sock *gs, struct sk_buff *skb)
 		opts = ip_tunnel_info_opts(&tun_dst->u.tun_info,
 					   gnvh->opt_len * 4);
 		memcpy(opts, gnvh->options, gnvh->opt_len * 4);
+		xnet = false;
 	} else {
 		/* Drop packets w/ critical options,
 		 * since we don't support any...
 		 */
 		if (gnvh->critical)
 			goto drop;
+		xnet = !net_eq(geneve->net, dev_net(geneve->dev));
 	}
 
 	skb_reset_mac_header(skb);
-	skb_scrub_packet(skb, !net_eq(geneve->net, dev_net(geneve->dev)));
+	skb_scrub_packet(skb, xnet);
 	skb->protocol = eth_type_trans(skb, geneve->dev);
 	skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
 
@@ -207,20 +235,265 @@ static void geneve_uninit(struct net_device *dev)
 	free_percpu(dev->tstats);
 }
 
+/* Callback from net/ipv4/udp.c to receive packets */
+static int geneve_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
+{
+	struct genevehdr *geneveh;
+	struct geneve_sock *gs;
+	int opts_len;
+
+	/* Need Geneve and inner Ethernet header to be present */
+	if (unlikely(!pskb_may_pull(skb, GENEVE_BASE_HLEN)))
+		goto error;
+
+	/* Return packets with reserved bits set */
+	geneveh = geneve_hdr(skb);
+	if (unlikely(geneveh->ver != GENEVE_VER))
+		goto error;
+
+	if (unlikely(geneveh->proto_type != htons(ETH_P_TEB)))
+		goto error;
+
+	opts_len = geneveh->opt_len * 4;
+	if (iptunnel_pull_header(skb, GENEVE_BASE_HLEN + opts_len,
+				 htons(ETH_P_TEB)))
+		goto drop;
+
+	gs = rcu_dereference_sk_user_data(sk);
+	if (!gs)
+		goto drop;
+
+	geneve_rx(gs, skb);
+	return 0;
+
+drop:
+	/* Consume bad packet */
+	kfree_skb(skb);
+	return 0;
+
+error:
+	/* Let the UDP layer deal with the skb */
+	return 1;
+}
+
+static struct socket *geneve_create_sock(struct net *net, bool ipv6,
+					 __be16 port)
+{
+	struct socket *sock;
+	struct udp_port_cfg udp_conf;
+	int err;
+
+	memset(&udp_conf, 0, sizeof(udp_conf));
+
+	if (ipv6) {
+		udp_conf.family = AF_INET6;
+	} else {
+		udp_conf.family = AF_INET;
+		udp_conf.local_ip.s_addr = htonl(INADDR_ANY);
+	}
+
+	udp_conf.local_udp_port = port;
+
+	/* Open UDP socket */
+	err = udp_sock_create(net, &udp_conf, &sock);
+	if (err < 0)
+		return ERR_PTR(err);
+
+	return sock;
+}
+
+static void geneve_notify_add_rx_port(struct geneve_sock *gs)
+{
+	struct sock *sk = gs->sock->sk;
+	sa_family_t sa_family = sk->sk_family;
+	int err;
+
+	if (sa_family == AF_INET) {
+		err = udp_add_offload(&gs->udp_offloads);
+		if (err)
+			pr_warn("geneve: udp_add_offload failed with status %d\n",
+				err);
+	}
+}
+
+static int geneve_hlen(struct genevehdr *gh)
+{
+	return sizeof(*gh) + gh->opt_len * 4;
+}
+
+static struct sk_buff **geneve_gro_receive(struct sk_buff **head,
+					   struct sk_buff *skb,
+					   struct udp_offload *uoff)
+{
+	struct sk_buff *p, **pp = NULL;
+	struct genevehdr *gh, *gh2;
+	unsigned int hlen, gh_len, off_gnv;
+	const struct packet_offload *ptype;
+	__be16 type;
+	int flush = 1;
+
+	off_gnv = skb_gro_offset(skb);
+	hlen = off_gnv + sizeof(*gh);
+	gh = skb_gro_header_fast(skb, off_gnv);
+	if (skb_gro_header_hard(skb, hlen)) {
+		gh = skb_gro_header_slow(skb, hlen, off_gnv);
+		if (unlikely(!gh))
+			goto out;
+	}
+
+	if (gh->ver != GENEVE_VER || gh->oam)
+		goto out;
+	gh_len = geneve_hlen(gh);
+
+	hlen = off_gnv + gh_len;
+	if (skb_gro_header_hard(skb, hlen)) {
+		gh = skb_gro_header_slow(skb, hlen, off_gnv);
+		if (unlikely(!gh))
+			goto out;
+	}
+
+	flush = 0;
+
+	for (p = *head; p; p = p->next) {
+		if (!NAPI_GRO_CB(p)->same_flow)
+			continue;
+
+		gh2 = (struct genevehdr *)(p->data + off_gnv);
+		if (gh->opt_len != gh2->opt_len ||
+		    memcmp(gh, gh2, gh_len)) {
+			NAPI_GRO_CB(p)->same_flow = 0;
+			continue;
+		}
+	}
+
+	type = gh->proto_type;
+
+	rcu_read_lock();
+	ptype = gro_find_receive_by_type(type);
+	if (!ptype) {
+		flush = 1;
+		goto out_unlock;
+	}
+
+	skb_gro_pull(skb, gh_len);
+	skb_gro_postpull_rcsum(skb, gh, gh_len);
+	pp = ptype->callbacks.gro_receive(head, skb);
+
+out_unlock:
+	rcu_read_unlock();
+out:
+	NAPI_GRO_CB(skb)->flush |= flush;
+
+	return pp;
+}
+
+static int geneve_gro_complete(struct sk_buff *skb, int nhoff,
+			       struct udp_offload *uoff)
+{
+	struct genevehdr *gh;
+	struct packet_offload *ptype;
+	__be16 type;
+	int gh_len;
+	int err = -ENOSYS;
+
+	udp_tunnel_gro_complete(skb, nhoff);
+
+	gh = (struct genevehdr *)(skb->data + nhoff);
+	gh_len = geneve_hlen(gh);
+	type = gh->proto_type;
+
+	rcu_read_lock();
+	ptype = gro_find_complete_by_type(type);
+	if (ptype)
+		err = ptype->callbacks.gro_complete(skb, nhoff + gh_len);
+
+	rcu_read_unlock();
+	return err;
+}
+
+/* Create new listen socket if needed */
+static struct geneve_sock *geneve_socket_create(struct net *net, __be16 port,
+						bool ipv6)
+{
+	struct geneve_net *gn = net_generic(net, geneve_net_id);
+	struct geneve_sock *gs;
+	struct socket *sock;
+	struct udp_tunnel_sock_cfg tunnel_cfg;
+
+	gs = kzalloc(sizeof(*gs), GFP_KERNEL);
+	if (!gs)
+		return ERR_PTR(-ENOMEM);
+
+	sock = geneve_create_sock(net, ipv6, port);
+	if (IS_ERR(sock)) {
+		kfree(gs);
+		return ERR_CAST(sock);
+	}
+
+	gs->sock = sock;
+	gs->refcnt = 1;
+	gs->gn = gn;
+
+	/* Initialize the geneve udp offloads structure */
+	gs->udp_offloads.port = port;
+	gs->udp_offloads.callbacks.gro_receive  = geneve_gro_receive;
+	gs->udp_offloads.callbacks.gro_complete = geneve_gro_complete;
+	geneve_notify_add_rx_port(gs);
+
+	/* Mark socket as an encapsulation socket */
+	tunnel_cfg.sk_user_data = gs;
+	tunnel_cfg.encap_type = 1;
+	tunnel_cfg.encap_rcv = geneve_udp_encap_recv;
+	tunnel_cfg.encap_destroy = NULL;
+	setup_udp_tunnel_sock(net, sock, &tunnel_cfg);
+
+	list_add(&gs->list, &gn->sock_list);
+	return gs;
+}
+
+static void geneve_notify_del_rx_port(struct geneve_sock *gs)
+{
+	struct sock *sk = gs->sock->sk;
+	sa_family_t sa_family = sk->sk_family;
+
+	if (sa_family == AF_INET)
+		udp_del_offload(&gs->udp_offloads);
+}
+
+static void geneve_sock_release(struct geneve_sock *gs)
+{
+	if (--gs->refcnt)
+		return;
+
+	list_del(&gs->list);
+	geneve_notify_del_rx_port(gs);
+	udp_tunnel_sock_release(gs->sock);
+	kfree_rcu(gs, rcu);
+}
+
 static int geneve_open(struct net_device *dev)
 {
 	struct geneve_dev *geneve = netdev_priv(dev);
 	struct net *net = geneve->net;
-	struct geneve_net *gn = net_generic(geneve->net, geneve_net_id);
+	struct geneve_net *gn = net_generic(net, geneve_net_id);
 	struct geneve_sock *gs;
 
-	gs = geneve_sock_add(net, geneve->dst_port, geneve_rx, gn,
-	                     false, false);
+	list_for_each_entry(gs, &gn->sock_list, list) {
+		if (inet_sk(gs->sock->sk)->inet_sport == geneve->dst_port &&
+		    inet_sk(gs->sock->sk)->sk.sk_family == AF_INET) {
+			gs->refcnt++;
+			goto out;
+		}
+	}
+
+	gs = geneve_socket_create(net, geneve->dst_port, false);
 	if (IS_ERR(gs))
 		return PTR_ERR(gs);
 
+out:
+	if (geneve->collect_md)
+		gs->collect_md = true;
 	geneve->sock = gs;
-
 	return 0;
 }
 
@@ -229,9 +502,58 @@ static int geneve_stop(struct net_device *dev)
 	struct geneve_dev *geneve = netdev_priv(dev);
 	struct geneve_sock *gs = geneve->sock;
 
+	if (geneve->collect_md)
+		gs->collect_md = false;
+
 	geneve_sock_release(gs);
+	return 0;
+}
+
+static void geneve_build_header(struct genevehdr *geneveh,
+				__be16 tun_flags, u8 vni[3],
+				u8 options_len, u8 *options)
+{
+	geneveh->ver = GENEVE_VER;
+	geneveh->opt_len = options_len / 4;
+	geneveh->oam = !!(tun_flags & TUNNEL_OAM);
+	geneveh->critical = !!(tun_flags & TUNNEL_CRIT_OPT);
+	geneveh->rsvd1 = 0;
+	memcpy(geneveh->vni, vni, 3);
+	geneveh->proto_type = htons(ETH_P_TEB);
+	geneveh->rsvd2 = 0;
+
+	memcpy(geneveh->options, options, options_len);
+}
+
+static int geneve_build_skb(struct rtable *rt, struct sk_buff *skb,
+			    __be16 tun_flags, u8 vni[3], u8 opt_len, u8 *opt,
+			    bool csum)
+{
+	struct genevehdr *gnvh;
+	int min_headroom;
+	int err;
+
+	min_headroom = LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len
+			+ GENEVE_BASE_HLEN + opt_len + sizeof(struct iphdr);
+	err = skb_cow_head(skb, min_headroom);
+	if (unlikely(err))
+		goto free_skb;
 
+	skb = udp_tunnel_handle_offloads(skb, csum);
+	if (IS_ERR(skb))
+		goto free_rt;
+
+	gnvh = (struct genevehdr *)__skb_push(skb, sizeof(*gnvh) + opt_len);
+	geneve_build_header(gnvh, tun_flags, vni, opt_len, opt);
+
+	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
 	return 0;
+
+free_skb:
+	ip_rt_put(rt);
+free_rt:
+	kfree_skb(skb);
+	return err;
 }
 
 static struct rtable *geneve_get_rt(struct sk_buff *skb,
@@ -275,7 +597,6 @@ static struct rtable *geneve_get_rt(struct sk_buff *skb,
 		ip_rt_put(rt);
 		return ERR_PTR(-EINVAL);
 	}
-
 	return rt;
 }
 
@@ -299,15 +620,13 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct geneve_sock *gs = geneve->sock;
 	struct ip_tunnel_info *info = NULL;
 	struct rtable *rt = NULL;
-	const struct iphdr *iip; /* interior IP header */
 	struct flowi4 fl4;
 	__u8 tos, ttl;
 	__be16 sport;
-	bool xnet;
+	bool xnet, udp_csum;
+	__be16 df;
 	int err;
 
-	sport = udp_flow_src_port(geneve->net, skb, 1, USHRT_MAX, true);
-
 	if (geneve->collect_md) {
 		info = skb_tunnel_info(skb, AF_INET);
 		if (unlikely(info && info->mode != IP_TUNNEL_INFO_TX)) {
@@ -323,45 +642,51 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct net_device *dev)
 		dev->stats.tx_carrier_errors++;
 		goto tx_error;
 	}
+
+	sport = udp_flow_src_port(geneve->net, skb, 1, USHRT_MAX, true);
 	skb_reset_mac_header(skb);
-	xnet = !net_eq(geneve->net, dev_net(geneve->dev));
 
 	if (info) {
 		const struct ip_tunnel_key *key = &info->key;
-		bool udp_csum;
 		u8 *opts = NULL;
 		u8 vni[3];
-		__be16 df;
 
 		tunnel_id_to_vni(key->tun_id, vni);
-		df = key->tun_flags & TUNNEL_DONT_FRAGMENT ? htons(IP_DF) : 0;
-		udp_csum = !!(key->tun_flags & TUNNEL_CSUM);
-
 		if (key->tun_flags & TUNNEL_GENEVE_OPT)
 			opts = ip_tunnel_info_opts(info, info->options_len);
 
-		err = geneve_xmit_skb(gs, rt, skb, fl4.saddr, fl4.daddr,
-				      key->ipv4_tos, key->ipv4_ttl, df,
-				      sport, geneve->dst_port,
-				      key->tun_flags, vni,
-				      info->options_len, opts, udp_csum, xnet);
+		udp_csum = !!(key->tun_flags & TUNNEL_CSUM);
+		err = geneve_build_skb(rt, skb, key->tun_flags, vni,
+				       info->options_len, opts, udp_csum);
+		if (unlikely(err))
+			return NETDEV_TX_OK;
+
+		tos = key->ipv4_tos;
+		ttl = key->ipv4_ttl;
+		df = key->tun_flags & TUNNEL_DONT_FRAGMENT ? htons(IP_DF) : 0;
+		xnet = false;
 	} else {
+		const struct iphdr *iip; /* interior IP header */
+
+		udp_csum = false;
+		err = geneve_build_skb(rt, skb, 0, geneve->vni,
+				       0, NULL, udp_csum);
+		if (unlikely(err))
+			return NETDEV_TX_OK;
+
 		iip = ip_hdr(skb);
 		tos = ip_tunnel_ecn_encap(fl4.flowi4_tos, iip, skb);
-
 		ttl = geneve->ttl;
 		if (!ttl && IN_MULTICAST(ntohl(fl4.daddr)))
 			ttl = 1;
-
 		ttl = ttl ? : ip4_dst_hoplimit(&rt->dst);
-
-		/* no need to handle local destination and encap bypass...yet... */
-		err = geneve_xmit_skb(gs, rt, skb, fl4.saddr, fl4.daddr, tos,
-				      ttl, 0, sport, geneve->dst_port, 0,
-				      geneve->vni, 0, NULL, false, xnet);
+		df = 0;
+		xnet = !net_eq(geneve->net, dev_net(geneve->dev));
 	}
-	if (err < 0)
-		ip_rt_put(rt);
+
+	err = udp_tunnel_xmit_skb(rt, gs->sock->sk, skb, fl4.saddr, fl4.daddr,
+				  tos, ttl, df, sport, geneve->dst_port,
+				  xnet, !udp_csum);
 
 	iptunnel_xmit_stats(err, &dev->stats, dev->tstats);
 	return NETDEV_TX_OK;
@@ -467,15 +792,10 @@ static int geneve_configure(struct net *net, struct net_device *dev,
 			    __u16 dst_port, bool metadata)
 {
 	struct geneve_net *gn = net_generic(net, geneve_net_id);
-	struct geneve_dev *dummy, *geneve = netdev_priv(dev);
-	struct hlist_head *vni_list_head;
-	struct sockaddr_in remote;	/* IPv4 address for link partner */
+	struct geneve_dev *t, *geneve = netdev_priv(dev);
 	__u32 hash;
 	int err;
 
-	if (metadata && rtnl_dereference(gn->collect_md_tun))
-		return -EEXIST;
-
 	geneve->net = net;
 	geneve->dev = dev;
 
@@ -487,16 +807,9 @@ static int geneve_configure(struct net *net, struct net_device *dev,
 	if (IN_MULTICAST(ntohl(geneve->remote.sin_addr.s_addr)))
 		return -EINVAL;
 
-	remote = geneve->remote;
-	hash = geneve_net_vni_hash(geneve->vni);
-	vni_list_head = &gn->vni_list[hash];
-	hlist_for_each_entry_rcu(dummy, vni_list_head, hlist) {
-		if (!memcmp(geneve->vni, dummy->vni, sizeof(dummy->vni)) &&
-		    !memcmp(&remote, &dummy->remote, sizeof(dummy->remote)) &&
-		    htons(dst_port) == dummy->dst_port) {
-			return -EBUSY;
-		}
-	}
+	t = geneve_lookup(gn, htons(dst_port), rem_addr, geneve->vni);
+	if (t)
+		return -EBUSY;
 
 	geneve->ttl = ttl;
 	geneve->tos = tos;
@@ -508,10 +821,8 @@ static int geneve_configure(struct net *net, struct net_device *dev,
 		return err;
 
 	list_add(&geneve->next, &gn->geneve_list);
+	hash = geneve_net_vni_hash(geneve->vni);
 	hlist_add_head_rcu(&geneve->hlist, &gn->vni_list[hash]);
-
-	if (geneve->collect_md)
-		rcu_assign_pointer(gn->collect_md_tun, geneve);
 	return 0;
 }
 
@@ -550,12 +861,6 @@ static void geneve_dellink(struct net_device *dev, struct list_head *head)
 {
 	struct geneve_dev *geneve = netdev_priv(dev);
 
-	if (geneve->collect_md) {
-		struct geneve_net *gn = net_generic(geneve->net, geneve_net_id);
-
-		rcu_assign_pointer(gn->collect_md_tun, NULL);
-	}
-
 	if (!hlist_unhashed(&geneve->hlist))
 		hlist_del_rcu(&geneve->hlist);
 
@@ -647,6 +952,7 @@ static __net_init int geneve_init_net(struct net *net)
 
 	INIT_LIST_HEAD(&gn->geneve_list);
 
+	INIT_LIST_HEAD(&gn->sock_list);
 	for (h = 0; h < VNI_HASH_SIZE; ++h)
 		INIT_HLIST_HEAD(&gn->vni_list[h]);
 
diff --git a/include/net/geneve.h b/include/net/geneve.h
index 4245e1d..3106ed6 100644
--- a/include/net/geneve.h
+++ b/include/net/geneve.h
@@ -62,41 +62,7 @@ struct genevehdr {
 	struct geneve_opt options[];
 };
 
-static inline struct genevehdr *geneve_hdr(const struct sk_buff *skb)
-{
-	return (struct genevehdr *)(udp_hdr(skb) + 1);
-}
-
 #ifdef CONFIG_INET
-struct geneve_sock;
-
-typedef void (geneve_rcv_t)(struct geneve_sock *gs, struct sk_buff *skb);
-
-struct geneve_sock {
-	struct list_head	list;
-	geneve_rcv_t		*rcv;
-	void			*rcv_data;
-	struct socket		*sock;
-	struct rcu_head		rcu;
-	int			refcnt;
-	struct udp_offload	udp_offloads;
-};
-
-#define GENEVE_VER 0
-#define GENEVE_BASE_HLEN (sizeof(struct udphdr) + sizeof(struct genevehdr))
-
-struct geneve_sock *geneve_sock_add(struct net *net, __be16 port,
-				    geneve_rcv_t *rcv, void *data,
-				    bool no_share, bool ipv6);
-
-void geneve_sock_release(struct geneve_sock *vs);
-
-int geneve_xmit_skb(struct geneve_sock *gs, struct rtable *rt,
-		    struct sk_buff *skb, __be32 src, __be32 dst, __u8 tos,
-		    __u8 ttl, __be16 df, __be16 src_port, __be16 dst_port,
-		    __be16 tun_flags, u8 vni[3], u8 opt_len, u8 *opt,
-		    bool csum, bool xnet);
-
 struct net_device *geneve_dev_create_fb(struct net *net, const char *name,
 					u8 name_assign_type, u16 dst_port);
 #endif /*ifdef CONFIG_INET */
diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index 6fb3c90..416dfa0 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -331,20 +331,6 @@ config NET_FOU_IP_TUNNELS
 	  When this option is enabled IP tunnels can be configured to use
 	  FOU or GUE encapsulation.
 
-config GENEVE_CORE
-	tristate "Generic Network Virtualization Encapsulation library"
-	depends on INET
-	select NET_UDP_TUNNEL
-	---help---
-	This allows one to create Geneve virtual interfaces that provide
-	Layer 2 Networks over Layer 3 Networks. Geneve is often used
-	to tunnel virtual network infrastructure in virtualized environments.
-	For more information see:
-	  http://tools.ietf.org/html/draft-gross-geneve-01
-
-	  To compile this driver as a module, choose M here: the module
-
-
 config INET_AH
 	tristate "IP: AH transformation"
 	select XFRM_ALGO
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index efc43f3..89aacb6 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -57,7 +57,6 @@ obj-$(CONFIG_TCP_CONG_YEAH) += tcp_yeah.o
 obj-$(CONFIG_TCP_CONG_ILLINOIS) += tcp_illinois.o
 obj-$(CONFIG_MEMCG_KMEM) += tcp_memcontrol.o
 obj-$(CONFIG_NETLABEL) += cipso_ipv4.o
-obj-$(CONFIG_GENEVE_CORE) += geneve_core.o
 
 obj-$(CONFIG_XFRM) += xfrm4_policy.o xfrm4_state.o xfrm4_input.o \
 		      xfrm4_output.o xfrm4_protocol.o
diff --git a/net/ipv4/geneve_core.c b/net/ipv4/geneve_core.c
deleted file mode 100644
index 311a4ba..0000000
--- a/net/ipv4/geneve_core.c
+++ /dev/null
@@ -1,447 +0,0 @@
-/*
- * Geneve: Generic Network Virtualization Encapsulation
- *
- * Copyright (c) 2014 Nicira, Inc.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- */
-
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
-#include <linux/kernel.h>
-#include <linux/types.h>
-#include <linux/module.h>
-#include <linux/errno.h>
-#include <linux/slab.h>
-#include <linux/skbuff.h>
-#include <linux/list.h>
-#include <linux/netdevice.h>
-#include <linux/in.h>
-#include <linux/ip.h>
-#include <linux/udp.h>
-#include <linux/igmp.h>
-#include <linux/etherdevice.h>
-#include <linux/if_ether.h>
-#include <linux/if_vlan.h>
-#include <linux/ethtool.h>
-#include <linux/mutex.h>
-#include <net/arp.h>
-#include <net/ndisc.h>
-#include <net/ip.h>
-#include <net/ip_tunnels.h>
-#include <net/icmp.h>
-#include <net/udp.h>
-#include <net/rtnetlink.h>
-#include <net/route.h>
-#include <net/dsfield.h>
-#include <net/inet_ecn.h>
-#include <net/net_namespace.h>
-#include <net/netns/generic.h>
-#include <net/geneve.h>
-#include <net/protocol.h>
-#include <net/udp_tunnel.h>
-#if IS_ENABLED(CONFIG_IPV6)
-#include <net/ipv6.h>
-#include <net/addrconf.h>
-#include <net/ip6_tunnel.h>
-#include <net/ip6_checksum.h>
-#endif
-
-/* Protects sock_list and refcounts. */
-static DEFINE_MUTEX(geneve_mutex);
-
-/* per-network namespace private data for this module */
-struct geneve_net {
-	struct list_head	sock_list;
-};
-
-static int geneve_net_id;
-
-static struct geneve_sock *geneve_find_sock(struct net *net,
-					    sa_family_t family, __be16 port)
-{
-	struct geneve_net *gn = net_generic(net, geneve_net_id);
-	struct geneve_sock *gs;
-
-	list_for_each_entry(gs, &gn->sock_list, list) {
-		if (inet_sk(gs->sock->sk)->inet_sport == port &&
-		    inet_sk(gs->sock->sk)->sk.sk_family == family)
-			return gs;
-	}
-
-	return NULL;
-}
-
-static void geneve_build_header(struct genevehdr *geneveh,
-				__be16 tun_flags, u8 vni[3],
-				u8 options_len, u8 *options)
-{
-	geneveh->ver = GENEVE_VER;
-	geneveh->opt_len = options_len / 4;
-	geneveh->oam = !!(tun_flags & TUNNEL_OAM);
-	geneveh->critical = !!(tun_flags & TUNNEL_CRIT_OPT);
-	geneveh->rsvd1 = 0;
-	memcpy(geneveh->vni, vni, 3);
-	geneveh->proto_type = htons(ETH_P_TEB);
-	geneveh->rsvd2 = 0;
-
-	memcpy(geneveh->options, options, options_len);
-}
-
-/* Transmit a fully formatted Geneve frame.
- *
- * When calling this function. The skb->data should point
- * to the geneve header which is fully formed.
- *
- * This function will add other UDP tunnel headers.
- */
-int geneve_xmit_skb(struct geneve_sock *gs, struct rtable *rt,
-		    struct sk_buff *skb, __be32 src, __be32 dst, __u8 tos,
-		    __u8 ttl, __be16 df, __be16 src_port, __be16 dst_port,
-		    __be16 tun_flags, u8 vni[3], u8 opt_len, u8 *opt,
-		    bool csum, bool xnet)
-{
-	struct genevehdr *gnvh;
-	int min_headroom;
-	int err;
-
-	min_headroom = LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len
-			+ GENEVE_BASE_HLEN + opt_len + sizeof(struct iphdr)
-			+ (skb_vlan_tag_present(skb) ? VLAN_HLEN : 0);
-
-	err = skb_cow_head(skb, min_headroom);
-	if (unlikely(err)) {
-		kfree_skb(skb);
-		return err;
-	}
-
-	skb = vlan_hwaccel_push_inside(skb);
-	if (unlikely(!skb))
-		return -ENOMEM;
-
-	skb = udp_tunnel_handle_offloads(skb, csum);
-	if (IS_ERR(skb))
-		return PTR_ERR(skb);
-
-	gnvh = (struct genevehdr *)__skb_push(skb, sizeof(*gnvh) + opt_len);
-	geneve_build_header(gnvh, tun_flags, vni, opt_len, opt);
-
-	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
-
-	return udp_tunnel_xmit_skb(rt, gs->sock->sk, skb, src, dst,
-				   tos, ttl, df, src_port, dst_port, xnet,
-				   !csum);
-}
-EXPORT_SYMBOL_GPL(geneve_xmit_skb);
-
-static int geneve_hlen(struct genevehdr *gh)
-{
-	return sizeof(*gh) + gh->opt_len * 4;
-}
-
-static struct sk_buff **geneve_gro_receive(struct sk_buff **head,
-					   struct sk_buff *skb,
-					   struct udp_offload *uoff)
-{
-	struct sk_buff *p, **pp = NULL;
-	struct genevehdr *gh, *gh2;
-	unsigned int hlen, gh_len, off_gnv;
-	const struct packet_offload *ptype;
-	__be16 type;
-	int flush = 1;
-
-	off_gnv = skb_gro_offset(skb);
-	hlen = off_gnv + sizeof(*gh);
-	gh = skb_gro_header_fast(skb, off_gnv);
-	if (skb_gro_header_hard(skb, hlen)) {
-		gh = skb_gro_header_slow(skb, hlen, off_gnv);
-		if (unlikely(!gh))
-			goto out;
-	}
-
-	if (gh->ver != GENEVE_VER || gh->oam)
-		goto out;
-	gh_len = geneve_hlen(gh);
-
-	hlen = off_gnv + gh_len;
-	if (skb_gro_header_hard(skb, hlen)) {
-		gh = skb_gro_header_slow(skb, hlen, off_gnv);
-		if (unlikely(!gh))
-			goto out;
-	}
-
-	flush = 0;
-
-	for (p = *head; p; p = p->next) {
-		if (!NAPI_GRO_CB(p)->same_flow)
-			continue;
-
-		gh2 = (struct genevehdr *)(p->data + off_gnv);
-		if (gh->opt_len != gh2->opt_len ||
-		    memcmp(gh, gh2, gh_len)) {
-			NAPI_GRO_CB(p)->same_flow = 0;
-			continue;
-		}
-	}
-
-	type = gh->proto_type;
-
-	rcu_read_lock();
-	ptype = gro_find_receive_by_type(type);
-	if (!ptype) {
-		flush = 1;
-		goto out_unlock;
-	}
-
-	skb_gro_pull(skb, gh_len);
-	skb_gro_postpull_rcsum(skb, gh, gh_len);
-	pp = ptype->callbacks.gro_receive(head, skb);
-
-out_unlock:
-	rcu_read_unlock();
-out:
-	NAPI_GRO_CB(skb)->flush |= flush;
-
-	return pp;
-}
-
-static int geneve_gro_complete(struct sk_buff *skb, int nhoff,
-			       struct udp_offload *uoff)
-{
-	struct genevehdr *gh;
-	struct packet_offload *ptype;
-	__be16 type;
-	int gh_len;
-	int err = -ENOSYS;
-
-	udp_tunnel_gro_complete(skb, nhoff);
-
-	gh = (struct genevehdr *)(skb->data + nhoff);
-	gh_len = geneve_hlen(gh);
-	type = gh->proto_type;
-
-	rcu_read_lock();
-	ptype = gro_find_complete_by_type(type);
-	if (ptype)
-		err = ptype->callbacks.gro_complete(skb, nhoff + gh_len);
-
-	rcu_read_unlock();
-	return err;
-}
-
-static void geneve_notify_add_rx_port(struct geneve_sock *gs)
-{
-	struct sock *sk = gs->sock->sk;
-	sa_family_t sa_family = sk->sk_family;
-	int err;
-
-	if (sa_family == AF_INET) {
-		err = udp_add_offload(&gs->udp_offloads);
-		if (err)
-			pr_warn("geneve: udp_add_offload failed with status %d\n",
-				err);
-	}
-}
-
-static void geneve_notify_del_rx_port(struct geneve_sock *gs)
-{
-	struct sock *sk = gs->sock->sk;
-	sa_family_t sa_family = sk->sk_family;
-
-	if (sa_family == AF_INET)
-		udp_del_offload(&gs->udp_offloads);
-}
-
-/* Callback from net/ipv4/udp.c to receive packets */
-static int geneve_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
-{
-	struct genevehdr *geneveh;
-	struct geneve_sock *gs;
-	int opts_len;
-
-	/* Need Geneve and inner Ethernet header to be present */
-	if (unlikely(!pskb_may_pull(skb, GENEVE_BASE_HLEN)))
-		goto error;
-
-	/* Return packets with reserved bits set */
-	geneveh = geneve_hdr(skb);
-
-	if (unlikely(geneveh->ver != GENEVE_VER))
-		goto error;
-
-	if (unlikely(geneveh->proto_type != htons(ETH_P_TEB)))
-		goto error;
-
-	opts_len = geneveh->opt_len * 4;
-	if (iptunnel_pull_header(skb, GENEVE_BASE_HLEN + opts_len,
-				 htons(ETH_P_TEB)))
-		goto drop;
-
-	gs = rcu_dereference_sk_user_data(sk);
-	if (!gs)
-		goto drop;
-
-	gs->rcv(gs, skb);
-	return 0;
-
-drop:
-	/* Consume bad packet */
-	kfree_skb(skb);
-	return 0;
-
-error:
-	/* Let the UDP layer deal with the skb */
-	return 1;
-}
-
-static struct socket *geneve_create_sock(struct net *net, bool ipv6,
-					 __be16 port)
-{
-	struct socket *sock;
-	struct udp_port_cfg udp_conf;
-	int err;
-
-	memset(&udp_conf, 0, sizeof(udp_conf));
-
-	if (ipv6) {
-		udp_conf.family = AF_INET6;
-	} else {
-		udp_conf.family = AF_INET;
-		udp_conf.local_ip.s_addr = htonl(INADDR_ANY);
-	}
-
-	udp_conf.local_udp_port = port;
-
-	/* Open UDP socket */
-	err = udp_sock_create(net, &udp_conf, &sock);
-	if (err < 0)
-		return ERR_PTR(err);
-
-	return sock;
-}
-
-/* Create new listen socket if needed */
-static struct geneve_sock *geneve_socket_create(struct net *net, __be16 port,
-						geneve_rcv_t *rcv, void *data,
-						bool ipv6)
-{
-	struct geneve_net *gn = net_generic(net, geneve_net_id);
-	struct geneve_sock *gs;
-	struct socket *sock;
-	struct udp_tunnel_sock_cfg tunnel_cfg;
-
-	gs = kzalloc(sizeof(*gs), GFP_KERNEL);
-	if (!gs)
-		return ERR_PTR(-ENOMEM);
-
-	sock = geneve_create_sock(net, ipv6, port);
-	if (IS_ERR(sock)) {
-		kfree(gs);
-		return ERR_CAST(sock);
-	}
-
-	gs->sock = sock;
-	gs->refcnt = 1;
-	gs->rcv = rcv;
-	gs->rcv_data = data;
-
-	/* Initialize the geneve udp offloads structure */
-	gs->udp_offloads.port = port;
-	gs->udp_offloads.callbacks.gro_receive  = geneve_gro_receive;
-	gs->udp_offloads.callbacks.gro_complete = geneve_gro_complete;
-	geneve_notify_add_rx_port(gs);
-
-	/* Mark socket as an encapsulation socket */
-	tunnel_cfg.sk_user_data = gs;
-	tunnel_cfg.encap_type = 1;
-	tunnel_cfg.encap_rcv = geneve_udp_encap_recv;
-	tunnel_cfg.encap_destroy = NULL;
-	setup_udp_tunnel_sock(net, sock, &tunnel_cfg);
-
-	list_add(&gs->list, &gn->sock_list);
-
-	return gs;
-}
-
-struct geneve_sock *geneve_sock_add(struct net *net, __be16 port,
-				    geneve_rcv_t *rcv, void *data,
-				    bool no_share, bool ipv6)
-{
-	struct geneve_sock *gs;
-
-	mutex_lock(&geneve_mutex);
-
-	gs = geneve_find_sock(net, ipv6 ? AF_INET6 : AF_INET, port);
-	if (gs) {
-		if (!no_share && gs->rcv == rcv)
-			gs->refcnt++;
-		else
-			gs = ERR_PTR(-EBUSY);
-	} else {
-		gs = geneve_socket_create(net, port, rcv, data, ipv6);
-	}
-
-	mutex_unlock(&geneve_mutex);
-
-	return gs;
-}
-EXPORT_SYMBOL_GPL(geneve_sock_add);
-
-void geneve_sock_release(struct geneve_sock *gs)
-{
-	mutex_lock(&geneve_mutex);
-
-	if (--gs->refcnt)
-		goto unlock;
-
-	list_del(&gs->list);
-	geneve_notify_del_rx_port(gs);
-	udp_tunnel_sock_release(gs->sock);
-	kfree_rcu(gs, rcu);
-
-unlock:
-	mutex_unlock(&geneve_mutex);
-}
-EXPORT_SYMBOL_GPL(geneve_sock_release);
-
-static __net_init int geneve_init_net(struct net *net)
-{
-	struct geneve_net *gn = net_generic(net, geneve_net_id);
-
-	INIT_LIST_HEAD(&gn->sock_list);
-
-	return 0;
-}
-
-static struct pernet_operations geneve_net_ops = {
-	.init = geneve_init_net,
-	.id   = &geneve_net_id,
-	.size = sizeof(struct geneve_net),
-};
-
-static int __init geneve_init_module(void)
-{
-	int rc;
-
-	rc = register_pernet_subsys(&geneve_net_ops);
-	if (rc)
-		return rc;
-
-	pr_info("Geneve core logic\n");
-
-	return 0;
-}
-module_init(geneve_init_module);
-
-static void __exit geneve_cleanup_module(void)
-{
-	unregister_pernet_subsys(&geneve_net_ops);
-}
-module_exit(geneve_cleanup_module);
-
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Jesse Gross <jesse@nicira.com>");
-MODULE_DESCRIPTION("Driver library for GENEVE encapsulated traffic");
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next v2 8/9] geneve: Move device hash table to geneve socket.
  2015-08-17 21:11 [PATCH net-next v2 0/9] Geneve: Add support for tunnel metadata mode Pravin B Shelar
                   ` (6 preceding siblings ...)
  2015-08-17 21:11 ` [PATCH net-next v2 7/9] geneve: Consolidate Geneve functionality in single module Pravin B Shelar
@ 2015-08-17 21:11 ` Pravin B Shelar
  2015-08-19 19:05   ` Jesse Gross
  2015-08-17 21:11 ` [PATCH net-next v2 9/9] geneve: Implement rtnl changelink Pravin B Shelar
  8 siblings, 1 reply; 33+ messages in thread
From: Pravin B Shelar @ 2015-08-17 21:11 UTC (permalink / raw)
  To: netdev; +Cc: Pravin B Shelar

This change simplifies Geneve Tunnel hash table management.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 drivers/net/geneve.c | 76 ++++++++++++++++++++++++++++------------------------
 1 file changed, 41 insertions(+), 35 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index eb298ff..e47cdd9 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -40,7 +40,6 @@ MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");
 /* per-network namespace private data for this module */
 struct geneve_net {
 	struct list_head	geneve_list;
-	struct hlist_head	vni_list[VNI_HASH_SIZE];
 	struct list_head	sock_list;
 };
 
@@ -63,12 +62,12 @@ struct geneve_dev {
 
 struct geneve_sock {
 	bool			collect_md;
-	struct geneve_net	*gn;
 	struct list_head	list;
 	struct socket		*sock;
 	struct rcu_head		rcu;
 	int			refcnt;
 	struct udp_offload	udp_offloads;
+	struct hlist_head	vni_list[VNI_HASH_SIZE];
 };
 
 static inline __u32 geneve_net_vni_hash(u8 vni[3])
@@ -90,7 +89,7 @@ static __be64 vni_to_tunnel_id(const __u8 *vni)
 #endif
 }
 
-static struct geneve_dev *geneve_lookup(struct geneve_net *gn, __be16 port,
+static struct geneve_dev *geneve_lookup(struct geneve_sock *gs,
 					__be32 addr, u8 vni[])
 {
 	struct hlist_head *vni_list_head;
@@ -99,13 +98,11 @@ static struct geneve_dev *geneve_lookup(struct geneve_net *gn, __be16 port,
 
 	/* Find the device for this VNI */
 	hash = geneve_net_vni_hash(vni);
-	vni_list_head = &gn->vni_list[hash];
+	vni_list_head = &gs->vni_list[hash];
 	hlist_for_each_entry_rcu(geneve, vni_list_head, hlist) {
 		if (!memcmp(vni, geneve->vni, sizeof(geneve->vni)) &&
-		    addr == geneve->remote.sin_addr.s_addr &&
-		    port == geneve->dst_port) {
+		    addr == geneve->remote.sin_addr.s_addr)
 			return geneve;
-		}
 	}
 	return NULL;
 }
@@ -118,9 +115,7 @@ static inline struct genevehdr *geneve_hdr(const struct sk_buff *skb)
 /* geneve receive/decap routine */
 static void geneve_rx(struct geneve_sock *gs, struct sk_buff *skb)
 {
-	struct inet_sock *sk = inet_sk(gs->sock->sk);
 	struct genevehdr *gnvh = geneve_hdr(skb);
-	struct geneve_net *gn = gs->gn;
 	struct metadata_dst *tun_dst = NULL;
 	struct geneve_dev *geneve = NULL;
 	struct pcpu_sw_netstats *stats;
@@ -130,8 +125,6 @@ static void geneve_rx(struct geneve_sock *gs, struct sk_buff *skb)
 	bool xnet;
 	int err;
 
-	iph = ip_hdr(skb); /* Still outer IP header... */
-
 	if (gs->collect_md) {
 		static u8 zero_vni[3];
 
@@ -139,10 +132,11 @@ static void geneve_rx(struct geneve_sock *gs, struct sk_buff *skb)
 		addr = 0;
 	} else {
 		vni = gnvh->vni;
+		iph = ip_hdr(skb); /* Still outer IP header... */
 		addr = iph->saddr;
 	}
 
-	geneve = geneve_lookup(gn, sk->inet_sport, addr, vni);
+	geneve = geneve_lookup(gs, addr, vni);
 	if (!geneve)
 		goto drop;
 
@@ -419,6 +413,7 @@ static struct geneve_sock *geneve_socket_create(struct net *net, __be16 port,
 	struct geneve_sock *gs;
 	struct socket *sock;
 	struct udp_tunnel_sock_cfg tunnel_cfg;
+	int h;
 
 	gs = kzalloc(sizeof(*gs), GFP_KERNEL);
 	if (!gs)
@@ -432,7 +427,8 @@ static struct geneve_sock *geneve_socket_create(struct net *net, __be16 port,
 
 	gs->sock = sock;
 	gs->refcnt = 1;
-	gs->gn = gn;
+	for (h = 0; h < VNI_HASH_SIZE; ++h)
+		INIT_HLIST_HEAD(&gs->vni_list[h]);
 
 	/* Initialize the geneve udp offloads structure */
 	gs->udp_offloads.port = port;
@@ -446,7 +442,6 @@ static struct geneve_sock *geneve_socket_create(struct net *net, __be16 port,
 	tunnel_cfg.encap_rcv = geneve_udp_encap_recv;
 	tunnel_cfg.encap_destroy = NULL;
 	setup_udp_tunnel_sock(net, sock, &tunnel_cfg);
-
 	list_add(&gs->list, &gn->sock_list);
 	return gs;
 }
@@ -471,19 +466,32 @@ static void geneve_sock_release(struct geneve_sock *gs)
 	kfree_rcu(gs, rcu);
 }
 
+static struct geneve_sock *geneve_find_sock(struct geneve_net *gn,
+					    __be16 dst_port)
+{
+	struct geneve_sock *gs;
+
+	list_for_each_entry(gs, &gn->sock_list, list) {
+		if (inet_sk(gs->sock->sk)->inet_sport == dst_port &&
+		    inet_sk(gs->sock->sk)->sk.sk_family == AF_INET) {
+			return gs;
+		}
+	}
+	return NULL;
+}
+
 static int geneve_open(struct net_device *dev)
 {
 	struct geneve_dev *geneve = netdev_priv(dev);
 	struct net *net = geneve->net;
 	struct geneve_net *gn = net_generic(net, geneve_net_id);
 	struct geneve_sock *gs;
+	__u32 hash;
 
-	list_for_each_entry(gs, &gn->sock_list, list) {
-		if (inet_sk(gs->sock->sk)->inet_sport == geneve->dst_port &&
-		    inet_sk(gs->sock->sk)->sk.sk_family == AF_INET) {
-			gs->refcnt++;
-			goto out;
-		}
+	gs = geneve_find_sock(gn, geneve->dst_port);
+	if (gs) {
+		gs->refcnt++;
+		goto out;
 	}
 
 	gs = geneve_socket_create(net, geneve->dst_port, false);
@@ -494,6 +502,9 @@ out:
 	if (geneve->collect_md)
 		gs->collect_md = true;
 	geneve->sock = gs;
+
+	hash = geneve_net_vni_hash(geneve->vni);
+	hlist_add_head_rcu(&geneve->hlist, &gs->vni_list[hash]);
 	return 0;
 }
 
@@ -505,6 +516,8 @@ static int geneve_stop(struct net_device *dev)
 	if (geneve->collect_md)
 		gs->collect_md = false;
 
+	if (!hlist_unhashed(&geneve->hlist))
+		hlist_del_rcu(&geneve->hlist);
 	geneve_sock_release(gs);
 	return 0;
 }
@@ -792,8 +805,8 @@ static int geneve_configure(struct net *net, struct net_device *dev,
 			    __u16 dst_port, bool metadata)
 {
 	struct geneve_net *gn = net_generic(net, geneve_net_id);
-	struct geneve_dev *t, *geneve = netdev_priv(dev);
-	__u32 hash;
+	struct geneve_dev *geneve = netdev_priv(dev);
+	struct geneve_dev *t;
 	int err;
 
 	geneve->net = net;
@@ -807,9 +820,12 @@ static int geneve_configure(struct net *net, struct net_device *dev,
 	if (IN_MULTICAST(ntohl(geneve->remote.sin_addr.s_addr)))
 		return -EINVAL;
 
-	t = geneve_lookup(gn, htons(dst_port), rem_addr, geneve->vni);
-	if (t)
-		return -EBUSY;
+	list_for_each_entry(t, &gn->geneve_list, next) {
+		if (!memcmp(geneve->vni, t->vni, sizeof(t->vni)) &&
+		    rem_addr == t->remote.sin_addr.s_addr &&
+		    htons(dst_port) == geneve->dst_port)
+			return -EBUSY;
+	}
 
 	geneve->ttl = ttl;
 	geneve->tos = tos;
@@ -821,8 +837,6 @@ static int geneve_configure(struct net *net, struct net_device *dev,
 		return err;
 
 	list_add(&geneve->next, &gn->geneve_list);
-	hash = geneve_net_vni_hash(geneve->vni);
-	hlist_add_head_rcu(&geneve->hlist, &gn->vni_list[hash]);
 	return 0;
 }
 
@@ -861,9 +875,6 @@ static void geneve_dellink(struct net_device *dev, struct list_head *head)
 {
 	struct geneve_dev *geneve = netdev_priv(dev);
 
-	if (!hlist_unhashed(&geneve->hlist))
-		hlist_del_rcu(&geneve->hlist);
-
 	list_del(&geneve->next);
 	unregister_netdevice_queue(dev, head);
 }
@@ -948,14 +959,9 @@ EXPORT_SYMBOL_GPL(geneve_dev_create_fb);
 static __net_init int geneve_init_net(struct net *net)
 {
 	struct geneve_net *gn = net_generic(net, geneve_net_id);
-	unsigned int h;
 
 	INIT_LIST_HEAD(&gn->geneve_list);
-
 	INIT_LIST_HEAD(&gn->sock_list);
-	for (h = 0; h < VNI_HASH_SIZE; ++h)
-		INIT_HLIST_HEAD(&gn->vni_list[h]);
-
 	return 0;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next v2 9/9] geneve: Implement rtnl changelink
  2015-08-17 21:11 [PATCH net-next v2 0/9] Geneve: Add support for tunnel metadata mode Pravin B Shelar
                   ` (7 preceding siblings ...)
  2015-08-17 21:11 ` [PATCH net-next v2 8/9] geneve: Move device hash table to geneve socket Pravin B Shelar
@ 2015-08-17 21:11 ` Pravin B Shelar
  2015-08-19 19:40   ` Jesse Gross
  8 siblings, 1 reply; 33+ messages in thread
From: Pravin B Shelar @ 2015-08-17 21:11 UTC (permalink / raw)
  To: netdev; +Cc: Pravin B Shelar

Allow run time changes to Geneve device configuration.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 drivers/net/geneve.c | 159 +++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 115 insertions(+), 44 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index e47cdd9..0d7fbef 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -70,12 +70,18 @@ struct geneve_sock {
 	struct hlist_head	vni_list[VNI_HASH_SIZE];
 };
 
-static inline __u32 geneve_net_vni_hash(u8 vni[3])
+/* Convert 64 bit tunnel ID to 24 bit VNI. */
+static void tunnel_id_to_vni(__be64 tun_id, __u8 *vni)
 {
-	__u32 vnid;
-
-	vnid = (vni[0] << 16) | (vni[1] << 8) | vni[2];
-	return hash_32(vnid, VNI_HASH_BITS);
+#ifdef __BIG_ENDIAN
+	vni[0] = (__force __u8)(tun_id >> 16);
+	vni[1] = (__force __u8)(tun_id >> 8);
+	vni[2] = (__force __u8)tun_id;
+#else
+	vni[0] = (__force __u8)((__force u64)tun_id >> 40);
+	vni[1] = (__force __u8)((__force u64)tun_id >> 48);
+	vni[2] = (__force __u8)((__force u64)tun_id >> 56);
+#endif
 }
 
 static __be64 vni_to_tunnel_id(const __u8 *vni)
@@ -89,8 +95,28 @@ static __be64 vni_to_tunnel_id(const __u8 *vni)
 #endif
 }
 
+static u32 vni_to_u32(const u8 vni[])
+{
+	return (vni[0] << 16) | (vni[1] << 8) | vni[2];
+}
+
+static void u32_to_vni(u32 id, u8 vni[])
+{
+	vni[0] = (id & 0x00ff0000) >> 16;
+	vni[1] = (id & 0x0000ff00) >> 8;
+	vni[2] =  id & 0x000000ff;
+}
+
+static inline __u32 geneve_net_vni_hash(const u8 vni[])
+{
+	__u32 vnid;
+
+	vnid = vni_to_u32(vni);
+	return hash_32(vnid, VNI_HASH_BITS);
+}
+
 static struct geneve_dev *geneve_lookup(struct geneve_sock *gs,
-					__be32 addr, u8 vni[])
+					__be32 addr, const u8 vni[])
 {
 	struct hlist_head *vni_list_head;
 	struct geneve_dev *geneve;
@@ -120,7 +146,7 @@ static void geneve_rx(struct geneve_sock *gs, struct sk_buff *skb)
 	struct geneve_dev *geneve = NULL;
 	struct pcpu_sw_netstats *stats;
 	struct iphdr *iph;
-	u8 *vni;
+	const u8 *vni;
 	__be32 addr;
 	bool xnet;
 	int err;
@@ -519,6 +545,7 @@ static int geneve_stop(struct net_device *dev)
 	if (!hlist_unhashed(&geneve->hlist))
 		hlist_del_rcu(&geneve->hlist);
 	geneve_sock_release(gs);
+	geneve->sock = NULL;
 	return 0;
 }
 
@@ -613,20 +640,6 @@ static struct rtable *geneve_get_rt(struct sk_buff *skb,
 	return rt;
 }
 
-/* Convert 64 bit tunnel ID to 24 bit VNI. */
-static void tunnel_id_to_vni(__be64 tun_id, __u8 *vni)
-{
-#ifdef __BIG_ENDIAN
-	vni[0] = (__force __u8)(tun_id >> 16);
-	vni[1] = (__force __u8)(tun_id >> 8);
-	vni[2] = (__force __u8)tun_id;
-#else
-	vni[0] = (__force __u8)((__force u64)tun_id >> 40);
-	vni[1] = (__force __u8)((__force u64)tun_id >> 48);
-	vni[2] = (__force __u8)((__force u64)tun_id >> 56);
-#endif
-}
-
 static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct geneve_dev *geneve = netdev_priv(dev);
@@ -800,26 +813,22 @@ static int geneve_validate(struct nlattr *tb[], struct nlattr *data[])
 	return 0;
 }
 
-static int geneve_configure(struct net *net, struct net_device *dev,
-			    __be32 rem_addr, __u32 vni, __u8 ttl, __u8 tos,
-			    __u16 dst_port, bool metadata)
+static int __geneve_configure(struct net *net, struct net_device *dev,
+			      __be32 rem_addr, __u32 vni, __u8 ttl, __u8 tos,
+			      __u16 dst_port, bool metadata)
 {
 	struct geneve_net *gn = net_generic(net, geneve_net_id);
 	struct geneve_dev *geneve = netdev_priv(dev);
 	struct geneve_dev *t;
-	int err;
 
 	geneve->net = net;
 	geneve->dev = dev;
 
-	geneve->vni[0] = (vni & 0x00ff0000) >> 16;
-	geneve->vni[1] = (vni & 0x0000ff00) >> 8;
-	geneve->vni[2] =  vni & 0x000000ff;
-
 	geneve->remote.sin_addr.s_addr = rem_addr;
 	if (IN_MULTICAST(ntohl(geneve->remote.sin_addr.s_addr)))
 		return -EINVAL;
 
+	u32_to_vni(vni, geneve->vni);
 	list_for_each_entry(t, &gn->geneve_list, next) {
 		if (!memcmp(geneve->vni, t->vni, sizeof(t->vni)) &&
 		    rem_addr == t->remote.sin_addr.s_addr &&
@@ -832,6 +841,22 @@ static int geneve_configure(struct net *net, struct net_device *dev,
 	geneve->dst_port = htons(dst_port);
 	geneve->collect_md = metadata;
 
+	return 0;
+}
+
+static int geneve_configure(struct net *net, struct net_device *dev,
+			    __be32 rem_addr, __u32 vni, __u8 ttl, __u8 tos,
+			    __u32 dst_port, bool metadata)
+{
+	struct geneve_net *gn = net_generic(net, geneve_net_id);
+	struct geneve_dev *geneve = netdev_priv(dev);
+	int err;
+
+	err = __geneve_configure(net, dev, rem_addr, vni, ttl, tos,
+				 dst_port, metadata);
+	if (err)
+		return err;
+
 	err = register_netdevice(dev);
 	if (err)
 		return err;
@@ -840,35 +865,80 @@ static int geneve_configure(struct net *net, struct net_device *dev,
 	return 0;
 }
 
+static void geneve_parse_conf(struct nlattr *data[], __be32 *rem_addr,
+			      __u32 *vni, __u8 *ttl, __u8 *tos,
+			      __u16 *dst_port, bool *metadata)
+{
+	if (data[IFLA_GENEVE_ID])
+		*vni = nla_get_u32(data[IFLA_GENEVE_ID]);
+
+	if (data[IFLA_GENEVE_REMOTE])
+		*rem_addr = nla_get_in_addr(data[IFLA_GENEVE_REMOTE]);
+
+	if (data[IFLA_GENEVE_TTL])
+		*ttl = nla_get_u8(data[IFLA_GENEVE_TTL]);
+
+	if (data[IFLA_GENEVE_TOS])
+		*tos = nla_get_u8(data[IFLA_GENEVE_TOS]);
+
+	if (data[IFLA_GENEVE_PORT])
+		*dst_port = nla_get_u16(data[IFLA_GENEVE_PORT]);
+
+	if (data[IFLA_GENEVE_COLLECT_METADATA])
+		*metadata = true;
+}
+
 static int geneve_newlink(struct net *net, struct net_device *dev,
 			  struct nlattr *tb[], struct nlattr *data[])
 {
 	__u16 dst_port = GENEVE_UDP_PORT;
 	__u8 ttl = 0, tos = 0;
 	bool metadata = false;
-	__be32 rem_addr;
-	__u32 vni;
+	__be32 rem_addr = 0;
+	__u32 vni = 0;
 
 	if (!data[IFLA_GENEVE_ID] || !data[IFLA_GENEVE_REMOTE])
 		return -EINVAL;
 
-	vni = nla_get_u32(data[IFLA_GENEVE_ID]);
-	rem_addr = nla_get_in_addr(data[IFLA_GENEVE_REMOTE]);
+	geneve_parse_conf(data, &rem_addr, &vni, &ttl, &tos, &dst_port,
+			  &metadata);
+	return geneve_configure(net, dev, rem_addr, vni,
+				ttl, tos, dst_port, metadata);
+}
 
-	if (data[IFLA_GENEVE_TTL])
-		ttl = nla_get_u8(data[IFLA_GENEVE_TTL]);
+static int geneve_changelink(struct net_device *dev,
+			     struct nlattr *tb[], struct nlattr *data[])
+{
+	struct geneve_dev *geneve = netdev_priv(dev);
+	__be32 rem_addr = geneve->remote.sin_addr.s_addr;
+	__u16 dst_port = ntohs(geneve->dst_port);
+	bool metadata = geneve->collect_md;
+	u32 vni = vni_to_u32(geneve->vni);
+	__u8 ttl = geneve->ttl;
+	__u8 tos = geneve->tos;
+	bool start_dev = false;
+	int err;
 
-	if (data[IFLA_GENEVE_TOS])
-		tos = nla_get_u8(data[IFLA_GENEVE_TOS]);
+	geneve_parse_conf(data, &rem_addr, &vni, &ttl, &tos, &dst_port,
+			  &metadata);
 
-	if (data[IFLA_GENEVE_PORT])
-		dst_port = nla_get_u16(data[IFLA_GENEVE_PORT]);
+	if (geneve->sock && (dst_port != ntohs(geneve->dst_port) ||
+			     metadata != geneve->collect_md)) {
+		/* Recreate socket due to socket related changes. */
+		err = geneve_stop(dev);
+		if (err)
+			return err;
+		start_dev = true;
+	}
+	err = __geneve_configure(geneve->net, dev, rem_addr, vni, ttl, tos,
+				 dst_port, metadata);
+	if (err)
+		return err;
 
-	if (data[IFLA_GENEVE_COLLECT_METADATA])
-		metadata = true;
+	if (start_dev)
+		err = geneve_open(dev);
 
-	return geneve_configure(net, dev, rem_addr, vni,
-				ttl, tos, dst_port, metadata);
+	return err;
 }
 
 static void geneve_dellink(struct net_device *dev, struct list_head *head)
@@ -895,7 +965,7 @@ static int geneve_fill_info(struct sk_buff *skb, const struct net_device *dev)
 	struct geneve_dev *geneve = netdev_priv(dev);
 	__u32 vni;
 
-	vni = (geneve->vni[0] << 16) | (geneve->vni[1] << 8) | geneve->vni[2];
+	vni = vni_to_u32(geneve->vni);
 	if (nla_put_u32(skb, IFLA_GENEVE_ID, vni))
 		goto nla_put_failure;
 
@@ -929,6 +999,7 @@ static struct rtnl_link_ops geneve_link_ops __read_mostly = {
 	.setup		= geneve_setup,
 	.validate	= geneve_validate,
 	.newlink	= geneve_newlink,
+	.changelink	= geneve_changelink,
 	.dellink	= geneve_dellink,
 	.get_size	= geneve_get_size,
 	.fill_info	= geneve_fill_info,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next v2 5/9] geneve: Add support to collect tunnel metadata.
  2015-08-17 21:11 ` [PATCH net-next v2 5/9] geneve: Add support to collect tunnel metadata Pravin B Shelar
@ 2015-08-18 21:12   ` David Miller
  2015-08-20 20:33     ` Pravin Shelar
  2015-08-19  1:07   ` Jesse Gross
  1 sibling, 1 reply; 33+ messages in thread
From: David Miller @ 2015-08-18 21:12 UTC (permalink / raw)
  To: pshelar; +Cc: netdev

From: Pravin B Shelar <pshelar@nicira.com>
Date: Mon, 17 Aug 2015 14:11:41 -0700

> +	if (tun_dst)
> +		skb_dst_set(skb, (struct dst_entry *)tun_dst);
> +

Please don't cast things like this, it is completely unnecessary:

	skb_dst_set(skb, &tun_dst->dst);

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next v2 1/9] geneve: Initialize ethernet address in device setup.
  2015-08-17 21:11 ` [PATCH net-next v2 1/9] geneve: Initialize ethernet address in device setup Pravin B Shelar
@ 2015-08-19  0:10   ` Jesse Gross
  2015-08-20 16:47   ` Thomas Graf
  1 sibling, 0 replies; 33+ messages in thread
From: Jesse Gross @ 2015-08-19  0:10 UTC (permalink / raw)
  To: Pravin B Shelar; +Cc: netdev

On Mon, Aug 17, 2015 at 2:11 PM, Pravin B Shelar <pshelar@nicira.com> wrote:
> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
> ---
>  drivers/net/geneve.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)

Reviewed-by: Jesse Gross <jesse@nicira.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next v2 2/9] geneve: Use skb mark and protocol to lookup route.
  2015-08-17 21:11 ` [PATCH net-next v2 2/9] geneve: Use skb mark and protocol to lookup route Pravin B Shelar
@ 2015-08-19  0:12   ` Jesse Gross
  2015-08-20 16:47   ` Thomas Graf
  1 sibling, 0 replies; 33+ messages in thread
From: Jesse Gross @ 2015-08-19  0:12 UTC (permalink / raw)
  To: Pravin B Shelar; +Cc: netdev

On Mon, Aug 17, 2015 at 2:11 PM, Pravin B Shelar <pshelar@nicira.com> wrote:
> On packet transmit path geneve need to lookup route. Following
> patch improves route lookup using more parameters.
>
> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>

Reviewed-by: Jesse Gross <jesse@nicira.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next v2 4/9] geneve: Make dst-port configurable.
  2015-08-17 21:11 ` [PATCH net-next v2 4/9] geneve: Make dst-port configurable Pravin B Shelar
@ 2015-08-19  0:27   ` Jesse Gross
  2015-08-20 16:58   ` Thomas Graf
  1 sibling, 0 replies; 33+ messages in thread
From: Jesse Gross @ 2015-08-19  0:27 UTC (permalink / raw)
  To: Pravin B Shelar; +Cc: netdev

On Mon, Aug 17, 2015 at 2:11 PM, Pravin B Shelar <pshelar@nicira.com> wrote:
> Add netlink interface to configure Geneve UDP port number.
> So that user can configure it for a Gevene device.
>
> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>

Reviewed-by: Jesse Gross <jesse@nicira.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next v2 5/9] geneve: Add support to collect tunnel metadata.
  2015-08-17 21:11 ` [PATCH net-next v2 5/9] geneve: Add support to collect tunnel metadata Pravin B Shelar
  2015-08-18 21:12   ` David Miller
@ 2015-08-19  1:07   ` Jesse Gross
  2015-08-20 20:31     ` Pravin Shelar
  1 sibling, 1 reply; 33+ messages in thread
From: Jesse Gross @ 2015-08-19  1:07 UTC (permalink / raw)
  To: Pravin B Shelar; +Cc: netdev

On Mon, Aug 17, 2015 at 2:11 PM, Pravin B Shelar <pshelar@nicira.com> wrote:
> diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
> index 546494d..cb2d874 100644
> --- a/drivers/net/geneve.c
> +++ b/drivers/net/geneve.c
> +static int geneve_configure(struct net *net, struct net_device *dev,
> +                           __be32 rem_addr, __u32 vni, __u8 ttl, __u8 tos,
> +                           __u16 dst_port, bool metadata)
[...]
> -       if (!data[IFLA_GENEVE_ID] || !data[IFLA_GENEVE_REMOTE])
> -               return -EINVAL;
> +       if (metadata && rtnl_dereference(gn->collect_md_tun))
> +               return -EEXIST;

This will allow you to configure a collect_md_tun devices and normal
devices on top of each other, even though the latter will never get
hit in that situation.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next v2 7/9] geneve: Consolidate Geneve functionality in single module.
  2015-08-17 21:11 ` [PATCH net-next v2 7/9] geneve: Consolidate Geneve functionality in single module Pravin B Shelar
@ 2015-08-19 18:18   ` Jesse Gross
  2015-08-19 18:29     ` Pravin Shelar
  0 siblings, 1 reply; 33+ messages in thread
From: Jesse Gross @ 2015-08-19 18:18 UTC (permalink / raw)
  To: Pravin B Shelar; +Cc: netdev

On Mon, Aug 17, 2015 at 2:11 PM, Pravin B Shelar <pshelar@nicira.com> wrote:
> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
> index e58468b..18ff83b 100644
> --- a/drivers/net/Kconfig
> +++ b/drivers/net/Kconfig
> @@ -181,7 +181,7 @@ config VXLAN
>
>  config GENEVE
>         tristate "Generic Network Virtualization Encapsulation netdev"
> -       depends on INET && GENEVE_CORE
> +       depends on INET
>         select NET_IP_TUNNEL

I think my comments on v1 one this patch were overlooked (about the
UDP_TUNNEL dependency and the name).

> diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
> index 5b43382..eb298ff 100644
> --- a/drivers/net/geneve.c
> +++ b/drivers/net/geneve.c
> +static void geneve_build_header(struct genevehdr *geneveh,
> +                               __be16 tun_flags, u8 vni[3],
> +                               u8 options_len, u8 *options)
[...]
> +static int geneve_build_skb(struct rtable *rt, struct sk_buff *skb,
> +                           __be16 tun_flags, u8 vni[3], u8 opt_len, u8 *opt,
> +                           bool csum)

It seems like we could just merge these functions. I'm not sure that
the role is all that different.

In geneve_build_skb(), the error labels are somewhat confusing (for
example, free_rt doesn't free the rt). Also, is it right that we don't
free the rt if udp_tunnel_handle_offloads() fails()? It might be
cleaner if the caller retains ownership of rt.

My guess is that if the issue from the earlier patch about overlapping
collect_md tunnels is fixed then that might allow us to simplify
things a little further, since for those tunnels we can assume there
is a 1:1 mapping between collect_md tunnels and sockets.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next v2 7/9] geneve: Consolidate Geneve functionality in single module.
  2015-08-19 18:18   ` Jesse Gross
@ 2015-08-19 18:29     ` Pravin Shelar
  2015-08-19 18:37       ` Jesse Gross
  0 siblings, 1 reply; 33+ messages in thread
From: Pravin Shelar @ 2015-08-19 18:29 UTC (permalink / raw)
  To: Jesse Gross; +Cc: netdev

On Wed, Aug 19, 2015 at 11:18 AM, Jesse Gross <jesse@nicira.com> wrote:
> On Mon, Aug 17, 2015 at 2:11 PM, Pravin B Shelar <pshelar@nicira.com> wrote:
>> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
>> index e58468b..18ff83b 100644
>> --- a/drivers/net/Kconfig
>> +++ b/drivers/net/Kconfig
>> @@ -181,7 +181,7 @@ config VXLAN
>>
>>  config GENEVE
>>         tristate "Generic Network Virtualization Encapsulation netdev"
>> -       depends on INET && GENEVE_CORE
>> +       depends on INET
>>         select NET_IP_TUNNEL
>
> I think my comments on v1 one this patch were overlooked (about the
> UDP_TUNNEL dependency and the name).
>
right, I missed it.

>> diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
>> index 5b43382..eb298ff 100644
>> --- a/drivers/net/geneve.c
>> +++ b/drivers/net/geneve.c
>> +static void geneve_build_header(struct genevehdr *geneveh,
>> +                               __be16 tun_flags, u8 vni[3],
>> +                               u8 options_len, u8 *options)
> [...]
>> +static int geneve_build_skb(struct rtable *rt, struct sk_buff *skb,
>> +                           __be16 tun_flags, u8 vni[3], u8 opt_len, u8 *opt,
>> +                           bool csum)
>
> It seems like we could just merge these functions. I'm not sure that
> the role is all that different.
>
ok.

> In geneve_build_skb(), the error labels are somewhat confusing (for
> example, free_rt doesn't free the rt). Also, is it right that we don't
> free the rt if udp_tunnel_handle_offloads() fails()? It might be
> cleaner if the caller retains ownership of rt.
>
ok.

> My guess is that if the issue from the earlier patch about overlapping
> collect_md tunnels is fixed then that might allow us to simplify
> things a little further, since for those tunnels we can assume there
> is a 1:1 mapping between collect_md tunnels and sockets.

I dont see how it would be different. Can you elaborate on this ?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next v2 7/9] geneve: Consolidate Geneve functionality in single module.
  2015-08-19 18:29     ` Pravin Shelar
@ 2015-08-19 18:37       ` Jesse Gross
  2015-08-19 18:49         ` Pravin Shelar
  0 siblings, 1 reply; 33+ messages in thread
From: Jesse Gross @ 2015-08-19 18:37 UTC (permalink / raw)
  To: Pravin Shelar; +Cc: netdev

On Wed, Aug 19, 2015 at 11:29 AM, Pravin Shelar <pshelar@nicira.com> wrote:
> On Wed, Aug 19, 2015 at 11:18 AM, Jesse Gross <jesse@nicira.com> wrote:
>> My guess is that if the issue from the earlier patch about overlapping
>> collect_md tunnels is fixed then that might allow us to simplify
>> things a little further, since for those tunnels we can assume there
>> is a 1:1 mapping between collect_md tunnels and sockets.
>
> I dont see how it would be different. Can you elaborate on this ?

Mostly just conceptually simpler. Right now it looks like we are doing
some kind of refcounting between devices and tunnels in
geneve_open/stop (I know it's not really but it appears like that in
some ways.) We could just directly assign collect_md in geneve_open()
and do nothing at all in geneve_stop().

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next v2 7/9] geneve: Consolidate Geneve functionality in single module.
  2015-08-19 18:37       ` Jesse Gross
@ 2015-08-19 18:49         ` Pravin Shelar
  2015-08-19 19:50           ` Jesse Gross
  0 siblings, 1 reply; 33+ messages in thread
From: Pravin Shelar @ 2015-08-19 18:49 UTC (permalink / raw)
  To: Jesse Gross; +Cc: netdev

On Wed, Aug 19, 2015 at 11:37 AM, Jesse Gross <jesse@nicira.com> wrote:
> On Wed, Aug 19, 2015 at 11:29 AM, Pravin Shelar <pshelar@nicira.com> wrote:
>> On Wed, Aug 19, 2015 at 11:18 AM, Jesse Gross <jesse@nicira.com> wrote:
>>> My guess is that if the issue from the earlier patch about overlapping
>>> collect_md tunnels is fixed then that might allow us to simplify
>>> things a little further, since for those tunnels we can assume there
>>> is a 1:1 mapping between collect_md tunnels and sockets.
>>
>> I dont see how it would be different. Can you elaborate on this ?
>
> Mostly just conceptually simpler. Right now it looks like we are doing
> some kind of refcounting between devices and tunnels in
> geneve_open/stop (I know it's not really but it appears like that in
> some ways.) We could just directly assign collect_md in geneve_open()
> and do nothing at all in geneve_stop().

If you look at next patch, I have changed geneve_open and stop
further. The change is geneve_open adds tunnel to hash table so that
only device which are open are in hash table. Since geneve_open and
stop is common for both type of tunnel I do not think there can be any
changes even after avoiding overlapping tunnel types in given socket.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next v2 8/9] geneve: Move device hash table to geneve socket.
  2015-08-17 21:11 ` [PATCH net-next v2 8/9] geneve: Move device hash table to geneve socket Pravin B Shelar
@ 2015-08-19 19:05   ` Jesse Gross
  0 siblings, 0 replies; 33+ messages in thread
From: Jesse Gross @ 2015-08-19 19:05 UTC (permalink / raw)
  To: Pravin B Shelar; +Cc: netdev

On Mon, Aug 17, 2015 at 2:11 PM, Pravin B Shelar <pshelar@nicira.com> wrote:
> This change simplifies Geneve Tunnel hash table management.
>
> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>

Reviewed-by: Jesse Gross <jesse@nicira.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next v2 9/9] geneve: Implement rtnl changelink
  2015-08-17 21:11 ` [PATCH net-next v2 9/9] geneve: Implement rtnl changelink Pravin B Shelar
@ 2015-08-19 19:40   ` Jesse Gross
  2015-08-19 20:12     ` Pravin Shelar
  0 siblings, 1 reply; 33+ messages in thread
From: Jesse Gross @ 2015-08-19 19:40 UTC (permalink / raw)
  To: Pravin B Shelar; +Cc: netdev

On Mon, Aug 17, 2015 at 2:11 PM, Pravin B Shelar <pshelar@nicira.com> wrote:
> diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
> index e47cdd9..0d7fbef 100644
> --- a/drivers/net/geneve.c
> +++ b/drivers/net/geneve.c
> -static int geneve_configure(struct net *net, struct net_device *dev,
> -                           __be32 rem_addr, __u32 vni, __u8 ttl, __u8 tos,
> -                           __u16 dst_port, bool metadata)
> +static int __geneve_configure(struct net *net, struct net_device *dev,
> +                             __be32 rem_addr, __u32 vni, __u8 ttl, __u8 tos,
> +                             __u16 dst_port, bool metadata)
>  {
[...]
>         geneve->net = net;
>         geneve->dev = dev;

I guess this stuff should really be in geneve_configure() - it seems a
bit odd to change it for a running device (even if it shouldn't
change).

>         geneve->remote.sin_addr.s_addr = rem_addr;
>         if (IN_MULTICAST(ntohl(geneve->remote.sin_addr.s_addr)))
>                 return -EINVAL;
>
> +       u32_to_vni(vni, geneve->vni);
>         list_for_each_entry(t, &gn->geneve_list, next) {
>                 if (!memcmp(geneve->vni, t->vni, sizeof(t->vni)) &&
>                     rem_addr == t->remote.sin_addr.s_addr &&

I'm not sure that these types of operations are safe if the device is
already running. We first overwrite the remote value and then we do
error checking but that means that if there is an error, then the
device will be left in a broken state. Don't we also need to update
the hash table if some of these parameters change?

> +static int geneve_changelink(struct net_device *dev,
> +                            struct nlattr *tb[], struct nlattr *data[])
> +{
[...]
> -       if (data[IFLA_GENEVE_PORT])
> -               dst_port = nla_get_u16(data[IFLA_GENEVE_PORT]);
> +       if (geneve->sock && (dst_port != ntohs(geneve->dst_port) ||
> +                            metadata != geneve->collect_md)) {

It seems like in an ideal world, we wouldn't need to recreate the
socket if metadata collection changed (assuming that there are no new
conflicts).

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next v2 7/9] geneve: Consolidate Geneve functionality in single module.
  2015-08-19 18:49         ` Pravin Shelar
@ 2015-08-19 19:50           ` Jesse Gross
  2015-08-19 20:14             ` Pravin Shelar
  0 siblings, 1 reply; 33+ messages in thread
From: Jesse Gross @ 2015-08-19 19:50 UTC (permalink / raw)
  To: Pravin Shelar; +Cc: netdev

On Wed, Aug 19, 2015 at 11:49 AM, Pravin Shelar <pshelar@nicira.com> wrote:
> On Wed, Aug 19, 2015 at 11:37 AM, Jesse Gross <jesse@nicira.com> wrote:
>> On Wed, Aug 19, 2015 at 11:29 AM, Pravin Shelar <pshelar@nicira.com> wrote:
>>> On Wed, Aug 19, 2015 at 11:18 AM, Jesse Gross <jesse@nicira.com> wrote:
>>>> My guess is that if the issue from the earlier patch about overlapping
>>>> collect_md tunnels is fixed then that might allow us to simplify
>>>> things a little further, since for those tunnels we can assume there
>>>> is a 1:1 mapping between collect_md tunnels and sockets.
>>>
>>> I dont see how it would be different. Can you elaborate on this ?
>>
>> Mostly just conceptually simpler. Right now it looks like we are doing
>> some kind of refcounting between devices and tunnels in
>> geneve_open/stop (I know it's not really but it appears like that in
>> some ways.) We could just directly assign collect_md in geneve_open()
>> and do nothing at all in geneve_stop().
>
> If you look at next patch, I have changed geneve_open and stop
> further. The change is geneve_open adds tunnel to hash table so that
> only device which are open are in hash table. Since geneve_open and
> stop is common for both type of tunnel I do not think there can be any
> changes even after avoiding overlapping tunnel types in given socket.

I guess I'm not sure why with the later changes it would be
incompatible. All I'm talking about is something pretty small:

geneve_open:
        if (geneve->collect_md)
                gs->collect_md = true;
to
                gs->collect_md = geneve->collect_md;

geneve_close:
remove
        if (geneve->collect_md)
                gs->collect_md = false;
since the socket is about to be freed anyways.

It's not very different in practice but it looks less like refcounting
and more like a 1:1 mapping.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next v2 9/9] geneve: Implement rtnl changelink
  2015-08-19 19:40   ` Jesse Gross
@ 2015-08-19 20:12     ` Pravin Shelar
  2015-08-19 22:39       ` Jesse Gross
  0 siblings, 1 reply; 33+ messages in thread
From: Pravin Shelar @ 2015-08-19 20:12 UTC (permalink / raw)
  To: Jesse Gross; +Cc: netdev

On Wed, Aug 19, 2015 at 12:40 PM, Jesse Gross <jesse@nicira.com> wrote:
> On Mon, Aug 17, 2015 at 2:11 PM, Pravin B Shelar <pshelar@nicira.com> wrote:
>> diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
>> index e47cdd9..0d7fbef 100644
>> --- a/drivers/net/geneve.c
>> +++ b/drivers/net/geneve.c
>> -static int geneve_configure(struct net *net, struct net_device *dev,
>> -                           __be32 rem_addr, __u32 vni, __u8 ttl, __u8 tos,
>> -                           __u16 dst_port, bool metadata)
>> +static int __geneve_configure(struct net *net, struct net_device *dev,
>> +                             __be32 rem_addr, __u32 vni, __u8 ttl, __u8 tos,
>> +                             __u16 dst_port, bool metadata)
>>  {
> [...]
>>         geneve->net = net;
>>         geneve->dev = dev;
>
> I guess this stuff should really be in geneve_configure() - it seems a
> bit odd to change it for a running device (even if it shouldn't
> change).
>
ok.

>>         geneve->remote.sin_addr.s_addr = rem_addr;
>>         if (IN_MULTICAST(ntohl(geneve->remote.sin_addr.s_addr)))
>>                 return -EINVAL;
>>
>> +       u32_to_vni(vni, geneve->vni);
>>         list_for_each_entry(t, &gn->geneve_list, next) {
>>                 if (!memcmp(geneve->vni, t->vni, sizeof(t->vni)) &&
>>                     rem_addr == t->remote.sin_addr.s_addr &&
>
> I'm not sure that these types of operations are safe if the device is
> already running. We first overwrite the remote value and then we do
> error checking but that means that if there is an error, then the
> device will be left in a broken state. Don't we also need to update
> the hash table if some of these parameters change?
>
ok, I will stop device before making changes. that way we can add it
to hash table.

>> +static int geneve_changelink(struct net_device *dev,
>> +                            struct nlattr *tb[], struct nlattr *data[])
>> +{
> [...]
>> -       if (data[IFLA_GENEVE_PORT])
>> -               dst_port = nla_get_u16(data[IFLA_GENEVE_PORT]);
>> +       if (geneve->sock && (dst_port != ntohs(geneve->dst_port) ||
>> +                            metadata != geneve->collect_md)) {
>
> It seems like in an ideal world, we wouldn't need to recreate the
> socket if metadata collection changed (assuming that there are no new
> conflicts).

To keep changelink simple I am thinking of disallowing metadata changes.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next v2 7/9] geneve: Consolidate Geneve functionality in single module.
  2015-08-19 19:50           ` Jesse Gross
@ 2015-08-19 20:14             ` Pravin Shelar
  0 siblings, 0 replies; 33+ messages in thread
From: Pravin Shelar @ 2015-08-19 20:14 UTC (permalink / raw)
  To: Jesse Gross; +Cc: netdev

On Wed, Aug 19, 2015 at 12:50 PM, Jesse Gross <jesse@nicira.com> wrote:
> On Wed, Aug 19, 2015 at 11:49 AM, Pravin Shelar <pshelar@nicira.com> wrote:
>> On Wed, Aug 19, 2015 at 11:37 AM, Jesse Gross <jesse@nicira.com> wrote:
>>> On Wed, Aug 19, 2015 at 11:29 AM, Pravin Shelar <pshelar@nicira.com> wrote:
>>>> On Wed, Aug 19, 2015 at 11:18 AM, Jesse Gross <jesse@nicira.com> wrote:
>>>>> My guess is that if the issue from the earlier patch about overlapping
>>>>> collect_md tunnels is fixed then that might allow us to simplify
>>>>> things a little further, since for those tunnels we can assume there
>>>>> is a 1:1 mapping between collect_md tunnels and sockets.
>>>>
>>>> I dont see how it would be different. Can you elaborate on this ?
>>>
>>> Mostly just conceptually simpler. Right now it looks like we are doing
>>> some kind of refcounting between devices and tunnels in
>>> geneve_open/stop (I know it's not really but it appears like that in
>>> some ways.) We could just directly assign collect_md in geneve_open()
>>> and do nothing at all in geneve_stop().
>>
>> If you look at next patch, I have changed geneve_open and stop
>> further. The change is geneve_open adds tunnel to hash table so that
>> only device which are open are in hash table. Since geneve_open and
>> stop is common for both type of tunnel I do not think there can be any
>> changes even after avoiding overlapping tunnel types in given socket.
>
> I guess I'm not sure why with the later changes it would be
> incompatible. All I'm talking about is something pretty small:
>
> geneve_open:
>         if (geneve->collect_md)
>                 gs->collect_md = true;
> to
>                 gs->collect_md = geneve->collect_md;
>
> geneve_close:
> remove
>         if (geneve->collect_md)
>                 gs->collect_md = false;
> since the socket is about to be freed anyways.
>
> It's not very different in practice but it looks less like refcounting
> and more like a 1:1 mapping.

ok, I thought you were talking about socket refcounting. I will make
the changes.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next v2 9/9] geneve: Implement rtnl changelink
  2015-08-19 20:12     ` Pravin Shelar
@ 2015-08-19 22:39       ` Jesse Gross
  2015-08-20 20:33         ` Pravin Shelar
  0 siblings, 1 reply; 33+ messages in thread
From: Jesse Gross @ 2015-08-19 22:39 UTC (permalink / raw)
  To: Pravin Shelar; +Cc: netdev

On Wed, Aug 19, 2015 at 1:12 PM, Pravin Shelar <pshelar@nicira.com> wrote:
> On Wed, Aug 19, 2015 at 12:40 PM, Jesse Gross <jesse@nicira.com> wrote:
>> On Mon, Aug 17, 2015 at 2:11 PM, Pravin B Shelar <pshelar@nicira.com> wrote:
>>> diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
>>> index e47cdd9..0d7fbef 100644
>>> --- a/drivers/net/geneve.c
>>> +++ b/drivers/net/geneve.c
>>>         geneve->remote.sin_addr.s_addr = rem_addr;
>>>         if (IN_MULTICAST(ntohl(geneve->remote.sin_addr.s_addr)))
>>>                 return -EINVAL;
>>>
>>> +       u32_to_vni(vni, geneve->vni);
>>>         list_for_each_entry(t, &gn->geneve_list, next) {
>>>                 if (!memcmp(geneve->vni, t->vni, sizeof(t->vni)) &&
>>>                     rem_addr == t->remote.sin_addr.s_addr &&
>>
>> I'm not sure that these types of operations are safe if the device is
>> already running. We first overwrite the remote value and then we do
>> error checking but that means that if there is an error, then the
>> device will be left in a broken state. Don't we also need to update
>> the hash table if some of these parameters change?
>>
> ok, I will stop device before making changes. that way we can add it
> to hash table.

I think we should still strive to make changes as minimally disruptive
as possible. At least some changes can still be done safely at runtime
so it would be nice to be able to handle those cleanly.

>>> +static int geneve_changelink(struct net_device *dev,
>>> +                            struct nlattr *tb[], struct nlattr *data[])
>>> +{
>> [...]
>>> -       if (data[IFLA_GENEVE_PORT])
>>> -               dst_port = nla_get_u16(data[IFLA_GENEVE_PORT]);
>>> +       if (geneve->sock && (dst_port != ntohs(geneve->dst_port) ||
>>> +                            metadata != geneve->collect_md)) {
>>
>> It seems like in an ideal world, we wouldn't need to recreate the
>> socket if metadata collection changed (assuming that there are no new
>> conflicts).
>
> To keep changelink simple I am thinking of disallowing metadata changes.

I guess I would rather allow it but make changes slower than disallow
it. That way it is easier to improve in the future if necessary.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next v2 2/9] geneve: Use skb mark and protocol to lookup route.
  2015-08-17 21:11 ` [PATCH net-next v2 2/9] geneve: Use skb mark and protocol to lookup route Pravin B Shelar
  2015-08-19  0:12   ` Jesse Gross
@ 2015-08-20 16:47   ` Thomas Graf
  1 sibling, 0 replies; 33+ messages in thread
From: Thomas Graf @ 2015-08-20 16:47 UTC (permalink / raw)
  To: Pravin B Shelar; +Cc: netdev

On 08/17/15 at 02:11pm, Pravin B Shelar wrote:
> On packet transmit path geneve need to lookup route. Following
> patch improves route lookup using more parameters.
> 
> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
> ---
>  drivers/net/geneve.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
> index 51f7f8b..4f4d15e 100644
> --- a/drivers/net/geneve.c
> +++ b/drivers/net/geneve.c
> @@ -202,6 +202,9 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct net_device *dev)
>  	memset(&fl4, 0, sizeof(fl4));
>  	fl4.flowi4_tos = RT_TOS(tos);
>  	fl4.daddr = geneve->remote.sin_addr.s_addr;
> +	fl4.flowi4_mark = skb->mark;
> +	fl4.flowi4_proto = IPPROTO_UDP;

Looks like it's time to add a helper for this so we don't lag behind
in specifying flowi parameters in various lookup sites all the time.

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next v2 1/9] geneve: Initialize ethernet address in device setup.
  2015-08-17 21:11 ` [PATCH net-next v2 1/9] geneve: Initialize ethernet address in device setup Pravin B Shelar
  2015-08-19  0:10   ` Jesse Gross
@ 2015-08-20 16:47   ` Thomas Graf
  1 sibling, 0 replies; 33+ messages in thread
From: Thomas Graf @ 2015-08-20 16:47 UTC (permalink / raw)
  To: Pravin B Shelar; +Cc: netdev

On 08/17/15 at 02:11pm, Pravin B Shelar wrote:
> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next v2 3/9] tunnel: introduce udp_tun_rx_dst()
  2015-08-17 21:11 ` [PATCH net-next v2 3/9] tunnel: introduce udp_tun_rx_dst() Pravin B Shelar
@ 2015-08-20 16:56   ` Thomas Graf
  2015-08-20 20:36     ` Pravin Shelar
  0 siblings, 1 reply; 33+ messages in thread
From: Thomas Graf @ 2015-08-20 16:56 UTC (permalink / raw)
  To: Pravin B Shelar; +Cc: netdev

On 08/17/15 at 02:11pm, Pravin B Shelar wrote:
> Introduce function udp_tun_rx_dst() to initialize tunnel dst on
> receive path.
> 
> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
> Reviewed-by: Jesse Gross <jesse@nicira.com>

This looks great but conflicts with Jiri Benc's IPv6 series. Can we
rebase this on top of his work so we get IPv6 support in the new
helpers from the beginning?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next v2 4/9] geneve: Make dst-port configurable.
  2015-08-17 21:11 ` [PATCH net-next v2 4/9] geneve: Make dst-port configurable Pravin B Shelar
  2015-08-19  0:27   ` Jesse Gross
@ 2015-08-20 16:58   ` Thomas Graf
  1 sibling, 0 replies; 33+ messages in thread
From: Thomas Graf @ 2015-08-20 16:58 UTC (permalink / raw)
  To: Pravin B Shelar; +Cc: netdev

On 08/17/15 at 02:11pm, Pravin B Shelar wrote:
> @@ -403,6 +416,7 @@ static size_t geneve_get_size(const struct net_device *dev)
>  		nla_total_size(sizeof(struct in_addr)) + /* IFLA_GENEVE_REMOTE */
>  		nla_total_size(sizeof(__u8)) +  /* IFLA_GENEVE_TTL */
>  		nla_total_size(sizeof(__u8)) +  /* IFLA_GENEVE_TOS */
> +		nla_total_size(sizeof(__u16)) +  /* IFLA_GENEVE_PORT */
>  		0;
>  }
>  
> @@ -423,6 +437,9 @@ static int geneve_fill_info(struct sk_buff *skb, const struct net_device *dev)
>  	    nla_put_u8(skb, IFLA_GENEVE_TOS, geneve->tos))
>  		goto nla_put_failure;
>  
> +	if (nla_put_u32(skb, IFLA_GENEVE_PORT, ntohs(geneve->dst_port)))
> +		goto nla_put_failure;

nla_put_u16?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next v2 5/9] geneve: Add support to collect tunnel metadata.
  2015-08-19  1:07   ` Jesse Gross
@ 2015-08-20 20:31     ` Pravin Shelar
  0 siblings, 0 replies; 33+ messages in thread
From: Pravin Shelar @ 2015-08-20 20:31 UTC (permalink / raw)
  To: Jesse Gross; +Cc: netdev

On Tue, Aug 18, 2015 at 6:07 PM, Jesse Gross <jesse@nicira.com> wrote:
> On Mon, Aug 17, 2015 at 2:11 PM, Pravin B Shelar <pshelar@nicira.com> wrote:
>> diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
>> index 546494d..cb2d874 100644
>> --- a/drivers/net/geneve.c
>> +++ b/drivers/net/geneve.c
>> +static int geneve_configure(struct net *net, struct net_device *dev,
>> +                           __be32 rem_addr, __u32 vni, __u8 ttl, __u8 tos,
>> +                           __u16 dst_port, bool metadata)
> [...]
>> -       if (!data[IFLA_GENEVE_ID] || !data[IFLA_GENEVE_REMOTE])
>> -               return -EINVAL;
>> +       if (metadata && rtnl_dereference(gn->collect_md_tun))
>> +               return -EEXIST;
>
> This will allow you to configure a collect_md_tun devices and normal
> devices on top of each other, even though the latter will never get
> hit in that situation.

ok. I will make them mutually exclusive.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next v2 9/9] geneve: Implement rtnl changelink
  2015-08-19 22:39       ` Jesse Gross
@ 2015-08-20 20:33         ` Pravin Shelar
  0 siblings, 0 replies; 33+ messages in thread
From: Pravin Shelar @ 2015-08-20 20:33 UTC (permalink / raw)
  To: Jesse Gross; +Cc: netdev

On Wed, Aug 19, 2015 at 3:39 PM, Jesse Gross <jesse@nicira.com> wrote:
> On Wed, Aug 19, 2015 at 1:12 PM, Pravin Shelar <pshelar@nicira.com> wrote:
>> On Wed, Aug 19, 2015 at 12:40 PM, Jesse Gross <jesse@nicira.com> wrote:
>>> On Mon, Aug 17, 2015 at 2:11 PM, Pravin B Shelar <pshelar@nicira.com> wrote:
>>>> diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
>>>> index e47cdd9..0d7fbef 100644
>>>> --- a/drivers/net/geneve.c
>>>> +++ b/drivers/net/geneve.c
>>>>         geneve->remote.sin_addr.s_addr = rem_addr;
>>>>         if (IN_MULTICAST(ntohl(geneve->remote.sin_addr.s_addr)))
>>>>                 return -EINVAL;
>>>>
>>>> +       u32_to_vni(vni, geneve->vni);
>>>>         list_for_each_entry(t, &gn->geneve_list, next) {
>>>>                 if (!memcmp(geneve->vni, t->vni, sizeof(t->vni)) &&
>>>>                     rem_addr == t->remote.sin_addr.s_addr &&
>>>
>>> I'm not sure that these types of operations are safe if the device is
>>> already running. We first overwrite the remote value and then we do
>>> error checking but that means that if there is an error, then the
>>> device will be left in a broken state. Don't we also need to update
>>> the hash table if some of these parameters change?
>>>
>> ok, I will stop device before making changes. that way we can add it
>> to hash table.
>
> I think we should still strive to make changes as minimally disruptive
> as possible. At least some changes can still be done safely at runtime
> so it would be nice to be able to handle those cleanly.
>
>>>> +static int geneve_changelink(struct net_device *dev,
>>>> +                            struct nlattr *tb[], struct nlattr *data[])
>>>> +{
>>> [...]
>>>> -       if (data[IFLA_GENEVE_PORT])
>>>> -               dst_port = nla_get_u16(data[IFLA_GENEVE_PORT]);
>>>> +       if (geneve->sock && (dst_port != ntohs(geneve->dst_port) ||
>>>> +                            metadata != geneve->collect_md)) {
>>>
>>> It seems like in an ideal world, we wouldn't need to recreate the
>>> socket if metadata collection changed (assuming that there are no new
>>> conflicts).
>>
>> To keep changelink simple I am thinking of disallowing metadata changes.
>
> I guess I would rather allow it but make changes slower than disallow
> it. That way it is easier to improve in the future if necessary.

These changes need more refactoring in Geneve code. I will post it as
separate patch series once this series is in.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next v2 5/9] geneve: Add support to collect tunnel metadata.
  2015-08-18 21:12   ` David Miller
@ 2015-08-20 20:33     ` Pravin Shelar
  0 siblings, 0 replies; 33+ messages in thread
From: Pravin Shelar @ 2015-08-20 20:33 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Tue, Aug 18, 2015 at 2:12 PM, David Miller <davem@davemloft.net> wrote:
> From: Pravin B Shelar <pshelar@nicira.com>
> Date: Mon, 17 Aug 2015 14:11:41 -0700
>
>> +     if (tun_dst)
>> +             skb_dst_set(skb, (struct dst_entry *)tun_dst);
>> +
>
> Please don't cast things like this, it is completely unnecessary:
>
>         skb_dst_set(skb, &tun_dst->dst);

ok.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next v2 3/9] tunnel: introduce udp_tun_rx_dst()
  2015-08-20 16:56   ` Thomas Graf
@ 2015-08-20 20:36     ` Pravin Shelar
  0 siblings, 0 replies; 33+ messages in thread
From: Pravin Shelar @ 2015-08-20 20:36 UTC (permalink / raw)
  To: Thomas Graf; +Cc: netdev

On Thu, Aug 20, 2015 at 9:56 AM, Thomas Graf <tgraf@suug.ch> wrote:
> On 08/17/15 at 02:11pm, Pravin B Shelar wrote:
>> Introduce function udp_tun_rx_dst() to initialize tunnel dst on
>> receive path.
>>
>> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
>> Reviewed-by: Jesse Gross <jesse@nicira.com>
>
> This looks great but conflicts with Jiri Benc's IPv6 series. Can we
> rebase this on top of his work so we get IPv6 support in the new
> helpers from the beginning?

Yes, I will post rebased series after Jiri Benc's patch series is pushed.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2015-08-20 20:36 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-17 21:11 [PATCH net-next v2 0/9] Geneve: Add support for tunnel metadata mode Pravin B Shelar
2015-08-17 21:11 ` [PATCH net-next v2 1/9] geneve: Initialize ethernet address in device setup Pravin B Shelar
2015-08-19  0:10   ` Jesse Gross
2015-08-20 16:47   ` Thomas Graf
2015-08-17 21:11 ` [PATCH net-next v2 2/9] geneve: Use skb mark and protocol to lookup route Pravin B Shelar
2015-08-19  0:12   ` Jesse Gross
2015-08-20 16:47   ` Thomas Graf
2015-08-17 21:11 ` [PATCH net-next v2 3/9] tunnel: introduce udp_tun_rx_dst() Pravin B Shelar
2015-08-20 16:56   ` Thomas Graf
2015-08-20 20:36     ` Pravin Shelar
2015-08-17 21:11 ` [PATCH net-next v2 4/9] geneve: Make dst-port configurable Pravin B Shelar
2015-08-19  0:27   ` Jesse Gross
2015-08-20 16:58   ` Thomas Graf
2015-08-17 21:11 ` [PATCH net-next v2 5/9] geneve: Add support to collect tunnel metadata Pravin B Shelar
2015-08-18 21:12   ` David Miller
2015-08-20 20:33     ` Pravin Shelar
2015-08-19  1:07   ` Jesse Gross
2015-08-20 20:31     ` Pravin Shelar
2015-08-17 21:11 ` [PATCH net-next v2 6/9] openvswitch: Use Geneve device Pravin B Shelar
2015-08-17 21:11 ` [PATCH net-next v2 7/9] geneve: Consolidate Geneve functionality in single module Pravin B Shelar
2015-08-19 18:18   ` Jesse Gross
2015-08-19 18:29     ` Pravin Shelar
2015-08-19 18:37       ` Jesse Gross
2015-08-19 18:49         ` Pravin Shelar
2015-08-19 19:50           ` Jesse Gross
2015-08-19 20:14             ` Pravin Shelar
2015-08-17 21:11 ` [PATCH net-next v2 8/9] geneve: Move device hash table to geneve socket Pravin B Shelar
2015-08-19 19:05   ` Jesse Gross
2015-08-17 21:11 ` [PATCH net-next v2 9/9] geneve: Implement rtnl changelink Pravin B Shelar
2015-08-19 19:40   ` Jesse Gross
2015-08-19 20:12     ` Pravin Shelar
2015-08-19 22:39       ` Jesse Gross
2015-08-20 20:33         ` Pravin Shelar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.