All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v4 0/5] Add VRF support for VXLAN underlay
@ 2018-11-22  1:07 Alexis Bauvin
  2018-11-22  1:07 ` [RFC v4 1/5] udp_tunnel: add config option to bind to a device Alexis Bauvin
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Alexis Bauvin @ 2018-11-22  1:07 UTC (permalink / raw)
  To: dsa, roopa; +Cc: netdev, abauvin, akherbouche

v3 -> v4:
- rename vxlan_is_in_l3mdev_chain to netdev_is_upper master
- move it to net/core/dev.c
- make it return bool instead of int
- check if remote_ifindex is zero before resolving the l3mdev
- add testing script

v2 -> v3:
- fix build when CONFIG_NET_IPV6 is off
- fix build "unused l3mdev_master_upper_ifindex_by_index" build error with some
  configs

v1 -> v2:
- move vxlan_get_l3mdev from vxlan driver to l3mdev driver as
  l3mdev_master_upper_ifindex_by_index
- vxlan: rename variables named l3mdev_ifindex to ifindex

v0 -> v1:
- fix typos

We are trying to isolate the VXLAN traffic from different VMs with VRF as shown
in the schemas below:

+-------------------------+   +----------------------------+
| +----------+            |   |     +------------+         |
| |          |            |   |     |            |         |
| | tap-red  |            |   |     |  tap-blue  |         |
| |          |            |   |     |            |         |
| +----+-----+            |   |     +-----+------+         |
|      |                  |   |           |                |
|      |                  |   |           |                |
| +----+---+              |   |      +----+----+           |
| |        |              |   |      |         |           |
| | br-red |              |   |      | br-blue |           |
| |        |              |   |      |         |           |
| +----+---+              |   |      +----+----+           |
|      |                  |   |           |                |
|      |                  |   |           |                |
|      |                  |   |           |                |
| +----+--------+         |   |     +--------------+       |
| |             |         |   |     |              |       |
| |  vxlan-red  |         |   |     |  vxlan-blue  |       |
| |             |         |   |     |              |       |
| +------+------+         |   |     +-------+------+       |
|        |                |   |             |              |
|        |           VRF  |   |             |          VRF |
|        |           red  |   |             |         blue |
+-------------------------+   +----------------------------+
         |                                  |
         |                                  |
 +---------------------------------------------------------+
 |       |                                  |              |
 |       |                                  |              |
 |       |         +--------------+         |              |
 |       |         |              |         |              |
 |       +---------+  eth0.2030   +---------+              |
 |                 |  10.0.0.1/24 |                        |
 |                 +-----+--------+                    VRF |
 |                       |                            green|
 +---------------------------------------------------------+
                         |
                         |
                    +----+---+
                    |        |
                    |  eth0  |
                    |        |
                    +--------+


iproute2 commands to reproduce the setup:

ip link add green type vrf table 1
ip link set green up
ip link add eth0.2030 link eth0 type vlan id 2030
ip link set eth0.2030 master green
ip addr add 10.0.0.1/24 dev eth0.2030
ip link set eth0.2030 up

ip link add blue type vrf table 2
ip link set blue up
ip link add br-blue type bridge
ip link set br-blue master blue
ip link set br-blue up
ip link add vxlan-blue type vxlan id 2 local 10.0.0.1 dev eth0.2030 \
 port 4789
ip link set vxlan-blue master br-blue
ip link set vxlan-blue up
ip link set tap-blue master br-blue
ip link set tap-blue up

ip link add red type vrf table 3
ip link set red up
ip link add br-red type bridge
ip link set br-red master red
ip link set br-red up
ip link add vxlan-red type vxlan id 3 local 10.0.0.1 dev eth0.2030 \
 port 4789
ip link set vxlan-red master br-red
ip link set vxlan-red up
ip link set tap-red master br-red
ip link set tap-red up

We faced some issue in the datapath, here are the details:

* Egress traffic:
The vxlan packets are sent directly to the default VRF because it's where the
socket is bound, therefore the traffic has a default route via eth0. the
workarount is to force this traffic to VRF green with ip rules.

* Ingress traffic:
When receiving the traffic on eth0.2030 the vxlan socket is unreachable from
VRF green. The workaround is to enable *udp_l3mdev_accept* sysctl, but
this breaks isolation between overlay and underlay: packets sent from
blue or red by e.g. a guest VM will be accepted by the socket, allowing
injection of VXLAN packets from the overlay.

This patch serie fixes the issues describe above by allowing VXLAN socket to be
bound to a specific VRF device therefore looking up in the correct table.

Alexis Bauvin (5):
  udp_tunnel: add config option to bind to a device
  l3mdev: add function to retreive upper master
  vxlan: add support for underlay in non-default VRF
  netdev: add netdev_is_upper_master
  vxlan: handle underlay VRF changes

 drivers/net/vxlan.c                           | 114 ++++++++++++++++--
 include/linux/netdevice.h                     |   1 +
 include/net/l3mdev.h                          |  22 ++++
 include/net/udp_tunnel.h                      |   1 +
 net/core/dev.c                                |  17 +++
 net/ipv4/udp_tunnel.c                         |  10 ++
 net/ipv6/ip6_udp_tunnel.c                     |   9 ++
 net/l3mdev/l3mdev.c                           |  18 +++
 .../selftests/net/test_vxlan_under_vrf.sh     |  86 +++++++++++++
 9 files changed, 270 insertions(+), 8 deletions(-)
 create mode 100755 tools/testing/selftests/net/test_vxlan_under_vrf.sh

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC v4 1/5] udp_tunnel: add config option to bind to a device
  2018-11-22  1:07 [RFC v4 0/5] Add VRF support for VXLAN underlay Alexis Bauvin
@ 2018-11-22  1:07 ` Alexis Bauvin
  2018-11-22 17:10   ` David Ahern
  2018-11-22  1:07 ` [RFC v4 2/5] l3mdev: add function to retreive upper master Alexis Bauvin
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Alexis Bauvin @ 2018-11-22  1:07 UTC (permalink / raw)
  To: dsa, roopa; +Cc: netdev, abauvin, akherbouche

UDP tunnel sockets are always opened unbound to a specific device. This
patch allow the socket to be bound on a custom device, which
incidentally makes UDP tunnels VRF-aware if binding to an l3mdev.

Signed-off-by: Alexis Bauvin <abauvin@scaleway.com>
Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com>
Tested-by: Amine Kherbouche <akherbouche@scaleway.com>
---
 include/net/udp_tunnel.h  |  1 +
 net/ipv4/udp_tunnel.c     | 10 ++++++++++
 net/ipv6/ip6_udp_tunnel.c |  9 +++++++++
 3 files changed, 20 insertions(+)

diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
index fe680ab6b15a..9f7970d010f9 100644
--- a/include/net/udp_tunnel.h
+++ b/include/net/udp_tunnel.h
@@ -30,6 +30,7 @@ struct udp_port_cfg {
 
 	__be16			local_udp_port;
 	__be16			peer_udp_port;
+	int			bind_ifindex;
 	unsigned int		use_udp_checksums:1,
 				use_udp6_tx_checksums:1,
 				use_udp6_rx_checksums:1,
diff --git a/net/ipv4/udp_tunnel.c b/net/ipv4/udp_tunnel.c
index 6539ff15e9a3..dc68e15a4f72 100644
--- a/net/ipv4/udp_tunnel.c
+++ b/net/ipv4/udp_tunnel.c
@@ -20,6 +20,16 @@ int udp_sock_create4(struct net *net, struct udp_port_cfg *cfg,
 	if (err < 0)
 		goto error;
 
+	if (cfg->bind_ifindex) {
+		struct net_device *dev;
+
+		dev = __dev_get_by_index(net, cfg->bind_ifindex);
+		err = kernel_setsockopt(sock, SOL_SOCKET, SO_BINDTODEVICE,
+					dev->name, strlen(dev->name) + 1);
+		if (err < 0)
+			goto error;
+	}
+
 	udp_addr.sin_family = AF_INET;
 	udp_addr.sin_addr = cfg->local_ip;
 	udp_addr.sin_port = cfg->local_udp_port;
diff --git a/net/ipv6/ip6_udp_tunnel.c b/net/ipv6/ip6_udp_tunnel.c
index b283f293ee4a..fc3811ef8787 100644
--- a/net/ipv6/ip6_udp_tunnel.c
+++ b/net/ipv6/ip6_udp_tunnel.c
@@ -31,6 +31,15 @@ int udp_sock_create6(struct net *net, struct udp_port_cfg *cfg,
 		if (err < 0)
 			goto error;
 	}
+	if (cfg->bind_ifindex) {
+		struct net_device *dev;
+
+		dev = __dev_get_by_index(net, cfg->bind_ifindex);
+		err = kernel_setsockopt(sock, SOL_SOCKET, SO_BINDTODEVICE,
+					dev->name, strlen(dev->name) + 1);
+		if (err < 0)
+			goto error;
+	}
 
 	udp6_addr.sin6_family = AF_INET6;
 	memcpy(&udp6_addr.sin6_addr, &cfg->local_ip6,
-- 

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC v4 2/5] l3mdev: add function to retreive upper master
  2018-11-22  1:07 [RFC v4 0/5] Add VRF support for VXLAN underlay Alexis Bauvin
  2018-11-22  1:07 ` [RFC v4 1/5] udp_tunnel: add config option to bind to a device Alexis Bauvin
@ 2018-11-22  1:07 ` Alexis Bauvin
  2018-11-22 17:11   ` David Ahern
  2018-11-22  1:07 ` [RFC v4 3/5] vxlan: add support for underlay in non-default VRF Alexis Bauvin
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Alexis Bauvin @ 2018-11-22  1:07 UTC (permalink / raw)
  To: dsa, roopa; +Cc: netdev, abauvin, akherbouche

Existing functions to retreive the l3mdev of a device did not walk the
master chain to find the upper master. This patch adds a function to
find the l3mdev, even indirect through e.g. a bridge:

+----------+
|          |
| vrf-blue |
|          |
+----+-----+
     |
     |
+----+-----+
|          |
| br-blue  |
|          |
+----+-----+
     |
     |
+----+-----+
|          |
|   eth0   |
|          |
+----------+

This will properly resolve the l3mdev of eth0 to vrf-blue.

Signed-off-by: Alexis Bauvin <abauvin@scaleway.com>
Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com>
Tested-by: Amine Kherbouche <akherbouche@scaleway.com>
---
 include/net/l3mdev.h | 22 ++++++++++++++++++++++
 net/l3mdev/l3mdev.c  | 18 ++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/include/net/l3mdev.h b/include/net/l3mdev.h
index 3832099289c5..78fa0ac4613c 100644
--- a/include/net/l3mdev.h
+++ b/include/net/l3mdev.h
@@ -101,6 +101,17 @@ struct net_device *l3mdev_master_dev_rcu(const struct net_device *_dev)
 	return master;
 }
 
+int l3mdev_master_upper_ifindex_by_index_rcu(struct net *net, int ifindex);
+static inline
+int l3mdev_master_upper_ifindex_by_index(struct net *net, int ifindex)
+{
+	rcu_read_lock();
+	ifindex = l3mdev_master_upper_ifindex_by_index_rcu(net, ifindex);
+	rcu_read_unlock();
+
+	return ifindex;
+}
+
 u32 l3mdev_fib_table_rcu(const struct net_device *dev);
 u32 l3mdev_fib_table_by_index(struct net *net, int ifindex);
 static inline u32 l3mdev_fib_table(const struct net_device *dev)
@@ -207,6 +218,17 @@ static inline int l3mdev_master_ifindex_by_index(struct net *net, int ifindex)
 	return 0;
 }
 
+static inline
+int l3mdev_master_upper_ifindex_by_index_rcu(struct net *net, int ifindex)
+{
+	return 0;
+}
+static inline
+int l3mdev_master_upper_ifindex_by_index(struct net *net, int ifindex)
+{
+	return 0;
+}
+
 static inline
 struct net_device *l3mdev_master_dev_rcu(const struct net_device *dev)
 {
diff --git a/net/l3mdev/l3mdev.c b/net/l3mdev/l3mdev.c
index 8da86ceca33d..309dee76724e 100644
--- a/net/l3mdev/l3mdev.c
+++ b/net/l3mdev/l3mdev.c
@@ -46,6 +46,24 @@ int l3mdev_master_ifindex_rcu(const struct net_device *dev)
 }
 EXPORT_SYMBOL_GPL(l3mdev_master_ifindex_rcu);
 
+/**
+ *	l3mdev_master_upper_ifindex_by_index - get index of upper l3 master
+ *					       device
+ *	@net: network namespace for device index lookup
+ *	@ifindex: targeted interface
+ */
+int l3mdev_master_upper_ifindex_by_index_rcu(struct net *net, int ifindex)
+{
+	struct net_device *dev;
+
+	dev = dev_get_by_index_rcu(net, ifindex);
+	while (dev && !netif_is_l3_master(dev))
+		dev = netdev_master_upper_dev_get(dev);
+
+	return dev ? dev->ifindex : 0;
+}
+EXPORT_SYMBOL_GPL(l3mdev_master_upper_ifindex_by_index_rcu);
+
 /**
  *	l3mdev_fib_table - get FIB table id associated with an L3
  *                             master interface
-- 

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC v4 3/5] vxlan: add support for underlay in non-default VRF
  2018-11-22  1:07 [RFC v4 0/5] Add VRF support for VXLAN underlay Alexis Bauvin
  2018-11-22  1:07 ` [RFC v4 1/5] udp_tunnel: add config option to bind to a device Alexis Bauvin
  2018-11-22  1:07 ` [RFC v4 2/5] l3mdev: add function to retreive upper master Alexis Bauvin
@ 2018-11-22  1:07 ` Alexis Bauvin
  2018-11-22 17:19   ` David Ahern
  2018-11-22  1:07 ` [RFC v4 4/5] netdev: add netdev_is_upper_master Alexis Bauvin
  2018-11-22  1:07 ` [RFC v4 5/5] vxlan: handle underlay VRF changes Alexis Bauvin
  4 siblings, 1 reply; 19+ messages in thread
From: Alexis Bauvin @ 2018-11-22  1:07 UTC (permalink / raw)
  To: dsa, roopa; +Cc: netdev, abauvin, akherbouche

Creating a VXLAN device with is underlay in the non-default VRF makes
egress route lookup fail or incorrect since it will resolve in the
default VRF, and ingress fail because the socket listens in the default
VRF.

This patch binds the underlying UDP tunnel socket to the l3mdev of the
lower device of the VXLAN device. This will listen in the proper VRF and
output traffic from said l3mdev, matching l3mdev routing rules and
looking up the correct routing table.

When the VXLAN device does not have a lower device, or the lower device
is in the default VRF, the socket will not be bound to any interface,
keeping the previous behaviour.

The underlay l3mdev is deduced from the VXLAN lower device
(IFLA_VXLAN_LINK).

+----------+                         +---------+
|          |                         |         |
| vrf-blue |                         | vrf-red |
|          |                         |         |
+----+-----+                         +----+----+
     |                                    |
     |                                    |
+----+-----+                         +----+----+
|          |                         |         |
| br-blue  |                         | br-red  |
|          |                         |         |
+----+-----+                         +---+-+---+
     |                                   | |
     |                             +-----+ +-----+
     |                             |             |
+----+-----+                +------+----+   +----+----+
|          |  lower device  |           |   |         |
|   eth0   | <- - - - - - - | vxlan-red |   | tap-red | (... more taps)
|          |                |           |   |         |
+----------+                +-----------+   +---------+

Signed-off-by: Alexis Bauvin <abauvin@scaleway.com>
Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com>
Tested-by: Amine Kherbouche <akherbouche@scaleway.com>
---
 drivers/net/vxlan.c                           | 32 +++++--
 .../selftests/net/test_vxlan_under_vrf.sh     | 90 +++++++++++++++++++
 2 files changed, 114 insertions(+), 8 deletions(-)
 create mode 100755 tools/testing/selftests/net/test_vxlan_under_vrf.sh

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 27bd586b94b0..8ba0a57ff958 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -212,7 +212,7 @@ static inline struct vxlan_rdst *first_remote_rtnl(struct vxlan_fdb *fdb)
  * and enabled unshareable flags.
  */
 static struct vxlan_sock *vxlan_find_sock(struct net *net, sa_family_t family,
-					  __be16 port, u32 flags)
+					  __be16 port, u32 flags, int ifindex)
 {
 	struct vxlan_sock *vs;
 
@@ -221,7 +221,8 @@ static struct vxlan_sock *vxlan_find_sock(struct net *net, sa_family_t family,
 	hlist_for_each_entry_rcu(vs, vs_head(net, port), hlist) {
 		if (inet_sk(vs->sock->sk)->inet_sport == port &&
 		    vxlan_get_sk_family(vs) == family &&
-		    vs->flags == flags)
+		    vs->flags == flags &&
+		    vs->sock->sk->sk_bound_dev_if == ifindex)
 			return vs;
 	}
 	return NULL;
@@ -261,7 +262,7 @@ static struct vxlan_dev *vxlan_find_vni(struct net *net, int ifindex,
 {
 	struct vxlan_sock *vs;
 
-	vs = vxlan_find_sock(net, family, port, flags);
+	vs = vxlan_find_sock(net, family, port, flags, ifindex);
 	if (!vs)
 		return NULL;
 
@@ -2172,6 +2173,9 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 		struct rtable *rt;
 		__be16 df = 0;
 
+		if (!ifindex)
+			ifindex = sock4->sock->sk->sk_bound_dev_if;
+
 		rt = vxlan_get_route(vxlan, dev, sock4, skb, ifindex, tos,
 				     dst->sin.sin_addr.s_addr,
 				     &local_ip.sin.sin_addr.s_addr,
@@ -2210,6 +2214,9 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 	} else {
 		struct vxlan_sock *sock6 = rcu_dereference(vxlan->vn6_sock);
 
+		if (!ifindex)
+			ifindex = sock6->sock->sk->sk_bound_dev_if;
+
 		ndst = vxlan6_get_route(vxlan, dev, sock6, skb, ifindex, tos,
 					label, &dst->sin6.sin6_addr,
 					&local_ip.sin6.sin6_addr,
@@ -2813,7 +2820,7 @@ static const struct ethtool_ops vxlan_ethtool_ops = {
 };
 
 static struct socket *vxlan_create_sock(struct net *net, bool ipv6,
-					__be16 port, u32 flags)
+					__be16 port, u32 flags, int ifindex)
 {
 	struct socket *sock;
 	struct udp_port_cfg udp_conf;
@@ -2831,6 +2838,7 @@ static struct socket *vxlan_create_sock(struct net *net, bool ipv6,
 	}
 
 	udp_conf.local_udp_port = port;
+	udp_conf.bind_ifindex = ifindex;
 
 	/* Open UDP socket */
 	err = udp_sock_create(net, &udp_conf, &sock);
@@ -2842,7 +2850,8 @@ static struct socket *vxlan_create_sock(struct net *net, bool ipv6,
 
 /* Create new listen socket if needed */
 static struct vxlan_sock *vxlan_socket_create(struct net *net, bool ipv6,
-					      __be16 port, u32 flags)
+					      __be16 port, u32 flags,
+					      int ifindex)
 {
 	struct vxlan_net *vn = net_generic(net, vxlan_net_id);
 	struct vxlan_sock *vs;
@@ -2857,7 +2866,7 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, bool ipv6,
 	for (h = 0; h < VNI_HASH_SIZE; ++h)
 		INIT_HLIST_HEAD(&vs->vni_list[h]);
 
-	sock = vxlan_create_sock(net, ipv6, port, flags);
+	sock = vxlan_create_sock(net, ipv6, port, flags, ifindex);
 	if (IS_ERR(sock)) {
 		kfree(vs);
 		return ERR_CAST(sock);
@@ -2894,11 +2903,17 @@ static int __vxlan_sock_add(struct vxlan_dev *vxlan, bool ipv6)
 	struct vxlan_net *vn = net_generic(vxlan->net, vxlan_net_id);
 	struct vxlan_sock *vs = NULL;
 	struct vxlan_dev_node *node;
+	int l3mdev_index = 0;
+
+	if (vxlan->cfg.remote_ifindex)
+		l3mdev_index = l3mdev_master_upper_ifindex_by_index(
+			vxlan->net, vxlan->cfg.remote_ifindex);
 
 	if (!vxlan->cfg.no_share) {
 		spin_lock(&vn->sock_lock);
 		vs = vxlan_find_sock(vxlan->net, ipv6 ? AF_INET6 : AF_INET,
-				     vxlan->cfg.dst_port, vxlan->cfg.flags);
+				     vxlan->cfg.dst_port, vxlan->cfg.flags,
+				     l3mdev_index);
 		if (vs && !refcount_inc_not_zero(&vs->refcnt)) {
 			spin_unlock(&vn->sock_lock);
 			return -EBUSY;
@@ -2907,7 +2922,8 @@ static int __vxlan_sock_add(struct vxlan_dev *vxlan, bool ipv6)
 	}
 	if (!vs)
 		vs = vxlan_socket_create(vxlan->net, ipv6,
-					 vxlan->cfg.dst_port, vxlan->cfg.flags);
+					 vxlan->cfg.dst_port, vxlan->cfg.flags,
+					 l3mdev_index);
 	if (IS_ERR(vs))
 		return PTR_ERR(vs);
 #if IS_ENABLED(CONFIG_IPV6)
diff --git a/tools/testing/selftests/net/test_vxlan_under_vrf.sh b/tools/testing/selftests/net/test_vxlan_under_vrf.sh
new file mode 100755
index 000000000000..9ee906d5d333
--- /dev/null
+++ b/tools/testing/selftests/net/test_vxlan_under_vrf.sh
@@ -0,0 +1,90 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# This test is for checking VXLAN underlay in a non-default VRF.
+
+set -e
+
+cleanup() {
+    ip link del veth-hv-1 || true
+    ip link del veth-tap || true
+
+    for ns in hv-1 hv-2 vm-1 vm-2; do
+        ip netns del $ns || true
+    done
+}
+
+# Clean start
+cleanup &> /dev/null
+
+[[ $1 == "clean" ]] && exit 0
+
+trap cleanup EXIT
+
+# Setup "Hypervisors" simulated with netns
+ip link add veth-hv-1 type veth peer name veth-hv-2
+setup-hv-networking() {
+    hv=$1
+
+    ip netns add hv-$hv
+    ip link set veth-hv-$hv netns hv-$hv
+    ip netns exec hv-$hv ip link set veth-hv-$hv name veth0
+
+    ip netns exec hv-$hv ip link add vrf-underlay type vrf table 1
+    ip netns exec hv-$hv ip link set vrf-underlay up
+    ip netns exec hv-$hv ip addr add 172.16.0.$hv/24 dev veth0
+    ip netns exec hv-$hv ip link set veth0 up
+
+    ip netns exec hv-$hv ip link add br0 type bridge
+    ip netns exec hv-$hv ip link set br0 up
+
+    ip netns exec hv-$hv ip link add vxlan0 type vxlan id 10 local 172.16.0.$hv dev veth0 dstport 4789
+    ip netns exec hv-$hv ip link set vxlan0 master br0
+    ip netns exec hv-$hv ip link set vxlan0 up
+}
+setup-hv-networking 1
+setup-hv-networking 2
+
+# Check connectivity between HVs by pinging hv-2 from hv-1
+echo Checking HV connectivity
+ip netns exec hv-1 ping -c 1 -W 1 172.16.0.2 &> /dev/null || (echo FAIL; false)
+
+# Setups a "VM" simulated by a netns an a veth pair
+setup-vm() {
+    id=$1
+
+    ip netns add vm-$id
+    ip link add veth-tap type veth peer name veth-hv
+
+    ip link set veth-tap netns hv-$id
+    ip netns exec hv-$id ip link set veth-tap master br0
+    ip netns exec hv-$id ip link set veth-tap up
+
+    ip link set veth-hv netns vm-$id
+    ip netns exec vm-$id ip addr add 10.0.0.$id/24 dev veth-hv
+    ip netns exec vm-$id ip link set veth-hv up
+}
+setup-vm 1
+setup-vm 2
+
+# Setup VTEP routes to make ARP work
+ip netns exec hv-1 bridge fdb add 00:00:00:00:00:00 dev vxlan0 dst 172.16.0.2 self permanent
+ip netns exec hv-2 bridge fdb add 00:00:00:00:00:00 dev vxlan0 dst 172.16.0.1 self permanent
+
+echo "Check VM connectivity through VXLAN (underlay in the default VRF)"
+ip netns exec vm-1 ping -c 1 -W 1 10.0.0.2 &> /dev/null || (echo FAIL; false)
+
+# Move the underlay to a non-default VRF
+ip netns exec hv-1 ip link set veth0 vrf vrf-underlay
+ip netns exec hv-1 ip link set veth0 down
+ip netns exec hv-1 ip link set veth0 up
+ip netns exec hv-2 ip link set veth0 vrf vrf-underlay
+ip netns exec hv-2 ip link set veth0 down
+ip netns exec hv-2 ip link set veth0 up
+
+echo "Check VM connectivity through VXLAN (underlay in a VRF)"
+ip netns exec vm-1 ping -c 1 -W 1 10.0.0.2 &> /dev/null || (echo FAIL; false)
+
+echo SUCCESS
+
+cleanup &> /dev/null
-- 

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC v4 4/5] netdev: add netdev_is_upper_master
  2018-11-22  1:07 [RFC v4 0/5] Add VRF support for VXLAN underlay Alexis Bauvin
                   ` (2 preceding siblings ...)
  2018-11-22  1:07 ` [RFC v4 3/5] vxlan: add support for underlay in non-default VRF Alexis Bauvin
@ 2018-11-22  1:07 ` Alexis Bauvin
  2018-11-22 17:14   ` David Ahern
  2018-11-22  1:07 ` [RFC v4 5/5] vxlan: handle underlay VRF changes Alexis Bauvin
  4 siblings, 1 reply; 19+ messages in thread
From: Alexis Bauvin @ 2018-11-22  1:07 UTC (permalink / raw)
  To: dsa, roopa; +Cc: netdev, abauvin, akherbouche

In preparation of next patch, this function allows to check if a device
is a master, be it direct or indirect, of another one. It walks up the
master chain until it finds the device, or there is no more master.

This allows to check e.g. if br-blue is a master of eth0:

   +----------+
   | vrf-blue |
   +----+-----+
        |
   +----+----+
   | br-blue |
   +----+----+
        |
    +---+---+
    | bond0 |
    +--+-+--+
       | |
    +--+ +--+
    |       |
+---+--+ +--+---+
| eth0 | | eth1 |
+------+ +------+

Signed-off-by: Alexis Bauvin <abauvin@scaleway.com>
Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com>
Tested-by: Amine Kherbouche <akherbouche@scaleway.com>
---
 include/linux/netdevice.h |  1 +
 net/core/dev.c            | 17 +++++++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d837dad24b4c..102f79337d7c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4212,6 +4212,7 @@ void *netdev_lower_dev_get_private(struct net_device *dev,
 				   struct net_device *lower_dev);
 void netdev_lower_state_changed(struct net_device *lower_dev,
 				void *lower_state_info);
+bool netdev_is_upper_master(struct net_device *dev, struct net_device *master);
 
 /* RSS keys are 40 or 52 bytes long */
 #define NETDEV_RSS_KEY_LEN 52
diff --git a/net/core/dev.c b/net/core/dev.c
index 93243479085f..12459036d0da 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7225,6 +7225,23 @@ void netdev_lower_state_changed(struct net_device *lower_dev,
 }
 EXPORT_SYMBOL(netdev_lower_state_changed);
 
+/**
+ * netdev_is_upper_master - Test if a device is a master, direct or indirect,
+ *                          of another one.
+ * @dev: device to start looking from
+ * @master: device to test if master of dev
+ */
+bool netdev_is_upper_master(struct net_device *dev, struct net_device *master)
+{
+	if (!dev)
+		return false;
+
+	if (dev->ifindex == master->ifindex)
+		return true;
+	return netdev_is_upper_master(netdev_master_upper_dev_get(dev), master);
+}
+EXPORT_SYMBOL(netdev_is_upper_master);
+
 static void dev_change_rx_flags(struct net_device *dev, int flags)
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
-- 

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC v4 5/5] vxlan: handle underlay VRF changes
  2018-11-22  1:07 [RFC v4 0/5] Add VRF support for VXLAN underlay Alexis Bauvin
                   ` (3 preceding siblings ...)
  2018-11-22  1:07 ` [RFC v4 4/5] netdev: add netdev_is_upper_master Alexis Bauvin
@ 2018-11-22  1:07 ` Alexis Bauvin
  4 siblings, 0 replies; 19+ messages in thread
From: Alexis Bauvin @ 2018-11-22  1:07 UTC (permalink / raw)
  To: dsa, roopa; +Cc: netdev, abauvin, akherbouche

When underlay VRF changes, either because the lower device itself changed,
or its VRF changed, this patch releases the current socket of the VXLAN
device and recreates another one in the right VRF. This allows for
on-the-fly change of the underlay VRF of a VXLAN device.

Signed-off-by: Alexis Bauvin <abauvin@scaleway.com>
Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com>
Tested-by: Amine Kherbouche <akherbouche@scaleway.com>
---
 drivers/net/vxlan.c                           | 82 +++++++++++++++++++
 .../selftests/net/test_vxlan_under_vrf.sh     |  4 -
 2 files changed, 82 insertions(+), 4 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 8ba0a57ff958..131ee80a38f9 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -3720,6 +3720,33 @@ struct net_device *vxlan_dev_create(struct net *net, const char *name,
 }
 EXPORT_SYMBOL_GPL(vxlan_dev_create);
 
+static int vxlan_reopen(struct vxlan_net *vn, struct vxlan_dev *vxlan)
+{
+	int ret = 0;
+
+	if (vxlan_addr_multicast(&vxlan->default_dst.remote_ip) &&
+	    !vxlan_group_used(vn, vxlan))
+		ret = vxlan_igmp_leave(vxlan);
+	vxlan_sock_release(vxlan);
+
+	if (ret < 0)
+		return ret;
+
+	ret = vxlan_sock_add(vxlan);
+	if (ret < 0)
+		return ret;
+
+	if (vxlan_addr_multicast(&vxlan->default_dst.remote_ip)) {
+		ret = vxlan_igmp_join(vxlan);
+		if (ret == -EADDRINUSE)
+			ret = 0;
+		if (ret)
+			vxlan_sock_release(vxlan);
+	}
+
+	return ret;
+}
+
 static void vxlan_handle_lowerdev_unregister(struct vxlan_net *vn,
 					     struct net_device *dev)
 {
@@ -3742,6 +3769,55 @@ static void vxlan_handle_lowerdev_unregister(struct vxlan_net *vn,
 	unregister_netdevice_many(&list_kill);
 }
 
+static void vxlan_handle_change_upper(struct vxlan_net *vn,
+				      struct net_device *dev)
+{
+	struct vxlan_dev *vxlan, *next;
+
+	list_for_each_entry_safe(vxlan, next, &vn->vxlan_list, next) {
+		struct net_device *lower;
+		int err;
+
+		lower = __dev_get_by_index(vxlan->net,
+					   vxlan->cfg.remote_ifindex);
+		if (!netdev_is_upper_master(lower, dev))
+			continue;
+
+		err = vxlan_reopen(vn, vxlan);
+		if (err < 0)
+			netdev_err(vxlan->dev, "Failed to reopen socket: %d\n",
+				   err);
+	}
+}
+
+static void vxlan_handle_change(struct vxlan_net *vn, struct net_device *dev)
+{
+	struct vxlan_dev *vxlan = netdev_priv(dev);
+	struct vxlan_sock *sock;
+	int l3mdev_index = 0;
+
+#if IS_ENABLED(CONFIG_IPV6)
+	bool metadata = vxlan->cfg.flags & VXLAN_F_COLLECT_METADATA;
+	bool ipv6 = vxlan->cfg.flags & VXLAN_F_IPV6 || metadata;
+
+	sock = ipv6 ? rcu_dereference(vxlan->vn6_sock)
+		    : rcu_dereference(vxlan->vn4_sock);
+#else
+	sock = rcu_dereference(vxlan->vn4_sock);
+#endif
+
+	if (vxlan->cfg.remote_ifindex)
+		l3mdev_index = l3mdev_master_upper_ifindex_by_index(
+			vxlan->net, vxlan->cfg.remote_ifindex);
+	if (sock->sock->sk->sk_bound_dev_if != l3mdev_index) {
+		int ret = vxlan_reopen(vn, vxlan);
+
+		if (ret < 0)
+			netdev_err(vxlan->dev, "Failed to reopen socket: %d\n",
+				   ret);
+	}
+}
+
 static int vxlan_netdevice_event(struct notifier_block *unused,
 				 unsigned long event, void *ptr)
 {
@@ -3756,6 +3832,12 @@ static int vxlan_netdevice_event(struct notifier_block *unused,
 	} else if (event == NETDEV_UDP_TUNNEL_PUSH_INFO ||
 		   event == NETDEV_UDP_TUNNEL_DROP_INFO) {
 		vxlan_offload_rx_ports(dev, event == NETDEV_UDP_TUNNEL_PUSH_INFO);
+	} else if (event == NETDEV_CHANGEUPPER) {
+		vxlan_handle_change_upper(vn, dev);
+	} else if (event == NETDEV_CHANGE) {
+		if (dev->rtnl_link_ops &&
+		    !strcmp(dev->rtnl_link_ops->kind, vxlan_link_ops.kind))
+			vxlan_handle_change(vn, dev);
 	}
 
 	return NOTIFY_DONE;
diff --git a/tools/testing/selftests/net/test_vxlan_under_vrf.sh b/tools/testing/selftests/net/test_vxlan_under_vrf.sh
index 9ee906d5d333..fccbb00f4305 100755
--- a/tools/testing/selftests/net/test_vxlan_under_vrf.sh
+++ b/tools/testing/selftests/net/test_vxlan_under_vrf.sh
@@ -76,11 +76,7 @@ ip netns exec vm-1 ping -c 1 -W 1 10.0.0.2 &> /dev/null || (echo FAIL; false)
 
 # Move the underlay to a non-default VRF
 ip netns exec hv-1 ip link set veth0 vrf vrf-underlay
-ip netns exec hv-1 ip link set veth0 down
-ip netns exec hv-1 ip link set veth0 up
 ip netns exec hv-2 ip link set veth0 vrf vrf-underlay
-ip netns exec hv-2 ip link set veth0 down
-ip netns exec hv-2 ip link set veth0 up
 
 echo "Check VM connectivity through VXLAN (underlay in a VRF)"
 ip netns exec vm-1 ping -c 1 -W 1 10.0.0.2 &> /dev/null || (echo FAIL; false)
-- 

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC v4 1/5] udp_tunnel: add config option to bind to a device
  2018-11-22  1:07 ` [RFC v4 1/5] udp_tunnel: add config option to bind to a device Alexis Bauvin
@ 2018-11-22 17:10   ` David Ahern
  0 siblings, 0 replies; 19+ messages in thread
From: David Ahern @ 2018-11-22 17:10 UTC (permalink / raw)
  To: Alexis Bauvin, roopa; +Cc: netdev, akherbouche

On 11/21/18 6:07 PM, Alexis Bauvin wrote:
> UDP tunnel sockets are always opened unbound to a specific device. This
> patch allow the socket to be bound on a custom device, which
> incidentally makes UDP tunnels VRF-aware if binding to an l3mdev.
> 
> Signed-off-by: Alexis Bauvin <abauvin@scaleway.com>
> Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com>
> Tested-by: Amine Kherbouche <akherbouche@scaleway.com>
> ---
>  include/net/udp_tunnel.h  |  1 +
>  net/ipv4/udp_tunnel.c     | 10 ++++++++++
>  net/ipv6/ip6_udp_tunnel.c |  9 +++++++++
>  3 files changed, 20 insertions(+)

Reviewed-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v4 2/5] l3mdev: add function to retreive upper master
  2018-11-22  1:07 ` [RFC v4 2/5] l3mdev: add function to retreive upper master Alexis Bauvin
@ 2018-11-22 17:11   ` David Ahern
  0 siblings, 0 replies; 19+ messages in thread
From: David Ahern @ 2018-11-22 17:11 UTC (permalink / raw)
  To: Alexis Bauvin, roopa; +Cc: netdev, akherbouche

On 11/21/18 6:07 PM, Alexis Bauvin wrote:
> Existing functions to retreive the l3mdev of a device did not walk the
> master chain to find the upper master. This patch adds a function to
> find the l3mdev, even indirect through e.g. a bridge:
> 
>
...

> 
> This will properly resolve the l3mdev of eth0 to vrf-blue.
> 
> Signed-off-by: Alexis Bauvin <abauvin@scaleway.com>
> Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com>
> Tested-by: Amine Kherbouche <akherbouche@scaleway.com>
> ---
>  include/net/l3mdev.h | 22 ++++++++++++++++++++++
>  net/l3mdev/l3mdev.c  | 18 ++++++++++++++++++
>  2 files changed, 40 insertions(+)
> 

Reviewed-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v4 4/5] netdev: add netdev_is_upper_master
  2018-11-22  1:07 ` [RFC v4 4/5] netdev: add netdev_is_upper_master Alexis Bauvin
@ 2018-11-22 17:14   ` David Ahern
  2018-11-22 18:10     ` Alexis Bauvin
  0 siblings, 1 reply; 19+ messages in thread
From: David Ahern @ 2018-11-22 17:14 UTC (permalink / raw)
  To: Alexis Bauvin, roopa; +Cc: netdev, akherbouche

On 11/21/18 6:07 PM, Alexis Bauvin wrote:
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 93243479085f..12459036d0da 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -7225,6 +7225,23 @@ void netdev_lower_state_changed(struct net_device *lower_dev,
>  }
>  EXPORT_SYMBOL(netdev_lower_state_changed);
>  
> +/**
> + * netdev_is_upper_master - Test if a device is a master, direct or indirect,
> + *                          of another one.
> + * @dev: device to start looking from
> + * @master: device to test if master of dev
> + */
> +bool netdev_is_upper_master(struct net_device *dev, struct net_device *master)
> +{
> +	if (!dev)
> +		return false;
> +
> +	if (dev->ifindex == master->ifindex)

dev == master should work as well without the dereference.

> +		return true;
> +	return netdev_is_upper_master(netdev_master_upper_dev_get(dev), master);
> +}
> +EXPORT_SYMBOL(netdev_is_upper_master);
> +
>  static void dev_change_rx_flags(struct net_device *dev, int flags)
>  {
>  	const struct net_device_ops *ops = dev->netdev_ops;
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v4 3/5] vxlan: add support for underlay in non-default VRF
  2018-11-22  1:07 ` [RFC v4 3/5] vxlan: add support for underlay in non-default VRF Alexis Bauvin
@ 2018-11-22 17:19   ` David Ahern
  2018-11-26 16:32     ` Alexis Bauvin
  0 siblings, 1 reply; 19+ messages in thread
From: David Ahern @ 2018-11-22 17:19 UTC (permalink / raw)
  To: Alexis Bauvin, roopa; +Cc: netdev, akherbouche

On 11/21/18 6:07 PM, Alexis Bauvin wrote:
> Creating a VXLAN device with is underlay in the non-default VRF makes
> egress route lookup fail or incorrect since it will resolve in the
> default VRF, and ingress fail because the socket listens in the default
> VRF.
> 
> This patch binds the underlying UDP tunnel socket to the l3mdev of the
> lower device of the VXLAN device. This will listen in the proper VRF and
> output traffic from said l3mdev, matching l3mdev routing rules and
> looking up the correct routing table.
> 
> When the VXLAN device does not have a lower device, or the lower device
> is in the default VRF, the socket will not be bound to any interface,
> keeping the previous behaviour.
> 
> The underlay l3mdev is deduced from the VXLAN lower device
> (IFLA_VXLAN_LINK).
> 
> +----------+                         +---------+
> |          |                         |         |
> | vrf-blue |                         | vrf-red |
> |          |                         |         |
> +----+-----+                         +----+----+
>      |                                    |
>      |                                    |
> +----+-----+                         +----+----+
> |          |                         |         |
> | br-blue  |                         | br-red  |
> |          |                         |         |
> +----+-----+                         +---+-+---+
>      |                                   | |
>      |                             +-----+ +-----+
>      |                             |             |
> +----+-----+                +------+----+   +----+----+
> |          |  lower device  |           |   |         |
> |   eth0   | <- - - - - - - | vxlan-red |   | tap-red | (... more taps)
> |          |                |           |   |         |
> +----------+                +-----------+   +---------+
> 
> Signed-off-by: Alexis Bauvin <abauvin@scaleway.com>
> Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com>
> Tested-by: Amine Kherbouche <akherbouche@scaleway.com>
> ---
>  drivers/net/vxlan.c                           | 32 +++++--
>  .../selftests/net/test_vxlan_under_vrf.sh     | 90 +++++++++++++++++++
>  2 files changed, 114 insertions(+), 8 deletions(-)
>  create mode 100755 tools/testing/selftests/net/test_vxlan_under_vrf.sh
> 

Reviewed-by: David Ahern <dsahern@gmail.com>

Thanks for adding the test case; I'll try it out next week (after the
holidays).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v4 4/5] netdev: add netdev_is_upper_master
  2018-11-22 17:14   ` David Ahern
@ 2018-11-22 18:10     ` Alexis Bauvin
  0 siblings, 0 replies; 19+ messages in thread
From: Alexis Bauvin @ 2018-11-22 18:10 UTC (permalink / raw)
  To: David Ahern, roopa; +Cc: netdev, akherbouche, Alexis Bauvin

Le 22 nov. 2018 à 18:14, David Ahern <dsa@cumulusnetworks.com> a écrit :
> On 11/21/18 6:07 PM, Alexis Bauvin wrote:
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 93243479085f..12459036d0da 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -7225,6 +7225,23 @@ void netdev_lower_state_changed(struct net_device *lower_dev,
>> }
>> EXPORT_SYMBOL(netdev_lower_state_changed);
>> 
>> +/**
>> + * netdev_is_upper_master - Test if a device is a master, direct or indirect,
>> + *                          of another one.
>> + * @dev: device to start looking from
>> + * @master: device to test if master of dev
>> + */
>> +bool netdev_is_upper_master(struct net_device *dev, struct net_device *master)
>> +{
>> +	if (!dev)
>> +		return false;
>> +
>> +	if (dev->ifindex == master->ifindex)
> 
> dev == master should work as well without the dereference.

Ack, will add to next version.

>> +		return true;
>> +	return netdev_is_upper_master(netdev_master_upper_dev_get(dev), master);
>> +}
>> +EXPORT_SYMBOL(netdev_is_upper_master);
>> +
>> static void dev_change_rx_flags(struct net_device *dev, int flags)
>> {
>> 	const struct net_device_ops *ops = dev->netdev_ops;
>> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v4 3/5] vxlan: add support for underlay in non-default VRF
  2018-11-22 17:19   ` David Ahern
@ 2018-11-26 16:32     ` Alexis Bauvin
  2018-11-26 17:54       ` David Ahern
  0 siblings, 1 reply; 19+ messages in thread
From: Alexis Bauvin @ 2018-11-26 16:32 UTC (permalink / raw)
  To: David Ahern, roopa; +Cc: netdev, akherbouche

Le 22 nov. 2018 à 18:19, David Ahern <dsa@cumulusnetworks.com> a écrit :
> On 11/21/18 6:07 PM, Alexis Bauvin wrote:
>> Creating a VXLAN device with is underlay in the non-default VRF makes
>> egress route lookup fail or incorrect since it will resolve in the
>> default VRF, and ingress fail because the socket listens in the default
>> VRF.
>> 
>> This patch binds the underlying UDP tunnel socket to the l3mdev of the
>> lower device of the VXLAN device. This will listen in the proper VRF and
>> output traffic from said l3mdev, matching l3mdev routing rules and
>> looking up the correct routing table.
>> 
>> When the VXLAN device does not have a lower device, or the lower device
>> is in the default VRF, the socket will not be bound to any interface,
>> keeping the previous behaviour.
>> 
>> The underlay l3mdev is deduced from the VXLAN lower device
>> (IFLA_VXLAN_LINK).
>> 
>> +----------+                         +---------+
>> |          |                         |         |
>> | vrf-blue |                         | vrf-red |
>> |          |                         |         |
>> +----+-----+                         +----+----+
>>     |                                    |
>>     |                                    |
>> +----+-----+                         +----+----+
>> |          |                         |         |
>> | br-blue  |                         | br-red  |
>> |          |                         |         |
>> +----+-----+                         +---+-+---+
>>     |                                   | |
>>     |                             +-----+ +-----+
>>     |                             |             |
>> +----+-----+                +------+----+   +----+----+
>> |          |  lower device  |           |   |         |
>> |   eth0   | <- - - - - - - | vxlan-red |   | tap-red | (... more taps)
>> |          |                |           |   |         |
>> +----------+                +-----------+   +---------+
>> 
>> Signed-off-by: Alexis Bauvin <abauvin@scaleway.com>
>> Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com>
>> Tested-by: Amine Kherbouche <akherbouche@scaleway.com>
>> ---
>> drivers/net/vxlan.c                           | 32 +++++--
>> .../selftests/net/test_vxlan_under_vrf.sh     | 90 +++++++++++++++++++
>> 2 files changed, 114 insertions(+), 8 deletions(-)
>> create mode 100755 tools/testing/selftests/net/test_vxlan_under_vrf.sh
>> 
> 
> Reviewed-by: David Ahern <dsahern@gmail.com>
> 
> Thanks for adding the test case; I'll try it out next week (after the
> holidays).

Thanks for the review. I’ll send a v5 if you have no other comment on
this version!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v4 3/5] vxlan: add support for underlay in non-default VRF
  2018-11-26 16:32     ` Alexis Bauvin
@ 2018-11-26 17:54       ` David Ahern
  2018-11-26 18:26         ` Roopa Prabhu
  2018-11-27  0:41         ` Alexis Bauvin
  0 siblings, 2 replies; 19+ messages in thread
From: David Ahern @ 2018-11-26 17:54 UTC (permalink / raw)
  To: Alexis Bauvin, roopa; +Cc: netdev, akherbouche

[-- Attachment #1: Type: text/plain, Size: 1058 bytes --]

On 11/26/18 9:32 AM, Alexis Bauvin wrote:
> Thanks for the review. I’ll send a v5 if you have no other comment on
> this version!

A few comments on the test script; see attached which has the changes.

Mainly the cleanup does not need to be called at the end since you setup
the exit trap. The cleanup calls ip to delete veth-hv-1 and veth-tap but
those are moved to other namespaces. 'ip netns exec NAME ip ...' is more
efficiently done as 'ip -netns NAME ...'. The test results should align
like this:

Checking HV connectivity                                          [ OK ]
Check VM connectivity through VXLAN (underlay in the default VRF) [ OK ]
Check VM connectivity through VXLAN (underlay in a VRF)           [ OK ]

So it is easy for users to see the PASS/FAIL.

It would be good to copy the topology ascii art into the test script as
well for future users.

Also, add the test as a separate patch at the end and include it in
tools/testing/selftests/net/Makefile

Finally, I think you should drop the RFC and send it as a 'ready for
inclusion'.

[-- Attachment #2: test_vxlan_under_vrf.sh --]
[-- Type: application/x-sh, Size: 2564 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v4 3/5] vxlan: add support for underlay in non-default VRF
  2018-11-26 17:54       ` David Ahern
@ 2018-11-26 18:26         ` Roopa Prabhu
  2018-11-26 19:06           ` Alexis Bauvin
  2018-11-27  0:41         ` Alexis Bauvin
  1 sibling, 1 reply; 19+ messages in thread
From: Roopa Prabhu @ 2018-11-26 18:26 UTC (permalink / raw)
  To: David Ahern; +Cc: abauvin, netdev, akherbouche

On Mon, Nov 26, 2018 at 9:54 AM David Ahern <dsa@cumulusnetworks.com> wrote:
>
> On 11/26/18 9:32 AM, Alexis Bauvin wrote:
> > Thanks for the review. I’ll send a v5 if you have no other comment on
> > this version!
>
> A few comments on the test script; see attached which has the changes.
>
> Mainly the cleanup does not need to be called at the end since you setup
> the exit trap. The cleanup calls ip to delete veth-hv-1 and veth-tap but
> those are moved to other namespaces. 'ip netns exec NAME ip ...' is more
> efficiently done as 'ip -netns NAME ...'. The test results should align
> like this:
>
> Checking HV connectivity                                          [ OK ]
> Check VM connectivity through VXLAN (underlay in the default VRF) [ OK ]
> Check VM connectivity through VXLAN (underlay in a VRF)           [ OK ]
>
> So it is easy for users to see the PASS/FAIL.
>
> It would be good to copy the topology ascii art into the test script as
> well for future users.
>
> Also, add the test as a separate patch at the end and include it in
> tools/testing/selftests/net/Makefile
>
> Finally, I think you should drop the RFC and send it as a 'ready for
> inclusion'.

I cant seem to find patch 5 in my mail box... so commenting here
(Using reference to patch5 from here
https://marc.info/?l=linux-netdev&m=154284885815549&w=2)

Still not convinced that the auto reopen is justified here IMO because
it can be done from user-space and there are many cases where this is
already done from user-space. A few questions for alexis on that,
- What is the reason for handling NETDEV_CHANGE on the vxlan device
from the notifier handler. It can be really done in the changelink
handler, correct  ?
- Also, IIUC, patch5 blindly re-opens the vxlan device without
considering if the admin had set it to down last (ie the last state on
it was vxlan_close). is that correct ?

(Don't want to block the entire series for just patch5. Patch5 can be
done incrementally after we converge on it. The rest of the series
looks good as David has already reviewed.  And nice to see the test!).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v4 3/5] vxlan: add support for underlay in non-default VRF
  2018-11-26 18:26         ` Roopa Prabhu
@ 2018-11-26 19:06           ` Alexis Bauvin
  2018-11-26 19:11             ` David Ahern
  0 siblings, 1 reply; 19+ messages in thread
From: Alexis Bauvin @ 2018-11-26 19:06 UTC (permalink / raw)
  To: Roopa Prabhu, David Ahern; +Cc: netdev, akherbouche, Alexis Bauvin

Le 26 nov. 2018 à 19:26, Roopa Prabhu <roopa@cumulusnetworks.com> a écrit :
> 
> On Mon, Nov 26, 2018 at 9:54 AM David Ahern <dsa@cumulusnetworks.com> wrote:
>> 
>> On 11/26/18 9:32 AM, Alexis Bauvin wrote:
>>> Thanks for the review. I’ll send a v5 if you have no other comment on
>>> this version!
>> 
>> A few comments on the test script; see attached which has the changes.
>> 
>> Mainly the cleanup does not need to be called at the end since you setup
>> the exit trap. The cleanup calls ip to delete veth-hv-1 and veth-tap but
>> those are moved to other namespaces. 'ip netns exec NAME ip ...' is more
>> efficiently done as 'ip -netns NAME ...'. The test results should align
>> like this:
>> 
>> Checking HV connectivity                                          [ OK ]
>> Check VM connectivity through VXLAN (underlay in the default VRF) [ OK ]
>> Check VM connectivity through VXLAN (underlay in a VRF)           [ OK ]
>> 
>> So it is easy for users to see the PASS/FAIL.
>> 
>> It would be good to copy the topology ascii art into the test script as
>> well for future users.
>> 
>> Also, add the test as a separate patch at the end and include it in
>> tools/testing/selftests/net/Makefile
>> 
>> Finally, I think you should drop the RFC and send it as a 'ready for
>> inclusion'.
> 
> I cant seem to find patch 5 in my mail box... so commenting here
> (Using reference to patch5 from here
> https://marc.info/?l=linux-netdev&m=154284885815549&w=2)
> 
> Still not convinced that the auto reopen is justified here IMO because
> it can be done from user-space and there are many cases where this is
> already done from user-space. A few questions for alexis on that,

I do agree on this. The test shows that a simple down/up is enough, and
the patch was written as a mere convenience.

> - What is the reason for handling NETDEV_CHANGE on the vxlan device
> from the notifier handler. It can be really done in the changelink
> handler, correct  ?

Looks correct to me. The reason is nothing more than me not thinking
about the netlink handlers.

> - Also, IIUC, patch5 blindly re-opens the vxlan device without
> considering if the admin had set it to down last (ie the last state on
> it was vxlan_close). is that correct ?

It is correct. This is a big oversight from my side, that could have
led to crashes. Fortunately the underlying code will check if the
sockets are null (which they are if the interface is down) before
accessing them.

> (Don't want to block the entire series for just patch5. Patch5 can be
> done incrementally after we converge on it. The rest of the series
> looks good as David has already reviewed.  And nice to see the test!).

Thanks!

Given the aforementioned oversight when handling a down interface, it is
best to wait for a better solution for this patch.
Moreover, the issue of mixing default and non-default vrf needs to be
addressed. For now it is stale, as I don’t see any solution (except for
rewriting the whole thing as you suggested before) to address the
"Address already in use" made by a socket of the default vrf owning the
port across all vrfs.
I tested both Vyatta’s changes and SO_REUSEPORT, and neither of them seem
to work for this case.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v4 3/5] vxlan: add support for underlay in non-default VRF
  2018-11-26 19:06           ` Alexis Bauvin
@ 2018-11-26 19:11             ` David Ahern
  0 siblings, 0 replies; 19+ messages in thread
From: David Ahern @ 2018-11-26 19:11 UTC (permalink / raw)
  To: Alexis Bauvin, Roopa Prabhu; +Cc: netdev, akherbouche

On 11/26/18 12:06 PM, Alexis Bauvin wrote:
> Moreover, the issue of mixing default and non-default vrf needs to be
> addressed. For now it is stale, as I don’t see any solution (except for
> rewriting the whole thing as you suggested before) to address the
> "Address already in use" made by a socket of the default vrf owning the
> port across all vrfs.
> I tested both Vyatta’s changes and SO_REUSEPORT, and neither of them seem
> to work for this case.

That suggests to me the reopen should be done internally so that the
socket failure can cause the enslavement to fail with a message passed
back to the user via extack.

ie., If changing the vrf association breaks vxlan, we should detect that
and fail the change.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v4 3/5] vxlan: add support for underlay in non-default VRF
  2018-11-26 17:54       ` David Ahern
  2018-11-26 18:26         ` Roopa Prabhu
@ 2018-11-27  0:41         ` Alexis Bauvin
  2018-11-27  0:46           ` David Ahern
  1 sibling, 1 reply; 19+ messages in thread
From: Alexis Bauvin @ 2018-11-27  0:41 UTC (permalink / raw)
  To: David Ahern, roopa; +Cc: netdev, akherbouche, Alexis Bauvin

Le 26 nov. 2018 à 18:54, David Ahern <dsa@cumulusnetworks.com> a écrit :
> On 11/26/18 9:32 AM, Alexis Bauvin wrote:
>> Thanks for the review. I’ll send a v5 if you have no other comment on
>> this version!
> 
> A few comments on the test script; see attached which has the changes.
> 
> Mainly the cleanup does not need to be called at the end since you setup
> the exit trap. The cleanup calls ip to delete veth-hv-1 and veth-tap but
> those are moved to other namespaces.

This was on purpose to be sure to cleanup the interfaces in case the
script crashes for some reason and left interfaces outside of the
namespace.

> 'ip netns exec NAME ip ...' is more
> efficiently done as 'ip -netns NAME ...'. The test results should align
> like this:
> 
> Checking HV connectivity                                          [ OK ]
> Check VM connectivity through VXLAN (underlay in the default VRF) [ OK ]
> Check VM connectivity through VXLAN (underlay in a VRF)           [ OK ]
> 
> So it is easy for users to see the PASS/FAIL.

Awesome, thanks!

> It would be good to copy the topology ascii art into the test script as
> well for future users.

Will include this:

+-------------------+                                      +-------------------+
|                   |                                      |                   |
|    vm-1 netns     |                                      |    vm-2 netns     |
|                   |                                      |                   |
|  +-------------+  |                                      |  +-------------+  |
|  |   veth-hv   |  |                                      |  |   veth-hv   |  |
|  | 10.0.0.1/24 |  |                                      |  | 10.0.0.2/24 |  |
|  +-------------+  |                                      |  +-------------+  |
|         .         |                                      |         .         |
+-------------------+                                      +-------------------+
          .                                                          .
          .                                                          .
          .                                                          .
+------------------------------------+   +-------------------------------------+
|         .                          |   |                           .         |
|   +----------+                     |   |                     +----------+    |
|   | veth-tap |                     |   |                     | veth-tap |    |
|   +----+-----+                     |   |                     +----+-----+    |
|        |                           |   |                          |          |
|     +--+--+      +--------------+  |   |  +--------------+     +--+--+       |
|     | br0 |      | vrf-underlay |  |   |  | vrf-underlay |     | br0 |       |
|     +--+--+      +-------+------+  |   |  +------+-------+     +--+--+       |
|        |                 |         |   |         |                |          |
|    +---+----+    +-------+-------+ |   | +-------+-------+    +---+----+     |
|    | vxlan0 |....|     veth0     |.|...|.|     veth0     |....| vxlan0 |     |
|    +--------+    | 172.16.0.1/24 | |   | | 172.16.0.2/24 |    +--------+     |
|                  +---------------+ |   | +---------------+                   |
|                                    |   |                                     |
|              hv-1 netns            |   |           hv-2 netns                |
|                                    |   |                                     |
+------------------------------------+   +-------------------------------------+

> Also, add the test as a separate patch at the end and include it in
> tools/testing/selftests/net/Makefile

Regarding the discussion on patch 5, it should be better to send it first
after patch 3, and remove the down/up from it after current patch 5,
right?

> Finally, I think you should drop the RFC and send it as a 'ready for
> inclusion’.

Great thanks!

> <test_vxlan_under_vrf.sh>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v4 3/5] vxlan: add support for underlay in non-default VRF
  2018-11-27  0:41         ` Alexis Bauvin
@ 2018-11-27  0:46           ` David Ahern
  2018-11-27  0:57             ` Alexis Bauvin
  0 siblings, 1 reply; 19+ messages in thread
From: David Ahern @ 2018-11-27  0:46 UTC (permalink / raw)
  To: Alexis Bauvin, roopa; +Cc: netdev, akherbouche

On 11/26/18 5:41 PM, Alexis Bauvin wrote:
> Le 26 nov. 2018 à 18:54, David Ahern <dsa@cumulusnetworks.com> a écrit :
>> On 11/26/18 9:32 AM, Alexis Bauvin wrote:
>>> Thanks for the review. I’ll send a v5 if you have no other comment on
>>> this version!
>>
>> A few comments on the test script; see attached which has the changes.
>>
>> Mainly the cleanup does not need to be called at the end since you setup
>> the exit trap. The cleanup calls ip to delete veth-hv-1 and veth-tap but
>> those are moved to other namespaces.
> 
> This was on purpose to be sure to cleanup the interfaces in case the
> script crashes for some reason and left interfaces outside of the
> namespace.

ok.

>> It would be good to copy the topology ascii art into the test script as
>> well for future users.
> 
> Will include this:

<cool ascii diagram>

Thanks for adding.

> 
>> Also, add the test as a separate patch at the end and include it in
>> tools/testing/selftests/net/Makefile
> 
> Regarding the discussion on patch 5, it should be better to send it first
> after patch 3, and remove the down/up from it after current patch 5,
> right?

Typically the test case is added at the end verifying the end goal of
the patch set as opposed to being part of a patch (3 in your case) and
then amended by a later patch.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v4 3/5] vxlan: add support for underlay in non-default VRF
  2018-11-27  0:46           ` David Ahern
@ 2018-11-27  0:57             ` Alexis Bauvin
  0 siblings, 0 replies; 19+ messages in thread
From: Alexis Bauvin @ 2018-11-27  0:57 UTC (permalink / raw)
  To: David Ahern, roopa; +Cc: netdev, akherbouche

Le 27 nov. 2018 à 01:46, David Ahern <dsa@cumulusnetworks.com> a écrit :
> On 11/26/18 5:41 PM, Alexis Bauvin wrote:
>> Le 26 nov. 2018 à 18:54, David Ahern <dsa@cumulusnetworks.com> a écrit :
>>> On 11/26/18 9:32 AM, Alexis Bauvin wrote:
>>>> Thanks for the review. I’ll send a v5 if you have no other comment on
>>>> this version!
>>> 
>>> A few comments on the test script; see attached which has the changes.
>>> 
>>> Mainly the cleanup does not need to be called at the end since you setup
>>> the exit trap. The cleanup calls ip to delete veth-hv-1 and veth-tap but
>>> those are moved to other namespaces.
>> 
>> This was on purpose to be sure to cleanup the interfaces in case the
>> script crashes for some reason and left interfaces outside of the
>> namespace.
> 
> ok.
> 
>>> It would be good to copy the topology ascii art into the test script as
>>> well for future users.
>> 
>> Will include this:
> 
> <cool ascii diagram>
> 
> Thanks for adding.
> 
>> 
>>> Also, add the test as a separate patch at the end and include it in
>>> tools/testing/selftests/net/Makefile
>> 
>> Regarding the discussion on patch 5, it should be better to send it first
>> after patch 3, and remove the down/up from it after current patch 5,
>> right?
> 
> Typically the test case is added at the end verifying the end goal of
> the patch set as opposed to being part of a patch (3 in your case) and
> then amended by a later patch.

Ok. Will make the test script the very last of the series. Thanks for the
pointers.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-11-27 11:53 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-22  1:07 [RFC v4 0/5] Add VRF support for VXLAN underlay Alexis Bauvin
2018-11-22  1:07 ` [RFC v4 1/5] udp_tunnel: add config option to bind to a device Alexis Bauvin
2018-11-22 17:10   ` David Ahern
2018-11-22  1:07 ` [RFC v4 2/5] l3mdev: add function to retreive upper master Alexis Bauvin
2018-11-22 17:11   ` David Ahern
2018-11-22  1:07 ` [RFC v4 3/5] vxlan: add support for underlay in non-default VRF Alexis Bauvin
2018-11-22 17:19   ` David Ahern
2018-11-26 16:32     ` Alexis Bauvin
2018-11-26 17:54       ` David Ahern
2018-11-26 18:26         ` Roopa Prabhu
2018-11-26 19:06           ` Alexis Bauvin
2018-11-26 19:11             ` David Ahern
2018-11-27  0:41         ` Alexis Bauvin
2018-11-27  0:46           ` David Ahern
2018-11-27  0:57             ` Alexis Bauvin
2018-11-22  1:07 ` [RFC v4 4/5] netdev: add netdev_is_upper_master Alexis Bauvin
2018-11-22 17:14   ` David Ahern
2018-11-22 18:10     ` Alexis Bauvin
2018-11-22  1:07 ` [RFC v4 5/5] vxlan: handle underlay VRF changes Alexis Bauvin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.