All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/5] vrf: allow simultaneous service instances in default and other VRFs
@ 2018-09-20  8:58 Mike Manning
  2018-09-20  8:58 ` [PATCH net-next 1/5] net: allow binding socket in a VRF when there's an unbound socket Mike Manning
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Mike Manning @ 2018-09-20  8:58 UTC (permalink / raw)
  To: netdev

Services currently have to be VRF-aware if they are using an unbound
socket. One cannot have multiple service instances running in the
default and other VRFs for services that are not VRF-aware and listen
on an unbound socket. This is because there is no way of isolating
packets received in the default VRF from those arriving in other VRFs.

This series provides this isolation subject to the existing kernel
parameter net.ipv4.tcp_l3mdev_accept not being set, given that this is
documented as allowing a single service instance to work across all
VRF domains. The functionality applies to UDP & TCP services, for IPv4
and IPv6, in particular adding VRF table handling for IPv6 multicast.

Example of running ssh instances in default and blue VRF:

$ /usr/sbin/sshd -D
$ ip vrf exec vrf-blue /usr/sbin/sshd
$ ss -ta | egrep 'State|ssh'
State   Recv-Q   Send-Q           Local Address:Port       Peer Address:Port
LISTEN  0        128           0.0.0.0%vrf-blue:ssh             0.0.0.0:*
LISTEN  0        128                    0.0.0.0:ssh             0.0.0.0:*
ESTAB   0        0              192.168.122.220:ssh       192.168.122.1:50282
LISTEN  0        128              [::]%vrf-blue:ssh                [::]:*
LISTEN  0        128                       [::]:ssh                [::]:*
ESTAB   0        0           [3000::2]%vrf-blue:ssh           [3000::9]:45896
ESTAB   0        0                    [2000::2]:ssh           [2000::9]:46398

Dewi Morgan (1):
  ipv6: do not drop vrf udp multicast packets

Mike Manning (1):
  ipv6: allow link-local and multicast packets inside vrf

Patrick Ruddy (1):
  ipv6: add vrf table handling code for ipv6 mcast

Robert Shearman (2):
  net: allow binding socket in a VRF when there's an unbound socket
  ipv4: Allow sending multicast packets on specific i/f using VRF socket

 Documentation/networking/vrf.txt |  9 ++++----
 drivers/net/vrf.c                | 30 ++++++++++++++++--------
 include/net/inet6_hashtables.h   |  5 ++--
 include/net/inet_hashtables.h    | 21 +++++++++++------
 include/net/inet_sock.h          | 13 +++++++++++
 net/core/sock.c                  |  2 ++
 net/ipv4/datagram.c              |  2 +-
 net/ipv4/inet_connection_sock.c  | 13 ++++++++---
 net/ipv4/inet_hashtables.c       | 34 +++++++++++++++++-----------
 net/ipv4/ip_sockglue.c           |  3 +++
 net/ipv4/ping.c                  |  2 +-
 net/ipv4/raw.c                   |  6 ++---
 net/ipv4/udp.c                   | 17 ++++++--------
 net/ipv6/datagram.c              |  5 +++-
 net/ipv6/inet6_hashtables.c      | 14 +++++-------
 net/ipv6/ip6_input.c             | 46 +++++++++++++++++++++++++++++++++----
 net/ipv6/ip6mr.c                 | 49 ++++++++++++++++++++++++++++++----------
 net/ipv6/ipv6_sockglue.c         |  5 +++-
 net/ipv6/raw.c                   |  6 ++---
 net/ipv6/udp.c                   | 22 ++++++++----------
 20 files changed, 208 insertions(+), 96 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net-next 1/5] net: allow binding socket in a VRF when there's an unbound socket
  2018-09-20  8:58 [PATCH net-next 0/5] vrf: allow simultaneous service instances in default and other VRFs Mike Manning
@ 2018-09-20  8:58 ` Mike Manning
  2018-09-23  8:47   ` kbuild test robot
  2018-09-23  9:58   ` kbuild test robot
  2018-09-20  8:58 ` [PATCH net-next 2/5] ipv6: allow link-local and multicast packets inside vrf Mike Manning
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 12+ messages in thread
From: Mike Manning @ 2018-09-20  8:58 UTC (permalink / raw)
  To: netdev; +Cc: Robert Shearman

From: Robert Shearman <rshearma@vyatta.att-mail.com>

There is no easy way currently for applications that want to receive
packets in the default VRF to be isolated from packets arriving in
VRFs, which makes using VRF-unaware applications in a VRF-aware system
a potential security risk.

So change the inet socket lookup to avoid packets arriving on a device
enslaved to an l3mdev from matching unbound sockets by removing the
wildcard for non sk_bound_dev_if and instead relying on check against
the secondary device index, which will be 0 when the input device is
not enslaved to an l3mdev and so match against an unbound socket and
not match when the input device is enslaved.

The existing net.ipv4.tcp_l3mdev_accept & net.ipv4.udp_l3mdev_accept
sysctls, which are documented as allowing the working across all VRF
domains, can be used to also work in the default VRF by causing
unbound sockets to match against packets arriving on a device
enslaved to an l3mdev.

Change the socket binding to take the l3mdev into account to allow an
unbound socket to not conflict sockets bound to an l3mdev given the
datapath isolation now guaranteed.

Signed-off-by: Robert Shearman <rshearma@vyatta.att-mail.com>
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
---
 Documentation/networking/vrf.txt |  9 +++++----
 include/net/inet6_hashtables.h   |  5 ++---
 include/net/inet_hashtables.h    | 21 ++++++++++++++-------
 include/net/inet_sock.h          | 13 +++++++++++++
 net/core/sock.c                  |  2 ++
 net/ipv4/inet_connection_sock.c  | 13 ++++++++++---
 net/ipv4/inet_hashtables.c       | 34 +++++++++++++++++++++-------------
 net/ipv4/ip_sockglue.c           |  3 +++
 net/ipv4/raw.c                   |  4 ++--
 net/ipv4/udp.c                   | 15 ++++++---------
 net/ipv6/datagram.c              |  5 ++++-
 net/ipv6/inet6_hashtables.c      | 14 ++++++--------
 net/ipv6/ipv6_sockglue.c         |  3 +++
 net/ipv6/raw.c                   |  6 +++---
 net/ipv6/udp.c                   | 14 +++++---------
 15 files changed, 99 insertions(+), 62 deletions(-)

diff --git a/Documentation/networking/vrf.txt b/Documentation/networking/vrf.txt
index 8ff7b4c8f91b..d4b129402d57 100644
--- a/Documentation/networking/vrf.txt
+++ b/Documentation/networking/vrf.txt
@@ -103,6 +103,11 @@ VRF device:
 
 or to specify the output device using cmsg and IP_PKTINFO.
 
+By default the scope of the port bindings for unbound sockets is
+limited to the default VRF. That is, it will not be matched by packets
+arriving on interfaces enslaved to an l3mdev and processes may bind to
+the same port if they bind to an l3mdev.
+
 TCP & UDP services running in the default VRF context (ie., not bound
 to any VRF device) can work across all VRF domains by enabling the
 tcp_l3mdev_accept and udp_l3mdev_accept sysctl options:
@@ -112,10 +117,6 @@ tcp_l3mdev_accept and udp_l3mdev_accept sysctl options:
 netfilter rules on the VRF device can be used to limit access to services
 running in the default VRF context as well.
 
-The default VRF does not have limited scope with respect to port bindings.
-That is, if a process does a wildcard bind to a port in the default VRF it
-owns the port across all VRF domains within the network namespace.
-
 ################################################################################
 
 Using iproute2 for VRFs
diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h
index 6e91e38a31da..9db98af46985 100644
--- a/include/net/inet6_hashtables.h
+++ b/include/net/inet6_hashtables.h
@@ -115,9 +115,8 @@ int inet6_hash(struct sock *sk);
 	 ((__sk)->sk_family == AF_INET6)			&&	\
 	 ipv6_addr_equal(&(__sk)->sk_v6_daddr, (__saddr))		&&	\
 	 ipv6_addr_equal(&(__sk)->sk_v6_rcv_saddr, (__daddr))	&&	\
-	 (!(__sk)->sk_bound_dev_if	||				\
-	   ((__sk)->sk_bound_dev_if == (__dif))	||			\
-	   ((__sk)->sk_bound_dev_if == (__sdif)))		&&	\
+	 (((__sk)->sk_bound_dev_if == (__dif))	||			\
+	  ((__sk)->sk_bound_dev_if == (__sdif)))		&&	\
 	 net_eq(sock_net(__sk), (__net)))
 
 #endif /* _INET6_HASHTABLES_H */
diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 9141e95529e7..ec279bcd0958 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -79,6 +79,7 @@ struct inet_ehash_bucket {
 
 struct inet_bind_bucket {
 	possible_net_t		ib_net;
+	int			l3mdev;
 	unsigned short		port;
 	signed char		fastreuse;
 	signed char		fastreuseport;
@@ -188,10 +189,18 @@ static inline void inet_ehash_locks_free(struct inet_hashinfo *hashinfo)
 	hashinfo->ehash_locks = NULL;
 }
 
+static inline bool inet_sk_bound_dev_eq(struct net *net, int bound_dev_if,
+					int dif, int sdif)
+{
+	if (!bound_dev_if)
+		return !sdif || net->ipv4.sysctl_tcp_l3mdev_accept;
+	return bound_dev_if == dif || bound_dev_if == sdif;
+}
+
 struct inet_bind_bucket *
 inet_bind_bucket_create(struct kmem_cache *cachep, struct net *net,
 			struct inet_bind_hashbucket *head,
-			const unsigned short snum);
+			const unsigned short snum, int l3mdev);
 void inet_bind_bucket_destroy(struct kmem_cache *cachep,
 			      struct inet_bind_bucket *tb);
 
@@ -282,9 +291,8 @@ static inline struct sock *inet_lookup_listener(struct net *net,
 #define INET_MATCH(__sk, __net, __cookie, __saddr, __daddr, __ports, __dif, __sdif) \
 	(((__sk)->sk_portpair == (__ports))			&&	\
 	 ((__sk)->sk_addrpair == (__cookie))			&&	\
-	 (!(__sk)->sk_bound_dev_if	||				\
-	   ((__sk)->sk_bound_dev_if == (__dif))			||	\
-	   ((__sk)->sk_bound_dev_if == (__sdif)))		&&	\
+	 (((__sk)->sk_bound_dev_if == (__dif))			||	\
+	  ((__sk)->sk_bound_dev_if == (__sdif)))		&&	\
 	 net_eq(sock_net(__sk), (__net)))
 #else /* 32-bit arch */
 #define INET_ADDR_COOKIE(__name, __saddr, __daddr) \
@@ -294,9 +302,8 @@ static inline struct sock *inet_lookup_listener(struct net *net,
 	(((__sk)->sk_portpair == (__ports))		&&		\
 	 ((__sk)->sk_daddr	== (__saddr))		&&		\
 	 ((__sk)->sk_rcv_saddr	== (__daddr))		&&		\
-	 (!(__sk)->sk_bound_dev_if	||				\
-	   ((__sk)->sk_bound_dev_if == (__dif))		||		\
-	   ((__sk)->sk_bound_dev_if == (__sdif)))	&&		\
+	 (((__sk)->sk_bound_dev_if == (__dif))		||		\
+	  ((__sk)->sk_bound_dev_if == (__sdif)))	&&		\
 	 net_eq(sock_net(__sk), (__net)))
 #endif /* 64-bit arch */
 
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index e03b93360f33..92e0aa3958f6 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -130,6 +130,19 @@ static inline int inet_request_bound_dev_if(const struct sock *sk,
 	return sk->sk_bound_dev_if;
 }
 
+static inline int inet_sk_bound_l3mdev(const struct sock *sk)
+{
+#ifdef CONFIG_NET_L3_MASTER_DEV
+	struct net *net = sock_net(sk);
+
+	if (!net->ipv4.sysctl_tcp_l3mdev_accept)
+		return l3mdev_master_ifindex_by_index(net,
+						      sk->sk_bound_dev_if);
+#endif
+
+	return 0;
+}
+
 static inline struct ip_options_rcu *ireq_opt_deref(const struct inet_request_sock *ireq)
 {
 	return rcu_dereference_check(ireq->ireq_opt,
diff --git a/net/core/sock.c b/net/core/sock.c
index 3730eb855095..da1cbb88a6bf 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -567,6 +567,8 @@ static int sock_setbindtodevice(struct sock *sk, char __user *optval,
 
 	lock_sock(sk);
 	sk->sk_bound_dev_if = index;
+	if (sk->sk_prot->rehash)
+		sk->sk_prot->rehash(sk);
 	sk_dst_reset(sk);
 	release_sock(sk);
 
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index dfd5009f96ef..97bba5b3d69f 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -183,7 +183,9 @@ inet_csk_find_open_port(struct sock *sk, struct inet_bind_bucket **tb_ret, int *
 	int i, low, high, attempt_half;
 	struct inet_bind_bucket *tb;
 	u32 remaining, offset;
+	int l3mdev;
 
+	l3mdev = inet_sk_bound_l3mdev(sk);
 	attempt_half = (sk->sk_reuse == SK_CAN_REUSE) ? 1 : 0;
 other_half_scan:
 	inet_get_local_port_range(net, &low, &high);
@@ -219,7 +221,8 @@ inet_csk_find_open_port(struct sock *sk, struct inet_bind_bucket **tb_ret, int *
 						  hinfo->bhash_size)];
 		spin_lock_bh(&head->lock);
 		inet_bind_bucket_for_each(tb, &head->chain)
-			if (net_eq(ib_net(tb), net) && tb->port == port) {
+			if (net_eq(ib_net(tb), net) && tb->l3mdev == l3mdev &&
+			    tb->port == port) {
 				if (!inet_csk_bind_conflict(sk, tb, false, false))
 					goto success;
 				goto next_port;
@@ -293,6 +296,9 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 	struct net *net = sock_net(sk);
 	struct inet_bind_bucket *tb = NULL;
 	kuid_t uid = sock_i_uid(sk);
+	int l3mdev;
+
+	l3mdev = inet_sk_bound_l3mdev(sk);
 
 	if (!port) {
 		head = inet_csk_find_open_port(sk, &tb, &port);
@@ -306,11 +312,12 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 					  hinfo->bhash_size)];
 	spin_lock_bh(&head->lock);
 	inet_bind_bucket_for_each(tb, &head->chain)
-		if (net_eq(ib_net(tb), net) && tb->port == port)
+		if (net_eq(ib_net(tb), net) && tb->l3mdev == l3mdev &&
+		    tb->port == port)
 			goto tb_found;
 tb_not_found:
 	tb = inet_bind_bucket_create(hinfo->bind_bucket_cachep,
-				     net, head, port);
+				     net, head, port, l3mdev);
 	if (!tb)
 		goto fail_unlock;
 tb_found:
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index f5c9ef2586de..2ec684057ebd 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -65,12 +65,14 @@ static u32 sk_ehashfn(const struct sock *sk)
 struct inet_bind_bucket *inet_bind_bucket_create(struct kmem_cache *cachep,
 						 struct net *net,
 						 struct inet_bind_hashbucket *head,
-						 const unsigned short snum)
+						 const unsigned short snum,
+						 int l3mdev)
 {
 	struct inet_bind_bucket *tb = kmem_cache_alloc(cachep, GFP_ATOMIC);
 
 	if (tb) {
 		write_pnet(&tb->ib_net, net);
+		tb->l3mdev    = l3mdev;
 		tb->port      = snum;
 		tb->fastreuse = 0;
 		tb->fastreuseport = 0;
@@ -135,6 +137,7 @@ int __inet_inherit_port(const struct sock *sk, struct sock *child)
 			table->bhash_size);
 	struct inet_bind_hashbucket *head = &table->bhash[bhash];
 	struct inet_bind_bucket *tb;
+	int l3mdev;
 
 	spin_lock(&head->lock);
 	tb = inet_csk(sk)->icsk_bind_hash;
@@ -143,6 +146,8 @@ int __inet_inherit_port(const struct sock *sk, struct sock *child)
 		return -ENOENT;
 	}
 	if (tb->port != port) {
+		l3mdev = inet_sk_bound_l3mdev(sk);
+
 		/* NOTE: using tproxy and redirecting skbs to a proxy
 		 * on a different listener port breaks the assumption
 		 * that the listener socket's icsk_bind_hash is the same
@@ -150,12 +155,13 @@ int __inet_inherit_port(const struct sock *sk, struct sock *child)
 		 * create a new bind bucket for the child here. */
 		inet_bind_bucket_for_each(tb, &head->chain) {
 			if (net_eq(ib_net(tb), sock_net(sk)) &&
-			    tb->port == port)
+			    tb->l3mdev == l3mdev && tb->port == port)
 				break;
 		}
 		if (!tb) {
 			tb = inet_bind_bucket_create(table->bind_bucket_cachep,
-						     sock_net(sk), head, port);
+						     sock_net(sk), head, port,
+						     l3mdev);
 			if (!tb) {
 				spin_unlock(&head->lock);
 				return -ENOMEM;
@@ -229,6 +235,7 @@ static inline int compute_score(struct sock *sk, struct net *net,
 {
 	int score = -1;
 	struct inet_sock *inet = inet_sk(sk);
+	bool dev_match;
 
 	if (net_eq(sock_net(sk), net) && inet->inet_num == hnum &&
 			!ipv6_only_sock(sk)) {
@@ -239,15 +246,12 @@ static inline int compute_score(struct sock *sk, struct net *net,
 				return -1;
 			score += 4;
 		}
-		if (sk->sk_bound_dev_if || exact_dif) {
-			bool dev_match = (sk->sk_bound_dev_if == dif ||
-					  sk->sk_bound_dev_if == sdif);
+		dev_match = inet_sk_bound_dev_eq(net, sk->sk_bound_dev_if,
+						 dif, sdif);
+		if (!dev_match)
+			return -1;
+		score += 4;
 
-			if (!dev_match)
-				return -1;
-			if (sk->sk_bound_dev_if)
-				score += 4;
-		}
 		if (sk->sk_incoming_cpu == raw_smp_processor_id())
 			score++;
 	}
@@ -675,6 +679,7 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 	u32 remaining, offset;
 	int ret, i, low, high;
 	static u32 hint;
+	int l3mdev;
 
 	if (port) {
 		head = &hinfo->bhash[inet_bhashfn(net, port,
@@ -693,6 +698,8 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 		return ret;
 	}
 
+	l3mdev = inet_sk_bound_l3mdev(sk);
+
 	inet_get_local_port_range(net, &low, &high);
 	high++; /* [32768, 60999] -> [32768, 61000[ */
 	remaining = high - low;
@@ -719,7 +726,8 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 		 * the established check is already unique enough.
 		 */
 		inet_bind_bucket_for_each(tb, &head->chain) {
-			if (net_eq(ib_net(tb), net) && tb->port == port) {
+			if (net_eq(ib_net(tb), net) && tb->l3mdev == l3mdev &&
+			    tb->port == port) {
 				if (tb->fastreuse >= 0 ||
 				    tb->fastreuseport >= 0)
 					goto next_port;
@@ -732,7 +740,7 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 		}
 
 		tb = inet_bind_bucket_create(hinfo->bind_bucket_cachep,
-					     net, head, port);
+					     net, head, port, l3mdev);
 		if (!tb) {
 			spin_unlock_bh(&head->lock);
 			return -ENOMEM;
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index c0fe5ad996f2..026971314c43 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -892,6 +892,9 @@ static int do_ip_setsockopt(struct sock *sk, int level,
 		dev_put(dev);
 
 		err = -EINVAL;
+		if (!sk->sk_bound_dev_if && midx)
+			break;
+
 		if (sk->sk_bound_dev_if &&
 		    mreq.imr_ifindex != sk->sk_bound_dev_if &&
 		    (!midx || midx != sk->sk_bound_dev_if))
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 33df4d76db2d..8a0d568d7aec 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -70,6 +70,7 @@
 #include <net/snmp.h>
 #include <net/tcp_states.h>
 #include <net/inet_common.h>
+#include <net/inet_hashtables.h>
 #include <net/checksum.h>
 #include <net/xfrm.h>
 #include <linux/rtnetlink.h>
@@ -131,8 +132,7 @@ struct sock *__raw_v4_lookup(struct net *net, struct sock *sk,
 		if (net_eq(sock_net(sk), net) && inet->inet_num == num	&&
 		    !(inet->inet_daddr && inet->inet_daddr != raddr) 	&&
 		    !(inet->inet_rcv_saddr && inet->inet_rcv_saddr != laddr) &&
-		    !(sk->sk_bound_dev_if && sk->sk_bound_dev_if != dif &&
-		      sk->sk_bound_dev_if != sdif))
+		    inet_sk_bound_dev_eq(net, sk->sk_bound_dev_if, dif, sdif))
 			goto found; /* gotcha */
 	}
 	sk = NULL;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index f4e35b2ff8b8..3d59ab47a85d 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -371,6 +371,7 @@ static int compute_score(struct sock *sk, struct net *net,
 {
 	int score;
 	struct inet_sock *inet;
+	bool dev_match;
 
 	if (!net_eq(sock_net(sk), net) ||
 	    udp_sk(sk)->udp_port_hash != hnum ||
@@ -398,15 +399,11 @@ static int compute_score(struct sock *sk, struct net *net,
 		score += 4;
 	}
 
-	if (sk->sk_bound_dev_if || exact_dif) {
-		bool dev_match = (sk->sk_bound_dev_if == dif ||
-				  sk->sk_bound_dev_if == sdif);
-
-		if (!dev_match)
-			return -1;
-		if (sk->sk_bound_dev_if)
-			score += 4;
-	}
+	dev_match = inet_sk_bound_dev_eq(net, sk->sk_bound_dev_if,
+					 dif, sdif);
+	if (!dev_match)
+		return -1;
+	score += 4;
 
 	if (sk->sk_incoming_cpu == raw_smp_processor_id())
 		score++;
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 1ede7a16a0be..4813293d4fad 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -782,7 +782,10 @@ int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
 
 			if (src_info->ipi6_ifindex) {
 				if (fl6->flowi6_oif &&
-				    src_info->ipi6_ifindex != fl6->flowi6_oif)
+				    src_info->ipi6_ifindex != fl6->flowi6_oif &&
+				    (sk->sk_bound_dev_if != fl6->flowi6_oif ||
+				     !sk_dev_equal_l3scope(
+					     sk, src_info->ipi6_ifindex)))
 					return -EINVAL;
 				fl6->flowi6_oif = src_info->ipi6_ifindex;
 			}
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index 3d7c7460a0c5..5eeeba7181a1 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -99,6 +99,7 @@ static inline int compute_score(struct sock *sk, struct net *net,
 				const int dif, const int sdif, bool exact_dif)
 {
 	int score = -1;
+	bool dev_match;
 
 	if (net_eq(sock_net(sk), net) && inet_sk(sk)->inet_num == hnum &&
 	    sk->sk_family == PF_INET6) {
@@ -109,15 +110,12 @@ static inline int compute_score(struct sock *sk, struct net *net,
 				return -1;
 			score++;
 		}
-		if (sk->sk_bound_dev_if || exact_dif) {
-			bool dev_match = (sk->sk_bound_dev_if == dif ||
-					  sk->sk_bound_dev_if == sdif);
+		dev_match = inet_sk_bound_dev_eq(net, sk->sk_bound_dev_if,
+						 dif, sdif);
+		if (!dev_match)
+			return -1;
+		score++;
 
-			if (!dev_match)
-				return -1;
-			if (sk->sk_bound_dev_if)
-				score++;
-		}
 		if (sk->sk_incoming_cpu == raw_smp_processor_id())
 			score++;
 	}
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index c0cac9cc3a28..7dfbc797b130 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -626,6 +626,9 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 
 			rcu_read_unlock();
 
+			if (!sk->sk_bound_dev_if && midx)
+				goto e_inval;
+
 			if (sk->sk_bound_dev_if &&
 			    sk->sk_bound_dev_if != val &&
 			    (!midx || midx != sk->sk_bound_dev_if))
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 413d98bf24f4..2f61a9f1a2b2 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -49,6 +49,7 @@
 #include <net/transp_v6.h>
 #include <net/udp.h>
 #include <net/inet_common.h>
+#include <net/inet_hashtables.h>
 #include <net/tcp_states.h>
 #if IS_ENABLED(CONFIG_IPV6_MIP6)
 #include <net/mip6.h>
@@ -86,9 +87,8 @@ struct sock *__raw_v6_lookup(struct net *net, struct sock *sk,
 			    !ipv6_addr_equal(&sk->sk_v6_daddr, rmt_addr))
 				continue;
 
-			if (sk->sk_bound_dev_if &&
-			    sk->sk_bound_dev_if != dif &&
-			    sk->sk_bound_dev_if != sdif)
+			if (!inet_sk_bound_dev_eq(net, sk->sk_bound_dev_if,
+						  dif, sdif))
 				continue;
 
 			if (!ipv6_addr_any(&sk->sk_v6_rcv_saddr)) {
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 83f4c77c79d8..e22b7dd78c9b 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -117,6 +117,7 @@ static int compute_score(struct sock *sk, struct net *net,
 {
 	int score;
 	struct inet_sock *inet;
+	bool dev_match;
 
 	if (!net_eq(sock_net(sk), net) ||
 	    udp_sk(sk)->udp_port_hash != hnum ||
@@ -144,15 +145,10 @@ static int compute_score(struct sock *sk, struct net *net,
 		score++;
 	}
 
-	if (sk->sk_bound_dev_if || exact_dif) {
-		bool dev_match = (sk->sk_bound_dev_if == dif ||
-				  sk->sk_bound_dev_if == sdif);
-
-		if (!dev_match)
-			return -1;
-		if (sk->sk_bound_dev_if)
-			score++;
-	}
+	dev_match = inet_sk_bound_dev_eq(net, sk->sk_bound_dev_if, dif, sdif);
+	if (!dev_match)
+		return -1;
+	score++;
 
 	if (sk->sk_incoming_cpu == raw_smp_processor_id())
 		score++;
-- 
2.11.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net-next 2/5] ipv6: allow link-local and multicast packets inside vrf
  2018-09-20  8:58 [PATCH net-next 0/5] vrf: allow simultaneous service instances in default and other VRFs Mike Manning
  2018-09-20  8:58 ` [PATCH net-next 1/5] net: allow binding socket in a VRF when there's an unbound socket Mike Manning
@ 2018-09-20  8:58 ` Mike Manning
  2018-09-20  8:58 ` [PATCH net-next 3/5] ipv4: Allow sending multicast packets on specific i/f using VRF socket Mike Manning
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Mike Manning @ 2018-09-20  8:58 UTC (permalink / raw)
  To: netdev

Packets that are multicast or to link-local addresses are not enslaved
to the vrf of the socket that they are received on. This is needed for
NDISC, but breaks applications that rely on receiving such packets when
in a VRF. Also to make IPv6 consistent with IPv4 which does handle
multicast packets as being enslaved, modify the VRF driver to do the
same for IPv6. As a result, the multicast address check needs to verify
the address against the enslaved rather than the l3mdev device.

Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
---
 drivers/net/vrf.c        | 19 +++++++++----------
 net/ipv6/ip6_input.c     | 19 ++++++++++++++++++-
 net/ipv6/ipv6_sockglue.c |  2 +-
 3 files changed, 28 insertions(+), 12 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index f93547f257fb..9d817c19f3b4 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -981,24 +981,23 @@ static struct sk_buff *vrf_ip6_rcv(struct net_device *vrf_dev,
 				   struct sk_buff *skb)
 {
 	int orig_iif = skb->skb_iif;
-	bool need_strict;
+	bool need_strict = rt6_need_strict(&ipv6_hdr(skb)->daddr);
+	bool is_ndisc = ipv6_ndisc_frame(skb);
 
-	/* loopback traffic; do not push through packet taps again.
-	 * Reset pkt_type for upper layers to process skb
+	/* loopback, multicast & non-ND link-local traffic; do not push through
+	 * packet taps again. Reset pkt_type for upper layers to process skb
 	 */
-	if (skb->pkt_type == PACKET_LOOPBACK) {
+	if (skb->pkt_type == PACKET_LOOPBACK || (need_strict && !is_ndisc)) {
 		skb->dev = vrf_dev;
 		skb->skb_iif = vrf_dev->ifindex;
 		IP6CB(skb)->flags |= IP6SKB_L3SLAVE;
-		skb->pkt_type = PACKET_HOST;
+		if (skb->pkt_type == PACKET_LOOPBACK)
+			skb->pkt_type = PACKET_HOST;
 		goto out;
 	}
 
-	/* if packet is NDISC or addressed to multicast or link-local
-	 * then keep the ingress interface
-	 */
-	need_strict = rt6_need_strict(&ipv6_hdr(skb)->daddr);
-	if (!ipv6_ndisc_frame(skb) && !need_strict) {
+	/* if packet is NDISC then keep the ingress interface */
+	if (!is_ndisc) {
 		vrf_rx_stats(vrf_dev, skb->len);
 		skb->dev = vrf_dev;
 		skb->skb_iif = vrf_dev->ifindex;
diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index 96577e742afd..108f5f88ec98 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -432,15 +432,32 @@ EXPORT_SYMBOL_GPL(ip6_input);
 
 int ip6_mc_input(struct sk_buff *skb)
 {
+	int sdif = inet6_sdif(skb);
 	const struct ipv6hdr *hdr;
+	struct net_device *dev;
 	bool deliver;
 
 	__IP6_UPD_PO_STATS(dev_net(skb_dst(skb)->dev),
 			 __in6_dev_get_safely(skb->dev), IPSTATS_MIB_INMCAST,
 			 skb->len);
 
+	/* skb->dev passed may be master dev for vrfs. */
+	if (sdif) {
+		rcu_read_lock();
+		dev = dev_get_by_index_rcu(dev_net(skb->dev), sdif);
+		if (!dev) {
+			rcu_read_unlock();
+			kfree_skb(skb);
+			return -ENODEV;
+		}
+	} else {
+		dev = skb->dev;
+	}
+
 	hdr = ipv6_hdr(skb);
-	deliver = ipv6_chk_mcast_addr(skb->dev, &hdr->daddr, NULL);
+	deliver = ipv6_chk_mcast_addr(dev, &hdr->daddr, NULL);
+	if (sdif)
+		rcu_read_unlock();
 
 #ifdef CONFIG_IPV6_MROUTE
 	/*
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 7dfbc797b130..4ebd395dd3df 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -486,7 +486,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 				retv = -EFAULT;
 				break;
 		}
-		if (sk->sk_bound_dev_if && pkt.ipi6_ifindex != sk->sk_bound_dev_if)
+		if (!sk_dev_equal_l3scope(sk, pkt.ipi6_ifindex))
 			goto e_inval;
 
 		np->sticky_pktinfo.ipi6_ifindex = pkt.ipi6_ifindex;
-- 
2.11.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net-next 3/5] ipv4: Allow sending multicast packets on specific i/f using VRF socket
  2018-09-20  8:58 [PATCH net-next 0/5] vrf: allow simultaneous service instances in default and other VRFs Mike Manning
  2018-09-20  8:58 ` [PATCH net-next 1/5] net: allow binding socket in a VRF when there's an unbound socket Mike Manning
  2018-09-20  8:58 ` [PATCH net-next 2/5] ipv6: allow link-local and multicast packets inside vrf Mike Manning
@ 2018-09-20  8:58 ` Mike Manning
  2018-09-20  8:58 ` [PATCH net-next 4/5] ipv6: do not drop vrf udp multicast packets Mike Manning
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Mike Manning @ 2018-09-20  8:58 UTC (permalink / raw)
  To: netdev; +Cc: Robert Shearman

From: Robert Shearman <rshearma@vyatta.att-mail.com>

It is useful to be able to use the same socket for listening in a
specific VRF, as for sending multicast packets out of a specific
interface. However, the bound device on the socket currently takes
precedence and results in the packets not being sent.

Relax the condition on overriding the output interface to use for
sending packets out of UDP, raw and ping sockets to allow multicast
packets to be sent using the specified multicast interface.

Signed-off-by: Robert Shearman <rshearma@vyatta.att-mail.com>
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
---
 net/ipv4/datagram.c | 2 +-
 net/ipv4/ping.c     | 2 +-
 net/ipv4/raw.c      | 2 +-
 net/ipv4/udp.c      | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c
index f915abff1350..300921417f89 100644
--- a/net/ipv4/datagram.c
+++ b/net/ipv4/datagram.c
@@ -42,7 +42,7 @@ int __ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len
 	oif = sk->sk_bound_dev_if;
 	saddr = inet->inet_saddr;
 	if (ipv4_is_multicast(usin->sin_addr.s_addr)) {
-		if (!oif)
+		if (!oif || netif_index_is_l3_master(sock_net(sk), oif))
 			oif = inet->mc_index;
 		if (!saddr)
 			saddr = inet->mc_addr;
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 8d7aaf118a30..7ccb5f87f70b 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -779,7 +779,7 @@ static int ping_v4_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	}
 
 	if (ipv4_is_multicast(daddr)) {
-		if (!ipc.oif)
+		if (!ipc.oif || netif_index_is_l3_master(sock_net(sk), ipc.oif))
 			ipc.oif = inet->mc_index;
 		if (!saddr)
 			saddr = inet->mc_addr;
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 8a0d568d7aec..c55ef53d87a8 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -608,7 +608,7 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		tos |= RTO_ONLINK;
 
 	if (ipv4_is_multicast(daddr)) {
-		if (!ipc.oif)
+		if (!ipc.oif || netif_index_is_l3_master(sock_net(sk), ipc.oif))
 			ipc.oif = inet->mc_index;
 		if (!saddr)
 			saddr = inet->mc_addr;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 3d59ab47a85d..f81097843031 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1039,7 +1039,7 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	}
 
 	if (ipv4_is_multicast(daddr)) {
-		if (!ipc.oif)
+		if (!ipc.oif || netif_index_is_l3_master(sock_net(sk), ipc.oif))
 			ipc.oif = inet->mc_index;
 		if (!saddr)
 			saddr = inet->mc_addr;
-- 
2.11.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net-next 4/5] ipv6: do not drop vrf udp multicast packets
  2018-09-20  8:58 [PATCH net-next 0/5] vrf: allow simultaneous service instances in default and other VRFs Mike Manning
                   ` (2 preceding siblings ...)
  2018-09-20  8:58 ` [PATCH net-next 3/5] ipv4: Allow sending multicast packets on specific i/f using VRF socket Mike Manning
@ 2018-09-20  8:58 ` Mike Manning
  2018-09-20 13:02   ` Paolo Abeni
  2018-09-20  8:58 ` [PATCH net-next 5/5] ipv6: add vrf table handling code for ipv6 mcast Mike Manning
  2018-09-21  4:28 ` [PATCH net-next 0/5] vrf: allow simultaneous service instances in default and other VRFs David Ahern
  5 siblings, 1 reply; 12+ messages in thread
From: Mike Manning @ 2018-09-20  8:58 UTC (permalink / raw)
  To: netdev; +Cc: Dewi Morgan

From: Dewi Morgan <morgand@vyatta.att-mail.com>

For bound udp sockets in a vrf, also check the sdif to get the index
for ingress devices enslaved to an l3mdev. Verify the multicast address
against the enslaved rather than the l3mdev device.

Signed-off-by: Dewi Morgan <morgand@vyatta.att-mail.com>
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
---
 net/ipv6/ip6_input.c | 27 ++++++++++++++++++++++++---
 net/ipv6/udp.c       |  8 +++++---
 2 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index 108f5f88ec98..fc60f297d95b 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -325,9 +325,12 @@ static int ip6_input_finish(struct net *net, struct sock *sk, struct sk_buff *sk
 {
 	const struct inet6_protocol *ipprot;
 	struct inet6_dev *idev;
+	struct net_device *dev;
 	unsigned int nhoff;
+	int sdif = inet6_sdif(skb);
 	int nexthdr;
 	bool raw;
+	bool deliver;
 	bool have_final = false;
 
 	/*
@@ -371,9 +374,27 @@ static int ip6_input_finish(struct net *net, struct sock *sk, struct sk_buff *sk
 			skb_postpull_rcsum(skb, skb_network_header(skb),
 					   skb_network_header_len(skb));
 			hdr = ipv6_hdr(skb);
-			if (ipv6_addr_is_multicast(&hdr->daddr) &&
-			    !ipv6_chk_mcast_addr(skb->dev, &hdr->daddr,
-			    &hdr->saddr) &&
+
+			/* skb->dev passed may be master dev for vrfs. */
+			if (sdif) {
+				rcu_read_lock();
+				dev = dev_get_by_index_rcu(dev_net(skb->dev),
+							   sdif);
+				if (!dev) {
+					rcu_read_unlock();
+					kfree_skb(skb);
+					return -ENODEV;
+				}
+			} else {
+				dev = skb->dev;
+			}
+
+			deliver = ipv6_chk_mcast_addr(dev, &hdr->daddr,
+						      &hdr->saddr);
+			if (sdif)
+				rcu_read_unlock();
+
+			if (ipv6_addr_is_multicast(&hdr->daddr) && !deliver &&
 			    !ipv6_is_mld(skb, nexthdr, skb_network_header_len(skb)))
 				goto discard;
 		}
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index e22b7dd78c9b..35f71b7a1070 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -637,7 +637,7 @@ static int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 static bool __udp_v6_is_mcast_sock(struct net *net, struct sock *sk,
 				   __be16 loc_port, const struct in6_addr *loc_addr,
 				   __be16 rmt_port, const struct in6_addr *rmt_addr,
-				   int dif, unsigned short hnum)
+				   int dif, int sdif, unsigned short hnum)
 {
 	struct inet_sock *inet = inet_sk(sk);
 
@@ -649,7 +649,7 @@ static bool __udp_v6_is_mcast_sock(struct net *net, struct sock *sk,
 	    (inet->inet_dport && inet->inet_dport != rmt_port) ||
 	    (!ipv6_addr_any(&sk->sk_v6_daddr) &&
 		    !ipv6_addr_equal(&sk->sk_v6_daddr, rmt_addr)) ||
-	    (sk->sk_bound_dev_if && sk->sk_bound_dev_if != dif) ||
+	    !inet_sk_bound_dev_eq(net, sk->sk_bound_dev_if, dif, sdif) ||
 	    (!ipv6_addr_any(&sk->sk_v6_rcv_saddr) &&
 		    !ipv6_addr_equal(&sk->sk_v6_rcv_saddr, loc_addr)))
 		return false;
@@ -683,6 +683,7 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	unsigned int offset = offsetof(typeof(*sk), sk_node);
 	unsigned int hash2 = 0, hash2_any = 0, use_hash2 = (hslot->count > 10);
 	int dif = inet6_iif(skb);
+	int sdif = inet6_sdif(skb);
 	struct hlist_node *node;
 	struct sk_buff *nskb;
 
@@ -697,7 +698,8 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 
 	sk_for_each_entry_offset_rcu(sk, node, &hslot->head, offset) {
 		if (!__udp_v6_is_mcast_sock(net, sk, uh->dest, daddr,
-					    uh->source, saddr, dif, hnum))
+					    uh->source, saddr, dif, sdif,
+					    hnum))
 			continue;
 		/* If zero checksum and no_check is not on for
 		 * the socket then skip it.
-- 
2.11.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net-next 5/5] ipv6: add vrf table handling code for ipv6 mcast
  2018-09-20  8:58 [PATCH net-next 0/5] vrf: allow simultaneous service instances in default and other VRFs Mike Manning
                   ` (3 preceding siblings ...)
  2018-09-20  8:58 ` [PATCH net-next 4/5] ipv6: do not drop vrf udp multicast packets Mike Manning
@ 2018-09-20  8:58 ` Mike Manning
  2018-09-21  4:28 ` [PATCH net-next 0/5] vrf: allow simultaneous service instances in default and other VRFs David Ahern
  5 siblings, 0 replies; 12+ messages in thread
From: Mike Manning @ 2018-09-20  8:58 UTC (permalink / raw)
  To: netdev; +Cc: Patrick Ruddy

From: Patrick Ruddy <pruddy@vyatta.att-mail.com>

The code to obtain the correct table for the incoming interface was
missing for IPv6. This has been added along with the table creation
notification to fib rules for the RTNL_FAMILY_IP6MR address family.

Signed-off-by: Patrick Ruddy <pruddy@vyatta.att-mail.com>
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
---
 drivers/net/vrf.c | 11 +++++++++++
 net/ipv6/ip6mr.c  | 49 +++++++++++++++++++++++++++++++++++++------------
 2 files changed, 48 insertions(+), 12 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 9d817c19f3b4..21ad4b1d7f03 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -1214,8 +1214,19 @@ static int vrf_add_fib_rules(const struct net_device *dev)
 		goto ipmr_err;
 #endif
 
+#if IS_ENABLED(CONFIG_IPV6_MROUTE_MULTIPLE_TABLES)
+	err = vrf_fib_rule(dev, RTNL_FAMILY_IP6MR, true);
+	if (err < 0)
+		goto ip6mr_err;
+#endif
+
 	return 0;
 
+#if IS_ENABLED(CONFIG_IPV6_MROUTE_MULTIPLE_TABLES)
+ip6mr_err:
+	vrf_fib_rule(dev, RTNL_FAMILY_IPMR,  false);
+#endif
+
 #if IS_ENABLED(CONFIG_IP_MROUTE_MULTIPLE_TABLES)
 ipmr_err:
 	vrf_fib_rule(dev, AF_INET6,  false);
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index d0b7e0249c13..1ecc88456dc5 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -85,7 +85,8 @@ static struct mr_table *ip6mr_new_table(struct net *net, u32 id);
 static void ip6mr_free_table(struct mr_table *mrt);
 
 static void ip6_mr_forward(struct net *net, struct mr_table *mrt,
-			   struct sk_buff *skb, struct mfc6_cache *cache);
+			   struct net_device *dev, struct sk_buff *skb,
+			   struct mfc6_cache *cache);
 static int ip6mr_cache_report(struct mr_table *mrt, struct sk_buff *pkt,
 			      mifi_t mifi, int assert);
 static void mr6_netlink_event(struct mr_table *mrt, struct mfc6_cache *mfc,
@@ -138,6 +139,9 @@ static int ip6mr_fib_lookup(struct net *net, struct flowi6 *flp6,
 		.flags = FIB_LOOKUP_NOREF,
 	};
 
+	/* update flow if oif or iif point to device enslaved to l3mdev */
+	l3mdev_update_flow(net, flowi6_to_flowi(flp6));
+
 	err = fib_rules_lookup(net->ipv6.mr6_rules_ops,
 			       flowi6_to_flowi(flp6), 0, &arg);
 	if (err < 0)
@@ -164,7 +168,9 @@ static int ip6mr_rule_action(struct fib_rule *rule, struct flowi *flp,
 		return -EINVAL;
 	}
 
-	mrt = ip6mr_get_table(rule->fr_net, rule->table);
+	arg->table = fib_rule_get_table(rule, arg);
+
+	mrt = ip6mr_get_table(rule->fr_net, arg->table);
 	if (!mrt)
 		return -EAGAIN;
 	res->mrt = mrt;
@@ -1014,7 +1020,7 @@ static void ip6mr_cache_resolve(struct net *net, struct mr_table *mrt,
 			}
 			rtnl_unicast(skb, net, NETLINK_CB(skb).portid);
 		} else
-			ip6_mr_forward(net, mrt, skb, c);
+			ip6_mr_forward(net, mrt, skb->dev, skb, c);
 	}
 }
 
@@ -1120,7 +1126,7 @@ static int ip6mr_cache_report(struct mr_table *mrt, struct sk_buff *pkt,
 
 /* Queue a packet for resolution. It gets locked cache entry! */
 static int ip6mr_cache_unresolved(struct mr_table *mrt, mifi_t mifi,
-				  struct sk_buff *skb)
+				  struct sk_buff *skb, struct net_device *dev)
 {
 	struct mfc6_cache *c;
 	bool found = false;
@@ -1180,6 +1186,10 @@ static int ip6mr_cache_unresolved(struct mr_table *mrt, mifi_t mifi,
 		kfree_skb(skb);
 		err = -ENOBUFS;
 	} else {
+		if (dev) {
+			skb->dev = dev;
+			skb->skb_iif = dev->ifindex;
+		}
 		skb_queue_tail(&c->_c.mfc_un.unres.unresolved, skb);
 		err = 0;
 	}
@@ -2043,11 +2053,12 @@ static int ip6mr_find_vif(struct mr_table *mrt, struct net_device *dev)
 }
 
 static void ip6_mr_forward(struct net *net, struct mr_table *mrt,
-			   struct sk_buff *skb, struct mfc6_cache *c)
+			   struct net_device *dev, struct sk_buff *skb,
+			   struct mfc6_cache *c)
 {
 	int psend = -1;
 	int vif, ct;
-	int true_vifi = ip6mr_find_vif(mrt, skb->dev);
+	int true_vifi = ip6mr_find_vif(mrt, dev);
 
 	vif = c->_c.mfc_parent;
 	c->_c.mfc_un.res.pkt++;
@@ -2073,7 +2084,7 @@ static void ip6_mr_forward(struct net *net, struct mr_table *mrt,
 	/*
 	 * Wrong interface: drop packet and (maybe) send PIM assert.
 	 */
-	if (mrt->vif_table[vif].dev != skb->dev) {
+	if (mrt->vif_table[vif].dev != dev) {
 		c->_c.mfc_un.res.wrong_if++;
 
 		if (true_vifi >= 0 && mrt->mroute_do_assert &&
@@ -2146,6 +2157,7 @@ static void ip6_mr_forward(struct net *net, struct mr_table *mrt,
 
 int ip6_mr_input(struct sk_buff *skb)
 {
+	struct rtable *rt = skb_rtable(skb);
 	struct mfc6_cache *cache;
 	struct net *net = dev_net(skb->dev);
 	struct mr_table *mrt;
@@ -2154,6 +2166,19 @@ int ip6_mr_input(struct sk_buff *skb)
 		.flowi6_mark	= skb->mark,
 	};
 	int err;
+	struct net_device *dev;
+
+	/* skb->dev passed in is the master dev for vrfs.
+	 * Get the proper interface that does have a vif associated with it.
+	 */
+	dev = skb->dev;
+	if (netif_is_l3_master(skb->dev)) {
+		dev = dev_get_by_index_rcu(net, IPCB(skb)->iif);
+		if (!dev) {
+			kfree_skb(skb);
+			return -ENODEV;
+		}
+	}
 
 	err = ip6mr_fib_lookup(net, &fl6, &mrt);
 	if (err < 0) {
@@ -2165,7 +2190,7 @@ int ip6_mr_input(struct sk_buff *skb)
 	cache = ip6mr_cache_find(mrt,
 				 &ipv6_hdr(skb)->saddr, &ipv6_hdr(skb)->daddr);
 	if (!cache) {
-		int vif = ip6mr_find_vif(mrt, skb->dev);
+		int vif = ip6mr_find_vif(mrt, dev);
 
 		if (vif >= 0)
 			cache = ip6mr_cache_find_any(mrt,
@@ -2179,9 +2204,9 @@ int ip6_mr_input(struct sk_buff *skb)
 	if (!cache) {
 		int vif;
 
-		vif = ip6mr_find_vif(mrt, skb->dev);
+		vif = ip6mr_find_vif(mrt, dev);
 		if (vif >= 0) {
-			int err = ip6mr_cache_unresolved(mrt, vif, skb);
+			int err = ip6mr_cache_unresolved(mrt, vif, skb, dev);
 			read_unlock(&mrt_lock);
 
 			return err;
@@ -2191,7 +2216,7 @@ int ip6_mr_input(struct sk_buff *skb)
 		return -ENODEV;
 	}
 
-	ip6_mr_forward(net, mrt, skb, cache);
+	ip6_mr_forward(net, mrt, dev, skb, cache);
 
 	read_unlock(&mrt_lock);
 
@@ -2257,7 +2282,7 @@ int ip6mr_get_route(struct net *net, struct sk_buff *skb, struct rtmsg *rtm,
 		iph->saddr = rt->rt6i_src.addr;
 		iph->daddr = rt->rt6i_dst.addr;
 
-		err = ip6mr_cache_unresolved(mrt, vif, skb2);
+		err = ip6mr_cache_unresolved(mrt, vif, skb2, dev);
 		read_unlock(&mrt_lock);
 
 		return err;
-- 
2.11.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 4/5] ipv6: do not drop vrf udp multicast packets
  2018-09-20  8:58 ` [PATCH net-next 4/5] ipv6: do not drop vrf udp multicast packets Mike Manning
@ 2018-09-20 13:02   ` Paolo Abeni
  2018-09-20 16:50     ` Mike Manning
  0 siblings, 1 reply; 12+ messages in thread
From: Paolo Abeni @ 2018-09-20 13:02 UTC (permalink / raw)
  To: Mike Manning, netdev; +Cc: Dewi Morgan

Hi,

On Thu, 2018-09-20 at 09:58 +0100, Mike Manning wrote:
> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
> index 108f5f88ec98..fc60f297d95b 100644
> --- a/net/ipv6/ip6_input.c
> +++ b/net/ipv6/ip6_input.c
> @@ -325,9 +325,12 @@ static int ip6_input_finish(struct net *net, struct sock *sk, struct sk_buff *sk
>  {
>  	const struct inet6_protocol *ipprot;
>  	struct inet6_dev *idev;
> +	struct net_device *dev;
>  	unsigned int nhoff;
> +	int sdif = inet6_sdif(skb);
>  	int nexthdr;
>  	bool raw;
> +	bool deliver;
>  	bool have_final = false;

Please, try instead to sort the variable in reverse x-mas tree order.
>  
>  	/*
> @@ -371,9 +374,27 @@ static int ip6_input_finish(struct net *net, struct sock *sk, struct sk_buff *sk
>  			skb_postpull_rcsum(skb, skb_network_header(skb),
>  					   skb_network_header_len(skb));
>  			hdr = ipv6_hdr(skb);
> -			if (ipv6_addr_is_multicast(&hdr->daddr) &&
> -			    !ipv6_chk_mcast_addr(skb->dev, &hdr->daddr,
> -			    &hdr->saddr) &&
> +
> +			/* skb->dev passed may be master dev for vrfs. */
> +			if (sdif) {
> +				rcu_read_lock();

AFAICS, the rcu lock is already acquired at the beginning of
ip6_input_finish(), not need to acquire it here again.
+				dev = dev_get_by_index_rcu(dev_net(skb->dev),
> +							   sdif);
> +				if (!dev) {
> +					rcu_read_unlock();
> +					kfree_skb(skb);
> +					return -ENODEV;
> +				}
> +			} else {
> +				dev = skb->dev;

The above fragment of code is a recurring pattern in this series,
perhaps adding an helper for it would reduce code duplication ?

Cheers,

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 4/5] ipv6: do not drop vrf udp multicast packets
  2018-09-20 13:02   ` Paolo Abeni
@ 2018-09-20 16:50     ` Mike Manning
  0 siblings, 0 replies; 12+ messages in thread
From: Mike Manning @ 2018-09-20 16:50 UTC (permalink / raw)
  To: Paolo Abeni, netdev; +Cc: Dewi Morgan

On 20/09/2018 14:02, Paolo Abeni wrote:
> Hi,
>
> On Thu, 2018-09-20 at 09:58 +0100, Mike Manning wrote:
>> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
>> index 108f5f88ec98..fc60f297d95b 100644
>> --- a/net/ipv6/ip6_input.c
>> +++ b/net/ipv6/ip6_input.c
>> @@ -325,9 +325,12 @@ static int ip6_input_finish(struct net *net, struct sock *sk, struct sk_buff *sk
>>  {
>>  	const struct inet6_protocol *ipprot;
>>  	struct inet6_dev *idev;
>> +	struct net_device *dev;
>>  	unsigned int nhoff;
>> +	int sdif = inet6_sdif(skb);
>>  	int nexthdr;
>>  	bool raw;
>> +	bool deliver;
>>  	bool have_final = false;
> Please, try instead to sort the variable in reverse x-mas tree order.
Will do.
>>  
>>  	/*
>> @@ -371,9 +374,27 @@ static int ip6_input_finish(struct net *net, struct sock *sk, struct sk_buff *sk
>>  			skb_postpull_rcsum(skb, skb_network_header(skb),
>>  					   skb_network_header_len(skb));
>>  			hdr = ipv6_hdr(skb);
>> -			if (ipv6_addr_is_multicast(&hdr->daddr) &&
>> -			    !ipv6_chk_mcast_addr(skb->dev, &hdr->daddr,
>> -			    &hdr->saddr) &&
>> +
>> +			/* skb->dev passed may be master dev for vrfs. */
>> +			if (sdif) {
>> +				rcu_read_lock();
> AFAICS, the rcu lock is already acquired at the beginning of
> ip6_input_finish(), not need to acquire it here again.
Nice catch, I will remove this.
> +				dev = dev_get_by_index_rcu(dev_net(skb->dev),
>> +							   sdif);
>> +				if (!dev) {
>> +					rcu_read_unlock();
>> +					kfree_skb(skb);
>> +					return -ENODEV;
>> +				}
>> +			} else {
>> +				dev = skb->dev;
> The above fragment of code is a recurring pattern in this series,
> perhaps adding an helper for it would reduce code duplication ?

This pattern of checking the secondary device index is used only twice, both in this file.

But with now one instance having the rcu lock handling, and the other not, I cannot refactor this.
>
> Cheers,
>
> Paolo
>
Thanks for the review! I will wait for further comments before producing a v1 of the series.

Regards, Mike

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 0/5] vrf: allow simultaneous service instances in default and other VRFs
  2018-09-20  8:58 [PATCH net-next 0/5] vrf: allow simultaneous service instances in default and other VRFs Mike Manning
                   ` (4 preceding siblings ...)
  2018-09-20  8:58 ` [PATCH net-next 5/5] ipv6: add vrf table handling code for ipv6 mcast Mike Manning
@ 2018-09-21  4:28 ` David Ahern
  2018-09-21 14:41   ` David Miller
  5 siblings, 1 reply; 12+ messages in thread
From: David Ahern @ 2018-09-21  4:28 UTC (permalink / raw)
  To: Mike Manning, netdev, David Miller

On 9/20/18 1:58 AM, Mike Manning wrote:
> Services currently have to be VRF-aware if they are using an unbound
> socket. One cannot have multiple service instances running in the
> default and other VRFs for services that are not VRF-aware and listen
> on an unbound socket. This is because there is no way of isolating
> packets received in the default VRF from those arriving in other VRFs.
> 
> This series provides this isolation subject to the existing kernel
> parameter net.ipv4.tcp_l3mdev_accept not being set, given that this is
> documented as allowing a single service instance to work across all
> VRF domains. The functionality applies to UDP & TCP services, for IPv4
> and IPv6, in particular adding VRF table handling for IPv6 multicast.
> 
> Example of running ssh instances in default and blue VRF:
> 
> $ /usr/sbin/sshd -D
> $ ip vrf exec vrf-blue /usr/sbin/sshd
> $ ss -ta | egrep 'State|ssh'
> State   Recv-Q   Send-Q           Local Address:Port       Peer Address:Port
> LISTEN  0        128           0.0.0.0%vrf-blue:ssh             0.0.0.0:*
> LISTEN  0        128                    0.0.0.0:ssh             0.0.0.0:*
> ESTAB   0        0              192.168.122.220:ssh       192.168.122.1:50282
> LISTEN  0        128              [::]%vrf-blue:ssh                [::]:*
> LISTEN  0        128                       [::]:ssh                [::]:*
> ESTAB   0        0           [3000::2]%vrf-blue:ssh           [3000::9]:45896
> ESTAB   0        0                    [2000::2]:ssh           [2000::9]:46398
> 

Hi Dave:

I need some time to review and more importantly test this patch set
before it is committed. I am traveling tomorrow afternoon through Sunday
evening, so I need a few days into next week to get to this.

Thanks,
David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 0/5] vrf: allow simultaneous service instances in default and other VRFs
  2018-09-21  4:28 ` [PATCH net-next 0/5] vrf: allow simultaneous service instances in default and other VRFs David Ahern
@ 2018-09-21 14:41   ` David Miller
  0 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2018-09-21 14:41 UTC (permalink / raw)
  To: dsahern; +Cc: mmanning, netdev

From: David Ahern <dsahern@gmail.com>
Date: Thu, 20 Sep 2018 21:28:43 -0700

> I need some time to review and more importantly test this patch set
> before it is committed. I am traveling tomorrow afternoon through Sunday
> evening, so I need a few days into next week to get to this.

Sure, no problem.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 1/5] net: allow binding socket in a VRF when there's an unbound socket
  2018-09-20  8:58 ` [PATCH net-next 1/5] net: allow binding socket in a VRF when there's an unbound socket Mike Manning
@ 2018-09-23  8:47   ` kbuild test robot
  2018-09-23  9:58   ` kbuild test robot
  1 sibling, 0 replies; 12+ messages in thread
From: kbuild test robot @ 2018-09-23  8:47 UTC (permalink / raw)
  To: Mike Manning; +Cc: kbuild-all, netdev, Robert Shearman

[-- Attachment #1: Type: text/plain, Size: 1476 bytes --]

Hi Robert,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Mike-Manning/vrf-allow-simultaneous-service-instances-in-default-and-other-VRFs/20180923-162308
config: x86_64-randconfig-x004-201838 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   In file included from include/net/tcp.h:36:0,
                    from net/core/sock.c:143:
   include/net/inet_hashtables.h: In function 'inet_sk_bound_dev_eq':
>> include/net/inet_hashtables.h:196:29: error: 'struct netns_ipv4' has no member named 'sysctl_tcp_l3mdev_accept'; did you mean 'sysctl_tcp_fwmark_accept'?
      return !sdif || net->ipv4.sysctl_tcp_l3mdev_accept;
                                ^~~~~~~~~~~~~~~~~~~~~~~~
                                sysctl_tcp_fwmark_accept

vim +196 include/net/inet_hashtables.h

   191	
   192	static inline bool inet_sk_bound_dev_eq(struct net *net, int bound_dev_if,
   193						int dif, int sdif)
   194	{
   195		if (!bound_dev_if)
 > 196			return !sdif || net->ipv4.sysctl_tcp_l3mdev_accept;
   197		return bound_dev_if == dif || bound_dev_if == sdif;
   198	}
   199	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 32260 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 1/5] net: allow binding socket in a VRF when there's an unbound socket
  2018-09-20  8:58 ` [PATCH net-next 1/5] net: allow binding socket in a VRF when there's an unbound socket Mike Manning
  2018-09-23  8:47   ` kbuild test robot
@ 2018-09-23  9:58   ` kbuild test robot
  1 sibling, 0 replies; 12+ messages in thread
From: kbuild test robot @ 2018-09-23  9:58 UTC (permalink / raw)
  To: Mike Manning; +Cc: kbuild-all, netdev, Robert Shearman

[-- Attachment #1: Type: text/plain, Size: 1361 bytes --]

Hi Robert,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Mike-Manning/vrf-allow-simultaneous-service-instances-in-default-and-other-VRFs/20180923-162308
config: i386-randconfig-x0-09231642 (attached as .config)
compiler: gcc-5 (Debian 5.5.0-3) 5.4.1 20171010
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   In file included from include/net/tcp.h:36:0,
                    from net/ipv6/af_inet6.c:51:
   include/net/inet_hashtables.h: In function 'inet_sk_bound_dev_eq':
>> include/net/inet_hashtables.h:196:28: error: 'struct netns_ipv4' has no member named 'sysctl_tcp_l3mdev_accept'
      return !sdif || net->ipv4.sysctl_tcp_l3mdev_accept;
                               ^

vim +196 include/net/inet_hashtables.h

   191	
   192	static inline bool inet_sk_bound_dev_eq(struct net *net, int bound_dev_if,
   193						int dif, int sdif)
   194	{
   195		if (!bound_dev_if)
 > 196			return !sdif || net->ipv4.sysctl_tcp_l3mdev_accept;
   197		return bound_dev_if == dif || bound_dev_if == sdif;
   198	}
   199	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 30848 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-09-23 15:56 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-20  8:58 [PATCH net-next 0/5] vrf: allow simultaneous service instances in default and other VRFs Mike Manning
2018-09-20  8:58 ` [PATCH net-next 1/5] net: allow binding socket in a VRF when there's an unbound socket Mike Manning
2018-09-23  8:47   ` kbuild test robot
2018-09-23  9:58   ` kbuild test robot
2018-09-20  8:58 ` [PATCH net-next 2/5] ipv6: allow link-local and multicast packets inside vrf Mike Manning
2018-09-20  8:58 ` [PATCH net-next 3/5] ipv4: Allow sending multicast packets on specific i/f using VRF socket Mike Manning
2018-09-20  8:58 ` [PATCH net-next 4/5] ipv6: do not drop vrf udp multicast packets Mike Manning
2018-09-20 13:02   ` Paolo Abeni
2018-09-20 16:50     ` Mike Manning
2018-09-20  8:58 ` [PATCH net-next 5/5] ipv6: add vrf table handling code for ipv6 mcast Mike Manning
2018-09-21  4:28 ` [PATCH net-next 0/5] vrf: allow simultaneous service instances in default and other VRFs David Ahern
2018-09-21 14:41   ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.