All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 00/13] net: Various VRF patches
@ 2016-05-05  3:33 David Ahern
  2016-05-05  3:33 ` [PATCH net-next 01/13] net: vrf: Create FIB tables on link create David Ahern
                   ` (13 more replies)
  0 siblings, 14 replies; 21+ messages in thread
From: David Ahern @ 2016-05-05  3:33 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

Various fixes and features for VRF over the past few months.

Patch 1 creates the FIB tables when VRF device is created.

Patch 2 is a prep for 3 which allow sends via cmsg/IP-PKTINFO on interfaces
enslaved to a VRF device.

Patch 4 fixes missing TCP reset / ECONNREFUSED packets in response to
packets to unused ports.

Patch 5 moves the packet hook to L3. It simplifies the overhead of VRFs,
removing a lot of packet path code from VRF driver and is the foundation
for the patches after it.

Patch 6 fixes IP{6}_PKTINFO to returned the index of the enslaved
interface rather than the VRF device. A required feature for proper
VRF support.

Patches 7 and 8 provide support for locally originated traffic to local
addresses.

Patch 9 accommdates a change to the IP6 route lookups - passing flags
to l3mdev/vrf route lookups.

Patch 10 provides support for IPv6 multicast and link local addresses.

Patch 11 protects access to the cached vrf dst entries which can be
deleted on 1 cpu while processing packets on another cpu.

Patches 12 and 13 fix up IPv6 source address selections.

David Ahern (13):
  net: vrf: Create FIB tables on link create
  net: l3mdev: Move get_saddr and rt6_dst
  net: l3mdev: Allow send on enslaved interface
  net: ipv6: tcp reset, icmp need to consider L3 domain
  net: l3mdev: Add hook in ip and ipv6
  net: original ingress device index in PKTINFO
  net: vrf: ipv4 support for local traffic to local addresses
  net: vrf: ipv6 support for local traffic to local addresses
  net: l3mdev: Propagate route lookup flags for IPv6
  net: vrf: Handle ipv6 multicast and link-local addresses
  net: vrf: rcu protect changes to private data
  net: vrf: Implement get_saddr for IPv6
  net: ipv6: address selection should only consider devices in L3 domain

 drivers/net/vrf.c         | 632 ++++++++++++++++++++++++++++++++++------------
 include/linux/ipv6.h      |   2 +-
 include/linux/netdevice.h |   2 +
 include/net/ip.h          |   1 +
 include/net/ip6_route.h   |  24 +-
 include/net/l3mdev.h      |  85 ++++---
 include/net/tcp.h         |   2 +-
 net/core/dev.c            |   3 +-
 net/ipv4/fib_frontend.c   |   1 +
 net/ipv4/ip_input.c       |   8 +
 net/ipv4/ip_sockglue.c    |   9 +-
 net/ipv4/route.c          |   4 +
 net/ipv6/addrconf.c       |   9 +-
 net/ipv6/icmp.c           |   7 +-
 net/ipv6/ip6_fib.c        |   1 +
 net/ipv6/ip6_input.c      |   8 +
 net/ipv6/ip6_output.c     |  12 +-
 net/ipv6/route.c          |  24 +-
 net/ipv6/tcp_ipv6.c       |   7 +-
 net/l3mdev/l3mdev.c       |  92 +++++++
 20 files changed, 701 insertions(+), 232 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH net-next 01/13] net: vrf: Create FIB tables on link create
  2016-05-05  3:33 [PATCH net-next 00/13] net: Various VRF patches David Ahern
@ 2016-05-05  3:33 ` David Ahern
  2016-05-05  3:33 ` [PATCH net-next 02/13] net: l3mdev: Move get_saddr and rt6_dst David Ahern
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: David Ahern @ 2016-05-05  3:33 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

Tables have to exist for VRFs to function. Ensure they exist
when VRF device is created.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 drivers/net/vrf.c       | 11 +++++++++--
 net/ipv4/fib_frontend.c |  1 +
 net/ipv6/ip6_fib.c      |  1 +
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 8a8f1e58b415..2f2aac1b598f 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -372,9 +372,13 @@ static int vrf_rt6_create(struct net_device *dev)
 	if (!rt6)
 		goto out;
 
-	rt6->dst.output	= vrf_output6;
-	rt6->rt6i_table = fib6_get_table(net, vrf->tb_id);
 	dst_hold(&rt6->dst);
+
+	rt6->rt6i_table = fib6_new_table(net, vrf->tb_id);
+	if (!rt6->rt6i_table)
+		goto out;
+
+	rt6->dst.output	= vrf_output6;
 	vrf->rt6 = rt6;
 	rc = 0;
 out:
@@ -462,6 +466,9 @@ static struct rtable *vrf_rtable_create(struct net_device *dev)
 	struct net_vrf *vrf = netdev_priv(dev);
 	struct rtable *rth;
 
+	if (!fib_new_table(dev_net(dev), vrf->tb_id))
+		return NULL;
+
 	rth = rt_dst_alloc(dev, 0, RTN_UNICAST, 1, 1, 0);
 	if (rth) {
 		rth->dst.output	= vrf_output;
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 63566ec54794..ef2ebeb89d0f 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -110,6 +110,7 @@ struct fib_table *fib_new_table(struct net *net, u32 id)
 	hlist_add_head_rcu(&tb->tb_hlist, &net->ipv4.fib_table_hash[h]);
 	return tb;
 }
+EXPORT_SYMBOL_GPL(fib_new_table);
 
 /* caller must hold either rtnl or rcu read lock */
 struct fib_table *fib_get_table(struct net *net, u32 id)
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index ea071fad67a0..1bcef2369d64 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -240,6 +240,7 @@ struct fib6_table *fib6_new_table(struct net *net, u32 id)
 
 	return tb;
 }
+EXPORT_SYMBOL_GPL(fib6_new_table);
 
 struct fib6_table *fib6_get_table(struct net *net, u32 id)
 {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH net-next 02/13] net: l3mdev: Move get_saddr and rt6_dst
  2016-05-05  3:33 [PATCH net-next 00/13] net: Various VRF patches David Ahern
  2016-05-05  3:33 ` [PATCH net-next 01/13] net: vrf: Create FIB tables on link create David Ahern
@ 2016-05-05  3:33 ` David Ahern
  2016-05-05  3:33 ` [PATCH net-next 03/13] net: l3mdev: Allow send on enslaved interface David Ahern
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: David Ahern @ 2016-05-05  3:33 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

Move l3mdev_rt6_dst_by_oif and l3mdev_get_saddr to l3mdev.c. Collapse
l3mdev_get_rt6_dst into l3mdev_rt6_dst_by_oif since it is the only
user and keep the l3mdev_get_rt6_dst name for consistency with other
hooks.

A follow-on patch adds more code to these functions making them long
for inlined functions.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 include/net/l3mdev.h | 56 +++-------------------------------------------------
 net/ipv6/route.c     |  2 +-
 net/l3mdev/l3mdev.c  | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 59 insertions(+), 54 deletions(-)

diff --git a/include/net/l3mdev.h b/include/net/l3mdev.h
index c43a9c73de5e..78872bd1dc2c 100644
--- a/include/net/l3mdev.h
+++ b/include/net/l3mdev.h
@@ -130,52 +130,9 @@ static inline bool netif_index_is_l3_master(struct net *net, int ifindex)
 	return rc;
 }
 
-static inline int l3mdev_get_saddr(struct net *net, int ifindex,
-				   struct flowi4 *fl4)
-{
-	struct net_device *dev;
-	int rc = 0;
-
-	if (ifindex) {
-
-		rcu_read_lock();
-
-		dev = dev_get_by_index_rcu(net, ifindex);
-		if (dev && netif_is_l3_master(dev) &&
-		    dev->l3mdev_ops->l3mdev_get_saddr) {
-			rc = dev->l3mdev_ops->l3mdev_get_saddr(dev, fl4);
-		}
-
-		rcu_read_unlock();
-	}
-
-	return rc;
-}
+int l3mdev_get_saddr(struct net *net, int ifindex, struct flowi4 *fl4);
 
-static inline struct dst_entry *l3mdev_get_rt6_dst(const struct net_device *dev,
-						   const struct flowi6 *fl6)
-{
-	if (netif_is_l3_master(dev) && dev->l3mdev_ops->l3mdev_get_rt6_dst)
-		return dev->l3mdev_ops->l3mdev_get_rt6_dst(dev, fl6);
-
-	return NULL;
-}
-
-static inline
-struct dst_entry *l3mdev_rt6_dst_by_oif(struct net *net,
-					const struct flowi6 *fl6)
-{
-	struct dst_entry *dst = NULL;
-	struct net_device *dev;
-
-	dev = dev_get_by_index(net, fl6->flowi6_oif);
-	if (dev) {
-		dst = l3mdev_get_rt6_dst(dev, fl6);
-		dev_put(dev);
-	}
-
-	return dst;
-}
+struct dst_entry *l3mdev_get_rt6_dst(struct net *net, const struct flowi6 *fl6);
 
 #else
 
@@ -233,14 +190,7 @@ static inline int l3mdev_get_saddr(struct net *net, int ifindex,
 }
 
 static inline
-struct dst_entry *l3mdev_get_rt6_dst(const struct net_device *dev,
-				     const struct flowi6 *fl6)
-{
-	return NULL;
-}
-static inline
-struct dst_entry *l3mdev_rt6_dst_by_oif(struct net *net,
-					const struct flowi6 *fl6)
+struct dst_entry *l3mdev_get_rt6_dst(struct net *net, const struct flowi6 *fl6)
 {
 	return NULL;
 }
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index af46e19205f5..c42fa1deb152 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1190,7 +1190,7 @@ struct dst_entry *ip6_route_output_flags(struct net *net, const struct sock *sk,
 	struct dst_entry *dst;
 	bool any_src;
 
-	dst = l3mdev_rt6_dst_by_oif(net, fl6);
+	dst = l3mdev_get_rt6_dst(net, fl6);
 	if (dst)
 		return dst;
 
diff --git a/net/l3mdev/l3mdev.c b/net/l3mdev/l3mdev.c
index e925037fa0df..0fe4211e646f 100644
--- a/net/l3mdev/l3mdev.c
+++ b/net/l3mdev/l3mdev.c
@@ -97,3 +97,58 @@ u32 l3mdev_fib_table_by_index(struct net *net, int ifindex)
 	return tb_id;
 }
 EXPORT_SYMBOL_GPL(l3mdev_fib_table_by_index);
+
+/**
+ *	l3mdev_get_rt6_dst - IPv6 route lookup based on flow. Returns
+ *			     cached route for L3 master device if relevant
+ *			     to flow
+ *	@net: network namespace for device index lookup
+ *	@fl6: IPv6 flow struct for lookup
+ */
+
+struct dst_entry *l3mdev_get_rt6_dst(struct net *net,
+				     const struct flowi6 *fl6)
+{
+	struct dst_entry *dst = NULL;
+	struct net_device *dev;
+
+	dev = dev_get_by_index(net, fl6->flowi6_oif);
+	if (dev) {
+		if (netif_is_l3_master(dev) &&
+		    dev->l3mdev_ops->l3mdev_get_rt6_dst)
+			dst = dev->l3mdev_ops->l3mdev_get_rt6_dst(dev, fl6);
+		dev_put(dev);
+	}
+
+	return dst;
+}
+EXPORT_SYMBOL_GPL(l3mdev_get_rt6_dst);
+
+/**
+ *	l3mdev_get_saddr - get source address for a flow based on an interface
+ *			   enslaved to an L3 master device
+ *	@net: network namespace for device index lookup
+ *	@ifindex: Interface index
+ *	@fl4: IPv4 flow struct
+ */
+
+int l3mdev_get_saddr(struct net *net, int ifindex, struct flowi4 *fl4)
+{
+	struct net_device *dev;
+	int rc = 0;
+
+	if (ifindex) {
+		rcu_read_lock();
+
+		dev = dev_get_by_index_rcu(net, ifindex);
+		if (dev && netif_is_l3_master(dev) &&
+		    dev->l3mdev_ops->l3mdev_get_saddr) {
+			rc = dev->l3mdev_ops->l3mdev_get_saddr(dev, fl4);
+		}
+
+		rcu_read_unlock();
+	}
+
+	return rc;
+}
+EXPORT_SYMBOL_GPL(l3mdev_get_saddr);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH net-next 03/13] net: l3mdev: Allow send on enslaved interface
  2016-05-05  3:33 [PATCH net-next 00/13] net: Various VRF patches David Ahern
  2016-05-05  3:33 ` [PATCH net-next 01/13] net: vrf: Create FIB tables on link create David Ahern
  2016-05-05  3:33 ` [PATCH net-next 02/13] net: l3mdev: Move get_saddr and rt6_dst David Ahern
@ 2016-05-05  3:33 ` David Ahern
  2016-05-05  7:40   ` Julian Anastasov
  2016-05-05  3:33 ` [PATCH net-next 04/13] net: ipv6: tcp reset, icmp need to consider L3 domain David Ahern
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 21+ messages in thread
From: David Ahern @ 2016-05-05  3:33 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

Allow udp and raw sockets to send by oif that is an enslaved interface
versus the l3mdev/VRF device. For example, this allows BFD to use ifindex
from IP_PKTINFO on a receive to send a response without the need to
convert to the VRF index. It also allows ping and ping6 to work when
specifying an enslaved interface (e.g., ping -I swp1 <ip>) which is
a natural use case.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 drivers/net/vrf.c   |  2 ++
 net/ipv4/route.c    |  4 ++++
 net/l3mdev/l3mdev.c | 20 +++++++++++++++-----
 3 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 2f2aac1b598f..3a04b8cac757 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -646,6 +646,8 @@ static int vrf_get_saddr(struct net_device *dev, struct flowi4 *fl4)
 
 	fl4->flowi4_flags |= FLOWI_FLAG_SKIP_NH_OIF;
 	fl4->flowi4_iif = LOOPBACK_IFINDEX;
+	/* make sure oif is set to VRF device for lookup */
+	fl4->flowi4_oif = dev->ifindex;
 	fl4->flowi4_tos = tos & IPTOS_RT_MASK;
 	fl4->flowi4_scope = ((tos & RTO_ONLINK) ?
 			     RT_SCOPE_LINK : RT_SCOPE_UNIVERSE);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 8c8c655bb2c4..a1f2830d8110 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2146,6 +2146,7 @@ struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 *fl4,
 	unsigned int flags = 0;
 	struct fib_result res;
 	struct rtable *rth;
+	int master_idx;
 	int orig_oif;
 	int err = -ENETUNREACH;
 
@@ -2155,6 +2156,9 @@ struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 *fl4,
 
 	orig_oif = fl4->flowi4_oif;
 
+	master_idx = l3mdev_master_ifindex_by_index(net, fl4->flowi4_oif);
+	if (master_idx)
+		fl4->flowi4_oif = master_idx;
 	fl4->flowi4_iif = LOOPBACK_IFINDEX;
 	fl4->flowi4_tos = tos & IPTOS_RT_MASK;
 	fl4->flowi4_scope = ((tos & RTO_ONLINK) ?
diff --git a/net/l3mdev/l3mdev.c b/net/l3mdev/l3mdev.c
index 0fe4211e646f..0fd8cc1417cd 100644
--- a/net/l3mdev/l3mdev.c
+++ b/net/l3mdev/l3mdev.c
@@ -112,12 +112,19 @@ struct dst_entry *l3mdev_get_rt6_dst(struct net *net,
 	struct dst_entry *dst = NULL;
 	struct net_device *dev;
 
-	dev = dev_get_by_index(net, fl6->flowi6_oif);
-	if (dev) {
-		if (netif_is_l3_master(dev) &&
-		    dev->l3mdev_ops->l3mdev_get_rt6_dst)
+	if (fl6->flowi6_oif) {
+		rcu_read_lock();
+
+		dev = dev_get_by_index_rcu(net, fl6->flowi6_oif);
+		if (dev && netif_is_l3_slave(dev))
+			dev = netdev_master_upper_dev_get_rcu(dev);
+
+		if (dev && netif_is_l3_master(dev) &&
+		    dev->l3mdev_ops->l3mdev_get_rt6_dst) {
 			dst = dev->l3mdev_ops->l3mdev_get_rt6_dst(dev, fl6);
-		dev_put(dev);
+		}
+
+		rcu_read_unlock();
 	}
 
 	return dst;
@@ -141,6 +148,9 @@ int l3mdev_get_saddr(struct net *net, int ifindex, struct flowi4 *fl4)
 		rcu_read_lock();
 
 		dev = dev_get_by_index_rcu(net, ifindex);
+		if (dev && netif_is_l3_slave(dev))
+			dev = netdev_master_upper_dev_get_rcu(dev);
+
 		if (dev && netif_is_l3_master(dev) &&
 		    dev->l3mdev_ops->l3mdev_get_saddr) {
 			rc = dev->l3mdev_ops->l3mdev_get_saddr(dev, fl4);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH net-next 04/13] net: ipv6: tcp reset, icmp need to consider L3 domain
  2016-05-05  3:33 [PATCH net-next 00/13] net: Various VRF patches David Ahern
                   ` (2 preceding siblings ...)
  2016-05-05  3:33 ` [PATCH net-next 03/13] net: l3mdev: Allow send on enslaved interface David Ahern
@ 2016-05-05  3:33 ` David Ahern
  2016-05-05  3:33 ` [PATCH net-next 05/13] net: l3mdev: Add hook in ip and ipv6 David Ahern
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: David Ahern @ 2016-05-05  3:33 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

Responses for packets to unused ports are getting lost with L3 domains.

IPv4 has ip_send_unicast_reply for sending TCP responses which accounts
for L3 domains; update the IPv6 counterpart tcp_v6_send_response.
For icmp the L3 master check needs to be moved up in icmp6_send
to properly respond to UDP packets to a port with no listener.

Fixes: ca254490c8df ("net: Add VRF support to IPv6 stack")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 net/ipv6/icmp.c     | 5 ++---
 net/ipv6/tcp_ipv6.c | 7 ++++++-
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 9554b99a8508..4527285fcaa2 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -446,6 +446,8 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info)
 
 	if (__ipv6_addr_needs_scope_id(addr_type))
 		iif = skb->dev->ifindex;
+	else
+		iif = l3mdev_master_ifindex(skb->dev);
 
 	/*
 	 *	Must not send error if the source does not uniquely
@@ -500,9 +502,6 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info)
 	else if (!fl6.flowi6_oif)
 		fl6.flowi6_oif = np->ucast_oif;
 
-	if (!fl6.flowi6_oif)
-		fl6.flowi6_oif = l3mdev_master_ifindex(skb->dev);
-
 	dst = icmpv6_route_lookup(net, skb, sk, &fl6);
 	if (IS_ERR(dst))
 		goto out;
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 7bdc9c9c231b..c4efaa97280c 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -810,8 +810,13 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32
 	fl6.flowi6_proto = IPPROTO_TCP;
 	if (rt6_need_strict(&fl6.daddr) && !oif)
 		fl6.flowi6_oif = tcp_v6_iif(skb);
-	else
+	else {
+		if (!oif && netif_index_is_l3_master(net, skb->skb_iif))
+			oif = skb->skb_iif;
+
 		fl6.flowi6_oif = oif;
+	}
+
 	fl6.flowi6_mark = IP6_REPLY_MARK(net, skb->mark);
 	fl6.fl6_dport = t1->dest;
 	fl6.fl6_sport = t1->source;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH net-next 05/13] net: l3mdev: Add hook in ip and ipv6
  2016-05-05  3:33 [PATCH net-next 00/13] net: Various VRF patches David Ahern
                   ` (3 preceding siblings ...)
  2016-05-05  3:33 ` [PATCH net-next 04/13] net: ipv6: tcp reset, icmp need to consider L3 domain David Ahern
@ 2016-05-05  3:33 ` David Ahern
  2016-05-05  3:33 ` [PATCH net-next 06/13] net: original ingress device index in PKTINFO David Ahern
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: David Ahern @ 2016-05-05  3:33 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

Currently the VRF driver uses the rx_handler to switch the skb device
to the VRF device. Switching the dev prior to the ip / ipv6 layer
means the VRF driver has to duplicate IP/IPv6 processing which is just
overhead and makes features such as retaining the ingress device index
more complicated than necessary.

This patch moves the hook to the L3 layer just after the first NF_HOOK
for PRE_ROUTING. This location makes exposing the original ingress device
fairly trivial (next patch) and allows adding other NF_HOOKs to the VRF
driver in the future.

dev_queue_xmit_nit is exported so that the VRF driver can cycle the skb
with the switched device through the packet taps to maintain current
behavior (tcpdump can be used on either the vrf device or the enslaved
devices).

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 drivers/net/vrf.c         | 186 ++++++++++++++++++++++------------------------
 include/linux/ipv6.h      |   2 +-
 include/linux/netdevice.h |   2 +
 include/net/l3mdev.h      |  43 +++++++++++
 include/net/tcp.h         |   2 +-
 net/core/dev.c            |   3 +-
 net/ipv4/ip_input.c       |   7 ++
 net/ipv6/ip6_input.c      |   7 ++
 8 files changed, 151 insertions(+), 101 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 3a04b8cac757..39bef1dc41fa 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -42,9 +42,6 @@
 #define DRV_NAME	"vrf"
 #define DRV_VERSION	"1.0"
 
-#define vrf_master_get_rcu(dev) \
-	((struct net_device *)rcu_dereference(dev->rx_handler_data))
-
 struct net_vrf {
 	struct rtable           *rth;
 	struct rt6_info		*rt6;
@@ -60,90 +57,12 @@ struct pcpu_dstats {
 	struct u64_stats_sync	syncp;
 };
 
-/* neighbor handling is done with actual device; do not want
- * to flip skb->dev for those ndisc packets. This really fails
- * for multiple next protocols (e.g., NEXTHDR_HOP). But it is
- * a start.
- */
-#if IS_ENABLED(CONFIG_IPV6)
-static bool check_ipv6_frame(const struct sk_buff *skb)
-{
-	const struct ipv6hdr *ipv6h;
-	struct ipv6hdr _ipv6h;
-	bool rc = true;
-
-	ipv6h = skb_header_pointer(skb, 0, sizeof(_ipv6h), &_ipv6h);
-	if (!ipv6h)
-		goto out;
-
-	if (ipv6h->nexthdr == NEXTHDR_ICMP) {
-		const struct icmp6hdr *icmph;
-		struct icmp6hdr _icmph;
-
-		icmph = skb_header_pointer(skb, sizeof(_ipv6h),
-					   sizeof(_icmph), &_icmph);
-		if (!icmph)
-			goto out;
-
-		switch (icmph->icmp6_type) {
-		case NDISC_ROUTER_SOLICITATION:
-		case NDISC_ROUTER_ADVERTISEMENT:
-		case NDISC_NEIGHBOUR_SOLICITATION:
-		case NDISC_NEIGHBOUR_ADVERTISEMENT:
-		case NDISC_REDIRECT:
-			rc = false;
-			break;
-		}
-	}
-
-out:
-	return rc;
-}
-#else
-static bool check_ipv6_frame(const struct sk_buff *skb)
-{
-	return false;
-}
-#endif
-
-static bool is_ip_rx_frame(struct sk_buff *skb)
-{
-	switch (skb->protocol) {
-	case htons(ETH_P_IP):
-		return true;
-	case htons(ETH_P_IPV6):
-		return check_ipv6_frame(skb);
-	}
-	return false;
-}
-
 static void vrf_tx_error(struct net_device *vrf_dev, struct sk_buff *skb)
 {
 	vrf_dev->stats.tx_errors++;
 	kfree_skb(skb);
 }
 
-/* note: already called with rcu_read_lock */
-static rx_handler_result_t vrf_handle_frame(struct sk_buff **pskb)
-{
-	struct sk_buff *skb = *pskb;
-
-	if (is_ip_rx_frame(skb)) {
-		struct net_device *dev = vrf_master_get_rcu(skb->dev);
-		struct pcpu_dstats *dstats = this_cpu_ptr(dev->dstats);
-
-		u64_stats_update_begin(&dstats->syncp);
-		dstats->rx_pkts++;
-		dstats->rx_bytes += skb->len;
-		u64_stats_update_end(&dstats->syncp);
-
-		skb->dev = dev;
-
-		return RX_HANDLER_ANOTHER;
-	}
-	return RX_HANDLER_PASS;
-}
-
 static struct rtnl_link_stats64 *vrf_get_stats64(struct net_device *dev,
 						 struct rtnl_link_stats64 *stats)
 {
@@ -504,28 +423,14 @@ static int do_vrf_add_slave(struct net_device *dev, struct net_device *port_dev)
 {
 	int ret;
 
-	/* register the packet handler for slave ports */
-	ret = netdev_rx_handler_register(port_dev, vrf_handle_frame, dev);
-	if (ret) {
-		netdev_err(port_dev,
-			   "Device %s failed to register rx_handler\n",
-			   port_dev->name);
-		goto out_fail;
-	}
-
 	ret = netdev_master_upper_dev_link(port_dev, dev, NULL, NULL);
 	if (ret < 0)
-		goto out_unregister;
+		return ret;
 
 	port_dev->priv_flags |= IFF_L3MDEV_SLAVE;
 	cycle_netdev(port_dev);
 
 	return 0;
-
-out_unregister:
-	netdev_rx_handler_unregister(port_dev);
-out_fail:
-	return ret;
 }
 
 static int vrf_add_slave(struct net_device *dev, struct net_device *port_dev)
@@ -542,8 +447,6 @@ static int do_vrf_del_slave(struct net_device *dev, struct net_device *port_dev)
 	netdev_upper_dev_unlink(port_dev, dev);
 	port_dev->priv_flags &= ~IFF_L3MDEV_SLAVE;
 
-	netdev_rx_handler_unregister(port_dev);
-
 	cycle_netdev(port_dev);
 
 	return 0;
@@ -668,6 +571,92 @@ static int vrf_get_saddr(struct net_device *dev, struct flowi4 *fl4)
 }
 
 #if IS_ENABLED(CONFIG_IPV6)
+/* neighbor handling is done with actual device; do not want
+ * to flip skb->dev for those ndisc packets. This really fails
+ * for multiple next protocols (e.g., NEXTHDR_HOP). But it is
+ * a start.
+ */
+static bool ipv6_ndisc_frame(const struct sk_buff *skb)
+{
+	const struct ipv6hdr *ipv6h = (struct ipv6hdr *)skb->data;
+	size_t hlen = sizeof(*ipv6h);
+	bool rc = false;
+
+	if (ipv6h->nexthdr == NEXTHDR_ICMP) {
+		const struct icmp6hdr *icmph;
+
+		if (skb->len < hlen + sizeof(*icmph))
+			goto out;
+
+		icmph = (struct icmp6hdr *)(skb->data + sizeof(*ipv6h));
+		switch (icmph->icmp6_type) {
+		case NDISC_ROUTER_SOLICITATION:
+		case NDISC_ROUTER_ADVERTISEMENT:
+		case NDISC_NEIGHBOUR_SOLICITATION:
+		case NDISC_NEIGHBOUR_ADVERTISEMENT:
+		case NDISC_REDIRECT:
+			rc = true;
+			break;
+		}
+	}
+
+out:
+	return rc;
+}
+
+static struct sk_buff *vrf_ip6_rcv(struct net_device *vrf_dev,
+				   struct sk_buff *skb)
+{
+	/* if packet is NDISC keep the ingress interface */
+	if (!ipv6_ndisc_frame(skb)) {
+		skb->dev = vrf_dev;
+		skb->skb_iif = vrf_dev->ifindex;
+
+		skb_push(skb, skb->mac_len);
+		dev_queue_xmit_nit(skb, vrf_dev);
+		skb_pull(skb, skb->mac_len);
+	}
+
+	return skb;
+}
+
+#else
+static struct sk_buff *vrf_ip6_rcv(struct net_device *vrf_dev,
+				   struct sk_buff *skb)
+{
+	return skb;
+}
+#endif
+
+static struct sk_buff *vrf_ip_rcv(struct net_device *vrf_dev,
+				  struct sk_buff *skb)
+{
+	skb->dev = vrf_dev;
+	skb->skb_iif = vrf_dev->ifindex;
+
+	skb_push(skb, skb->mac_len);
+	dev_queue_xmit_nit(skb, vrf_dev);
+	skb_pull(skb, skb->mac_len);
+
+	return skb;
+}
+
+/* called with rcu lock held */
+static struct sk_buff *vrf_l3_rcv(struct net_device *vrf_dev,
+				  struct sk_buff *skb,
+				  u16 proto)
+{
+	switch (proto) {
+	case AF_INET:
+		return vrf_ip_rcv(vrf_dev, skb);
+	case AF_INET6:
+		return vrf_ip6_rcv(vrf_dev, skb);
+	}
+
+	return skb;
+}
+
+#if IS_ENABLED(CONFIG_IPV6)
 static struct dst_entry *vrf_get_rt6_dst(const struct net_device *dev,
 					 const struct flowi6 *fl6)
 {
@@ -688,6 +677,7 @@ static const struct l3mdev_ops vrf_l3mdev_ops = {
 	.l3mdev_fib_table	= vrf_fib_table,
 	.l3mdev_get_rtable	= vrf_get_rtable,
 	.l3mdev_get_saddr	= vrf_get_saddr,
+	.l3mdev_l3_rcv		= vrf_l3_rcv,
 #if IS_ENABLED(CONFIG_IPV6)
 	.l3mdev_get_rt6_dst	= vrf_get_rt6_dst,
 #endif
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 58d6e158755f..2c460121498b 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -125,7 +125,7 @@ struct inet6_skb_parm {
 
 static inline int inet6_iif(const struct sk_buff *skb)
 {
-	return IP6CB(skb)->iif;
+	return skb->skb_iif > 1 ? skb->skb_iif : IP6CB(skb)->iif;
 }
 
 struct tcp6_request_sock {
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index bcf012637d10..3dce605d5273 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3266,6 +3266,8 @@ int dev_forward_skb(struct net_device *dev, struct sk_buff *skb);
 bool is_skb_forwardable(const struct net_device *dev,
 			const struct sk_buff *skb);
 
+void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev);
+
 extern int		netdev_budget;
 
 /* Called by rtnetlink.c:rtnl_unlock() */
diff --git a/include/net/l3mdev.h b/include/net/l3mdev.h
index 78872bd1dc2c..46ac4c69c155 100644
--- a/include/net/l3mdev.h
+++ b/include/net/l3mdev.h
@@ -25,6 +25,8 @@
 
 struct l3mdev_ops {
 	u32		(*l3mdev_fib_table)(const struct net_device *dev);
+	struct sk_buff * (*l3mdev_l3_rcv)(struct net_device *dev,
+					  struct sk_buff *skb, u16 proto);
 
 	/* IPv4 ops */
 	struct rtable *	(*l3mdev_get_rtable)(const struct net_device *dev,
@@ -134,6 +136,35 @@ int l3mdev_get_saddr(struct net *net, int ifindex, struct flowi4 *fl4);
 
 struct dst_entry *l3mdev_get_rt6_dst(struct net *net, const struct flowi6 *fl6);
 
+static inline
+struct sk_buff *l3mdev_l3_rcv(struct sk_buff *skb, u16 proto)
+{
+	struct net_device *master = NULL;
+
+	if (netif_is_l3_slave(skb->dev))
+		master = netdev_master_upper_dev_get_rcu(skb->dev);
+
+	else if (netif_is_l3_master(skb->dev))
+		master = skb->dev;
+
+	if (master && master->l3mdev_ops->l3mdev_l3_rcv)
+		skb = master->l3mdev_ops->l3mdev_l3_rcv(master, skb, proto);
+
+	return skb;
+}
+
+static inline
+struct sk_buff *l3mdev_ip_rcv(struct sk_buff *skb)
+{
+	return l3mdev_l3_rcv(skb, AF_INET);
+}
+
+static inline
+struct sk_buff *l3mdev_ip6_rcv(struct sk_buff *skb)
+{
+	return l3mdev_l3_rcv(skb, AF_INET6);
+}
+
 #else
 
 static inline int l3mdev_master_ifindex_rcu(const struct net_device *dev)
@@ -194,6 +225,18 @@ struct dst_entry *l3mdev_get_rt6_dst(struct net *net, const struct flowi6 *fl6)
 {
 	return NULL;
 }
+
+static inline
+struct sk_buff *l3mdev_ip_rcv(struct sk_buff *skb)
+{
+	return skb;
+}
+
+static inline
+struct sk_buff *l3mdev_ip6_rcv(struct sk_buff *skb)
+{
+	return skb;
+}
 #endif
 
 #endif /* _NET_L3MDEV_H_ */
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 24ec80483805..bf08eb370c96 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -781,7 +781,7 @@ struct tcp_skb_cb {
  */
 static inline int tcp_v6_iif(const struct sk_buff *skb)
 {
-	return TCP_SKB_CB(skb)->header.h6.iif;
+	return skb->skb_iif > 1 ? skb->skb_iif : TCP_SKB_CB(skb)->header.h6.iif;
 }
 #endif
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 673d1f118bfb..845acb9f75d2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1850,7 +1850,7 @@ static inline bool skb_loop_sk(struct packet_type *ptype, struct sk_buff *skb)
  *	taps currently in use.
  */
 
-static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
+void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct packet_type *ptype;
 	struct sk_buff *skb2 = NULL;
@@ -1907,6 +1907,7 @@ static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
 		pt_prev->func(skb2, skb->dev, pt_prev, skb->dev);
 	rcu_read_unlock();
 }
+EXPORT_SYMBOL_GPL(dev_queue_xmit_nit);
 
 /**
  * netif_setup_tc - Handle tc mappings on real_num_tx_queues change
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index 751c0658e194..37375eedeef9 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -313,6 +313,13 @@ static int ip_rcv_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
 	const struct iphdr *iph = ip_hdr(skb);
 	struct rtable *rt;
 
+	/* if ingress device is enslaved to an L3 master device pass the
+	 * skb to its handler for processing
+	 */
+	skb = l3mdev_ip_rcv(skb);
+	if (!skb)
+		return NET_RX_SUCCESS;
+
 	if (net->ipv4.sysctl_ip_early_demux &&
 	    !skb_dst(skb) &&
 	    !skb->sk &&
diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index 6ed56012005d..f185cbcda114 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -49,6 +49,13 @@
 
 int ip6_rcv_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
 {
+	/* if ingress device is enslaved to an L3 master device pass the
+	 * skb to its handler for processing
+	 */
+	skb = l3mdev_ip6_rcv(skb);
+	if (!skb)
+		return NET_RX_SUCCESS;
+
 	if (net->ipv4.sysctl_ip_early_demux && !skb_dst(skb) && skb->sk == NULL) {
 		const struct inet6_protocol *ipprot;
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH net-next 06/13] net: original ingress device index in PKTINFO
  2016-05-05  3:33 [PATCH net-next 00/13] net: Various VRF patches David Ahern
                   ` (4 preceding siblings ...)
  2016-05-05  3:33 ` [PATCH net-next 05/13] net: l3mdev: Add hook in ip and ipv6 David Ahern
@ 2016-05-05  3:33 ` David Ahern
  2016-05-05  8:41   ` Julian Anastasov
  2016-05-05  3:33 ` [PATCH net-next 07/13] net: vrf: ipv4 support for local traffic to local addresses David Ahern
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 21+ messages in thread
From: David Ahern @ 2016-05-05  3:33 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

Applications such as OSPF and BFD need the original ingress device not
the VRF device; the latter can be derived from the former. To that end
add the skb_iif to inet_skb_parm and set it in ipv4 code after clearing
the skb control buffer similar to IPv6. From there the pktinfo can just
pull it from cb with the PKTINFO_SKB_CB cast.

The previous patch moving the skb->dev change to L3 means nothing else
is needed for IPv6; it just works.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 include/net/ip.h       | 1 +
 net/ipv4/ip_input.c    | 1 +
 net/ipv4/ip_sockglue.c | 9 +++++++--
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index 247ac82e9cf2..37165fba3741 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -36,6 +36,7 @@
 struct sock;
 
 struct inet_skb_parm {
+	int			iif;
 	struct ip_options	opt;		/* Compiled IP options		*/
 	unsigned char		flags;
 
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index 37375eedeef9..4b351af3e67b 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -478,6 +478,7 @@ int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt,
 
 	/* Remove any debris in the socket control block */
 	memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
+	IPCB(skb)->iif = skb->skb_iif;
 
 	/* Must drop socket now because of tproxy. */
 	skb_orphan(skb);
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index bdb222c0c6a2..dbcd027c38e7 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -476,9 +476,9 @@ static bool ipv4_datagram_support_cmsg(const struct sock *sk,
 	    (!skb->dev))
 		return false;
 
+	/* see comment in ipv4_pktinfo_prepare about CB re-use */
 	info = PKTINFO_SKB_CB(skb);
 	info->ipi_spec_dst.s_addr = ip_hdr(skb)->saddr;
-	info->ipi_ifindex = skb->dev->ifindex;
 	return true;
 }
 
@@ -1193,7 +1193,12 @@ void ipv4_pktinfo_prepare(const struct sock *sk, struct sk_buff *skb)
 		       ipv6_sk_rxinfo(sk);
 
 	if (prepare && skb_rtable(skb)) {
-		pktinfo->ipi_ifindex = inet_iif(skb);
+		/* skb->cb is overloaded: prior to this point it is IP{6}CB
+		 * which has interface index (iif) as the first member of the
+		 * underlying inet{6}_skb_parm struct. This code then overlays
+		 * PKTINFO_SKB_CB and in_pktinfo also has iif as the first
+		 * element so the iif is picked up from the prior IPCB
+		 */
 		pktinfo->ipi_spec_dst.s_addr = fib_compute_spec_dst(skb);
 	} else {
 		pktinfo->ipi_ifindex = 0;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH net-next 07/13] net: vrf: ipv4 support for local traffic to local addresses
  2016-05-05  3:33 [PATCH net-next 00/13] net: Various VRF patches David Ahern
                   ` (5 preceding siblings ...)
  2016-05-05  3:33 ` [PATCH net-next 06/13] net: original ingress device index in PKTINFO David Ahern
@ 2016-05-05  3:33 ` David Ahern
  2016-05-05  3:33 ` [PATCH net-next 08/13] net: vrf: ipv6 " David Ahern
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: David Ahern @ 2016-05-05  3:33 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

Add support for locally originated traffic to VRF local addresses.
This patch handles IPv4 support; follow on patch handles IPv6.

With this patch, ping, tcp and udp packets to a local IPv4 address are
successfully routed:

    $ ping -c1 -I red 10.100.1.1
    ping: Warning: source address might be selected on device other than red.
    PING 10.100.1.1 (10.100.1.1) from 10.100.1.1 red: 56(84) bytes of data.
    64 bytes from 10.100.1.1: icmp_seq=1 ttl=64 time=0.057 ms

This patch also enables use of IPv4 loopback address on the VRF device:
    $ ip addr add dev red 127.0.0.1/8

    $ ping -I red -c1 127.0.0.1
    PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 red: 56(84) bytes of data.
    64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.058 ms

which comes in handy for example when running ntpd in a VRF context and
then using ntpq to query status.

The l3mdev change also passes packets to the VRF driver if the ingress
device is an L3 master. This is needed to reset the packet type to HOST.
(It is set to LOOPBACK to avoid hitting network taps a second time on
Rx.)

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 drivers/net/vrf.c | 138 +++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 101 insertions(+), 37 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 39bef1dc41fa..b6e8b1e9b4fd 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -44,6 +44,7 @@
 
 struct net_vrf {
 	struct rtable           *rth;
+	struct rtable           *rth_local;
 	struct rt6_info		*rt6;
 	u32                     tb_id;
 };
@@ -54,6 +55,7 @@ struct pcpu_dstats {
 	u64			tx_drps;
 	u64			rx_pkts;
 	u64			rx_bytes;
+	u64			rx_drps;
 	struct u64_stats_sync	syncp;
 };
 
@@ -91,6 +93,40 @@ static struct rtnl_link_stats64 *vrf_get_stats64(struct net_device *dev,
 	return stats;
 }
 
+/* Local traffic destined to local address. Reinsert the packet to rx
+ * path, similar to loopback handling. Based on loopback_xmit
+ */
+static int vrf_local_xmit(struct sk_buff *skb, struct dst_entry *dst)
+{
+	struct net_device *dev = skb->dev;
+	struct pcpu_dstats *dstats = this_cpu_ptr(dev->dstats);
+	int len = skb->len;
+
+	skb_orphan(skb);
+
+	dst_hold(dst);
+	skb_dst_set(skb, dst);
+	skb_dst_force(skb);
+
+	/* set pkt_type to avoid skb hitting packet taps twice -
+	 * once Tx and again in Rx processing
+	 */
+	skb->pkt_type = PACKET_LOOPBACK;
+
+	skb->protocol = eth_type_trans(skb, skb->dev);
+
+	if (likely(netif_rx(skb) == NET_RX_SUCCESS)) {
+		u64_stats_update_begin(&dstats->syncp);
+		dstats->rx_pkts++;
+		dstats->rx_bytes += len;
+		u64_stats_update_end(&dstats->syncp);
+	} else {
+		this_cpu_inc(dev->dstats->rx_drps);
+	}
+
+	return NETDEV_TX_OK;
+}
+
 #if IS_ENABLED(CONFIG_IPV6)
 static netdev_tx_t vrf_process_v6_outbound(struct sk_buff *skb,
 					   struct net_device *dev)
@@ -112,6 +148,9 @@ static netdev_tx_t vrf_process_v6_outbound(struct sk_buff *skb,
 	struct dst_entry *dst;
 	struct dst_entry *dst_null = &net->ipv6.ip6_null_entry->dst;
 
+	/* strip the ethernet header added for pass through VRF device */
+	__skb_pull(skb, skb_network_offset(skb));
+
 	dst = ip6_route_output(net, NULL, &fl6);
 	if (dst == dst_null)
 		goto err;
@@ -139,29 +178,6 @@ static netdev_tx_t vrf_process_v6_outbound(struct sk_buff *skb,
 }
 #endif
 
-static int vrf_send_v4_prep(struct sk_buff *skb, struct flowi4 *fl4,
-			    struct net_device *vrf_dev)
-{
-	struct rtable *rt;
-	int err = 1;
-
-	rt = ip_route_output_flow(dev_net(vrf_dev), fl4, NULL);
-	if (IS_ERR(rt))
-		goto out;
-
-	/* TO-DO: what about broadcast ? */
-	if (rt->rt_type != RTN_UNICAST && rt->rt_type != RTN_LOCAL) {
-		ip_rt_put(rt);
-		goto out;
-	}
-
-	skb_dst_drop(skb);
-	skb_dst_set(skb, &rt->dst);
-	err = 0;
-out:
-	return err;
-}
-
 static netdev_tx_t vrf_process_v4_outbound(struct sk_buff *skb,
 					   struct net_device *vrf_dev)
 {
@@ -176,9 +192,35 @@ static netdev_tx_t vrf_process_v4_outbound(struct sk_buff *skb,
 				FLOWI_FLAG_SKIP_NH_OIF,
 		.daddr = ip4h->daddr,
 	};
+	struct net *net = dev_net(vrf_dev);
+	struct rtable *rt;
 
-	if (vrf_send_v4_prep(skb, &fl4, vrf_dev))
+	rt = ip_route_output_flow(net, &fl4, NULL);
+	if (IS_ERR(rt))
+		goto err;
+
+	if (rt->rt_type != RTN_UNICAST && rt->rt_type != RTN_LOCAL) {
+		ip_rt_put(rt);
 		goto err;
+	}
+
+	skb_dst_drop(skb);
+
+	/* if dst.dev is loopback or the VRF device again this is locally
+	 * originated traffic destined to a local address. Short circuit
+	 * to Rx path using our local dst
+	 */
+	if (rt->dst.dev == net->loopback_dev || rt->dst.dev == vrf_dev) {
+		struct net_vrf *vrf = netdev_priv(vrf_dev);
+
+		ip_rt_put(rt);
+		return vrf_local_xmit(skb, &vrf->rth_local->dst);
+	}
+
+	skb_dst_set(skb, &rt->dst);
+
+	/* strip the ethernet header added for pass through VRF device */
+	__skb_pull(skb, skb_network_offset(skb));
 
 	if (!ip4h->saddr) {
 		ip4h->saddr = inet_select_addr(skb_dst(skb)->dev, 0,
@@ -200,9 +242,6 @@ static netdev_tx_t vrf_process_v4_outbound(struct sk_buff *skb,
 
 static netdev_tx_t is_ip_tx_frame(struct sk_buff *skb, struct net_device *dev)
 {
-	/* strip the ethernet header added for pass through VRF device */
-	__skb_pull(skb, skb_network_offset(skb));
-
 	switch (skb->protocol) {
 	case htons(ETH_P_IP):
 		return vrf_process_v4_outbound(skb, dev);
@@ -374,27 +413,45 @@ static int vrf_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 
 static void vrf_rtable_release(struct net_vrf *vrf)
 {
-	struct dst_entry *dst = (struct dst_entry *)vrf->rth;
+	dst_release(&vrf->rth->dst);
+	dst_release(&vrf->rth_local->dst);
 
-	dst_release(dst);
 	vrf->rth = NULL;
+	vrf->rth_local = NULL;
 }
 
-static struct rtable *vrf_rtable_create(struct net_device *dev)
+static int vrf_rtable_create(struct net_device *dev)
 {
 	struct net_vrf *vrf = netdev_priv(dev);
 	struct rtable *rth;
 
 	if (!fib_new_table(dev_net(dev), vrf->tb_id))
-		return NULL;
+		return -ENOMEM;
 
+	/* create a dst for local ingress routing - packets sent locally
+	 * to local address via the VRF device as a loopback
+	 */
+	rth = rt_dst_alloc(dev, RTCF_LOCAL, RTN_LOCAL, 1, 1, 0);
+	if (!rth)
+		return -ENOMEM;
+
+	rth->dst.dev = dev;
+	rth->rt_table_id = vrf->tb_id;
+	vrf->rth_local = rth;
+
+	/* create a dst for routing packets out through a VRF device */
 	rth = rt_dst_alloc(dev, 0, RTN_UNICAST, 1, 1, 0);
-	if (rth) {
-		rth->dst.output	= vrf_output;
-		rth->rt_table_id = vrf->tb_id;
+	if (!rth) {
+		dst_release(&vrf->rth_local->dst);
+		return -ENOMEM;
 	}
 
-	return rth;
+	rth->dst.output = vrf_output;
+	rth->dst.dev = dev;
+	rth->rt_table_id = vrf->tb_id;
+	vrf->rth = rth;
+
+	return 0;
 }
 
 /**************************** device handling ********************/
@@ -482,8 +539,7 @@ static int vrf_dev_init(struct net_device *dev)
 		goto out_nomem;
 
 	/* create the default dst which points back to us */
-	vrf->rth = vrf_rtable_create(dev);
-	if (!vrf->rth)
+	if (vrf_rtable_create(dev))
 		goto out_stats;
 
 	if (vrf_rt6_create(dev) != 0)
@@ -646,6 +702,14 @@ static struct sk_buff *vrf_l3_rcv(struct net_device *vrf_dev,
 				  struct sk_buff *skb,
 				  u16 proto)
 {
+	/* loopback based traffic. Need to reset pkt_type for upper
+	 * layers to process skb
+	 */
+	if (skb->pkt_type == PACKET_LOOPBACK) {
+		skb->pkt_type = PACKET_HOST;
+		return skb;
+	}
+
 	switch (proto) {
 	case AF_INET:
 		return vrf_ip_rcv(vrf_dev, skb);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH net-next 08/13] net: vrf: ipv6 support for local traffic to local addresses
  2016-05-05  3:33 [PATCH net-next 00/13] net: Various VRF patches David Ahern
                   ` (6 preceding siblings ...)
  2016-05-05  3:33 ` [PATCH net-next 07/13] net: vrf: ipv4 support for local traffic to local addresses David Ahern
@ 2016-05-05  3:33 ` David Ahern
  2016-05-05  3:33 ` [PATCH net-next 09/13] net: l3mdev: Propagate route lookup flags for IPv6 David Ahern
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: David Ahern @ 2016-05-05  3:33 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

Add support for locally originated traffic to VRF-local addresses.
This patch handles IPv6 support. With this patch, ping, tcp and udp
packets to a local IPv6 address are successfully routed:

    $ ping6 -c1 -I red 2100:1::1
    ping6: Warning: source address might be selected on device other than red.
    PING 2100:1::1(2100:1::1) from 2100:1::1 red: 56 data bytes
    64 bytes from 2100:1::1: icmp_seq=1 ttl=64 time=0.098 ms

ip6_input is exported so the VRF driver can use it for the dst input
function. IPv4 defaults to setting the input and output functions; IPv6
does not. VRF does not need to reinvent the Rx path so just export the
function.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 drivers/net/vrf.c    | 79 ++++++++++++++++++++++++++++++++++++++++++++--------
 net/ipv6/ip6_input.c |  1 +
 2 files changed, 69 insertions(+), 11 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index b6e8b1e9b4fd..7a533607a08c 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -46,6 +46,7 @@ struct net_vrf {
 	struct rtable           *rth;
 	struct rtable           *rth_local;
 	struct rt6_info		*rt6;
+	struct rt6_info		*rt6_local;
 	u32                     tb_id;
 };
 
@@ -148,14 +149,39 @@ static netdev_tx_t vrf_process_v6_outbound(struct sk_buff *skb,
 	struct dst_entry *dst;
 	struct dst_entry *dst_null = &net->ipv6.ip6_null_entry->dst;
 
-	/* strip the ethernet header added for pass through VRF device */
-	__skb_pull(skb, skb_network_offset(skb));
-
 	dst = ip6_route_output(net, NULL, &fl6);
 	if (dst == dst_null)
 		goto err;
 
 	skb_dst_drop(skb);
+
+	/* if dst.dev is loopback or the VRF device again this is locally
+	 * originated traffic destined to a local address. Short circuit
+	 * to Rx path using our local dst
+	 */
+	if (dst->dev == net->loopback_dev || dst->dev == dev) {
+		struct net_vrf *vrf = netdev_priv(dev);
+		struct rt6_info *rt6_local = vrf->rt6_local;
+
+		/* release looked up dst and use cached local dst */
+		dst_release(dst);
+
+		/* Ordering issue: cached local dst is created on newlink
+		 * before the IPv6 initialization. Using the local dst
+		 * requires rt6i_idev to be set so make sure it is.
+		 */
+		if (!rt6_local->rt6i_idev) {
+			rt6_local->rt6i_idev = in6_dev_get(dev);
+			if (!rt6_local->rt6i_idev)
+				goto err;
+		}
+
+		return vrf_local_xmit(skb, &rt6_local->dst);
+	}
+
+	/* strip the ethernet header added for pass through VRF device */
+	__skb_pull(skb, skb_network_offset(skb));
+
 	skb_dst_set(skb, dst);
 
 	ret = ip6_local_out(net, skb->sk, skb);
@@ -314,30 +340,61 @@ static int vrf_output6(struct net *net, struct sock *sk, struct sk_buff *skb)
 
 static void vrf_rt6_release(struct net_vrf *vrf)
 {
-	dst_release(&vrf->rt6->dst);
+	struct rt6_info *rt6;
+
+	rt6 = vrf->rt6;
+	dst_release(&rt6->dst);
 	vrf->rt6 = NULL;
+
+	rt6 = vrf->rt6_local;
+	if (rt6->rt6i_idev)
+		in6_dev_put(rt6->rt6i_idev);
+
+	dst_release(&rt6->dst);
+	vrf->rt6_local = NULL;
 }
 
 static int vrf_rt6_create(struct net_device *dev)
 {
+	int flags = DST_HOST | DST_NOPOLICY | DST_NOXFRM | DST_NOCACHE;
 	struct net_vrf *vrf = netdev_priv(dev);
 	struct net *net = dev_net(dev);
+	struct fib6_table *rt6i_table;
 	struct rt6_info *rt6;
 	int rc = -ENOMEM;
 
-	rt6 = ip6_dst_alloc(net, dev,
-			    DST_HOST | DST_NOPOLICY | DST_NOXFRM | DST_NOCACHE);
+	rt6i_table = fib6_new_table(net, vrf->tb_id);
+	if (!rt6i_table)
+		goto out;
+
+	/* create a dst for routing packets out a VRF device */
+	rt6 = ip6_dst_alloc(net, dev, flags);
 	if (!rt6)
 		goto out;
 
 	dst_hold(&rt6->dst);
+	rt6->rt6i_table = rt6i_table;
+	rt6->dst.output = vrf_output6;
+	vrf->rt6 = rt6;
 
-	rt6->rt6i_table = fib6_new_table(net, vrf->tb_id);
-	if (!rt6->rt6i_table)
+	/* create a dst for local routing - packets sent locally
+	 * to local address via the VRF device as a loopback
+	 */
+	rt6 = ip6_dst_alloc(net, dev, flags);
+	if (!rt6) {
+		dst_release(&vrf->rt6->dst);
 		goto out;
+	}
+
+	dst_hold(&rt6->dst);
+
+	rt6->dst.flags |= DST_HOST;
+	rt6->rt6i_idev = in6_dev_get(dev);
+	rt6->rt6i_flags = RTF_UP | RTF_NONEXTHOP | RTF_LOCAL;
+	rt6->rt6i_table = rt6i_table;
+	rt6->dst.input = ip6_input;
+	vrf->rt6_local = rt6;
 
-	rt6->dst.output	= vrf_output6;
-	vrf->rt6 = rt6;
 	rc = 0;
 out:
 	return rc;
@@ -733,7 +790,7 @@ static struct dst_entry *vrf_get_rt6_dst(const struct net_device *dev,
 		dst_hold(&rt->dst);
 	}
 
-	return (struct dst_entry *)rt;
+	return &rt->dst;
 }
 #endif
 
diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index f185cbcda114..d896a08be0fc 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -298,6 +298,7 @@ int ip6_input(struct sk_buff *skb)
 		       dev_net(skb->dev), NULL, skb, skb->dev, NULL,
 		       ip6_input_finish);
 }
+EXPORT_SYMBOL_GPL(ip6_input);
 
 int ip6_mc_input(struct sk_buff *skb)
 {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH net-next 09/13] net: l3mdev: Propagate route lookup flags for IPv6
  2016-05-05  3:33 [PATCH net-next 00/13] net: Various VRF patches David Ahern
                   ` (7 preceding siblings ...)
  2016-05-05  3:33 ` [PATCH net-next 08/13] net: vrf: ipv6 " David Ahern
@ 2016-05-05  3:33 ` David Ahern
  2016-05-05  3:33 ` [PATCH net-next 10/13] net: vrf: Handle ipv6 multicast and link-local addresses David Ahern
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: David Ahern @ 2016-05-05  3:33 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

Commit 6f21c96a78b8 ("ipv6: enforce flowi6_oif usage in
ip6_dst_lookup_tail") converted ip6_route_output to ip6_route_output_flags
which takes a flags input parameter. That arg should be passed to
l3mdev_get_rt6_dst for it to use in lookups as well. Needed by the
next patch

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 drivers/net/vrf.c    | 2 +-
 include/net/l3mdev.h | 9 ++++++---
 net/ipv6/route.c     | 2 +-
 net/l3mdev/l3mdev.c  | 6 ++++--
 4 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 7a533607a08c..1389cd6008f7 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -779,7 +779,7 @@ static struct sk_buff *vrf_l3_rcv(struct net_device *vrf_dev,
 
 #if IS_ENABLED(CONFIG_IPV6)
 static struct dst_entry *vrf_get_rt6_dst(const struct net_device *dev,
-					 const struct flowi6 *fl6)
+					 const struct flowi6 *fl6, int flags)
 {
 	struct rt6_info *rt = NULL;
 
diff --git a/include/net/l3mdev.h b/include/net/l3mdev.h
index 46ac4c69c155..0b38f58b6798 100644
--- a/include/net/l3mdev.h
+++ b/include/net/l3mdev.h
@@ -36,7 +36,8 @@ struct l3mdev_ops {
 
 	/* IPv6 ops */
 	struct dst_entry * (*l3mdev_get_rt6_dst)(const struct net_device *dev,
-						 const struct flowi6 *fl6);
+						 const struct flowi6 *fl6,
+						 int flags);
 };
 
 #ifdef CONFIG_NET_L3_MASTER_DEV
@@ -134,7 +135,8 @@ static inline bool netif_index_is_l3_master(struct net *net, int ifindex)
 
 int l3mdev_get_saddr(struct net *net, int ifindex, struct flowi4 *fl4);
 
-struct dst_entry *l3mdev_get_rt6_dst(struct net *net, const struct flowi6 *fl6);
+struct dst_entry *l3mdev_get_rt6_dst(struct net *net, const struct flowi6 *fl6,
+				     int flags);
 
 static inline
 struct sk_buff *l3mdev_l3_rcv(struct sk_buff *skb, u16 proto)
@@ -221,7 +223,8 @@ static inline int l3mdev_get_saddr(struct net *net, int ifindex,
 }
 
 static inline
-struct dst_entry *l3mdev_get_rt6_dst(struct net *net, const struct flowi6 *fl6)
+struct dst_entry *l3mdev_get_rt6_dst(struct net *net, const struct flowi6 *fl6,
+				     int flags)
 {
 	return NULL;
 }
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index c42fa1deb152..c585323503f1 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1190,7 +1190,7 @@ struct dst_entry *ip6_route_output_flags(struct net *net, const struct sock *sk,
 	struct dst_entry *dst;
 	bool any_src;
 
-	dst = l3mdev_get_rt6_dst(net, fl6);
+	dst = l3mdev_get_rt6_dst(net, fl6, flags);
 	if (dst)
 		return dst;
 
diff --git a/net/l3mdev/l3mdev.c b/net/l3mdev/l3mdev.c
index 0fd8cc1417cd..fbf16c487d8b 100644
--- a/net/l3mdev/l3mdev.c
+++ b/net/l3mdev/l3mdev.c
@@ -107,7 +107,8 @@ EXPORT_SYMBOL_GPL(l3mdev_fib_table_by_index);
  */
 
 struct dst_entry *l3mdev_get_rt6_dst(struct net *net,
-				     const struct flowi6 *fl6)
+				     const struct flowi6 *fl6,
+				     int flags)
 {
 	struct dst_entry *dst = NULL;
 	struct net_device *dev;
@@ -121,7 +122,8 @@ struct dst_entry *l3mdev_get_rt6_dst(struct net *net,
 
 		if (dev && netif_is_l3_master(dev) &&
 		    dev->l3mdev_ops->l3mdev_get_rt6_dst) {
-			dst = dev->l3mdev_ops->l3mdev_get_rt6_dst(dev, fl6);
+			dst = dev->l3mdev_ops->l3mdev_get_rt6_dst(dev, fl6,
+								  flags);
 		}
 
 		rcu_read_unlock();
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH net-next 10/13] net: vrf: Handle ipv6 multicast and link-local addresses
  2016-05-05  3:33 [PATCH net-next 00/13] net: Various VRF patches David Ahern
                   ` (8 preceding siblings ...)
  2016-05-05  3:33 ` [PATCH net-next 09/13] net: l3mdev: Propagate route lookup flags for IPv6 David Ahern
@ 2016-05-05  3:33 ` David Ahern
  2016-05-05  3:33 ` [PATCH net-next 11/13] net: vrf: rcu protect changes to private data David Ahern
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: David Ahern @ 2016-05-05  3:33 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

IPv6 multicast and link-local addresses require special handling by the
VRF driver. Rather than using the VRF device index and a full FIB lookups
packets to/from these addresses should use direct FIB lookups.

Multicast routes do not make sense for L3 master devices. So, do not
add mcast routes for that device and fail attempts to send packets
to ipv6 mast addresses on the device.

With this change connections into and out of a VRF enslaved device work:

1. packets into VM with VRF config:
    ping6 -c3 fe80::e0:f9ff:fe1c:b974%br1
    ping6 -c3 ff02::1%br1

    ssh -6 fe80::e0:f9ff:fe1c:b974%br1

2. packets going out a VRF enslaved ddevice:
    ping6 -c3 fe80::18f8:83ff:fe4b:7a2e%eth1
    ping6 -c3 ff02::1%eth1
    ssh -6 root@fe80::18f8:83ff:fe4b:7a2e%eth1

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 drivers/net/vrf.c       | 78 ++++++++++++++++++++++++++++++++++++++++++++-----
 include/net/ip6_route.h |  3 ++
 include/net/l3mdev.h    |  6 ++--
 net/ipv6/addrconf.c     |  2 +-
 net/ipv6/icmp.c         |  2 +-
 net/ipv6/route.c        |  5 ++--
 net/l3mdev/l3mdev.c     |  2 +-
 7 files changed, 83 insertions(+), 15 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 1389cd6008f7..f4b44e23e6c2 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -717,11 +717,46 @@ static bool ipv6_ndisc_frame(const struct sk_buff *skb)
 	return rc;
 }
 
+static void vrf_ip6_input_dst(struct sk_buff *skb, struct net_device *vrf_dev,
+			      int ifindex)
+{
+	const struct ipv6hdr *iph = ipv6_hdr(skb);
+	struct flowi6 fl6 = {
+		.daddr		= iph->daddr,
+		.saddr		= iph->saddr,
+		.flowlabel	= ip6_flowinfo(iph),
+		.flowi6_mark	= skb->mark,
+		.flowi6_proto	= iph->nexthdr,
+		.flowi6_iif	= ifindex,
+	};
+	struct net_vrf *vrf = netdev_priv(vrf_dev);
+	struct net *net = dev_net(vrf_dev);
+	struct fib6_table *table;
+	struct rt6_info *rt6;
+
+	table = vrf->rt6->rt6i_table;
+	if (!table)
+		return;
+
+	rt6 = ip6_pol_route(net, table, ifindex, &fl6,
+			    RT6_LOOKUP_F_HAS_SADDR | RT6_LOOKUP_F_IFACE);
+
+	if (unlikely(&rt6->dst == &net->ipv6.ip6_null_entry->dst))
+		return;
+
+	skb_dst_set(skb, &rt6->dst);
+}
+
 static struct sk_buff *vrf_ip6_rcv(struct net_device *vrf_dev,
 				   struct sk_buff *skb)
 {
-	/* if packet is NDISC keep the ingress interface */
-	if (!ipv6_ndisc_frame(skb)) {
+	int orig_iif = skb->skb_iif;
+	bool need_strict = rt6_need_strict(&ipv6_hdr(skb)->daddr);
+
+	/* if packet is NDISC or addressed to multicast or link-local
+	 * then keep the ingress interface
+	 */
+	if (!ipv6_ndisc_frame(skb) && !need_strict) {
 		skb->dev = vrf_dev;
 		skb->skb_iif = vrf_dev->ifindex;
 
@@ -730,6 +765,9 @@ static struct sk_buff *vrf_ip6_rcv(struct net_device *vrf_dev,
 		skb_pull(skb, skb->mac_len);
 	}
 
+	if (need_strict)
+		vrf_ip6_input_dst(skb, vrf_dev, orig_iif);
+
 	return skb;
 }
 
@@ -779,15 +817,41 @@ static struct sk_buff *vrf_l3_rcv(struct net_device *vrf_dev,
 
 #if IS_ENABLED(CONFIG_IPV6)
 static struct dst_entry *vrf_get_rt6_dst(const struct net_device *dev,
-					 const struct flowi6 *fl6, int flags)
+					 struct flowi6 *fl6, int flags)
 {
+	struct net_vrf *vrf = netdev_priv(dev);
+	struct net *net = dev_net(dev);
 	struct rt6_info *rt = NULL;
 
-	if (!(fl6->flowi6_flags & FLOWI_FLAG_L3MDEV_SRC)) {
-		struct net_vrf *vrf = netdev_priv(dev);
+	/* send to link-local or multicast address */
+	if (rt6_need_strict(&fl6->daddr)) {
+		struct fib6_table *table;
+
+		/* VRF device does not have a link-local address and
+		 * sending packets to link-local or mcast addresses over
+		 * a VRF device does not make sense
+		 */
+		if (fl6->flowi6_oif == dev->ifindex) {
+			struct dst_entry *dst = &net->ipv6.ip6_null_entry->dst;
+
+			dst_hold(dst);
+			return dst;
+		}
+
+		table = vrf->rt6->rt6i_table;
+		if (!table)
+			return NULL;
 
-		rt = vrf->rt6;
-		dst_hold(&rt->dst);
+		flags |= RT6_LOOKUP_F_IFACE;
+		if (!ipv6_addr_any(&fl6->saddr))
+			flags |= RT6_LOOKUP_F_HAS_SADDR;
+
+		rt = ip6_pol_route(net, table, fl6->flowi6_oif, fl6, flags);
+	} else {
+		if (!(fl6->flowi6_flags & FLOWI_FLAG_L3MDEV_SRC)) {
+			rt = vrf->rt6;
+			dst_hold(&rt->dst);
+		}
 	}
 
 	return &rt->dst;
diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 54c779416eec..f73a65e97597 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -77,6 +77,9 @@ static inline struct dst_entry *ip6_route_output(struct net *net,
 struct dst_entry *ip6_route_lookup(struct net *net, struct flowi6 *fl6,
 				   int flags);
 
+struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
+			       int oif, struct flowi6 *fl6, int flags);
+
 int ip6_route_init(void);
 void ip6_route_cleanup(void);
 
diff --git a/include/net/l3mdev.h b/include/net/l3mdev.h
index 0b38f58b6798..d575185600a5 100644
--- a/include/net/l3mdev.h
+++ b/include/net/l3mdev.h
@@ -36,7 +36,7 @@ struct l3mdev_ops {
 
 	/* IPv6 ops */
 	struct dst_entry * (*l3mdev_get_rt6_dst)(const struct net_device *dev,
-						 const struct flowi6 *fl6,
+						 struct flowi6 *fl6,
 						 int flags);
 };
 
@@ -135,7 +135,7 @@ static inline bool netif_index_is_l3_master(struct net *net, int ifindex)
 
 int l3mdev_get_saddr(struct net *net, int ifindex, struct flowi4 *fl4);
 
-struct dst_entry *l3mdev_get_rt6_dst(struct net *net, const struct flowi6 *fl6,
+struct dst_entry *l3mdev_get_rt6_dst(struct net *net, struct flowi6 *fl6,
 				     int flags);
 
 static inline
@@ -223,7 +223,7 @@ static inline int l3mdev_get_saddr(struct net *net, int ifindex,
 }
 
 static inline
-struct dst_entry *l3mdev_get_rt6_dst(struct net *net, const struct flowi6 *fl6,
+struct dst_entry *l3mdev_get_rt6_dst(struct net *net, struct flowi6 *fl6,
 				     int flags)
 {
 	return NULL;
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 47f837a58e0a..b12553905e42 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2254,7 +2254,7 @@ static struct inet6_dev *addrconf_add_dev(struct net_device *dev)
 		return ERR_PTR(-EACCES);
 
 	/* Add default multicast route */
-	if (!(dev->flags & IFF_LOOPBACK))
+	if (!(dev->flags & IFF_LOOPBACK) && !netif_is_l3_master(dev))
 		addrconf_add_mroute(dev);
 
 	return idev;
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 4527285fcaa2..a69a7e553adb 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -585,7 +585,7 @@ static void icmpv6_echo_reply(struct sk_buff *skb)
 	fl6.daddr = ipv6_hdr(skb)->saddr;
 	if (saddr)
 		fl6.saddr = *saddr;
-	fl6.flowi6_oif = l3mdev_fib_oif(skb->dev);
+	fl6.flowi6_oif = skb->dev->ifindex;
 	fl6.fl6_icmp_type = ICMPV6_ECHO_REPLY;
 	fl6.flowi6_mark = mark;
 	security_skb_classify_flow(skb, flowi6_to_flowi(&fl6));
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index c585323503f1..a87e66d2284f 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1042,8 +1042,8 @@ static struct rt6_info *rt6_make_pcpu_route(struct rt6_info *rt)
 	return pcpu_rt;
 }
 
-static struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table, int oif,
-				      struct flowi6 *fl6, int flags)
+struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
+			       int oif, struct flowi6 *fl6, int flags)
 {
 	struct fib6_node *fn, *saved_fn;
 	struct rt6_info *rt;
@@ -1139,6 +1139,7 @@ static struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
 
 	}
 }
+EXPORT_SYMBOL_GPL(ip6_pol_route);
 
 static struct rt6_info *ip6_pol_route_input(struct net *net, struct fib6_table *table,
 					    struct flowi6 *fl6, int flags)
diff --git a/net/l3mdev/l3mdev.c b/net/l3mdev/l3mdev.c
index fbf16c487d8b..dceac272b8c4 100644
--- a/net/l3mdev/l3mdev.c
+++ b/net/l3mdev/l3mdev.c
@@ -107,7 +107,7 @@ EXPORT_SYMBOL_GPL(l3mdev_fib_table_by_index);
  */
 
 struct dst_entry *l3mdev_get_rt6_dst(struct net *net,
-				     const struct flowi6 *fl6,
+				     struct flowi6 *fl6,
 				     int flags)
 {
 	struct dst_entry *dst = NULL;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH net-next 11/13] net: vrf: rcu protect changes to private data
  2016-05-05  3:33 [PATCH net-next 00/13] net: Various VRF patches David Ahern
                   ` (9 preceding siblings ...)
  2016-05-05  3:33 ` [PATCH net-next 10/13] net: vrf: Handle ipv6 multicast and link-local addresses David Ahern
@ 2016-05-05  3:33 ` David Ahern
  2016-05-05  3:33 ` [PATCH net-next 12/13] net: vrf: Implement get_saddr for IPv6 David Ahern
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: David Ahern @ 2016-05-05  3:33 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

The problem is that one cpu is processing packets which includes using
the cached route entries in the vrf device's private data and on another
cpu the device is getting deleted which releases the routes and sets
the pointers to NULL. Fix by rcu protecting the changes.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 drivers/net/vrf.c | 202 ++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 142 insertions(+), 60 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index f4b44e23e6c2..fb2d0b2052ea 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -43,10 +43,10 @@
 #define DRV_VERSION	"1.0"
 
 struct net_vrf {
-	struct rtable           *rth;
-	struct rtable           *rth_local;
-	struct rt6_info		*rt6;
-	struct rt6_info		*rt6_local;
+	struct rtable __rcu	*rth;
+	struct rtable __rcu	*rth_local;
+	struct rt6_info	__rcu	*rt6;
+	struct rt6_info	__rcu	*rt6_local;
 	u32                     tb_id;
 };
 
@@ -104,8 +104,6 @@ static int vrf_local_xmit(struct sk_buff *skb, struct dst_entry *dst)
 	int len = skb->len;
 
 	skb_orphan(skb);
-
-	dst_hold(dst);
 	skb_dst_set(skb, dst);
 	skb_dst_force(skb);
 
@@ -149,11 +147,13 @@ static netdev_tx_t vrf_process_v6_outbound(struct sk_buff *skb,
 	struct dst_entry *dst;
 	struct dst_entry *dst_null = &net->ipv6.ip6_null_entry->dst;
 
+	skb_dst_drop(skb);
+
 	dst = ip6_route_output(net, NULL, &fl6);
-	if (dst == dst_null)
+	if (dst == dst_null) {
+		dst_release(dst);
 		goto err;
-
-	skb_dst_drop(skb);
+	}
 
 	/* if dst.dev is loopback or the VRF device again this is locally
 	 * originated traffic destined to a local address. Short circuit
@@ -161,22 +161,37 @@ static netdev_tx_t vrf_process_v6_outbound(struct sk_buff *skb,
 	 */
 	if (dst->dev == net->loopback_dev || dst->dev == dev) {
 		struct net_vrf *vrf = netdev_priv(dev);
-		struct rt6_info *rt6_local = vrf->rt6_local;
+		struct rt6_info *rt6_local;
 
 		/* release looked up dst and use cached local dst */
 		dst_release(dst);
 
+		rcu_read_lock();
+
+		rt6_local = rcu_dereference(vrf->rt6_local);
+		if (unlikely(!rt6_local)) {
+			rcu_read_unlock();
+			goto err;
+		}
+
 		/* Ordering issue: cached local dst is created on newlink
 		 * before the IPv6 initialization. Using the local dst
 		 * requires rt6i_idev to be set so make sure it is.
 		 */
-		if (!rt6_local->rt6i_idev) {
+		if (unlikely(!rt6_local->rt6i_idev)) {
 			rt6_local->rt6i_idev = in6_dev_get(dev);
-			if (!rt6_local->rt6i_idev)
+			if (!rt6_local->rt6i_idev) {
+				rcu_read_unlock();
 				goto err;
+			}
 		}
 
-		return vrf_local_xmit(skb, &rt6_local->dst);
+		dst = &rt6_local->dst;
+		dst_hold(dst);
+
+		rcu_read_unlock();
+
+		return vrf_local_xmit(skb, dst);
 	}
 
 	/* strip the ethernet header added for pass through VRF device */
@@ -238,9 +253,25 @@ static netdev_tx_t vrf_process_v4_outbound(struct sk_buff *skb,
 	 */
 	if (rt->dst.dev == net->loopback_dev || rt->dst.dev == vrf_dev) {
 		struct net_vrf *vrf = netdev_priv(vrf_dev);
+		struct rtable *rth_local;
+		struct dst_entry *dst = NULL;
 
 		ip_rt_put(rt);
-		return vrf_local_xmit(skb, &vrf->rth_local->dst);
+
+		rcu_read_lock();
+
+		rth_local = rcu_dereference(vrf->rth_local);
+		if (likely(rth_local)) {
+			dst = &rth_local->dst;
+			dst_hold(dst);
+		}
+
+		rcu_read_unlock();
+
+		if (unlikely(!dst))
+			goto err;
+
+		return vrf_local_xmit(skb, dst);
 	}
 
 	skb_dst_set(skb, &rt->dst);
@@ -338,20 +369,28 @@ static int vrf_output6(struct net *net, struct sock *sk, struct sk_buff *skb)
 			    !(IP6CB(skb)->flags & IP6SKB_REROUTED));
 }
 
+/* holding rtnl */
 static void vrf_rt6_release(struct net_vrf *vrf)
 {
-	struct rt6_info *rt6;
+	struct rt6_info *rt6, *rt6_local;
+
+	rt6       = rtnl_dereference(vrf->rt6);
+	rt6_local = rtnl_dereference(vrf->rt6_local);
 
-	rt6 = vrf->rt6;
-	dst_release(&rt6->dst);
-	vrf->rt6 = NULL;
+	RCU_INIT_POINTER(vrf->rt6,       NULL);
+	RCU_INIT_POINTER(vrf->rt6_local, NULL);
 
-	rt6 = vrf->rt6_local;
-	if (rt6->rt6i_idev)
-		in6_dev_put(rt6->rt6i_idev);
+	synchronize_rcu();
 
-	dst_release(&rt6->dst);
-	vrf->rt6_local = NULL;
+	if (rt6)
+		dst_release(&rt6->dst);
+
+	if (rt6_local) {
+		if (rt6_local->rt6i_idev)
+			in6_dev_put(rt6_local->rt6i_idev);
+
+		dst_release(&rt6_local->dst);
+	}
 }
 
 static int vrf_rt6_create(struct net_device *dev)
@@ -360,7 +399,7 @@ static int vrf_rt6_create(struct net_device *dev)
 	struct net_vrf *vrf = netdev_priv(dev);
 	struct net *net = dev_net(dev);
 	struct fib6_table *rt6i_table;
-	struct rt6_info *rt6;
+	struct rt6_info *rt6, *rt6_local;
 	int rc = -ENOMEM;
 
 	rt6i_table = fib6_new_table(net, vrf->tb_id);
@@ -375,25 +414,26 @@ static int vrf_rt6_create(struct net_device *dev)
 	dst_hold(&rt6->dst);
 	rt6->rt6i_table = rt6i_table;
 	rt6->dst.output = vrf_output6;
-	vrf->rt6 = rt6;
 
 	/* create a dst for local routing - packets sent locally
 	 * to local address via the VRF device as a loopback
 	 */
-	rt6 = ip6_dst_alloc(net, dev, flags);
-	if (!rt6) {
-		dst_release(&vrf->rt6->dst);
+	rt6_local = ip6_dst_alloc(net, dev, flags);
+	if (!rt6_local) {
+		dst_release(&rt6->dst);
 		goto out;
 	}
 
-	dst_hold(&rt6->dst);
+	dst_hold(&rt6_local->dst);
 
-	rt6->dst.flags |= DST_HOST;
-	rt6->rt6i_idev = in6_dev_get(dev);
-	rt6->rt6i_flags = RTF_UP | RTF_NONEXTHOP | RTF_LOCAL;
-	rt6->rt6i_table = rt6i_table;
-	rt6->dst.input = ip6_input;
-	vrf->rt6_local = rt6;
+	rt6_local->dst.flags |= DST_HOST;
+	rt6_local->rt6i_idev = in6_dev_get(dev);
+	rt6_local->rt6i_flags = RTF_UP | RTF_NONEXTHOP | RTF_LOCAL;
+	rt6_local->rt6i_table = rt6i_table;
+	rt6_local->dst.input = ip6_input;
+
+	rcu_assign_pointer(vrf->rt6, rt6);
+	rcu_assign_pointer(vrf->rt6_local, rt6_local);
 
 	rc = 0;
 out:
@@ -468,19 +508,30 @@ static int vrf_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 			    !(IPCB(skb)->flags & IPSKB_REROUTED));
 }
 
+/* holding rtnl */
 static void vrf_rtable_release(struct net_vrf *vrf)
 {
-	dst_release(&vrf->rth->dst);
-	dst_release(&vrf->rth_local->dst);
+	struct rtable *rth, *rth_local;
+
+	rth       = rtnl_dereference(vrf->rth);
+	rth_local = rtnl_dereference(vrf->rth_local);
+
+	RCU_INIT_POINTER(vrf->rth,       NULL);
+	RCU_INIT_POINTER(vrf->rth_local, NULL);
 
-	vrf->rth = NULL;
-	vrf->rth_local = NULL;
+	synchronize_rcu();
+
+	if (rth)
+		dst_release(&rth->dst);
+
+	if (rth_local)
+		dst_release(&rth_local->dst);
 }
 
 static int vrf_rtable_create(struct net_device *dev)
 {
 	struct net_vrf *vrf = netdev_priv(dev);
-	struct rtable *rth;
+	struct rtable *rth, *rth_local;
 
 	if (!fib_new_table(dev_net(dev), vrf->tb_id))
 		return -ENOMEM;
@@ -488,25 +539,26 @@ static int vrf_rtable_create(struct net_device *dev)
 	/* create a dst for local ingress routing - packets sent locally
 	 * to local address via the VRF device as a loopback
 	 */
-	rth = rt_dst_alloc(dev, RTCF_LOCAL, RTN_LOCAL, 1, 1, 0);
-	if (!rth)
+	rth_local = rt_dst_alloc(dev, RTCF_LOCAL, RTN_LOCAL, 1, 1, 0);
+	if (!rth_local)
 		return -ENOMEM;
 
-	rth->dst.dev = dev;
-	rth->rt_table_id = vrf->tb_id;
-	vrf->rth_local = rth;
+	rth_local->dst.dev = dev;
+	rth_local->rt_table_id = vrf->tb_id;
 
 	/* create a dst for routing packets out through a VRF device */
 	rth = rt_dst_alloc(dev, 0, RTN_UNICAST, 1, 1, 0);
 	if (!rth) {
-		dst_release(&vrf->rth_local->dst);
+		dst_release(&rth_local->dst);
 		return -ENOMEM;
 	}
 
 	rth->dst.output = vrf_output;
 	rth->dst.dev = dev;
 	rth->rt_table_id = vrf->tb_id;
-	vrf->rth = rth;
+
+	rcu_assign_pointer(vrf->rth, rth);
+	rcu_assign_pointer(vrf->rth_local, rth_local);
 
 	return 0;
 }
@@ -639,8 +691,13 @@ static struct rtable *vrf_get_rtable(const struct net_device *dev,
 	if (!(fl4->flowi4_flags & FLOWI_FLAG_L3MDEV_SRC)) {
 		struct net_vrf *vrf = netdev_priv(dev);
 
-		rth = vrf->rth;
-		dst_hold(&rth->dst);
+		rcu_read_lock();
+
+		rth = rcu_dereference(vrf->rth);
+		if (rth)
+			dst_hold(&rth->dst);
+
+		rcu_read_unlock();
 	}
 
 	return rth;
@@ -731,10 +788,18 @@ static void vrf_ip6_input_dst(struct sk_buff *skb, struct net_device *vrf_dev,
 	};
 	struct net_vrf *vrf = netdev_priv(vrf_dev);
 	struct net *net = dev_net(vrf_dev);
-	struct fib6_table *table;
+	struct fib6_table *table = NULL;
 	struct rt6_info *rt6;
 
-	table = vrf->rt6->rt6i_table;
+	rcu_read_lock();
+
+	/* fib6_table does not have a refcnt and can not be freed */
+	rt6 = rcu_dereference(vrf->rt6);
+	if (likely(rt6))
+		table = rt6->rt6i_table;
+
+	rcu_read_unlock();
+
 	if (!table)
 		return;
 
@@ -821,11 +886,12 @@ static struct dst_entry *vrf_get_rt6_dst(const struct net_device *dev,
 {
 	struct net_vrf *vrf = netdev_priv(dev);
 	struct net *net = dev_net(dev);
-	struct rt6_info *rt = NULL;
+	struct dst_entry *dst = NULL;
+	struct rt6_info *rt6;
 
 	/* send to link-local or multicast address */
 	if (rt6_need_strict(&fl6->daddr)) {
-		struct fib6_table *table;
+		struct fib6_table *table = NULL;
 
 		/* VRF device does not have a link-local address and
 		 * sending packets to link-local or mcast addresses over
@@ -838,7 +904,15 @@ static struct dst_entry *vrf_get_rt6_dst(const struct net_device *dev,
 			return dst;
 		}
 
-		table = vrf->rt6->rt6i_table;
+		rcu_read_lock();
+
+		/* fib6_table does not have a refcnt and can not be freed */
+		rt6 = rcu_dereference(vrf->rt6);
+		if (likely(rt6))
+			table = rt6->rt6i_table;
+
+		rcu_read_unlock();
+
 		if (!table)
 			return NULL;
 
@@ -846,15 +920,23 @@ static struct dst_entry *vrf_get_rt6_dst(const struct net_device *dev,
 		if (!ipv6_addr_any(&fl6->saddr))
 			flags |= RT6_LOOKUP_F_HAS_SADDR;
 
-		rt = ip6_pol_route(net, table, fl6->flowi6_oif, fl6, flags);
-	} else {
-		if (!(fl6->flowi6_flags & FLOWI_FLAG_L3MDEV_SRC)) {
-			rt = vrf->rt6;
-			dst_hold(&rt->dst);
+		rt6 = ip6_pol_route(net, table, fl6->flowi6_oif, fl6, flags);
+		if (rt6)
+			dst = &rt6->dst;
+
+	} else if (!(fl6->flowi6_flags & FLOWI_FLAG_L3MDEV_SRC)) {
+		rcu_read_lock();
+
+		rt6 = rcu_dereference(vrf->rt6);
+		if (likely(rt6)) {
+			dst = &rt6->dst;
+			dst_hold(dst);
 		}
+
+		rcu_read_unlock();
 	}
 
-	return &rt->dst;
+	return dst;
 }
 #endif
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH net-next 12/13] net: vrf: Implement get_saddr for IPv6
  2016-05-05  3:33 [PATCH net-next 00/13] net: Various VRF patches David Ahern
                   ` (10 preceding siblings ...)
  2016-05-05  3:33 ` [PATCH net-next 11/13] net: vrf: rcu protect changes to private data David Ahern
@ 2016-05-05  3:33 ` David Ahern
  2016-05-05  3:33 ` [PATCH net-next 13/13] net: ipv6: address selection should only consider devices in L3 domain David Ahern
  2016-05-05  3:59 ` [PATCH net-next 00/13] net: Various VRF patches David Miller
  13 siblings, 0 replies; 21+ messages in thread
From: David Ahern @ 2016-05-05  3:33 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

Right now source address selection is all screwed up for a number of
use cases. It does not properly take into account VRF centric addresses
or even valid routes for a VRF. Fix by implementating a get_saddr method
similar to what was done for IPv4. The get_saddr6 method does a full
lookup which means pulling a route from the VRF FIB table. Lookup
failures (eg., unreachable) then cause the source address selection
to fail which gets propagated back to the caller.

Since ipv6_dev_get_saddr is already exported move ip6_route_get_saddr to
the header as an inline since it only checks for a preferred source
address prior to calling ipv6_dev_get_saddr.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 drivers/net/vrf.c       | 86 +++++++++++++++++++++++++++++++++++++++----------
 include/net/ip6_route.h | 21 ++++++++++--
 include/net/l3mdev.h    | 11 +++++++
 net/ipv6/ip6_output.c   | 12 +++++--
 net/ipv6/route.c        | 17 ----------
 net/l3mdev/l3mdev.c     | 25 ++++++++++++++
 6 files changed, 133 insertions(+), 39 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index fb2d0b2052ea..d83d903dc674 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -774,20 +774,11 @@ static bool ipv6_ndisc_frame(const struct sk_buff *skb)
 	return rc;
 }
 
-static void vrf_ip6_input_dst(struct sk_buff *skb, struct net_device *vrf_dev,
-			      int ifindex)
+static struct rt6_info *vrf_ip6_route_lookup(struct net_device *dev,
+					     struct flowi6 *fl6, int ifindex)
 {
-	const struct ipv6hdr *iph = ipv6_hdr(skb);
-	struct flowi6 fl6 = {
-		.daddr		= iph->daddr,
-		.saddr		= iph->saddr,
-		.flowlabel	= ip6_flowinfo(iph),
-		.flowi6_mark	= skb->mark,
-		.flowi6_proto	= iph->nexthdr,
-		.flowi6_iif	= ifindex,
-	};
-	struct net_vrf *vrf = netdev_priv(vrf_dev);
-	struct net *net = dev_net(vrf_dev);
+	struct net_vrf *vrf = netdev_priv(dev);
+	struct net *net = dev_net(dev);
 	struct fib6_table *table = NULL;
 	struct rt6_info *rt6;
 
@@ -801,14 +792,36 @@ static void vrf_ip6_input_dst(struct sk_buff *skb, struct net_device *vrf_dev,
 	rcu_read_unlock();
 
 	if (!table)
-		return;
+		return NULL;
 
-	rt6 = ip6_pol_route(net, table, ifindex, &fl6,
-			    RT6_LOOKUP_F_HAS_SADDR | RT6_LOOKUP_F_IFACE);
+	return ip6_pol_route(net, table, ifindex, fl6,
+			     RT6_LOOKUP_F_HAS_SADDR | RT6_LOOKUP_F_IFACE);
+}
 
-	if (unlikely(&rt6->dst == &net->ipv6.ip6_null_entry->dst))
+static void vrf_ip6_input_dst(struct sk_buff *skb, struct net_device *dev,
+			      int ifindex)
+{
+	const struct ipv6hdr *iph = ipv6_hdr(skb);
+	struct flowi6 fl6 = {
+		.daddr		= iph->daddr,
+		.saddr		= iph->saddr,
+		.flowlabel	= ip6_flowinfo(iph),
+		.flowi6_mark	= skb->mark,
+		.flowi6_proto	= iph->nexthdr,
+		.flowi6_iif	= ifindex,
+	};
+	struct net *net = dev_net(dev);
+	struct rt6_info *rt6;
+
+	rt6 = vrf_ip6_route_lookup(dev, &fl6, ifindex);
+	if (unlikely(!rt6))
 		return;
 
+	if (unlikely(&rt6->dst == &net->ipv6.ip6_null_entry->dst)) {
+		dst_release(&rt6->dst);
+		return;
+	}
+
 	skb_dst_set(skb, &rt6->dst);
 }
 
@@ -836,6 +849,44 @@ static struct sk_buff *vrf_ip6_rcv(struct net_device *vrf_dev,
 	return skb;
 }
 
+/* called under rcu_read_lock */
+static int vrf_get_saddr6(struct net_device *dev, const struct sock *sk,
+			  struct flowi6 *fl6)
+{
+	struct net *net = dev_net(dev);
+	struct dst_entry *dst;
+	struct rt6_info *rt;
+	int err;
+
+	if (rt6_need_strict(&fl6->daddr)) {
+		rt = vrf_ip6_route_lookup(dev, fl6, fl6->flowi6_oif);
+		if (unlikely(!rt))
+			return 0;
+
+		dst = &rt->dst;
+	} else {
+		__u8 flags = fl6->flowi6_flags;
+
+		fl6->flowi6_flags |= FLOWI_FLAG_L3MDEV_SRC;
+		fl6->flowi6_flags |= FLOWI_FLAG_SKIP_NH_OIF;
+
+		dst = ip6_route_output(net, sk, fl6);
+		rt = (struct rt6_info *)dst;
+
+		fl6->flowi6_flags = flags;
+	}
+
+	err = dst->error;
+	if (!err) {
+		err = ip6_route_get_saddr(net, rt, &fl6->daddr,
+					  sk ? inet6_sk(sk)->srcprefs : 0,
+					  &fl6->saddr);
+	}
+
+	dst_release(dst);
+
+	return err;
+}
 #else
 static struct sk_buff *vrf_ip6_rcv(struct net_device *vrf_dev,
 				   struct sk_buff *skb)
@@ -947,6 +998,7 @@ static const struct l3mdev_ops vrf_l3mdev_ops = {
 	.l3mdev_l3_rcv		= vrf_l3_rcv,
 #if IS_ENABLED(CONFIG_IPV6)
 	.l3mdev_get_rt6_dst	= vrf_get_rt6_dst,
+	.l3mdev_get_saddr6	= vrf_get_saddr6,
 #endif
 };
 
diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index f73a65e97597..6886deb45679 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -18,6 +18,7 @@ struct route_info {
 	__u8			prefix[0];	/* 0,8 or 16 */
 };
 
+#include <net/addrconf.h>
 #include <net/flow.h>
 #include <net/ip6_fib.h>
 #include <net/sock.h>
@@ -89,9 +90,23 @@ int ip6_route_add(struct fib6_config *cfg);
 int ip6_ins_rt(struct rt6_info *);
 int ip6_del_rt(struct rt6_info *);
 
-int ip6_route_get_saddr(struct net *net, struct rt6_info *rt,
-			const struct in6_addr *daddr, unsigned int prefs,
-			struct in6_addr *saddr);
+static inline int ip6_route_get_saddr(struct net *net, struct rt6_info *rt,
+				      const struct in6_addr *daddr,
+				      unsigned int prefs,
+				      struct in6_addr *saddr)
+{
+	struct inet6_dev *idev =
+			rt ? ip6_dst_idev((struct dst_entry *)rt) : NULL;
+	int err = 0;
+
+	if (rt && rt->rt6i_prefsrc.plen)
+		*saddr = rt->rt6i_prefsrc.addr;
+	else
+		err = ipv6_dev_get_saddr(net, idev ? idev->dev : NULL,
+					 daddr, prefs, saddr);
+
+	return err;
+}
 
 struct rt6_info *rt6_lookup(struct net *net, const struct in6_addr *daddr,
 			    const struct in6_addr *saddr, int oif, int flags);
diff --git a/include/net/l3mdev.h b/include/net/l3mdev.h
index d575185600a5..6ba0a206db45 100644
--- a/include/net/l3mdev.h
+++ b/include/net/l3mdev.h
@@ -38,6 +38,9 @@ struct l3mdev_ops {
 	struct dst_entry * (*l3mdev_get_rt6_dst)(const struct net_device *dev,
 						 struct flowi6 *fl6,
 						 int flags);
+	int		   (*l3mdev_get_saddr6)(struct net_device *dev,
+						const struct sock *sk,
+						struct flowi6 *fl6);
 };
 
 #ifdef CONFIG_NET_L3_MASTER_DEV
@@ -137,6 +140,8 @@ int l3mdev_get_saddr(struct net *net, int ifindex, struct flowi4 *fl4);
 
 struct dst_entry *l3mdev_get_rt6_dst(struct net *net, struct flowi6 *fl6,
 				     int flags);
+int l3mdev_get_saddr6(struct net *net, const struct sock *sk,
+		      struct flowi6 *fl6);
 
 static inline
 struct sk_buff *l3mdev_l3_rcv(struct sk_buff *skb, u16 proto)
@@ -229,6 +234,12 @@ struct dst_entry *l3mdev_get_rt6_dst(struct net *net, struct flowi6 *fl6,
 	return NULL;
 }
 
+static inline int l3mdev_get_saddr6(struct net *net, const struct sock *sk,
+				    struct flowi6 *fl6)
+{
+	return 0;
+}
+
 static inline
 struct sk_buff *l3mdev_ip_rcv(struct sk_buff *skb)
 {
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index cbf127ae7c67..cfd01782a621 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -910,6 +910,13 @@ static int ip6_dst_lookup_tail(struct net *net, const struct sock *sk,
 	int err;
 	int flags = 0;
 
+	if (ipv6_addr_any(&fl6->saddr) && fl6->flowi6_oif &&
+	    (!*dst || !(*dst)->error)) {
+		err = l3mdev_get_saddr6(net, sk, fl6);
+		if (err)
+			goto out_err;
+	}
+
 	/* The correct way to handle this would be to do
 	 * ip6_route_get_saddr, and then ip6_route_output; however,
 	 * the route-specific preferred source forces the
@@ -999,10 +1006,11 @@ static int ip6_dst_lookup_tail(struct net *net, const struct sock *sk,
 	return 0;
 
 out_err_release:
-	if (err == -ENETUNREACH)
-		IP6_INC_STATS(net, NULL, IPSTATS_MIB_OUTNOROUTES);
 	dst_release(*dst);
 	*dst = NULL;
+out_err:
+	if (err == -ENETUNREACH)
+		IP6_INC_STATS(net, NULL, IPSTATS_MIB_OUTNOROUTES);
 	return err;
 }
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index a87e66d2284f..67ec5594be9c 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2584,23 +2584,6 @@ struct rt6_info *addrconf_dst_alloc(struct inet6_dev *idev,
 	return rt;
 }
 
-int ip6_route_get_saddr(struct net *net,
-			struct rt6_info *rt,
-			const struct in6_addr *daddr,
-			unsigned int prefs,
-			struct in6_addr *saddr)
-{
-	struct inet6_dev *idev =
-		rt ? ip6_dst_idev((struct dst_entry *)rt) : NULL;
-	int err = 0;
-	if (rt && rt->rt6i_prefsrc.plen)
-		*saddr = rt->rt6i_prefsrc.addr;
-	else
-		err = ipv6_dev_get_saddr(net, idev ? idev->dev : NULL,
-					 daddr, prefs, saddr);
-	return err;
-}
-
 /* remove deleted ip from prefsrc entries */
 struct arg_dev_net_ip {
 	struct net_device *dev;
diff --git a/net/l3mdev/l3mdev.c b/net/l3mdev/l3mdev.c
index dceac272b8c4..3e08d3e27a8a 100644
--- a/net/l3mdev/l3mdev.c
+++ b/net/l3mdev/l3mdev.c
@@ -164,3 +164,28 @@ int l3mdev_get_saddr(struct net *net, int ifindex, struct flowi4 *fl4)
 	return rc;
 }
 EXPORT_SYMBOL_GPL(l3mdev_get_saddr);
+
+int l3mdev_get_saddr6(struct net *net, const struct sock *sk,
+		      struct flowi6 *fl6)
+{
+	struct net_device *dev;
+	int rc = 0;
+
+	if (fl6->flowi6_oif) {
+		rcu_read_lock();
+
+		dev = dev_get_by_index_rcu(net, fl6->flowi6_oif);
+		if (dev && netif_is_l3_slave(dev))
+			dev = netdev_master_upper_dev_get_rcu(dev);
+
+		if (dev && netif_is_l3_master(dev) &&
+		    dev->l3mdev_ops->l3mdev_get_saddr6) {
+			rc = dev->l3mdev_ops->l3mdev_get_saddr6(dev, sk, fl6);
+		}
+
+		rcu_read_unlock();
+	}
+
+	return rc;
+}
+EXPORT_SYMBOL_GPL(l3mdev_get_saddr6);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH net-next 13/13] net: ipv6: address selection should only consider devices in L3 domain
  2016-05-05  3:33 [PATCH net-next 00/13] net: Various VRF patches David Ahern
                   ` (11 preceding siblings ...)
  2016-05-05  3:33 ` [PATCH net-next 12/13] net: vrf: Implement get_saddr for IPv6 David Ahern
@ 2016-05-05  3:33 ` David Ahern
  2016-05-05  3:59 ` [PATCH net-next 00/13] net: Various VRF patches David Miller
  13 siblings, 0 replies; 21+ messages in thread
From: David Ahern @ 2016-05-05  3:33 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

IPv6 version of 3f2fb9a834cb ("net: l3mdev: address selection should only
consider devices in L3 domain"). IPv4's follow up commit, a17b693cdd876
("net: l3mdev: prefer VRF master for source address selection"), is not
relevant. For IPv6 the VRF device should not be preferred over the dst_dev
as it leads to unnecessary forwarding versus a direct hop.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 net/ipv6/addrconf.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index b12553905e42..d13813867460 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1577,7 +1577,14 @@ int ipv6_dev_get_saddr(struct net *net, const struct net_device *dst_dev,
 		if (idev)
 			hiscore_idx = __ipv6_dev_get_saddr(net, &dst, idev, scores, hiscore_idx);
 	} else {
+		int master_idx = l3mdev_master_ifindex_rcu(dst_dev);
+
 		for_each_netdev_rcu(net, dev) {
+			/* only consider addresses on devices in the
+			 * same L3 domain
+			 */
+			if (l3mdev_master_ifindex_rcu(dev) != master_idx)
+				continue;
 			idev = __in6_dev_get(dev);
 			if (!idev)
 				continue;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 00/13] net: Various VRF patches
  2016-05-05  3:33 [PATCH net-next 00/13] net: Various VRF patches David Ahern
                   ` (12 preceding siblings ...)
  2016-05-05  3:33 ` [PATCH net-next 13/13] net: ipv6: address selection should only consider devices in L3 domain David Ahern
@ 2016-05-05  3:59 ` David Miller
  2016-05-05  4:13   ` David Ahern
  13 siblings, 1 reply; 21+ messages in thread
From: David Miller @ 2016-05-05  3:59 UTC (permalink / raw)
  To: dsa; +Cc: netdev

From: David Ahern <dsa@cumulusnetworks.com>
Date: Wed,  4 May 2016 20:33:17 -0700

> Various fixes and features for VRF over the past few months.

I really dislike a patch series that is simply a hodge podge of
unrelated things.

Please group your changes into logical bunches, and submit them as
groups one at a time.

Thank you.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 00/13] net: Various VRF patches
  2016-05-05  3:59 ` [PATCH net-next 00/13] net: Various VRF patches David Miller
@ 2016-05-05  4:13   ` David Ahern
  0 siblings, 0 replies; 21+ messages in thread
From: David Ahern @ 2016-05-05  4:13 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On 5/4/16 9:59 PM, David Miller wrote:
> From: David Ahern <dsa@cumulusnetworks.com>
> Date: Wed,  4 May 2016 20:33:17 -0700
>
>> Various fixes and features for VRF over the past few months.
>
> I really dislike a patch series that is simply a hodge podge of
> unrelated things.
>
> Please group your changes into logical bunches, and submit them as
> groups one at a time.

I can send them out based on the grouping mentioned in the cover letter, 
but they apply in order. I believe patch 4 is the only one that is not 
order dependent.

After several days of wrestling with the order of patches 7-13 they are 
all inherently dependent on patch 5. If you look at the diff summary 
most of the code changes are to l3mdev.{c,h} and vrf.c.

I will break out the first 4 as separate patches and re-send.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 03/13] net: l3mdev: Allow send on enslaved interface
  2016-05-05  3:33 ` [PATCH net-next 03/13] net: l3mdev: Allow send on enslaved interface David Ahern
@ 2016-05-05  7:40   ` Julian Anastasov
  2016-05-05 14:50     ` David Ahern
  0 siblings, 1 reply; 21+ messages in thread
From: Julian Anastasov @ 2016-05-05  7:40 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev


	Hello,

On Wed, 4 May 2016, David Ahern wrote:

> Allow udp and raw sockets to send by oif that is an enslaved interface
> versus the l3mdev/VRF device. For example, this allows BFD to use ifindex
> from IP_PKTINFO on a receive to send a response without the need to
> convert to the VRF index. It also allows ping and ping6 to work when
> specifying an enslaved interface (e.g., ping -I swp1 <ip>) which is
> a natural use case.
> 
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
> ---
>  drivers/net/vrf.c   |  2 ++
>  net/ipv4/route.c    |  4 ++++
>  net/l3mdev/l3mdev.c | 20 +++++++++++++++-----
>  3 files changed, 21 insertions(+), 5 deletions(-)
> 

> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 8c8c655bb2c4..a1f2830d8110 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -2146,6 +2146,7 @@ struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 *fl4,
>  	unsigned int flags = 0;
>  	struct fib_result res;
>  	struct rtable *rth;
> +	int master_idx;
>  	int orig_oif;
>  	int err = -ENETUNREACH;
>  
> @@ -2155,6 +2156,9 @@ struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 *fl4,
>  
>  	orig_oif = fl4->flowi4_oif;
>  
> +	master_idx = l3mdev_master_ifindex_by_index(net, fl4->flowi4_oif);
> +	if (master_idx)
> +		fl4->flowi4_oif = master_idx;

	Changing the flowi4_oif at this point can have
bad effects. I remember for recent commit for  __mkroute_output
where the route caching is disabled if traffic is redirected
to loopback. I think, such change can affect the route
caching, for example, now we use nexthop on master_idx to
cache routes for orig_oif. Such problems with the caching
in the past always caused lookups to return wrong cached result
for other users. But this is only my fears, I don't know
the actual result of this change. May be you are trying to
change flowi4_oif at one place instead of every caller.

Regards

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 06/13] net: original ingress device index in PKTINFO
  2016-05-05  3:33 ` [PATCH net-next 06/13] net: original ingress device index in PKTINFO David Ahern
@ 2016-05-05  8:41   ` Julian Anastasov
  2016-05-05 15:00     ` David Ahern
  0 siblings, 1 reply; 21+ messages in thread
From: Julian Anastasov @ 2016-05-05  8:41 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev


	Hello,

On Wed, 4 May 2016, David Ahern wrote:

> Applications such as OSPF and BFD need the original ingress device not
> the VRF device; the latter can be derived from the former. To that end
> add the skb_iif to inet_skb_parm and set it in ipv4 code after clearing
> the skb control buffer similar to IPv6. From there the pktinfo can just
> pull it from cb with the PKTINFO_SKB_CB cast.
> 
> The previous patch moving the skb->dev change to L3 means nothing else
> is needed for IPv6; it just works.
> 
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
> ---
>  include/net/ip.h       | 1 +
>  net/ipv4/ip_input.c    | 1 +
>  net/ipv4/ip_sockglue.c | 9 +++++++--
>  3 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/ip.h b/include/net/ip.h
> index 247ac82e9cf2..37165fba3741 100644
> --- a/include/net/ip.h
> +++ b/include/net/ip.h
> @@ -36,6 +36,7 @@
>  struct sock;
>  
>  struct inet_skb_parm {
> +	int			iif;
>  	struct ip_options	opt;		/* Compiled IP options		*/
>  	unsigned char		flags;
>  
> diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
> index 37375eedeef9..4b351af3e67b 100644
> --- a/net/ipv4/ip_input.c
> +++ b/net/ipv4/ip_input.c
> @@ -478,6 +478,7 @@ int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt,
>  
>  	/* Remove any debris in the socket control block */
>  	memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
> +	IPCB(skb)->iif = skb->skb_iif;

	For loopback traffic (including looped back multicast)
this is now a zero :( Can inet_iif be moved to ip_rcv_finish
instead? Still, we spend cycles in fast path in case nobody
listens for such info.

>  	/* Must drop socket now because of tproxy. */
>  	skb_orphan(skb);
> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
> index bdb222c0c6a2..dbcd027c38e7 100644
> --- a/net/ipv4/ip_sockglue.c
> +++ b/net/ipv4/ip_sockglue.c
> @@ -476,9 +476,9 @@ static bool ipv4_datagram_support_cmsg(const struct sock *sk,
>  	    (!skb->dev))
>  		return false;
>  
> +	/* see comment in ipv4_pktinfo_prepare about CB re-use */
>  	info = PKTINFO_SKB_CB(skb);
>  	info->ipi_spec_dst.s_addr = ip_hdr(skb)->saddr;
> -	info->ipi_ifindex = skb->dev->ifindex;

	This code is only for SOF_TIMESTAMPING_OPT_CMSG.
I'm not sure skb passes ip_rcv in all cases. So, we can not
easily remove it.

Regards

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 03/13] net: l3mdev: Allow send on enslaved interface
  2016-05-05  7:40   ` Julian Anastasov
@ 2016-05-05 14:50     ` David Ahern
  0 siblings, 0 replies; 21+ messages in thread
From: David Ahern @ 2016-05-05 14:50 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: netdev

On 5/5/16 1:40 AM, Julian Anastasov wrote:
>> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
>> index 8c8c655bb2c4..a1f2830d8110 100644
>> --- a/net/ipv4/route.c
>> +++ b/net/ipv4/route.c
>> @@ -2146,6 +2146,7 @@ struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 *fl4,
>>   	unsigned int flags = 0;
>>   	struct fib_result res;
>>   	struct rtable *rth;
>> +	int master_idx;
>>   	int orig_oif;
>>   	int err = -ENETUNREACH;
>>
>> @@ -2155,6 +2156,9 @@ struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 *fl4,
>>
>>   	orig_oif = fl4->flowi4_oif;
>>
>> +	master_idx = l3mdev_master_ifindex_by_index(net, fl4->flowi4_oif);
>> +	if (master_idx)
>> +		fl4->flowi4_oif = master_idx;
>
> 	Changing the flowi4_oif at this point can have
> bad effects. I remember for recent commit for  __mkroute_output
> where the route caching is disabled if traffic is redirected
> to loopback. I think, such change can affect the route
> caching, for example, now we use nexthop on master_idx to
> cache routes for orig_oif. Such problems with the caching
> in the past always caused lookups to return wrong cached result
> for other users. But this is only my fears, I don't know
> the actual result of this change. May be you are trying to
> change flowi4_oif at one place instead of every caller.

Yes. VRFs require the oif to be the master index so that the FIB rules 
direct the lookup to the proper table. Without it we get the wrong result.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 06/13] net: original ingress device index in PKTINFO
  2016-05-05  8:41   ` Julian Anastasov
@ 2016-05-05 15:00     ` David Ahern
  2016-05-05 20:00       ` Julian Anastasov
  0 siblings, 1 reply; 21+ messages in thread
From: David Ahern @ 2016-05-05 15:00 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: netdev

On 5/5/16 2:41 AM, Julian Anastasov wrote:
>
> 	Hello,
>
> On Wed, 4 May 2016, David Ahern wrote:
>
>> Applications such as OSPF and BFD need the original ingress device not
>> the VRF device; the latter can be derived from the former. To that end
>> add the skb_iif to inet_skb_parm and set it in ipv4 code after clearing
>> the skb control buffer similar to IPv6. From there the pktinfo can just
>> pull it from cb with the PKTINFO_SKB_CB cast.
>>
>> The previous patch moving the skb->dev change to L3 means nothing else
>> is needed for IPv6; it just works.
>>
>> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
>> ---
>>   include/net/ip.h       | 1 +
>>   net/ipv4/ip_input.c    | 1 +
>>   net/ipv4/ip_sockglue.c | 9 +++++++--
>>   3 files changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/net/ip.h b/include/net/ip.h
>> index 247ac82e9cf2..37165fba3741 100644
>> --- a/include/net/ip.h
>> +++ b/include/net/ip.h
>> @@ -36,6 +36,7 @@
>>   struct sock;
>>
>>   struct inet_skb_parm {
>> +	int			iif;
>>   	struct ip_options	opt;		/* Compiled IP options		*/
>>   	unsigned char		flags;
>>
>> diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
>> index 37375eedeef9..4b351af3e67b 100644
>> --- a/net/ipv4/ip_input.c
>> +++ b/net/ipv4/ip_input.c
>> @@ -478,6 +478,7 @@ int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt,
>>
>>   	/* Remove any debris in the socket control block */
>>   	memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
>> +	IPCB(skb)->iif = skb->skb_iif;
>
> 	For loopback traffic (including looped back multicast)
> this is now a zero :( Can inet_iif be moved to ip_rcv_finish
> instead? Still, we spend cycles in fast path in case nobody
> listens for such info.

Why is that? skb_iif is set to skb->dev->index in 
__netif_receive_skb_core and ip_rcv is called it. Is there another path 
to it?


>
>>   	/* Must drop socket now because of tproxy. */
>>   	skb_orphan(skb);
>> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
>> index bdb222c0c6a2..dbcd027c38e7 100644
>> --- a/net/ipv4/ip_sockglue.c
>> +++ b/net/ipv4/ip_sockglue.c
>> @@ -476,9 +476,9 @@ static bool ipv4_datagram_support_cmsg(const struct sock *sk,
>>   	    (!skb->dev))
>>   		return false;
>>
>> +	/* see comment in ipv4_pktinfo_prepare about CB re-use */
>>   	info = PKTINFO_SKB_CB(skb);
>>   	info->ipi_spec_dst.s_addr = ip_hdr(skb)->saddr;
>> -	info->ipi_ifindex = skb->dev->ifindex;
>
> 	This code is only for SOF_TIMESTAMPING_OPT_CMSG.
> I'm not sure skb passes ip_rcv in all cases. So, we can not
> easily remove it.

ok.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 06/13] net: original ingress device index in PKTINFO
  2016-05-05 15:00     ` David Ahern
@ 2016-05-05 20:00       ` Julian Anastasov
  0 siblings, 0 replies; 21+ messages in thread
From: Julian Anastasov @ 2016-05-05 20:00 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev


	Hello,

On Thu, 5 May 2016, David Ahern wrote:

> On 5/5/16 2:41 AM, Julian Anastasov wrote:

> > > +	IPCB(skb)->iif = skb->skb_iif;
> >
> > 	For loopback traffic (including looped back multicast)
> > this is now a zero :( Can inet_iif be moved to ip_rcv_finish
> > instead? Still, we spend cycles in fast path in case nobody
> > listens for such info.
> 
> Why is that? skb_iif is set to skb->dev->index in __netif_receive_skb_core and
> ip_rcv is called it. Is there another path to it?

	You are right, it is 0 only for the output path.

Regards

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2016-05-05 20:00 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-05  3:33 [PATCH net-next 00/13] net: Various VRF patches David Ahern
2016-05-05  3:33 ` [PATCH net-next 01/13] net: vrf: Create FIB tables on link create David Ahern
2016-05-05  3:33 ` [PATCH net-next 02/13] net: l3mdev: Move get_saddr and rt6_dst David Ahern
2016-05-05  3:33 ` [PATCH net-next 03/13] net: l3mdev: Allow send on enslaved interface David Ahern
2016-05-05  7:40   ` Julian Anastasov
2016-05-05 14:50     ` David Ahern
2016-05-05  3:33 ` [PATCH net-next 04/13] net: ipv6: tcp reset, icmp need to consider L3 domain David Ahern
2016-05-05  3:33 ` [PATCH net-next 05/13] net: l3mdev: Add hook in ip and ipv6 David Ahern
2016-05-05  3:33 ` [PATCH net-next 06/13] net: original ingress device index in PKTINFO David Ahern
2016-05-05  8:41   ` Julian Anastasov
2016-05-05 15:00     ` David Ahern
2016-05-05 20:00       ` Julian Anastasov
2016-05-05  3:33 ` [PATCH net-next 07/13] net: vrf: ipv4 support for local traffic to local addresses David Ahern
2016-05-05  3:33 ` [PATCH net-next 08/13] net: vrf: ipv6 " David Ahern
2016-05-05  3:33 ` [PATCH net-next 09/13] net: l3mdev: Propagate route lookup flags for IPv6 David Ahern
2016-05-05  3:33 ` [PATCH net-next 10/13] net: vrf: Handle ipv6 multicast and link-local addresses David Ahern
2016-05-05  3:33 ` [PATCH net-next 11/13] net: vrf: rcu protect changes to private data David Ahern
2016-05-05  3:33 ` [PATCH net-next 12/13] net: vrf: Implement get_saddr for IPv6 David Ahern
2016-05-05  3:33 ` [PATCH net-next 13/13] net: ipv6: address selection should only consider devices in L3 domain David Ahern
2016-05-05  3:59 ` [PATCH net-next 00/13] net: Various VRF patches David Miller
2016-05-05  4:13   ` David Ahern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.