netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/7] net: ivp4: return matching route for GETROUTE request
@ 2017-01-09 21:32 David Ahern
  2017-01-09 21:32 ` [PATCH net-next 1/7] net: ipv4: refactor __ip_route_output_key_hash David Ahern
                   ` (6 more replies)
  0 siblings, 7 replies; 10+ messages in thread
From: David Ahern @ 2017-01-09 21:32 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

For complicated and highly populated route tables, RTM_GETROUTE requests
are an eye chart trying to match the response with the route entry that
was hit. This series solves that problem by returning the RIB entry that
was matched for a GETROUTE request as an a new nested attribute,
RTA_ROUTE_GET, that contains the typical RTA's for a route spec.

Example:
    $ ip ro get 10.10.10.10
    10.10.10.10 via 172.16.20.21 dev virt01 src 172.16.20.20 uid 0
        cache

    Matching route:
    10.10.10.10  encap mpls  100 via 172.16.20.21 dev virt01

Patches 1-3 refactor the existing input and output route lookups, moving
the rcu read lock protected sections into standalone functions that take
the fib_result as input an argument. inet_rtm_getroute is then converted
to use the new functions while holding the rcu read lock. Doing so gives
inet_rtm_getroute access to the matching fib_info.

Patch 4 refactors fib_dump_info, moving the code that adds route
attributes to a response into a separate function.

Patch 5 adds the prefix for the matching trie entry to fib_result.

Patch 6 then adds the prefix and matching fib_info to the GETROUTE
response using the fib_dump_add_attrs_rcu from Patch 4.

Patch 7 removes the event arg from rt_fill_info simplifying its
argument list.

IPv6 will be converted to return the same in a follow on patch set.

David Ahern (7):
  net: ipv4: refactor __ip_route_output_key_hash
  net: ipv4: refactor ip_route_input_noref
  net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup
  net: ipv4: refactor fib_dump_info
  net: ipv4: Save trie prefix to fib lookup result
  net: ipv4: return route match in GETROUTE request
  net: ipv4: Remove event arg to rt_fill_info

 include/net/ip_fib.h           |   1 +
 include/net/route.h            |  12 ++-
 include/uapi/linux/rtnetlink.h |   2 +
 net/ipv4/fib_lookup.h          |   2 +
 net/ipv4/fib_semantics.c       |  17 +++-
 net/ipv4/fib_trie.c            |   1 +
 net/ipv4/icmp.c                |   4 +-
 net/ipv4/route.c               | 177 +++++++++++++++++++++++++++--------------
 8 files changed, 149 insertions(+), 67 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net-next 1/7] net: ipv4: refactor __ip_route_output_key_hash
  2017-01-09 21:32 [PATCH net-next 0/7] net: ivp4: return matching route for GETROUTE request David Ahern
@ 2017-01-09 21:32 ` David Ahern
  2017-01-09 21:32 ` [PATCH net-next 2/7] net: ipv4: refactor ip_route_input_noref David Ahern
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: David Ahern @ 2017-01-09 21:32 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

A later patch wants access to the fib result on an output route lookup
with the rcu lock held. Refactor __ip_route_output_key_hash, pushing
the logic between rcu_read_lock ... rcu_read_unlock into a new helper
that takes the fib_result as an input arg.

To keep the name length under control remove the leading underscores
from the name. _rcu is added to the name of the new helper indicating
it is called with the rcu read lock held.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 include/net/route.h |  9 ++++++---
 net/ipv4/icmp.c     |  4 ++--
 net/ipv4/route.c    | 50 +++++++++++++++++++++++++++++---------------------
 3 files changed, 37 insertions(+), 26 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index c0874c87c173..bf4f7a98f753 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -113,13 +113,16 @@ struct in_device;
 int ip_rt_init(void);
 void rt_cache_flush(struct net *net);
 void rt_flush_dev(struct net_device *dev);
-struct rtable *__ip_route_output_key_hash(struct net *, struct flowi4 *flp,
-					  int mp_hash);
+struct rtable *ip_route_output_key_hash(struct net *, struct flowi4 *flp,
+					int mp_hash);
+struct rtable *ip_route_output_key_hash_rcu(struct net *, struct flowi4 *flp,
+					    struct fib_result *res,
+					    int mp_hash);
 
 static inline struct rtable *__ip_route_output_key(struct net *net,
 						   struct flowi4 *flp)
 {
-	return __ip_route_output_key_hash(net, flp, -1);
+	return ip_route_output_key_hash(net, flp, -1);
 }
 
 struct rtable *ip_route_output_flow(struct net *, struct flowi4 *flp,
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 0777ea949223..67ed57365f80 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -482,8 +482,8 @@ static struct rtable *icmp_route_lookup(struct net *net,
 	fl4->flowi4_oif = l3mdev_master_ifindex(skb_dst(skb_in)->dev);
 
 	security_skb_classify_flow(skb_in, flowi4_to_flowi(fl4));
-	rt = __ip_route_output_key_hash(net, fl4,
-					icmp_multipath_hash_skb(skb_in));
+	rt = ip_route_output_key_hash(net, fl4,
+				      icmp_multipath_hash_skb(skb_in));
 	if (IS_ERR(rt))
 		return rt;
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 7144288371cf..effd7f8e31f9 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2181,29 +2181,39 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
  * Major route resolver routine.
  */
 
-struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 *fl4,
-					  int mp_hash)
+struct rtable *ip_route_output_key_hash(struct net *net, struct flowi4 *fl4,
+					int mp_hash)
 {
-	struct net_device *dev_out = NULL;
 	__u8 tos = RT_FL_TOS(fl4);
-	unsigned int flags = 0;
 	struct fib_result res;
 	struct rtable *rth;
-	int orig_oif;
-	int err = -ENETUNREACH;
 
 	res.tclassid	= 0;
 	res.fi		= NULL;
 	res.table	= NULL;
 
-	orig_oif = fl4->flowi4_oif;
-
 	fl4->flowi4_iif = LOOPBACK_IFINDEX;
 	fl4->flowi4_tos = tos & IPTOS_RT_MASK;
 	fl4->flowi4_scope = ((tos & RTO_ONLINK) ?
 			 RT_SCOPE_LINK : RT_SCOPE_UNIVERSE);
 
 	rcu_read_lock();
+	rth = ip_route_output_key_hash_rcu(net, fl4, &res, mp_hash);
+	rcu_read_unlock();
+
+	return rth;
+}
+EXPORT_SYMBOL_GPL(ip_route_output_key_hash);
+
+struct rtable *ip_route_output_key_hash_rcu(struct net *net, struct flowi4 *fl4,
+					    struct fib_result *res, int mp_hash)
+{
+	struct net_device *dev_out = NULL;
+	int orig_oif = fl4->flowi4_oif;
+	unsigned int flags = 0;
+	struct rtable *rth;
+	int err = -ENETUNREACH;
+
 	if (fl4->saddr) {
 		rth = ERR_PTR(-EINVAL);
 		if (ipv4_is_multicast(fl4->saddr) ||
@@ -2289,15 +2299,15 @@ struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 *fl4,
 			fl4->daddr = fl4->saddr = htonl(INADDR_LOOPBACK);
 		dev_out = net->loopback_dev;
 		fl4->flowi4_oif = LOOPBACK_IFINDEX;
-		res.type = RTN_LOCAL;
+		res->type = RTN_LOCAL;
 		flags |= RTCF_LOCAL;
 		goto make_route;
 	}
 
-	err = fib_lookup(net, fl4, &res, 0);
+	err = fib_lookup(net, fl4, res, 0);
 	if (err) {
-		res.fi = NULL;
-		res.table = NULL;
+		res->fi = NULL;
+		res->table = NULL;
 		if (fl4->flowi4_oif &&
 		    (ipv4_is_multicast(fl4->daddr) ||
 		    !netif_index_is_l3_master(net, fl4->flowi4_oif))) {
@@ -2322,17 +2332,17 @@ struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 *fl4,
 			if (fl4->saddr == 0)
 				fl4->saddr = inet_select_addr(dev_out, 0,
 							      RT_SCOPE_LINK);
-			res.type = RTN_UNICAST;
+			res->type = RTN_UNICAST;
 			goto make_route;
 		}
 		rth = ERR_PTR(err);
 		goto out;
 	}
 
-	if (res.type == RTN_LOCAL) {
+	if (res->type == RTN_LOCAL) {
 		if (!fl4->saddr) {
-			if (res.fi->fib_prefsrc)
-				fl4->saddr = res.fi->fib_prefsrc;
+			if (res->fi->fib_prefsrc)
+				fl4->saddr = res->fi->fib_prefsrc;
 			else
 				fl4->saddr = fl4->daddr;
 		}
@@ -2344,20 +2354,18 @@ struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 *fl4,
 		goto make_route;
 	}
 
-	fib_select_path(net, &res, fl4, mp_hash);
+	fib_select_path(net, res, fl4, mp_hash);
 
-	dev_out = FIB_RES_DEV(res);
+	dev_out = FIB_RES_DEV(*res);
 	fl4->flowi4_oif = dev_out->ifindex;
 
 
 make_route:
-	rth = __mkroute_output(&res, fl4, orig_oif, dev_out, flags);
+	rth = __mkroute_output(res, fl4, orig_oif, dev_out, flags);
 
 out:
-	rcu_read_unlock();
 	return rth;
 }
-EXPORT_SYMBOL_GPL(__ip_route_output_key_hash);
 
 static struct dst_entry *ipv4_blackhole_dst_check(struct dst_entry *dst, u32 cookie)
 {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 2/7] net: ipv4: refactor ip_route_input_noref
  2017-01-09 21:32 [PATCH net-next 0/7] net: ivp4: return matching route for GETROUTE request David Ahern
  2017-01-09 21:32 ` [PATCH net-next 1/7] net: ipv4: refactor __ip_route_output_key_hash David Ahern
@ 2017-01-09 21:32 ` David Ahern
  2017-01-09 21:32 ` [PATCH net-next 3/7] net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup David Ahern
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: David Ahern @ 2017-01-09 21:32 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

A later patch wants access to the fib result on an input route lookup
with the rcu lock held. Refactor ip_route_input_noref pushing the logic
between rcu_read_lock ... rcu_read_unlock into a new helper that takes
the fib_result as an input arg.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 include/net/route.h |  3 +++
 net/ipv4/route.c    | 66 ++++++++++++++++++++++++++++++-----------------------
 2 files changed, 40 insertions(+), 29 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index bf4f7a98f753..4f3502a67203 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -178,6 +178,9 @@ static inline struct rtable *ip_route_output_gre(struct net *net, struct flowi4
 
 int ip_route_input_noref(struct sk_buff *skb, __be32 dst, __be32 src,
 			 u8 tos, struct net_device *devin);
+int ip_route_input_rcu(struct sk_buff *skb, __be32 dst, __be32 src,
+		       u8 tos, struct net_device *devin,
+		       struct fib_result *res);
 
 static inline int ip_route_input(struct sk_buff *skb, __be32 dst, __be32 src,
 				 u8 tos, struct net_device *devin)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index effd7f8e31f9..3142cd802e79 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1789,9 +1789,9 @@ static int ip_mkroute_input(struct sk_buff *skb,
  */
 
 static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
-			       u8 tos, struct net_device *dev)
+			       u8 tos, struct net_device *dev,
+			       struct fib_result *res)
 {
-	struct fib_result res;
 	struct in_device *in_dev = __in_dev_get_rcu(dev);
 	struct ip_tunnel_info *tun_info;
 	struct flowi4	fl4;
@@ -1821,8 +1821,8 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 	if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr))
 		goto martian_source;
 
-	res.fi = NULL;
-	res.table = NULL;
+	res->fi = NULL;
+	res->table = NULL;
 	if (ipv4_is_lbcast(daddr) || (saddr == 0 && daddr == 0))
 		goto brd_input;
 
@@ -1857,17 +1857,17 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 	fl4.flowi4_flags = 0;
 	fl4.daddr = daddr;
 	fl4.saddr = saddr;
-	err = fib_lookup(net, &fl4, &res, 0);
+	err = fib_lookup(net, &fl4, res, 0);
 	if (err != 0) {
 		if (!IN_DEV_FORWARD(in_dev))
 			err = -EHOSTUNREACH;
 		goto no_route;
 	}
 
-	if (res.type == RTN_BROADCAST)
+	if (res->type == RTN_BROADCAST)
 		goto brd_input;
 
-	if (res.type == RTN_LOCAL) {
+	if (res->type == RTN_LOCAL) {
 		err = fib_validate_source(skb, saddr, daddr, tos,
 					  0, dev, in_dev, &itag);
 		if (err < 0)
@@ -1879,10 +1879,10 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 		err = -EHOSTUNREACH;
 		goto no_route;
 	}
-	if (res.type != RTN_UNICAST)
+	if (res->type != RTN_UNICAST)
 		goto martian_destination;
 
-	err = ip_mkroute_input(skb, &res, in_dev, daddr, saddr, tos);
+	err = ip_mkroute_input(skb, res, in_dev, daddr, saddr, tos);
 out:	return err;
 
 brd_input:
@@ -1896,14 +1896,14 @@ out:	return err;
 			goto martian_source;
 	}
 	flags |= RTCF_BROADCAST;
-	res.type = RTN_BROADCAST;
+	res->type = RTN_BROADCAST;
 	RT_CACHE_STAT_INC(in_brd);
 
 local_input:
 	do_cache = false;
-	if (res.fi) {
+	if (res->fi) {
 		if (!itag) {
-			rth = rcu_dereference(FIB_RES_NH(res).nh_rth_input);
+			rth = rcu_dereference(FIB_RES_NH(*res).nh_rth_input);
 			if (rt_cache_valid(rth)) {
 				skb_dst_set_noref(skb, &rth->dst);
 				err = 0;
@@ -1914,7 +1914,7 @@ out:	return err;
 	}
 
 	rth = rt_dst_alloc(l3mdev_master_dev_rcu(dev) ? : net->loopback_dev,
-			   flags | RTCF_LOCAL, res.type,
+			   flags | RTCF_LOCAL, res->type,
 			   IN_DEV_CONF_GET(in_dev, NOPOLICY), false, do_cache);
 	if (!rth)
 		goto e_nobufs;
@@ -1924,18 +1924,18 @@ out:	return err;
 	rth->dst.tclassid = itag;
 #endif
 	rth->rt_is_input = 1;
-	if (res.table)
-		rth->rt_table_id = res.table->tb_id;
+	if (res->table)
+		rth->rt_table_id = res->table->tb_id;
 
 	RT_CACHE_STAT_INC(in_slow_tot);
-	if (res.type == RTN_UNREACHABLE) {
+	if (res->type == RTN_UNREACHABLE) {
 		rth->dst.input= ip_error;
 		rth->dst.error= -err;
 		rth->rt_flags 	&= ~RTCF_LOCAL;
 	}
 
 	if (do_cache) {
-		struct fib_nh *nh = &FIB_RES_NH(res);
+		struct fib_nh *nh = &FIB_RES_NH(*res);
 
 		rth->dst.lwtstate = lwtstate_get(nh->nh_lwtstate);
 		if (lwtunnel_input_redirect(rth->dst.lwtstate)) {
@@ -1955,9 +1955,9 @@ out:	return err;
 
 no_route:
 	RT_CACHE_STAT_INC(in_no_route);
-	res.type = RTN_UNREACHABLE;
-	res.fi = NULL;
-	res.table = NULL;
+	res->type = RTN_UNREACHABLE;
+	res->fi = NULL;
+	res->table = NULL;
 	goto local_input;
 
 	/*
@@ -1987,10 +1987,21 @@ out:	return err;
 int ip_route_input_noref(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 			 u8 tos, struct net_device *dev)
 {
-	int res;
+	struct fib_result res;
+	int err;
 
 	rcu_read_lock();
+	err = ip_route_input_rcu(skb, daddr, saddr, tos, dev, &res);
+	rcu_read_unlock();
 
+	return err;
+}
+EXPORT_SYMBOL(ip_route_input_noref);
+
+/* called with rcu_read_lock held */
+int ip_route_input_rcu(struct sk_buff *skb, __be32 daddr, __be32 saddr,
+		       u8 tos, struct net_device *dev, struct fib_result *res)
+{
 	/* Multicast recognition logic is moved from route cache to here.
 	   The problem was that too many Ethernet cards have broken/missing
 	   hardware multicast filters :-( As result the host on multicasting
@@ -2005,6 +2016,7 @@ int ip_route_input_noref(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 	if (ipv4_is_multicast(daddr)) {
 		struct in_device *in_dev = __in_dev_get_rcu(dev);
 		int our = 0;
+		int err = -EINVAL;
 
 		if (in_dev)
 			our = ip_check_mc_rcu(in_dev, daddr, saddr,
@@ -2020,7 +2032,6 @@ int ip_route_input_noref(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 						      ip_hdr(skb)->protocol);
 		}
 
-		res = -EINVAL;
 		if (our
 #ifdef CONFIG_IP_MROUTE
 			||
@@ -2028,17 +2039,14 @@ int ip_route_input_noref(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 		     IN_DEV_MFORWARD(in_dev))
 #endif
 		   ) {
-			res = ip_route_input_mc(skb, daddr, saddr,
+			err = ip_route_input_mc(skb, daddr, saddr,
 						tos, dev, our);
 		}
-		rcu_read_unlock();
-		return res;
+		return err;
 	}
-	res = ip_route_input_slow(skb, daddr, saddr, tos, dev);
-	rcu_read_unlock();
-	return res;
+
+	return ip_route_input_slow(skb, daddr, saddr, tos, dev, res);
 }
-EXPORT_SYMBOL(ip_route_input_noref);
 
 /* called with rcu_read_lock() */
 static struct rtable *__mkroute_output(const struct fib_result *res,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 3/7] net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup
  2017-01-09 21:32 [PATCH net-next 0/7] net: ivp4: return matching route for GETROUTE request David Ahern
  2017-01-09 21:32 ` [PATCH net-next 1/7] net: ipv4: refactor __ip_route_output_key_hash David Ahern
  2017-01-09 21:32 ` [PATCH net-next 2/7] net: ipv4: refactor ip_route_input_noref David Ahern
@ 2017-01-09 21:32 ` David Ahern
  2017-01-09 21:32 ` [PATCH net-next 4/7] net: ipv4: refactor fib_dump_info David Ahern
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: David Ahern @ 2017-01-09 21:32 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

Convert inet_rtm_getroute to use ip_route_input_rcu and
ip_route_output_key_hash_rcu passing the fib_result arg to both.
The rcu lock is held through the creation of the response, so the
rtable/dst does not need to be attached to the skb and is passed
to rt_fill_info directly.

In converting from ip_route_output_key to ip_route_output_key_hash_rcu
the xfrm_lookup_route in ip_route_output_flow is dropped since
flowi4_proto is not set for a route get request. Also, the flow struct
adjustments from __ip_route_output_key_hash are added to make sure
the route request logic is not altered by the conversion.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 net/ipv4/route.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 3142cd802e79..03ddc03c185a 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2467,11 +2467,11 @@ struct rtable *ip_route_output_flow(struct net *net, struct flowi4 *flp4,
 }
 EXPORT_SYMBOL_GPL(ip_route_output_flow);
 
+/* called with rcu_read_lock held */
 static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 table_id,
 			struct flowi4 *fl4, struct sk_buff *skb, u32 portid,
-			u32 seq, int event)
+			u32 seq, int event, struct rtable *rt)
 {
-	struct rtable *rt = skb_rtable(skb);
 	struct rtmsg *r;
 	struct nlmsghdr *nlh;
 	unsigned long expires = 0;
@@ -2585,10 +2585,12 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh)
 	struct net *net = sock_net(in_skb->sk);
 	struct rtmsg *rtm;
 	struct nlattr *tb[RTA_MAX+1];
+	struct fib_result res = {};
 	struct rtable *rt = NULL;
 	struct flowi4 fl4;
 	__be32 dst = 0;
 	__be32 src = 0;
+	__u8 tos;
 	u32 iif;
 	int err;
 	int mark;
@@ -2630,15 +2632,20 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh)
 	memset(&fl4, 0, sizeof(fl4));
 	fl4.daddr = dst;
 	fl4.saddr = src;
-	fl4.flowi4_tos = rtm->rtm_tos;
+	tos = rtm->rtm_tos & (IPTOS_RT_MASK | RTO_ONLINK);
+	fl4.flowi4_tos = tos & IPTOS_RT_MASK;
+	fl4.flowi4_scope = ((tos & RTO_ONLINK) ?
+				RT_SCOPE_LINK : RT_SCOPE_UNIVERSE);
 	fl4.flowi4_oif = tb[RTA_OIF] ? nla_get_u32(tb[RTA_OIF]) : 0;
 	fl4.flowi4_mark = mark;
 	fl4.flowi4_uid = uid;
 
+	rcu_read_lock();
+
 	if (iif) {
 		struct net_device *dev;
 
-		dev = __dev_get_by_index(net, iif);
+		dev = dev_get_by_index_rcu(net, iif);
 		if (!dev) {
 			err = -ENODEV;
 			goto errout_free;
@@ -2647,14 +2654,16 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh)
 		skb->protocol	= htons(ETH_P_IP);
 		skb->dev	= dev;
 		skb->mark	= mark;
-		err = ip_route_input(skb, dst, src, rtm->rtm_tos, dev);
+		err = ip_route_input_rcu(skb, dst, src, rtm->rtm_tos,
+					 dev, &res);
 
 		rt = skb_rtable(skb);
 		if (err == 0 && rt->dst.error)
 			err = -rt->dst.error;
 	} else {
-		rt = ip_route_output_key(net, &fl4);
+		fl4.flowi4_iif = LOOPBACK_IFINDEX;
 
+		rt = ip_route_output_key_hash_rcu(net, &fl4, &res, -1);
 		err = 0;
 		if (IS_ERR(rt))
 			err = PTR_ERR(rt);
@@ -2663,7 +2672,6 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh)
 	if (err)
 		goto errout_free;
 
-	skb_dst_set(skb, &rt->dst);
 	if (rtm->rtm_flags & RTM_F_NOTIFY)
 		rt->rt_flags |= RTCF_NOTIFY;
 
@@ -2672,15 +2680,18 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh)
 
 	err = rt_fill_info(net, dst, src, table_id, &fl4, skb,
 			   NETLINK_CB(in_skb).portid, nlh->nlmsg_seq,
-			   RTM_NEWROUTE);
+			   RTM_NEWROUTE, rt);
 	if (err < 0)
 		goto errout_free;
 
+	rcu_read_unlock();
+
 	err = rtnl_unicast(skb, net, NETLINK_CB(in_skb).portid);
 errout:
 	return err;
 
 errout_free:
+	rcu_read_unlock();
 	kfree_skb(skb);
 	goto errout;
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 4/7] net: ipv4: refactor fib_dump_info
  2017-01-09 21:32 [PATCH net-next 0/7] net: ivp4: return matching route for GETROUTE request David Ahern
                   ` (2 preceding siblings ...)
  2017-01-09 21:32 ` [PATCH net-next 3/7] net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup David Ahern
@ 2017-01-09 21:32 ` David Ahern
  2017-01-09 21:32 ` [PATCH net-next 5/7] net: ipv4: Save trie prefix to fib lookup result David Ahern
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: David Ahern @ 2017-01-09 21:32 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

Pull code that adds attributes to the response from fib_dump_info into
a separate, stand-alone function. That function is used by a later patch
to add the matching route to a get route request.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 net/ipv4/fib_lookup.h    |  2 ++
 net/ipv4/fib_semantics.c | 17 +++++++++++++++--
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/fib_lookup.h b/net/ipv4/fib_lookup.h
index 9c02920725db..7d2c019c5257 100644
--- a/net/ipv4/fib_lookup.h
+++ b/net/ipv4/fib_lookup.h
@@ -33,6 +33,8 @@ int fib_nh_match(struct fib_config *cfg, struct fib_info *fi);
 int fib_dump_info(struct sk_buff *skb, u32 pid, u32 seq, int event, u32 tb_id,
 		  u8 type, __be32 dst, int dst_len, u8 tos, struct fib_info *fi,
 		  unsigned int);
+int fib_dump_add_attrs_rcu(struct sk_buff *skb, __be32 dst, struct rtmsg *rtm,
+			   struct fib_info *fi);
 void rtmsg_fib(int event, __be32 key, struct fib_alias *fa, int dst_len,
 	       u32 tb_id, const struct nl_info *info, unsigned int nlm_flags);
 
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 05c911d21782..c6f7223fcd08 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -1247,6 +1247,21 @@ int fib_dump_info(struct sk_buff *skb, u32 portid, u32 seq, int event,
 	rtm->rtm_scope = fi->fib_scope;
 	rtm->rtm_protocol = fi->fib_protocol;
 
+	if (fib_dump_add_attrs_rcu(skb, dst, rtm, fi))
+		goto nla_put_failure;
+
+	nlmsg_end(skb, nlh);
+	return 0;
+
+nla_put_failure:
+	nlmsg_cancel(skb, nlh);
+	return -EMSGSIZE;
+}
+
+/* called with rcu_read_lock held */
+int fib_dump_add_attrs_rcu(struct sk_buff *skb, __be32 dst, struct rtmsg *rtm,
+			   struct fib_info *fi)
+{
 	if (rtm->rtm_dst_len &&
 	    nla_put_in_addr(skb, RTA_DST, dst))
 		goto nla_put_failure;
@@ -1325,11 +1340,9 @@ int fib_dump_info(struct sk_buff *skb, u32 portid, u32 seq, int event,
 		nla_nest_end(skb, mp);
 	}
 #endif
-	nlmsg_end(skb, nlh);
 	return 0;
 
 nla_put_failure:
-	nlmsg_cancel(skb, nlh);
 	return -EMSGSIZE;
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 5/7] net: ipv4: Save trie prefix to fib lookup result
  2017-01-09 21:32 [PATCH net-next 0/7] net: ivp4: return matching route for GETROUTE request David Ahern
                   ` (3 preceding siblings ...)
  2017-01-09 21:32 ` [PATCH net-next 4/7] net: ipv4: refactor fib_dump_info David Ahern
@ 2017-01-09 21:32 ` David Ahern
  2017-01-09 21:32 ` [PATCH net-next 6/7] net: ipv4: return route match in GETROUTE request David Ahern
  2017-01-09 21:32 ` [PATCH net-next 7/7] net: ipv4: Remove event arg to rt_fill_info David Ahern
  6 siblings, 0 replies; 10+ messages in thread
From: David Ahern @ 2017-01-09 21:32 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

Prefix is needed for returning matching route spec on get route request.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 include/net/ip_fib.h | 1 +
 net/ipv4/fib_trie.c  | 1 +
 2 files changed, 2 insertions(+)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 57c2a863d0b2..f2cc345852d7 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -136,6 +136,7 @@ struct fib_rule;
 
 struct fib_table;
 struct fib_result {
+	__be32		prefix;
 	unsigned char	prefixlen;
 	unsigned char	nh_sel;
 	unsigned char	type;
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 2919d1a10cfd..2fc5793cce36 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1544,6 +1544,7 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
 			if (!(fib_flags & FIB_LOOKUP_NOREF))
 				atomic_inc(&fi->fib_clntref);
 
+			res->prefix = htonl(n->key);
 			res->prefixlen = KEYLENGTH - fa->fa_slen;
 			res->nh_sel = nhsel;
 			res->type = fa->fa_type;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 6/7] net: ipv4: return route match in GETROUTE request
  2017-01-09 21:32 [PATCH net-next 0/7] net: ivp4: return matching route for GETROUTE request David Ahern
                   ` (4 preceding siblings ...)
  2017-01-09 21:32 ` [PATCH net-next 5/7] net: ipv4: Save trie prefix to fib lookup result David Ahern
@ 2017-01-09 21:32 ` David Ahern
  2017-01-11  1:40   ` David Miller
  2017-01-09 21:32 ` [PATCH net-next 7/7] net: ipv4: Remove event arg to rt_fill_info David Ahern
  6 siblings, 1 reply; 10+ messages in thread
From: David Ahern @ 2017-01-09 21:32 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

Add the matching route returned in fib_result as a new, nested attribute,
RTA_ROUTE_GET, to the GETROUTE response. The rtmsg struct is added use a
new attribute, RTA_ROUTE_GET_RTM. These attributes allow userspace to show
which route was matched for a GETROUTE request.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 include/uapi/linux/rtnetlink.h |  2 ++
 net/ipv4/route.c               | 36 ++++++++++++++++++++++++++++++++++--
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 8c93ad1ef9ab..471384b72cea 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -319,6 +319,8 @@ enum rtattr_type_t {
 	RTA_EXPIRES,
 	RTA_PAD,
 	RTA_UID,
+	RTA_ROUTE_GET,  /* nested attribute; route spec for RTM_GETROUTE */
+	RTA_ROUTE_GET_RTM, /* struct rtmsg for nested spec */
 	__RTA_MAX
 };
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 03ddc03c185a..9f44b869b8a6 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -113,6 +113,7 @@
 #include <net/secure_seq.h>
 #include <net/ip_tunnels.h>
 #include <net/l3mdev.h>
+#include "fib_lookup.h"
 
 #define RT_FL_TOS(oldflp4) \
 	((oldflp4)->flowi4_tos & (IPTOS_RT_MASK | RTO_ONLINK))
@@ -2470,7 +2471,8 @@ EXPORT_SYMBOL_GPL(ip_route_output_flow);
 /* called with rcu_read_lock held */
 static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 table_id,
 			struct flowi4 *fl4, struct sk_buff *skb, u32 portid,
-			u32 seq, int event, struct rtable *rt)
+			u32 seq, int event, struct rtable *rt,
+			struct fib_result *res)
 {
 	struct rtmsg *r;
 	struct nlmsghdr *nlh;
@@ -2572,6 +2574,36 @@ static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 table_id,
 	if (rtnl_put_cacheinfo(skb, &rt->dst, 0, expires, error) < 0)
 		goto nla_put_failure;
 
+	if (res->fi) {
+		struct nlattr *get_rt;
+		struct rtmsg r_match;
+
+		/* Add data for matching route */
+		get_rt = nla_nest_start(skb, RTA_ROUTE_GET);
+		if (!get_rt)
+			goto nla_put_failure;
+
+		r_match.rtm_family = AF_INET;
+		r_match.rtm_dst_len = res->prefixlen;
+		r_match.rtm_src_len = 0;
+		r_match.rtm_tos  = fl4->flowi4_tos;
+		r_match.rtm_type = rt->rt_type;
+		r_match.rtm_flags = res->fi->fib_flags;
+		r_match.rtm_scope = res->fi->fib_scope;
+		r_match.rtm_protocol = res->fi->fib_protocol;
+		r_match.rtm_table = table_id;
+		if (nla_put_u32(skb, RTA_TABLE, table_id))
+			goto nla_put_failure;
+
+		if (fib_dump_add_attrs_rcu(skb, res->prefix, &r_match, res->fi))
+			goto nla_put_failure;
+
+		if (nla_put(skb, RTA_ROUTE_GET_RTM, sizeof(r_match), &r_match))
+			goto nla_put_failure;
+
+		nla_nest_end(skb, get_rt);
+	}
+
 	nlmsg_end(skb, nlh);
 	return 0;
 
@@ -2680,7 +2712,7 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh)
 
 	err = rt_fill_info(net, dst, src, table_id, &fl4, skb,
 			   NETLINK_CB(in_skb).portid, nlh->nlmsg_seq,
-			   RTM_NEWROUTE, rt);
+			   RTM_NEWROUTE, rt, &res);
 	if (err < 0)
 		goto errout_free;
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 7/7] net: ipv4: Remove event arg to rt_fill_info
  2017-01-09 21:32 [PATCH net-next 0/7] net: ivp4: return matching route for GETROUTE request David Ahern
                   ` (5 preceding siblings ...)
  2017-01-09 21:32 ` [PATCH net-next 6/7] net: ipv4: return route match in GETROUTE request David Ahern
@ 2017-01-09 21:32 ` David Ahern
  6 siblings, 0 replies; 10+ messages in thread
From: David Ahern @ 2017-01-09 21:32 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

rt_fill_info has 1 caller with the event set to RTM_NEWROUTE. Given that
remove the arg and use RTM_NEWROUTE directly in rt_fill_info.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 net/ipv4/route.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 9f44b869b8a6..b34f79ffb11d 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2471,8 +2471,7 @@ EXPORT_SYMBOL_GPL(ip_route_output_flow);
 /* called with rcu_read_lock held */
 static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 table_id,
 			struct flowi4 *fl4, struct sk_buff *skb, u32 portid,
-			u32 seq, int event, struct rtable *rt,
-			struct fib_result *res)
+			u32 seq, struct rtable *rt, struct fib_result *res)
 {
 	struct rtmsg *r;
 	struct nlmsghdr *nlh;
@@ -2480,7 +2479,7 @@ static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 table_id,
 	u32 error;
 	u32 metrics[RTAX_MAX];
 
-	nlh = nlmsg_put(skb, portid, seq, event, sizeof(*r), 0);
+	nlh = nlmsg_put(skb, portid, seq, RTM_NEWROUTE, sizeof(*r), 0);
 	if (!nlh)
 		return -EMSGSIZE;
 
@@ -2711,8 +2710,7 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh)
 		table_id = rt->rt_table_id;
 
 	err = rt_fill_info(net, dst, src, table_id, &fl4, skb,
-			   NETLINK_CB(in_skb).portid, nlh->nlmsg_seq,
-			   RTM_NEWROUTE, rt, &res);
+			   NETLINK_CB(in_skb).portid, nlh->nlmsg_seq, rt, &res);
 	if (err < 0)
 		goto errout_free;
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 6/7] net: ipv4: return route match in GETROUTE request
  2017-01-09 21:32 ` [PATCH net-next 6/7] net: ipv4: return route match in GETROUTE request David Ahern
@ 2017-01-11  1:40   ` David Miller
  2017-01-11 17:24     ` David Ahern
  0 siblings, 1 reply; 10+ messages in thread
From: David Miller @ 2017-01-11  1:40 UTC (permalink / raw)
  To: dsa; +Cc: netdev

From: David Ahern <dsa@cumulusnetworks.com>
Date: Mon,  9 Jan 2017 13:32:50 -0800

> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
> index 8c93ad1ef9ab..471384b72cea 100644
> --- a/include/uapi/linux/rtnetlink.h
> +++ b/include/uapi/linux/rtnetlink.h
> @@ -319,6 +319,8 @@ enum rtattr_type_t {
>  	RTA_EXPIRES,
>  	RTA_PAD,
>  	RTA_UID,
> +	RTA_ROUTE_GET,  /* nested attribute; route spec for RTM_GETROUTE */
> +	RTA_ROUTE_GET_RTM, /* struct rtmsg for nested spec */
>  	__RTA_MAX
>  };

The nested attribute and the attributes within that nested attribute
live in two different attribute number namespaces.

So usually we allocate the nested attribute at the top level in the
main enumeration.  Then the elements within the nested attribute
get allocated with a new enumeration created specifically for items
inside that nested attribute.

For example, RTA_METRICS --> RTAX_*

So please arrange things this way.

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 6/7] net: ipv4: return route match in GETROUTE request
  2017-01-11  1:40   ` David Miller
@ 2017-01-11 17:24     ` David Ahern
  0 siblings, 0 replies; 10+ messages in thread
From: David Ahern @ 2017-01-11 17:24 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On 1/10/17 6:40 PM, David Miller wrote:
> From: David Ahern <dsa@cumulusnetworks.com>
> Date: Mon,  9 Jan 2017 13:32:50 -0800
> 
>> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
>> index 8c93ad1ef9ab..471384b72cea 100644
>> --- a/include/uapi/linux/rtnetlink.h
>> +++ b/include/uapi/linux/rtnetlink.h
>> @@ -319,6 +319,8 @@ enum rtattr_type_t {
>>  	RTA_EXPIRES,
>>  	RTA_PAD,
>>  	RTA_UID,
>> +	RTA_ROUTE_GET,  /* nested attribute; route spec for RTM_GETROUTE */
>> +	RTA_ROUTE_GET_RTM, /* struct rtmsg for nested spec */
>>  	__RTA_MAX
>>  };
> 
> The nested attribute and the attributes within that nested attribute
> live in two different attribute number namespaces.
> 
> So usually we allocate the nested attribute at the top level in the
> main enumeration.  Then the elements within the nested attribute
> get allocated with a new enumeration created specifically for items
> inside that nested attribute.
> 
> For example, RTA_METRICS --> RTAX_*
> 
> So please arrange things this way.

ok. I did it this way for code re-use since the nested attribute is a route spec. If separate attributes for the nest are desired I'll do that.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-01-11 17:24 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-09 21:32 [PATCH net-next 0/7] net: ivp4: return matching route for GETROUTE request David Ahern
2017-01-09 21:32 ` [PATCH net-next 1/7] net: ipv4: refactor __ip_route_output_key_hash David Ahern
2017-01-09 21:32 ` [PATCH net-next 2/7] net: ipv4: refactor ip_route_input_noref David Ahern
2017-01-09 21:32 ` [PATCH net-next 3/7] net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup David Ahern
2017-01-09 21:32 ` [PATCH net-next 4/7] net: ipv4: refactor fib_dump_info David Ahern
2017-01-09 21:32 ` [PATCH net-next 5/7] net: ipv4: Save trie prefix to fib lookup result David Ahern
2017-01-09 21:32 ` [PATCH net-next 6/7] net: ipv4: return route match in GETROUTE request David Ahern
2017-01-11  1:40   ` David Miller
2017-01-11 17:24     ` David Ahern
2017-01-09 21:32 ` [PATCH net-next 7/7] net: ipv4: Remove event arg to rt_fill_info David Ahern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).