All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v2 0/8] net: extend RTM_GETROUTE to return fib result
@ 2017-05-25 17:42 Roopa Prabhu
  2017-05-25 17:42 ` [PATCH net-next v2 1/8] net: ipv4: refactor __ip_route_output_key_hash Roopa Prabhu
                   ` (8 more replies)
  0 siblings, 9 replies; 15+ messages in thread
From: Roopa Prabhu @ 2017-05-25 17:42 UTC (permalink / raw)
  To: davem, dsahern, rami.rosen; +Cc: netdev, nikolay

From: Roopa Prabhu <roopa@cumulusnetworks.com>

This series adds a new RTM_F_FIB_MATCH flag to return matched fib result
with RTM_GETROUTE. This is useful for applications and protocols in
userspace wanting to query the selected route.

examples (with patched iproute2):
ipv4:
----
$ip route show
default via 192.168.0.2 dev eth0
10.0.14.0/24
        nexthop via 172.16.0.3  dev dummy0 weight 1
        nexthop via 172.16.1.3  dev dummy1 weight 1

$ip route get 10.0.14.2
10.0.14.2 via 172.16.1.3 dev dummy1  src 172.16.1.1 
    cache 

$ip route get fibmatch 10.0.14.2
10.0.14.0/24
        nexthop via 172.16.0.3  dev dummy0 weight 1
        nexthop via 172.16.1.3  dev dummy1 weight 1

ipv6:
----
$ip -6 route show
2001:db9:100::/120  metric 1024 
        nexthop via 2001:db8:2::2  dev dummy0 weight 1
        nexthop via 2001:db8:12::2  dev dummy1 weight 1

$ip -6 route get 2001:db9:100::1
2001:db9:100::1 from :: via 2001:db8:12::2 dev dummy1  src 2001:db8:12::1  metric 1024  pref medium

$ip -6 route get fibmatch 2001:db9:100::1
2001:db9:100::/120  metric 1024 
        nexthop via 2001:db8:12::2  dev dummy1 weight 1
        nexthop via 2001:db8:2::2  dev dummy0 weight 1

v2:
        - pick up new forward port of patch-01 from david
        - inet6_rtm_getroute: use container_of for rt6_info to
          dst conversion

David Ahern (5):
  net: ipv4: refactor __ip_route_output_key_hash
  net: ipv4: refactor ip_route_input_noref
  net: ipv4: Remove event arg to rt_fill_info
  net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup
  net: ipv4: Save trie prefix to fib lookup result

Roopa Prabhu (3):
  net: ipv4: add new RTM_F_FIB_MATCH flag for use with RTM_GETROUTE
  net: ipv4: RTM_GETROUTE: return matched fib result when
    RTM_F_FIB_MATCH flag present
  net: ipv6: RTM_GETROUTE: return matched fib result when
    RTM_F_FIB_MATCH flag present

 include/net/ip_fib.h           |   1 +
 include/net/route.h            |  10 ++-
 include/uapi/linux/rtnetlink.h |   1 +
 net/ipv4/fib_trie.c            |   1 +
 net/ipv4/icmp.c                |   2 +-
 net/ipv4/route.c               | 160 ++++++++++++++++++++++++-----------------
 net/ipv6/route.c               |  11 ++-
 7 files changed, 116 insertions(+), 70 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next v2 1/8] net: ipv4: refactor __ip_route_output_key_hash
  2017-05-25 17:42 [PATCH net-next v2 0/8] net: extend RTM_GETROUTE to return fib result Roopa Prabhu
@ 2017-05-25 17:42 ` Roopa Prabhu
  2017-05-25 17:42 ` [PATCH net-next v2 2/8] net: ipv4: refactor ip_route_input_noref Roopa Prabhu
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Roopa Prabhu @ 2017-05-25 17:42 UTC (permalink / raw)
  To: davem, dsahern, rami.rosen; +Cc: netdev, nikolay

From: David Ahern <dsahern@gmail.com>

A later patch wants access to the fib result on an output route lookup
with the rcu lock held. Refactor __ip_route_output_key_hash, pushing
the logic between rcu_read_lock ... rcu_read_unlock into a new helper
with the fib_result as an input arg.

To keep the name length under control remove the leading underscores
from the name and add _rcu to the name of the new helper indicating it
is called with the rcu read lock held.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
 include/net/route.h |  9 ++++++---
 net/ipv4/icmp.c     |  2 +-
 net/ipv4/route.c    | 53 +++++++++++++++++++++++++++++++----------------------
 3 files changed, 38 insertions(+), 26 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index 2cc0e14..89e4028 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -113,13 +113,16 @@ struct rt_cache_stat {
 int ip_rt_init(void);
 void rt_cache_flush(struct net *net);
 void rt_flush_dev(struct net_device *dev);
-struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 *flp,
-					  const struct sk_buff *skb);
+struct rtable *ip_route_output_key_hash(struct net *net, struct flowi4 *flp,
+					const struct sk_buff *skb);
+struct rtable *ip_route_output_key_hash_rcu(struct net *net, struct flowi4 *flp,
+					    struct fib_result *res,
+					    const struct sk_buff *skb);
 
 static inline struct rtable *__ip_route_output_key(struct net *net,
 						   struct flowi4 *flp)
 {
-	return __ip_route_output_key_hash(net, flp, NULL);
+	return ip_route_output_key_hash(net, flp, NULL);
 }
 
 struct rtable *ip_route_output_flow(struct net *, struct flowi4 *flp,
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 43318b5..5610971 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -489,7 +489,7 @@ static struct rtable *icmp_route_lookup(struct net *net,
 	fl4->flowi4_oif = l3mdev_master_ifindex(skb_dst(skb_in)->dev);
 
 	security_skb_classify_flow(skb_in, flowi4_to_flowi(fl4));
-	rt = __ip_route_output_key_hash(net, fl4, skb_in);
+	rt = ip_route_output_key_hash(net, fl4, skb_in);
 	if (IS_ERR(rt))
 		return rt;
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 655d9ee..c9b55cb 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2246,29 +2246,40 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
  * Major route resolver routine.
  */
 
-struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 *fl4,
-					  const struct sk_buff *skb)
+struct rtable *ip_route_output_key_hash(struct net *net, struct flowi4 *fl4,
+					const struct sk_buff *skb)
 {
-	struct net_device *dev_out = NULL;
 	__u8 tos = RT_FL_TOS(fl4);
-	unsigned int flags = 0;
 	struct fib_result res;
 	struct rtable *rth;
-	int orig_oif;
-	int err = -ENETUNREACH;
 
 	res.tclassid	= 0;
 	res.fi		= NULL;
 	res.table	= NULL;
 
-	orig_oif = fl4->flowi4_oif;
-
 	fl4->flowi4_iif = LOOPBACK_IFINDEX;
 	fl4->flowi4_tos = tos & IPTOS_RT_MASK;
 	fl4->flowi4_scope = ((tos & RTO_ONLINK) ?
 			 RT_SCOPE_LINK : RT_SCOPE_UNIVERSE);
 
 	rcu_read_lock();
+	rth = ip_route_output_key_hash_rcu(net, fl4, &res, skb);
+	rcu_read_unlock();
+
+	return rth;
+}
+EXPORT_SYMBOL_GPL(ip_route_output_key_hash);
+
+struct rtable *ip_route_output_key_hash_rcu(struct net *net, struct flowi4 *fl4,
+					    struct fib_result *res,
+					    const struct sk_buff *skb)
+{
+	struct net_device *dev_out = NULL;
+	int orig_oif = fl4->flowi4_oif;
+	unsigned int flags = 0;
+	struct rtable *rth;
+	int err = -ENETUNREACH;
+
 	if (fl4->saddr) {
 		rth = ERR_PTR(-EINVAL);
 		if (ipv4_is_multicast(fl4->saddr) ||
@@ -2354,15 +2365,15 @@ struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 *fl4,
 			fl4->daddr = fl4->saddr = htonl(INADDR_LOOPBACK);
 		dev_out = net->loopback_dev;
 		fl4->flowi4_oif = LOOPBACK_IFINDEX;
-		res.type = RTN_LOCAL;
+		res->type = RTN_LOCAL;
 		flags |= RTCF_LOCAL;
 		goto make_route;
 	}
 
-	err = fib_lookup(net, fl4, &res, 0);
+	err = fib_lookup(net, fl4, res, 0);
 	if (err) {
-		res.fi = NULL;
-		res.table = NULL;
+		res->fi = NULL;
+		res->table = NULL;
 		if (fl4->flowi4_oif &&
 		    (ipv4_is_multicast(fl4->daddr) ||
 		    !netif_index_is_l3_master(net, fl4->flowi4_oif))) {
@@ -2387,43 +2398,41 @@ struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 *fl4,
 			if (fl4->saddr == 0)
 				fl4->saddr = inet_select_addr(dev_out, 0,
 							      RT_SCOPE_LINK);
-			res.type = RTN_UNICAST;
+			res->type = RTN_UNICAST;
 			goto make_route;
 		}
 		rth = ERR_PTR(err);
 		goto out;
 	}
 
-	if (res.type == RTN_LOCAL) {
+	if (res->type == RTN_LOCAL) {
 		if (!fl4->saddr) {
-			if (res.fi->fib_prefsrc)
-				fl4->saddr = res.fi->fib_prefsrc;
+			if (res->fi->fib_prefsrc)
+				fl4->saddr = res->fi->fib_prefsrc;
 			else
 				fl4->saddr = fl4->daddr;
 		}
 
 		/* L3 master device is the loopback for that domain */
-		dev_out = l3mdev_master_dev_rcu(FIB_RES_DEV(res)) ? :
+		dev_out = l3mdev_master_dev_rcu(FIB_RES_DEV(*res)) ? :
 			net->loopback_dev;
 		fl4->flowi4_oif = dev_out->ifindex;
 		flags |= RTCF_LOCAL;
 		goto make_route;
 	}
 
-	fib_select_path(net, &res, fl4, skb);
+	fib_select_path(net, res, fl4, skb);
 
-	dev_out = FIB_RES_DEV(res);
+	dev_out = FIB_RES_DEV(*res);
 	fl4->flowi4_oif = dev_out->ifindex;
 
 
 make_route:
-	rth = __mkroute_output(&res, fl4, orig_oif, dev_out, flags);
+	rth = __mkroute_output(res, fl4, orig_oif, dev_out, flags);
 
 out:
-	rcu_read_unlock();
 	return rth;
 }
-EXPORT_SYMBOL_GPL(__ip_route_output_key_hash);
 
 static struct dst_entry *ipv4_blackhole_dst_check(struct dst_entry *dst, u32 cookie)
 {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next v2 2/8] net: ipv4: refactor ip_route_input_noref
  2017-05-25 17:42 [PATCH net-next v2 0/8] net: extend RTM_GETROUTE to return fib result Roopa Prabhu
  2017-05-25 17:42 ` [PATCH net-next v2 1/8] net: ipv4: refactor __ip_route_output_key_hash Roopa Prabhu
@ 2017-05-25 17:42 ` Roopa Prabhu
  2017-05-25 17:42 ` [PATCH net-next v2 3/8] net: ipv4: Remove event arg to rt_fill_info Roopa Prabhu
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Roopa Prabhu @ 2017-05-25 17:42 UTC (permalink / raw)
  To: davem, dsahern, rami.rosen; +Cc: netdev, nikolay

From: David Ahern <dsahern@gmail.com>

A later patch wants access to the fib result on an input route lookup
with the rcu lock held. Refactor ip_route_input_noref pushing the logic
between rcu_read_lock ... rcu_read_unlock into a new helper that takes
the fib_result as an input arg.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
 include/net/route.h |  3 +++
 net/ipv4/route.c    | 66 ++++++++++++++++++++++++++++++-----------------------
 2 files changed, 40 insertions(+), 29 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index 89e4028..08e689f 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -178,6 +178,9 @@ static inline struct rtable *ip_route_output_gre(struct net *net, struct flowi4
 
 int ip_route_input_noref(struct sk_buff *skb, __be32 dst, __be32 src,
 			 u8 tos, struct net_device *devin);
+int ip_route_input_rcu(struct sk_buff *skb, __be32 dst, __be32 src,
+		       u8 tos, struct net_device *devin,
+		       struct fib_result *res);
 
 static inline int ip_route_input(struct sk_buff *skb, __be32 dst, __be32 src,
 				 u8 tos, struct net_device *devin)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index c9b55cb..1dc8fd1 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1852,9 +1852,9 @@ static int ip_mkroute_input(struct sk_buff *skb,
  */
 
 static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
-			       u8 tos, struct net_device *dev)
+			       u8 tos, struct net_device *dev,
+			       struct fib_result *res)
 {
-	struct fib_result res;
 	struct in_device *in_dev = __in_dev_get_rcu(dev);
 	struct ip_tunnel_info *tun_info;
 	struct flowi4	fl4;
@@ -1884,8 +1884,8 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 	if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr))
 		goto martian_source;
 
-	res.fi = NULL;
-	res.table = NULL;
+	res->fi = NULL;
+	res->table = NULL;
 	if (ipv4_is_lbcast(daddr) || (saddr == 0 && daddr == 0))
 		goto brd_input;
 
@@ -1921,17 +1921,17 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 	fl4.daddr = daddr;
 	fl4.saddr = saddr;
 	fl4.flowi4_uid = sock_net_uid(net, NULL);
-	err = fib_lookup(net, &fl4, &res, 0);
+	err = fib_lookup(net, &fl4, res, 0);
 	if (err != 0) {
 		if (!IN_DEV_FORWARD(in_dev))
 			err = -EHOSTUNREACH;
 		goto no_route;
 	}
 
-	if (res.type == RTN_BROADCAST)
+	if (res->type == RTN_BROADCAST)
 		goto brd_input;
 
-	if (res.type == RTN_LOCAL) {
+	if (res->type == RTN_LOCAL) {
 		err = fib_validate_source(skb, saddr, daddr, tos,
 					  0, dev, in_dev, &itag);
 		if (err < 0)
@@ -1943,10 +1943,10 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 		err = -EHOSTUNREACH;
 		goto no_route;
 	}
-	if (res.type != RTN_UNICAST)
+	if (res->type != RTN_UNICAST)
 		goto martian_destination;
 
-	err = ip_mkroute_input(skb, &res, in_dev, daddr, saddr, tos);
+	err = ip_mkroute_input(skb, res, in_dev, daddr, saddr, tos);
 out:	return err;
 
 brd_input:
@@ -1960,14 +1960,14 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 			goto martian_source;
 	}
 	flags |= RTCF_BROADCAST;
-	res.type = RTN_BROADCAST;
+	res->type = RTN_BROADCAST;
 	RT_CACHE_STAT_INC(in_brd);
 
 local_input:
 	do_cache = false;
-	if (res.fi) {
+	if (res->fi) {
 		if (!itag) {
-			rth = rcu_dereference(FIB_RES_NH(res).nh_rth_input);
+			rth = rcu_dereference(FIB_RES_NH(*res).nh_rth_input);
 			if (rt_cache_valid(rth)) {
 				skb_dst_set_noref(skb, &rth->dst);
 				err = 0;
@@ -1978,7 +1978,7 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 	}
 
 	rth = rt_dst_alloc(l3mdev_master_dev_rcu(dev) ? : net->loopback_dev,
-			   flags | RTCF_LOCAL, res.type,
+			   flags | RTCF_LOCAL, res->type,
 			   IN_DEV_CONF_GET(in_dev, NOPOLICY), false, do_cache);
 	if (!rth)
 		goto e_nobufs;
@@ -1988,18 +1988,18 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 	rth->dst.tclassid = itag;
 #endif
 	rth->rt_is_input = 1;
-	if (res.table)
-		rth->rt_table_id = res.table->tb_id;
+	if (res->table)
+		rth->rt_table_id = res->table->tb_id;
 
 	RT_CACHE_STAT_INC(in_slow_tot);
-	if (res.type == RTN_UNREACHABLE) {
+	if (res->type == RTN_UNREACHABLE) {
 		rth->dst.input= ip_error;
 		rth->dst.error= -err;
 		rth->rt_flags 	&= ~RTCF_LOCAL;
 	}
 
 	if (do_cache) {
-		struct fib_nh *nh = &FIB_RES_NH(res);
+		struct fib_nh *nh = &FIB_RES_NH(*res);
 
 		rth->dst.lwtstate = lwtstate_get(nh->nh_lwtstate);
 		if (lwtunnel_input_redirect(rth->dst.lwtstate)) {
@@ -2019,9 +2019,9 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 
 no_route:
 	RT_CACHE_STAT_INC(in_no_route);
-	res.type = RTN_UNREACHABLE;
-	res.fi = NULL;
-	res.table = NULL;
+	res->type = RTN_UNREACHABLE;
+	res->fi = NULL;
+	res->table = NULL;
 	goto local_input;
 
 	/*
@@ -2051,11 +2051,22 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 int ip_route_input_noref(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 			 u8 tos, struct net_device *dev)
 {
-	int res;
+	struct fib_result res;
+	int err;
 
 	tos &= IPTOS_RT_MASK;
 	rcu_read_lock();
+	err = ip_route_input_rcu(skb, daddr, saddr, tos, dev, &res);
+	rcu_read_unlock();
 
+	return err;
+}
+EXPORT_SYMBOL(ip_route_input_noref);
+
+/* called with rcu_read_lock held */
+int ip_route_input_rcu(struct sk_buff *skb, __be32 daddr, __be32 saddr,
+		       u8 tos, struct net_device *dev, struct fib_result *res)
+{
 	/* Multicast recognition logic is moved from route cache to here.
 	   The problem was that too many Ethernet cards have broken/missing
 	   hardware multicast filters :-( As result the host on multicasting
@@ -2070,6 +2081,7 @@ int ip_route_input_noref(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 	if (ipv4_is_multicast(daddr)) {
 		struct in_device *in_dev = __in_dev_get_rcu(dev);
 		int our = 0;
+		int err = -EINVAL;
 
 		if (in_dev)
 			our = ip_check_mc_rcu(in_dev, daddr, saddr,
@@ -2085,7 +2097,6 @@ int ip_route_input_noref(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 						      ip_hdr(skb)->protocol);
 		}
 
-		res = -EINVAL;
 		if (our
 #ifdef CONFIG_IP_MROUTE
 			||
@@ -2093,17 +2104,14 @@ int ip_route_input_noref(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 		     IN_DEV_MFORWARD(in_dev))
 #endif
 		   ) {
-			res = ip_route_input_mc(skb, daddr, saddr,
+			err = ip_route_input_mc(skb, daddr, saddr,
 						tos, dev, our);
 		}
-		rcu_read_unlock();
-		return res;
+		return err;
 	}
-	res = ip_route_input_slow(skb, daddr, saddr, tos, dev);
-	rcu_read_unlock();
-	return res;
+
+	return ip_route_input_slow(skb, daddr, saddr, tos, dev, res);
 }
-EXPORT_SYMBOL(ip_route_input_noref);
 
 /* called with rcu_read_lock() */
 static struct rtable *__mkroute_output(const struct fib_result *res,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next v2 3/8] net: ipv4: Remove event arg to rt_fill_info
  2017-05-25 17:42 [PATCH net-next v2 0/8] net: extend RTM_GETROUTE to return fib result Roopa Prabhu
  2017-05-25 17:42 ` [PATCH net-next v2 1/8] net: ipv4: refactor __ip_route_output_key_hash Roopa Prabhu
  2017-05-25 17:42 ` [PATCH net-next v2 2/8] net: ipv4: refactor ip_route_input_noref Roopa Prabhu
@ 2017-05-25 17:42 ` Roopa Prabhu
  2017-05-25 17:42 ` [PATCH net-next v2 4/8] net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup Roopa Prabhu
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Roopa Prabhu @ 2017-05-25 17:42 UTC (permalink / raw)
  To: davem, dsahern, rami.rosen; +Cc: netdev, nikolay

From: David Ahern <dsahern@gmail.com>

rt_fill_info has 1 caller with the event set to RTM_NEWROUTE. Given that
remove the arg and use RTM_NEWROUTE directly in rt_fill_info.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
 net/ipv4/route.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 1dc8fd1..d8fcecc 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2536,7 +2536,7 @@ struct rtable *ip_route_output_flow(struct net *net, struct flowi4 *flp4,
 
 static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 table_id,
 			struct flowi4 *fl4, struct sk_buff *skb, u32 portid,
-			u32 seq, int event)
+			u32 seq)
 {
 	struct rtable *rt = skb_rtable(skb);
 	struct rtmsg *r;
@@ -2545,7 +2545,7 @@ static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 table_id,
 	u32 error;
 	u32 metrics[RTAX_MAX];
 
-	nlh = nlmsg_put(skb, portid, seq, event, sizeof(*r), 0);
+	nlh = nlmsg_put(skb, portid, seq, RTM_NEWROUTE, sizeof(*r), 0);
 	if (!nlh)
 		return -EMSGSIZE;
 
@@ -2745,8 +2745,7 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 		table_id = rt->rt_table_id;
 
 	err = rt_fill_info(net, dst, src, table_id, &fl4, skb,
-			   NETLINK_CB(in_skb).portid, nlh->nlmsg_seq,
-			   RTM_NEWROUTE);
+			   NETLINK_CB(in_skb).portid, nlh->nlmsg_seq);
 	if (err < 0)
 		goto errout_free;
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next v2 4/8] net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup
  2017-05-25 17:42 [PATCH net-next v2 0/8] net: extend RTM_GETROUTE to return fib result Roopa Prabhu
                   ` (2 preceding siblings ...)
  2017-05-25 17:42 ` [PATCH net-next v2 3/8] net: ipv4: Remove event arg to rt_fill_info Roopa Prabhu
@ 2017-05-25 17:42 ` Roopa Prabhu
  2017-05-31 20:11   ` John Fastabend
  2017-05-25 17:42 ` [PATCH net-next v2 5/8] net: ipv4: Save trie prefix to fib lookup result Roopa Prabhu
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Roopa Prabhu @ 2017-05-25 17:42 UTC (permalink / raw)
  To: davem, dsahern, rami.rosen; +Cc: netdev, nikolay

From: David Ahern <dsahern@gmail.com>

Convert inet_rtm_getroute to use ip_route_input_rcu and
ip_route_output_key_hash_rcu passing the fib_result arg to both.
The rcu lock is held through the creation of the response, so the
rtable/dst does not need to be attached to the skb and is passed
to rt_fill_info directly.

In converting from ip_route_output_key to ip_route_output_key_hash_rcu
the xfrm_lookup_route in ip_route_output_flow is dropped since
flowi4_proto is not set for a route get request.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
 net/ipv4/route.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index d8fcecc..1fa9127 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2534,11 +2534,11 @@ struct rtable *ip_route_output_flow(struct net *net, struct flowi4 *flp4,
 }
 EXPORT_SYMBOL_GPL(ip_route_output_flow);
 
+/* called with rcu_read_lock held */
 static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 table_id,
 			struct flowi4 *fl4, struct sk_buff *skb, u32 portid,
-			u32 seq)
+			u32 seq, struct rtable *rt)
 {
-	struct rtable *rt = skb_rtable(skb);
 	struct rtmsg *r;
 	struct nlmsghdr *nlh;
 	unsigned long expires = 0;
@@ -2653,6 +2653,7 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 	struct net *net = sock_net(in_skb->sk);
 	struct rtmsg *rtm;
 	struct nlattr *tb[RTA_MAX+1];
+	struct fib_result res = {};
 	struct rtable *rt = NULL;
 	struct flowi4 fl4;
 	__be32 dst = 0;
@@ -2709,10 +2710,12 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 	fl4.flowi4_mark = mark;
 	fl4.flowi4_uid = uid;
 
+	rcu_read_lock();
+
 	if (iif) {
 		struct net_device *dev;
 
-		dev = __dev_get_by_index(net, iif);
+		dev = dev_get_by_index_rcu(net, iif);
 		if (!dev) {
 			err = -ENODEV;
 			goto errout_free;
@@ -2721,14 +2724,14 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 		skb->protocol	= htons(ETH_P_IP);
 		skb->dev	= dev;
 		skb->mark	= mark;
-		err = ip_route_input(skb, dst, src, rtm->rtm_tos, dev);
+		err = ip_route_input_rcu(skb, dst, src, rtm->rtm_tos,
+					 dev, &res);
 
 		rt = skb_rtable(skb);
 		if (err == 0 && rt->dst.error)
 			err = -rt->dst.error;
 	} else {
-		rt = ip_route_output_key(net, &fl4);
-
+		rt = ip_route_output_key_hash_rcu(net, &fl4, &res, skb);
 		err = 0;
 		if (IS_ERR(rt))
 			err = PTR_ERR(rt);
@@ -2737,7 +2740,6 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 	if (err)
 		goto errout_free;
 
-	skb_dst_set(skb, &rt->dst);
 	if (rtm->rtm_flags & RTM_F_NOTIFY)
 		rt->rt_flags |= RTCF_NOTIFY;
 
@@ -2745,15 +2747,18 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 		table_id = rt->rt_table_id;
 
 	err = rt_fill_info(net, dst, src, table_id, &fl4, skb,
-			   NETLINK_CB(in_skb).portid, nlh->nlmsg_seq);
+			   NETLINK_CB(in_skb).portid, nlh->nlmsg_seq, rt);
 	if (err < 0)
 		goto errout_free;
 
+	rcu_read_unlock();
+
 	err = rtnl_unicast(skb, net, NETLINK_CB(in_skb).portid);
 errout:
 	return err;
 
 errout_free:
+	rcu_read_unlock();
 	kfree_skb(skb);
 	goto errout;
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next v2 5/8] net: ipv4: Save trie prefix to fib lookup result
  2017-05-25 17:42 [PATCH net-next v2 0/8] net: extend RTM_GETROUTE to return fib result Roopa Prabhu
                   ` (3 preceding siblings ...)
  2017-05-25 17:42 ` [PATCH net-next v2 4/8] net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup Roopa Prabhu
@ 2017-05-25 17:42 ` Roopa Prabhu
  2017-05-25 17:42 ` [PATCH net-next v2 6/8] net: ipv4: add new RTM_F_FIB_MATCH flag for use with RTM_GETROUTE Roopa Prabhu
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Roopa Prabhu @ 2017-05-25 17:42 UTC (permalink / raw)
  To: davem, dsahern, rami.rosen; +Cc: netdev, nikolay

From: David Ahern <dsahern@gmail.com>

Prefix is needed for returning matching route spec on get route request.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
 include/net/ip_fib.h | 1 +
 net/ipv4/fib_trie.c  | 1 +
 2 files changed, 2 insertions(+)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 42e8b8f..25f5c51 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -136,6 +136,7 @@ struct fib_info {
 
 struct fib_table;
 struct fib_result {
+	__be32		prefix;
 	unsigned char	prefixlen;
 	unsigned char	nh_sel;
 	unsigned char	type;
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 6d0f6c79..6e9df7d 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1452,6 +1452,7 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
 			if (!(fib_flags & FIB_LOOKUP_NOREF))
 				atomic_inc(&fi->fib_clntref);
 
+			res->prefix = htonl(n->key);
 			res->prefixlen = KEYLENGTH - fa->fa_slen;
 			res->nh_sel = nhsel;
 			res->type = fa->fa_type;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next v2 6/8] net: ipv4: add new RTM_F_FIB_MATCH flag for use with RTM_GETROUTE
  2017-05-25 17:42 [PATCH net-next v2 0/8] net: extend RTM_GETROUTE to return fib result Roopa Prabhu
                   ` (4 preceding siblings ...)
  2017-05-25 17:42 ` [PATCH net-next v2 5/8] net: ipv4: Save trie prefix to fib lookup result Roopa Prabhu
@ 2017-05-25 17:42 ` Roopa Prabhu
  2017-05-25 17:42 ` [PATCH net-next v2 7/8] net: ipv4: RTM_GETROUTE: return matched fib result when requested Roopa Prabhu
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Roopa Prabhu @ 2017-05-25 17:42 UTC (permalink / raw)
  To: davem, dsahern, rami.rosen; +Cc: netdev, nikolay

From: Roopa Prabhu <roopa@cumulusnetworks.com>

This flag when specified will return matched fib result in
response to a RTM_GETROUTE query.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
 include/uapi/linux/rtnetlink.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 6487b21..564790e 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -278,6 +278,7 @@ enum rt_scope_t {
 #define RTM_F_EQUALIZE		0x400	/* Multipath equalizer: NI	*/
 #define RTM_F_PREFIX		0x800	/* Prefix addresses		*/
 #define RTM_F_LOOKUP_TABLE	0x1000	/* set rtm_table to FIB lookup result */
+#define RTM_F_FIB_MATCH	        0x2000	/* return full fib lookup match */
 
 /* Reserved table identifiers */
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next v2 7/8] net: ipv4: RTM_GETROUTE: return matched fib result when requested
  2017-05-25 17:42 [PATCH net-next v2 0/8] net: extend RTM_GETROUTE to return fib result Roopa Prabhu
                   ` (5 preceding siblings ...)
  2017-05-25 17:42 ` [PATCH net-next v2 6/8] net: ipv4: add new RTM_F_FIB_MATCH flag for use with RTM_GETROUTE Roopa Prabhu
@ 2017-05-25 17:42 ` Roopa Prabhu
  2017-05-25 17:42 ` [PATCH net-next v2 8/8] net: ipv6: " Roopa Prabhu
  2017-05-26 18:18 ` [PATCH net-next v2 0/8] net: extend RTM_GETROUTE to return fib result David Miller
  8 siblings, 0 replies; 15+ messages in thread
From: Roopa Prabhu @ 2017-05-25 17:42 UTC (permalink / raw)
  To: davem, dsahern, rami.rosen; +Cc: netdev, nikolay

From: Roopa Prabhu <roopa@cumulusnetworks.com>

This patch adds support to return matched fib result when RTM_F_FIB_MATCH
flag is specified in RTM_GETROUTE request. This is useful for user-space
applications/controllers wanting to query a matching route.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
 net/ipv4/route.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 1fa9127..3a74256 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -114,6 +114,8 @@
 #include <net/ip_tunnels.h>
 #include <net/l3mdev.h>
 
+#include "fib_lookup.h"
+
 #define RT_FL_TOS(oldflp4) \
 	((oldflp4)->flowi4_tos & (IPTOS_RT_MASK | RTO_ONLINK))
 
@@ -2746,8 +2748,15 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 	if (rtm->rtm_flags & RTM_F_LOOKUP_TABLE)
 		table_id = rt->rt_table_id;
 
-	err = rt_fill_info(net, dst, src, table_id, &fl4, skb,
-			   NETLINK_CB(in_skb).portid, nlh->nlmsg_seq, rt);
+	if (rtm->rtm_flags & RTM_F_FIB_MATCH)
+		err = fib_dump_info(skb, NETLINK_CB(in_skb).portid,
+				    nlh->nlmsg_seq, RTM_NEWROUTE, table_id,
+				    rt->rt_type, res.prefix, res.prefixlen,
+				    fl4.flowi4_tos, res.fi, 0);
+	else
+		err = rt_fill_info(net, dst, src, table_id, &fl4, skb,
+				   NETLINK_CB(in_skb).portid, nlh->nlmsg_seq,
+				   rt);
 	if (err < 0)
 		goto errout_free;
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next v2 8/8] net: ipv6: RTM_GETROUTE: return matched fib result when requested
  2017-05-25 17:42 [PATCH net-next v2 0/8] net: extend RTM_GETROUTE to return fib result Roopa Prabhu
                   ` (6 preceding siblings ...)
  2017-05-25 17:42 ` [PATCH net-next v2 7/8] net: ipv4: RTM_GETROUTE: return matched fib result when requested Roopa Prabhu
@ 2017-05-25 17:42 ` Roopa Prabhu
  2017-05-26 18:18 ` [PATCH net-next v2 0/8] net: extend RTM_GETROUTE to return fib result David Miller
  8 siblings, 0 replies; 15+ messages in thread
From: Roopa Prabhu @ 2017-05-25 17:42 UTC (permalink / raw)
  To: davem, dsahern, rami.rosen; +Cc: netdev, nikolay

From: Roopa Prabhu <roopa@cumulusnetworks.com>

This patch adds support to return matched fib result when RTM_F_FIB_MATCH
flag is specified in RTM_GETROUTE request. This is useful for user-space
applications/controllers wanting to query a matching route.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
 net/ipv6/route.c | 34 ++++++++++++++++++++++++++--------
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 80bda31..2fe84bd 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3607,11 +3607,13 @@ static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 {
 	struct net *net = sock_net(in_skb->sk);
 	struct nlattr *tb[RTA_MAX+1];
+	int err, iif = 0, oif = 0;
+	struct dst_entry *dst;
 	struct rt6_info *rt;
 	struct sk_buff *skb;
 	struct rtmsg *rtm;
 	struct flowi6 fl6;
-	int err, iif = 0, oif = 0;
+	bool fibmatch;
 
 	err = nlmsg_parse(nlh, sizeof(*rtm), tb, RTA_MAX, rtm_ipv6_policy,
 			  extack);
@@ -3622,6 +3624,7 @@ static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 	memset(&fl6, 0, sizeof(fl6));
 	rtm = nlmsg_data(nlh);
 	fl6.flowlabel = ip6_make_flowinfo(rtm->rtm_tos, 0);
+	fibmatch = !!(rtm->rtm_flags & RTM_F_FIB_MATCH);
 
 	if (tb[RTA_SRC]) {
 		if (nla_len(tb[RTA_SRC]) < sizeof(struct in6_addr))
@@ -3667,12 +3670,23 @@ static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 		if (!ipv6_addr_any(&fl6.saddr))
 			flags |= RT6_LOOKUP_F_HAS_SADDR;
 
-		rt = (struct rt6_info *)ip6_route_input_lookup(net, dev, &fl6,
-							       flags);
+		if (!fibmatch)
+			dst = ip6_route_input_lookup(net, dev, &fl6, flags);
 	} else {
 		fl6.flowi6_oif = oif;
 
-		rt = (struct rt6_info *)ip6_route_output(net, NULL, &fl6);
+		if (!fibmatch)
+			dst = ip6_route_output(net, NULL, &fl6);
+	}
+
+	if (fibmatch)
+		dst = ip6_route_lookup(net, &fl6, 0);
+
+	rt = container_of(dst, struct rt6_info, dst);
+	if (rt->dst.error) {
+		err = rt->dst.error;
+		ip6_rt_put(rt);
+		goto errout;
 	}
 
 	if (rt == net->ipv6.ip6_null_entry) {
@@ -3689,10 +3703,14 @@ static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 	}
 
 	skb_dst_set(skb, &rt->dst);
-
-	err = rt6_fill_node(net, skb, rt, &fl6.daddr, &fl6.saddr, iif,
-			    RTM_NEWROUTE, NETLINK_CB(in_skb).portid,
-			    nlh->nlmsg_seq, 0);
+	if (fibmatch)
+		err = rt6_fill_node(net, skb, rt, NULL, NULL, iif,
+				    RTM_NEWROUTE, NETLINK_CB(in_skb).portid,
+				    nlh->nlmsg_seq, 0);
+	else
+		err = rt6_fill_node(net, skb, rt, &fl6.daddr, &fl6.saddr, iif,
+				    RTM_NEWROUTE, NETLINK_CB(in_skb).portid,
+				    nlh->nlmsg_seq, 0);
 	if (err < 0) {
 		kfree_skb(skb);
 		goto errout;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2 0/8] net: extend RTM_GETROUTE to return fib result
  2017-05-25 17:42 [PATCH net-next v2 0/8] net: extend RTM_GETROUTE to return fib result Roopa Prabhu
                   ` (7 preceding siblings ...)
  2017-05-25 17:42 ` [PATCH net-next v2 8/8] net: ipv6: " Roopa Prabhu
@ 2017-05-26 18:18 ` David Miller
  2017-05-27  6:00   ` Roopa Prabhu
  8 siblings, 1 reply; 15+ messages in thread
From: David Miller @ 2017-05-26 18:18 UTC (permalink / raw)
  To: roopa; +Cc: dsahern, rami.rosen, netdev, nikolay

From: Roopa Prabhu <roopa@cumulusnetworks.com>
Date: Thu, 25 May 2017 10:42:32 -0700

> This series adds a new RTM_F_FIB_MATCH flag to return matched fib result
> with RTM_GETROUTE. This is useful for applications and protocols in
> userspace wanting to query the selected route.

Looks good, series applied, thanks.

Have you considered taking this further and allowing one to see which
nexthop a route lookup picked?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2 0/8] net: extend RTM_GETROUTE to return fib result
  2017-05-26 18:18 ` [PATCH net-next v2 0/8] net: extend RTM_GETROUTE to return fib result David Miller
@ 2017-05-27  6:00   ` Roopa Prabhu
  2017-05-27 14:02     ` David Ahern
  0 siblings, 1 reply; 15+ messages in thread
From: Roopa Prabhu @ 2017-05-27  6:00 UTC (permalink / raw)
  To: David Miller; +Cc: David Ahern, Rami Rosen, netdev, Nikolay Aleksandrov

On Fri, May 26, 2017 at 11:18 AM, David Miller <davem@davemloft.net> wrote:
> From: Roopa Prabhu <roopa@cumulusnetworks.com>
> Date: Thu, 25 May 2017 10:42:32 -0700
>
>> This series adds a new RTM_F_FIB_MATCH flag to return matched fib result
>> with RTM_GETROUTE. This is useful for applications and protocols in
>> userspace wanting to query the selected route.
>
> Looks good, series applied, thanks.
>

thank you.

> Have you considered taking this further and allowing one to see which
> nexthop a route lookup picked?

since the default RTM_GETROUTE output gives most of the attributes
from the resolved dst,
have not considered adding more to it yet...but certainly can if
needed in the future.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2 0/8] net: extend RTM_GETROUTE to return fib result
  2017-05-27  6:00   ` Roopa Prabhu
@ 2017-05-27 14:02     ` David Ahern
  0 siblings, 0 replies; 15+ messages in thread
From: David Ahern @ 2017-05-27 14:02 UTC (permalink / raw)
  To: Roopa Prabhu, David Miller; +Cc: Rami Rosen, netdev, Nikolay Aleksandrov

On 5/27/17 12:00 AM, Roopa Prabhu wrote:
> On Fri, May 26, 2017 at 11:18 AM, David Miller <davem@davemloft.net> wrote:
>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>> Date: Thu, 25 May 2017 10:42:32 -0700
>>
>>> This series adds a new RTM_F_FIB_MATCH flag to return matched fib result
>>> with RTM_GETROUTE. This is useful for applications and protocols in
>>> userspace wanting to query the selected route.
>>
>> Looks good, series applied, thanks.
>>
> 
> thank you.
> 
>> Have you considered taking this further and allowing one to see which
>> nexthop a route lookup picked?
> 
> since the default RTM_GETROUTE output gives most of the attributes
> from the resolved dst,
> have not considered adding more to it yet...but certainly can if
> needed in the future.
> 

One extension is to pass in prefix and length (plus any options to such
as metric to uniquely discriminate a route) and get back the route
entry. It is needed to retrieve the BPF code for routes with a bpf
encap. This patch set makes it easier.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2 4/8] net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup
  2017-05-25 17:42 ` [PATCH net-next v2 4/8] net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup Roopa Prabhu
@ 2017-05-31 20:11   ` John Fastabend
  2017-05-31 21:48     ` David Ahern
  0 siblings, 1 reply; 15+ messages in thread
From: John Fastabend @ 2017-05-31 20:11 UTC (permalink / raw)
  To: Roopa Prabhu, davem, dsahern, rami.rosen; +Cc: netdev, nikolay

On 05/25/2017 10:42 AM, Roopa Prabhu wrote:
> From: David Ahern <dsahern@gmail.com>
> 
> Convert inet_rtm_getroute to use ip_route_input_rcu and
> ip_route_output_key_hash_rcu passing the fib_result arg to both.
> The rcu lock is held through the creation of the response, so the
> rtable/dst does not need to be attached to the skb and is passed
> to rt_fill_info directly.
> 
> In converting from ip_route_output_key to ip_route_output_key_hash_rcu
> the xfrm_lookup_route in ip_route_output_flow is dropped since
> flowi4_proto is not set for a route get request.
> 
> Signed-off-by: David Ahern <dsahern@gmail.com>
> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
> ---

Hi Roopa, David,

I'm getting a usage count bug with this patch,

unregister_netdevice: waiting for lo to become free. Usage count = 1

see below,

>  net/ipv4/route.c | 21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index d8fcecc..1fa9127 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -2534,11 +2534,11 @@ struct rtable *ip_route_output_flow(struct net *net, struct flowi4 *flp4,
>  }
>  EXPORT_SYMBOL_GPL(ip_route_output_flow);
>  
> +/* called with rcu_read_lock held */
>  static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 table_id,
>  			struct flowi4 *fl4, struct sk_buff *skb, u32 portid,
> -			u32 seq)
> +			u32 seq, struct rtable *rt)
>  {
> -	struct rtable *rt = skb_rtable(skb);
>  	struct rtmsg *r;
>  	struct nlmsghdr *nlh;
>  	unsigned long expires = 0;
> @@ -2653,6 +2653,7 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
>  	struct net *net = sock_net(in_skb->sk);
>  	struct rtmsg *rtm;
>  	struct nlattr *tb[RTA_MAX+1];
> +	struct fib_result res = {};
>  	struct rtable *rt = NULL;
>  	struct flowi4 fl4;
>  	__be32 dst = 0;
> @@ -2709,10 +2710,12 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
>  	fl4.flowi4_mark = mark;
>  	fl4.flowi4_uid = uid;
>  
> +	rcu_read_lock();
> +
>  	if (iif) {
>  		struct net_device *dev;
>  
> -		dev = __dev_get_by_index(net, iif);
> +		dev = dev_get_by_index_rcu(net, iif);
>  		if (!dev) {
>  			err = -ENODEV;
>  			goto errout_free;
> @@ -2721,14 +2724,14 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
>  		skb->protocol	= htons(ETH_P_IP);
>  		skb->dev	= dev;
>  		skb->mark	= mark;
> -		err = ip_route_input(skb, dst, src, rtm->rtm_tos, dev);
> +		err = ip_route_input_rcu(skb, dst, src, rtm->rtm_tos,
> +					 dev, &res);
>  
>  		rt = skb_rtable(skb);
>  		if (err == 0 && rt->dst.error)
>  			err = -rt->dst.error;
>  	} else {
> -		rt = ip_route_output_key(net, &fl4);
> -
> +		rt = ip_route_output_key_hash_rcu(net, &fl4, &res, skb);
>  		err = 0;
>  		if (IS_ERR(rt))
>  			err = PTR_ERR(rt);
> @@ -2737,7 +2740,6 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
>  	if (err)
>  		goto errout_free;
>  
> -	skb_dst_set(skb, &rt->dst);


Why did you remove this? Neither ip_route_input() or ip_route_output_key()
seem to justify this with a quick scan on my side. Feel free to correct me
here.

The following fix resolves my issues,

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index f1f2e5a..8f373bd 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2750,6 +2750,7 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
        if (err)
                goto errout_free;
 
+       skb_dst_set(skb, &rt->dst);
        if (rtm->rtm_flags & RTM_F_NOTIFY)
                rt->rt_flags |= RTCF_NOTIFY;
 

Thanks,
John 

>  	if (rtm->rtm_flags & RTM_F_NOTIFY)
>  		rt->rt_flags |= RTCF_NOTIFY;
>  
> @@ -2745,15 +2747,18 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
>  		table_id = rt->rt_table_id;
>  
>  	err = rt_fill_info(net, dst, src, table_id, &fl4, skb,
> -			   NETLINK_CB(in_skb).portid, nlh->nlmsg_seq);
> +			   NETLINK_CB(in_skb).portid, nlh->nlmsg_seq, rt);
>  	if (err < 0)
>  		goto errout_free;
>  
> +	rcu_read_unlock();
> +
>  	err = rtnl_unicast(skb, net, NETLINK_CB(in_skb).portid);
>  errout:
>  	return err;
>  
>  errout_free:
> +	rcu_read_unlock();
>  	kfree_skb(skb);
>  	goto errout;
>  }
> 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2 4/8] net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup
  2017-05-31 20:11   ` John Fastabend
@ 2017-05-31 21:48     ` David Ahern
  2017-06-01  4:34       ` Roopa Prabhu
  0 siblings, 1 reply; 15+ messages in thread
From: David Ahern @ 2017-05-31 21:48 UTC (permalink / raw)
  To: John Fastabend, Roopa Prabhu, davem, rami.rosen; +Cc: netdev, nikolay

On 5/31/17 2:11 PM, John Fastabend wrote:
>> @@ -2721,14 +2724,14 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
>>  		skb->protocol	= htons(ETH_P_IP);
>>  		skb->dev	= dev;
>>  		skb->mark	= mark;
>> -		err = ip_route_input(skb, dst, src, rtm->rtm_tos, dev);
>> +		err = ip_route_input_rcu(skb, dst, src, rtm->rtm_tos,
>> +					 dev, &res);
>>  
>>  		rt = skb_rtable(skb);
>>  		if (err == 0 && rt->dst.error)
>>  			err = -rt->dst.error;
>>  	} else {
>> -		rt = ip_route_output_key(net, &fl4);
>> -
>> +		rt = ip_route_output_key_hash_rcu(net, &fl4, &res, skb);
>>  		err = 0;
>>  		if (IS_ERR(rt))
>>  			err = PTR_ERR(rt);
>> @@ -2737,7 +2740,6 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
>>  	if (err)
>>  		goto errout_free;
>>  
>> -	skb_dst_set(skb, &rt->dst);
> 
> 
> Why did you remove this? Neither ip_route_input() or ip_route_output_key()
> seem to justify this with a quick scan on my side. Feel free to correct me
> here.
> 

original patch was done in January. I forget why I took it out. It is
clearly needed to release the dst. Might as well undo the argument
change to rt_fill_info since it is attached to the skb. Something like
this (whitespace damaged on paste - stupid Mac):

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index f1f2e5aaa2d6..93cca12a8319 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2547,8 +2547,9 @@ EXPORT_SYMBOL_GPL(ip_route_output_flow);
 /* called with rcu_read_lock held */
 static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32
table_id,
                        struct flowi4 *fl4, struct sk_buff *skb, u32 portid,
-                       u32 seq, struct rtable *rt)
+                       u32 seq)
 {
+       struct rtable *rt = skb_rtable(skb);
        struct rtmsg *r;
        struct nlmsghdr *nlh;
        unsigned long expires = 0;
@@ -2750,6 +2751,8 @@ static int inet_rtm_getroute(struct sk_buff
*in_skb, struct nlmsghdr *nlh,
        if (err)
                goto errout_free;

+       skb_dst_set(skb, &rt->dst);
+
        if (rtm->rtm_flags & RTM_F_NOTIFY)
                rt->rt_flags |= RTCF_NOTIFY;

@@ -2763,8 +2766,7 @@ static int inet_rtm_getroute(struct sk_buff
*in_skb, struct nlmsghdr *nlh,
                                    fl4.flowi4_tos, res.fi, 0);
        else
                err = rt_fill_info(net, dst, src, table_id, &fl4, skb,
-                                  NETLINK_CB(in_skb).portid,
nlh->nlmsg_seq,
-                                  rt);
+                                  NETLINK_CB(in_skb).portid,
nlh->nlmsg_seq);
        if (err < 0)
                goto errout_free;


Seems to work for me.

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2 4/8] net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup
  2017-05-31 21:48     ` David Ahern
@ 2017-06-01  4:34       ` Roopa Prabhu
  0 siblings, 0 replies; 15+ messages in thread
From: Roopa Prabhu @ 2017-06-01  4:34 UTC (permalink / raw)
  To: David Ahern
  Cc: John Fastabend, davem, Rami Rosen, netdev, Nikolay Aleksandrov

On Wed, May 31, 2017 at 2:48 PM, David Ahern <dsahern@gmail.com> wrote:
> On 5/31/17 2:11 PM, John Fastabend wrote:
>>> @@ -2721,14 +2724,14 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
>>>              skb->protocol   = htons(ETH_P_IP);
>>>              skb->dev        = dev;
>>>              skb->mark       = mark;
>>> -            err = ip_route_input(skb, dst, src, rtm->rtm_tos, dev);
>>> +            err = ip_route_input_rcu(skb, dst, src, rtm->rtm_tos,
>>> +                                     dev, &res);
>>>
>>>              rt = skb_rtable(skb);
>>>              if (err == 0 && rt->dst.error)
>>>                      err = -rt->dst.error;
>>>      } else {
>>> -            rt = ip_route_output_key(net, &fl4);
>>> -
>>> +            rt = ip_route_output_key_hash_rcu(net, &fl4, &res, skb);
>>>              err = 0;
>>>              if (IS_ERR(rt))
>>>                      err = PTR_ERR(rt);
>>> @@ -2737,7 +2740,6 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
>>>      if (err)
>>>              goto errout_free;
>>>
>>> -    skb_dst_set(skb, &rt->dst);
>>
>>
>> Why did you remove this? Neither ip_route_input() or ip_route_output_key()
>> seem to justify this with a quick scan on my side. Feel free to correct me
>> here.
>>
>
> original patch was done in January. I forget why I took it out. It is
> clearly needed to release the dst.

yes, looking at it now the attach is needed to release the dst.  The original
commit msg said "The rcu lock is held through the creation of the
response, so the
 rtable/dst does not need to be attached to the skb and is passed to
rt_fill_info directly."

> Might as well undo the argument
> change to rt_fill_info since it is attached to the skb. Something like
> this (whitespace damaged on paste - stupid Mac):

agree, I can submit the patch

>
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index f1f2e5aaa2d6..93cca12a8319 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -2547,8 +2547,9 @@ EXPORT_SYMBOL_GPL(ip_route_output_flow);
>  /* called with rcu_read_lock held */
>  static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32
> table_id,
>                         struct flowi4 *fl4, struct sk_buff *skb, u32 portid,
> -                       u32 seq, struct rtable *rt)
> +                       u32 seq)
>  {
> +       struct rtable *rt = skb_rtable(skb);
>         struct rtmsg *r;
>         struct nlmsghdr *nlh;
>         unsigned long expires = 0;
> @@ -2750,6 +2751,8 @@ static int inet_rtm_getroute(struct sk_buff
> *in_skb, struct nlmsghdr *nlh,
>         if (err)
>                 goto errout_free;
>
> +       skb_dst_set(skb, &rt->dst);
> +
>         if (rtm->rtm_flags & RTM_F_NOTIFY)
>                 rt->rt_flags |= RTCF_NOTIFY;
>
> @@ -2763,8 +2766,7 @@ static int inet_rtm_getroute(struct sk_buff
> *in_skb, struct nlmsghdr *nlh,
>                                     fl4.flowi4_tos, res.fi, 0);
>         else
>                 err = rt_fill_info(net, dst, src, table_id, &fl4, skb,
> -                                  NETLINK_CB(in_skb).portid,
> nlh->nlmsg_seq,
> -                                  rt);
> +                                  NETLINK_CB(in_skb).portid,
> nlh->nlmsg_seq);
>         if (err < 0)
>                 goto errout_free;
>
>
> Seems to work for me.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-06-01  4:34 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-25 17:42 [PATCH net-next v2 0/8] net: extend RTM_GETROUTE to return fib result Roopa Prabhu
2017-05-25 17:42 ` [PATCH net-next v2 1/8] net: ipv4: refactor __ip_route_output_key_hash Roopa Prabhu
2017-05-25 17:42 ` [PATCH net-next v2 2/8] net: ipv4: refactor ip_route_input_noref Roopa Prabhu
2017-05-25 17:42 ` [PATCH net-next v2 3/8] net: ipv4: Remove event arg to rt_fill_info Roopa Prabhu
2017-05-25 17:42 ` [PATCH net-next v2 4/8] net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup Roopa Prabhu
2017-05-31 20:11   ` John Fastabend
2017-05-31 21:48     ` David Ahern
2017-06-01  4:34       ` Roopa Prabhu
2017-05-25 17:42 ` [PATCH net-next v2 5/8] net: ipv4: Save trie prefix to fib lookup result Roopa Prabhu
2017-05-25 17:42 ` [PATCH net-next v2 6/8] net: ipv4: add new RTM_F_FIB_MATCH flag for use with RTM_GETROUTE Roopa Prabhu
2017-05-25 17:42 ` [PATCH net-next v2 7/8] net: ipv4: RTM_GETROUTE: return matched fib result when requested Roopa Prabhu
2017-05-25 17:42 ` [PATCH net-next v2 8/8] net: ipv6: " Roopa Prabhu
2017-05-26 18:18 ` [PATCH net-next v2 0/8] net: extend RTM_GETROUTE to return fib result David Miller
2017-05-27  6:00   ` Roopa Prabhu
2017-05-27 14:02     ` David Ahern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.