linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] net: ipv6: return the first matched rt6_info for multicast packets in find_rr_leaf()
@ 2017-01-18 15:13 Rajasekar Kumar
  2017-01-20 16:58 ` David Miller
  0 siblings, 1 reply; 3+ messages in thread
From: Rajasekar Kumar @ 2017-01-18 15:13 UTC (permalink / raw)
  To: davem, kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel

There is a performance issue when large number of interfaces are
enabled with VRRP protocol in 2 router nodes which are connected
to each other. When VRRP hello is received (which is multicast
packet with DIP: ff02::18), a rt6_info node is added to fib6_node
 of address ff02::18. This happens for each interface on which
VRRP is enabled. For 2000 interfaces with VRRP enabled, 2000
rt6_info nodes are added to the same fib6_node. As of today,
find_rr_leaf() goes further to find better match, even after first
successful match based on interface key. In this case, it walks
2000 nodes for every incoming packet/outgoing packet, which is
expensive and not needed. rt6_info match based on supplied
interface match should be sufficient. The first match occurs
when there is interface match, and after that there can not be
another match for multicast packets. So, first match should be
returned for multicast packets.

find_rr_leaf() tries to find best available gateway, mainly based on
interface match and gateway's reachablity info.When this is required
for unicast packets, multicast packets do not need either gateway's
reachability status or gateway's Layer2 address as it is derived
from Destination IP (group address). rt6_info match based on supplied
interface match should be sufficient.

This fix helps in scenario wherein multicast packets arrive in some
interfaces frequently than other interfaces. rt6_info is added to
beginning of list for former cases. Verified this case.

Signed-off-by: Rajasekar Kumar <sekraj@gmail.com>
---
	Changes from first mail:
		- Including netdev@vger.kernel.org, linux-kernel@vger.kernel.org to get review inputs/help
		- Changed the subject prefix to RFC from PATCH 
		- Amended commit message to include test scenario

 net/ipv6/route.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 8417c41..609b543 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -703,6 +703,8 @@ static struct rt6_info *find_rr_leaf(struct fib6_node *fn,
 		}
 
 		match = find_match(rt, oif, strict, &mpri, match, do_rr);
+		if (match && ipv6_addr_is_multicast(&rt->rt6i_dst.addr))
+			return match;
 	}
 
 	for (rt = fn->leaf; rt && rt != rr_head; rt = rt->dst.rt6_next) {
@@ -712,13 +714,18 @@ static struct rt6_info *find_rr_leaf(struct fib6_node *fn,
 		}
 
 		match = find_match(rt, oif, strict, &mpri, match, do_rr);
+		if (match && ipv6_addr_is_multicast(&rt->rt6i_dst.addr))
+			return match;
 	}
 
 	if (match || !cont)
 		return match;
 
-	for (rt = cont; rt; rt = rt->dst.rt6_next)
+	for (rt = cont; rt; rt = rt->dst.rt6_next) {
 		match = find_match(rt, oif, strict, &mpri, match, do_rr);
+		if (match && ipv6_addr_is_multicast(&rt->rt6i_dst.addr))
+			return match;
+	}
 
 	return match;
 }
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [RFC] net: ipv6: return the first matched rt6_info for multicast packets in find_rr_leaf()
  2017-01-18 15:13 [RFC] net: ipv6: return the first matched rt6_info for multicast packets in find_rr_leaf() Rajasekar Kumar
@ 2017-01-20 16:58 ` David Miller
  2017-01-24 16:10   ` Rajasekar Kumar
  0 siblings, 1 reply; 3+ messages in thread
From: David Miller @ 2017-01-20 16:58 UTC (permalink / raw)
  To: sekraj; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel

From: Rajasekar Kumar <sekraj@gmail.com>
Date: Wed, 18 Jan 2017 20:43:37 +0530

> There is a performance issue when large number of interfaces are
> enabled with VRRP protocol in 2 router nodes which are connected
> to each other. When VRRP hello is received (which is multicast
> packet with DIP: ff02::18), a rt6_info node is added to fib6_node
>  of address ff02::18. This happens for each interface on which
> VRRP is enabled. For 2000 interfaces with VRRP enabled, 2000
> rt6_info nodes are added to the same fib6_node. As of today,
> find_rr_leaf() goes further to find better match, even after first
> successful match based on interface key. In this case, it walks
> 2000 nodes for every incoming packet/outgoing packet, which is
> expensive and not needed. rt6_info match based on supplied
> interface match should be sufficient. The first match occurs
> when there is interface match, and after that there can not be
> another match for multicast packets. So, first match should be
> returned for multicast packets.
> 
> find_rr_leaf() tries to find best available gateway, mainly based on
> interface match and gateway's reachablity info.When this is required
> for unicast packets, multicast packets do not need either gateway's
> reachability status or gateway's Layer2 address as it is derived
> from Destination IP (group address). rt6_info match based on supplied
> interface match should be sufficient.
> 
> This fix helps in scenario wherein multicast packets arrive in some
> interfaces frequently than other interfaces. rt6_info is added to
> beginning of list for former cases. Verified this case.
> 
> Signed-off-by: Rajasekar Kumar <sekraj@gmail.com>

So the only thing different in each rt6_info in the list is the
interface, right?

Well, that's a part of the lookup key, multicast or not.  If the user
binds a socket to a specific interface, they want the route lookup to
return the rt6_info node with that device.

So I think your change introduces a regression, therefore another
solution will need to be found for your performance problem.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC] net: ipv6: return the first matched rt6_info for multicast packets in find_rr_leaf()
  2017-01-20 16:58 ` David Miller
@ 2017-01-24 16:10   ` Rajasekar Kumar
  0 siblings, 0 replies; 3+ messages in thread
From: Rajasekar Kumar @ 2017-01-24 16:10 UTC (permalink / raw)
  To: David Miller; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel

On Fri, Jan 20, 2017 at 11:58:04AM -0500, David Miller wrote:
> From: Rajasekar Kumar <sekraj@gmail.com>
> Date: Wed, 18 Jan 2017 20:43:37 +0530
> 
> > There is a performance issue when large number of interfaces are
> > enabled with VRRP protocol in 2 router nodes which are connected
> > to each other. When VRRP hello is received (which is multicast
> > packet with DIP: ff02::18), a rt6_info node is added to fib6_node
> >  of address ff02::18. This happens for each interface on which
> > VRRP is enabled. For 2000 interfaces with VRRP enabled, 2000
> > rt6_info nodes are added to the same fib6_node. As of today,
> > find_rr_leaf() goes further to find better match, even after first
> > successful match based on interface key. In this case, it walks
> > 2000 nodes for every incoming packet/outgoing packet, which is
> > expensive and not needed. rt6_info match based on supplied
> > interface match should be sufficient. The first match occurs
> > when there is interface match, and after that there can not be
> > another match for multicast packets. So, first match should be
> > returned for multicast packets.
> > 
> > find_rr_leaf() tries to find best available gateway, mainly based on
> > interface match and gateway's reachablity info.When this is required
> > for unicast packets, multicast packets do not need either gateway's
> > reachability status or gateway's Layer2 address as it is derived
> > from Destination IP (group address). rt6_info match based on supplied
> > interface match should be sufficient.
> > 
> > This fix helps in scenario wherein multicast packets arrive in some
> > interfaces frequently than other interfaces. rt6_info is added to
> > beginning of list for former cases. Verified this case.
> > 
> > Signed-off-by: Rajasekar Kumar <sekraj@gmail.com>
> 
> So the only thing different in each rt6_info in the list is the
> interface, right?
> 
> Well, that's a part of the lookup key, multicast or not.  If the user
> binds a socket to a specific interface, they want the route lookup to
> return the rt6_info node with that device.
> 
> So I think your change introduces a regression, therefore another
> solution will need to be found for your performance problem.

Thanks for the review. Below are my thoughts.
The meaningful difference between rt6_infos is interface,
which is true for multicast. For unicast routes, nexthop's
reachability status, route's preference values also are important.
For unicast destinations, there will not be so many rt6_info's
for the same prefix even with multipath support, unlike multicast
case described above. In general for unicast routes, finding better
route/nexthop match involves gateway's reachability status, route's
preference values, in addition to interface match. This code is
present already. Also, it does not look like, IPv4 implementation
does any specific check for bind-to-device case in the area of
route lookup, but it does interface match like anyother case by
walking all nexthops in fib_info within check_leaf(). So, I am
not sure, if a fix is needed right now for non-multicast.

Comparing with IPv4 implementation with respect to IP packet
reception/transmission, multicast was handled specially in multiple
occurrences. This fix attempted to do something similar. If any
specific problem identified for non-multicast, can be addressed in
future. Please post your thoughts.

Sorry, I didnot understand your comment regarding regression. 
Is the fix already creating regression issue?

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-01-24 16:10 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-18 15:13 [RFC] net: ipv6: return the first matched rt6_info for multicast packets in find_rr_leaf() Rajasekar Kumar
2017-01-20 16:58 ` David Miller
2017-01-24 16:10   ` Rajasekar Kumar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).