From: Hannes Frederic Sowa <hannes@stressinduktion.org>
To: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: netdev@vger.kernel.org, yoshfuji@linux-ipv6.org,
petrus.lt@gmail.com, davem@davemloft.net
Subject: Re: [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF
Date: Wed, 10 Jul 2013 16:34:47 +0200 [thread overview]
Message-ID: <20130710143447.GF15411@order.stressinduktion.org> (raw)
In-Reply-To: <51DD700B.4060504@6wind.com>
On Wed, Jul 10, 2013 at 04:30:35PM +0200, Nicolas Dichtel wrote:
> Le 10/07/2013 15:49, Hannes Frederic Sowa a écrit :
> >On Wed, Jul 10, 2013 at 03:17:41PM +0200, Hannes Frederic Sowa wrote:
> >>On Wed, Jul 10, 2013 at 02:08:42PM +0200, Nicolas Dichtel wrote:
> >>>Le 10/07/2013 13:15, Hannes Frederic Sowa a écrit :
> >>>>On Wed, Jul 10, 2013 at 09:54:58AM +0200, Nicolas Dichtel wrote:
> >>>>>Le 09/07/2013 23:57, Hannes Frederic Sowa a écrit :
> >>>>>>Are we sure we decrement all sibling's rt6i_nsiblings? Shouldn't we
> >>>>>>start iterating from fn->leaf? But this does not seem to cause it,
> >>>>>>because my trace does not report any calls to fib6_del_route.
> >>>>>Note sure to follow you, but all siblings are listed in rt6i_siblings,
> >>>>>so
> >>>>>it must be enough.
> >>>>
> >>>>My hunch was to iterate over fn->leaf->rt_next and compare the metrics
> >>>>like we
> >>>>do when adding a new route. Then take that rt6_info->rt6i_siblings
> >>>>list_head
> >>>>to iterate over the remaining siblings. But I did not review that part
> >>>>carefully, need to check later.
> >>>>
> >>>>>>You could try reproduce it by having an interface autoconfigured with
> >>>>>>a default router with NUD_VALID neighbour. I then added an unused vlan
> >>>>>>interface (vid 100 in my case) and added the following ip addresses:
> >>>>>>
> >>>>>>ip -6 a a 2001:ffff::1/64 dev eth0.100
> >>>>>>ip -6 r a 2000::/3 nexthop via 2001:ffff::30 nexthop via 2001:ffff::31
> >>>>>>nexthop via 2001:ffff::32 nexthop via 2001:ffff::33
> >>>>>>
> >>>>>>(all nexthops should not be reachable)
> >>>>>>
> >>>>>>After starting a ping6 2000::1 the box should panic soon, after the
> >>>>>>first nexthop entry times out.
> >>>>>>
> >>>>>>Perhaps you could give me a hint?
> >>>>>I will run some tests with your patch. Will see.
> >>>>>
> >>>>>I assume you didn't reproduce this without your patch.
> >>>>
> >>>>Current kernel does not correctly select more specific routes, so these
> >>>>routes
> >>>>are not even tried and the logic should not be excercised.
> >>>>
> >>>>Ah, sorry, you should also compile your kernel without
> >>>>CONFIG_IPV6_ROUTER_PREF, too, if you try to reproduce it.
> >>>I've done this.
> >>>
> >>>My conf (eth1 autoconfigured, I use net-next + your patch):
> >>>vconfig add eth1 100
> >>>ifconfig eth1.100 up
> >>>ip -6 a a 2001:ffff::1/64 dev eth1.100
> >>>ip -6 r a 2000::/3 nexthop via 2001:ffff::30 nexthop via 2001:ffff::31
> >>>nexthop via 2001:ffff::32 nexthop via 2001:ffff::33
> >>>ping6 2000::1
> >>
> >>Hm, I see. I suspect something with timing. I, too, use a net-next and
> >>have
> >>one function dump_route added and sprinkeld it at some points.
> >>
> >>When I copy&pasted your calls I could not reproduce it. After a reboot
> >>when
> >>just applying the commands from my history (which I did a lot faster), I
> >>got
> >>the panic again.
> >>
> >>I'll remove the dump_routes and recheck later.
> >
> >This patch ontop
> >
> >--- a/net/ipv6/ip6_fib.c
> >+++ b/net/ipv6/ip6_fib.c
> >@@ -46,6 +46,16 @@
> > #define RT6_TRACE(x...) do { ; } while (0)
> > #endif
> >
> >+static void dump_route(struct rt6_info *rt, const char *prefix)
> >+{
> >+ u32 f = rt->rt6i_flags;
> >+ struct rt6key *k = &rt->rt6i_dst;
> >+ printk(KERN_INFO "%s: %p dst %pI6c plen %d gateway %pI6c, siblings
> >%d, metric %d, expires %d gateway %d idev6 %p dev %p\n", prefix,
> >+ rt, &k->addr, k->plen, &rt->rt6i_gateway,
> >rt->rt6i_nsiblings, rt->rt6i_metric, f&RTF_EXPIRES, f&RTF_GATEWAY,
> >rt->rt6i_idev, rt->dst.dev);
> >+}
> >+
> >+
> >+
> > static struct kmem_cache * fib6_node_kmem __read_mostly;
> >
> > enum fib_walk_state_t
> >@@ -693,8 +703,11 @@ static int fib6_add_rt2node(struct fib6_node *fn,
> >struct rt6_info *rt,
> > */
> > if (rt->rt6i_flags & RTF_GATEWAY &&
> > !(rt->rt6i_flags & RTF_EXPIRES) &&
> >- !(iter->rt6i_flags & RTF_EXPIRES))
> >+ !(iter->rt6i_flags & RTF_EXPIRES)) {
> > rt->rt6i_nsiblings++;
> >+ dump_route(rt, "(rt)");
> >+ dump_route(iter, "(iter)");
> >+ }
> > }
> >
> > if (iter->rt6i_metric > rt->rt6i_metric)
> >@@ -718,6 +731,7 @@ static int fib6_add_rt2node(struct fib6_node *fn,
> >struct rt6_info *rt,
> > if (sibling->rt6i_metric == rt->rt6i_metric) {
> > list_add_tail(&rt->rt6i_siblings,
> > &sibling->rt6i_siblings);
> >+ dump_route(sibling, "(sibling)");
> > break;
> > }
> > sibling = sibling->dst.rt6_next;
> >@@ -730,6 +744,7 @@ static int fib6_add_rt2node(struct fib6_node *fn,
> >struct rt6_info *rt,
> > list_for_each_entry_safe(sibling, temp_sibling,
> > &rt->rt6i_siblings,
> > rt6i_siblings) {
> > sibling->rt6i_nsiblings++;
> >+ dump_route(sibling, "(sibling increment)");
> > BUG_ON(sibling->rt6i_nsiblings !=
> > rt->rt6i_nsiblings);
> > rt6i_nsiblings++;
> > }
> >
> >produces this panic:
> >
> >[ 59.234779] (rt): ffff880113242000 dst 2000::1 plen 128 gateway
> >2001:ffff::33, siblings 1, metric 0, expires 0 gateway 2 idev6
> >ffff8801131ab000 dev ffff88011816d000
> >[ 59.243794] (iter): ffff880117e7b680 dst 2000::1 plen 128 gateway
> >2001:ffff::31, siblings 2, metric 0, expires 0 gateway 2 idev6
> >ffff8801131ab000 dev ffff88011816d000
> >[ 59.261383] (rt): ffff880113242000 dst 2000::1 plen 128 gateway
> >2001:ffff::33, siblings 2, metric 0, expires 0 gateway 2 idev6
> >ffff8801131ab000 dev ffff88011816d000
> >[ 59.270030] (iter): ffff880117e7bb00 dst 2000::1 plen 128 gateway
> >2001:ffff::32, siblings 2, metric 0, expires 0 gateway 2 idev6
> >ffff8801131ab000 dev ffff88011816d000
> >[ 59.291933] (sibling): ffff880117e62480 dst 2000::1 plen 128 gateway
> >2001:ffff::30, siblings 2, metric 0, expires 4194304 gateway 2 idev6
> >ffff8801131ab000 dev ffff88011816d000
> >[ 59.306893] (sibling increment): ffff880117e62480 dst 2000::1 plen 128
> >gateway 2001:ffff::30, siblings 3, metric 0, expires 4194304 gateway 2
> >idev6 ffff8801131ab000 dev ffff88011816d000
> I don't have the same output:
> [ 97.945170] (rt): f1a02d80 dst 2000:: plen 3 gateway
> 2001:660:1234:5678::31, siblings 1, metric 1024, expires 0 gateway 2 idev6
> f7507e00 dev f7793000
> [ 97.948117] (iter): f1a02e80 dst 2000:: plen 3 gateway
> 2001:660:1234:5678::30, siblings 0, metric 1024, expires 0 gateway 2 idev6
> f7507e00 dev f7793000
> [ 97.951207] (sibling): f1a02e80 dst 2000:: plen 3 gateway
> 2001:660:1234:5678::30, siblings 0, metric 1024, expires 0 gateway 2 idev6
> f7507e00 dev f7793000
> [ 97.954272] (sibling increment): f1a02e80 dst 2000:: plen 3 gateway
> 2001:660:1234:5678::30, siblings 1, metric 1024, expires 0 gateway 2 idev6
> f7507e00 dev f7793000
> [ 97.957545] (rt): f1a02c80 dst 2000:: plen 3 gateway
> 2001:660:1234:5678::32, siblings 1, metric 1024, expires 0 gateway 2 idev6
> f7507e00 dev f7793000
> [ 97.960376] (iter): f1a02e80 dst 2000:: plen 3 gateway
> 2001:660:1234:5678::30, siblings 1, metric 1024, expires 0 gateway 2 idev6
> f7507e00 dev f7793000
> [ 97.961902] (rt): f1a02c80 dst 2000:: plen 3 gateway
> 2001:660:1234:5678::32, siblings 2, metric 1024, expires 0 gateway 2 idev6
> f7507e00 dev f7793000
> [ 97.963095] (iter): f1a02d80 dst 2000:: plen 3 gateway
> 2001:660:1234:5678::31, siblings 1, metric 1024, expires 0 gateway 2 idev6
> f7507e00 dev f7793000
> [ 97.964354] (sibling): f1a02e80 dst 2000:: plen 3 gateway
> 2001:660:1234:5678::30, siblings 1, metric 1024, expires 0 gateway 2 idev6
> f7507e00 dev f7793000
> [ 97.965604] (sibling increment): f1a02e80 dst 2000:: plen 3 gateway
> 2001:660:1234:5678::30, siblings 2, metric 1024, expires 0 gateway 2 idev6
> f7507e00 dev f7793000
> [ 97.966916] (sibling increment): f1a02d80 dst 2000:: plen 3 gateway
> 2001:660:1234:5678::31, siblings 2, metric 1024, expires 0 gateway 2 idev6
> f7507e00 dev f7793000
> [ 97.968254] (rt): f1a02b80 dst 2000:: plen 3 gateway
> 2001:660:1234:5678::33, siblings 1, metric 1024, expires 0 gateway 2 idev6
> f7507e00 dev f7793000
> [ 97.969467] (iter): f1a02e80 dst 2000:: plen 3 gateway
> 2001:660:1234:5678::30, siblings 2, metric 1024, expires 0 gateway 2 idev6
> f7507e00 dev f7793000
> [ 97.970702] (rt): f1a02b80 dst 2000:: plen 3 gateway
> 2001:660:1234:5678::33, siblings 2, metric 1024, expires 0 gateway 2 idev6
> f7507e00 dev f7793000
> [ 97.971895] (iter): f1a02d80 dst 2000:: plen 3 gateway
> 2001:660:1234:5678::31, siblings 2, metric 1024, expires 0 gateway 2 idev6
> f7507e00 dev f7793000
> [ 97.973137] (rt): f1a02b80 dst 2000:: plen 3 gateway
> 2001:660:1234:5678::33, siblings 3, metric 1024, expires 0 gateway 2 idev6
> f7507e00 dev f7793000
> [ 97.974331] (iter): f1a02c80 dst 2000:: plen 3 gateway
> 2001:660:1234:5678::32, siblings 2, metric 1024, expires 0 gateway 2 idev6
> f7507e00 dev f7793000
> [ 97.975542] (sibling): f1a02e80 dst 2000:: plen 3 gateway
> 2001:660:1234:5678::30, siblings 2, metric 1024, expires 0 gateway 2 idev6
> f7507e00 dev f7793000
> [ 97.976808] (sibling increment): f1a02e80 dst 2000:: plen 3 gateway
> 2001:660:1234:5678::30, siblings 3, metric 1024, expires 0 gateway 2 idev6
> f7507e00 dev f7793000
> [ 97.978126] (sibling increment): f1a02d80 dst 2000:: plen 3 gateway
> 2001:660:1234:5678::31, siblings 3, metric 1024, expires 0 gateway 2 idev6
> f7507e00 dev f7793000
> [ 97.979453] (sibling increment): f1a02c80 dst 2000:: plen 3 gateway
> 2001:660:1234:5678::32, siblings 3, metric 1024, expires 0 gateway 2 idev6
> f7507e00 dev f7793000
>
> Can you send me the output of:
> ip -6 r
> ip -6 a
>
Of course:
ip -6 r:
2001:700:1300:feed::134 via 2001:ffff::33 dev eth0.100 metric 0
cache expires -7sec
2001:700:1300:feed::134 via fe80::5054:ff:fe82:e153 dev eth0 metric 0
cache
2001:db8:ee8c:180::/64 dev eth0 proto kernel metric 256 expires 86353sec
2001:ffff::/64 dev eth0.100 proto kernel metric 256
2000::/3 via 2001:ffff::30 dev eth0.100 metric 1024
2000::/3 via 2001:ffff::31 dev eth0.100 metric 1024
2000::/3 via 2001:ffff::32 dev eth0.100 metric 1024
2000::/3 via 2001:ffff::33 dev eth0.100 metric 1024
fe80::/64 dev eth0 proto kernel metric 256
fe80::/64 dev eth0.100 proto kernel metric 256
default via fe80::5054:ff:fe82:e153 dev eth0 proto ra metric 1024 expires 1753sec
ip -6 a:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
inet6 2001:db8:ee8c:180:5054:ff:fe53:c6c9/64 scope global dynamic
valid_lft 86392sec preferred_lft 14392sec
inet6 fe80::5054:ff:fe53:c6c9/64 scope link
valid_lft forever preferred_lft forever
3: eth0.100@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
inet6 2001:ffff::1/64 scope global
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe53:c6c9/64 scope link
valid_lft forever preferred_lft forever
Thanks,
Hannes
prev parent reply other threads:[~2013-07-10 14:34 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-07 17:30 [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF Hannes Frederic Sowa
2013-07-09 21:57 ` Hannes Frederic Sowa
2013-07-10 7:54 ` Nicolas Dichtel
2013-07-10 9:28 ` Nicolas Dichtel
2013-07-10 10:53 ` Hannes Frederic Sowa
2013-07-10 12:22 ` Nicolas Dichtel
2013-07-10 13:21 ` Hannes Frederic Sowa
2013-07-10 14:10 ` Nicolas Dichtel
2013-07-10 15:20 ` Hannes Frederic Sowa
2013-07-10 15:59 ` Hannes Frederic Sowa
2013-07-10 16:35 ` Hannes Frederic Sowa
2013-07-11 8:07 ` Nicolas Dichtel
2013-07-10 21:21 ` Hannes Frederic Sowa
2013-07-11 8:04 ` Nicolas Dichtel
2013-07-11 10:24 ` Hannes Frederic Sowa
2013-07-11 14:46 ` Hannes Frederic Sowa
2013-07-11 14:57 ` Nicolas Dichtel
2013-07-12 8:51 ` Hannes Frederic Sowa
2013-07-12 12:04 ` Nicolas Dichtel
2013-07-12 16:19 ` Hannes Frederic Sowa
2013-07-12 19:01 ` Nicolas Dichtel
2013-07-12 19:20 ` Hannes Frederic Sowa
2013-07-12 21:48 ` Hannes Frederic Sowa
2013-07-10 11:15 ` Hannes Frederic Sowa
2013-07-10 11:40 ` Hannes Frederic Sowa
2013-07-10 12:08 ` Nicolas Dichtel
2013-07-10 13:17 ` Hannes Frederic Sowa
2013-07-10 13:49 ` Hannes Frederic Sowa
2013-07-10 14:30 ` Nicolas Dichtel
2013-07-10 14:34 ` Hannes Frederic Sowa [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130710143447.GF15411@order.stressinduktion.org \
--to=hannes@stressinduktion.org \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
--cc=nicolas.dichtel@6wind.com \
--cc=petrus.lt@gmail.com \
--cc=yoshfuji@linux-ipv6.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).