* [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF @ 2013-07-07 17:30 Hannes Frederic Sowa 2013-07-09 21:57 ` Hannes Frederic Sowa 0 siblings, 1 reply; 30+ messages in thread From: Hannes Frederic Sowa @ 2013-07-07 17:30 UTC (permalink / raw) To: netdev; +Cc: yoshfuji, petrus.lt, davem Hello! This patch fixes the last (I currently know of) known problem with nexthop selection and rt->n removal for me. I did some preliminary testing and it looks good so far. I wanted to post it here early to gather further feedback, reviews or even testing. I am not sure if I overlooked something, yet. Things I have not checked, yet: a) switching a router to host (I don't know if we handle this correctly, currently) b) router disappearing (but should be fine) c) ... No signed-off by me, for now. Thanks, Hannes -- >8 -- This is a follow-up patch to 3630d40067a21d4dfbadc6002bb469ce26ac5d52 ("ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available"). Since the removal of rt->n in rt6_info we can end up with a dst == NULL in rt6_check_neigh. In case the kernel is not compiled with CONFIG_IPV6_ROUTER_PREF we should also select a route with unkown NUD state but we must not avoid doing round robin selection on routes with the same target. So introduce and pass down a boolean ``do_rr'' to indicate when we should update rt->rr_ptr. As soon as no route is valid we do backtracking and do a lookup on a higher level in the fib trie. To hold correct state on the NUD selection we need to create a neighbour entry as soon as we try to validate a nexthop. I changed the return value of rt6_check_neigh to: 1 in case of the dst entry validated -1 in case of we had no dst_entry and we need to do rr now -2 in case a we had a dst_entry and it did not validate In case of CONFIG_IPV6_ROUTER_PREF, rt6_probe does not allocate an neighbour entry (!CONFIG_IPV6_ROUTER_PREF: rt6_probe is a nop). Because of this, we have to create a neighbour entry on nexthop validation to track earlier validation errors. We recheck NUD state here to shortcurcuit NUD_NOARP neighbours. This seems to be the least complex fix for stable and net. I'll introduce a new route lookup flag 'idempotent' as soon as next opens to not let ip route get trigger active NUD validation if CONFIG_IPV6_ROUTER_PREF is enabled. It also seems advantageous to make CONFIG_IPV6_ROUTER_PREF a runtime switch and just select the default operation on compile-time. Reported-by: Pierre Emeriaud <petrus.lt@gmail.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Cc: David Miller <davem@davemloft.net> --- net/ipv6/route.c | 67 +++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 44 insertions(+), 23 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index bd5fd70..6a87ced 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -531,28 +531,39 @@ static inline int rt6_check_dev(struct rt6_info *rt, int oif) return 0; } -static inline bool rt6_check_neigh(struct rt6_info *rt) +static inline int rt6_check_neigh(struct rt6_info *rt) { struct neighbour *neigh; - bool ret = false; + int ret = -2; if (rt->rt6i_flags & RTF_NONEXTHOP || !(rt->rt6i_flags & RTF_GATEWAY)) - return true; + return 1; rcu_read_lock_bh(); neigh = __ipv6_neigh_lookup_noref(rt->dst.dev, &rt->rt6i_gateway); if (neigh) { read_lock(&neigh->lock); if (neigh->nud_state & NUD_VALID) - ret = true; + ret = 1; #ifdef CONFIG_IPV6_ROUTER_PREF else if (!(neigh->nud_state & NUD_FAILED)) - ret = true; + ret = 1; #endif read_unlock(&neigh->lock); - } else if (IS_ENABLED(CONFIG_IPV6_ROUTER_PREF)) { - ret = true; + } else { + /* track state for next check */ + neigh = __neigh_create(&nd_tbl, &rt->rt6i_gateway, rt->dst.dev, + false); + if (!IS_ERR(neigh)) { + if (neigh->nud_state & NUD_VALID) + ret = 1; + else + ret = IS_ENABLED(CONFIG_IPV6_ROUTER_PREF) ? + 1 : -1; + } else { + ret = -2; + } } rcu_read_unlock_bh(); @@ -566,43 +577,52 @@ static int rt6_score_route(struct rt6_info *rt, int oif, m = rt6_check_dev(rt, oif); if (!m && (strict & RT6_LOOKUP_F_IFACE)) - return -1; + return -2; #ifdef CONFIG_IPV6_ROUTER_PREF m |= IPV6_DECODE_PREF(IPV6_EXTRACT_PREF(rt->rt6i_flags)) << 2; #endif - if (!rt6_check_neigh(rt) && (strict & RT6_LOOKUP_F_REACHABLE)) - return -1; + if (strict & RT6_LOOKUP_F_REACHABLE) { + int n = rt6_check_neigh(rt); + if (n < 0) + return n; + } return m; } static struct rt6_info *find_match(struct rt6_info *rt, int oif, int strict, - int *mpri, struct rt6_info *match) + int *mpri, struct rt6_info *match, + bool *do_rr) { int m; + bool match_do_rr = false; if (rt6_check_expired(rt)) goto out; m = rt6_score_route(rt, oif, strict); - if (m < 0) + if (m == -1 && !IS_ENABLED(CONFIG_IPV6_ROUTER_PREF)) { + match_do_rr = true; + m = 0; /* lowest valid score */ + } else if (m < 0) { goto out; + } + + if (strict & RT6_LOOKUP_F_REACHABLE) + rt6_probe(rt); if (m > *mpri) { - if (strict & RT6_LOOKUP_F_REACHABLE) - rt6_probe(match); + *do_rr = match_do_rr; *mpri = m; match = rt; - } else if (strict & RT6_LOOKUP_F_REACHABLE) { - rt6_probe(rt); } - out: return match; } static struct rt6_info *find_rr_leaf(struct fib6_node *fn, struct rt6_info *rr_head, - u32 metric, int oif, int strict) + u32 metric, int oif, int strict, + bool *do_rr) { struct rt6_info *rt, *match; int mpri = -1; @@ -610,10 +630,10 @@ static struct rt6_info *find_rr_leaf(struct fib6_node *fn, match = NULL; for (rt = rr_head; rt && rt->rt6i_metric == metric; rt = rt->dst.rt6_next) - match = find_match(rt, oif, strict, &mpri, match); + match = find_match(rt, oif, strict, &mpri, match, do_rr); for (rt = fn->leaf; rt && rt != rr_head && rt->rt6i_metric == metric; rt = rt->dst.rt6_next) - match = find_match(rt, oif, strict, &mpri, match); + match = find_match(rt, oif, strict, &mpri, match, do_rr); return match; } @@ -622,15 +642,16 @@ static struct rt6_info *rt6_select(struct fib6_node *fn, int oif, int strict) { struct rt6_info *match, *rt0; struct net *net; + bool do_rr = false; rt0 = fn->rr_ptr; if (!rt0) fn->rr_ptr = rt0 = fn->leaf; - match = find_rr_leaf(fn, rt0, rt0->rt6i_metric, oif, strict); + match = find_rr_leaf(fn, rt0, rt0->rt6i_metric, oif, strict, + &do_rr); - if (!match && - (strict & RT6_LOOKUP_F_REACHABLE)) { + if (do_rr) { struct rt6_info *next = rt0->dst.rt6_next; /* no entries matched; do round-robin */ -- 1.8.1.4 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF 2013-07-07 17:30 [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF Hannes Frederic Sowa @ 2013-07-09 21:57 ` Hannes Frederic Sowa 2013-07-10 7:54 ` Nicolas Dichtel 0 siblings, 1 reply; 30+ messages in thread From: Hannes Frederic Sowa @ 2013-07-09 21:57 UTC (permalink / raw) To: nicolas.dichtel; +Cc: netdev, yoshfuji, petrus.lt, davem Hello Nicolas! I am currently trying to fix the nexthop selection for kernels compiled without support for CONFIG_IPV6_ROUTER_PREF. I attached my current patch below. While testing I got the following kernel panic in ecmp code: [ 80.144667] ------------[ cut here ]------------ [ 80.145172] kernel BUG at net/ipv6/ip6_fib.c:733! [ 80.145172] invalid opcode: 0000 [#1] SMP [ 80.145172] Modules linked in: 8021q nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer virtio_balloon snd soundcore i2c_piix4 i2c_core virtio_net virtio_blk [ 80.145172] CPU: 1 PID: 786 Comm: ping6 Not tainted 3.10.0+ #118 [ 80.145172] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 80.145172] task: ffff880117fa0000 ti: ffff880118770000 task.ti: ffff880118770000 [ 80.145172] RIP: 0010:[<ffffffff815f3b5d>] [<ffffffff815f3b5d>] fib6_add+0x75d/0x830 [ 80.145172] RSP: 0018:ffff880118771798 EFLAGS: 00010202 [ 80.145172] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88011350e480 [ 80.145172] RDX: ffff88011350e238 RSI: 0000000000000004 RDI: ffff88011350f738 [ 80.145172] RBP: ffff880118771848 R08: ffff880117903280 R09: 0000000000000001 [ 80.145172] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88011350f680 [ 80.145172] R13: ffff880117903280 R14: ffff880118771890 R15: ffff88011350ef90 [ 80.145172] FS: 00007f02b5127740(0000) GS:ffff88011fd00000(0000) knlGS:0000000000000000 [ 80.145172] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 80.145172] CR2: 00007f981322a000 CR3: 00000001181b1000 CR4: 00000000000006e0 [ 80.145172] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 80.145172] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 80.145172] Stack: [ 80.145172] 0000000000000001 ffff880100000000 ffff880100000000 ffff880117903280 [ 80.145172] 0000000000000000 ffff880119a4cf00 0000000000000400 00000000000007fa [ 80.145172] 0000000000000000 0000000000000000 0000000000000000 ffff88011350f680 [ 80.145172] Call Trace: [ 80.145172] [<ffffffff815eeceb>] ? rt6_bind_peer+0x4b/0x90 [ 80.145172] [<ffffffff815ed985>] __ip6_ins_rt+0x45/0x70 [ 80.145172] [<ffffffff815eee35>] ip6_ins_rt+0x35/0x40 [ 80.145172] [<ffffffff815ef1e4>] ip6_pol_route.isra.44+0x3a4/0x4b0 [ 80.145172] [<ffffffff815ef34a>] ip6_pol_route_output+0x2a/0x30 [ 80.145172] [<ffffffff81616077>] fib6_rule_action+0xd7/0x210 [ 80.145172] [<ffffffff815ef320>] ? ip6_pol_route_input+0x30/0x30 [ 80.145172] [<ffffffff81553026>] fib_rules_lookup+0xc6/0x140 [ 80.145172] [<ffffffff81616374>] fib6_rule_lookup+0x44/0x80 [ 80.145172] [<ffffffff815ef320>] ? ip6_pol_route_input+0x30/0x30 [ 80.145172] [<ffffffff815edea3>] ip6_route_output+0x73/0xb0 [ 80.145172] [<ffffffff815dfdf3>] ip6_dst_lookup_tail+0x2c3/0x2e0 [ 80.145172] [<ffffffff813007b1>] ? list_del+0x11/0x40 [ 80.145172] [<ffffffff81082a4c>] ? remove_wait_queue+0x3c/0x50 [ 80.145172] [<ffffffff815dfe4d>] ip6_dst_lookup_flow+0x3d/0xa0 [ 80.145172] [<ffffffff815fda77>] rawv6_sendmsg+0x267/0xc20 [ 80.145172] [<ffffffff815a8a83>] inet_sendmsg+0x63/0xb0 [ 80.145172] [<ffffffff8128eb93>] ? selinux_socket_sendmsg+0x23/0x30 [ 80.145172] [<ffffffff815218d6>] sock_sendmsg+0xa6/0xd0 [ 80.145172] [<ffffffff81524a68>] SYSC_sendto+0x128/0x180 [ 80.145172] [<ffffffff8109825c>] ? update_curr+0xec/0x170 [ 80.145172] [<ffffffff81041d09>] ? kvm_clock_get_cycles+0x9/0x10 [ 80.145172] [<ffffffff810afd1e>] ? __getnstimeofday+0x3e/0xd0 [ 80.145172] [<ffffffff8152509e>] SyS_sendto+0xe/0x10 [ 80.145172] [<ffffffff8164efd9>] system_call_fastpath+0x16/0x1b [ 80.145172] Code: fe ff ff 41 f6 45 2a 06 0f 85 ca fe ff ff 49 8b 7e 08 4c 89 ee e8 94 ef ff ff e9 b9 fe ff ff 48 8b 82 28 05 00 00 e9 01 ff ff ff <0f> 0b 49 8b 54 24 30 0d 00 00 40 00 89 83 14 01 00 00 48 89 53 [ 80.145172] RIP [<ffffffff815f3b5d>] fib6_add+0x75d/0x830 [ 80.145172] RSP <ffff880118771798> [ 80.387413] ---[ end trace 02f20b7a8b81ed95 ]--- [ 80.390154] Kernel panic - not syncing: Fatal exception in interrupt The relevant code is: 725 /* For each sibling in the list, increment the counter of 726 * siblings. BUG() if counters does not match, list of siblings 727 * is broken! 728 */ 729 rt6i_nsiblings = 0; 730 list_for_each_entry_safe(sibling, temp_sibling, 731 &rt->rt6i_siblings, rt6i_siblings) { 732 sibling->rt6i_nsiblings++; 733 BUG_ON(sibling->rt6i_nsiblings != rt->rt6i_nsiblings); 734 rt6i_nsiblings++; 735 } 736 BUG_ON(rt6i_nsiblings != rt->rt6i_nsiblings); 737 } Currently, I don't know if I my patch is to blame or something in the ecmp logic. Another thing I just noticed is: 1240 /* Remove this entry from other siblings */ 1241 if (rt->rt6i_nsiblings) { 1242 struct rt6_info *sibling, *next_sibling; 1243 1244 list_for_each_entry_safe(sibling, next_sibling, 1245 &rt->rt6i_siblings, rt6i_siblings) 1246 sibling->rt6i_nsiblings--; 1247 rt->rt6i_nsiblings = 0; 1248 list_del_init(&rt->rt6i_siblings); 1249 } Are we sure we decrement all sibling's rt6i_nsiblings? Shouldn't we start iterating from fn->leaf? But this does not seem to cause it, because my trace does not report any calls to fib6_del_route. You could try reproduce it by having an interface autoconfigured with a default router with NUD_VALID neighbour. I then added an unused vlan interface (vid 100 in my case) and added the following ip addresses: ip -6 a a 2001:ffff::1/64 dev eth0.100 ip -6 r a 2000::/3 nexthop via 2001:ffff::30 nexthop via 2001:ffff::31 nexthop via 2001:ffff::32 nexthop via 2001:ffff::33 (all nexthops should not be reachable) After starting a ping6 2000::1 the box should panic soon, after the first nexthop entry times out. Perhaps you could give me a hint? Thanks a lot, Hannes [PATCH RFC] ipv6: fix route selection if kernel not compiled with CONFIG_IPV6_ROUTER_PREF This is a follow-up patch to 3630d40067a21d4dfbadc6002bb469ce26ac5d52 ("ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available"). Since the removal of rt->n in rt6_info we can end up with a dst == NULL in rt6_check_neigh. In case the kernel is not compiled with CONFIG_IPV6_ROUTER_PREF we should also select a route with unkown NUD state but we must not avoid doing round robin selection on routes with the same target. So introduce and pass down a boolean ``do_rr'' to indicate when we should update rt->rr_ptr. As soon as no route is valid we do backtracking and do a lookup on a higher level in the fib trie. To hold correct state on the NUD selection we need to create a neighbour entry as soon as we tried to validate a nexthop. I changed the return value of rt6_check_neigh to: 1 in case of the dst entry validated -1 in case of we had no dst_entry and we need to do rr now -2 in case a we had a dst_entry and it did not validate In case of CONFIG_IPV6_ROUTER_PREF, rt6_probe does not allocate an neighbour entry (!CONFIG_IPV6_ROUTER_PREF: rt6_probe is a nop). Because of this, we have to create a neighbour entry on nexthop validation to track earlier validation errors. We recheck NUD state here to shortcurcuit NUD_NOARP neighbours. This seems to be the least complex fix for stable and net. I'll introduce a new route lookup flag 'idempotent' as soon as next opens to not let ip route get trigger active NUD validation if CONFIG_IPV6_ROUTER_PREF is enabled. Currently we trigger active NUD validation if compiled with CONFIG_IPV6_ROUTER_PREF. It also seems advantageous to make CONFIG_IPV6_ROUTER_PREF a runtime switch and just select the default operation on compile-time. v2: a) improved rt6_check_neigh logic and documented return values Reported-by: Pierre Emeriaud <petrus.lt@gmail.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> --- net/ipv6/route.c | 62 +++++++++++++++++++++++++++++++++++--------------------- 1 file changed, 39 insertions(+), 23 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index bd5fd70..c5d9e68 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -531,28 +531,34 @@ static inline int rt6_check_dev(struct rt6_info *rt, int oif) return 0; } -static inline bool rt6_check_neigh(struct rt6_info *rt) +/* This function checks if a neighbour is reachable for routing + * purposes. It returns -2 in case the neighbour should not get + * selected as a viable router, -1 in case it should get selected with + * lowest score and afterwards trying roundrobin. 1 indicates a + * successfull verification. + */ +static inline int rt6_check_neigh(struct rt6_info *rt) { struct neighbour *neigh; - bool ret = false; + int ret = -2; if (rt->rt6i_flags & RTF_NONEXTHOP || !(rt->rt6i_flags & RTF_GATEWAY)) - return true; + return 1; rcu_read_lock_bh(); neigh = __ipv6_neigh_lookup_noref(rt->dst.dev, &rt->rt6i_gateway); if (neigh) { read_lock(&neigh->lock); if (neigh->nud_state & NUD_VALID) - ret = true; + ret = 1; #ifdef CONFIG_IPV6_ROUTER_PREF else if (!(neigh->nud_state & NUD_FAILED)) - ret = true; + ret = 1; #endif read_unlock(&neigh->lock); - } else if (IS_ENABLED(CONFIG_IPV6_ROUTER_PREF)) { - ret = true; + } else { + ret = IS_ENABLED(CONFIG_IPV6_ROUTER_PREF) ? 1 : -1; } rcu_read_unlock_bh(); @@ -566,43 +572,52 @@ static int rt6_score_route(struct rt6_info *rt, int oif, m = rt6_check_dev(rt, oif); if (!m && (strict & RT6_LOOKUP_F_IFACE)) - return -1; + return -2; #ifdef CONFIG_IPV6_ROUTER_PREF m |= IPV6_DECODE_PREF(IPV6_EXTRACT_PREF(rt->rt6i_flags)) << 2; #endif - if (!rt6_check_neigh(rt) && (strict & RT6_LOOKUP_F_REACHABLE)) - return -1; + if (strict & RT6_LOOKUP_F_REACHABLE) { + int n = rt6_check_neigh(rt); + if (n < 0) + return n; + } return m; } static struct rt6_info *find_match(struct rt6_info *rt, int oif, int strict, - int *mpri, struct rt6_info *match) + int *mpri, struct rt6_info *match, + bool *do_rr) { int m; + bool match_do_rr = false; if (rt6_check_expired(rt)) goto out; m = rt6_score_route(rt, oif, strict); - if (m < 0) + if (m == -1 && !IS_ENABLED(CONFIG_IPV6_ROUTER_PREF)) { + match_do_rr = true; + m = 0; /* lowest valid score */ + } else if (m < 0) { goto out; + } + + if (strict & RT6_LOOKUP_F_REACHABLE) + rt6_probe(rt); if (m > *mpri) { - if (strict & RT6_LOOKUP_F_REACHABLE) - rt6_probe(match); + *do_rr = match_do_rr; *mpri = m; match = rt; - } else if (strict & RT6_LOOKUP_F_REACHABLE) { - rt6_probe(rt); } - out: return match; } static struct rt6_info *find_rr_leaf(struct fib6_node *fn, struct rt6_info *rr_head, - u32 metric, int oif, int strict) + u32 metric, int oif, int strict, + bool *do_rr) { struct rt6_info *rt, *match; int mpri = -1; @@ -610,10 +625,10 @@ static struct rt6_info *find_rr_leaf(struct fib6_node *fn, match = NULL; for (rt = rr_head; rt && rt->rt6i_metric == metric; rt = rt->dst.rt6_next) - match = find_match(rt, oif, strict, &mpri, match); + match = find_match(rt, oif, strict, &mpri, match, do_rr); for (rt = fn->leaf; rt && rt != rr_head && rt->rt6i_metric == metric; rt = rt->dst.rt6_next) - match = find_match(rt, oif, strict, &mpri, match); + match = find_match(rt, oif, strict, &mpri, match, do_rr); return match; } @@ -622,15 +637,16 @@ static struct rt6_info *rt6_select(struct fib6_node *fn, int oif, int strict) { struct rt6_info *match, *rt0; struct net *net; + bool do_rr = false; rt0 = fn->rr_ptr; if (!rt0) fn->rr_ptr = rt0 = fn->leaf; - match = find_rr_leaf(fn, rt0, rt0->rt6i_metric, oif, strict); + match = find_rr_leaf(fn, rt0, rt0->rt6i_metric, oif, strict, + &do_rr); - if (!match && - (strict & RT6_LOOKUP_F_REACHABLE)) { + if (do_rr) { struct rt6_info *next = rt0->dst.rt6_next; /* no entries matched; do round-robin */ -- 1.8.1.4 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF 2013-07-09 21:57 ` Hannes Frederic Sowa @ 2013-07-10 7:54 ` Nicolas Dichtel 2013-07-10 9:28 ` Nicolas Dichtel 2013-07-10 11:15 ` Hannes Frederic Sowa 0 siblings, 2 replies; 30+ messages in thread From: Nicolas Dichtel @ 2013-07-10 7:54 UTC (permalink / raw) To: netdev, yoshfuji, petrus.lt, davem Le 09/07/2013 23:57, Hannes Frederic Sowa a écrit : > Hello Nicolas! > > I am currently trying to fix the nexthop selection for kernels compiled > without support for CONFIG_IPV6_ROUTER_PREF. I attached my current patch > below. While testing I got the following kernel panic in ecmp code: > > [ 80.144667] ------------[ cut here ]------------ > [ 80.145172] kernel BUG at net/ipv6/ip6_fib.c:733! > [ 80.145172] invalid opcode: 0000 [#1] SMP > [ 80.145172] Modules linked in: 8021q nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer virtio_balloon snd soundcore i2c_piix4 i2c_core virtio_net virtio_blk > [ 80.145172] CPU: 1 PID: 786 Comm: ping6 Not tainted 3.10.0+ #118 > [ 80.145172] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > [ 80.145172] task: ffff880117fa0000 ti: ffff880118770000 task.ti: ffff880118770000 > [ 80.145172] RIP: 0010:[<ffffffff815f3b5d>] [<ffffffff815f3b5d>] fib6_add+0x75d/0x830 > [ 80.145172] RSP: 0018:ffff880118771798 EFLAGS: 00010202 > [ 80.145172] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88011350e480 > [ 80.145172] RDX: ffff88011350e238 RSI: 0000000000000004 RDI: ffff88011350f738 > [ 80.145172] RBP: ffff880118771848 R08: ffff880117903280 R09: 0000000000000001 > [ 80.145172] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88011350f680 > [ 80.145172] R13: ffff880117903280 R14: ffff880118771890 R15: ffff88011350ef90 > [ 80.145172] FS: 00007f02b5127740(0000) GS:ffff88011fd00000(0000) knlGS:0000000000000000 > [ 80.145172] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 80.145172] CR2: 00007f981322a000 CR3: 00000001181b1000 CR4: 00000000000006e0 > [ 80.145172] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 80.145172] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 80.145172] Stack: > [ 80.145172] 0000000000000001 ffff880100000000 ffff880100000000 ffff880117903280 > [ 80.145172] 0000000000000000 ffff880119a4cf00 0000000000000400 00000000000007fa > [ 80.145172] 0000000000000000 0000000000000000 0000000000000000 ffff88011350f680 > [ 80.145172] Call Trace: > [ 80.145172] [<ffffffff815eeceb>] ? rt6_bind_peer+0x4b/0x90 > [ 80.145172] [<ffffffff815ed985>] __ip6_ins_rt+0x45/0x70 > [ 80.145172] [<ffffffff815eee35>] ip6_ins_rt+0x35/0x40 > [ 80.145172] [<ffffffff815ef1e4>] ip6_pol_route.isra.44+0x3a4/0x4b0 > [ 80.145172] [<ffffffff815ef34a>] ip6_pol_route_output+0x2a/0x30 > [ 80.145172] [<ffffffff81616077>] fib6_rule_action+0xd7/0x210 > [ 80.145172] [<ffffffff815ef320>] ? ip6_pol_route_input+0x30/0x30 > [ 80.145172] [<ffffffff81553026>] fib_rules_lookup+0xc6/0x140 > [ 80.145172] [<ffffffff81616374>] fib6_rule_lookup+0x44/0x80 > [ 80.145172] [<ffffffff815ef320>] ? ip6_pol_route_input+0x30/0x30 > [ 80.145172] [<ffffffff815edea3>] ip6_route_output+0x73/0xb0 > [ 80.145172] [<ffffffff815dfdf3>] ip6_dst_lookup_tail+0x2c3/0x2e0 > [ 80.145172] [<ffffffff813007b1>] ? list_del+0x11/0x40 > [ 80.145172] [<ffffffff81082a4c>] ? remove_wait_queue+0x3c/0x50 > [ 80.145172] [<ffffffff815dfe4d>] ip6_dst_lookup_flow+0x3d/0xa0 > [ 80.145172] [<ffffffff815fda77>] rawv6_sendmsg+0x267/0xc20 > [ 80.145172] [<ffffffff815a8a83>] inet_sendmsg+0x63/0xb0 > [ 80.145172] [<ffffffff8128eb93>] ? selinux_socket_sendmsg+0x23/0x30 > [ 80.145172] [<ffffffff815218d6>] sock_sendmsg+0xa6/0xd0 > [ 80.145172] [<ffffffff81524a68>] SYSC_sendto+0x128/0x180 > [ 80.145172] [<ffffffff8109825c>] ? update_curr+0xec/0x170 > [ 80.145172] [<ffffffff81041d09>] ? kvm_clock_get_cycles+0x9/0x10 > [ 80.145172] [<ffffffff810afd1e>] ? __getnstimeofday+0x3e/0xd0 > [ 80.145172] [<ffffffff8152509e>] SyS_sendto+0xe/0x10 > [ 80.145172] [<ffffffff8164efd9>] system_call_fastpath+0x16/0x1b > [ 80.145172] Code: fe ff ff 41 f6 45 2a 06 0f 85 ca fe ff ff 49 8b 7e 08 4c 89 ee e8 94 ef ff ff e9 b9 fe ff ff 48 8b 82 28 05 00 00 e9 01 ff ff ff <0f> 0b 49 8b 54 24 30 0d 00 00 40 00 89 83 14 01 00 00 48 89 53 > [ 80.145172] RIP [<ffffffff815f3b5d>] fib6_add+0x75d/0x830 > [ 80.145172] RSP <ffff880118771798> > [ 80.387413] ---[ end trace 02f20b7a8b81ed95 ]--- > [ 80.390154] Kernel panic - not syncing: Fatal exception in interrupt > > The relevant code is: > > 725 /* For each sibling in the list, increment the counter of > 726 * siblings. BUG() if counters does not match, list of siblings > 727 * is broken! > 728 */ > 729 rt6i_nsiblings = 0; > 730 list_for_each_entry_safe(sibling, temp_sibling, > 731 &rt->rt6i_siblings, rt6i_siblings) { > 732 sibling->rt6i_nsiblings++; > 733 BUG_ON(sibling->rt6i_nsiblings != rt->rt6i_nsiblings); > 734 rt6i_nsiblings++; > 735 } > 736 BUG_ON(rt6i_nsiblings != rt->rt6i_nsiblings); > 737 } > > Currently, I don't know if I my patch is to blame or something in the ecmp logic. > > Another thing I just noticed is: > > 1240 /* Remove this entry from other siblings */ > 1241 if (rt->rt6i_nsiblings) { > 1242 struct rt6_info *sibling, *next_sibling; > 1243 > 1244 list_for_each_entry_safe(sibling, next_sibling, > 1245 &rt->rt6i_siblings, rt6i_siblings) > 1246 sibling->rt6i_nsiblings--; > 1247 rt->rt6i_nsiblings = 0; > 1248 list_del_init(&rt->rt6i_siblings); > 1249 } > > Are we sure we decrement all sibling's rt6i_nsiblings? Shouldn't we > start iterating from fn->leaf? But this does not seem to cause it, > because my trace does not report any calls to fib6_del_route. Note sure to follow you, but all siblings are listed in rt6i_siblings, so it must be enough. > > You could try reproduce it by having an interface autoconfigured with > a default router with NUD_VALID neighbour. I then added an unused vlan > interface (vid 100 in my case) and added the following ip addresses: > > ip -6 a a 2001:ffff::1/64 dev eth0.100 > ip -6 r a 2000::/3 nexthop via 2001:ffff::30 nexthop via 2001:ffff::31 nexthop via 2001:ffff::32 nexthop via 2001:ffff::33 > > (all nexthops should not be reachable) > > After starting a ping6 2000::1 the box should panic soon, after the > first nexthop entry times out. > > Perhaps you could give me a hint? I will run some tests with your patch. Will see. I assume you didn't reproduce this without your patch. Regards, Nicolas > > Thanks a lot, > > Hannes > > > [PATCH RFC] ipv6: fix route selection if kernel not compiled with CONFIG_IPV6_ROUTER_PREF > > This is a follow-up patch to 3630d40067a21d4dfbadc6002bb469ce26ac5d52 > ("ipv6: rt6_check_neigh should successfully verify neigh if no NUD > information are available"). > > Since the removal of rt->n in rt6_info we can end up with a dst == > NULL in rt6_check_neigh. In case the kernel is not compiled with > CONFIG_IPV6_ROUTER_PREF we should also select a route with unkown > NUD state but we must not avoid doing round robin selection on routes > with the same target. So introduce and pass down a boolean ``do_rr'' to > indicate when we should update rt->rr_ptr. As soon as no route is valid > we do backtracking and do a lookup on a higher level in the fib trie. > > To hold correct state on the NUD selection we need to create a neighbour > entry as soon as we tried to validate a nexthop. > > I changed the return value of rt6_check_neigh to: > 1 in case of the dst entry validated > -1 in case of we had no dst_entry and we need to do rr now > -2 in case a we had a dst_entry and it did not validate > > In case of CONFIG_IPV6_ROUTER_PREF, rt6_probe does not allocate an > neighbour entry (!CONFIG_IPV6_ROUTER_PREF: rt6_probe is a nop). Because > of this, we have to create a neighbour entry on nexthop validation to > track earlier validation errors. We recheck NUD state here to shortcurcuit > NUD_NOARP neighbours. > > This seems to be the least complex fix for stable and net. I'll introduce > a new route lookup flag 'idempotent' as soon as next opens to not let > ip route get trigger active NUD validation if CONFIG_IPV6_ROUTER_PREF > is enabled. Currently we trigger active NUD validation if compiled with > CONFIG_IPV6_ROUTER_PREF. > > It also seems advantageous to make CONFIG_IPV6_ROUTER_PREF a runtime > switch and just select the default operation on compile-time. > > v2: > a) improved rt6_check_neigh logic and documented return values > > Reported-by: Pierre Emeriaud <petrus.lt@gmail.com> > Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> > --- > net/ipv6/route.c | 62 +++++++++++++++++++++++++++++++++++--------------------- > 1 file changed, 39 insertions(+), 23 deletions(-) > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > index bd5fd70..c5d9e68 100644 > --- a/net/ipv6/route.c > +++ b/net/ipv6/route.c > @@ -531,28 +531,34 @@ static inline int rt6_check_dev(struct rt6_info *rt, int oif) > return 0; > } > > -static inline bool rt6_check_neigh(struct rt6_info *rt) > +/* This function checks if a neighbour is reachable for routing > + * purposes. It returns -2 in case the neighbour should not get > + * selected as a viable router, -1 in case it should get selected with > + * lowest score and afterwards trying roundrobin. 1 indicates a > + * successfull verification. > + */ > +static inline int rt6_check_neigh(struct rt6_info *rt) > { > struct neighbour *neigh; > - bool ret = false; > + int ret = -2; > > if (rt->rt6i_flags & RTF_NONEXTHOP || > !(rt->rt6i_flags & RTF_GATEWAY)) > - return true; > + return 1; > > rcu_read_lock_bh(); > neigh = __ipv6_neigh_lookup_noref(rt->dst.dev, &rt->rt6i_gateway); > if (neigh) { > read_lock(&neigh->lock); > if (neigh->nud_state & NUD_VALID) > - ret = true; > + ret = 1; > #ifdef CONFIG_IPV6_ROUTER_PREF > else if (!(neigh->nud_state & NUD_FAILED)) > - ret = true; > + ret = 1; > #endif > read_unlock(&neigh->lock); > - } else if (IS_ENABLED(CONFIG_IPV6_ROUTER_PREF)) { > - ret = true; > + } else { > + ret = IS_ENABLED(CONFIG_IPV6_ROUTER_PREF) ? 1 : -1; > } > rcu_read_unlock_bh(); > > @@ -566,43 +572,52 @@ static int rt6_score_route(struct rt6_info *rt, int oif, > > m = rt6_check_dev(rt, oif); > if (!m && (strict & RT6_LOOKUP_F_IFACE)) > - return -1; > + return -2; > #ifdef CONFIG_IPV6_ROUTER_PREF > m |= IPV6_DECODE_PREF(IPV6_EXTRACT_PREF(rt->rt6i_flags)) << 2; > #endif > - if (!rt6_check_neigh(rt) && (strict & RT6_LOOKUP_F_REACHABLE)) > - return -1; > + if (strict & RT6_LOOKUP_F_REACHABLE) { > + int n = rt6_check_neigh(rt); > + if (n < 0) > + return n; > + } > return m; > } > > static struct rt6_info *find_match(struct rt6_info *rt, int oif, int strict, > - int *mpri, struct rt6_info *match) > + int *mpri, struct rt6_info *match, > + bool *do_rr) > { > int m; > + bool match_do_rr = false; > > if (rt6_check_expired(rt)) > goto out; > > m = rt6_score_route(rt, oif, strict); > - if (m < 0) > + if (m == -1 && !IS_ENABLED(CONFIG_IPV6_ROUTER_PREF)) { > + match_do_rr = true; > + m = 0; /* lowest valid score */ > + } else if (m < 0) { > goto out; > + } > + > + if (strict & RT6_LOOKUP_F_REACHABLE) > + rt6_probe(rt); > > if (m > *mpri) { > - if (strict & RT6_LOOKUP_F_REACHABLE) > - rt6_probe(match); > + *do_rr = match_do_rr; > *mpri = m; > match = rt; > - } else if (strict & RT6_LOOKUP_F_REACHABLE) { > - rt6_probe(rt); > } > - > out: > return match; > } > > static struct rt6_info *find_rr_leaf(struct fib6_node *fn, > struct rt6_info *rr_head, > - u32 metric, int oif, int strict) > + u32 metric, int oif, int strict, > + bool *do_rr) > { > struct rt6_info *rt, *match; > int mpri = -1; > @@ -610,10 +625,10 @@ static struct rt6_info *find_rr_leaf(struct fib6_node *fn, > match = NULL; > for (rt = rr_head; rt && rt->rt6i_metric == metric; > rt = rt->dst.rt6_next) > - match = find_match(rt, oif, strict, &mpri, match); > + match = find_match(rt, oif, strict, &mpri, match, do_rr); > for (rt = fn->leaf; rt && rt != rr_head && rt->rt6i_metric == metric; > rt = rt->dst.rt6_next) > - match = find_match(rt, oif, strict, &mpri, match); > + match = find_match(rt, oif, strict, &mpri, match, do_rr); > > return match; > } > @@ -622,15 +637,16 @@ static struct rt6_info *rt6_select(struct fib6_node *fn, int oif, int strict) > { > struct rt6_info *match, *rt0; > struct net *net; > + bool do_rr = false; > > rt0 = fn->rr_ptr; > if (!rt0) > fn->rr_ptr = rt0 = fn->leaf; > > - match = find_rr_leaf(fn, rt0, rt0->rt6i_metric, oif, strict); > + match = find_rr_leaf(fn, rt0, rt0->rt6i_metric, oif, strict, > + &do_rr); > > - if (!match && > - (strict & RT6_LOOKUP_F_REACHABLE)) { > + if (do_rr) { > struct rt6_info *next = rt0->dst.rt6_next; > > /* no entries matched; do round-robin */ > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF 2013-07-10 7:54 ` Nicolas Dichtel @ 2013-07-10 9:28 ` Nicolas Dichtel 2013-07-10 10:53 ` Hannes Frederic Sowa 2013-07-10 11:15 ` Hannes Frederic Sowa 1 sibling, 1 reply; 30+ messages in thread From: Nicolas Dichtel @ 2013-07-10 9:28 UTC (permalink / raw) To: hannes; +Cc: netdev, yoshfuji, petrus.lt, davem Le 10/07/2013 09:54, Nicolas Dichtel a écrit : > Le 09/07/2013 23:57, Hannes Frederic Sowa a écrit : >> Hello Nicolas! >> >> I am currently trying to fix the nexthop selection for kernels compiled >> without support for CONFIG_IPV6_ROUTER_PREF. I attached my current patch >> below. While testing I got the following kernel panic in ecmp code: >> >> [ 80.144667] ------------[ cut here ]------------ >> [ 80.145172] kernel BUG at net/ipv6/ip6_fib.c:733! >> [ 80.145172] invalid opcode: 0000 [#1] SMP >> [ 80.145172] Modules linked in: 8021q nf_conntrack_netbios_ns >> nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT >> nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle >> nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter >> ebtables ip6table_filter ip6_tables snd_hda_intel snd_hda_codec snd_hwdep >> snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer virtio_balloon snd >> soundcore i2c_piix4 i2c_core virtio_net virtio_blk >> [ 80.145172] CPU: 1 PID: 786 Comm: ping6 Not tainted 3.10.0+ #118 >> [ 80.145172] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 >> [ 80.145172] task: ffff880117fa0000 ti: ffff880118770000 task.ti: >> ffff880118770000 >> [ 80.145172] RIP: 0010:[<ffffffff815f3b5d>] [<ffffffff815f3b5d>] >> fib6_add+0x75d/0x830 >> [ 80.145172] RSP: 0018:ffff880118771798 EFLAGS: 00010202 >> [ 80.145172] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88011350e480 >> [ 80.145172] RDX: ffff88011350e238 RSI: 0000000000000004 RDI: ffff88011350f738 >> [ 80.145172] RBP: ffff880118771848 R08: ffff880117903280 R09: 0000000000000001 >> [ 80.145172] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88011350f680 >> [ 80.145172] R13: ffff880117903280 R14: ffff880118771890 R15: ffff88011350ef90 >> [ 80.145172] FS: 00007f02b5127740(0000) GS:ffff88011fd00000(0000) >> knlGS:0000000000000000 >> [ 80.145172] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> [ 80.145172] CR2: 00007f981322a000 CR3: 00000001181b1000 CR4: 00000000000006e0 >> [ 80.145172] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> [ 80.145172] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> [ 80.145172] Stack: >> [ 80.145172] 0000000000000001 ffff880100000000 ffff880100000000 >> ffff880117903280 >> [ 80.145172] 0000000000000000 ffff880119a4cf00 0000000000000400 >> 00000000000007fa >> [ 80.145172] 0000000000000000 0000000000000000 0000000000000000 >> ffff88011350f680 >> [ 80.145172] Call Trace: >> [ 80.145172] [<ffffffff815eeceb>] ? rt6_bind_peer+0x4b/0x90 >> [ 80.145172] [<ffffffff815ed985>] __ip6_ins_rt+0x45/0x70 >> [ 80.145172] [<ffffffff815eee35>] ip6_ins_rt+0x35/0x40 >> [ 80.145172] [<ffffffff815ef1e4>] ip6_pol_route.isra.44+0x3a4/0x4b0 >> [ 80.145172] [<ffffffff815ef34a>] ip6_pol_route_output+0x2a/0x30 >> [ 80.145172] [<ffffffff81616077>] fib6_rule_action+0xd7/0x210 >> [ 80.145172] [<ffffffff815ef320>] ? ip6_pol_route_input+0x30/0x30 >> [ 80.145172] [<ffffffff81553026>] fib_rules_lookup+0xc6/0x140 >> [ 80.145172] [<ffffffff81616374>] fib6_rule_lookup+0x44/0x80 >> [ 80.145172] [<ffffffff815ef320>] ? ip6_pol_route_input+0x30/0x30 >> [ 80.145172] [<ffffffff815edea3>] ip6_route_output+0x73/0xb0 >> [ 80.145172] [<ffffffff815dfdf3>] ip6_dst_lookup_tail+0x2c3/0x2e0 >> [ 80.145172] [<ffffffff813007b1>] ? list_del+0x11/0x40 >> [ 80.145172] [<ffffffff81082a4c>] ? remove_wait_queue+0x3c/0x50 >> [ 80.145172] [<ffffffff815dfe4d>] ip6_dst_lookup_flow+0x3d/0xa0 >> [ 80.145172] [<ffffffff815fda77>] rawv6_sendmsg+0x267/0xc20 >> [ 80.145172] [<ffffffff815a8a83>] inet_sendmsg+0x63/0xb0 >> [ 80.145172] [<ffffffff8128eb93>] ? selinux_socket_sendmsg+0x23/0x30 >> [ 80.145172] [<ffffffff815218d6>] sock_sendmsg+0xa6/0xd0 >> [ 80.145172] [<ffffffff81524a68>] SYSC_sendto+0x128/0x180 >> [ 80.145172] [<ffffffff8109825c>] ? update_curr+0xec/0x170 >> [ 80.145172] [<ffffffff81041d09>] ? kvm_clock_get_cycles+0x9/0x10 >> [ 80.145172] [<ffffffff810afd1e>] ? __getnstimeofday+0x3e/0xd0 >> [ 80.145172] [<ffffffff8152509e>] SyS_sendto+0xe/0x10 >> [ 80.145172] [<ffffffff8164efd9>] system_call_fastpath+0x16/0x1b >> [ 80.145172] Code: fe ff ff 41 f6 45 2a 06 0f 85 ca fe ff ff 49 8b 7e 08 4c >> 89 ee e8 94 ef ff ff e9 b9 fe ff ff 48 8b 82 28 05 00 00 e9 01 ff ff ff <0f> >> 0b 49 8b 54 24 30 0d 00 00 40 00 89 83 14 01 00 00 48 89 53 >> [ 80.145172] RIP [<ffffffff815f3b5d>] fib6_add+0x75d/0x830 >> [ 80.145172] RSP <ffff880118771798> >> [ 80.387413] ---[ end trace 02f20b7a8b81ed95 ]--- >> [ 80.390154] Kernel panic - not syncing: Fatal exception in interrupt >> >> The relevant code is: >> >> 725 /* For each sibling in the list, increment the >> counter of >> 726 * siblings. BUG() if counters does not match, list >> of siblings >> 727 * is broken! >> 728 */ >> 729 rt6i_nsiblings = 0; >> 730 list_for_each_entry_safe(sibling, temp_sibling, >> 731 &rt->rt6i_siblings, >> rt6i_siblings) { >> 732 sibling->rt6i_nsiblings++; >> 733 BUG_ON(sibling->rt6i_nsiblings != >> rt->rt6i_nsiblings); >> 734 rt6i_nsiblings++; >> 735 } >> 736 BUG_ON(rt6i_nsiblings != rt->rt6i_nsiblings); >> 737 } >> >> Currently, I don't know if I my patch is to blame or something in the ecmp logic. >> >> Another thing I just noticed is: >> >> 1240 /* Remove this entry from other siblings */ >> 1241 if (rt->rt6i_nsiblings) { >> 1242 struct rt6_info *sibling, *next_sibling; >> 1243 >> 1244 list_for_each_entry_safe(sibling, next_sibling, >> 1245 &rt->rt6i_siblings, >> rt6i_siblings) >> 1246 sibling->rt6i_nsiblings--; >> 1247 rt->rt6i_nsiblings = 0; >> 1248 list_del_init(&rt->rt6i_siblings); >> 1249 } >> >> Are we sure we decrement all sibling's rt6i_nsiblings? Shouldn't we >> start iterating from fn->leaf? But this does not seem to cause it, >> because my trace does not report any calls to fib6_del_route. > Note sure to follow you, but all siblings are listed in rt6i_siblings, so it > must be enough. > >> >> You could try reproduce it by having an interface autoconfigured with >> a default router with NUD_VALID neighbour. I then added an unused vlan >> interface (vid 100 in my case) and added the following ip addresses: >> >> ip -6 a a 2001:ffff::1/64 dev eth0.100 >> ip -6 r a 2000::/3 nexthop via 2001:ffff::30 nexthop via 2001:ffff::31 nexthop >> via 2001:ffff::32 nexthop via 2001:ffff::33 >> >> (all nexthops should not be reachable) >> >> After starting a ping6 2000::1 the box should panic soon, after the >> first nexthop entry times out. >> >> Perhaps you could give me a hint? > I will run some tests with your patch. Will see. I don't reproduce this panic. Maybe you can try to revert this patch: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/net/ipv6/route.c?id=52bd4c0c1551daa2efa7bb9e01a2f4ea6d1311bb Regards, Nicolas > > I assume you didn't reproduce this without your patch. > > > Regards, > Nicolas > >> >> Thanks a lot, >> >> Hannes >> >> >> [PATCH RFC] ipv6: fix route selection if kernel not compiled with >> CONFIG_IPV6_ROUTER_PREF >> >> This is a follow-up patch to 3630d40067a21d4dfbadc6002bb469ce26ac5d52 >> ("ipv6: rt6_check_neigh should successfully verify neigh if no NUD >> information are available"). >> >> Since the removal of rt->n in rt6_info we can end up with a dst == >> NULL in rt6_check_neigh. In case the kernel is not compiled with >> CONFIG_IPV6_ROUTER_PREF we should also select a route with unkown >> NUD state but we must not avoid doing round robin selection on routes >> with the same target. So introduce and pass down a boolean ``do_rr'' to >> indicate when we should update rt->rr_ptr. As soon as no route is valid >> we do backtracking and do a lookup on a higher level in the fib trie. >> >> To hold correct state on the NUD selection we need to create a neighbour >> entry as soon as we tried to validate a nexthop. >> >> I changed the return value of rt6_check_neigh to: >> 1 in case of the dst entry validated >> -1 in case of we had no dst_entry and we need to do rr now >> -2 in case a we had a dst_entry and it did not validate >> >> In case of CONFIG_IPV6_ROUTER_PREF, rt6_probe does not allocate an >> neighbour entry (!CONFIG_IPV6_ROUTER_PREF: rt6_probe is a nop). Because >> of this, we have to create a neighbour entry on nexthop validation to >> track earlier validation errors. We recheck NUD state here to shortcurcuit >> NUD_NOARP neighbours. >> >> This seems to be the least complex fix for stable and net. I'll introduce >> a new route lookup flag 'idempotent' as soon as next opens to not let >> ip route get trigger active NUD validation if CONFIG_IPV6_ROUTER_PREF >> is enabled. Currently we trigger active NUD validation if compiled with >> CONFIG_IPV6_ROUTER_PREF. >> >> It also seems advantageous to make CONFIG_IPV6_ROUTER_PREF a runtime >> switch and just select the default operation on compile-time. >> >> v2: >> a) improved rt6_check_neigh logic and documented return values >> >> Reported-by: Pierre Emeriaud <petrus.lt@gmail.com> >> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> >> --- >> net/ipv6/route.c | 62 +++++++++++++++++++++++++++++++++++--------------------- >> 1 file changed, 39 insertions(+), 23 deletions(-) >> >> diff --git a/net/ipv6/route.c b/net/ipv6/route.c >> index bd5fd70..c5d9e68 100644 >> --- a/net/ipv6/route.c >> +++ b/net/ipv6/route.c >> @@ -531,28 +531,34 @@ static inline int rt6_check_dev(struct rt6_info *rt, int >> oif) >> return 0; >> } >> >> -static inline bool rt6_check_neigh(struct rt6_info *rt) >> +/* This function checks if a neighbour is reachable for routing >> + * purposes. It returns -2 in case the neighbour should not get >> + * selected as a viable router, -1 in case it should get selected with >> + * lowest score and afterwards trying roundrobin. 1 indicates a >> + * successfull verification. >> + */ >> +static inline int rt6_check_neigh(struct rt6_info *rt) >> { >> struct neighbour *neigh; >> - bool ret = false; >> + int ret = -2; >> >> if (rt->rt6i_flags & RTF_NONEXTHOP || >> !(rt->rt6i_flags & RTF_GATEWAY)) >> - return true; >> + return 1; >> >> rcu_read_lock_bh(); >> neigh = __ipv6_neigh_lookup_noref(rt->dst.dev, &rt->rt6i_gateway); >> if (neigh) { >> read_lock(&neigh->lock); >> if (neigh->nud_state & NUD_VALID) >> - ret = true; >> + ret = 1; >> #ifdef CONFIG_IPV6_ROUTER_PREF >> else if (!(neigh->nud_state & NUD_FAILED)) >> - ret = true; >> + ret = 1; >> #endif >> read_unlock(&neigh->lock); >> - } else if (IS_ENABLED(CONFIG_IPV6_ROUTER_PREF)) { >> - ret = true; >> + } else { >> + ret = IS_ENABLED(CONFIG_IPV6_ROUTER_PREF) ? 1 : -1; >> } >> rcu_read_unlock_bh(); >> >> @@ -566,43 +572,52 @@ static int rt6_score_route(struct rt6_info *rt, int oif, >> >> m = rt6_check_dev(rt, oif); >> if (!m && (strict & RT6_LOOKUP_F_IFACE)) >> - return -1; >> + return -2; >> #ifdef CONFIG_IPV6_ROUTER_PREF >> m |= IPV6_DECODE_PREF(IPV6_EXTRACT_PREF(rt->rt6i_flags)) << 2; >> #endif >> - if (!rt6_check_neigh(rt) && (strict & RT6_LOOKUP_F_REACHABLE)) >> - return -1; >> + if (strict & RT6_LOOKUP_F_REACHABLE) { >> + int n = rt6_check_neigh(rt); >> + if (n < 0) >> + return n; >> + } >> return m; >> } >> >> static struct rt6_info *find_match(struct rt6_info *rt, int oif, int strict, >> - int *mpri, struct rt6_info *match) >> + int *mpri, struct rt6_info *match, >> + bool *do_rr) >> { >> int m; >> + bool match_do_rr = false; >> >> if (rt6_check_expired(rt)) >> goto out; >> >> m = rt6_score_route(rt, oif, strict); >> - if (m < 0) >> + if (m == -1 && !IS_ENABLED(CONFIG_IPV6_ROUTER_PREF)) { >> + match_do_rr = true; >> + m = 0; /* lowest valid score */ >> + } else if (m < 0) { >> goto out; >> + } >> + >> + if (strict & RT6_LOOKUP_F_REACHABLE) >> + rt6_probe(rt); >> >> if (m > *mpri) { >> - if (strict & RT6_LOOKUP_F_REACHABLE) >> - rt6_probe(match); >> + *do_rr = match_do_rr; >> *mpri = m; >> match = rt; >> - } else if (strict & RT6_LOOKUP_F_REACHABLE) { >> - rt6_probe(rt); >> } >> - >> out: >> return match; >> } >> >> static struct rt6_info *find_rr_leaf(struct fib6_node *fn, >> struct rt6_info *rr_head, >> - u32 metric, int oif, int strict) >> + u32 metric, int oif, int strict, >> + bool *do_rr) >> { >> struct rt6_info *rt, *match; >> int mpri = -1; >> @@ -610,10 +625,10 @@ static struct rt6_info *find_rr_leaf(struct fib6_node *fn, >> match = NULL; >> for (rt = rr_head; rt && rt->rt6i_metric == metric; >> rt = rt->dst.rt6_next) >> - match = find_match(rt, oif, strict, &mpri, match); >> + match = find_match(rt, oif, strict, &mpri, match, do_rr); >> for (rt = fn->leaf; rt && rt != rr_head && rt->rt6i_metric == metric; >> rt = rt->dst.rt6_next) >> - match = find_match(rt, oif, strict, &mpri, match); >> + match = find_match(rt, oif, strict, &mpri, match, do_rr); >> >> return match; >> } >> @@ -622,15 +637,16 @@ static struct rt6_info *rt6_select(struct fib6_node *fn, >> int oif, int strict) >> { >> struct rt6_info *match, *rt0; >> struct net *net; >> + bool do_rr = false; >> >> rt0 = fn->rr_ptr; >> if (!rt0) >> fn->rr_ptr = rt0 = fn->leaf; >> >> - match = find_rr_leaf(fn, rt0, rt0->rt6i_metric, oif, strict); >> + match = find_rr_leaf(fn, rt0, rt0->rt6i_metric, oif, strict, >> + &do_rr); >> >> - if (!match && >> - (strict & RT6_LOOKUP_F_REACHABLE)) { >> + if (do_rr) { >> struct rt6_info *next = rt0->dst.rt6_next; >> >> /* no entries matched; do round-robin */ >> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF 2013-07-10 9:28 ` Nicolas Dichtel @ 2013-07-10 10:53 ` Hannes Frederic Sowa 2013-07-10 12:22 ` Nicolas Dichtel 0 siblings, 1 reply; 30+ messages in thread From: Hannes Frederic Sowa @ 2013-07-10 10:53 UTC (permalink / raw) To: Nicolas Dichtel; +Cc: netdev, yoshfuji, petrus.lt, davem On Wed, Jul 10, 2013 at 11:28:57AM +0200, Nicolas Dichtel wrote: > Le 10/07/2013 09:54, Nicolas Dichtel a écrit : > >Le 09/07/2013 23:57, Hannes Frederic Sowa a écrit : > >>After starting a ping6 2000::1 the box should panic soon, after the > >>first nexthop entry times out. > >> > >>Perhaps you could give me a hint? > >I will run some tests with your patch. Will see. > I don't reproduce this panic. I just dumped the routes for which it does increase the rt6i_nsiblings counter in this condition: /* If we have the same destination and the same metric, * but not the same gateway, then the route we try to * add is sibling to this route, increment our counter * of siblings, and later we will add our route to the * list. * Only static routes (which don't have flag * RTF_EXPIRES) are used for ECMPv6. * * To avoid long list, we only had siblings if the * route have a gateway. */ if (rt->rt6i_flags & RTF_GATEWAY && !(rt->rt6i_flags & RTF_EXPIRES) && !(iter->rt6i_flags & RTF_EXPIRES)) rt->rt6i_nsiblings++; dump_route(iter, "(iter)"); dump_route(rt, "(rt)"); } Here: [ 42.497470] (iter): ffff88011796cc00 dst 2000::1 plen 128 gateway 2001:db8::32, siblings 2, metric 0, expires 0 gateway 2 idev6 ffff8801139ddc00 dev ffff880117e83000 [ 42.505912] (rt): ffff88011796d800 dst 2000::1 plen 128 gateway fe80::5054:ff:fe82:e153, siblings 1, metric 0, expires 0 gateway 2 idev6 ffff880117edc400 dev ffff8801185cb000 [ 42.527241] (iter): ffff88011796d380 dst 2000::1 plen 128 gateway 2001:db8::33, siblings 2, metric 0, expires 0 gateway 2 idev6 ffff8801139ddc00 dev ffff880117e83000 [ 42.536440] (rt): ffff88011796d800 dst 2000::1 plen 128 gateway fe80::5054:ff:fe82:e153, siblings 2, metric 0, expires 0 gateway 2 idev6 ffff880117edc400 dev ffff8801185cb000 From my understanding these two routes should not be aggregated in one ecmp route set. Am I seeing this correct? (My configuration is like in the mail before.) I wonder why the '(rt)' route does not have the expires flag, but it seems we have to special-case RTF_CACHE routes here which derive from different levels of the fib6_tree. Does that make sense? Greetings, Hannes ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF 2013-07-10 10:53 ` Hannes Frederic Sowa @ 2013-07-10 12:22 ` Nicolas Dichtel 2013-07-10 13:21 ` Hannes Frederic Sowa 0 siblings, 1 reply; 30+ messages in thread From: Nicolas Dichtel @ 2013-07-10 12:22 UTC (permalink / raw) To: hannes; +Cc: netdev, yoshfuji, petrus.lt, davem Le 10/07/2013 12:53, Hannes Frederic Sowa a écrit : > On Wed, Jul 10, 2013 at 11:28:57AM +0200, Nicolas Dichtel wrote: >> Le 10/07/2013 09:54, Nicolas Dichtel a écrit : >>> Le 09/07/2013 23:57, Hannes Frederic Sowa a écrit : >>>> After starting a ping6 2000::1 the box should panic soon, after the >>>> first nexthop entry times out. >>>> >>>> Perhaps you could give me a hint? >>> I will run some tests with your patch. Will see. >> I don't reproduce this panic. > > I just dumped the routes for which it does increase the rt6i_nsiblings > counter in this condition: > > /* If we have the same destination and the same metric, > * but not the same gateway, then the route we try to > * add is sibling to this route, increment our counter > * of siblings, and later we will add our route to the > * list. > * Only static routes (which don't have flag > * RTF_EXPIRES) are used for ECMPv6. > * > * To avoid long list, we only had siblings if the > * route have a gateway. > */ > if (rt->rt6i_flags & RTF_GATEWAY && > !(rt->rt6i_flags & RTF_EXPIRES) && > !(iter->rt6i_flags & RTF_EXPIRES)) > rt->rt6i_nsiblings++; > dump_route(iter, "(iter)"); > dump_route(rt, "(rt)"); > } > > > > Here: > > [ 42.497470] (iter): ffff88011796cc00 dst 2000::1 plen 128 gateway 2001:db8::32, siblings 2, metric 0, expires 0 gateway 2 idev6 ffff8801139ddc00 dev ffff880117e83000 > [ 42.505912] (rt): ffff88011796d800 dst 2000::1 plen 128 gateway fe80::5054:ff:fe82:e153, siblings 1, metric 0, expires 0 gateway 2 idev6 ffff880117edc400 dev ffff8801185cb000 > [ 42.527241] (iter): ffff88011796d380 dst 2000::1 plen 128 gateway 2001:db8::33, siblings 2, metric 0, expires 0 gateway 2 idev6 ffff8801139ddc00 dev ffff880117e83000 > [ 42.536440] (rt): ffff88011796d800 dst 2000::1 plen 128 gateway fe80::5054:ff:fe82:e153, siblings 2, metric 0, expires 0 gateway 2 idev6 ffff880117edc400 dev ffff8801185cb000 > > From my understanding these two routes should not be aggregated in one ecmp > route set. Am I seeing this correct? (My configuration is like in the mail > before.) Hmm, why? Routes have the same destination, same metric, are static (expires == 0) and have a gateway. nsiblings counts the number of siblings and does not contains ourself, hence both iter should be 1, not 2. Nicolas ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF 2013-07-10 12:22 ` Nicolas Dichtel @ 2013-07-10 13:21 ` Hannes Frederic Sowa 2013-07-10 14:10 ` Nicolas Dichtel 0 siblings, 1 reply; 30+ messages in thread From: Hannes Frederic Sowa @ 2013-07-10 13:21 UTC (permalink / raw) To: Nicolas Dichtel; +Cc: netdev, yoshfuji, petrus.lt, davem On Wed, Jul 10, 2013 at 02:22:55PM +0200, Nicolas Dichtel wrote: > Le 10/07/2013 12:53, Hannes Frederic Sowa a écrit : > >On Wed, Jul 10, 2013 at 11:28:57AM +0200, Nicolas Dichtel wrote: > >>Le 10/07/2013 09:54, Nicolas Dichtel a écrit : > >>>Le 09/07/2013 23:57, Hannes Frederic Sowa a écrit : > >>>>After starting a ping6 2000::1 the box should panic soon, after the > >>>>first nexthop entry times out. > >>>> > >>>>Perhaps you could give me a hint? > >>>I will run some tests with your patch. Will see. > >>I don't reproduce this panic. > > > >I just dumped the routes for which it does increase the rt6i_nsiblings > >counter in this condition: > > > > /* If we have the same destination and the same > > metric, > > * but not the same gateway, then the route we > > try to > > * add is sibling to this route, increment our > > counter > > * of siblings, and later we will add our route > > to the > > * list. > > * Only static routes (which don't have flag > > * RTF_EXPIRES) are used for ECMPv6. > > * > > * To avoid long list, we only had siblings if the > > * route have a gateway. > > */ > > if (rt->rt6i_flags & RTF_GATEWAY && > > !(rt->rt6i_flags & RTF_EXPIRES) && > > !(iter->rt6i_flags & RTF_EXPIRES)) > > rt->rt6i_nsiblings++; > > dump_route(iter, "(iter)"); > > dump_route(rt, "(rt)"); > > } > > > > > > > >Here: > > > >[ 42.497470] (iter): ffff88011796cc00 dst 2000::1 plen 128 gateway > >2001:db8::32, siblings 2, metric 0, expires 0 gateway 2 idev6 > >ffff8801139ddc00 dev ffff880117e83000 > >[ 42.505912] (rt): ffff88011796d800 dst 2000::1 plen 128 gateway > >fe80::5054:ff:fe82:e153, siblings 1, metric 0, expires 0 gateway 2 idev6 > >ffff880117edc400 dev ffff8801185cb000 > >[ 42.527241] (iter): ffff88011796d380 dst 2000::1 plen 128 gateway > >2001:db8::33, siblings 2, metric 0, expires 0 gateway 2 idev6 > >ffff8801139ddc00 dev ffff880117e83000 > >[ 42.536440] (rt): ffff88011796d800 dst 2000::1 plen 128 gateway > >fe80::5054:ff:fe82:e153, siblings 2, metric 0, expires 0 gateway 2 idev6 > >ffff880117edc400 dev ffff8801185cb000 > > > > From my understanding these two routes should not be aggregated in one > > ecmp > >route set. Am I seeing this correct? (My configuration is like in the mail > >before.) > Hmm, why? > Routes have the same destination, same metric, are static (expires == 0) > and have a gateway. The route with rt6i_gateway does actually expire because I got it from autoconf and ip -6 r l confirms this, too. It seems this is only the cached route (I will confirm shortly). Is this still ok? > nsiblings counts the number of siblings and does not contains ourself, > hence both iter should be 1, not 2. Ok. Thanks for helping, Hannes ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF 2013-07-10 13:21 ` Hannes Frederic Sowa @ 2013-07-10 14:10 ` Nicolas Dichtel 2013-07-10 15:20 ` Hannes Frederic Sowa 2013-07-10 21:21 ` Hannes Frederic Sowa 0 siblings, 2 replies; 30+ messages in thread From: Nicolas Dichtel @ 2013-07-10 14:10 UTC (permalink / raw) Cc: netdev, yoshfuji, petrus.lt, davem, hannes Le 10/07/2013 15:21, Hannes Frederic Sowa a écrit : > On Wed, Jul 10, 2013 at 02:22:55PM +0200, Nicolas Dichtel wrote: >> Le 10/07/2013 12:53, Hannes Frederic Sowa a écrit : >>> On Wed, Jul 10, 2013 at 11:28:57AM +0200, Nicolas Dichtel wrote: >>>> Le 10/07/2013 09:54, Nicolas Dichtel a écrit : >>>>> Le 09/07/2013 23:57, Hannes Frederic Sowa a écrit : >>>>>> After starting a ping6 2000::1 the box should panic soon, after the >>>>>> first nexthop entry times out. >>>>>> >>>>>> Perhaps you could give me a hint? >>>>> I will run some tests with your patch. Will see. >>>> I don't reproduce this panic. >>> >>> I just dumped the routes for which it does increase the rt6i_nsiblings >>> counter in this condition: >>> >>> /* If we have the same destination and the same >>> metric, >>> * but not the same gateway, then the route we >>> try to >>> * add is sibling to this route, increment our >>> counter >>> * of siblings, and later we will add our route >>> to the >>> * list. >>> * Only static routes (which don't have flag >>> * RTF_EXPIRES) are used for ECMPv6. >>> * >>> * To avoid long list, we only had siblings if the >>> * route have a gateway. >>> */ >>> if (rt->rt6i_flags & RTF_GATEWAY && >>> !(rt->rt6i_flags & RTF_EXPIRES) && >>> !(iter->rt6i_flags & RTF_EXPIRES)) >>> rt->rt6i_nsiblings++; >>> dump_route(iter, "(iter)"); >>> dump_route(rt, "(rt)"); >>> } >>> >>> >>> >>> Here: >>> >>> [ 42.497470] (iter): ffff88011796cc00 dst 2000::1 plen 128 gateway >>> 2001:db8::32, siblings 2, metric 0, expires 0 gateway 2 idev6 >>> ffff8801139ddc00 dev ffff880117e83000 >>> [ 42.505912] (rt): ffff88011796d800 dst 2000::1 plen 128 gateway >>> fe80::5054:ff:fe82:e153, siblings 1, metric 0, expires 0 gateway 2 idev6 >>> ffff880117edc400 dev ffff8801185cb000 >>> [ 42.527241] (iter): ffff88011796d380 dst 2000::1 plen 128 gateway >>> 2001:db8::33, siblings 2, metric 0, expires 0 gateway 2 idev6 >>> ffff8801139ddc00 dev ffff880117e83000 >>> [ 42.536440] (rt): ffff88011796d800 dst 2000::1 plen 128 gateway >>> fe80::5054:ff:fe82:e153, siblings 2, metric 0, expires 0 gateway 2 idev6 >>> ffff880117edc400 dev ffff8801185cb000 >>> >>> From my understanding these two routes should not be aggregated in one >>> ecmp >>> route set. Am I seeing this correct? (My configuration is like in the mail >>> before.) >> Hmm, why? >> Routes have the same destination, same metric, are static (expires == 0) >> and have a gateway. > > The route with rt6i_gateway does actually expire because I got it from > autoconf and ip -6 r l confirms this, too. It seems this is only the cached > route (I will confirm shortly). Is this still ok? I wonder why expires is 0. Even if this route is cached, the flag RTF_EXPIRES should be set. Am I wrong? Nicolas ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF 2013-07-10 14:10 ` Nicolas Dichtel @ 2013-07-10 15:20 ` Hannes Frederic Sowa 2013-07-10 15:59 ` Hannes Frederic Sowa 2013-07-10 21:21 ` Hannes Frederic Sowa 1 sibling, 1 reply; 30+ messages in thread From: Hannes Frederic Sowa @ 2013-07-10 15:20 UTC (permalink / raw) To: Nicolas Dichtel; +Cc: netdev, yoshfuji, petrus.lt, davem On Wed, Jul 10, 2013 at 04:10:58PM +0200, Nicolas Dichtel wrote: > Le 10/07/2013 15:21, Hannes Frederic Sowa a écrit : > >On Wed, Jul 10, 2013 at 02:22:55PM +0200, Nicolas Dichtel wrote: > >>Le 10/07/2013 12:53, Hannes Frederic Sowa a écrit : > >The route with rt6i_gateway does actually expire because I got it from > >autoconf and ip -6 r l confirms this, too. It seems this is only the cached > >route (I will confirm shortly). Is this still ok? > I wonder why expires is 0. Even if this route is cached, the flag > RTF_EXPIRES should be set. Am I wrong? It seems it is possible cached route gets its expiration updated. As such it is not counted in the iteration but it is found as a sibling and its nsiblings count is updated again. (sorry for the additional mail, forgot to send it to the list) Greetings, Hannes ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF 2013-07-10 15:20 ` Hannes Frederic Sowa @ 2013-07-10 15:59 ` Hannes Frederic Sowa 2013-07-10 16:35 ` Hannes Frederic Sowa 0 siblings, 1 reply; 30+ messages in thread From: Hannes Frederic Sowa @ 2013-07-10 15:59 UTC (permalink / raw) To: Nicolas Dichtel, netdev, yoshfuji, petrus.lt, davem On Wed, Jul 10, 2013 at 05:20:01PM +0200, Hannes Frederic Sowa wrote: > On Wed, Jul 10, 2013 at 04:10:58PM +0200, Nicolas Dichtel wrote: > > Le 10/07/2013 15:21, Hannes Frederic Sowa a écrit : > > >On Wed, Jul 10, 2013 at 02:22:55PM +0200, Nicolas Dichtel wrote: > > >>Le 10/07/2013 12:53, Hannes Frederic Sowa a écrit : > > >The route with rt6i_gateway does actually expire because I got it from > > >autoconf and ip -6 r l confirms this, too. It seems this is only the cached > > >route (I will confirm shortly). Is this still ok? > > I wonder why expires is 0. Even if this route is cached, the flag > > RTF_EXPIRES should be set. Am I wrong? > > It seems it is possible cached route gets its expiration updated. As > such it is not counted in the iteration but it is found as a sibling and its > nsiblings count is updated again. ip6_link_failure is the problem. We need to remove the route directly instead of calling rt6_update_expires: static void ip6_link_failure(struct sk_buff *skb) { struct rt6_info *rt; icmpv6_send(skb, ICMPV6_DEST_UNREACH, ICMPV6_ADDR_UNREACH, 0); rt = (struct rt6_info *) skb_dst(skb); if (rt) { if (rt->rt6i_flags & RTF_CACHE) rt6_update_expires(rt, 0); else if (rt->rt6i_node && (rt->rt6i_flags & RTF_DEFAULT)) rt->rt6i_node->fn_sernum = -1; } } Thanks, Hannes ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF 2013-07-10 15:59 ` Hannes Frederic Sowa @ 2013-07-10 16:35 ` Hannes Frederic Sowa 2013-07-11 8:07 ` Nicolas Dichtel 0 siblings, 1 reply; 30+ messages in thread From: Hannes Frederic Sowa @ 2013-07-10 16:35 UTC (permalink / raw) To: Nicolas Dichtel, netdev, yoshfuji, petrus.lt, davem On Wed, Jul 10, 2013 at 05:59:04PM +0200, Hannes Frederic Sowa wrote: > On Wed, Jul 10, 2013 at 05:20:01PM +0200, Hannes Frederic Sowa wrote: > > On Wed, Jul 10, 2013 at 04:10:58PM +0200, Nicolas Dichtel wrote: > > > Le 10/07/2013 15:21, Hannes Frederic Sowa a écrit : > > > >On Wed, Jul 10, 2013 at 02:22:55PM +0200, Nicolas Dichtel wrote: > > > >>Le 10/07/2013 12:53, Hannes Frederic Sowa a écrit : > > > >The route with rt6i_gateway does actually expire because I got it from > > > >autoconf and ip -6 r l confirms this, too. It seems this is only the cached > > > >route (I will confirm shortly). Is this still ok? > > > I wonder why expires is 0. Even if this route is cached, the flag > > > RTF_EXPIRES should be set. Am I wrong? > > > > It seems it is possible cached route gets its expiration updated. As > > such it is not counted in the iteration but it is found as a sibling and its > > nsiblings count is updated again. > > ip6_link_failure is the problem. We need to remove the route directly instead > of calling rt6_update_expires: > > static void ip6_link_failure(struct sk_buff *skb) > { > struct rt6_info *rt; > > icmpv6_send(skb, ICMPV6_DEST_UNREACH, ICMPV6_ADDR_UNREACH, 0); > > rt = (struct rt6_info *) skb_dst(skb); > if (rt) { > if (rt->rt6i_flags & RTF_CACHE) > rt6_update_expires(rt, 0); > else if (rt->rt6i_node && (rt->rt6i_flags & RTF_DEFAULT)) > rt->rt6i_node->fn_sernum = -1; > } > } With this patch I do not get the panic any more and the routing table stabilizes and discarded all NUD nonavailable routes. I'll need to review it myself, first. Thanks for your help! --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1096,10 +1096,12 @@ static void ip6_link_failure(struct sk_buff *skb) rt = (struct rt6_info *) skb_dst(skb); if (rt) { - if (rt->rt6i_flags & RTF_CACHE) - rt6_update_expires(rt, 0); - else if (rt->rt6i_node && (rt->rt6i_flags & RTF_DEFAULT)) + if (rt->rt6i_flags & RTF_CACHE) { + dst_hold(&rt->dst); + ip6_del_rt(rt); + } else if (rt->rt6i_node && (rt->rt6i_flags & RTF_DEFAULT)) { rt->rt6i_node->fn_sernum = -1; + } } } ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF 2013-07-10 16:35 ` Hannes Frederic Sowa @ 2013-07-11 8:07 ` Nicolas Dichtel 0 siblings, 0 replies; 30+ messages in thread From: Nicolas Dichtel @ 2013-07-11 8:07 UTC (permalink / raw) To: hannes; +Cc: netdev, yoshfuji, petrus.lt, davem Le 10/07/2013 18:35, Hannes Frederic Sowa a écrit : > On Wed, Jul 10, 2013 at 05:59:04PM +0200, Hannes Frederic Sowa wrote: >> On Wed, Jul 10, 2013 at 05:20:01PM +0200, Hannes Frederic Sowa wrote: >>> On Wed, Jul 10, 2013 at 04:10:58PM +0200, Nicolas Dichtel wrote: >>>> Le 10/07/2013 15:21, Hannes Frederic Sowa a écrit : >>>>> On Wed, Jul 10, 2013 at 02:22:55PM +0200, Nicolas Dichtel wrote: >>>>>> Le 10/07/2013 12:53, Hannes Frederic Sowa a écrit : >>>>> The route with rt6i_gateway does actually expire because I got it from >>>>> autoconf and ip -6 r l confirms this, too. It seems this is only the cached >>>>> route (I will confirm shortly). Is this still ok? >>>> I wonder why expires is 0. Even if this route is cached, the flag >>>> RTF_EXPIRES should be set. Am I wrong? >>> >>> It seems it is possible cached route gets its expiration updated. As >>> such it is not counted in the iteration but it is found as a sibling and its >>> nsiblings count is updated again. >> >> ip6_link_failure is the problem. We need to remove the route directly instead >> of calling rt6_update_expires: >> >> static void ip6_link_failure(struct sk_buff *skb) >> { >> struct rt6_info *rt; >> >> icmpv6_send(skb, ICMPV6_DEST_UNREACH, ICMPV6_ADDR_UNREACH, 0); >> >> rt = (struct rt6_info *) skb_dst(skb); >> if (rt) { >> if (rt->rt6i_flags & RTF_CACHE) >> rt6_update_expires(rt, 0); >> else if (rt->rt6i_node && (rt->rt6i_flags & RTF_DEFAULT)) >> rt->rt6i_node->fn_sernum = -1; >> } >> } > > With this patch I do not get the panic any more and the routing table > stabilizes and discarded all NUD nonavailable routes. I'll need to review > it myself, first. Thanks for your help! Good catch! Thank you. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF 2013-07-10 14:10 ` Nicolas Dichtel 2013-07-10 15:20 ` Hannes Frederic Sowa @ 2013-07-10 21:21 ` Hannes Frederic Sowa 2013-07-11 8:04 ` Nicolas Dichtel 1 sibling, 1 reply; 30+ messages in thread From: Hannes Frederic Sowa @ 2013-07-10 21:21 UTC (permalink / raw) To: Nicolas Dichtel; +Cc: netdev, yoshfuji, petrus.lt, davem On Wed, Jul 10, 2013 at 04:10:58PM +0200, Nicolas Dichtel wrote: > I wonder why expires is 0. Even if this route is cached, the flag > RTF_EXPIRES should be set. Am I wrong? rt6_set_from deliberately clears the RTF_EXPIRES when creating a cached copy of the route if the route is an autoconfigured default route. Maybe the criterion for exclusion of which routes can get into an ecmp route set should be revisited? This could result in strange effects for users working with two interfaces, both receiving a RA with default routes. Thanks, Hannes ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF 2013-07-10 21:21 ` Hannes Frederic Sowa @ 2013-07-11 8:04 ` Nicolas Dichtel 2013-07-11 10:24 ` Hannes Frederic Sowa 0 siblings, 1 reply; 30+ messages in thread From: Nicolas Dichtel @ 2013-07-11 8:04 UTC (permalink / raw) To: hannes; +Cc: netdev, yoshfuji, petrus.lt, davem Le 10/07/2013 23:21, Hannes Frederic Sowa a écrit : > On Wed, Jul 10, 2013 at 04:10:58PM +0200, Nicolas Dichtel wrote: >> I wonder why expires is 0. Even if this route is cached, the flag >> RTF_EXPIRES should be set. Am I wrong? > > rt6_set_from deliberately clears the RTF_EXPIRES when creating a cached copy > of the route if the route is an autoconfigured default route. > > Maybe the criterion for exclusion of which routes can get into an ecmp route > set should be revisited? This could result in strange effects for users > working with two interfaces, both receiving a RA with default routes. Agreed. Here is a proposal, what do you think? [PATCH] ipv6: don't use autoconfigured route for ecmp The intention was already there by checking the flag RTF_EXPIRES, but this flag is removed from the default route by rt6_set_from() when this route is cached. Let's add a check against RTF_ADDRCONF. Spotted-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> --- net/ipv6/ip6_fib.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index 192dd1a..87e31c6 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -686,14 +686,14 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt, * of siblings, and later we will add our route to the * list. * Only static routes (which don't have flag - * RTF_EXPIRES) are used for ECMPv6. + * RTF_EXPIRES nor RTF_ADDRCONF) are used for ECMPv6. * * To avoid long list, we only had siblings if the * route have a gateway. */ if (rt->rt6i_flags & RTF_GATEWAY && - !(rt->rt6i_flags & RTF_EXPIRES) && - !(iter->rt6i_flags & RTF_EXPIRES)) + !(rt->rt6i_flags & (RTF_EXPIRES|RTF_ADDRCONF)) && + !(iter->rt6i_flags & (RTF_EXPIRES|RTF_ADDRCONF))) rt->rt6i_nsiblings++; } -- 1.8.2.1 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF 2013-07-11 8:04 ` Nicolas Dichtel @ 2013-07-11 10:24 ` Hannes Frederic Sowa 2013-07-11 14:46 ` Hannes Frederic Sowa 0 siblings, 1 reply; 30+ messages in thread From: Hannes Frederic Sowa @ 2013-07-11 10:24 UTC (permalink / raw) To: Nicolas Dichtel; +Cc: netdev, yoshfuji, petrus.lt, davem On Thu, Jul 11, 2013 at 10:04:47AM +0200, Nicolas Dichtel wrote: > Le 10/07/2013 23:21, Hannes Frederic Sowa a écrit : > >On Wed, Jul 10, 2013 at 04:10:58PM +0200, Nicolas Dichtel wrote: > >>I wonder why expires is 0. Even if this route is cached, the flag > >>RTF_EXPIRES should be set. Am I wrong? > > > >rt6_set_from deliberately clears the RTF_EXPIRES when creating a cached > >copy > >of the route if the route is an autoconfigured default route. > > > >Maybe the criterion for exclusion of which routes can get into an ecmp > >route > >set should be revisited? This could result in strange effects for users > >working with two interfaces, both receiving a RA with default routes. > Agreed. Here is a proposal, what do you think? > > [PATCH] ipv6: don't use autoconfigured route for ecmp > > The intention was already there by checking the flag RTF_EXPIRES, but this > flag > is removed from the default route by rt6_set_from() when this route is > cached. > > Let's add a check against RTF_ADDRCONF. > > Spotted-by: Hannes Frederic Sowa <hannes@stressinduktion.org> > Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> I do think the patch is ok. I just wanted to show you a solution I did on my laptop this night and did not have the time to send yesterday. It is only compile-tested. I did strengthen the RTF_EXPIRES check a bit. Also I am not sure what did stop the search for the first route with the same metric to find a RTF_EXPIRES route, so I also added the guard there, too. I fear, I'll need to do a bit more research. Thanks! [PATCH RFC] ipv6: routes only qualify for ecmp if their original routes do not expire Cloned routes get their RTF_EXPIRES flag reset if they are autoconfigured default routes. Because these routes should not end up in the ecmp route set, we check if the original route has the RTF_EXPIRES flag set. Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> --- net/ipv6/ip6_fib.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index 192dd1a..3bd5fcd 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -632,6 +632,18 @@ insert_above: return ln; } +static bool rt6_qualify_for_ecmp(struct rt6_info *rt0) +{ + struct rt6_info *rt; + + if (!(rt0->rt6i_flags & RTF_GATEWAY)) + return false; + + for (rt = rt0; rt && !(rt->rt6i_flags & RTF_EXPIRES); + rt = (struct rt6_info *)rt->dst.from); + return !(rt && (rt->rt6i_flags & RTF_EXPIRES)); +} + /* * Insert routing information in a node. */ @@ -691,9 +703,8 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt, * To avoid long list, we only had siblings if the * route have a gateway. */ - if (rt->rt6i_flags & RTF_GATEWAY && - !(rt->rt6i_flags & RTF_EXPIRES) && - !(iter->rt6i_flags & RTF_EXPIRES)) + if (rt6_qualify_for_ecmp(rt) && + rt6_qualify_for_ecmp(iter)) rt->rt6i_nsiblings++; } @@ -715,7 +726,8 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt, /* Find the first route that have the same metric */ sibling = fn->leaf; while (sibling) { - if (sibling->rt6i_metric == rt->rt6i_metric) { + if (sibling->rt6i_metric == rt->rt6i_metric && + rt6_qualify_for_ecmp(sibling)) { list_add_tail(&rt->rt6i_siblings, &sibling->rt6i_siblings); break; -- 1.8.1.4 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF 2013-07-11 10:24 ` Hannes Frederic Sowa @ 2013-07-11 14:46 ` Hannes Frederic Sowa 2013-07-11 14:57 ` Nicolas Dichtel 0 siblings, 1 reply; 30+ messages in thread From: Hannes Frederic Sowa @ 2013-07-11 14:46 UTC (permalink / raw) To: Nicolas Dichtel, netdev, yoshfuji, petrus.lt, davem On Thu, Jul 11, 2013 at 12:24:41PM +0200, Hannes Frederic Sowa wrote: > I fear, I'll need to do a bit more research. My proposal is to take my patch and check for RTF_ADDRCONF plus RTF_DYNAMIC, too. The RTF_DYNAMIC check would prevent routes created from icmpv6 redirects entering an ecmp route set. Do you agree? Thanks, Hannes ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF 2013-07-11 14:46 ` Hannes Frederic Sowa @ 2013-07-11 14:57 ` Nicolas Dichtel 2013-07-12 8:51 ` Hannes Frederic Sowa 0 siblings, 1 reply; 30+ messages in thread From: Nicolas Dichtel @ 2013-07-11 14:57 UTC (permalink / raw) To: hannes; +Cc: netdev, yoshfuji, petrus.lt, davem Le 11/07/2013 16:46, Hannes Frederic Sowa a écrit : > On Thu, Jul 11, 2013 at 12:24:41PM +0200, Hannes Frederic Sowa wrote: >> I fear, I'll need to do a bit more research. > > My proposal is to take my patch and check for RTF_ADDRCONF plus RTF_DYNAMIC, > too. The RTF_DYNAMIC check would prevent routes created from icmpv6 redirects > entering an ecmp route set. > > Do you agree? Yes. Regards, Nicolas ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF 2013-07-11 14:57 ` Nicolas Dichtel @ 2013-07-12 8:51 ` Hannes Frederic Sowa 2013-07-12 12:04 ` Nicolas Dichtel 0 siblings, 1 reply; 30+ messages in thread From: Hannes Frederic Sowa @ 2013-07-12 8:51 UTC (permalink / raw) To: Nicolas Dichtel; +Cc: netdev, yoshfuji, petrus.lt, davem Hello Nicolas, On Thu, Jul 11, 2013 at 04:57:42PM +0200, Nicolas Dichtel wrote: > Le 11/07/2013 16:46, Hannes Frederic Sowa a écrit : > >On Thu, Jul 11, 2013 at 12:24:41PM +0200, Hannes Frederic Sowa wrote: > >>I fear, I'll need to do a bit more research. > > > >My proposal is to take my patch and check for RTF_ADDRCONF plus > >RTF_DYNAMIC, > >too. The RTF_DYNAMIC check would prevent routes created from icmpv6 > >redirects > >entering an ecmp route set. > > > >Do you agree? > Yes. There is still some window where things go wrong now, I fear. If we have ecmp routes active and we update the pmtu of that rt6_info, we might end up with a route in the ecmp set, which might not get recountet if another ecmp route joins the set. I will have to think how to deal with this. Do you have an idea? Thanks, Hannes ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF 2013-07-12 8:51 ` Hannes Frederic Sowa @ 2013-07-12 12:04 ` Nicolas Dichtel 2013-07-12 16:19 ` Hannes Frederic Sowa 0 siblings, 1 reply; 30+ messages in thread From: Nicolas Dichtel @ 2013-07-12 12:04 UTC (permalink / raw) To: hannes; +Cc: netdev, yoshfuji, petrus.lt, davem Le 12/07/2013 10:51, Hannes Frederic Sowa a écrit : > Hello Nicolas, > > On Thu, Jul 11, 2013 at 04:57:42PM +0200, Nicolas Dichtel wrote: >> Le 11/07/2013 16:46, Hannes Frederic Sowa a écrit : >>> On Thu, Jul 11, 2013 at 12:24:41PM +0200, Hannes Frederic Sowa wrote: >>>> I fear, I'll need to do a bit more research. >>> >>> My proposal is to take my patch and check for RTF_ADDRCONF plus >>> RTF_DYNAMIC, >>> too. The RTF_DYNAMIC check would prevent routes created from icmpv6 >>> redirects >>> entering an ecmp route set. >>> >>> Do you agree? >> Yes. > > There is still some window where things go wrong now, I fear. If we have ecmp > routes active and we update the pmtu of that rt6_info, we might end up with a > route in the ecmp set, which might not get recountet if another ecmp route > joins the set. I will have to think how to deal with this. Do you have an > idea? It's possible to add a glue to check this counter when we play with these flags, but it's ugly. Maybe the check against RTF_EXPIRES is fundamentally wrong. Checking RTF_ADDRCONF|RTF_DYNAMIC should be enough, what do you think? In another hand, we can discuss about the initial assumption, that was "only static routes are part of ECMP routes". I'm thinking of what are the consequence if we accept to accept all routes, without checking any flags. Regards, Nicolas ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF 2013-07-12 12:04 ` Nicolas Dichtel @ 2013-07-12 16:19 ` Hannes Frederic Sowa 2013-07-12 19:01 ` Nicolas Dichtel 0 siblings, 1 reply; 30+ messages in thread From: Hannes Frederic Sowa @ 2013-07-12 16:19 UTC (permalink / raw) To: Nicolas Dichtel; +Cc: netdev, yoshfuji, petrus.lt, davem On Fri, Jul 12, 2013 at 02:04:45PM +0200, Nicolas Dichtel wrote: > It's possible to add a glue to check this counter when we play with these > flags, but it's ugly. > > Maybe the check against RTF_EXPIRES is fundamentally wrong. Checking > RTF_ADDRCONF|RTF_DYNAMIC should be enough, what do you think? Yes, this seems to be the best option now. I will audit the source if RTF_ADDRCONF and RTF_DYNAMIC are immutable after dst construction and if other errors could arise for that and would go with this solution then. What do you think about making ecmp routes explicit by adding RTF_ECMP flag? > In another hand, we can discuss about the initial assumption, that was > "only static routes are part of ECMP routes". I'm thinking of what are the > consequence if we accept to accept all routes, without checking any flags. I don't have a good feeling about that. But I may be wrong. Greetings, Hannes ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF 2013-07-12 16:19 ` Hannes Frederic Sowa @ 2013-07-12 19:01 ` Nicolas Dichtel 2013-07-12 19:20 ` Hannes Frederic Sowa 0 siblings, 1 reply; 30+ messages in thread From: Nicolas Dichtel @ 2013-07-12 19:01 UTC (permalink / raw) To: Hannes Frederic Sowa; +Cc: netdev, yoshfuji, petrus.lt, davem Le 12/07/2013 18:19, Hannes Frederic Sowa a écrit : > On Fri, Jul 12, 2013 at 02:04:45PM +0200, Nicolas Dichtel wrote: >> It's possible to add a glue to check this counter when we play with these >> flags, but it's ugly. >> >> Maybe the check against RTF_EXPIRES is fundamentally wrong. Checking >> RTF_ADDRCONF|RTF_DYNAMIC should be enough, what do you think? > > Yes, this seems to be the best option now. I will audit the source if > RTF_ADDRCONF and RTF_DYNAMIC are immutable after dst construction and > if other errors could arise for that and would go with this solution then. > > What do you think about making ecmp routes explicit by adding RTF_ECMP > flag? Why not, but you will have to be careful on insertion and deletion. Next hop can be added one by one and deleted one by one. > >> In another hand, we can discuss about the initial assumption, that was >> "only static routes are part of ECMP routes". I'm thinking of what are the >> consequence if we accept to accept all routes, without checking any flags. > > I don't have a good feeling about that. But I may be wrong. Same for me, but for now, I don't have any argument ;-) The above solution is probably the better way for now. Nicolas ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF 2013-07-12 19:01 ` Nicolas Dichtel @ 2013-07-12 19:20 ` Hannes Frederic Sowa 2013-07-12 21:48 ` Hannes Frederic Sowa 0 siblings, 1 reply; 30+ messages in thread From: Hannes Frederic Sowa @ 2013-07-12 19:20 UTC (permalink / raw) To: Nicolas Dichtel; +Cc: netdev, yoshfuji, petrus.lt, davem On Fri, Jul 12, 2013 at 09:01:17PM +0200, Nicolas Dichtel wrote: > Le 12/07/2013 18:19, Hannes Frederic Sowa a écrit : > >On Fri, Jul 12, 2013 at 02:04:45PM +0200, Nicolas Dichtel wrote: > >>It's possible to add a glue to check this counter when we play with these > >>flags, but it's ugly. > >> > >>Maybe the check against RTF_EXPIRES is fundamentally wrong. Checking > >>RTF_ADDRCONF|RTF_DYNAMIC should be enough, what do you think? > > > >Yes, this seems to be the best option now. I will audit the source if > >RTF_ADDRCONF and RTF_DYNAMIC are immutable after dst construction and > >if other errors could arise for that and would go with this solution then. > > > >What do you think about making ecmp routes explicit by adding RTF_ECMP > >flag? > Why not, but you will have to be careful on insertion and deletion. Next > hop can be added one by one and deleted one by one. Ok, we can have a look to do so in -next. > > > >>In another hand, we can discuss about the initial assumption, that was > >>"only static routes are part of ECMP routes". I'm thinking of what are the > >>consequence if we accept to accept all routes, without checking any flags. > > > >I don't have a good feeling about that. But I may be wrong. > Same for me, but for now, I don't have any argument ;-) The above solution > is probably the better way for now. To go without RTF_EXPIRES seems the way to go. I still am unsure if we need to propagate the RTF_DYNAMIC flag in case we update the expiration date on a route. --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -682,6 +682,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt, rt->rt6i_nsiblings = 0; if (!(iter->rt6i_flags & RTF_EXPIRES)) return -EEXIST; + iter->rt6i_flags |= rt->rt6i_flags & RTF_DYNAMIC; if (!(rt->rt6i_flags & RTF_EXPIRES)) rt6_clean_expires(iter); else I hope to have identified all possible site-effects later today. Thanks, Hannes ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF 2013-07-12 19:20 ` Hannes Frederic Sowa @ 2013-07-12 21:48 ` Hannes Frederic Sowa 0 siblings, 0 replies; 30+ messages in thread From: Hannes Frederic Sowa @ 2013-07-12 21:48 UTC (permalink / raw) To: Nicolas Dichtel, netdev, yoshfuji, petrus.lt, davem On Fri, Jul 12, 2013 at 09:20:23PM +0200, Hannes Frederic Sowa wrote: > --- a/net/ipv6/ip6_fib.c > +++ b/net/ipv6/ip6_fib.c > @@ -682,6 +682,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt, > rt->rt6i_nsiblings = 0; > if (!(iter->rt6i_flags & RTF_EXPIRES)) > return -EEXIST; > + iter->rt6i_flags |= rt->rt6i_flags & RTF_DYNAMIC; > if (!(rt->rt6i_flags & RTF_EXPIRES)) > rt6_clean_expires(iter); > else > This was not necessary and just complicated things because we would have to special-case the toggling of the RTF_DYNAMIC flag to also get removed from the ecmp route set. Thanks, Hannes ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF 2013-07-10 7:54 ` Nicolas Dichtel 2013-07-10 9:28 ` Nicolas Dichtel @ 2013-07-10 11:15 ` Hannes Frederic Sowa 2013-07-10 11:40 ` Hannes Frederic Sowa 2013-07-10 12:08 ` Nicolas Dichtel 1 sibling, 2 replies; 30+ messages in thread From: Hannes Frederic Sowa @ 2013-07-10 11:15 UTC (permalink / raw) To: Nicolas Dichtel; +Cc: netdev, yoshfuji, petrus.lt, davem On Wed, Jul 10, 2013 at 09:54:58AM +0200, Nicolas Dichtel wrote: > Le 09/07/2013 23:57, Hannes Frederic Sowa a écrit : > >Are we sure we decrement all sibling's rt6i_nsiblings? Shouldn't we > >start iterating from fn->leaf? But this does not seem to cause it, > >because my trace does not report any calls to fib6_del_route. > Note sure to follow you, but all siblings are listed in rt6i_siblings, so > it must be enough. My hunch was to iterate over fn->leaf->rt_next and compare the metrics like we do when adding a new route. Then take that rt6_info->rt6i_siblings list_head to iterate over the remaining siblings. But I did not review that part carefully, need to check later. > >You could try reproduce it by having an interface autoconfigured with > >a default router with NUD_VALID neighbour. I then added an unused vlan > >interface (vid 100 in my case) and added the following ip addresses: > > > >ip -6 a a 2001:ffff::1/64 dev eth0.100 > >ip -6 r a 2000::/3 nexthop via 2001:ffff::30 nexthop via 2001:ffff::31 > >nexthop via 2001:ffff::32 nexthop via 2001:ffff::33 > > > >(all nexthops should not be reachable) > > > >After starting a ping6 2000::1 the box should panic soon, after the > >first nexthop entry times out. > > > >Perhaps you could give me a hint? > I will run some tests with your patch. Will see. > > I assume you didn't reproduce this without your patch. Current kernel does not correctly select more specific routes, so these routes are not even tried and the logic should not be excercised. Ah, sorry, you should also compile your kernel without CONFIG_IPV6_ROUTER_PREF, too, if you try to reproduce it. Thanks for looking into this, Hannes ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF 2013-07-10 11:15 ` Hannes Frederic Sowa @ 2013-07-10 11:40 ` Hannes Frederic Sowa 2013-07-10 12:08 ` Nicolas Dichtel 1 sibling, 0 replies; 30+ messages in thread From: Hannes Frederic Sowa @ 2013-07-10 11:40 UTC (permalink / raw) To: Nicolas Dichtel, netdev, yoshfuji, petrus.lt, davem On Wed, Jul 10, 2013 at 01:15:04PM +0200, Hannes Frederic Sowa wrote: > On Wed, Jul 10, 2013 at 09:54:58AM +0200, Nicolas Dichtel wrote: > > Le 09/07/2013 23:57, Hannes Frederic Sowa a écrit : > > >Are we sure we decrement all sibling's rt6i_nsiblings? Shouldn't we > > >start iterating from fn->leaf? But this does not seem to cause it, > > >because my trace does not report any calls to fib6_del_route. > > Note sure to follow you, but all siblings are listed in rt6i_siblings, so > > it must be enough. > > My hunch was to iterate over fn->leaf->rt_next and compare the metrics like we > do when adding a new route. Then take that rt6_info->rt6i_siblings list_head > to iterate over the remaining siblings. But I did not review that part > carefully, need to check later. Just checked, it is fine. I was distracted by the way the initial rt6_sibling list was constructed. Thanks, Hannes ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF 2013-07-10 11:15 ` Hannes Frederic Sowa 2013-07-10 11:40 ` Hannes Frederic Sowa @ 2013-07-10 12:08 ` Nicolas Dichtel 2013-07-10 13:17 ` Hannes Frederic Sowa 1 sibling, 1 reply; 30+ messages in thread From: Nicolas Dichtel @ 2013-07-10 12:08 UTC (permalink / raw) To: hannes; +Cc: netdev, yoshfuji, petrus.lt, davem Le 10/07/2013 13:15, Hannes Frederic Sowa a écrit : > On Wed, Jul 10, 2013 at 09:54:58AM +0200, Nicolas Dichtel wrote: >> Le 09/07/2013 23:57, Hannes Frederic Sowa a écrit : >>> Are we sure we decrement all sibling's rt6i_nsiblings? Shouldn't we >>> start iterating from fn->leaf? But this does not seem to cause it, >>> because my trace does not report any calls to fib6_del_route. >> Note sure to follow you, but all siblings are listed in rt6i_siblings, so >> it must be enough. > > My hunch was to iterate over fn->leaf->rt_next and compare the metrics like we > do when adding a new route. Then take that rt6_info->rt6i_siblings list_head > to iterate over the remaining siblings. But I did not review that part > carefully, need to check later. > >>> You could try reproduce it by having an interface autoconfigured with >>> a default router with NUD_VALID neighbour. I then added an unused vlan >>> interface (vid 100 in my case) and added the following ip addresses: >>> >>> ip -6 a a 2001:ffff::1/64 dev eth0.100 >>> ip -6 r a 2000::/3 nexthop via 2001:ffff::30 nexthop via 2001:ffff::31 >>> nexthop via 2001:ffff::32 nexthop via 2001:ffff::33 >>> >>> (all nexthops should not be reachable) >>> >>> After starting a ping6 2000::1 the box should panic soon, after the >>> first nexthop entry times out. >>> >>> Perhaps you could give me a hint? >> I will run some tests with your patch. Will see. >> >> I assume you didn't reproduce this without your patch. > > Current kernel does not correctly select more specific routes, so these routes > are not even tried and the logic should not be excercised. > > Ah, sorry, you should also compile your kernel without > CONFIG_IPV6_ROUTER_PREF, too, if you try to reproduce it. I've done this. My conf (eth1 autoconfigured, I use net-next + your patch): vconfig add eth1 100 ifconfig eth1.100 up ip -6 a a 2001:ffff::1/64 dev eth1.100 ip -6 r a 2000::/3 nexthop via 2001:ffff::30 nexthop via 2001:ffff::31 nexthop via 2001:ffff::32 nexthop via 2001:ffff::33 ping6 2000::1 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF 2013-07-10 12:08 ` Nicolas Dichtel @ 2013-07-10 13:17 ` Hannes Frederic Sowa 2013-07-10 13:49 ` Hannes Frederic Sowa 0 siblings, 1 reply; 30+ messages in thread From: Hannes Frederic Sowa @ 2013-07-10 13:17 UTC (permalink / raw) To: Nicolas Dichtel; +Cc: netdev, yoshfuji, petrus.lt, davem On Wed, Jul 10, 2013 at 02:08:42PM +0200, Nicolas Dichtel wrote: > Le 10/07/2013 13:15, Hannes Frederic Sowa a écrit : > >On Wed, Jul 10, 2013 at 09:54:58AM +0200, Nicolas Dichtel wrote: > >>Le 09/07/2013 23:57, Hannes Frederic Sowa a écrit : > >>>Are we sure we decrement all sibling's rt6i_nsiblings? Shouldn't we > >>>start iterating from fn->leaf? But this does not seem to cause it, > >>>because my trace does not report any calls to fib6_del_route. > >>Note sure to follow you, but all siblings are listed in rt6i_siblings, so > >>it must be enough. > > > >My hunch was to iterate over fn->leaf->rt_next and compare the metrics > >like we > >do when adding a new route. Then take that rt6_info->rt6i_siblings > >list_head > >to iterate over the remaining siblings. But I did not review that part > >carefully, need to check later. > > > >>>You could try reproduce it by having an interface autoconfigured with > >>>a default router with NUD_VALID neighbour. I then added an unused vlan > >>>interface (vid 100 in my case) and added the following ip addresses: > >>> > >>>ip -6 a a 2001:ffff::1/64 dev eth0.100 > >>>ip -6 r a 2000::/3 nexthop via 2001:ffff::30 nexthop via 2001:ffff::31 > >>>nexthop via 2001:ffff::32 nexthop via 2001:ffff::33 > >>> > >>>(all nexthops should not be reachable) > >>> > >>>After starting a ping6 2000::1 the box should panic soon, after the > >>>first nexthop entry times out. > >>> > >>>Perhaps you could give me a hint? > >>I will run some tests with your patch. Will see. > >> > >>I assume you didn't reproduce this without your patch. > > > >Current kernel does not correctly select more specific routes, so these > >routes > >are not even tried and the logic should not be excercised. > > > >Ah, sorry, you should also compile your kernel without > >CONFIG_IPV6_ROUTER_PREF, too, if you try to reproduce it. > I've done this. > > My conf (eth1 autoconfigured, I use net-next + your patch): > vconfig add eth1 100 > ifconfig eth1.100 up > ip -6 a a 2001:ffff::1/64 dev eth1.100 > ip -6 r a 2000::/3 nexthop via 2001:ffff::30 nexthop via 2001:ffff::31 > nexthop via 2001:ffff::32 nexthop via 2001:ffff::33 > ping6 2000::1 Hm, I see. I suspect something with timing. I, too, use a net-next and have one function dump_route added and sprinkeld it at some points. When I copy&pasted your calls I could not reproduce it. After a reboot when just applying the commands from my history (which I did a lot faster), I got the panic again. I'll remove the dump_routes and recheck later. Thanks, Hannes ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF 2013-07-10 13:17 ` Hannes Frederic Sowa @ 2013-07-10 13:49 ` Hannes Frederic Sowa 2013-07-10 14:30 ` Nicolas Dichtel 0 siblings, 1 reply; 30+ messages in thread From: Hannes Frederic Sowa @ 2013-07-10 13:49 UTC (permalink / raw) To: Nicolas Dichtel, netdev, yoshfuji, petrus.lt, davem On Wed, Jul 10, 2013 at 03:17:41PM +0200, Hannes Frederic Sowa wrote: > On Wed, Jul 10, 2013 at 02:08:42PM +0200, Nicolas Dichtel wrote: > > Le 10/07/2013 13:15, Hannes Frederic Sowa a écrit : > > >On Wed, Jul 10, 2013 at 09:54:58AM +0200, Nicolas Dichtel wrote: > > >>Le 09/07/2013 23:57, Hannes Frederic Sowa a écrit : > > >>>Are we sure we decrement all sibling's rt6i_nsiblings? Shouldn't we > > >>>start iterating from fn->leaf? But this does not seem to cause it, > > >>>because my trace does not report any calls to fib6_del_route. > > >>Note sure to follow you, but all siblings are listed in rt6i_siblings, so > > >>it must be enough. > > > > > >My hunch was to iterate over fn->leaf->rt_next and compare the metrics > > >like we > > >do when adding a new route. Then take that rt6_info->rt6i_siblings > > >list_head > > >to iterate over the remaining siblings. But I did not review that part > > >carefully, need to check later. > > > > > >>>You could try reproduce it by having an interface autoconfigured with > > >>>a default router with NUD_VALID neighbour. I then added an unused vlan > > >>>interface (vid 100 in my case) and added the following ip addresses: > > >>> > > >>>ip -6 a a 2001:ffff::1/64 dev eth0.100 > > >>>ip -6 r a 2000::/3 nexthop via 2001:ffff::30 nexthop via 2001:ffff::31 > > >>>nexthop via 2001:ffff::32 nexthop via 2001:ffff::33 > > >>> > > >>>(all nexthops should not be reachable) > > >>> > > >>>After starting a ping6 2000::1 the box should panic soon, after the > > >>>first nexthop entry times out. > > >>> > > >>>Perhaps you could give me a hint? > > >>I will run some tests with your patch. Will see. > > >> > > >>I assume you didn't reproduce this without your patch. > > > > > >Current kernel does not correctly select more specific routes, so these > > >routes > > >are not even tried and the logic should not be excercised. > > > > > >Ah, sorry, you should also compile your kernel without > > >CONFIG_IPV6_ROUTER_PREF, too, if you try to reproduce it. > > I've done this. > > > > My conf (eth1 autoconfigured, I use net-next + your patch): > > vconfig add eth1 100 > > ifconfig eth1.100 up > > ip -6 a a 2001:ffff::1/64 dev eth1.100 > > ip -6 r a 2000::/3 nexthop via 2001:ffff::30 nexthop via 2001:ffff::31 > > nexthop via 2001:ffff::32 nexthop via 2001:ffff::33 > > ping6 2000::1 > > Hm, I see. I suspect something with timing. I, too, use a net-next and have > one function dump_route added and sprinkeld it at some points. > > When I copy&pasted your calls I could not reproduce it. After a reboot when > just applying the commands from my history (which I did a lot faster), I got > the panic again. > > I'll remove the dump_routes and recheck later. This patch ontop --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -46,6 +46,16 @@ #define RT6_TRACE(x...) do { ; } while (0) #endif +static void dump_route(struct rt6_info *rt, const char *prefix) +{ + u32 f = rt->rt6i_flags; + struct rt6key *k = &rt->rt6i_dst; + printk(KERN_INFO "%s: %p dst %pI6c plen %d gateway %pI6c, siblings %d, metric %d, expires %d gateway %d idev6 %p dev %p\n", prefix, + rt, &k->addr, k->plen, &rt->rt6i_gateway, rt->rt6i_nsiblings, rt->rt6i_metric, f&RTF_EXPIRES, f&RTF_GATEWAY, rt->rt6i_idev, rt->dst.dev); +} + + + static struct kmem_cache * fib6_node_kmem __read_mostly; enum fib_walk_state_t @@ -693,8 +703,11 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt, */ if (rt->rt6i_flags & RTF_GATEWAY && !(rt->rt6i_flags & RTF_EXPIRES) && - !(iter->rt6i_flags & RTF_EXPIRES)) + !(iter->rt6i_flags & RTF_EXPIRES)) { rt->rt6i_nsiblings++; + dump_route(rt, "(rt)"); + dump_route(iter, "(iter)"); + } } if (iter->rt6i_metric > rt->rt6i_metric) @@ -718,6 +731,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt, if (sibling->rt6i_metric == rt->rt6i_metric) { list_add_tail(&rt->rt6i_siblings, &sibling->rt6i_siblings); + dump_route(sibling, "(sibling)"); break; } sibling = sibling->dst.rt6_next; @@ -730,6 +744,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt, list_for_each_entry_safe(sibling, temp_sibling, &rt->rt6i_siblings, rt6i_siblings) { sibling->rt6i_nsiblings++; + dump_route(sibling, "(sibling increment)"); BUG_ON(sibling->rt6i_nsiblings != rt->rt6i_nsiblings); rt6i_nsiblings++; } produces this panic: [ 59.234779] (rt): ffff880113242000 dst 2000::1 plen 128 gateway 2001:ffff::33, siblings 1, metric 0, expires 0 gateway 2 idev6 ffff8801131ab000 dev ffff88011816d000 [ 59.243794] (iter): ffff880117e7b680 dst 2000::1 plen 128 gateway 2001:ffff::31, siblings 2, metric 0, expires 0 gateway 2 idev6 ffff8801131ab000 dev ffff88011816d000 [ 59.261383] (rt): ffff880113242000 dst 2000::1 plen 128 gateway 2001:ffff::33, siblings 2, metric 0, expires 0 gateway 2 idev6 ffff8801131ab000 dev ffff88011816d000 [ 59.270030] (iter): ffff880117e7bb00 dst 2000::1 plen 128 gateway 2001:ffff::32, siblings 2, metric 0, expires 0 gateway 2 idev6 ffff8801131ab000 dev ffff88011816d000 [ 59.291933] (sibling): ffff880117e62480 dst 2000::1 plen 128 gateway 2001:ffff::30, siblings 2, metric 0, expires 4194304 gateway 2 idev6 ffff8801131ab000 dev ffff88011816d000 [ 59.306893] (sibling increment): ffff880117e62480 dst 2000::1 plen 128 gateway 2001:ffff::30, siblings 3, metric 0, expires 4194304 gateway 2 idev6 ffff8801131ab000 dev ffff88011816d000 [ 59.318840] ------------[ cut here ]------------ [ 59.319780] kernel BUG at net/ipv6/ip6_fib.c:748! [ 59.319780] invalid opcode: 0000 [#1] SMP [ 59.319780] Modules linked in: 8021q nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc virtio_balloon snd_timer virtio_net snd i2c_piix4 soundcore i2c_core virtio_blk [ 59.319780] CPU: 0 PID: 784 Comm: ping6 Not tainted 3.10.0+ #154 [ 59.319780] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 59.319780] task: ffff880117e98000 ti: ffff8801131e4000 task.ti: ffff8801131e4000 [ 59.319780] RIP: 0010:[<ffffffff815f3ca6>] [<ffffffff815f3ca6>] fib6_add+0x826/0x900 [ 59.319780] RSP: 0018:ffff8801131e5788 EFLAGS: 00010202 [ 59.319780] RAX: 00000000000000ad RBX: ffff880117e7b680 RCX: ffff88011fc0faa8 [ 59.319780] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000246 [ 59.319780] RBP: ffff8801131e5848 R08: ffffffff81ce7f00 R09: 0000000000000244 [ 59.319780] R10: 0000000000000000 R11: 0000000000000243 R12: ffff880117e62480 [ 59.319780] R13: ffff880117e7bb90 R14: 0000000000000000 R15: ffff880113242000 [ 59.319780] FS: 00007f83b6958740(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000 [ 59.319780] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 59.319780] CR2: 00007f83b59e5580 CR3: 00000001187d1000 CR4: 00000000000006f0 [ 59.319780] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 59.319780] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 59.319780] Stack: [ 59.319780] 0000000000000000 0000000100000000 ffff880113035840 ffff880113035840 [ 59.319780] ffff8801131e5890 0000000000000000 ffff880119b80a00 0000000000000000 [ 59.319780] ffff8801131e5818 ffffffff81180ca3 ffff8801131e58e8 ffffffff8109bf04 [ 59.319780] Call Trace: [ 59.319780] [<ffffffff81180ca3>] ? kmem_cache_alloc+0x1a3/0x1f0 [ 59.319780] [<ffffffff8109bf04>] ? load_balance+0xf4/0x7f0 [ 59.319780] [<ffffffff815eeceb>] ? rt6_bind_peer+0x4b/0x90 [ 59.319780] [<ffffffff815ed985>] __ip6_ins_rt+0x45/0x70 [ 59.319780] [<ffffffff815eee35>] ip6_ins_rt+0x35/0x40 [ 59.319780] [<ffffffff815ef1e4>] ip6_pol_route.isra.44+0x3a4/0x4b0 [ 59.319780] [<ffffffff815ef34a>] ip6_pol_route_output+0x2a/0x30 [ 59.319780] [<ffffffff816161c7>] fib6_rule_action+0xd7/0x210 [ 59.319780] [<ffffffff815ef320>] ? ip6_pol_route_input+0x30/0x30 [ 59.319780] [<ffffffff8106cd8f>] ? try_to_del_timer_sync+0x4f/0x70 [ 59.319780] [<ffffffff81553026>] fib_rules_lookup+0xc6/0x140 [ 59.319780] [<ffffffff816164c4>] fib6_rule_lookup+0x44/0x80 [ 59.319780] [<ffffffff815ef320>] ? ip6_pol_route_input+0x30/0x30 [ 59.319780] [<ffffffff815edea3>] ip6_route_output+0x73/0xb0 [ 59.319780] [<ffffffff815dfdf3>] ip6_dst_lookup_tail+0x2c3/0x2e0 [ 59.319780] [<ffffffff81095838>] ? __enqueue_entity+0x78/0x80 [ 59.319780] [<ffffffff815dfe4d>] ip6_dst_lookup_flow+0x3d/0xa0 [ 59.319780] [<ffffffff815fdbc7>] rawv6_sendmsg+0x267/0xc20 [ 59.319780] [<ffffffff815a8a83>] inet_sendmsg+0x63/0xb0 [ 59.319780] [<ffffffff8128eb93>] ? selinux_socket_sendmsg+0x23/0x30 [ 59.319780] [<ffffffff815218d6>] sock_sendmsg+0xa6/0xd0 [ 59.319780] [<ffffffff81524a68>] SYSC_sendto+0x128/0x180 [ 59.319780] [<ffffffff8109825c>] ? update_curr+0xec/0x170 [ 59.319780] [<ffffffff81041d09>] ? kvm_clock_get_cycles+0x9/0x10 [ 59.319780] [<ffffffff810afd1e>] ? __getnstimeofday+0x3e/0xd0 [ 59.319780] [<ffffffff8152509e>] SyS_sendto+0xe/0x10 [ 59.319780] [<ffffffff8164f159>] system_call_fastpath+0x16/0x1b [ 59.319780] Code: 06 0f 85 c6 fe ff ff 48 8b 95 60 ff ff ff 48 89 c6 48 8b 7a 08 e8 cb ee ff ff e9 ae fe ff ff 48 8b 82 28 05 00 00 e9 fc fe ff ff <0f> 0b 49 8b 57 30 0d 00 00 40 00 bb ef ff ff ff 41 89 86 14 01 [ 59.319780] RIP [<ffffffff815f3ca6>] fib6_add+0x826/0x900 [ 59.319780] RSP <ffff8801131e5788> [ 59.506210] ---[ end trace 3ade307f40880be9 ]--- [ 59.507503] Kernel panic - not syncing: Fatal exception in interrupt I can also reproduce it without this debugging diff: git log --oneline HEAD^^^.. 155b81f ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF c7e8e8a bridge: fix some kernel warning in multicast timer 734d4e1 sfc: Fix memory leak when discarding scattered packets (net-next a few days old) Thanks, Hannes ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF 2013-07-10 13:49 ` Hannes Frederic Sowa @ 2013-07-10 14:30 ` Nicolas Dichtel 2013-07-10 14:34 ` Hannes Frederic Sowa 0 siblings, 1 reply; 30+ messages in thread From: Nicolas Dichtel @ 2013-07-10 14:30 UTC (permalink / raw) To: hannes; +Cc: netdev, yoshfuji, petrus.lt, davem Le 10/07/2013 15:49, Hannes Frederic Sowa a écrit : > On Wed, Jul 10, 2013 at 03:17:41PM +0200, Hannes Frederic Sowa wrote: >> On Wed, Jul 10, 2013 at 02:08:42PM +0200, Nicolas Dichtel wrote: >>> Le 10/07/2013 13:15, Hannes Frederic Sowa a écrit : >>>> On Wed, Jul 10, 2013 at 09:54:58AM +0200, Nicolas Dichtel wrote: >>>>> Le 09/07/2013 23:57, Hannes Frederic Sowa a écrit : >>>>>> Are we sure we decrement all sibling's rt6i_nsiblings? Shouldn't we >>>>>> start iterating from fn->leaf? But this does not seem to cause it, >>>>>> because my trace does not report any calls to fib6_del_route. >>>>> Note sure to follow you, but all siblings are listed in rt6i_siblings, so >>>>> it must be enough. >>>> >>>> My hunch was to iterate over fn->leaf->rt_next and compare the metrics >>>> like we >>>> do when adding a new route. Then take that rt6_info->rt6i_siblings >>>> list_head >>>> to iterate over the remaining siblings. But I did not review that part >>>> carefully, need to check later. >>>> >>>>>> You could try reproduce it by having an interface autoconfigured with >>>>>> a default router with NUD_VALID neighbour. I then added an unused vlan >>>>>> interface (vid 100 in my case) and added the following ip addresses: >>>>>> >>>>>> ip -6 a a 2001:ffff::1/64 dev eth0.100 >>>>>> ip -6 r a 2000::/3 nexthop via 2001:ffff::30 nexthop via 2001:ffff::31 >>>>>> nexthop via 2001:ffff::32 nexthop via 2001:ffff::33 >>>>>> >>>>>> (all nexthops should not be reachable) >>>>>> >>>>>> After starting a ping6 2000::1 the box should panic soon, after the >>>>>> first nexthop entry times out. >>>>>> >>>>>> Perhaps you could give me a hint? >>>>> I will run some tests with your patch. Will see. >>>>> >>>>> I assume you didn't reproduce this without your patch. >>>> >>>> Current kernel does not correctly select more specific routes, so these >>>> routes >>>> are not even tried and the logic should not be excercised. >>>> >>>> Ah, sorry, you should also compile your kernel without >>>> CONFIG_IPV6_ROUTER_PREF, too, if you try to reproduce it. >>> I've done this. >>> >>> My conf (eth1 autoconfigured, I use net-next + your patch): >>> vconfig add eth1 100 >>> ifconfig eth1.100 up >>> ip -6 a a 2001:ffff::1/64 dev eth1.100 >>> ip -6 r a 2000::/3 nexthop via 2001:ffff::30 nexthop via 2001:ffff::31 >>> nexthop via 2001:ffff::32 nexthop via 2001:ffff::33 >>> ping6 2000::1 >> >> Hm, I see. I suspect something with timing. I, too, use a net-next and have >> one function dump_route added and sprinkeld it at some points. >> >> When I copy&pasted your calls I could not reproduce it. After a reboot when >> just applying the commands from my history (which I did a lot faster), I got >> the panic again. >> >> I'll remove the dump_routes and recheck later. > > This patch ontop > > --- a/net/ipv6/ip6_fib.c > +++ b/net/ipv6/ip6_fib.c > @@ -46,6 +46,16 @@ > #define RT6_TRACE(x...) do { ; } while (0) > #endif > > +static void dump_route(struct rt6_info *rt, const char *prefix) > +{ > + u32 f = rt->rt6i_flags; > + struct rt6key *k = &rt->rt6i_dst; > + printk(KERN_INFO "%s: %p dst %pI6c plen %d gateway %pI6c, siblings %d, metric %d, expires %d gateway %d idev6 %p dev %p\n", prefix, > + rt, &k->addr, k->plen, &rt->rt6i_gateway, rt->rt6i_nsiblings, rt->rt6i_metric, f&RTF_EXPIRES, f&RTF_GATEWAY, rt->rt6i_idev, rt->dst.dev); > +} > + > + > + > static struct kmem_cache * fib6_node_kmem __read_mostly; > > enum fib_walk_state_t > @@ -693,8 +703,11 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt, > */ > if (rt->rt6i_flags & RTF_GATEWAY && > !(rt->rt6i_flags & RTF_EXPIRES) && > - !(iter->rt6i_flags & RTF_EXPIRES)) > + !(iter->rt6i_flags & RTF_EXPIRES)) { > rt->rt6i_nsiblings++; > + dump_route(rt, "(rt)"); > + dump_route(iter, "(iter)"); > + } > } > > if (iter->rt6i_metric > rt->rt6i_metric) > @@ -718,6 +731,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt, > if (sibling->rt6i_metric == rt->rt6i_metric) { > list_add_tail(&rt->rt6i_siblings, > &sibling->rt6i_siblings); > + dump_route(sibling, "(sibling)"); > break; > } > sibling = sibling->dst.rt6_next; > @@ -730,6 +744,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt, > list_for_each_entry_safe(sibling, temp_sibling, > &rt->rt6i_siblings, rt6i_siblings) { > sibling->rt6i_nsiblings++; > + dump_route(sibling, "(sibling increment)"); > BUG_ON(sibling->rt6i_nsiblings != rt->rt6i_nsiblings); > rt6i_nsiblings++; > } > > produces this panic: > > [ 59.234779] (rt): ffff880113242000 dst 2000::1 plen 128 gateway 2001:ffff::33, siblings 1, metric 0, expires 0 gateway 2 idev6 ffff8801131ab000 dev ffff88011816d000 > [ 59.243794] (iter): ffff880117e7b680 dst 2000::1 plen 128 gateway 2001:ffff::31, siblings 2, metric 0, expires 0 gateway 2 idev6 ffff8801131ab000 dev ffff88011816d000 > [ 59.261383] (rt): ffff880113242000 dst 2000::1 plen 128 gateway 2001:ffff::33, siblings 2, metric 0, expires 0 gateway 2 idev6 ffff8801131ab000 dev ffff88011816d000 > [ 59.270030] (iter): ffff880117e7bb00 dst 2000::1 plen 128 gateway 2001:ffff::32, siblings 2, metric 0, expires 0 gateway 2 idev6 ffff8801131ab000 dev ffff88011816d000 > [ 59.291933] (sibling): ffff880117e62480 dst 2000::1 plen 128 gateway 2001:ffff::30, siblings 2, metric 0, expires 4194304 gateway 2 idev6 ffff8801131ab000 dev ffff88011816d000 > [ 59.306893] (sibling increment): ffff880117e62480 dst 2000::1 plen 128 gateway 2001:ffff::30, siblings 3, metric 0, expires 4194304 gateway 2 idev6 ffff8801131ab000 dev ffff88011816d000 I don't have the same output: [ 97.945170] (rt): f1a02d80 dst 2000:: plen 3 gateway 2001:660:1234:5678::31, siblings 1, metric 1024, expires 0 gateway 2 idev6 f7507e00 dev f7793000 [ 97.948117] (iter): f1a02e80 dst 2000:: plen 3 gateway 2001:660:1234:5678::30, siblings 0, metric 1024, expires 0 gateway 2 idev6 f7507e00 dev f7793000 [ 97.951207] (sibling): f1a02e80 dst 2000:: plen 3 gateway 2001:660:1234:5678::30, siblings 0, metric 1024, expires 0 gateway 2 idev6 f7507e00 dev f7793000 [ 97.954272] (sibling increment): f1a02e80 dst 2000:: plen 3 gateway 2001:660:1234:5678::30, siblings 1, metric 1024, expires 0 gateway 2 idev6 f7507e00 dev f7793000 [ 97.957545] (rt): f1a02c80 dst 2000:: plen 3 gateway 2001:660:1234:5678::32, siblings 1, metric 1024, expires 0 gateway 2 idev6 f7507e00 dev f7793000 [ 97.960376] (iter): f1a02e80 dst 2000:: plen 3 gateway 2001:660:1234:5678::30, siblings 1, metric 1024, expires 0 gateway 2 idev6 f7507e00 dev f7793000 [ 97.961902] (rt): f1a02c80 dst 2000:: plen 3 gateway 2001:660:1234:5678::32, siblings 2, metric 1024, expires 0 gateway 2 idev6 f7507e00 dev f7793000 [ 97.963095] (iter): f1a02d80 dst 2000:: plen 3 gateway 2001:660:1234:5678::31, siblings 1, metric 1024, expires 0 gateway 2 idev6 f7507e00 dev f7793000 [ 97.964354] (sibling): f1a02e80 dst 2000:: plen 3 gateway 2001:660:1234:5678::30, siblings 1, metric 1024, expires 0 gateway 2 idev6 f7507e00 dev f7793000 [ 97.965604] (sibling increment): f1a02e80 dst 2000:: plen 3 gateway 2001:660:1234:5678::30, siblings 2, metric 1024, expires 0 gateway 2 idev6 f7507e00 dev f7793000 [ 97.966916] (sibling increment): f1a02d80 dst 2000:: plen 3 gateway 2001:660:1234:5678::31, siblings 2, metric 1024, expires 0 gateway 2 idev6 f7507e00 dev f7793000 [ 97.968254] (rt): f1a02b80 dst 2000:: plen 3 gateway 2001:660:1234:5678::33, siblings 1, metric 1024, expires 0 gateway 2 idev6 f7507e00 dev f7793000 [ 97.969467] (iter): f1a02e80 dst 2000:: plen 3 gateway 2001:660:1234:5678::30, siblings 2, metric 1024, expires 0 gateway 2 idev6 f7507e00 dev f7793000 [ 97.970702] (rt): f1a02b80 dst 2000:: plen 3 gateway 2001:660:1234:5678::33, siblings 2, metric 1024, expires 0 gateway 2 idev6 f7507e00 dev f7793000 [ 97.971895] (iter): f1a02d80 dst 2000:: plen 3 gateway 2001:660:1234:5678::31, siblings 2, metric 1024, expires 0 gateway 2 idev6 f7507e00 dev f7793000 [ 97.973137] (rt): f1a02b80 dst 2000:: plen 3 gateway 2001:660:1234:5678::33, siblings 3, metric 1024, expires 0 gateway 2 idev6 f7507e00 dev f7793000 [ 97.974331] (iter): f1a02c80 dst 2000:: plen 3 gateway 2001:660:1234:5678::32, siblings 2, metric 1024, expires 0 gateway 2 idev6 f7507e00 dev f7793000 [ 97.975542] (sibling): f1a02e80 dst 2000:: plen 3 gateway 2001:660:1234:5678::30, siblings 2, metric 1024, expires 0 gateway 2 idev6 f7507e00 dev f7793000 [ 97.976808] (sibling increment): f1a02e80 dst 2000:: plen 3 gateway 2001:660:1234:5678::30, siblings 3, metric 1024, expires 0 gateway 2 idev6 f7507e00 dev f7793000 [ 97.978126] (sibling increment): f1a02d80 dst 2000:: plen 3 gateway 2001:660:1234:5678::31, siblings 3, metric 1024, expires 0 gateway 2 idev6 f7507e00 dev f7793000 [ 97.979453] (sibling increment): f1a02c80 dst 2000:: plen 3 gateway 2001:660:1234:5678::32, siblings 3, metric 1024, expires 0 gateway 2 idev6 f7507e00 dev f7793000 Can you send me the output of: ip -6 r ip -6 a ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF 2013-07-10 14:30 ` Nicolas Dichtel @ 2013-07-10 14:34 ` Hannes Frederic Sowa 0 siblings, 0 replies; 30+ messages in thread From: Hannes Frederic Sowa @ 2013-07-10 14:34 UTC (permalink / raw) To: Nicolas Dichtel; +Cc: netdev, yoshfuji, petrus.lt, davem On Wed, Jul 10, 2013 at 04:30:35PM +0200, Nicolas Dichtel wrote: > Le 10/07/2013 15:49, Hannes Frederic Sowa a écrit : > >On Wed, Jul 10, 2013 at 03:17:41PM +0200, Hannes Frederic Sowa wrote: > >>On Wed, Jul 10, 2013 at 02:08:42PM +0200, Nicolas Dichtel wrote: > >>>Le 10/07/2013 13:15, Hannes Frederic Sowa a écrit : > >>>>On Wed, Jul 10, 2013 at 09:54:58AM +0200, Nicolas Dichtel wrote: > >>>>>Le 09/07/2013 23:57, Hannes Frederic Sowa a écrit : > >>>>>>Are we sure we decrement all sibling's rt6i_nsiblings? Shouldn't we > >>>>>>start iterating from fn->leaf? But this does not seem to cause it, > >>>>>>because my trace does not report any calls to fib6_del_route. > >>>>>Note sure to follow you, but all siblings are listed in rt6i_siblings, > >>>>>so > >>>>>it must be enough. > >>>> > >>>>My hunch was to iterate over fn->leaf->rt_next and compare the metrics > >>>>like we > >>>>do when adding a new route. Then take that rt6_info->rt6i_siblings > >>>>list_head > >>>>to iterate over the remaining siblings. But I did not review that part > >>>>carefully, need to check later. > >>>> > >>>>>>You could try reproduce it by having an interface autoconfigured with > >>>>>>a default router with NUD_VALID neighbour. I then added an unused vlan > >>>>>>interface (vid 100 in my case) and added the following ip addresses: > >>>>>> > >>>>>>ip -6 a a 2001:ffff::1/64 dev eth0.100 > >>>>>>ip -6 r a 2000::/3 nexthop via 2001:ffff::30 nexthop via 2001:ffff::31 > >>>>>>nexthop via 2001:ffff::32 nexthop via 2001:ffff::33 > >>>>>> > >>>>>>(all nexthops should not be reachable) > >>>>>> > >>>>>>After starting a ping6 2000::1 the box should panic soon, after the > >>>>>>first nexthop entry times out. > >>>>>> > >>>>>>Perhaps you could give me a hint? > >>>>>I will run some tests with your patch. Will see. > >>>>> > >>>>>I assume you didn't reproduce this without your patch. > >>>> > >>>>Current kernel does not correctly select more specific routes, so these > >>>>routes > >>>>are not even tried and the logic should not be excercised. > >>>> > >>>>Ah, sorry, you should also compile your kernel without > >>>>CONFIG_IPV6_ROUTER_PREF, too, if you try to reproduce it. > >>>I've done this. > >>> > >>>My conf (eth1 autoconfigured, I use net-next + your patch): > >>>vconfig add eth1 100 > >>>ifconfig eth1.100 up > >>>ip -6 a a 2001:ffff::1/64 dev eth1.100 > >>>ip -6 r a 2000::/3 nexthop via 2001:ffff::30 nexthop via 2001:ffff::31 > >>>nexthop via 2001:ffff::32 nexthop via 2001:ffff::33 > >>>ping6 2000::1 > >> > >>Hm, I see. I suspect something with timing. I, too, use a net-next and > >>have > >>one function dump_route added and sprinkeld it at some points. > >> > >>When I copy&pasted your calls I could not reproduce it. After a reboot > >>when > >>just applying the commands from my history (which I did a lot faster), I > >>got > >>the panic again. > >> > >>I'll remove the dump_routes and recheck later. > > > >This patch ontop > > > >--- a/net/ipv6/ip6_fib.c > >+++ b/net/ipv6/ip6_fib.c > >@@ -46,6 +46,16 @@ > > #define RT6_TRACE(x...) do { ; } while (0) > > #endif > > > >+static void dump_route(struct rt6_info *rt, const char *prefix) > >+{ > >+ u32 f = rt->rt6i_flags; > >+ struct rt6key *k = &rt->rt6i_dst; > >+ printk(KERN_INFO "%s: %p dst %pI6c plen %d gateway %pI6c, siblings > >%d, metric %d, expires %d gateway %d idev6 %p dev %p\n", prefix, > >+ rt, &k->addr, k->plen, &rt->rt6i_gateway, > >rt->rt6i_nsiblings, rt->rt6i_metric, f&RTF_EXPIRES, f&RTF_GATEWAY, > >rt->rt6i_idev, rt->dst.dev); > >+} > >+ > >+ > >+ > > static struct kmem_cache * fib6_node_kmem __read_mostly; > > > > enum fib_walk_state_t > >@@ -693,8 +703,11 @@ static int fib6_add_rt2node(struct fib6_node *fn, > >struct rt6_info *rt, > > */ > > if (rt->rt6i_flags & RTF_GATEWAY && > > !(rt->rt6i_flags & RTF_EXPIRES) && > >- !(iter->rt6i_flags & RTF_EXPIRES)) > >+ !(iter->rt6i_flags & RTF_EXPIRES)) { > > rt->rt6i_nsiblings++; > >+ dump_route(rt, "(rt)"); > >+ dump_route(iter, "(iter)"); > >+ } > > } > > > > if (iter->rt6i_metric > rt->rt6i_metric) > >@@ -718,6 +731,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, > >struct rt6_info *rt, > > if (sibling->rt6i_metric == rt->rt6i_metric) { > > list_add_tail(&rt->rt6i_siblings, > > &sibling->rt6i_siblings); > >+ dump_route(sibling, "(sibling)"); > > break; > > } > > sibling = sibling->dst.rt6_next; > >@@ -730,6 +744,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, > >struct rt6_info *rt, > > list_for_each_entry_safe(sibling, temp_sibling, > > &rt->rt6i_siblings, > > rt6i_siblings) { > > sibling->rt6i_nsiblings++; > >+ dump_route(sibling, "(sibling increment)"); > > BUG_ON(sibling->rt6i_nsiblings != > > rt->rt6i_nsiblings); > > rt6i_nsiblings++; > > } > > > >produces this panic: > > > >[ 59.234779] (rt): ffff880113242000 dst 2000::1 plen 128 gateway > >2001:ffff::33, siblings 1, metric 0, expires 0 gateway 2 idev6 > >ffff8801131ab000 dev ffff88011816d000 > >[ 59.243794] (iter): ffff880117e7b680 dst 2000::1 plen 128 gateway > >2001:ffff::31, siblings 2, metric 0, expires 0 gateway 2 idev6 > >ffff8801131ab000 dev ffff88011816d000 > >[ 59.261383] (rt): ffff880113242000 dst 2000::1 plen 128 gateway > >2001:ffff::33, siblings 2, metric 0, expires 0 gateway 2 idev6 > >ffff8801131ab000 dev ffff88011816d000 > >[ 59.270030] (iter): ffff880117e7bb00 dst 2000::1 plen 128 gateway > >2001:ffff::32, siblings 2, metric 0, expires 0 gateway 2 idev6 > >ffff8801131ab000 dev ffff88011816d000 > >[ 59.291933] (sibling): ffff880117e62480 dst 2000::1 plen 128 gateway > >2001:ffff::30, siblings 2, metric 0, expires 4194304 gateway 2 idev6 > >ffff8801131ab000 dev ffff88011816d000 > >[ 59.306893] (sibling increment): ffff880117e62480 dst 2000::1 plen 128 > >gateway 2001:ffff::30, siblings 3, metric 0, expires 4194304 gateway 2 > >idev6 ffff8801131ab000 dev ffff88011816d000 > I don't have the same output: > [ 97.945170] (rt): f1a02d80 dst 2000:: plen 3 gateway > 2001:660:1234:5678::31, siblings 1, metric 1024, expires 0 gateway 2 idev6 > f7507e00 dev f7793000 > [ 97.948117] (iter): f1a02e80 dst 2000:: plen 3 gateway > 2001:660:1234:5678::30, siblings 0, metric 1024, expires 0 gateway 2 idev6 > f7507e00 dev f7793000 > [ 97.951207] (sibling): f1a02e80 dst 2000:: plen 3 gateway > 2001:660:1234:5678::30, siblings 0, metric 1024, expires 0 gateway 2 idev6 > f7507e00 dev f7793000 > [ 97.954272] (sibling increment): f1a02e80 dst 2000:: plen 3 gateway > 2001:660:1234:5678::30, siblings 1, metric 1024, expires 0 gateway 2 idev6 > f7507e00 dev f7793000 > [ 97.957545] (rt): f1a02c80 dst 2000:: plen 3 gateway > 2001:660:1234:5678::32, siblings 1, metric 1024, expires 0 gateway 2 idev6 > f7507e00 dev f7793000 > [ 97.960376] (iter): f1a02e80 dst 2000:: plen 3 gateway > 2001:660:1234:5678::30, siblings 1, metric 1024, expires 0 gateway 2 idev6 > f7507e00 dev f7793000 > [ 97.961902] (rt): f1a02c80 dst 2000:: plen 3 gateway > 2001:660:1234:5678::32, siblings 2, metric 1024, expires 0 gateway 2 idev6 > f7507e00 dev f7793000 > [ 97.963095] (iter): f1a02d80 dst 2000:: plen 3 gateway > 2001:660:1234:5678::31, siblings 1, metric 1024, expires 0 gateway 2 idev6 > f7507e00 dev f7793000 > [ 97.964354] (sibling): f1a02e80 dst 2000:: plen 3 gateway > 2001:660:1234:5678::30, siblings 1, metric 1024, expires 0 gateway 2 idev6 > f7507e00 dev f7793000 > [ 97.965604] (sibling increment): f1a02e80 dst 2000:: plen 3 gateway > 2001:660:1234:5678::30, siblings 2, metric 1024, expires 0 gateway 2 idev6 > f7507e00 dev f7793000 > [ 97.966916] (sibling increment): f1a02d80 dst 2000:: plen 3 gateway > 2001:660:1234:5678::31, siblings 2, metric 1024, expires 0 gateway 2 idev6 > f7507e00 dev f7793000 > [ 97.968254] (rt): f1a02b80 dst 2000:: plen 3 gateway > 2001:660:1234:5678::33, siblings 1, metric 1024, expires 0 gateway 2 idev6 > f7507e00 dev f7793000 > [ 97.969467] (iter): f1a02e80 dst 2000:: plen 3 gateway > 2001:660:1234:5678::30, siblings 2, metric 1024, expires 0 gateway 2 idev6 > f7507e00 dev f7793000 > [ 97.970702] (rt): f1a02b80 dst 2000:: plen 3 gateway > 2001:660:1234:5678::33, siblings 2, metric 1024, expires 0 gateway 2 idev6 > f7507e00 dev f7793000 > [ 97.971895] (iter): f1a02d80 dst 2000:: plen 3 gateway > 2001:660:1234:5678::31, siblings 2, metric 1024, expires 0 gateway 2 idev6 > f7507e00 dev f7793000 > [ 97.973137] (rt): f1a02b80 dst 2000:: plen 3 gateway > 2001:660:1234:5678::33, siblings 3, metric 1024, expires 0 gateway 2 idev6 > f7507e00 dev f7793000 > [ 97.974331] (iter): f1a02c80 dst 2000:: plen 3 gateway > 2001:660:1234:5678::32, siblings 2, metric 1024, expires 0 gateway 2 idev6 > f7507e00 dev f7793000 > [ 97.975542] (sibling): f1a02e80 dst 2000:: plen 3 gateway > 2001:660:1234:5678::30, siblings 2, metric 1024, expires 0 gateway 2 idev6 > f7507e00 dev f7793000 > [ 97.976808] (sibling increment): f1a02e80 dst 2000:: plen 3 gateway > 2001:660:1234:5678::30, siblings 3, metric 1024, expires 0 gateway 2 idev6 > f7507e00 dev f7793000 > [ 97.978126] (sibling increment): f1a02d80 dst 2000:: plen 3 gateway > 2001:660:1234:5678::31, siblings 3, metric 1024, expires 0 gateway 2 idev6 > f7507e00 dev f7793000 > [ 97.979453] (sibling increment): f1a02c80 dst 2000:: plen 3 gateway > 2001:660:1234:5678::32, siblings 3, metric 1024, expires 0 gateway 2 idev6 > f7507e00 dev f7793000 > > Can you send me the output of: > ip -6 r > ip -6 a > Of course: ip -6 r: 2001:700:1300:feed::134 via 2001:ffff::33 dev eth0.100 metric 0 cache expires -7sec 2001:700:1300:feed::134 via fe80::5054:ff:fe82:e153 dev eth0 metric 0 cache 2001:db8:ee8c:180::/64 dev eth0 proto kernel metric 256 expires 86353sec 2001:ffff::/64 dev eth0.100 proto kernel metric 256 2000::/3 via 2001:ffff::30 dev eth0.100 metric 1024 2000::/3 via 2001:ffff::31 dev eth0.100 metric 1024 2000::/3 via 2001:ffff::32 dev eth0.100 metric 1024 2000::/3 via 2001:ffff::33 dev eth0.100 metric 1024 fe80::/64 dev eth0 proto kernel metric 256 fe80::/64 dev eth0.100 proto kernel metric 256 default via fe80::5054:ff:fe82:e153 dev eth0 proto ra metric 1024 expires 1753sec ip -6 a: 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000 inet6 2001:db8:ee8c:180:5054:ff:fe53:c6c9/64 scope global dynamic valid_lft 86392sec preferred_lft 14392sec inet6 fe80::5054:ff:fe53:c6c9/64 scope link valid_lft forever preferred_lft forever 3: eth0.100@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 inet6 2001:ffff::1/64 scope global valid_lft forever preferred_lft forever inet6 fe80::5054:ff:fe53:c6c9/64 scope link valid_lft forever preferred_lft forever Thanks, Hannes ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2013-07-12 21:48 UTC | newest] Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-07-07 17:30 [PATCH RFC] ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF Hannes Frederic Sowa 2013-07-09 21:57 ` Hannes Frederic Sowa 2013-07-10 7:54 ` Nicolas Dichtel 2013-07-10 9:28 ` Nicolas Dichtel 2013-07-10 10:53 ` Hannes Frederic Sowa 2013-07-10 12:22 ` Nicolas Dichtel 2013-07-10 13:21 ` Hannes Frederic Sowa 2013-07-10 14:10 ` Nicolas Dichtel 2013-07-10 15:20 ` Hannes Frederic Sowa 2013-07-10 15:59 ` Hannes Frederic Sowa 2013-07-10 16:35 ` Hannes Frederic Sowa 2013-07-11 8:07 ` Nicolas Dichtel 2013-07-10 21:21 ` Hannes Frederic Sowa 2013-07-11 8:04 ` Nicolas Dichtel 2013-07-11 10:24 ` Hannes Frederic Sowa 2013-07-11 14:46 ` Hannes Frederic Sowa 2013-07-11 14:57 ` Nicolas Dichtel 2013-07-12 8:51 ` Hannes Frederic Sowa 2013-07-12 12:04 ` Nicolas Dichtel 2013-07-12 16:19 ` Hannes Frederic Sowa 2013-07-12 19:01 ` Nicolas Dichtel 2013-07-12 19:20 ` Hannes Frederic Sowa 2013-07-12 21:48 ` Hannes Frederic Sowa 2013-07-10 11:15 ` Hannes Frederic Sowa 2013-07-10 11:40 ` Hannes Frederic Sowa 2013-07-10 12:08 ` Nicolas Dichtel 2013-07-10 13:17 ` Hannes Frederic Sowa 2013-07-10 13:49 ` Hannes Frederic Sowa 2013-07-10 14:30 ` Nicolas Dichtel 2013-07-10 14:34 ` Hannes Frederic Sowa
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).