All of lore.kernel.org
 help / color / mirror / Atom feed
* icmpv6: issue with routing table entries from link local addresses
@ 2016-09-12 14:27 Andreas Hübner
  2016-09-12 17:26 ` Hannes Frederic Sowa
  0 siblings, 1 reply; 10+ messages in thread
From: Andreas Hübner @ 2016-09-12 14:27 UTC (permalink / raw)
  To: netdev

Hi,

I'm currently debugging a potential issue with the icmpv6 stack and
hopefully this is the correct place to ask. (Was actually looking for a
more specific list, but didn't find anything. Please point me to a more
apropriate list if this is out of place here.)

I have the following setup:
  - 2 directly connected hosts (A+B), both have only link local addresses
    configured (interface on both hosts is eth0)
  - host B is also connected to another host C (via interface eth1)
  - main routing table (relevant part) on host B looks like this:

      fe80::/64 dev eth1  proto kernel  metric 256
      fe80::/64 dev eth0  proto kernel  metric 256

  - host A is trying to ICMPv6 ping the link local address of host B

The issue I currently have is, that the echo reply that host B should
generate is never sent back to host A. If I change the order of the
routing table entries on host B, everything works fine.
(host A is connected on eth0)

I'm wondering, if this is how it is supposed to work. Do we need to do a
routing table lookup when generating an ICMPv6 echo reply for link local
addresses?  (From my understanding, this is not done in the neighbour
discovery stack, so why here?)

Actually, I'm convinced I must be doing something wrong here. The setup
for the issue is quite trivial, someone would have tripped over it
already. The only condition is that one host has multiple interfaces
with ipv6 enabled.

Any help in shedding some light onto this issue would be appreciated.


Thanks,
Andreas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: icmpv6: issue with routing table entries from link local addresses
  2016-09-12 14:27 icmpv6: issue with routing table entries from link local addresses Andreas Hübner
@ 2016-09-12 17:26 ` Hannes Frederic Sowa
  2016-09-12 19:17   ` David Ahern
                     ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Hannes Frederic Sowa @ 2016-09-12 17:26 UTC (permalink / raw)
  To: Andreas Hübner, netdev, d. caratti

Hello,

On 12.09.2016 16:27, Andreas Hübner wrote:
> Hi,
> 
> I'm currently debugging a potential issue with the icmpv6 stack and
> hopefully this is the correct place to ask. (Was actually looking for a
> more specific list, but didn't find anything. Please point me to a more
> apropriate list if this is out of place here.)
> 
> I have the following setup:
>   - 2 directly connected hosts (A+B), both have only link local addresses
>     configured (interface on both hosts is eth0)
>   - host B is also connected to another host C (via interface eth1)
>   - main routing table (relevant part) on host B looks like this:
> 
>       fe80::/64 dev eth1  proto kernel  metric 256
>       fe80::/64 dev eth0  proto kernel  metric 256
> 
>   - host A is trying to ICMPv6 ping the link local address of host B
> 
> The issue I currently have is, that the echo reply that host B should
> generate is never sent back to host A. If I change the order of the
> routing table entries on host B, everything works fine.
> (host A is connected on eth0)
> 
> I'm wondering, if this is how it is supposed to work. Do we need to do a
> routing table lookup when generating an ICMPv6 echo reply for link local
> addresses?  (From my understanding, this is not done in the neighbour
> discovery stack, so why here?)

For global addresses this is necessary as asymetric routing could be
involved and we don't want to treat ping echos in any way special.

> Actually, I'm convinced I must be doing something wrong here. The setup
> for the issue is quite trivial, someone would have tripped over it
> already. The only condition is that one host has multiple interfaces
> with ipv6 enabled.
> 
> Any help in shedding some light onto this issue would be appreciated.

This shouldn't be the case. We certainly carry over the ifindex of the
received packet into the routing lookup of the outgoing packet, thus the
appropriate rule, with outgoing ifindex should be selected.

I also couldn't reproduce your problem here with my system. Can you
verify with tcpdump that the packet is leaving on another interface?

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: icmpv6: issue with routing table entries from link local addresses
  2016-09-12 17:26 ` Hannes Frederic Sowa
@ 2016-09-12 19:17   ` David Ahern
  2016-09-13  9:22     ` Andreas Hübner
  2016-09-13 11:59     ` Andreas Hübner
  2016-09-13  2:03   ` Sowmini Varadhan
  2016-09-13  6:35   ` Andreas Hübner
  2 siblings, 2 replies; 10+ messages in thread
From: David Ahern @ 2016-09-12 19:17 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Andreas Hübner, netdev, d. caratti

On 9/12/16 11:26 AM, Hannes Frederic Sowa wrote:
> Hello,
> 
> On 12.09.2016 16:27, Andreas Hübner wrote:
>> Hi,
>>
>> I'm currently debugging a potential issue with the icmpv6 stack and
>> hopefully this is the correct place to ask. (Was actually looking for a
>> more specific list, but didn't find anything. Please point me to a more
>> apropriate list if this is out of place here.)
>>
>> I have the following setup:
>>   - 2 directly connected hosts (A+B), both have only link local addresses
>>     configured (interface on both hosts is eth0)
>>   - host B is also connected to another host C (via interface eth1)
>>   - main routing table (relevant part) on host B looks like this:
>>
>>       fe80::/64 dev eth1  proto kernel  metric 256
>>       fe80::/64 dev eth0  proto kernel  metric 256
>>
>>   - host A is trying to ICMPv6 ping the link local address of host B
>>
>> The issue I currently have is, that the echo reply that host B should
>> generate is never sent back to host A. If I change the order of the
>> routing table entries on host B, everything works fine.
>> (host A is connected on eth0)
>>
>> I'm wondering, if this is how it is supposed to work. Do we need to do a
>> routing table lookup when generating an ICMPv6 echo reply for link local
>> addresses?  (From my understanding, this is not done in the neighbour
>> discovery stack, so why here?)
> 
> For global addresses this is necessary as asymetric routing could be
> involved and we don't want to treat ping echos in any way special.
> 
>> Actually, I'm convinced I must be doing something wrong here. The setup
>> for the issue is quite trivial, someone would have tripped over it
>> already. The only condition is that one host has multiple interfaces
>> with ipv6 enabled.
>>
>> Any help in shedding some light onto this issue would be appreciated.
> 
> This shouldn't be the case. We certainly carry over the ifindex of the
> received packet into the routing lookup of the outgoing packet, thus the
> appropriate rule, with outgoing ifindex should be selected.
> 
> I also couldn't reproduce your problem here with my system. Can you
> verify with tcpdump that the packet is leaving on another interface?

v4.4 and on there are fib6 tracepoints that show the lookup result. May provide some insights.

perf record -a -e fib6:* 
perf script

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: icmpv6: issue with routing table entries from link local addresses
  2016-09-12 17:26 ` Hannes Frederic Sowa
  2016-09-12 19:17   ` David Ahern
@ 2016-09-13  2:03   ` Sowmini Varadhan
  2016-09-13  2:42     ` Hannes Frederic Sowa
  2016-09-13  3:01     ` YOSHIFUJI Hideaki
  2016-09-13  6:35   ` Andreas Hübner
  2 siblings, 2 replies; 10+ messages in thread
From: Sowmini Varadhan @ 2016-09-13  2:03 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Andreas Hübner, netdev, d. caratti, Sowmini Varadhan

On Mon, Sep 12, 2016 at 1:26 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> Hello,
>
> On 12.09.2016 16:27, Andreas Hübner wrote:

>>
>> I have the following setup:
>>   - 2 directly connected hosts (A+B), both have only link local addresses
>>     configured (interface on both hosts is eth0)
           :
>>       fe80::/64 dev eth1  proto kernel  metric 256
>>       fe80::/64 dev eth0  proto kernel  metric 256
           :
>> The issue I currently have is, that the echo reply that host B should
>> generate is never sent back to host A. If I change the order of the
>> routing table entries on host B, everything works fine.
>> (host A is connected on eth0)
   :
> This shouldn't be the case. We certainly carry over the ifindex of the
> received packet into the routing lookup of the outgoing packet, thus the
> appropriate rule, with outgoing ifindex should be selected.

Like Hannes,  I too would first check "is B sending out the echo-resp? on
which interface?".

But a couple of unexpected things I noticed in linux: the link-local
prefix should have a prefixlen of /10 according to
http://www.iana.org/assignments/ipv6-address-space/ipv6-address-space.xhtml
but "ip -6 route show"  lists this as a /64..

moreover, even though I cannot use "ip [-6] route add.." to add the
same prefix multiple times (with different nexthop and/or interface)
unless I explicitly mark them as ECMP with /sbin/ip, it seems like you
can create the same onlink prefix on multiple interfaces, but the
kernel will not treat this as an ECMP group (and sometimes this
can produce inconsistent results depending on the order of
route addition, e.g., for ipv4 rp_filter checks). I dont know if some
variant of  this (latter observation) may be the reason for the behavior
that  Andreas reports.

--Sowmini

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: icmpv6: issue with routing table entries from link local addresses
  2016-09-13  2:03   ` Sowmini Varadhan
@ 2016-09-13  2:42     ` Hannes Frederic Sowa
  2016-09-13  3:05       ` Sowmini Varadhan
  2016-09-13  3:01     ` YOSHIFUJI Hideaki
  1 sibling, 1 reply; 10+ messages in thread
From: Hannes Frederic Sowa @ 2016-09-13  2:42 UTC (permalink / raw)
  To: Sowmini Varadhan
  Cc: Andreas Hübner, netdev, d. caratti, Sowmini Varadhan

On 13.09.2016 04:03, Sowmini Varadhan wrote:
> On Mon, Sep 12, 2016 at 1:26 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
>> Hello,
>>
>> On 12.09.2016 16:27, Andreas Hübner wrote:
> 
>>>
>>> I have the following setup:
>>>   - 2 directly connected hosts (A+B), both have only link local addresses
>>>     configured (interface on both hosts is eth0)
>            :
>>>       fe80::/64 dev eth1  proto kernel  metric 256
>>>       fe80::/64 dev eth0  proto kernel  metric 256
>            :
>>> The issue I currently have is, that the echo reply that host B should
>>> generate is never sent back to host A. If I change the order of the
>>> routing table entries on host B, everything works fine.
>>> (host A is connected on eth0)
>    :
>> This shouldn't be the case. We certainly carry over the ifindex of the
>> received packet into the routing lookup of the outgoing packet, thus the
>> appropriate rule, with outgoing ifindex should be selected.
> 
> Like Hannes,  I too would first check "is B sending out the echo-resp? on
> which interface?".
> 
> But a couple of unexpected things I noticed in linux: the link-local
> prefix should have a prefixlen of /10 according to
> http://www.iana.org/assignments/ipv6-address-space/ipv6-address-space.xhtml
> but "ip -6 route show"  lists this as a /64..

The link local subnet is still specified to be a /64 as the other parts
of the address must be 0. Legally we probably could blackhole them.
https://tools.ietf.org/html/rfc4291#section-2.5.6

> moreover, even though I cannot use "ip [-6] route add.." to add the
> same prefix multiple times (with different nexthop and/or interface)
> unless I explicitly mark them as ECMP with /sbin/ip, it seems like you
> can create the same onlink prefix on multiple interfaces, but the
> kernel will not treat this as an ECMP group (and sometimes this
> can produce inconsistent results depending on the order of
> route addition, e.g., for ipv4 rp_filter checks). I dont know if some
> variant of  this (latter observation) may be the reason for the behavior
> that  Andreas reports.

iproute sets the NLM_F_EXCL flag. Use ip route prepend ...

We don't have urpf checks for ipv6, those are implemented in netfilter
only. This could very well be a firewall issue or something like that.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: icmpv6: issue with routing table entries from link local addresses
  2016-09-13  2:03   ` Sowmini Varadhan
  2016-09-13  2:42     ` Hannes Frederic Sowa
@ 2016-09-13  3:01     ` YOSHIFUJI Hideaki
  1 sibling, 0 replies; 10+ messages in thread
From: YOSHIFUJI Hideaki @ 2016-09-13  3:01 UTC (permalink / raw)
  To: Sowmini Varadhan, Hannes Frederic Sowa
  Cc: hideaki.yoshifuji, Andreas Hübner, netdev, d. caratti,
	Sowmini Varadhan

Sowmini Varadhan wrote:
> On Mon, Sep 12, 2016 at 1:26 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
>> Hello,
>>
>> On 12.09.2016 16:27, Andreas Hübner wrote:
> 
>>>
>>> I have the following setup:
>>>   - 2 directly connected hosts (A+B), both have only link local addresses
>>>     configured (interface on both hosts is eth0)
>            :
>>>       fe80::/64 dev eth1  proto kernel  metric 256
>>>       fe80::/64 dev eth0  proto kernel  metric 256
>            :
>>> The issue I currently have is, that the echo reply that host B should
>>> generate is never sent back to host A. If I change the order of the
>>> routing table entries on host B, everything works fine.
>>> (host A is connected on eth0)
>    :
>> This shouldn't be the case. We certainly carry over the ifindex of the
>> received packet into the routing lookup of the outgoing packet, thus the
>> appropriate rule, with outgoing ifindex should be selected.
> 
> Like Hannes,  I too would first check "is B sending out the echo-resp? on
> which interface?".
> 
> But a couple of unexpected things I noticed in linux: the link-local
> prefix should have a prefixlen of /10 according to
> http://www.iana.org/assignments/ipv6-address-space/ipv6-address-space.xhtml
> but "ip -6 route show"  lists this as a /64..

Do not be confused; link-local address for ethernet is described by
IPv6 over FOO document (e.g., RFC2464 for Ethernet).  The address
(fe80::/64 for Ethernet, for example) is defined inside the link-local
scope unicast address space (/10).

-- 
Hideaki Yoshifuji <hideaki.yoshifuji@miraclelinux.com>
Technical Division, MIRACLE LINUX CORPORATION

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: icmpv6: issue with routing table entries from link local addresses
  2016-09-13  2:42     ` Hannes Frederic Sowa
@ 2016-09-13  3:05       ` Sowmini Varadhan
  0 siblings, 0 replies; 10+ messages in thread
From: Sowmini Varadhan @ 2016-09-13  3:05 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Sowmini Varadhan, Andreas Hübner, netdev, d. caratti

On (09/13/16 04:42), Hannes Frederic Sowa wrote:
> > But a couple of unexpected things I noticed in linux: the link-local
> > prefix should have a prefixlen of /10 according to
> > http://www.iana.org/assignments/ipv6-address-space/ipv6-address-space.xhtml
> > but "ip -6 route show"  lists this as a /64..
> 
> The link local subnet is still specified to be a /64 as the other parts
> of the address must be 0. Legally we probably could blackhole them.
> https://tools.ietf.org/html/rfc4291#section-2.5.6

A bit of a gray area. 4291 does not specify this as MBZ, and IANA
registration is a /10. Both Solaris and BSD use /10. And while fec0
is deprecated, I suppose some similar thing could come up in the
future. ymmv.

> We don't have urpf checks for ipv6, those are implemented in netfilter
> only. This could very well be a firewall issue or something like that.

yes, I know that (no rp_filter check for ipv6), and thats why I said it
may be some similar variant.  What tripped me up is that onlink prefixes 
(which are multipath routes in that they have the same dst, mask, metric)
are not treated as part of the typical IP_ROUTE_MULTIPATH in many places 
in the code because the fib_nhs data-structures do not get set up.
(thus, e.g., one ipoib config I was looking at recently, which 
had multiple ports connected to the same IB switch, and had the same
onlink prefix on these ports,  would not load-spread across all ports
until I explicitly did the 'ip route change' to tell the kernel to
ecmp that prefix).

Lets see what Andreas reports..

--Sowmini

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: icmpv6: issue with routing table entries from link local addresses
  2016-09-12 17:26 ` Hannes Frederic Sowa
  2016-09-12 19:17   ` David Ahern
  2016-09-13  2:03   ` Sowmini Varadhan
@ 2016-09-13  6:35   ` Andreas Hübner
  2 siblings, 0 replies; 10+ messages in thread
From: Andreas Hübner @ 2016-09-13  6:35 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: netdev, d. caratti

On Mon, Sep 12, 2016 at 07:26:23PM +0200, Hannes Frederic Sowa wrote:
> > Actually, I'm convinced I must be doing something wrong here. The setup
> > for the issue is quite trivial, someone would have tripped over it
> > already. The only condition is that one host has multiple interfaces
> > with ipv6 enabled.
> > 
> > Any help in shedding some light onto this issue would be appreciated.
> 
> This shouldn't be the case. We certainly carry over the ifindex of the
> received packet into the routing lookup of the outgoing packet, thus the
> appropriate rule, with outgoing ifindex should be selected.

I saw this in the code and that's the reason why I wrote the initial
mail. Was trying to trace with ftrace, but got stuck somewhere around
the find_rr_leaf function. (If there is any good documentation on the
internal fib data structure, please point me to it.)

> I also couldn't reproduce your problem here with my system. Can you
> verify with tcpdump that the packet is leaving on another interface?

It is not leaving on another interface but simply discarded on the host.
The Ip6OutNoRoutes stat in /proc/net/snmp6 is increased.
>From my understanding the routing subsystem finds the first matching entry
in the main routing table, checks the interface and bails out because it
does not match.

I did omit a crucial information in the last mail, I'm currently stuck
on an older distribution kernel (3.16).
I'll try to check if there have been any relevant changes to IPv6
route lookup in the last two years.
(Maybe I should try to reproduce it with the current kernel, sorry that
I didn't think of this before.)


Andreas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: icmpv6: issue with routing table entries from link local addresses
  2016-09-12 19:17   ` David Ahern
@ 2016-09-13  9:22     ` Andreas Hübner
  2016-09-13 11:59     ` Andreas Hübner
  1 sibling, 0 replies; 10+ messages in thread
From: Andreas Hübner @ 2016-09-13  9:22 UTC (permalink / raw)
  To: David Ahern; +Cc: Hannes Frederic Sowa, netdev, d. caratti

On Mon, Sep 12, 2016 at 01:17:24PM -0600, David Ahern wrote:
> v4.4 and on there are fib6 tracepoints that show the lookup result.
> May provide some insights.
>
> perf record -a -e fib6:*
> perf script

Thanks for the hint, didn't now that something like this exists.

Following up on my earlier mail, I wasn't able to reproduce the issue
with more recent kernel versions. (tried 4.7)

So I guess someone must have fixed it somewhere between 3.16 and 4.7. :)
Okay, will check the git and probably try to backport it.

Again, sorry that I did not check immediately with the more recent kernel
versions. Wasn't expecting that much has changed in this area.

But my request for information with regard to the FIB data structure
still remains, since I'm curious about how it actually works.
(And I already spent some time trying to understand it.)


Thanks for your help, everyone!

Andreas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: icmpv6: issue with routing table entries from link local addresses
  2016-09-12 19:17   ` David Ahern
  2016-09-13  9:22     ` Andreas Hübner
@ 2016-09-13 11:59     ` Andreas Hübner
  1 sibling, 0 replies; 10+ messages in thread
From: Andreas Hübner @ 2016-09-13 11:59 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev

I think I found the relevant fixes:

First and foremost it's 741a11d9e410.
(net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set)

This seems to have already solved my problem, however there were two
followup fixes that I should probably also apply:

d46a9d678e4c net: ipv6: Dont add RT6_LOOKUP_F_IFACE flag if saddr set
6f21c96a78b8 ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail()


So again, sorry for the noise and thanks for your help!

Andreas

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-09-13 11:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-12 14:27 icmpv6: issue with routing table entries from link local addresses Andreas Hübner
2016-09-12 17:26 ` Hannes Frederic Sowa
2016-09-12 19:17   ` David Ahern
2016-09-13  9:22     ` Andreas Hübner
2016-09-13 11:59     ` Andreas Hübner
2016-09-13  2:03   ` Sowmini Varadhan
2016-09-13  2:42     ` Hannes Frederic Sowa
2016-09-13  3:05       ` Sowmini Varadhan
2016-09-13  3:01     ` YOSHIFUJI Hideaki
2016-09-13  6:35   ` Andreas Hübner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.