All of lore.kernel.org
 help / color / mirror / Atom feed
* arp_filter and IPv6 ND
@ 2016-12-02 12:51 Saku Ytti
  2016-12-02 14:08 ` Hannes Frederic Sowa
  0 siblings, 1 reply; 8+ messages in thread
From: Saku Ytti @ 2016-12-02 12:51 UTC (permalink / raw)
  To: netdev

Hey,

net.ipv4.conf.all.arp_filter appears not to have IPv6 counter part.
Or am I missing something? That is Linux does answer to ND queries for
unrelated interfaces by default, and I can't seem to find way to turn
that off.

Is it proper maintainership to accept changes to single protocol,
without mandating the support for other protocol having same
behavioural characteristics?

It is good that some parts for ARP and ND have common code in linux
(neighbour.c) unlike in BSD where everything seems to be
self-contained.
I'd wish that even more of ARP/ND would common, because there are
still lot of common behavioural code in ARP/ND code itself, which
requires double maintenance and are implemented by different people at
different times, so leads to different set of bugs and behaviour for
same intended behaviour.

For example this feature should be protocol agnostic, developer should
only need to develop it once for the higher level behavioural code,
without minding which IP AFI it is for. Obviously that does not
exclude ability to sysctl configure it on/off per AFI.

Thanks!
-- 
  ++ytti

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: arp_filter and IPv6 ND
  2016-12-02 12:51 arp_filter and IPv6 ND Saku Ytti
@ 2016-12-02 14:08 ` Hannes Frederic Sowa
  2016-12-02 15:42   ` Saku Ytti
  0 siblings, 1 reply; 8+ messages in thread
From: Hannes Frederic Sowa @ 2016-12-02 14:08 UTC (permalink / raw)
  To: Saku Ytti, netdev

On 02.12.2016 13:51, Saku Ytti wrote:
> net.ipv4.conf.all.arp_filter appears not to have IPv6 counter part.
> Or am I missing something? That is Linux does answer to ND queries for
> unrelated interfaces by default, and I can't seem to find way to turn
> that off.

May I ask why you want to turn it off?

In IPv6 this depends on the scope. In IPv4 this concept doesn't really
exist.

Please notice that in IPv4 arp_filter does not necessarily mean that the
system is operating in strong end system mode but you end up in an
hybrid clone where arp is acting strong but routing not and thus you
also have to add fib rules to simulate that.

> Is it proper maintainership to accept changes to single protocol,
> without mandating the support for other protocol having same
> behavioural characteristics?
> 
> It is good that some parts for ARP and ND have common code in linux
> (neighbour.c) unlike in BSD where everything seems to be
> self-contained.
>
> I'd wish that even more of ARP/ND would common, because there are
> still lot of common behavioural code in ARP/ND code itself, which
> requires double maintenance and are implemented by different people at
> different times, so leads to different set of bugs and behaviour for
> same intended behaviour.
> 
> For example this feature should be protocol agnostic, developer should
> only need to develop it once for the higher level behavioural code,
> without minding which IP AFI it is for. Obviously that does not
> exclude ability to sysctl configure it on/off per AFI.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: arp_filter and IPv6 ND
  2016-12-02 14:08 ` Hannes Frederic Sowa
@ 2016-12-02 15:42   ` Saku Ytti
  2016-12-02 16:45     ` Hannes Frederic Sowa
  0 siblings, 1 reply; 8+ messages in thread
From: Saku Ytti @ 2016-12-02 15:42 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: netdev

On 2 December 2016 at 16:08, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:

Hey,

> May I ask why you want to turn it off?

Certainly. I don't want device to answer with link address for L3
address it does not have on the link. In my case it triggers this bug
https://supportforums.cisco.com/document/12098096/cscse46790-cef-prefers-arp-adjacency-over-rib-next-hop

In this particular case, for one reason or another my Cisco device
would have ND entry for Linux loopback pointing to an interface with
completely different network.  Which itself would be just weird, but
combined with weird behaviour of Cisco it actually causes the loopback
route advertised by BGP not to be installed. If the ND entry didn't
exist, the BGP route would be installed.
I don't really even know why the ND entry exists, all I can think of
is that Linux must have sent gratuitous reply, because I don't se why
Cisco would have tried to discover it.

Expected behaviour is that the loopback/128 BGP route resolves to
on-link next-hop, and on-link next hop is then ND'd. Observed
behaviour is that loopback/128 BGP route also appears in ND cache.

> In IPv6 this depends on the scope. In IPv4 this concept doesn't really
> exist.
>
> Please notice that in IPv4 arp_filter does not necessarily mean that the
> system is operating in strong end system mode but you end up in an
> hybrid clone where arp is acting strong but routing not and thus you
> also have to add fib rules to simulate that.

It's just very peculiar behaviour to have ARP or ND entries on a
interface where given subnet does not exist, it rudimentarily causes
difficult to troubleshoot problems and is surprising/unexpected
behaviour.
Of course well behaving device wouldn't accept such replies, because
it itself could be attack vector (imagine me telling you 8.8.8.8 is on
the link, or worse, your bank).

I'm curious, why does this behaviour exist? When is this desirable?
I've never seen any other device than Linux behave like this, and when
ever I've heard about the problem, I've only seen surprised faces that
it does behave like this.
-- 
  ++ytti

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: arp_filter and IPv6 ND
  2016-12-02 15:42   ` Saku Ytti
@ 2016-12-02 16:45     ` Hannes Frederic Sowa
  2016-12-02 17:51       ` Saku Ytti
  0 siblings, 1 reply; 8+ messages in thread
From: Hannes Frederic Sowa @ 2016-12-02 16:45 UTC (permalink / raw)
  To: Saku Ytti; +Cc: netdev

Hello,

On 02.12.2016 16:42, Saku Ytti wrote:
> On 2 December 2016 at 16:08, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> 
> Hey,
> 
>> May I ask why you want to turn it off?
> 
> Certainly. I don't want device to answer with link address for L3
> address it does not have on the link. In my case it triggers this bug
> https://supportforums.cisco.com/document/12098096/cscse46790-cef-prefers-arp-adjacency-over-rib-next-hop

Okay, that should not happen.

Redirects and neighbor advertisements are the only way how you can
announce prefixes on-link. Unfortunately historically we automatically
add device routes for prefixes, too. We can't change this anymore but
this is wrong.

> In this particular case, for one reason or another my Cisco device
> would have ND entry for Linux loopback pointing to an interface with
> completely different network.  Which itself would be just weird, but
> combined with weird behaviour of Cisco it actually causes the loopback
> route advertised by BGP not to be installed. If the ND entry didn't
> exist, the BGP route would be installed.

Hmmmm... Loopback route advertised by BGP? Do you use filter to get rid
of that on your AS-border? So you probably don't use an IGP? Do you use
next-hop-self attribute on your neighbor in that direction? BGP in
general doesn't lead to ND entry installs, protocols like IS-IS afair
can short circuit here.

Hmm, I would keep the Loopback announcements out of the BGP.

> I don't really even know why the ND entry exists, all I can think of
> is that Linux must have sent gratuitous reply, because I don't se why
> Cisco would have tried to discover it.
>
> Expected behaviour is that the loopback/128 BGP route resolves to
> on-link next-hop, and on-link next hop is then ND'd. Observed
> behaviour is that loopback/128 BGP route also appears in ND cache.

Yep, exactly.

>> In IPv6 this depends on the scope. In IPv4 this concept doesn't really
>> exist.
>>
>> Please notice that in IPv4 arp_filter does not necessarily mean that the
>> system is operating in strong end system mode but you end up in an
>> hybrid clone where arp is acting strong but routing not and thus you
>> also have to add fib rules to simulate that.
> 
> It's just very peculiar behaviour to have ARP or ND entries on a
> interface where given subnet does not exist, it rudimentarily causes
> difficult to troubleshoot problems and is surprising/unexpected
> behaviour.

For enterprise and cloud stuff it is certainly very surprising, as some
isolations don't work as expected. OTOH it is really easy to build up
home networks and things are more plug and play.

> Of course well behaving device wouldn't accept such replies, because
> it itself could be attack vector (imagine me telling you 8.8.8.8 is on
> the link, or worse, your bank).
> 
> I'm curious, why does this behaviour exist? When is this desirable?
> I've never seen any other device than Linux behave like this, and when
> ever I've heard about the problem, I've only seen surprised faces that
> it does behave like this.

I don't feel comfortable to answer that, just some thoughts...

Some RFCs require that for some router implementations (CPE), on the
other hand weak end model in Linux was probably inherited by IPv4. The
addition of duplicate address detection (which of course only makes
sense in strong end systems) to IPv6, basically shows that IPv6 is more
or less designed to be a strong end system model.

Anyway, a patch to suppress ndisc requests on those interfaces will
probably be accepted.

For unicast reverse filtering e.g. there is actually no sysctl available
anymore, instead you are supposed to install a netfilter rule to handle
this, which automatically takes care of this.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: arp_filter and IPv6 ND
  2016-12-02 16:45     ` Hannes Frederic Sowa
@ 2016-12-02 17:51       ` Saku Ytti
  2016-12-02 18:39         ` Hannes Frederic Sowa
  0 siblings, 1 reply; 8+ messages in thread
From: Saku Ytti @ 2016-12-02 17:51 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: netdev

On 2 December 2016 at 18:45, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:

> next-hop-self attribute on your neighbor in that direction? BGP in
> general doesn't lead to ND entry installs, protocols like IS-IS afair
> can short circuit here.

That's the whole problem, Linux does not think of ND or ARP as
interface specific thing, but as global thing. ND and ARP will happily
answer to query from any interface if any other interface has said IP.
I'm not sure why the Loopback ended up in Cisco ND Cache, answer is
either Cisco queried for it or Linux did gratuitous answer. I believe
gratuitous.

> Hmm, I would keep the Loopback announcements out of the BGP.

It's extremely common way to do anycast, but not interesting for the
topic at hand.

> For enterprise and cloud stuff it is certainly very surprising, as some
> isolations don't work as expected. OTOH it is really easy to build up
> home networks and things are more plug and play.

Can you give me practical example when the behaviour is desirable, my
imagination is failing me. I'm not arguing, I just want to understand
it, as I've never had the need myself.
I've never ran into setup which needs it, but cursory googling shows
several people having broken networks because of the behaviour. If it
is needed, I'm sure it's esoteric setup and perhaps saner default
would that extra sysctl config is needed to get this interface
agnostic ARP/ND behaviour.

> Some RFCs require that for some router implementations (CPE), on the
> other hand weak end model in Linux was probably inherited by IPv4. The
> addition of duplicate address detection (which of course only makes
> sense in strong end systems) to IPv6, basically shows that IPv6 is more
> or less designed to be a strong end system model.
>
> Anyway, a patch to suppress ndisc requests on those interfaces will
> probably be accepted.

Grand, not that I feel comfortable writing it. I'd rather see the
whole suppression functionality moved to neighbour.c from being AFI
specific.

-- 
  ++ytti

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: arp_filter and IPv6 ND
  2016-12-02 17:51       ` Saku Ytti
@ 2016-12-02 18:39         ` Hannes Frederic Sowa
  2016-12-03 14:21           ` Saku Ytti
  0 siblings, 1 reply; 8+ messages in thread
From: Hannes Frederic Sowa @ 2016-12-02 18:39 UTC (permalink / raw)
  To: Saku Ytti; +Cc: netdev

Hi,

On 02.12.2016 18:51, Saku Ytti wrote:
> On 2 December 2016 at 18:45, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> 
>> next-hop-self attribute on your neighbor in that direction? BGP in
>> general doesn't lead to ND entry installs, protocols like IS-IS afair
>> can short circuit here.
> 
> That's the whole problem, Linux does not think of ND or ARP as
> interface specific thing, but as global thing. ND and ARP will happily
> answer to query from any interface if any other interface has said IP.
> I'm not sure why the Loopback ended up in Cisco ND Cache, answer is
> either Cisco queried for it or Linux did gratuitous answer. I believe
> gratuitous.
> 
>> Hmm, I would keep the Loopback announcements out of the BGP.
> 
> It's extremely common way to do anycast, but not interesting for the
> topic at hand.

Okay, sorry, I understood that you terminate the bgp sessions on those
IPs and use them for router-ids. But they are merely service IPs bound
to the loopback interface. I take it back and even for end system bgp
speakers/service announcement that is fine.

>> For enterprise and cloud stuff it is certainly very surprising, as some
>> isolations don't work as expected. OTOH it is really easy to build up
>> home networks and things are more plug and play.
> 
> Can you give me practical example when the behaviour is desirable, my
> imagination is failing me. I'm not arguing, I just want to understand
> it, as I've never had the need myself.

The major difference is that you e.g. keep connectivity in some
scenarios where strong end systems would fail.

E.g. you can use IP addresses bound to other interfaces to send replys
on another interface. This can be useful if you have a limited amount of
IP addresses on the system but much more interfaces. Especially if they
are limited in scope, like in IPv6.

Basically Cisco's feature of "unnumbered interface" is always provided
in Linux. And there are certainly cases where you would want to use it,
e.g. emulate private-vlan feature for network separation.

Also in the BGP setup, you might have it easier to establish loopback
neighbor contact by just using static on-link routes, without caring
about more complex numbering there (otherwise you pretty soon introduce
OSPF or some other routing protocol to do the recursive forward resolution).

> I've never ran into setup which needs it, but cursory googling shows
> several people having broken networks because of the behaviour. If it
> is needed, I'm sure it's esoteric setup and perhaps saner default
> would that extra sysctl config is needed to get this interface
> agnostic ARP/ND behaviour.

Yes, it is a very problematic situation at internet exchanges and weak
end behavior must be disabled there as it causes havoc.

As global IPv6 addresses are more or less global, such problems actually
shouldn't exist, as no conflicting IP addresses should show up. Link
Local addresses are anyway handled in a strong end manner. Thus if the
Cisco router would install your routing entry you would probably not
have noticed. :)

>> Some RFCs require that for some router implementations (CPE), on the
>> other hand weak end model in Linux was probably inherited by IPv4. The
>> addition of duplicate address detection (which of course only makes
>> sense in strong end systems) to IPv6, basically shows that IPv6 is more
>> or less designed to be a strong end system model.
>>
>> Anyway, a patch to suppress ndisc requests on those interfaces will
>> probably be accepted.
> 
> Grand, not that I feel comfortable writing it. I'd rather see the
> whole suppression functionality moved to neighbour.c from being AFI
> specific.

Yes sure, please provide a patch. A separate sysctl is necessary anyway
because the current one is within the ipv4 procfs directory hierarchy.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: arp_filter and IPv6 ND
  2016-12-02 18:39         ` Hannes Frederic Sowa
@ 2016-12-03 14:21           ` Saku Ytti
  2016-12-06 12:07             ` Hannes Frederic Sowa
  0 siblings, 1 reply; 8+ messages in thread
From: Saku Ytti @ 2016-12-03 14:21 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: netdev

On 2 December 2016 at 20:39, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:

Hey,

> E.g. you can use IP addresses bound to other interfaces to send replys
> on another interface. This can be useful if you have a limited amount of
> IP addresses on the system but much more interfaces. Especially if they
> are limited in scope, like in IPv6.
>
> Basically Cisco's feature of "unnumbered interface" is always provided
> in Linux. And there are certainly cases where you would want to use it,
> e.g. emulate private-vlan feature for network separation.

Got it, thanks, the explanation makes sense. And indeed it's valid
case, but also it is the exception, not the rule. I think it would be
entirely change the default and people who want 'unnumbered' style
behaviour (like some BRAS scenarios), will know how to and why to
configure it.

> Also in the BGP setup, you might have it easier to establish loopback
> neighbor contact by just using static on-link routes, without caring
> about more complex numbering there (otherwise you pretty soon introduce
> OSPF or some other routing protocol to do the recursive forward resolution).

The BGP is running on-link, it's just that the BGP is advertising loop
of Linux. Why the loop ends up in ND cache, I don't know.

>> Grand, not that I feel comfortable writing it. I'd rather see the
>> whole suppression functionality moved to neighbour.c from being AFI
>> specific.
>
> Yes sure, please provide a patch. A separate sysctl is necessary anyway
> because the current one is within the ipv4 procfs directory hierarchy.

Sorry, not a comfortable C programmer, I'm pretty confident I could
get it working, but I'm more confident that patch would be entirely
rejected and rewritten by someone who knows what they are doing.
I see no reason not to have AFI specific toggle, just logic and code
should be AFI agnostic, like GC (ARP/ND cache time) stuff in
neighbour.c is nicely done. Frankly whole ARP/ND code could do with
refactoring to make arp.c and ndisc.c more wire-format stuff and
behavioural code more in neighbour.c.


-- 
  ++ytti

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: arp_filter and IPv6 ND
  2016-12-03 14:21           ` Saku Ytti
@ 2016-12-06 12:07             ` Hannes Frederic Sowa
  0 siblings, 0 replies; 8+ messages in thread
From: Hannes Frederic Sowa @ 2016-12-06 12:07 UTC (permalink / raw)
  To: Saku Ytti; +Cc: netdev

On 03.12.2016 15:21, Saku Ytti wrote:
> On 2 December 2016 at 20:39, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> 
> Hey,
> 
>> E.g. you can use IP addresses bound to other interfaces to send replys
>> on another interface. This can be useful if you have a limited amount of
>> IP addresses on the system but much more interfaces. Especially if they
>> are limited in scope, like in IPv6.
>>
>> Basically Cisco's feature of "unnumbered interface" is always provided
>> in Linux. And there are certainly cases where you would want to use it,
>> e.g. emulate private-vlan feature for network separation.
> 
> Got it, thanks, the explanation makes sense. And indeed it's valid
> case, but also it is the exception, not the rule. I think it would be
> entirely change the default and people who want 'unnumbered' style
> behaviour (like some BRAS scenarios), will know how to and why to
> configure it.

The limited ip address scenario is actually more common for normal
routers. ;)

In retrospect I don't know what what would win if the decision would be
made again. Mostly all operating systems switched to strong end host
model over time, Linux remaining in the weak host end camp alone. It
probably is also easier to go from strong end to weak end system by
policy than vice versa, so I would probably also picked strong end
system semantics by default today.

>> Also in the BGP setup, you might have it easier to establish loopback
>> neighbor contact by just using static on-link routes, without caring
>> about more complex numbering there (otherwise you pretty soon introduce
>> OSPF or some other routing protocol to do the recursive forward resolution).
> 
> The BGP is running on-link, it's just that the BGP is advertising loop
> of Linux. Why the loop ends up in ND cache, I don't know.

Did you check neighbor advertisements and solicitations with tcpdump?

Did you have force_tllao, ndisc_notify enabled? Which source address
does the BGP/TCP connection use?

>>> Grand, not that I feel comfortable writing it. I'd rather see the
>>> whole suppression functionality moved to neighbour.c from being AFI
>>> specific.
>>
>> Yes sure, please provide a patch. A separate sysctl is necessary anyway
>> because the current one is within the ipv4 procfs directory hierarchy.
> 
> Sorry, not a comfortable C programmer, I'm pretty confident I could
> get it working, but I'm more confident that patch would be entirely
> rejected and rewritten by someone who knows what they are doing.
> I see no reason not to have AFI specific toggle, just logic and code
> should be AFI agnostic, like GC (ARP/ND cache time) stuff in
> neighbour.c is nicely done. Frankly whole ARP/ND code could do with
> refactoring to make arp.c and ndisc.c more wire-format stuff and
> behavioural code more in neighbour.c.

Let's first see what the real problem is.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-12-06 12:07 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-02 12:51 arp_filter and IPv6 ND Saku Ytti
2016-12-02 14:08 ` Hannes Frederic Sowa
2016-12-02 15:42   ` Saku Ytti
2016-12-02 16:45     ` Hannes Frederic Sowa
2016-12-02 17:51       ` Saku Ytti
2016-12-02 18:39         ` Hannes Frederic Sowa
2016-12-03 14:21           ` Saku Ytti
2016-12-06 12:07             ` Hannes Frederic Sowa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.