All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch net] ipv6: do not create neighbor entries for local delivery
@ 2013-01-30  8:26 Jiri Pirko
  2013-01-31  1:26 ` David Miller
  2013-08-08 18:45 ` Debabrata Banerjee
  0 siblings, 2 replies; 17+ messages in thread
From: Jiri Pirko @ 2013-01-30  8:26 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuznet, jmorris, yoshfuji, kaber, mleitner

From: Marcelo Ricardo Leitner <mleitner@redhat.com>

They will be created at output, if ever needed. This avoids creating
empty neighbor entries when TPROXYing/Forwarding packets for addresses
that are not even directly reachable.

Note that IPv4 already handles it this way. No neighbor entries are
created for local input.

Tested by myself and customer.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
---
 net/ipv6/route.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index e229a3b..363d8b7 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -928,7 +928,7 @@ restart:
 	dst_hold(&rt->dst);
 	read_unlock_bh(&table->tb6_lock);
 
-	if (!rt->n && !(rt->rt6i_flags & RTF_NONEXTHOP))
+	if (!rt->n && !(rt->rt6i_flags & (RTF_NONEXTHOP | RTF_LOCAL)))
 		nrt = rt6_alloc_cow(rt, &fl6->daddr, &fl6->saddr);
 	else if (!(rt->dst.flags & DST_HOST))
 		nrt = rt6_alloc_clone(rt, &fl6->daddr);
-- 
1.8.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [patch net] ipv6: do not create neighbor entries for local delivery
  2013-01-30  8:26 [patch net] ipv6: do not create neighbor entries for local delivery Jiri Pirko
@ 2013-01-31  1:26 ` David Miller
  2013-08-08 18:45 ` Debabrata Banerjee
  1 sibling, 0 replies; 17+ messages in thread
From: David Miller @ 2013-01-31  1:26 UTC (permalink / raw)
  To: jiri; +Cc: netdev, kuznet, jmorris, yoshfuji, kaber, mleitner

From: Jiri Pirko <jiri@resnulli.us>
Date: Wed, 30 Jan 2013 09:26:08 +0100

> From: Marcelo Ricardo Leitner <mleitner@redhat.com>
> 
> They will be created at output, if ever needed. This avoids creating
> empty neighbor entries when TPROXYing/Forwarding packets for addresses
> that are not even directly reachable.
> 
> Note that IPv4 already handles it this way. No neighbor entries are
> created for local input.
> 
> Tested by myself and customer.
> 
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch net] ipv6: do not create neighbor entries for local delivery
  2013-01-30  8:26 [patch net] ipv6: do not create neighbor entries for local delivery Jiri Pirko
  2013-01-31  1:26 ` David Miller
@ 2013-08-08 18:45 ` Debabrata Banerjee
  2013-08-08 19:01   ` Hannes Frederic Sowa
  2013-08-08 19:47   ` Hannes Frederic Sowa
  1 sibling, 2 replies; 17+ messages in thread
From: Debabrata Banerjee @ 2013-08-08 18:45 UTC (permalink / raw)
  To: Jiri Pirko, mleitner, davem
  Cc: netdev, Alexey Kuznetsov, jmorris, yoshfuji, Patrick McHardy,
	Banerjee, Debabrata, Joshua Hunt

On Wed, Jan 30, 2013 at 3:26 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> From: Marcelo Ricardo Leitner <mleitner@redhat.com>
>
> They will be created at output, if ever needed. This avoids creating
> empty neighbor entries when TPROXYing/Forwarding packets for addresses
> that are not even directly reachable.
>
> Note that IPv4 already handles it this way. No neighbor entries are
> created for local input.
>
> Tested by myself and customer.
>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
> ---
>  net/ipv6/route.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index e229a3b..363d8b7 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -928,7 +928,7 @@ restart:
>         dst_hold(&rt->dst);
>         read_unlock_bh(&table->tb6_lock);
>
> -       if (!rt->n && !(rt->rt6i_flags & RTF_NONEXTHOP))
> +       if (!rt->n && !(rt->rt6i_flags & (RTF_NONEXTHOP | RTF_LOCAL)))
>                 nrt = rt6_alloc_cow(rt, &fl6->daddr, &fl6->saddr);
>         else if (!(rt->dst.flags & DST_HOST))
>                 nrt = rt6_alloc_clone(rt, &fl6->daddr);



I'm not sure this patch is doing the right thing. It seems to break
IPv6 loopback functionality, it is no longer equivalent to IPv4, as
stated above. It doesn't just stop neighbor creation but it stops
cached route creation. Seems like a scary change for a stable tree.
See below:

$ ip -4 route show local
local 127.0.0.0/8 dev lo  proto kernel  scope host  src 127.0.0.1

This local route enables us to use the whole loopback network, any
address inside 127.0.0.0/8 will work.

$ ping -c1 127.0.0.9
PING 127.0.0.9 (127.0.0.9) 56(84) bytes of data.
64 bytes from 127.0.0.9: icmp_seq=1 ttl=64 time=0.012 ms

--- 127.0.0.9 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.012/0.012/0.012/0.000 ms

This also used to work equivalently for IPv6 local loopback routes:

$ ip -6 route add local 2001:::/64 dev lo
$ ping6 -c1 2001::9
PING 2001::9(2001::9) 56 data bytes
64 bytes from 2001::9: icmp_seq=1 ttl=64 time=0.010 ms

--- 2001::9 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.010/0.010/0.010/0.000 ms

However with this patch, this is very broken:

$ ip -6 route add local 2001::/64 dev lo
$ ping6 -c1 2001::9
PING 2001::9(2001::9) 56 data bytes
ping: sendmsg: Invalid argument

--- 2001::9 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

Thanks,
Debabrata

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch net] ipv6: do not create neighbor entries for local delivery
  2013-08-08 18:45 ` Debabrata Banerjee
@ 2013-08-08 19:01   ` Hannes Frederic Sowa
  2013-08-08 19:02     ` Marcelo Ricardo Leitner
  2013-08-08 19:19     ` Debabrata Banerjee
  2013-08-08 19:47   ` Hannes Frederic Sowa
  1 sibling, 2 replies; 17+ messages in thread
From: Hannes Frederic Sowa @ 2013-08-08 19:01 UTC (permalink / raw)
  To: Debabrata Banerjee
  Cc: Jiri Pirko, mleitner, davem, netdev, Alexey Kuznetsov, jmorris,
	yoshfuji, Patrick McHardy, Banerjee, Debabrata, Joshua Hunt

On Thu, Aug 08, 2013 at 02:45:40PM -0400, Debabrata Banerjee wrote:
> On Wed, Jan 30, 2013 at 3:26 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> > From: Marcelo Ricardo Leitner <mleitner@redhat.com>
> >
> > They will be created at output, if ever needed. This avoids creating
> > empty neighbor entries when TPROXYing/Forwarding packets for addresses
> > that are not even directly reachable.
> >
> > Note that IPv4 already handles it this way. No neighbor entries are
> > created for local input.
> >
> > Tested by myself and customer.
> >
> > Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> > Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
> >
> > [...]
> 
> I'm not sure this patch is doing the right thing. It seems to break
> IPv6 loopback functionality, it is no longer equivalent to IPv4, as
> stated above. It doesn't just stop neighbor creation but it stops
> cached route creation. Seems like a scary change for a stable tree.
> See below:
> 
> $ ip -4 route show local
> local 127.0.0.0/8 dev lo  proto kernel  scope host  src 127.0.0.1
> 
> This local route enables us to use the whole loopback network, any
> address inside 127.0.0.0/8 will work.
> 
> $ ping -c1 127.0.0.9
> PING 127.0.0.9 (127.0.0.9) 56(84) bytes of data.
> 64 bytes from 127.0.0.9: icmp_seq=1 ttl=64 time=0.012 ms
> 
> --- 127.0.0.9 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.012/0.012/0.012/0.000 ms
> 
> This also used to work equivalently for IPv6 local loopback routes:
> 
> $ ip -6 route add local 2001:::/64 dev lo
> $ ping6 -c1 2001::9
> PING 2001::9(2001::9) 56 data bytes
> 64 bytes from 2001::9: icmp_seq=1 ttl=64 time=0.010 ms
> 
> --- 2001::9 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.010/0.010/0.010/0.000 ms
> 
> However with this patch, this is very broken:
> 
> $ ip -6 route add local 2001::/64 dev lo
> $ ping6 -c1 2001::9
> PING 2001::9(2001::9) 56 data bytes
> ping: sendmsg: Invalid argument
> 
> --- 2001::9 ping statistics ---
> 1 packets transmitted, 0 received, 100% packet loss, time 0ms

Which kernel version are you using? Perhaps you miss another fix? It works for
me. Also I cannot find this patch in net-next?

Greetings,

  Hannes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch net] ipv6: do not create neighbor entries for local delivery
  2013-08-08 19:01   ` Hannes Frederic Sowa
@ 2013-08-08 19:02     ` Marcelo Ricardo Leitner
  2013-08-08 19:06       ` Hannes Frederic Sowa
  2013-08-08 19:19     ` Debabrata Banerjee
  1 sibling, 1 reply; 17+ messages in thread
From: Marcelo Ricardo Leitner @ 2013-08-08 19:02 UTC (permalink / raw)
  To: Debabrata Banerjee, Jiri Pirko, davem, netdev, Alexey Kuznetsov,
	jmorris, yoshfuji, Patrick McHardy, Banerjee, Debabrata,
	Joshua Hunt

Em 08-08-2013 16:01, Hannes Frederic Sowa escreveu:
> On Thu, Aug 08, 2013 at 02:45:40PM -0400, Debabrata Banerjee wrote:
>> On Wed, Jan 30, 2013 at 3:26 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>> From: Marcelo Ricardo Leitner <mleitner@redhat.com>
>>>
>>> They will be created at output, if ever needed. This avoids creating
>>> empty neighbor entries when TPROXYing/Forwarding packets for addresses
>>> that are not even directly reachable.
>>>
>>> Note that IPv4 already handles it this way. No neighbor entries are
>>> created for local input.
>>>
>>> Tested by myself and customer.
>>>
>>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
>>>
>>> [...]
>>
>> I'm not sure this patch is doing the right thing. It seems to break
>> IPv6 loopback functionality, it is no longer equivalent to IPv4, as
>> stated above. It doesn't just stop neighbor creation but it stops
>> cached route creation. Seems like a scary change for a stable tree.
>> See below:
>>
>> $ ip -4 route show local
>> local 127.0.0.0/8 dev lo  proto kernel  scope host  src 127.0.0.1
>>
>> This local route enables us to use the whole loopback network, any
>> address inside 127.0.0.0/8 will work.
>>
>> $ ping -c1 127.0.0.9
>> PING 127.0.0.9 (127.0.0.9) 56(84) bytes of data.
>> 64 bytes from 127.0.0.9: icmp_seq=1 ttl=64 time=0.012 ms
>>
>> --- 127.0.0.9 ping statistics ---
>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>> rtt min/avg/max/mdev = 0.012/0.012/0.012/0.000 ms
>>
>> This also used to work equivalently for IPv6 local loopback routes:
>>
>> $ ip -6 route add local 2001:::/64 dev lo
>> $ ping6 -c1 2001::9
>> PING 2001::9(2001::9) 56 data bytes
>> 64 bytes from 2001::9: icmp_seq=1 ttl=64 time=0.010 ms
>>
>> --- 2001::9 ping statistics ---
>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>> rtt min/avg/max/mdev = 0.010/0.010/0.010/0.000 ms
>>
>> However with this patch, this is very broken:
>>
>> $ ip -6 route add local 2001::/64 dev lo
>> $ ping6 -c1 2001::9
>> PING 2001::9(2001::9) 56 data bytes
>> ping: sendmsg: Invalid argument
>>
>> --- 2001::9 ping statistics ---
>> 1 packets transmitted, 0 received, 100% packet loss, time 0ms
>
> Which kernel version are you using? Perhaps you miss another fix? It works for
> me. Also I cannot find this patch in net-next?

It wasn't needed/applied as the route cache was removed.

Regards,
Marcelo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch net] ipv6: do not create neighbor entries for local delivery
  2013-08-08 19:02     ` Marcelo Ricardo Leitner
@ 2013-08-08 19:06       ` Hannes Frederic Sowa
  2013-08-08 19:11         ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 17+ messages in thread
From: Hannes Frederic Sowa @ 2013-08-08 19:06 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Debabrata Banerjee, Jiri Pirko, davem, netdev, Alexey Kuznetsov,
	jmorris, yoshfuji, Patrick McHardy, Banerjee, Debabrata,
	Joshua Hunt

On Thu, Aug 08, 2013 at 04:02:36PM -0300, Marcelo Ricardo Leitner wrote:
> Em 08-08-2013 16:01, Hannes Frederic Sowa escreveu:
> >On Thu, Aug 08, 2013 at 02:45:40PM -0400, Debabrata Banerjee wrote:
> >>On Wed, Jan 30, 2013 at 3:26 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> >>>From: Marcelo Ricardo Leitner <mleitner@redhat.com>
> >>I'm not sure this patch is doing the right thing. It seems to break
> >>IPv6 loopback functionality, it is no longer equivalent to IPv4, as
> >>stated above. It doesn't just stop neighbor creation but it stops
> >>cached route creation. Seems like a scary change for a stable tree.
> >>See below:
> >>
> >>$ ip -4 route show local
> >>local 127.0.0.0/8 dev lo  proto kernel  scope host  src 127.0.0.1
> >>
> >>This local route enables us to use the whole loopback network, any
> >>address inside 127.0.0.0/8 will work.
> >>
> >>$ ping -c1 127.0.0.9
> >>PING 127.0.0.9 (127.0.0.9) 56(84) bytes of data.
> >>64 bytes from 127.0.0.9: icmp_seq=1 ttl=64 time=0.012 ms
> >>
> >>--- 127.0.0.9 ping statistics ---
> >>1 packets transmitted, 1 received, 0% packet loss, time 0ms
> >>rtt min/avg/max/mdev = 0.012/0.012/0.012/0.000 ms
> >>
> >>This also used to work equivalently for IPv6 local loopback routes:
> >>
> >>$ ip -6 route add local 2001:::/64 dev lo
> >>$ ping6 -c1 2001::9
> >>PING 2001::9(2001::9) 56 data bytes
> >>64 bytes from 2001::9: icmp_seq=1 ttl=64 time=0.010 ms
> >>
> >>--- 2001::9 ping statistics ---
> >>1 packets transmitted, 1 received, 0% packet loss, time 0ms
> >>rtt min/avg/max/mdev = 0.010/0.010/0.010/0.000 ms
> >>
> >>However with this patch, this is very broken:
> >>
> >>$ ip -6 route add local 2001::/64 dev lo
> >>$ ping6 -c1 2001::9
> >>PING 2001::9(2001::9) 56 data bytes
> >>ping: sendmsg: Invalid argument
> >>
> >>--- 2001::9 ping statistics ---
> >>1 packets transmitted, 0 received, 100% packet loss, time 0ms
> >
> >Which kernel version are you using? Perhaps you miss another fix? It works 
> >for
> >me. Also I cannot find this patch in net-next?
> 
> It wasn't needed/applied as the route cache was removed.

Do you mean the rt->n(eighbour) removal? There was no removal of a route cache
in IPv6 land. The cache is merely in the routing table itself.

Greetings,

  Hannes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch net] ipv6: do not create neighbor entries for local delivery
  2013-08-08 19:06       ` Hannes Frederic Sowa
@ 2013-08-08 19:11         ` Marcelo Ricardo Leitner
  2013-08-08 19:16           ` Hannes Frederic Sowa
  0 siblings, 1 reply; 17+ messages in thread
From: Marcelo Ricardo Leitner @ 2013-08-08 19:11 UTC (permalink / raw)
  To: Debabrata Banerjee, Jiri Pirko, davem, netdev, Alexey Kuznetsov,
	jmorris, yoshfuji, Patrick McHardy, Banerjee, Debabrata,
	Joshua Hunt

Em 08-08-2013 16:06, Hannes Frederic Sowa escreveu:
> On Thu, Aug 08, 2013 at 04:02:36PM -0300, Marcelo Ricardo Leitner wrote:
>> Em 08-08-2013 16:01, Hannes Frederic Sowa escreveu:
>>> On Thu, Aug 08, 2013 at 02:45:40PM -0400, Debabrata Banerjee wrote:
>>>> On Wed, Jan 30, 2013 at 3:26 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>>>> From: Marcelo Ricardo Leitner <mleitner@redhat.com>
>>>> I'm not sure this patch is doing the right thing. It seems to break
>>>> IPv6 loopback functionality, it is no longer equivalent to IPv4, as
>>>> stated above. It doesn't just stop neighbor creation but it stops
>>>> cached route creation. Seems like a scary change for a stable tree.
>>>> See below:
>>>>
>>>> $ ip -4 route show local
>>>> local 127.0.0.0/8 dev lo  proto kernel  scope host  src 127.0.0.1
>>>>
>>>> This local route enables us to use the whole loopback network, any
>>>> address inside 127.0.0.0/8 will work.
>>>>
>>>> $ ping -c1 127.0.0.9
>>>> PING 127.0.0.9 (127.0.0.9) 56(84) bytes of data.
>>>> 64 bytes from 127.0.0.9: icmp_seq=1 ttl=64 time=0.012 ms
>>>>
>>>> --- 127.0.0.9 ping statistics ---
>>>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>>>> rtt min/avg/max/mdev = 0.012/0.012/0.012/0.000 ms
>>>>
>>>> This also used to work equivalently for IPv6 local loopback routes:
>>>>
>>>> $ ip -6 route add local 2001:::/64 dev lo
>>>> $ ping6 -c1 2001::9
>>>> PING 2001::9(2001::9) 56 data bytes
>>>> 64 bytes from 2001::9: icmp_seq=1 ttl=64 time=0.010 ms
>>>>
>>>> --- 2001::9 ping statistics ---
>>>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>>>> rtt min/avg/max/mdev = 0.010/0.010/0.010/0.000 ms
>>>>
>>>> However with this patch, this is very broken:
>>>>
>>>> $ ip -6 route add local 2001::/64 dev lo
>>>> $ ping6 -c1 2001::9
>>>> PING 2001::9(2001::9) 56 data bytes
>>>> ping: sendmsg: Invalid argument
>>>>
>>>> --- 2001::9 ping statistics ---
>>>> 1 packets transmitted, 0 received, 100% packet loss, time 0ms
>>>
>>> Which kernel version are you using? Perhaps you miss another fix? It works
>>> for
>>> me. Also I cannot find this patch in net-next?
>>
>> It wasn't needed/applied as the route cache was removed.
>
> Do you mean the rt->n(eighbour) removal? There was no removal of a route cache
> in IPv6 land. The cache is merely in the routing table itself.

Yes, my bad, sorry. s/route/neighour/. It was discussed on this thread:
http://article.gmane.org/gmane.linux.network/255318

"Note also that YOSHIFUJI Hideaki's patches to remove the cached neighbour
entirely from ipv6 routes will have the same effect, so your patch won't
be needed."

Thanks,
Marcelo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch net] ipv6: do not create neighbor entries for local delivery
  2013-08-08 19:11         ` Marcelo Ricardo Leitner
@ 2013-08-08 19:16           ` Hannes Frederic Sowa
  2013-08-08 19:23             ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 17+ messages in thread
From: Hannes Frederic Sowa @ 2013-08-08 19:16 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Debabrata Banerjee, Jiri Pirko, davem, netdev, Alexey Kuznetsov,
	jmorris, yoshfuji, Patrick McHardy, Banerjee, Debabrata,
	Joshua Hunt

On Thu, Aug 08, 2013 at 04:11:28PM -0300, Marcelo Ricardo Leitner wrote:
> Em 08-08-2013 16:06, Hannes Frederic Sowa escreveu:
> >On Thu, Aug 08, 2013 at 04:02:36PM -0300, Marcelo Ricardo Leitner wrote:
> >>Em 08-08-2013 16:01, Hannes Frederic Sowa escreveu:
> >>>On Thu, Aug 08, 2013 at 02:45:40PM -0400, Debabrata Banerjee wrote:
> >>>>On Wed, Jan 30, 2013 at 3:26 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> >>>>>From: Marcelo Ricardo Leitner <mleitner@redhat.com>
> >>>>I'm not sure this patch is doing the right thing. It seems to break
> >>>>IPv6 loopback functionality, it is no longer equivalent to IPv4, as
> >>>>stated above. It doesn't just stop neighbor creation but it stops
> >>>>cached route creation. Seems like a scary change for a stable tree.
> >>>>See below:
> >>>>
> >>>>$ ip -4 route show local
> >>>>local 127.0.0.0/8 dev lo  proto kernel  scope host  src 127.0.0.1
> >>>>
> >>>>This local route enables us to use the whole loopback network, any
> >>>>address inside 127.0.0.0/8 will work.
> >>>>
> >>>>$ ping -c1 127.0.0.9
> >>>>PING 127.0.0.9 (127.0.0.9) 56(84) bytes of data.
> >>>>64 bytes from 127.0.0.9: icmp_seq=1 ttl=64 time=0.012 ms
> >>>>
> >>>>--- 127.0.0.9 ping statistics ---
> >>>>1 packets transmitted, 1 received, 0% packet loss, time 0ms
> >>>>rtt min/avg/max/mdev = 0.012/0.012/0.012/0.000 ms
> >>>>
> >>>>This also used to work equivalently for IPv6 local loopback routes:
> >>>>
> >>>>$ ip -6 route add local 2001:::/64 dev lo
> >>>>$ ping6 -c1 2001::9
> >>>>PING 2001::9(2001::9) 56 data bytes
> >>>>64 bytes from 2001::9: icmp_seq=1 ttl=64 time=0.010 ms
> >>>>
> >>>>--- 2001::9 ping statistics ---
> >>>>1 packets transmitted, 1 received, 0% packet loss, time 0ms
> >>>>rtt min/avg/max/mdev = 0.010/0.010/0.010/0.000 ms
> >>>>
> >>>>However with this patch, this is very broken:
> >>>>
> >>>>$ ip -6 route add local 2001::/64 dev lo
> >>>>$ ping6 -c1 2001::9
> >>>>PING 2001::9(2001::9) 56 data bytes
> >>>>ping: sendmsg: Invalid argument
> >>>>
> >>>>--- 2001::9 ping statistics ---
> >>>>1 packets transmitted, 0 received, 100% packet loss, time 0ms
> >>>
> >>>Which kernel version are you using? Perhaps you miss another fix? It 
> >>>works
> >>>for
> >>>me. Also I cannot find this patch in net-next?
> >>
> >>It wasn't needed/applied as the route cache was removed.
> >
> >Do you mean the rt->n(eighbour) removal? There was no removal of a route 
> >cache
> >in IPv6 land. The cache is merely in the routing table itself.
> 
> Yes, my bad, sorry. s/route/neighour/. It was discussed on this thread:
> http://article.gmane.org/gmane.linux.network/255318
> 
> "Note also that YOSHIFUJI Hideaki's patches to remove the cached neighbour
> entirely from ipv6 routes will have the same effect, so your patch won't
> be needed."

Ok, thanks!

But it somehow managed to get into stable kernels, nor? Kernels after rt->n
removal should not be affected. At least the example above works on my
net-next kernel correctly.

Greetings,

  Hannes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch net] ipv6: do not create neighbor entries for local delivery
  2013-08-08 19:01   ` Hannes Frederic Sowa
  2013-08-08 19:02     ` Marcelo Ricardo Leitner
@ 2013-08-08 19:19     ` Debabrata Banerjee
  1 sibling, 0 replies; 17+ messages in thread
From: Debabrata Banerjee @ 2013-08-08 19:19 UTC (permalink / raw)
  To: Debabrata Banerjee, Jiri Pirko, mleitner, davem, netdev,
	Alexey Kuznetsov, jmorris, yoshfuji, Patrick McHardy, Banerjee,
	Debabrata, Joshua Hunt

On Thu, Aug 8, 2013 at 3:01 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
>
> Which kernel version are you using? Perhaps you miss another fix? It works for
> me. Also I cannot find this patch in net-next?
>

Just pulled and tried longterm 3.2.50, behavior is the same, broken.

-Debabrata

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch net] ipv6: do not create neighbor entries for local delivery
  2013-08-08 19:16           ` Hannes Frederic Sowa
@ 2013-08-08 19:23             ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 17+ messages in thread
From: Marcelo Ricardo Leitner @ 2013-08-08 19:23 UTC (permalink / raw)
  To: Debabrata Banerjee, Jiri Pirko, davem, netdev, Alexey Kuznetsov,
	jmorris, yoshfuji, Patrick McHardy, Banerjee, Debabrata,
	Joshua Hunt

Em 08-08-2013 16:16, Hannes Frederic Sowa escreveu:
> On Thu, Aug 08, 2013 at 04:11:28PM -0300, Marcelo Ricardo Leitner wrote:
>> Em 08-08-2013 16:06, Hannes Frederic Sowa escreveu:
>>> On Thu, Aug 08, 2013 at 04:02:36PM -0300, Marcelo Ricardo Leitner wrote:
>>>> Em 08-08-2013 16:01, Hannes Frederic Sowa escreveu:
>>>>> On Thu, Aug 08, 2013 at 02:45:40PM -0400, Debabrata Banerjee wrote:
>>>>>> On Wed, Jan 30, 2013 at 3:26 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>>>>>> From: Marcelo Ricardo Leitner <mleitner@redhat.com>
>>>>>> I'm not sure this patch is doing the right thing. It seems to break
>>>>>> IPv6 loopback functionality, it is no longer equivalent to IPv4, as
>>>>>> stated above. It doesn't just stop neighbor creation but it stops
>>>>>> cached route creation. Seems like a scary change for a stable tree.
>>>>>> See below:
>>>>>>
>>>>>> $ ip -4 route show local
>>>>>> local 127.0.0.0/8 dev lo  proto kernel  scope host  src 127.0.0.1
>>>>>>
>>>>>> This local route enables us to use the whole loopback network, any
>>>>>> address inside 127.0.0.0/8 will work.
>>>>>>
>>>>>> $ ping -c1 127.0.0.9
>>>>>> PING 127.0.0.9 (127.0.0.9) 56(84) bytes of data.
>>>>>> 64 bytes from 127.0.0.9: icmp_seq=1 ttl=64 time=0.012 ms
>>>>>>
>>>>>> --- 127.0.0.9 ping statistics ---
>>>>>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>>>>>> rtt min/avg/max/mdev = 0.012/0.012/0.012/0.000 ms
>>>>>>
>>>>>> This also used to work equivalently for IPv6 local loopback routes:
>>>>>>
>>>>>> $ ip -6 route add local 2001:::/64 dev lo
>>>>>> $ ping6 -c1 2001::9
>>>>>> PING 2001::9(2001::9) 56 data bytes
>>>>>> 64 bytes from 2001::9: icmp_seq=1 ttl=64 time=0.010 ms
>>>>>>
>>>>>> --- 2001::9 ping statistics ---
>>>>>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>>>>>> rtt min/avg/max/mdev = 0.010/0.010/0.010/0.000 ms
>>>>>>
>>>>>> However with this patch, this is very broken:
>>>>>>
>>>>>> $ ip -6 route add local 2001::/64 dev lo
>>>>>> $ ping6 -c1 2001::9
>>>>>> PING 2001::9(2001::9) 56 data bytes
>>>>>> ping: sendmsg: Invalid argument
>>>>>>
>>>>>> --- 2001::9 ping statistics ---
>>>>>> 1 packets transmitted, 0 received, 100% packet loss, time 0ms
>>>>>
>>>>> Which kernel version are you using? Perhaps you miss another fix? It
>>>>> works
>>>>> for
>>>>> me. Also I cannot find this patch in net-next?
>>>>
>>>> It wasn't needed/applied as the route cache was removed.
>>>
>>> Do you mean the rt->n(eighbour) removal? There was no removal of a route
>>> cache
>>> in IPv6 land. The cache is merely in the routing table itself.
>>
>> Yes, my bad, sorry. s/route/neighour/. It was discussed on this thread:
>> http://article.gmane.org/gmane.linux.network/255318
>>
>> "Note also that YOSHIFUJI Hideaki's patches to remove the cached neighbour
>> entirely from ipv6 routes will have the same effect, so your patch won't
>> be needed."
>
> Ok, thanks!
>
> But it somehow managed to get into stable kernels, nor? Kernels after rt->n
> removal should not be affected. At least the example above works on my
> net-next kernel correctly.

Yes, it did, as a intermediate fix, let's say. As we wouldn't remove the cache 
for -stable tree, this patch seems reasonable to avoid creating a flood of 
non-wanted entries. Without it, when using TPROXY, it was creating neighbor 
entries for IP addresses that were behind a gateway.

In case it helps:
http://thread.gmane.org/gmane.linux.network/255234/focus=257293
http://article.gmane.org/gmane.linux.network/257433 (this thread, actually)

Thanks,
Marcelo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch net] ipv6: do not create neighbor entries for local delivery
  2013-08-08 18:45 ` Debabrata Banerjee
  2013-08-08 19:01   ` Hannes Frederic Sowa
@ 2013-08-08 19:47   ` Hannes Frederic Sowa
  2013-08-08 20:16     ` Hannes Frederic Sowa
  1 sibling, 1 reply; 17+ messages in thread
From: Hannes Frederic Sowa @ 2013-08-08 19:47 UTC (permalink / raw)
  To: Debabrata Banerjee
  Cc: Jiri Pirko, mleitner, davem, netdev, Alexey Kuznetsov, jmorris,
	yoshfuji, Patrick McHardy, Banerjee, Debabrata, Joshua Hunt

On Thu, Aug 08, 2013 at 02:45:40PM -0400, Debabrata Banerjee wrote:
> On Wed, Jan 30, 2013 at 3:26 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> > From: Marcelo Ricardo Leitner <mleitner@redhat.com>
> >
> > They will be created at output, if ever needed. This avoids creating
> > empty neighbor entries when TPROXYing/Forwarding packets for addresses
> > that are not even directly reachable.
> >
> > Note that IPv4 already handles it this way. No neighbor entries are
> > created for local input.
> >
> > Tested by myself and customer.
> >
> > Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> > Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
> > ---
> >  net/ipv6/route.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> > index e229a3b..363d8b7 100644
> > --- a/net/ipv6/route.c
> > +++ b/net/ipv6/route.c
> > @@ -928,7 +928,7 @@ restart:
> >         dst_hold(&rt->dst);
> >         read_unlock_bh(&table->tb6_lock);
> >
> > -       if (!rt->n && !(rt->rt6i_flags & RTF_NONEXTHOP))
> > +       if (!rt->n && !(rt->rt6i_flags & (RTF_NONEXTHOP | RTF_LOCAL)))
> >                 nrt = rt6_alloc_cow(rt, &fl6->daddr, &fl6->saddr);
> >         else if (!(rt->dst.flags & DST_HOST))
> >                 nrt = rt6_alloc_clone(rt, &fl6->daddr);
> 
> 
> 
> I'm not sure this patch is doing the right thing. It seems to break
> IPv6 loopback functionality, it is no longer equivalent to IPv4, as
> stated above. It doesn't just stop neighbor creation but it stops
> cached route creation. Seems like a scary change for a stable tree.
> See below:
> 
> $ ip -4 route show local
> local 127.0.0.0/8 dev lo  proto kernel  scope host  src 127.0.0.1
> 
> This local route enables us to use the whole loopback network, any
> address inside 127.0.0.0/8 will work.
> 
> $ ping -c1 127.0.0.9
> PING 127.0.0.9 (127.0.0.9) 56(84) bytes of data.
> 64 bytes from 127.0.0.9: icmp_seq=1 ttl=64 time=0.012 ms
> 
> --- 127.0.0.9 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.012/0.012/0.012/0.000 ms
> 
> This also used to work equivalently for IPv6 local loopback routes:
> 
> $ ip -6 route add local 2001:::/64 dev lo
> $ ping6 -c1 2001::9
> PING 2001::9(2001::9) 56 data bytes
> 64 bytes from 2001::9: icmp_seq=1 ttl=64 time=0.010 ms
> 
> --- 2001::9 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.010/0.010/0.010/0.000 ms
> 
> However with this patch, this is very broken:
> 
> $ ip -6 route add local 2001::/64 dev lo
> $ ping6 -c1 2001::9
> PING 2001::9(2001::9) 56 data bytes
> ping: sendmsg: Invalid argument

I do think that the patch above is fine. I wonder why you get a blackhole
route back here. Maybe backtracking in ip6_pol_route or in fib6_lookup_1 was
way too aggressive?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch net] ipv6: do not create neighbor entries for local delivery
  2013-08-08 19:47   ` Hannes Frederic Sowa
@ 2013-08-08 20:16     ` Hannes Frederic Sowa
  2013-08-08 20:45       ` Marcelo Ricardo Leitner
  2013-08-12 18:09       ` Marcelo Ricardo Leitner
  0 siblings, 2 replies; 17+ messages in thread
From: Hannes Frederic Sowa @ 2013-08-08 20:16 UTC (permalink / raw)
  To: Debabrata Banerjee, Jiri Pirko, mleitner, davem, netdev,
	Alexey Kuznetsov, jmorris, yoshfuji, Patrick McHardy, Banerjee,
	Debabrata, Joshua Hunt

On Thu, Aug 08, 2013 at 09:47:02PM +0200, Hannes Frederic Sowa wrote:
> On Thu, Aug 08, 2013 at 02:45:40PM -0400, Debabrata Banerjee wrote:
> > On Wed, Jan 30, 2013 at 3:26 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> > > From: Marcelo Ricardo Leitner <mleitner@redhat.com>
> > >
> > > They will be created at output, if ever needed. This avoids creating
> > > empty neighbor entries when TPROXYing/Forwarding packets for addresses
> > > that are not even directly reachable.
> > >
> > > Note that IPv4 already handles it this way. No neighbor entries are
> > > created for local input.
> > >
> > > Tested by myself and customer.
> > >
> > > Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> > > Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
> > > ---
> > >  net/ipv6/route.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> > > index e229a3b..363d8b7 100644
> > > --- a/net/ipv6/route.c
> > > +++ b/net/ipv6/route.c
> > > @@ -928,7 +928,7 @@ restart:
> > >         dst_hold(&rt->dst);
> > >         read_unlock_bh(&table->tb6_lock);
> > >
> > > -       if (!rt->n && !(rt->rt6i_flags & RTF_NONEXTHOP))
> > > +       if (!rt->n && !(rt->rt6i_flags & (RTF_NONEXTHOP | RTF_LOCAL)))
> > >                 nrt = rt6_alloc_cow(rt, &fl6->daddr, &fl6->saddr);
> > >         else if (!(rt->dst.flags & DST_HOST))
> > >                 nrt = rt6_alloc_clone(rt, &fl6->daddr);
> > 
> > 
> > 
> > I'm not sure this patch is doing the right thing. It seems to break
> > IPv6 loopback functionality, it is no longer equivalent to IPv4, as
> > stated above. It doesn't just stop neighbor creation but it stops
> > cached route creation. Seems like a scary change for a stable tree.
> > See below:
> > 
> > $ ip -4 route show local
> > local 127.0.0.0/8 dev lo  proto kernel  scope host  src 127.0.0.1
> > 
> > This local route enables us to use the whole loopback network, any
> > address inside 127.0.0.0/8 will work.
> > 
> > $ ping -c1 127.0.0.9
> > PING 127.0.0.9 (127.0.0.9) 56(84) bytes of data.
> > 64 bytes from 127.0.0.9: icmp_seq=1 ttl=64 time=0.012 ms
> > 
> > --- 127.0.0.9 ping statistics ---
> > 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> > rtt min/avg/max/mdev = 0.012/0.012/0.012/0.000 ms
> > 
> > This also used to work equivalently for IPv6 local loopback routes:
> > 
> > $ ip -6 route add local 2001:::/64 dev lo
> > $ ping6 -c1 2001::9
> > PING 2001::9(2001::9) 56 data bytes
> > 64 bytes from 2001::9: icmp_seq=1 ttl=64 time=0.010 ms
> > 
> > --- 2001::9 ping statistics ---
> > 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> > rtt min/avg/max/mdev = 0.010/0.010/0.010/0.000 ms
> > 
> > However with this patch, this is very broken:
> > 
> > $ ip -6 route add local 2001::/64 dev lo
> > $ ping6 -c1 2001::9
> > PING 2001::9(2001::9) 56 data bytes
> > ping: sendmsg: Invalid argument
> 
> I do think that the patch above is fine. I wonder why you get a blackhole
> route back here. Maybe backtracking in ip6_pol_route or in fib6_lookup_1 was
> way too aggressive?

Ah sorry, before rt->n removal everything worked a bit
different. rt6_alloc_cow did fill rt->n back then. To fix both things
we would have to bind a neighbour towards the loopback interface into
the non-cloned rt6_info if it feeds packets towards lo. Pretty big change for
old stable kernels, I guess. :/

Marcelo, any idea how to deal with this? My guess would be a revert, but I
don't know the impact on the tproxy issue.

Greetings,

  Hannes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch net] ipv6: do not create neighbor entries for local delivery
  2013-08-08 20:16     ` Hannes Frederic Sowa
@ 2013-08-08 20:45       ` Marcelo Ricardo Leitner
  2013-08-08 20:46         ` Marcelo Ricardo Leitner
  2013-08-12 18:09       ` Marcelo Ricardo Leitner
  1 sibling, 1 reply; 17+ messages in thread
From: Marcelo Ricardo Leitner @ 2013-08-08 20:45 UTC (permalink / raw)
  To: Debabrata Banerjee, Jiri Pirko, davem, netdev, Alexey Kuznetsov,
	jmorris, yoshfuji, Patrick McHardy, Banerjee, Debabrata,
	Joshua Hunt

Em 08-08-2013 17:16, Hannes Frederic Sowa escreveu:
> On Thu, Aug 08, 2013 at 09:47:02PM +0200, Hannes Frederic Sowa wrote:
>> On Thu, Aug 08, 2013 at 02:45:40PM -0400, Debabrata Banerjee wrote:
>>> On Wed, Jan 30, 2013 at 3:26 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>>> From: Marcelo Ricardo Leitner <mleitner@redhat.com>
>>>>
>>>> They will be created at output, if ever needed. This avoids creating
>>>> empty neighbor entries when TPROXYing/Forwarding packets for addresses
>>>> that are not even directly reachable.
>>>>
>>>> Note that IPv4 already handles it this way. No neighbor entries are
>>>> created for local input.
>>>>
>>>> Tested by myself and customer.
>>>>
>>>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>>> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
>>>> ---
>>>>   net/ipv6/route.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>>>> index e229a3b..363d8b7 100644
>>>> --- a/net/ipv6/route.c
>>>> +++ b/net/ipv6/route.c
>>>> @@ -928,7 +928,7 @@ restart:
>>>>          dst_hold(&rt->dst);
>>>>          read_unlock_bh(&table->tb6_lock);
>>>>
>>>> -       if (!rt->n && !(rt->rt6i_flags & RTF_NONEXTHOP))
>>>> +       if (!rt->n && !(rt->rt6i_flags & (RTF_NONEXTHOP | RTF_LOCAL)))
>>>>                  nrt = rt6_alloc_cow(rt, &fl6->daddr, &fl6->saddr);
>>>>          else if (!(rt->dst.flags & DST_HOST))
>>>>                  nrt = rt6_alloc_clone(rt, &fl6->daddr);
>>>
>>>
>>>
>>> I'm not sure this patch is doing the right thing. It seems to break
>>> IPv6 loopback functionality, it is no longer equivalent to IPv4, as
>>> stated above. It doesn't just stop neighbor creation but it stops
>>> cached route creation. Seems like a scary change for a stable tree.
>>> See below:
>>>
>>> $ ip -4 route show local
>>> local 127.0.0.0/8 dev lo  proto kernel  scope host  src 127.0.0.1
>>>
>>> This local route enables us to use the whole loopback network, any
>>> address inside 127.0.0.0/8 will work.
>>>
>>> $ ping -c1 127.0.0.9
>>> PING 127.0.0.9 (127.0.0.9) 56(84) bytes of data.
>>> 64 bytes from 127.0.0.9: icmp_seq=1 ttl=64 time=0.012 ms
>>>
>>> --- 127.0.0.9 ping statistics ---
>>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>>> rtt min/avg/max/mdev = 0.012/0.012/0.012/0.000 ms
>>>
>>> This also used to work equivalently for IPv6 local loopback routes:
>>>
>>> $ ip -6 route add local 2001:::/64 dev lo
>>> $ ping6 -c1 2001::9
>>> PING 2001::9(2001::9) 56 data bytes
>>> 64 bytes from 2001::9: icmp_seq=1 ttl=64 time=0.010 ms
>>>
>>> --- 2001::9 ping statistics ---
>>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>>> rtt min/avg/max/mdev = 0.010/0.010/0.010/0.000 ms
>>>
>>> However with this patch, this is very broken:
>>>
>>> $ ip -6 route add local 2001::/64 dev lo
>>> $ ping6 -c1 2001::9
>>> PING 2001::9(2001::9) 56 data bytes
>>> ping: sendmsg: Invalid argument
>>
>> I do think that the patch above is fine. I wonder why you get a blackhole
>> route back here. Maybe backtracking in ip6_pol_route or in fib6_lookup_1 was
>> way too aggressive?
>
> Ah sorry, before rt->n removal everything worked a bit
> different. rt6_alloc_cow did fill rt->n back then. To fix both things
> we would have to bind a neighbour towards the loopback interface into
> the non-cloned rt6_info if it feeds packets towards lo. Pretty big change for
> old stable kernels, I guess. :/
>
> Marcelo, any idea how to deal with this? My guess would be a revert, but I
> don't know the impact on the tproxy issue.

Good question :) Nothing so far, sorry.

The impact would be returning to the previous state, that a tproxy server is 
limited to neighbor cache size. And just making it larger is not a good option 
as it will introduce big latency spikes during cleanup.

I'll have to rebuild the tproxy environment I had to test this out again, it 
will take a while. Keep you posted.

Cheers,
Marcelo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch net] ipv6: do not create neighbor entries for local delivery
  2013-08-08 20:45       ` Marcelo Ricardo Leitner
@ 2013-08-08 20:46         ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 17+ messages in thread
From: Marcelo Ricardo Leitner @ 2013-08-08 20:46 UTC (permalink / raw)
  To: Debabrata Banerjee
  Cc: Jiri Pirko, davem, netdev, Alexey Kuznetsov, jmorris, yoshfuji,
	Patrick McHardy, Banerjee, Debabrata, Joshua Hunt

Em 08-08-2013 17:45, Marcelo Ricardo Leitner escreveu:
> Em 08-08-2013 17:16, Hannes Frederic Sowa escreveu:
>> On Thu, Aug 08, 2013 at 09:47:02PM +0200, Hannes Frederic Sowa wrote:
>>> On Thu, Aug 08, 2013 at 02:45:40PM -0400, Debabrata Banerjee wrote:
>>>> On Wed, Jan 30, 2013 at 3:26 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>>>> From: Marcelo Ricardo Leitner <mleitner@redhat.com>
>>>>>
>>>>> They will be created at output, if ever needed. This avoids creating
>>>>> empty neighbor entries when TPROXYing/Forwarding packets for addresses
>>>>> that are not even directly reachable.
>>>>>
>>>>> Note that IPv4 already handles it this way. No neighbor entries are
>>>>> created for local input.
>>>>>
>>>>> Tested by myself and customer.
>>>>>
>>>>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>>>> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
>>>>> ---
>>>>>   net/ipv6/route.c | 2 +-
>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>>>>> index e229a3b..363d8b7 100644
>>>>> --- a/net/ipv6/route.c
>>>>> +++ b/net/ipv6/route.c
>>>>> @@ -928,7 +928,7 @@ restart:
>>>>>          dst_hold(&rt->dst);
>>>>>          read_unlock_bh(&table->tb6_lock);
>>>>>
>>>>> -       if (!rt->n && !(rt->rt6i_flags & RTF_NONEXTHOP))
>>>>> +       if (!rt->n && !(rt->rt6i_flags & (RTF_NONEXTHOP | RTF_LOCAL)))
>>>>>                  nrt = rt6_alloc_cow(rt, &fl6->daddr, &fl6->saddr);
>>>>>          else if (!(rt->dst.flags & DST_HOST))
>>>>>                  nrt = rt6_alloc_clone(rt, &fl6->daddr);
>>>>
>>>>
>>>>
>>>> I'm not sure this patch is doing the right thing. It seems to break
>>>> IPv6 loopback functionality, it is no longer equivalent to IPv4, as
>>>> stated above. It doesn't just stop neighbor creation but it stops
>>>> cached route creation. Seems like a scary change for a stable tree.
>>>> See below:
>>>>
>>>> $ ip -4 route show local
>>>> local 127.0.0.0/8 dev lo  proto kernel  scope host  src 127.0.0.1
>>>>
>>>> This local route enables us to use the whole loopback network, any
>>>> address inside 127.0.0.0/8 will work.
>>>>
>>>> $ ping -c1 127.0.0.9
>>>> PING 127.0.0.9 (127.0.0.9) 56(84) bytes of data.
>>>> 64 bytes from 127.0.0.9: icmp_seq=1 ttl=64 time=0.012 ms
>>>>
>>>> --- 127.0.0.9 ping statistics ---
>>>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>>>> rtt min/avg/max/mdev = 0.012/0.012/0.012/0.000 ms
>>>>
>>>> This also used to work equivalently for IPv6 local loopback routes:
>>>>
>>>> $ ip -6 route add local 2001:::/64 dev lo
>>>> $ ping6 -c1 2001::9
>>>> PING 2001::9(2001::9) 56 data bytes
>>>> 64 bytes from 2001::9: icmp_seq=1 ttl=64 time=0.010 ms
>>>>
>>>> --- 2001::9 ping statistics ---
>>>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>>>> rtt min/avg/max/mdev = 0.010/0.010/0.010/0.000 ms
>>>>
>>>> However with this patch, this is very broken:
>>>>
>>>> $ ip -6 route add local 2001::/64 dev lo
>>>> $ ping6 -c1 2001::9
>>>> PING 2001::9(2001::9) 56 data bytes
>>>> ping: sendmsg: Invalid argument
>>>
>>> I do think that the patch above is fine. I wonder why you get a blackhole
>>> route back here. Maybe backtracking in ip6_pol_route or in fib6_lookup_1 was
>>> way too aggressive?
>>
>> Ah sorry, before rt->n removal everything worked a bit
>> different. rt6_alloc_cow did fill rt->n back then. To fix both things
>> we would have to bind a neighbour towards the loopback interface into
>> the non-cloned rt6_info if it feeds packets towards lo. Pretty big change for
>> old stable kernels, I guess. :/
>>
>> Marcelo, any idea how to deal with this? My guess would be a revert, but I
>> don't know the impact on the tproxy issue.
>
> Good question :) Nothing so far, sorry.
>
> The impact would be returning to the previous state, that a tproxy server is
> limited to neighbor cache size. And just making it larger is not a good option
> as it will introduce big latency spikes during cleanup.
>
> I'll have to rebuild the tproxy environment I had to test this out again, it
> will take a while. Keep you posted.

Aye, and thanks for assisting on this, Hannes, appreciated.

Cheers,
Marcelo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch net] ipv6: do not create neighbor entries for local delivery
  2013-08-08 20:16     ` Hannes Frederic Sowa
  2013-08-08 20:45       ` Marcelo Ricardo Leitner
@ 2013-08-12 18:09       ` Marcelo Ricardo Leitner
  2013-08-12 22:26         ` Hannes Frederic Sowa
  1 sibling, 1 reply; 17+ messages in thread
From: Marcelo Ricardo Leitner @ 2013-08-12 18:09 UTC (permalink / raw)
  To: Debabrata Banerjee, Jiri Pirko, davem, netdev, Alexey Kuznetsov,
	jmorris, yoshfuji, Patrick McHardy, Banerjee, Debabrata,
	Joshua Hunt

[-- Attachment #1: Type: text/plain, Size: 4060 bytes --]

Em 08-08-2013 17:16, Hannes Frederic Sowa escreveu:
> On Thu, Aug 08, 2013 at 09:47:02PM +0200, Hannes Frederic Sowa wrote:
>> On Thu, Aug 08, 2013 at 02:45:40PM -0400, Debabrata Banerjee wrote:
>>> On Wed, Jan 30, 2013 at 3:26 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>>> From: Marcelo Ricardo Leitner <mleitner@redhat.com>
>>>>
>>>> They will be created at output, if ever needed. This avoids creating
>>>> empty neighbor entries when TPROXYing/Forwarding packets for addresses
>>>> that are not even directly reachable.
>>>>
>>>> Note that IPv4 already handles it this way. No neighbor entries are
>>>> created for local input.
>>>>
>>>> Tested by myself and customer.
>>>>
>>>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>>> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
>>>> ---
>>>>   net/ipv6/route.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>>>> index e229a3b..363d8b7 100644
>>>> --- a/net/ipv6/route.c
>>>> +++ b/net/ipv6/route.c
>>>> @@ -928,7 +928,7 @@ restart:
>>>>          dst_hold(&rt->dst);
>>>>          read_unlock_bh(&table->tb6_lock);
>>>>
>>>> -       if (!rt->n && !(rt->rt6i_flags & RTF_NONEXTHOP))
>>>> +       if (!rt->n && !(rt->rt6i_flags & (RTF_NONEXTHOP | RTF_LOCAL)))
>>>>                  nrt = rt6_alloc_cow(rt, &fl6->daddr, &fl6->saddr);
>>>>          else if (!(rt->dst.flags & DST_HOST))
>>>>                  nrt = rt6_alloc_clone(rt, &fl6->daddr);
>>>
>>>
>>>
>>> I'm not sure this patch is doing the right thing. It seems to break
>>> IPv6 loopback functionality, it is no longer equivalent to IPv4, as
>>> stated above. It doesn't just stop neighbor creation but it stops
>>> cached route creation. Seems like a scary change for a stable tree.
>>> See below:
>>>
>>> $ ip -4 route show local
>>> local 127.0.0.0/8 dev lo  proto kernel  scope host  src 127.0.0.1
>>>
>>> This local route enables us to use the whole loopback network, any
>>> address inside 127.0.0.0/8 will work.
>>>
>>> $ ping -c1 127.0.0.9
>>> PING 127.0.0.9 (127.0.0.9) 56(84) bytes of data.
>>> 64 bytes from 127.0.0.9: icmp_seq=1 ttl=64 time=0.012 ms
>>>
>>> --- 127.0.0.9 ping statistics ---
>>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>>> rtt min/avg/max/mdev = 0.012/0.012/0.012/0.000 ms
>>>
>>> This also used to work equivalently for IPv6 local loopback routes:
>>>
>>> $ ip -6 route add local 2001:::/64 dev lo
>>> $ ping6 -c1 2001::9
>>> PING 2001::9(2001::9) 56 data bytes
>>> 64 bytes from 2001::9: icmp_seq=1 ttl=64 time=0.010 ms
>>>
>>> --- 2001::9 ping statistics ---
>>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>>> rtt min/avg/max/mdev = 0.010/0.010/0.010/0.000 ms
>>>
>>> However with this patch, this is very broken:
>>>
>>> $ ip -6 route add local 2001::/64 dev lo
>>> $ ping6 -c1 2001::9
>>> PING 2001::9(2001::9) 56 data bytes
>>> ping: sendmsg: Invalid argument
>>
>> I do think that the patch above is fine. I wonder why you get a blackhole
>> route back here. Maybe backtracking in ip6_pol_route or in fib6_lookup_1 was
>> way too aggressive?
>
> Ah sorry, before rt->n removal everything worked a bit
> different. rt6_alloc_cow did fill rt->n back then. To fix both things
> we would have to bind a neighbour towards the loopback interface into
> the non-cloned rt6_info if it feeds packets towards lo. Pretty big change for
> old stable kernels, I guess. :/
>
> Marcelo, any idea how to deal with this? My guess would be a revert, but I
> don't know the impact on the tproxy issue.

Hannes, would something like this be acceptable? I'm hoping it's not too 
ugly/hacky... as far as I could track back, input and output routines were 
merged mainly due code similarity.

TPROXY scenario needs to not create this neighbor entries on INPUT path, while 
Debabrata ping test needs it on OUTPUT path. This patch limits my previous 
patch to INPUT only then.

Initial testing here seems good, TPROXY seems to be working as expected and 
also the ping6 test.

What do you think?

Regards,
Marcelo


[-- Attachment #2: ipv6-rt.patch --]
[-- Type: text/x-patch, Size: 1914 bytes --]

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 18ea73c..603f9d9 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -791,7 +791,7 @@ static struct rt6_info *rt6_alloc_clone(struct rt6_info *ort,
 }
 
 static struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table, int oif,
-				      struct flowi6 *fl6, int flags)
+				      struct flowi6 *fl6, int flags, int output)
 {
 	struct fib6_node *fn;
 	struct rt6_info *rt, *nrt;
@@ -799,8 +799,11 @@ static struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
 	int attempts = 3;
 	int err;
 	int reachable = net->ipv6.devconf_all->forwarding ? 0 : RT6_LOOKUP_F_REACHABLE;
+	int local = RTF_NONEXTHOP;
 
 	strict |= flags & RT6_LOOKUP_F_IFACE;
+	if (!output)
+			local |= RTF_LOCAL;
 
 relookup:
 	read_lock_bh(&table->tb6_lock);
@@ -820,7 +823,7 @@ restart:
 	read_unlock_bh(&table->tb6_lock);
 
 	if (!dst_get_neighbour_raw(&rt->dst)
-	    && !(rt->rt6i_flags & (RTF_NONEXTHOP | RTF_LOCAL)))
+	    && !(rt->rt6i_flags & local))
 		nrt = rt6_alloc_cow(rt, &fl6->daddr, &fl6->saddr);
 	else if (!(rt->dst.flags & DST_HOST))
 		nrt = rt6_alloc_clone(rt, &fl6->daddr);
@@ -864,7 +867,7 @@ out2:
 static struct rt6_info *ip6_pol_route_input(struct net *net, struct fib6_table *table,
 					    struct flowi6 *fl6, int flags)
 {
-	return ip6_pol_route(net, table, fl6->flowi6_iif, fl6, flags);
+	return ip6_pol_route(net, table, fl6->flowi6_iif, fl6, flags, 0);
 }
 
 void ip6_route_input(struct sk_buff *skb)
@@ -890,7 +893,7 @@ void ip6_route_input(struct sk_buff *skb)
 static struct rt6_info *ip6_pol_route_output(struct net *net, struct fib6_table *table,
 					     struct flowi6 *fl6, int flags)
 {
-	return ip6_pol_route(net, table, fl6->flowi6_oif, fl6, flags);
+	return ip6_pol_route(net, table, fl6->flowi6_oif, fl6, flags, 1);
 }
 
 struct dst_entry * ip6_route_output(struct net *net, const struct sock *sk,

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [patch net] ipv6: do not create neighbor entries for local delivery
  2013-08-12 18:09       ` Marcelo Ricardo Leitner
@ 2013-08-12 22:26         ` Hannes Frederic Sowa
  2013-08-13 12:48           ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 17+ messages in thread
From: Hannes Frederic Sowa @ 2013-08-12 22:26 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Debabrata Banerjee, Jiri Pirko, davem, netdev, Alexey Kuznetsov,
	jmorris, yoshfuji, Patrick McHardy, Banerjee, Debabrata,
	Joshua Hunt

Hi Marcelo!

On Mon, Aug 12, 2013 at 03:09:19PM -0300, Marcelo Ricardo Leitner wrote:
> Hannes, would something like this be acceptable? I'm hoping it's not too 
> ugly/hacky... as far as I could track back, input and output routines were 
> merged mainly due code similarity.

Your idea seems sound and I don't think it is very ugly or hacky. It's
as minimal as a stable-only patch should be. But we could simplify the
logic a bit. ;) See below.

> TPROXY scenario needs to not create this neighbor entries on INPUT path, 
> while Debabrata ping test needs it on OUTPUT path. This patch limits my 
> previous patch to INPUT only then.

Yes, agreed. I don't see anything which could break because of this patch.
So I would go with it.

> Initial testing here seems good, TPROXY seems to be working as expected and 
> also the ping6 test.
> 
> What do you think?


> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 18ea73c..603f9d9 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -791,7 +791,7 @@ static struct rt6_info *rt6_alloc_clone(struct rt6_info *ort,
>  }
>  
>  static struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table, int oif,
> -				      struct flowi6 *fl6, int flags)
> +				      struct flowi6 *fl6, int flags, int output)

								     bool input

>  {
>  	struct fib6_node *fn;
>  	struct rt6_info *rt, *nrt;
> @@ -799,8 +799,11 @@ static struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
>  	int attempts = 3;
>  	int err;
>  	int reachable = net->ipv6.devconf_all->forwarding ? 0 : RT6_LOOKUP_F_REACHABLE;
> +	int local = RTF_NONEXTHOP;
>  
>  	strict |= flags & RT6_LOOKUP_F_IFACE;
> +	if (!output)
> +			local |= RTF_LOCAL;


	if (input)
			local |= RTF_LOCAL;

>  
>  relookup:
>  	read_lock_bh(&table->tb6_lock);
> @@ -820,7 +823,7 @@ restart:
>  	read_unlock_bh(&table->tb6_lock);
>  
>  	if (!dst_get_neighbour_raw(&rt->dst)
> -	    && !(rt->rt6i_flags & (RTF_NONEXTHOP | RTF_LOCAL)))
> +	    && !(rt->rt6i_flags & local))
>  		nrt = rt6_alloc_cow(rt, &fl6->daddr, &fl6->saddr);
>  	else if (!(rt->dst.flags & DST_HOST))
>  		nrt = rt6_alloc_clone(rt, &fl6->daddr);
> @@ -864,7 +867,7 @@ out2:
>  static struct rt6_info *ip6_pol_route_input(struct net *net, struct fib6_table *table,
>  					    struct flowi6 *fl6, int flags)
>  {
> -	return ip6_pol_route(net, table, fl6->flowi6_iif, fl6, flags);
> +	return ip6_pol_route(net, table, fl6->flowi6_iif, fl6, flags, 0);

								      true);


>  }
>  
>  void ip6_route_input(struct sk_buff *skb)
> @@ -890,7 +893,7 @@ void ip6_route_input(struct sk_buff *skb)
>  static struct rt6_info *ip6_pol_route_output(struct net *net, struct fib6_table *table,
>  					     struct flowi6 *fl6, int flags)
>  {
> -	return ip6_pol_route(net, table, fl6->flowi6_oif, fl6, flags);
> +	return ip6_pol_route(net, table, fl6->flowi6_oif, fl6, flags, 1);

								      false);

>  }
>  
>  struct dst_entry * ip6_route_output(struct net *net, const struct sock *sk,

Thanks,

  Hannes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch net] ipv6: do not create neighbor entries for local delivery
  2013-08-12 22:26         ` Hannes Frederic Sowa
@ 2013-08-13 12:48           ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 17+ messages in thread
From: Marcelo Ricardo Leitner @ 2013-08-13 12:48 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Debabrata Banerjee, Jiri Pirko, davem, netdev, Alexey Kuznetsov,
	jmorris, yoshfuji, Patrick McHardy, Banerjee, Debabrata,
	Joshua Hunt

Em 12-08-2013 19:26, Hannes Frederic Sowa escreveu:
> Hi Marcelo!
>
> On Mon, Aug 12, 2013 at 03:09:19PM -0300, Marcelo Ricardo Leitner wrote:
>> Hannes, would something like this be acceptable? I'm hoping it's not too
>> ugly/hacky... as far as I could track back, input and output routines were
>> merged mainly due code similarity.
>
> Your idea seems sound and I don't think it is very ugly or hacky. It's
> as minimal as a stable-only patch should be. But we could simplify the
> logic a bit. ;) See below.
>
>> TPROXY scenario needs to not create this neighbor entries on INPUT path,
>> while Debabrata ping test needs it on OUTPUT path. This patch limits my
>> previous patch to INPUT only then.
>
> Yes, agreed. I don't see anything which could break because of this patch.
> So I would go with it.
>
>> Initial testing here seems good, TPROXY seems to be working as expected and
>> also the ping6 test.
>>
>> What do you think?

Aye Hannes, thanks! I'll rework the patch based on your points, do some more 
testings in here and post it probably only by tomorrow.

Thanks!
Marcelo

>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>> index 18ea73c..603f9d9 100644
>> --- a/net/ipv6/route.c
>> +++ b/net/ipv6/route.c
>> @@ -791,7 +791,7 @@ static struct rt6_info *rt6_alloc_clone(struct rt6_info *ort,
>>   }
>>
>>   static struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table, int oif,
>> -				      struct flowi6 *fl6, int flags)
>> +				      struct flowi6 *fl6, int flags, int output)
>
> 								     bool input
>
>>   {
>>   	struct fib6_node *fn;
>>   	struct rt6_info *rt, *nrt;
>> @@ -799,8 +799,11 @@ static struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
>>   	int attempts = 3;
>>   	int err;
>>   	int reachable = net->ipv6.devconf_all->forwarding ? 0 : RT6_LOOKUP_F_REACHABLE;
>> +	int local = RTF_NONEXTHOP;
>>
>>   	strict |= flags & RT6_LOOKUP_F_IFACE;
>> +	if (!output)
>> +			local |= RTF_LOCAL;
>
>
> 	if (input)
> 			local |= RTF_LOCAL;
>
>>
>>   relookup:
>>   	read_lock_bh(&table->tb6_lock);
>> @@ -820,7 +823,7 @@ restart:
>>   	read_unlock_bh(&table->tb6_lock);
>>
>>   	if (!dst_get_neighbour_raw(&rt->dst)
>> -	    && !(rt->rt6i_flags & (RTF_NONEXTHOP | RTF_LOCAL)))
>> +	    && !(rt->rt6i_flags & local))
>>   		nrt = rt6_alloc_cow(rt, &fl6->daddr, &fl6->saddr);
>>   	else if (!(rt->dst.flags & DST_HOST))
>>   		nrt = rt6_alloc_clone(rt, &fl6->daddr);
>> @@ -864,7 +867,7 @@ out2:
>>   static struct rt6_info *ip6_pol_route_input(struct net *net, struct fib6_table *table,
>>   					    struct flowi6 *fl6, int flags)
>>   {
>> -	return ip6_pol_route(net, table, fl6->flowi6_iif, fl6, flags);
>> +	return ip6_pol_route(net, table, fl6->flowi6_iif, fl6, flags, 0);
>
> 								      true);
>
>
>>   }
>>
>>   void ip6_route_input(struct sk_buff *skb)
>> @@ -890,7 +893,7 @@ void ip6_route_input(struct sk_buff *skb)
>>   static struct rt6_info *ip6_pol_route_output(struct net *net, struct fib6_table *table,
>>   					     struct flowi6 *fl6, int flags)
>>   {
>> -	return ip6_pol_route(net, table, fl6->flowi6_oif, fl6, flags);
>> +	return ip6_pol_route(net, table, fl6->flowi6_oif, fl6, flags, 1);
>
> 								      false);
>
>>   }
>>
>>   struct dst_entry * ip6_route_output(struct net *net, const struct sock *sk,
>
> Thanks,
>
>    Hannes
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-08-13 12:49 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-30  8:26 [patch net] ipv6: do not create neighbor entries for local delivery Jiri Pirko
2013-01-31  1:26 ` David Miller
2013-08-08 18:45 ` Debabrata Banerjee
2013-08-08 19:01   ` Hannes Frederic Sowa
2013-08-08 19:02     ` Marcelo Ricardo Leitner
2013-08-08 19:06       ` Hannes Frederic Sowa
2013-08-08 19:11         ` Marcelo Ricardo Leitner
2013-08-08 19:16           ` Hannes Frederic Sowa
2013-08-08 19:23             ` Marcelo Ricardo Leitner
2013-08-08 19:19     ` Debabrata Banerjee
2013-08-08 19:47   ` Hannes Frederic Sowa
2013-08-08 20:16     ` Hannes Frederic Sowa
2013-08-08 20:45       ` Marcelo Ricardo Leitner
2013-08-08 20:46         ` Marcelo Ricardo Leitner
2013-08-12 18:09       ` Marcelo Ricardo Leitner
2013-08-12 22:26         ` Hannes Frederic Sowa
2013-08-13 12:48           ` Marcelo Ricardo Leitner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.