linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] net/ipv4: inconsistent routing table
@ 2015-08-05  8:56 Zang MingJie
  2015-08-05  9:06 ` Daniel Borkmann
  0 siblings, 1 reply; 16+ messages in thread
From: Zang MingJie @ 2015-08-05  8:56 UTC (permalink / raw)
  To: linux-kernel

Hi:

I found a bug when remove an ip address which is referenced by a routing entry.

step to reproduce:

ip li add type dummy
ip li set dummy0 up
ip ad add 10.0.0.1/24 dev dummy0
ip ad add 10.0.0.2/24 dev dummy0
ip ro add default via 10.0.0.2/24
ip ad del 10.0.0.2/24 dev dummy0

after deleting the secondary ip address, the routing entry still
pointing to 10.0.0.2

# ip ro
default via 10.0.0.2 dev dummy0
10.0.0.0/24 dev dummy0  proto kernel  scope link  src 10.0.0.1

but actually, kernel considers the default route is directly connected.

# ip ro get 1.1.1.1
1.1.1.1 dev dummy0  src 10.0.0.1
    cache

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] net/ipv4: inconsistent routing table
  2015-08-05  8:56 [BUG] net/ipv4: inconsistent routing table Zang MingJie
@ 2015-08-05  9:06 ` Daniel Borkmann
  2015-08-05 17:45   ` Alexander Duyck
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Borkmann @ 2015-08-05  9:06 UTC (permalink / raw)
  To: Zang MingJie; +Cc: linux-kernel, netdev

[ please cc netdev ]

On 08/05/2015 10:56 AM, Zang MingJie wrote:
> Hi:
>
> I found a bug when remove an ip address which is referenced by a routing entry.
>
> step to reproduce:
>
> ip li add type dummy
> ip li set dummy0 up
> ip ad add 10.0.0.1/24 dev dummy0
> ip ad add 10.0.0.2/24 dev dummy0
> ip ro add default via 10.0.0.2/24
> ip ad del 10.0.0.2/24 dev dummy0
>
> after deleting the secondary ip address, the routing entry still
> pointing to 10.0.0.2
>
> # ip ro
> default via 10.0.0.2 dev dummy0
> 10.0.0.0/24 dev dummy0  proto kernel  scope link  src 10.0.0.1
>
> but actually, kernel considers the default route is directly connected.
>
> # ip ro get 1.1.1.1
> 1.1.1.1 dev dummy0  src 10.0.0.1
>      cache
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] net/ipv4: inconsistent routing table
  2015-08-05  9:06 ` Daniel Borkmann
@ 2015-08-05 17:45   ` Alexander Duyck
  2015-08-06 10:13     ` Zang MingJie
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander Duyck @ 2015-08-05 17:45 UTC (permalink / raw)
  To: Daniel Borkmann, Zang MingJie; +Cc: linux-kernel, netdev

On 08/05/2015 02:06 AM, Daniel Borkmann wrote:
> [ please cc netdev ]
>
> On 08/05/2015 10:56 AM, Zang MingJie wrote:
>> Hi:
>>
>> I found a bug when remove an ip address which is referenced by a 
>> routing entry.
>>
>> step to reproduce:
>>
>> ip li add type dummy
>> ip li set dummy0 up
>> ip ad add 10.0.0.1/24 dev dummy0
>> ip ad add 10.0.0.2/24 dev dummy0

Okay, so up to this point you have 2 addresses on the same subnet that 
are now on dummy0.

>> ip ro add default via 10.0.0.2/24

This makes the default route go through 10.0.0.2.

>> ip ad del 10.0.0.2/24 dev dummy0

Then you remove 10.0.0.2 from the local system, however since 10.0.0.1 
is on the same subnet dummy0 would still be the correct interface to 
access 10.0.0.2 it is just no longer local to the system.

>> after deleting the secondary ip address, the routing entry still
>> pointing to 10.0.0.2

You didn't delete the default routing entry so why would you expect it 
to change?  All you did is remove 10.0.0.2 from the local system.  I 
believe the assumption is that 10.0.0.2 is still out there somewhere, it 
just isn't on the local system anymore.

>> # ip ro
>> default via 10.0.0.2 dev dummy0
>> 10.0.0.0/24 dev dummy0  proto kernel  scope link  src 10.0.0.1

This matches up with what I would expect.  10.0.0.2 is the default 
gateway and it is accessible from dummy0 since 10.0.0.0/24 is accessible 
from dummy0.

>> but actually, kernel considers the default route is directly connected.
>>
>> # ip ro get 1.1.1.1
>> 1.1.1.1 dev dummy0  src 10.0.0.1
>>      cache

I'm not sure how you came to the "directly connected" conclusion. It is 
still routing things out through 10.0.0.2 from 10.0.0.1.

Maybe your example would work better if you used 10.0.0.1 and 10.0.1.1 
instead.  Then I think you might be able to better see that when you 
delete the second address the route would be broken.

- Alex

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] net/ipv4: inconsistent routing table
  2015-08-05 17:45   ` Alexander Duyck
@ 2015-08-06 10:13     ` Zang MingJie
  2015-08-06 19:43       ` Alexander Duyck
  0 siblings, 1 reply; 16+ messages in thread
From: Zang MingJie @ 2015-08-06 10:13 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Daniel Borkmann, linux-kernel, netdev

On Thu, Aug 6, 2015 at 1:45 AM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On 08/05/2015 02:06 AM, Daniel Borkmann wrote:
>>
>> [ please cc netdev ]
>>
>> On 08/05/2015 10:56 AM, Zang MingJie wrote:
>>>
>>> Hi:
>>>
>>> I found a bug when remove an ip address which is referenced by a routing
>>> entry.
>>>
>>> step to reproduce:
>>>
>>> ip li add type dummy
>>> ip li set dummy0 up
>>> ip ad add 10.0.0.1/24 dev dummy0
>>> ip ad add 10.0.0.2/24 dev dummy0
>
>
> Okay, so up to this point you have 2 addresses on the same subnet that are
> now on dummy0.
>
>>> ip ro add default via 10.0.0.2/24
>
>
> This makes the default route go through 10.0.0.2.
>
>>> ip ad del 10.0.0.2/24 dev dummy0
>
>
> Then you remove 10.0.0.2 from the local system, however since 10.0.0.1 is on
> the same subnet dummy0 would still be the correct interface to access
> 10.0.0.2 it is just no longer local to the system.
>
>>> after deleting the secondary ip address, the routing entry still
>>> pointing to 10.0.0.2
>
>
> You didn't delete the default routing entry so why would you expect it to
> change?  All you did is remove 10.0.0.2 from the local system.  I believe
> the assumption is that 10.0.0.2 is still out there somewhere, it just isn't
> on the local system anymore.

Yes, 10.0.0.2 is migrated to somewhere else

>
>>> # ip ro
>>> default via 10.0.0.2 dev dummy0
>>> 10.0.0.0/24 dev dummy0  proto kernel  scope link  src 10.0.0.1
>
>
> This matches up with what I would expect.  10.0.0.2 is the default gateway
> and it is accessible from dummy0 since 10.0.0.0/24 is accessible from
> dummy0.

This means 0.0.0.0/0 is accessible via 10.0.0.2 on the network of dummy0

>
>>> but actually, kernel considers the default route is directly connected.
>>>
>>> # ip ro get 1.1.1.1
>>> 1.1.1.1 dev dummy0  src 10.0.0.1
>>>      cache
>
>
> I'm not sure how you came to the "directly connected" conclusion. It is
> still routing things out through 10.0.0.2 from 10.0.0.1.
>
> Maybe your example would work better if you used 10.0.0.1 and 10.0.1.1
> instead.  Then I think you might be able to better see that when you delete
> the second address the route would be broken.

No, it isn't. when ping 1.1.1.1, kernel will directly send arp request
braodcast to 1.1.1.1, this is not what I expect. it should send arp
request to 10.0.0.2, following should be the correct routing entry:

# ip ro get 1.1.1.1
1.1.1.1 via 10.0.0.2 dev dummy0  src 10.0.0.1
    cache


>
> - Alex

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] net/ipv4: inconsistent routing table
  2015-08-06 10:13     ` Zang MingJie
@ 2015-08-06 19:43       ` Alexander Duyck
  2015-08-07  8:23         ` Zang MingJie
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander Duyck @ 2015-08-06 19:43 UTC (permalink / raw)
  To: Zang MingJie, Alexander Duyck
  Cc: Daniel Borkmann, linux-kernel, netdev, Stephen Hemminger, David Miller

On 08/06/2015 03:13 AM, Zang MingJie wrote:
> On Thu, Aug 6, 2015 at 1:45 AM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>> On 08/05/2015 02:06 AM, Daniel Borkmann wrote:
>>>
>>> [ please cc netdev ]
>>>
>>> On 08/05/2015 10:56 AM, Zang MingJie wrote:
>>>>
>>>> Hi:
>>>>
>>>> I found a bug when remove an ip address which is referenced by a routing
>>>> entry.
>>>>
>>>> step to reproduce:
>>>>
>>>> ip li add type dummy
>>>> ip li set dummy0 up
>>>> ip ad add 10.0.0.1/24 dev dummy0
>>>> ip ad add 10.0.0.2/24 dev dummy0
>>
>>
>> Okay, so up to this point you have 2 addresses on the same subnet that are
>> now on dummy0.
>>
>>>> ip ro add default via 10.0.0.2/24
>>
>>
>> This makes the default route go through 10.0.0.2.
>>
>>>> ip ad del 10.0.0.2/24 dev dummy0
>>
>>
>> Then you remove 10.0.0.2 from the local system, however since 10.0.0.1 is on
>> the same subnet dummy0 would still be the correct interface to access
>> 10.0.0.2 it is just no longer local to the system.
>>
>>>> after deleting the secondary ip address, the routing entry still
>>>> pointing to 10.0.0.2
>>
>>
>> You didn't delete the default routing entry so why would you expect it to
>> change?  All you did is remove 10.0.0.2 from the local system.  I believe
>> the assumption is that 10.0.0.2 is still out there somewhere, it just isn't
>> on the local system anymore.
>
> Yes, 10.0.0.2 is migrated to somewhere else

The address might have migrated, but the interface is still up and 
10.0.0.1 is still present on the same subnet.  Because you made a local 
address the default gateway the assumption is any routes not 
specifically called out on other interfaces are directly accessible to 
this interface.

The bug indicates that the kernel is doing something to make the table 
inconsistent, but a default route that is a local interface address does 
essentially the same thing.

>>
>>>> # ip ro
>>>> default via 10.0.0.2 dev dummy0
>>>> 10.0.0.0/24 dev dummy0  proto kernel  scope link  src 10.0.0.1
>>
>>
>> This matches up with what I would expect.  10.0.0.2 is the default gateway
>> and it is accessible from dummy0 since 10.0.0.0/24 is accessible from
>> dummy0.
>
> This means 0.0.0.0/0 is accessible via 10.0.0.2 on the network of dummy0

Yes, but at the time you specified it 10.0.0.2 was a local address which 
belonged to dummy0.  This means that dummy0 can access anything not 
specified elsewhere via pretty much any address it wants.  So it is 
perfectly valid if it wants to use a source address of 10.0.0.1 to send 
packets to 1.1.1.1 over dummy0.

>>
>>>> but actually, kernel considers the default route is directly connected.
>>>>
>>>> # ip ro get 1.1.1.1
>>>> 1.1.1.1 dev dummy0  src 10.0.0.1
>>>>       cache
>>
>>
>> I'm not sure how you came to the "directly connected" conclusion. It is
>> still routing things out through 10.0.0.2 from 10.0.0.1.
>>
>> Maybe your example would work better if you used 10.0.0.1 and 10.0.1.1
>> instead.  Then I think you might be able to better see that when you delete
>> the second address the route would be broken.
>
> No, it isn't. when ping 1.1.1.1, kernel will directly send arp request
> braodcast to 1.1.1.1, this is not what I expect. it should send arp
> request to 10.0.0.2, following should be the correct routing entry:
>
> # ip ro get 1.1.1.1
> 1.1.1.1 via 10.0.0.2 dev dummy0  src 10.0.0.1
>      cache

I see what you are trying to say, but the example provided is a bit 
lacking.  Assuming you could ping 1.1.1.1 via dummy0 before with 
10.0.0.2 as your default gateway, that shouldn't change if 10.0.0.2 is 
migrated to another address.  That is, unless there is an issue on the 
system 10.0.0.2 was migrated to.

Now if I move away from using dummy interface and instead using a real 
network interface things can get a bit more interesting.  So if we 
follow your example and use 2 different subnets on the two systems then 
pings continue to work after we remove the addresses.  However if we 
flip things a bit and add the default route, and then the local address 
for the gateway they don't.  So something like below:
	ip li set eth0 up
	ip ad add 10.0.0.1/24 dev eth0
	ip ro add default via 10.0.0.2
	ip ad add 10.0.0.2/24 dev eth0

What you end up with is eth0 sending arp requests looking for 10.0.0.2 
even though it is a local address on the system.

My question would be what is the correct behavior for this?  If a local 
address is removed or added that is being used as a gateway address 
should we delete the route, or update the scope of the next hop?

- Alex

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] net/ipv4: inconsistent routing table
  2015-08-06 19:43       ` Alexander Duyck
@ 2015-08-07  8:23         ` Zang MingJie
  2015-08-07 16:08           ` Alexander Duyck
  0 siblings, 1 reply; 16+ messages in thread
From: Zang MingJie @ 2015-08-07  8:23 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Alexander Duyck, Daniel Borkmann, linux-kernel, netdev,
	Stephen Hemminger, David Miller

IMO, the routing decision is determined, given a specific routing
table and local network the result MUST be determined, independence of
how/what order the routing entry is added.

Now there are two ways to configure the system resulting EXACTLY the
same routing table and local addresses, but the routing decision is
totally different.

SAME routing table, DIFFERENT routing decision, there MUST be bugs in kernel.

On Thu, Aug 6, 2015 at 3:43 PM, Alexander Duyck
<alexander.h.duyck@redhat.com> wrote:
> On 08/06/2015 03:13 AM, Zang MingJie wrote:
>>
>> On Thu, Aug 6, 2015 at 1:45 AM, Alexander Duyck
>> <alexander.duyck@gmail.com> wrote:
>>>
>>> On 08/05/2015 02:06 AM, Daniel Borkmann wrote:
>>>>
>>>>
>>>> [ please cc netdev ]
>>>>
>>>> On 08/05/2015 10:56 AM, Zang MingJie wrote:
>>>>>
>>>>>
>>>>> Hi:
>>>>>
>>>>> I found a bug when remove an ip address which is referenced by a
>>>>> routing
>>>>> entry.
>>>>>
>>>>> step to reproduce:
>>>>>
>>>>> ip li add type dummy
>>>>> ip li set dummy0 up
>>>>> ip ad add 10.0.0.1/24 dev dummy0
>>>>> ip ad add 10.0.0.2/24 dev dummy0
>>>
>>>
>>>
>>> Okay, so up to this point you have 2 addresses on the same subnet that
>>> are
>>> now on dummy0.
>>>
>>>>> ip ro add default via 10.0.0.2/24
>>>
>>>
>>>
>>> This makes the default route go through 10.0.0.2.
>>>
>>>>> ip ad del 10.0.0.2/24 dev dummy0
>>>
>>>
>>>
>>> Then you remove 10.0.0.2 from the local system, however since 10.0.0.1 is
>>> on
>>> the same subnet dummy0 would still be the correct interface to access
>>> 10.0.0.2 it is just no longer local to the system.
>>>
>>>>> after deleting the secondary ip address, the routing entry still
>>>>> pointing to 10.0.0.2
>>>
>>>
>>>
>>> You didn't delete the default routing entry so why would you expect it to
>>> change?  All you did is remove 10.0.0.2 from the local system.  I believe
>>> the assumption is that 10.0.0.2 is still out there somewhere, it just
>>> isn't
>>> on the local system anymore.
>>
>>
>> Yes, 10.0.0.2 is migrated to somewhere else
>
>
> The address might have migrated, but the interface is still up and 10.0.0.1
> is still present on the same subnet.  Because you made a local address the
> default gateway the assumption is any routes not specifically called out on
> other interfaces are directly accessible to this interface.
>
> The bug indicates that the kernel is doing something to make the table
> inconsistent, but a default route that is a local interface address does
> essentially the same thing.
>
>>>
>>>>> # ip ro
>>>>> default via 10.0.0.2 dev dummy0
>>>>> 10.0.0.0/24 dev dummy0  proto kernel  scope link  src 10.0.0.1
>>>
>>>
>>>
>>> This matches up with what I would expect.  10.0.0.2 is the default
>>> gateway
>>> and it is accessible from dummy0 since 10.0.0.0/24 is accessible from
>>> dummy0.
>>
>>
>> This means 0.0.0.0/0 is accessible via 10.0.0.2 on the network of dummy0
>
>
> Yes, but at the time you specified it 10.0.0.2 was a local address which
> belonged to dummy0.  This means that dummy0 can access anything not
> specified elsewhere via pretty much any address it wants.  So it is
> perfectly valid if it wants to use a source address of 10.0.0.1 to send
> packets to 1.1.1.1 over dummy0.
>
>>>
>>>>> but actually, kernel considers the default route is directly connected.
>>>>>
>>>>> # ip ro get 1.1.1.1
>>>>> 1.1.1.1 dev dummy0  src 10.0.0.1
>>>>>       cache
>>>
>>>
>>>
>>> I'm not sure how you came to the "directly connected" conclusion. It is
>>> still routing things out through 10.0.0.2 from 10.0.0.1.
>>>
>>> Maybe your example would work better if you used 10.0.0.1 and 10.0.1.1
>>> instead.  Then I think you might be able to better see that when you
>>> delete
>>> the second address the route would be broken.
>>
>>
>> No, it isn't. when ping 1.1.1.1, kernel will directly send arp request
>> braodcast to 1.1.1.1, this is not what I expect. it should send arp
>> request to 10.0.0.2, following should be the correct routing entry:
>>
>> # ip ro get 1.1.1.1
>> 1.1.1.1 via 10.0.0.2 dev dummy0  src 10.0.0.1
>>      cache
>
>
> I see what you are trying to say, but the example provided is a bit lacking.
> Assuming you could ping 1.1.1.1 via dummy0 before with 10.0.0.2 as your
> default gateway, that shouldn't change if 10.0.0.2 is migrated to another
> address.  That is, unless there is an issue on the system 10.0.0.2 was
> migrated to.
>
> Now if I move away from using dummy interface and instead using a real
> network interface things can get a bit more interesting.  So if we follow
> your example and use 2 different subnets on the two systems then pings
> continue to work after we remove the addresses.  However if we flip things a
> bit and add the default route, and then the local address for the gateway
> they don't.  So something like below:
>         ip li set eth0 up
>         ip ad add 10.0.0.1/24 dev eth0
>         ip ro add default via 10.0.0.2
>         ip ad add 10.0.0.2/24 dev eth0
>
> What you end up with is eth0 sending arp requests looking for 10.0.0.2 even
> though it is a local address on the system.
>
> My question would be what is the correct behavior for this?  If a local
> address is removed or added that is being used as a gateway address should
> we delete the route, or update the scope of the next hop?
>
> - Alex

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] net/ipv4: inconsistent routing table
  2015-08-07  8:23         ` Zang MingJie
@ 2015-08-07 16:08           ` Alexander Duyck
  2015-08-07 17:00             ` Hannes Frederic Sowa
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander Duyck @ 2015-08-07 16:08 UTC (permalink / raw)
  To: Zang MingJie
  Cc: Alexander Duyck, Daniel Borkmann, linux-kernel, netdev,
	Stephen Hemminger, David Miller

On 08/07/2015 01:23 AM, Zang MingJie wrote:
> IMO, the routing decision is determined, given a specific routing
> table and local network the result MUST be determined, independence of
> how/what order the routing entry is added.
>
> Now there are two ways to configure the system resulting EXACTLY the
> same routing table and local addresses, but the routing decision is
> totally different.
>
> SAME routing table, DIFFERENT routing decision, there MUST be bugs in kernel

I wasn't arguing that the behavior is undesirable, but the likelihood of 
having a default route assigned to a local address should be pretty 
low.  If the system is the default route of others then it should have a 
different default gateway than itself.  For example an office router 
would end up pointing to the ISP as the gateway, and the ISP would 
either point to some other provider or run a BGP configuration.  So in 
the case of the default route transitioning to us we should end up 
having to delete and update the default route anyway.  This is likely 
one of the reasons why there hasn't been any issues reported with this 
behavior until now.

I'm just wondering if the work involved to fix it is going to be worth 
it.  We have to keep in mind that this will result in a change of 
behavior for existing users and we don't know if anyone might be 
expecting this type of behavior.

We basically are looking at one of three options.  The first one is to 
just delete the route if you add the gateway as a local address or 
remove it.  That would be consistent with what you might see if the 
address was the sole address on an interface of its own.  The second 
option is to update the nh_scope which I believe should be transitioned 
between RT_SCOPE_HOST to RT_SCOPE_LINK if I am understanding things 
correctly.  The third option is we don't change the behavior and just 
document it.  This would then require manually deleting and restoring 
any routes that use a recently modified address as their gateway.

Based on your feedback I'm assuming you would probably prefer the second 
option.  I'm just waiting to see if there are any other opinions on the 
matter before I act.

Thanks.

- Alex

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] net/ipv4: inconsistent routing table
  2015-08-07 16:08           ` Alexander Duyck
@ 2015-08-07 17:00             ` Hannes Frederic Sowa
       [not found]               ` <CAOrge3qxOb_XrspuvYjV0pDDxUUoqGE3690KUQGoxZMxuD-NRQ@mail.gmail.com>
  0 siblings, 1 reply; 16+ messages in thread
From: Hannes Frederic Sowa @ 2015-08-07 17:00 UTC (permalink / raw)
  To: Alexander Duyck, Zang MingJie
  Cc: Alexander Duyck, Daniel Borkmann, linux-kernel, netdev,
	Stephen Hemminger, David Miller

Hello,

Alexander Duyck <alexander.h.duyck@redhat.com> writes:
> On 08/07/2015 01:23 AM, Zang MingJie wrote:
>> IMO, the routing decision is determined, given a specific routing
>> table and local network the result MUST be determined, independence of
>> how/what order the routing entry is added.
>>
>> Now there are two ways to configure the system resulting EXACTLY the
>> same routing table and local addresses, but the routing decision is
>> totally different.
>>
>> SAME routing table, DIFFERENT routing decision, there MUST be bugs in kernel
>
> I wasn't arguing that the behavior is undesirable, but the likelihood of 
> having a default route assigned to a local address should be pretty 
> low.  If the system is the default route of others then it should have a 
> different default gateway than itself.  For example an office router 
> would end up pointing to the ISP as the gateway, and the ISP would 
> either point to some other provider or run a BGP configuration.  So in 
> the case of the default route transitioning to us we should end up 
> having to delete and update the default route anyway.  This is likely 
> one of the reasons why there hasn't been any issues reported with this 
> behavior until now.
>
> I'm just wondering if the work involved to fix it is going to be worth 
> it.  We have to keep in mind that this will result in a change of 
> behavior for existing users and we don't know if anyone might be 
> expecting this type of behavior.
>
> We basically are looking at one of three options.  The first one is to 
> just delete the route if you add the gateway as a local address or 
> remove it.  That would be consistent with what you might see if the 
> address was the sole address on an interface of its own.  The second 
> option is to update the nh_scope which I believe should be transitioned 
> between RT_SCOPE_HOST to RT_SCOPE_LINK if I am understanding things 
> correctly.  The third option is we don't change the behavior and just 
> document it.  This would then require manually deleting and restoring 
> any routes that use a recently modified address as their gateway.
>
> Based on your feedback I'm assuming you would probably prefer the second 
> option.  I'm just waiting to see if there are any other opinions on the 
> matter before I act.

The semantics behind this are not easy and the result might well break
other people's system. I would leave the current resolution logic as-is
and merely change the way iproute presents those information.

Currently we resolve the nexthop during route setup time and install the
resulting information into the FIB. This is very common on other OS, too.

In case we would reevaluate the nexthop part of a route during local
address changes on one of the interfaces, we could get the system very
well in a situation where it would have to remove its default route
because the network would not be reachable via ip subnetting any more,
but neighboring information would still keep the machine connected. And
this could happen with setups where someone did not configure their
routes to their own addresses, which are much more widespread.

The change wouldn't be in contradiction with weak end system behavior,
but I very much don't want to make other people's machines unreachable
because of such a change.

If we could rewind time, we could make local nexthops -EINVAL.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] net/ipv4: inconsistent routing table
       [not found]               ` <CAOrge3qxOb_XrspuvYjV0pDDxUUoqGE3690KUQGoxZMxuD-NRQ@mail.gmail.com>
@ 2015-08-08 10:36                 ` Zang MingJie
  2015-08-10  9:16                   ` Hannes Frederic Sowa
  0 siblings, 1 reply; 16+ messages in thread
From: Zang MingJie @ 2015-08-08 10:36 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel

Days ago I mistakenly set the gateway address on my box, then add the
default router, after I deleted the address my box can't access
Internet and all things looks fine. It takes me several hours to
figure out it is an kernel bug.

>On Sat, Aug 8, 2015, 1:00 AM Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:
>If we could rewind time, we could make local nexthops -EINVAL.

I don't think this is the proper solution. As almost all network OS
considers the routing table recursive, and it's next hop can be any
unicast ip address.

When the next hop is unreachable the entry won't be installed.

I suggest adding a new sysconf entry, when not set, behavior as the
same as now, when set recalculate the fib when necessary

BTW is there any way to check the fib table?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] net/ipv4: inconsistent routing table
  2015-08-08 10:36                 ` Zang MingJie
@ 2015-08-10  9:16                   ` Hannes Frederic Sowa
  2015-08-10 10:51                     ` Zang MingJie
  0 siblings, 1 reply; 16+ messages in thread
From: Hannes Frederic Sowa @ 2015-08-10  9:16 UTC (permalink / raw)
  To: Zang MingJie, netdev; +Cc: linux-kernel

Hello,

Zang MingJie <zealot0630@gmail.com> writes:
> Days ago I mistakenly set the gateway address on my box, then add the
> default router, after I deleted the address my box can't access
> Internet and all things looks fine. It takes me several hours to
> figure out it is an kernel bug.

I don't consider this a kernel bug.

>>On Sat, Aug 8, 2015, 1:00 AM Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:
>>If we could rewind time, we could make local nexthops -EINVAL.
>
> I don't think this is the proper solution. As almost all network OS
> considers the routing table recursive, and it's next hop can be any
> unicast ip address.

You are talking about ios, junos, no?

Linux does not have any kind of recursive routing table. It only helps
by doing a first-hop lookup during insertion time, that's merely it. If
you want to compare Linux to a "network OS" you would have to install
quagga/bird/xorp/... on a box to get the same behavior.

Also notice that we don't talk about adding/removing addresses to
interfaces but what the routing code considers are the routes which get
created because of those address changes (like the subnet route added in
IPv4 if you install an address with subnet on an interface). Thus we
shouldn't make address changes special, we would have to reevaluate the
complete FIB/routing-table (I guess everyone is talking about something
different here) at the time we only change a route. And this is a no-go.

I don't see a problem with adding a "recursive routing table" to the
stack if people need that. I just don't see the need for that.

> When the next hop is unreachable the entry won't be installed.

In a recursive routing table, the entry could be installed but it will
only get into effect when the nexthop turns reachable.

> I suggest adding a new sysconf entry, when not set, behavior as the
> same as now, when set recalculate the fib when necessary

A new sysctl would work, but I don't consider it necessary. I don't
think we need the additional code for that. Kernel does not run routing
protocols and those are normally the only ones which need to do that.

> BTW is there any way to check the fib table?

I don't understand the question. Do you mean

ip route get xx.yy.zz.aa ?

Bye,
Hannes

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] net/ipv4: inconsistent routing table
  2015-08-10  9:16                   ` Hannes Frederic Sowa
@ 2015-08-10 10:51                     ` Zang MingJie
  2015-08-10 11:50                       ` Hannes Frederic Sowa
  0 siblings, 1 reply; 16+ messages in thread
From: Zang MingJie @ 2015-08-10 10:51 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: netdev, linux-kernel

Here comes several options:

1. reject local next hop w/ EINVAL
2. delete route when local next hop removed
3. transition between RT_SCOPE_HOST amd RT_SCOPE_LINK
4. document it

which one should we choose ?

1 will definitely cause compatibility problem
2 is the easiest solution
3 need a bit of code, not sure if worth it

On Mon, Aug 10, 2015 at 5:16 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> Hello,
>
> Zang MingJie <zealot0630@gmail.com> writes:
>> Days ago I mistakenly set the gateway address on my box, then add the
>> default router, after I deleted the address my box can't access
>> Internet and all things looks fine. It takes me several hours to
>> figure out it is an kernel bug.
>
> I don't consider this a kernel bug.
>
>>>On Sat, Aug 8, 2015, 1:00 AM Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:
>>>If we could rewind time, we could make local nexthops -EINVAL.
>>
>> I don't think this is the proper solution. As almost all network OS
>> considers the routing table recursive, and it's next hop can be any
>> unicast ip address.
>
> You are talking about ios, junos, no?
>
> Linux does not have any kind of recursive routing table. It only helps
> by doing a first-hop lookup during insertion time, that's merely it. If
> you want to compare Linux to a "network OS" you would have to install
> quagga/bird/xorp/... on a box to get the same behavior.
>
> Also notice that we don't talk about adding/removing addresses to
> interfaces but what the routing code considers are the routes which get
> created because of those address changes (like the subnet route added in
> IPv4 if you install an address with subnet on an interface). Thus we
> shouldn't make address changes special, we would have to reevaluate the
> complete FIB/routing-table (I guess everyone is talking about something
> different here) at the time we only change a route. And this is a no-go.
>
> I don't see a problem with adding a "recursive routing table" to the
> stack if people need that. I just don't see the need for that.
>
>> When the next hop is unreachable the entry won't be installed.
>
> In a recursive routing table, the entry could be installed but it will
> only get into effect when the nexthop turns reachable.
>
>> I suggest adding a new sysconf entry, when not set, behavior as the
>> same as now, when set recalculate the fib when necessary
>
> A new sysctl would work, but I don't consider it necessary. I don't
> think we need the additional code for that. Kernel does not run routing
> protocols and those are normally the only ones which need to do that.
>
>> BTW is there any way to check the fib table?
>
> I don't understand the question. Do you mean
>
> ip route get xx.yy.zz.aa ?
>
> Bye,
> Hannes

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] net/ipv4: inconsistent routing table
  2015-08-10 10:51                     ` Zang MingJie
@ 2015-08-10 11:50                       ` Hannes Frederic Sowa
  2015-08-11 20:52                         ` Alexander Duyck
  0 siblings, 1 reply; 16+ messages in thread
From: Hannes Frederic Sowa @ 2015-08-10 11:50 UTC (permalink / raw)
  To: Zang MingJie; +Cc: netdev, linux-kernel

Hello,

Zang MingJie <zealot0630@gmail.com> writes:

> Here comes several options:
>
> 1. reject local next hop w/ EINVAL
> 2. delete route when local next hop removed

Will also cause some people to complain.

> 3. transition between RT_SCOPE_HOST amd RT_SCOPE_LINK

I don't understand the scope transition. I know Alex mentioned it for
the first time. Maybe he can explain?

> 4. document it

I prefer that one :)

> which one should we choose ?
>
> 1 will definitely cause compatibility problem

Agreed.

> 2 is the easiest solution

Will definietely cause some people to complain.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] net/ipv4: inconsistent routing table
  2015-08-10 11:50                       ` Hannes Frederic Sowa
@ 2015-08-11 20:52                         ` Alexander Duyck
  2015-08-11 21:15                           ` David Miller
  2015-08-12  8:14                           ` Zang MingJie
  0 siblings, 2 replies; 16+ messages in thread
From: Alexander Duyck @ 2015-08-11 20:52 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Zang MingJie; +Cc: netdev, linux-kernel

On 08/10/2015 04:50 AM, Hannes Frederic Sowa wrote:
> Hello,
>
> Zang MingJie <zealot0630@gmail.com> writes:
>
>> Here comes several options:
>>
>> 1. reject local next hop w/ EINVAL
>> 2. delete route when local next hop removed
> Will also cause some people to complain.
>
>> 3. transition between RT_SCOPE_HOST amd RT_SCOPE_LINK
> I don't understand the scope transition. I know Alex mentioned it for
> the first time. Maybe he can explain?

If I am not mistaken part of the issue in terms of the behaviour being 
seen is due to the fact that the nexthop scope is recorded only when the 
route is added, and there is code in place in rt_set_nexthop which will 
only use the gateway if the scope is RT_SCOPE_LINK.  So what we would 
probably need to do is go through and audit any routes on a given 
interface every time an address is added or removed and if the nh_gw is 
equal to the address added or removed would would need to transition 
between RT_SCOPE_LINK and RT_SCOPE_HOST since the gateway is 
transitioning between the local system and somewhere on the other side 
of the link.

The problem is that this would still be a behaviour change and there may 
be somebody that has heartburn about it.

>> 4. document it
> I prefer that one :)

Yeah, me too.  The fact is things have worked this way up until now and 
I suspect the reason why this hasn't been reported until now is simply 
because in many cases it works since routes are usually updated if you 
are moving the gateway onto the local system.

- Alex

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] net/ipv4: inconsistent routing table
  2015-08-11 20:52                         ` Alexander Duyck
@ 2015-08-11 21:15                           ` David Miller
  2015-08-12  8:14                           ` Zang MingJie
  1 sibling, 0 replies; 16+ messages in thread
From: David Miller @ 2015-08-11 21:15 UTC (permalink / raw)
  To: alexander.duyck; +Cc: hannes, zealot0630, netdev, linux-kernel

From: Alexander Duyck <alexander.duyck@gmail.com>
Date: Tue, 11 Aug 2015 13:52:27 -0700

> On 08/10/2015 04:50 AM, Hannes Frederic Sowa wrote:
>>> 4. document it
>> I prefer that one :)
> 
> Yeah, me too.  The fact is things have worked this way up until now
> and I suspect the reason why this hasn't been reported until now is
> simply because in many cases it works since routes are usually updated
> if you are moving the gateway onto the local system.

+1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] net/ipv4: inconsistent routing table
  2015-08-11 20:52                         ` Alexander Duyck
  2015-08-11 21:15                           ` David Miller
@ 2015-08-12  8:14                           ` Zang MingJie
  2015-08-12 15:23                             ` Stephen Hemminger
  1 sibling, 1 reply; 16+ messages in thread
From: Zang MingJie @ 2015-08-12  8:14 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Hannes Frederic Sowa, netdev, linux-kernel

On Wed, Aug 12, 2015 at 4:52 AM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On 08/10/2015 04:50 AM, Hannes Frederic Sowa wrote:
>>
>> Hello,
>>
>> Zang MingJie <zealot0630@gmail.com> writes:
>>
>>> Here comes several options:
>>>
>>> 1. reject local next hop w/ EINVAL
>>> 2. delete route when local next hop removed
>>
>> Will also cause some people to complain.
>>
>>> 3. transition between RT_SCOPE_HOST amd RT_SCOPE_LINK
>>
>> I don't understand the scope transition. I know Alex mentioned it for
>> the first time. Maybe he can explain?
>
>
> If I am not mistaken part of the issue in terms of the behaviour being seen
> is due to the fact that the nexthop scope is recorded only when the route is
> added, and there is code in place in rt_set_nexthop which will only use the
> gateway if the scope is RT_SCOPE_LINK.  So what we would probably need to do
> is go through and audit any routes on a given interface every time an
> address is added or removed and if the nh_gw is equal to the address added
> or removed would would need to transition between RT_SCOPE_LINK and
> RT_SCOPE_HOST since the gateway is transitioning between the local system
> and somewhere on the other side of the link.
>
> The problem is that this would still be a behaviour change and there may be
> somebody that has heartburn about it.

That's why I'm going to introduce a sysconf entry, with the entry
unset, keep compatibility; with the entry set, fix the bug.

>
>>> 4. document it
>>
>> I prefer that one :)
>
>
> Yeah, me too.  The fact is things have worked this way up until now and I
> suspect the reason why this hasn't been reported until now is simply because
> in many cases it works since routes are usually updated if you are moving
> the gateway onto the local system.
>
> - Alex

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] net/ipv4: inconsistent routing table
  2015-08-12  8:14                           ` Zang MingJie
@ 2015-08-12 15:23                             ` Stephen Hemminger
  0 siblings, 0 replies; 16+ messages in thread
From: Stephen Hemminger @ 2015-08-12 15:23 UTC (permalink / raw)
  To: Zang MingJie; +Cc: Alexander Duyck, Hannes Frederic Sowa, netdev, linux-kernel

On Wed, 12 Aug 2015 16:14:33 +0800
Zang MingJie <zealot0630@gmail.com> wrote:

> On Wed, Aug 12, 2015 at 4:52 AM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
> > On 08/10/2015 04:50 AM, Hannes Frederic Sowa wrote:
> >>
> >> Hello,
> >>
> >> Zang MingJie <zealot0630@gmail.com> writes:
> >>
> >>> Here comes several options:
> >>>
> >>> 1. reject local next hop w/ EINVAL
> >>> 2. delete route when local next hop removed
> >>
> >> Will also cause some people to complain.
> >>
> >>> 3. transition between RT_SCOPE_HOST amd RT_SCOPE_LINK
> >>
> >> I don't understand the scope transition. I know Alex mentioned it for
> >> the first time. Maybe he can explain?
> >
> >
> > If I am not mistaken part of the issue in terms of the behaviour being seen
> > is due to the fact that the nexthop scope is recorded only when the route is
> > added, and there is code in place in rt_set_nexthop which will only use the
> > gateway if the scope is RT_SCOPE_LINK.  So what we would probably need to do
> > is go through and audit any routes on a given interface every time an
> > address is added or removed and if the nh_gw is equal to the address added
> > or removed would would need to transition between RT_SCOPE_LINK and
> > RT_SCOPE_HOST since the gateway is transitioning between the local system
> > and somewhere on the other side of the link.
> >
> > The problem is that this would still be a behaviour change and there may be
> > somebody that has heartburn about it.
> 
> That's why I'm going to introduce a sysconf entry, with the entry
> unset, keep compatibility; with the entry set, fix the bug.
> 
> >
> >>> 4. document it
> >>
> >> I prefer that one :)
> >
> >
> > Yeah, me too.  The fact is things have worked this way up until now and I
> > suspect the reason why this hasn't been reported until now is simply because
> > in many cases it works since routes are usually updated if you are moving
> > the gateway onto the local system.

Most people doing any router use routing protocols suites like Quagga
or Bird which have a routing management daemon. This is the kind of change
that the routing services portion manages. When a route or interface change
is detected it updates the FIB based on the bigger RIB.



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-08-12 15:23 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-05  8:56 [BUG] net/ipv4: inconsistent routing table Zang MingJie
2015-08-05  9:06 ` Daniel Borkmann
2015-08-05 17:45   ` Alexander Duyck
2015-08-06 10:13     ` Zang MingJie
2015-08-06 19:43       ` Alexander Duyck
2015-08-07  8:23         ` Zang MingJie
2015-08-07 16:08           ` Alexander Duyck
2015-08-07 17:00             ` Hannes Frederic Sowa
     [not found]               ` <CAOrge3qxOb_XrspuvYjV0pDDxUUoqGE3690KUQGoxZMxuD-NRQ@mail.gmail.com>
2015-08-08 10:36                 ` Zang MingJie
2015-08-10  9:16                   ` Hannes Frederic Sowa
2015-08-10 10:51                     ` Zang MingJie
2015-08-10 11:50                       ` Hannes Frederic Sowa
2015-08-11 20:52                         ` Alexander Duyck
2015-08-11 21:15                           ` David Miller
2015-08-12  8:14                           ` Zang MingJie
2015-08-12 15:23                             ` Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).