All of lore.kernel.org
 help / color / mirror / Atom feed
* Commit 0ddcf43d5d4a causes the local table to conflict with the main table
@ 2019-10-26  7:07 Ttttabcd
  2019-10-28 15:54   ` Alexander Duyck
  0 siblings, 1 reply; 6+ messages in thread
From: Ttttabcd @ 2019-10-26  7:07 UTC (permalink / raw)
  To: alexander.h.duyck, davem, idosch, Netdev, lartc

Recently I was implementing a transparent proxy project. Naturally, I started to research the Linux local routing table and the tproxy module.

But when I was studying the behavior of the local routing table, something strange happened, which caused me to spend a lot of time thinking about why.

ip route add local 0.0.0.0/0 dev lo

At the beginning, I tested a local routing table and added 0.0.0.0/0 to let the system think that the entire IPv4 space is a local address.

ip rule
0:	from all lookup local
32766:	from all lookup main
32767:	from all lookup default

I did not add any policy routing, everything stays normal

ip route show table main
default via 192.168.3.1 dev wlan0 proto dhcp metric 600
192.168.3.0/24 dev wlan0 proto kernel scope link src 192.168.3.62 metric 600

The above is my main routing table, I am now on a network of 192.168.3.0/24, the default route is 192.168.3.1.

ip route show table local
local default dev lo scope host
broadcast 127.0.0.0 dev lo proto kernel scope link src 127.0.0.1
local 127.0.0.0/8 dev lo proto kernel scope host src 127.0.0.1
local 127.0.0.1 dev lo proto kernel scope host src 127.0.0.1
broadcast 127.255.255.255 dev lo proto kernel scope link src 127.0.0.1
broadcast 192.168.3.0 dev wlan0 proto kernel scope link src 192.168.3.62
local 192.168.3.62 dev wlan0 proto kernel scope host src 192.168.3.62
broadcast 192.168.3.255 dev wlan0 proto kernel scope link src 192.168.3.62

The above is my current local routing table, you can see that 0.0.0.0/0 (default) has been added, and 127.0.0.1, 127.0.0.0/8, 192.168.3.62 is also my local address.

According to the correct theory, all IPv4 addresses are now my local address, so whatever address I want to connect to should be routed to the loopback interface and sent back to myself.

ip route get 1.1.1.1
local 1.1.1.1 dev lo src 1.1.1.1 uid 0
    cache <local>

ip route get 192.168.3.100
192.168.3.100 dev wlan0 src 192.168.3.62 uid 0
    cache

But I did some testing but found that it was not like this. For example, I tested that 1.1.1.1 will be routed to lo, but 192.168.3.100 will be routed to wlan0. 192.168.3.100 is not routed according to the contents of the local routing table! 192.168.3.100 is normally routed in the main table.

This is the wrong handling. The correct way is that according to the priority of the policy routing, the local table has higher priority than the main table. It should match the local table first. As long as there is a corresponding routing table entry in the local table, the main table will not be used.

This problem has been a headache for me for a long time because it is inconsistent with the description of the ip rule by the Manual page. Until I found an amazing fact in /proc/net/fib_trie.

/proc/net/fib_trie

Main:
  +-- 0.0.0.0/0 3 0 5
     |-- 0.0.0.0
        /0 host LOCAL
        /0 universe UNICAST
     +-- 127.0.0.0/8 2 0 2
        +-- 127.0.0.0/31 1 0 0
           |-- 127.0.0.0
              /32 link BROADCAST
              /8 host LOCAL
           |-- 127.0.0.1
              /32 host LOCAL
        |-- 127.255.255.255
           /32 link BROADCAST
     +-- 192.168.3.0/24 2 1 2
        +-- 192.168.3.0/26 2 0 2
           |-- 192.168.3.0
              /32 link BROADCAST
              /24 link UNICAST
           |-- 192.168.3.62
              /32 host LOCAL
        |-- 192.168.3.255
           /32 link BROADCAST
Local:
  +-- 0.0.0.0/0 3 0 5
     |-- 0.0.0.0
        /0 host LOCAL
        /0 universe UNICAST
     +-- 127.0.0.0/8 2 0 2
        +-- 127.0.0.0/31 1 0 0
           |-- 127.0.0.0
              /32 link BROADCAST
              /8 host LOCAL
           |-- 127.0.0.1
              /32 host LOCAL
        |-- 127.255.255.255
           /32 link BROADCAST
     +-- 192.168.3.0/24 2 1 2
        +-- 192.168.3.0/26 2 0 2
           |-- 192.168.3.0
              /32 link BROADCAST
              /24 link UNICAST
           |-- 192.168.3.62
              /32 host LOCAL
        |-- 192.168.3.255
           /32 link BROADCAST

The contents of the main table and the local table are actually the same! What is this strange phenomenon?

The strange thing is not over yet, then I accidentally added a policy route.

ip rule add table 100 priority 10

ip rule
0:	from all lookup local
10:	from all lookup 100
32766:	from all lookup main
32767:	from all lookup default

I added table 100 with a priority of 10.So the RPDB has become like the above.

The miracle happened, the local table and the main table restored what I thought it should be, they were separated!

/proc/net/fib_trie

Main:
  +-- 0.0.0.0/0 3 0 6
     |-- 0.0.0.0
        /0 universe UNICAST
     |-- 192.168.3.0
        /24 link UNICAST
Local:
  +-- 0.0.0.0/0 3 0 5
     |-- 0.0.0.0
        /0 host LOCAL
     +-- 127.0.0.0/8 2 0 2
        +-- 127.0.0.0/31 1 0 0
           |-- 127.0.0.0
              /32 link BROADCAST
              /8 host LOCAL
           |-- 127.0.0.1
              /32 host LOCAL
        |-- 127.255.255.255
           /32 link BROADCAST
     +-- 192.168.3.0/24 2 1 2
        +-- 192.168.3.0/26 2 0 2
           |-- 192.168.3.0
              /32 link BROADCAST
           |-- 192.168.3.62
              /32 host LOCAL
        |-- 192.168.3.255
           /32 link BROADCAST

So I immediately conducted the previous test.

ip route get 1.1.1.1
local 1.1.1.1 dev lo table local src 1.1.1.1 uid 0
    cache <local>

ip route get 192.168.3.100
local 192.168.3.100 dev lo table local src 192.168.3.100 uid 0
    cache <local>

The result is correct this time, both 1.1.1.1 and 192.168.3.100 are routed to the lo interface. And from the "table local" in the output, it seems that this time the local table is actually used.

Now I have a drink, which is really too ridiculous...

ip rule del table 100

Then I deleted the policy route I just added.

ip rule
0:	from all lookup local
32766:	from all lookup main
32767:	from all lookup default

The RPDB recovered from the beginning, but the magic is that the main and local tables are still separate and not merged as they did at the beginning.

Main:
  +-- 0.0.0.0/0 3 0 6
     |-- 0.0.0.0
        /0 universe UNICAST
     |-- 192.168.3.0
        /24 link UNICAST
Local:
  +-- 0.0.0.0/0 3 0 5
     |-- 0.0.0.0
        /0 host LOCAL
     +-- 127.0.0.0/8 2 0 2
        +-- 127.0.0.0/31 1 0 0
           |-- 127.0.0.0
              /32 link BROADCAST
              /8 host LOCAL
           |-- 127.0.0.1
              /32 host LOCAL
        |-- 127.255.255.255
           /32 link BROADCAST
     +-- 192.168.3.0/24 2 1 2
        +-- 192.168.3.0/26 2 0 2
           |-- 192.168.3.0
              /32 link BROADCAST
           |-- 192.168.3.62
              /32 host LOCAL
        |-- 192.168.3.255
           /32 link BROADCAST

And the previous routing test is still correct, 1.1.1.1 and 192.168.3.100 are routed to lo.

ip route get 1.1.1.1
local 1.1.1.1 dev lo table local src 1.1.1.1 uid 0
    cache <local>

ip route get 192.168.3.100
local 192.168.3.100 dev lo table local src 192.168.3.100 uid 0
    cache <local>

The same RPDB will produce different results! This is even more ridiculous, so I went to have a drink again...

I began to understand that this is a man-made implementation error, not a mistake in my use.

So, I have been at Google for a long time, and in the end I found Commit 0ddcf43d5d4a, which is the root cause of the above problems.

In this patch, in order to improve efficiency, the local table and the main table are merged, and the local table is re-separated from the main table when the custom FIB rule is enabled.

This is in full compliance with the situation I described above, yes, it is caused by this patch.

Then we can discuss it now.

Obviously, merging the main table and the local table breaks the rules of the routing table priority. The local table should be searched independently before the main table.

Perhaps as described in this patch, this can have some performance improvements, but is it worthwhile to have a little bit of performance improvement that undermines the entire routing rule? This is worth thinking about.

Suppose now that a software engineer wants to add 192.168.0.0/16 to the local address, so he naturally executed the following command.

ip route add local 192.168.0.0/16 dev lo

But he does not know that there is a trap in the main table, another overlapping route!

ip route
192.168.3.0/24 dev wlan0 proto kernel scope link src 192.168.3.62 metric 600

ip route get 192.168.1.100
local 192.168.1.100 dev lo src 192.168.1.100 uid 0
    cache <local>

ip route get 192.168.3.100
192.168.3.100 dev wlan0 src 192.168.3.62 uid 0
    cache

This will cause the entire network of 192.168.3.0/24 not to be routed to the local lo interface as he thought!

This will lead to bugs that are very difficult to find! If I am not a programmer who knows a little about the kernel implementation, I can't find out what caused the problem (I checked a lot of source code and read a lot of patches and kernel mail to solve this problem).

Of course, you can say that no one will modify the local routing table under normal circumstances. But is the linux system also designed for geeks? Is it also designed for programmers who want to exploit the full potential of the system?

If you provide a mechanism to modify the local table, you must ensure that the mechanism is working correctly.

And it's absolutely impossible to make this mechanism a significant difference in the execution process after triggering some incredible conditions (custom FIB rules are enabled, even if they are later disabled).

In summary, I don't think this Commit 0ddcf43d5d4a is a good idea.

Welcome to discuss.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Commit 0ddcf43d5d4a causes the local table to conflict with the main table
  2019-10-26  7:07 Commit 0ddcf43d5d4a causes the local table to conflict with the main table Ttttabcd
@ 2019-10-28 15:54   ` Alexander Duyck
  0 siblings, 0 replies; 6+ messages in thread
From: Alexander Duyck @ 2019-10-28 15:54 UTC (permalink / raw)
  To: Ttttabcd; +Cc: alexander.h.duyck, davem, idosch, Netdev, lartc

On Sat, Oct 26, 2019 at 12:10 AM Ttttabcd <ttttabcd@protonmail.com> wrote:

I am dropping all of the above. The fact is commit 0ddcf43d5d4a
("ipv4: FIB Local/MAIN table collapse") has been in the kernel for
over 4 and a half years so hopefully by now most of the network
developers are aware that the tables are merged, and become unmerged
when custom rules are added.

<snip>

> Suppose now that a software engineer wants to add 192.168.0.0/16 to the local address, so he naturally executed the following command.
>
> ip route add local 192.168.0.0/16 dev lo
>
> But he does not know that there is a trap in the main table, another overlapping route!

So the first question I would have is why is the developer
intentionally overlapping the routes and creating duplicate routes?
For most users any address other than the localhost route is always a
/32 when it comes to local routing. Why is there a need to create
another route that is looped back into the system?

> ip route
> 192.168.3.0/24 dev wlan0 proto kernel scope link src 192.168.3.62 metric 600
>
> ip route get 192.168.1.100
> local 192.168.1.100 dev lo src 192.168.1.100 uid 0
>     cache <local>
>
> ip route get 192.168.3.100
> 192.168.3.100 dev wlan0 src 192.168.3.62 uid 0
>     cache
>
> This will cause the entire network of 192.168.3.0/24 not to be routed to the local lo interface as he thought!

I agree this is the behavior. However it muddles the routing tables as
you have overlapping entries in the first place. As it stands doesn't
this effectively render 192.168.0.0/16 a blackhole since what you end
up with is it trying to loop back packets that will be recognized as
something from the local system anyway?

> This will lead to bugs that are very difficult to find! If I am not a programmer who knows a little about the kernel implementation, I can't find out what caused the problem (I checked a lot of source code and read a lot of patches and kernel mail to solve this problem).
>
> Of course, you can say that no one will modify the local routing table under normal circumstances. But is the linux system also designed for geeks? Is it also designed for programmers who want to exploit the full potential of the system?

I would argue that adding routes to the local table is a very "geek"
thing to do. Normally people aren't going to be adding routes there in
the first place. Normally routes are added to main as the assumption
is you are attempting to route traffic out to some external entity.
The local table normally only consists of the localhost route and a
set of /32 addresses as I mentioned earlier.

> If you provide a mechanism to modify the local table, you must ensure that the mechanism is working correctly.
>
> And it's absolutely impossible to make this mechanism a significant difference in the execution process after triggering some incredible conditions (custom FIB rules are enabled, even if they are later disabled).
>
> In summary, I don't think this Commit 0ddcf43d5d4a is a good idea.
>
> Welcome to discuss.

I would disagree. There is a significant performance gain to be had
from this commit. What it is basically doing is taking advantage of
the fact that normally your local trie is going to consist of /32
routes and localhost. In the vast majority of cases there will never
be a clash as you have pointed out.

If anything what I would suggest, assuming the priority inversion
issue you have pointed out needs to be addressed, would be to look at
adding a special case for the addition of non /32 non-localhost routes
to local so that we could unmerge the table on such an addition. The
instance of non-/32 routes being added to local has apparently been
low enough that this hasn't been an issue up until now, and if we are
wanting to focus on the correctness of the fib lookup then this would
be the easiest way to sort it out while maintaining both the
performance and correctness.

Thanks.

- Alex

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Commit 0ddcf43d5d4a causes the local table to conflict with the main table
@ 2019-10-28 15:54   ` Alexander Duyck
  0 siblings, 0 replies; 6+ messages in thread
From: Alexander Duyck @ 2019-10-28 15:54 UTC (permalink / raw)
  To: Ttttabcd; +Cc: alexander.h.duyck, davem, idosch, Netdev, lartc

On Sat, Oct 26, 2019 at 12:10 AM Ttttabcd <ttttabcd@protonmail.com> wrote:

I am dropping all of the above. The fact is commit 0ddcf43d5d4a
("ipv4: FIB Local/MAIN table collapse") has been in the kernel for
over 4 and a half years so hopefully by now most of the network
developers are aware that the tables are merged, and become unmerged
when custom rules are added.

<snip>

> Suppose now that a software engineer wants to add 192.168.0.0/16 to the local address, so he naturally executed the following command.
>
> ip route add local 192.168.0.0/16 dev lo
>
> But he does not know that there is a trap in the main table, another overlapping route!

So the first question I would have is why is the developer
intentionally overlapping the routes and creating duplicate routes?
For most users any address other than the localhost route is always a
/32 when it comes to local routing. Why is there a need to create
another route that is looped back into the system?

> ip route
> 192.168.3.0/24 dev wlan0 proto kernel scope link src 192.168.3.62 metric 600
>
> ip route get 192.168.1.100
> local 192.168.1.100 dev lo src 192.168.1.100 uid 0
>     cache <local>
>
> ip route get 192.168.3.100
> 192.168.3.100 dev wlan0 src 192.168.3.62 uid 0
>     cache
>
> This will cause the entire network of 192.168.3.0/24 not to be routed to the local lo interface as he thought!

I agree this is the behavior. However it muddles the routing tables as
you have overlapping entries in the first place. As it stands doesn't
this effectively render 192.168.0.0/16 a blackhole since what you end
up with is it trying to loop back packets that will be recognized as
something from the local system anyway?

> This will lead to bugs that are very difficult to find! If I am not a programmer who knows a little about the kernel implementation, I can't find out what caused the problem (I checked a lot of source code and read a lot of patches and kernel mail to solve this problem).
>
> Of course, you can say that no one will modify the local routing table under normal circumstances. But is the linux system also designed for geeks? Is it also designed for programmers who want to exploit the full potential of the system?

I would argue that adding routes to the local table is a very "geek"
thing to do. Normally people aren't going to be adding routes there in
the first place. Normally routes are added to main as the assumption
is you are attempting to route traffic out to some external entity.
The local table normally only consists of the localhost route and a
set of /32 addresses as I mentioned earlier.

> If you provide a mechanism to modify the local table, you must ensure that the mechanism is working correctly.
>
> And it's absolutely impossible to make this mechanism a significant difference in the execution process after triggering some incredible conditions (custom FIB rules are enabled, even if they are later disabled).
>
> In summary, I don't think this Commit 0ddcf43d5d4a is a good idea.
>
> Welcome to discuss.

I would disagree. There is a significant performance gain to be had
from this commit. What it is basically doing is taking advantage of
the fact that normally your local trie is going to consist of /32
routes and localhost. In the vast majority of cases there will never
be a clash as you have pointed out.

If anything what I would suggest, assuming the priority inversion
issue you have pointed out needs to be addressed, would be to look at
adding a special case for the addition of non /32 non-localhost routes
to local so that we could unmerge the table on such an addition. The
instance of non-/32 routes being added to local has apparently been
low enough that this hasn't been an issue up until now, and if we are
wanting to focus on the correctness of the fib lookup then this would
be the easiest way to sort it out while maintaining both the
performance and correctness.

Thanks.

- Alex

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Commit 0ddcf43d5d4a causes the local table to conflict with the main table
  2019-10-28 15:54   ` Alexander Duyck
  (?)
@ 2019-10-29  7:07   ` Ttttabcd
  2019-11-03 17:16       ` David Ahern
  -1 siblings, 1 reply; 6+ messages in thread
From: Ttttabcd @ 2019-10-29  7:07 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: davem, idosch, Netdev, lartc

> I am dropping all of the above. The fact is commit 0ddcf43d5d4a
> ("ipv4: FIB Local/MAIN table collapse") has been in the kernel for
> over 4 and a half years so hopefully by now most of the network
> developers are aware that the tables are merged, and become unmerged
> when custom rules are added.

First of all, we can't assume that all developers already know this, even if this feature has been added to the kernel for more than 4 years, because this does not appear in the "ip route" or "ip rule" man page.

> I would argue that adding routes to the local table is a very "geek"
> thing to do. Normally people aren't going to be adding routes there in
> the first place. Normally routes are added to main as the assumption
> is you are attempting to route traffic out to some external entity.
> The local table normally only consists of the localhost route and a
> set of /32 addresses as I mentioned earlier.

You are right, most people in the world will not do this, I am doing this to test the local route in the transparent proxy.

> I would disagree. There is a significant performance gain to be had
> from this commit. What it is basically doing is taking advantage of
> the fact that normally your local trie is going to consist of /32
> routes and localhost. In the vast majority of cases there will never
> be a clash as you have pointed out.

> If anything what I would suggest, assuming the priority inversion
> issue you have pointed out needs to be addressed, would be to look at
> adding a special case for the addition of non /32 non-localhost routes
> to local so that we could unmerge the table on such an addition. The
> instance of non-/32 routes being added to local has apparently been
> low enough that this hasn't been an issue up until now, and if we are
> wanting to focus on the correctness of the fib lookup then this would
> be the easiest way to sort it out while maintaining both the
> performance and correctness.

If you have done experiments that can bring a lot of performance improvements, then please keep the function of merging the main and local tables.

However, please add an explanation of this merge in the man pages of "ip route" and "ip rule", and explain that the route table will be re-segmented after adding the policy route.

As you have described, re-segmenting the local route table when a route other than /32 is added to the local route table is also a very good solution. This fully complies with the priority of the local table and the main table.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Commit 0ddcf43d5d4a causes the local table to conflict with the main table
  2019-10-29  7:07   ` Ttttabcd
@ 2019-11-03 17:16       ` David Ahern
  0 siblings, 0 replies; 6+ messages in thread
From: David Ahern @ 2019-11-03 17:16 UTC (permalink / raw)
  To: Ttttabcd, Alexander Duyck; +Cc: davem, idosch, Netdev, lartc

On 10/29/19 1:07 AM, Ttttabcd wrote:
> However, please add an explanation of this merge in the man pages of "ip route" and "ip rule", and explain that the route table will be re-segmented after adding the policy route.

iproute2 is not the place for documenting internal details of the kernel
implementation.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Commit 0ddcf43d5d4a causes the local table to conflict with the main table
@ 2019-11-03 17:16       ` David Ahern
  0 siblings, 0 replies; 6+ messages in thread
From: David Ahern @ 2019-11-03 17:16 UTC (permalink / raw)
  To: Ttttabcd, Alexander Duyck; +Cc: davem, idosch, Netdev, lartc

On 10/29/19 1:07 AM, Ttttabcd wrote:
> However, please add an explanation of this merge in the man pages of "ip route" and "ip rule", and explain that the route table will be re-segmented after adding the policy route.

iproute2 is not the place for documenting internal details of the kernel
implementation.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-11-03 17:17 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-26  7:07 Commit 0ddcf43d5d4a causes the local table to conflict with the main table Ttttabcd
2019-10-28 15:54 ` Alexander Duyck
2019-10-28 15:54   ` Alexander Duyck
2019-10-29  7:07   ` Ttttabcd
2019-11-03 17:16     ` David Ahern
2019-11-03 17:16       ` David Ahern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.