All of lore.kernel.org
 help / color / mirror / Atom feed
* IP routing sending local packet to gateway.
@ 2021-08-27 14:11 David Laight
  2021-08-27 16:39 ` David Laight
  0 siblings, 1 reply; 7+ messages in thread
From: David Laight @ 2021-08-27 14:11 UTC (permalink / raw)
  To: netdev

I've an odd IP routing issue.
A packet that should be sent on the local subnet (to an ARPed address)
is being send to the default gateway instead.

What seems to happen is:
A TCP connection is opened between A and B.
The only traffic to B is application level keepalives on the connection.
This state is completely stable.

A then makes another connection to B.
B sends the SYN-ACK packet to the default gateway G.
G ARP's B and sends an ICMP host redirect packet to B.

G doesn't seem to forward the packet to A.
B also ignores the icmp redirect.

Now B is sending all traffic with A's IP address to G's MAC address.
So all the connections retry and then timeout.

In this state arping will work while (icmp) ping fails!
Although one of the ping requests does 'fix' it.
Possibly when A actually ARPs B - but I'm not sure.

A is ubuntu 20.0 (5.4.0-81) under vmware - but probably not relevant.
G is likely to be Linux with IP forwarding enabled.

B is an x86-64 kernel I've built from the 5.10.36 LTS sources.
Userspace buildroot/busybox (I need to add ftrace).

Running netstat -rn on B gives the expected 2 routes.
arp -an always seems to show a MAC address for A's IP.

Before I start digging through the code has anyone any ideas?
I don't remember seeing anything going through the mailing lists.

My 'gut feel' is that it has something to do with the arp table
entry timing out (10 minutes??).
The existing TCP connection has a reference to the ARP entry and
is probably using it even though it might be stale.
But the SYN-ACK transmit is trying to locate the entry so may
well have a different error action.

I've not seen any arp packets while the application keepalives
are going on - but those messages are every 5 seconds.
It might be that the arp request on the 10 minute timer
isn't actually being sent (or responded to) and the 'arp failed'
state is getting set so that the later request decides the
'local route' is broken and so uses the 'default route' instead.

B does have two interfaces setup as a 'bond' but only one IP
address on the single virtual interface.
That shouldn't be relevant since it looks like IP routing
rather than anything lower down.

I've not tried any other kernel versions.
I do need to start using the latest 5.10 one soon.
(Build is set to use kernels from kernel.org rather than git.)

Any ideas/suggestions?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: IP routing sending local packet to gateway.
  2021-08-27 14:11 IP routing sending local packet to gateway David Laight
@ 2021-08-27 16:39 ` David Laight
  2021-08-27 16:50   ` David Ahern
  0 siblings, 1 reply; 7+ messages in thread
From: David Laight @ 2021-08-27 16:39 UTC (permalink / raw)
  To: David Laight, netdev

From: David Laight
> Sent: 27 August 2021 15:12
> 
> I've an odd IP routing issue.
> A packet that should be sent on the local subnet (to an ARPed address)
> is being send to the default gateway instead.

I've done some tests on a different network where it all appears to work.

But running 'tcpdump -pen' shows that all the outbound packets for the
TCP connections are being sent to the default gateway.

5.10.30, 5.10.61 and 5.14.0-rc7 all behave the same way.

If do a ping (in either direction) I get an ARP table entry.
But TCP connections (in or out) always use the default gateway.

I'm now getting more confused.
I noticed that the 'default route' was missing the 'metric 100' bit.
That might give the behaviour I'm seeing if the netmask width is ignored.

But if I delete the default route (neither netstat -r or ip route show
it) then packets are still being sent to the deleted gateway.
If I delete the arp/neigh entry for the deleted default gateway an
outward connection recreates the entry - leaving the one for the actual
address 'STALE'.

Something very odd is going on.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: IP routing sending local packet to gateway.
  2021-08-27 16:39 ` David Laight
@ 2021-08-27 16:50   ` David Ahern
  2021-08-31 16:24     ` David Laight
  0 siblings, 1 reply; 7+ messages in thread
From: David Ahern @ 2021-08-27 16:50 UTC (permalink / raw)
  To: David Laight, netdev

On 8/27/21 9:39 AM, David Laight wrote:
> From: David Laight
>> Sent: 27 August 2021 15:12
>>
>> I've an odd IP routing issue.
>> A packet that should be sent on the local subnet (to an ARPed address)
>> is being send to the default gateway instead.
> 
> I've done some tests on a different network where it all appears to work.
> 
> But running 'tcpdump -pen' shows that all the outbound packets for the
> TCP connections are being sent to the default gateway.
> 
> 5.10.30, 5.10.61 and 5.14.0-rc7 all behave the same way.
> 
> If do a ping (in either direction) I get an ARP table entry.
> But TCP connections (in or out) always use the default gateway.
> 
> I'm now getting more confused.
> I noticed that the 'default route' was missing the 'metric 100' bit.
> That might give the behaviour I'm seeing if the netmask width is ignored.
> 
> But if I delete the default route (neither netstat -r or ip route show
> it) then packets are still being sent to the deleted gateway.
> If I delete the arp/neigh entry for the deleted default gateway an
> outward connection recreates the entry - leaving the one for the actual
> address 'STALE'.
> 
> Something very odd is going on.

perf record -e fib:* -a -g -- <run tests>
ctrl-c
perf script

It should tell you code paths and route lookup results. Should shed some
light on why the gw vs local.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: IP routing sending local packet to gateway.
  2021-08-27 16:50   ` David Ahern
@ 2021-08-31 16:24     ` David Laight
  2021-09-01 16:24       ` David Laight
  0 siblings, 1 reply; 7+ messages in thread
From: David Laight @ 2021-08-31 16:24 UTC (permalink / raw)
  To: 'David Ahern', netdev

From: David Ahern
> Sent: 27 August 2021 17:51
> 
> On 8/27/21 9:39 AM, David Laight wrote:
> > From: David Laight
> >> Sent: 27 August 2021 15:12
> >>
> >> I've an odd IP routing issue.
> >> A packet that should be sent on the local subnet (to an ARPed address)
> >> is being send to the default gateway instead.
> >
> > I've done some tests on a different network where it all appears to work.
> >
> > But running 'tcpdump -pen' shows that all the outbound packets for the
> > TCP connections are being sent to the default gateway.
> >
> > 5.10.30, 5.10.61 and 5.14.0-rc7 all behave the same way.
> >
> > If do a ping (in either direction) I get an ARP table entry.
> > But TCP connections (in or out) always use the default gateway.
> >
> > I'm now getting more confused.
> > I noticed that the 'default route' was missing the 'metric 100' bit.
> > That might give the behaviour I'm seeing if the netmask width is ignored.

Setting the metric/priority to 100 makes no difference.
I actually patched the kernel code that processes the netlink
socket request rather than the application that generated the request.
Note that the application hasn't really been changed for 10 years.

> > But if I delete the default route (neither netstat -r or ip route show
> > it) then packets are still being sent to the deleted gateway.
> > If I delete the arp/neigh entry for the deleted default gateway an
> > outward connection recreates the entry - leaving the one for the actual
> > address 'STALE'.
> >
> > Something very odd is going on.
> 
> perf record -e fib:* -a -g -- <run tests>
> ctrl-c
> perf script
> 
> It should tell you code paths and route lookup results. Should shed some
> light on why the gw vs local.

How do I cross-compile 'perf', there don't seem to be any obvious
hints in the Makefile.

But I'm not too sure that would help.
The response to an incoming TCP SYN seems to create a cached entry that
everything else then uses.
I've tried to untangle to code that caches a 'dst' entry on the socket
but it is all rather complicated.

I'm sure it has something to do with the 'fib_trie' data.
When it fails I get:
# cat /proc/net/fib_trie
Id 200:
  |-- 0.0.0.0
     /0 universe UNICAST
Main:
  +-- 0.0.0.0/0 3 0 6
     |-- 0.0.0.0
        /0 universe UNICAST
     |-- 192.168.1.0
        /24 link UNICAST
Local:
  +-- 0.0.0.0/0 2 0 2
     +-- 127.0.0.0/8 2 0 2
        +-- 127.0.0.0/31 1 0 0
           |-- 127.0.0.0
              /32 link BROADCAST
              /8 host LOCAL
           |-- 127.0.0.1
              /32 host LOCAL
        |-- 127.255.255.255
           /32 link BROADCAST
     +-- 192.168.1.0/24 2 0 1
        |-- 192.168.1.0
           /32 link BROADCAST
        |-- 192.168.1.99
           /32 host LOCAL
        |-- 192.168.1.255
           /32 link BROADCAST

1.99 is localhost, gw is 1.1 and the only remote 1.53.
Apart from the 'Id 200' bit (which I assume is something
to do with my bonds) it looks much like a working system.

I can't find anything that lists the cached rt/dst entries
that are cached by the socket.

I remember from looking up the rawip send path that the initial
lookup for outbound messages just finds the 'route' entry and
a second lookup (ref-counting another structure) is done to
get the rt/dst to save on the socket.
(The rawip send ended up creating one for every packet and then
deleting them in massive batches from an rcu timeout.)

I'm guessing that something got broken when that change to the
routing code was made.
It was the change that broke rawip sends where the ip address
in the IP-header didn't match that in the destaddr field.
Was a long time ago.
I wonder if I can test the older kernel.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: IP routing sending local packet to gateway.
  2021-08-31 16:24     ` David Laight
@ 2021-09-01 16:24       ` David Laight
  2021-09-02  3:38         ` David Ahern
  0 siblings, 1 reply; 7+ messages in thread
From: David Laight @ 2021-09-01 16:24 UTC (permalink / raw)
  To: 'David Ahern', 'netdev@vger.kernel.org'

From: David Laight
> Sent: 31 August 2021 17:24
...
> I'm sure it has something to do with the 'fib_trie' data.
> When it fails I get:
> # cat /proc/net/fib_trie
> Id 200:
>   |-- 0.0.0.0
>      /0 universe UNICAST
> Main:
>   +-- 0.0.0.0/0 3 0 6
>      |-- 0.0.0.0
>         /0 universe UNICAST
>      |-- 192.168.1.0
>         /24 link UNICAST
> Local:
>   +-- 0.0.0.0/0 2 0 2
>      +-- 127.0.0.0/8 2 0 2
>         +-- 127.0.0.0/31 1 0 0
>            |-- 127.0.0.0
>               /32 link BROADCAST
>               /8 host LOCAL
>            |-- 127.0.0.1
>               /32 host LOCAL
>         |-- 127.255.255.255
>            /32 link BROADCAST
>      +-- 192.168.1.0/24 2 0 1
>         |-- 192.168.1.0
>            /32 link BROADCAST
>         |-- 192.168.1.99
>            /32 host LOCAL
>         |-- 192.168.1.255
>            /32 link BROADCAST

I've found a script that gets run after the IP address and default route
have been added that does:

	SOURCE=192.168.1.88
	GATEWAY=192.168.1.1

	ip rule add from "$SOURCE" lookup px0
	ip rule add to "$SOURCE" lookup px0

	ip route add default via ${GATEWAY} dev px0 src ${SOURCE} table px0

The 'ip rule' are probably not related (or needed).
I suspect they cause traffic to the local IP be transmitted on px0.
(They may be from a strange setup we had where that might have been needed,
but why something from 10 years ago appeared is beyond me - and our source control.)

Am I right in thinking that the 'table px0' bit is what causes 'Id 200'
be created and that it would really need the normal 'use arp' route
added as well?

There is an attempt at some 'clever routing' in the script.
A second interface can be configured that might have its own
'default route' - but all that traffic (all RTP) is sent using
rawip and can select the specific interface.
It has to be said that should really just use a different network
namespace - and it would all be much simpler.
(As well as giving the RTP access to all 64k UDP port numbers.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: IP routing sending local packet to gateway.
  2021-09-01 16:24       ` David Laight
@ 2021-09-02  3:38         ` David Ahern
  2021-09-02  8:27           ` David Laight
  0 siblings, 1 reply; 7+ messages in thread
From: David Ahern @ 2021-09-02  3:38 UTC (permalink / raw)
  To: David Laight, 'netdev@vger.kernel.org'

On 9/1/21 9:24 AM, David Laight wrote:
> I've found a script that gets run after the IP address and default route
> have been added that does:
> 
> 	SOURCE=192.168.1.88
> 	GATEWAY=192.168.1.1
> 
> 	ip rule add from "$SOURCE" lookup px0
> 	ip rule add to "$SOURCE" lookup px0
> 
> 	ip route add default via ${GATEWAY} dev px0 src ${SOURCE} table px0
> 
> The 'ip rule' are probably not related (or needed).
> I suspect they cause traffic to the local IP be transmitted on px0.
> (They may be from a strange setup we had where that might have been needed,
> but why something from 10 years ago appeared is beyond me - and our source control.)
> 
> Am I right in thinking that the 'table px0' bit is what causes 'Id 200'
> be created and that it would really need the normal 'use arp' route
> added as well?
> 

this is why the fib tracepoint exists. It shows what is happening at the
time of the fib lookup - inputs and lookup results (gw, device) - which
give the clue as to why the packet went the direction it did.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: IP routing sending local packet to gateway.
  2021-09-02  3:38         ` David Ahern
@ 2021-09-02  8:27           ` David Laight
  0 siblings, 0 replies; 7+ messages in thread
From: David Laight @ 2021-09-02  8:27 UTC (permalink / raw)
  To: 'David Ahern', 'netdev@vger.kernel.org'

From: David Ahern
> Sent: 02 September 2021 04:38
> 
> On 9/1/21 9:24 AM, David Laight wrote:
> > I've found a script that gets run after the IP address and default route
> > have been added that does:
> >
> > 	SOURCE=192.168.1.88
> > 	GATEWAY=192.168.1.1
> >
> > 	ip rule add from "$SOURCE" lookup px0
> > 	ip rule add to "$SOURCE" lookup px0
> >
> > 	ip route add default via ${GATEWAY} dev px0 src ${SOURCE} table px0
> >
> > The 'ip rule' are probably not related (or needed).
> > I suspect they cause traffic to the local IP be transmitted on px0.
> > (They may be from a strange setup we had where that might have been needed,
> > but why something from 10 years ago appeared is beyond me - and our source control.)
> >
> > Am I right in thinking that the 'table px0' bit is what causes 'Id 200'
> > be created and that it would really need the normal 'use arp' route
> > added as well?
> >
> 
> this is why the fib tracepoint exists. It shows what is happening at the
> time of the fib lookup - inputs and lookup results (gw, device) - which
> give the clue as to why the packet went the direction it did.

They mostly gave me a hint as to where to look.
There are definitely some code paths where a fib entry is
ignored (and it continues to search) that could do with tracing.

But I had to add extra traces to the 'route add' paths to
find what was adding the extra fib table.
Fortunately I've got a serial console setup (into putty)
so I setup ftrace before the network config actually happens.

Anyway the script is trying to do something that would be
better done with a network namespace.

Thanks for the pointers.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-09-02  8:28 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-27 14:11 IP routing sending local packet to gateway David Laight
2021-08-27 16:39 ` David Laight
2021-08-27 16:50   ` David Ahern
2021-08-31 16:24     ` David Laight
2021-09-01 16:24       ` David Laight
2021-09-02  3:38         ` David Ahern
2021-09-02  8:27           ` David Laight

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.