Re: VRF Issue Since kernel 5

From: Gowen <gowen@potatocomputing.co.uk>
To: David Ahern <dsahern@gmail.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: VRF Issue Since kernel 5
Date: Thu, 12 Sep 2019 06:50:37 +0000	[thread overview]
Message-ID: <CWLP265MB15547011D9510DEA6475B469FDB00@CWLP265MB1554.GBRP265.PROD.OUTLOOK.COM> (raw)
In-Reply-To: <b300235a-00f8-0689-8544-9db07cbd1e21@gmail.com>

Hi David -thanks for getting back to me

The DNS servers are 10.24.65.203 or 10.24.64.203 which you want to go

out mgmt-vrf. correct? No - 10.24.65.203 10.25.65.203, so should hit the route leak rule as below (if I've put the 10.24.64.0/24 subnet anywhere it is a typo)

vmAdmin@NETM06:~$ ip ro get 10.24.65.203 fibmatch
10.24.65.0/24 via 10.24.12.1 dev eth0

I've added the 127/8 route - no difference.

The reason for what you might think is an odd design is that I wanted any non-VRF aware users to be able to come in and run all commands in default context without issue, while production and mgmt traffic was separated still

DNS is now working as long as /etc/resolv.conf is populated with my DNS servers - a lot of people would be using this on Azure which uses netplan, so they'll have the same issue, is there documentation I could update or raise a bug to check the systemd-resolve servers as well?

Gareth

From: David Ahern <dsahern@gmail.com>

Sent: 11 September 2019 18:02

To: Gowen <gowen@potatocomputing.co.uk>; netdev@vger.kernel.org <netdev@vger.kernel.org>

Subject: Re: VRF Issue Since kernel 5

At LPC this week and just now getting a chance to process the data you sent.

On 9/9/19 8:46 AM, Gowen wrote:

> the production traffic is all in the 10.0.0.0/8 network (eth1 global VRF) except for a few subnets (DNS) which are routed out eth0 (mgmt-vrf)

> 

> 

> Admin@NETM06:~$ ip route show

> default via 10.24.12.1 dev eth0

> 10.0.0.0/8 via 10.24.12.1 dev eth1

> 10.24.12.0/24 dev eth1 proto kernel scope link src 10.24.12.9

> 10.24.65.0/24 via 10.24.12.1 dev eth0

> 10.25.65.0/24 via 10.24.12.1 dev eth0

> 10.26.0.0/21 via 10.24.12.1 dev eth0

> 10.26.64.0/21 via 10.24.12.1 dev eth0

interesting route table. This is default VRF but you have route leaking

through eth0 which is in mgmt-vrf.

> 

> 

> Admin@NETM06:~$ ip route show vrf mgmt-vrf

> default via 10.24.12.1 dev eth0

> unreachable default metric 4278198272

> 10.24.12.0/24 dev eth0 proto kernel scope link src 10.24.12.10

> 10.24.65.0/24 via 10.24.12.1 dev eth0

> 10.25.65.0/24 via 10.24.12.1 dev eth0

> 10.26.0.0/21 via 10.24.12.1 dev eth0

> 10.26.64.0/21 via 10.24.12.1 dev eth0

The DNS servers are 10.24.65.203 or 10.24.64.203 which you want to go

out mgmt-vrf. correct?

10.24.65.203 should hit the route "10.24.65.0/24 via 10.24.12.1 dev

eth0" for both default VRF and mgmt-vrf.

10.24.64.203 will NOT hit a route leak entry so traverse the VRF

associated with the context of the command (mgmt-vrf or default). Is

that intentional? (verify with: `ip ro get 10.24.64.203 fibmatch` and

`ip ro get 10.24.64.203 vrf mgmt-vrf fibmatch`)

> 

> 

> 

> The strange activity occurs when I enter the command “sudo apt update” as I can resolve the DNS request (10.24.65.203 or 10.24.64.203, verified with tcpdump) out eth0 but for the actual update traffic there is no activity:

> 

> 

> sudo tcpdump -i eth0 '(host 10.24.65.203 or host 10.25.65.203) and port 53' -n

> <OUTPUT OMITTED FOR BREVITY>

> 10:06:05.268735 IP 10.24.12.10.39963 > 10.24.65.203.53: 48798+ [1au] A? security.ubuntu.com. (48)

> <OUTPUT OMITTED FOR BREVITY>

> 10:06:05.284403 IP 10.24.65.203.53 > 10.24.12.10.39963: 48798 13/0/1 A 91.189.91.23, A 91.189.88.24, A 91.189.91.26, A 91.189.88.162, A 91.189.88.149, A 91.189.91.24, A 91.189.88.173, A 91.189.88.177, A 91.189.88.31, A 91.189.91.14, A 91.189.88.176, A 91.189.88.175,
 A 91.189.88.174 (256)

> 

> 

> 

> You can see that the update traffic is returned but is not accepted by the stack and a RST is sent

> 

> 

> Admin@NETM06:~$ sudo tcpdump -i eth0 '(not host 168.63.129.16 and port 80)' -n

> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

> listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes

> 10:17:12.690658 IP 10.24.12.10.40216 > 91.189.88.175.80: Flags [S], seq 2279624826, win 64240, options [mss 1460,sackOK,TS val 2029365856 ecr 0,nop,wscale 7], length 0

> 10:17:12.691929 IP 10.24.12.10.52362 > 91.189.95.83.80: Flags [S], seq 1465797256, win 64240, options [mss 1460,sackOK,TS val 3833463674 ecr 0,nop,wscale 7], length 0

> 10:17:12.696270 IP 91.189.88.175.80 > 10.24.12.10.40216: Flags [S.], seq 968450722, ack 2279624827, win 28960, options [mss 1418,sackOK,TS val 81957103 ecr 2029365856,nop,wscale 7], length 0                                                                                      

> 10:17:12.696301 IP 10.24.12.10.40216 > 91.189.88.175.80: Flags [R], seq 2279624827, win 0, length 0

> 10:17:12.697884 IP 91.189.95.83.80 > 10.24.12.10.52362: Flags [S.], seq 4148330738, ack 1465797257, win 28960, options [mss 1418,sackOK,TS val 2257624414 ecr 3833463674,nop,wscale 8], length 0                                                                                                                         

> 10:17:12.697909 IP 10.24.12.10.52362 > 91.189.95.83.80: Flags [R], seq 1465797257, win 0, length 0

> 

> 

> 

> 

> I can emulate the DNS lookup using netcat in the vrf:

> 

> 

> sudo ip vrf exec mgmt-vrf nc -u 10.24.65.203 53

> 

`ip vrf exec mgmt-vrf <COMMAND>` means that every IPv4 and IPv6 socket

opened by <COMMAND> is automatically bound to mgmt-vrf which causes

route lookups to hit the mgmt-vrf table.

Just running <COMMAND> (without binding to any vrf) means no socket is

bound to anything unless the command does a bind. In that case the

routing lookups determine which egress device is used.

Now the response comes back, if the ingress interface is a VRF then the

socket lookup wants to match on a device.

Now, a later response shows this for DNS lookups:

  isc-worker0000 20261 [000]  2215.013849: fib:fib_table_lookup: table

10 oif 0 iif 0 proto 0 0.0.0.0/0 -> 127.0.0.1/0 tos 0 scope 0 flags 0

==> dev eth0 gw 10.24.12.1 src 10.24.12.10 err 0

  isc-worker0000 20261 [000]  2215.013915: fib:fib_table_lookup: table

10 oif 4 iif 1 proto 17 0.0.0.0/52138 -> 127.0.0.53/53 tos 0 scope 0

flags 4 ==> dev eth0 gw 10.24.12.1 src 10.24.12.10 err 0

  isc-worker0000 20261 [000]  2220.014006: fib:fib_table_lookup: table

10 oif 4 iif 1 proto 17 0.0.0.0/52138 -> 127.0.0.53/53 tos 0 scope 0

flags 4 ==> dev eth0 gw 10.24.12.1 src 10.24.12.10 err 0

which suggests your process is passing off the DNS lookup to a local

process (isc-worker) and it hits the default route for mgmt-vrf when it

is trying to connect to a localhost address.

For mgmt-vrf I suggest always adding 127.0.0.1/8 to the mgmt vrf device

(and ::1/128 for IPv6 starting with 5.x kernels - I forget the exact

kernel version).

That might solve your problem; it might not.

(BTW: Cumulus uses fib rules for DNS servers to force DNS packets out

the mgmt-vrf interface.)