All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Ahern <dsahern@gmail.com>
To: Maximilian Bosch <maximilian@mbosch.me>, netdev@vger.kernel.org
Subject: Re: VRF Issue Since kernel 5
Date: Wed, 11 Mar 2020 19:06:54 -0600	[thread overview]
Message-ID: <2583bdb7-f9ea-3b7b-1c09-a273d3229b45@gmail.com> (raw)
In-Reply-To: <20200310204721.7jo23zgb7pjf5j33@topsnens>

On 3/10/20 2:47 PM, Maximilian Bosch wrote:
> Hi!
> 
> I suspect I hit the same issue which is why I decided to respond to this
> thread (if that's wrong please let me know).
> 
>> sudo sysctl -a | grep l3mdev
>>
>> If not,
>> sudo sysctl net.ipv4.raw_l3mdev_accept=1
>> sudo sysctl net.ipv4.udp_l3mdev_accept=1
>> sudo sysctl net.ipv4.tcp_l3mdev_accept=1
> 
> On my system (NixOS 20.03, Linux 5.5.8) those values are set to `1`, but
> I experience the same issue.
> 
>> Since Kernel 5 though I am no longer able to update – but the issue is quite a curious one as some traffic appears to be fine (DNS lookups use VRF correctly) but others don’t (updating/upgrading the packages)
> 
> I can reproduce this on 5.4.x and 5.5.x. To be more precise, I suspect
> that only TCP traffic hangs in the VRF. When I try to `ssh` through the
> VRF, I get a timeout, but UDP traffic e.g. from WireGuard works just fine.
> 
> However, TCP traffic through a VRF works fine as well on 4.x (just tested this on
> 4.19.108 and 4.14.172).

functional test script under tools/testing/selftests/net covers VRF
tests and it ran clean for 5.4 last time I checked. There were a few
changes that went into 4.20 or 5.0 that might be tripping up this use
case, but I need a lot more information.

> 
> I use VRFs to enslave my physical uplink interfaces (enp0s31f6, wlp2s0).
> My main routing table has a default route via my WireGuard Gateway and I
> only route my WireGuard uplink through the VRF. With this approach I can
> make sure that all of my traffic goes through the VPN and only the
> UDP packets of WireGuard will be routed through the uplink network.

are you saying wireguard worked with VRF in the past but is not now?


> 
> As mentioned above, the WireGuard traffic works perfectly fine, but I
> can't access `<vpn-uplink>` via SSH:
> 
> ```
> $ ssh root@<vpn-uplink> -vvvv
> OpenSSH_8.2p1, OpenSSL 1.1.1d  10 Sep 2019
> debug1: Reading configuration data /home/ma27/.ssh/config
> debug1: /home/ma27/.ssh/config line 5: Applying options for *
> debug1: Reading configuration data /etc/ssh/ssh_config
> debug1: /etc/ssh/ssh_config line 5: Applying options for *
> debug2: resolve_canonicalize: hostname <vpn-uplink> is address
> debug1: Control socket "/home/ma27/.ssh/master-root@<vpn-uplink>:22" does not exist
> debug2: ssh_connect_direct
> debug1: Connecting to <vpn-uplink> [<vpn-uplink>] port 22.
> # Hangs here for a while
> ```
> 
> I get the following output when debugging this with `tcpdump`:
> 
> ```
> $ tcpdump -ni uplink tcp
> 20:06:40.409006 IP 10.214.40.237.58928 > <vpn-uplink>.22: Flags [S], seq 4123706560, win 65495, options [mss 65495,sackOK,TS val 3798273519 ecr 0,nop,wscale 7], length 0
> 20:06:40.439699 IP <vpn-uplink>.22 > 10.214.40.237.58928: Flags [S.], seq 3289740891, ack 4123706561, win 65160, options [mss 1460,sackOK,TS val 1100235016 ecr 3798273519,nop,wscale 7], length 0
> 20:06:40.439751 IP 10.214.40.237.58928 > <vpn-uplink>.22: Flags [R], seq 4123706561, win 0, length 0

that suggests not finding a matching socket, so sending a reset.

> 20:06:41.451871 IP 10.214.40.237.58928 > <vpn-uplink>.22: Flags [S], seq 4123706560, win 65495, options [mss 65495,sackOK,TS val 3798274562 ecr 0,nop,wscale 7], length 0
> 20:06:41.484498 IP <vpn-uplink>.22 > 10.214.40.237.58928: Flags [S.], seq 3306036877, ack 4123706561, win 65160, options [mss 1460,sackOK,TS val 1100236059 ecr 3798274562,nop,wscale 7], length 0
> 20:06:41.484528 IP 10.214.40.237.58928 > <vpn-uplink>.22: Flags [R], seq 4123706561, win 0, length 0
> ```
> 
> AFAICS every SYN will be terminated with an RST which is the reason why
> the connection hangs.
> 
> I can work around the issue by using `ip vrf exec`. However I get the
> following error (unless I run `ulimit -l 2048`):
> 
> ```
> Failed to load BPF prog: 'Operation not permitted'
> ```

'ip vrf exec' loads a bpf program and that requires locked memory, so
yes, you need to increase it.

Let's start with lookups:

perf record -e fib:* -a -g
<run test that fails, ctrl-c>
perf script

That shows the lookups (inputs, table id, result) and context (stack
trace). That might give some context.

  reply	other threads:[~2020-03-12  1:07 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-09  7:46 VRF Issue Since kernel 5 Gowen
2019-09-09  9:28 ` Alexis Bauvin
     [not found]   ` <CWLP265MB1554B902B7F3B43E6E75FD0DFDB70@CWLP265MB1554.GBRP265.PROD.OUTLOOK.COM>
2019-09-09 12:01     ` Alexis Bauvin
2019-09-09 19:43       ` Gowen
2019-09-10 14:22         ` Gowen
2019-09-10 16:36       ` David Ahern
2019-09-11  5:09         ` Gowen
2019-09-11 11:19           ` Gowen
2019-09-11 11:49             ` Gowen
2019-09-11 12:15               ` Mike Manning
     [not found]                 ` <CWLP265MB155485682829AD9B66AB66FCFDB10@CWLP265MB1554.GBRP265.PROD.OUTLOOK.COM>
     [not found]                   ` <CWLP265MB155424EF95E39E98C4502F86FDB10@CWLP265MB1554.GBRP265.PROD.OUTLOOK.COM>
2019-09-11 16:09                     ` David Ahern
2019-09-12  6:54                       ` Gowen
2020-03-10 20:47                 ` Maximilian Bosch
2020-03-12  1:06                   ` David Ahern [this message]
2020-04-01 18:16                     ` Maximilian Bosch
2020-04-01 19:18                       ` David Ahern
2020-04-01 20:35                         ` Maximilian Bosch
2020-04-01 20:41                           ` David Ahern
2020-04-02 23:02                             ` Maximilian Bosch
2020-04-05 16:52                               ` David Ahern
2020-04-08 10:07                                 ` Mike Manning
2020-04-08 15:36                                   ` David Ahern
2020-04-19 20:35                                   ` Maximilian Bosch
2019-09-11 16:53   ` David Ahern
2019-09-10 16:39 ` David Ahern
2019-09-11 17:02 ` David Ahern
2019-09-12  6:50   ` Gowen
2019-09-13 17:41     ` David Ahern

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2583bdb7-f9ea-3b7b-1c09-a273d3229b45@gmail.com \
    --to=dsahern@gmail.com \
    --cc=maximilian@mbosch.me \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.