All of lore.kernel.org
 help / color / mirror / Atom feed
* VRF + ip xfrm, egress ESP packet looping when qdisc configured
@ 2020-01-02 23:11 Trev Larock
  2020-01-03  4:44 ` David Ahern
  0 siblings, 1 reply; 10+ messages in thread
From: Trev Larock @ 2020-01-02 23:11 UTC (permalink / raw)
  To: netdev

With a vrf configured and an xfrm policy I see some ESP packet looping,
only with qdisc.  Tried on:
fedora31 kernel 5.3.7-301.fc31.x86_64
fedora26 kernel 4.16.11

1. VRF case, host-host tunnel mode xfrm, no qdisc
          host1                                |  host2
         +---------------+                     |
         |     vrf0      |                     |
         +---------------+                     |
            |                                  |
            |                                  |
         +--------+                            |
         | enp0s8 | 192.168.56.14 --------------- 192.168.56.16
         +--------+                            |
                                               |
vrf config:
 sysctl net.ipv4.tcp_l3mdev_accept=1
 ip link add dev vrf0 type vrf table 300
 ip link set dev vrf0 up
 ip link set dev enp0s8  master vrf0

xfrm config:
 ip xfrm policy add src 192.168.56.114/32 dst 192.168.56.116/32 \
 dir out priority 367231 ptype main tmpl src 192.168.56.114 dst \
 192.168.56.116 proto esp spi 0x1234567 reqid 1 mode tunnel

 ip xfrm state add src 192.168.56.114 dst 192.168.56.116 proto esp \
 spi 0x1234567 reqid 1 mode tunnel aead rfc4106\(gcm\(aes\)\) \
 0x68db8eabd7f61557247f28f95e668f19855e086d02b21488fde4f5fcc9d42fcfbc9a2e35 \
 128 sel src 192.168.56.114/32 dst 192.168.56.116/32

(No namespace or virtual xfrm interface config involved).

ping -c 1 -w 1 -I vrf0 192.168.56.116
tcpdump -n -i enp0s8
05:01:27.085768 IP 192.168.56.114 > 192.168.56.116:
ESP(spi=0x01234567,seq=0x1), length 120
(ESP packet goes out ok)

2.  VRF + qdisc
If activating qdisc, there is increasing sized 'looping' ESP packet:
tc qdisc add dev vrf0 root netem delay 0ms

tcpdump -n -i enp0s8
(shows nothing)

tcpdump -n -i vrf0
05:08:22.583088 IP 192.168.56.114 > 192.168.56.116: ICMP echo request,
id 8873, seq 1, length 64
05:08:22.583155 IP 192.168.56.114 > 192.168.56.116:
ESP(spi=0x01234567,seq=0xe), length 120
05:08:22.583163 IP 192.168.56.114 > 192.168.56.116:
ESP(spi=0x01234567,seq=0xf), length 176
05:08:22.583168 IP 192.168.56.114 > 192.168.56.116:
ESP(spi=0x01234567,seq=0x10), length 232
05:08:22.583172 IP 192.168.56.114 > 192.168.56.116:
ESP(spi=0x01234567,seq=0x11), length 288
05:08:22.583177 IP 192.168.56.114 > 192.168.56.116:
ESP(spi=0x01234567,seq=0x12), length 344
05:08:22.583182 IP 192.168.56.114 > 192.168.56.116:
ESP(spi=0x01234567,seq=0x13), length 400

Transport mode is same behavior.  Anyone have reference config for vrf + xfrm?
Adding "dev vrf0" to the xfrm policy/state yields cleartext pings as the
oif for xfrm_lookup is enp0s8.

Thanks,
Trev


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: VRF + ip xfrm, egress ESP packet looping when qdisc configured
  2020-01-02 23:11 VRF + ip xfrm, egress ESP packet looping when qdisc configured Trev Larock
@ 2020-01-03  4:44 ` David Ahern
  2020-01-04  5:56   ` Trev Larock
  0 siblings, 1 reply; 10+ messages in thread
From: David Ahern @ 2020-01-03  4:44 UTC (permalink / raw)
  To: Trev Larock, netdev, Ben Greear

On 1/2/20 4:11 PM, Trev Larock wrote:
> Transport mode is same behavior.  Anyone have reference config for vrf + xfrm?

Ben, cc-ed, has done some IPsec + VRF work.

I have not done much wth xfrm + vrf. Can you re-create this with network
namespaces? If so, send the commands and I will take a look when I can.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: VRF + ip xfrm, egress ESP packet looping when qdisc configured
  2020-01-03  4:44 ` David Ahern
@ 2020-01-04  5:56   ` Trev Larock
  2020-01-06  4:27     ` David Ahern
  0 siblings, 1 reply; 10+ messages in thread
From: Trev Larock @ 2020-01-04  5:56 UTC (permalink / raw)
  To: David Ahern; +Cc: Trev Larock, netdev, Ben Greear

On Thu, Jan 2, 2020 at 11:44 PM David Ahern <dsahern@gmail.com> wrote:
> Ben, cc-ed, has done some IPsec + VRF work.
>
> I have not done much wth xfrm + vrf. Can you re-create this with network
> namespaces? If so, send the commands and I will take a look when I can.
>
Thanks for responding David, under namespace the same behavior is seen.
Setup for host1 was fedora31 kernel 5.3.7-301.fc31.x86_64, host2 optional

          host1 netns ns0                      |  host2
         +---------------+                     |
         |     vrf0      |                     |
         +---------------+                     |
            |                                  |
            |                                  |
         +--------+                            |
         | enp0s8 | 192.168.56.116 --------------- 192.168.56.114
         +--------+                            |
                                               |
 ip netns add ns0
 ip netns exec ns0 ip link set lo up
 ip link set dev enp0s8 netns ns0
 sysctl net.ipv4.tcp_l3mdev_accept=1
 ip netns exec ns0 sysctl net.ipv4.tcp_l3mdev_accept=1
 ip netns exec ns0 ip addr add 192.168.56.116/24 dev enp0s8
 ip netns exec ns0 ip link set enp0s8 up
 ip netns exec ns0 ip link add dev vrf0 type vrf table 300
 ip netns exec ns0 ip link set dev vrf0 up
 ip netns exec ns0 ip link set dev enp0s8 master vrf0
 ip netns exec ns0 ip xfrm policy add src 192.168.56.116/32 dst
192.168.56.114/32 dir out priority 367231 ptype main tmpl src
192.168.56.116 dst 192.168.56.114 proto esp spi 0x1234567 reqid 1 mode
tunnel
 ip netns exec ns0 ip xfrm state add src 192.168.56.116 dst
192.168.56.114 proto esp spi 0x1234567 reqid 1 mode tunnel aead
'rfc4106(gcm(aes))'
0x68db8eabd7f61557247f28f95e668f19855e086d02b21488fde4f5fcc9d42fcfbc9a2e35
128 sel src 192.168.56.116/32 dst 192.168.56.114/32

# With qdisc have the looping ESP packet in vrf0
 ip netns exec ns0 tc qdisc add dev vrf0 root netem delay 0ms
# ping to trigger policy
 ip netns exec ns0 ping -c 1 -w 1 -I vrf0 192.168.56.114
# monitor with tcpdump
 ip netns exec ns0 tcpdump -i vrf0 host 192.168.56.114

Thanks
Trev


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: VRF + ip xfrm, egress ESP packet looping when qdisc configured
  2020-01-04  5:56   ` Trev Larock
@ 2020-01-06  4:27     ` David Ahern
  2020-01-06  5:58       ` Trev Larock
  0 siblings, 1 reply; 10+ messages in thread
From: David Ahern @ 2020-01-06  4:27 UTC (permalink / raw)
  To: Trev Larock; +Cc: netdev, Ben Greear

On 1/3/20 10:56 PM, Trev Larock wrote:
> On Thu, Jan 2, 2020 at 11:44 PM David Ahern <dsahern@gmail.com> wrote:
>> Ben, cc-ed, has done some IPsec + VRF work.
>>
>> I have not done much wth xfrm + vrf. Can you re-create this with network
>> namespaces? If so, send the commands and I will take a look when I can.
>>
> Thanks for responding David, under namespace the same behavior is seen.
> Setup for host1 was fedora31 kernel 5.3.7-301.fc31.x86_64, host2 optional
> 
>           host1 netns ns0                      |  host2
>          +---------------+                     |
>          |     vrf0      |                     |
>          +---------------+                     |
>             |                                  |
>             |                                  |
>          +--------+                            |
>          | enp0s8 | 192.168.56.116 --------------- 192.168.56.114
>          +--------+                            |
>                                                |
>  ip netns add ns0
>  ip netns exec ns0 ip link set lo up
>  ip link set dev enp0s8 netns ns0
>  sysctl net.ipv4.tcp_l3mdev_accept=1
>  ip netns exec ns0 sysctl net.ipv4.tcp_l3mdev_accept=1
>  ip netns exec ns0 ip addr add 192.168.56.116/24 dev enp0s8
>  ip netns exec ns0 ip link set enp0s8 up
>  ip netns exec ns0 ip link add dev vrf0 type vrf table 300
>  ip netns exec ns0 ip link set dev vrf0 up
>  ip netns exec ns0 ip link set dev enp0s8 master vrf0
>  ip netns exec ns0 ip xfrm policy add src 192.168.56.116/32 dst
> 192.168.56.114/32 dir out priority 367231 ptype main tmpl src
> 192.168.56.116 dst 192.168.56.114 proto esp spi 0x1234567 reqid 1 mode
> tunnel
>  ip netns exec ns0 ip xfrm state add src 192.168.56.116 dst
> 192.168.56.114 proto esp spi 0x1234567 reqid 1 mode tunnel aead
> 'rfc4106(gcm(aes))'
> 0x68db8eabd7f61557247f28f95e668f19855e086d02b21488fde4f5fcc9d42fcfbc9a2e35
> 128 sel src 192.168.56.116/32 dst 192.168.56.114/32
> 
> # With qdisc have the looping ESP packet in vrf0
>  ip netns exec ns0 tc qdisc add dev vrf0 root netem delay 0ms
> # ping to trigger policy
>  ip netns exec ns0 ping -c 1 -w 1 -I vrf0 192.168.56.114
> # monitor with tcpdump
>  ip netns exec ns0 tcpdump -i vrf0 host 192.168.56.114
> 
> Thanks
> Trev
> 

Hi: I meant a series of commands using *only* network namespaces for
host1 and host2. e.g.,

ip link add veth1 type veth peer name veth2
ip link add dev vrf0 type vrf table 300
ip link set dev vrf0 up
ip link set dev veth1 master vrf0
ip addr add 192.168.56.116/24 dev veth1
ip li set dev veth1 up

ip netns add host2
ip netns exec host2 ip link set lo up
ip link set dev veth2 netns host2
ip netns exec host2 sysctl net.ipv4.tcp_l3mdev_accept=1
ip -netns host2 addr add 192.168.56.114/24 dev veth2
ip -netns host2 link set veth2 up


I was able to adapt your commands with the above and reproduced the
problem. I need to think about the proper solution.

Also, I looked at my commands from a few years ago (IPsec with VRF) and
noticed you are not adding a device context to the xfrm policy and
state. e.g.,

ip xfrm policy flush
ip xfrm policy add src 192.168.56.0/24 dst 192.168.56.0/24 \
  dev vrf0 ...

ip xfrm state flush
ip xfrm state add src 192.168.56.116 dst 192.168.56.114 \
...
   sel dev vrf0 src 192.168.56.116 dst 192.168.56.114


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: VRF + ip xfrm, egress ESP packet looping when qdisc configured
  2020-01-06  4:27     ` David Ahern
@ 2020-01-06  5:58       ` Trev Larock
  2020-01-07 22:59         ` Ben Greear
  0 siblings, 1 reply; 10+ messages in thread
From: Trev Larock @ 2020-01-06  5:58 UTC (permalink / raw)
  To: David Ahern; +Cc: Trev Larock, netdev, Ben Greear

On Sun, Jan 5, 2020 at 11:29 PM David Ahern <dsahern@gmail.com> wrote:
> I was able to adapt your commands with the above and reproduced the
> problem. I need to think about the proper solution.
>
Ok thanks for investigating.

> Also, I looked at my commands from a few years ago (IPsec with VRF) and
> noticed you are not adding a device context to the xfrm policy and
> state. e.g.,
>
Yes was part of my original query, that makes sense in order to be able to have
multiple vrf each with their own xfrm policies.
I will investigate further on it.  The oif passed to xfrm_lookup seemed to be
enp0s8 oif rather than vrf0 oif, so I was observing just cleartext
pings go out / policy wouldn't match.
Perhaps I'm missing something to get vrf0 oif passed for the ping packet.

Thanks,
Trev


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: VRF + ip xfrm, egress ESP packet looping when qdisc configured
  2020-01-06  5:58       ` Trev Larock
@ 2020-01-07 22:59         ` Ben Greear
  2020-01-13 16:48           ` David Ahern
  0 siblings, 1 reply; 10+ messages in thread
From: Ben Greear @ 2020-01-07 22:59 UTC (permalink / raw)
  To: Trev Larock, David Ahern; +Cc: netdev

On 1/5/20 9:58 PM, Trev Larock wrote:
> On Sun, Jan 5, 2020 at 11:29 PM David Ahern <dsahern@gmail.com> wrote:
>> I was able to adapt your commands with the above and reproduced the
>> problem. I need to think about the proper solution.
>>
> Ok thanks for investigating.
> 
>> Also, I looked at my commands from a few years ago (IPsec with VRF) and
>> noticed you are not adding a device context to the xfrm policy and
>> state. e.g.,
>>
> Yes was part of my original query, that makes sense in order to be able to have
> multiple vrf each with their own xfrm policies.
> I will investigate further on it.  The oif passed to xfrm_lookup seemed to be
> enp0s8 oif rather than vrf0 oif, so I was observing just cleartext
> pings go out / policy wouldn't match.
> Perhaps I'm missing something to get vrf0 oif passed for the ping packet.

As luck would have it, I am investigating problems that sound very similar
today.

In my case, I'm not using network name spaces.  For instance:

eth1 is the un-encrypted interface
x_eth1 is the xfrm network device on top of eth1
both belong to _vrf1

What I see is that packets coming in eth1 from the VPN are encrypted and received
on x_eth1.

But, UDP frames that I am trying very hard to send on x_eth1 (SO_BINDTODEVICE is called)
are not actually sent from there but instead go out of eth1 un-encrypted.

David:  I'll be happy to test patches, and if you think it will be a while
before you can write them, if you want to point me to the likely problem places,
I can make an attempt at fixing it.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: VRF + ip xfrm, egress ESP packet looping when qdisc configured
  2020-01-07 22:59         ` Ben Greear
@ 2020-01-13 16:48           ` David Ahern
  2020-02-03  3:13             ` Trev Larock
  0 siblings, 1 reply; 10+ messages in thread
From: David Ahern @ 2020-01-13 16:48 UTC (permalink / raw)
  To: Ben Greear, Trev Larock; +Cc: netdev

On 1/7/20 3:59 PM, Ben Greear wrote:
> 
> As luck would have it, I am investigating problems that sound very similar
> today.

Trev's problem is looping due to the presence of the qdisc. The vrf
driver needs to detect that it has seen the packet and not redirect it
again.

> 
> In my case, I'm not using network name spaces.  For instance:

use of the namespaces is solely for a standalone (single node) test. It
has no bearing on the problem.

> 
> eth1 is the un-encrypted interface
> x_eth1 is the xfrm network device on top of eth1
> both belong to _vrf1
> 
> What I see is that packets coming in eth1 from the VPN are encrypted and
> received
> on x_eth1.
> 
> But, UDP frames that I am trying very hard to send on x_eth1
> (SO_BINDTODEVICE is called)
> are not actually sent from there but instead go out of eth1 un-encrypted.

have you added debugs to the udp code to check that device binding? What
about the fib_table_lookup tracepoint?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: VRF + ip xfrm, egress ESP packet looping when qdisc configured
  2020-01-13 16:48           ` David Ahern
@ 2020-02-03  3:13             ` Trev Larock
  2020-02-03  4:04               ` David Ahern
  0 siblings, 1 reply; 10+ messages in thread
From: Trev Larock @ 2020-02-03  3:13 UTC (permalink / raw)
  To: David Ahern; +Cc: Ben Greear, netdev

On Mon, Jan 13, 2020 at 11:51 AM David Ahern <dsahern@gmail.com> wrote:
> Trev's problem is looping due to the presence of the qdisc. The vrf
> driver needs to detect that it has seen the packet and not redirect it
> again.
Yes note it was when specifying no dev on the xfrm policy/state.
For the non-qdisc case the policy triggered from the __ip4_datagram_connect->
xfrm_lookup and the vrf "direct" route sent it out without any xfrm_lookup call.
It appears to work but it's not really a "xfrm vrf specific " policy.

For qdisc the policy matched again on the vrf->xfrm_lookup call.

When specifying "dev vrf0" I don't see the policy get matched at all.
Should that be triggered in the vrf.c -> xfrm_lookup  call from
vrf_process_v4_outbound or elsewhere?

(The qdisc case seems more like the older / pre dcdd43c41e commit flow.)

From ftrace stack trace with qdisc and sending UDP packet with netcat
   nc-4391  [001] .... 11663.551103: xfrm_lookup <-xfrm_lookup_route
   nc-4391  [001] .... 11663.551104: <stack trace>
 => xfrm_lookup
 => xfrm_lookup_route
 => vrf_xmit
 => dev_hard_start_xmit
 => sch_direct_xmit
 => __qdisc_run
 => __dev_queue_xmit
 => vrf_finish_output
 => vrf_output
 => ip_send_skb
 => udp_send_skb
 => udp_sendmsg
 => sock_sendmsg
 => SYSC_sendto
 => do_syscall_64
 => entry_SYSCALL_64_after_hwframe

Full flow from vrf_xmit:
vrf_xmit
 -->is_ip_tx_frame
   -->vrf_process_v4_outbound
     -->ip_route_output_flow
       -->xfrm_lookup_route
         --> xfrm_lookup

In vrf_process_v4_outbound the flow sets ".flowi4_oif = vrf_dev->ifindex",
should that match the vrf ifindex or the network interface enslaved to the vrf?
I observe it's = network interface so matching a policy with dev vrf0
won't trigger, but not sure if it's missing config or some other issue.
Is there any reference/test sample configs for vrf+xfrm use case where
that matched policy as expected? (even on older kernel).

Thanks,
Trev


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: VRF + ip xfrm, egress ESP packet looping when qdisc configured
  2020-02-03  3:13             ` Trev Larock
@ 2020-02-03  4:04               ` David Ahern
  2020-02-21  4:52                 ` Trev Larock
  0 siblings, 1 reply; 10+ messages in thread
From: David Ahern @ 2020-02-03  4:04 UTC (permalink / raw)
  To: Trev Larock; +Cc: Ben Greear, netdev

On 2/2/20 8:13 PM, Trev Larock wrote:
> On Mon, Jan 13, 2020 at 11:51 AM David Ahern <dsahern@gmail.com> wrote:
>> Trev's problem is looping due to the presence of the qdisc. The vrf
>> driver needs to detect that it has seen the packet and not redirect it
>> again.
> Yes note it was when specifying no dev on the xfrm policy/state.
> For the non-qdisc case the policy triggered from the __ip4_datagram_connect->
> xfrm_lookup and the vrf "direct" route sent it out without any xfrm_lookup call.
> It appears to work but it's not really a "xfrm vrf specific " policy.
> 
> For qdisc the policy matched again on the vrf->xfrm_lookup call.
> 

I understand the problem you are facing. It is limited to xfrm + 	qdisc
on VRF device. I have a proposal for how to fix it:

    https://github.com/dsahern/linux vrf-qdisc-xfrm

Right now I am stuck on debugging related xfrm cases - like xfrm
devices, vrf device in the selector, and vti device. I feel like I need
to get all of them working before sending patches, I just lack enough time.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: VRF + ip xfrm, egress ESP packet looping when qdisc configured
  2020-02-03  4:04               ` David Ahern
@ 2020-02-21  4:52                 ` Trev Larock
  0 siblings, 0 replies; 10+ messages in thread
From: Trev Larock @ 2020-02-21  4:52 UTC (permalink / raw)
  To: David Ahern; +Cc: Trev Larock, Ben Greear, netdev

On Sun, Feb 2, 2020 at 11:04 PM David Ahern <dsahern@gmail.com> wrote:
> I understand the problem you are facing. It is limited to xfrm +        qdisc
> on VRF device. I have a proposal for how to fix it:
>
>     https://github.com/dsahern/linux vrf-qdisc-xfrm

Thanks I tried the fixes on fedora31/ kernel 5.3.8, it did resolve
the qdisc looping packet issue.

> Right now I am stuck on debugging related xfrm cases - like xfrm
> devices, vrf device in the selector, and vti device. I feel like I need
> to get all of them working before sending patches, I just lack enough time.
>

Yes the vrf device in selector issue is still a puzzle.
Without the dev in selector the policy is triggered by the ip4_datagram_connect
call to xfrm_lookup, and there seems no xfrm_lookup call from vrf.c.

With a policy having vrf dev vrf0 in selector, just plaintext packets go out.
For that to trigger properly, should vrf.c be calling xfrm_lookup with the
vrf0 oif, or should that happen elsewhere?


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-02-21  6:39 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-02 23:11 VRF + ip xfrm, egress ESP packet looping when qdisc configured Trev Larock
2020-01-03  4:44 ` David Ahern
2020-01-04  5:56   ` Trev Larock
2020-01-06  4:27     ` David Ahern
2020-01-06  5:58       ` Trev Larock
2020-01-07 22:59         ` Ben Greear
2020-01-13 16:48           ` David Ahern
2020-02-03  3:13             ` Trev Larock
2020-02-03  4:04               ` David Ahern
2020-02-21  4:52                 ` Trev Larock

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.