* In raw prerouting, `iif` matches different interfaces in different kernels when enslaved in a vrf
@ 2021-10-01 15:19 Eugene Crosser
2021-10-02 18:50 ` Florian Westphal
0 siblings, 1 reply; 8+ messages in thread
From: Eugene Crosser @ 2021-10-01 15:19 UTC (permalink / raw)
To: netfilter-devel
[-- Attachment #1.1: Type: text/plain, Size: 3296 bytes --]
When the interface against which you match in the "raw prerouting" is enslaved
in a VRF matching is different in the kernel 5.4 and kernels 5.10 and later (I
have no systems to check kernels in between).
On 5.4, veth interface is matched and zone is set accordingly, then vrf
interface is matched again, rule is executed, according to trace, but once set
zone does not change.
On 5.10 and later, the rule that should match veth interface _does not appear in
the trace_, despite trace shows the veth as the `iif` at that moment. Then the
rule that matches vrf interface is executed, and corresponding zone is set.
Reproducer script creates a veth pair with one end enslaved in a vrf, and sends
a packet to the unenslaved end of the veth. In the prerouting chain, there are
rules that set different conntrack zone depending on which iif matched - veth or
vrf. As a result, entries are created in different zones when the script runs on
earlier and on later kernels. Here are the results (observe different zones),
and the script is below.
========
5.4.86-pserver
conntrack v1.4.5 (conntrack-tools): connection tracking table has been emptied.
PING 172.30.30.2 (172.30.30.2) from 172.30.30.1 vein: 56(84) bytes of data.
64 bytes from 172.30.30.2: icmp_seq=1 ttl=64 time=0.128 ms
--- 172.30.30.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.128/0.128/0.128/0.000 ms
icmp 1 30 src=172.30.30.1 dst=172.30.30.2 type=8 code=0 id=13818 [UNREPLIED]
src=172.30.30.2 dst=172.30.30.1 type=0 code=0 id=13818 mark=0 zone=1 use=1
conntrack v1.4.5 (conntrack-tools): 1 flow entries have been shown.
========
5.13.0-16-generic
conntrack v1.4.6 (conntrack-tools): connection tracking table has been emptied.
PING 172.30.30.2 (172.30.30.2) from 172.30.30.1 vein: 56(84) bytes of data.
64 bytes from 172.30.30.2: icmp_seq=1 ttl=64 time=0.117 ms
--- 172.30.30.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.117/0.117/0.117/0.000 ms
icmp 1 30 src=172.30.30.1 dst=172.30.30.2 type=8 code=0 id=104 [UNREPLIED]
src=172.30.30.2 dst=172.30.30.1 type=0 code=0 id=104 mark=0 zone=2 use=1
conntrack v1.4.6 (conntrack-tools): 1 flow entries have been shown.
========
#!/bin/sh
IPIN=172.30.30.1
IPOUT=172.30.30.2
PFXL=30
ip li sh vein >/dev/null 2>&1 && ip li del vein
ip li sh tvrf >/dev/null 2>&1 && ip li del tvrf
nft list table testct >/dev/null 2>&1 && nft delete table testct
ip li add vein type veth peer veout
ip li add tvrf type vrf table 9876
ip li set veout master tvrf
ip li set vein up
ip li set veout up
ip li set tvrf up
sysctl -w net.ipv4.conf.veout.accept_local=1
ip addr add $IPIN/$PFXL dev vein
ip addr add $IPOUT/$PFXL dev veout
nft -f - <<__END__
table testct {
chain rawpre {
type filter hook prerouting priority raw;
# iif { veout, tvrf } meta nftrace set 1
iif veout ct zone set 1 return
iif tvrf ct zone set 2 return
notrack
}
chain rawout {
type filter hook output priority raw;
notrack
}
}
__END__
uname -r
conntrack -F
ping -W 1 -c 1 -I vein $IPOUT
conntrack -L
========
Is this a known situation? Which behavior is "correct"?
Thank you,
Eugene
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: In raw prerouting, `iif` matches different interfaces in different kernels when enslaved in a vrf
2021-10-01 15:19 In raw prerouting, `iif` matches different interfaces in different kernels when enslaved in a vrf Eugene Crosser
@ 2021-10-02 18:50 ` Florian Westphal
2021-10-06 12:11 ` Eugene Crosser
0 siblings, 1 reply; 8+ messages in thread
From: Florian Westphal @ 2021-10-02 18:50 UTC (permalink / raw)
To: Eugene Crosser; +Cc: netfilter-devel
Eugene Crosser <crosser@average.org> wrote:
> Is this a known situation? Which behavior is "correct"?
No idea, your reproducer gives this on my laptop:
unshare -n bash repro.sh
net.ipv4.conf.veout.accept_local = 1
5.14.9-200.fc34.x86_64
conntrack v1.4.5 (conntrack-tools): connection tracking table has been emptied.
PING 172.30.30.2 (172.30.30.2) from 172.30.30.1 vein: 56(84) bytes of data.
--- 172.30.30.2 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
conntrack v1.4.5 (conntrack-tools): 0 flow entries have been shown.
A bisection is needed to figure out what introduced a change.
However, if this is already changeed for a few releases then we can't
revert it again.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: In raw prerouting, `iif` matches different interfaces in different kernels when enslaved in a vrf
2021-10-02 18:50 ` Florian Westphal
@ 2021-10-06 12:11 ` Eugene Crosser
2021-10-06 14:48 ` Eugene Crosser
2021-10-07 9:29 ` Florian Westphal
0 siblings, 2 replies; 8+ messages in thread
From: Eugene Crosser @ 2021-10-06 12:11 UTC (permalink / raw)
To: Florian Westphal; +Cc: netfilter-devel
[-- Attachment #1.1: Type: text/plain, Size: 3195 bytes --]
Hello Florian,
On 02/10/2021 20:50, Florian Westphal wrote:
> Eugene Crosser <crosser@average.org> wrote:
>> Is this a known situation? Which behavior is "correct"?
>
> No idea, your reproducer gives this on my laptop:
>
> unshare -n bash repro.sh
> net.ipv4.conf.veout.accept_local = 1
> 5.14.9-200.fc34.x86_64
> conntrack v1.4.5 (conntrack-tools): connection tracking table has
been emptied.
> PING 172.30.30.2 (172.30.30.2) from 172.30.30.1 vein: 56(84) bytes of
data.
>
> --- 172.30.30.2 ping statistics ---
> 1 packets transmitted, 0 received, 100% packet loss, time 0ms
>
> conntrack v1.4.5 (conntrack-tools): 0 flow entries have been shown.
It would seem that you have an existing filter that drops packets and
prevents creation of conntrack entries? I can reproduce the behaviour on
freshly installed Debian and Ubuntu VMs without any modifications, with
and without `unshare`.
>
> A bisection is needed to figure out what introduced a change.
>
> However, if this is already changeed for a few releases then we can't
> revert it again.
I think that behaviour change is not benign though. If you have several
interfaces enslaved in one VRF, (which is a normal configuration), you
can no longer create rules that depend on the specific interface from
which the packet arrived.
So far I was able to prove that it depends on the kernel version and
nothing else. I've installed debian bullseye on a fresh VM, and upgraded
it to debian sid. The VM now has two kernels: 5.10.0-8 and 5.14.0-2
(debian builds). When booted with the older kernel, my reproducer shows
"correct" behaviour (rule matches the original veth), when booted with
the newer kernel, behaviour is altered (rule matches VRF instead).
I also updated the reproducer to write nftrace, and it looks
"interesting". I am including the new reproducer below, and I can send
nftrace files if needed.
Now I am trying to bisect upstream kernel.
Thanks.
==========
#!/bin/sh
IPIN=172.30.30.1
IPOUT=172.30.30.2
PFXL=30
ip li sh vein >/dev/null 2>&1 && ip li del vein
ip li sh tvrf >/dev/null 2>&1 && ip li del tvrf
nft list table testct >/dev/null 2>&1 && nft delete table testct
ip li add vein type veth peer veout
ip li add tvrf type vrf table 9876
ip li set veout master tvrf
ip li set vein up
ip li set veout up
ip li set tvrf up
/sbin/sysctl -w net.ipv4.conf.veout.accept_local=1
ip addr add $IPIN/$PFXL dev vein
ip addr add $IPOUT/$PFXL dev veout
nft -f - <<__END__
table testct {
chain rawpre {
type filter hook prerouting priority raw;
iif { veout, tvrf } meta nftrace set 1
iif veout ct zone set 1 return
iif tvrf ct zone set 2 return
notrack
}
chain rawout {
type filter hook output priority raw;
notrack
}
}
__END__
uname -rv
conntrack -F
stdbuf -o0 nft monitor trace >nftrace.`uname -r`.txt &
monpid=$!
ping -W 1 -c 1 -I vein $IPOUT
conntrack -L
sleep 1
kill -15 $monpid
wait
========
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: In raw prerouting, `iif` matches different interfaces in different kernels when enslaved in a vrf
2021-10-06 12:11 ` Eugene Crosser
@ 2021-10-06 14:48 ` Eugene Crosser
2021-10-06 15:03 ` Florian Westphal
2021-10-07 9:29 ` Florian Westphal
1 sibling, 1 reply; 8+ messages in thread
From: Eugene Crosser @ 2021-10-06 14:48 UTC (permalink / raw)
To: Florian Westphal; +Cc: netfilter-devel, Jinpu Wang
[-- Attachment #1.1.1: Type: text/plain, Size: 244 bytes --]
> Now I am trying to bisect upstream kernel.
It looks like Jinpu Wang <jinpu.wang@ionos.com> has found the offending
commit, it's 09e856d54bda5f28 "vrf: Reset skb conntrack connection on
VRF rcv" from Aug 15 2021.
Regards,
Eugene
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 47069 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: In raw prerouting, `iif` matches different interfaces in different kernels when enslaved in a vrf
2021-10-06 14:48 ` Eugene Crosser
@ 2021-10-06 15:03 ` Florian Westphal
2021-10-06 15:09 ` Eugene Crosser
0 siblings, 1 reply; 8+ messages in thread
From: Florian Westphal @ 2021-10-06 15:03 UTC (permalink / raw)
To: Eugene Crosser; +Cc: Florian Westphal, netfilter-devel, Jinpu Wang
Eugene Crosser <crosser@average.org> wrote:
> > Now I am trying to bisect upstream kernel.
>
> It looks like Jinpu Wang <jinpu.wang@ionos.com> has found the offending
> commit, it's 09e856d54bda5f28 "vrf: Reset skb conntrack connection on VRF
> rcv" from Aug 15 2021.
This change is very recent, you reported failure between 5.4 and 5.10, or was
that already backported?
This change doesn't influcence matching either, but it does zap the ct
zone association afaics.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: In raw prerouting, `iif` matches different interfaces in different kernels when enslaved in a vrf
2021-10-06 15:03 ` Florian Westphal
@ 2021-10-06 15:09 ` Eugene Crosser
2021-10-07 9:31 ` Florian Westphal
0 siblings, 1 reply; 8+ messages in thread
From: Eugene Crosser @ 2021-10-06 15:09 UTC (permalink / raw)
To: Florian Westphal; +Cc: netfilter-devel, Jinpu Wang
[-- Attachment #1.1: Type: text/plain, Size: 638 bytes --]
On 06/10/2021 17:03, Florian Westphal wrote:
>> It looks like Jinpu Wang <jinpu.wang@ionos.com> has found the offending
>> commit, it's 09e856d54bda5f28 "vrf: Reset skb conntrack connection on VRF
>> rcv" from Aug 15 2021.
>
> This change is very recent, you reported failure between 5.4 and 5.10, or was
> that already backported?
>
> This change doesn't influcence matching either, but it does zap the ct
> zone association afaics.
Yes, looks like it was backported to Debian/Ubuntu kernels
Jinpu reported that reverting the change restores the "old" behaviour.
But we have not yet checked how it affects SNAT.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: In raw prerouting, `iif` matches different interfaces in different kernels when enslaved in a vrf
2021-10-06 12:11 ` Eugene Crosser
2021-10-06 14:48 ` Eugene Crosser
@ 2021-10-07 9:29 ` Florian Westphal
1 sibling, 0 replies; 8+ messages in thread
From: Florian Westphal @ 2021-10-07 9:29 UTC (permalink / raw)
To: Eugene Crosser; +Cc: Florian Westphal, netfilter-devel
Eugene Crosser <crosser@average.org> wrote:
> It would seem that you have an existing filter that drops packets and
> prevents creation of conntrack entries? I can reproduce the behaviour on
> freshly installed Debian and Ubuntu VMs without any modifications, with and
> without `unshare`.
FWIW, this was due to different default setting of rp_filter.
Adding
sysctl net.ipv4.conf.all.rp_filter=0
sysctl net.ipv4.conf.default.rp_filter=0
to start of script makes it work on my side too.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: In raw prerouting, `iif` matches different interfaces in different kernels when enslaved in a vrf
2021-10-06 15:09 ` Eugene Crosser
@ 2021-10-07 9:31 ` Florian Westphal
0 siblings, 0 replies; 8+ messages in thread
From: Florian Westphal @ 2021-10-07 9:31 UTC (permalink / raw)
To: Eugene Crosser; +Cc: Florian Westphal, netfilter-devel, Jinpu Wang
Eugene Crosser <crosser@average.org> wrote:
> On 06/10/2021 17:03, Florian Westphal wrote:
>
> > > It looks like Jinpu Wang <jinpu.wang@ionos.com> has found the offending
> > > commit, it's 09e856d54bda5f28 "vrf: Reset skb conntrack connection on VRF
> > > rcv" from Aug 15 2021.
> >
> > This change is very recent, you reported failure between 5.4 and 5.10, or was
> > that already backported?
> >
> > This change doesn't influcence matching either, but it does zap the ct
> > zone association afaics.
>
> Yes, looks like it was backported to Debian/Ubuntu kernels
>
> Jinpu reported that reverting the change restores the "old" behaviour.
>
> But we have not yet checked how it affects SNAT.
Can you start a new thread on netdev and CC author of that commit
and l3m/vrf maintainers/authors?
I'm afraid you won't find anyone on the netfilter lists that can make
any statements on what the VRF expectations are.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-10-07 9:31 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-01 15:19 In raw prerouting, `iif` matches different interfaces in different kernels when enslaved in a vrf Eugene Crosser
2021-10-02 18:50 ` Florian Westphal
2021-10-06 12:11 ` Eugene Crosser
2021-10-06 14:48 ` Eugene Crosser
2021-10-06 15:03 ` Florian Westphal
2021-10-06 15:09 ` Eugene Crosser
2021-10-07 9:31 ` Florian Westphal
2021-10-07 9:29 ` Florian Westphal
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.