All of lore.kernel.org
 help / color / mirror / Atom feed
* In raw prerouting, `iif` matches different interfaces in different kernels when enslaved in a vrf
@ 2021-10-01 15:19 Eugene Crosser
  2021-10-02 18:50 ` Florian Westphal
  0 siblings, 1 reply; 8+ messages in thread
From: Eugene Crosser @ 2021-10-01 15:19 UTC (permalink / raw)
  To: netfilter-devel


[-- Attachment #1.1: Type: text/plain, Size: 3296 bytes --]

When the interface against which you match in the "raw prerouting" is enslaved
in a VRF matching is different in the kernel 5.4 and kernels 5.10 and later (I
have no systems to check kernels in between).

On 5.4, veth interface is matched and zone is set accordingly, then vrf
interface is matched again, rule is executed, according to trace, but once set
zone does not change.

On 5.10 and later, the rule that should match veth interface _does not appear in
the trace_, despite trace shows the veth as the `iif` at that moment. Then the
rule that matches vrf interface is executed, and corresponding zone is set.

Reproducer script creates a veth pair with one end enslaved in a vrf, and sends
a packet to the unenslaved end of the veth. In the prerouting chain, there are
rules that set different conntrack zone depending on which iif matched - veth or
vrf. As a result, entries are created in different zones when the script runs on
earlier and on later kernels. Here are the results (observe different zones),
and the script is below.

========
5.4.86-pserver
conntrack v1.4.5 (conntrack-tools): connection tracking table has been emptied.
PING 172.30.30.2 (172.30.30.2) from 172.30.30.1 vein: 56(84) bytes of data.
64 bytes from 172.30.30.2: icmp_seq=1 ttl=64 time=0.128 ms

--- 172.30.30.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.128/0.128/0.128/0.000 ms
icmp     1 30 src=172.30.30.1 dst=172.30.30.2 type=8 code=0 id=13818 [UNREPLIED]
src=172.30.30.2 dst=172.30.30.1 type=0 code=0 id=13818 mark=0 zone=1 use=1
conntrack v1.4.5 (conntrack-tools): 1 flow entries have been shown.

========
5.13.0-16-generic
conntrack v1.4.6 (conntrack-tools): connection tracking table has been emptied.
PING 172.30.30.2 (172.30.30.2) from 172.30.30.1 vein: 56(84) bytes of data.
64 bytes from 172.30.30.2: icmp_seq=1 ttl=64 time=0.117 ms

--- 172.30.30.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.117/0.117/0.117/0.000 ms
icmp     1 30 src=172.30.30.1 dst=172.30.30.2 type=8 code=0 id=104 [UNREPLIED]
src=172.30.30.2 dst=172.30.30.1 type=0 code=0 id=104 mark=0 zone=2 use=1
conntrack v1.4.6 (conntrack-tools): 1 flow entries have been shown.

========
#!/bin/sh

IPIN=172.30.30.1
IPOUT=172.30.30.2
PFXL=30

ip li sh vein >/dev/null 2>&1 && ip li del vein
ip li sh tvrf >/dev/null 2>&1 && ip li del tvrf
nft list table testct >/dev/null 2>&1 && nft delete table testct

ip li add vein type veth peer veout
ip li add tvrf type vrf table 9876
ip li set veout master tvrf
ip li set vein up
ip li set veout up
ip li set tvrf up
sysctl -w net.ipv4.conf.veout.accept_local=1
ip addr add $IPIN/$PFXL dev vein
ip addr add $IPOUT/$PFXL dev veout

nft -f - <<__END__
table testct {
	chain rawpre {
		type filter hook prerouting priority raw;
	#	iif { veout, tvrf } meta nftrace set 1
		iif veout ct zone set 1 return
		iif tvrf ct zone set 2 return
		notrack
	}
	chain rawout {
		type filter hook output priority raw;
		notrack
	}
}
__END__

uname -r
conntrack -F
ping -W 1 -c 1 -I vein $IPOUT
conntrack -L

========

Is this a known situation? Which behavior is "correct"?

Thank you,

Eugene

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: In raw prerouting, `iif` matches different interfaces in different kernels when enslaved in a vrf
  2021-10-01 15:19 In raw prerouting, `iif` matches different interfaces in different kernels when enslaved in a vrf Eugene Crosser
@ 2021-10-02 18:50 ` Florian Westphal
  2021-10-06 12:11   ` Eugene Crosser
  0 siblings, 1 reply; 8+ messages in thread
From: Florian Westphal @ 2021-10-02 18:50 UTC (permalink / raw)
  To: Eugene Crosser; +Cc: netfilter-devel

Eugene Crosser <crosser@average.org> wrote:
> Is this a known situation? Which behavior is "correct"?

No idea, your reproducer gives this on my laptop:

 unshare -n bash repro.sh
net.ipv4.conf.veout.accept_local = 1
5.14.9-200.fc34.x86_64
conntrack v1.4.5 (conntrack-tools): connection tracking table has been emptied.
PING 172.30.30.2 (172.30.30.2) from 172.30.30.1 vein: 56(84) bytes of data.

--- 172.30.30.2 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

conntrack v1.4.5 (conntrack-tools): 0 flow entries have been shown.

A bisection is needed to figure out what introduced a change.

However, if this is already changeed for a few releases then we can't
revert it again.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: In raw prerouting, `iif` matches different interfaces in different kernels when enslaved in a vrf
  2021-10-02 18:50 ` Florian Westphal
@ 2021-10-06 12:11   ` Eugene Crosser
  2021-10-06 14:48     ` Eugene Crosser
  2021-10-07  9:29     ` Florian Westphal
  0 siblings, 2 replies; 8+ messages in thread
From: Eugene Crosser @ 2021-10-06 12:11 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel


[-- Attachment #1.1: Type: text/plain, Size: 3195 bytes --]

Hello Florian,



On 02/10/2021 20:50, Florian Westphal wrote:



 > Eugene Crosser <crosser@average.org> wrote:

 >> Is this a known situation? Which behavior is "correct"?

 >

 > No idea, your reproducer gives this on my laptop:

 >

 >  unshare -n bash repro.sh

 > net.ipv4.conf.veout.accept_local = 1

 > 5.14.9-200.fc34.x86_64

 > conntrack v1.4.5 (conntrack-tools): connection tracking table has 
been emptied.

 > PING 172.30.30.2 (172.30.30.2) from 172.30.30.1 vein: 56(84) bytes of 
data.

 >

 > --- 172.30.30.2 ping statistics ---

 > 1 packets transmitted, 0 received, 100% packet loss, time 0ms

 >

 > conntrack v1.4.5 (conntrack-tools): 0 flow entries have been shown.



It would seem that you have an existing filter that drops packets and 
prevents creation of conntrack entries? I can reproduce the behaviour on 
freshly installed Debian and Ubuntu VMs without any modifications, with 
and without `unshare`.



 >

 > A bisection is needed to figure out what introduced a change.

 >

 > However, if this is already changeed for a few releases then we can't

 > revert it again.



I think that behaviour change is not benign though. If you have several 
interfaces enslaved in one VRF, (which is a normal configuration), you 
can no longer create rules that depend on the specific interface from 
which the packet arrived.



So far I was able to prove that it depends on the kernel version and 
nothing else. I've installed debian bullseye on a fresh VM, and upgraded 
it to debian sid. The VM now has two kernels: 5.10.0-8 and 5.14.0-2 
(debian builds). When booted with the older kernel, my reproducer shows 
"correct" behaviour (rule matches the original veth), when booted with 
the newer kernel, behaviour is altered (rule matches VRF instead).



I also updated the reproducer to write nftrace, and it looks 
"interesting". I am including the new reproducer below, and I can send 
nftrace files if needed.



Now I am trying to bisect upstream kernel.



Thanks.



==========



#!/bin/sh



IPIN=172.30.30.1

IPOUT=172.30.30.2

PFXL=30



ip li sh vein >/dev/null 2>&1 && ip li del vein

ip li sh tvrf >/dev/null 2>&1 && ip li del tvrf

nft list table testct >/dev/null 2>&1 && nft delete table testct



ip li add vein type veth peer veout

ip li add tvrf type vrf table 9876

ip li set veout master tvrf

ip li set vein up

ip li set veout up

ip li set tvrf up

/sbin/sysctl -w net.ipv4.conf.veout.accept_local=1

ip addr add $IPIN/$PFXL dev vein

ip addr add $IPOUT/$PFXL dev veout



nft -f - <<__END__

table testct {

	chain rawpre {

		type filter hook prerouting priority raw;

		iif { veout, tvrf } meta nftrace set 1

		iif veout ct zone set 1 return

		iif tvrf ct zone set 2 return

		notrack

	}

	chain rawout {

		type filter hook output priority raw;

		notrack

	}

}

__END__



uname -rv

conntrack -F

stdbuf -o0 nft monitor trace >nftrace.`uname -r`.txt &

monpid=$!

ping -W 1 -c 1 -I vein $IPOUT

conntrack -L

sleep 1

kill -15 $monpid

wait
========


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: In raw prerouting, `iif` matches different interfaces in different kernels when enslaved in a vrf
  2021-10-06 12:11   ` Eugene Crosser
@ 2021-10-06 14:48     ` Eugene Crosser
  2021-10-06 15:03       ` Florian Westphal
  2021-10-07  9:29     ` Florian Westphal
  1 sibling, 1 reply; 8+ messages in thread
From: Eugene Crosser @ 2021-10-06 14:48 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, Jinpu Wang


[-- Attachment #1.1.1: Type: text/plain, Size: 244 bytes --]

> Now I am trying to bisect upstream kernel.

It looks like Jinpu Wang <jinpu.wang@ionos.com> has found the offending 
commit, it's 09e856d54bda5f28 "vrf: Reset skb conntrack connection on 
VRF rcv" from Aug 15 2021.

Regards,

Eugene

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 47069 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: In raw prerouting, `iif` matches different interfaces in different kernels when enslaved in a vrf
  2021-10-06 14:48     ` Eugene Crosser
@ 2021-10-06 15:03       ` Florian Westphal
  2021-10-06 15:09         ` Eugene Crosser
  0 siblings, 1 reply; 8+ messages in thread
From: Florian Westphal @ 2021-10-06 15:03 UTC (permalink / raw)
  To: Eugene Crosser; +Cc: Florian Westphal, netfilter-devel, Jinpu Wang

Eugene Crosser <crosser@average.org> wrote:
> > Now I am trying to bisect upstream kernel.
> 
> It looks like Jinpu Wang <jinpu.wang@ionos.com> has found the offending
> commit, it's 09e856d54bda5f28 "vrf: Reset skb conntrack connection on VRF
> rcv" from Aug 15 2021.

This change is very recent, you reported failure between 5.4 and 5.10, or was
that already backported?

This change doesn't influcence matching either, but it does zap the ct
zone association afaics.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: In raw prerouting, `iif` matches different interfaces in different kernels when enslaved in a vrf
  2021-10-06 15:03       ` Florian Westphal
@ 2021-10-06 15:09         ` Eugene Crosser
  2021-10-07  9:31           ` Florian Westphal
  0 siblings, 1 reply; 8+ messages in thread
From: Eugene Crosser @ 2021-10-06 15:09 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, Jinpu Wang


[-- Attachment #1.1: Type: text/plain, Size: 638 bytes --]

On 06/10/2021 17:03, Florian Westphal wrote:

>> It looks like Jinpu Wang <jinpu.wang@ionos.com> has found the offending
>> commit, it's 09e856d54bda5f28 "vrf: Reset skb conntrack connection on VRF
>> rcv" from Aug 15 2021.
> 
> This change is very recent, you reported failure between 5.4 and 5.10, or was
> that already backported?
> 
> This change doesn't influcence matching either, but it does zap the ct
> zone association afaics.

Yes, looks like it was backported to Debian/Ubuntu kernels

Jinpu reported that reverting the change restores the "old" behaviour.

But we have not yet checked how it affects SNAT.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: In raw prerouting, `iif` matches different interfaces in different kernels when enslaved in a vrf
  2021-10-06 12:11   ` Eugene Crosser
  2021-10-06 14:48     ` Eugene Crosser
@ 2021-10-07  9:29     ` Florian Westphal
  1 sibling, 0 replies; 8+ messages in thread
From: Florian Westphal @ 2021-10-07  9:29 UTC (permalink / raw)
  To: Eugene Crosser; +Cc: Florian Westphal, netfilter-devel

Eugene Crosser <crosser@average.org> wrote:
> It would seem that you have an existing filter that drops packets and
> prevents creation of conntrack entries? I can reproduce the behaviour on
> freshly installed Debian and Ubuntu VMs without any modifications, with and
> without `unshare`.

FWIW, this was due to different default setting of rp_filter.
Adding
sysctl net.ipv4.conf.all.rp_filter=0
sysctl net.ipv4.conf.default.rp_filter=0

to start of script makes it work on my side too.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: In raw prerouting, `iif` matches different interfaces in different kernels when enslaved in a vrf
  2021-10-06 15:09         ` Eugene Crosser
@ 2021-10-07  9:31           ` Florian Westphal
  0 siblings, 0 replies; 8+ messages in thread
From: Florian Westphal @ 2021-10-07  9:31 UTC (permalink / raw)
  To: Eugene Crosser; +Cc: Florian Westphal, netfilter-devel, Jinpu Wang

Eugene Crosser <crosser@average.org> wrote:
> On 06/10/2021 17:03, Florian Westphal wrote:
> 
> > > It looks like Jinpu Wang <jinpu.wang@ionos.com> has found the offending
> > > commit, it's 09e856d54bda5f28 "vrf: Reset skb conntrack connection on VRF
> > > rcv" from Aug 15 2021.
> > 
> > This change is very recent, you reported failure between 5.4 and 5.10, or was
> > that already backported?
> > 
> > This change doesn't influcence matching either, but it does zap the ct
> > zone association afaics.
> 
> Yes, looks like it was backported to Debian/Ubuntu kernels
> 
> Jinpu reported that reverting the change restores the "old" behaviour.
> 
> But we have not yet checked how it affects SNAT.

Can you start a new thread on netdev and CC author of that commit
and l3m/vrf maintainers/authors?

I'm afraid you won't find anyone on the netfilter lists that can make
any statements on what the VRF expectations are.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-10-07  9:31 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-01 15:19 In raw prerouting, `iif` matches different interfaces in different kernels when enslaved in a vrf Eugene Crosser
2021-10-02 18:50 ` Florian Westphal
2021-10-06 12:11   ` Eugene Crosser
2021-10-06 14:48     ` Eugene Crosser
2021-10-06 15:03       ` Florian Westphal
2021-10-06 15:09         ` Eugene Crosser
2021-10-07  9:31           ` Florian Westphal
2021-10-07  9:29     ` Florian Westphal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.