All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
@ 2021-08-09  7:04 Hangbin Liu
  2021-08-09  8:35 ` Roi Dayan
  2021-08-09 10:00 ` patchwork-bot+netdevbpf
  0 siblings, 2 replies; 10+ messages in thread
From: Hangbin Liu @ 2021-08-09  7:04 UTC (permalink / raw)
  To: netdev
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller,
	Jakub Kicinski, Marcelo Ricardo Leitner, Alaa Hleihel,
	Davide Caratti, Aaron Conole, Hangbin Liu, Roi Dayan

When mirror/redirect a skb to a different port, the ct info should be reset
for reclassification. Or the pkts will match unexpected rules. For example,
with following topology and commands:

    -----------
              |
       veth0 -+-------
              |
       veth1 -+-------
              |
   ------------

 tc qdisc add dev veth0 clsact
 # The same with "action mirred egress mirror dev veth1" or "action mirred ingress redirect dev veth1"
 tc filter add dev veth0 egress chain 1 protocol ip flower ct_state +trk action mirred ingress mirror dev veth1
 tc filter add dev veth0 egress chain 0 protocol ip flower ct_state -inv action ct commit action goto chain 1
 tc qdisc add dev veth1 clsact
 tc filter add dev veth1 ingress chain 0 protocol ip flower ct_state +trk action drop

 ping <remove ip via veth0> &
 tc -s filter show dev veth1 ingress

With command 'tc -s filter show', we can find the pkts were dropped on
veth1.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 net/sched/act_mirred.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 7153c67f641e..2ef4cd2c848b 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -273,6 +273,9 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a,
 			goto out;
 	}
 
+	/* All mirred/redirected skbs should clear previous ct info */
+	nf_reset_ct(skb2);
+
 	want_ingress = tcf_mirred_act_wants_ingress(m_eaction);
 
 	expects_nh = want_ingress || !m_mac_header_xmit;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
  2021-08-09  7:04 [PATCH net] net: sched: act_mirred: Reset ct info when mirror/redirect skb Hangbin Liu
@ 2021-08-09  8:35 ` Roi Dayan
  2021-08-09 10:00 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 10+ messages in thread
From: Roi Dayan @ 2021-08-09  8:35 UTC (permalink / raw)
  To: Hangbin Liu, netdev
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller,
	Jakub Kicinski, Marcelo Ricardo Leitner, Alaa Hleihel,
	Davide Caratti, Aaron Conole



On 2021-08-09 10:04 AM, Hangbin Liu wrote:
> When mirror/redirect a skb to a different port, the ct info should be reset
> for reclassification. Or the pkts will match unexpected rules. For example,
> with following topology and commands:
> 
>      -----------
>                |
>         veth0 -+-------
>                |
>         veth1 -+-------
>                |
>     ------------
> 
>   tc qdisc add dev veth0 clsact
>   # The same with "action mirred egress mirror dev veth1" or "action mirred ingress redirect dev veth1"
>   tc filter add dev veth0 egress chain 1 protocol ip flower ct_state +trk action mirred ingress mirror dev veth1
>   tc filter add dev veth0 egress chain 0 protocol ip flower ct_state -inv action ct commit action goto chain 1
>   tc qdisc add dev veth1 clsact
>   tc filter add dev veth1 ingress chain 0 protocol ip flower ct_state +trk action drop
> 
>   ping <remove ip via veth0> &
>   tc -s filter show dev veth1 ingress
> 
> With command 'tc -s filter show', we can find the pkts were dropped on
> veth1.
> 

We can add a fixes line on the commit added ct support
Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct")

> Signed-off-by: Roi Dayan <roid@nvidia.com>
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> ---
>   net/sched/act_mirred.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
> index 7153c67f641e..2ef4cd2c848b 100644
> --- a/net/sched/act_mirred.c
> +++ b/net/sched/act_mirred.c
> @@ -273,6 +273,9 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a,
>   			goto out;
>   	}
>   
> +	/* All mirred/redirected skbs should clear previous ct info */
> +	nf_reset_ct(skb2);
> +
>   	want_ingress = tcf_mirred_act_wants_ingress(m_eaction);
>   
>   	expects_nh = want_ingress || !m_mac_header_xmit;
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
  2021-08-09  7:04 [PATCH net] net: sched: act_mirred: Reset ct info when mirror/redirect skb Hangbin Liu
  2021-08-09  8:35 ` Roi Dayan
@ 2021-08-09 10:00 ` patchwork-bot+netdevbpf
  2022-04-19 16:50   ` Eyal Birger
  1 sibling, 1 reply; 10+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-08-09 10:00 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: netdev, jhs, xiyou.wangcong, jiri, davem, kuba, mleitner,
	ahleihel, dcaratti, aconole, roid

Hello:

This patch was applied to netdev/net.git (refs/heads/master):

On Mon,  9 Aug 2021 15:04:55 +0800 you wrote:
> When mirror/redirect a skb to a different port, the ct info should be reset
> for reclassification. Or the pkts will match unexpected rules. For example,
> with following topology and commands:
> 
>     -----------
>               |
>        veth0 -+-------
>               |
>        veth1 -+-------
>               |
> 
> [...]

Here is the summary with links:
  - [net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
    https://git.kernel.org/netdev/net/c/d09c548dbf3b

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
  2021-08-09 10:00 ` patchwork-bot+netdevbpf
@ 2022-04-19 16:50   ` Eyal Birger
  2022-04-19 17:26     ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 10+ messages in thread
From: Eyal Birger @ 2022-04-19 16:50 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: netdev, jhs, xiyou.wangcong, jiri, davem, kuba, mleitner,
	ahleihel, dcaratti, aconole, roid, Shmulik Ladkani

Hi,

On Mon, Aug 9, 2021 at 1:29 PM <patchwork-bot+netdevbpf@kernel.org> wrote:
>
> Hello:
>
> This patch was applied to netdev/net.git (refs/heads/master):
>
> On Mon,  9 Aug 2021 15:04:55 +0800 you wrote:
> > When mirror/redirect a skb to a different port, the ct info should be reset
> > for reclassification. Or the pkts will match unexpected rules. For example,
> > with following topology and commands:
> >
> >     -----------
> >               |
> >        veth0 -+-------
> >               |
> >        veth1 -+-------
> >               |
> >
> > [...]
>
> Here is the summary with links:
>   - [net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
>     https://git.kernel.org/netdev/net/c/d09c548dbf3b

Unfortunately this commit breaks DNAT when performed before going via mirred
egress->ingress.

The reason is that connection tracking is lost and therefore a new state
is created on ingress.

This breaks existing setups.

See below a simplified script reproducing this issue.

Therefore I suggest this commit be reverted and a knob is introduced to mirred
for clearing ct as needed.

Eyal.

Reproduction script:

#!/bin/bash

ip netns add a
ip netns add b

ip netns exec a sysctl -w net.ipv4.conf.all.forwarding=1
ip netns exec a sysctl -w net.ipv4.conf.all.accept_local=1

ip link add veth0 netns a type veth peer name veth0 netns b
ip -net a link set veth0 up
ip -net a addr add dev veth0 198.51.100.1/30

ip -net a link add dum0 type dummy
ip -net a link set dev dum0 up
ip -net a addr add dev dum0 198.51.100.2/32

ip netns exec a iptables -t nat -I OUTPUT -d 10.0.0.1 -j DNAT
--to-destination 10.0.0.2
ip -net a route add default dev dum0
ip -net a rule add pref 50 iif dum0 lookup 1000
ip -net a route add table 1000 default dev veth0

ip netns exec a tc qdisc add dev dum0 clsact
ip netns exec a tc filter add dev dum0 parent ffff:fff3 prio 50 basic
action mirred ingress redirect dev dum0

ip -net b link set veth0 up
ip  -net b addr add 10.0.0.2/32 dev veth0
ip  -net b addr add 198.51.100.3/30 dev veth0

ip netns exec a ping 10.0.0.1
>
> You are awesome, thank you!
> --
> Deet-doot-dot, I am a bot.
> https://korg.docs.kernel.org/patchwork/pwbot.html
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
  2022-04-19 16:50   ` Eyal Birger
@ 2022-04-19 17:26     ` Marcelo Ricardo Leitner
  2022-04-19 18:14       ` Eyal Birger
  0 siblings, 1 reply; 10+ messages in thread
From: Marcelo Ricardo Leitner @ 2022-04-19 17:26 UTC (permalink / raw)
  To: Eyal Birger
  Cc: Hangbin Liu, netdev, jhs, xiyou.wangcong, jiri, davem, kuba,
	ahleihel, dcaratti, aconole, roid, Shmulik Ladkani

Hi,

On Tue, Apr 19, 2022 at 07:50:38PM +0300, Eyal Birger wrote:
> Hi,
>
> On Mon, Aug 9, 2021 at 1:29 PM <patchwork-bot+netdevbpf@kernel.org> wrote:
> >
> > Hello:
> >
> > This patch was applied to netdev/net.git (refs/heads/master):
> >
> > On Mon,  9 Aug 2021 15:04:55 +0800 you wrote:
> > > When mirror/redirect a skb to a different port, the ct info should be reset
> > > for reclassification. Or the pkts will match unexpected rules. For example,
> > > with following topology and commands:
> > >
> > >     -----------
> > >               |
> > >        veth0 -+-------
> > >               |
> > >        veth1 -+-------
> > >               |
> > >
> > > [...]
> >
> > Here is the summary with links:
> >   - [net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
> >     https://git.kernel.org/netdev/net/c/d09c548dbf3b
>
> Unfortunately this commit breaks DNAT when performed before going via mirred
> egress->ingress.
>
> The reason is that connection tracking is lost and therefore a new state
> is created on ingress.
>
> This breaks existing setups.
>
> See below a simplified script reproducing this issue.

I guess I can understand why the reproducer triggers it, but I fail to
see the actual use case you have behind it. Can you please elaborate
on it?

>
> Therefore I suggest this commit be reverted and a knob is introduced to mirred
> for clearing ct as needed.
>
> Eyal.
>
> Reproduction script:
>
> #!/bin/bash
>
> ip netns add a
> ip netns add b
>
> ip netns exec a sysctl -w net.ipv4.conf.all.forwarding=1
> ip netns exec a sysctl -w net.ipv4.conf.all.accept_local=1
>
> ip link add veth0 netns a type veth peer name veth0 netns b
> ip -net a link set veth0 up
> ip -net a addr add dev veth0 198.51.100.1/30
>
> ip -net a link add dum0 type dummy
> ip -net a link set dev dum0 up
> ip -net a addr add dev dum0 198.51.100.2/32
>
> ip netns exec a iptables -t nat -I OUTPUT -d 10.0.0.1 -j DNAT
> --to-destination 10.0.0.2
> ip -net a route add default dev dum0
> ip -net a rule add pref 50 iif dum0 lookup 1000
> ip -net a route add table 1000 default dev veth0
>
> ip netns exec a tc qdisc add dev dum0 clsact
> ip netns exec a tc filter add dev dum0 parent ffff:fff3 prio 50 basic
> action mirred ingress redirect dev dum0
>
> ip -net b link set veth0 up
> ip  -net b addr add 10.0.0.2/32 dev veth0
> ip  -net b addr add 198.51.100.3/30 dev veth0
>
> ip netns exec a ping 10.0.0.1
> >
> > You are awesome, thank you!
> > --
> > Deet-doot-dot, I am a bot.
> > https://korg.docs.kernel.org/patchwork/pwbot.html
> >
> >
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
  2022-04-19 17:26     ` Marcelo Ricardo Leitner
@ 2022-04-19 18:14       ` Eyal Birger
  2022-04-20 10:07         ` Hangbin Liu
  2022-04-21 11:00         ` Hangbin Liu
  0 siblings, 2 replies; 10+ messages in thread
From: Eyal Birger @ 2022-04-19 18:14 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Hangbin Liu, netdev, jhs, xiyou.wangcong, jiri, davem, kuba,
	ahleihel, dcaratti, aconole, roid, Shmulik Ladkani

Hi,

On Tue, Apr 19, 2022 at 8:26 PM Marcelo Ricardo Leitner
<mleitner@redhat.com> wrote:
>
> Hi,
>
> On Tue, Apr 19, 2022 at 07:50:38PM +0300, Eyal Birger wrote:
> > Hi,
> >
> > On Mon, Aug 9, 2021 at 1:29 PM <patchwork-bot+netdevbpf@kernel.org> wrote:
> > >
> > > Hello:
> > >
> > > This patch was applied to netdev/net.git (refs/heads/master):
> > >
> > > On Mon,  9 Aug 2021 15:04:55 +0800 you wrote:
> > > > When mirror/redirect a skb to a different port, the ct info should be reset
> > > > for reclassification. Or the pkts will match unexpected rules. For example,
> > > > with following topology and commands:
> > > >
> > > >     -----------
> > > >               |
> > > >        veth0 -+-------
> > > >               |
> > > >        veth1 -+-------
> > > >               |
> > > >
> > > > [...]
> > >
> > > Here is the summary with links:
> > >   - [net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
> > >     https://git.kernel.org/netdev/net/c/d09c548dbf3b
> >
> > Unfortunately this commit breaks DNAT when performed before going via mirred
> > egress->ingress.
> >
> > The reason is that connection tracking is lost and therefore a new state
> > is created on ingress.
> >
> > This breaks existing setups.
> >
> > See below a simplified script reproducing this issue.
>
> I guess I can understand why the reproducer triggers it, but I fail to
> see the actual use case you have behind it. Can you please elaborate
> on it?

One use case we use mirred egress->ingress redirect for is when we want to
reroute a packet after applying some change to the packet which would affect
its routing. for example consider a bpf program running on tc ingress (after
mirred) setting the skb->mark based on some criteria.

So you have something like:

packet routed to dummy device based on some criteria ->
  mirred redirect to ingress ->
    classification by ebpf logic at tc ingress ->
       packet routed again

We have a setup where DNAT is performed before this flow in that case the
ebpf logic needs to see the packet after the NAT.

Eyal.

>
> >
> > Therefore I suggest this commit be reverted and a knob is introduced to mirred
> > for clearing ct as needed.
> >
> > Eyal.
> >
> > Reproduction script:
> >
> > #!/bin/bash
> >
> > ip netns add a
> > ip netns add b
> >
> > ip netns exec a sysctl -w net.ipv4.conf.all.forwarding=1
> > ip netns exec a sysctl -w net.ipv4.conf.all.accept_local=1
> >
> > ip link add veth0 netns a type veth peer name veth0 netns b
> > ip -net a link set veth0 up
> > ip -net a addr add dev veth0 198.51.100.1/30
> >
> > ip -net a link add dum0 type dummy
> > ip -net a link set dev dum0 up
> > ip -net a addr add dev dum0 198.51.100.2/32
> >
> > ip netns exec a iptables -t nat -I OUTPUT -d 10.0.0.1 -j DNAT
> > --to-destination 10.0.0.2
> > ip -net a route add default dev dum0
> > ip -net a rule add pref 50 iif dum0 lookup 1000
> > ip -net a route add table 1000 default dev veth0
> >
> > ip netns exec a tc qdisc add dev dum0 clsact
> > ip netns exec a tc filter add dev dum0 parent ffff:fff3 prio 50 basic
> > action mirred ingress redirect dev dum0
> >
> > ip -net b link set veth0 up
> > ip  -net b addr add 10.0.0.2/32 dev veth0
> > ip  -net b addr add 198.51.100.3/30 dev veth0
> >
> > ip netns exec a ping 10.0.0.1
> > >
> > > You are awesome, thank you!
> > > --
> > > Deet-doot-dot, I am a bot.
> > > https://korg.docs.kernel.org/patchwork/pwbot.html
> > >
> > >
> >
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
  2022-04-19 18:14       ` Eyal Birger
@ 2022-04-20 10:07         ` Hangbin Liu
  2022-04-21 11:00         ` Hangbin Liu
  1 sibling, 0 replies; 10+ messages in thread
From: Hangbin Liu @ 2022-04-20 10:07 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Eyal Birger, netdev, jhs, xiyou.wangcong, jiri, davem, kuba,
	ahleihel, dcaratti, aconole, roid, Shmulik Ladkani

On Tue, Apr 19, 2022 at 09:14:38PM +0300, Eyal Birger wrote:
> >
> > I guess I can understand why the reproducer triggers it, but I fail to
> > see the actual use case you have behind it. Can you please elaborate
> > on it?
> 
> One use case we use mirred egress->ingress redirect for is when we want to
> reroute a packet after applying some change to the packet which would affect
> its routing. for example consider a bpf program running on tc ingress (after
> mirred) setting the skb->mark based on some criteria.
> 
> So you have something like:
> 
> packet routed to dummy device based on some criteria ->
>   mirred redirect to ingress ->
>     classification by ebpf logic at tc ingress ->
>        packet routed again
> 
> We have a setup where DNAT is performed before this flow in that case the
> ebpf logic needs to see the packet after the NAT.

Hi Marcelo,

Thanks for taking care of this. Would you help following up this issue as
you are more familiar with net sched?

Thanks
Hangbin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
  2022-04-19 18:14       ` Eyal Birger
  2022-04-20 10:07         ` Hangbin Liu
@ 2022-04-21 11:00         ` Hangbin Liu
  2022-04-22 13:41           ` Marcelo Ricardo Leitner
  1 sibling, 1 reply; 10+ messages in thread
From: Hangbin Liu @ 2022-04-21 11:00 UTC (permalink / raw)
  To: Eyal Birger
  Cc: Marcelo Ricardo Leitner, netdev, jhs, xiyou.wangcong, jiri,
	davem, kuba, ahleihel, dcaratti, aconole, roid, Shmulik Ladkani

Hi Eyal,
On Tue, Apr 19, 2022 at 09:14:38PM +0300, Eyal Birger wrote:
> > > > On Mon,  9 Aug 2021 15:04:55 +0800 you wrote:
> > > > > When mirror/redirect a skb to a different port, the ct info should be reset
> > > > > for reclassification. Or the pkts will match unexpected rules. For example,
> > > > > with following topology and commands:
> > > > >
> > > > >     -----------
> > > > >               |
> > > > >        veth0 -+-------
> > > > >               |
> > > > >        veth1 -+-------
> > > > >               |
> > > > >
> > > > > [...]
> > > >
> > > > Here is the summary with links:
> > > >   - [net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
> > > >     https://git.kernel.org/netdev/net/c/d09c548dbf3b
> > >
> > > Unfortunately this commit breaks DNAT when performed before going via mirred
> > > egress->ingress.
> > >
> > > The reason is that connection tracking is lost and therefore a new state
> > > is created on ingress.
> > >
> > > This breaks existing setups.
> > >
> > > See below a simplified script reproducing this issue.

I think we come in to a paradox state. Some user don't want to have previous
ct info after mirror, while others would like to keep. In my understanding,
when we receive a pkt from a interface, the skb should be clean and no ct info
at first. But I may wrong.

Jamal, Wang Cong, Jiri, do you have any comments?

> >
> > I guess I can understand why the reproducer triggers it, but I fail to
> > see the actual use case you have behind it. Can you please elaborate
> > on it?
> 
> One use case we use mirred egress->ingress redirect for is when we want to
> reroute a packet after applying some change to the packet which would affect
> its routing. for example consider a bpf program running on tc ingress (after
> mirred) setting the skb->mark based on some criteria.
> 
> So you have something like:
> 
> packet routed to dummy device based on some criteria ->
>   mirred redirect to ingress ->
>     classification by ebpf logic at tc ingress ->
>        packet routed again
> 
> We have a setup where DNAT is performed before this flow in that case the
> ebpf logic needs to see the packet after the NAT.

Is it possible to check whether it's need to set the skb->mark before DNAT?
So we can update it before egress and no need to re-route.

Thanks
Hangbin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
  2022-04-21 11:00         ` Hangbin Liu
@ 2022-04-22 13:41           ` Marcelo Ricardo Leitner
  2022-04-23 16:08             ` Eyal Birger
  0 siblings, 1 reply; 10+ messages in thread
From: Marcelo Ricardo Leitner @ 2022-04-22 13:41 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: Eyal Birger, netdev, jhs, xiyou.wangcong, jiri, davem, kuba,
	ahleihel, dcaratti, aconole, roid, Shmulik Ladkani

On Thu, Apr 21, 2022 at 07:00:07PM +0800, Hangbin Liu wrote:
> Hi Eyal,
> On Tue, Apr 19, 2022 at 09:14:38PM +0300, Eyal Birger wrote:
> > > > > On Mon,  9 Aug 2021 15:04:55 +0800 you wrote:
> > > > > > When mirror/redirect a skb to a different port, the ct info should be reset
> > > > > > for reclassification. Or the pkts will match unexpected rules. For example,
> > > > > > with following topology and commands:
> > > > > >
> > > > > >     -----------
> > > > > >               |
> > > > > >        veth0 -+-------
> > > > > >               |
> > > > > >        veth1 -+-------
> > > > > >               |
> > > > > >
> > > > > > [...]
> > > > >
> > > > > Here is the summary with links:
> > > > >   - [net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
> > > > >     https://git.kernel.org/netdev/net/c/d09c548dbf3b
> > > >
> > > > Unfortunately this commit breaks DNAT when performed before going via mirred
> > > > egress->ingress.
> > > >
> > > > The reason is that connection tracking is lost and therefore a new state
> > > > is created on ingress.
> > > >
> > > > This breaks existing setups.
> > > >
> > > > See below a simplified script reproducing this issue.
>
> I think we come in to a paradox state. Some user don't want to have previous
> ct info after mirror, while others would like to keep. In my understanding,
> when we receive a pkt from a interface, the skb should be clean and no ct info
> at first. But I may wrong.

Makes sense to me. Moreover, there were a couple of fixes on this on
mirred around that time frame/area (like f799ada6bf23 ("net: sched:
act_mirred: drop dst for the direction from egress to ingress")). That's
because we are seeing that mirred xmit action when switching to
ingress direction should be as close skb_scrub_packet. OVS needs this
scrubbing as well, btw. This ct information could be easily stale if
there were other packet changes after it.

Point being, if we really need the knob for backwards compatibility
here, it may have to be a broader one.

>
> Jamal, Wang Cong, Jiri, do you have any comments?
>
> > >
> > > I guess I can understand why the reproducer triggers it, but I fail to
> > > see the actual use case you have behind it. Can you please elaborate
> > > on it?
> >
> > One use case we use mirred egress->ingress redirect for is when we want to
> > reroute a packet after applying some change to the packet which would affect
> > its routing. for example consider a bpf program running on tc ingress (after
> > mirred) setting the skb->mark based on some criteria.
> >
> > So you have something like:
> >
> > packet routed to dummy device based on some criteria ->
> >   mirred redirect to ingress ->
> >     classification by ebpf logic at tc ingress ->
> >        packet routed again
> >
> > We have a setup where DNAT is performed before this flow in that case the
> > ebpf logic needs to see the packet after the NAT.
>
> Is it possible to check whether it's need to set the skb->mark before DNAT?
> So we can update it before egress and no need to re-route.
>
> Thanks
> Hangbin
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
  2022-04-22 13:41           ` Marcelo Ricardo Leitner
@ 2022-04-23 16:08             ` Eyal Birger
  0 siblings, 0 replies; 10+ messages in thread
From: Eyal Birger @ 2022-04-23 16:08 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Hangbin Liu, netdev, jhs, xiyou.wangcong, jiri, davem, kuba,
	ahleihel, dcaratti, aconole, roid, Shmulik Ladkani

On Fri, Apr 22, 2022 at 4:41 PM Marcelo Ricardo Leitner
<mleitner@redhat.com> wrote:
>
> On Thu, Apr 21, 2022 at 07:00:07PM +0800, Hangbin Liu wrote:
> > Hi Eyal,
> > On Tue, Apr 19, 2022 at 09:14:38PM +0300, Eyal Birger wrote:
> > > > > > On Mon,  9 Aug 2021 15:04:55 +0800 you wrote:
> > > > > > > When mirror/redirect a skb to a different port, the ct info should be reset
> > > > > > > for reclassification. Or the pkts will match unexpected rules. For example,
> > > > > > > with following topology and commands:
> > > > > > >
> > > > > > >     -----------
> > > > > > >               |
> > > > > > >        veth0 -+-------
> > > > > > >               |
> > > > > > >        veth1 -+-------
> > > > > > >               |
> > > > > > >
> > > > > > > [...]
> > > > > >
> > > > > > Here is the summary with links:
> > > > > >   - [net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
> > > > > >     https://git.kernel.org/netdev/net/c/d09c548dbf3b
> > > > >
> > > > > Unfortunately this commit breaks DNAT when performed before going via mirred
> > > > > egress->ingress.
> > > > >
> > > > > The reason is that connection tracking is lost and therefore a new state
> > > > > is created on ingress.
> > > > >
> > > > > This breaks existing setups.
> > > > >
> > > > > See below a simplified script reproducing this issue.
> >
> > I think we come in to a paradox state. Some user don't want to have previous
> > ct info after mirror, while others would like to keep. In my understanding,
> > when we receive a pkt from a interface, the skb should be clean and no ct info
> > at first. But I may wrong.
>
> Makes sense to me. Moreover, there were a couple of fixes on this on
> mirred around that time frame/area (like f799ada6bf23 ("net: sched:
> act_mirred: drop dst for the direction from egress to ingress")). That's
> because we are seeing that mirred xmit action when switching to
> ingress direction should be as close skb_scrub_packet. OVS needs this
> scrubbing as well, btw. This ct information could be easily stale if
> there were other packet changes after it.

Makes sense to me too. The main reason for bringing this up was that it's a
subtle change and wasn't trivial to figure out.

>
> Point being, if we really need the knob for backwards compatibility
> here, it may have to be a broader one.

FWIW the dst change was ok in our setups.

>
> >
> > Jamal, Wang Cong, Jiri, do you have any comments?
> >
> > > >
> > > > I guess I can understand why the reproducer triggers it, but I fail to
> > > > see the actual use case you have behind it. Can you please elaborate
> > > > on it?
> > >
> > > One use case we use mirred egress->ingress redirect for is when we want to
> > > reroute a packet after applying some change to the packet which would affect
> > > its routing. for example consider a bpf program running on tc ingress (after
> > > mirred) setting the skb->mark based on some criteria.
> > >
> > > So you have something like:
> > >
> > > packet routed to dummy device based on some criteria ->
> > >   mirred redirect to ingress ->
> > >     classification by ebpf logic at tc ingress ->
> > >        packet routed again
> > >
> > > We have a setup where DNAT is performed before this flow in that case the
> > > ebpf logic needs to see the packet after the NAT.
> >
> > Is it possible to check whether it's need to set the skb->mark before DNAT?
> > So we can update it before egress and no need to re-route.

For future reference, we worked around this issue by moving some of the
relevant ebpf functionality to the lwt output hook which allows classification
and rerouting.

Eyal.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-04-23 16:08 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-09  7:04 [PATCH net] net: sched: act_mirred: Reset ct info when mirror/redirect skb Hangbin Liu
2021-08-09  8:35 ` Roi Dayan
2021-08-09 10:00 ` patchwork-bot+netdevbpf
2022-04-19 16:50   ` Eyal Birger
2022-04-19 17:26     ` Marcelo Ricardo Leitner
2022-04-19 18:14       ` Eyal Birger
2022-04-20 10:07         ` Hangbin Liu
2022-04-21 11:00         ` Hangbin Liu
2022-04-22 13:41           ` Marcelo Ricardo Leitner
2022-04-23 16:08             ` Eyal Birger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.