All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net 1/1] net: openvswitch: Fix ct_state nat flags for conns arriving from tc
@ 2022-01-04  8:28 Paul Blakey
  2022-01-04 18:08 ` Jakub Kicinski
  2022-01-05 14:57 ` Jamal Hadi Salim
  0 siblings, 2 replies; 7+ messages in thread
From: Paul Blakey @ 2022-01-04  8:28 UTC (permalink / raw)
  To: Paul Blakey, dev, netdev, Cong Wang, Jamal Hadi Salim,
	Pravin B Shelar, davem, Jiri Pirko, Jakub Kicinski
  Cc: Saeed Mahameed, Oz Shlomo, Vlad Buslov, Roi Dayan

Netfilter conntrack maintains NAT flags per connection indicating
whether NAT was configured for the connection. Openvswitch maintains
NAT flags on the per packet flow key ct_state field, indicating
whether NAT was actually executed on the packet.

When a packet misses from tc to ovs the conntrack NAT flags are set.
However, NAT was not necessarily executed on the packet because the
connection's state might still be in NEW state. As such, openvswitch wrongly
assumes that NAT was executed and sets an incorrect flow key NAT flags.

Fix this, by flagging to openvswitch which NAT was actually done in
act_ct via tc_skb_ext and tc_skb_cb to the openvswitch module, so
the packet flow key NAT flags will be correctly set.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
---
 include/linux/skbuff.h  |  4 +++-
 include/net/pkt_sched.h |  4 +++-
 net/openvswitch/flow.c  | 16 +++++++++++++---
 net/sched/act_ct.c      |  6 ++++++
 net/sched/cls_api.c     |  2 ++
 5 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4507d77d6941..bab45a009310 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -287,7 +287,9 @@ struct tc_skb_ext {
 	__u32 chain;
 	__u16 mru;
 	__u16 zone;
-	bool post_ct;
+	bool post_ct:1;
+	bool post_ct_snat:1;
+	bool post_ct_dnat:1;
 };
 #endif
 
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 9e71691c491b..a171dfa91910 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -197,7 +197,9 @@ struct tc_skb_cb {
 	struct qdisc_skb_cb qdisc_cb;
 
 	u16 mru;
-	bool post_ct;
+	bool post_ct: 1;
+	bool post_ct_snat:1;
+	bool post_ct_dnat:1;
 	u16 zone; /* Only valid if post_ct = true */
 };
 
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 6d262d9aa10e..02096f2ec678 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -859,7 +859,7 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
 #if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
 	struct tc_skb_ext *tc_ext;
 #endif
-	bool post_ct = false;
+	bool post_ct = false, post_ct_snat = false, post_ct_dnat = false;
 	int res, err;
 	u16 zone = 0;
 
@@ -900,6 +900,8 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
 		key->recirc_id = tc_ext ? tc_ext->chain : 0;
 		OVS_CB(skb)->mru = tc_ext ? tc_ext->mru : 0;
 		post_ct = tc_ext ? tc_ext->post_ct : false;
+		post_ct_snat = post_ct ? tc_ext->post_ct_snat : false;
+		post_ct_dnat = post_ct ? tc_ext->post_ct_dnat : false;
 		zone = post_ct ? tc_ext->zone : 0;
 	} else {
 		key->recirc_id = 0;
@@ -911,8 +913,16 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
 	err = key_extract(skb, key);
 	if (!err) {
 		ovs_ct_fill_key(skb, key, post_ct);   /* Must be after key_extract(). */
-		if (post_ct && !skb_get_nfct(skb))
-			key->ct_zone = zone;
+		if (post_ct) {
+			if (!skb_get_nfct(skb)) {
+				key->ct_zone = zone;
+			} else {
+				if (!post_ct_dnat)
+					key->ct_state &= ~OVS_CS_F_DST_NAT;
+				if (!post_ct_snat)
+					key->ct_state &= ~OVS_CS_F_SRC_NAT;
+			}
+		}
 	}
 	return err;
 }
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index ab3591408419..2a17eb77c904 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -839,6 +839,12 @@ static int ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
 	}
 
 	err = nf_nat_packet(ct, ctinfo, hooknum, skb);
+	if (err == NF_ACCEPT) {
+		if (maniptype == NF_NAT_MANIP_SRC)
+			tc_skb_cb(skb)->post_ct_snat = 1;
+		if (maniptype == NF_NAT_MANIP_DST)
+			tc_skb_cb(skb)->post_ct_dnat = 1;
+	}
 out:
 	return err;
 }
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 35c74bdde848..cc9409aa755e 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -1625,6 +1625,8 @@ int tcf_classify(struct sk_buff *skb,
 		ext->chain = last_executed_chain;
 		ext->mru = cb->mru;
 		ext->post_ct = cb->post_ct;
+		ext->post_ct_snat = cb->post_ct_snat;
+		ext->post_ct_dnat = cb->post_ct_dnat;
 		ext->zone = cb->zone;
 	}
 
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net 1/1] net: openvswitch: Fix ct_state nat flags for conns arriving from tc
  2022-01-04  8:28 [PATCH net 1/1] net: openvswitch: Fix ct_state nat flags for conns arriving from tc Paul Blakey
@ 2022-01-04 18:08 ` Jakub Kicinski
  2022-01-05  8:29   ` Paul Blakey
  2022-01-05 14:57 ` Jamal Hadi Salim
  1 sibling, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2022-01-04 18:08 UTC (permalink / raw)
  To: Paul Blakey
  Cc: dev, netdev, Cong Wang, Jamal Hadi Salim, Pravin B Shelar, davem,
	Jiri Pirko, Saeed Mahameed, Oz Shlomo, Vlad Buslov, Roi Dayan

On Tue, 4 Jan 2022 10:28:21 +0200 Paul Blakey wrote:
> Netfilter conntrack maintains NAT flags per connection indicating
> whether NAT was configured for the connection. Openvswitch maintains
> NAT flags on the per packet flow key ct_state field, indicating
> whether NAT was actually executed on the packet.
> 
> When a packet misses from tc to ovs the conntrack NAT flags are set.
> However, NAT was not necessarily executed on the packet because the
> connection's state might still be in NEW state. As such, openvswitch wrongly
> assumes that NAT was executed and sets an incorrect flow key NAT flags.
> 
> Fix this, by flagging to openvswitch which NAT was actually done in
> act_ct via tc_skb_ext and tc_skb_cb to the openvswitch module, so
> the packet flow key NAT flags will be correctly set.

Fixes ?

> Signed-off-by: Paul Blakey <paulb@nvidia.com>

> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 4507d77d6941..bab45a009310 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -287,7 +287,9 @@ struct tc_skb_ext {
>  	__u32 chain;
>  	__u16 mru;
>  	__u16 zone;
> -	bool post_ct;
> +	bool post_ct:1;
> +	bool post_ct_snat:1;
> +	bool post_ct_dnat:1;

single bit bool variables seem weird, use a unsigned int type, like u8.

>  };
>  #endif
>  
> diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
> index 9e71691c491b..a171dfa91910 100644
> --- a/include/net/pkt_sched.h
> +++ b/include/net/pkt_sched.h
> @@ -197,7 +197,9 @@ struct tc_skb_cb {
>  	struct qdisc_skb_cb qdisc_cb;
>  
>  	u16 mru;
> -	bool post_ct;
> +	bool post_ct: 1;

extra space

> +	bool post_ct_snat:1;
> +	bool post_ct_dnat:1;
>  	u16 zone; /* Only valid if post_ct = true */
>  };

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net 1/1] net: openvswitch: Fix ct_state nat flags for conns arriving from tc
  2022-01-04 18:08 ` Jakub Kicinski
@ 2022-01-05  8:29   ` Paul Blakey
  0 siblings, 0 replies; 7+ messages in thread
From: Paul Blakey @ 2022-01-05  8:29 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: dev, netdev, Cong Wang, Jamal Hadi Salim, Pravin B Shelar, davem,
	Jiri Pirko, Saeed Mahameed, Oz Shlomo, Vlad Buslov, Roi Dayan




On Tue, 4 Jan 2022, Jakub Kicinski wrote:

> On Tue, 4 Jan 2022 10:28:21 +0200 Paul Blakey wrote:
> > Netfilter conntrack maintains NAT flags per connection indicating
> > whether NAT was configured for the connection. Openvswitch maintains
> > NAT flags on the per packet flow key ct_state field, indicating
> > whether NAT was actually executed on the packet.
> > 
> > When a packet misses from tc to ovs the conntrack NAT flags are set.
> > However, NAT was not necessarily executed on the packet because the
> > connection's state might still be in NEW state. As such, openvswitch wrongly
> > assumes that NAT was executed and sets an incorrect flow key NAT flags.
> > 
> > Fix this, by flagging to openvswitch which NAT was actually done in
> > act_ct via tc_skb_ext and tc_skb_cb to the openvswitch module, so
> > the packet flow key NAT flags will be correctly set.
> 
> Fixes ?

I wasn't sure which patches to blame, I guess the bug was there from the
introduction of action ct in tc, so I'll blame that. 

> 
> > Signed-off-by: Paul Blakey <paulb@nvidia.com>
> 
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index 4507d77d6941..bab45a009310 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -287,7 +287,9 @@ struct tc_skb_ext {
> >  	__u32 chain;
> >  	__u16 mru;
> >  	__u16 zone;
> > -	bool post_ct;
> > +	bool post_ct:1;
> > +	bool post_ct_snat:1;
> > +	bool post_ct_dnat:1;
> 
> single bit bool variables seem weird, use a unsigned int type, like u8.
> 
> >  };
> >  #endif
> >  
> > diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
> > index 9e71691c491b..a171dfa91910 100644
> > --- a/include/net/pkt_sched.h
> > +++ b/include/net/pkt_sched.h
> > @@ -197,7 +197,9 @@ struct tc_skb_cb {
> >  	struct qdisc_skb_cb qdisc_cb;
> >  
> >  	u16 mru;
> > -	bool post_ct;
> > +	bool post_ct: 1;
> 
> extra space

Will remove, and send v2.

> 
> > +	bool post_ct_snat:1;
> > +	bool post_ct_dnat:1;
> >  	u16 zone; /* Only valid if post_ct = true */
> >  };
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net 1/1] net: openvswitch: Fix ct_state nat flags for conns arriving from tc
  2022-01-04  8:28 [PATCH net 1/1] net: openvswitch: Fix ct_state nat flags for conns arriving from tc Paul Blakey
  2022-01-04 18:08 ` Jakub Kicinski
@ 2022-01-05 14:57 ` Jamal Hadi Salim
  2022-01-05 15:30   ` Daniel Borkmann
  1 sibling, 1 reply; 7+ messages in thread
From: Jamal Hadi Salim @ 2022-01-05 14:57 UTC (permalink / raw)
  To: Paul Blakey, dev, netdev, Cong Wang, Pravin B Shelar, davem,
	Jiri Pirko, Jakub Kicinski
  Cc: Saeed Mahameed, Oz Shlomo, Vlad Buslov, Roi Dayan

On 2022-01-04 03:28, Paul Blakey wrote:
[..]
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -287,7 +287,9 @@ struct tc_skb_ext {
>   	__u32 chain;
>   	__u16 mru;
>   	__u16 zone;
> -	bool post_ct;
> +	bool post_ct:1;
> +	bool post_ct_snat:1;
> +	bool post_ct_dnat:1;
>   };


is skb_ext intended only for ovs? If yes, why does it belong
in the core code? Ex: Looking at tcf_classify() which is such
a core function in the fast path any packet going via tc, it
is now encumbered with with checking presence of skb_ext.
I know passing around metadata is a paramount requirement
for programmability but this is getting messier with speacial
use cases for ovs and/or offload...

cheers,
jamal

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net 1/1] net: openvswitch: Fix ct_state nat flags for conns arriving from tc
  2022-01-05 14:57 ` Jamal Hadi Salim
@ 2022-01-05 15:30   ` Daniel Borkmann
  2022-01-05 16:18     ` Paul Blakey
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Borkmann @ 2022-01-05 15:30 UTC (permalink / raw)
  To: Jamal Hadi Salim, Paul Blakey, dev, netdev, Cong Wang,
	Pravin B Shelar, davem, Jiri Pirko, Jakub Kicinski
  Cc: Saeed Mahameed, Oz Shlomo, Vlad Buslov, Roi Dayan, john.fastabend

On 1/5/22 3:57 PM, Jamal Hadi Salim wrote:
> On 2022-01-04 03:28, Paul Blakey wrote:
> [..]
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -287,7 +287,9 @@ struct tc_skb_ext {
>>       __u32 chain;
>>       __u16 mru;
>>       __u16 zone;
>> -    bool post_ct;
>> +    bool post_ct:1;
>> +    bool post_ct_snat:1;
>> +    bool post_ct_dnat:1;
>>   };
> 
> is skb_ext intended only for ovs? If yes, why does it belong
> in the core code? Ex: Looking at tcf_classify() which is such
> a core function in the fast path any packet going via tc, it
> is now encumbered with with checking presence of skb_ext.
> I know passing around metadata is a paramount requirement
> for programmability but this is getting messier with speacial
> use cases for ovs and/or offload...

Full ack on the bloat for corner cases like ovs offload, especially
given distros just enable most stuff anyway and therefore no light
fast path as with !CONFIG_NET_TC_SKB_EXT. :(

Could this somehow be hidden behind static key or such if offloads
are not used, so we can shrink it back to just calling into plain
__tcf_classify() for sw-only use cases (like BPF)?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net 1/1] net: openvswitch: Fix ct_state nat flags for conns arriving from tc
  2022-01-05 15:30   ` Daniel Borkmann
@ 2022-01-05 16:18     ` Paul Blakey
  2022-01-06 12:54       ` Jamal Hadi Salim
  0 siblings, 1 reply; 7+ messages in thread
From: Paul Blakey @ 2022-01-05 16:18 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Jamal Hadi Salim, dev, netdev, Cong Wang, Pravin B Shelar, davem,
	Jiri Pirko, Jakub Kicinski, Saeed Mahameed, Oz Shlomo,
	Vlad Buslov, Roi Dayan, john.fastabend

[-- Attachment #1: Type: text/plain, Size: 1678 bytes --]



On Wed, 5 Jan 2022, Daniel Borkmann wrote:

> On 1/5/22 3:57 PM, Jamal Hadi Salim wrote:
> > On 2022-01-04 03:28, Paul Blakey wrote:
> > [..]
> >> --- a/include/linux/skbuff.h
> >> +++ b/include/linux/skbuff.h
> >> @@ -287,7 +287,9 @@ struct tc_skb_ext {
> >>       __u32 chain;
> >>       __u16 mru;
> >>       __u16 zone;
> >> -    bool post_ct;
> >> +    bool post_ct:1;
> >> +    bool post_ct_snat:1;
> >> +    bool post_ct_dnat:1;
> >>   };
> > 
> > is skb_ext intended only for ovs? If yes, why does it belong
> > in the core code? Ex: Looking at tcf_classify() which is such
> > a core function in the fast path any packet going via tc, it
> > is now encumbered with with checking presence of skb_ext.
> > I know passing around metadata is a paramount requirement
> > for programmability but this is getting messier with speacial
> > use cases for ovs and/or offload...
> 
> Full ack on the bloat for corner cases like ovs offload, especially
> given distros just enable most stuff anyway and therefore no light
> fast path as with !CONFIG_NET_TC_SKB_EXT. :(
> 
> Could this somehow be hidden behind static key or such if offloads
> are not used, so we can shrink it back to just calling into plain
> __tcf_classify() for sw-only use cases (like BPF)?
> 
> 

It is used for both tc -> ovs and driver -> tc path.

I think I can do what you suggest adn will work on something like
that,  but this specific patch  doesn't really change the ext 
allocation/derefences count (and probably  not the size as well).
So can  we take  this (not yet posted v2 after fixing what already 
mentioned) and I'll do a patch of what you suggest in net-next?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net 1/1] net: openvswitch: Fix ct_state nat flags for conns arriving from tc
  2022-01-05 16:18     ` Paul Blakey
@ 2022-01-06 12:54       ` Jamal Hadi Salim
  0 siblings, 0 replies; 7+ messages in thread
From: Jamal Hadi Salim @ 2022-01-06 12:54 UTC (permalink / raw)
  To: Paul Blakey, Daniel Borkmann
  Cc: dev, netdev, Cong Wang, Pravin B Shelar, davem, Jiri Pirko,
	Jakub Kicinski, Saeed Mahameed, Oz Shlomo, Vlad Buslov,
	Roi Dayan, john.fastabend

On 2022-01-05 11:18, Paul Blakey wrote:
> 
> 
> On Wed, 5 Jan 2022, Daniel Borkmann wrote:
> 

[..]

>> Full ack on the bloat for corner cases like ovs offload, especially
>> given distros just enable most stuff anyway and therefore no light
>> fast path as with !CONFIG_NET_TC_SKB_EXT. :(
>>
>> Could this somehow be hidden behind static key or such if offloads
>> are not used, so we can shrink it back to just calling into plain
>> __tcf_classify() for sw-only use cases (like BPF)?
>>
>>
> 
> It is used for both tc -> ovs and driver -> tc path.
> 
> I think I can do what you suggest adn will work on something like
> that,  but this specific patch  doesn't really change the ext
> allocation/derefences count (and probably  not the size as well).
> So can  we take  this (not yet posted v2 after fixing what already
> mentioned) and I'll do a patch of what you suggest in net-next?
> 

Sounds reasonable.

The main outstanding challenge is still going to be all these bit
declarations (and i am sure more to come) that are specific for
specific components in TC (act_ct in this case); if every component
started planting their flags(pun intended) then both backwards and
forwards compatibility are going to be needed for maintainance.

The _cb spaces are supposed to be opaque and meaning to whats in
that space is only sensible to the component that is storing and
retrieving. Something like chain id is ok in that namespaces
because it has global meaning to TC, the others are not. My
suggestion is going forward try to not add more component specific
variables.
For a proper solution:
I think we need some sort of "metadata bus" to resolve this in the
long run. Something which is a bigger surgery...

cheers,
jamal

cheers,
jamal



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-01-06 12:54 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-04  8:28 [PATCH net 1/1] net: openvswitch: Fix ct_state nat flags for conns arriving from tc Paul Blakey
2022-01-04 18:08 ` Jakub Kicinski
2022-01-05  8:29   ` Paul Blakey
2022-01-05 14:57 ` Jamal Hadi Salim
2022-01-05 15:30   ` Daniel Borkmann
2022-01-05 16:18     ` Paul Blakey
2022-01-06 12:54       ` Jamal Hadi Salim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.