netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/5] net: eliminate the duplicate code in the ct nat functions of ovs and tc
@ 2022-11-15 15:50 Xin Long
  2022-11-15 15:50 ` [PATCH net-next 1/5] openvswitch: delete the unncessary skb_pull_rcsum call in ovs_ct_nat_execute Xin Long
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Xin Long @ 2022-11-15 15:50 UTC (permalink / raw)
  To: network dev, dev
  Cc: davem, kuba, Eric Dumazet, Paolo Abeni, Pravin B Shelar,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko, Pablo Neira Ayuso,
	Florian Westphal, Marcelo Ricardo Leitner, Davide Caratti,
	Eelco Chaudron, Aaron Conole

The changes in the patchset:

  "net: add helper support in tc act_ct for ovs offloading"

had moved some common ct code used by both OVS and TC into netfilter.

There are still some big functions pretty similar defined and used in
each of OVS and TC. It is not good to maintain such similar code in 2
places. This patchset is to extract the functions for NAT processing
from OVS and TC to netfilter.

To make this change clear and safe, this patchset gets the common code
out of OVS and TC step by step: The patch 1-4 make some minor changes
in OVS and TC to make the NAT code of them completely the same, then
the patch 5 moves the common code to the netfilter and exports one
function called by each of OVS and TC.

Xin Long (5):
  openvswitch: delete the unncessary skb_pull_rcsum call in
    ovs_ct_nat_execute
  openvswitch: return NF_ACCEPT when OVS_CT_NAT is net set in info nat
  net: sched: return NF_ACCEPT when fails to add nat ext in
    tcf_ct_act_nat
  net: sched: update the nat flag for icmp error packets in
    ct_nat_execute
  net: move the nat function to nf_nat_core for ovs and tc

 include/net/netfilter/nf_nat.h |   4 +
 net/netfilter/nf_nat_core.c    | 131 +++++++++++++++++++++++++++++
 net/openvswitch/conntrack.c    | 146 +++------------------------------
 net/sched/act_ct.c             | 136 +++---------------------------
 4 files changed, 159 insertions(+), 258 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH net-next 1/5] openvswitch: delete the unncessary skb_pull_rcsum call in ovs_ct_nat_execute
  2022-11-15 15:50 [PATCH net-next 0/5] net: eliminate the duplicate code in the ct nat functions of ovs and tc Xin Long
@ 2022-11-15 15:50 ` Xin Long
  2022-11-16 20:55   ` Aaron Conole
  2022-11-15 15:50 ` [PATCH net-next 2/5] openvswitch: return NF_ACCEPT when OVS_CT_NAT is net set in info nat Xin Long
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Xin Long @ 2022-11-15 15:50 UTC (permalink / raw)
  To: network dev, dev
  Cc: davem, kuba, Eric Dumazet, Paolo Abeni, Pravin B Shelar,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko, Pablo Neira Ayuso,
	Florian Westphal, Marcelo Ricardo Leitner, Davide Caratti,
	Eelco Chaudron, Aaron Conole

The calls to ovs_ct_nat_execute() are as below:

  ovs_ct_execute()
    ovs_ct_lookup()
      __ovs_ct_lookup()
        ovs_ct_nat()
          ovs_ct_nat_execute()
    ovs_ct_commit()
      __ovs_ct_lookup()
        ovs_ct_nat()
          ovs_ct_nat_execute()

and since skb_pull_rcsum() and skb_push_rcsum() are already
called in ovs_ct_execute(), there's no need to do it again
in ovs_ct_nat_execute().

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/openvswitch/conntrack.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 4348321856af..4c5e5a6475af 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -735,10 +735,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
 			      const struct nf_nat_range2 *range,
 			      enum nf_nat_manip_type maniptype, struct sw_flow_key *key)
 {
-	int hooknum, nh_off, err = NF_ACCEPT;
-
-	nh_off = skb_network_offset(skb);
-	skb_pull_rcsum(skb, nh_off);
+	int hooknum, err = NF_ACCEPT;
 
 	/* See HOOK2MANIP(). */
 	if (maniptype == NF_NAT_MANIP_SRC)
@@ -755,7 +752,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
 			if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
 							   hooknum))
 				err = NF_DROP;
-			goto push;
+			goto out;
 		} else if (IS_ENABLED(CONFIG_IPV6) &&
 			   skb->protocol == htons(ETH_P_IPV6)) {
 			__be16 frag_off;
@@ -770,7 +767,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
 								     hooknum,
 								     hdrlen))
 					err = NF_DROP;
-				goto push;
+				goto out;
 			}
 		}
 		/* Non-ICMP, fall thru to initialize if needed. */
@@ -788,7 +785,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
 				? nf_nat_setup_info(ct, range, maniptype)
 				: nf_nat_alloc_null_binding(ct, hooknum);
 			if (err != NF_ACCEPT)
-				goto push;
+				goto out;
 		}
 		break;
 
@@ -798,13 +795,11 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
 
 	default:
 		err = NF_DROP;
-		goto push;
+		goto out;
 	}
 
 	err = nf_nat_packet(ct, ctinfo, hooknum, skb);
-push:
-	skb_push_rcsum(skb, nh_off);
-
+out:
 	/* Update the flow key if NAT successful. */
 	if (err == NF_ACCEPT)
 		ovs_nat_update_key(key, skb, maniptype);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next 2/5] openvswitch: return NF_ACCEPT when OVS_CT_NAT is net set in info nat
  2022-11-15 15:50 [PATCH net-next 0/5] net: eliminate the duplicate code in the ct nat functions of ovs and tc Xin Long
  2022-11-15 15:50 ` [PATCH net-next 1/5] openvswitch: delete the unncessary skb_pull_rcsum call in ovs_ct_nat_execute Xin Long
@ 2022-11-15 15:50 ` Xin Long
  2022-11-16 20:56   ` Aaron Conole
  2022-11-15 15:50 ` [PATCH net-next 3/5] net: sched: return NF_ACCEPT when fails to add nat ext in tcf_ct_act_nat Xin Long
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Xin Long @ 2022-11-15 15:50 UTC (permalink / raw)
  To: network dev, dev
  Cc: davem, kuba, Eric Dumazet, Paolo Abeni, Pravin B Shelar,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko, Pablo Neira Ayuso,
	Florian Westphal, Marcelo Ricardo Leitner, Davide Caratti,
	Eelco Chaudron, Aaron Conole

Either OVS_CT_SRC_NAT or OVS_CT_DST_NAT is set, OVS_CT_NAT must be
set in info->nat. Thus, if OVS_CT_NAT is not set in info->nat, it
will definitely not do NAT but returns NF_ACCEPT in ovs_ct_nat().

This patch changes nothing funcational but only makes this return
earlier in ovs_ct_nat() to keep consistent with TC's processing
in tcf_ct_act_nat().

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/openvswitch/conntrack.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 4c5e5a6475af..cc643a556ea1 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -816,6 +816,9 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key *key,
 	enum nf_nat_manip_type maniptype;
 	int err;
 
+	if (!(info->nat & OVS_CT_NAT))
+		return NF_ACCEPT;
+
 	/* Add NAT extension if not confirmed yet. */
 	if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct))
 		return NF_ACCEPT;   /* Can't NAT. */
@@ -825,8 +828,7 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key *key,
 	 * Make sure new expected connections (IP_CT_RELATED) are NATted only
 	 * when committing.
 	 */
-	if (info->nat & OVS_CT_NAT && ctinfo != IP_CT_NEW &&
-	    ct->status & IPS_NAT_MASK &&
+	if (ctinfo != IP_CT_NEW && ct->status & IPS_NAT_MASK &&
 	    (ctinfo != IP_CT_RELATED || info->commit)) {
 		/* NAT an established or related connection like before. */
 		if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next 3/5] net: sched: return NF_ACCEPT when fails to add nat ext in tcf_ct_act_nat
  2022-11-15 15:50 [PATCH net-next 0/5] net: eliminate the duplicate code in the ct nat functions of ovs and tc Xin Long
  2022-11-15 15:50 ` [PATCH net-next 1/5] openvswitch: delete the unncessary skb_pull_rcsum call in ovs_ct_nat_execute Xin Long
  2022-11-15 15:50 ` [PATCH net-next 2/5] openvswitch: return NF_ACCEPT when OVS_CT_NAT is net set in info nat Xin Long
@ 2022-11-15 15:50 ` Xin Long
  2022-11-15 15:50 ` [PATCH net-next 4/5] net: sched: update the nat flag for icmp error packets in ct_nat_execute Xin Long
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Xin Long @ 2022-11-15 15:50 UTC (permalink / raw)
  To: network dev, dev
  Cc: davem, kuba, Eric Dumazet, Paolo Abeni, Pravin B Shelar,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko, Pablo Neira Ayuso,
	Florian Westphal, Marcelo Ricardo Leitner, Davide Caratti,
	Eelco Chaudron, Aaron Conole

This patch changes to return NF_ACCEPT when fails to add nat
ext before doing NAT in tcf_ct_act_nat(), to keep consistent
with OVS' processing in ovs_ct_nat().

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/sched/act_ct.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index da0b7f665277..8869b3ef6642 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -994,7 +994,7 @@ static int tcf_ct_act_nat(struct sk_buff *skb,
 
 	/* Add NAT extension if not confirmed yet. */
 	if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct))
-		return NF_DROP;   /* Can't NAT. */
+		return NF_ACCEPT;   /* Can't NAT. */
 
 	if (ctinfo != IP_CT_NEW && (ct->status & IPS_NAT_MASK) &&
 	    (ctinfo != IP_CT_RELATED || commit)) {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next 4/5] net: sched: update the nat flag for icmp error packets in ct_nat_execute
  2022-11-15 15:50 [PATCH net-next 0/5] net: eliminate the duplicate code in the ct nat functions of ovs and tc Xin Long
                   ` (2 preceding siblings ...)
  2022-11-15 15:50 ` [PATCH net-next 3/5] net: sched: return NF_ACCEPT when fails to add nat ext in tcf_ct_act_nat Xin Long
@ 2022-11-15 15:50 ` Xin Long
  2022-11-15 15:50 ` [PATCH net-next 5/5] net: move the nat function to nf_nat_core for ovs and tc Xin Long
  2022-11-15 19:42 ` [PATCH net-next 0/5] net: eliminate the duplicate code in the ct nat functions of " Saeed Mahameed
  5 siblings, 0 replies; 16+ messages in thread
From: Xin Long @ 2022-11-15 15:50 UTC (permalink / raw)
  To: network dev, dev
  Cc: davem, kuba, Eric Dumazet, Paolo Abeni, Pravin B Shelar,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko, Pablo Neira Ayuso,
	Florian Westphal, Marcelo Ricardo Leitner, Davide Caratti,
	Eelco Chaudron, Aaron Conole

In ovs_ct_nat_execute(), the packet flow key nat flags are updated
when it processes ICMP(v6) error packets translation successfully.

In ct_nat_execute() when processing ICMP(v6) error packets translation
successfully, it should have done the same in ct_nat_execute() to set
post_ct_s/dnat flag, which will be used to update flow key nat flags
in OVS module later.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/sched/act_ct.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 8869b3ef6642..c7782c9a6ab6 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -936,13 +936,13 @@ static int ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
 	}
 
 	err = nf_nat_packet(ct, ctinfo, hooknum, skb);
+out:
 	if (err == NF_ACCEPT) {
 		if (maniptype == NF_NAT_MANIP_SRC)
 			tc_skb_cb(skb)->post_ct_snat = 1;
 		if (maniptype == NF_NAT_MANIP_DST)
 			tc_skb_cb(skb)->post_ct_dnat = 1;
 	}
-out:
 	return err;
 }
 #endif /* CONFIG_NF_NAT */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next 5/5] net: move the nat function to nf_nat_core for ovs and tc
  2022-11-15 15:50 [PATCH net-next 0/5] net: eliminate the duplicate code in the ct nat functions of ovs and tc Xin Long
                   ` (3 preceding siblings ...)
  2022-11-15 15:50 ` [PATCH net-next 4/5] net: sched: update the nat flag for icmp error packets in ct_nat_execute Xin Long
@ 2022-11-15 15:50 ` Xin Long
  2022-11-16 21:05   ` Aaron Conole
  2022-11-16 21:54   ` Pablo Neira Ayuso
  2022-11-15 19:42 ` [PATCH net-next 0/5] net: eliminate the duplicate code in the ct nat functions of " Saeed Mahameed
  5 siblings, 2 replies; 16+ messages in thread
From: Xin Long @ 2022-11-15 15:50 UTC (permalink / raw)
  To: network dev, dev
  Cc: davem, kuba, Eric Dumazet, Paolo Abeni, Pravin B Shelar,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko, Pablo Neira Ayuso,
	Florian Westphal, Marcelo Ricardo Leitner, Davide Caratti,
	Eelco Chaudron, Aaron Conole

There are two nat functions are nearly the same in both OVS and
TC code, (ovs_)ct_nat_execute() and ovs_ct_nat/tcf_ct_act_nat().

This patch is to move them to netfilter nf_nat_core and export
nf_ct_nat() so that it can be shared by both OVS and TC, and
keep the nat (type) check and nat flag update in OVS and TC's
own place, as these parts are different between OVS and TC.

Note that in OVS nat function it was using skb->protocol to get
the proto as it already skips vlans in key_extract(), while it
doesn't in TC, and TC has to call skb_protocol() to get proto.
So in nf_ct_nat_execute(), we keep using skb_protocol() which
works for both OVS and TC.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/net/netfilter/nf_nat.h |   4 +
 net/netfilter/nf_nat_core.c    | 131 +++++++++++++++++++++++++++++++
 net/openvswitch/conntrack.c    | 137 +++------------------------------
 net/sched/act_ct.c             | 136 +++-----------------------------
 4 files changed, 156 insertions(+), 252 deletions(-)

diff --git a/include/net/netfilter/nf_nat.h b/include/net/netfilter/nf_nat.h
index e9eb01e99d2f..9877f064548a 100644
--- a/include/net/netfilter/nf_nat.h
+++ b/include/net/netfilter/nf_nat.h
@@ -104,6 +104,10 @@ unsigned int
 nf_nat_inet_fn(void *priv, struct sk_buff *skb,
 	       const struct nf_hook_state *state);
 
+int nf_ct_nat(struct sk_buff *skb, struct nf_conn *ct,
+	      enum ip_conntrack_info ctinfo, int *action,
+	      const struct nf_nat_range2 *range, bool commit);
+
 static inline int nf_nat_initialized(const struct nf_conn *ct,
 				     enum nf_nat_manip_type manip)
 {
diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
index e29e4ccb5c5a..1c72b8caa24e 100644
--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -784,6 +784,137 @@ nf_nat_inet_fn(void *priv, struct sk_buff *skb,
 }
 EXPORT_SYMBOL_GPL(nf_nat_inet_fn);
 
+/* Modelled after nf_nat_ipv[46]_fn().
+ * range is only used for new, uninitialized NAT state.
+ * Returns either NF_ACCEPT or NF_DROP.
+ */
+static int nf_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
+			     enum ip_conntrack_info ctinfo, int *action,
+			     const struct nf_nat_range2 *range,
+			     enum nf_nat_manip_type maniptype)
+{
+	__be16 proto = skb_protocol(skb, true);
+	int hooknum, err = NF_ACCEPT;
+
+	/* See HOOK2MANIP(). */
+	if (maniptype == NF_NAT_MANIP_SRC)
+		hooknum = NF_INET_LOCAL_IN; /* Source NAT */
+	else
+		hooknum = NF_INET_LOCAL_OUT; /* Destination NAT */
+
+	switch (ctinfo) {
+	case IP_CT_RELATED:
+	case IP_CT_RELATED_REPLY:
+		if (proto == htons(ETH_P_IP) &&
+		    ip_hdr(skb)->protocol == IPPROTO_ICMP) {
+			if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
+							   hooknum))
+				err = NF_DROP;
+			goto out;
+		} else if (IS_ENABLED(CONFIG_IPV6) && proto == htons(ETH_P_IPV6)) {
+			__be16 frag_off;
+			u8 nexthdr = ipv6_hdr(skb)->nexthdr;
+			int hdrlen = ipv6_skip_exthdr(skb,
+						      sizeof(struct ipv6hdr),
+						      &nexthdr, &frag_off);
+
+			if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) {
+				if (!nf_nat_icmpv6_reply_translation(skb, ct,
+								     ctinfo,
+								     hooknum,
+								     hdrlen))
+					err = NF_DROP;
+				goto out;
+			}
+		}
+		/* Non-ICMP, fall thru to initialize if needed. */
+		fallthrough;
+	case IP_CT_NEW:
+		/* Seen it before?  This can happen for loopback, retrans,
+		 * or local packets.
+		 */
+		if (!nf_nat_initialized(ct, maniptype)) {
+			/* Initialize according to the NAT action. */
+			err = (range && range->flags & NF_NAT_RANGE_MAP_IPS)
+				/* Action is set up to establish a new
+				 * mapping.
+				 */
+				? nf_nat_setup_info(ct, range, maniptype)
+				: nf_nat_alloc_null_binding(ct, hooknum);
+			if (err != NF_ACCEPT)
+				goto out;
+		}
+		break;
+
+	case IP_CT_ESTABLISHED:
+	case IP_CT_ESTABLISHED_REPLY:
+		break;
+
+	default:
+		err = NF_DROP;
+		goto out;
+	}
+
+	err = nf_nat_packet(ct, ctinfo, hooknum, skb);
+	if (err == NF_ACCEPT)
+		*action |= (1 << maniptype);
+out:
+	return err;
+}
+
+int nf_ct_nat(struct sk_buff *skb, struct nf_conn *ct,
+	      enum ip_conntrack_info ctinfo, int *action,
+	      const struct nf_nat_range2 *range, bool commit)
+{
+	enum nf_nat_manip_type maniptype;
+	int err, ct_action = *action;
+
+	*action = 0;
+
+	/* Add NAT extension if not confirmed yet. */
+	if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct))
+		return NF_ACCEPT;   /* Can't NAT. */
+
+	if (ctinfo != IP_CT_NEW && (ct->status & IPS_NAT_MASK) &&
+	    (ctinfo != IP_CT_RELATED || commit)) {
+		/* NAT an established or related connection like before. */
+		if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY)
+			/* This is the REPLY direction for a connection
+			 * for which NAT was applied in the forward
+			 * direction.  Do the reverse NAT.
+			 */
+			maniptype = ct->status & IPS_SRC_NAT
+				? NF_NAT_MANIP_DST : NF_NAT_MANIP_SRC;
+		else
+			maniptype = ct->status & IPS_SRC_NAT
+				? NF_NAT_MANIP_SRC : NF_NAT_MANIP_DST;
+	} else if (ct_action & (1 << NF_NAT_MANIP_SRC)) {
+		maniptype = NF_NAT_MANIP_SRC;
+	} else if (ct_action & (1 << NF_NAT_MANIP_DST)) {
+		maniptype = NF_NAT_MANIP_DST;
+	} else {
+		return NF_ACCEPT;
+	}
+
+	err = nf_ct_nat_execute(skb, ct, ctinfo, action, range, maniptype);
+	if (err == NF_ACCEPT && ct->status & IPS_DST_NAT) {
+		if (ct->status & IPS_SRC_NAT) {
+			if (maniptype == NF_NAT_MANIP_SRC)
+				maniptype = NF_NAT_MANIP_DST;
+			else
+				maniptype = NF_NAT_MANIP_SRC;
+
+			err = nf_ct_nat_execute(skb, ct, ctinfo, action, range,
+						maniptype);
+		} else if (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) {
+			err = nf_ct_nat_execute(skb, ct, ctinfo, action, NULL,
+						NF_NAT_MANIP_SRC);
+		}
+	}
+	return err;
+}
+EXPORT_SYMBOL_GPL(nf_ct_nat);
+
 struct nf_nat_proto_clean {
 	u8	l3proto;
 	u8	l4proto;
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index cc643a556ea1..d03c75165663 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -726,144 +726,27 @@ static void ovs_nat_update_key(struct sw_flow_key *key,
 	}
 }
 
-/* Modelled after nf_nat_ipv[46]_fn().
- * range is only used for new, uninitialized NAT state.
- * Returns either NF_ACCEPT or NF_DROP.
- */
-static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
-			      enum ip_conntrack_info ctinfo,
-			      const struct nf_nat_range2 *range,
-			      enum nf_nat_manip_type maniptype, struct sw_flow_key *key)
-{
-	int hooknum, err = NF_ACCEPT;
-
-	/* See HOOK2MANIP(). */
-	if (maniptype == NF_NAT_MANIP_SRC)
-		hooknum = NF_INET_LOCAL_IN; /* Source NAT */
-	else
-		hooknum = NF_INET_LOCAL_OUT; /* Destination NAT */
-
-	switch (ctinfo) {
-	case IP_CT_RELATED:
-	case IP_CT_RELATED_REPLY:
-		if (IS_ENABLED(CONFIG_NF_NAT) &&
-		    skb->protocol == htons(ETH_P_IP) &&
-		    ip_hdr(skb)->protocol == IPPROTO_ICMP) {
-			if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
-							   hooknum))
-				err = NF_DROP;
-			goto out;
-		} else if (IS_ENABLED(CONFIG_IPV6) &&
-			   skb->protocol == htons(ETH_P_IPV6)) {
-			__be16 frag_off;
-			u8 nexthdr = ipv6_hdr(skb)->nexthdr;
-			int hdrlen = ipv6_skip_exthdr(skb,
-						      sizeof(struct ipv6hdr),
-						      &nexthdr, &frag_off);
-
-			if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) {
-				if (!nf_nat_icmpv6_reply_translation(skb, ct,
-								     ctinfo,
-								     hooknum,
-								     hdrlen))
-					err = NF_DROP;
-				goto out;
-			}
-		}
-		/* Non-ICMP, fall thru to initialize if needed. */
-		fallthrough;
-	case IP_CT_NEW:
-		/* Seen it before?  This can happen for loopback, retrans,
-		 * or local packets.
-		 */
-		if (!nf_nat_initialized(ct, maniptype)) {
-			/* Initialize according to the NAT action. */
-			err = (range && range->flags & NF_NAT_RANGE_MAP_IPS)
-				/* Action is set up to establish a new
-				 * mapping.
-				 */
-				? nf_nat_setup_info(ct, range, maniptype)
-				: nf_nat_alloc_null_binding(ct, hooknum);
-			if (err != NF_ACCEPT)
-				goto out;
-		}
-		break;
-
-	case IP_CT_ESTABLISHED:
-	case IP_CT_ESTABLISHED_REPLY:
-		break;
-
-	default:
-		err = NF_DROP;
-		goto out;
-	}
-
-	err = nf_nat_packet(ct, ctinfo, hooknum, skb);
-out:
-	/* Update the flow key if NAT successful. */
-	if (err == NF_ACCEPT)
-		ovs_nat_update_key(key, skb, maniptype);
-
-	return err;
-}
-
 /* Returns NF_DROP if the packet should be dropped, NF_ACCEPT otherwise. */
 static int ovs_ct_nat(struct net *net, struct sw_flow_key *key,
 		      const struct ovs_conntrack_info *info,
 		      struct sk_buff *skb, struct nf_conn *ct,
 		      enum ip_conntrack_info ctinfo)
 {
-	enum nf_nat_manip_type maniptype;
-	int err;
+	int err, action = 0;
 
 	if (!(info->nat & OVS_CT_NAT))
 		return NF_ACCEPT;
+	if (info->nat & OVS_CT_SRC_NAT)
+		action |= (1 << NF_NAT_MANIP_SRC);
+	if (info->nat & OVS_CT_DST_NAT)
+		action |= (1 << NF_NAT_MANIP_DST);
 
-	/* Add NAT extension if not confirmed yet. */
-	if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct))
-		return NF_ACCEPT;   /* Can't NAT. */
+	err = nf_ct_nat(skb, ct, ctinfo, &action, &info->range, info->commit);
 
-	/* Determine NAT type.
-	 * Check if the NAT type can be deduced from the tracked connection.
-	 * Make sure new expected connections (IP_CT_RELATED) are NATted only
-	 * when committing.
-	 */
-	if (ctinfo != IP_CT_NEW && ct->status & IPS_NAT_MASK &&
-	    (ctinfo != IP_CT_RELATED || info->commit)) {
-		/* NAT an established or related connection like before. */
-		if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY)
-			/* This is the REPLY direction for a connection
-			 * for which NAT was applied in the forward
-			 * direction.  Do the reverse NAT.
-			 */
-			maniptype = ct->status & IPS_SRC_NAT
-				? NF_NAT_MANIP_DST : NF_NAT_MANIP_SRC;
-		else
-			maniptype = ct->status & IPS_SRC_NAT
-				? NF_NAT_MANIP_SRC : NF_NAT_MANIP_DST;
-	} else if (info->nat & OVS_CT_SRC_NAT) {
-		maniptype = NF_NAT_MANIP_SRC;
-	} else if (info->nat & OVS_CT_DST_NAT) {
-		maniptype = NF_NAT_MANIP_DST;
-	} else {
-		return NF_ACCEPT; /* Connection is not NATed. */
-	}
-	err = ovs_ct_nat_execute(skb, ct, ctinfo, &info->range, maniptype, key);
-
-	if (err == NF_ACCEPT && ct->status & IPS_DST_NAT) {
-		if (ct->status & IPS_SRC_NAT) {
-			if (maniptype == NF_NAT_MANIP_SRC)
-				maniptype = NF_NAT_MANIP_DST;
-			else
-				maniptype = NF_NAT_MANIP_SRC;
-
-			err = ovs_ct_nat_execute(skb, ct, ctinfo, &info->range,
-						 maniptype, key);
-		} else if (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) {
-			err = ovs_ct_nat_execute(skb, ct, ctinfo, NULL,
-						 NF_NAT_MANIP_SRC, key);
-		}
-	}
+	if (action & (1 << NF_NAT_MANIP_SRC))
+		ovs_nat_update_key(key, skb, NF_NAT_MANIP_SRC);
+	if (action & (1 << NF_NAT_MANIP_DST))
+		ovs_nat_update_key(key, skb, NF_NAT_MANIP_DST);
 
 	return err;
 }
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index c7782c9a6ab6..0c410220239f 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -863,90 +863,6 @@ static void tcf_ct_params_free_rcu(struct rcu_head *head)
 	tcf_ct_params_free(params);
 }
 
-#if IS_ENABLED(CONFIG_NF_NAT)
-/* Modelled after nf_nat_ipv[46]_fn().
- * range is only used for new, uninitialized NAT state.
- * Returns either NF_ACCEPT or NF_DROP.
- */
-static int ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
-			  enum ip_conntrack_info ctinfo,
-			  const struct nf_nat_range2 *range,
-			  enum nf_nat_manip_type maniptype)
-{
-	__be16 proto = skb_protocol(skb, true);
-	int hooknum, err = NF_ACCEPT;
-
-	/* See HOOK2MANIP(). */
-	if (maniptype == NF_NAT_MANIP_SRC)
-		hooknum = NF_INET_LOCAL_IN; /* Source NAT */
-	else
-		hooknum = NF_INET_LOCAL_OUT; /* Destination NAT */
-
-	switch (ctinfo) {
-	case IP_CT_RELATED:
-	case IP_CT_RELATED_REPLY:
-		if (proto == htons(ETH_P_IP) &&
-		    ip_hdr(skb)->protocol == IPPROTO_ICMP) {
-			if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
-							   hooknum))
-				err = NF_DROP;
-			goto out;
-		} else if (IS_ENABLED(CONFIG_IPV6) && proto == htons(ETH_P_IPV6)) {
-			__be16 frag_off;
-			u8 nexthdr = ipv6_hdr(skb)->nexthdr;
-			int hdrlen = ipv6_skip_exthdr(skb,
-						      sizeof(struct ipv6hdr),
-						      &nexthdr, &frag_off);
-
-			if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) {
-				if (!nf_nat_icmpv6_reply_translation(skb, ct,
-								     ctinfo,
-								     hooknum,
-								     hdrlen))
-					err = NF_DROP;
-				goto out;
-			}
-		}
-		/* Non-ICMP, fall thru to initialize if needed. */
-		fallthrough;
-	case IP_CT_NEW:
-		/* Seen it before?  This can happen for loopback, retrans,
-		 * or local packets.
-		 */
-		if (!nf_nat_initialized(ct, maniptype)) {
-			/* Initialize according to the NAT action. */
-			err = (range && range->flags & NF_NAT_RANGE_MAP_IPS)
-				/* Action is set up to establish a new
-				 * mapping.
-				 */
-				? nf_nat_setup_info(ct, range, maniptype)
-				: nf_nat_alloc_null_binding(ct, hooknum);
-			if (err != NF_ACCEPT)
-				goto out;
-		}
-		break;
-
-	case IP_CT_ESTABLISHED:
-	case IP_CT_ESTABLISHED_REPLY:
-		break;
-
-	default:
-		err = NF_DROP;
-		goto out;
-	}
-
-	err = nf_nat_packet(ct, ctinfo, hooknum, skb);
-out:
-	if (err == NF_ACCEPT) {
-		if (maniptype == NF_NAT_MANIP_SRC)
-			tc_skb_cb(skb)->post_ct_snat = 1;
-		if (maniptype == NF_NAT_MANIP_DST)
-			tc_skb_cb(skb)->post_ct_dnat = 1;
-	}
-	return err;
-}
-#endif /* CONFIG_NF_NAT */
-
 static void tcf_ct_act_set_mark(struct nf_conn *ct, u32 mark, u32 mask)
 {
 #if IS_ENABLED(CONFIG_NF_CONNTRACK_MARK)
@@ -986,52 +902,22 @@ static int tcf_ct_act_nat(struct sk_buff *skb,
 			  bool commit)
 {
 #if IS_ENABLED(CONFIG_NF_NAT)
-	int err;
-	enum nf_nat_manip_type maniptype;
+	int err, action = 0;
 
 	if (!(ct_action & TCA_CT_ACT_NAT))
 		return NF_ACCEPT;
+	if (ct_action & TCA_CT_ACT_NAT_SRC)
+		action |= (1 << NF_NAT_MANIP_SRC);
+	if (ct_action & TCA_CT_ACT_NAT_DST)
+		action |= (1 << NF_NAT_MANIP_DST);
 
-	/* Add NAT extension if not confirmed yet. */
-	if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct))
-		return NF_ACCEPT;   /* Can't NAT. */
-
-	if (ctinfo != IP_CT_NEW && (ct->status & IPS_NAT_MASK) &&
-	    (ctinfo != IP_CT_RELATED || commit)) {
-		/* NAT an established or related connection like before. */
-		if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY)
-			/* This is the REPLY direction for a connection
-			 * for which NAT was applied in the forward
-			 * direction.  Do the reverse NAT.
-			 */
-			maniptype = ct->status & IPS_SRC_NAT
-				? NF_NAT_MANIP_DST : NF_NAT_MANIP_SRC;
-		else
-			maniptype = ct->status & IPS_SRC_NAT
-				? NF_NAT_MANIP_SRC : NF_NAT_MANIP_DST;
-	} else if (ct_action & TCA_CT_ACT_NAT_SRC) {
-		maniptype = NF_NAT_MANIP_SRC;
-	} else if (ct_action & TCA_CT_ACT_NAT_DST) {
-		maniptype = NF_NAT_MANIP_DST;
-	} else {
-		return NF_ACCEPT;
-	}
+	err = nf_ct_nat(skb, ct, ctinfo, &action, range, commit);
+
+	if (action & (1 << NF_NAT_MANIP_SRC))
+		tc_skb_cb(skb)->post_ct_snat = 1;
+	if (action & (1 << NF_NAT_MANIP_DST))
+		tc_skb_cb(skb)->post_ct_dnat = 1;
 
-	err = ct_nat_execute(skb, ct, ctinfo, range, maniptype);
-	if (err == NF_ACCEPT && ct->status & IPS_DST_NAT) {
-		if (ct->status & IPS_SRC_NAT) {
-			if (maniptype == NF_NAT_MANIP_SRC)
-				maniptype = NF_NAT_MANIP_DST;
-			else
-				maniptype = NF_NAT_MANIP_SRC;
-
-			err = ct_nat_execute(skb, ct, ctinfo, range,
-					     maniptype);
-		} else if (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) {
-			err = ct_nat_execute(skb, ct, ctinfo, NULL,
-					     NF_NAT_MANIP_SRC);
-		}
-	}
 	return err;
 #else
 	return NF_ACCEPT;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 0/5] net: eliminate the duplicate code in the ct nat functions of ovs and tc
  2022-11-15 15:50 [PATCH net-next 0/5] net: eliminate the duplicate code in the ct nat functions of ovs and tc Xin Long
                   ` (4 preceding siblings ...)
  2022-11-15 15:50 ` [PATCH net-next 5/5] net: move the nat function to nf_nat_core for ovs and tc Xin Long
@ 2022-11-15 19:42 ` Saeed Mahameed
  5 siblings, 0 replies; 16+ messages in thread
From: Saeed Mahameed @ 2022-11-15 19:42 UTC (permalink / raw)
  To: Xin Long
  Cc: network dev, dev, davem, kuba, Eric Dumazet, Paolo Abeni,
	Pravin B Shelar, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	Pablo Neira Ayuso, Florian Westphal, Marcelo Ricardo Leitner,
	Davide Caratti, Eelco Chaudron, Aaron Conole

On 15 Nov 10:50, Xin Long wrote:
>The changes in the patchset:
>
>  "net: add helper support in tc act_ct for ovs offloading"
>
>had moved some common ct code used by both OVS and TC into netfilter.
>
>There are still some big functions pretty similar defined and used in
>each of OVS and TC. It is not good to maintain such similar code in 2
>places. This patchset is to extract the functions for NAT processing
>from OVS and TC to netfilter.
>
>To make this change clear and safe, this patchset gets the common code
>out of OVS and TC step by step: The patch 1-4 make some minor changes
>in OVS and TC to make the NAT code of them completely the same, then
>the patch 5 moves the common code to the netfilter and exports one
>function called by each of OVS and TC.
>

not super expert on TC or OVS, but LGTM.

Reviewed-by: Saeed Mahameed <saeed@kernel.org>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 1/5] openvswitch: delete the unncessary skb_pull_rcsum call in ovs_ct_nat_execute
  2022-11-15 15:50 ` [PATCH net-next 1/5] openvswitch: delete the unncessary skb_pull_rcsum call in ovs_ct_nat_execute Xin Long
@ 2022-11-16 20:55   ` Aaron Conole
  0 siblings, 0 replies; 16+ messages in thread
From: Aaron Conole @ 2022-11-16 20:55 UTC (permalink / raw)
  To: Xin Long
  Cc: network dev, dev, davem, kuba, Eric Dumazet, Paolo Abeni,
	Pravin B Shelar, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	Pablo Neira Ayuso, Florian Westphal, Marcelo Ricardo Leitner,
	Davide Caratti, Eelco Chaudron

Xin Long <lucien.xin@gmail.com> writes:

> The calls to ovs_ct_nat_execute() are as below:
>
>   ovs_ct_execute()
>     ovs_ct_lookup()
>       __ovs_ct_lookup()
>         ovs_ct_nat()
>           ovs_ct_nat_execute()
>     ovs_ct_commit()
>       __ovs_ct_lookup()
>         ovs_ct_nat()
>           ovs_ct_nat_execute()
>
> and since skb_pull_rcsum() and skb_push_rcsum() are already
> called in ovs_ct_execute(), there's no need to do it again
> in ovs_ct_nat_execute().
>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---

LGTM

Acked-by: Aaron Conole <aconole@redhat.com>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 2/5] openvswitch: return NF_ACCEPT when OVS_CT_NAT is net set in info nat
  2022-11-15 15:50 ` [PATCH net-next 2/5] openvswitch: return NF_ACCEPT when OVS_CT_NAT is net set in info nat Xin Long
@ 2022-11-16 20:56   ` Aaron Conole
  0 siblings, 0 replies; 16+ messages in thread
From: Aaron Conole @ 2022-11-16 20:56 UTC (permalink / raw)
  To: Xin Long
  Cc: network dev, dev, davem, kuba, Eric Dumazet, Paolo Abeni,
	Pravin B Shelar, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	Pablo Neira Ayuso, Florian Westphal, Marcelo Ricardo Leitner,
	Davide Caratti, Eelco Chaudron

Xin Long <lucien.xin@gmail.com> writes:

> Either OVS_CT_SRC_NAT or OVS_CT_DST_NAT is set, OVS_CT_NAT must be
> set in info->nat. Thus, if OVS_CT_NAT is not set in info->nat, it
> will definitely not do NAT but returns NF_ACCEPT in ovs_ct_nat().
>
> This patch changes nothing funcational but only makes this return
> earlier in ovs_ct_nat() to keep consistent with TC's processing
> in tcf_ct_act_nat().
>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---

Acked-by: Aaron Conole <aconole@redhat.com>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 5/5] net: move the nat function to nf_nat_core for ovs and tc
  2022-11-15 15:50 ` [PATCH net-next 5/5] net: move the nat function to nf_nat_core for ovs and tc Xin Long
@ 2022-11-16 21:05   ` Aaron Conole
  2022-11-17  0:36     ` Xin Long
  2022-11-16 21:54   ` Pablo Neira Ayuso
  1 sibling, 1 reply; 16+ messages in thread
From: Aaron Conole @ 2022-11-16 21:05 UTC (permalink / raw)
  To: Xin Long
  Cc: network dev, dev, davem, kuba, Eric Dumazet, Paolo Abeni,
	Pravin B Shelar, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	Pablo Neira Ayuso, Florian Westphal, Marcelo Ricardo Leitner,
	Davide Caratti, Eelco Chaudron

Hi Xin,

Xin Long <lucien.xin@gmail.com> writes:

> There are two nat functions are nearly the same in both OVS and
> TC code, (ovs_)ct_nat_execute() and ovs_ct_nat/tcf_ct_act_nat().
>
> This patch is to move them to netfilter nf_nat_core and export
> nf_ct_nat() so that it can be shared by both OVS and TC, and
> keep the nat (type) check and nat flag update in OVS and TC's
> own place, as these parts are different between OVS and TC.
>
> Note that in OVS nat function it was using skb->protocol to get
> the proto as it already skips vlans in key_extract(), while it
> doesn't in TC, and TC has to call skb_protocol() to get proto.
> So in nf_ct_nat_execute(), we keep using skb_protocol() which
> works for both OVS and TC.
>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---
>  include/net/netfilter/nf_nat.h |   4 +
>  net/netfilter/nf_nat_core.c    | 131 +++++++++++++++++++++++++++++++
>  net/openvswitch/conntrack.c    | 137 +++------------------------------
>  net/sched/act_ct.c             | 136 +++-----------------------------
>  4 files changed, 156 insertions(+), 252 deletions(-)
>
> diff --git a/include/net/netfilter/nf_nat.h b/include/net/netfilter/nf_nat.h
> index e9eb01e99d2f..9877f064548a 100644
> --- a/include/net/netfilter/nf_nat.h
> +++ b/include/net/netfilter/nf_nat.h
> @@ -104,6 +104,10 @@ unsigned int
>  nf_nat_inet_fn(void *priv, struct sk_buff *skb,
>  	       const struct nf_hook_state *state);
>  
> +int nf_ct_nat(struct sk_buff *skb, struct nf_conn *ct,
> +	      enum ip_conntrack_info ctinfo, int *action,
> +	      const struct nf_nat_range2 *range, bool commit);
> +
>  static inline int nf_nat_initialized(const struct nf_conn *ct,
>  				     enum nf_nat_manip_type manip)
>  {
> diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
> index e29e4ccb5c5a..1c72b8caa24e 100644
> --- a/net/netfilter/nf_nat_core.c
> +++ b/net/netfilter/nf_nat_core.c
> @@ -784,6 +784,137 @@ nf_nat_inet_fn(void *priv, struct sk_buff *skb,
>  }
>  EXPORT_SYMBOL_GPL(nf_nat_inet_fn);
>  
> +/* Modelled after nf_nat_ipv[46]_fn().
> + * range is only used for new, uninitialized NAT state.
> + * Returns either NF_ACCEPT or NF_DROP.
> + */
> +static int nf_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
> +			     enum ip_conntrack_info ctinfo, int *action,
> +			     const struct nf_nat_range2 *range,
> +			     enum nf_nat_manip_type maniptype)
> +{
> +	__be16 proto = skb_protocol(skb, true);
> +	int hooknum, err = NF_ACCEPT;
> +
> +	/* See HOOK2MANIP(). */
> +	if (maniptype == NF_NAT_MANIP_SRC)
> +		hooknum = NF_INET_LOCAL_IN; /* Source NAT */
> +	else
> +		hooknum = NF_INET_LOCAL_OUT; /* Destination NAT */
> +
> +	switch (ctinfo) {
> +	case IP_CT_RELATED:
> +	case IP_CT_RELATED_REPLY:
> +		if (proto == htons(ETH_P_IP) &&
> +		    ip_hdr(skb)->protocol == IPPROTO_ICMP) {
> +			if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
> +							   hooknum))
> +				err = NF_DROP;
> +			goto out;
> +		} else if (IS_ENABLED(CONFIG_IPV6) && proto == htons(ETH_P_IPV6)) {
> +			__be16 frag_off;
> +			u8 nexthdr = ipv6_hdr(skb)->nexthdr;
> +			int hdrlen = ipv6_skip_exthdr(skb,
> +						      sizeof(struct ipv6hdr),
> +						      &nexthdr, &frag_off);
> +
> +			if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) {
> +				if (!nf_nat_icmpv6_reply_translation(skb, ct,
> +								     ctinfo,
> +								     hooknum,
> +								     hdrlen))
> +					err = NF_DROP;
> +				goto out;
> +			}
> +		}
> +		/* Non-ICMP, fall thru to initialize if needed. */
> +		fallthrough;
> +	case IP_CT_NEW:
> +		/* Seen it before?  This can happen for loopback, retrans,
> +		 * or local packets.
> +		 */
> +		if (!nf_nat_initialized(ct, maniptype)) {
> +			/* Initialize according to the NAT action. */
> +			err = (range && range->flags & NF_NAT_RANGE_MAP_IPS)
> +				/* Action is set up to establish a new
> +				 * mapping.
> +				 */
> +				? nf_nat_setup_info(ct, range, maniptype)
> +				: nf_nat_alloc_null_binding(ct, hooknum);
> +			if (err != NF_ACCEPT)
> +				goto out;
> +		}
> +		break;
> +
> +	case IP_CT_ESTABLISHED:
> +	case IP_CT_ESTABLISHED_REPLY:
> +		break;
> +
> +	default:
> +		err = NF_DROP;
> +		goto out;
> +	}
> +
> +	err = nf_nat_packet(ct, ctinfo, hooknum, skb);
> +	if (err == NF_ACCEPT)
> +		*action |= (1 << maniptype);
> +out:
> +	return err;
> +}
> +
> +int nf_ct_nat(struct sk_buff *skb, struct nf_conn *ct,
> +	      enum ip_conntrack_info ctinfo, int *action,
> +	      const struct nf_nat_range2 *range, bool commit)
> +{
> +	enum nf_nat_manip_type maniptype;
> +	int err, ct_action = *action;
> +
> +	*action = 0;
> +
> +	/* Add NAT extension if not confirmed yet. */
> +	if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct))
> +		return NF_ACCEPT;   /* Can't NAT. */
> +
> +	if (ctinfo != IP_CT_NEW && (ct->status & IPS_NAT_MASK) &&
> +	    (ctinfo != IP_CT_RELATED || commit)) {
> +		/* NAT an established or related connection like before. */
> +		if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY)
> +			/* This is the REPLY direction for a connection
> +			 * for which NAT was applied in the forward
> +			 * direction.  Do the reverse NAT.
> +			 */
> +			maniptype = ct->status & IPS_SRC_NAT
> +				? NF_NAT_MANIP_DST : NF_NAT_MANIP_SRC;
> +		else
> +			maniptype = ct->status & IPS_SRC_NAT
> +				? NF_NAT_MANIP_SRC : NF_NAT_MANIP_DST;
> +	} else if (ct_action & (1 << NF_NAT_MANIP_SRC)) {
> +		maniptype = NF_NAT_MANIP_SRC;
> +	} else if (ct_action & (1 << NF_NAT_MANIP_DST)) {
> +		maniptype = NF_NAT_MANIP_DST;
> +	} else {
> +		return NF_ACCEPT;
> +	}
> +
> +	err = nf_ct_nat_execute(skb, ct, ctinfo, action, range, maniptype);
> +	if (err == NF_ACCEPT && ct->status & IPS_DST_NAT) {
> +		if (ct->status & IPS_SRC_NAT) {
> +			if (maniptype == NF_NAT_MANIP_SRC)
> +				maniptype = NF_NAT_MANIP_DST;
> +			else
> +				maniptype = NF_NAT_MANIP_SRC;
> +
> +			err = nf_ct_nat_execute(skb, ct, ctinfo, action, range,
> +						maniptype);
> +		} else if (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) {
> +			err = nf_ct_nat_execute(skb, ct, ctinfo, action, NULL,
> +						NF_NAT_MANIP_SRC);
> +		}
> +	}
> +	return err;
> +}
> +EXPORT_SYMBOL_GPL(nf_ct_nat);
> +
>  struct nf_nat_proto_clean {
>  	u8	l3proto;
>  	u8	l4proto;
> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> index cc643a556ea1..d03c75165663 100644
> --- a/net/openvswitch/conntrack.c
> +++ b/net/openvswitch/conntrack.c
> @@ -726,144 +726,27 @@ static void ovs_nat_update_key(struct sw_flow_key *key,
>  	}
>  }
>  
> -/* Modelled after nf_nat_ipv[46]_fn().
> - * range is only used for new, uninitialized NAT state.
> - * Returns either NF_ACCEPT or NF_DROP.
> - */
> -static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
> -			      enum ip_conntrack_info ctinfo,
> -			      const struct nf_nat_range2 *range,
> -			      enum nf_nat_manip_type maniptype, struct sw_flow_key *key)
> -{
> -	int hooknum, err = NF_ACCEPT;
> -
> -	/* See HOOK2MANIP(). */
> -	if (maniptype == NF_NAT_MANIP_SRC)
> -		hooknum = NF_INET_LOCAL_IN; /* Source NAT */
> -	else
> -		hooknum = NF_INET_LOCAL_OUT; /* Destination NAT */
> -
> -	switch (ctinfo) {
> -	case IP_CT_RELATED:
> -	case IP_CT_RELATED_REPLY:
> -		if (IS_ENABLED(CONFIG_NF_NAT) &&
> -		    skb->protocol == htons(ETH_P_IP) &&
> -		    ip_hdr(skb)->protocol == IPPROTO_ICMP) {
> -			if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
> -							   hooknum))
> -				err = NF_DROP;
> -			goto out;
> -		} else if (IS_ENABLED(CONFIG_IPV6) &&
> -			   skb->protocol == htons(ETH_P_IPV6)) {
> -			__be16 frag_off;
> -			u8 nexthdr = ipv6_hdr(skb)->nexthdr;
> -			int hdrlen = ipv6_skip_exthdr(skb,
> -						      sizeof(struct ipv6hdr),
> -						      &nexthdr, &frag_off);
> -
> -			if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) {
> -				if (!nf_nat_icmpv6_reply_translation(skb, ct,
> -								     ctinfo,
> -								     hooknum,
> -								     hdrlen))
> -					err = NF_DROP;
> -				goto out;
> -			}
> -		}
> -		/* Non-ICMP, fall thru to initialize if needed. */
> -		fallthrough;
> -	case IP_CT_NEW:
> -		/* Seen it before?  This can happen for loopback, retrans,
> -		 * or local packets.
> -		 */
> -		if (!nf_nat_initialized(ct, maniptype)) {
> -			/* Initialize according to the NAT action. */
> -			err = (range && range->flags & NF_NAT_RANGE_MAP_IPS)
> -				/* Action is set up to establish a new
> -				 * mapping.
> -				 */
> -				? nf_nat_setup_info(ct, range, maniptype)
> -				: nf_nat_alloc_null_binding(ct, hooknum);
> -			if (err != NF_ACCEPT)
> -				goto out;
> -		}
> -		break;
> -
> -	case IP_CT_ESTABLISHED:
> -	case IP_CT_ESTABLISHED_REPLY:
> -		break;
> -
> -	default:
> -		err = NF_DROP;
> -		goto out;
> -	}
> -
> -	err = nf_nat_packet(ct, ctinfo, hooknum, skb);
> -out:
> -	/* Update the flow key if NAT successful. */
> -	if (err == NF_ACCEPT)
> -		ovs_nat_update_key(key, skb, maniptype);
> -
> -	return err;
> -}
> -
>  /* Returns NF_DROP if the packet should be dropped, NF_ACCEPT otherwise. */
>  static int ovs_ct_nat(struct net *net, struct sw_flow_key *key,
>  		      const struct ovs_conntrack_info *info,
>  		      struct sk_buff *skb, struct nf_conn *ct,
>  		      enum ip_conntrack_info ctinfo)
>  {
> -	enum nf_nat_manip_type maniptype;
> -	int err;
> +	int err, action = 0;
>  
>  	if (!(info->nat & OVS_CT_NAT))
>  		return NF_ACCEPT;
> +	if (info->nat & OVS_CT_SRC_NAT)
> +		action |= (1 << NF_NAT_MANIP_SRC);
> +	if (info->nat & OVS_CT_DST_NAT)
> +		action |= (1 << NF_NAT_MANIP_DST);
>  
> -	/* Add NAT extension if not confirmed yet. */
> -	if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct))
> -		return NF_ACCEPT;   /* Can't NAT. */
> +	err = nf_ct_nat(skb, ct, ctinfo, &action, &info->range, info->commit);
>  
> -	/* Determine NAT type.
> -	 * Check if the NAT type can be deduced from the tracked connection.
> -	 * Make sure new expected connections (IP_CT_RELATED) are NATted only
> -	 * when committing.
> -	 */
> -	if (ctinfo != IP_CT_NEW && ct->status & IPS_NAT_MASK &&
> -	    (ctinfo != IP_CT_RELATED || info->commit)) {
> -		/* NAT an established or related connection like before. */
> -		if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY)
> -			/* This is the REPLY direction for a connection
> -			 * for which NAT was applied in the forward
> -			 * direction.  Do the reverse NAT.
> -			 */
> -			maniptype = ct->status & IPS_SRC_NAT
> -				? NF_NAT_MANIP_DST : NF_NAT_MANIP_SRC;
> -		else
> -			maniptype = ct->status & IPS_SRC_NAT
> -				? NF_NAT_MANIP_SRC : NF_NAT_MANIP_DST;
> -	} else if (info->nat & OVS_CT_SRC_NAT) {
> -		maniptype = NF_NAT_MANIP_SRC;
> -	} else if (info->nat & OVS_CT_DST_NAT) {
> -		maniptype = NF_NAT_MANIP_DST;
> -	} else {
> -		return NF_ACCEPT; /* Connection is not NATed. */
> -	}
> -	err = ovs_ct_nat_execute(skb, ct, ctinfo, &info->range, maniptype, key);
> -
> -	if (err == NF_ACCEPT && ct->status & IPS_DST_NAT) {
> -		if (ct->status & IPS_SRC_NAT) {
> -			if (maniptype == NF_NAT_MANIP_SRC)
> -				maniptype = NF_NAT_MANIP_DST;
> -			else
> -				maniptype = NF_NAT_MANIP_SRC;
> -
> -			err = ovs_ct_nat_execute(skb, ct, ctinfo, &info->range,
> -						 maniptype, key);
> -		} else if (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) {
> -			err = ovs_ct_nat_execute(skb, ct, ctinfo, NULL,
> -						 NF_NAT_MANIP_SRC, key);
> -		}
> -	}
> +	if (action & (1 << NF_NAT_MANIP_SRC))
> +		ovs_nat_update_key(key, skb, NF_NAT_MANIP_SRC);
> +	if (action & (1 << NF_NAT_MANIP_DST))
> +		ovs_nat_update_key(key, skb, NF_NAT_MANIP_DST);

I haven't tested this yet with tuple collision, but if I'm reading the
code right, info->nat will only be from the parsed actions (which will
have a single src/dst manipulation).  But, this code was previously
always updated based on ct->status information, which isn't quite the
same.  Maybe I'm missing something.  I plan to run this testing
tomorrow, but maybe you have already tested for it?

>  	return err;
>  }
> diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
> index c7782c9a6ab6..0c410220239f 100644
> --- a/net/sched/act_ct.c
> +++ b/net/sched/act_ct.c
> @@ -863,90 +863,6 @@ static void tcf_ct_params_free_rcu(struct rcu_head *head)
>  	tcf_ct_params_free(params);
>  }
>  
> -#if IS_ENABLED(CONFIG_NF_NAT)
> -/* Modelled after nf_nat_ipv[46]_fn().
> - * range is only used for new, uninitialized NAT state.
> - * Returns either NF_ACCEPT or NF_DROP.
> - */
> -static int ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
> -			  enum ip_conntrack_info ctinfo,
> -			  const struct nf_nat_range2 *range,
> -			  enum nf_nat_manip_type maniptype)
> -{
> -	__be16 proto = skb_protocol(skb, true);
> -	int hooknum, err = NF_ACCEPT;
> -
> -	/* See HOOK2MANIP(). */
> -	if (maniptype == NF_NAT_MANIP_SRC)
> -		hooknum = NF_INET_LOCAL_IN; /* Source NAT */
> -	else
> -		hooknum = NF_INET_LOCAL_OUT; /* Destination NAT */
> -
> -	switch (ctinfo) {
> -	case IP_CT_RELATED:
> -	case IP_CT_RELATED_REPLY:
> -		if (proto == htons(ETH_P_IP) &&
> -		    ip_hdr(skb)->protocol == IPPROTO_ICMP) {
> -			if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
> -							   hooknum))
> -				err = NF_DROP;
> -			goto out;
> -		} else if (IS_ENABLED(CONFIG_IPV6) && proto == htons(ETH_P_IPV6)) {
> -			__be16 frag_off;
> -			u8 nexthdr = ipv6_hdr(skb)->nexthdr;
> -			int hdrlen = ipv6_skip_exthdr(skb,
> -						      sizeof(struct ipv6hdr),
> -						      &nexthdr, &frag_off);
> -
> -			if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) {
> -				if (!nf_nat_icmpv6_reply_translation(skb, ct,
> -								     ctinfo,
> -								     hooknum,
> -								     hdrlen))
> -					err = NF_DROP;
> -				goto out;
> -			}
> -		}
> -		/* Non-ICMP, fall thru to initialize if needed. */
> -		fallthrough;
> -	case IP_CT_NEW:
> -		/* Seen it before?  This can happen for loopback, retrans,
> -		 * or local packets.
> -		 */
> -		if (!nf_nat_initialized(ct, maniptype)) {
> -			/* Initialize according to the NAT action. */
> -			err = (range && range->flags & NF_NAT_RANGE_MAP_IPS)
> -				/* Action is set up to establish a new
> -				 * mapping.
> -				 */
> -				? nf_nat_setup_info(ct, range, maniptype)
> -				: nf_nat_alloc_null_binding(ct, hooknum);
> -			if (err != NF_ACCEPT)
> -				goto out;
> -		}
> -		break;
> -
> -	case IP_CT_ESTABLISHED:
> -	case IP_CT_ESTABLISHED_REPLY:
> -		break;
> -
> -	default:
> -		err = NF_DROP;
> -		goto out;
> -	}
> -
> -	err = nf_nat_packet(ct, ctinfo, hooknum, skb);
> -out:
> -	if (err == NF_ACCEPT) {
> -		if (maniptype == NF_NAT_MANIP_SRC)
> -			tc_skb_cb(skb)->post_ct_snat = 1;
> -		if (maniptype == NF_NAT_MANIP_DST)
> -			tc_skb_cb(skb)->post_ct_dnat = 1;
> -	}
> -	return err;
> -}
> -#endif /* CONFIG_NF_NAT */
> -
>  static void tcf_ct_act_set_mark(struct nf_conn *ct, u32 mark, u32 mask)
>  {
>  #if IS_ENABLED(CONFIG_NF_CONNTRACK_MARK)
> @@ -986,52 +902,22 @@ static int tcf_ct_act_nat(struct sk_buff *skb,
>  			  bool commit)
>  {
>  #if IS_ENABLED(CONFIG_NF_NAT)
> -	int err;
> -	enum nf_nat_manip_type maniptype;
> +	int err, action = 0;
>  
>  	if (!(ct_action & TCA_CT_ACT_NAT))
>  		return NF_ACCEPT;
> +	if (ct_action & TCA_CT_ACT_NAT_SRC)
> +		action |= (1 << NF_NAT_MANIP_SRC);
> +	if (ct_action & TCA_CT_ACT_NAT_DST)
> +		action |= (1 << NF_NAT_MANIP_DST);
>  
> -	/* Add NAT extension if not confirmed yet. */
> -	if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct))
> -		return NF_ACCEPT;   /* Can't NAT. */
> -
> -	if (ctinfo != IP_CT_NEW && (ct->status & IPS_NAT_MASK) &&
> -	    (ctinfo != IP_CT_RELATED || commit)) {
> -		/* NAT an established or related connection like before. */
> -		if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY)
> -			/* This is the REPLY direction for a connection
> -			 * for which NAT was applied in the forward
> -			 * direction.  Do the reverse NAT.
> -			 */
> -			maniptype = ct->status & IPS_SRC_NAT
> -				? NF_NAT_MANIP_DST : NF_NAT_MANIP_SRC;
> -		else
> -			maniptype = ct->status & IPS_SRC_NAT
> -				? NF_NAT_MANIP_SRC : NF_NAT_MANIP_DST;
> -	} else if (ct_action & TCA_CT_ACT_NAT_SRC) {
> -		maniptype = NF_NAT_MANIP_SRC;
> -	} else if (ct_action & TCA_CT_ACT_NAT_DST) {
> -		maniptype = NF_NAT_MANIP_DST;
> -	} else {
> -		return NF_ACCEPT;
> -	}
> +	err = nf_ct_nat(skb, ct, ctinfo, &action, range, commit);
> +
> +	if (action & (1 << NF_NAT_MANIP_SRC))
> +		tc_skb_cb(skb)->post_ct_snat = 1;
> +	if (action & (1 << NF_NAT_MANIP_DST))
> +		tc_skb_cb(skb)->post_ct_dnat = 1;
>  
> -	err = ct_nat_execute(skb, ct, ctinfo, range, maniptype);
> -	if (err == NF_ACCEPT && ct->status & IPS_DST_NAT) {
> -		if (ct->status & IPS_SRC_NAT) {
> -			if (maniptype == NF_NAT_MANIP_SRC)
> -				maniptype = NF_NAT_MANIP_DST;
> -			else
> -				maniptype = NF_NAT_MANIP_SRC;
> -
> -			err = ct_nat_execute(skb, ct, ctinfo, range,
> -					     maniptype);
> -		} else if (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) {
> -			err = ct_nat_execute(skb, ct, ctinfo, NULL,
> -					     NF_NAT_MANIP_SRC);
> -		}
> -	}
>  	return err;
>  #else
>  	return NF_ACCEPT;


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 5/5] net: move the nat function to nf_nat_core for ovs and tc
  2022-11-15 15:50 ` [PATCH net-next 5/5] net: move the nat function to nf_nat_core for ovs and tc Xin Long
  2022-11-16 21:05   ` Aaron Conole
@ 2022-11-16 21:54   ` Pablo Neira Ayuso
  2022-11-17  0:51     ` Xin Long
  1 sibling, 1 reply; 16+ messages in thread
From: Pablo Neira Ayuso @ 2022-11-16 21:54 UTC (permalink / raw)
  To: Xin Long
  Cc: network dev, dev, davem, kuba, Eric Dumazet, Paolo Abeni,
	Pravin B Shelar, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	Florian Westphal, Marcelo Ricardo Leitner, Davide Caratti,
	Eelco Chaudron, Aaron Conole

On Tue, Nov 15, 2022 at 10:50:57AM -0500, Xin Long wrote:
> diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
> index e29e4ccb5c5a..1c72b8caa24e 100644
> --- a/net/netfilter/nf_nat_core.c
> +++ b/net/netfilter/nf_nat_core.c
> @@ -784,6 +784,137 @@ nf_nat_inet_fn(void *priv, struct sk_buff *skb,
>  }
>  EXPORT_SYMBOL_GPL(nf_nat_inet_fn);
>  
> +/* Modelled after nf_nat_ipv[46]_fn().
> + * range is only used for new, uninitialized NAT state.
> + * Returns either NF_ACCEPT or NF_DROP.
> + */
> +static int nf_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
> +			     enum ip_conntrack_info ctinfo, int *action,
> +			     const struct nf_nat_range2 *range,
> +			     enum nf_nat_manip_type maniptype)
> +{
> +	__be16 proto = skb_protocol(skb, true);
> +	int hooknum, err = NF_ACCEPT;
> +
> +	/* See HOOK2MANIP(). */
> +	if (maniptype == NF_NAT_MANIP_SRC)
> +		hooknum = NF_INET_LOCAL_IN; /* Source NAT */
> +	else
> +		hooknum = NF_INET_LOCAL_OUT; /* Destination NAT */
> +
> +	switch (ctinfo) {
> +	case IP_CT_RELATED:
> +	case IP_CT_RELATED_REPLY:
> +		if (proto == htons(ETH_P_IP) &&
> +		    ip_hdr(skb)->protocol == IPPROTO_ICMP) {
> +			if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
> +							   hooknum))
> +				err = NF_DROP;
> +			goto out;
> +		} else if (IS_ENABLED(CONFIG_IPV6) && proto == htons(ETH_P_IPV6)) {
> +			__be16 frag_off;
> +			u8 nexthdr = ipv6_hdr(skb)->nexthdr;
> +			int hdrlen = ipv6_skip_exthdr(skb,
> +						      sizeof(struct ipv6hdr),
> +						      &nexthdr, &frag_off);
> +
> +			if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) {
> +				if (!nf_nat_icmpv6_reply_translation(skb, ct,
> +								     ctinfo,
> +								     hooknum,
> +								     hdrlen))
> +					err = NF_DROP;
> +				goto out;
> +			}
> +		}
> +		/* Non-ICMP, fall thru to initialize if needed. */
> +		fallthrough;
> +	case IP_CT_NEW:
> +		/* Seen it before?  This can happen for loopback, retrans,
> +		 * or local packets.
> +		 */
> +		if (!nf_nat_initialized(ct, maniptype)) {
> +			/* Initialize according to the NAT action. */
> +			err = (range && range->flags & NF_NAT_RANGE_MAP_IPS)
> +				/* Action is set up to establish a new
> +				 * mapping.
> +				 */
> +				? nf_nat_setup_info(ct, range, maniptype)
> +				: nf_nat_alloc_null_binding(ct, hooknum);
> +			if (err != NF_ACCEPT)
> +				goto out;
> +		}
> +		break;
> +
> +	case IP_CT_ESTABLISHED:
> +	case IP_CT_ESTABLISHED_REPLY:
> +		break;
> +
> +	default:
> +		err = NF_DROP;
> +		goto out;
> +	}
> +
> +	err = nf_nat_packet(ct, ctinfo, hooknum, skb);
> +	if (err == NF_ACCEPT)
> +		*action |= (1 << maniptype);
> +out:
> +	return err;
> +}
> +
> +int nf_ct_nat(struct sk_buff *skb, struct nf_conn *ct,
> +	      enum ip_conntrack_info ctinfo, int *action,
> +	      const struct nf_nat_range2 *range, bool commit)
> +{
> +	enum nf_nat_manip_type maniptype;
> +	int err, ct_action = *action;
> +
> +	*action = 0;
> +
> +	/* Add NAT extension if not confirmed yet. */
> +	if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct))
> +		return NF_ACCEPT;   /* Can't NAT. */
> +
> +	if (ctinfo != IP_CT_NEW && (ct->status & IPS_NAT_MASK) &&
> +	    (ctinfo != IP_CT_RELATED || commit)) {
> +		/* NAT an established or related connection like before. */
> +		if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY)
> +			/* This is the REPLY direction for a connection
> +			 * for which NAT was applied in the forward
> +			 * direction.  Do the reverse NAT.
> +			 */
> +			maniptype = ct->status & IPS_SRC_NAT
> +				? NF_NAT_MANIP_DST : NF_NAT_MANIP_SRC;
> +		else
> +			maniptype = ct->status & IPS_SRC_NAT
> +				? NF_NAT_MANIP_SRC : NF_NAT_MANIP_DST;
> +	} else if (ct_action & (1 << NF_NAT_MANIP_SRC)) {
> +		maniptype = NF_NAT_MANIP_SRC;
> +	} else if (ct_action & (1 << NF_NAT_MANIP_DST)) {
> +		maniptype = NF_NAT_MANIP_DST;
> +	} else {
> +		return NF_ACCEPT;
> +	}
> +
> +	err = nf_ct_nat_execute(skb, ct, ctinfo, action, range, maniptype);
> +	if (err == NF_ACCEPT && ct->status & IPS_DST_NAT) {
> +		if (ct->status & IPS_SRC_NAT) {
> +			if (maniptype == NF_NAT_MANIP_SRC)
> +				maniptype = NF_NAT_MANIP_DST;
> +			else
> +				maniptype = NF_NAT_MANIP_SRC;
> +
> +			err = nf_ct_nat_execute(skb, ct, ctinfo, action, range,
> +						maniptype);
> +		} else if (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) {
> +			err = nf_ct_nat_execute(skb, ct, ctinfo, action, NULL,
> +						NF_NAT_MANIP_SRC);
> +		}
> +	}
> +	return err;
> +}
> +EXPORT_SYMBOL_GPL(nf_ct_nat);

I'd suggest you move this code to nf_nat_ovs.c or such so we remember
these symbols are used by act_ct.c and ovs.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 5/5] net: move the nat function to nf_nat_core for ovs and tc
  2022-11-16 21:05   ` Aaron Conole
@ 2022-11-17  0:36     ` Xin Long
  2022-11-19 16:22       ` Aaron Conole
  0 siblings, 1 reply; 16+ messages in thread
From: Xin Long @ 2022-11-17  0:36 UTC (permalink / raw)
  To: Aaron Conole
  Cc: network dev, dev, davem, kuba, Eric Dumazet, Paolo Abeni,
	Pravin B Shelar, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	Pablo Neira Ayuso, Florian Westphal, Marcelo Ricardo Leitner,
	Davide Caratti, Eelco Chaudron

On Wed, Nov 16, 2022 at 4:05 PM Aaron Conole <aconole@redhat.com> wrote:
>
> Hi Xin,
>
> Xin Long <lucien.xin@gmail.com> writes:
>
> > There are two nat functions are nearly the same in both OVS and
> > TC code, (ovs_)ct_nat_execute() and ovs_ct_nat/tcf_ct_act_nat().
> >
> > This patch is to move them to netfilter nf_nat_core and export
> > nf_ct_nat() so that it can be shared by both OVS and TC, and
> > keep the nat (type) check and nat flag update in OVS and TC's
> > own place, as these parts are different between OVS and TC.
> >
> > Note that in OVS nat function it was using skb->protocol to get
> > the proto as it already skips vlans in key_extract(), while it
> > doesn't in TC, and TC has to call skb_protocol() to get proto.
> > So in nf_ct_nat_execute(), we keep using skb_protocol() which
> > works for both OVS and TC.
> >
> > Signed-off-by: Xin Long <lucien.xin@gmail.com>
> > ---
> >  include/net/netfilter/nf_nat.h |   4 +
> >  net/netfilter/nf_nat_core.c    | 131 +++++++++++++++++++++++++++++++
> >  net/openvswitch/conntrack.c    | 137 +++------------------------------
> >  net/sched/act_ct.c             | 136 +++-----------------------------
> >  4 files changed, 156 insertions(+), 252 deletions(-)
> >
> > diff --git a/include/net/netfilter/nf_nat.h b/include/net/netfilter/nf_nat.h
> > index e9eb01e99d2f..9877f064548a 100644
> > --- a/include/net/netfilter/nf_nat.h
> > +++ b/include/net/netfilter/nf_nat.h
> > @@ -104,6 +104,10 @@ unsigned int
> >  nf_nat_inet_fn(void *priv, struct sk_buff *skb,
> >              const struct nf_hook_state *state);
> >
> > +int nf_ct_nat(struct sk_buff *skb, struct nf_conn *ct,
> > +           enum ip_conntrack_info ctinfo, int *action,
> > +           const struct nf_nat_range2 *range, bool commit);
> > +
> >  static inline int nf_nat_initialized(const struct nf_conn *ct,
> >                                    enum nf_nat_manip_type manip)
> >  {
> > diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
> > index e29e4ccb5c5a..1c72b8caa24e 100644
> > --- a/net/netfilter/nf_nat_core.c
> > +++ b/net/netfilter/nf_nat_core.c
> > @@ -784,6 +784,137 @@ nf_nat_inet_fn(void *priv, struct sk_buff *skb,
> >  }
> >  EXPORT_SYMBOL_GPL(nf_nat_inet_fn);
> >
> > +/* Modelled after nf_nat_ipv[46]_fn().
> > + * range is only used for new, uninitialized NAT state.
> > + * Returns either NF_ACCEPT or NF_DROP.
> > + */
> > +static int nf_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
> > +                          enum ip_conntrack_info ctinfo, int *action,
> > +                          const struct nf_nat_range2 *range,
> > +                          enum nf_nat_manip_type maniptype)
> > +{
> > +     __be16 proto = skb_protocol(skb, true);
> > +     int hooknum, err = NF_ACCEPT;
> > +
> > +     /* See HOOK2MANIP(). */
> > +     if (maniptype == NF_NAT_MANIP_SRC)
> > +             hooknum = NF_INET_LOCAL_IN; /* Source NAT */
> > +     else
> > +             hooknum = NF_INET_LOCAL_OUT; /* Destination NAT */
> > +
> > +     switch (ctinfo) {
> > +     case IP_CT_RELATED:
> > +     case IP_CT_RELATED_REPLY:
> > +             if (proto == htons(ETH_P_IP) &&
> > +                 ip_hdr(skb)->protocol == IPPROTO_ICMP) {
> > +                     if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
> > +                                                        hooknum))
> > +                             err = NF_DROP;
> > +                     goto out;
> > +             } else if (IS_ENABLED(CONFIG_IPV6) && proto == htons(ETH_P_IPV6)) {
> > +                     __be16 frag_off;
> > +                     u8 nexthdr = ipv6_hdr(skb)->nexthdr;
> > +                     int hdrlen = ipv6_skip_exthdr(skb,
> > +                                                   sizeof(struct ipv6hdr),
> > +                                                   &nexthdr, &frag_off);
> > +
> > +                     if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) {
> > +                             if (!nf_nat_icmpv6_reply_translation(skb, ct,
> > +                                                                  ctinfo,
> > +                                                                  hooknum,
> > +                                                                  hdrlen))
> > +                                     err = NF_DROP;
> > +                             goto out;
> > +                     }
> > +             }
> > +             /* Non-ICMP, fall thru to initialize if needed. */
> > +             fallthrough;
> > +     case IP_CT_NEW:
> > +             /* Seen it before?  This can happen for loopback, retrans,
> > +              * or local packets.
> > +              */
> > +             if (!nf_nat_initialized(ct, maniptype)) {
> > +                     /* Initialize according to the NAT action. */
> > +                     err = (range && range->flags & NF_NAT_RANGE_MAP_IPS)
> > +                             /* Action is set up to establish a new
> > +                              * mapping.
> > +                              */
> > +                             ? nf_nat_setup_info(ct, range, maniptype)
> > +                             : nf_nat_alloc_null_binding(ct, hooknum);
> > +                     if (err != NF_ACCEPT)
> > +                             goto out;
> > +             }
> > +             break;
> > +
> > +     case IP_CT_ESTABLISHED:
> > +     case IP_CT_ESTABLISHED_REPLY:
> > +             break;
> > +
> > +     default:
> > +             err = NF_DROP;
> > +             goto out;
> > +     }
> > +
> > +     err = nf_nat_packet(ct, ctinfo, hooknum, skb);
> > +     if (err == NF_ACCEPT)
> > +             *action |= (1 << maniptype);
> > +out:
> > +     return err;
> > +}
> > +
> > +int nf_ct_nat(struct sk_buff *skb, struct nf_conn *ct,
> > +           enum ip_conntrack_info ctinfo, int *action,
> > +           const struct nf_nat_range2 *range, bool commit)
> > +{
> > +     enum nf_nat_manip_type maniptype;
> > +     int err, ct_action = *action;
> > +
> > +     *action = 0;
> > +
> > +     /* Add NAT extension if not confirmed yet. */
> > +     if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct))
> > +             return NF_ACCEPT;   /* Can't NAT. */
> > +
> > +     if (ctinfo != IP_CT_NEW && (ct->status & IPS_NAT_MASK) &&
> > +         (ctinfo != IP_CT_RELATED || commit)) {
> > +             /* NAT an established or related connection like before. */
> > +             if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY)
> > +                     /* This is the REPLY direction for a connection
> > +                      * for which NAT was applied in the forward
> > +                      * direction.  Do the reverse NAT.
> > +                      */
> > +                     maniptype = ct->status & IPS_SRC_NAT
> > +                             ? NF_NAT_MANIP_DST : NF_NAT_MANIP_SRC;
> > +             else
> > +                     maniptype = ct->status & IPS_SRC_NAT
> > +                             ? NF_NAT_MANIP_SRC : NF_NAT_MANIP_DST;
> > +     } else if (ct_action & (1 << NF_NAT_MANIP_SRC)) {
> > +             maniptype = NF_NAT_MANIP_SRC;
> > +     } else if (ct_action & (1 << NF_NAT_MANIP_DST)) {
> > +             maniptype = NF_NAT_MANIP_DST;
> > +     } else {
> > +             return NF_ACCEPT;
> > +     }
> > +
> > +     err = nf_ct_nat_execute(skb, ct, ctinfo, action, range, maniptype);
> > +     if (err == NF_ACCEPT && ct->status & IPS_DST_NAT) {
> > +             if (ct->status & IPS_SRC_NAT) {
> > +                     if (maniptype == NF_NAT_MANIP_SRC)
> > +                             maniptype = NF_NAT_MANIP_DST;
> > +                     else
> > +                             maniptype = NF_NAT_MANIP_SRC;
> > +
> > +                     err = nf_ct_nat_execute(skb, ct, ctinfo, action, range,
> > +                                             maniptype);
> > +             } else if (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) {
> > +                     err = nf_ct_nat_execute(skb, ct, ctinfo, action, NULL,
> > +                                             NF_NAT_MANIP_SRC);
> > +             }
> > +     }
> > +     return err;
> > +}
> > +EXPORT_SYMBOL_GPL(nf_ct_nat);
> > +
> >  struct nf_nat_proto_clean {
> >       u8      l3proto;
> >       u8      l4proto;
> > diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> > index cc643a556ea1..d03c75165663 100644
> > --- a/net/openvswitch/conntrack.c
> > +++ b/net/openvswitch/conntrack.c
> > @@ -726,144 +726,27 @@ static void ovs_nat_update_key(struct sw_flow_key *key,
> >       }
> >  }
> >
> > -/* Modelled after nf_nat_ipv[46]_fn().
> > - * range is only used for new, uninitialized NAT state.
> > - * Returns either NF_ACCEPT or NF_DROP.
> > - */
> > -static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
> > -                           enum ip_conntrack_info ctinfo,
> > -                           const struct nf_nat_range2 *range,
> > -                           enum nf_nat_manip_type maniptype, struct sw_flow_key *key)
> > -{
> > -     int hooknum, err = NF_ACCEPT;
> > -
> > -     /* See HOOK2MANIP(). */
> > -     if (maniptype == NF_NAT_MANIP_SRC)
> > -             hooknum = NF_INET_LOCAL_IN; /* Source NAT */
> > -     else
> > -             hooknum = NF_INET_LOCAL_OUT; /* Destination NAT */
> > -
> > -     switch (ctinfo) {
> > -     case IP_CT_RELATED:
> > -     case IP_CT_RELATED_REPLY:
> > -             if (IS_ENABLED(CONFIG_NF_NAT) &&
> > -                 skb->protocol == htons(ETH_P_IP) &&
> > -                 ip_hdr(skb)->protocol == IPPROTO_ICMP) {
> > -                     if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
> > -                                                        hooknum))
> > -                             err = NF_DROP;
> > -                     goto out;
> > -             } else if (IS_ENABLED(CONFIG_IPV6) &&
> > -                        skb->protocol == htons(ETH_P_IPV6)) {
> > -                     __be16 frag_off;
> > -                     u8 nexthdr = ipv6_hdr(skb)->nexthdr;
> > -                     int hdrlen = ipv6_skip_exthdr(skb,
> > -                                                   sizeof(struct ipv6hdr),
> > -                                                   &nexthdr, &frag_off);
> > -
> > -                     if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) {
> > -                             if (!nf_nat_icmpv6_reply_translation(skb, ct,
> > -                                                                  ctinfo,
> > -                                                                  hooknum,
> > -                                                                  hdrlen))
> > -                                     err = NF_DROP;
> > -                             goto out;
> > -                     }
> > -             }
> > -             /* Non-ICMP, fall thru to initialize if needed. */
> > -             fallthrough;
> > -     case IP_CT_NEW:
> > -             /* Seen it before?  This can happen for loopback, retrans,
> > -              * or local packets.
> > -              */
> > -             if (!nf_nat_initialized(ct, maniptype)) {
> > -                     /* Initialize according to the NAT action. */
> > -                     err = (range && range->flags & NF_NAT_RANGE_MAP_IPS)
> > -                             /* Action is set up to establish a new
> > -                              * mapping.
> > -                              */
> > -                             ? nf_nat_setup_info(ct, range, maniptype)
> > -                             : nf_nat_alloc_null_binding(ct, hooknum);
> > -                     if (err != NF_ACCEPT)
> > -                             goto out;
> > -             }
> > -             break;
> > -
> > -     case IP_CT_ESTABLISHED:
> > -     case IP_CT_ESTABLISHED_REPLY:
> > -             break;
> > -
> > -     default:
> > -             err = NF_DROP;
> > -             goto out;
> > -     }
> > -
> > -     err = nf_nat_packet(ct, ctinfo, hooknum, skb);
> > -out:
> > -     /* Update the flow key if NAT successful. */
> > -     if (err == NF_ACCEPT)
> > -             ovs_nat_update_key(key, skb, maniptype);
> > -
> > -     return err;
> > -}
> > -
> >  /* Returns NF_DROP if the packet should be dropped, NF_ACCEPT otherwise. */
> >  static int ovs_ct_nat(struct net *net, struct sw_flow_key *key,
> >                     const struct ovs_conntrack_info *info,
> >                     struct sk_buff *skb, struct nf_conn *ct,
> >                     enum ip_conntrack_info ctinfo)
> >  {
> > -     enum nf_nat_manip_type maniptype;
> > -     int err;
> > +     int err, action = 0;
> >
> >       if (!(info->nat & OVS_CT_NAT))
> >               return NF_ACCEPT;
> > +     if (info->nat & OVS_CT_SRC_NAT)
> > +             action |= (1 << NF_NAT_MANIP_SRC);
> > +     if (info->nat & OVS_CT_DST_NAT)
> > +             action |= (1 << NF_NAT_MANIP_DST);
> >
> > -     /* Add NAT extension if not confirmed yet. */
> > -     if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct))
> > -             return NF_ACCEPT;   /* Can't NAT. */
> > +     err = nf_ct_nat(skb, ct, ctinfo, &action, &info->range, info->commit);
> >
> > -     /* Determine NAT type.
> > -      * Check if the NAT type can be deduced from the tracked connection.
> > -      * Make sure new expected connections (IP_CT_RELATED) are NATted only
> > -      * when committing.
> > -      */
> > -     if (ctinfo != IP_CT_NEW && ct->status & IPS_NAT_MASK &&
> > -         (ctinfo != IP_CT_RELATED || info->commit)) {
> > -             /* NAT an established or related connection like before. */
> > -             if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY)
> > -                     /* This is the REPLY direction for a connection
> > -                      * for which NAT was applied in the forward
> > -                      * direction.  Do the reverse NAT.
> > -                      */
> > -                     maniptype = ct->status & IPS_SRC_NAT
> > -                             ? NF_NAT_MANIP_DST : NF_NAT_MANIP_SRC;
> > -             else
> > -                     maniptype = ct->status & IPS_SRC_NAT
> > -                             ? NF_NAT_MANIP_SRC : NF_NAT_MANIP_DST;
> > -     } else if (info->nat & OVS_CT_SRC_NAT) {
> > -             maniptype = NF_NAT_MANIP_SRC;
> > -     } else if (info->nat & OVS_CT_DST_NAT) {
> > -             maniptype = NF_NAT_MANIP_DST;
> > -     } else {
> > -             return NF_ACCEPT; /* Connection is not NATed. */
> > -     }
> > -     err = ovs_ct_nat_execute(skb, ct, ctinfo, &info->range, maniptype, key);
> > -
> > -     if (err == NF_ACCEPT && ct->status & IPS_DST_NAT) {
> > -             if (ct->status & IPS_SRC_NAT) {
> > -                     if (maniptype == NF_NAT_MANIP_SRC)
> > -                             maniptype = NF_NAT_MANIP_DST;
> > -                     else
> > -                             maniptype = NF_NAT_MANIP_SRC;
> > -
> > -                     err = ovs_ct_nat_execute(skb, ct, ctinfo, &info->range,
> > -                                              maniptype, key);
> > -             } else if (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) {
> > -                     err = ovs_ct_nat_execute(skb, ct, ctinfo, NULL,
> > -                                              NF_NAT_MANIP_SRC, key);
> > -             }
> > -     }
> > +     if (action & (1 << NF_NAT_MANIP_SRC))
> > +             ovs_nat_update_key(key, skb, NF_NAT_MANIP_SRC);
> > +     if (action & (1 << NF_NAT_MANIP_DST))
> > +             ovs_nat_update_key(key, skb, NF_NAT_MANIP_DST);
>
> I haven't tested this yet with tuple collision, but if I'm reading the
> code right, info->nat will only be from the parsed actions (which will
> have a single src/dst manipulation).  But, this code was previously
> always updated based on ct->status information, which isn't quite the
> same.  Maybe I'm missing something.  I plan to run this testing
> tomorrow, but maybe you have already tested for it?
Hi, Aaron,

Note "&action" is passed into nf_ct_nat() then nf_ct_nat_execute(), and
its value might be changed in nf_ct_nat_execute():

+       err = nf_nat_packet(ct, ctinfo, hooknum, skb);
+       if (err == NF_ACCEPT)
+               *action |= (1 << maniptype);

so it's possible to have multiple (src and dst) manipulations, and
the code is eventually the same as before.

I tested with:
ovs-ofctl add-flow br-ovs "...nw_dst=7.7.16.3 actions=ct(commit,
nat(dst=7.7.16.2), alg=ftp), normal"
ovs-ofctl add-flow br-ovs "...nw_dst=7.7.16.4 actions=ct(commit,
nat(dst=7.7.16.2), alg=ftp), normal"
and could see actions=3 (NF_NAT_MANIP_SRC|DST).

It will be good if you have more tests to be done.

Thanks.

>
> >       return err;
> >  }
> > diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
> > index c7782c9a6ab6..0c410220239f 100644
> > --- a/net/sched/act_ct.c
> > +++ b/net/sched/act_ct.c
> > @@ -863,90 +863,6 @@ static void tcf_ct_params_free_rcu(struct rcu_head *head)
> >       tcf_ct_params_free(params);
> >  }
> >
> > -#if IS_ENABLED(CONFIG_NF_NAT)
> > -/* Modelled after nf_nat_ipv[46]_fn().
> > - * range is only used for new, uninitialized NAT state.
> > - * Returns either NF_ACCEPT or NF_DROP.
> > - */
> > -static int ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
> > -                       enum ip_conntrack_info ctinfo,
> > -                       const struct nf_nat_range2 *range,
> > -                       enum nf_nat_manip_type maniptype)
> > -{
> > -     __be16 proto = skb_protocol(skb, true);
> > -     int hooknum, err = NF_ACCEPT;
> > -
> > -     /* See HOOK2MANIP(). */
> > -     if (maniptype == NF_NAT_MANIP_SRC)
> > -             hooknum = NF_INET_LOCAL_IN; /* Source NAT */
> > -     else
> > -             hooknum = NF_INET_LOCAL_OUT; /* Destination NAT */
> > -
> > -     switch (ctinfo) {
> > -     case IP_CT_RELATED:
> > -     case IP_CT_RELATED_REPLY:
> > -             if (proto == htons(ETH_P_IP) &&
> > -                 ip_hdr(skb)->protocol == IPPROTO_ICMP) {
> > -                     if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
> > -                                                        hooknum))
> > -                             err = NF_DROP;
> > -                     goto out;
> > -             } else if (IS_ENABLED(CONFIG_IPV6) && proto == htons(ETH_P_IPV6)) {
> > -                     __be16 frag_off;
> > -                     u8 nexthdr = ipv6_hdr(skb)->nexthdr;
> > -                     int hdrlen = ipv6_skip_exthdr(skb,
> > -                                                   sizeof(struct ipv6hdr),
> > -                                                   &nexthdr, &frag_off);
> > -
> > -                     if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) {
> > -                             if (!nf_nat_icmpv6_reply_translation(skb, ct,
> > -                                                                  ctinfo,
> > -                                                                  hooknum,
> > -                                                                  hdrlen))
> > -                                     err = NF_DROP;
> > -                             goto out;
> > -                     }
> > -             }
> > -             /* Non-ICMP, fall thru to initialize if needed. */
> > -             fallthrough;
> > -     case IP_CT_NEW:
> > -             /* Seen it before?  This can happen for loopback, retrans,
> > -              * or local packets.
> > -              */
> > -             if (!nf_nat_initialized(ct, maniptype)) {
> > -                     /* Initialize according to the NAT action. */
> > -                     err = (range && range->flags & NF_NAT_RANGE_MAP_IPS)
> > -                             /* Action is set up to establish a new
> > -                              * mapping.
> > -                              */
> > -                             ? nf_nat_setup_info(ct, range, maniptype)
> > -                             : nf_nat_alloc_null_binding(ct, hooknum);
> > -                     if (err != NF_ACCEPT)
> > -                             goto out;
> > -             }
> > -             break;
> > -
> > -     case IP_CT_ESTABLISHED:
> > -     case IP_CT_ESTABLISHED_REPLY:
> > -             break;
> > -
> > -     default:
> > -             err = NF_DROP;
> > -             goto out;
> > -     }
> > -
> > -     err = nf_nat_packet(ct, ctinfo, hooknum, skb);
> > -out:
> > -     if (err == NF_ACCEPT) {
> > -             if (maniptype == NF_NAT_MANIP_SRC)
> > -                     tc_skb_cb(skb)->post_ct_snat = 1;
> > -             if (maniptype == NF_NAT_MANIP_DST)
> > -                     tc_skb_cb(skb)->post_ct_dnat = 1;
> > -     }
> > -     return err;
> > -}
> > -#endif /* CONFIG_NF_NAT */
> > -
> >  static void tcf_ct_act_set_mark(struct nf_conn *ct, u32 mark, u32 mask)
> >  {
> >  #if IS_ENABLED(CONFIG_NF_CONNTRACK_MARK)
> > @@ -986,52 +902,22 @@ static int tcf_ct_act_nat(struct sk_buff *skb,
> >                         bool commit)
> >  {
> >  #if IS_ENABLED(CONFIG_NF_NAT)
> > -     int err;
> > -     enum nf_nat_manip_type maniptype;
> > +     int err, action = 0;
> >
> >       if (!(ct_action & TCA_CT_ACT_NAT))
> >               return NF_ACCEPT;
> > +     if (ct_action & TCA_CT_ACT_NAT_SRC)
> > +             action |= (1 << NF_NAT_MANIP_SRC);
> > +     if (ct_action & TCA_CT_ACT_NAT_DST)
> > +             action |= (1 << NF_NAT_MANIP_DST);
> >
> > -     /* Add NAT extension if not confirmed yet. */
> > -     if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct))
> > -             return NF_ACCEPT;   /* Can't NAT. */
> > -
> > -     if (ctinfo != IP_CT_NEW && (ct->status & IPS_NAT_MASK) &&
> > -         (ctinfo != IP_CT_RELATED || commit)) {
> > -             /* NAT an established or related connection like before. */
> > -             if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY)
> > -                     /* This is the REPLY direction for a connection
> > -                      * for which NAT was applied in the forward
> > -                      * direction.  Do the reverse NAT.
> > -                      */
> > -                     maniptype = ct->status & IPS_SRC_NAT
> > -                             ? NF_NAT_MANIP_DST : NF_NAT_MANIP_SRC;
> > -             else
> > -                     maniptype = ct->status & IPS_SRC_NAT
> > -                             ? NF_NAT_MANIP_SRC : NF_NAT_MANIP_DST;
> > -     } else if (ct_action & TCA_CT_ACT_NAT_SRC) {
> > -             maniptype = NF_NAT_MANIP_SRC;
> > -     } else if (ct_action & TCA_CT_ACT_NAT_DST) {
> > -             maniptype = NF_NAT_MANIP_DST;
> > -     } else {
> > -             return NF_ACCEPT;
> > -     }
> > +     err = nf_ct_nat(skb, ct, ctinfo, &action, range, commit);
> > +
> > +     if (action & (1 << NF_NAT_MANIP_SRC))
> > +             tc_skb_cb(skb)->post_ct_snat = 1;
> > +     if (action & (1 << NF_NAT_MANIP_DST))
> > +             tc_skb_cb(skb)->post_ct_dnat = 1;
> >
> > -     err = ct_nat_execute(skb, ct, ctinfo, range, maniptype);
> > -     if (err == NF_ACCEPT && ct->status & IPS_DST_NAT) {
> > -             if (ct->status & IPS_SRC_NAT) {
> > -                     if (maniptype == NF_NAT_MANIP_SRC)
> > -                             maniptype = NF_NAT_MANIP_DST;
> > -                     else
> > -                             maniptype = NF_NAT_MANIP_SRC;
> > -
> > -                     err = ct_nat_execute(skb, ct, ctinfo, range,
> > -                                          maniptype);
> > -             } else if (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) {
> > -                     err = ct_nat_execute(skb, ct, ctinfo, NULL,
> > -                                          NF_NAT_MANIP_SRC);
> > -             }
> > -     }
> >       return err;
> >  #else
> >       return NF_ACCEPT;
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 5/5] net: move the nat function to nf_nat_core for ovs and tc
  2022-11-16 21:54   ` Pablo Neira Ayuso
@ 2022-11-17  0:51     ` Xin Long
  2022-11-17 10:57       ` Pablo Neira Ayuso
  0 siblings, 1 reply; 16+ messages in thread
From: Xin Long @ 2022-11-17  0:51 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: network dev, dev, davem, kuba, Eric Dumazet, Paolo Abeni,
	Pravin B Shelar, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	Florian Westphal, Marcelo Ricardo Leitner, Davide Caratti,
	Eelco Chaudron, Aaron Conole

On Wed, Nov 16, 2022 at 4:54 PM Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>
> On Tue, Nov 15, 2022 at 10:50:57AM -0500, Xin Long wrote:
> > diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
> > index e29e4ccb5c5a..1c72b8caa24e 100644
> > --- a/net/netfilter/nf_nat_core.c
> > +++ b/net/netfilter/nf_nat_core.c
> > @@ -784,6 +784,137 @@ nf_nat_inet_fn(void *priv, struct sk_buff *skb,
> >  }
> >  EXPORT_SYMBOL_GPL(nf_nat_inet_fn);
> >
> > +/* Modelled after nf_nat_ipv[46]_fn().
> > + * range is only used for new, uninitialized NAT state.
> > + * Returns either NF_ACCEPT or NF_DROP.
> > + */
> > +static int nf_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
> > +                          enum ip_conntrack_info ctinfo, int *action,
> > +                          const struct nf_nat_range2 *range,
> > +                          enum nf_nat_manip_type maniptype)
> > +{
> > +     __be16 proto = skb_protocol(skb, true);
> > +     int hooknum, err = NF_ACCEPT;
> > +
> > +     /* See HOOK2MANIP(). */
> > +     if (maniptype == NF_NAT_MANIP_SRC)
> > +             hooknum = NF_INET_LOCAL_IN; /* Source NAT */
> > +     else
> > +             hooknum = NF_INET_LOCAL_OUT; /* Destination NAT */
> > +
> > +     switch (ctinfo) {
> > +     case IP_CT_RELATED:
> > +     case IP_CT_RELATED_REPLY:
> > +             if (proto == htons(ETH_P_IP) &&
> > +                 ip_hdr(skb)->protocol == IPPROTO_ICMP) {
> > +                     if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
> > +                                                        hooknum))
> > +                             err = NF_DROP;
> > +                     goto out;
> > +             } else if (IS_ENABLED(CONFIG_IPV6) && proto == htons(ETH_P_IPV6)) {
> > +                     __be16 frag_off;
> > +                     u8 nexthdr = ipv6_hdr(skb)->nexthdr;
> > +                     int hdrlen = ipv6_skip_exthdr(skb,
> > +                                                   sizeof(struct ipv6hdr),
> > +                                                   &nexthdr, &frag_off);
> > +
> > +                     if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) {
> > +                             if (!nf_nat_icmpv6_reply_translation(skb, ct,
> > +                                                                  ctinfo,
> > +                                                                  hooknum,
> > +                                                                  hdrlen))
> > +                                     err = NF_DROP;
> > +                             goto out;
> > +                     }
> > +             }
> > +             /* Non-ICMP, fall thru to initialize if needed. */
> > +             fallthrough;
> > +     case IP_CT_NEW:
> > +             /* Seen it before?  This can happen for loopback, retrans,
> > +              * or local packets.
> > +              */
> > +             if (!nf_nat_initialized(ct, maniptype)) {
> > +                     /* Initialize according to the NAT action. */
> > +                     err = (range && range->flags & NF_NAT_RANGE_MAP_IPS)
> > +                             /* Action is set up to establish a new
> > +                              * mapping.
> > +                              */
> > +                             ? nf_nat_setup_info(ct, range, maniptype)
> > +                             : nf_nat_alloc_null_binding(ct, hooknum);
> > +                     if (err != NF_ACCEPT)
> > +                             goto out;
> > +             }
> > +             break;
> > +
> > +     case IP_CT_ESTABLISHED:
> > +     case IP_CT_ESTABLISHED_REPLY:
> > +             break;
> > +
> > +     default:
> > +             err = NF_DROP;
> > +             goto out;
> > +     }
> > +
> > +     err = nf_nat_packet(ct, ctinfo, hooknum, skb);
> > +     if (err == NF_ACCEPT)
> > +             *action |= (1 << maniptype);
> > +out:
> > +     return err;
> > +}
> > +
> > +int nf_ct_nat(struct sk_buff *skb, struct nf_conn *ct,
> > +           enum ip_conntrack_info ctinfo, int *action,
> > +           const struct nf_nat_range2 *range, bool commit)
> > +{
> > +     enum nf_nat_manip_type maniptype;
> > +     int err, ct_action = *action;
> > +
> > +     *action = 0;
> > +
> > +     /* Add NAT extension if not confirmed yet. */
> > +     if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct))
> > +             return NF_ACCEPT;   /* Can't NAT. */
> > +
> > +     if (ctinfo != IP_CT_NEW && (ct->status & IPS_NAT_MASK) &&
> > +         (ctinfo != IP_CT_RELATED || commit)) {
> > +             /* NAT an established or related connection like before. */
> > +             if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY)
> > +                     /* This is the REPLY direction for a connection
> > +                      * for which NAT was applied in the forward
> > +                      * direction.  Do the reverse NAT.
> > +                      */
> > +                     maniptype = ct->status & IPS_SRC_NAT
> > +                             ? NF_NAT_MANIP_DST : NF_NAT_MANIP_SRC;
> > +             else
> > +                     maniptype = ct->status & IPS_SRC_NAT
> > +                             ? NF_NAT_MANIP_SRC : NF_NAT_MANIP_DST;
> > +     } else if (ct_action & (1 << NF_NAT_MANIP_SRC)) {
> > +             maniptype = NF_NAT_MANIP_SRC;
> > +     } else if (ct_action & (1 << NF_NAT_MANIP_DST)) {
> > +             maniptype = NF_NAT_MANIP_DST;
> > +     } else {
> > +             return NF_ACCEPT;
> > +     }
> > +
> > +     err = nf_ct_nat_execute(skb, ct, ctinfo, action, range, maniptype);
> > +     if (err == NF_ACCEPT && ct->status & IPS_DST_NAT) {
> > +             if (ct->status & IPS_SRC_NAT) {
> > +                     if (maniptype == NF_NAT_MANIP_SRC)
> > +                             maniptype = NF_NAT_MANIP_DST;
> > +                     else
> > +                             maniptype = NF_NAT_MANIP_SRC;
> > +
> > +                     err = nf_ct_nat_execute(skb, ct, ctinfo, action, range,
> > +                                             maniptype);
> > +             } else if (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) {
> > +                     err = nf_ct_nat_execute(skb, ct, ctinfo, action, NULL,
> > +                                             NF_NAT_MANIP_SRC);
> > +             }
> > +     }
> > +     return err;
> > +}
> > +EXPORT_SYMBOL_GPL(nf_ct_nat);
>
> I'd suggest you move this code to nf_nat_ovs.c or such so we remember
> these symbols are used by act_ct.c and ovs.
Good idea, do you think we should also create nf_conntrack_ovs.c
to have nf_ct_helper() and nf_ct_add_helper()?
which were added by:

https://lore.kernel.org/netdev/20221101150031.a6rtrgzwfd7kzknn@t14s.localdomain/T/

thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 5/5] net: move the nat function to nf_nat_core for ovs and tc
  2022-11-17  0:51     ` Xin Long
@ 2022-11-17 10:57       ` Pablo Neira Ayuso
  2022-11-17 15:10         ` Xin Long
  0 siblings, 1 reply; 16+ messages in thread
From: Pablo Neira Ayuso @ 2022-11-17 10:57 UTC (permalink / raw)
  To: Xin Long
  Cc: network dev, dev, davem, kuba, Eric Dumazet, Paolo Abeni,
	Pravin B Shelar, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	Florian Westphal, Marcelo Ricardo Leitner, Davide Caratti,
	Eelco Chaudron, Aaron Conole

On Wed, Nov 16, 2022 at 07:51:40PM -0500, Xin Long wrote:
> On Wed, Nov 16, 2022 at 4:54 PM Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > On Tue, Nov 15, 2022 at 10:50:57AM -0500, Xin Long wrote:
[...]
> > I'd suggest you move this code to nf_nat_ovs.c or such so we remember
> > these symbols are used by act_ct.c and ovs.
>
> Good idea, do you think we should also create nf_conntrack_ovs.c
> to have nf_ct_helper() and nf_ct_add_helper()?
> which were added by:
> 
> https://lore.kernel.org/netdev/20221101150031.a6rtrgzwfd7kzknn@t14s.localdomain/T/

If it is used by ovs infra, I would suggest to move there too.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 5/5] net: move the nat function to nf_nat_core for ovs and tc
  2022-11-17 10:57       ` Pablo Neira Ayuso
@ 2022-11-17 15:10         ` Xin Long
  0 siblings, 0 replies; 16+ messages in thread
From: Xin Long @ 2022-11-17 15:10 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: network dev, dev, davem, kuba, Eric Dumazet, Paolo Abeni,
	Pravin B Shelar, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	Florian Westphal, Marcelo Ricardo Leitner, Davide Caratti,
	Eelco Chaudron, Aaron Conole

On Thu, Nov 17, 2022 at 5:57 AM Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>
> On Wed, Nov 16, 2022 at 07:51:40PM -0500, Xin Long wrote:
> > On Wed, Nov 16, 2022 at 4:54 PM Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > On Tue, Nov 15, 2022 at 10:50:57AM -0500, Xin Long wrote:
> [...]
> > > I'd suggest you move this code to nf_nat_ovs.c or such so we remember
> > > these symbols are used by act_ct.c and ovs.
> >
> > Good idea, do you think we should also create nf_conntrack_ovs.c
> > to have nf_ct_helper() and nf_ct_add_helper()?
> > which were added by:
> >
> > https://lore.kernel.org/netdev/20221101150031.a6rtrgzwfd7kzknn@t14s.localdomain/T/
>
> If it is used by ovs infra, I would suggest to move there too.
OK, I will create only "nf_conntrack_ovs.c" to have the nat functions
in this patch.
and post another patch to move nf_ct_helper() and nf_ct_add_helper() there, too.

I think one file "nf_conntrack_ovs.c" should be enough to include all
these functions
used by ovs conntrack infra.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 5/5] net: move the nat function to nf_nat_core for ovs and tc
  2022-11-17  0:36     ` Xin Long
@ 2022-11-19 16:22       ` Aaron Conole
  0 siblings, 0 replies; 16+ messages in thread
From: Aaron Conole @ 2022-11-19 16:22 UTC (permalink / raw)
  To: Xin Long
  Cc: network dev, dev, davem, kuba, Eric Dumazet, Paolo Abeni,
	Pravin B Shelar, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	Pablo Neira Ayuso, Florian Westphal, Marcelo Ricardo Leitner,
	Davide Caratti, Eelco Chaudron

Xin Long <lucien.xin@gmail.com> writes:

> On Wed, Nov 16, 2022 at 4:05 PM Aaron Conole <aconole@redhat.com> wrote:
>>
>> Hi Xin,
>>
>> Xin Long <lucien.xin@gmail.com> writes:
>>
>> > There are two nat functions are nearly the same in both OVS and
>> > TC code, (ovs_)ct_nat_execute() and ovs_ct_nat/tcf_ct_act_nat().
>> >
>> > This patch is to move them to netfilter nf_nat_core and export
>> > nf_ct_nat() so that it can be shared by both OVS and TC, and
>> > keep the nat (type) check and nat flag update in OVS and TC's
>> > own place, as these parts are different between OVS and TC.
>> >
>> > Note that in OVS nat function it was using skb->protocol to get
>> > the proto as it already skips vlans in key_extract(), while it
>> > doesn't in TC, and TC has to call skb_protocol() to get proto.
>> > So in nf_ct_nat_execute(), we keep using skb_protocol() which
>> > works for both OVS and TC.
>> >
>> > Signed-off-by: Xin Long <lucien.xin@gmail.com>
>> > ---
>> >  include/net/netfilter/nf_nat.h |   4 +
>> >  net/netfilter/nf_nat_core.c    | 131 +++++++++++++++++++++++++++++++
>> >  net/openvswitch/conntrack.c    | 137 +++------------------------------
>> >  net/sched/act_ct.c             | 136 +++-----------------------------
>> >  4 files changed, 156 insertions(+), 252 deletions(-)
>> >
>> > diff --git a/include/net/netfilter/nf_nat.h b/include/net/netfilter/nf_nat.h
>> > index e9eb01e99d2f..9877f064548a 100644
>> > --- a/include/net/netfilter/nf_nat.h
>> > +++ b/include/net/netfilter/nf_nat.h
>> > @@ -104,6 +104,10 @@ unsigned int
>> >  nf_nat_inet_fn(void *priv, struct sk_buff *skb,
>> >              const struct nf_hook_state *state);
>> >
>> > +int nf_ct_nat(struct sk_buff *skb, struct nf_conn *ct,
>> > +           enum ip_conntrack_info ctinfo, int *action,
>> > +           const struct nf_nat_range2 *range, bool commit);
>> > +
>> >  static inline int nf_nat_initialized(const struct nf_conn *ct,
>> >                                    enum nf_nat_manip_type manip)
>> >  {
>> > diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
>> > index e29e4ccb5c5a..1c72b8caa24e 100644
>> > --- a/net/netfilter/nf_nat_core.c
>> > +++ b/net/netfilter/nf_nat_core.c
>> > @@ -784,6 +784,137 @@ nf_nat_inet_fn(void *priv, struct sk_buff *skb,
>> >  }
>> >  EXPORT_SYMBOL_GPL(nf_nat_inet_fn);
>> >
>> > +/* Modelled after nf_nat_ipv[46]_fn().
>> > + * range is only used for new, uninitialized NAT state.
>> > + * Returns either NF_ACCEPT or NF_DROP.
>> > + */
>> > +static int nf_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
>> > +                          enum ip_conntrack_info ctinfo, int *action,
>> > +                          const struct nf_nat_range2 *range,
>> > +                          enum nf_nat_manip_type maniptype)
>> > +{
>> > +     __be16 proto = skb_protocol(skb, true);
>> > +     int hooknum, err = NF_ACCEPT;
>> > +
>> > +     /* See HOOK2MANIP(). */
>> > +     if (maniptype == NF_NAT_MANIP_SRC)
>> > +             hooknum = NF_INET_LOCAL_IN; /* Source NAT */
>> > +     else
>> > +             hooknum = NF_INET_LOCAL_OUT; /* Destination NAT */
>> > +
>> > +     switch (ctinfo) {
>> > +     case IP_CT_RELATED:
>> > +     case IP_CT_RELATED_REPLY:
>> > +             if (proto == htons(ETH_P_IP) &&
>> > +                 ip_hdr(skb)->protocol == IPPROTO_ICMP) {
>> > +                     if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
>> > +                                                        hooknum))
>> > +                             err = NF_DROP;
>> > +                     goto out;
>> > +             } else if (IS_ENABLED(CONFIG_IPV6) && proto == htons(ETH_P_IPV6)) {
>> > +                     __be16 frag_off;
>> > +                     u8 nexthdr = ipv6_hdr(skb)->nexthdr;
>> > +                     int hdrlen = ipv6_skip_exthdr(skb,
>> > +                                                   sizeof(struct ipv6hdr),
>> > +                                                   &nexthdr, &frag_off);
>> > +
>> > +                     if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) {
>> > +                             if (!nf_nat_icmpv6_reply_translation(skb, ct,
>> > +                                                                  ctinfo,
>> > +                                                                  hooknum,
>> > +                                                                  hdrlen))
>> > +                                     err = NF_DROP;
>> > +                             goto out;
>> > +                     }
>> > +             }
>> > +             /* Non-ICMP, fall thru to initialize if needed. */
>> > +             fallthrough;
>> > +     case IP_CT_NEW:
>> > +             /* Seen it before?  This can happen for loopback, retrans,
>> > +              * or local packets.
>> > +              */
>> > +             if (!nf_nat_initialized(ct, maniptype)) {
>> > +                     /* Initialize according to the NAT action. */
>> > +                     err = (range && range->flags & NF_NAT_RANGE_MAP_IPS)
>> > +                             /* Action is set up to establish a new
>> > +                              * mapping.
>> > +                              */
>> > +                             ? nf_nat_setup_info(ct, range, maniptype)
>> > +                             : nf_nat_alloc_null_binding(ct, hooknum);
>> > +                     if (err != NF_ACCEPT)
>> > +                             goto out;
>> > +             }
>> > +             break;
>> > +
>> > +     case IP_CT_ESTABLISHED:
>> > +     case IP_CT_ESTABLISHED_REPLY:
>> > +             break;
>> > +
>> > +     default:
>> > +             err = NF_DROP;
>> > +             goto out;
>> > +     }
>> > +
>> > +     err = nf_nat_packet(ct, ctinfo, hooknum, skb);
>> > +     if (err == NF_ACCEPT)
>> > +             *action |= (1 << maniptype);
>> > +out:
>> > +     return err;
>> > +}
>> > +
>> > +int nf_ct_nat(struct sk_buff *skb, struct nf_conn *ct,
>> > +           enum ip_conntrack_info ctinfo, int *action,
>> > +           const struct nf_nat_range2 *range, bool commit)
>> > +{
>> > +     enum nf_nat_manip_type maniptype;
>> > +     int err, ct_action = *action;
>> > +
>> > +     *action = 0;
>> > +
>> > +     /* Add NAT extension if not confirmed yet. */
>> > +     if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct))
>> > +             return NF_ACCEPT;   /* Can't NAT. */
>> > +
>> > +     if (ctinfo != IP_CT_NEW && (ct->status & IPS_NAT_MASK) &&
>> > +         (ctinfo != IP_CT_RELATED || commit)) {
>> > +             /* NAT an established or related connection like before. */
>> > +             if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY)
>> > +                     /* This is the REPLY direction for a connection
>> > +                      * for which NAT was applied in the forward
>> > +                      * direction.  Do the reverse NAT.
>> > +                      */
>> > +                     maniptype = ct->status & IPS_SRC_NAT
>> > +                             ? NF_NAT_MANIP_DST : NF_NAT_MANIP_SRC;
>> > +             else
>> > +                     maniptype = ct->status & IPS_SRC_NAT
>> > +                             ? NF_NAT_MANIP_SRC : NF_NAT_MANIP_DST;
>> > +     } else if (ct_action & (1 << NF_NAT_MANIP_SRC)) {
>> > +             maniptype = NF_NAT_MANIP_SRC;
>> > +     } else if (ct_action & (1 << NF_NAT_MANIP_DST)) {
>> > +             maniptype = NF_NAT_MANIP_DST;
>> > +     } else {
>> > +             return NF_ACCEPT;
>> > +     }
>> > +
>> > +     err = nf_ct_nat_execute(skb, ct, ctinfo, action, range, maniptype);
>> > +     if (err == NF_ACCEPT && ct->status & IPS_DST_NAT) {
>> > +             if (ct->status & IPS_SRC_NAT) {
>> > +                     if (maniptype == NF_NAT_MANIP_SRC)
>> > +                             maniptype = NF_NAT_MANIP_DST;
>> > +                     else
>> > +                             maniptype = NF_NAT_MANIP_SRC;
>> > +
>> > +                     err = nf_ct_nat_execute(skb, ct, ctinfo, action, range,
>> > +                                             maniptype);
>> > +             } else if (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) {
>> > +                     err = nf_ct_nat_execute(skb, ct, ctinfo, action, NULL,
>> > +                                             NF_NAT_MANIP_SRC);
>> > +             }
>> > +     }
>> > +     return err;
>> > +}
>> > +EXPORT_SYMBOL_GPL(nf_ct_nat);
>> > +
>> >  struct nf_nat_proto_clean {
>> >       u8      l3proto;
>> >       u8      l4proto;
>> > diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
>> > index cc643a556ea1..d03c75165663 100644
>> > --- a/net/openvswitch/conntrack.c
>> > +++ b/net/openvswitch/conntrack.c
>> > @@ -726,144 +726,27 @@ static void ovs_nat_update_key(struct sw_flow_key *key,
>> >       }
>> >  }
>> >
>> > -/* Modelled after nf_nat_ipv[46]_fn().
>> > - * range is only used for new, uninitialized NAT state.
>> > - * Returns either NF_ACCEPT or NF_DROP.
>> > - */
>> > -static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
>> > -                           enum ip_conntrack_info ctinfo,
>> > -                           const struct nf_nat_range2 *range,
>> > -                           enum nf_nat_manip_type maniptype, struct sw_flow_key *key)
>> > -{
>> > -     int hooknum, err = NF_ACCEPT;
>> > -
>> > -     /* See HOOK2MANIP(). */
>> > -     if (maniptype == NF_NAT_MANIP_SRC)
>> > -             hooknum = NF_INET_LOCAL_IN; /* Source NAT */
>> > -     else
>> > -             hooknum = NF_INET_LOCAL_OUT; /* Destination NAT */
>> > -
>> > -     switch (ctinfo) {
>> > -     case IP_CT_RELATED:
>> > -     case IP_CT_RELATED_REPLY:
>> > -             if (IS_ENABLED(CONFIG_NF_NAT) &&
>> > -                 skb->protocol == htons(ETH_P_IP) &&
>> > -                 ip_hdr(skb)->protocol == IPPROTO_ICMP) {
>> > -                     if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
>> > -                                                        hooknum))
>> > -                             err = NF_DROP;
>> > -                     goto out;
>> > -             } else if (IS_ENABLED(CONFIG_IPV6) &&
>> > -                        skb->protocol == htons(ETH_P_IPV6)) {
>> > -                     __be16 frag_off;
>> > -                     u8 nexthdr = ipv6_hdr(skb)->nexthdr;
>> > -                     int hdrlen = ipv6_skip_exthdr(skb,
>> > -                                                   sizeof(struct ipv6hdr),
>> > -                                                   &nexthdr, &frag_off);
>> > -
>> > -                     if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) {
>> > -                             if (!nf_nat_icmpv6_reply_translation(skb, ct,
>> > -                                                                  ctinfo,
>> > -                                                                  hooknum,
>> > -                                                                  hdrlen))
>> > -                                     err = NF_DROP;
>> > -                             goto out;
>> > -                     }
>> > -             }
>> > -             /* Non-ICMP, fall thru to initialize if needed. */
>> > -             fallthrough;
>> > -     case IP_CT_NEW:
>> > -             /* Seen it before?  This can happen for loopback, retrans,
>> > -              * or local packets.
>> > -              */
>> > -             if (!nf_nat_initialized(ct, maniptype)) {
>> > -                     /* Initialize according to the NAT action. */
>> > -                     err = (range && range->flags & NF_NAT_RANGE_MAP_IPS)
>> > -                             /* Action is set up to establish a new
>> > -                              * mapping.
>> > -                              */
>> > -                             ? nf_nat_setup_info(ct, range, maniptype)
>> > -                             : nf_nat_alloc_null_binding(ct, hooknum);
>> > -                     if (err != NF_ACCEPT)
>> > -                             goto out;
>> > -             }
>> > -             break;
>> > -
>> > -     case IP_CT_ESTABLISHED:
>> > -     case IP_CT_ESTABLISHED_REPLY:
>> > -             break;
>> > -
>> > -     default:
>> > -             err = NF_DROP;
>> > -             goto out;
>> > -     }
>> > -
>> > -     err = nf_nat_packet(ct, ctinfo, hooknum, skb);
>> > -out:
>> > -     /* Update the flow key if NAT successful. */
>> > -     if (err == NF_ACCEPT)
>> > -             ovs_nat_update_key(key, skb, maniptype);
>> > -
>> > -     return err;
>> > -}
>> > -
>> >  /* Returns NF_DROP if the packet should be dropped, NF_ACCEPT otherwise. */
>> >  static int ovs_ct_nat(struct net *net, struct sw_flow_key *key,
>> >                     const struct ovs_conntrack_info *info,
>> >                     struct sk_buff *skb, struct nf_conn *ct,
>> >                     enum ip_conntrack_info ctinfo)
>> >  {
>> > -     enum nf_nat_manip_type maniptype;
>> > -     int err;
>> > +     int err, action = 0;
>> >
>> >       if (!(info->nat & OVS_CT_NAT))
>> >               return NF_ACCEPT;
>> > +     if (info->nat & OVS_CT_SRC_NAT)
>> > +             action |= (1 << NF_NAT_MANIP_SRC);
>> > +     if (info->nat & OVS_CT_DST_NAT)
>> > +             action |= (1 << NF_NAT_MANIP_DST);
>> >
>> > -     /* Add NAT extension if not confirmed yet. */
>> > -     if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct))
>> > -             return NF_ACCEPT;   /* Can't NAT. */
>> > +     err = nf_ct_nat(skb, ct, ctinfo, &action, &info->range, info->commit);
>> >
>> > -     /* Determine NAT type.
>> > -      * Check if the NAT type can be deduced from the tracked connection.
>> > -      * Make sure new expected connections (IP_CT_RELATED) are NATted only
>> > -      * when committing.
>> > -      */
>> > -     if (ctinfo != IP_CT_NEW && ct->status & IPS_NAT_MASK &&
>> > -         (ctinfo != IP_CT_RELATED || info->commit)) {
>> > -             /* NAT an established or related connection like before. */
>> > -             if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY)
>> > -                     /* This is the REPLY direction for a connection
>> > -                      * for which NAT was applied in the forward
>> > -                      * direction.  Do the reverse NAT.
>> > -                      */
>> > -                     maniptype = ct->status & IPS_SRC_NAT
>> > -                             ? NF_NAT_MANIP_DST : NF_NAT_MANIP_SRC;
>> > -             else
>> > -                     maniptype = ct->status & IPS_SRC_NAT
>> > -                             ? NF_NAT_MANIP_SRC : NF_NAT_MANIP_DST;
>> > -     } else if (info->nat & OVS_CT_SRC_NAT) {
>> > -             maniptype = NF_NAT_MANIP_SRC;
>> > -     } else if (info->nat & OVS_CT_DST_NAT) {
>> > -             maniptype = NF_NAT_MANIP_DST;
>> > -     } else {
>> > -             return NF_ACCEPT; /* Connection is not NATed. */
>> > -     }
>> > -     err = ovs_ct_nat_execute(skb, ct, ctinfo, &info->range, maniptype, key);
>> > -
>> > -     if (err == NF_ACCEPT && ct->status & IPS_DST_NAT) {
>> > -             if (ct->status & IPS_SRC_NAT) {
>> > -                     if (maniptype == NF_NAT_MANIP_SRC)
>> > -                             maniptype = NF_NAT_MANIP_DST;
>> > -                     else
>> > -                             maniptype = NF_NAT_MANIP_SRC;
>> > -
>> > -                     err = ovs_ct_nat_execute(skb, ct, ctinfo, &info->range,
>> > -                                              maniptype, key);
>> > -             } else if (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) {
>> > -                     err = ovs_ct_nat_execute(skb, ct, ctinfo, NULL,
>> > -                                              NF_NAT_MANIP_SRC, key);
>> > -             }
>> > -     }
>> > +     if (action & (1 << NF_NAT_MANIP_SRC))
>> > +             ovs_nat_update_key(key, skb, NF_NAT_MANIP_SRC);
>> > +     if (action & (1 << NF_NAT_MANIP_DST))
>> > +             ovs_nat_update_key(key, skb, NF_NAT_MANIP_DST);
>>
>> I haven't tested this yet with tuple collision, but if I'm reading the
>> code right, info->nat will only be from the parsed actions (which will
>> have a single src/dst manipulation).  But, this code was previously
>> always updated based on ct->status information, which isn't quite the
>> same.  Maybe I'm missing something.  I plan to run this testing
>> tomorrow, but maybe you have already tested for it?
> Hi, Aaron,
>
> Note "&action" is passed into nf_ct_nat() then nf_ct_nat_execute(), and
> its value might be changed in nf_ct_nat_execute():
>
> +       err = nf_nat_packet(ct, ctinfo, hooknum, skb);
> +       if (err == NF_ACCEPT)
> +               *action |= (1 << maniptype);
>
> so it's possible to have multiple (src and dst) manipulations, and
> the code is eventually the same as before.

I missed the assignment from '*action' on the line above '*action = 0'.

> I tested with:
> ovs-ofctl add-flow br-ovs "...nw_dst=7.7.16.3 actions=ct(commit,
> nat(dst=7.7.16.2), alg=ftp), normal"
> ovs-ofctl add-flow br-ovs "...nw_dst=7.7.16.4 actions=ct(commit,
> nat(dst=7.7.16.2), alg=ftp), normal"
> and could see actions=3 (NF_NAT_MANIP_SRC|DST).
>
> It will be good if you have more tests to be done.

Will do - I will also work on adding a tuple collision test in the
selftests after the next round of updates I do.

> Thanks.
>
>>
>> >       return err;
>> >  }
>> > diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
>> > index c7782c9a6ab6..0c410220239f 100644
>> > --- a/net/sched/act_ct.c
>> > +++ b/net/sched/act_ct.c
>> > @@ -863,90 +863,6 @@ static void tcf_ct_params_free_rcu(struct rcu_head *head)
>> >       tcf_ct_params_free(params);
>> >  }
>> >
>> > -#if IS_ENABLED(CONFIG_NF_NAT)
>> > -/* Modelled after nf_nat_ipv[46]_fn().
>> > - * range is only used for new, uninitialized NAT state.
>> > - * Returns either NF_ACCEPT or NF_DROP.
>> > - */
>> > -static int ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
>> > -                       enum ip_conntrack_info ctinfo,
>> > -                       const struct nf_nat_range2 *range,
>> > -                       enum nf_nat_manip_type maniptype)
>> > -{
>> > -     __be16 proto = skb_protocol(skb, true);
>> > -     int hooknum, err = NF_ACCEPT;
>> > -
>> > -     /* See HOOK2MANIP(). */
>> > -     if (maniptype == NF_NAT_MANIP_SRC)
>> > -             hooknum = NF_INET_LOCAL_IN; /* Source NAT */
>> > -     else
>> > -             hooknum = NF_INET_LOCAL_OUT; /* Destination NAT */
>> > -
>> > -     switch (ctinfo) {
>> > -     case IP_CT_RELATED:
>> > -     case IP_CT_RELATED_REPLY:
>> > -             if (proto == htons(ETH_P_IP) &&
>> > -                 ip_hdr(skb)->protocol == IPPROTO_ICMP) {
>> > -                     if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
>> > -                                                        hooknum))
>> > -                             err = NF_DROP;
>> > -                     goto out;
>> > -             } else if (IS_ENABLED(CONFIG_IPV6) && proto == htons(ETH_P_IPV6)) {
>> > -                     __be16 frag_off;
>> > -                     u8 nexthdr = ipv6_hdr(skb)->nexthdr;
>> > -                     int hdrlen = ipv6_skip_exthdr(skb,
>> > -                                                   sizeof(struct ipv6hdr),
>> > -                                                   &nexthdr, &frag_off);
>> > -
>> > -                     if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) {
>> > -                             if (!nf_nat_icmpv6_reply_translation(skb, ct,
>> > -                                                                  ctinfo,
>> > -                                                                  hooknum,
>> > -                                                                  hdrlen))
>> > -                                     err = NF_DROP;
>> > -                             goto out;
>> > -                     }
>> > -             }
>> > -             /* Non-ICMP, fall thru to initialize if needed. */
>> > -             fallthrough;
>> > -     case IP_CT_NEW:
>> > -             /* Seen it before?  This can happen for loopback, retrans,
>> > -              * or local packets.
>> > -              */
>> > -             if (!nf_nat_initialized(ct, maniptype)) {
>> > -                     /* Initialize according to the NAT action. */
>> > -                     err = (range && range->flags & NF_NAT_RANGE_MAP_IPS)
>> > -                             /* Action is set up to establish a new
>> > -                              * mapping.
>> > -                              */
>> > -                             ? nf_nat_setup_info(ct, range, maniptype)
>> > -                             : nf_nat_alloc_null_binding(ct, hooknum);
>> > -                     if (err != NF_ACCEPT)
>> > -                             goto out;
>> > -             }
>> > -             break;
>> > -
>> > -     case IP_CT_ESTABLISHED:
>> > -     case IP_CT_ESTABLISHED_REPLY:
>> > -             break;
>> > -
>> > -     default:
>> > -             err = NF_DROP;
>> > -             goto out;
>> > -     }
>> > -
>> > -     err = nf_nat_packet(ct, ctinfo, hooknum, skb);
>> > -out:
>> > -     if (err == NF_ACCEPT) {
>> > -             if (maniptype == NF_NAT_MANIP_SRC)
>> > -                     tc_skb_cb(skb)->post_ct_snat = 1;
>> > -             if (maniptype == NF_NAT_MANIP_DST)
>> > -                     tc_skb_cb(skb)->post_ct_dnat = 1;
>> > -     }
>> > -     return err;
>> > -}
>> > -#endif /* CONFIG_NF_NAT */
>> > -
>> >  static void tcf_ct_act_set_mark(struct nf_conn *ct, u32 mark, u32 mask)
>> >  {
>> >  #if IS_ENABLED(CONFIG_NF_CONNTRACK_MARK)
>> > @@ -986,52 +902,22 @@ static int tcf_ct_act_nat(struct sk_buff *skb,
>> >                         bool commit)
>> >  {
>> >  #if IS_ENABLED(CONFIG_NF_NAT)
>> > -     int err;
>> > -     enum nf_nat_manip_type maniptype;
>> > +     int err, action = 0;
>> >
>> >       if (!(ct_action & TCA_CT_ACT_NAT))
>> >               return NF_ACCEPT;
>> > +     if (ct_action & TCA_CT_ACT_NAT_SRC)
>> > +             action |= (1 << NF_NAT_MANIP_SRC);
>> > +     if (ct_action & TCA_CT_ACT_NAT_DST)
>> > +             action |= (1 << NF_NAT_MANIP_DST);
>> >
>> > -     /* Add NAT extension if not confirmed yet. */
>> > -     if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct))
>> > -             return NF_ACCEPT;   /* Can't NAT. */
>> > -
>> > -     if (ctinfo != IP_CT_NEW && (ct->status & IPS_NAT_MASK) &&
>> > -         (ctinfo != IP_CT_RELATED || commit)) {
>> > -             /* NAT an established or related connection like before. */
>> > -             if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY)
>> > -                     /* This is the REPLY direction for a connection
>> > -                      * for which NAT was applied in the forward
>> > -                      * direction.  Do the reverse NAT.
>> > -                      */
>> > -                     maniptype = ct->status & IPS_SRC_NAT
>> > -                             ? NF_NAT_MANIP_DST : NF_NAT_MANIP_SRC;
>> > -             else
>> > -                     maniptype = ct->status & IPS_SRC_NAT
>> > -                             ? NF_NAT_MANIP_SRC : NF_NAT_MANIP_DST;
>> > -     } else if (ct_action & TCA_CT_ACT_NAT_SRC) {
>> > -             maniptype = NF_NAT_MANIP_SRC;
>> > -     } else if (ct_action & TCA_CT_ACT_NAT_DST) {
>> > -             maniptype = NF_NAT_MANIP_DST;
>> > -     } else {
>> > -             return NF_ACCEPT;
>> > -     }
>> > +     err = nf_ct_nat(skb, ct, ctinfo, &action, range, commit);
>> > +
>> > +     if (action & (1 << NF_NAT_MANIP_SRC))
>> > +             tc_skb_cb(skb)->post_ct_snat = 1;
>> > +     if (action & (1 << NF_NAT_MANIP_DST))
>> > +             tc_skb_cb(skb)->post_ct_dnat = 1;
>> >
>> > -     err = ct_nat_execute(skb, ct, ctinfo, range, maniptype);
>> > -     if (err == NF_ACCEPT && ct->status & IPS_DST_NAT) {
>> > -             if (ct->status & IPS_SRC_NAT) {
>> > -                     if (maniptype == NF_NAT_MANIP_SRC)
>> > -                             maniptype = NF_NAT_MANIP_DST;
>> > -                     else
>> > -                             maniptype = NF_NAT_MANIP_SRC;
>> > -
>> > -                     err = ct_nat_execute(skb, ct, ctinfo, range,
>> > -                                          maniptype);
>> > -             } else if (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) {
>> > -                     err = ct_nat_execute(skb, ct, ctinfo, NULL,
>> > -                                          NF_NAT_MANIP_SRC);
>> > -             }
>> > -     }
>> >       return err;
>> >  #else
>> >       return NF_ACCEPT;
>>


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-11-19 16:24 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-15 15:50 [PATCH net-next 0/5] net: eliminate the duplicate code in the ct nat functions of ovs and tc Xin Long
2022-11-15 15:50 ` [PATCH net-next 1/5] openvswitch: delete the unncessary skb_pull_rcsum call in ovs_ct_nat_execute Xin Long
2022-11-16 20:55   ` Aaron Conole
2022-11-15 15:50 ` [PATCH net-next 2/5] openvswitch: return NF_ACCEPT when OVS_CT_NAT is net set in info nat Xin Long
2022-11-16 20:56   ` Aaron Conole
2022-11-15 15:50 ` [PATCH net-next 3/5] net: sched: return NF_ACCEPT when fails to add nat ext in tcf_ct_act_nat Xin Long
2022-11-15 15:50 ` [PATCH net-next 4/5] net: sched: update the nat flag for icmp error packets in ct_nat_execute Xin Long
2022-11-15 15:50 ` [PATCH net-next 5/5] net: move the nat function to nf_nat_core for ovs and tc Xin Long
2022-11-16 21:05   ` Aaron Conole
2022-11-17  0:36     ` Xin Long
2022-11-19 16:22       ` Aaron Conole
2022-11-16 21:54   ` Pablo Neira Ayuso
2022-11-17  0:51     ` Xin Long
2022-11-17 10:57       ` Pablo Neira Ayuso
2022-11-17 15:10         ` Xin Long
2022-11-15 19:42 ` [PATCH net-next 0/5] net: eliminate the duplicate code in the ct nat functions of " Saeed Mahameed

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).