All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v2 0/7] Allow offloading of UDP NEW connections via act_ct
@ 2023-01-13 16:55 Vlad Buslov
  2023-01-13 16:55 ` [PATCH net-next v2 1/7] net: flow_offload: provision conntrack info in ct_metadata Vlad Buslov
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Vlad Buslov @ 2023-01-13 16:55 UTC (permalink / raw)
  To: davem, kuba, pabeni, pablo
  Cc: netdev, netfilter-devel, jhs, xiyou.wangcong, jiri, ozsh,
	marcelo.leitner, simon.horman, Vlad Buslov

Currently only bidirectional established connections can be offloaded
via act_ct. Such approach allows to hardcode a lot of assumptions into
act_ct, flow_table and flow_offload intermediate layer codes. In order
to enabled offloading of unidirectional UDP NEW connections start with
incrementally changing the following assumptions:

- CT meta action metadata doesn't store ctinfo as "established" or
  "established replied" is assumed depending on the direction.
  Explicitly provide ctinfo as a new structure field and modify act_ct
  to set it according to current connection state.

- Fix flow_table offload fixup algorithm to calculate flow timeout
  according to current connection state instead of hardcoded
  "established" value.

- Add new flow_table flow flag that designates bidirectional connections
  instead of assuming it and hardcoding hardware offload of every flow
  in both directions.

- Add new flow_table flow flag that marks the flow for asynchronous
  update. Hardware offload state of such flows is updated by gc task by
  leveraging existing flow 'refresh' code.

With all the necessary infrastructure in place modify act_ct to offload
UDP NEW as unidirectional connection. Pass reply direction traffic to CT
and promote connection to bidirectional when UDP connection state
changes to "assured". Rely on refresh mechanism to propagate connection
state change to supporting drivers.

Note that early drop algorithm that is designed to free up some space in
connection tracking table when it becomes full (by randomly deleting up
to 5% of non-established connections) currently ignores connections
marked as "offloaded". Now, with UDP NEW connections becoming
"offloaded" it could allow malicious user to perform DoS attack by
filling the table with non-droppable UDP NEW connections by sending just
one packet in single direction. To prevent such scenario change early
drop algorithm to also consider "offloaded" connections for deletion.

Vlad Buslov (7):
  net: flow_offload: provision conntrack info in ct_metadata
  netfilter: flowtable: fixup UDP timeout depending on ct state
  netfilter: flowtable: allow unidirectional rules
  netfilter: flowtable: allow updating offloaded rules asynchronously
  net/sched: act_ct: set ctinfo in meta action depending on ct state
  net/sched: act_ct: offload UDP NEW connections
  netfilter: nf_conntrack: allow early drop of offloaded UDP conns

 .../ethernet/mellanox/mlx5/core/en/tc_ct.c    |  2 +-
 .../ethernet/netronome/nfp/flower/conntrack.c | 20 +++++++
 include/net/flow_offload.h                    |  2 +
 include/net/netfilter/nf_flow_table.h         |  4 +-
 net/netfilter/nf_conntrack_core.c             | 11 ++--
 net/netfilter/nf_flow_table_core.c            | 25 +++++++--
 net/netfilter/nf_flow_table_offload.c         | 17 ++++--
 net/sched/act_ct.c                            | 56 ++++++++++++++-----
 8 files changed, 104 insertions(+), 33 deletions(-)

-- 
2.38.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH net-next v2 1/7] net: flow_offload: provision conntrack info in ct_metadata
  2023-01-13 16:55 [PATCH net-next v2 0/7] Allow offloading of UDP NEW connections via act_ct Vlad Buslov
@ 2023-01-13 16:55 ` Vlad Buslov
  2023-01-17 14:42   ` Pablo Neira Ayuso
  2023-01-17 15:04   ` Marcelo Ricardo Leitner
  2023-01-13 16:55 ` [PATCH net-next v2 2/7] netfilter: flowtable: fixup UDP timeout depending on ct state Vlad Buslov
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 25+ messages in thread
From: Vlad Buslov @ 2023-01-13 16:55 UTC (permalink / raw)
  To: davem, kuba, pabeni, pablo
  Cc: netdev, netfilter-devel, jhs, xiyou.wangcong, jiri, ozsh,
	marcelo.leitner, simon.horman, Vlad Buslov

In order to offload connections in other states besides "established" the
driver offload callbacks need to have access to connection conntrack info.
Extend flow offload intermediate representation data structure
flow_action_entry->ct_metadata with new enum ip_conntrack_info field and
fill it in tcf_ct_flow_table_add_action_meta() callback.

Reject offloading IP_CT_NEW connections for now by returning an error in
relevant driver callbacks based on value of ctinfo. Support for offloading
such connections will need to be added to the drivers afterwards.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
---

Notes:
    Changes V1 -> V2:
    
    - Add missing include that caused compilation errors on certain configs.
    
    - Change naming in nfp driver as suggested by Simon and Baowen.

 .../ethernet/mellanox/mlx5/core/en/tc_ct.c    |  2 +-
 .../ethernet/netronome/nfp/flower/conntrack.c | 20 +++++++++++++++++++
 include/net/flow_offload.h                    |  2 ++
 net/sched/act_ct.c                            |  1 +
 4 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
index 313df8232db7..8cad5cf3305d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
@@ -1077,7 +1077,7 @@ mlx5_tc_ct_block_flow_offload_add(struct mlx5_ct_ft *ft,
 	int err;
 
 	meta_action = mlx5_tc_ct_get_ct_metadata_action(flow_rule);
-	if (!meta_action)
+	if (!meta_action || meta_action->ct_metadata.ctinfo == IP_CT_NEW)
 		return -EOPNOTSUPP;
 
 	spin_lock_bh(&ct_priv->ht_lock);
diff --git a/drivers/net/ethernet/netronome/nfp/flower/conntrack.c b/drivers/net/ethernet/netronome/nfp/flower/conntrack.c
index f693119541d5..f7569584b9d8 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/conntrack.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/conntrack.c
@@ -1964,6 +1964,23 @@ int nfp_fl_ct_stats(struct flow_cls_offload *flow,
 	return 0;
 }
 
+static bool
+nfp_fl_ct_offload_nft_supported(struct flow_cls_offload *flow)
+{
+	struct flow_rule *flow_rule = flow->rule;
+	struct flow_action *flow_action =
+		&flow_rule->action;
+	struct flow_action_entry *act;
+	int i;
+
+	flow_action_for_each(i, act, flow_action) {
+		if (act->id == FLOW_ACTION_CT_METADATA)
+			return act->ct_metadata.ctinfo != IP_CT_NEW;
+	}
+
+	return false;
+}
+
 static int
 nfp_fl_ct_offload_nft_flow(struct nfp_fl_ct_zone_entry *zt, struct flow_cls_offload *flow)
 {
@@ -1976,6 +1993,9 @@ nfp_fl_ct_offload_nft_flow(struct nfp_fl_ct_zone_entry *zt, struct flow_cls_offl
 	extack = flow->common.extack;
 	switch (flow->command) {
 	case FLOW_CLS_REPLACE:
+		if (!nfp_fl_ct_offload_nft_supported(flow))
+			return -EOPNOTSUPP;
+
 		/* Netfilter can request offload multiple times for the same
 		 * flow - protect against adding duplicates.
 		 */
diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 0400a0ac8a29..a6adaffb68fb 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -4,6 +4,7 @@
 #include <linux/kernel.h>
 #include <linux/list.h>
 #include <linux/netlink.h>
+#include <linux/netfilter/nf_conntrack_common.h>
 #include <net/flow_dissector.h>
 
 struct flow_match {
@@ -288,6 +289,7 @@ struct flow_action_entry {
 		} ct;
 		struct {
 			unsigned long cookie;
+			enum ip_conntrack_info ctinfo;
 			u32 mark;
 			u32 labels[4];
 			bool orig_dir;
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 0ca2bb8ed026..515577f913a3 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -187,6 +187,7 @@ static void tcf_ct_flow_table_add_action_meta(struct nf_conn *ct,
 	/* aligns with the CT reference on the SKB nf_ct_set */
 	entry->ct_metadata.cookie = (unsigned long)ct | ctinfo;
 	entry->ct_metadata.orig_dir = dir == IP_CT_DIR_ORIGINAL;
+	entry->ct_metadata.ctinfo = ctinfo;
 
 	act_ct_labels = entry->ct_metadata.labels;
 	ct_labels = nf_ct_labels_find(ct);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next v2 2/7] netfilter: flowtable: fixup UDP timeout depending on ct state
  2023-01-13 16:55 [PATCH net-next v2 0/7] Allow offloading of UDP NEW connections via act_ct Vlad Buslov
  2023-01-13 16:55 ` [PATCH net-next v2 1/7] net: flow_offload: provision conntrack info in ct_metadata Vlad Buslov
@ 2023-01-13 16:55 ` Vlad Buslov
  2023-01-13 16:55 ` [PATCH net-next v2 3/7] netfilter: flowtable: allow unidirectional rules Vlad Buslov
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Vlad Buslov @ 2023-01-13 16:55 UTC (permalink / raw)
  To: davem, kuba, pabeni, pablo
  Cc: netdev, netfilter-devel, jhs, xiyou.wangcong, jiri, ozsh,
	marcelo.leitner, simon.horman, Vlad Buslov

Currently flow_offload_fixup_ct() function assumes that only replied UDP
connections can be offloaded and hardcodes UDP_CT_REPLIED timeout value. To
enable UDP NEW connection offload in following patches extract the actual
connections state from ct->status and set the timeout according to it.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
---
 net/netfilter/nf_flow_table_core.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index 81c26a96c30b..04bd0ed4d2ae 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -193,8 +193,11 @@ static void flow_offload_fixup_ct(struct nf_conn *ct)
 		timeout -= tn->offload_timeout;
 	} else if (l4num == IPPROTO_UDP) {
 		struct nf_udp_net *tn = nf_udp_pernet(net);
+		enum udp_conntrack state =
+			test_bit(IPS_SEEN_REPLY_BIT, &ct->status) ?
+			UDP_CT_REPLIED : UDP_CT_UNREPLIED;
 
-		timeout = tn->timeouts[UDP_CT_REPLIED];
+		timeout = tn->timeouts[state];
 		timeout -= tn->offload_timeout;
 	} else {
 		return;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next v2 3/7] netfilter: flowtable: allow unidirectional rules
  2023-01-13 16:55 [PATCH net-next v2 0/7] Allow offloading of UDP NEW connections via act_ct Vlad Buslov
  2023-01-13 16:55 ` [PATCH net-next v2 1/7] net: flow_offload: provision conntrack info in ct_metadata Vlad Buslov
  2023-01-13 16:55 ` [PATCH net-next v2 2/7] netfilter: flowtable: fixup UDP timeout depending on ct state Vlad Buslov
@ 2023-01-13 16:55 ` Vlad Buslov
  2023-01-17 15:15   ` Marcelo Ricardo Leitner
  2023-01-13 16:55 ` [PATCH net-next v2 4/7] netfilter: flowtable: allow updating offloaded rules asynchronously Vlad Buslov
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Vlad Buslov @ 2023-01-13 16:55 UTC (permalink / raw)
  To: davem, kuba, pabeni, pablo
  Cc: netdev, netfilter-devel, jhs, xiyou.wangcong, jiri, ozsh,
	marcelo.leitner, simon.horman, Vlad Buslov

Modify flow table offload to support unidirectional connections by
extending enum nf_flow_flags with new "NF_FLOW_HW_BIDIRECTIONAL" flag. Only
offload reply direction when the flag is not set. This infrastructure
change is necessary to support offloading UDP NEW connections in original
direction in following patches in series.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
---
 include/net/netfilter/nf_flow_table.h |  1 +
 net/netfilter/nf_flow_table_offload.c | 12 ++++++++----
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index cd982f4a0f50..88ab98ab41d9 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -164,6 +164,7 @@ enum nf_flow_flags {
 	NF_FLOW_HW_DYING,
 	NF_FLOW_HW_DEAD,
 	NF_FLOW_HW_PENDING,
+	NF_FLOW_HW_BIDIRECTIONAL,
 };
 
 enum flow_offload_type {
diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
index 4d9b99abe37d..8b852f10fab4 100644
--- a/net/netfilter/nf_flow_table_offload.c
+++ b/net/netfilter/nf_flow_table_offload.c
@@ -895,8 +895,9 @@ static int flow_offload_rule_add(struct flow_offload_work *offload,
 
 	ok_count += flow_offload_tuple_add(offload, flow_rule[0],
 					   FLOW_OFFLOAD_DIR_ORIGINAL);
-	ok_count += flow_offload_tuple_add(offload, flow_rule[1],
-					   FLOW_OFFLOAD_DIR_REPLY);
+	if (test_bit(NF_FLOW_HW_BIDIRECTIONAL, &offload->flow->flags))
+		ok_count += flow_offload_tuple_add(offload, flow_rule[1],
+						   FLOW_OFFLOAD_DIR_REPLY);
 	if (ok_count == 0)
 		return -ENOENT;
 
@@ -926,7 +927,8 @@ static void flow_offload_work_del(struct flow_offload_work *offload)
 {
 	clear_bit(IPS_HW_OFFLOAD_BIT, &offload->flow->ct->status);
 	flow_offload_tuple_del(offload, FLOW_OFFLOAD_DIR_ORIGINAL);
-	flow_offload_tuple_del(offload, FLOW_OFFLOAD_DIR_REPLY);
+	if (test_bit(NF_FLOW_HW_BIDIRECTIONAL, &offload->flow->flags))
+		flow_offload_tuple_del(offload, FLOW_OFFLOAD_DIR_REPLY);
 	set_bit(NF_FLOW_HW_DEAD, &offload->flow->flags);
 }
 
@@ -946,7 +948,9 @@ static void flow_offload_work_stats(struct flow_offload_work *offload)
 	u64 lastused;
 
 	flow_offload_tuple_stats(offload, FLOW_OFFLOAD_DIR_ORIGINAL, &stats[0]);
-	flow_offload_tuple_stats(offload, FLOW_OFFLOAD_DIR_REPLY, &stats[1]);
+	if (test_bit(NF_FLOW_HW_BIDIRECTIONAL, &offload->flow->flags))
+		flow_offload_tuple_stats(offload, FLOW_OFFLOAD_DIR_REPLY,
+					 &stats[1]);
 
 	lastused = max_t(u64, stats[0].lastused, stats[1].lastused);
 	offload->flow->timeout = max_t(u64, offload->flow->timeout,
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next v2 4/7] netfilter: flowtable: allow updating offloaded rules asynchronously
  2023-01-13 16:55 [PATCH net-next v2 0/7] Allow offloading of UDP NEW connections via act_ct Vlad Buslov
                   ` (2 preceding siblings ...)
  2023-01-13 16:55 ` [PATCH net-next v2 3/7] netfilter: flowtable: allow unidirectional rules Vlad Buslov
@ 2023-01-13 16:55 ` Vlad Buslov
  2023-01-17 15:28   ` Marcelo Ricardo Leitner
  2023-01-13 16:55 ` [PATCH net-next v2 5/7] net/sched: act_ct: set ctinfo in meta action depending on ct state Vlad Buslov
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Vlad Buslov @ 2023-01-13 16:55 UTC (permalink / raw)
  To: davem, kuba, pabeni, pablo
  Cc: netdev, netfilter-devel, jhs, xiyou.wangcong, jiri, ozsh,
	marcelo.leitner, simon.horman, Vlad Buslov

Following patches in series need to update flowtable rule several times
during its lifetime in order to synchronize hardware offload with actual ct
status. However, reusing existing 'refresh' logic in act_ct would cause
data path to potentially schedule significant amount of spurious tasks in
'add' workqueue since it is executed per-packet. Instead, introduce a new
flow 'update' flag and use it to schedule async flow refresh in flowtable
gc which will only be executed once per gc iteration.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
---
 include/net/netfilter/nf_flow_table.h |  3 ++-
 net/netfilter/nf_flow_table_core.c    | 20 +++++++++++++++-----
 net/netfilter/nf_flow_table_offload.c |  5 +++--
 3 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index 88ab98ab41d9..e396424e2e68 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -165,6 +165,7 @@ enum nf_flow_flags {
 	NF_FLOW_HW_DEAD,
 	NF_FLOW_HW_PENDING,
 	NF_FLOW_HW_BIDIRECTIONAL,
+	NF_FLOW_HW_UPDATE,
 };
 
 enum flow_offload_type {
@@ -300,7 +301,7 @@ unsigned int nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
 #define MODULE_ALIAS_NF_FLOWTABLE(family)	\
 	MODULE_ALIAS("nf-flowtable-" __stringify(family))
 
-void nf_flow_offload_add(struct nf_flowtable *flowtable,
+bool nf_flow_offload_add(struct nf_flowtable *flowtable,
 			 struct flow_offload *flow);
 void nf_flow_offload_del(struct nf_flowtable *flowtable,
 			 struct flow_offload *flow);
diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index 04bd0ed4d2ae..5b495e768655 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -316,21 +316,28 @@ int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload *flow)
 }
 EXPORT_SYMBOL_GPL(flow_offload_add);
 
+static bool __flow_offload_refresh(struct nf_flowtable *flow_table,
+				   struct flow_offload *flow)
+{
+	if (likely(!nf_flowtable_hw_offload(flow_table)))
+		return true;
+
+	return nf_flow_offload_add(flow_table, flow);
+}
+
 void flow_offload_refresh(struct nf_flowtable *flow_table,
 			  struct flow_offload *flow)
 {
 	u32 timeout;
 
 	timeout = nf_flowtable_time_stamp + flow_offload_get_timeout(flow);
-	if (timeout - READ_ONCE(flow->timeout) > HZ)
+	if (timeout - READ_ONCE(flow->timeout) > HZ &&
+	    !test_bit(NF_FLOW_HW_UPDATE, &flow->flags))
 		WRITE_ONCE(flow->timeout, timeout);
 	else
 		return;
 
-	if (likely(!nf_flowtable_hw_offload(flow_table)))
-		return;
-
-	nf_flow_offload_add(flow_table, flow);
+	__flow_offload_refresh(flow_table, flow);
 }
 EXPORT_SYMBOL_GPL(flow_offload_refresh);
 
@@ -435,6 +442,9 @@ static void nf_flow_offload_gc_step(struct nf_flowtable *flow_table,
 		} else {
 			flow_offload_del(flow_table, flow);
 		}
+	} else if (test_and_clear_bit(NF_FLOW_HW_UPDATE, &flow->flags)) {
+		if (!__flow_offload_refresh(flow_table, flow))
+			set_bit(NF_FLOW_HW_UPDATE, &flow->flags);
 	} else if (test_bit(NF_FLOW_HW, &flow->flags)) {
 		nf_flow_offload_stats(flow_table, flow);
 	}
diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
index 8b852f10fab4..103b2ca8d123 100644
--- a/net/netfilter/nf_flow_table_offload.c
+++ b/net/netfilter/nf_flow_table_offload.c
@@ -1036,16 +1036,17 @@ nf_flow_offload_work_alloc(struct nf_flowtable *flowtable,
 }
 
 
-void nf_flow_offload_add(struct nf_flowtable *flowtable,
+bool nf_flow_offload_add(struct nf_flowtable *flowtable,
 			 struct flow_offload *flow)
 {
 	struct flow_offload_work *offload;
 
 	offload = nf_flow_offload_work_alloc(flowtable, flow, FLOW_CLS_REPLACE);
 	if (!offload)
-		return;
+		return false;
 
 	flow_offload_queue_work(offload);
+	return true;
 }
 
 void nf_flow_offload_del(struct nf_flowtable *flowtable,
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next v2 5/7] net/sched: act_ct: set ctinfo in meta action depending on ct state
  2023-01-13 16:55 [PATCH net-next v2 0/7] Allow offloading of UDP NEW connections via act_ct Vlad Buslov
                   ` (3 preceding siblings ...)
  2023-01-13 16:55 ` [PATCH net-next v2 4/7] netfilter: flowtable: allow updating offloaded rules asynchronously Vlad Buslov
@ 2023-01-13 16:55 ` Vlad Buslov
  2023-01-13 16:55 ` [PATCH net-next v2 6/7] net/sched: act_ct: offload UDP NEW connections Vlad Buslov
  2023-01-13 16:55 ` [PATCH net-next v2 7/7] netfilter: nf_conntrack: allow early drop of offloaded UDP conns Vlad Buslov
  6 siblings, 0 replies; 25+ messages in thread
From: Vlad Buslov @ 2023-01-13 16:55 UTC (permalink / raw)
  To: davem, kuba, pabeni, pablo
  Cc: netdev, netfilter-devel, jhs, xiyou.wangcong, jiri, ozsh,
	marcelo.leitner, simon.horman, Vlad Buslov

Currently tcf_ct_flow_table_add_action_meta() function assumes that only
established connections can be offloaded and always sets ctinfo to either
IP_CT_ESTABLISHED or IP_CT_ESTABLISHED_REPLY strictly based on direction
without checking actual connection state. To enable UDP NEW connection
offload set the ctinfo and metadata cookie based on ct->status value.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
---
 net/sched/act_ct.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 515577f913a3..bfddb462d2bc 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -182,8 +182,11 @@ static void tcf_ct_flow_table_add_action_meta(struct nf_conn *ct,
 #if IS_ENABLED(CONFIG_NF_CONNTRACK_MARK)
 	entry->ct_metadata.mark = READ_ONCE(ct->mark);
 #endif
-	ctinfo = dir == IP_CT_DIR_ORIGINAL ? IP_CT_ESTABLISHED :
-					     IP_CT_ESTABLISHED_REPLY;
+	if (dir == IP_CT_DIR_ORIGINAL)
+		ctinfo = test_bit(IPS_SEEN_REPLY_BIT, &ct->status) ?
+			IP_CT_ESTABLISHED : IP_CT_NEW;
+	else
+		ctinfo = IP_CT_ESTABLISHED_REPLY;
 	/* aligns with the CT reference on the SKB nf_ct_set */
 	entry->ct_metadata.cookie = (unsigned long)ct | ctinfo;
 	entry->ct_metadata.orig_dir = dir == IP_CT_DIR_ORIGINAL;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next v2 6/7] net/sched: act_ct: offload UDP NEW connections
  2023-01-13 16:55 [PATCH net-next v2 0/7] Allow offloading of UDP NEW connections via act_ct Vlad Buslov
                   ` (4 preceding siblings ...)
  2023-01-13 16:55 ` [PATCH net-next v2 5/7] net/sched: act_ct: set ctinfo in meta action depending on ct state Vlad Buslov
@ 2023-01-13 16:55 ` Vlad Buslov
  2023-01-17 15:41   ` Marcelo Ricardo Leitner
  2023-01-13 16:55 ` [PATCH net-next v2 7/7] netfilter: nf_conntrack: allow early drop of offloaded UDP conns Vlad Buslov
  6 siblings, 1 reply; 25+ messages in thread
From: Vlad Buslov @ 2023-01-13 16:55 UTC (permalink / raw)
  To: davem, kuba, pabeni, pablo
  Cc: netdev, netfilter-devel, jhs, xiyou.wangcong, jiri, ozsh,
	marcelo.leitner, simon.horman, Vlad Buslov

When processing connections allow offloading of UDP connections that don't
have IPS_ASSURED_BIT set as unidirectional. When performing table lookup
for reply packets check the current connection status: If UDP
unidirectional connection became assured also promote the corresponding
flow table entry to bidirectional and set the 'update' bit, else just set
the 'update' bit since reply directional traffic will most likely cause
connection status to become 'established' which requires updating the
offload state.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
---
 net/sched/act_ct.c | 48 ++++++++++++++++++++++++++++++++++------------
 1 file changed, 36 insertions(+), 12 deletions(-)

diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index bfddb462d2bc..563cbdd8341c 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -369,7 +369,7 @@ static void tcf_ct_flow_tc_ifidx(struct flow_offload *entry,
 
 static void tcf_ct_flow_table_add(struct tcf_ct_flow_table *ct_ft,
 				  struct nf_conn *ct,
-				  bool tcp)
+				  bool tcp, bool bidirectional)
 {
 	struct nf_conn_act_ct_ext *act_ct_ext;
 	struct flow_offload *entry;
@@ -388,6 +388,8 @@ static void tcf_ct_flow_table_add(struct tcf_ct_flow_table *ct_ft,
 		ct->proto.tcp.seen[0].flags |= IP_CT_TCP_FLAG_BE_LIBERAL;
 		ct->proto.tcp.seen[1].flags |= IP_CT_TCP_FLAG_BE_LIBERAL;
 	}
+	if (bidirectional)
+		__set_bit(NF_FLOW_HW_BIDIRECTIONAL, &entry->flags);
 
 	act_ct_ext = nf_conn_act_ct_ext_find(ct);
 	if (act_ct_ext) {
@@ -411,26 +413,34 @@ static void tcf_ct_flow_table_process_conn(struct tcf_ct_flow_table *ct_ft,
 					   struct nf_conn *ct,
 					   enum ip_conntrack_info ctinfo)
 {
-	bool tcp = false;
-
-	if ((ctinfo != IP_CT_ESTABLISHED && ctinfo != IP_CT_ESTABLISHED_REPLY) ||
-	    !test_bit(IPS_ASSURED_BIT, &ct->status))
-		return;
+	bool tcp = false, bidirectional = true;
 
 	switch (nf_ct_protonum(ct)) {
 	case IPPROTO_TCP:
-		tcp = true;
-		if (ct->proto.tcp.state != TCP_CONNTRACK_ESTABLISHED)
+		if ((ctinfo != IP_CT_ESTABLISHED &&
+		     ctinfo != IP_CT_ESTABLISHED_REPLY) ||
+		    !test_bit(IPS_ASSURED_BIT, &ct->status) ||
+		    ct->proto.tcp.state != TCP_CONNTRACK_ESTABLISHED)
 			return;
+
+		tcp = true;
 		break;
 	case IPPROTO_UDP:
+		if (!nf_ct_is_confirmed(ct))
+			return;
+		if (!test_bit(IPS_ASSURED_BIT, &ct->status))
+			bidirectional = false;
 		break;
 #ifdef CONFIG_NF_CT_PROTO_GRE
 	case IPPROTO_GRE: {
 		struct nf_conntrack_tuple *tuple;
 
-		if (ct->status & IPS_NAT_MASK)
+		if ((ctinfo != IP_CT_ESTABLISHED &&
+		     ctinfo != IP_CT_ESTABLISHED_REPLY) ||
+		    !test_bit(IPS_ASSURED_BIT, &ct->status) ||
+		    ct->status & IPS_NAT_MASK)
 			return;
+
 		tuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
 		/* No support for GRE v1 */
 		if (tuple->src.u.gre.key || tuple->dst.u.gre.key)
@@ -446,7 +456,7 @@ static void tcf_ct_flow_table_process_conn(struct tcf_ct_flow_table *ct_ft,
 	    ct->status & IPS_SEQ_ADJUST)
 		return;
 
-	tcf_ct_flow_table_add(ct_ft, ct, tcp);
+	tcf_ct_flow_table_add(ct_ft, ct, tcp, bidirectional);
 }
 
 static bool
@@ -625,13 +635,27 @@ static bool tcf_ct_flow_table_lookup(struct tcf_ct_params *p,
 	flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]);
 	ct = flow->ct;
 
+	if (dir == FLOW_OFFLOAD_DIR_REPLY &&
+	    !test_bit(NF_FLOW_HW_BIDIRECTIONAL, &flow->flags)) {
+		/* Only offload reply direction after connection became
+		 * assured.
+		 */
+		if (test_bit(IPS_ASSURED_BIT, &ct->status))
+			set_bit(NF_FLOW_HW_BIDIRECTIONAL, &flow->flags);
+		set_bit(NF_FLOW_HW_UPDATE, &flow->flags);
+		return false;
+	}
+
 	if (tcph && (unlikely(tcph->fin || tcph->rst))) {
 		flow_offload_teardown(flow);
 		return false;
 	}
 
-	ctinfo = dir == FLOW_OFFLOAD_DIR_ORIGINAL ? IP_CT_ESTABLISHED :
-						    IP_CT_ESTABLISHED_REPLY;
+	if (dir == FLOW_OFFLOAD_DIR_ORIGINAL)
+		ctinfo = test_bit(IPS_SEEN_REPLY_BIT, &ct->status) ?
+			IP_CT_ESTABLISHED : IP_CT_NEW;
+	else
+		ctinfo = IP_CT_ESTABLISHED_REPLY;
 
 	flow_offload_refresh(nf_ft, flow);
 	nf_conntrack_get(&ct->ct_general);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next v2 7/7] netfilter: nf_conntrack: allow early drop of offloaded UDP conns
  2023-01-13 16:55 [PATCH net-next v2 0/7] Allow offloading of UDP NEW connections via act_ct Vlad Buslov
                   ` (5 preceding siblings ...)
  2023-01-13 16:55 ` [PATCH net-next v2 6/7] net/sched: act_ct: offload UDP NEW connections Vlad Buslov
@ 2023-01-13 16:55 ` Vlad Buslov
  6 siblings, 0 replies; 25+ messages in thread
From: Vlad Buslov @ 2023-01-13 16:55 UTC (permalink / raw)
  To: davem, kuba, pabeni, pablo
  Cc: netdev, netfilter-devel, jhs, xiyou.wangcong, jiri, ozsh,
	marcelo.leitner, simon.horman, Vlad Buslov

Both synchronous early drop algorithm and asynchronous gc worker completely
ignore connections with IPS_OFFLOAD_BIT status bit set. With new
functionality that enabled UDP NEW connection offload in action CT
malicious user can flood the conntrack table with offloaded UDP connections
by just sending a single packet per 5tuple because such connections can no
longer be deleted by early drop algorithm.

To mitigate the issue allow both early drop and gc to consider offloaded
UDP connections for deletion.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
---
 net/netfilter/nf_conntrack_core.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 496c4920505b..52b824a60176 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1374,9 +1374,6 @@ static unsigned int early_drop_list(struct net *net,
 	hlist_nulls_for_each_entry_rcu(h, n, head, hnnode) {
 		tmp = nf_ct_tuplehash_to_ctrack(h);
 
-		if (test_bit(IPS_OFFLOAD_BIT, &tmp->status))
-			continue;
-
 		if (nf_ct_is_expired(tmp)) {
 			nf_ct_gc_expired(tmp);
 			continue;
@@ -1446,11 +1443,14 @@ static bool gc_worker_skip_ct(const struct nf_conn *ct)
 static bool gc_worker_can_early_drop(const struct nf_conn *ct)
 {
 	const struct nf_conntrack_l4proto *l4proto;
+	u8 protonum = nf_ct_protonum(ct);
 
+	if (test_bit(IPS_OFFLOAD_BIT, &ct->status) && protonum != IPPROTO_UDP)
+		return false;
 	if (!test_bit(IPS_ASSURED_BIT, &ct->status))
 		return true;
 
-	l4proto = nf_ct_l4proto_find(nf_ct_protonum(ct));
+	l4proto = nf_ct_l4proto_find(protonum);
 	if (l4proto->can_early_drop && l4proto->can_early_drop(ct))
 		return true;
 
@@ -1507,7 +1507,8 @@ static void gc_worker(struct work_struct *work)
 
 			if (test_bit(IPS_OFFLOAD_BIT, &tmp->status)) {
 				nf_ct_offload_timeout(tmp);
-				continue;
+				if (!nf_conntrack_max95)
+					continue;
 			}
 
 			if (expired_count > GC_SCAN_EXPIRED_MAX) {
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v2 1/7] net: flow_offload: provision conntrack info in ct_metadata
  2023-01-13 16:55 ` [PATCH net-next v2 1/7] net: flow_offload: provision conntrack info in ct_metadata Vlad Buslov
@ 2023-01-17 14:42   ` Pablo Neira Ayuso
  2023-01-17 17:43     ` Vlad Buslov
  2023-01-17 15:04   ` Marcelo Ricardo Leitner
  1 sibling, 1 reply; 25+ messages in thread
From: Pablo Neira Ayuso @ 2023-01-17 14:42 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: davem, kuba, pabeni, netdev, netfilter-devel, jhs,
	xiyou.wangcong, jiri, ozsh, marcelo.leitner, simon.horman

Hi Vlad,

On Fri, Jan 13, 2023 at 05:55:42PM +0100, Vlad Buslov wrote:
> In order to offload connections in other states besides "established" the
> driver offload callbacks need to have access to connection conntrack info.
> Extend flow offload intermediate representation data structure
> flow_action_entry->ct_metadata with new enum ip_conntrack_info field and
> fill it in tcf_ct_flow_table_add_action_meta() callback.
> 
> Reject offloading IP_CT_NEW connections for now by returning an error in
> relevant driver callbacks based on value of ctinfo. Support for offloading
> such connections will need to be added to the drivers afterwards.
> 
> Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
> ---
> 
> Notes:
>     Changes V1 -> V2:
>     
>     - Add missing include that caused compilation errors on certain configs.
>     
>     - Change naming in nfp driver as suggested by Simon and Baowen.
> 
>  .../ethernet/mellanox/mlx5/core/en/tc_ct.c    |  2 +-
>  .../ethernet/netronome/nfp/flower/conntrack.c | 20 +++++++++++++++++++
>  include/net/flow_offload.h                    |  2 ++
>  net/sched/act_ct.c                            |  1 +
>  4 files changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
> index 313df8232db7..8cad5cf3305d 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
> @@ -1077,7 +1077,7 @@ mlx5_tc_ct_block_flow_offload_add(struct mlx5_ct_ft *ft,
>  	int err;
>  
>  	meta_action = mlx5_tc_ct_get_ct_metadata_action(flow_rule);
> -	if (!meta_action)
> +	if (!meta_action || meta_action->ct_metadata.ctinfo == IP_CT_NEW)
>  		return -EOPNOTSUPP;
>  
>  	spin_lock_bh(&ct_priv->ht_lock);
> diff --git a/drivers/net/ethernet/netronome/nfp/flower/conntrack.c b/drivers/net/ethernet/netronome/nfp/flower/conntrack.c
> index f693119541d5..f7569584b9d8 100644
> --- a/drivers/net/ethernet/netronome/nfp/flower/conntrack.c
> +++ b/drivers/net/ethernet/netronome/nfp/flower/conntrack.c
> @@ -1964,6 +1964,23 @@ int nfp_fl_ct_stats(struct flow_cls_offload *flow,
>  	return 0;
>  }
>  
> +static bool
> +nfp_fl_ct_offload_nft_supported(struct flow_cls_offload *flow)
> +{
> +	struct flow_rule *flow_rule = flow->rule;
> +	struct flow_action *flow_action =
> +		&flow_rule->action;
> +	struct flow_action_entry *act;
> +	int i;
> +
> +	flow_action_for_each(i, act, flow_action) {
> +		if (act->id == FLOW_ACTION_CT_METADATA)
> +			return act->ct_metadata.ctinfo != IP_CT_NEW;
> +	}
> +
> +	return false;
> +}
> +
>  static int
>  nfp_fl_ct_offload_nft_flow(struct nfp_fl_ct_zone_entry *zt, struct flow_cls_offload *flow)
>  {
> @@ -1976,6 +1993,9 @@ nfp_fl_ct_offload_nft_flow(struct nfp_fl_ct_zone_entry *zt, struct flow_cls_offl
>  	extack = flow->common.extack;
>  	switch (flow->command) {
>  	case FLOW_CLS_REPLACE:
> +		if (!nfp_fl_ct_offload_nft_supported(flow))
> +			return -EOPNOTSUPP;
> +
>  		/* Netfilter can request offload multiple times for the same
>  		 * flow - protect against adding duplicates.
>  		 */
> diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
> index 0400a0ac8a29..a6adaffb68fb 100644
> --- a/include/net/flow_offload.h
> +++ b/include/net/flow_offload.h
> @@ -4,6 +4,7 @@
>  #include <linux/kernel.h>
>  #include <linux/list.h>
>  #include <linux/netlink.h>
> +#include <linux/netfilter/nf_conntrack_common.h>
>  #include <net/flow_dissector.h>
>  
>  struct flow_match {
> @@ -288,6 +289,7 @@ struct flow_action_entry {
>  		} ct;
>  		struct {
>  			unsigned long cookie;
> +			enum ip_conntrack_info ctinfo;

Maybe you can use a bool here, only possible states that make sense
are new and established.

>  			u32 mark;
>  			u32 labels[4];
>  			bool orig_dir;
> diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
> index 0ca2bb8ed026..515577f913a3 100644
> --- a/net/sched/act_ct.c
> +++ b/net/sched/act_ct.c
> @@ -187,6 +187,7 @@ static void tcf_ct_flow_table_add_action_meta(struct nf_conn *ct,
>  	/* aligns with the CT reference on the SKB nf_ct_set */
>  	entry->ct_metadata.cookie = (unsigned long)ct | ctinfo;
>  	entry->ct_metadata.orig_dir = dir == IP_CT_DIR_ORIGINAL;
> +	entry->ct_metadata.ctinfo = ctinfo;
>  
>  	act_ct_labels = entry->ct_metadata.labels;
>  	ct_labels = nf_ct_labels_find(ct);
> -- 
> 2.38.1
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v2 1/7] net: flow_offload: provision conntrack info in ct_metadata
  2023-01-13 16:55 ` [PATCH net-next v2 1/7] net: flow_offload: provision conntrack info in ct_metadata Vlad Buslov
  2023-01-17 14:42   ` Pablo Neira Ayuso
@ 2023-01-17 15:04   ` Marcelo Ricardo Leitner
  2023-01-17 15:09     ` Marcelo Ricardo Leitner
  2023-01-17 17:25     ` Vlad Buslov
  1 sibling, 2 replies; 25+ messages in thread
From: Marcelo Ricardo Leitner @ 2023-01-17 15:04 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: davem, kuba, pabeni, pablo, netdev, netfilter-devel, jhs,
	xiyou.wangcong, jiri, ozsh, simon.horman

On Fri, Jan 13, 2023 at 05:55:42PM +0100, Vlad Buslov wrote:
...
>  struct flow_match {
> @@ -288,6 +289,7 @@ struct flow_action_entry {
>  		} ct;
>  		struct {
>  			unsigned long cookie;
> +			enum ip_conntrack_info ctinfo;
>  			u32 mark;
>  			u32 labels[4];
>  			bool orig_dir;
> diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
> index 0ca2bb8ed026..515577f913a3 100644
> --- a/net/sched/act_ct.c
> +++ b/net/sched/act_ct.c
> @@ -187,6 +187,7 @@ static void tcf_ct_flow_table_add_action_meta(struct nf_conn *ct,
>  	/* aligns with the CT reference on the SKB nf_ct_set */
>  	entry->ct_metadata.cookie = (unsigned long)ct | ctinfo;
                                                   ^^^^^^^^^^^

>  	entry->ct_metadata.orig_dir = dir == IP_CT_DIR_ORIGINAL;
> +	entry->ct_metadata.ctinfo = ctinfo;

tcf_ct_flow_table_restore_skb() is doing:
        enum ip_conntrack_info ctinfo = cookie & NFCT_INFOMASK;

Not sure if it really needs this duplication then.

>  
>  	act_ct_labels = entry->ct_metadata.labels;
>  	ct_labels = nf_ct_labels_find(ct);
> -- 
> 2.38.1
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v2 1/7] net: flow_offload: provision conntrack info in ct_metadata
  2023-01-17 15:04   ` Marcelo Ricardo Leitner
@ 2023-01-17 15:09     ` Marcelo Ricardo Leitner
  2023-01-17 17:28       ` Vlad Buslov
  2023-01-17 17:25     ` Vlad Buslov
  1 sibling, 1 reply; 25+ messages in thread
From: Marcelo Ricardo Leitner @ 2023-01-17 15:09 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: davem, kuba, pabeni, pablo, netdev, netfilter-devel, jhs,
	xiyou.wangcong, jiri, ozsh, simon.horman

On Tue, Jan 17, 2023 at 12:04:32PM -0300, Marcelo Ricardo Leitner wrote:
> On Fri, Jan 13, 2023 at 05:55:42PM +0100, Vlad Buslov wrote:
> ...
> >  struct flow_match {
> > @@ -288,6 +289,7 @@ struct flow_action_entry {
> >  		} ct;
> >  		struct {
> >  			unsigned long cookie;
> > +			enum ip_conntrack_info ctinfo;
> >  			u32 mark;
> >  			u32 labels[4];
> >  			bool orig_dir;
> > diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
> > index 0ca2bb8ed026..515577f913a3 100644
> > --- a/net/sched/act_ct.c
> > +++ b/net/sched/act_ct.c
> > @@ -187,6 +187,7 @@ static void tcf_ct_flow_table_add_action_meta(struct nf_conn *ct,
> >  	/* aligns with the CT reference on the SKB nf_ct_set */
> >  	entry->ct_metadata.cookie = (unsigned long)ct | ctinfo;
>                                                    ^^^^^^^^^^^

Hmm. Thought that just came up and still need to dig into, but wanted
to share/ask already. Would it be a problem to update the cookie later
on then, to reflect the new ctinfo?

> 
> >  	entry->ct_metadata.orig_dir = dir == IP_CT_DIR_ORIGINAL;
> > +	entry->ct_metadata.ctinfo = ctinfo;
> 
> tcf_ct_flow_table_restore_skb() is doing:
>         enum ip_conntrack_info ctinfo = cookie & NFCT_INFOMASK;
> 
> Not sure if it really needs this duplication then.
> 
> >  
> >  	act_ct_labels = entry->ct_metadata.labels;
> >  	ct_labels = nf_ct_labels_find(ct);
> > -- 
> > 2.38.1
> > 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v2 3/7] netfilter: flowtable: allow unidirectional rules
  2023-01-13 16:55 ` [PATCH net-next v2 3/7] netfilter: flowtable: allow unidirectional rules Vlad Buslov
@ 2023-01-17 15:15   ` Marcelo Ricardo Leitner
  2023-01-17 17:31     ` Vlad Buslov
  0 siblings, 1 reply; 25+ messages in thread
From: Marcelo Ricardo Leitner @ 2023-01-17 15:15 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: davem, kuba, pabeni, pablo, netdev, netfilter-devel, jhs,
	xiyou.wangcong, jiri, ozsh, simon.horman

On Fri, Jan 13, 2023 at 05:55:44PM +0100, Vlad Buslov wrote:
> Modify flow table offload to support unidirectional connections by
> extending enum nf_flow_flags with new "NF_FLOW_HW_BIDIRECTIONAL" flag. Only
> offload reply direction when the flag is not set. This infrastructure
                                           ^^^ s///, I believe? :-)

...
> -	ok_count += flow_offload_tuple_add(offload, flow_rule[1],
> -					   FLOW_OFFLOAD_DIR_REPLY);
> +	if (test_bit(NF_FLOW_HW_BIDIRECTIONAL, &offload->flow->flags))
> +		ok_count += flow_offload_tuple_add(offload, flow_rule[1],
> +						   FLOW_OFFLOAD_DIR_REPLY);
>  	if (ok_count == 0)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v2 4/7] netfilter: flowtable: allow updating offloaded rules asynchronously
  2023-01-13 16:55 ` [PATCH net-next v2 4/7] netfilter: flowtable: allow updating offloaded rules asynchronously Vlad Buslov
@ 2023-01-17 15:28   ` Marcelo Ricardo Leitner
  2023-01-17 17:33     ` Vlad Buslov
  0 siblings, 1 reply; 25+ messages in thread
From: Marcelo Ricardo Leitner @ 2023-01-17 15:28 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: davem, kuba, pabeni, pablo, netdev, netfilter-devel, jhs,
	xiyou.wangcong, jiri, ozsh, simon.horman

On Fri, Jan 13, 2023 at 05:55:45PM +0100, Vlad Buslov wrote:
> Following patches in series need to update flowtable rule several times
> during its lifetime in order to synchronize hardware offload with actual ct
> status. However, reusing existing 'refresh' logic in act_ct would cause
> data path to potentially schedule significant amount of spurious tasks in
> 'add' workqueue since it is executed per-packet. Instead, introduce a new
> flow 'update' flag and use it to schedule async flow refresh in flowtable
> gc which will only be executed once per gc iteration.
> 
> Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
> ---
>  include/net/netfilter/nf_flow_table.h |  3 ++-
>  net/netfilter/nf_flow_table_core.c    | 20 +++++++++++++++-----
>  net/netfilter/nf_flow_table_offload.c |  5 +++--
>  3 files changed, 20 insertions(+), 8 deletions(-)
> 
> diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
> index 88ab98ab41d9..e396424e2e68 100644
> --- a/include/net/netfilter/nf_flow_table.h
> +++ b/include/net/netfilter/nf_flow_table.h
> @@ -165,6 +165,7 @@ enum nf_flow_flags {
>  	NF_FLOW_HW_DEAD,
>  	NF_FLOW_HW_PENDING,
>  	NF_FLOW_HW_BIDIRECTIONAL,
> +	NF_FLOW_HW_UPDATE,
>  };
>  
>  enum flow_offload_type {
> @@ -300,7 +301,7 @@ unsigned int nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
>  #define MODULE_ALIAS_NF_FLOWTABLE(family)	\
>  	MODULE_ALIAS("nf-flowtable-" __stringify(family))
>  
> -void nf_flow_offload_add(struct nf_flowtable *flowtable,
> +bool nf_flow_offload_add(struct nf_flowtable *flowtable,
>  			 struct flow_offload *flow);
>  void nf_flow_offload_del(struct nf_flowtable *flowtable,
>  			 struct flow_offload *flow);
> diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
> index 04bd0ed4d2ae..5b495e768655 100644
> --- a/net/netfilter/nf_flow_table_core.c
> +++ b/net/netfilter/nf_flow_table_core.c
> @@ -316,21 +316,28 @@ int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload *flow)
>  }
>  EXPORT_SYMBOL_GPL(flow_offload_add);
>  
> +static bool __flow_offload_refresh(struct nf_flowtable *flow_table,
> +				   struct flow_offload *flow)
> +{
> +	if (likely(!nf_flowtable_hw_offload(flow_table)))
> +		return true;
> +
> +	return nf_flow_offload_add(flow_table, flow);
> +}
> +
>  void flow_offload_refresh(struct nf_flowtable *flow_table,
>  			  struct flow_offload *flow)
>  {
>  	u32 timeout;
>  
>  	timeout = nf_flowtable_time_stamp + flow_offload_get_timeout(flow);
> -	if (timeout - READ_ONCE(flow->timeout) > HZ)
> +	if (timeout - READ_ONCE(flow->timeout) > HZ &&
> +	    !test_bit(NF_FLOW_HW_UPDATE, &flow->flags))
>  		WRITE_ONCE(flow->timeout, timeout);
>  	else
>  		return;
>  
> -	if (likely(!nf_flowtable_hw_offload(flow_table)))
> -		return;
> -
> -	nf_flow_offload_add(flow_table, flow);
> +	__flow_offload_refresh(flow_table, flow);
>  }
>  EXPORT_SYMBOL_GPL(flow_offload_refresh);
>  
> @@ -435,6 +442,9 @@ static void nf_flow_offload_gc_step(struct nf_flowtable *flow_table,
>  		} else {
>  			flow_offload_del(flow_table, flow);
>  		}
> +	} else if (test_and_clear_bit(NF_FLOW_HW_UPDATE, &flow->flags)) {
> +		if (!__flow_offload_refresh(flow_table, flow))
> +			set_bit(NF_FLOW_HW_UPDATE, &flow->flags);
>  	} else if (test_bit(NF_FLOW_HW, &flow->flags)) {
>  		nf_flow_offload_stats(flow_table, flow);

AFAICT even after this patchset it is possible to have both flags set
at the same time.
With that, this would cause the stats to skip a beat.
This would be better:

- 	} else if (test_bit(NF_FLOW_HW, &flow->flags)) {
- 		nf_flow_offload_stats(flow_table, flow);
+	} else {
+		if (test_and_clear_bit(NF_FLOW_HW_UPDATE, &flow->flags))
+			if (!__flow_offload_refresh(flow_table, flow))
+				set_bit(NF_FLOW_HW_UPDATE, &flow->flags);
+	 	if (test_bit(NF_FLOW_HW, &flow->flags))
+ 			nf_flow_offload_stats(flow_table, flow);
 	}

But a flow cannot have 2 pending actions at a time.
Then maybe an update to nf_flow_offload_tuple() to make it handle the
stats implicitly?

>  	}
> diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
> index 8b852f10fab4..103b2ca8d123 100644
> --- a/net/netfilter/nf_flow_table_offload.c
> +++ b/net/netfilter/nf_flow_table_offload.c
> @@ -1036,16 +1036,17 @@ nf_flow_offload_work_alloc(struct nf_flowtable *flowtable,
>  }
>  
>  
> -void nf_flow_offload_add(struct nf_flowtable *flowtable,
> +bool nf_flow_offload_add(struct nf_flowtable *flowtable,
>  			 struct flow_offload *flow)
>  {
>  	struct flow_offload_work *offload;
>  
>  	offload = nf_flow_offload_work_alloc(flowtable, flow, FLOW_CLS_REPLACE);
>  	if (!offload)
> -		return;
> +		return false;
>  
>  	flow_offload_queue_work(offload);
> +	return true;
>  }
>  
>  void nf_flow_offload_del(struct nf_flowtable *flowtable,
> -- 
> 2.38.1
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v2 6/7] net/sched: act_ct: offload UDP NEW connections
  2023-01-13 16:55 ` [PATCH net-next v2 6/7] net/sched: act_ct: offload UDP NEW connections Vlad Buslov
@ 2023-01-17 15:41   ` Marcelo Ricardo Leitner
  2023-01-17 17:36     ` Vlad Buslov
  0 siblings, 1 reply; 25+ messages in thread
From: Marcelo Ricardo Leitner @ 2023-01-17 15:41 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: davem, kuba, pabeni, pablo, netdev, netfilter-devel, jhs,
	xiyou.wangcong, jiri, ozsh, simon.horman

On Fri, Jan 13, 2023 at 05:55:47PM +0100, Vlad Buslov wrote:
> When processing connections allow offloading of UDP connections that don't
> have IPS_ASSURED_BIT set as unidirectional. When performing table lookup

Hmm. Considering that this is now offloading one direction only
already, what about skipping this grace period:

In nf_conntrack_udp_packet(), it does:

        /* If we've seen traffic both ways, this is some kind of UDP
         * stream. Set Assured.
         */
        if (test_bit(IPS_SEEN_REPLY_BIT, &ct->status)) {
		...
                /* Still active after two seconds? Extend timeout. */
                if (time_after(jiffies, ct->proto.udp.stream_ts)) {
                        extra = timeouts[UDP_CT_REPLIED];
                        stream = true;
                }
		...
                /* Also, more likely to be important, and not a probe */
                if (stream && !test_and_set_bit(IPS_ASSURED_BIT, &ct->status))
                        nf_conntrack_event_cache(IPCT_ASSURED, ct);

Maybe the patch should be relying on IPS_SEEN_REPLY_BIT instead of
ASSURED for UDP? Just a thought here, but I'm not seeing why not.

> for reply packets check the current connection status: If UDP
> unidirectional connection became assured also promote the corresponding
> flow table entry to bidirectional and set the 'update' bit, else just set
> the 'update' bit since reply directional traffic will most likely cause
> connection status to become 'established' which requires updating the
> offload state.
> 
> Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
> ---
>  net/sched/act_ct.c | 48 ++++++++++++++++++++++++++++++++++------------
>  1 file changed, 36 insertions(+), 12 deletions(-)
> 
> diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
> index bfddb462d2bc..563cbdd8341c 100644
> --- a/net/sched/act_ct.c
> +++ b/net/sched/act_ct.c
> @@ -369,7 +369,7 @@ static void tcf_ct_flow_tc_ifidx(struct flow_offload *entry,
>  
>  static void tcf_ct_flow_table_add(struct tcf_ct_flow_table *ct_ft,
>  				  struct nf_conn *ct,
> -				  bool tcp)
> +				  bool tcp, bool bidirectional)
>  {
>  	struct nf_conn_act_ct_ext *act_ct_ext;
>  	struct flow_offload *entry;
> @@ -388,6 +388,8 @@ static void tcf_ct_flow_table_add(struct tcf_ct_flow_table *ct_ft,
>  		ct->proto.tcp.seen[0].flags |= IP_CT_TCP_FLAG_BE_LIBERAL;
>  		ct->proto.tcp.seen[1].flags |= IP_CT_TCP_FLAG_BE_LIBERAL;
>  	}
> +	if (bidirectional)
> +		__set_bit(NF_FLOW_HW_BIDIRECTIONAL, &entry->flags);
>  
>  	act_ct_ext = nf_conn_act_ct_ext_find(ct);
>  	if (act_ct_ext) {
> @@ -411,26 +413,34 @@ static void tcf_ct_flow_table_process_conn(struct tcf_ct_flow_table *ct_ft,
>  					   struct nf_conn *ct,
>  					   enum ip_conntrack_info ctinfo)
>  {
> -	bool tcp = false;
> -
> -	if ((ctinfo != IP_CT_ESTABLISHED && ctinfo != IP_CT_ESTABLISHED_REPLY) ||
> -	    !test_bit(IPS_ASSURED_BIT, &ct->status))
> -		return;
> +	bool tcp = false, bidirectional = true;
>  
>  	switch (nf_ct_protonum(ct)) {
>  	case IPPROTO_TCP:
> -		tcp = true;
> -		if (ct->proto.tcp.state != TCP_CONNTRACK_ESTABLISHED)
> +		if ((ctinfo != IP_CT_ESTABLISHED &&
> +		     ctinfo != IP_CT_ESTABLISHED_REPLY) ||
> +		    !test_bit(IPS_ASSURED_BIT, &ct->status) ||
> +		    ct->proto.tcp.state != TCP_CONNTRACK_ESTABLISHED)
>  			return;
> +
> +		tcp = true;
>  		break;
>  	case IPPROTO_UDP:
> +		if (!nf_ct_is_confirmed(ct))
> +			return;
> +		if (!test_bit(IPS_ASSURED_BIT, &ct->status))
> +			bidirectional = false;
>  		break;
>  #ifdef CONFIG_NF_CT_PROTO_GRE
>  	case IPPROTO_GRE: {
>  		struct nf_conntrack_tuple *tuple;
>  
> -		if (ct->status & IPS_NAT_MASK)
> +		if ((ctinfo != IP_CT_ESTABLISHED &&
> +		     ctinfo != IP_CT_ESTABLISHED_REPLY) ||
> +		    !test_bit(IPS_ASSURED_BIT, &ct->status) ||
> +		    ct->status & IPS_NAT_MASK)
>  			return;
> +
>  		tuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
>  		/* No support for GRE v1 */
>  		if (tuple->src.u.gre.key || tuple->dst.u.gre.key)
> @@ -446,7 +456,7 @@ static void tcf_ct_flow_table_process_conn(struct tcf_ct_flow_table *ct_ft,
>  	    ct->status & IPS_SEQ_ADJUST)
>  		return;
>  
> -	tcf_ct_flow_table_add(ct_ft, ct, tcp);
> +	tcf_ct_flow_table_add(ct_ft, ct, tcp, bidirectional);
>  }
>  
>  static bool
> @@ -625,13 +635,27 @@ static bool tcf_ct_flow_table_lookup(struct tcf_ct_params *p,
>  	flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]);
>  	ct = flow->ct;
>  
> +	if (dir == FLOW_OFFLOAD_DIR_REPLY &&
> +	    !test_bit(NF_FLOW_HW_BIDIRECTIONAL, &flow->flags)) {
> +		/* Only offload reply direction after connection became
> +		 * assured.
> +		 */
> +		if (test_bit(IPS_ASSURED_BIT, &ct->status))
> +			set_bit(NF_FLOW_HW_BIDIRECTIONAL, &flow->flags);
> +		set_bit(NF_FLOW_HW_UPDATE, &flow->flags);
> +		return false;
> +	}
> +
>  	if (tcph && (unlikely(tcph->fin || tcph->rst))) {
>  		flow_offload_teardown(flow);
>  		return false;
>  	}
>  
> -	ctinfo = dir == FLOW_OFFLOAD_DIR_ORIGINAL ? IP_CT_ESTABLISHED :
> -						    IP_CT_ESTABLISHED_REPLY;
> +	if (dir == FLOW_OFFLOAD_DIR_ORIGINAL)
> +		ctinfo = test_bit(IPS_SEEN_REPLY_BIT, &ct->status) ?
> +			IP_CT_ESTABLISHED : IP_CT_NEW;
> +	else
> +		ctinfo = IP_CT_ESTABLISHED_REPLY;
>  
>  	flow_offload_refresh(nf_ft, flow);
>  	nf_conntrack_get(&ct->ct_general);
> -- 
> 2.38.1
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v2 1/7] net: flow_offload: provision conntrack info in ct_metadata
  2023-01-17 15:04   ` Marcelo Ricardo Leitner
  2023-01-17 15:09     ` Marcelo Ricardo Leitner
@ 2023-01-17 17:25     ` Vlad Buslov
  1 sibling, 0 replies; 25+ messages in thread
From: Vlad Buslov @ 2023-01-17 17:25 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: davem, kuba, pabeni, pablo, netdev, netfilter-devel, jhs,
	xiyou.wangcong, jiri, ozsh, simon.horman

Hi Marcelo,

Thanks for reviewing!

On Tue 17 Jan 2023 at 12:04, Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:
> On Fri, Jan 13, 2023 at 05:55:42PM +0100, Vlad Buslov wrote:
> ...
>>  struct flow_match {
>> @@ -288,6 +289,7 @@ struct flow_action_entry {
>>  		} ct;
>>  		struct {
>>  			unsigned long cookie;
>> +			enum ip_conntrack_info ctinfo;
>>  			u32 mark;
>>  			u32 labels[4];
>>  			bool orig_dir;
>> diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
>> index 0ca2bb8ed026..515577f913a3 100644
>> --- a/net/sched/act_ct.c
>> +++ b/net/sched/act_ct.c
>> @@ -187,6 +187,7 @@ static void tcf_ct_flow_table_add_action_meta(struct nf_conn *ct,
>>  	/* aligns with the CT reference on the SKB nf_ct_set */
>>  	entry->ct_metadata.cookie = (unsigned long)ct | ctinfo;
>                                                    ^^^^^^^^^^^
>
>>  	entry->ct_metadata.orig_dir = dir == IP_CT_DIR_ORIGINAL;
>> +	entry->ct_metadata.ctinfo = ctinfo;
>
> tcf_ct_flow_table_restore_skb() is doing:
>         enum ip_conntrack_info ctinfo = cookie & NFCT_INFOMASK;
>
> Not sure if it really needs this duplication then.

Indeed! I wanted to preserve the cookie as opaque integer (similar to TC
filter cookie), but since drivers are already aware about its contents
we can just reuse it for my case and prevent duplication.

>
>>  
>>  	act_ct_labels = entry->ct_metadata.labels;
>>  	ct_labels = nf_ct_labels_find(ct);
>> -- 
>> 2.38.1
>> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v2 1/7] net: flow_offload: provision conntrack info in ct_metadata
  2023-01-17 15:09     ` Marcelo Ricardo Leitner
@ 2023-01-17 17:28       ` Vlad Buslov
  2023-01-19  4:15         ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 25+ messages in thread
From: Vlad Buslov @ 2023-01-17 17:28 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: davem, kuba, pabeni, pablo, netdev, netfilter-devel, jhs,
	xiyou.wangcong, jiri, ozsh, simon.horman

On Tue 17 Jan 2023 at 12:09, Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:
> On Tue, Jan 17, 2023 at 12:04:32PM -0300, Marcelo Ricardo Leitner wrote:
>> On Fri, Jan 13, 2023 at 05:55:42PM +0100, Vlad Buslov wrote:
>> ...
>> >  struct flow_match {
>> > @@ -288,6 +289,7 @@ struct flow_action_entry {
>> >  		} ct;
>> >  		struct {
>> >  			unsigned long cookie;
>> > +			enum ip_conntrack_info ctinfo;
>> >  			u32 mark;
>> >  			u32 labels[4];
>> >  			bool orig_dir;
>> > diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
>> > index 0ca2bb8ed026..515577f913a3 100644
>> > --- a/net/sched/act_ct.c
>> > +++ b/net/sched/act_ct.c
>> > @@ -187,6 +187,7 @@ static void tcf_ct_flow_table_add_action_meta(struct nf_conn *ct,
>> >  	/* aligns with the CT reference on the SKB nf_ct_set */
>> >  	entry->ct_metadata.cookie = (unsigned long)ct | ctinfo;
>>                                                    ^^^^^^^^^^^
>
> Hmm. Thought that just came up and still need to dig into, but wanted
> to share/ask already. Would it be a problem to update the cookie later
> on then, to reflect the new ctinfo?

Not sure I'm following, but every time the flow changes state it is
updated in the driver since new metadata is generated by calling
tcf_ct_flow_table_fill_actions() from nf_flow_offload_rule_alloc().

>
>> 
>> >  	entry->ct_metadata.orig_dir = dir == IP_CT_DIR_ORIGINAL;
>> > +	entry->ct_metadata.ctinfo = ctinfo;
>> 
>> tcf_ct_flow_table_restore_skb() is doing:
>>         enum ip_conntrack_info ctinfo = cookie & NFCT_INFOMASK;
>> 
>> Not sure if it really needs this duplication then.
>> 
>> >  
>> >  	act_ct_labels = entry->ct_metadata.labels;
>> >  	ct_labels = nf_ct_labels_find(ct);
>> > -- 
>> > 2.38.1
>> > 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v2 3/7] netfilter: flowtable: allow unidirectional rules
  2023-01-17 15:15   ` Marcelo Ricardo Leitner
@ 2023-01-17 17:31     ` Vlad Buslov
  0 siblings, 0 replies; 25+ messages in thread
From: Vlad Buslov @ 2023-01-17 17:31 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: davem, kuba, pabeni, pablo, netdev, netfilter-devel, jhs,
	xiyou.wangcong, jiri, ozsh, simon.horman

On Tue 17 Jan 2023 at 12:15, Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:
> On Fri, Jan 13, 2023 at 05:55:44PM +0100, Vlad Buslov wrote:
>> Modify flow table offload to support unidirectional connections by
>> extending enum nf_flow_flags with new "NF_FLOW_HW_BIDIRECTIONAL" flag. Only
>> offload reply direction when the flag is not set. This infrastructure
>                                            ^^^ s///, I believe? :-)

Good catch!

>
> ...
>> -	ok_count += flow_offload_tuple_add(offload, flow_rule[1],
>> -					   FLOW_OFFLOAD_DIR_REPLY);
>> +	if (test_bit(NF_FLOW_HW_BIDIRECTIONAL, &offload->flow->flags))
>> +		ok_count += flow_offload_tuple_add(offload, flow_rule[1],
>> +						   FLOW_OFFLOAD_DIR_REPLY);
>>  	if (ok_count == 0)


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v2 4/7] netfilter: flowtable: allow updating offloaded rules asynchronously
  2023-01-17 15:28   ` Marcelo Ricardo Leitner
@ 2023-01-17 17:33     ` Vlad Buslov
  2023-01-17 17:47       ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 25+ messages in thread
From: Vlad Buslov @ 2023-01-17 17:33 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: davem, kuba, pabeni, pablo, netdev, netfilter-devel, jhs,
	xiyou.wangcong, jiri, ozsh, simon.horman


On Tue 17 Jan 2023 at 12:28, Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:
> On Fri, Jan 13, 2023 at 05:55:45PM +0100, Vlad Buslov wrote:
>> Following patches in series need to update flowtable rule several times
>> during its lifetime in order to synchronize hardware offload with actual ct
>> status. However, reusing existing 'refresh' logic in act_ct would cause
>> data path to potentially schedule significant amount of spurious tasks in
>> 'add' workqueue since it is executed per-packet. Instead, introduce a new
>> flow 'update' flag and use it to schedule async flow refresh in flowtable
>> gc which will only be executed once per gc iteration.
>> 
>> Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
>> ---
>>  include/net/netfilter/nf_flow_table.h |  3 ++-
>>  net/netfilter/nf_flow_table_core.c    | 20 +++++++++++++++-----
>>  net/netfilter/nf_flow_table_offload.c |  5 +++--
>>  3 files changed, 20 insertions(+), 8 deletions(-)
>> 
>> diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
>> index 88ab98ab41d9..e396424e2e68 100644
>> --- a/include/net/netfilter/nf_flow_table.h
>> +++ b/include/net/netfilter/nf_flow_table.h
>> @@ -165,6 +165,7 @@ enum nf_flow_flags {
>>  	NF_FLOW_HW_DEAD,
>>  	NF_FLOW_HW_PENDING,
>>  	NF_FLOW_HW_BIDIRECTIONAL,
>> +	NF_FLOW_HW_UPDATE,
>>  };
>>  
>>  enum flow_offload_type {
>> @@ -300,7 +301,7 @@ unsigned int nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
>>  #define MODULE_ALIAS_NF_FLOWTABLE(family)	\
>>  	MODULE_ALIAS("nf-flowtable-" __stringify(family))
>>  
>> -void nf_flow_offload_add(struct nf_flowtable *flowtable,
>> +bool nf_flow_offload_add(struct nf_flowtable *flowtable,
>>  			 struct flow_offload *flow);
>>  void nf_flow_offload_del(struct nf_flowtable *flowtable,
>>  			 struct flow_offload *flow);
>> diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
>> index 04bd0ed4d2ae..5b495e768655 100644
>> --- a/net/netfilter/nf_flow_table_core.c
>> +++ b/net/netfilter/nf_flow_table_core.c
>> @@ -316,21 +316,28 @@ int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload *flow)
>>  }
>>  EXPORT_SYMBOL_GPL(flow_offload_add);
>>  
>> +static bool __flow_offload_refresh(struct nf_flowtable *flow_table,
>> +				   struct flow_offload *flow)
>> +{
>> +	if (likely(!nf_flowtable_hw_offload(flow_table)))
>> +		return true;
>> +
>> +	return nf_flow_offload_add(flow_table, flow);
>> +}
>> +
>>  void flow_offload_refresh(struct nf_flowtable *flow_table,
>>  			  struct flow_offload *flow)
>>  {
>>  	u32 timeout;
>>  
>>  	timeout = nf_flowtable_time_stamp + flow_offload_get_timeout(flow);
>> -	if (timeout - READ_ONCE(flow->timeout) > HZ)
>> +	if (timeout - READ_ONCE(flow->timeout) > HZ &&
>> +	    !test_bit(NF_FLOW_HW_UPDATE, &flow->flags))
>>  		WRITE_ONCE(flow->timeout, timeout);
>>  	else
>>  		return;
>>  
>> -	if (likely(!nf_flowtable_hw_offload(flow_table)))
>> -		return;
>> -
>> -	nf_flow_offload_add(flow_table, flow);
>> +	__flow_offload_refresh(flow_table, flow);
>>  }
>>  EXPORT_SYMBOL_GPL(flow_offload_refresh);
>>  
>> @@ -435,6 +442,9 @@ static void nf_flow_offload_gc_step(struct nf_flowtable *flow_table,
>>  		} else {
>>  			flow_offload_del(flow_table, flow);
>>  		}
>> +	} else if (test_and_clear_bit(NF_FLOW_HW_UPDATE, &flow->flags)) {
>> +		if (!__flow_offload_refresh(flow_table, flow))
>> +			set_bit(NF_FLOW_HW_UPDATE, &flow->flags);
>>  	} else if (test_bit(NF_FLOW_HW, &flow->flags)) {
>>  		nf_flow_offload_stats(flow_table, flow);
>
> AFAICT even after this patchset it is possible to have both flags set
> at the same time.
> With that, this would cause the stats to skip a beat.
> This would be better:
>
> - 	} else if (test_bit(NF_FLOW_HW, &flow->flags)) {
> - 		nf_flow_offload_stats(flow_table, flow);
> +	} else {
> +		if (test_and_clear_bit(NF_FLOW_HW_UPDATE, &flow->flags))
> +			if (!__flow_offload_refresh(flow_table, flow))
> +				set_bit(NF_FLOW_HW_UPDATE, &flow->flags);
> +	 	if (test_bit(NF_FLOW_HW, &flow->flags))
> + 			nf_flow_offload_stats(flow_table, flow);
>  	}
>
> But a flow cannot have 2 pending actions at a time.

Yes. And timeouts are quite generous so there is IMO no problem in
skipping one iteration. It is not like this wq is high priority and we
can guarantee any exact update interval here anyway.

> Then maybe an update to nf_flow_offload_tuple() to make it handle the
> stats implicitly?

I considered this, but didn't want to over-complicate this series which
is tricky enough as it is.

>
>>  	}
>> diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
>> index 8b852f10fab4..103b2ca8d123 100644
>> --- a/net/netfilter/nf_flow_table_offload.c
>> +++ b/net/netfilter/nf_flow_table_offload.c
>> @@ -1036,16 +1036,17 @@ nf_flow_offload_work_alloc(struct nf_flowtable *flowtable,
>>  }
>>  
>>  
>> -void nf_flow_offload_add(struct nf_flowtable *flowtable,
>> +bool nf_flow_offload_add(struct nf_flowtable *flowtable,
>>  			 struct flow_offload *flow)
>>  {
>>  	struct flow_offload_work *offload;
>>  
>>  	offload = nf_flow_offload_work_alloc(flowtable, flow, FLOW_CLS_REPLACE);
>>  	if (!offload)
>> -		return;
>> +		return false;
>>  
>>  	flow_offload_queue_work(offload);
>> +	return true;
>>  }
>>  
>>  void nf_flow_offload_del(struct nf_flowtable *flowtable,
>> -- 
>> 2.38.1
>> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v2 6/7] net/sched: act_ct: offload UDP NEW connections
  2023-01-17 15:41   ` Marcelo Ricardo Leitner
@ 2023-01-17 17:36     ` Vlad Buslov
  2023-01-17 17:56       ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 25+ messages in thread
From: Vlad Buslov @ 2023-01-17 17:36 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: davem, kuba, pabeni, pablo, netdev, netfilter-devel, jhs,
	xiyou.wangcong, jiri, ozsh, simon.horman


On Tue 17 Jan 2023 at 12:41, Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:
> On Fri, Jan 13, 2023 at 05:55:47PM +0100, Vlad Buslov wrote:
>> When processing connections allow offloading of UDP connections that don't
>> have IPS_ASSURED_BIT set as unidirectional. When performing table lookup
>
> Hmm. Considering that this is now offloading one direction only
> already, what about skipping this grace period:
>
> In nf_conntrack_udp_packet(), it does:
>
>         /* If we've seen traffic both ways, this is some kind of UDP
>          * stream. Set Assured.
>          */
>         if (test_bit(IPS_SEEN_REPLY_BIT, &ct->status)) {
> 		...
>                 /* Still active after two seconds? Extend timeout. */
>                 if (time_after(jiffies, ct->proto.udp.stream_ts)) {
>                         extra = timeouts[UDP_CT_REPLIED];
>                         stream = true;
>                 }
> 		...
>                 /* Also, more likely to be important, and not a probe */
>                 if (stream && !test_and_set_bit(IPS_ASSURED_BIT, &ct->status))
>                         nf_conntrack_event_cache(IPCT_ASSURED, ct);
>
> Maybe the patch should be relying on IPS_SEEN_REPLY_BIT instead of
> ASSURED for UDP? Just a thought here, but I'm not seeing why not.

The issue with this is that if we offload both directions early, then
conntrack state machine will not receive any more packets and,
consecutively, will never change the flow state to assured. I guess that
could be mitigated somehow by periodically checking the hw stats and
transitioning the flow to assured based on them, but as I said in
previous email we don't want to over-complicate this series even more.
Also, offloading to hardware isn't free and costs both memory and CPU,
so it is not like offloading as early as possible is strictly beneficial
for all cases...

>
>> for reply packets check the current connection status: If UDP
>> unidirectional connection became assured also promote the corresponding
>> flow table entry to bidirectional and set the 'update' bit, else just set
>> the 'update' bit since reply directional traffic will most likely cause
>> connection status to become 'established' which requires updating the
>> offload state.
>> 
>> Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
>> ---
>>  net/sched/act_ct.c | 48 ++++++++++++++++++++++++++++++++++------------
>>  1 file changed, 36 insertions(+), 12 deletions(-)
>> 
>> diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
>> index bfddb462d2bc..563cbdd8341c 100644
>> --- a/net/sched/act_ct.c
>> +++ b/net/sched/act_ct.c
>> @@ -369,7 +369,7 @@ static void tcf_ct_flow_tc_ifidx(struct flow_offload *entry,
>>  
>>  static void tcf_ct_flow_table_add(struct tcf_ct_flow_table *ct_ft,
>>  				  struct nf_conn *ct,
>> -				  bool tcp)
>> +				  bool tcp, bool bidirectional)
>>  {
>>  	struct nf_conn_act_ct_ext *act_ct_ext;
>>  	struct flow_offload *entry;
>> @@ -388,6 +388,8 @@ static void tcf_ct_flow_table_add(struct tcf_ct_flow_table *ct_ft,
>>  		ct->proto.tcp.seen[0].flags |= IP_CT_TCP_FLAG_BE_LIBERAL;
>>  		ct->proto.tcp.seen[1].flags |= IP_CT_TCP_FLAG_BE_LIBERAL;
>>  	}
>> +	if (bidirectional)
>> +		__set_bit(NF_FLOW_HW_BIDIRECTIONAL, &entry->flags);
>>  
>>  	act_ct_ext = nf_conn_act_ct_ext_find(ct);
>>  	if (act_ct_ext) {
>> @@ -411,26 +413,34 @@ static void tcf_ct_flow_table_process_conn(struct tcf_ct_flow_table *ct_ft,
>>  					   struct nf_conn *ct,
>>  					   enum ip_conntrack_info ctinfo)
>>  {
>> -	bool tcp = false;
>> -
>> -	if ((ctinfo != IP_CT_ESTABLISHED && ctinfo != IP_CT_ESTABLISHED_REPLY) ||
>> -	    !test_bit(IPS_ASSURED_BIT, &ct->status))
>> -		return;
>> +	bool tcp = false, bidirectional = true;
>>  
>>  	switch (nf_ct_protonum(ct)) {
>>  	case IPPROTO_TCP:
>> -		tcp = true;
>> -		if (ct->proto.tcp.state != TCP_CONNTRACK_ESTABLISHED)
>> +		if ((ctinfo != IP_CT_ESTABLISHED &&
>> +		     ctinfo != IP_CT_ESTABLISHED_REPLY) ||
>> +		    !test_bit(IPS_ASSURED_BIT, &ct->status) ||
>> +		    ct->proto.tcp.state != TCP_CONNTRACK_ESTABLISHED)
>>  			return;
>> +
>> +		tcp = true;
>>  		break;
>>  	case IPPROTO_UDP:
>> +		if (!nf_ct_is_confirmed(ct))
>> +			return;
>> +		if (!test_bit(IPS_ASSURED_BIT, &ct->status))
>> +			bidirectional = false;
>>  		break;
>>  #ifdef CONFIG_NF_CT_PROTO_GRE
>>  	case IPPROTO_GRE: {
>>  		struct nf_conntrack_tuple *tuple;
>>  
>> -		if (ct->status & IPS_NAT_MASK)
>> +		if ((ctinfo != IP_CT_ESTABLISHED &&
>> +		     ctinfo != IP_CT_ESTABLISHED_REPLY) ||
>> +		    !test_bit(IPS_ASSURED_BIT, &ct->status) ||
>> +		    ct->status & IPS_NAT_MASK)
>>  			return;
>> +
>>  		tuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
>>  		/* No support for GRE v1 */
>>  		if (tuple->src.u.gre.key || tuple->dst.u.gre.key)
>> @@ -446,7 +456,7 @@ static void tcf_ct_flow_table_process_conn(struct tcf_ct_flow_table *ct_ft,
>>  	    ct->status & IPS_SEQ_ADJUST)
>>  		return;
>>  
>> -	tcf_ct_flow_table_add(ct_ft, ct, tcp);
>> +	tcf_ct_flow_table_add(ct_ft, ct, tcp, bidirectional);
>>  }
>>  
>>  static bool
>> @@ -625,13 +635,27 @@ static bool tcf_ct_flow_table_lookup(struct tcf_ct_params *p,
>>  	flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]);
>>  	ct = flow->ct;
>>  
>> +	if (dir == FLOW_OFFLOAD_DIR_REPLY &&
>> +	    !test_bit(NF_FLOW_HW_BIDIRECTIONAL, &flow->flags)) {
>> +		/* Only offload reply direction after connection became
>> +		 * assured.
>> +		 */
>> +		if (test_bit(IPS_ASSURED_BIT, &ct->status))
>> +			set_bit(NF_FLOW_HW_BIDIRECTIONAL, &flow->flags);
>> +		set_bit(NF_FLOW_HW_UPDATE, &flow->flags);
>> +		return false;
>> +	}
>> +
>>  	if (tcph && (unlikely(tcph->fin || tcph->rst))) {
>>  		flow_offload_teardown(flow);
>>  		return false;
>>  	}
>>  
>> -	ctinfo = dir == FLOW_OFFLOAD_DIR_ORIGINAL ? IP_CT_ESTABLISHED :
>> -						    IP_CT_ESTABLISHED_REPLY;
>> +	if (dir == FLOW_OFFLOAD_DIR_ORIGINAL)
>> +		ctinfo = test_bit(IPS_SEEN_REPLY_BIT, &ct->status) ?
>> +			IP_CT_ESTABLISHED : IP_CT_NEW;
>> +	else
>> +		ctinfo = IP_CT_ESTABLISHED_REPLY;
>>  
>>  	flow_offload_refresh(nf_ft, flow);
>>  	nf_conntrack_get(&ct->ct_general);
>> -- 
>> 2.38.1
>> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v2 1/7] net: flow_offload: provision conntrack info in ct_metadata
  2023-01-17 14:42   ` Pablo Neira Ayuso
@ 2023-01-17 17:43     ` Vlad Buslov
  0 siblings, 0 replies; 25+ messages in thread
From: Vlad Buslov @ 2023-01-17 17:43 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: davem, kuba, pabeni, netdev, netfilter-devel, jhs,
	xiyou.wangcong, jiri, ozsh, marcelo.leitner, simon.horman


On Tue 17 Jan 2023 at 15:42, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> Hi Vlad,
>
> On Fri, Jan 13, 2023 at 05:55:42PM +0100, Vlad Buslov wrote:
>> In order to offload connections in other states besides "established" the
>> driver offload callbacks need to have access to connection conntrack info.
>> Extend flow offload intermediate representation data structure
>> flow_action_entry->ct_metadata with new enum ip_conntrack_info field and
>> fill it in tcf_ct_flow_table_add_action_meta() callback.
>> 
>> Reject offloading IP_CT_NEW connections for now by returning an error in
>> relevant driver callbacks based on value of ctinfo. Support for offloading
>> such connections will need to be added to the drivers afterwards.
>> 
>> Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
>> ---
>> 
>> Notes:
>>     Changes V1 -> V2:
>>     
>>     - Add missing include that caused compilation errors on certain configs.
>>     
>>     - Change naming in nfp driver as suggested by Simon and Baowen.
>> 
>>  .../ethernet/mellanox/mlx5/core/en/tc_ct.c    |  2 +-
>>  .../ethernet/netronome/nfp/flower/conntrack.c | 20 +++++++++++++++++++
>>  include/net/flow_offload.h                    |  2 ++
>>  net/sched/act_ct.c                            |  1 +
>>  4 files changed, 24 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
>> index 313df8232db7..8cad5cf3305d 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
>> @@ -1077,7 +1077,7 @@ mlx5_tc_ct_block_flow_offload_add(struct mlx5_ct_ft *ft,
>>  	int err;
>>  
>>  	meta_action = mlx5_tc_ct_get_ct_metadata_action(flow_rule);
>> -	if (!meta_action)
>> +	if (!meta_action || meta_action->ct_metadata.ctinfo == IP_CT_NEW)
>>  		return -EOPNOTSUPP;
>>  
>>  	spin_lock_bh(&ct_priv->ht_lock);
>> diff --git a/drivers/net/ethernet/netronome/nfp/flower/conntrack.c b/drivers/net/ethernet/netronome/nfp/flower/conntrack.c
>> index f693119541d5..f7569584b9d8 100644
>> --- a/drivers/net/ethernet/netronome/nfp/flower/conntrack.c
>> +++ b/drivers/net/ethernet/netronome/nfp/flower/conntrack.c
>> @@ -1964,6 +1964,23 @@ int nfp_fl_ct_stats(struct flow_cls_offload *flow,
>>  	return 0;
>>  }
>>  
>> +static bool
>> +nfp_fl_ct_offload_nft_supported(struct flow_cls_offload *flow)
>> +{
>> +	struct flow_rule *flow_rule = flow->rule;
>> +	struct flow_action *flow_action =
>> +		&flow_rule->action;
>> +	struct flow_action_entry *act;
>> +	int i;
>> +
>> +	flow_action_for_each(i, act, flow_action) {
>> +		if (act->id == FLOW_ACTION_CT_METADATA)
>> +			return act->ct_metadata.ctinfo != IP_CT_NEW;
>> +	}
>> +
>> +	return false;
>> +}
>> +
>>  static int
>>  nfp_fl_ct_offload_nft_flow(struct nfp_fl_ct_zone_entry *zt, struct flow_cls_offload *flow)
>>  {
>> @@ -1976,6 +1993,9 @@ nfp_fl_ct_offload_nft_flow(struct nfp_fl_ct_zone_entry *zt, struct flow_cls_offl
>>  	extack = flow->common.extack;
>>  	switch (flow->command) {
>>  	case FLOW_CLS_REPLACE:
>> +		if (!nfp_fl_ct_offload_nft_supported(flow))
>> +			return -EOPNOTSUPP;
>> +
>>  		/* Netfilter can request offload multiple times for the same
>>  		 * flow - protect against adding duplicates.
>>  		 */
>> diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
>> index 0400a0ac8a29..a6adaffb68fb 100644
>> --- a/include/net/flow_offload.h
>> +++ b/include/net/flow_offload.h
>> @@ -4,6 +4,7 @@
>>  #include <linux/kernel.h>
>>  #include <linux/list.h>
>>  #include <linux/netlink.h>
>> +#include <linux/netfilter/nf_conntrack_common.h>
>>  #include <net/flow_dissector.h>
>>  
>>  struct flow_match {
>> @@ -288,6 +289,7 @@ struct flow_action_entry {
>>  		} ct;
>>  		struct {
>>  			unsigned long cookie;
>> +			enum ip_conntrack_info ctinfo;
>
> Maybe you can use a bool here, only possible states that make sense
> are new and established.

As Marcelo suggested we can just obtain same info from the cookie, so
there is no need for adding either ctinfo or a boolean here.

>
>>  			u32 mark;
>>  			u32 labels[4];
>>  			bool orig_dir;
>> diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
>> index 0ca2bb8ed026..515577f913a3 100644
>> --- a/net/sched/act_ct.c
>> +++ b/net/sched/act_ct.c
>> @@ -187,6 +187,7 @@ static void tcf_ct_flow_table_add_action_meta(struct nf_conn *ct,
>>  	/* aligns with the CT reference on the SKB nf_ct_set */
>>  	entry->ct_metadata.cookie = (unsigned long)ct | ctinfo;
>>  	entry->ct_metadata.orig_dir = dir == IP_CT_DIR_ORIGINAL;
>> +	entry->ct_metadata.ctinfo = ctinfo;
>>  
>>  	act_ct_labels = entry->ct_metadata.labels;
>>  	ct_labels = nf_ct_labels_find(ct);
>> -- 
>> 2.38.1
>> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v2 4/7] netfilter: flowtable: allow updating offloaded rules asynchronously
  2023-01-17 17:33     ` Vlad Buslov
@ 2023-01-17 17:47       ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 25+ messages in thread
From: Marcelo Ricardo Leitner @ 2023-01-17 17:47 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: davem, kuba, pabeni, pablo, netdev, netfilter-devel, jhs,
	xiyou.wangcong, jiri, ozsh, simon.horman

On Tue, Jan 17, 2023 at 07:33:31PM +0200, Vlad Buslov wrote:
> 
> On Tue 17 Jan 2023 at 12:28, Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:
> > On Fri, Jan 13, 2023 at 05:55:45PM +0100, Vlad Buslov wrote:
> >> Following patches in series need to update flowtable rule several times
> >> during its lifetime in order to synchronize hardware offload with actual ct
> >> status. However, reusing existing 'refresh' logic in act_ct would cause
> >> data path to potentially schedule significant amount of spurious tasks in
> >> 'add' workqueue since it is executed per-packet. Instead, introduce a new
> >> flow 'update' flag and use it to schedule async flow refresh in flowtable
> >> gc which will only be executed once per gc iteration.
> >> 
> >> Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
> >> ---
> >>  include/net/netfilter/nf_flow_table.h |  3 ++-
> >>  net/netfilter/nf_flow_table_core.c    | 20 +++++++++++++++-----
> >>  net/netfilter/nf_flow_table_offload.c |  5 +++--
> >>  3 files changed, 20 insertions(+), 8 deletions(-)
> >> 
> >> diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
> >> index 88ab98ab41d9..e396424e2e68 100644
> >> --- a/include/net/netfilter/nf_flow_table.h
> >> +++ b/include/net/netfilter/nf_flow_table.h
> >> @@ -165,6 +165,7 @@ enum nf_flow_flags {
> >>  	NF_FLOW_HW_DEAD,
> >>  	NF_FLOW_HW_PENDING,
> >>  	NF_FLOW_HW_BIDIRECTIONAL,
> >> +	NF_FLOW_HW_UPDATE,
> >>  };
> >>  
> >>  enum flow_offload_type {
> >> @@ -300,7 +301,7 @@ unsigned int nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
> >>  #define MODULE_ALIAS_NF_FLOWTABLE(family)	\
> >>  	MODULE_ALIAS("nf-flowtable-" __stringify(family))
> >>  
> >> -void nf_flow_offload_add(struct nf_flowtable *flowtable,
> >> +bool nf_flow_offload_add(struct nf_flowtable *flowtable,
> >>  			 struct flow_offload *flow);
> >>  void nf_flow_offload_del(struct nf_flowtable *flowtable,
> >>  			 struct flow_offload *flow);
> >> diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
> >> index 04bd0ed4d2ae..5b495e768655 100644
> >> --- a/net/netfilter/nf_flow_table_core.c
> >> +++ b/net/netfilter/nf_flow_table_core.c
> >> @@ -316,21 +316,28 @@ int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload *flow)
> >>  }
> >>  EXPORT_SYMBOL_GPL(flow_offload_add);
> >>  
> >> +static bool __flow_offload_refresh(struct nf_flowtable *flow_table,
> >> +				   struct flow_offload *flow)
> >> +{
> >> +	if (likely(!nf_flowtable_hw_offload(flow_table)))
> >> +		return true;
> >> +
> >> +	return nf_flow_offload_add(flow_table, flow);
> >> +}
> >> +
> >>  void flow_offload_refresh(struct nf_flowtable *flow_table,
> >>  			  struct flow_offload *flow)
> >>  {
> >>  	u32 timeout;
> >>  
> >>  	timeout = nf_flowtable_time_stamp + flow_offload_get_timeout(flow);
> >> -	if (timeout - READ_ONCE(flow->timeout) > HZ)
> >> +	if (timeout - READ_ONCE(flow->timeout) > HZ &&
> >> +	    !test_bit(NF_FLOW_HW_UPDATE, &flow->flags))
> >>  		WRITE_ONCE(flow->timeout, timeout);
> >>  	else
> >>  		return;
> >>  
> >> -	if (likely(!nf_flowtable_hw_offload(flow_table)))
> >> -		return;
> >> -
> >> -	nf_flow_offload_add(flow_table, flow);
> >> +	__flow_offload_refresh(flow_table, flow);
> >>  }
> >>  EXPORT_SYMBOL_GPL(flow_offload_refresh);
> >>  
> >> @@ -435,6 +442,9 @@ static void nf_flow_offload_gc_step(struct nf_flowtable *flow_table,
> >>  		} else {
> >>  			flow_offload_del(flow_table, flow);
> >>  		}
> >> +	} else if (test_and_clear_bit(NF_FLOW_HW_UPDATE, &flow->flags)) {
> >> +		if (!__flow_offload_refresh(flow_table, flow))
> >> +			set_bit(NF_FLOW_HW_UPDATE, &flow->flags);
> >>  	} else if (test_bit(NF_FLOW_HW, &flow->flags)) {
> >>  		nf_flow_offload_stats(flow_table, flow);
> >
> > AFAICT even after this patchset it is possible to have both flags set
> > at the same time.
> > With that, this would cause the stats to skip a beat.
> > This would be better:
> >
> > - 	} else if (test_bit(NF_FLOW_HW, &flow->flags)) {
> > - 		nf_flow_offload_stats(flow_table, flow);
> > +	} else {
> > +		if (test_and_clear_bit(NF_FLOW_HW_UPDATE, &flow->flags))
> > +			if (!__flow_offload_refresh(flow_table, flow))
> > +				set_bit(NF_FLOW_HW_UPDATE, &flow->flags);
> > +	 	if (test_bit(NF_FLOW_HW, &flow->flags))
> > + 			nf_flow_offload_stats(flow_table, flow);
> >  	}
> >
> > But a flow cannot have 2 pending actions at a time.
> 
> Yes. And timeouts are quite generous so there is IMO no problem in
> skipping one iteration. It is not like this wq is high priority and we
> can guarantee any exact update interval here anyway.

I cannot disagree, lets say :-)

Perhaps I'm just over worried because of recent issues with ovs and
datapath flows, that it was evicting them because it saw no traffic in
5s and so.

For example,
Subject: [ovs-dev] [PATCH v3] ofproto-dpif-upcall: Wait for valid hw flow stats before applying min-revalidate-pps

And we're still chasing a stall in ovs revalidator that leads to
hicups in datapath stats periodicity.

Yet, I'm not aware of such checks on top of CT entries.

> 
> > Then maybe an update to nf_flow_offload_tuple() to make it handle the
> > stats implicitly?
> 
> I considered this, but didn't want to over-complicate this series which
> is tricky enough as it is.

Makes sense.

> 
> >
> >>  	}
> >> diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
> >> index 8b852f10fab4..103b2ca8d123 100644
> >> --- a/net/netfilter/nf_flow_table_offload.c
> >> +++ b/net/netfilter/nf_flow_table_offload.c
> >> @@ -1036,16 +1036,17 @@ nf_flow_offload_work_alloc(struct nf_flowtable *flowtable,
> >>  }
> >>  
> >>  
> >> -void nf_flow_offload_add(struct nf_flowtable *flowtable,
> >> +bool nf_flow_offload_add(struct nf_flowtable *flowtable,
> >>  			 struct flow_offload *flow)
> >>  {
> >>  	struct flow_offload_work *offload;
> >>  
> >>  	offload = nf_flow_offload_work_alloc(flowtable, flow, FLOW_CLS_REPLACE);
> >>  	if (!offload)
> >> -		return;
> >> +		return false;
> >>  
> >>  	flow_offload_queue_work(offload);
> >> +	return true;
> >>  }
> >>  
> >>  void nf_flow_offload_del(struct nf_flowtable *flowtable,
> >> -- 
> >> 2.38.1
> >> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v2 6/7] net/sched: act_ct: offload UDP NEW connections
  2023-01-17 17:36     ` Vlad Buslov
@ 2023-01-17 17:56       ` Marcelo Ricardo Leitner
  2023-01-17 19:12         ` Vlad Buslov
  0 siblings, 1 reply; 25+ messages in thread
From: Marcelo Ricardo Leitner @ 2023-01-17 17:56 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: davem, kuba, pabeni, pablo, netdev, netfilter-devel, jhs,
	xiyou.wangcong, jiri, ozsh, simon.horman

On Tue, Jan 17, 2023 at 07:36:42PM +0200, Vlad Buslov wrote:
> 
> On Tue 17 Jan 2023 at 12:41, Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:
> > On Fri, Jan 13, 2023 at 05:55:47PM +0100, Vlad Buslov wrote:
> >> When processing connections allow offloading of UDP connections that don't
> >> have IPS_ASSURED_BIT set as unidirectional. When performing table lookup
> >
> > Hmm. Considering that this is now offloading one direction only
> > already, what about skipping this grace period:
> >
> > In nf_conntrack_udp_packet(), it does:
> >
> >         /* If we've seen traffic both ways, this is some kind of UDP
> >          * stream. Set Assured.
> >          */
> >         if (test_bit(IPS_SEEN_REPLY_BIT, &ct->status)) {
> > 		...
> >                 /* Still active after two seconds? Extend timeout. */
> >                 if (time_after(jiffies, ct->proto.udp.stream_ts)) {
> >                         extra = timeouts[UDP_CT_REPLIED];
> >                         stream = true;
> >                 }
> > 		...
> >                 /* Also, more likely to be important, and not a probe */
> >                 if (stream && !test_and_set_bit(IPS_ASSURED_BIT, &ct->status))
> >                         nf_conntrack_event_cache(IPCT_ASSURED, ct);
> >
> > Maybe the patch should be relying on IPS_SEEN_REPLY_BIT instead of
> > ASSURED for UDP? Just a thought here, but I'm not seeing why not.
> 
> The issue with this is that if we offload both directions early, then
> conntrack state machine will not receive any more packets and,
> consecutively, will never change the flow state to assured. I guess that

I'm missing how it would offload each direction independently.
Wouldn't CT state machine see the 1st reply, because it is not
offloaded yet, and match it to the original direction?

> could be mitigated somehow by periodically checking the hw stats and
> transitioning the flow to assured based on them, but as I said in
> previous email we don't want to over-complicate this series even more.
> Also, offloading to hardware isn't free and costs both memory and CPU,
> so it is not like offloading as early as possible is strictly beneficial
> for all cases...

Yup.

> 
> >
> >> for reply packets check the current connection status: If UDP
> >> unidirectional connection became assured also promote the corresponding
> >> flow table entry to bidirectional and set the 'update' bit, else just set
> >> the 'update' bit since reply directional traffic will most likely cause
> >> connection status to become 'established' which requires updating the
> >> offload state.
> >> 
> >> Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
> >> ---
> >>  net/sched/act_ct.c | 48 ++++++++++++++++++++++++++++++++++------------
> >>  1 file changed, 36 insertions(+), 12 deletions(-)
> >> 
> >> diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
> >> index bfddb462d2bc..563cbdd8341c 100644
> >> --- a/net/sched/act_ct.c
> >> +++ b/net/sched/act_ct.c
> >> @@ -369,7 +369,7 @@ static void tcf_ct_flow_tc_ifidx(struct flow_offload *entry,
> >>  
> >>  static void tcf_ct_flow_table_add(struct tcf_ct_flow_table *ct_ft,
> >>  				  struct nf_conn *ct,
> >> -				  bool tcp)
> >> +				  bool tcp, bool bidirectional)
> >>  {
> >>  	struct nf_conn_act_ct_ext *act_ct_ext;
> >>  	struct flow_offload *entry;
> >> @@ -388,6 +388,8 @@ static void tcf_ct_flow_table_add(struct tcf_ct_flow_table *ct_ft,
> >>  		ct->proto.tcp.seen[0].flags |= IP_CT_TCP_FLAG_BE_LIBERAL;
> >>  		ct->proto.tcp.seen[1].flags |= IP_CT_TCP_FLAG_BE_LIBERAL;
> >>  	}
> >> +	if (bidirectional)
> >> +		__set_bit(NF_FLOW_HW_BIDIRECTIONAL, &entry->flags);
> >>  
> >>  	act_ct_ext = nf_conn_act_ct_ext_find(ct);
> >>  	if (act_ct_ext) {
> >> @@ -411,26 +413,34 @@ static void tcf_ct_flow_table_process_conn(struct tcf_ct_flow_table *ct_ft,
> >>  					   struct nf_conn *ct,
> >>  					   enum ip_conntrack_info ctinfo)
> >>  {
> >> -	bool tcp = false;
> >> -
> >> -	if ((ctinfo != IP_CT_ESTABLISHED && ctinfo != IP_CT_ESTABLISHED_REPLY) ||
> >> -	    !test_bit(IPS_ASSURED_BIT, &ct->status))
> >> -		return;
> >> +	bool tcp = false, bidirectional = true;
> >>  
> >>  	switch (nf_ct_protonum(ct)) {
> >>  	case IPPROTO_TCP:
> >> -		tcp = true;
> >> -		if (ct->proto.tcp.state != TCP_CONNTRACK_ESTABLISHED)
> >> +		if ((ctinfo != IP_CT_ESTABLISHED &&
> >> +		     ctinfo != IP_CT_ESTABLISHED_REPLY) ||
> >> +		    !test_bit(IPS_ASSURED_BIT, &ct->status) ||
> >> +		    ct->proto.tcp.state != TCP_CONNTRACK_ESTABLISHED)
> >>  			return;
> >> +
> >> +		tcp = true;
> >>  		break;
> >>  	case IPPROTO_UDP:
> >> +		if (!nf_ct_is_confirmed(ct))
> >> +			return;
> >> +		if (!test_bit(IPS_ASSURED_BIT, &ct->status))
> >> +			bidirectional = false;
> >>  		break;
> >>  #ifdef CONFIG_NF_CT_PROTO_GRE
> >>  	case IPPROTO_GRE: {
> >>  		struct nf_conntrack_tuple *tuple;
> >>  
> >> -		if (ct->status & IPS_NAT_MASK)
> >> +		if ((ctinfo != IP_CT_ESTABLISHED &&
> >> +		     ctinfo != IP_CT_ESTABLISHED_REPLY) ||
> >> +		    !test_bit(IPS_ASSURED_BIT, &ct->status) ||
> >> +		    ct->status & IPS_NAT_MASK)
> >>  			return;
> >> +
> >>  		tuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
> >>  		/* No support for GRE v1 */
> >>  		if (tuple->src.u.gre.key || tuple->dst.u.gre.key)
> >> @@ -446,7 +456,7 @@ static void tcf_ct_flow_table_process_conn(struct tcf_ct_flow_table *ct_ft,
> >>  	    ct->status & IPS_SEQ_ADJUST)
> >>  		return;
> >>  
> >> -	tcf_ct_flow_table_add(ct_ft, ct, tcp);
> >> +	tcf_ct_flow_table_add(ct_ft, ct, tcp, bidirectional);
> >>  }
> >>  
> >>  static bool
> >> @@ -625,13 +635,27 @@ static bool tcf_ct_flow_table_lookup(struct tcf_ct_params *p,
> >>  	flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]);
> >>  	ct = flow->ct;
> >>  
> >> +	if (dir == FLOW_OFFLOAD_DIR_REPLY &&
> >> +	    !test_bit(NF_FLOW_HW_BIDIRECTIONAL, &flow->flags)) {
> >> +		/* Only offload reply direction after connection became
> >> +		 * assured.
> >> +		 */
> >> +		if (test_bit(IPS_ASSURED_BIT, &ct->status))
> >> +			set_bit(NF_FLOW_HW_BIDIRECTIONAL, &flow->flags);
> >> +		set_bit(NF_FLOW_HW_UPDATE, &flow->flags);
> >> +		return false;
> >> +	}
> >> +
> >>  	if (tcph && (unlikely(tcph->fin || tcph->rst))) {
> >>  		flow_offload_teardown(flow);
> >>  		return false;
> >>  	}
> >>  
> >> -	ctinfo = dir == FLOW_OFFLOAD_DIR_ORIGINAL ? IP_CT_ESTABLISHED :
> >> -						    IP_CT_ESTABLISHED_REPLY;
> >> +	if (dir == FLOW_OFFLOAD_DIR_ORIGINAL)
> >> +		ctinfo = test_bit(IPS_SEEN_REPLY_BIT, &ct->status) ?
> >> +			IP_CT_ESTABLISHED : IP_CT_NEW;
> >> +	else
> >> +		ctinfo = IP_CT_ESTABLISHED_REPLY;
> >>  
> >>  	flow_offload_refresh(nf_ft, flow);
> >>  	nf_conntrack_get(&ct->ct_general);
> >> -- 
> >> 2.38.1
> >> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v2 6/7] net/sched: act_ct: offload UDP NEW connections
  2023-01-17 17:56       ` Marcelo Ricardo Leitner
@ 2023-01-17 19:12         ` Vlad Buslov
  2023-01-19  4:18           ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 25+ messages in thread
From: Vlad Buslov @ 2023-01-17 19:12 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: davem, kuba, pabeni, pablo, netdev, netfilter-devel, jhs,
	xiyou.wangcong, jiri, ozsh, simon.horman

On Tue 17 Jan 2023 at 14:56, Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:
> On Tue, Jan 17, 2023 at 07:36:42PM +0200, Vlad Buslov wrote:
>> 
>> On Tue 17 Jan 2023 at 12:41, Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:
>> > On Fri, Jan 13, 2023 at 05:55:47PM +0100, Vlad Buslov wrote:
>> >> When processing connections allow offloading of UDP connections that don't
>> >> have IPS_ASSURED_BIT set as unidirectional. When performing table lookup
>> >
>> > Hmm. Considering that this is now offloading one direction only
>> > already, what about skipping this grace period:
>> >
>> > In nf_conntrack_udp_packet(), it does:
>> >
>> >         /* If we've seen traffic both ways, this is some kind of UDP
>> >          * stream. Set Assured.
>> >          */
>> >         if (test_bit(IPS_SEEN_REPLY_BIT, &ct->status)) {
>> > 		...
>> >                 /* Still active after two seconds? Extend timeout. */
>> >                 if (time_after(jiffies, ct->proto.udp.stream_ts)) {
>> >                         extra = timeouts[UDP_CT_REPLIED];
>> >                         stream = true;
>> >                 }
>> > 		...
>> >                 /* Also, more likely to be important, and not a probe */
>> >                 if (stream && !test_and_set_bit(IPS_ASSURED_BIT, &ct->status))
>> >                         nf_conntrack_event_cache(IPCT_ASSURED, ct);
>> >
>> > Maybe the patch should be relying on IPS_SEEN_REPLY_BIT instead of
>> > ASSURED for UDP? Just a thought here, but I'm not seeing why not.
>> 
>> The issue with this is that if we offload both directions early, then
>> conntrack state machine will not receive any more packets and,
>> consecutively, will never change the flow state to assured. I guess that
>
> I'm missing how it would offload each direction independently.
> Wouldn't CT state machine see the 1st reply, because it is not
> offloaded yet, and match it to the original direction?

What I meant is that if both directions are offloaded as soon as
IPS_SEEN_REPLY_BIT is set, then nf_conntrack_udp_packet() will not be
called for that connection anymore and would never be able to transition
the connection to assured state. But main thing is, as I said in the
previous reply, that we don't need to offload such connection ATM. It
could be done in a follow-up if there is a use-case for it, maybe even
made somehow configurable (with BPF! :)), so it could be dynamically
controlled.

>
>> could be mitigated somehow by periodically checking the hw stats and
>> transitioning the flow to assured based on them, but as I said in
>> previous email we don't want to over-complicate this series even more.
>> Also, offloading to hardware isn't free and costs both memory and CPU,
>> so it is not like offloading as early as possible is strictly beneficial
>> for all cases...
>
> Yup.
>
>> 
>> >
>> >> for reply packets check the current connection status: If UDP
>> >> unidirectional connection became assured also promote the corresponding
>> >> flow table entry to bidirectional and set the 'update' bit, else just set
>> >> the 'update' bit since reply directional traffic will most likely cause
>> >> connection status to become 'established' which requires updating the
>> >> offload state.
>> >> 
>> >> Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
>> >> ---
>> >>  net/sched/act_ct.c | 48 ++++++++++++++++++++++++++++++++++------------
>> >>  1 file changed, 36 insertions(+), 12 deletions(-)
>> >> 
>> >> diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
>> >> index bfddb462d2bc..563cbdd8341c 100644
>> >> --- a/net/sched/act_ct.c
>> >> +++ b/net/sched/act_ct.c
>> >> @@ -369,7 +369,7 @@ static void tcf_ct_flow_tc_ifidx(struct flow_offload *entry,
>> >>  
>> >>  static void tcf_ct_flow_table_add(struct tcf_ct_flow_table *ct_ft,
>> >>  				  struct nf_conn *ct,
>> >> -				  bool tcp)
>> >> +				  bool tcp, bool bidirectional)
>> >>  {
>> >>  	struct nf_conn_act_ct_ext *act_ct_ext;
>> >>  	struct flow_offload *entry;
>> >> @@ -388,6 +388,8 @@ static void tcf_ct_flow_table_add(struct tcf_ct_flow_table *ct_ft,
>> >>  		ct->proto.tcp.seen[0].flags |= IP_CT_TCP_FLAG_BE_LIBERAL;
>> >>  		ct->proto.tcp.seen[1].flags |= IP_CT_TCP_FLAG_BE_LIBERAL;
>> >>  	}
>> >> +	if (bidirectional)
>> >> +		__set_bit(NF_FLOW_HW_BIDIRECTIONAL, &entry->flags);
>> >>  
>> >>  	act_ct_ext = nf_conn_act_ct_ext_find(ct);
>> >>  	if (act_ct_ext) {
>> >> @@ -411,26 +413,34 @@ static void tcf_ct_flow_table_process_conn(struct tcf_ct_flow_table *ct_ft,
>> >>  					   struct nf_conn *ct,
>> >>  					   enum ip_conntrack_info ctinfo)
>> >>  {
>> >> -	bool tcp = false;
>> >> -
>> >> -	if ((ctinfo != IP_CT_ESTABLISHED && ctinfo != IP_CT_ESTABLISHED_REPLY) ||
>> >> -	    !test_bit(IPS_ASSURED_BIT, &ct->status))
>> >> -		return;
>> >> +	bool tcp = false, bidirectional = true;
>> >>  
>> >>  	switch (nf_ct_protonum(ct)) {
>> >>  	case IPPROTO_TCP:
>> >> -		tcp = true;
>> >> -		if (ct->proto.tcp.state != TCP_CONNTRACK_ESTABLISHED)
>> >> +		if ((ctinfo != IP_CT_ESTABLISHED &&
>> >> +		     ctinfo != IP_CT_ESTABLISHED_REPLY) ||
>> >> +		    !test_bit(IPS_ASSURED_BIT, &ct->status) ||
>> >> +		    ct->proto.tcp.state != TCP_CONNTRACK_ESTABLISHED)
>> >>  			return;
>> >> +
>> >> +		tcp = true;
>> >>  		break;
>> >>  	case IPPROTO_UDP:
>> >> +		if (!nf_ct_is_confirmed(ct))
>> >> +			return;
>> >> +		if (!test_bit(IPS_ASSURED_BIT, &ct->status))
>> >> +			bidirectional = false;
>> >>  		break;
>> >>  #ifdef CONFIG_NF_CT_PROTO_GRE
>> >>  	case IPPROTO_GRE: {
>> >>  		struct nf_conntrack_tuple *tuple;
>> >>  
>> >> -		if (ct->status & IPS_NAT_MASK)
>> >> +		if ((ctinfo != IP_CT_ESTABLISHED &&
>> >> +		     ctinfo != IP_CT_ESTABLISHED_REPLY) ||
>> >> +		    !test_bit(IPS_ASSURED_BIT, &ct->status) ||
>> >> +		    ct->status & IPS_NAT_MASK)
>> >>  			return;
>> >> +
>> >>  		tuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
>> >>  		/* No support for GRE v1 */
>> >>  		if (tuple->src.u.gre.key || tuple->dst.u.gre.key)
>> >> @@ -446,7 +456,7 @@ static void tcf_ct_flow_table_process_conn(struct tcf_ct_flow_table *ct_ft,
>> >>  	    ct->status & IPS_SEQ_ADJUST)
>> >>  		return;
>> >>  
>> >> -	tcf_ct_flow_table_add(ct_ft, ct, tcp);
>> >> +	tcf_ct_flow_table_add(ct_ft, ct, tcp, bidirectional);
>> >>  }
>> >>  
>> >>  static bool
>> >> @@ -625,13 +635,27 @@ static bool tcf_ct_flow_table_lookup(struct tcf_ct_params *p,
>> >>  	flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]);
>> >>  	ct = flow->ct;
>> >>  
>> >> +	if (dir == FLOW_OFFLOAD_DIR_REPLY &&
>> >> +	    !test_bit(NF_FLOW_HW_BIDIRECTIONAL, &flow->flags)) {
>> >> +		/* Only offload reply direction after connection became
>> >> +		 * assured.
>> >> +		 */
>> >> +		if (test_bit(IPS_ASSURED_BIT, &ct->status))
>> >> +			set_bit(NF_FLOW_HW_BIDIRECTIONAL, &flow->flags);
>> >> +		set_bit(NF_FLOW_HW_UPDATE, &flow->flags);
>> >> +		return false;
>> >> +	}
>> >> +
>> >>  	if (tcph && (unlikely(tcph->fin || tcph->rst))) {
>> >>  		flow_offload_teardown(flow);
>> >>  		return false;
>> >>  	}
>> >>  
>> >> -	ctinfo = dir == FLOW_OFFLOAD_DIR_ORIGINAL ? IP_CT_ESTABLISHED :
>> >> -						    IP_CT_ESTABLISHED_REPLY;
>> >> +	if (dir == FLOW_OFFLOAD_DIR_ORIGINAL)
>> >> +		ctinfo = test_bit(IPS_SEEN_REPLY_BIT, &ct->status) ?
>> >> +			IP_CT_ESTABLISHED : IP_CT_NEW;
>> >> +	else
>> >> +		ctinfo = IP_CT_ESTABLISHED_REPLY;
>> >>  
>> >>  	flow_offload_refresh(nf_ft, flow);
>> >>  	nf_conntrack_get(&ct->ct_general);
>> >> -- 
>> >> 2.38.1
>> >> 
>> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v2 1/7] net: flow_offload: provision conntrack info in ct_metadata
  2023-01-17 17:28       ` Vlad Buslov
@ 2023-01-19  4:15         ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 25+ messages in thread
From: Marcelo Ricardo Leitner @ 2023-01-19  4:15 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: davem, kuba, pabeni, pablo, netdev, netfilter-devel, jhs,
	xiyou.wangcong, jiri, ozsh, simon.horman

On Tue, Jan 17, 2023 at 07:28:09PM +0200, Vlad Buslov wrote:
> On Tue 17 Jan 2023 at 12:09, Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:
> > On Tue, Jan 17, 2023 at 12:04:32PM -0300, Marcelo Ricardo Leitner wrote:
> >> On Fri, Jan 13, 2023 at 05:55:42PM +0100, Vlad Buslov wrote:
> >> ...
> >> >  struct flow_match {
> >> > @@ -288,6 +289,7 @@ struct flow_action_entry {
> >> >  		} ct;
> >> >  		struct {
> >> >  			unsigned long cookie;
> >> > +			enum ip_conntrack_info ctinfo;
> >> >  			u32 mark;
> >> >  			u32 labels[4];
> >> >  			bool orig_dir;
> >> > diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
> >> > index 0ca2bb8ed026..515577f913a3 100644
> >> > --- a/net/sched/act_ct.c
> >> > +++ b/net/sched/act_ct.c
> >> > @@ -187,6 +187,7 @@ static void tcf_ct_flow_table_add_action_meta(struct nf_conn *ct,
> >> >  	/* aligns with the CT reference on the SKB nf_ct_set */
> >> >  	entry->ct_metadata.cookie = (unsigned long)ct | ctinfo;
> >>                                                    ^^^^^^^^^^^
> >
> > Hmm. Thought that just came up and still need to dig into, but wanted
> > to share/ask already. Would it be a problem to update the cookie later
> > on then, to reflect the new ctinfo?
> 
> Not sure I'm following, but every time the flow changes state it is
> updated in the driver since new metadata is generated by calling
> tcf_ct_flow_table_fill_actions() from nf_flow_offload_rule_alloc().

Whoops.. missed to reply this one.

I worried that the cookie perhaps was used a hash index or so, and
with such update on it, then maybe the key would be changing under the
carpet. Checked now, I don't see such issue.

I guess that's it from my side then. :)

> 
> >
> >> 
> >> >  	entry->ct_metadata.orig_dir = dir == IP_CT_DIR_ORIGINAL;
> >> > +	entry->ct_metadata.ctinfo = ctinfo;
> >> 
> >> tcf_ct_flow_table_restore_skb() is doing:
> >>         enum ip_conntrack_info ctinfo = cookie & NFCT_INFOMASK;
> >> 
> >> Not sure if it really needs this duplication then.
> >> 
> >> >  
> >> >  	act_ct_labels = entry->ct_metadata.labels;
> >> >  	ct_labels = nf_ct_labels_find(ct);
> >> > -- 
> >> > 2.38.1
> >> > 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v2 6/7] net/sched: act_ct: offload UDP NEW connections
  2023-01-17 19:12         ` Vlad Buslov
@ 2023-01-19  4:18           ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 25+ messages in thread
From: Marcelo Ricardo Leitner @ 2023-01-19  4:18 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: davem, kuba, pabeni, pablo, netdev, netfilter-devel, jhs,
	xiyou.wangcong, jiri, ozsh, simon.horman

On Tue, Jan 17, 2023 at 09:12:26PM +0200, Vlad Buslov wrote:
> On Tue 17 Jan 2023 at 14:56, Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:
> > On Tue, Jan 17, 2023 at 07:36:42PM +0200, Vlad Buslov wrote:
> >> 
> >> On Tue 17 Jan 2023 at 12:41, Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:
> >> > On Fri, Jan 13, 2023 at 05:55:47PM +0100, Vlad Buslov wrote:
> >> >> When processing connections allow offloading of UDP connections that don't
> >> >> have IPS_ASSURED_BIT set as unidirectional. When performing table lookup
> >> >
> >> > Hmm. Considering that this is now offloading one direction only
> >> > already, what about skipping this grace period:
> >> >
> >> > In nf_conntrack_udp_packet(), it does:
> >> >
> >> >         /* If we've seen traffic both ways, this is some kind of UDP
> >> >          * stream. Set Assured.
> >> >          */
> >> >         if (test_bit(IPS_SEEN_REPLY_BIT, &ct->status)) {
> >> > 		...
> >> >                 /* Still active after two seconds? Extend timeout. */
> >> >                 if (time_after(jiffies, ct->proto.udp.stream_ts)) {
> >> >                         extra = timeouts[UDP_CT_REPLIED];
> >> >                         stream = true;
> >> >                 }
> >> > 		...
> >> >                 /* Also, more likely to be important, and not a probe */
> >> >                 if (stream && !test_and_set_bit(IPS_ASSURED_BIT, &ct->status))
> >> >                         nf_conntrack_event_cache(IPCT_ASSURED, ct);
> >> >
> >> > Maybe the patch should be relying on IPS_SEEN_REPLY_BIT instead of
> >> > ASSURED for UDP? Just a thought here, but I'm not seeing why not.
> >> 
> >> The issue with this is that if we offload both directions early, then
> >> conntrack state machine will not receive any more packets and,
> >> consecutively, will never change the flow state to assured. I guess that
> >
> > I'm missing how it would offload each direction independently.
> > Wouldn't CT state machine see the 1st reply, because it is not
> > offloaded yet, and match it to the original direction?
> 

(ok, better reply this too instead of radio silence)

> What I meant is that if both directions are offloaded as soon as
> IPS_SEEN_REPLY_BIT is set, then nf_conntrack_udp_packet() will not be
> called for that connection anymore and would never be able to transition
> the connection to assured state. But main thing is, as I said in the

Oh, right!

> previous reply, that we don't need to offload such connection ATM. It
> could be done in a follow-up if there is a use-case for it, maybe even
> made somehow configurable (with BPF! :)), so it could be dynamically
> controlled.
> 
> >
> >> could be mitigated somehow by periodically checking the hw stats and
> >> transitioning the flow to assured based on them, but as I said in
> >> previous email we don't want to over-complicate this series even more.
> >> Also, offloading to hardware isn't free and costs both memory and CPU,
> >> so it is not like offloading as early as possible is strictly beneficial
> >> for all cases...
> >
> > Yup.
> >
> >> 
> >> >
> >> >> for reply packets check the current connection status: If UDP
> >> >> unidirectional connection became assured also promote the corresponding
> >> >> flow table entry to bidirectional and set the 'update' bit, else just set
> >> >> the 'update' bit since reply directional traffic will most likely cause
> >> >> connection status to become 'established' which requires updating the
> >> >> offload state.
> >> >> 
> >> >> Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
> >> >> ---
> >> >>  net/sched/act_ct.c | 48 ++++++++++++++++++++++++++++++++++------------
> >> >>  1 file changed, 36 insertions(+), 12 deletions(-)
> >> >> 
> >> >> diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
> >> >> index bfddb462d2bc..563cbdd8341c 100644
> >> >> --- a/net/sched/act_ct.c
> >> >> +++ b/net/sched/act_ct.c
> >> >> @@ -369,7 +369,7 @@ static void tcf_ct_flow_tc_ifidx(struct flow_offload *entry,
> >> >>  
> >> >>  static void tcf_ct_flow_table_add(struct tcf_ct_flow_table *ct_ft,
> >> >>  				  struct nf_conn *ct,
> >> >> -				  bool tcp)
> >> >> +				  bool tcp, bool bidirectional)
> >> >>  {
> >> >>  	struct nf_conn_act_ct_ext *act_ct_ext;
> >> >>  	struct flow_offload *entry;
> >> >> @@ -388,6 +388,8 @@ static void tcf_ct_flow_table_add(struct tcf_ct_flow_table *ct_ft,
> >> >>  		ct->proto.tcp.seen[0].flags |= IP_CT_TCP_FLAG_BE_LIBERAL;
> >> >>  		ct->proto.tcp.seen[1].flags |= IP_CT_TCP_FLAG_BE_LIBERAL;
> >> >>  	}
> >> >> +	if (bidirectional)
> >> >> +		__set_bit(NF_FLOW_HW_BIDIRECTIONAL, &entry->flags);
> >> >>  
> >> >>  	act_ct_ext = nf_conn_act_ct_ext_find(ct);
> >> >>  	if (act_ct_ext) {
> >> >> @@ -411,26 +413,34 @@ static void tcf_ct_flow_table_process_conn(struct tcf_ct_flow_table *ct_ft,
> >> >>  					   struct nf_conn *ct,
> >> >>  					   enum ip_conntrack_info ctinfo)
> >> >>  {
> >> >> -	bool tcp = false;
> >> >> -
> >> >> -	if ((ctinfo != IP_CT_ESTABLISHED && ctinfo != IP_CT_ESTABLISHED_REPLY) ||
> >> >> -	    !test_bit(IPS_ASSURED_BIT, &ct->status))
> >> >> -		return;
> >> >> +	bool tcp = false, bidirectional = true;
> >> >>  
> >> >>  	switch (nf_ct_protonum(ct)) {
> >> >>  	case IPPROTO_TCP:
> >> >> -		tcp = true;
> >> >> -		if (ct->proto.tcp.state != TCP_CONNTRACK_ESTABLISHED)
> >> >> +		if ((ctinfo != IP_CT_ESTABLISHED &&
> >> >> +		     ctinfo != IP_CT_ESTABLISHED_REPLY) ||
> >> >> +		    !test_bit(IPS_ASSURED_BIT, &ct->status) ||
> >> >> +		    ct->proto.tcp.state != TCP_CONNTRACK_ESTABLISHED)
> >> >>  			return;
> >> >> +
> >> >> +		tcp = true;
> >> >>  		break;
> >> >>  	case IPPROTO_UDP:
> >> >> +		if (!nf_ct_is_confirmed(ct))
> >> >> +			return;
> >> >> +		if (!test_bit(IPS_ASSURED_BIT, &ct->status))
> >> >> +			bidirectional = false;
> >> >>  		break;
> >> >>  #ifdef CONFIG_NF_CT_PROTO_GRE
> >> >>  	case IPPROTO_GRE: {
> >> >>  		struct nf_conntrack_tuple *tuple;
> >> >>  
> >> >> -		if (ct->status & IPS_NAT_MASK)
> >> >> +		if ((ctinfo != IP_CT_ESTABLISHED &&
> >> >> +		     ctinfo != IP_CT_ESTABLISHED_REPLY) ||
> >> >> +		    !test_bit(IPS_ASSURED_BIT, &ct->status) ||
> >> >> +		    ct->status & IPS_NAT_MASK)
> >> >>  			return;
> >> >> +
> >> >>  		tuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
> >> >>  		/* No support for GRE v1 */
> >> >>  		if (tuple->src.u.gre.key || tuple->dst.u.gre.key)
> >> >> @@ -446,7 +456,7 @@ static void tcf_ct_flow_table_process_conn(struct tcf_ct_flow_table *ct_ft,
> >> >>  	    ct->status & IPS_SEQ_ADJUST)
> >> >>  		return;
> >> >>  
> >> >> -	tcf_ct_flow_table_add(ct_ft, ct, tcp);
> >> >> +	tcf_ct_flow_table_add(ct_ft, ct, tcp, bidirectional);
> >> >>  }
> >> >>  
> >> >>  static bool
> >> >> @@ -625,13 +635,27 @@ static bool tcf_ct_flow_table_lookup(struct tcf_ct_params *p,
> >> >>  	flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]);
> >> >>  	ct = flow->ct;
> >> >>  
> >> >> +	if (dir == FLOW_OFFLOAD_DIR_REPLY &&
> >> >> +	    !test_bit(NF_FLOW_HW_BIDIRECTIONAL, &flow->flags)) {
> >> >> +		/* Only offload reply direction after connection became
> >> >> +		 * assured.
> >> >> +		 */
> >> >> +		if (test_bit(IPS_ASSURED_BIT, &ct->status))
> >> >> +			set_bit(NF_FLOW_HW_BIDIRECTIONAL, &flow->flags);
> >> >> +		set_bit(NF_FLOW_HW_UPDATE, &flow->flags);
> >> >> +		return false;
> >> >> +	}
> >> >> +
> >> >>  	if (tcph && (unlikely(tcph->fin || tcph->rst))) {
> >> >>  		flow_offload_teardown(flow);
> >> >>  		return false;
> >> >>  	}
> >> >>  
> >> >> -	ctinfo = dir == FLOW_OFFLOAD_DIR_ORIGINAL ? IP_CT_ESTABLISHED :
> >> >> -						    IP_CT_ESTABLISHED_REPLY;
> >> >> +	if (dir == FLOW_OFFLOAD_DIR_ORIGINAL)
> >> >> +		ctinfo = test_bit(IPS_SEEN_REPLY_BIT, &ct->status) ?
> >> >> +			IP_CT_ESTABLISHED : IP_CT_NEW;
> >> >> +	else
> >> >> +		ctinfo = IP_CT_ESTABLISHED_REPLY;
> >> >>  
> >> >>  	flow_offload_refresh(nf_ft, flow);
> >> >>  	nf_conntrack_get(&ct->ct_general);
> >> >> -- 
> >> >> 2.38.1
> >> >> 
> >> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2023-01-19  4:48 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-13 16:55 [PATCH net-next v2 0/7] Allow offloading of UDP NEW connections via act_ct Vlad Buslov
2023-01-13 16:55 ` [PATCH net-next v2 1/7] net: flow_offload: provision conntrack info in ct_metadata Vlad Buslov
2023-01-17 14:42   ` Pablo Neira Ayuso
2023-01-17 17:43     ` Vlad Buslov
2023-01-17 15:04   ` Marcelo Ricardo Leitner
2023-01-17 15:09     ` Marcelo Ricardo Leitner
2023-01-17 17:28       ` Vlad Buslov
2023-01-19  4:15         ` Marcelo Ricardo Leitner
2023-01-17 17:25     ` Vlad Buslov
2023-01-13 16:55 ` [PATCH net-next v2 2/7] netfilter: flowtable: fixup UDP timeout depending on ct state Vlad Buslov
2023-01-13 16:55 ` [PATCH net-next v2 3/7] netfilter: flowtable: allow unidirectional rules Vlad Buslov
2023-01-17 15:15   ` Marcelo Ricardo Leitner
2023-01-17 17:31     ` Vlad Buslov
2023-01-13 16:55 ` [PATCH net-next v2 4/7] netfilter: flowtable: allow updating offloaded rules asynchronously Vlad Buslov
2023-01-17 15:28   ` Marcelo Ricardo Leitner
2023-01-17 17:33     ` Vlad Buslov
2023-01-17 17:47       ` Marcelo Ricardo Leitner
2023-01-13 16:55 ` [PATCH net-next v2 5/7] net/sched: act_ct: set ctinfo in meta action depending on ct state Vlad Buslov
2023-01-13 16:55 ` [PATCH net-next v2 6/7] net/sched: act_ct: offload UDP NEW connections Vlad Buslov
2023-01-17 15:41   ` Marcelo Ricardo Leitner
2023-01-17 17:36     ` Vlad Buslov
2023-01-17 17:56       ` Marcelo Ricardo Leitner
2023-01-17 19:12         ` Vlad Buslov
2023-01-19  4:18           ` Marcelo Ricardo Leitner
2023-01-13 16:55 ` [PATCH net-next v2 7/7] netfilter: nf_conntrack: allow early drop of offloaded UDP conns Vlad Buslov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.