All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v3 0/7] Allow offloading of UDP NEW connections via act_ct
@ 2023-01-19 19:50 Vlad Buslov
  2023-01-19 19:50 ` [PATCH net-next v3 1/7] net: flow_offload: provision conntrack info in ct_metadata Vlad Buslov
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Vlad Buslov @ 2023-01-19 19:50 UTC (permalink / raw)
  To: davem, kuba, pabeni, pablo
  Cc: netdev, netfilter-devel, jhs, xiyou.wangcong, jiri, ozsh,
	marcelo.leitner, simon.horman, Vlad Buslov

Currently only bidirectional established connections can be offloaded
via act_ct. Such approach allows to hardcode a lot of assumptions into
act_ct, flow_table and flow_offload intermediate layer codes. In order
to enabled offloading of unidirectional UDP NEW connections start with
incrementally changing the following assumptions:

- Drivers assume that only established connections are offloaded and
  don't support updating existing connections. Extract ctinfo from meta
  action cookie and refuse offloading of new connections in the drivers.

- Fix flow_table offload fixup algorithm to calculate flow timeout
  according to current connection state instead of hardcoded
  "established" value.

- Add new flow_table flow flag that designates bidirectional connections
  instead of assuming it and hardcoding hardware offload of every flow
  in both directions.

- Add new flow_table flow flag that marks the flow for asynchronous
  update. Hardware offload state of such flows is updated by gc task by
  leveraging existing flow 'refresh' code.

With all the necessary infrastructure in place modify act_ct to offload
UDP NEW as unidirectional connection. Pass reply direction traffic to CT
and promote connection to bidirectional when UDP connection state
changes to "assured". Rely on refresh mechanism to propagate connection
state change to supporting drivers.

Note that early drop algorithm that is designed to free up some space in
connection tracking table when it becomes full (by randomly deleting up
to 5% of non-established connections) currently ignores connections
marked as "offloaded". Now, with UDP NEW connections becoming
"offloaded" it could allow malicious user to perform DoS attack by
filling the table with non-droppable UDP NEW connections by sending just
one packet in single direction. To prevent such scenario change early
drop algorithm to also consider "offloaded" connections for deletion.

Vlad Buslov (7):
  net: flow_offload: provision conntrack info in ct_metadata
  netfilter: flowtable: fixup UDP timeout depending on ct state
  netfilter: flowtable: allow unidirectional rules
  netfilter: flowtable: allow updating offloaded rules asynchronously
  net/sched: act_ct: set ctinfo in meta action depending on ct state
  net/sched: act_ct: offload UDP NEW connections
  netfilter: nf_conntrack: allow early drop of offloaded UDP conns

 .../ethernet/mellanox/mlx5/core/en/tc_ct.c    |  4 +-
 .../ethernet/netronome/nfp/flower/conntrack.c | 24 ++++++++
 include/net/netfilter/nf_flow_table.h         |  4 +-
 net/netfilter/nf_conntrack_core.c             | 11 ++--
 net/netfilter/nf_flow_table_core.c            | 25 +++++++--
 net/netfilter/nf_flow_table_offload.c         | 17 ++++--
 net/sched/act_ct.c                            | 55 ++++++++++++++-----
 7 files changed, 107 insertions(+), 33 deletions(-)

-- 
2.38.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net-next v3 1/7] net: flow_offload: provision conntrack info in ct_metadata
  2023-01-19 19:50 [PATCH net-next v3 0/7] Allow offloading of UDP NEW connections via act_ct Vlad Buslov
@ 2023-01-19 19:50 ` Vlad Buslov
  2023-01-19 19:50 ` [PATCH net-next v3 2/7] netfilter: flowtable: fixup UDP timeout depending on ct state Vlad Buslov
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Vlad Buslov @ 2023-01-19 19:50 UTC (permalink / raw)
  To: davem, kuba, pabeni, pablo
  Cc: netdev, netfilter-devel, jhs, xiyou.wangcong, jiri, ozsh,
	marcelo.leitner, simon.horman, Vlad Buslov

In order to offload connections in other states besides "established" the
driver offload callbacks need to have access to connection conntrack info.
Flow offload intermediate representation data structure already contains
that data encoded in 'cookie' field, so just reuse it in the drivers.

Reject offloading IP_CT_NEW connections for now by returning an error in
relevant driver callbacks based on value of ctinfo. Support for offloading
such connections will need to be added to the drivers afterwards.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
---

Notes:
    Changes V2 -> V3:
    
    - Reuse existing meta action 'cookie' field to obtain ctinfo instead of
    introducing a new field as suggested by Marcelo.
    
    Changes V1 -> V2:
    
    - Add missing include that caused compilation errors on certain configs.
    
    - Change naming in nfp driver as suggested by Simon and Baowen.

 .../ethernet/mellanox/mlx5/core/en/tc_ct.c    |  4 +++-
 .../ethernet/netronome/nfp/flower/conntrack.c | 24 +++++++++++++++++++
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
index 313df8232db7..6774e441f490 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
@@ -1073,11 +1073,13 @@ mlx5_tc_ct_block_flow_offload_add(struct mlx5_ct_ft *ft,
 	struct mlx5_tc_ct_priv *ct_priv = ft->ct_priv;
 	struct flow_action_entry *meta_action;
 	unsigned long cookie = flow->cookie;
+	enum ip_conntrack_info ctinfo;
 	struct mlx5_ct_entry *entry;
 	int err;
 
 	meta_action = mlx5_tc_ct_get_ct_metadata_action(flow_rule);
-	if (!meta_action)
+	ctinfo = meta_action->ct_metadata.cookie & NFCT_INFOMASK;
+	if (!meta_action || ctinfo == IP_CT_NEW)
 		return -EOPNOTSUPP;
 
 	spin_lock_bh(&ct_priv->ht_lock);
diff --git a/drivers/net/ethernet/netronome/nfp/flower/conntrack.c b/drivers/net/ethernet/netronome/nfp/flower/conntrack.c
index f693119541d5..d23830b5bcb8 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/conntrack.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/conntrack.c
@@ -1964,6 +1964,27 @@ int nfp_fl_ct_stats(struct flow_cls_offload *flow,
 	return 0;
 }
 
+static bool
+nfp_fl_ct_offload_nft_supported(struct flow_cls_offload *flow)
+{
+	struct flow_rule *flow_rule = flow->rule;
+	struct flow_action *flow_action =
+		&flow_rule->action;
+	struct flow_action_entry *act;
+	int i;
+
+	flow_action_for_each(i, act, flow_action) {
+		if (act->id == FLOW_ACTION_CT_METADATA) {
+			enum ip_conntrack_info ctinfo =
+				act->ct_metadata.cookie & NFCT_INFOMASK;
+
+			return ctinfo != IP_CT_NEW;
+		}
+	}
+
+	return false;
+}
+
 static int
 nfp_fl_ct_offload_nft_flow(struct nfp_fl_ct_zone_entry *zt, struct flow_cls_offload *flow)
 {
@@ -1976,6 +1997,9 @@ nfp_fl_ct_offload_nft_flow(struct nfp_fl_ct_zone_entry *zt, struct flow_cls_offl
 	extack = flow->common.extack;
 	switch (flow->command) {
 	case FLOW_CLS_REPLACE:
+		if (!nfp_fl_ct_offload_nft_supported(flow))
+			return -EOPNOTSUPP;
+
 		/* Netfilter can request offload multiple times for the same
 		 * flow - protect against adding duplicates.
 		 */
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v3 2/7] netfilter: flowtable: fixup UDP timeout depending on ct state
  2023-01-19 19:50 [PATCH net-next v3 0/7] Allow offloading of UDP NEW connections via act_ct Vlad Buslov
  2023-01-19 19:50 ` [PATCH net-next v3 1/7] net: flow_offload: provision conntrack info in ct_metadata Vlad Buslov
@ 2023-01-19 19:50 ` Vlad Buslov
  2023-01-20 11:57   ` Pablo Neira Ayuso
  2023-01-19 19:51 ` [PATCH net-next v3 3/7] netfilter: flowtable: allow unidirectional rules Vlad Buslov
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Vlad Buslov @ 2023-01-19 19:50 UTC (permalink / raw)
  To: davem, kuba, pabeni, pablo
  Cc: netdev, netfilter-devel, jhs, xiyou.wangcong, jiri, ozsh,
	marcelo.leitner, simon.horman, Vlad Buslov

Currently flow_offload_fixup_ct() function assumes that only replied UDP
connections can be offloaded and hardcodes UDP_CT_REPLIED timeout value. To
enable UDP NEW connection offload in following patches extract the actual
connections state from ct->status and set the timeout according to it.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
---
 net/netfilter/nf_flow_table_core.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index 81c26a96c30b..04bd0ed4d2ae 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -193,8 +193,11 @@ static void flow_offload_fixup_ct(struct nf_conn *ct)
 		timeout -= tn->offload_timeout;
 	} else if (l4num == IPPROTO_UDP) {
 		struct nf_udp_net *tn = nf_udp_pernet(net);
+		enum udp_conntrack state =
+			test_bit(IPS_SEEN_REPLY_BIT, &ct->status) ?
+			UDP_CT_REPLIED : UDP_CT_UNREPLIED;
 
-		timeout = tn->timeouts[UDP_CT_REPLIED];
+		timeout = tn->timeouts[state];
 		timeout -= tn->offload_timeout;
 	} else {
 		return;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v3 3/7] netfilter: flowtable: allow unidirectional rules
  2023-01-19 19:50 [PATCH net-next v3 0/7] Allow offloading of UDP NEW connections via act_ct Vlad Buslov
  2023-01-19 19:50 ` [PATCH net-next v3 1/7] net: flow_offload: provision conntrack info in ct_metadata Vlad Buslov
  2023-01-19 19:50 ` [PATCH net-next v3 2/7] netfilter: flowtable: fixup UDP timeout depending on ct state Vlad Buslov
@ 2023-01-19 19:51 ` Vlad Buslov
  2023-01-19 19:51 ` [PATCH net-next v3 4/7] netfilter: flowtable: allow updating offloaded rules asynchronously Vlad Buslov
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Vlad Buslov @ 2023-01-19 19:51 UTC (permalink / raw)
  To: davem, kuba, pabeni, pablo
  Cc: netdev, netfilter-devel, jhs, xiyou.wangcong, jiri, ozsh,
	marcelo.leitner, simon.horman, Vlad Buslov

Modify flow table offload to support unidirectional connections by
extending enum nf_flow_flags with new "NF_FLOW_HW_BIDIRECTIONAL" flag. Only
offload reply direction when the flag is set. This infrastructure change is
necessary to support offloading UDP NEW connections in original direction
in following patches in series.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
---

Notes:
    Changes V2 -> V3:
    
    - Fix error in commit message (spotted by Marcelo).

 include/net/netfilter/nf_flow_table.h |  1 +
 net/netfilter/nf_flow_table_offload.c | 12 ++++++++----
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index cd982f4a0f50..88ab98ab41d9 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -164,6 +164,7 @@ enum nf_flow_flags {
 	NF_FLOW_HW_DYING,
 	NF_FLOW_HW_DEAD,
 	NF_FLOW_HW_PENDING,
+	NF_FLOW_HW_BIDIRECTIONAL,
 };
 
 enum flow_offload_type {
diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
index 4d9b99abe37d..8b852f10fab4 100644
--- a/net/netfilter/nf_flow_table_offload.c
+++ b/net/netfilter/nf_flow_table_offload.c
@@ -895,8 +895,9 @@ static int flow_offload_rule_add(struct flow_offload_work *offload,
 
 	ok_count += flow_offload_tuple_add(offload, flow_rule[0],
 					   FLOW_OFFLOAD_DIR_ORIGINAL);
-	ok_count += flow_offload_tuple_add(offload, flow_rule[1],
-					   FLOW_OFFLOAD_DIR_REPLY);
+	if (test_bit(NF_FLOW_HW_BIDIRECTIONAL, &offload->flow->flags))
+		ok_count += flow_offload_tuple_add(offload, flow_rule[1],
+						   FLOW_OFFLOAD_DIR_REPLY);
 	if (ok_count == 0)
 		return -ENOENT;
 
@@ -926,7 +927,8 @@ static void flow_offload_work_del(struct flow_offload_work *offload)
 {
 	clear_bit(IPS_HW_OFFLOAD_BIT, &offload->flow->ct->status);
 	flow_offload_tuple_del(offload, FLOW_OFFLOAD_DIR_ORIGINAL);
-	flow_offload_tuple_del(offload, FLOW_OFFLOAD_DIR_REPLY);
+	if (test_bit(NF_FLOW_HW_BIDIRECTIONAL, &offload->flow->flags))
+		flow_offload_tuple_del(offload, FLOW_OFFLOAD_DIR_REPLY);
 	set_bit(NF_FLOW_HW_DEAD, &offload->flow->flags);
 }
 
@@ -946,7 +948,9 @@ static void flow_offload_work_stats(struct flow_offload_work *offload)
 	u64 lastused;
 
 	flow_offload_tuple_stats(offload, FLOW_OFFLOAD_DIR_ORIGINAL, &stats[0]);
-	flow_offload_tuple_stats(offload, FLOW_OFFLOAD_DIR_REPLY, &stats[1]);
+	if (test_bit(NF_FLOW_HW_BIDIRECTIONAL, &offload->flow->flags))
+		flow_offload_tuple_stats(offload, FLOW_OFFLOAD_DIR_REPLY,
+					 &stats[1]);
 
 	lastused = max_t(u64, stats[0].lastused, stats[1].lastused);
 	offload->flow->timeout = max_t(u64, offload->flow->timeout,
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v3 4/7] netfilter: flowtable: allow updating offloaded rules asynchronously
  2023-01-19 19:50 [PATCH net-next v3 0/7] Allow offloading of UDP NEW connections via act_ct Vlad Buslov
                   ` (2 preceding siblings ...)
  2023-01-19 19:51 ` [PATCH net-next v3 3/7] netfilter: flowtable: allow unidirectional rules Vlad Buslov
@ 2023-01-19 19:51 ` Vlad Buslov
  2023-01-20 11:41   ` Pablo Neira Ayuso
  2023-01-19 19:51 ` [PATCH net-next v3 5/7] net/sched: act_ct: set ctinfo in meta action depending on ct state Vlad Buslov
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Vlad Buslov @ 2023-01-19 19:51 UTC (permalink / raw)
  To: davem, kuba, pabeni, pablo
  Cc: netdev, netfilter-devel, jhs, xiyou.wangcong, jiri, ozsh,
	marcelo.leitner, simon.horman, Vlad Buslov

Following patches in series need to update flowtable rule several times
during its lifetime in order to synchronize hardware offload with actual ct
status. However, reusing existing 'refresh' logic in act_ct would cause
data path to potentially schedule significant amount of spurious tasks in
'add' workqueue since it is executed per-packet. Instead, introduce a new
flow 'update' flag and use it to schedule async flow refresh in flowtable
gc which will only be executed once per gc iteration.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
---
 include/net/netfilter/nf_flow_table.h |  3 ++-
 net/netfilter/nf_flow_table_core.c    | 20 +++++++++++++++-----
 net/netfilter/nf_flow_table_offload.c |  5 +++--
 3 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index 88ab98ab41d9..e396424e2e68 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -165,6 +165,7 @@ enum nf_flow_flags {
 	NF_FLOW_HW_DEAD,
 	NF_FLOW_HW_PENDING,
 	NF_FLOW_HW_BIDIRECTIONAL,
+	NF_FLOW_HW_UPDATE,
 };
 
 enum flow_offload_type {
@@ -300,7 +301,7 @@ unsigned int nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
 #define MODULE_ALIAS_NF_FLOWTABLE(family)	\
 	MODULE_ALIAS("nf-flowtable-" __stringify(family))
 
-void nf_flow_offload_add(struct nf_flowtable *flowtable,
+bool nf_flow_offload_add(struct nf_flowtable *flowtable,
 			 struct flow_offload *flow);
 void nf_flow_offload_del(struct nf_flowtable *flowtable,
 			 struct flow_offload *flow);
diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index 04bd0ed4d2ae..5b495e768655 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -316,21 +316,28 @@ int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload *flow)
 }
 EXPORT_SYMBOL_GPL(flow_offload_add);
 
+static bool __flow_offload_refresh(struct nf_flowtable *flow_table,
+				   struct flow_offload *flow)
+{
+	if (likely(!nf_flowtable_hw_offload(flow_table)))
+		return true;
+
+	return nf_flow_offload_add(flow_table, flow);
+}
+
 void flow_offload_refresh(struct nf_flowtable *flow_table,
 			  struct flow_offload *flow)
 {
 	u32 timeout;
 
 	timeout = nf_flowtable_time_stamp + flow_offload_get_timeout(flow);
-	if (timeout - READ_ONCE(flow->timeout) > HZ)
+	if (timeout - READ_ONCE(flow->timeout) > HZ &&
+	    !test_bit(NF_FLOW_HW_UPDATE, &flow->flags))
 		WRITE_ONCE(flow->timeout, timeout);
 	else
 		return;
 
-	if (likely(!nf_flowtable_hw_offload(flow_table)))
-		return;
-
-	nf_flow_offload_add(flow_table, flow);
+	__flow_offload_refresh(flow_table, flow);
 }
 EXPORT_SYMBOL_GPL(flow_offload_refresh);
 
@@ -435,6 +442,9 @@ static void nf_flow_offload_gc_step(struct nf_flowtable *flow_table,
 		} else {
 			flow_offload_del(flow_table, flow);
 		}
+	} else if (test_and_clear_bit(NF_FLOW_HW_UPDATE, &flow->flags)) {
+		if (!__flow_offload_refresh(flow_table, flow))
+			set_bit(NF_FLOW_HW_UPDATE, &flow->flags);
 	} else if (test_bit(NF_FLOW_HW, &flow->flags)) {
 		nf_flow_offload_stats(flow_table, flow);
 	}
diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
index 8b852f10fab4..103b2ca8d123 100644
--- a/net/netfilter/nf_flow_table_offload.c
+++ b/net/netfilter/nf_flow_table_offload.c
@@ -1036,16 +1036,17 @@ nf_flow_offload_work_alloc(struct nf_flowtable *flowtable,
 }
 
 
-void nf_flow_offload_add(struct nf_flowtable *flowtable,
+bool nf_flow_offload_add(struct nf_flowtable *flowtable,
 			 struct flow_offload *flow)
 {
 	struct flow_offload_work *offload;
 
 	offload = nf_flow_offload_work_alloc(flowtable, flow, FLOW_CLS_REPLACE);
 	if (!offload)
-		return;
+		return false;
 
 	flow_offload_queue_work(offload);
+	return true;
 }
 
 void nf_flow_offload_del(struct nf_flowtable *flowtable,
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v3 5/7] net/sched: act_ct: set ctinfo in meta action depending on ct state
  2023-01-19 19:50 [PATCH net-next v3 0/7] Allow offloading of UDP NEW connections via act_ct Vlad Buslov
                   ` (3 preceding siblings ...)
  2023-01-19 19:51 ` [PATCH net-next v3 4/7] netfilter: flowtable: allow updating offloaded rules asynchronously Vlad Buslov
@ 2023-01-19 19:51 ` Vlad Buslov
  2023-01-19 19:51 ` [PATCH net-next v3 6/7] net/sched: act_ct: offload UDP NEW connections Vlad Buslov
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Vlad Buslov @ 2023-01-19 19:51 UTC (permalink / raw)
  To: davem, kuba, pabeni, pablo
  Cc: netdev, netfilter-devel, jhs, xiyou.wangcong, jiri, ozsh,
	marcelo.leitner, simon.horman, Vlad Buslov

Currently tcf_ct_flow_table_add_action_meta() function assumes that only
established connections can be offloaded and always sets ctinfo to either
IP_CT_ESTABLISHED or IP_CT_ESTABLISHED_REPLY strictly based on direction
without checking actual connection state. To enable UDP NEW connection
offload set the ctinfo and metadata cookie based on ct->status value.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
---
 net/sched/act_ct.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 0ca2bb8ed026..52e392de05a4 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -182,8 +182,11 @@ static void tcf_ct_flow_table_add_action_meta(struct nf_conn *ct,
 #if IS_ENABLED(CONFIG_NF_CONNTRACK_MARK)
 	entry->ct_metadata.mark = READ_ONCE(ct->mark);
 #endif
-	ctinfo = dir == IP_CT_DIR_ORIGINAL ? IP_CT_ESTABLISHED :
-					     IP_CT_ESTABLISHED_REPLY;
+	if (dir == IP_CT_DIR_ORIGINAL)
+		ctinfo = test_bit(IPS_SEEN_REPLY_BIT, &ct->status) ?
+			IP_CT_ESTABLISHED : IP_CT_NEW;
+	else
+		ctinfo = IP_CT_ESTABLISHED_REPLY;
 	/* aligns with the CT reference on the SKB nf_ct_set */
 	entry->ct_metadata.cookie = (unsigned long)ct | ctinfo;
 	entry->ct_metadata.orig_dir = dir == IP_CT_DIR_ORIGINAL;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v3 6/7] net/sched: act_ct: offload UDP NEW connections
  2023-01-19 19:50 [PATCH net-next v3 0/7] Allow offloading of UDP NEW connections via act_ct Vlad Buslov
                   ` (4 preceding siblings ...)
  2023-01-19 19:51 ` [PATCH net-next v3 5/7] net/sched: act_ct: set ctinfo in meta action depending on ct state Vlad Buslov
@ 2023-01-19 19:51 ` Vlad Buslov
  2023-01-19 19:51 ` [PATCH net-next v3 7/7] netfilter: nf_conntrack: allow early drop of offloaded UDP conns Vlad Buslov
  2023-01-19 21:37 ` [PATCH net-next v3 0/7] Allow offloading of UDP NEW connections via act_ct Marcelo Ricardo Leitner
  7 siblings, 0 replies; 19+ messages in thread
From: Vlad Buslov @ 2023-01-19 19:51 UTC (permalink / raw)
  To: davem, kuba, pabeni, pablo
  Cc: netdev, netfilter-devel, jhs, xiyou.wangcong, jiri, ozsh,
	marcelo.leitner, simon.horman, Vlad Buslov

When processing connections allow offloading of UDP connections that don't
have IPS_ASSURED_BIT set as unidirectional. When performing table lookup
for reply packets check the current connection status: If UDP
unidirectional connection became assured also promote the corresponding
flow table entry to bidirectional and set the 'update' bit, else just set
the 'update' bit since reply directional traffic will most likely cause
connection status to become 'established' which requires updating the
offload state.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
---
 net/sched/act_ct.c | 48 ++++++++++++++++++++++++++++++++++------------
 1 file changed, 36 insertions(+), 12 deletions(-)

diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 52e392de05a4..dca492eb0e22 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -368,7 +368,7 @@ static void tcf_ct_flow_tc_ifidx(struct flow_offload *entry,
 
 static void tcf_ct_flow_table_add(struct tcf_ct_flow_table *ct_ft,
 				  struct nf_conn *ct,
-				  bool tcp)
+				  bool tcp, bool bidirectional)
 {
 	struct nf_conn_act_ct_ext *act_ct_ext;
 	struct flow_offload *entry;
@@ -387,6 +387,8 @@ static void tcf_ct_flow_table_add(struct tcf_ct_flow_table *ct_ft,
 		ct->proto.tcp.seen[0].flags |= IP_CT_TCP_FLAG_BE_LIBERAL;
 		ct->proto.tcp.seen[1].flags |= IP_CT_TCP_FLAG_BE_LIBERAL;
 	}
+	if (bidirectional)
+		__set_bit(NF_FLOW_HW_BIDIRECTIONAL, &entry->flags);
 
 	act_ct_ext = nf_conn_act_ct_ext_find(ct);
 	if (act_ct_ext) {
@@ -410,26 +412,34 @@ static void tcf_ct_flow_table_process_conn(struct tcf_ct_flow_table *ct_ft,
 					   struct nf_conn *ct,
 					   enum ip_conntrack_info ctinfo)
 {
-	bool tcp = false;
-
-	if ((ctinfo != IP_CT_ESTABLISHED && ctinfo != IP_CT_ESTABLISHED_REPLY) ||
-	    !test_bit(IPS_ASSURED_BIT, &ct->status))
-		return;
+	bool tcp = false, bidirectional = true;
 
 	switch (nf_ct_protonum(ct)) {
 	case IPPROTO_TCP:
-		tcp = true;
-		if (ct->proto.tcp.state != TCP_CONNTRACK_ESTABLISHED)
+		if ((ctinfo != IP_CT_ESTABLISHED &&
+		     ctinfo != IP_CT_ESTABLISHED_REPLY) ||
+		    !test_bit(IPS_ASSURED_BIT, &ct->status) ||
+		    ct->proto.tcp.state != TCP_CONNTRACK_ESTABLISHED)
 			return;
+
+		tcp = true;
 		break;
 	case IPPROTO_UDP:
+		if (!nf_ct_is_confirmed(ct))
+			return;
+		if (!test_bit(IPS_ASSURED_BIT, &ct->status))
+			bidirectional = false;
 		break;
 #ifdef CONFIG_NF_CT_PROTO_GRE
 	case IPPROTO_GRE: {
 		struct nf_conntrack_tuple *tuple;
 
-		if (ct->status & IPS_NAT_MASK)
+		if ((ctinfo != IP_CT_ESTABLISHED &&
+		     ctinfo != IP_CT_ESTABLISHED_REPLY) ||
+		    !test_bit(IPS_ASSURED_BIT, &ct->status) ||
+		    ct->status & IPS_NAT_MASK)
 			return;
+
 		tuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
 		/* No support for GRE v1 */
 		if (tuple->src.u.gre.key || tuple->dst.u.gre.key)
@@ -445,7 +455,7 @@ static void tcf_ct_flow_table_process_conn(struct tcf_ct_flow_table *ct_ft,
 	    ct->status & IPS_SEQ_ADJUST)
 		return;
 
-	tcf_ct_flow_table_add(ct_ft, ct, tcp);
+	tcf_ct_flow_table_add(ct_ft, ct, tcp, bidirectional);
 }
 
 static bool
@@ -624,13 +634,27 @@ static bool tcf_ct_flow_table_lookup(struct tcf_ct_params *p,
 	flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]);
 	ct = flow->ct;
 
+	if (dir == FLOW_OFFLOAD_DIR_REPLY &&
+	    !test_bit(NF_FLOW_HW_BIDIRECTIONAL, &flow->flags)) {
+		/* Only offload reply direction after connection became
+		 * assured.
+		 */
+		if (test_bit(IPS_ASSURED_BIT, &ct->status))
+			set_bit(NF_FLOW_HW_BIDIRECTIONAL, &flow->flags);
+		set_bit(NF_FLOW_HW_UPDATE, &flow->flags);
+		return false;
+	}
+
 	if (tcph && (unlikely(tcph->fin || tcph->rst))) {
 		flow_offload_teardown(flow);
 		return false;
 	}
 
-	ctinfo = dir == FLOW_OFFLOAD_DIR_ORIGINAL ? IP_CT_ESTABLISHED :
-						    IP_CT_ESTABLISHED_REPLY;
+	if (dir == FLOW_OFFLOAD_DIR_ORIGINAL)
+		ctinfo = test_bit(IPS_SEEN_REPLY_BIT, &ct->status) ?
+			IP_CT_ESTABLISHED : IP_CT_NEW;
+	else
+		ctinfo = IP_CT_ESTABLISHED_REPLY;
 
 	flow_offload_refresh(nf_ft, flow);
 	nf_conntrack_get(&ct->ct_general);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v3 7/7] netfilter: nf_conntrack: allow early drop of offloaded UDP conns
  2023-01-19 19:50 [PATCH net-next v3 0/7] Allow offloading of UDP NEW connections via act_ct Vlad Buslov
                   ` (5 preceding siblings ...)
  2023-01-19 19:51 ` [PATCH net-next v3 6/7] net/sched: act_ct: offload UDP NEW connections Vlad Buslov
@ 2023-01-19 19:51 ` Vlad Buslov
  2023-01-19 21:37 ` [PATCH net-next v3 0/7] Allow offloading of UDP NEW connections via act_ct Marcelo Ricardo Leitner
  7 siblings, 0 replies; 19+ messages in thread
From: Vlad Buslov @ 2023-01-19 19:51 UTC (permalink / raw)
  To: davem, kuba, pabeni, pablo
  Cc: netdev, netfilter-devel, jhs, xiyou.wangcong, jiri, ozsh,
	marcelo.leitner, simon.horman, Vlad Buslov

Both synchronous early drop algorithm and asynchronous gc worker completely
ignore connections with IPS_OFFLOAD_BIT status bit set. With new
functionality that enabled UDP NEW connection offload in action CT
malicious user can flood the conntrack table with offloaded UDP connections
by just sending a single packet per 5tuple because such connections can no
longer be deleted by early drop algorithm.

To mitigate the issue allow both early drop and gc to consider offloaded
UDP connections for deletion.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
---
 net/netfilter/nf_conntrack_core.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 496c4920505b..52b824a60176 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1374,9 +1374,6 @@ static unsigned int early_drop_list(struct net *net,
 	hlist_nulls_for_each_entry_rcu(h, n, head, hnnode) {
 		tmp = nf_ct_tuplehash_to_ctrack(h);
 
-		if (test_bit(IPS_OFFLOAD_BIT, &tmp->status))
-			continue;
-
 		if (nf_ct_is_expired(tmp)) {
 			nf_ct_gc_expired(tmp);
 			continue;
@@ -1446,11 +1443,14 @@ static bool gc_worker_skip_ct(const struct nf_conn *ct)
 static bool gc_worker_can_early_drop(const struct nf_conn *ct)
 {
 	const struct nf_conntrack_l4proto *l4proto;
+	u8 protonum = nf_ct_protonum(ct);
 
+	if (test_bit(IPS_OFFLOAD_BIT, &ct->status) && protonum != IPPROTO_UDP)
+		return false;
 	if (!test_bit(IPS_ASSURED_BIT, &ct->status))
 		return true;
 
-	l4proto = nf_ct_l4proto_find(nf_ct_protonum(ct));
+	l4proto = nf_ct_l4proto_find(protonum);
 	if (l4proto->can_early_drop && l4proto->can_early_drop(ct))
 		return true;
 
@@ -1507,7 +1507,8 @@ static void gc_worker(struct work_struct *work)
 
 			if (test_bit(IPS_OFFLOAD_BIT, &tmp->status)) {
 				nf_ct_offload_timeout(tmp);
-				continue;
+				if (!nf_conntrack_max95)
+					continue;
 			}
 
 			if (expired_count > GC_SCAN_EXPIRED_MAX) {
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v3 0/7] Allow offloading of UDP NEW connections via act_ct
  2023-01-19 19:50 [PATCH net-next v3 0/7] Allow offloading of UDP NEW connections via act_ct Vlad Buslov
                   ` (6 preceding siblings ...)
  2023-01-19 19:51 ` [PATCH net-next v3 7/7] netfilter: nf_conntrack: allow early drop of offloaded UDP conns Vlad Buslov
@ 2023-01-19 21:37 ` Marcelo Ricardo Leitner
  2023-01-20  6:38   ` Vlad Buslov
  7 siblings, 1 reply; 19+ messages in thread
From: Marcelo Ricardo Leitner @ 2023-01-19 21:37 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: davem, kuba, pabeni, pablo, netdev, netfilter-devel, jhs,
	xiyou.wangcong, jiri, ozsh, simon.horman

On Thu, Jan 19, 2023 at 08:50:57PM +0100, Vlad Buslov wrote:
> Currently only bidirectional established connections can be offloaded
> via act_ct. Such approach allows to hardcode a lot of assumptions into
> act_ct, flow_table and flow_offload intermediate layer codes. In order
> to enabled offloading of unidirectional UDP NEW connections start with
> incrementally changing the following assumptions:
> 
> - Drivers assume that only established connections are offloaded and
>   don't support updating existing connections. Extract ctinfo from meta
>   action cookie and refuse offloading of new connections in the drivers.

Hi Vlad,

Regarding ct_seq_show(). When dumping the CT entries today, it will do
things like:

        if (!test_bit(IPS_OFFLOAD_BIT, &ct->status))
                seq_printf(s, "%ld ", nf_ct_expires(ct)  / HZ);

omit the timeout, which is okay with this new patchset, but then:

        if (test_bit(IPS_HW_OFFLOAD_BIT, &ct->status))
                seq_puts(s, "[HW_OFFLOAD] ");
        else if (test_bit(IPS_OFFLOAD_BIT, &ct->status))
                seq_puts(s, "[OFFLOAD] ");
        else if (test_bit(IPS_ASSURED_BIT, &ct->status))
                seq_puts(s, "[ASSURED] ");

Previously, in order to be offloaded, it had to be Assured. But not
anymore after this patchset. Thoughts?

  Marcelo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v3 0/7] Allow offloading of UDP NEW connections via act_ct
  2023-01-19 21:37 ` [PATCH net-next v3 0/7] Allow offloading of UDP NEW connections via act_ct Marcelo Ricardo Leitner
@ 2023-01-20  6:38   ` Vlad Buslov
  2023-01-20  6:57     ` Vlad Buslov
  0 siblings, 1 reply; 19+ messages in thread
From: Vlad Buslov @ 2023-01-20  6:38 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: davem, kuba, pabeni, pablo, netdev, netfilter-devel, jhs,
	xiyou.wangcong, jiri, ozsh, simon.horman


On Thu 19 Jan 2023 at 18:37, Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:
> On Thu, Jan 19, 2023 at 08:50:57PM +0100, Vlad Buslov wrote:
>> Currently only bidirectional established connections can be offloaded
>> via act_ct. Such approach allows to hardcode a lot of assumptions into
>> act_ct, flow_table and flow_offload intermediate layer codes. In order
>> to enabled offloading of unidirectional UDP NEW connections start with
>> incrementally changing the following assumptions:
>> 
>> - Drivers assume that only established connections are offloaded and
>>   don't support updating existing connections. Extract ctinfo from meta
>>   action cookie and refuse offloading of new connections in the drivers.
>
> Hi Vlad,
>
> Regarding ct_seq_show(). When dumping the CT entries today, it will do
> things like:
>
>         if (!test_bit(IPS_OFFLOAD_BIT, &ct->status))
>                 seq_printf(s, "%ld ", nf_ct_expires(ct)  / HZ);
>
> omit the timeout, which is okay with this new patchset, but then:
>
>         if (test_bit(IPS_HW_OFFLOAD_BIT, &ct->status))
>                 seq_puts(s, "[HW_OFFLOAD] ");
>         else if (test_bit(IPS_OFFLOAD_BIT, &ct->status))
>                 seq_puts(s, "[OFFLOAD] ");
>         else if (test_bit(IPS_ASSURED_BIT, &ct->status))
>                 seq_puts(s, "[ASSURED] ");
>
> Previously, in order to be offloaded, it had to be Assured. But not
> anymore after this patchset. Thoughts?

Hi Marcelo,

I know that for some reason offloaded entries no longer display
'assured' flag in the dump. This could be changed, but I don't have a
preference either way and this patch set doesn't modify the behavior.
Up to you and maintainers I guess.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v3 0/7] Allow offloading of UDP NEW connections via act_ct
  2023-01-20  6:38   ` Vlad Buslov
@ 2023-01-20  6:57     ` Vlad Buslov
  2023-01-20 11:30       ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 19+ messages in thread
From: Vlad Buslov @ 2023-01-20  6:57 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: davem, kuba, pabeni, pablo, netdev, netfilter-devel, jhs,
	xiyou.wangcong, jiri, ozsh, simon.horman

On Fri 20 Jan 2023 at 08:38, Vlad Buslov <vladbu@nvidia.com> wrote:
> On Thu 19 Jan 2023 at 18:37, Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:
>> On Thu, Jan 19, 2023 at 08:50:57PM +0100, Vlad Buslov wrote:
>>> Currently only bidirectional established connections can be offloaded
>>> via act_ct. Such approach allows to hardcode a lot of assumptions into
>>> act_ct, flow_table and flow_offload intermediate layer codes. In order
>>> to enabled offloading of unidirectional UDP NEW connections start with
>>> incrementally changing the following assumptions:
>>> 
>>> - Drivers assume that only established connections are offloaded and
>>>   don't support updating existing connections. Extract ctinfo from meta
>>>   action cookie and refuse offloading of new connections in the drivers.
>>
>> Hi Vlad,
>>
>> Regarding ct_seq_show(). When dumping the CT entries today, it will do
>> things like:
>>
>>         if (!test_bit(IPS_OFFLOAD_BIT, &ct->status))
>>                 seq_printf(s, "%ld ", nf_ct_expires(ct)  / HZ);
>>
>> omit the timeout, which is okay with this new patchset, but then:
>>
>>         if (test_bit(IPS_HW_OFFLOAD_BIT, &ct->status))
>>                 seq_puts(s, "[HW_OFFLOAD] ");
>>         else if (test_bit(IPS_OFFLOAD_BIT, &ct->status))
>>                 seq_puts(s, "[OFFLOAD] ");
>>         else if (test_bit(IPS_ASSURED_BIT, &ct->status))
>>                 seq_puts(s, "[ASSURED] ");
>>
>> Previously, in order to be offloaded, it had to be Assured. But not
>> anymore after this patchset. Thoughts?
>
> Hi Marcelo,
>
> I know that for some reason offloaded entries no longer display
> 'assured' flag in the dump. This could be changed, but I don't have a
> preference either way and this patch set doesn't modify the behavior.
> Up to you and maintainers I guess.

BTW after checking the log I don't think the assumption that all
offloaded connections are always assured is true. As far as I understand
act_ct originally offloaded established connections and change to
offload assured was made relatively recently in 43332cf97425
("net/sched: act_ct: Offload only ASSURED connections") without
modifying the prints you mentioned.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v3 0/7] Allow offloading of UDP NEW connections via act_ct
  2023-01-20  6:57     ` Vlad Buslov
@ 2023-01-20 11:30       ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 19+ messages in thread
From: Marcelo Ricardo Leitner @ 2023-01-20 11:30 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: davem, kuba, pabeni, pablo, netdev, netfilter-devel, jhs,
	xiyou.wangcong, jiri, ozsh, simon.horman

On Fri, Jan 20, 2023 at 08:57:16AM +0200, Vlad Buslov wrote:
> On Fri 20 Jan 2023 at 08:38, Vlad Buslov <vladbu@nvidia.com> wrote:
> > On Thu 19 Jan 2023 at 18:37, Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:
> >> On Thu, Jan 19, 2023 at 08:50:57PM +0100, Vlad Buslov wrote:
> >>> Currently only bidirectional established connections can be offloaded
> >>> via act_ct. Such approach allows to hardcode a lot of assumptions into
> >>> act_ct, flow_table and flow_offload intermediate layer codes. In order
> >>> to enabled offloading of unidirectional UDP NEW connections start with
> >>> incrementally changing the following assumptions:
> >>> 
> >>> - Drivers assume that only established connections are offloaded and
> >>>   don't support updating existing connections. Extract ctinfo from meta
> >>>   action cookie and refuse offloading of new connections in the drivers.
> >>
> >> Hi Vlad,
> >>
> >> Regarding ct_seq_show(). When dumping the CT entries today, it will do
> >> things like:
> >>
> >>         if (!test_bit(IPS_OFFLOAD_BIT, &ct->status))
> >>                 seq_printf(s, "%ld ", nf_ct_expires(ct)  / HZ);
> >>
> >> omit the timeout, which is okay with this new patchset, but then:
> >>
> >>         if (test_bit(IPS_HW_OFFLOAD_BIT, &ct->status))
> >>                 seq_puts(s, "[HW_OFFLOAD] ");
> >>         else if (test_bit(IPS_OFFLOAD_BIT, &ct->status))
> >>                 seq_puts(s, "[OFFLOAD] ");
> >>         else if (test_bit(IPS_ASSURED_BIT, &ct->status))
> >>                 seq_puts(s, "[ASSURED] ");
> >>
> >> Previously, in order to be offloaded, it had to be Assured. But not
> >> anymore after this patchset. Thoughts?
> >
> > Hi Marcelo,
> >
> > I know that for some reason offloaded entries no longer display
> > 'assured' flag in the dump. This could be changed, but I don't have a
> > preference either way and this patch set doesn't modify the behavior.
> > Up to you and maintainers I guess.
> 
> BTW after checking the log I don't think the assumption that all
> offloaded connections are always assured is true. As far as I understand
> act_ct originally offloaded established connections and change to
> offload assured was made relatively recently in 43332cf97425
> ("net/sched: act_ct: Offload only ASSURED connections") without
> modifying the prints you mentioned.

Oh. Somehow this behavior glued to my mind as it was always there. Not
sure which glue was used, please don't ask :D
Thanks!

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v3 4/7] netfilter: flowtable: allow updating offloaded rules asynchronously
  2023-01-19 19:51 ` [PATCH net-next v3 4/7] netfilter: flowtable: allow updating offloaded rules asynchronously Vlad Buslov
@ 2023-01-20 11:41   ` Pablo Neira Ayuso
  2023-01-24  7:06     ` Vlad Buslov
  0 siblings, 1 reply; 19+ messages in thread
From: Pablo Neira Ayuso @ 2023-01-20 11:41 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: davem, kuba, pabeni, netdev, netfilter-devel, jhs,
	xiyou.wangcong, jiri, ozsh, marcelo.leitner, simon.horman

Hi Vlad,

On Thu, Jan 19, 2023 at 08:51:01PM +0100, Vlad Buslov wrote:
> Following patches in series need to update flowtable rule several times
> during its lifetime in order to synchronize hardware offload with actual ct
> status. However, reusing existing 'refresh' logic in act_ct would cause
> data path to potentially schedule significant amount of spurious tasks in
> 'add' workqueue since it is executed per-packet. Instead, introduce a new
> flow 'update' flag and use it to schedule async flow refresh in flowtable
> gc which will only be executed once per gc iteration.

So the idea is to use a NF_FLOW_HW_UPDATE which triggers the update
from the garbage collector. I understand the motivation here is to
avoid adding more work to the workqueue, by simply letting the gc
thread pick up for the update.

I already proposed in the last year alternative approaches to improve
the workqueue logic, including cancelation of useless work. For
example, cancel a flying "add" work if "delete" just arrive and the
work is still sitting in the queue. Same approach could be use for
this update logic, ie. cancel an add UDP unidirectional or upgrade it
to bidirectional if, by the time we see traffic in both directions,
then work is still sitting in the queue.

I am sorry to say but it seems to me this approach based on flags is
pushing the existing design to the limit. The flag semantics is
already overloaded that this just makes the state machine behind the
flag logic more complicated. I really think we should explore for
better strategies for the offload work to be processed.

Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v3 2/7] netfilter: flowtable: fixup UDP timeout depending on ct state
  2023-01-19 19:50 ` [PATCH net-next v3 2/7] netfilter: flowtable: fixup UDP timeout depending on ct state Vlad Buslov
@ 2023-01-20 11:57   ` Pablo Neira Ayuso
  2023-01-24  7:08     ` Vlad Buslov
  0 siblings, 1 reply; 19+ messages in thread
From: Pablo Neira Ayuso @ 2023-01-20 11:57 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: davem, kuba, pabeni, netdev, netfilter-devel, jhs,
	xiyou.wangcong, jiri, ozsh, marcelo.leitner, simon.horman

On Thu, Jan 19, 2023 at 08:50:59PM +0100, Vlad Buslov wrote:
> Currently flow_offload_fixup_ct() function assumes that only replied UDP
> connections can be offloaded and hardcodes UDP_CT_REPLIED timeout value. To
> enable UDP NEW connection offload in following patches extract the actual
> connections state from ct->status and set the timeout according to it.
> 
> Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
> ---
>  net/netfilter/nf_flow_table_core.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
> index 81c26a96c30b..04bd0ed4d2ae 100644
> --- a/net/netfilter/nf_flow_table_core.c
> +++ b/net/netfilter/nf_flow_table_core.c
> @@ -193,8 +193,11 @@ static void flow_offload_fixup_ct(struct nf_conn *ct)
>  		timeout -= tn->offload_timeout;
>  	} else if (l4num == IPPROTO_UDP) {
>  		struct nf_udp_net *tn = nf_udp_pernet(net);
> +		enum udp_conntrack state =
> +			test_bit(IPS_SEEN_REPLY_BIT, &ct->status) ?
> +			UDP_CT_REPLIED : UDP_CT_UNREPLIED;
>  
> -		timeout = tn->timeouts[UDP_CT_REPLIED];
> +		timeout = tn->timeouts[state];
>  		timeout -= tn->offload_timeout;

For netfilter's flowtable (not talking about act_ct), this is a
"problem" because the flowtable path update with ct->status flags.
In other words, for netfilter's flowtable UDP_CT_UNREPLIED timeout
will be always used for UDP traffic if it is offloaded and no traffic
from the classic path was seen.

If packets go via hardware offload, the host does not see packets in
the reply direction (unless hardware counters are used to set on
IPS_SEEN_REPLY_BIT?).

Then, there is also IPS_ASSURED: Netfilter's flowtable assumes that
TCP flows are only offloaded to hardware if IPS_ASSURED.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v3 4/7] netfilter: flowtable: allow updating offloaded rules asynchronously
  2023-01-20 11:41   ` Pablo Neira Ayuso
@ 2023-01-24  7:06     ` Vlad Buslov
  2023-01-24  8:41       ` Pablo Neira Ayuso
  0 siblings, 1 reply; 19+ messages in thread
From: Vlad Buslov @ 2023-01-24  7:06 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: davem, kuba, pabeni, netdev, netfilter-devel, jhs,
	xiyou.wangcong, jiri, ozsh, marcelo.leitner, simon.horman


On Fri 20 Jan 2023 at 12:41, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> Hi Vlad,
>
> On Thu, Jan 19, 2023 at 08:51:01PM +0100, Vlad Buslov wrote:
>> Following patches in series need to update flowtable rule several times
>> during its lifetime in order to synchronize hardware offload with actual ct
>> status. However, reusing existing 'refresh' logic in act_ct would cause
>> data path to potentially schedule significant amount of spurious tasks in
>> 'add' workqueue since it is executed per-packet. Instead, introduce a new
>> flow 'update' flag and use it to schedule async flow refresh in flowtable
>> gc which will only be executed once per gc iteration.
>
> So the idea is to use a NF_FLOW_HW_UPDATE which triggers the update
> from the garbage collector. I understand the motivation here is to
> avoid adding more work to the workqueue, by simply letting the gc
> thread pick up for the update.
>
> I already proposed in the last year alternative approaches to improve
> the workqueue logic, including cancelation of useless work. For
> example, cancel a flying "add" work if "delete" just arrive and the
> work is still sitting in the queue. Same approach could be use for
> this update logic, ie. cancel an add UDP unidirectional or upgrade it
> to bidirectional if, by the time we see traffic in both directions,
> then work is still sitting in the queue.

Thanks for the suggestion. I'll try to make this work over regular
workqueues without further extending the flow flags and/or putting more
stuff into gc.

>
> I am sorry to say but it seems to me this approach based on flags is
> pushing the existing design to the limit. The flag semantics is
> already overloaded that this just makes the state machine behind the
> flag logic more complicated. I really think we should explore for
> better strategies for the offload work to be processed.

Got it.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v3 2/7] netfilter: flowtable: fixup UDP timeout depending on ct state
  2023-01-20 11:57   ` Pablo Neira Ayuso
@ 2023-01-24  7:08     ` Vlad Buslov
  0 siblings, 0 replies; 19+ messages in thread
From: Vlad Buslov @ 2023-01-24  7:08 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: davem, kuba, pabeni, netdev, netfilter-devel, jhs,
	xiyou.wangcong, jiri, ozsh, marcelo.leitner, simon.horman


On Fri 20 Jan 2023 at 12:57, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Thu, Jan 19, 2023 at 08:50:59PM +0100, Vlad Buslov wrote:
>> Currently flow_offload_fixup_ct() function assumes that only replied UDP
>> connections can be offloaded and hardcodes UDP_CT_REPLIED timeout value. To
>> enable UDP NEW connection offload in following patches extract the actual
>> connections state from ct->status and set the timeout according to it.
>> 
>> Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
>> ---
>>  net/netfilter/nf_flow_table_core.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>> 
>> diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
>> index 81c26a96c30b..04bd0ed4d2ae 100644
>> --- a/net/netfilter/nf_flow_table_core.c
>> +++ b/net/netfilter/nf_flow_table_core.c
>> @@ -193,8 +193,11 @@ static void flow_offload_fixup_ct(struct nf_conn *ct)
>>  		timeout -= tn->offload_timeout;
>>  	} else if (l4num == IPPROTO_UDP) {
>>  		struct nf_udp_net *tn = nf_udp_pernet(net);
>> +		enum udp_conntrack state =
>> +			test_bit(IPS_SEEN_REPLY_BIT, &ct->status) ?
>> +			UDP_CT_REPLIED : UDP_CT_UNREPLIED;
>>  
>> -		timeout = tn->timeouts[UDP_CT_REPLIED];
>> +		timeout = tn->timeouts[state];
>>  		timeout -= tn->offload_timeout;
>
> For netfilter's flowtable (not talking about act_ct), this is a
> "problem" because the flowtable path update with ct->status flags.
> In other words, for netfilter's flowtable UDP_CT_UNREPLIED timeout
> will be always used for UDP traffic if it is offloaded and no traffic
> from the classic path was seen.

Hmm, I didn't consider that netfilter might not update the status. Will
try to decouple the teardown timeout calculation and allow flow_table
users to specify their own.

>
> If packets go via hardware offload, the host does not see packets in
> the reply direction (unless hardware counters are used to set on
> IPS_SEEN_REPLY_BIT?).
>
> Then, there is also IPS_ASSURED: Netfilter's flowtable assumes that
> TCP flows are only offloaded to hardware if IPS_ASSURED.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v3 4/7] netfilter: flowtable: allow updating offloaded rules asynchronously
  2023-01-24  7:06     ` Vlad Buslov
@ 2023-01-24  8:41       ` Pablo Neira Ayuso
  2023-01-24  9:19         ` Vlad Buslov
  0 siblings, 1 reply; 19+ messages in thread
From: Pablo Neira Ayuso @ 2023-01-24  8:41 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: davem, kuba, pabeni, netdev, netfilter-devel, jhs,
	xiyou.wangcong, jiri, ozsh, marcelo.leitner, simon.horman

Hi Vlad,

On Tue, Jan 24, 2023 at 09:06:13AM +0200, Vlad Buslov wrote:
> 
> On Fri 20 Jan 2023 at 12:41, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > Hi Vlad,
> >
> > On Thu, Jan 19, 2023 at 08:51:01PM +0100, Vlad Buslov wrote:
> >> Following patches in series need to update flowtable rule several times
> >> during its lifetime in order to synchronize hardware offload with actual ct
> >> status. However, reusing existing 'refresh' logic in act_ct would cause
> >> data path to potentially schedule significant amount of spurious tasks in
> >> 'add' workqueue since it is executed per-packet. Instead, introduce a new
> >> flow 'update' flag and use it to schedule async flow refresh in flowtable
> >> gc which will only be executed once per gc iteration.
> >
> > So the idea is to use a NF_FLOW_HW_UPDATE which triggers the update
> > from the garbage collector. I understand the motivation here is to
> > avoid adding more work to the workqueue, by simply letting the gc
> > thread pick up for the update.
> >
> > I already proposed in the last year alternative approaches to improve
> > the workqueue logic, including cancelation of useless work. For
> > example, cancel a flying "add" work if "delete" just arrive and the
> > work is still sitting in the queue. Same approach could be use for
> > this update logic, ie. cancel an add UDP unidirectional or upgrade it
> > to bidirectional if, by the time we see traffic in both directions,
> > then work is still sitting in the queue.
> 
> Thanks for the suggestion. I'll try to make this work over regular
> workqueues without further extending the flow flags and/or putting more
> stuff into gc.

Let me make a second pass to sort out thoughts on this.

Either we use regular workqueues (without new flags) or we explore
fully consolidating this hardware offload workqueue infrastructure
around flags, ie. use flags not only for update events, but also for
new and delete.

This would go more in the direction of your _UPDATE flag idea:

- Update the hardware offload workqueue to iterate over the
  flowtable. The hardware offload workqueue would be "scanning" for
  entries in the flowtable that require some sort of update in the
  hardware. The flags would tell what kind of action is needed.

- Add these flags:

NF_FLOW_HW_NEW
NF_FLOW_HW_UPDATE
NF_FLOW_HW_DELETE

and remove the work object (flow_offload_work) and the existing list.
If the workqueue finds an entry with:

NEW|DELETE, this means this is short lived flow, not worth to waste
cycles to offload it.
NEW|UPDATE, this means this is an UDP flow that is bidirectional.

Then, there will be no more work allocation + "flying" work objects to
the hardware offload workqueue. Instead, the hardware offload
workqueue will be iterating.

This approach would need _DONE flags to annotate if the offload
updates have been applied to hardware already (similar to the
conntrack _DONE flags).

(Oh well, this proposal is adding even more flags. But I think flags
are not the issue, but the mixture of the existing flow_offload_work
approach with this new _UPDATE flag and the gc changes).

If flow_offload_work is removed, we would also need to add a:

 struct nf_flowtable *flowtable;

field to the flow_offload entry, which is an entry field that is
passed via flow_offload_work. So it is one extra field for the each
flow_offload entry.

The other alternative is to use the existing nf_flow_offload_add_wq
with UPDATE command, which might result in more flying objects in
turn. I think this is what you are trying to avoid with the _UPDATE
flag approach.

Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v3 4/7] netfilter: flowtable: allow updating offloaded rules asynchronously
  2023-01-24  8:41       ` Pablo Neira Ayuso
@ 2023-01-24  9:19         ` Vlad Buslov
  2023-01-24 13:57           ` Vlad Buslov
  0 siblings, 1 reply; 19+ messages in thread
From: Vlad Buslov @ 2023-01-24  9:19 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: davem, kuba, pabeni, netdev, netfilter-devel, jhs,
	xiyou.wangcong, jiri, ozsh, marcelo.leitner, simon.horman


On Tue 24 Jan 2023 at 09:41, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> Hi Vlad,
>
> On Tue, Jan 24, 2023 at 09:06:13AM +0200, Vlad Buslov wrote:
>> 
>> On Fri 20 Jan 2023 at 12:41, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>> > Hi Vlad,
>> >
>> > On Thu, Jan 19, 2023 at 08:51:01PM +0100, Vlad Buslov wrote:
>> >> Following patches in series need to update flowtable rule several times
>> >> during its lifetime in order to synchronize hardware offload with actual ct
>> >> status. However, reusing existing 'refresh' logic in act_ct would cause
>> >> data path to potentially schedule significant amount of spurious tasks in
>> >> 'add' workqueue since it is executed per-packet. Instead, introduce a new
>> >> flow 'update' flag and use it to schedule async flow refresh in flowtable
>> >> gc which will only be executed once per gc iteration.
>> >
>> > So the idea is to use a NF_FLOW_HW_UPDATE which triggers the update
>> > from the garbage collector. I understand the motivation here is to
>> > avoid adding more work to the workqueue, by simply letting the gc
>> > thread pick up for the update.
>> >
>> > I already proposed in the last year alternative approaches to improve
>> > the workqueue logic, including cancelation of useless work. For
>> > example, cancel a flying "add" work if "delete" just arrive and the
>> > work is still sitting in the queue. Same approach could be use for
>> > this update logic, ie. cancel an add UDP unidirectional or upgrade it
>> > to bidirectional if, by the time we see traffic in both directions,
>> > then work is still sitting in the queue.
>> 
>> Thanks for the suggestion. I'll try to make this work over regular
>> workqueues without further extending the flow flags and/or putting more
>> stuff into gc.
>
> Let me make a second pass to sort out thoughts on this.
>
> Either we use regular workqueues (without new flags) or we explore
> fully consolidating this hardware offload workqueue infrastructure
> around flags, ie. use flags not only for update events, but also for
> new and delete.
>
> This would go more in the direction of your _UPDATE flag idea:
>
> - Update the hardware offload workqueue to iterate over the
>   flowtable. The hardware offload workqueue would be "scanning" for
>   entries in the flowtable that require some sort of update in the
>   hardware. The flags would tell what kind of action is needed.
>
> - Add these flags:
>
> NF_FLOW_HW_NEW
> NF_FLOW_HW_UPDATE
> NF_FLOW_HW_DELETE
>
> and remove the work object (flow_offload_work) and the existing list.
> If the workqueue finds an entry with:
>
> NEW|DELETE, this means this is short lived flow, not worth to waste
> cycles to offload it.
> NEW|UPDATE, this means this is an UDP flow that is bidirectional.
>
> Then, there will be no more work allocation + "flying" work objects to
> the hardware offload workqueue. Instead, the hardware offload
> workqueue will be iterating.
>
> This approach would need _DONE flags to annotate if the offload
> updates have been applied to hardware already (similar to the
> conntrack _DONE flags).
>
> (Oh well, this proposal is adding even more flags. But I think flags
> are not the issue, but the mixture of the existing flow_offload_work
> approach with this new _UPDATE flag and the gc changes).
>
> If flow_offload_work is removed, we would also need to add a:
>
>  struct nf_flowtable *flowtable;
>
> field to the flow_offload entry, which is an entry field that is
> passed via flow_offload_work. So it is one extra field for the each
> flow_offload entry.
>
> The other alternative is to use the existing nf_flow_offload_add_wq
> with UPDATE command, which might result in more flying objects in
> turn. I think this is what you are trying to avoid with the _UPDATE
> flag approach.

This looks interesting, but is very ambitious and will probably be a
bigger change than this whole series. I have an idea how we can leverage
existing 'refresh' mechanism for updating flow state that doesn't
involve large-scale refactoring of existing offload infrastructure,
which I would prefer to try first. WDYT?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v3 4/7] netfilter: flowtable: allow updating offloaded rules asynchronously
  2023-01-24  9:19         ` Vlad Buslov
@ 2023-01-24 13:57           ` Vlad Buslov
  0 siblings, 0 replies; 19+ messages in thread
From: Vlad Buslov @ 2023-01-24 13:57 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: davem, kuba, pabeni, netdev, netfilter-devel, jhs,
	xiyou.wangcong, jiri, ozsh, marcelo.leitner, simon.horman

On Tue 24 Jan 2023 at 11:19, Vlad Buslov <vladbu@nvidia.com> wrote:
> On Tue 24 Jan 2023 at 09:41, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>> Hi Vlad,
>>
>> On Tue, Jan 24, 2023 at 09:06:13AM +0200, Vlad Buslov wrote:
>>> 
>>> On Fri 20 Jan 2023 at 12:41, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>>> > Hi Vlad,
>>> >
>>> > On Thu, Jan 19, 2023 at 08:51:01PM +0100, Vlad Buslov wrote:
>>> >> Following patches in series need to update flowtable rule several times
>>> >> during its lifetime in order to synchronize hardware offload with actual ct
>>> >> status. However, reusing existing 'refresh' logic in act_ct would cause
>>> >> data path to potentially schedule significant amount of spurious tasks in
>>> >> 'add' workqueue since it is executed per-packet. Instead, introduce a new
>>> >> flow 'update' flag and use it to schedule async flow refresh in flowtable
>>> >> gc which will only be executed once per gc iteration.
>>> >
>>> > So the idea is to use a NF_FLOW_HW_UPDATE which triggers the update
>>> > from the garbage collector. I understand the motivation here is to
>>> > avoid adding more work to the workqueue, by simply letting the gc
>>> > thread pick up for the update.
>>> >
>>> > I already proposed in the last year alternative approaches to improve
>>> > the workqueue logic, including cancelation of useless work. For
>>> > example, cancel a flying "add" work if "delete" just arrive and the
>>> > work is still sitting in the queue. Same approach could be use for
>>> > this update logic, ie. cancel an add UDP unidirectional or upgrade it
>>> > to bidirectional if, by the time we see traffic in both directions,
>>> > then work is still sitting in the queue.
>>> 
>>> Thanks for the suggestion. I'll try to make this work over regular
>>> workqueues without further extending the flow flags and/or putting more
>>> stuff into gc.
>>
>> Let me make a second pass to sort out thoughts on this.
>>
>> Either we use regular workqueues (without new flags) or we explore
>> fully consolidating this hardware offload workqueue infrastructure
>> around flags, ie. use flags not only for update events, but also for
>> new and delete.
>>
>> This would go more in the direction of your _UPDATE flag idea:
>>
>> - Update the hardware offload workqueue to iterate over the
>>   flowtable. The hardware offload workqueue would be "scanning" for
>>   entries in the flowtable that require some sort of update in the
>>   hardware. The flags would tell what kind of action is needed.
>>
>> - Add these flags:
>>
>> NF_FLOW_HW_NEW
>> NF_FLOW_HW_UPDATE
>> NF_FLOW_HW_DELETE
>>
>> and remove the work object (flow_offload_work) and the existing list.
>> If the workqueue finds an entry with:
>>
>> NEW|DELETE, this means this is short lived flow, not worth to waste
>> cycles to offload it.
>> NEW|UPDATE, this means this is an UDP flow that is bidirectional.
>>
>> Then, there will be no more work allocation + "flying" work objects to
>> the hardware offload workqueue. Instead, the hardware offload
>> workqueue will be iterating.
>>
>> This approach would need _DONE flags to annotate if the offload
>> updates have been applied to hardware already (similar to the
>> conntrack _DONE flags).
>>
>> (Oh well, this proposal is adding even more flags. But I think flags
>> are not the issue, but the mixture of the existing flow_offload_work
>> approach with this new _UPDATE flag and the gc changes).
>>
>> If flow_offload_work is removed, we would also need to add a:
>>
>>  struct nf_flowtable *flowtable;
>>
>> field to the flow_offload entry, which is an entry field that is
>> passed via flow_offload_work. So it is one extra field for the each
>> flow_offload entry.
>>
>> The other alternative is to use the existing nf_flow_offload_add_wq
>> with UPDATE command, which might result in more flying objects in
>> turn. I think this is what you are trying to avoid with the _UPDATE
>> flag approach.
>
> This looks interesting, but is very ambitious and will probably be a
> bigger change than this whole series. I have an idea how we can leverage
> existing 'refresh' mechanism for updating flow state that doesn't
> involve large-scale refactoring of existing offload infrastructure,
> which I would prefer to try first. WDYT?

Update: to illustrate this point I prepared V4 that uses regular refresh
to update the flow and also tries to prevent excessive wq spam or
updating flow offload to a state that is already outdated.


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-01-24 14:01 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-19 19:50 [PATCH net-next v3 0/7] Allow offloading of UDP NEW connections via act_ct Vlad Buslov
2023-01-19 19:50 ` [PATCH net-next v3 1/7] net: flow_offload: provision conntrack info in ct_metadata Vlad Buslov
2023-01-19 19:50 ` [PATCH net-next v3 2/7] netfilter: flowtable: fixup UDP timeout depending on ct state Vlad Buslov
2023-01-20 11:57   ` Pablo Neira Ayuso
2023-01-24  7:08     ` Vlad Buslov
2023-01-19 19:51 ` [PATCH net-next v3 3/7] netfilter: flowtable: allow unidirectional rules Vlad Buslov
2023-01-19 19:51 ` [PATCH net-next v3 4/7] netfilter: flowtable: allow updating offloaded rules asynchronously Vlad Buslov
2023-01-20 11:41   ` Pablo Neira Ayuso
2023-01-24  7:06     ` Vlad Buslov
2023-01-24  8:41       ` Pablo Neira Ayuso
2023-01-24  9:19         ` Vlad Buslov
2023-01-24 13:57           ` Vlad Buslov
2023-01-19 19:51 ` [PATCH net-next v3 5/7] net/sched: act_ct: set ctinfo in meta action depending on ct state Vlad Buslov
2023-01-19 19:51 ` [PATCH net-next v3 6/7] net/sched: act_ct: offload UDP NEW connections Vlad Buslov
2023-01-19 19:51 ` [PATCH net-next v3 7/7] netfilter: nf_conntrack: allow early drop of offloaded UDP conns Vlad Buslov
2023-01-19 21:37 ` [PATCH net-next v3 0/7] Allow offloading of UDP NEW connections via act_ct Marcelo Ricardo Leitner
2023-01-20  6:38   ` Vlad Buslov
2023-01-20  6:57     ` Vlad Buslov
2023-01-20 11:30       ` Marcelo Ricardo Leitner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.