All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v6 0/7] Allow offloading of UDP NEW connections via act_ct
@ 2023-02-01 16:30 Vlad Buslov
  2023-02-01 16:30 ` [PATCH net-next v6 1/7] net: flow_offload: provision conntrack info in ct_metadata Vlad Buslov
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Vlad Buslov @ 2023-02-01 16:30 UTC (permalink / raw)
  To: davem, kuba, pabeni, pablo
  Cc: netdev, netfilter-devel, jhs, xiyou.wangcong, jiri, ozsh,
	marcelo.leitner, simon.horman, Vlad Buslov

Currently only bidirectional established connections can be offloaded
via act_ct. Such approach allows to hardcode a lot of assumptions into
act_ct, flow_table and flow_offload intermediate layer codes. In order
to enabled offloading of unidirectional UDP NEW connections start with
incrementally changing the following assumptions:

- Drivers assume that only established connections are offloaded and
  don't support updating existing connections. Extract ctinfo from meta
  action cookie and refuse offloading of new connections in the drivers.

- Fix flow_table offload fixup algorithm to calculate flow timeout
  according to current connection state instead of hardcoded
  "established" value.

- Add new flow_table flow flag that designates bidirectional connections
  instead of assuming it and hardcoding hardware offload of every flow
  in both directions.

- Add new flow_table flow flag that designates connections that are
  offloaded to hardware as "established" instead of assuming it. This
  allows some optimizations in act_ct and prevents spamming the
  flow_table workqueue with redundant tasks.

With all the necessary infrastructure in place modify act_ct to offload
UDP NEW as unidirectional connection. Pass reply direction traffic to CT
and promote connection to bidirectional when UDP connection state
changes to "assured". Rely on refresh mechanism to propagate connection
state change to supporting drivers.

Note that early drop algorithm that is designed to free up some space in
connection tracking table when it becomes full (by randomly deleting up
to 5% of non-established connections) currently ignores connections
marked as "offloaded". Now, with UDP NEW connections becoming
"offloaded" it could allow malicious user to perform DoS attack by
filling the table with non-droppable UDP NEW connections by sending just
one packet in single direction. To prevent such scenario change early
drop algorithm to also consider "offloaded" connections for deletion.

Vlad Buslov (7):
  net: flow_offload: provision conntrack info in ct_metadata
  netfilter: flowtable: fixup UDP timeout depending on ct state
  netfilter: flowtable: allow unidirectional rules
  netfilter: flowtable: cache info of last offload
  net/sched: act_ct: set ctinfo in meta action depending on ct state
  net/sched: act_ct: offload UDP NEW connections
  netfilter: nf_conntrack: allow early drop of offloaded UDP conns

 .../ethernet/mellanox/mlx5/core/en/tc_ct.c    |  4 ++
 .../ethernet/netronome/nfp/flower/conntrack.c | 24 +++++++
 include/net/netfilter/nf_flow_table.h         |  8 ++-
 net/netfilter/nf_conntrack_core.c             | 11 ++--
 net/netfilter/nf_flow_table_core.c            |  5 +-
 net/netfilter/nf_flow_table_inet.c            |  2 +-
 net/netfilter/nf_flow_table_offload.c         | 18 +++--
 net/sched/act_ct.c                            | 65 ++++++++++++++-----
 8 files changed, 103 insertions(+), 34 deletions(-)

-- 
2.38.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH net-next v6 1/7] net: flow_offload: provision conntrack info in ct_metadata
  2023-02-01 16:30 [PATCH net-next v6 0/7] Allow offloading of UDP NEW connections via act_ct Vlad Buslov
@ 2023-02-01 16:30 ` Vlad Buslov
  2023-02-01 16:30 ` [PATCH net-next v6 2/7] netfilter: flowtable: fixup UDP timeout depending on ct state Vlad Buslov
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Vlad Buslov @ 2023-02-01 16:30 UTC (permalink / raw)
  To: davem, kuba, pabeni, pablo
  Cc: netdev, netfilter-devel, jhs, xiyou.wangcong, jiri, ozsh,
	marcelo.leitner, simon.horman, Vlad Buslov

In order to offload connections in other states besides "established" the
driver offload callbacks need to have access to connection conntrack info.
Flow offload intermediate representation data structure already contains
that data encoded in 'cookie' field, so just reuse it in the drivers.

Reject offloading IP_CT_NEW connections for now by returning an error in
relevant driver callbacks based on value of ctinfo. Support for offloading
such connections will need to be added to the drivers afterwards.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
---

Notes:
    Changes V3 -> V4:
    
    - Only obtain ctinfo in mlx5 after checking the meta action pointer.
    
    Changes V2 -> V3:
    
    - Reuse existing meta action 'cookie' field to obtain ctinfo instead of
    introducing a new field as suggested by Marcelo.
    
    Changes V1 -> V2:
    
    - Add missing include that caused compilation errors on certain configs.
    
    - Change naming in nfp driver as suggested by Simon and Baowen.

 .../ethernet/mellanox/mlx5/core/en/tc_ct.c    |  4 ++++
 .../ethernet/netronome/nfp/flower/conntrack.c | 24 +++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
index 313df8232db7..193562c14c44 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
@@ -1073,12 +1073,16 @@ mlx5_tc_ct_block_flow_offload_add(struct mlx5_ct_ft *ft,
 	struct mlx5_tc_ct_priv *ct_priv = ft->ct_priv;
 	struct flow_action_entry *meta_action;
 	unsigned long cookie = flow->cookie;
+	enum ip_conntrack_info ctinfo;
 	struct mlx5_ct_entry *entry;
 	int err;
 
 	meta_action = mlx5_tc_ct_get_ct_metadata_action(flow_rule);
 	if (!meta_action)
 		return -EOPNOTSUPP;
+	ctinfo = meta_action->ct_metadata.cookie & NFCT_INFOMASK;
+	if (ctinfo == IP_CT_NEW)
+		return -EOPNOTSUPP;
 
 	spin_lock_bh(&ct_priv->ht_lock);
 	entry = rhashtable_lookup_fast(&ft->ct_entries_ht, &cookie, cts_ht_params);
diff --git a/drivers/net/ethernet/netronome/nfp/flower/conntrack.c b/drivers/net/ethernet/netronome/nfp/flower/conntrack.c
index f693119541d5..d23830b5bcb8 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/conntrack.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/conntrack.c
@@ -1964,6 +1964,27 @@ int nfp_fl_ct_stats(struct flow_cls_offload *flow,
 	return 0;
 }
 
+static bool
+nfp_fl_ct_offload_nft_supported(struct flow_cls_offload *flow)
+{
+	struct flow_rule *flow_rule = flow->rule;
+	struct flow_action *flow_action =
+		&flow_rule->action;
+	struct flow_action_entry *act;
+	int i;
+
+	flow_action_for_each(i, act, flow_action) {
+		if (act->id == FLOW_ACTION_CT_METADATA) {
+			enum ip_conntrack_info ctinfo =
+				act->ct_metadata.cookie & NFCT_INFOMASK;
+
+			return ctinfo != IP_CT_NEW;
+		}
+	}
+
+	return false;
+}
+
 static int
 nfp_fl_ct_offload_nft_flow(struct nfp_fl_ct_zone_entry *zt, struct flow_cls_offload *flow)
 {
@@ -1976,6 +1997,9 @@ nfp_fl_ct_offload_nft_flow(struct nfp_fl_ct_zone_entry *zt, struct flow_cls_offl
 	extack = flow->common.extack;
 	switch (flow->command) {
 	case FLOW_CLS_REPLACE:
+		if (!nfp_fl_ct_offload_nft_supported(flow))
+			return -EOPNOTSUPP;
+
 		/* Netfilter can request offload multiple times for the same
 		 * flow - protect against adding duplicates.
 		 */
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next v6 2/7] netfilter: flowtable: fixup UDP timeout depending on ct state
  2023-02-01 16:30 [PATCH net-next v6 0/7] Allow offloading of UDP NEW connections via act_ct Vlad Buslov
  2023-02-01 16:30 ` [PATCH net-next v6 1/7] net: flow_offload: provision conntrack info in ct_metadata Vlad Buslov
@ 2023-02-01 16:30 ` Vlad Buslov
  2023-02-01 16:30 ` [PATCH net-next v6 3/7] netfilter: flowtable: allow unidirectional rules Vlad Buslov
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Vlad Buslov @ 2023-02-01 16:30 UTC (permalink / raw)
  To: davem, kuba, pabeni, pablo
  Cc: netdev, netfilter-devel, jhs, xiyou.wangcong, jiri, ozsh,
	marcelo.leitner, simon.horman, Vlad Buslov

Currently flow_offload_fixup_ct() function assumes that only replied UDP
connections can be offloaded and hardcodes UDP_CT_REPLIED timeout value. To
enable UDP NEW connection offload in following patches extract the actual
connections state from ct->status and set the timeout according to it.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
---

Notes:
    Changes V5 -> V6:
    
    - Revert the patch to V2 version. Pablo is going to fix the issue of
    netfilter's flow table not updating ct->status flags.
    
    Changes V3 -> V4:
    
    - Rework the patch to decouple netfilter and act_ct timeout fixup
    algorithms.

 net/netfilter/nf_flow_table_core.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index 81c26a96c30b..04bd0ed4d2ae 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -193,8 +193,11 @@ static void flow_offload_fixup_ct(struct nf_conn *ct)
 		timeout -= tn->offload_timeout;
 	} else if (l4num == IPPROTO_UDP) {
 		struct nf_udp_net *tn = nf_udp_pernet(net);
+		enum udp_conntrack state =
+			test_bit(IPS_SEEN_REPLY_BIT, &ct->status) ?
+			UDP_CT_REPLIED : UDP_CT_UNREPLIED;
 
-		timeout = tn->timeouts[UDP_CT_REPLIED];
+		timeout = tn->timeouts[state];
 		timeout -= tn->offload_timeout;
 	} else {
 		return;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next v6 3/7] netfilter: flowtable: allow unidirectional rules
  2023-02-01 16:30 [PATCH net-next v6 0/7] Allow offloading of UDP NEW connections via act_ct Vlad Buslov
  2023-02-01 16:30 ` [PATCH net-next v6 1/7] net: flow_offload: provision conntrack info in ct_metadata Vlad Buslov
  2023-02-01 16:30 ` [PATCH net-next v6 2/7] netfilter: flowtable: fixup UDP timeout depending on ct state Vlad Buslov
@ 2023-02-01 16:30 ` Vlad Buslov
  2023-02-01 16:30 ` [PATCH net-next v6 4/7] netfilter: flowtable: cache info of last offload Vlad Buslov
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Vlad Buslov @ 2023-02-01 16:30 UTC (permalink / raw)
  To: davem, kuba, pabeni, pablo
  Cc: netdev, netfilter-devel, jhs, xiyou.wangcong, jiri, ozsh,
	marcelo.leitner, simon.horman, Vlad Buslov

Modify flow table offload to support unidirectional connections by
extending enum nf_flow_flags with new "NF_FLOW_HW_BIDIRECTIONAL" flag. Only
offload reply direction when the flag is set. This infrastructure change is
necessary to support offloading UDP NEW connections in original direction
in following patches in series.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
---

Notes:
    Changes V2 -> V3:
    
    - Fix error in commit message (spotted by Marcelo).

 include/net/netfilter/nf_flow_table.h |  1 +
 net/netfilter/nf_flow_table_offload.c | 12 ++++++++----
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index cd982f4a0f50..88ab98ab41d9 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -164,6 +164,7 @@ enum nf_flow_flags {
 	NF_FLOW_HW_DYING,
 	NF_FLOW_HW_DEAD,
 	NF_FLOW_HW_PENDING,
+	NF_FLOW_HW_BIDIRECTIONAL,
 };
 
 enum flow_offload_type {
diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
index 4d9b99abe37d..8b852f10fab4 100644
--- a/net/netfilter/nf_flow_table_offload.c
+++ b/net/netfilter/nf_flow_table_offload.c
@@ -895,8 +895,9 @@ static int flow_offload_rule_add(struct flow_offload_work *offload,
 
 	ok_count += flow_offload_tuple_add(offload, flow_rule[0],
 					   FLOW_OFFLOAD_DIR_ORIGINAL);
-	ok_count += flow_offload_tuple_add(offload, flow_rule[1],
-					   FLOW_OFFLOAD_DIR_REPLY);
+	if (test_bit(NF_FLOW_HW_BIDIRECTIONAL, &offload->flow->flags))
+		ok_count += flow_offload_tuple_add(offload, flow_rule[1],
+						   FLOW_OFFLOAD_DIR_REPLY);
 	if (ok_count == 0)
 		return -ENOENT;
 
@@ -926,7 +927,8 @@ static void flow_offload_work_del(struct flow_offload_work *offload)
 {
 	clear_bit(IPS_HW_OFFLOAD_BIT, &offload->flow->ct->status);
 	flow_offload_tuple_del(offload, FLOW_OFFLOAD_DIR_ORIGINAL);
-	flow_offload_tuple_del(offload, FLOW_OFFLOAD_DIR_REPLY);
+	if (test_bit(NF_FLOW_HW_BIDIRECTIONAL, &offload->flow->flags))
+		flow_offload_tuple_del(offload, FLOW_OFFLOAD_DIR_REPLY);
 	set_bit(NF_FLOW_HW_DEAD, &offload->flow->flags);
 }
 
@@ -946,7 +948,9 @@ static void flow_offload_work_stats(struct flow_offload_work *offload)
 	u64 lastused;
 
 	flow_offload_tuple_stats(offload, FLOW_OFFLOAD_DIR_ORIGINAL, &stats[0]);
-	flow_offload_tuple_stats(offload, FLOW_OFFLOAD_DIR_REPLY, &stats[1]);
+	if (test_bit(NF_FLOW_HW_BIDIRECTIONAL, &offload->flow->flags))
+		flow_offload_tuple_stats(offload, FLOW_OFFLOAD_DIR_REPLY,
+					 &stats[1]);
 
 	lastused = max_t(u64, stats[0].lastused, stats[1].lastused);
 	offload->flow->timeout = max_t(u64, offload->flow->timeout,
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next v6 4/7] netfilter: flowtable: cache info of last offload
  2023-02-01 16:30 [PATCH net-next v6 0/7] Allow offloading of UDP NEW connections via act_ct Vlad Buslov
                   ` (2 preceding siblings ...)
  2023-02-01 16:30 ` [PATCH net-next v6 3/7] netfilter: flowtable: allow unidirectional rules Vlad Buslov
@ 2023-02-01 16:30 ` Vlad Buslov
  2023-02-01 16:30 ` [PATCH net-next v6 5/7] net/sched: act_ct: set ctinfo in meta action depending on ct state Vlad Buslov
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Vlad Buslov @ 2023-02-01 16:30 UTC (permalink / raw)
  To: davem, kuba, pabeni, pablo
  Cc: netdev, netfilter-devel, jhs, xiyou.wangcong, jiri, ozsh,
	marcelo.leitner, simon.horman, Vlad Buslov

Modify flow table offload to cache the last ct info status that was passed
to the driver offload callbacks by extending enum nf_flow_flags with new
"NF_FLOW_HW_ESTABLISHED" flag. Set the flag if ctinfo was 'established'
during last act_ct meta actions fill call. This infrastructure change is
necessary to optimize promoting of UDP connections from 'new' to
'established' in following patches in this series.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
---

Notes:
    Changes V5 -> V6:
    
    - Rework the patch to only set flow_offload NF_FLOW_HW_ESTABLISHED flag
    instead of caching whole ctinfo in dedicated field.
    
    Changes V3 -> V4:
    
    - New patch replaces gc async update that is no longer needed after
    refactoring of following act_ct patches.

 include/net/netfilter/nf_flow_table.h |  7 ++++---
 net/netfilter/nf_flow_table_inet.c    |  2 +-
 net/netfilter/nf_flow_table_offload.c |  6 +++---
 net/sched/act_ct.c                    | 12 +++++++-----
 4 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index 88ab98ab41d9..ebb28ec5b6fa 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -57,7 +57,7 @@ struct nf_flowtable_type {
 						 struct net_device *dev,
 						 enum flow_block_command cmd);
 	int				(*action)(struct net *net,
-						  const struct flow_offload *flow,
+						  struct flow_offload *flow,
 						  enum flow_offload_tuple_dir dir,
 						  struct nf_flow_rule *flow_rule);
 	void				(*free)(struct nf_flowtable *ft);
@@ -165,6 +165,7 @@ enum nf_flow_flags {
 	NF_FLOW_HW_DEAD,
 	NF_FLOW_HW_PENDING,
 	NF_FLOW_HW_BIDIRECTIONAL,
+	NF_FLOW_HW_ESTABLISHED,
 };
 
 enum flow_offload_type {
@@ -313,10 +314,10 @@ void nf_flow_table_offload_flush_cleanup(struct nf_flowtable *flowtable);
 int nf_flow_table_offload_setup(struct nf_flowtable *flowtable,
 				struct net_device *dev,
 				enum flow_block_command cmd);
-int nf_flow_rule_route_ipv4(struct net *net, const struct flow_offload *flow,
+int nf_flow_rule_route_ipv4(struct net *net, struct flow_offload *flow,
 			    enum flow_offload_tuple_dir dir,
 			    struct nf_flow_rule *flow_rule);
-int nf_flow_rule_route_ipv6(struct net *net, const struct flow_offload *flow,
+int nf_flow_rule_route_ipv6(struct net *net, struct flow_offload *flow,
 			    enum flow_offload_tuple_dir dir,
 			    struct nf_flow_rule *flow_rule);
 
diff --git a/net/netfilter/nf_flow_table_inet.c b/net/netfilter/nf_flow_table_inet.c
index 0ccabf3fa6aa..9505f9d188ff 100644
--- a/net/netfilter/nf_flow_table_inet.c
+++ b/net/netfilter/nf_flow_table_inet.c
@@ -39,7 +39,7 @@ nf_flow_offload_inet_hook(void *priv, struct sk_buff *skb,
 }
 
 static int nf_flow_rule_route_inet(struct net *net,
-				   const struct flow_offload *flow,
+				   struct flow_offload *flow,
 				   enum flow_offload_tuple_dir dir,
 				   struct nf_flow_rule *flow_rule)
 {
diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
index 8b852f10fab4..1c26f03fc661 100644
--- a/net/netfilter/nf_flow_table_offload.c
+++ b/net/netfilter/nf_flow_table_offload.c
@@ -679,7 +679,7 @@ nf_flow_rule_route_common(struct net *net, const struct flow_offload *flow,
 	return 0;
 }
 
-int nf_flow_rule_route_ipv4(struct net *net, const struct flow_offload *flow,
+int nf_flow_rule_route_ipv4(struct net *net, struct flow_offload *flow,
 			    enum flow_offload_tuple_dir dir,
 			    struct nf_flow_rule *flow_rule)
 {
@@ -704,7 +704,7 @@ int nf_flow_rule_route_ipv4(struct net *net, const struct flow_offload *flow,
 }
 EXPORT_SYMBOL_GPL(nf_flow_rule_route_ipv4);
 
-int nf_flow_rule_route_ipv6(struct net *net, const struct flow_offload *flow,
+int nf_flow_rule_route_ipv6(struct net *net, struct flow_offload *flow,
 			    enum flow_offload_tuple_dir dir,
 			    struct nf_flow_rule *flow_rule)
 {
@@ -735,7 +735,7 @@ nf_flow_offload_rule_alloc(struct net *net,
 {
 	const struct nf_flowtable *flowtable = offload->flowtable;
 	const struct flow_offload_tuple *tuple, *other_tuple;
-	const struct flow_offload *flow = offload->flow;
+	struct flow_offload *flow = offload->flow;
 	struct dst_entry *other_dst = NULL;
 	struct nf_flow_rule *flow_rule;
 	int err = -ENOMEM;
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 0ca2bb8ed026..5837f6258b17 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -170,11 +170,11 @@ tcf_ct_flow_table_add_action_nat_udp(const struct nf_conntrack_tuple *tuple,
 
 static void tcf_ct_flow_table_add_action_meta(struct nf_conn *ct,
 					      enum ip_conntrack_dir dir,
+					      enum ip_conntrack_info ctinfo,
 					      struct flow_action *action)
 {
 	struct nf_conn_labels *ct_labels;
 	struct flow_action_entry *entry;
-	enum ip_conntrack_info ctinfo;
 	u32 *act_ct_labels;
 
 	entry = tcf_ct_flow_table_flow_action_get_next(action);
@@ -182,8 +182,6 @@ static void tcf_ct_flow_table_add_action_meta(struct nf_conn *ct,
 #if IS_ENABLED(CONFIG_NF_CONNTRACK_MARK)
 	entry->ct_metadata.mark = READ_ONCE(ct->mark);
 #endif
-	ctinfo = dir == IP_CT_DIR_ORIGINAL ? IP_CT_ESTABLISHED :
-					     IP_CT_ESTABLISHED_REPLY;
 	/* aligns with the CT reference on the SKB nf_ct_set */
 	entry->ct_metadata.cookie = (unsigned long)ct | ctinfo;
 	entry->ct_metadata.orig_dir = dir == IP_CT_DIR_ORIGINAL;
@@ -237,22 +235,26 @@ static int tcf_ct_flow_table_add_action_nat(struct net *net,
 }
 
 static int tcf_ct_flow_table_fill_actions(struct net *net,
-					  const struct flow_offload *flow,
+					  struct flow_offload *flow,
 					  enum flow_offload_tuple_dir tdir,
 					  struct nf_flow_rule *flow_rule)
 {
 	struct flow_action *action = &flow_rule->rule->action;
 	int num_entries = action->num_entries;
 	struct nf_conn *ct = flow->ct;
+	enum ip_conntrack_info ctinfo;
 	enum ip_conntrack_dir dir;
 	int i, err;
 
 	switch (tdir) {
 	case FLOW_OFFLOAD_DIR_ORIGINAL:
 		dir = IP_CT_DIR_ORIGINAL;
+		ctinfo = IP_CT_ESTABLISHED;
+		set_bit(NF_FLOW_HW_ESTABLISHED, &flow->flags);
 		break;
 	case FLOW_OFFLOAD_DIR_REPLY:
 		dir = IP_CT_DIR_REPLY;
+		ctinfo = IP_CT_ESTABLISHED_REPLY;
 		break;
 	default:
 		return -EOPNOTSUPP;
@@ -262,7 +264,7 @@ static int tcf_ct_flow_table_fill_actions(struct net *net,
 	if (err)
 		goto err_nat;
 
-	tcf_ct_flow_table_add_action_meta(ct, dir, action);
+	tcf_ct_flow_table_add_action_meta(ct, dir, ctinfo, action);
 	return 0;
 
 err_nat:
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next v6 5/7] net/sched: act_ct: set ctinfo in meta action depending on ct state
  2023-02-01 16:30 [PATCH net-next v6 0/7] Allow offloading of UDP NEW connections via act_ct Vlad Buslov
                   ` (3 preceding siblings ...)
  2023-02-01 16:30 ` [PATCH net-next v6 4/7] netfilter: flowtable: cache info of last offload Vlad Buslov
@ 2023-02-01 16:30 ` Vlad Buslov
  2023-02-01 16:30 ` [PATCH net-next v6 6/7] net/sched: act_ct: offload UDP NEW connections Vlad Buslov
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Vlad Buslov @ 2023-02-01 16:30 UTC (permalink / raw)
  To: davem, kuba, pabeni, pablo
  Cc: netdev, netfilter-devel, jhs, xiyou.wangcong, jiri, ozsh,
	marcelo.leitner, simon.horman, Vlad Buslov

Currently tcf_ct_flow_table_fill_actions() function assumes that only
established connections can be offloaded and always sets ctinfo to either
IP_CT_ESTABLISHED or IP_CT_ESTABLISHED_REPLY strictly based on direction
without checking actual connection state. To enable UDP NEW connection
offload set the ctinfo, metadata cookie and NF_FLOW_HW_ESTABLISHED
flow_offload flags bit based on ct->status value.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
---

Notes:
    Changes V5 -> V6:
    
    - Update to use flow_offload NF_FLOW_HW_ESTABLISHED bit instead of
    ext_data pointer.

 net/sched/act_ct.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 5837f6258b17..4dad7bf64b14 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -249,8 +249,10 @@ static int tcf_ct_flow_table_fill_actions(struct net *net,
 	switch (tdir) {
 	case FLOW_OFFLOAD_DIR_ORIGINAL:
 		dir = IP_CT_DIR_ORIGINAL;
-		ctinfo = IP_CT_ESTABLISHED;
-		set_bit(NF_FLOW_HW_ESTABLISHED, &flow->flags);
+		ctinfo = test_bit(IPS_SEEN_REPLY_BIT, &ct->status) ?
+			IP_CT_ESTABLISHED : IP_CT_NEW;
+		if (ctinfo == IP_CT_ESTABLISHED)
+			set_bit(NF_FLOW_HW_ESTABLISHED, &flow->flags);
 		break;
 	case FLOW_OFFLOAD_DIR_REPLY:
 		dir = IP_CT_DIR_REPLY;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next v6 6/7] net/sched: act_ct: offload UDP NEW connections
  2023-02-01 16:30 [PATCH net-next v6 0/7] Allow offloading of UDP NEW connections via act_ct Vlad Buslov
                   ` (4 preceding siblings ...)
  2023-02-01 16:30 ` [PATCH net-next v6 5/7] net/sched: act_ct: set ctinfo in meta action depending on ct state Vlad Buslov
@ 2023-02-01 16:30 ` Vlad Buslov
  2023-02-01 16:31 ` [PATCH net-next v6 7/7] netfilter: nf_conntrack: allow early drop of offloaded UDP conns Vlad Buslov
  2023-02-03  9:40 ` [PATCH net-next v6 0/7] Allow offloading of UDP NEW connections via act_ct patchwork-bot+netdevbpf
  7 siblings, 0 replies; 9+ messages in thread
From: Vlad Buslov @ 2023-02-01 16:30 UTC (permalink / raw)
  To: davem, kuba, pabeni, pablo
  Cc: netdev, netfilter-devel, jhs, xiyou.wangcong, jiri, ozsh,
	marcelo.leitner, simon.horman, Vlad Buslov

Modify the offload algorithm of UDP connections to the following:

- Offload NEW connection as unidirectional.

- When connection state changes to ESTABLISHED also update the hardware
flow. However, in order to prevent act_ct from spamming offload add wq for
every packet coming in reply direction in this state verify whether
connection has already been updated to ESTABLISHED in the drivers. If that
it the case, then skip flow_table and let conntrack handle such packets
which will also allow conntrack to potentially promote the connection to
ASSURED.

- When connection state changes to ASSURED set the flow_table flow
NF_FLOW_HW_BIDIRECTIONAL flag which will cause refresh mechanism to offload
the reply direction.

All other protocols have their offload algorithm preserved and are always
offloaded as bidirectional.

Note that this change tries to minimize the load on flow_table add
workqueue. First, it tracks the last ctinfo that was offloaded by using new
flow 'NF_FLOW_HW_ESTABLISHED' flag and doesn't schedule the refresh for
reply direction packets when the offloads have already been updated with
current ctinfo. Second, when 'add' task executes on workqueue it always
update the offload with current flow state (by checking 'bidirectional'
flow flag and obtaining actual ctinfo/cookie through meta action instead of
caching any of these from the moment of scheduling the 'add' work)
preventing the need from scheduling more updates if state changed
concurrently while the 'add' work was pending on workqueue.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
---

Notes:
    Changes V5 -> V6:
    
    - Use NF_FLOW_HW_ESTABLISHED bit instead of ext_data pointer to determine
    the ctinfo of last offload call.
    
    Changes V4 -> V5:
    
    - Make clang happy.
    
    Changes V3 -> V4:
    
    - Refactor the patch to leverage the refresh code and new flow 'ext_data'
    field in order to change the offload state instead of relying on async gc
    update.

 net/sched/act_ct.c | 51 +++++++++++++++++++++++++++++++++++-----------
 1 file changed, 39 insertions(+), 12 deletions(-)

diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 4dad7bf64b14..38095524d98b 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -369,7 +369,7 @@ static void tcf_ct_flow_tc_ifidx(struct flow_offload *entry,
 
 static void tcf_ct_flow_table_add(struct tcf_ct_flow_table *ct_ft,
 				  struct nf_conn *ct,
-				  bool tcp)
+				  bool tcp, bool bidirectional)
 {
 	struct nf_conn_act_ct_ext *act_ct_ext;
 	struct flow_offload *entry;
@@ -388,6 +388,8 @@ static void tcf_ct_flow_table_add(struct tcf_ct_flow_table *ct_ft,
 		ct->proto.tcp.seen[0].flags |= IP_CT_TCP_FLAG_BE_LIBERAL;
 		ct->proto.tcp.seen[1].flags |= IP_CT_TCP_FLAG_BE_LIBERAL;
 	}
+	if (bidirectional)
+		__set_bit(NF_FLOW_HW_BIDIRECTIONAL, &entry->flags);
 
 	act_ct_ext = nf_conn_act_ct_ext_find(ct);
 	if (act_ct_ext) {
@@ -411,26 +413,34 @@ static void tcf_ct_flow_table_process_conn(struct tcf_ct_flow_table *ct_ft,
 					   struct nf_conn *ct,
 					   enum ip_conntrack_info ctinfo)
 {
-	bool tcp = false;
-
-	if ((ctinfo != IP_CT_ESTABLISHED && ctinfo != IP_CT_ESTABLISHED_REPLY) ||
-	    !test_bit(IPS_ASSURED_BIT, &ct->status))
-		return;
+	bool tcp = false, bidirectional = true;
 
 	switch (nf_ct_protonum(ct)) {
 	case IPPROTO_TCP:
-		tcp = true;
-		if (ct->proto.tcp.state != TCP_CONNTRACK_ESTABLISHED)
+		if ((ctinfo != IP_CT_ESTABLISHED &&
+		     ctinfo != IP_CT_ESTABLISHED_REPLY) ||
+		    !test_bit(IPS_ASSURED_BIT, &ct->status) ||
+		    ct->proto.tcp.state != TCP_CONNTRACK_ESTABLISHED)
 			return;
+
+		tcp = true;
 		break;
 	case IPPROTO_UDP:
+		if (!nf_ct_is_confirmed(ct))
+			return;
+		if (!test_bit(IPS_ASSURED_BIT, &ct->status))
+			bidirectional = false;
 		break;
 #ifdef CONFIG_NF_CT_PROTO_GRE
 	case IPPROTO_GRE: {
 		struct nf_conntrack_tuple *tuple;
 
-		if (ct->status & IPS_NAT_MASK)
+		if ((ctinfo != IP_CT_ESTABLISHED &&
+		     ctinfo != IP_CT_ESTABLISHED_REPLY) ||
+		    !test_bit(IPS_ASSURED_BIT, &ct->status) ||
+		    ct->status & IPS_NAT_MASK)
 			return;
+
 		tuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
 		/* No support for GRE v1 */
 		if (tuple->src.u.gre.key || tuple->dst.u.gre.key)
@@ -446,7 +456,7 @@ static void tcf_ct_flow_table_process_conn(struct tcf_ct_flow_table *ct_ft,
 	    ct->status & IPS_SEQ_ADJUST)
 		return;
 
-	tcf_ct_flow_table_add(ct_ft, ct, tcp);
+	tcf_ct_flow_table_add(ct_ft, ct, tcp, bidirectional);
 }
 
 static bool
@@ -625,13 +635,30 @@ static bool tcf_ct_flow_table_lookup(struct tcf_ct_params *p,
 	flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]);
 	ct = flow->ct;
 
+	if (dir == FLOW_OFFLOAD_DIR_REPLY &&
+	    !test_bit(NF_FLOW_HW_BIDIRECTIONAL, &flow->flags)) {
+		/* Only offload reply direction after connection became
+		 * assured.
+		 */
+		if (test_bit(IPS_ASSURED_BIT, &ct->status))
+			set_bit(NF_FLOW_HW_BIDIRECTIONAL, &flow->flags);
+		else if (test_bit(NF_FLOW_HW_ESTABLISHED, &flow->flags))
+			/* If flow_table flow has already been updated to the
+			 * established state, then don't refresh.
+			 */
+			return false;
+	}
+
 	if (tcph && (unlikely(tcph->fin || tcph->rst))) {
 		flow_offload_teardown(flow);
 		return false;
 	}
 
-	ctinfo = dir == FLOW_OFFLOAD_DIR_ORIGINAL ? IP_CT_ESTABLISHED :
-						    IP_CT_ESTABLISHED_REPLY;
+	if (dir == FLOW_OFFLOAD_DIR_ORIGINAL)
+		ctinfo = test_bit(IPS_SEEN_REPLY_BIT, &ct->status) ?
+			IP_CT_ESTABLISHED : IP_CT_NEW;
+	else
+		ctinfo = IP_CT_ESTABLISHED_REPLY;
 
 	flow_offload_refresh(nf_ft, flow);
 	nf_conntrack_get(&ct->ct_general);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next v6 7/7] netfilter: nf_conntrack: allow early drop of offloaded UDP conns
  2023-02-01 16:30 [PATCH net-next v6 0/7] Allow offloading of UDP NEW connections via act_ct Vlad Buslov
                   ` (5 preceding siblings ...)
  2023-02-01 16:30 ` [PATCH net-next v6 6/7] net/sched: act_ct: offload UDP NEW connections Vlad Buslov
@ 2023-02-01 16:31 ` Vlad Buslov
  2023-02-03  9:40 ` [PATCH net-next v6 0/7] Allow offloading of UDP NEW connections via act_ct patchwork-bot+netdevbpf
  7 siblings, 0 replies; 9+ messages in thread
From: Vlad Buslov @ 2023-02-01 16:31 UTC (permalink / raw)
  To: davem, kuba, pabeni, pablo
  Cc: netdev, netfilter-devel, jhs, xiyou.wangcong, jiri, ozsh,
	marcelo.leitner, simon.horman, Vlad Buslov

Both synchronous early drop algorithm and asynchronous gc worker completely
ignore connections with IPS_OFFLOAD_BIT status bit set. With new
functionality that enabled UDP NEW connection offload in action CT
malicious user can flood the conntrack table with offloaded UDP connections
by just sending a single packet per 5tuple because such connections can no
longer be deleted by early drop algorithm.

To mitigate the issue allow both early drop and gc to consider offloaded
UDP connections for deletion.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
---
 net/netfilter/nf_conntrack_core.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 496c4920505b..52b824a60176 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1374,9 +1374,6 @@ static unsigned int early_drop_list(struct net *net,
 	hlist_nulls_for_each_entry_rcu(h, n, head, hnnode) {
 		tmp = nf_ct_tuplehash_to_ctrack(h);
 
-		if (test_bit(IPS_OFFLOAD_BIT, &tmp->status))
-			continue;
-
 		if (nf_ct_is_expired(tmp)) {
 			nf_ct_gc_expired(tmp);
 			continue;
@@ -1446,11 +1443,14 @@ static bool gc_worker_skip_ct(const struct nf_conn *ct)
 static bool gc_worker_can_early_drop(const struct nf_conn *ct)
 {
 	const struct nf_conntrack_l4proto *l4proto;
+	u8 protonum = nf_ct_protonum(ct);
 
+	if (test_bit(IPS_OFFLOAD_BIT, &ct->status) && protonum != IPPROTO_UDP)
+		return false;
 	if (!test_bit(IPS_ASSURED_BIT, &ct->status))
 		return true;
 
-	l4proto = nf_ct_l4proto_find(nf_ct_protonum(ct));
+	l4proto = nf_ct_l4proto_find(protonum);
 	if (l4proto->can_early_drop && l4proto->can_early_drop(ct))
 		return true;
 
@@ -1507,7 +1507,8 @@ static void gc_worker(struct work_struct *work)
 
 			if (test_bit(IPS_OFFLOAD_BIT, &tmp->status)) {
 				nf_ct_offload_timeout(tmp);
-				continue;
+				if (!nf_conntrack_max95)
+					continue;
 			}
 
 			if (expired_count > GC_SCAN_EXPIRED_MAX) {
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v6 0/7] Allow offloading of UDP NEW connections via act_ct
  2023-02-01 16:30 [PATCH net-next v6 0/7] Allow offloading of UDP NEW connections via act_ct Vlad Buslov
                   ` (6 preceding siblings ...)
  2023-02-01 16:31 ` [PATCH net-next v6 7/7] netfilter: nf_conntrack: allow early drop of offloaded UDP conns Vlad Buslov
@ 2023-02-03  9:40 ` patchwork-bot+netdevbpf
  7 siblings, 0 replies; 9+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-02-03  9:40 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: davem, kuba, pabeni, pablo, netdev, netfilter-devel, jhs,
	xiyou.wangcong, jiri, ozsh, marcelo.leitner, simon.horman

Hello:

This series was applied to netdev/net-next.git (master)
by David S. Miller <davem@davemloft.net>:

On Wed, 1 Feb 2023 17:30:53 +0100 you wrote:
> Currently only bidirectional established connections can be offloaded
> via act_ct. Such approach allows to hardcode a lot of assumptions into
> act_ct, flow_table and flow_offload intermediate layer codes. In order
> to enabled offloading of unidirectional UDP NEW connections start with
> incrementally changing the following assumptions:
> 
> - Drivers assume that only established connections are offloaded and
>   don't support updating existing connections. Extract ctinfo from meta
>   action cookie and refuse offloading of new connections in the drivers.
> 
> [...]

Here is the summary with links:
  - [net-next,v6,1/7] net: flow_offload: provision conntrack info in ct_metadata
    https://git.kernel.org/netdev/net-next/c/29744a10c59e
  - [net-next,v6,2/7] netfilter: flowtable: fixup UDP timeout depending on ct state
    https://git.kernel.org/netdev/net-next/c/0eb5acb16418
  - [net-next,v6,3/7] netfilter: flowtable: allow unidirectional rules
    https://git.kernel.org/netdev/net-next/c/8f84780b84d6
  - [net-next,v6,4/7] netfilter: flowtable: cache info of last offload
    https://git.kernel.org/netdev/net-next/c/1a441a9b8be8
  - [net-next,v6,5/7] net/sched: act_ct: set ctinfo in meta action depending on ct state
    https://git.kernel.org/netdev/net-next/c/d5774cb6c55c
  - [net-next,v6,6/7] net/sched: act_ct: offload UDP NEW connections
    https://git.kernel.org/netdev/net-next/c/6a9bad0069cf
  - [net-next,v6,7/7] netfilter: nf_conntrack: allow early drop of offloaded UDP conns
    https://git.kernel.org/netdev/net-next/c/df25455e5a48

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-02-03  9:40 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-01 16:30 [PATCH net-next v6 0/7] Allow offloading of UDP NEW connections via act_ct Vlad Buslov
2023-02-01 16:30 ` [PATCH net-next v6 1/7] net: flow_offload: provision conntrack info in ct_metadata Vlad Buslov
2023-02-01 16:30 ` [PATCH net-next v6 2/7] netfilter: flowtable: fixup UDP timeout depending on ct state Vlad Buslov
2023-02-01 16:30 ` [PATCH net-next v6 3/7] netfilter: flowtable: allow unidirectional rules Vlad Buslov
2023-02-01 16:30 ` [PATCH net-next v6 4/7] netfilter: flowtable: cache info of last offload Vlad Buslov
2023-02-01 16:30 ` [PATCH net-next v6 5/7] net/sched: act_ct: set ctinfo in meta action depending on ct state Vlad Buslov
2023-02-01 16:30 ` [PATCH net-next v6 6/7] net/sched: act_ct: offload UDP NEW connections Vlad Buslov
2023-02-01 16:31 ` [PATCH net-next v6 7/7] netfilter: nf_conntrack: allow early drop of offloaded UDP conns Vlad Buslov
2023-02-03  9:40 ` [PATCH net-next v6 0/7] Allow offloading of UDP NEW connections via act_ct patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.