All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next ct-offload v3 00/15] Introduce connection tracking offload
@ 2020-03-11 14:33 Paul Blakey
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 01/15] net/mlx5: E-Switch, Enable reg c1 loopback when possible Paul Blakey
                   ` (15 more replies)
  0 siblings, 16 replies; 27+ messages in thread
From: Paul Blakey @ 2020-03-11 14:33 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko, Roi Dayan

Background
----------

The connection tracking action provides the ability to associate connection state to a packet.
The connection state may be used for stateful packet processing such as stateful firewalls
and NAT operations.

Connection tracking in TC SW
----------------------------

The CT state may be matched only after the CT action is performed.
As such, CT use cases are commonly implemented using multiple chains.
Consider the following TC filters, as an example:
1. tc filter add dev ens1f0_0 ingress prio 1 chain 0 proto ip flower \
    src_mac 24:8a:07:a5:28:01 ct_state -trk \
    action ct \
    pipe action goto chain 2
       
2. tc filter add dev ens1f0_0 ingress prio 1 chain 2 proto ip flower \
    ct_state +trk+new \
    action ct commit \
    pipe action tunnel_key set \
        src_ip 0.0.0.0 \
        dst_ip 7.7.7.8 \
        id 98 \
        dst_port 4789 \
    action mirred egress redirect dev vxlan0
       
3. tc filter add dev ens1f0_0 ingress prio 1 chain 2 proto ip flower \
    ct_state +trk+est \
    action tunnel_key set \
        src_ip 0.0.0.0 \
        dst_ip 7.7.7.8 \
        id 98 \
        dst_port 4789 \
    action mirred egress redirect dev vxlan0
       
Filter #1 (chain 0) decides, after initial packet classification, to send the packet to the
connection tracking module (ct action).
Once the ct_state is initialized by the CT action the packet processing continues on chain 2.

Chain 2 classifies the packet based on the ct_state.
Filter #2 matches on the +trk+new CT state while filter #3 matches on the +trk+est ct_state.

MLX5 Connection tracking HW offload - MLX5 driver patches
------------------------------

The MLX5 hardware model aligns with the software model by realizing a multi-table
architecture. In SW the TC CT action sets the CT state on the skb. Similarly,
HW sets the CT state on a HW register. Driver gets this CT state while offloading
a tuple with a new ct_metadata action that provides it.

Matches on ct_state are translated to HW register matches.
    
TC filter with CT action broken to two rules, a pre_ct rule, and a post_ct rule.
pre_ct rule:
   Inserted on the corrosponding tc chain table, matches on original tc match, with
   actions: any pre ct actions, set fte_id, set zone, and goto the ct table.
   The fte_id is a register mapping uniquely identifying this filter.
post_ct_rule:
   Inserted in a post_ct table, matches on the fte_id register mapping, with
   actions: counter + any post ct actions (this is usally 'goto chain X')

post_ct table is a table that all the tuples inserted to the ct table goto, so
if there is a tuple hit, packet will continue from ct table to post_ct table,
after being marked with the CT state (mark/label..)

This design ensures that the rule's actions and counters will be executed only after a CT hit.
HW misses will continue processing in SW from the last chain ID that was processed in hardware.

The following illustrates the HW model:

+-------------------+      +--------------------+    +--------------+
+ pre_ct (tc chain) +----->+ CT (nat or no nat) +--->+ post_ct      +----->
+ original match    +   |  + tuple + zone match + |  + fte_id match +  |
+-------------------+   |  +--------------------+ |  +--------------+  |
                        v                         v                    v
                     set chain miss mapping    set mark             original
                     set fte_id                set label            filter
                     set zone                  set established      actions
                     set tunnel_id             do nat (if needed)
                     do decap

To fill CT table, driver registers a CB for flow offload events, for each new
flow table that is passed to it from offloading ct actions. Once a flow offload
event is triggered on this CB, offload this flow to the hardware CT table.

Established events offload
--------------------------

Currently, act_ct maintains an FT instance per ct zone. Flow table entries
are created, per ct connection, when connections enter an established
state and deleted otherwise. Once an entry is created, the FT assumes
ownership of the entries, and manages their aging. FT is used for software
offload of conntrack. FT entries associate 5-tuples with an action list.

The act_ct changes in this patchset:
Populate the action list with a (new) ct_metadata action, providing the
connection's ct state (zone,mark and label), and mangle actions if NAT
is configured.

Pass the action's flow table instance as ct action entry parameter,
so  when the action is offloaded, the driver may register a callback on
it's block to receive FT flow offload add/del/stats events.

Netilter changes
--------------------------
The netfilter changes export the relevant bits, and add the relevant CBs
to support the above.

Applying this patchset
--------------------------

On top of current net-next ("r8169: simplify getting stats by using netdev_stats_to_stats64"),
pull Saeed's ct-offload branch, from git git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git
and fix the following non trivial conflict in fs_core.c as follows:
#define OFFLOADS_MAX_FT 2
#define OFFLOADS_NUM_PRIOS 2
#define OFFLOADS_MIN_LEVEL (ANCHOR_MIN_LEVEL + OFFLOADS_NUM_PRIOS)

Then apply this patchset.

Changelog:
  v2->v3:
    Added the first two patches needed after rebasing on net-next:
     "net/mlx5: E-Switch, Enable reg c1 loopback when possible"
     "net/mlx5e: en_rep: Create uplink rep root table after eswitch offloads table"

Paul Blakey (15):
  net/mlx5: E-Switch, Enable reg c1 loopback when possible
  net/mlx5e: en_rep: Create uplink rep root table after eswitch offloads table
  netfilter: flowtable: Add API for registering to flow table events
  net/sched: act_ct: Instantiate flow table entry actions
  net/sched: act_ct: Support restoring conntrack info on skbs
  net/sched: act_ct: Support refreshing the flow table entries
  net/sched: act_ct: Enable hardware offload of flow table entires
  net/mlx5: E-Switch, Introduce global tables
  net/mlx5: E-Switch, Add support for offloading rules with no in_port
  net/mlx5: E-Switch, Support getting chain mapping
  flow_offload: Add flow_match_ct to get rule ct match
  net/mlx5e: CT: Introduce connection tracking
  net/mlx5e: CT: Offload established flows
  net/mlx5e: CT: Handle misses after executing CT action
  net/mlx5e: CT: Support clear action

 drivers/net/ethernet/mellanox/mlx5/core/Kconfig    |   10 +
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |    1 +
 drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c | 1347 ++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h |  171 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |    1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.h   |    3 +
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    |  120 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.h    |    9 +
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |    6 +
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c |   66 +-
 .../mellanox/mlx5/core/eswitch_offloads_chains.c   |   43 +
 .../mellanox/mlx5/core/eswitch_offloads_chains.h   |   13 +
 include/linux/mlx5/eswitch.h                       |    7 +
 include/net/flow_offload.h                         |   13 +
 include/net/netfilter/nf_flow_table.h              |   32 +
 include/net/tc_act/tc_ct.h                         |   17 +
 net/core/flow_offload.c                            |    7 +
 net/netfilter/nf_flow_table_core.c                 |   60 +
 net/netfilter/nf_flow_table_ip.c                   |   15 +-
 net/netfilter/nf_flow_table_offload.c              |   27 +-
 net/sched/act_ct.c                                 |  225 ++++
 net/sched/cls_api.c                                |    1 +
 22 files changed, 2124 insertions(+), 70 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH net-next ct-offload v3 01/15] net/mlx5: E-Switch, Enable reg c1 loopback when possible
  2020-03-11 14:33 [PATCH net-next ct-offload v3 00/15] Introduce connection tracking offload Paul Blakey
@ 2020-03-11 14:33 ` Paul Blakey
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 02/15] net/mlx5e: en_rep: Create uplink rep root table after eswitch offloads table Paul Blakey
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Paul Blakey @ 2020-03-11 14:33 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko, Roi Dayan

Enable reg c1 loopback if firmware reports it's supported,
as this is needed for restoring packet metadata (e.g chain).

Also define helper to query if it is enabled.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  1 +
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 44 ++++++++++++++++------
 include/linux/mlx5/eswitch.h                       |  7 ++++
 3 files changed, 41 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 9b5eaa8..ee36a8a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -236,6 +236,7 @@ struct mlx5_esw_functions {
 
 enum {
 	MLX5_ESWITCH_VPORT_MATCH_METADATA = BIT(0),
+	MLX5_ESWITCH_REG_C1_LOOPBACK_ENABLED = BIT(1),
 };
 
 struct mlx5_eswitch {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 25ae38d..c79a73b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -763,14 +763,21 @@ void mlx5_eswitch_del_send_to_vport_rule(struct mlx5_flow_handle *rule)
 	mlx5_del_flow_rules(rule);
 }
 
+static bool mlx5_eswitch_reg_c1_loopback_supported(struct mlx5_eswitch *esw)
+{
+	return MLX5_CAP_ESW_FLOWTABLE(esw->dev, fdb_to_vport_reg_c_id) &
+	       MLX5_FDB_TO_VPORT_REG_C_1;
+}
+
 static int esw_set_passing_vport_metadata(struct mlx5_eswitch *esw, bool enable)
 {
 	u32 out[MLX5_ST_SZ_DW(query_esw_vport_context_out)] = {};
 	u32 in[MLX5_ST_SZ_DW(modify_esw_vport_context_in)] = {};
-	u8 fdb_to_vport_reg_c_id;
+	u8 curr, wanted;
 	int err;
 
-	if (!mlx5_eswitch_vport_match_metadata_enabled(esw))
+	if (!mlx5_eswitch_reg_c1_loopback_supported(esw) &&
+	    !mlx5_eswitch_vport_match_metadata_enabled(esw))
 		return 0;
 
 	err = mlx5_eswitch_query_esw_vport_context(esw->dev, 0, false,
@@ -778,24 +785,33 @@ static int esw_set_passing_vport_metadata(struct mlx5_eswitch *esw, bool enable)
 	if (err)
 		return err;
 
-	fdb_to_vport_reg_c_id = MLX5_GET(query_esw_vport_context_out, out,
-					 esw_vport_context.fdb_to_vport_reg_c_id);
+	curr = MLX5_GET(query_esw_vport_context_out, out,
+			esw_vport_context.fdb_to_vport_reg_c_id);
+	wanted = MLX5_FDB_TO_VPORT_REG_C_0;
+	if (mlx5_eswitch_reg_c1_loopback_supported(esw))
+		wanted |= MLX5_FDB_TO_VPORT_REG_C_1;
 
 	if (enable)
-		fdb_to_vport_reg_c_id |= MLX5_FDB_TO_VPORT_REG_C_0 |
-					 MLX5_FDB_TO_VPORT_REG_C_1;
+		curr |= wanted;
 	else
-		fdb_to_vport_reg_c_id &= ~(MLX5_FDB_TO_VPORT_REG_C_0 |
-					   MLX5_FDB_TO_VPORT_REG_C_1);
+		curr &= ~wanted;
 
 	MLX5_SET(modify_esw_vport_context_in, in,
-		 esw_vport_context.fdb_to_vport_reg_c_id, fdb_to_vport_reg_c_id);
+		 esw_vport_context.fdb_to_vport_reg_c_id, curr);
 
 	MLX5_SET(modify_esw_vport_context_in, in,
 		 field_select.fdb_to_vport_reg_c_id, 1);
 
-	return mlx5_eswitch_modify_esw_vport_context(esw->dev, 0, false,
-						     in, sizeof(in));
+	err = mlx5_eswitch_modify_esw_vport_context(esw->dev, 0, false, in,
+						    sizeof(in));
+	if (!err) {
+		if (enable && (curr & MLX5_FDB_TO_VPORT_REG_C_1))
+			esw->flags |= MLX5_ESWITCH_REG_C1_LOOPBACK_ENABLED;
+		else
+			esw->flags &= ~MLX5_ESWITCH_REG_C1_LOOPBACK_ENABLED;
+	}
+
+	return err;
 }
 
 static void peer_miss_rules_setup(struct mlx5_eswitch *esw,
@@ -2830,6 +2846,12 @@ bool mlx5_eswitch_is_vf_vport(const struct mlx5_eswitch *esw, u16 vport_num)
 	       vport_num <= esw->dev->priv.sriov.max_vfs;
 }
 
+bool mlx5_eswitch_reg_c1_loopback_enabled(const struct mlx5_eswitch *esw)
+{
+	return !!(esw->flags & MLX5_ESWITCH_REG_C1_LOOPBACK_ENABLED);
+}
+EXPORT_SYMBOL(mlx5_eswitch_reg_c1_loopback_enabled);
+
 bool mlx5_eswitch_vport_match_metadata_enabled(const struct mlx5_eswitch *esw)
 {
 	return !!(esw->flags & MLX5_ESWITCH_VPORT_MATCH_METADATA);
diff --git a/include/linux/mlx5/eswitch.h b/include/linux/mlx5/eswitch.h
index 61705e7..c16827e 100644
--- a/include/linux/mlx5/eswitch.h
+++ b/include/linux/mlx5/eswitch.h
@@ -70,6 +70,7 @@ struct mlx5_flow_handle *
 enum devlink_eswitch_encap_mode
 mlx5_eswitch_get_encap_mode(const struct mlx5_core_dev *dev);
 
+bool mlx5_eswitch_reg_c1_loopback_enabled(const struct mlx5_eswitch *esw);
 bool mlx5_eswitch_vport_match_metadata_enabled(const struct mlx5_eswitch *esw);
 
 /* Reg C0 usage:
@@ -109,6 +110,12 @@ static inline u8 mlx5_eswitch_mode(struct mlx5_eswitch *esw)
 }
 
 static inline bool
+mlx5_eswitch_reg_c1_loopback_enabled(const struct mlx5_eswitch *esw)
+{
+	return false;
+};
+
+static inline bool
 mlx5_eswitch_vport_match_metadata_enabled(const struct mlx5_eswitch *esw)
 {
 	return false;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next ct-offload v3 02/15] net/mlx5e: en_rep: Create uplink rep root table after eswitch offloads table
  2020-03-11 14:33 [PATCH net-next ct-offload v3 00/15] Introduce connection tracking offload Paul Blakey
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 01/15] net/mlx5: E-Switch, Enable reg c1 loopback when possible Paul Blakey
@ 2020-03-11 14:33 ` Paul Blakey
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 03/15] netfilter: flowtable: Add API for registering to flow table events Paul Blakey
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Paul Blakey @ 2020-03-11 14:33 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko, Roi Dayan

The eswitch offloads table, which has the reps (vport) rx miss rules,
was moved from OFFLOADS namespace [0,0] (prio, level), to [1,0], so
the restore table (the new [0,0]) can come before it. The destinations
of these miss rules is the rep root ft (ttc for non uplink reps).

Uplink rep root ft is created as OFFLOADS namespace [0,1], and is used
as a hook to next RX prio (either ethtool or ttc), but this fails to
pass fs_core level's check.

Move uplink rep root ft to OFFLOADS prio 1, level 1 ([1,1]), so it
will keep the same relative position after the restore table
change.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 3aef7faa..a33d151 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -1607,6 +1607,7 @@ static int mlx5e_create_rep_root_ft(struct mlx5e_priv *priv)
 	}
 
 	ft_attr.max_fte = 0; /* Empty table, miss rule will always point to next table */
+	ft_attr.prio = 1;
 	ft_attr.level = 1;
 
 	rpriv->root_ft = mlx5_create_flow_table(ns, &ft_attr);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next ct-offload v3 03/15] netfilter: flowtable: Add API for registering to flow table events
  2020-03-11 14:33 [PATCH net-next ct-offload v3 00/15] Introduce connection tracking offload Paul Blakey
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 01/15] net/mlx5: E-Switch, Enable reg c1 loopback when possible Paul Blakey
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 02/15] net/mlx5e: en_rep: Create uplink rep root table after eswitch offloads table Paul Blakey
@ 2020-03-11 14:33 ` Paul Blakey
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 04/15] net/sched: act_ct: Instantiate flow table entry actions Paul Blakey
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Paul Blakey @ 2020-03-11 14:33 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko, Roi Dayan

Let drivers to add their cb allowing them to receive flow offload events
of type TC_SETUP_CLSFLOWER (REPLACE/DEL/STATS) for flows managed by the
flow table.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/netfilter/nf_flow_table.h |  6 +++++
 net/netfilter/nf_flow_table_core.c    | 47 +++++++++++++++++++++++++++++++++++
 net/netfilter/nf_flow_table_offload.c |  4 +++
 3 files changed, 57 insertions(+)

diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index e0f709d9..d9d0945 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -44,6 +44,7 @@ struct nf_flowtable {
 	struct delayed_work		gc_work;
 	unsigned int			flags;
 	struct flow_block		flow_block;
+	struct mutex			flow_block_lock; /* Guards flow_block */
 	possible_net_t			net;
 };
 
@@ -129,6 +130,11 @@ struct nf_flow_route {
 struct flow_offload *flow_offload_alloc(struct nf_conn *ct);
 void flow_offload_free(struct flow_offload *flow);
 
+int nf_flow_table_offload_add_cb(struct nf_flowtable *flow_table,
+				 flow_setup_cb_t *cb, void *cb_priv);
+void nf_flow_table_offload_del_cb(struct nf_flowtable *flow_table,
+				  flow_setup_cb_t *cb, void *cb_priv);
+
 int flow_offload_route_init(struct flow_offload *flow,
 			    const struct nf_flow_route *route);
 
diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index 8af28e1..4af0327 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -372,6 +372,50 @@ static void nf_flow_offload_work_gc(struct work_struct *work)
 	queue_delayed_work(system_power_efficient_wq, &flow_table->gc_work, HZ);
 }
 
+int nf_flow_table_offload_add_cb(struct nf_flowtable *flow_table,
+				 flow_setup_cb_t *cb, void *cb_priv)
+{
+	struct flow_block *block = &flow_table->flow_block;
+	struct flow_block_cb *block_cb;
+	int err = 0;
+
+	mutex_lock(&flow_table->flow_block_lock);
+	block_cb = flow_block_cb_lookup(block, cb, cb_priv);
+	if (block_cb) {
+		err = -EEXIST;
+		goto unlock;
+	}
+
+	block_cb = flow_block_cb_alloc(cb, cb_priv, cb_priv, NULL);
+	if (IS_ERR(block_cb)) {
+		err = PTR_ERR(block_cb);
+		goto unlock;
+	}
+
+	list_add_tail(&block_cb->list, &block->cb_list);
+
+unlock:
+	mutex_unlock(&flow_table->flow_block_lock);
+	return err;
+}
+EXPORT_SYMBOL_GPL(nf_flow_table_offload_add_cb);
+
+void nf_flow_table_offload_del_cb(struct nf_flowtable *flow_table,
+				  flow_setup_cb_t *cb, void *cb_priv)
+{
+	struct flow_block *block = &flow_table->flow_block;
+	struct flow_block_cb *block_cb;
+
+	mutex_lock(&flow_table->flow_block_lock);
+	block_cb = flow_block_cb_lookup(block, cb, cb_priv);
+	if (block_cb)
+		list_del(&block_cb->list);
+	else
+		WARN_ON(true);
+	mutex_unlock(&flow_table->flow_block_lock);
+}
+EXPORT_SYMBOL_GPL(nf_flow_table_offload_del_cb);
+
 static int nf_flow_nat_port_tcp(struct sk_buff *skb, unsigned int thoff,
 				__be16 port, __be16 new_port)
 {
@@ -494,6 +538,7 @@ int nf_flow_table_init(struct nf_flowtable *flowtable)
 
 	INIT_DEFERRABLE_WORK(&flowtable->gc_work, nf_flow_offload_work_gc);
 	flow_block_init(&flowtable->flow_block);
+	mutex_init(&flowtable->flow_block_lock);
 
 	err = rhashtable_init(&flowtable->rhashtable,
 			      &nf_flow_offload_rhash_params);
@@ -550,11 +595,13 @@ void nf_flow_table_free(struct nf_flowtable *flow_table)
 	mutex_lock(&flowtable_lock);
 	list_del(&flow_table->list);
 	mutex_unlock(&flowtable_lock);
+
 	cancel_delayed_work_sync(&flow_table->gc_work);
 	nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL);
 	nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, flow_table);
 	nf_flow_table_offload_flush(flow_table);
 	rhashtable_destroy(&flow_table->rhashtable);
+	mutex_destroy(&flow_table->flow_block_lock);
 }
 EXPORT_SYMBOL_GPL(nf_flow_table_free);
 
diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
index 06f00cd..f5afdf0 100644
--- a/net/netfilter/nf_flow_table_offload.c
+++ b/net/netfilter/nf_flow_table_offload.c
@@ -610,6 +610,7 @@ static int nf_flow_offload_tuple(struct nf_flowtable *flowtable,
 	if (cmd == FLOW_CLS_REPLACE)
 		cls_flow.rule = flow_rule->rule;
 
+	mutex_lock(&flowtable->flow_block_lock);
 	list_for_each_entry(block_cb, block_cb_list, list) {
 		err = block_cb->cb(TC_SETUP_CLSFLOWER, &cls_flow,
 				   block_cb->cb_priv);
@@ -618,6 +619,7 @@ static int nf_flow_offload_tuple(struct nf_flowtable *flowtable,
 
 		i++;
 	}
+	mutex_unlock(&flowtable->flow_block_lock);
 
 	return i;
 }
@@ -692,8 +694,10 @@ static void flow_offload_tuple_stats(struct flow_offload_work *offload,
 			     FLOW_CLS_STATS,
 			     &offload->flow->tuplehash[dir].tuple, &extack);
 
+	mutex_lock(&flowtable->flow_block_lock);
 	list_for_each_entry(block_cb, &flowtable->flow_block.cb_list, list)
 		block_cb->cb(TC_SETUP_CLSFLOWER, &cls_flow, block_cb->cb_priv);
+	mutex_unlock(&flowtable->flow_block_lock);
 	memcpy(stats, &cls_flow.stats, sizeof(*stats));
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next ct-offload v3 04/15] net/sched: act_ct: Instantiate flow table entry actions
  2020-03-11 14:33 [PATCH net-next ct-offload v3 00/15] Introduce connection tracking offload Paul Blakey
                   ` (2 preceding siblings ...)
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 03/15] netfilter: flowtable: Add API for registering to flow table events Paul Blakey
@ 2020-03-11 14:33 ` Paul Blakey
  2020-03-11 17:41   ` Edward Cree
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 05/15] net/sched: act_ct: Support restoring conntrack info on skbs Paul Blakey
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 27+ messages in thread
From: Paul Blakey @ 2020-03-11 14:33 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko, Roi Dayan

NF flow table API associate 5-tuple rule with an action list by calling
the flow table type action() CB to fill the rule's actions.

In action CB of act_ct, populate the ct offload entry actions with a new
ct_metadata action. Initialize the ct_metadata with the ct mark, label and
zone information. If ct nat was performed, then also append the relevant
packet mangle actions (e.g. ipv4/ipv6/tcp/udp header rewrites).

Drivers that offload the ft entries may match on the 5-tuple and perform
the action list.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
---
Changelog:
   v2->v3:
     Fix nat masks and conversions
     Moved comment above nat helpers
   v1->v2:
     Remove zone from metadata
     Add add mangle helper func (removes the unneccasry () and correct the mask there)
     Remove "abuse" of ? operator and use switch case
     Check protocol and ports in relevant function and return err
     On error restore action entries (on the topic, validaiting num of action isn't available)
     Add comment expalining nat
     Remove Inlinie from tcf_ct_flow_table_flow_action_get_next
     Refactor tcf_ct_flow_table_add_action_nat_ipv6 with helper
     On nats, allow both src and dst mangles

 include/net/flow_offload.h            |   5 +
 include/net/netfilter/nf_flow_table.h |  23 ++++
 net/netfilter/nf_flow_table_offload.c |  23 ----
 net/sched/act_ct.c                    | 207 ++++++++++++++++++++++++++++++++++
 4 files changed, 235 insertions(+), 23 deletions(-)

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index d1b1e4a..ba43349 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -136,6 +136,7 @@ enum flow_action_id {
 	FLOW_ACTION_SAMPLE,
 	FLOW_ACTION_POLICE,
 	FLOW_ACTION_CT,
+	FLOW_ACTION_CT_METADATA,
 	FLOW_ACTION_MPLS_PUSH,
 	FLOW_ACTION_MPLS_POP,
 	FLOW_ACTION_MPLS_MANGLE,
@@ -225,6 +226,10 @@ struct flow_action_entry {
 			int action;
 			u16 zone;
 		} ct;
+		struct {
+			u32 mark;
+			u32 labels[4];
+		} ct_metadata;
 		struct {				/* FLOW_ACTION_MPLS_PUSH */
 			u32		label;
 			__be16		proto;
diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index d9d0945..c2d5cdd 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -16,6 +16,29 @@
 struct flow_offload;
 enum flow_offload_tuple_dir;
 
+struct nf_flow_key {
+	struct flow_dissector_key_meta			meta;
+	struct flow_dissector_key_control		control;
+	struct flow_dissector_key_basic			basic;
+	union {
+		struct flow_dissector_key_ipv4_addrs	ipv4;
+		struct flow_dissector_key_ipv6_addrs	ipv6;
+	};
+	struct flow_dissector_key_tcp			tcp;
+	struct flow_dissector_key_ports			tp;
+} __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as longs. */
+
+struct nf_flow_match {
+	struct flow_dissector	dissector;
+	struct nf_flow_key	key;
+	struct nf_flow_key	mask;
+};
+
+struct nf_flow_rule {
+	struct nf_flow_match	match;
+	struct flow_rule	*rule;
+};
+
 struct nf_flowtable_type {
 	struct list_head		list;
 	int				family;
diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
index f5afdf0..42b73a0 100644
--- a/net/netfilter/nf_flow_table_offload.c
+++ b/net/netfilter/nf_flow_table_offload.c
@@ -23,29 +23,6 @@ struct flow_offload_work {
 	struct flow_offload	*flow;
 };
 
-struct nf_flow_key {
-	struct flow_dissector_key_meta			meta;
-	struct flow_dissector_key_control		control;
-	struct flow_dissector_key_basic			basic;
-	union {
-		struct flow_dissector_key_ipv4_addrs	ipv4;
-		struct flow_dissector_key_ipv6_addrs	ipv6;
-	};
-	struct flow_dissector_key_tcp			tcp;
-	struct flow_dissector_key_ports			tp;
-} __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as longs. */
-
-struct nf_flow_match {
-	struct flow_dissector	dissector;
-	struct nf_flow_key	key;
-	struct nf_flow_key	mask;
-};
-
-struct nf_flow_rule {
-	struct nf_flow_match	match;
-	struct flow_rule	*rule;
-};
-
 #define NF_FLOW_DISSECTOR(__match, __type, __field)	\
 	(__match)->dissector.offset[__type] =		\
 		offsetof(struct nf_flow_key, __field)
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 3d9e678..9c522bc 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -55,7 +55,214 @@ struct tcf_ct_flow_table {
 	.automatic_shrinking = true,
 };
 
+static struct flow_action_entry *
+tcf_ct_flow_table_flow_action_get_next(struct flow_action *flow_action)
+{
+	int i = flow_action->num_entries++;
+
+	return &flow_action->entries[i];
+}
+
+static void tcf_ct_add_mangle_action(struct flow_action *action,
+				     enum flow_action_mangle_base htype,
+				     u32 offset,
+				     u32 mask,
+				     u32 val)
+{
+	struct flow_action_entry *entry;
+
+	entry = tcf_ct_flow_table_flow_action_get_next(action);
+	entry->id = FLOW_ACTION_MANGLE;
+	entry->mangle.htype = htype;
+	entry->mangle.mask = ~mask;
+	entry->mangle.offset = offset;
+	entry->mangle.val = val;
+}
+
+/* The following nat helper functions check if the inverted reverse tuple
+ * (target) is different then the current dir tuple - meaning nat for ports
+ * and/or ip is needed, and add the relevant mangle actions.
+ */
+static void
+tcf_ct_flow_table_add_action_nat_ipv4(const struct nf_conntrack_tuple *tuple,
+				      struct nf_conntrack_tuple target,
+				      struct flow_action *action)
+{
+	if (memcmp(&target.src.u3, &tuple->src.u3, sizeof(target.src.u3)))
+		tcf_ct_add_mangle_action(action, FLOW_ACT_MANGLE_HDR_TYPE_IP4,
+					 offsetof(struct iphdr, saddr),
+					 0xFFFFFFFF,
+					 be32_to_cpu(target.src.u3.ip));
+	if (memcmp(&target.dst.u3, &tuple->dst.u3, sizeof(target.dst.u3)))
+		tcf_ct_add_mangle_action(action, FLOW_ACT_MANGLE_HDR_TYPE_IP4,
+					 offsetof(struct iphdr, daddr),
+					 0xFFFFFFFF,
+					 be32_to_cpu(target.dst.u3.ip));
+}
+
+static void
+tcf_ct_add_ipv6_addr_mangle_action(struct flow_action *action,
+				   union nf_inet_addr *addr,
+				   u32 offset)
+{
+	int i;
+
+	for (i = 0; i < sizeof(struct in6_addr) / sizeof(u32); i++)
+		tcf_ct_add_mangle_action(action, FLOW_ACT_MANGLE_HDR_TYPE_IP6,
+					 i * sizeof(u32) + offset,
+					 0xFFFFFFFF, be32_to_cpu(addr->ip6[i]));
+}
+
+static void
+tcf_ct_flow_table_add_action_nat_ipv6(const struct nf_conntrack_tuple *tuple,
+				      struct nf_conntrack_tuple target,
+				      struct flow_action *action)
+{
+	if (memcmp(&target.src.u3, &tuple->src.u3, sizeof(target.src.u3)))
+		tcf_ct_add_ipv6_addr_mangle_action(action, &target.src.u3,
+						   offsetof(struct ipv6hdr,
+							    saddr));
+	if (memcmp(&target.dst.u3, &tuple->dst.u3, sizeof(target.dst.u3)))
+		tcf_ct_add_ipv6_addr_mangle_action(action, &target.dst.u3,
+						   offsetof(struct ipv6hdr,
+							    daddr));
+}
+
+static void
+tcf_ct_flow_table_add_action_nat_tcp(const struct nf_conntrack_tuple *tuple,
+				     struct nf_conntrack_tuple target,
+				     struct flow_action *action)
+{
+	__be16 target_src = target.src.u.tcp.port;
+	__be16 target_dst = target.dst.u.tcp.port;
+
+	if (target_src != tuple->src.u.tcp.port)
+		tcf_ct_add_mangle_action(action, FLOW_ACT_MANGLE_HDR_TYPE_TCP,
+					 offsetof(struct tcphdr, source),
+					 0xFFFF, be16_to_cpu(target_src));
+	if (target_dst != tuple->dst.u.tcp.port)
+		tcf_ct_add_mangle_action(action, FLOW_ACT_MANGLE_HDR_TYPE_TCP,
+					 offsetof(struct tcphdr, dest),
+					 0xFFFF, be16_to_cpu(target_dst));
+}
+
+static void
+tcf_ct_flow_table_add_action_nat_udp(const struct nf_conntrack_tuple *tuple,
+				     struct nf_conntrack_tuple target,
+				     struct flow_action *action)
+{
+	__be16 target_src = target.src.u.udp.port;
+	__be16 target_dst = target.dst.u.udp.port;
+
+	if (target_src != tuple->src.u.udp.port)
+		tcf_ct_add_mangle_action(action, FLOW_ACT_MANGLE_HDR_TYPE_TCP,
+					 offsetof(struct udphdr, source),
+					 0xFFFF, be16_to_cpu(target_src));
+	if (target_dst != tuple->dst.u.udp.port)
+		tcf_ct_add_mangle_action(action, FLOW_ACT_MANGLE_HDR_TYPE_TCP,
+					 offsetof(struct udphdr, dest),
+					 0xFFFF, be16_to_cpu(target_dst));
+}
+
+static void tcf_ct_flow_table_add_action_meta(struct nf_conn *ct,
+					      enum ip_conntrack_dir dir,
+					      struct flow_action *action)
+{
+	struct nf_conn_labels *ct_labels;
+	struct flow_action_entry *entry;
+	u32 *act_ct_labels;
+
+	entry = tcf_ct_flow_table_flow_action_get_next(action);
+	entry->id = FLOW_ACTION_CT_METADATA;
+#if IS_ENABLED(CONFIG_NF_CONNTRACK_MARK)
+	entry->ct_metadata.mark = ct->mark;
+#endif
+
+	act_ct_labels = entry->ct_metadata.labels;
+	ct_labels = nf_ct_labels_find(ct);
+	if (ct_labels)
+		memcpy(act_ct_labels, ct_labels->bits, NF_CT_LABELS_MAX_SIZE);
+	else
+		memset(act_ct_labels, 0, NF_CT_LABELS_MAX_SIZE);
+}
+
+static int tcf_ct_flow_table_add_action_nat(struct net *net,
+					    struct nf_conn *ct,
+					    enum ip_conntrack_dir dir,
+					    struct flow_action *action)
+{
+	const struct nf_conntrack_tuple *tuple = &ct->tuplehash[dir].tuple;
+	struct nf_conntrack_tuple target;
+
+	nf_ct_invert_tuple(&target, &ct->tuplehash[!dir].tuple);
+
+	switch (tuple->src.l3num) {
+	case NFPROTO_IPV4:
+		tcf_ct_flow_table_add_action_nat_ipv4(tuple, target,
+						      action);
+		break;
+	case NFPROTO_IPV6:
+		tcf_ct_flow_table_add_action_nat_ipv6(tuple, target,
+						      action);
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	switch (nf_ct_protonum(ct)) {
+	case IPPROTO_TCP:
+		tcf_ct_flow_table_add_action_nat_tcp(tuple, target, action);
+		break;
+	case IPPROTO_UDP:
+		tcf_ct_flow_table_add_action_nat_udp(tuple, target, action);
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	return 0;
+}
+
+static int tcf_ct_flow_table_fill_actions(struct net *net,
+					  const struct flow_offload *flow,
+					  enum flow_offload_tuple_dir tdir,
+					  struct nf_flow_rule *flow_rule)
+{
+	struct flow_action *action = &flow_rule->rule->action;
+	int num_entries = action->num_entries;
+	struct nf_conn *ct = flow->ct;
+	enum ip_conntrack_dir dir;
+	int i, err;
+
+	switch (tdir) {
+	case FLOW_OFFLOAD_DIR_ORIGINAL:
+		dir = IP_CT_DIR_ORIGINAL;
+		break;
+	case FLOW_OFFLOAD_DIR_REPLY:
+		dir = IP_CT_DIR_REPLY;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	err = tcf_ct_flow_table_add_action_nat(net, ct, dir, action);
+	if (err)
+		goto err_nat;
+
+	tcf_ct_flow_table_add_action_meta(ct, dir, action);
+	return 0;
+
+err_nat:
+	/* Clear filled actions */
+	for (i = num_entries; i < action->num_entries; i++)
+		memset(&action->entries[i], 0, sizeof(action->entries[i]));
+	action->num_entries = num_entries;
+
+	return err;
+}
+
 static struct nf_flowtable_type flowtable_ct = {
+	.action		= tcf_ct_flow_table_fill_actions,
 	.owner		= THIS_MODULE,
 };
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next ct-offload v3 05/15] net/sched: act_ct: Support restoring conntrack info on skbs
  2020-03-11 14:33 [PATCH net-next ct-offload v3 00/15] Introduce connection tracking offload Paul Blakey
                   ` (3 preceding siblings ...)
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 04/15] net/sched: act_ct: Instantiate flow table entry actions Paul Blakey
@ 2020-03-11 14:33 ` Paul Blakey
  2020-03-12  6:40   ` David Miller
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 06/15] net/sched: act_ct: Support refreshing the flow table entries Paul Blakey
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 27+ messages in thread
From: Paul Blakey @ 2020-03-11 14:33 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko, Roi Dayan

Provide an API to restore the ct state pointer.

This may be used by drivers to restore the ct state if they
miss in tc chain after they already did the hardware connection
tracking action (ct_metadata action).

For example, consider the following rule on chain 0 that is in_hw,
however chain 1 is not_in_hw:

$ tc filter add dev ... chain 0 ... \
  flower ... action ct pipe action goto chain 1

Packets of a flow offloaded (via nf flow table offload) by the driver
hit this rule in hardware, will be marked with the ct metadata action
(mark, label, zone) that does the equivalent of the software ct action,
and when the packet jumps to hardware chain 1, there would be a miss.

CT was already processed in hardware. Therefore, the driver's miss
handling should restore the ct state on the skb, using the provided API,
and continue the packet processing in chain 1.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/flow_offload.h |  1 +
 include/net/tc_act/tc_ct.h |  7 +++++++
 net/sched/act_ct.c         | 15 +++++++++++++++
 3 files changed, 23 insertions(+)

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index ba43349..a039c90 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -227,6 +227,7 @@ struct flow_action_entry {
 			u16 zone;
 		} ct;
 		struct {
+			unsigned long cookie;
 			u32 mark;
 			u32 labels[4];
 		} ct_metadata;
diff --git a/include/net/tc_act/tc_ct.h b/include/net/tc_act/tc_ct.h
index cf3492e..735da59 100644
--- a/include/net/tc_act/tc_ct.h
+++ b/include/net/tc_act/tc_ct.h
@@ -55,6 +55,13 @@ static inline int tcf_ct_action(const struct tc_action *a)
 static inline int tcf_ct_action(const struct tc_action *a) { return 0; }
 #endif /* CONFIG_NF_CONNTRACK */
 
+#if IS_ENABLED(CONFIG_NET_ACT_CT)
+void tcf_ct_flow_table_restore_skb(struct sk_buff *skb, unsigned long cookie);
+#else
+static inline void
+tcf_ct_flow_table_restore_skb(struct sk_buff *skb, unsigned long cookie) { }
+#endif
+
 static inline bool is_tcf_ct(const struct tc_action *a)
 {
 #if defined(CONFIG_NET_CLS_ACT) && IS_ENABLED(CONFIG_NF_CONNTRACK)
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 9c522bc..9d6eb145 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -170,6 +170,7 @@ static void tcf_ct_flow_table_add_action_meta(struct nf_conn *ct,
 {
 	struct nf_conn_labels *ct_labels;
 	struct flow_action_entry *entry;
+	enum ip_conntrack_info ctinfo;
 	u32 *act_ct_labels;
 
 	entry = tcf_ct_flow_table_flow_action_get_next(action);
@@ -177,6 +178,10 @@ static void tcf_ct_flow_table_add_action_meta(struct nf_conn *ct,
 #if IS_ENABLED(CONFIG_NF_CONNTRACK_MARK)
 	entry->ct_metadata.mark = ct->mark;
 #endif
+	ctinfo = dir == IP_CT_DIR_ORIGINAL ? IP_CT_ESTABLISHED :
+					     IP_CT_ESTABLISHED_REPLY;
+	/* aligns with the CT reference on the SKB nf_ct_set */
+	entry->ct_metadata.cookie = (unsigned long)ct | ctinfo;
 
 	act_ct_labels = entry->ct_metadata.labels;
 	ct_labels = nf_ct_labels_find(ct);
@@ -1530,6 +1535,16 @@ static void __exit ct_cleanup_module(void)
 	destroy_workqueue(act_ct_wq);
 }
 
+void tcf_ct_flow_table_restore_skb(struct sk_buff *skb, unsigned long cookie)
+{
+	enum ip_conntrack_info ctinfo = cookie & NFCT_INFOMASK;
+	struct nf_conn *ct = (struct nf_conn *)(cookie & NFCT_PTRMASK);
+
+	nf_conntrack_get(&ct->ct_general);
+	nf_ct_set(skb, ct, ctinfo);
+}
+EXPORT_SYMBOL_GPL(tcf_ct_flow_table_restore_skb);
+
 module_init(ct_init_module);
 module_exit(ct_cleanup_module);
 MODULE_AUTHOR("Paul Blakey <paulb@mellanox.com>");
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next ct-offload v3 06/15] net/sched: act_ct: Support refreshing the flow table entries
  2020-03-11 14:33 [PATCH net-next ct-offload v3 00/15] Introduce connection tracking offload Paul Blakey
                   ` (4 preceding siblings ...)
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 05/15] net/sched: act_ct: Support restoring conntrack info on skbs Paul Blakey
@ 2020-03-11 14:33 ` Paul Blakey
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 07/15] net/sched: act_ct: Enable hardware offload of flow table entires Paul Blakey
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Paul Blakey @ 2020-03-11 14:33 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko, Roi Dayan

If driver deleted an FT entry, a FT failed to offload, or registered to the
flow table after flows were already added, we still get packets in
software.

For those packets, while restoring the ct state from the flow table
entry, refresh it's hardware offload.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/netfilter/nf_flow_table.h |  3 +++
 net/netfilter/nf_flow_table_core.c    | 13 +++++++++++++
 net/netfilter/nf_flow_table_ip.c      | 15 ++-------------
 net/sched/act_ct.c                    |  1 +
 4 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index c2d5cdd..6890f1c 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -162,6 +162,9 @@ int flow_offload_route_init(struct flow_offload *flow,
 			    const struct nf_flow_route *route);
 
 int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload *flow);
+void flow_offload_refresh(struct nf_flowtable *flow_table,
+			  struct flow_offload *flow);
+
 struct flow_offload_tuple_rhash *flow_offload_lookup(struct nf_flowtable *flow_table,
 						     struct flow_offload_tuple *tuple);
 void nf_flow_table_cleanup(struct net_device *dev);
diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index 4af0327..9a477bd 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -252,6 +252,19 @@ int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload *flow)
 }
 EXPORT_SYMBOL_GPL(flow_offload_add);
 
+void flow_offload_refresh(struct nf_flowtable *flow_table,
+			  struct flow_offload *flow)
+{
+	flow->timeout = nf_flowtable_time_stamp + NF_FLOW_TIMEOUT;
+
+	if (likely(!nf_flowtable_hw_offload(flow_table) ||
+		   !test_and_clear_bit(NF_FLOW_HW_REFRESH, &flow->flags)))
+		return;
+
+	nf_flow_offload_add(flow_table, flow);
+}
+EXPORT_SYMBOL_GPL(flow_offload_refresh);
+
 static inline bool nf_flow_has_expired(const struct flow_offload *flow)
 {
 	return nf_flow_timeout_delta(flow->timeout) <= 0;
diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index 9e563fd..5272721 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -232,13 +232,6 @@ static unsigned int nf_flow_xmit_xfrm(struct sk_buff *skb,
 	return NF_STOLEN;
 }
 
-static bool nf_flow_offload_refresh(struct nf_flowtable *flow_table,
-				    struct flow_offload *flow)
-{
-	return nf_flowtable_hw_offload(flow_table) &&
-	       test_and_clear_bit(NF_FLOW_HW_REFRESH, &flow->flags);
-}
-
 unsigned int
 nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
 			const struct nf_hook_state *state)
@@ -279,8 +272,7 @@ static bool nf_flow_offload_refresh(struct nf_flowtable *flow_table,
 	if (nf_flow_state_check(flow, ip_hdr(skb)->protocol, skb, thoff))
 		return NF_ACCEPT;
 
-	if (unlikely(nf_flow_offload_refresh(flow_table, flow)))
-		nf_flow_offload_add(flow_table, flow);
+	flow_offload_refresh(flow_table, flow);
 
 	if (nf_flow_offload_dst_check(&rt->dst)) {
 		flow_offload_teardown(flow);
@@ -290,7 +282,6 @@ static bool nf_flow_offload_refresh(struct nf_flowtable *flow_table,
 	if (nf_flow_nat_ip(flow, skb, thoff, dir) < 0)
 		return NF_DROP;
 
-	flow->timeout = nf_flowtable_time_stamp + NF_FLOW_TIMEOUT;
 	iph = ip_hdr(skb);
 	ip_decrease_ttl(iph);
 	skb->tstamp = 0;
@@ -508,8 +499,7 @@ static int nf_flow_tuple_ipv6(struct sk_buff *skb, const struct net_device *dev,
 				sizeof(*ip6h)))
 		return NF_ACCEPT;
 
-	if (unlikely(nf_flow_offload_refresh(flow_table, flow)))
-		nf_flow_offload_add(flow_table, flow);
+	flow_offload_refresh(flow_table, flow);
 
 	if (nf_flow_offload_dst_check(&rt->dst)) {
 		flow_offload_teardown(flow);
@@ -522,7 +512,6 @@ static int nf_flow_tuple_ipv6(struct sk_buff *skb, const struct net_device *dev,
 	if (nf_flow_nat_ipv6(flow, skb, dir) < 0)
 		return NF_DROP;
 
-	flow->timeout = nf_flowtable_time_stamp + NF_FLOW_TIMEOUT;
 	ip6h = ipv6_hdr(skb);
 	ip6h->hop_limit--;
 	skb->tstamp = 0;
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 9d6eb145..c04b156 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -531,6 +531,7 @@ static bool tcf_ct_flow_table_lookup(struct tcf_ct_params *p,
 	ctinfo = dir == FLOW_OFFLOAD_DIR_ORIGINAL ? IP_CT_ESTABLISHED :
 						    IP_CT_ESTABLISHED_REPLY;
 
+	flow_offload_refresh(nf_ft, flow);
 	nf_conntrack_get(&ct->ct_general);
 	nf_ct_set(skb, ct, ctinfo);
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next ct-offload v3 07/15] net/sched: act_ct: Enable hardware offload of flow table entires
  2020-03-11 14:33 [PATCH net-next ct-offload v3 00/15] Introduce connection tracking offload Paul Blakey
                   ` (5 preceding siblings ...)
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 06/15] net/sched: act_ct: Support refreshing the flow table entries Paul Blakey
@ 2020-03-11 14:33 ` Paul Blakey
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 08/15] net/mlx5: E-Switch, Introduce global tables Paul Blakey
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Paul Blakey @ 2020-03-11 14:33 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko, Roi Dayan

Pass the zone's flow table instance on the flow action to the drivers.
Thus, allowing drivers to register FT add/del/stats callbacks.

Finally, enable hardware offload on the flow table instance.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/flow_offload.h |  1 +
 include/net/tc_act/tc_ct.h | 10 ++++++++++
 net/sched/act_ct.c         |  2 ++
 net/sched/cls_api.c        |  1 +
 4 files changed, 14 insertions(+)

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index a039c90..ceaa362 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -225,6 +225,7 @@ struct flow_action_entry {
 		struct {				/* FLOW_ACTION_CT */
 			int action;
 			u16 zone;
+			struct nf_flowtable *flow_table;
 		} ct;
 		struct {
 			unsigned long cookie;
diff --git a/include/net/tc_act/tc_ct.h b/include/net/tc_act/tc_ct.h
index 735da59..79654bc 100644
--- a/include/net/tc_act/tc_ct.h
+++ b/include/net/tc_act/tc_ct.h
@@ -27,6 +27,7 @@ struct tcf_ct_params {
 	struct rcu_head rcu;
 
 	struct tcf_ct_flow_table *ct_ft;
+	struct nf_flowtable *nf_ft;
 };
 
 struct tcf_ct {
@@ -50,9 +51,18 @@ static inline int tcf_ct_action(const struct tc_action *a)
 	return to_ct_params(a)->ct_action;
 }
 
+static inline struct nf_flowtable *tcf_ct_ft(const struct tc_action *a)
+{
+	return to_ct_params(a)->nf_ft;
+}
+
 #else
 static inline uint16_t tcf_ct_zone(const struct tc_action *a) { return 0; }
 static inline int tcf_ct_action(const struct tc_action *a) { return 0; }
+static inline struct nf_flowtable *tcf_ct_ft(const struct tc_action *a)
+{
+	return NULL;
+}
 #endif /* CONFIG_NF_CONNTRACK */
 
 #if IS_ENABLED(CONFIG_NET_ACT_CT)
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index c04b156..cec08ff 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -292,6 +292,7 @@ static int tcf_ct_flow_table_get(struct tcf_ct_params *params)
 		goto err_insert;
 
 	ct_ft->nf_ft.type = &flowtable_ct;
+	ct_ft->nf_ft.flags |= NF_FLOWTABLE_HW_OFFLOAD;
 	err = nf_flow_table_init(&ct_ft->nf_ft);
 	if (err)
 		goto err_init;
@@ -299,6 +300,7 @@ static int tcf_ct_flow_table_get(struct tcf_ct_params *params)
 	__module_get(THIS_MODULE);
 out_unlock:
 	params->ct_ft = ct_ft;
+	params->nf_ft = &ct_ft->nf_ft;
 	mutex_unlock(&zones_mutex);
 
 	return 0;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 2b5b4eb..2046102 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -3636,6 +3636,7 @@ int tc_setup_flow_action(struct flow_action *flow_action,
 			entry->id = FLOW_ACTION_CT;
 			entry->ct.action = tcf_ct_action(act);
 			entry->ct.zone = tcf_ct_zone(act);
+			entry->ct.flow_table = tcf_ct_ft(act);
 		} else if (is_tcf_mpls(act)) {
 			switch (tcf_mpls_action(act)) {
 			case TCA_MPLS_ACT_PUSH:
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next ct-offload v3 08/15] net/mlx5: E-Switch, Introduce global tables
  2020-03-11 14:33 [PATCH net-next ct-offload v3 00/15] Introduce connection tracking offload Paul Blakey
                   ` (6 preceding siblings ...)
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 07/15] net/sched: act_ct: Enable hardware offload of flow table entires Paul Blakey
@ 2020-03-11 14:33 ` Paul Blakey
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 09/15] net/mlx5: E-Switch, Add support for offloading rules with no in_port Paul Blakey
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Paul Blakey @ 2020-03-11 14:33 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko, Roi Dayan

Currently, flow tables are automatically connected according to their
<chain,prio,level> tuple.

Introduce global tables which are flow tables that are detached from the
eswitch chains processing, and will be connected by explicitly referencing
them from multiple chains.

Add this new table type, and allow connecting them by refenece.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  2 ++
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 18 +++++++++----
 .../mellanox/mlx5/core/eswitch_offloads_chains.c   | 30 ++++++++++++++++++++++
 .../mellanox/mlx5/core/eswitch_offloads_chains.h   |  6 +++++
 4 files changed, 51 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index ee36a8a..dae0f3e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -421,6 +421,8 @@ struct mlx5_esw_flow_attr {
 	u16	prio;
 	u32	dest_chain;
 	u32	flags;
+	struct mlx5_flow_table *fdb;
+	struct mlx5_flow_table *dest_ft;
 	struct mlx5e_tc_flow_parse_attr *parse_attr;
 };
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index c79a73b..8bfab5d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -324,7 +324,12 @@ struct mlx5_flow_handle *
 	if (flow_act.action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST) {
 		struct mlx5_flow_table *ft;
 
-		if (attr->flags & MLX5_ESW_ATTR_FLAG_SLOW_PATH) {
+		if (attr->dest_ft) {
+			flow_act.flags |= FLOW_ACT_IGNORE_FLOW_LEVEL;
+			dest[i].type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
+			dest[i].ft = attr->dest_ft;
+			i++;
+		} else if (attr->flags & MLX5_ESW_ATTR_FLAG_SLOW_PATH) {
 			flow_act.flags |= FLOW_ACT_IGNORE_FLOW_LEVEL;
 			dest[i].type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
 			dest[i].ft = mlx5_esw_chains_get_tc_end_ft(esw);
@@ -378,8 +383,11 @@ struct mlx5_flow_handle *
 	if (split) {
 		fdb = esw_vport_tbl_get(esw, attr);
 	} else {
-		fdb = mlx5_esw_chains_get_table(esw, attr->chain, attr->prio,
-						0);
+		if (attr->chain || attr->prio)
+			fdb = mlx5_esw_chains_get_table(esw, attr->chain,
+							attr->prio, 0);
+		else
+			fdb = attr->fdb;
 		mlx5_eswitch_set_rule_source_port(esw, spec, attr);
 	}
 	if (IS_ERR(fdb)) {
@@ -402,7 +410,7 @@ struct mlx5_flow_handle *
 err_add_rule:
 	if (split)
 		esw_vport_tbl_put(esw, attr);
-	else
+	else if (attr->chain || attr->prio)
 		mlx5_esw_chains_put_table(esw, attr->chain, attr->prio, 0);
 err_esw_get:
 	if (!(attr->flags & MLX5_ESW_ATTR_FLAG_SLOW_PATH) && attr->dest_chain)
@@ -499,7 +507,7 @@ struct mlx5_flow_handle *
 	} else {
 		if (split)
 			esw_vport_tbl_put(esw, attr);
-		else
+		else if (attr->chain || attr->prio)
 			mlx5_esw_chains_put_table(esw, attr->chain, attr->prio,
 						  0);
 		if (attr->dest_chain)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.c
index ca3bbf8..84e3313 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.c
@@ -722,6 +722,36 @@ struct mlx5_flow_table *
 	return tc_end_fdb(esw);
 }
 
+struct mlx5_flow_table *
+mlx5_esw_chains_create_global_table(struct mlx5_eswitch *esw)
+{
+	int chain, prio, level, err;
+
+	if (!fdb_ignore_flow_level_supported(esw)) {
+		err = -EOPNOTSUPP;
+
+		esw_warn(esw->dev,
+			 "Couldn't create global flow table, ignore_flow_level not supported.");
+		goto err_ignore;
+	}
+
+	chain = mlx5_esw_chains_get_chain_range(esw),
+	prio = mlx5_esw_chains_get_prio_range(esw);
+	level = mlx5_esw_chains_get_level_range(esw);
+
+	return mlx5_esw_chains_create_fdb_table(esw, chain, prio, level);
+
+err_ignore:
+	return ERR_PTR(err);
+}
+
+void
+mlx5_esw_chains_destroy_global_table(struct mlx5_eswitch *esw,
+				     struct mlx5_flow_table *ft)
+{
+	mlx5_esw_chains_destroy_fdb_table(esw, ft);
+}
+
 static int
 mlx5_esw_chains_init(struct mlx5_eswitch *esw)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.h
index e806d8d..c7bc609 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.h
@@ -25,6 +25,12 @@ struct mlx5_flow_table *
 struct mlx5_flow_table *
 mlx5_esw_chains_get_tc_end_ft(struct mlx5_eswitch *esw);
 
+struct mlx5_flow_table *
+mlx5_esw_chains_create_global_table(struct mlx5_eswitch *esw);
+void
+mlx5_esw_chains_destroy_global_table(struct mlx5_eswitch *esw,
+				     struct mlx5_flow_table *ft);
+
 int mlx5_esw_chains_create(struct mlx5_eswitch *esw);
 void mlx5_esw_chains_destroy(struct mlx5_eswitch *esw);
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next ct-offload v3 09/15] net/mlx5: E-Switch, Add support for offloading rules with no in_port
  2020-03-11 14:33 [PATCH net-next ct-offload v3 00/15] Introduce connection tracking offload Paul Blakey
                   ` (7 preceding siblings ...)
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 08/15] net/mlx5: E-Switch, Introduce global tables Paul Blakey
@ 2020-03-11 14:33 ` Paul Blakey
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 10/15] net/mlx5: E-Switch, Support getting chain mapping Paul Blakey
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Paul Blakey @ 2020-03-11 14:33 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko, Roi Dayan

FTEs in global tables may match on packets from multiple in_ports.
Provide the capability to omit the in_port match condition.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h          | 1 +
 drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index dae0f3e..6254bb6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -391,6 +391,7 @@ enum {
 enum {
 	MLX5_ESW_ATTR_FLAG_VLAN_HANDLED  = BIT(0),
 	MLX5_ESW_ATTR_FLAG_SLOW_PATH     = BIT(1),
+	MLX5_ESW_ATTR_FLAG_NO_IN_PORT    = BIT(2),
 };
 
 struct mlx5_esw_flow_attr {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 8bfab5d..21ac813 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -388,7 +388,9 @@ struct mlx5_flow_handle *
 							attr->prio, 0);
 		else
 			fdb = attr->fdb;
-		mlx5_eswitch_set_rule_source_port(esw, spec, attr);
+
+		if (!(attr->flags & MLX5_ESW_ATTR_FLAG_NO_IN_PORT))
+			mlx5_eswitch_set_rule_source_port(esw, spec, attr);
 	}
 	if (IS_ERR(fdb)) {
 		rule = ERR_CAST(fdb);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next ct-offload v3 10/15] net/mlx5: E-Switch, Support getting chain mapping
  2020-03-11 14:33 [PATCH net-next ct-offload v3 00/15] Introduce connection tracking offload Paul Blakey
                   ` (8 preceding siblings ...)
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 09/15] net/mlx5: E-Switch, Add support for offloading rules with no in_port Paul Blakey
@ 2020-03-11 14:33 ` Paul Blakey
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 11/15] flow_offload: Add flow_match_ct to get rule ct match Paul Blakey
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Paul Blakey @ 2020-03-11 14:33 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko, Roi Dayan

Currently, we write chain register mapping on miss from the the last
prio of a chain. It is used to restore the chain in software.

To support re-using the chain register mapping from global tables (such
as CT tuple table) misses, export the chain mapping.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
---
 .../ethernet/mellanox/mlx5/core/eswitch_offloads_chains.c   | 13 +++++++++++++
 .../ethernet/mellanox/mlx5/core/eswitch_offloads_chains.h   |  7 +++++++
 2 files changed, 20 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.c
index 84e3313..0702c21 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.c
@@ -900,6 +900,19 @@ struct mlx5_flow_table *
 	mlx5_esw_chains_cleanup(esw);
 }
 
+int
+mlx5_esw_chains_get_chain_mapping(struct mlx5_eswitch *esw, u32 chain,
+				  u32 *chain_mapping)
+{
+	return mapping_add(esw_chains_mapping(esw), &chain, chain_mapping);
+}
+
+int
+mlx5_esw_chains_put_chain_mapping(struct mlx5_eswitch *esw, u32 chain_mapping)
+{
+	return mapping_remove(esw_chains_mapping(esw), chain_mapping);
+}
+
 int mlx5_eswitch_get_chain_for_tag(struct mlx5_eswitch *esw, u32 tag,
 				   u32 *chain)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.h
index c7bc609..f3b9ae6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.h
@@ -31,6 +31,13 @@ struct mlx5_flow_table *
 mlx5_esw_chains_destroy_global_table(struct mlx5_eswitch *esw,
 				     struct mlx5_flow_table *ft);
 
+int
+mlx5_esw_chains_get_chain_mapping(struct mlx5_eswitch *esw, u32 chain,
+				  u32 *chain_mapping);
+int
+mlx5_esw_chains_put_chain_mapping(struct mlx5_eswitch *esw,
+				  u32 chain_mapping);
+
 int mlx5_esw_chains_create(struct mlx5_eswitch *esw);
 void mlx5_esw_chains_destroy(struct mlx5_eswitch *esw);
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next ct-offload v3 11/15] flow_offload: Add flow_match_ct to get rule ct match
  2020-03-11 14:33 [PATCH net-next ct-offload v3 00/15] Introduce connection tracking offload Paul Blakey
                   ` (9 preceding siblings ...)
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 10/15] net/mlx5: E-Switch, Support getting chain mapping Paul Blakey
@ 2020-03-11 14:33 ` Paul Blakey
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 12/15] net/mlx5e: CT: Introduce connection tracking Paul Blakey
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Paul Blakey @ 2020-03-11 14:33 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko, Roi Dayan

Add relevant getter for ct info dissector.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/flow_offload.h | 6 ++++++
 net/core/flow_offload.c    | 7 +++++++
 2 files changed, 13 insertions(+)

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index ceaa362..efd8d47 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -69,6 +69,10 @@ struct flow_match_enc_opts {
 	struct flow_dissector_key_enc_opts *key, *mask;
 };
 
+struct flow_match_ct {
+	struct flow_dissector_key_ct *key, *mask;
+};
+
 struct flow_rule;
 
 void flow_rule_match_meta(const struct flow_rule *rule,
@@ -111,6 +115,8 @@ void flow_rule_match_enc_keyid(const struct flow_rule *rule,
 			       struct flow_match_enc_keyid *out);
 void flow_rule_match_enc_opts(const struct flow_rule *rule,
 			      struct flow_match_enc_opts *out);
+void flow_rule_match_ct(const struct flow_rule *rule,
+			struct flow_match_ct *out);
 
 enum flow_action_id {
 	FLOW_ACTION_ACCEPT		= 0,
diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c
index d213482..7440e61 100644
--- a/net/core/flow_offload.c
+++ b/net/core/flow_offload.c
@@ -188,6 +188,13 @@ void flow_action_cookie_destroy(struct flow_action_cookie *cookie)
 }
 EXPORT_SYMBOL(flow_action_cookie_destroy);
 
+void flow_rule_match_ct(const struct flow_rule *rule,
+			struct flow_match_ct *out)
+{
+	FLOW_DISSECTOR_MATCH(rule, FLOW_DISSECTOR_KEY_CT, out);
+}
+EXPORT_SYMBOL(flow_rule_match_ct);
+
 struct flow_block_cb *flow_block_cb_alloc(flow_setup_cb_t *cb,
 					  void *cb_ident, void *cb_priv,
 					  void (*release)(void *cb_priv))
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next ct-offload v3 12/15] net/mlx5e: CT: Introduce connection tracking
  2020-03-11 14:33 [PATCH net-next ct-offload v3 00/15] Introduce connection tracking offload Paul Blakey
                   ` (10 preceding siblings ...)
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 11/15] flow_offload: Add flow_match_ct to get rule ct match Paul Blakey
@ 2020-03-11 14:33 ` Paul Blakey
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 13/15] net/mlx5e: CT: Offload established flows Paul Blakey
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Paul Blakey @ 2020-03-11 14:33 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko, Roi Dayan

Add support for offloading tc ct action and ct matches.
We translate the tc filter with CT action the following HW model:

+-------------------+      +--------------------+    +--------------+
+ pre_ct (tc chain) +----->+ CT (nat or no nat) +--->+ post_ct      +----->
+ original match    +  |   + tuple + zone match + |  + fte_id match +  |
+-------------------+  |   +--------------------+ |  +--------------+  |
                       v                          v                    v
                      set chain miss mapping  set mark             original
                      set fte_id              set label            filter
                      set zone                set established      actions
                      set tunnel_id           do nat (if needed)
                      do decap

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
---
Changelog:
   v2->v3:
      check reg loopback support instead of metadata support/enabled

 drivers/net/ethernet/mellanox/mlx5/core/Kconfig    |  10 +
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c | 541 +++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h | 140 ++++++
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.h   |   3 +
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    | 104 +++-
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.h    |   8 +
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |   2 +
 8 files changed, 793 insertions(+), 16 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index a1f20b2..312e0a1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -78,6 +78,16 @@ config MLX5_ESWITCH
 	        Legacy SRIOV mode (L2 mac vlan steering based).
 	        Switchdev mode (eswitch offloads).
 
+config MLX5_TC_CT
+	bool "MLX5 TC connection tracking offload support"
+	depends on MLX5_CORE_EN && NET_SWITCHDEV && NF_FLOW_TABLE && NET_ACT_CT && NET_TC_SKB_EXT
+	default y
+	help
+	  Say Y here if you want to support offloading connection tracking rules
+	  via tc ct action.
+
+	  If unsure, set to Y
+
 config MLX5_CORE_EN_DCB
 	bool "Data Center Bridging (DCB) Support"
 	default y
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index a62dc81..7408ae3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -37,6 +37,7 @@ mlx5_core-$(CONFIG_MLX5_ESWITCH)     += en_rep.o en_tc.o en/tc_tun.o lib/port_tu
 					lib/geneve.o en/mapping.o en/tc_tun_vxlan.o en/tc_tun_gre.o \
 					en/tc_tun_geneve.o diag/en_tc_tracepoint.o
 mlx5_core-$(CONFIG_PCI_HYPERV_INTERFACE) += en/hv_vhca_stats.o
+mlx5_core-$(CONFIG_MLX5_TC_CT)	     += en/tc_ct.o
 
 #
 # Core extra
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
new file mode 100644
index 0000000..c113046
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
@@ -0,0 +1,541 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2019 Mellanox Technologies. */
+
+#include <net/netfilter/nf_conntrack.h>
+#include <net/netfilter/nf_conntrack_core.h>
+#include <net/netfilter/nf_conntrack_zones.h>
+#include <net/netfilter/nf_conntrack_labels.h>
+#include <net/netfilter/nf_conntrack_helper.h>
+#include <net/netfilter/nf_conntrack_acct.h>
+#include <uapi/linux/tc_act/tc_pedit.h>
+#include <net/tc_act/tc_ct.h>
+#include <net/flow_offload.h>
+#include <linux/workqueue.h>
+
+#include "en/tc_ct.h"
+#include "en.h"
+#include "en_tc.h"
+#include "en_rep.h"
+#include "eswitch_offloads_chains.h"
+
+#define MLX5_CT_ZONE_BITS (mlx5e_tc_attr_to_reg_mappings[ZONE_TO_REG].mlen * 8)
+#define MLX5_CT_ZONE_MASK GENMASK(MLX5_CT_ZONE_BITS - 1, 0)
+#define MLX5_CT_STATE_ESTABLISHED_BIT BIT(1)
+#define MLX5_CT_STATE_TRK_BIT BIT(2)
+
+#define MLX5_FTE_ID_BITS (mlx5e_tc_attr_to_reg_mappings[FTEID_TO_REG].mlen * 8)
+#define MLX5_FTE_ID_MAX GENMASK(MLX5_FTE_ID_BITS - 1, 0)
+#define MLX5_FTE_ID_MASK MLX5_FTE_ID_MAX
+
+#define ct_dbg(fmt, args...)\
+	netdev_dbg(ct_priv->netdev, "ct_debug: " fmt "\n", ##args)
+
+struct mlx5_tc_ct_priv {
+	struct mlx5_eswitch *esw;
+	const struct net_device *netdev;
+	struct idr fte_ids;
+	struct mlx5_flow_table *ct;
+	struct mlx5_flow_table *ct_nat;
+	struct mlx5_flow_table *post_ct;
+	struct mutex control_lock; /* guards parallel adds/dels */
+};
+
+struct mlx5_ct_flow {
+	struct mlx5_esw_flow_attr pre_ct_attr;
+	struct mlx5_esw_flow_attr post_ct_attr;
+	struct mlx5_flow_handle *pre_ct_rule;
+	struct mlx5_flow_handle *post_ct_rule;
+	u32 fte_id;
+	u32 chain_mapping;
+};
+
+static struct mlx5_tc_ct_priv *
+mlx5_tc_ct_get_ct_priv(struct mlx5e_priv *priv)
+{
+	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+	struct mlx5_rep_uplink_priv *uplink_priv;
+	struct mlx5e_rep_priv *uplink_rpriv;
+
+	uplink_rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH);
+	uplink_priv = &uplink_rpriv->uplink_priv;
+	return uplink_priv->ct_priv;
+}
+
+int
+mlx5_tc_ct_parse_match(struct mlx5e_priv *priv,
+		       struct mlx5_flow_spec *spec,
+		       struct flow_cls_offload *f,
+		       struct netlink_ext_ack *extack)
+{
+	struct mlx5_tc_ct_priv *ct_priv = mlx5_tc_ct_get_ct_priv(priv);
+	struct flow_dissector_key_ct *mask, *key;
+	bool trk, est, untrk, unest, new, unnew;
+	u32 ctstate = 0, ctstate_mask = 0;
+	u16 ct_state_on, ct_state_off;
+	u16 ct_state, ct_state_mask;
+	struct flow_match_ct match;
+
+	if (!flow_rule_match_key(f->rule, FLOW_DISSECTOR_KEY_CT))
+		return 0;
+
+	if (!ct_priv) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "offload of ct matching isn't available");
+		return -EOPNOTSUPP;
+	}
+
+	flow_rule_match_ct(f->rule, &match);
+
+	key = match.key;
+	mask = match.mask;
+
+	ct_state = key->ct_state;
+	ct_state_mask = mask->ct_state;
+
+	if (ct_state_mask & ~(TCA_FLOWER_KEY_CT_FLAGS_TRACKED |
+			      TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED |
+			      TCA_FLOWER_KEY_CT_FLAGS_NEW)) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "only ct_state trk, est and new are supported for offload");
+		return -EOPNOTSUPP;
+	}
+
+	if (mask->ct_labels[1] || mask->ct_labels[2] || mask->ct_labels[3]) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "only lower 32bits of ct_labels are supported for offload");
+		return -EOPNOTSUPP;
+	}
+
+	ct_state_on = ct_state & ct_state_mask;
+	ct_state_off = (ct_state & ct_state_mask) ^ ct_state_mask;
+	trk = ct_state_on & TCA_FLOWER_KEY_CT_FLAGS_TRACKED;
+	new = ct_state_on & TCA_FLOWER_KEY_CT_FLAGS_NEW;
+	est = ct_state_on & TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED;
+	untrk = ct_state_off & TCA_FLOWER_KEY_CT_FLAGS_TRACKED;
+	unnew = ct_state_off & TCA_FLOWER_KEY_CT_FLAGS_NEW;
+	unest = ct_state_off & TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED;
+
+	ctstate |= trk ? MLX5_CT_STATE_TRK_BIT : 0;
+	ctstate |= est ? MLX5_CT_STATE_ESTABLISHED_BIT : 0;
+	ctstate_mask |= (untrk || trk) ? MLX5_CT_STATE_TRK_BIT : 0;
+	ctstate_mask |= (unest || est) ? MLX5_CT_STATE_ESTABLISHED_BIT : 0;
+
+	if (new) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "matching on ct_state +new isn't supported");
+		return -EOPNOTSUPP;
+	}
+
+	if (mask->ct_zone)
+		mlx5e_tc_match_to_reg_match(spec, ZONE_TO_REG,
+					    key->ct_zone, MLX5_CT_ZONE_MASK);
+	if (ctstate_mask)
+		mlx5e_tc_match_to_reg_match(spec, CTSTATE_TO_REG,
+					    ctstate, ctstate_mask);
+	if (mask->ct_mark)
+		mlx5e_tc_match_to_reg_match(spec, MARK_TO_REG,
+					    key->ct_mark, mask->ct_mark);
+	if (mask->ct_labels[0])
+		mlx5e_tc_match_to_reg_match(spec, LABELS_TO_REG,
+					    key->ct_labels[0],
+					    mask->ct_labels[0]);
+
+	return 0;
+}
+
+int
+mlx5_tc_ct_parse_action(struct mlx5e_priv *priv,
+			struct mlx5_esw_flow_attr *attr,
+			const struct flow_action_entry *act,
+			struct netlink_ext_ack *extack)
+{
+	struct mlx5_tc_ct_priv *ct_priv = mlx5_tc_ct_get_ct_priv(priv);
+
+	if (!ct_priv) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "offload of ct action isn't available");
+		return -EOPNOTSUPP;
+	}
+
+	attr->ct_attr.zone = act->ct.zone;
+	attr->ct_attr.ct_action = act->ct.action;
+
+	return 0;
+}
+
+/* We translate the tc filter with CT action to the following HW model:
+ *
+ * +-------------------+      +--------------------+    +--------------+
+ * + pre_ct (tc chain) +----->+ CT (nat or no nat) +--->+ post_ct      +----->
+ * + original match    +  |   + tuple + zone match + |  + fte_id match +  |
+ * +-------------------+  |   +--------------------+ |  +--------------+  |
+ *                        v                          v                    v
+ *                       set chain miss mapping  set mark             original
+ *                       set fte_id              set label            filter
+ *                       set zone                set established      actions
+ *                       set tunnel_id           do nat (if needed)
+ *                       do decap
+ */
+static int
+__mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
+			  struct mlx5e_tc_flow *flow,
+			  struct mlx5_flow_spec *orig_spec,
+			  struct mlx5_esw_flow_attr *attr,
+			  struct mlx5_flow_handle **flow_rule)
+{
+	struct mlx5_tc_ct_priv *ct_priv = mlx5_tc_ct_get_ct_priv(priv);
+	bool nat = attr->ct_attr.ct_action & TCA_CT_ACT_NAT;
+	struct mlx5e_tc_mod_hdr_acts pre_mod_acts = {};
+	struct mlx5_eswitch *esw = ct_priv->esw;
+	struct mlx5_flow_spec post_ct_spec = {};
+	struct mlx5_esw_flow_attr *pre_ct_attr;
+	struct  mlx5_modify_hdr *mod_hdr;
+	struct mlx5_flow_handle *rule;
+	struct mlx5_ct_flow *ct_flow;
+	int chain_mapping = 0, err;
+	u32 fte_id = 1;
+
+	ct_flow = kzalloc(sizeof(*ct_flow), GFP_KERNEL);
+	if (!ct_flow)
+		return -ENOMEM;
+
+	err = idr_alloc_u32(&ct_priv->fte_ids, ct_flow, &fte_id,
+			    MLX5_FTE_ID_MAX, GFP_KERNEL);
+	if (err) {
+		netdev_warn(priv->netdev,
+			    "Failed to allocate fte id, err: %d\n", err);
+		goto err_idr;
+	}
+	ct_flow->fte_id = fte_id;
+
+	/* Base esw attributes of both rules on original rule attribute */
+	pre_ct_attr = &ct_flow->pre_ct_attr;
+	memcpy(pre_ct_attr, attr, sizeof(*attr));
+	memcpy(&ct_flow->post_ct_attr, attr, sizeof(*attr));
+
+	/* Modify the original rule's action to fwd and modify, leave decap */
+	pre_ct_attr->action = attr->action & MLX5_FLOW_CONTEXT_ACTION_DECAP;
+	pre_ct_attr->action |= MLX5_FLOW_CONTEXT_ACTION_FWD_DEST |
+			       MLX5_FLOW_CONTEXT_ACTION_MOD_HDR;
+
+	/* Write chain miss tag for miss in ct table as we
+	 * don't go though all prios of this chain as normal tc rules
+	 * miss.
+	 */
+	err = mlx5_esw_chains_get_chain_mapping(esw, attr->chain,
+						&chain_mapping);
+	if (err) {
+		ct_dbg("Failed to get chain register mapping for chain");
+		goto err_get_chain;
+	}
+	ct_flow->chain_mapping = chain_mapping;
+
+	err = mlx5e_tc_match_to_reg_set(esw->dev, &pre_mod_acts,
+					CHAIN_TO_REG, chain_mapping);
+	if (err) {
+		ct_dbg("Failed to set chain register mapping");
+		goto err_mapping;
+	}
+
+	err = mlx5e_tc_match_to_reg_set(esw->dev, &pre_mod_acts, ZONE_TO_REG,
+					attr->ct_attr.zone &
+					MLX5_CT_ZONE_MASK);
+	if (err) {
+		ct_dbg("Failed to set zone register mapping");
+		goto err_mapping;
+	}
+
+	err = mlx5e_tc_match_to_reg_set(esw->dev, &pre_mod_acts,
+					FTEID_TO_REG, fte_id);
+	if (err) {
+		ct_dbg("Failed to set fte_id register mapping");
+		goto err_mapping;
+	}
+
+	/* If original flow is decap, we do it before going into ct table
+	 * so add a rewrite for the tunnel match_id.
+	 */
+	if ((pre_ct_attr->action & MLX5_FLOW_CONTEXT_ACTION_DECAP) &&
+	    attr->chain == 0) {
+		u32 tun_id = mlx5e_tc_get_flow_tun_id(flow);
+
+		err = mlx5e_tc_match_to_reg_set(esw->dev, &pre_mod_acts,
+						TUNNEL_TO_REG,
+						tun_id);
+		if (err) {
+			ct_dbg("Failed to set tunnel register mapping");
+			goto err_mapping;
+		}
+	}
+
+	mod_hdr = mlx5_modify_header_alloc(esw->dev,
+					   MLX5_FLOW_NAMESPACE_FDB,
+					   pre_mod_acts.num_actions,
+					   pre_mod_acts.actions);
+	if (IS_ERR(mod_hdr)) {
+		err = PTR_ERR(mod_hdr);
+		ct_dbg("Failed to create pre ct mod hdr");
+		goto err_mapping;
+	}
+	pre_ct_attr->modify_hdr = mod_hdr;
+
+	/* Post ct rule matches on fte_id and executes original rule's
+	 * tc rule action
+	 */
+	mlx5e_tc_match_to_reg_match(&post_ct_spec, FTEID_TO_REG,
+				    fte_id, MLX5_FTE_ID_MASK);
+
+	/* Put post_ct rule on post_ct fdb */
+	ct_flow->post_ct_attr.chain = 0;
+	ct_flow->post_ct_attr.prio = 0;
+	ct_flow->post_ct_attr.fdb = ct_priv->post_ct;
+
+	ct_flow->post_ct_attr.inner_match_level = MLX5_MATCH_NONE;
+	ct_flow->post_ct_attr.outer_match_level = MLX5_MATCH_NONE;
+	ct_flow->post_ct_attr.action &= ~(MLX5_FLOW_CONTEXT_ACTION_DECAP);
+	rule = mlx5_eswitch_add_offloaded_rule(esw, &post_ct_spec,
+					       &ct_flow->post_ct_attr);
+	ct_flow->post_ct_rule = rule;
+	if (IS_ERR(ct_flow->post_ct_rule)) {
+		err = PTR_ERR(ct_flow->post_ct_rule);
+		ct_dbg("Failed to add post ct rule");
+		goto err_insert_post_ct;
+	}
+
+	/* Change original rule point to ct table */
+	pre_ct_attr->dest_chain = 0;
+	pre_ct_attr->dest_ft = nat ? ct_priv->ct_nat : ct_priv->ct;
+	ct_flow->pre_ct_rule = mlx5_eswitch_add_offloaded_rule(esw,
+							       orig_spec,
+							       pre_ct_attr);
+	if (IS_ERR(ct_flow->pre_ct_rule)) {
+		err = PTR_ERR(ct_flow->pre_ct_rule);
+		ct_dbg("Failed to add pre ct rule");
+		goto err_insert_orig;
+	}
+
+	attr->ct_attr.ct_flow = ct_flow;
+	*flow_rule = ct_flow->post_ct_rule;
+	dealloc_mod_hdr_actions(&pre_mod_acts);
+
+	return 0;
+
+err_insert_orig:
+	mlx5_eswitch_del_offloaded_rule(ct_priv->esw, ct_flow->post_ct_rule,
+					&ct_flow->post_ct_attr);
+err_insert_post_ct:
+	mlx5_modify_header_dealloc(priv->mdev, pre_ct_attr->modify_hdr);
+err_mapping:
+	dealloc_mod_hdr_actions(&pre_mod_acts);
+	mlx5_esw_chains_put_chain_mapping(esw, ct_flow->chain_mapping);
+err_get_chain:
+	idr_remove(&ct_priv->fte_ids, fte_id);
+err_idr:
+	kfree(ct_flow);
+	netdev_warn(priv->netdev, "Failed to offload ct flow, err %d\n", err);
+	return err;
+}
+
+struct mlx5_flow_handle *
+mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
+			struct mlx5e_tc_flow *flow,
+			struct mlx5_flow_spec *spec,
+			struct mlx5_esw_flow_attr *attr)
+{
+	struct mlx5_tc_ct_priv *ct_priv = mlx5_tc_ct_get_ct_priv(priv);
+	struct mlx5_flow_handle *rule;
+	int err;
+
+	if (!ct_priv)
+		return ERR_PTR(-EOPNOTSUPP);
+
+	mutex_lock(&ct_priv->control_lock);
+	err = __mlx5_tc_ct_flow_offload(priv, flow, spec, attr, &rule);
+	mutex_unlock(&ct_priv->control_lock);
+	if (err)
+		return ERR_PTR(err);
+
+	return rule;
+}
+
+static void
+__mlx5_tc_ct_delete_flow(struct mlx5_tc_ct_priv *ct_priv,
+			 struct mlx5_ct_flow *ct_flow)
+{
+	struct mlx5_esw_flow_attr *pre_ct_attr = &ct_flow->pre_ct_attr;
+	struct mlx5_eswitch *esw = ct_priv->esw;
+
+	mlx5_eswitch_del_offloaded_rule(esw, ct_flow->pre_ct_rule,
+					pre_ct_attr);
+	mlx5_modify_header_dealloc(esw->dev, pre_ct_attr->modify_hdr);
+	mlx5_eswitch_del_offloaded_rule(esw, ct_flow->post_ct_rule,
+					&ct_flow->post_ct_attr);
+	mlx5_esw_chains_put_chain_mapping(esw, ct_flow->chain_mapping);
+	idr_remove(&ct_priv->fte_ids, ct_flow->fte_id);
+	kfree(ct_flow);
+}
+
+void
+mlx5_tc_ct_delete_flow(struct mlx5e_priv *priv, struct mlx5e_tc_flow *flow,
+		       struct mlx5_esw_flow_attr *attr)
+{
+	struct mlx5_tc_ct_priv *ct_priv = mlx5_tc_ct_get_ct_priv(priv);
+	struct mlx5_ct_flow *ct_flow = attr->ct_attr.ct_flow;
+
+	/* We are called on error to clean up stuff from parsing
+	 * but we don't have anything for now
+	 */
+	if (!ct_flow)
+		return;
+
+	mutex_lock(&ct_priv->control_lock);
+	__mlx5_tc_ct_delete_flow(ct_priv, ct_flow);
+	mutex_unlock(&ct_priv->control_lock);
+}
+
+static int
+mlx5_tc_ct_init_check_support(struct mlx5_eswitch *esw,
+			      const char **err_msg)
+{
+#if !IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
+	/* cannot restore chain ID on HW miss */
+
+	*err_msg = "tc skb extension missing";
+	return -EOPNOTSUPP;
+#endif
+
+	if (!MLX5_CAP_ESW_FLOWTABLE_FDB(esw->dev, ignore_flow_level)) {
+		*err_msg = "firmware level support is missing";
+		return -EOPNOTSUPP;
+	}
+
+	if (!mlx5_eswitch_vlan_actions_supported(esw->dev, 1)) {
+		/* vlan workaround should be avoided for multi chain rules.
+		 * This is just a sanity check as pop vlan action should
+		 * be supported by any FW that supports ignore_flow_level
+		 */
+
+		*err_msg = "firmware vlan actions support is missing";
+		return -EOPNOTSUPP;
+	}
+
+	if (!MLX5_CAP_ESW_FLOWTABLE(esw->dev,
+				    fdb_modify_header_fwd_to_table)) {
+		/* CT always writes to registers which are mod header actions.
+		 * Therefore, mod header and goto is required
+		 */
+
+		*err_msg = "firmware fwd and modify support is missing";
+		return -EOPNOTSUPP;
+	}
+
+	if (!mlx5_eswitch_reg_c1_loopback_enabled(esw)) {
+		*err_msg = "register loopback isn't supported";
+		return -EOPNOTSUPP;
+	}
+
+	return 0;
+}
+
+static void
+mlx5_tc_ct_init_err(struct mlx5e_rep_priv *rpriv, const char *msg, int err)
+{
+	if (msg)
+		netdev_warn(rpriv->netdev,
+			    "tc ct offload not supported, %s, err: %d\n",
+			    msg, err);
+	else
+		netdev_warn(rpriv->netdev,
+			    "tc ct offload not supported, err: %d\n",
+			    err);
+}
+
+int
+mlx5_tc_ct_init(struct mlx5_rep_uplink_priv *uplink_priv)
+{
+	struct mlx5_tc_ct_priv *ct_priv;
+	struct mlx5e_rep_priv *rpriv;
+	struct mlx5_eswitch *esw;
+	struct mlx5e_priv *priv;
+	const char *msg;
+	int err;
+
+	rpriv = container_of(uplink_priv, struct mlx5e_rep_priv, uplink_priv);
+	priv = netdev_priv(rpriv->netdev);
+	esw = priv->mdev->priv.eswitch;
+
+	err = mlx5_tc_ct_init_check_support(esw, &msg);
+	if (err) {
+		mlx5_tc_ct_init_err(rpriv, msg, err);
+		goto err_support;
+	}
+
+	ct_priv = kzalloc(sizeof(*ct_priv), GFP_KERNEL);
+	if (!ct_priv) {
+		mlx5_tc_ct_init_err(rpriv, NULL, -ENOMEM);
+		goto err_alloc;
+	}
+
+	ct_priv->esw = esw;
+	ct_priv->netdev = rpriv->netdev;
+	ct_priv->ct = mlx5_esw_chains_create_global_table(esw);
+	if (IS_ERR(ct_priv->ct)) {
+		err = PTR_ERR(ct_priv->ct);
+		mlx5_tc_ct_init_err(rpriv, "failed to create ct table", err);
+		goto err_ct_tbl;
+	}
+
+	ct_priv->ct_nat = mlx5_esw_chains_create_global_table(esw);
+	if (IS_ERR(ct_priv->ct_nat)) {
+		err = PTR_ERR(ct_priv->ct_nat);
+		mlx5_tc_ct_init_err(rpriv, "failed to create ct nat table",
+				    err);
+		goto err_ct_nat_tbl;
+	}
+
+	ct_priv->post_ct = mlx5_esw_chains_create_global_table(esw);
+	if (IS_ERR(ct_priv->post_ct)) {
+		err = PTR_ERR(ct_priv->post_ct);
+		mlx5_tc_ct_init_err(rpriv, "failed to create post ct table",
+				    err);
+		goto err_post_ct_tbl;
+	}
+
+	idr_init(&ct_priv->fte_ids);
+	mutex_init(&ct_priv->control_lock);
+
+	/* Done, set ct_priv to know it initializted */
+	uplink_priv->ct_priv = ct_priv;
+
+	return 0;
+
+err_post_ct_tbl:
+	mlx5_esw_chains_destroy_global_table(esw, ct_priv->ct_nat);
+err_ct_nat_tbl:
+	mlx5_esw_chains_destroy_global_table(esw, ct_priv->ct);
+err_ct_tbl:
+	kfree(ct_priv);
+err_alloc:
+err_support:
+
+	return 0;
+}
+
+void
+mlx5_tc_ct_clean(struct mlx5_rep_uplink_priv *uplink_priv)
+{
+	struct mlx5_tc_ct_priv *ct_priv = uplink_priv->ct_priv;
+
+	if (!ct_priv)
+		return;
+
+	mlx5_esw_chains_destroy_global_table(ct_priv->esw, ct_priv->post_ct);
+	mlx5_esw_chains_destroy_global_table(ct_priv->esw, ct_priv->ct_nat);
+	mlx5_esw_chains_destroy_global_table(ct_priv->esw, ct_priv->ct);
+
+	mutex_destroy(&ct_priv->control_lock);
+	idr_destroy(&ct_priv->fte_ids);
+	kfree(ct_priv);
+
+	uplink_priv->ct_priv = NULL;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h
new file mode 100644
index 0000000..3a84216
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h
@@ -0,0 +1,140 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2018 Mellanox Technologies. */
+
+#ifndef __MLX5_EN_TC_CT_H__
+#define __MLX5_EN_TC_CT_H__
+
+#include <net/pkt_cls.h>
+#include <linux/mlx5/fs.h>
+#include <net/tc_act/tc_ct.h>
+
+struct mlx5_esw_flow_attr;
+struct mlx5_rep_uplink_priv;
+struct mlx5e_tc_flow;
+struct mlx5e_priv;
+
+struct mlx5_ct_flow;
+
+struct mlx5_ct_attr {
+	u16 zone;
+	u16 ct_action;
+	struct mlx5_ct_flow *ct_flow;
+};
+
+#define zone_to_reg_ct {\
+	.mfield = MLX5_ACTION_IN_FIELD_METADATA_REG_C_2,\
+	.moffset = 0,\
+	.mlen = 2,\
+	.soffset = MLX5_BYTE_OFF(fte_match_param,\
+				 misc_parameters_2.metadata_reg_c_2) + 2,\
+}
+
+#define ctstate_to_reg_ct {\
+	.mfield = MLX5_ACTION_IN_FIELD_METADATA_REG_C_2,\
+	.moffset = 2,\
+	.mlen = 2,\
+	.soffset = MLX5_BYTE_OFF(fte_match_param,\
+				 misc_parameters_2.metadata_reg_c_2),\
+}
+
+#define mark_to_reg_ct {\
+	.mfield = MLX5_ACTION_IN_FIELD_METADATA_REG_C_3,\
+	.moffset = 0,\
+	.mlen = 4,\
+	.soffset = MLX5_BYTE_OFF(fte_match_param,\
+				 misc_parameters_2.metadata_reg_c_3),\
+}
+
+#define labels_to_reg_ct {\
+	.mfield = MLX5_ACTION_IN_FIELD_METADATA_REG_C_4,\
+	.moffset = 0,\
+	.mlen = 4,\
+	.soffset = MLX5_BYTE_OFF(fte_match_param,\
+				 misc_parameters_2.metadata_reg_c_4),\
+}
+
+#define fteid_to_reg_ct {\
+	.mfield = MLX5_ACTION_IN_FIELD_METADATA_REG_C_5,\
+	.moffset = 0,\
+	.mlen = 4,\
+	.soffset = MLX5_BYTE_OFF(fte_match_param,\
+				 misc_parameters_2.metadata_reg_c_5),\
+}
+
+#if IS_ENABLED(CONFIG_MLX5_TC_CT)
+
+int
+mlx5_tc_ct_init(struct mlx5_rep_uplink_priv *uplink_priv);
+void
+mlx5_tc_ct_clean(struct mlx5_rep_uplink_priv *uplink_priv);
+
+int
+mlx5_tc_ct_parse_match(struct mlx5e_priv *priv,
+		       struct mlx5_flow_spec *spec,
+		       struct flow_cls_offload *f,
+		       struct netlink_ext_ack *extack);
+int
+mlx5_tc_ct_parse_action(struct mlx5e_priv *priv,
+			struct mlx5_esw_flow_attr *attr,
+			const struct flow_action_entry *act,
+			struct netlink_ext_ack *extack);
+
+struct mlx5_flow_handle *
+mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
+			struct mlx5e_tc_flow *flow,
+			struct mlx5_flow_spec *spec,
+			struct mlx5_esw_flow_attr *attr);
+void
+mlx5_tc_ct_delete_flow(struct mlx5e_priv *priv,
+		       struct mlx5e_tc_flow *flow,
+		       struct mlx5_esw_flow_attr *attr);
+
+#else /* CONFIG_MLX5_TC_CT */
+
+static inline int
+mlx5_tc_ct_init(struct mlx5_rep_uplink_priv *uplink_priv)
+{
+	return 0;
+}
+
+static inline void
+mlx5_tc_ct_clean(struct mlx5_rep_uplink_priv *uplink_priv)
+{
+}
+
+static inline int
+mlx5_tc_ct_parse_match(struct mlx5e_priv *priv,
+		       struct mlx5_flow_spec *spec,
+		       struct flow_cls_offload *f,
+		       struct netlink_ext_ack *extack)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline int
+mlx5_tc_ct_parse_action(struct mlx5e_priv *priv,
+			struct mlx5_esw_flow_attr *attr,
+			const struct flow_action_entry *act,
+			struct netlink_ext_ack *extack)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline struct mlx5_flow_handle *
+mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
+			struct mlx5e_tc_flow *flow,
+			struct mlx5_flow_spec *spec,
+			struct mlx5_esw_flow_attr *attr)
+{
+	return ERR_PTR(-EOPNOTSUPP);
+}
+
+static inline void
+mlx5_tc_ct_delete_flow(struct mlx5e_priv *priv,
+		       struct mlx5e_tc_flow *flow,
+		       struct mlx5_esw_flow_attr *attr)
+{
+}
+
+#endif /* !IS_ENABLED(CONFIG_MLX5_TC_CT) */
+#endif /* __MLX5_EN_TC_CT_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
index 2bbdbdc..6a23379 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
@@ -55,6 +55,7 @@ struct mlx5e_neigh_update_table {
 	unsigned long           min_interval; /* jiffies */
 };
 
+struct mlx5_tc_ct_priv;
 struct mlx5_rep_uplink_priv {
 	/* Filters DB - instantiated by the uplink representor and shared by
 	 * the uplink's VFs
@@ -86,6 +87,8 @@ struct mlx5_rep_uplink_priv {
 	struct mapping_ctx *tunnel_mapping;
 	/* maps tun_enc_opts to a unique id*/
 	struct mapping_ctx *tunnel_enc_opts_mapping;
+
+	struct mlx5_tc_ct_priv *ct_priv;
 };
 
 struct mlx5e_rep_priv {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 4b04992..1f6a306 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -56,6 +56,7 @@
 #include "en/port.h"
 #include "en/tc_tun.h"
 #include "en/mapping.h"
+#include "en/tc_ct.h"
 #include "lib/devcom.h"
 #include "lib/geneve.h"
 #include "diag/en_tc_tracepoint.h"
@@ -87,6 +88,7 @@ enum {
 	MLX5E_TC_FLOW_FLAG_DUP		= MLX5E_TC_FLOW_BASE + 4,
 	MLX5E_TC_FLOW_FLAG_NOT_READY	= MLX5E_TC_FLOW_BASE + 5,
 	MLX5E_TC_FLOW_FLAG_DELETED	= MLX5E_TC_FLOW_BASE + 6,
+	MLX5E_TC_FLOW_FLAG_CT		= MLX5E_TC_FLOW_BASE + 7,
 };
 
 #define MLX5E_TC_MAX_SPLITS 1
@@ -193,6 +195,11 @@ struct mlx5e_tc_attr_to_reg_mapping mlx5e_tc_attr_to_reg_mappings[] = {
 		.soffset = MLX5_BYTE_OFF(fte_match_param,
 					 misc_parameters_2.metadata_reg_c_1),
 	},
+	[ZONE_TO_REG] = zone_to_reg_ct,
+	[CTSTATE_TO_REG] = ctstate_to_reg_ct,
+	[MARK_TO_REG] = mark_to_reg_ct,
+	[LABELS_TO_REG] = labels_to_reg_ct,
+	[FTEID_TO_REG] = fteid_to_reg_ct,
 };
 
 static void mlx5e_put_flow_tunnel_id(struct mlx5e_tc_flow *flow);
@@ -1144,6 +1151,10 @@ static int mlx5e_attach_encap(struct mlx5e_priv *priv,
 			   struct mlx5_esw_flow_attr *attr)
 {
 	struct mlx5_flow_handle *rule;
+	struct mlx5e_tc_mod_hdr_acts;
+
+	if (flow_flag_test(flow, CT))
+		return mlx5_tc_ct_flow_offload(flow->priv, flow, spec, attr);
 
 	rule = mlx5_eswitch_add_offloaded_rule(esw, spec, attr);
 	if (IS_ERR(rule))
@@ -1163,10 +1174,15 @@ static int mlx5e_attach_encap(struct mlx5e_priv *priv,
 static void
 mlx5e_tc_unoffload_fdb_rules(struct mlx5_eswitch *esw,
 			     struct mlx5e_tc_flow *flow,
-			   struct mlx5_esw_flow_attr *attr)
+			     struct mlx5_esw_flow_attr *attr)
 {
 	flow_flag_clear(flow, OFFLOADED);
 
+	if (flow_flag_test(flow, CT)) {
+		mlx5_tc_ct_delete_flow(flow->priv, flow, attr);
+		return;
+	}
+
 	if (attr->split_count)
 		mlx5_eswitch_del_fwd_rule(esw, flow->rule[1], attr);
 
@@ -1938,6 +1954,11 @@ static void mlx5e_put_flow_tunnel_id(struct mlx5e_tc_flow *flow)
 			       enc_opts_id);
 }
 
+u32 mlx5e_tc_get_flow_tun_id(struct mlx5e_tc_flow *flow)
+{
+	return flow->tunnel_id;
+}
+
 static int parse_tunnel_attr(struct mlx5e_priv *priv,
 			     struct mlx5e_tc_flow *flow,
 			     struct mlx5_flow_spec *spec,
@@ -2103,6 +2124,7 @@ static int __parse_cls_flower(struct mlx5e_priv *priv,
 	      BIT(FLOW_DISSECTOR_KEY_ENC_CONTROL) |
 	      BIT(FLOW_DISSECTOR_KEY_TCP) |
 	      BIT(FLOW_DISSECTOR_KEY_IP)  |
+	      BIT(FLOW_DISSECTOR_KEY_CT) |
 	      BIT(FLOW_DISSECTOR_KEY_ENC_IP) |
 	      BIT(FLOW_DISSECTOR_KEY_ENC_OPTS))) {
 		NL_SET_ERR_MSG_MOD(extack, "Unsupported key");
@@ -2913,7 +2935,9 @@ struct ipv6_hoplimit_word {
 	__u8	hop_limit;
 };
 
-static bool is_action_keys_supported(const struct flow_action_entry *act)
+static int is_action_keys_supported(const struct flow_action_entry *act,
+				    bool ct_flow, bool *modify_ip_header,
+				    struct netlink_ext_ack *extack)
 {
 	u32 mask, offset;
 	u8 htype;
@@ -2932,7 +2956,13 @@ static bool is_action_keys_supported(const struct flow_action_entry *act)
 		if (offset != offsetof(struct iphdr, ttl) ||
 		    ttl_word->protocol ||
 		    ttl_word->check) {
-			return true;
+			*modify_ip_header = true;
+		}
+
+		if (ct_flow && offset >= offsetof(struct iphdr, saddr)) {
+			NL_SET_ERR_MSG_MOD(extack,
+					   "can't offload re-write of ipv4 address with action ct");
+			return -EOPNOTSUPP;
 		}
 	} else if (htype == FLOW_ACT_MANGLE_HDR_TYPE_IP6) {
 		struct ipv6_hoplimit_word *hoplimit_word =
@@ -2941,15 +2971,27 @@ static bool is_action_keys_supported(const struct flow_action_entry *act)
 		if (offset != offsetof(struct ipv6hdr, payload_len) ||
 		    hoplimit_word->payload_len ||
 		    hoplimit_word->nexthdr) {
-			return true;
+			*modify_ip_header = true;
+		}
+
+		if (ct_flow && offset >= offsetof(struct ipv6hdr, saddr)) {
+			NL_SET_ERR_MSG_MOD(extack,
+					   "can't offload re-write of ipv6 address with action ct");
+			return -EOPNOTSUPP;
 		}
+	} else if (ct_flow && (htype == FLOW_ACT_MANGLE_HDR_TYPE_TCP ||
+			       htype == FLOW_ACT_MANGLE_HDR_TYPE_UDP)) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "can't offload re-write of transport header ports with action ct");
+		return -EOPNOTSUPP;
 	}
-	return false;
+
+	return 0;
 }
 
 static bool modify_header_match_supported(struct mlx5_flow_spec *spec,
 					  struct flow_action *flow_action,
-					  u32 actions,
+					  u32 actions, bool ct_flow,
 					  struct netlink_ext_ack *extack)
 {
 	const struct flow_action_entry *act;
@@ -2957,7 +2999,7 @@ static bool modify_header_match_supported(struct mlx5_flow_spec *spec,
 	void *headers_v;
 	u16 ethertype;
 	u8 ip_proto;
-	int i;
+	int i, err;
 
 	headers_v = get_match_headers_value(actions, spec);
 	ethertype = MLX5_GET(fte_match_set_lyr_2_4, headers_v, ethertype);
@@ -2972,10 +3014,10 @@ static bool modify_header_match_supported(struct mlx5_flow_spec *spec,
 		    act->id != FLOW_ACTION_ADD)
 			continue;
 
-		if (is_action_keys_supported(act)) {
-			modify_ip_header = true;
-			break;
-		}
+		err = is_action_keys_supported(act, ct_flow,
+					       &modify_ip_header, extack);
+		if (err)
+			return err;
 	}
 
 	ip_proto = MLX5_GET(fte_match_set_lyr_2_4, headers_v, ip_protocol);
@@ -2998,13 +3040,24 @@ static bool actions_match_supported(struct mlx5e_priv *priv,
 				    struct netlink_ext_ack *extack)
 {
 	struct net_device *filter_dev = parse_attr->filter_dev;
-	bool drop_action, pop_action;
+	bool drop_action, pop_action, ct_flow;
 	u32 actions;
 
-	if (mlx5e_is_eswitch_flow(flow))
+	ct_flow = flow_flag_test(flow, CT);
+	if (mlx5e_is_eswitch_flow(flow)) {
 		actions = flow->esw_attr->action;
-	else
+
+		if (flow->esw_attr->split_count && ct_flow) {
+			/* All registers used by ct are cleared when using
+			 * split rules.
+			 */
+			NL_SET_ERR_MSG_MOD(extack,
+					   "Can't offload mirroring with action ct");
+			return -EOPNOTSUPP;
+		}
+	} else {
 		actions = flow->nic_attr->action;
+	}
 
 	drop_action = actions & MLX5_FLOW_CONTEXT_ACTION_DROP;
 	pop_action = actions & MLX5_FLOW_CONTEXT_ACTION_VLAN_POP;
@@ -3021,7 +3074,7 @@ static bool actions_match_supported(struct mlx5e_priv *priv,
 	if (actions & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
 		return modify_header_match_supported(&parse_attr->spec,
 						     flow_action, actions,
-						     extack);
+						     ct_flow, extack);
 
 	return true;
 }
@@ -3826,6 +3879,13 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv,
 			action |= MLX5_FLOW_CONTEXT_ACTION_COUNT;
 			attr->dest_chain = act->chain_index;
 			break;
+		case FLOW_ACTION_CT:
+			err = mlx5_tc_ct_parse_action(priv, attr, act, extack);
+			if (err)
+				return err;
+
+			flow_flag_set(flow, CT);
+			break;
 		default:
 			NL_SET_ERR_MSG_MOD(extack, "The offload action is not supported");
 			return -EOPNOTSUPP;
@@ -4066,6 +4126,10 @@ static bool is_peer_flow_needed(struct mlx5e_tc_flow *flow)
 	if (err)
 		goto err_free;
 
+	err = mlx5_tc_ct_parse_match(priv, &parse_attr->spec, f, extack);
+	if (err)
+		goto err_free;
+
 	err = mlx5e_tc_add_fdb_flow(priv, flow, extack);
 	complete_all(&flow->init_done);
 	if (err) {
@@ -4350,7 +4414,7 @@ int mlx5e_stats_flower(struct net_device *dev, struct mlx5e_priv *priv,
 		goto errout;
 	}
 
-	if (mlx5e_is_offloaded_flow(flow)) {
+	if (mlx5e_is_offloaded_flow(flow) || flow_flag_test(flow, CT)) {
 		counter = mlx5e_tc_get_counter(flow);
 		if (!counter)
 			goto errout;
@@ -4622,6 +4686,10 @@ int mlx5e_tc_esw_init(struct rhashtable *tc_ht)
 	uplink_priv = container_of(tc_ht, struct mlx5_rep_uplink_priv, tc_ht);
 	priv = container_of(uplink_priv, struct mlx5e_rep_priv, uplink_priv);
 
+	err = mlx5_tc_ct_init(uplink_priv);
+	if (err)
+		goto err_ct;
+
 	mapping = mapping_create(sizeof(struct tunnel_match_key),
 				 TUNNEL_INFO_BITS_MASK, true);
 	if (IS_ERR(mapping)) {
@@ -4648,6 +4716,8 @@ int mlx5e_tc_esw_init(struct rhashtable *tc_ht)
 err_enc_opts_mapping:
 	mapping_destroy(uplink_priv->tunnel_mapping);
 err_tun_mapping:
+	mlx5_tc_ct_clean(uplink_priv);
+err_ct:
 	netdev_warn(priv->netdev,
 		    "Failed to initialize tc (eswitch), err: %d", err);
 	return err;
@@ -4662,6 +4732,8 @@ void mlx5e_tc_esw_cleanup(struct rhashtable *tc_ht)
 	uplink_priv = container_of(tc_ht, struct mlx5_rep_uplink_priv, tc_ht);
 	mapping_destroy(uplink_priv->tunnel_enc_opts_mapping);
 	mapping_destroy(uplink_priv->tunnel_mapping);
+
+	mlx5_tc_ct_clean(uplink_priv);
 }
 
 int mlx5e_tc_num_filters(struct mlx5e_priv *priv, unsigned long flags)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
index 21cbde4..31c9e81 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
@@ -94,6 +94,11 @@ void mlx5e_tc_encap_flows_del(struct mlx5e_priv *priv,
 enum mlx5e_tc_attr_to_reg {
 	CHAIN_TO_REG,
 	TUNNEL_TO_REG,
+	CTSTATE_TO_REG,
+	ZONE_TO_REG,
+	MARK_TO_REG,
+	LABELS_TO_REG,
+	FTEID_TO_REG,
 };
 
 struct mlx5e_tc_attr_to_reg_mapping {
@@ -139,6 +144,9 @@ int alloc_mod_hdr_actions(struct mlx5_core_dev *mdev,
 			  struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts);
 void dealloc_mod_hdr_actions(struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts);
 
+struct mlx5e_tc_flow;
+u32 mlx5e_tc_get_flow_tun_id(struct mlx5e_tc_flow *flow);
+
 #else /* CONFIG_MLX5_ESWITCH */
 static inline int  mlx5e_tc_nic_init(struct mlx5e_priv *priv) { return 0; }
 static inline void mlx5e_tc_nic_cleanup(struct mlx5e_priv *priv) {}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 6254bb6..2e0417d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -42,6 +42,7 @@
 #include <linux/mlx5/vport.h>
 #include <linux/mlx5/fs.h>
 #include "lib/mpfs.h"
+#include "en/tc_ct.h"
 
 #define FDB_TC_MAX_CHAIN 3
 #define FDB_FT_CHAIN (FDB_TC_MAX_CHAIN + 1)
@@ -424,6 +425,7 @@ struct mlx5_esw_flow_attr {
 	u32	flags;
 	struct mlx5_flow_table *fdb;
 	struct mlx5_flow_table *dest_ft;
+	struct mlx5_ct_attr ct_attr;
 	struct mlx5e_tc_flow_parse_attr *parse_attr;
 };
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next ct-offload v3 13/15] net/mlx5e: CT: Offload established flows
  2020-03-11 14:33 [PATCH net-next ct-offload v3 00/15] Introduce connection tracking offload Paul Blakey
                   ` (11 preceding siblings ...)
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 12/15] net/mlx5e: CT: Introduce connection tracking Paul Blakey
@ 2020-03-11 14:33 ` Paul Blakey
  2020-03-11 17:45   ` Edward Cree
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 14/15] net/mlx5e: CT: Handle misses after executing CT action Paul Blakey
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 27+ messages in thread
From: Paul Blakey @ 2020-03-11 14:33 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko, Roi Dayan

Register driver callbacks with the nf flow table platform.
FT add/delete events will create/delete FTE in the CT/CT_NAT tables.

Restoring the CT state on miss will be added in the following patch.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
---
  Changelog:
    v2->v3:
       Args of mlx5_tc_ct_del_ft_cb in a single line
       Add flow table ct entries to a list, and flush it on last ct rule after deleting the cb
    v1->v2:
       Remove zone param from metadata

 drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c | 679 +++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h |   3 +
 2 files changed, 682 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
index c113046..124aca5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
@@ -10,6 +10,7 @@
 #include <uapi/linux/tc_act/tc_pedit.h>
 #include <net/tc_act/tc_ct.h>
 #include <net/flow_offload.h>
+#include <net/netfilter/nf_flow_table.h>
 #include <linux/workqueue.h>
 
 #include "en/tc_ct.h"
@@ -34,6 +35,7 @@ struct mlx5_tc_ct_priv {
 	struct mlx5_eswitch *esw;
 	const struct net_device *netdev;
 	struct idr fte_ids;
+	struct rhashtable zone_ht;
 	struct mlx5_flow_table *ct;
 	struct mlx5_flow_table *ct_nat;
 	struct mlx5_flow_table *post_ct;
@@ -45,10 +47,53 @@ struct mlx5_ct_flow {
 	struct mlx5_esw_flow_attr post_ct_attr;
 	struct mlx5_flow_handle *pre_ct_rule;
 	struct mlx5_flow_handle *post_ct_rule;
+	struct mlx5_ct_ft *ft;
 	u32 fte_id;
 	u32 chain_mapping;
 };
 
+struct mlx5_ct_zone_rule {
+	struct mlx5_flow_handle *rule;
+	struct mlx5_esw_flow_attr attr;
+	bool nat;
+};
+
+struct mlx5_ct_ft {
+	struct rhash_head node;
+	u16 zone;
+	refcount_t refcount;
+	struct nf_flowtable *nf_ft;
+	struct mlx5_tc_ct_priv *ct_priv;
+	struct rhashtable ct_entries_ht;
+	struct list_head ct_entries_list;
+};
+
+struct mlx5_ct_entry {
+	struct list_head list;
+	u16 zone;
+	struct rhash_head node;
+	struct flow_rule *flow_rule;
+	struct mlx5_fc *counter;
+	unsigned long lastuse;
+	unsigned long cookie;
+	struct mlx5_ct_zone_rule zone_rules[2];
+};
+
+static const struct rhashtable_params cts_ht_params = {
+	.head_offset = offsetof(struct mlx5_ct_entry, node),
+	.key_offset = offsetof(struct mlx5_ct_entry, cookie),
+	.key_len = sizeof(((struct mlx5_ct_entry *)0)->cookie),
+	.automatic_shrinking = true,
+	.min_size = 16 * 1024,
+};
+
+static const struct rhashtable_params zone_params = {
+	.head_offset = offsetof(struct mlx5_ct_ft, node),
+	.key_offset = offsetof(struct mlx5_ct_ft, zone),
+	.key_len = sizeof(((struct mlx5_ct_ft *)0)->zone),
+	.automatic_shrinking = true,
+};
+
 static struct mlx5_tc_ct_priv *
 mlx5_tc_ct_get_ct_priv(struct mlx5e_priv *priv)
 {
@@ -61,6 +106,552 @@ struct mlx5_ct_flow {
 	return uplink_priv->ct_priv;
 }
 
+static int
+mlx5_tc_ct_set_tuple_match(struct mlx5_flow_spec *spec,
+			   struct flow_rule *rule)
+{
+	void *headers_c = MLX5_ADDR_OF(fte_match_param, spec->match_criteria,
+				       outer_headers);
+	void *headers_v = MLX5_ADDR_OF(fte_match_param, spec->match_value,
+				       outer_headers);
+	u16 addr_type = 0;
+	u8 ip_proto = 0;
+
+	if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_BASIC)) {
+		struct flow_match_basic match;
+
+		flow_rule_match_basic(rule, &match);
+
+		MLX5_SET(fte_match_set_lyr_2_4, headers_c, ethertype,
+			 ntohs(match.mask->n_proto));
+		MLX5_SET(fte_match_set_lyr_2_4, headers_v, ethertype,
+			 ntohs(match.key->n_proto));
+		MLX5_SET(fte_match_set_lyr_2_4, headers_c, ip_protocol,
+			 match.mask->ip_proto);
+		MLX5_SET(fte_match_set_lyr_2_4, headers_v, ip_protocol,
+			 match.key->ip_proto);
+
+		ip_proto = match.key->ip_proto;
+	}
+
+	if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_CONTROL)) {
+		struct flow_match_control match;
+
+		flow_rule_match_control(rule, &match);
+		addr_type = match.key->addr_type;
+	}
+
+	if (addr_type == FLOW_DISSECTOR_KEY_IPV4_ADDRS) {
+		struct flow_match_ipv4_addrs match;
+
+		flow_rule_match_ipv4_addrs(rule, &match);
+		memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_c,
+				    src_ipv4_src_ipv6.ipv4_layout.ipv4),
+		       &match.mask->src, sizeof(match.mask->src));
+		memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_v,
+				    src_ipv4_src_ipv6.ipv4_layout.ipv4),
+		       &match.key->src, sizeof(match.key->src));
+		memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_c,
+				    dst_ipv4_dst_ipv6.ipv4_layout.ipv4),
+		       &match.mask->dst, sizeof(match.mask->dst));
+		memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_v,
+				    dst_ipv4_dst_ipv6.ipv4_layout.ipv4),
+		       &match.key->dst, sizeof(match.key->dst));
+	}
+
+	if (addr_type == FLOW_DISSECTOR_KEY_IPV6_ADDRS) {
+		struct flow_match_ipv6_addrs match;
+
+		flow_rule_match_ipv6_addrs(rule, &match);
+		memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_c,
+				    src_ipv4_src_ipv6.ipv6_layout.ipv6),
+		       &match.mask->src, sizeof(match.mask->src));
+		memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_v,
+				    src_ipv4_src_ipv6.ipv6_layout.ipv6),
+		       &match.key->src, sizeof(match.key->src));
+
+		memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_c,
+				    dst_ipv4_dst_ipv6.ipv6_layout.ipv6),
+		       &match.mask->dst, sizeof(match.mask->dst));
+		memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_v,
+				    dst_ipv4_dst_ipv6.ipv6_layout.ipv6),
+		       &match.key->dst, sizeof(match.key->dst));
+	}
+
+	if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_PORTS)) {
+		struct flow_match_ports match;
+
+		flow_rule_match_ports(rule, &match);
+		switch (ip_proto) {
+		case IPPROTO_TCP:
+			MLX5_SET(fte_match_set_lyr_2_4, headers_c,
+				 tcp_sport, ntohs(match.mask->src));
+			MLX5_SET(fte_match_set_lyr_2_4, headers_v,
+				 tcp_sport, ntohs(match.key->src));
+
+			MLX5_SET(fte_match_set_lyr_2_4, headers_c,
+				 tcp_dport, ntohs(match.mask->dst));
+			MLX5_SET(fte_match_set_lyr_2_4, headers_v,
+				 tcp_dport, ntohs(match.key->dst));
+			break;
+
+		case IPPROTO_UDP:
+			MLX5_SET(fte_match_set_lyr_2_4, headers_c,
+				 udp_sport, ntohs(match.mask->src));
+			MLX5_SET(fte_match_set_lyr_2_4, headers_v,
+				 udp_sport, ntohs(match.key->src));
+
+			MLX5_SET(fte_match_set_lyr_2_4, headers_c,
+				 udp_dport, ntohs(match.mask->dst));
+			MLX5_SET(fte_match_set_lyr_2_4, headers_v,
+				 udp_dport, ntohs(match.key->dst));
+			break;
+		default:
+			break;
+		}
+	}
+
+	if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_TCP)) {
+		struct flow_match_tcp match;
+
+		flow_rule_match_tcp(rule, &match);
+		MLX5_SET(fte_match_set_lyr_2_4, headers_c, tcp_flags,
+			 ntohs(match.mask->flags));
+		MLX5_SET(fte_match_set_lyr_2_4, headers_v, tcp_flags,
+			 ntohs(match.key->flags));
+	}
+
+	return 0;
+}
+
+static void
+mlx5_tc_ct_entry_del_rule(struct mlx5_tc_ct_priv *ct_priv,
+			  struct mlx5_ct_entry *entry,
+			  bool nat)
+{
+	struct mlx5_ct_zone_rule *zone_rule = &entry->zone_rules[nat];
+	struct mlx5_esw_flow_attr *attr = &zone_rule->attr;
+	struct mlx5_eswitch *esw = ct_priv->esw;
+
+	ct_dbg("Deleting ct entry rule in zone %d", entry->zone);
+
+	mlx5_eswitch_del_offloaded_rule(esw, zone_rule->rule, attr);
+	mlx5_modify_header_dealloc(esw->dev, attr->modify_hdr);
+}
+
+static void
+mlx5_tc_ct_entry_del_rules(struct mlx5_tc_ct_priv *ct_priv,
+			   struct mlx5_ct_entry *entry)
+{
+	mlx5_tc_ct_entry_del_rule(ct_priv, entry, true);
+	mlx5_tc_ct_entry_del_rule(ct_priv, entry, false);
+
+	mlx5_fc_destroy(ct_priv->esw->dev, entry->counter);
+}
+
+static struct flow_action_entry *
+mlx5_tc_ct_get_ct_metadata_action(struct flow_rule *flow_rule)
+{
+	struct flow_action *flow_action = &flow_rule->action;
+	struct flow_action_entry *act;
+	int i;
+
+	flow_action_for_each(i, act, flow_action) {
+		if (act->id == FLOW_ACTION_CT_METADATA)
+			return act;
+	}
+
+	return NULL;
+}
+
+static int
+mlx5_tc_ct_entry_set_registers(struct mlx5_tc_ct_priv *ct_priv,
+			       struct mlx5e_tc_mod_hdr_acts *mod_acts,
+			       u8 ct_state,
+			       u32 mark,
+			       u32 label)
+{
+	struct mlx5_eswitch *esw = ct_priv->esw;
+	int err;
+
+	err = mlx5e_tc_match_to_reg_set(esw->dev, mod_acts,
+					CTSTATE_TO_REG, ct_state);
+	if (err)
+		return err;
+
+	err = mlx5e_tc_match_to_reg_set(esw->dev, mod_acts,
+					MARK_TO_REG, mark);
+	if (err)
+		return err;
+
+	err = mlx5e_tc_match_to_reg_set(esw->dev, mod_acts,
+					LABELS_TO_REG, label);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+static int
+mlx5_tc_ct_parse_mangle_to_mod_act(struct flow_action_entry *act,
+				   char *modact)
+{
+	u32 offset = act->mangle.offset, field;
+
+	switch (act->mangle.htype) {
+	case FLOW_ACT_MANGLE_HDR_TYPE_IP4:
+		MLX5_SET(set_action_in, modact, length, 0);
+		field = offset == offsetof(struct iphdr, saddr) ?
+			MLX5_ACTION_IN_FIELD_OUT_SIPV4 :
+			MLX5_ACTION_IN_FIELD_OUT_DIPV4;
+		break;
+
+	case FLOW_ACT_MANGLE_HDR_TYPE_IP6:
+		MLX5_SET(set_action_in, modact, length, 0);
+		if (offset == offsetof(struct ipv6hdr, saddr))
+			field = MLX5_ACTION_IN_FIELD_OUT_SIPV6_31_0;
+		else if (offset == offsetof(struct ipv6hdr, saddr) + 4)
+			field = MLX5_ACTION_IN_FIELD_OUT_SIPV6_63_32;
+		else if (offset == offsetof(struct ipv6hdr, saddr) + 8)
+			field = MLX5_ACTION_IN_FIELD_OUT_SIPV6_95_64;
+		else if (offset == offsetof(struct ipv6hdr, saddr) + 12)
+			field = MLX5_ACTION_IN_FIELD_OUT_SIPV6_127_96;
+		else if (offset == offsetof(struct ipv6hdr, daddr))
+			field = MLX5_ACTION_IN_FIELD_OUT_DIPV6_31_0;
+		else if (offset == offsetof(struct ipv6hdr, daddr) + 4)
+			field = MLX5_ACTION_IN_FIELD_OUT_DIPV6_63_32;
+		else if (offset == offsetof(struct ipv6hdr, daddr) + 8)
+			field = MLX5_ACTION_IN_FIELD_OUT_DIPV6_95_64;
+		else if (offset == offsetof(struct ipv6hdr, daddr) + 12)
+			field = MLX5_ACTION_IN_FIELD_OUT_DIPV6_127_96;
+		else
+			return -EOPNOTSUPP;
+		break;
+
+	case FLOW_ACT_MANGLE_HDR_TYPE_TCP:
+		MLX5_SET(set_action_in, modact, length, 16);
+		field = offset == offsetof(struct tcphdr, source) ?
+			MLX5_ACTION_IN_FIELD_OUT_TCP_SPORT :
+			MLX5_ACTION_IN_FIELD_OUT_TCP_DPORT;
+		break;
+
+	case FLOW_ACT_MANGLE_HDR_TYPE_UDP:
+		MLX5_SET(set_action_in, modact, length, 16);
+		field = offset == offsetof(struct udphdr, source) ?
+			MLX5_ACTION_IN_FIELD_OUT_UDP_SPORT :
+			MLX5_ACTION_IN_FIELD_OUT_TCP_DPORT;
+		break;
+
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	MLX5_SET(set_action_in, modact, action_type, MLX5_ACTION_TYPE_SET);
+	MLX5_SET(set_action_in, modact, offset, 0);
+	MLX5_SET(set_action_in, modact, field, field);
+	MLX5_SET(set_action_in, modact, data, act->mangle.val);
+
+	return 0;
+}
+
+static int
+mlx5_tc_ct_entry_create_nat(struct mlx5_tc_ct_priv *ct_priv,
+			    struct flow_rule *flow_rule,
+			    struct mlx5e_tc_mod_hdr_acts *mod_acts)
+{
+	struct flow_action *flow_action = &flow_rule->action;
+	struct mlx5_core_dev *mdev = ct_priv->esw->dev;
+	struct flow_action_entry *act;
+	size_t action_size;
+	char *modact;
+	int err, i;
+
+	action_size = MLX5_UN_SZ_BYTES(set_action_in_add_action_in_auto);
+
+	flow_action_for_each(i, act, flow_action) {
+		switch (act->id) {
+		case FLOW_ACTION_MANGLE: {
+			err = alloc_mod_hdr_actions(mdev,
+						    MLX5_FLOW_NAMESPACE_FDB,
+						    mod_acts);
+			if (err)
+				return err;
+
+			modact = mod_acts->actions +
+				 mod_acts->num_actions * action_size;
+
+			err = mlx5_tc_ct_parse_mangle_to_mod_act(act, modact);
+			if (err)
+				return err;
+
+			mod_acts->num_actions++;
+		}
+		break;
+
+		case FLOW_ACTION_CT_METADATA:
+			/* Handled earlier */
+			continue;
+		default:
+			return -EOPNOTSUPP;
+		}
+	}
+
+	return 0;
+}
+
+static int
+mlx5_tc_ct_entry_create_mod_hdr(struct mlx5_tc_ct_priv *ct_priv,
+				struct mlx5_esw_flow_attr *attr,
+				struct flow_rule *flow_rule,
+				bool nat)
+{
+	struct mlx5e_tc_mod_hdr_acts mod_acts = {};
+	struct mlx5_eswitch *esw = ct_priv->esw;
+	struct mlx5_modify_hdr *mod_hdr;
+	struct flow_action_entry *meta;
+	int err;
+
+	meta = mlx5_tc_ct_get_ct_metadata_action(flow_rule);
+	if (!meta)
+		return -EOPNOTSUPP;
+
+	if (meta->ct_metadata.labels[1] ||
+	    meta->ct_metadata.labels[2] ||
+	    meta->ct_metadata.labels[3]) {
+		ct_dbg("Failed to offload ct entry due to unsupported label");
+		return -EOPNOTSUPP;
+	}
+
+	if (nat) {
+		err = mlx5_tc_ct_entry_create_nat(ct_priv, flow_rule,
+						  &mod_acts);
+		if (err)
+			goto err_mapping;
+	}
+
+	err = mlx5_tc_ct_entry_set_registers(ct_priv, &mod_acts,
+					     (MLX5_CT_STATE_ESTABLISHED_BIT |
+					      MLX5_CT_STATE_TRK_BIT),
+					     meta->ct_metadata.mark,
+					     meta->ct_metadata.labels[0]);
+	if (err)
+		goto err_mapping;
+
+	mod_hdr = mlx5_modify_header_alloc(esw->dev, MLX5_FLOW_NAMESPACE_FDB,
+					   mod_acts.num_actions,
+					   mod_acts.actions);
+	if (IS_ERR(mod_hdr)) {
+		err = PTR_ERR(mod_hdr);
+		goto err_mapping;
+	}
+	attr->modify_hdr = mod_hdr;
+
+	dealloc_mod_hdr_actions(&mod_acts);
+	return 0;
+
+err_mapping:
+	dealloc_mod_hdr_actions(&mod_acts);
+	return err;
+}
+
+static int
+mlx5_tc_ct_entry_add_rule(struct mlx5_tc_ct_priv *ct_priv,
+			  struct flow_rule *flow_rule,
+			  struct mlx5_ct_entry *entry,
+			  bool nat)
+{
+	struct mlx5_ct_zone_rule *zone_rule = &entry->zone_rules[nat];
+	struct mlx5_esw_flow_attr *attr = &zone_rule->attr;
+	struct mlx5_eswitch *esw = ct_priv->esw;
+	struct mlx5_flow_spec spec = {};
+	int err;
+
+	zone_rule->nat = nat;
+
+	err = mlx5_tc_ct_entry_create_mod_hdr(ct_priv, attr, flow_rule, nat);
+	if (err) {
+		ct_dbg("Failed to create ct entry mod hdr");
+		return err;
+	}
+
+	attr->action = MLX5_FLOW_CONTEXT_ACTION_MOD_HDR |
+		       MLX5_FLOW_CONTEXT_ACTION_FWD_DEST |
+		       MLX5_FLOW_CONTEXT_ACTION_COUNT;
+	attr->dest_chain = 0;
+	attr->dest_ft = ct_priv->post_ct;
+	attr->fdb = nat ? ct_priv->ct_nat : ct_priv->ct;
+	attr->outer_match_level = MLX5_MATCH_L4;
+	attr->counter = entry->counter;
+	attr->flags |= MLX5_ESW_ATTR_FLAG_NO_IN_PORT;
+
+	mlx5_tc_ct_set_tuple_match(&spec, flow_rule);
+	mlx5e_tc_match_to_reg_match(&spec, ZONE_TO_REG,
+				    entry->zone & MLX5_CT_ZONE_MASK,
+				    MLX5_CT_ZONE_MASK);
+
+	zone_rule->rule = mlx5_eswitch_add_offloaded_rule(esw, &spec, attr);
+	if (IS_ERR(zone_rule->rule)) {
+		err = PTR_ERR(zone_rule->rule);
+		ct_dbg("Failed to add ct entry rule, nat: %d", nat);
+		goto err_rule;
+	}
+
+	ct_dbg("Offloaded ct entry rule in zone %d", entry->zone);
+
+	return 0;
+
+err_rule:
+	mlx5_modify_header_dealloc(esw->dev, attr->modify_hdr);
+	return err;
+}
+
+static int
+mlx5_tc_ct_entry_add_rules(struct mlx5_tc_ct_priv *ct_priv,
+			   struct flow_rule *flow_rule,
+			   struct mlx5_ct_entry *entry)
+{
+	struct mlx5_eswitch *esw = ct_priv->esw;
+	int err;
+
+	entry->counter = mlx5_fc_create(esw->dev, true);
+	if (IS_ERR(entry->counter)) {
+		err = PTR_ERR(entry->counter);
+		ct_dbg("Failed to create counter for ct entry");
+		return err;
+	}
+
+	err = mlx5_tc_ct_entry_add_rule(ct_priv, flow_rule, entry, false);
+	if (err)
+		goto err_orig;
+
+	err = mlx5_tc_ct_entry_add_rule(ct_priv, flow_rule, entry, true);
+	if (err)
+		goto err_nat;
+
+	return 0;
+
+err_nat:
+	mlx5_tc_ct_entry_del_rule(ct_priv, entry, false);
+err_orig:
+	mlx5_fc_destroy(esw->dev, entry->counter);
+	return err;
+}
+
+static int
+mlx5_tc_ct_block_flow_offload_add(struct mlx5_ct_ft *ft,
+				  struct flow_cls_offload *flow)
+{
+	struct flow_rule *flow_rule = flow_cls_offload_flow_rule(flow);
+	struct mlx5_tc_ct_priv *ct_priv = ft->ct_priv;
+	struct flow_action_entry *meta_action;
+	unsigned long cookie = flow->cookie;
+	struct mlx5_ct_entry *entry;
+	int err;
+
+	meta_action = mlx5_tc_ct_get_ct_metadata_action(flow_rule);
+	if (!meta_action)
+		return -EOPNOTSUPP;
+
+	entry = rhashtable_lookup_fast(&ft->ct_entries_ht, &cookie,
+				       cts_ht_params);
+	if (entry)
+		return 0;
+
+	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+	if (!entry)
+		return -ENOMEM;
+
+	entry->zone = ft->zone;
+	entry->flow_rule = flow_rule;
+	entry->cookie = flow->cookie;
+
+	err = mlx5_tc_ct_entry_add_rules(ct_priv, flow_rule, entry);
+	if (err)
+		goto err_rules;
+
+	err = rhashtable_insert_fast(&ft->ct_entries_ht, &entry->node,
+				     cts_ht_params);
+	if (err)
+		goto err_insert;
+
+	list_add(&entry->list, &ft->ct_entries_list);
+
+	return 0;
+
+err_insert:
+	mlx5_tc_ct_entry_del_rules(ct_priv, entry);
+err_rules:
+	kfree(entry);
+	netdev_warn(ct_priv->netdev,
+		    "Failed to offload ct entry, err: %d\n", err);
+	return err;
+}
+
+static int
+mlx5_tc_ct_block_flow_offload_del(struct mlx5_ct_ft *ft,
+				  struct flow_cls_offload *flow)
+{
+	unsigned long cookie = flow->cookie;
+	struct mlx5_ct_entry *entry;
+
+	entry = rhashtable_lookup_fast(&ft->ct_entries_ht, &cookie,
+				       cts_ht_params);
+	if (!entry)
+		return -ENOENT;
+
+	mlx5_tc_ct_entry_del_rules(ft->ct_priv, entry);
+	WARN_ON(rhashtable_remove_fast(&ft->ct_entries_ht,
+				       &entry->node,
+				       cts_ht_params));
+	list_del(&entry->list);
+	kfree(entry);
+
+	return 0;
+}
+
+static int
+mlx5_tc_ct_block_flow_offload_stats(struct mlx5_ct_ft *ft,
+				    struct flow_cls_offload *f)
+{
+	unsigned long cookie = f->cookie;
+	struct mlx5_ct_entry *entry;
+	u64 lastuse, packets, bytes;
+
+	entry = rhashtable_lookup_fast(&ft->ct_entries_ht, &cookie,
+				       cts_ht_params);
+	if (!entry)
+		return -ENOENT;
+
+	mlx5_fc_query_cached(entry->counter, &bytes, &packets, &lastuse);
+	flow_stats_update(&f->stats, bytes, packets, lastuse);
+
+	return 0;
+}
+
+static int
+mlx5_tc_ct_block_flow_offload(enum tc_setup_type type, void *type_data,
+			      void *cb_priv)
+{
+	struct flow_cls_offload *f = type_data;
+	struct mlx5_ct_ft *ft = cb_priv;
+
+	if (type != TC_SETUP_CLSFLOWER)
+		return -EOPNOTSUPP;
+
+	switch (f->command) {
+	case FLOW_CLS_REPLACE:
+		return mlx5_tc_ct_block_flow_offload_add(ft, f);
+	case FLOW_CLS_DESTROY:
+		return mlx5_tc_ct_block_flow_offload_del(ft, f);
+	case FLOW_CLS_STATS:
+		return mlx5_tc_ct_block_flow_offload_stats(ft, f);
+	default:
+		break;
+	};
+
+	return -EOPNOTSUPP;
+}
+
 int
 mlx5_tc_ct_parse_match(struct mlx5e_priv *priv,
 		       struct mlx5_flow_spec *spec,
@@ -159,10 +750,82 @@ struct mlx5_ct_flow {
 
 	attr->ct_attr.zone = act->ct.zone;
 	attr->ct_attr.ct_action = act->ct.action;
+	attr->ct_attr.nf_ft = act->ct.flow_table;
 
 	return 0;
 }
 
+static struct mlx5_ct_ft *
+mlx5_tc_ct_add_ft_cb(struct mlx5_tc_ct_priv *ct_priv, u16 zone,
+		     struct nf_flowtable *nf_ft)
+{
+	struct mlx5_ct_ft *ft;
+	int err;
+
+	ft = rhashtable_lookup_fast(&ct_priv->zone_ht, &zone, zone_params);
+	if (ft) {
+		refcount_inc(&ft->refcount);
+		return ft;
+	}
+
+	ft = kzalloc(sizeof(*ft), GFP_KERNEL);
+	if (!ft)
+		return ERR_PTR(-ENOMEM);
+
+	ft->zone = zone;
+	ft->nf_ft = nf_ft;
+	ft->ct_priv = ct_priv;
+	INIT_LIST_HEAD(&ft->ct_entries_list);
+	refcount_set(&ft->refcount, 1);
+
+	err = rhashtable_init(&ft->ct_entries_ht, &cts_ht_params);
+	if (err)
+		goto err_init;
+
+	err = rhashtable_insert_fast(&ct_priv->zone_ht, &ft->node,
+				     zone_params);
+	if (err)
+		goto err_insert;
+
+	err = nf_flow_table_offload_add_cb(ft->nf_ft,
+					   mlx5_tc_ct_block_flow_offload, ft);
+	if (err)
+		goto err_add_cb;
+
+	return ft;
+
+err_add_cb:
+	rhashtable_remove_fast(&ct_priv->zone_ht, &ft->node, zone_params);
+err_insert:
+	rhashtable_destroy(&ft->ct_entries_ht);
+err_init:
+	kfree(ft);
+	return ERR_PTR(err);
+}
+
+static void
+mlx5_tc_ct_flush_ft(struct mlx5_tc_ct_priv *ct_priv, struct mlx5_ct_ft *ft)
+{
+	struct mlx5_ct_entry *entry;
+
+	list_for_each_entry(entry, &ft->ct_entries_list, list)
+		mlx5_tc_ct_entry_del_rules(ft->ct_priv, entry);
+}
+
+static void
+mlx5_tc_ct_del_ft_cb(struct mlx5_tc_ct_priv *ct_priv, struct mlx5_ct_ft *ft)
+{
+	if (!refcount_dec_and_test(&ft->refcount))
+		return;
+
+	nf_flow_table_offload_del_cb(ft->nf_ft,
+				     mlx5_tc_ct_block_flow_offload, ft);
+	mlx5_tc_ct_flush_ft(ct_priv, ft);
+	rhashtable_remove_fast(&ct_priv->zone_ht, &ft->node, zone_params);
+	rhashtable_destroy(&ft->ct_entries_ht);
+	kfree(ft);
+}
+
 /* We translate the tc filter with CT action to the following HW model:
  *
  * +-------------------+      +--------------------+    +--------------+
@@ -193,12 +856,23 @@ struct mlx5_ct_flow {
 	struct mlx5_flow_handle *rule;
 	struct mlx5_ct_flow *ct_flow;
 	int chain_mapping = 0, err;
+	struct mlx5_ct_ft *ft;
 	u32 fte_id = 1;
 
 	ct_flow = kzalloc(sizeof(*ct_flow), GFP_KERNEL);
 	if (!ct_flow)
 		return -ENOMEM;
 
+	/* Register for CT established events */
+	ft = mlx5_tc_ct_add_ft_cb(ct_priv, attr->ct_attr.zone,
+				  attr->ct_attr.nf_ft);
+	if (IS_ERR(ft)) {
+		err = PTR_ERR(ft);
+		ct_dbg("Failed to register to ft callback");
+		goto err_ft;
+	}
+	ct_flow->ft = ft;
+
 	err = idr_alloc_u32(&ct_priv->fte_ids, ct_flow, &fte_id,
 			    MLX5_FTE_ID_MAX, GFP_KERNEL);
 	if (err) {
@@ -331,6 +1005,8 @@ struct mlx5_ct_flow {
 err_get_chain:
 	idr_remove(&ct_priv->fte_ids, fte_id);
 err_idr:
+	mlx5_tc_ct_del_ft_cb(ct_priv, ft);
+err_ft:
 	kfree(ct_flow);
 	netdev_warn(priv->netdev, "Failed to offload ct flow, err %d\n", err);
 	return err;
@@ -372,6 +1048,7 @@ struct mlx5_flow_handle *
 					&ct_flow->post_ct_attr);
 	mlx5_esw_chains_put_chain_mapping(esw, ct_flow->chain_mapping);
 	idr_remove(&ct_priv->fte_ids, ct_flow->fte_id);
+	mlx5_tc_ct_del_ft_cb(ct_priv, ct_flow->ft);
 	kfree(ct_flow);
 }
 
@@ -503,6 +1180,7 @@ struct mlx5_flow_handle *
 
 	idr_init(&ct_priv->fte_ids);
 	mutex_init(&ct_priv->control_lock);
+	rhashtable_init(&ct_priv->zone_ht, &zone_params);
 
 	/* Done, set ct_priv to know it initializted */
 	uplink_priv->ct_priv = ct_priv;
@@ -533,6 +1211,7 @@ struct mlx5_flow_handle *
 	mlx5_esw_chains_destroy_global_table(ct_priv->esw, ct_priv->ct_nat);
 	mlx5_esw_chains_destroy_global_table(ct_priv->esw, ct_priv->ct);
 
+	rhashtable_destroy(&ct_priv->zone_ht);
 	mutex_destroy(&ct_priv->control_lock);
 	idr_destroy(&ct_priv->fte_ids);
 	kfree(ct_priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h
index 3a84216..f4bfda7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h
@@ -15,10 +15,13 @@
 
 struct mlx5_ct_flow;
 
+struct nf_flowtable;
+
 struct mlx5_ct_attr {
 	u16 zone;
 	u16 ct_action;
 	struct mlx5_ct_flow *ct_flow;
+	struct nf_flowtable *nf_ft;
 };
 
 #define zone_to_reg_ct {\
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next ct-offload v3 14/15] net/mlx5e: CT: Handle misses after executing CT action
  2020-03-11 14:33 [PATCH net-next ct-offload v3 00/15] Introduce connection tracking offload Paul Blakey
                   ` (12 preceding siblings ...)
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 13/15] net/mlx5e: CT: Offload established flows Paul Blakey
@ 2020-03-11 14:33 ` Paul Blakey
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 15/15] net/mlx5e: CT: Support clear action Paul Blakey
  2020-03-11 19:13 ` [PATCH net-next ct-offload v3 00/15] Introduce connection tracking offload Marcelo Ricardo Leitner
  15 siblings, 0 replies; 27+ messages in thread
From: Paul Blakey @ 2020-03-11 14:33 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko, Roi Dayan

Mark packets with a unique tupleid, and on miss use that id to get
the act ct restore_cookie. Using that restore cookie, we ask CT to
restore the relevant info on the SKB.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c | 59 ++++++++++++++++++++--
 drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h | 25 +++++++++
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    | 12 ++++-
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.h    |  1 +
 4 files changed, 92 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
index 124aca5..2bbf113 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
@@ -35,6 +35,7 @@ struct mlx5_tc_ct_priv {
 	struct mlx5_eswitch *esw;
 	const struct net_device *netdev;
 	struct idr fte_ids;
+	struct idr tuple_ids;
 	struct rhashtable zone_ht;
 	struct mlx5_flow_table *ct;
 	struct mlx5_flow_table *ct_nat;
@@ -55,6 +56,7 @@ struct mlx5_ct_flow {
 struct mlx5_ct_zone_rule {
 	struct mlx5_flow_handle *rule;
 	struct mlx5_esw_flow_attr attr;
+	int tupleid;
 	bool nat;
 };
 
@@ -76,6 +78,7 @@ struct mlx5_ct_entry {
 	struct mlx5_fc *counter;
 	unsigned long lastuse;
 	unsigned long cookie;
+	unsigned long restore_cookie;
 	struct mlx5_ct_zone_rule zone_rules[2];
 };
 
@@ -237,6 +240,7 @@ struct mlx5_ct_entry {
 
 	mlx5_eswitch_del_offloaded_rule(esw, zone_rule->rule, attr);
 	mlx5_modify_header_dealloc(esw->dev, attr->modify_hdr);
+	idr_remove(&ct_priv->tuple_ids, zone_rule->tupleid);
 }
 
 static void
@@ -269,7 +273,8 @@ struct mlx5_ct_entry {
 			       struct mlx5e_tc_mod_hdr_acts *mod_acts,
 			       u8 ct_state,
 			       u32 mark,
-			       u32 label)
+			       u32 label,
+			       u32 tupleid)
 {
 	struct mlx5_eswitch *esw = ct_priv->esw;
 	int err;
@@ -289,6 +294,11 @@ struct mlx5_ct_entry {
 	if (err)
 		return err;
 
+	err = mlx5e_tc_match_to_reg_set(esw->dev, mod_acts,
+					TUPLEID_TO_REG, tupleid);
+	if (err)
+		return err;
+
 	return 0;
 }
 
@@ -403,6 +413,7 @@ struct mlx5_ct_entry {
 mlx5_tc_ct_entry_create_mod_hdr(struct mlx5_tc_ct_priv *ct_priv,
 				struct mlx5_esw_flow_attr *attr,
 				struct flow_rule *flow_rule,
+				u32 tupleid,
 				bool nat)
 {
 	struct mlx5e_tc_mod_hdr_acts mod_acts = {};
@@ -433,7 +444,8 @@ struct mlx5_ct_entry {
 					     (MLX5_CT_STATE_ESTABLISHED_BIT |
 					      MLX5_CT_STATE_TRK_BIT),
 					     meta->ct_metadata.mark,
-					     meta->ct_metadata.labels[0]);
+					     meta->ct_metadata.labels[0],
+					     tupleid);
 	if (err)
 		goto err_mapping;
 
@@ -464,15 +476,27 @@ struct mlx5_ct_entry {
 	struct mlx5_esw_flow_attr *attr = &zone_rule->attr;
 	struct mlx5_eswitch *esw = ct_priv->esw;
 	struct mlx5_flow_spec spec = {};
+	u32 tupleid = 1;
 	int err;
 
 	zone_rule->nat = nat;
 
-	err = mlx5_tc_ct_entry_create_mod_hdr(ct_priv, attr, flow_rule, nat);
+	/* Get tuple unique id */
+	err = idr_alloc_u32(&ct_priv->tuple_ids, zone_rule, &tupleid,
+			    TUPLE_ID_MAX, GFP_KERNEL);
 	if (err) {
-		ct_dbg("Failed to create ct entry mod hdr");
+		netdev_warn(ct_priv->netdev,
+			    "Failed to allocate tuple id, err: %d\n", err);
 		return err;
 	}
+	zone_rule->tupleid = tupleid;
+
+	err = mlx5_tc_ct_entry_create_mod_hdr(ct_priv, attr, flow_rule,
+					      tupleid, nat);
+	if (err) {
+		ct_dbg("Failed to create ct entry mod hdr");
+		goto err_mod_hdr;
+	}
 
 	attr->action = MLX5_FLOW_CONTEXT_ACTION_MOD_HDR |
 		       MLX5_FLOW_CONTEXT_ACTION_FWD_DEST |
@@ -502,6 +526,8 @@ struct mlx5_ct_entry {
 
 err_rule:
 	mlx5_modify_header_dealloc(esw->dev, attr->modify_hdr);
+err_mod_hdr:
+	idr_remove(&ct_priv->tuple_ids, zone_rule->tupleid);
 	return err;
 }
 
@@ -564,6 +590,7 @@ struct mlx5_ct_entry {
 	entry->zone = ft->zone;
 	entry->flow_rule = flow_rule;
 	entry->cookie = flow->cookie;
+	entry->restore_cookie = meta_action->ct_metadata.cookie;
 
 	err = mlx5_tc_ct_entry_add_rules(ct_priv, flow_rule, entry);
 	if (err)
@@ -1179,6 +1206,7 @@ struct mlx5_flow_handle *
 	}
 
 	idr_init(&ct_priv->fte_ids);
+	idr_init(&ct_priv->tuple_ids);
 	mutex_init(&ct_priv->control_lock);
 	rhashtable_init(&ct_priv->zone_ht, &zone_params);
 
@@ -1213,8 +1241,31 @@ struct mlx5_flow_handle *
 
 	rhashtable_destroy(&ct_priv->zone_ht);
 	mutex_destroy(&ct_priv->control_lock);
+	idr_destroy(&ct_priv->tuple_ids);
 	idr_destroy(&ct_priv->fte_ids);
 	kfree(ct_priv);
 
 	uplink_priv->ct_priv = NULL;
 }
+
+bool
+mlx5e_tc_ct_restore_flow(struct mlx5_rep_uplink_priv *uplink_priv,
+			 struct sk_buff *skb, u32 tupleid)
+{
+	struct mlx5_tc_ct_priv *ct_priv = uplink_priv->ct_priv;
+	struct mlx5_ct_zone_rule *zone_rule;
+	struct mlx5_ct_entry *entry;
+
+	if (!ct_priv || !tupleid)
+		return true;
+
+	zone_rule = idr_find(&ct_priv->tuple_ids, tupleid);
+	if (!zone_rule)
+		return false;
+
+	entry = container_of(zone_rule, struct mlx5_ct_entry,
+			     zone_rules[zone_rule->nat]);
+	tcf_ct_flow_table_restore_skb(skb, entry->restore_cookie);
+
+	return true;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h
index f4bfda7..464c865 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h
@@ -64,6 +64,17 @@ struct mlx5_ct_attr {
 				 misc_parameters_2.metadata_reg_c_5),\
 }
 
+#define tupleid_to_reg_ct {\
+	.mfield = MLX5_ACTION_IN_FIELD_METADATA_REG_C_1,\
+	.moffset = 0,\
+	.mlen = 3,\
+	.soffset = MLX5_BYTE_OFF(fte_match_param,\
+				 misc_parameters_2.metadata_reg_c_1),\
+}
+
+#define TUPLE_ID_BITS (mlx5e_tc_attr_to_reg_mappings[TUPLEID_TO_REG].mlen * 8)
+#define TUPLE_ID_MAX GENMASK(TUPLE_ID_BITS - 1, 0)
+
 #if IS_ENABLED(CONFIG_MLX5_TC_CT)
 
 int
@@ -92,6 +103,10 @@ struct mlx5_flow_handle *
 		       struct mlx5e_tc_flow *flow,
 		       struct mlx5_esw_flow_attr *attr);
 
+bool
+mlx5e_tc_ct_restore_flow(struct mlx5_rep_uplink_priv *uplink_priv,
+			 struct sk_buff *skb, u32 tupleid);
+
 #else /* CONFIG_MLX5_TC_CT */
 
 static inline int
@@ -139,5 +154,15 @@ struct mlx5_flow_handle *
 {
 }
 
+static inline bool
+mlx5e_tc_ct_restore_flow(struct mlx5_rep_uplink_priv *uplink_priv,
+			 struct sk_buff *skb, u32 tupleid)
+{
+	if  (!tupleid)
+		return  true;
+
+	return false;
+}
+
 #endif /* !IS_ENABLED(CONFIG_MLX5_TC_CT) */
 #endif /* __MLX5_EN_TC_CT_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 1f6a306..5497d5e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -200,6 +200,7 @@ struct mlx5e_tc_attr_to_reg_mapping mlx5e_tc_attr_to_reg_mappings[] = {
 	[MARK_TO_REG] = mark_to_reg_ct,
 	[LABELS_TO_REG] = labels_to_reg_ct,
 	[FTEID_TO_REG] = fteid_to_reg_ct,
+	[TUPLEID_TO_REG] = tupleid_to_reg_ct,
 };
 
 static void mlx5e_put_flow_tunnel_id(struct mlx5e_tc_flow *flow);
@@ -4851,7 +4852,9 @@ bool mlx5e_tc_rep_update_skb(struct mlx5_cqe64 *cqe,
 			     struct mlx5e_tc_update_priv *tc_priv)
 {
 #if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
-	u32 chain = 0, reg_c0, reg_c1, tunnel_id;
+	u32 chain = 0, reg_c0, reg_c1, tunnel_id, tuple_id;
+	struct mlx5_rep_uplink_priv *uplink_priv;
+	struct mlx5e_rep_priv *uplink_rpriv;
 	struct tc_skb_ext *tc_skb_ext;
 	struct mlx5_eswitch *esw;
 	struct mlx5e_priv *priv;
@@ -4885,6 +4888,13 @@ bool mlx5e_tc_rep_update_skb(struct mlx5_cqe64 *cqe,
 		}
 
 		tc_skb_ext->chain = chain;
+
+		tuple_id = reg_c1 & TUPLE_ID_MAX;
+
+		uplink_rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH);
+		uplink_priv = &uplink_rpriv->uplink_priv;
+		if (!mlx5e_tc_ct_restore_flow(uplink_priv, skb, tuple_id))
+			return false;
 	}
 
 	tunnel_moffset = mlx5e_tc_attr_to_reg_mappings[TUNNEL_TO_REG].moffset;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
index 31c9e81..abdcfa4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
@@ -99,6 +99,7 @@ enum mlx5e_tc_attr_to_reg {
 	MARK_TO_REG,
 	LABELS_TO_REG,
 	FTEID_TO_REG,
+	TUPLEID_TO_REG,
 };
 
 struct mlx5e_tc_attr_to_reg_mapping {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next ct-offload v3 15/15] net/mlx5e: CT: Support clear action
  2020-03-11 14:33 [PATCH net-next ct-offload v3 00/15] Introduce connection tracking offload Paul Blakey
                   ` (13 preceding siblings ...)
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 14/15] net/mlx5e: CT: Handle misses after executing CT action Paul Blakey
@ 2020-03-11 14:33 ` Paul Blakey
  2020-03-11 19:13 ` [PATCH net-next ct-offload v3 00/15] Introduce connection tracking offload Marcelo Ricardo Leitner
  15 siblings, 0 replies; 27+ messages in thread
From: Paul Blakey @ 2020-03-11 14:33 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko, Roi Dayan

Clear action, as with software, removes all ct metadata from
the packet.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c | 90 ++++++++++++++++++++--
 drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h |  7 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    | 10 ++-
 3 files changed, 95 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
index 2bbf113..7bf98d6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
@@ -1039,12 +1039,79 @@ struct mlx5_ct_entry {
 	return err;
 }
 
+static int
+__mlx5_tc_ct_flow_offload_clear(struct mlx5e_priv *priv,
+				struct mlx5e_tc_flow *flow,
+				struct mlx5_flow_spec *orig_spec,
+				struct mlx5_esw_flow_attr *attr,
+				struct mlx5e_tc_mod_hdr_acts *mod_acts,
+				struct mlx5_flow_handle **flow_rule)
+{
+	struct mlx5_tc_ct_priv *ct_priv = mlx5_tc_ct_get_ct_priv(priv);
+	struct mlx5_eswitch *esw = ct_priv->esw;
+	struct mlx5_esw_flow_attr *pre_ct_attr;
+	struct mlx5_modify_hdr *mod_hdr;
+	struct mlx5_flow_handle *rule;
+	struct mlx5_ct_flow *ct_flow;
+	int err;
+
+	ct_flow = kzalloc(sizeof(*ct_flow), GFP_KERNEL);
+	if (!ct_flow)
+		return -ENOMEM;
+
+	/* Base esw attributes on original rule attribute */
+	pre_ct_attr = &ct_flow->pre_ct_attr;
+	memcpy(pre_ct_attr, attr, sizeof(*attr));
+
+	err = mlx5_tc_ct_entry_set_registers(ct_priv, mod_acts, 0, 0, 0, 0);
+	if (err) {
+		ct_dbg("Failed to set register for ct clear");
+		goto err_set_registers;
+	}
+
+	mod_hdr = mlx5_modify_header_alloc(esw->dev,
+					   MLX5_FLOW_NAMESPACE_FDB,
+					   mod_acts->num_actions,
+					   mod_acts->actions);
+	if (IS_ERR(mod_hdr)) {
+		err = PTR_ERR(mod_hdr);
+		ct_dbg("Failed to add create ct clear mod hdr");
+		goto err_set_registers;
+	}
+
+	dealloc_mod_hdr_actions(mod_acts);
+	pre_ct_attr->modify_hdr = mod_hdr;
+	pre_ct_attr->action |= MLX5_FLOW_CONTEXT_ACTION_MOD_HDR;
+
+	rule = mlx5_eswitch_add_offloaded_rule(esw, orig_spec, pre_ct_attr);
+	if (IS_ERR(rule)) {
+		err = PTR_ERR(rule);
+		ct_dbg("Failed to add ct clear rule");
+		goto err_insert;
+	}
+
+	attr->ct_attr.ct_flow = ct_flow;
+	ct_flow->pre_ct_rule = rule;
+	*flow_rule = rule;
+
+	return 0;
+
+err_insert:
+	mlx5_modify_header_dealloc(priv->mdev, mod_hdr);
+err_set_registers:
+	netdev_warn(priv->netdev,
+		    "Failed to offload ct clear flow, err %d\n", err);
+	return err;
+}
+
 struct mlx5_flow_handle *
 mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
 			struct mlx5e_tc_flow *flow,
 			struct mlx5_flow_spec *spec,
-			struct mlx5_esw_flow_attr *attr)
+			struct mlx5_esw_flow_attr *attr,
+			struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts)
 {
+	bool clear_action = attr->ct_attr.ct_action & TCA_CT_ACT_CLEAR;
 	struct mlx5_tc_ct_priv *ct_priv = mlx5_tc_ct_get_ct_priv(priv);
 	struct mlx5_flow_handle *rule;
 	int err;
@@ -1053,7 +1120,12 @@ struct mlx5_flow_handle *
 		return ERR_PTR(-EOPNOTSUPP);
 
 	mutex_lock(&ct_priv->control_lock);
-	err = __mlx5_tc_ct_flow_offload(priv, flow, spec, attr, &rule);
+	if (clear_action)
+		err = __mlx5_tc_ct_flow_offload_clear(priv, flow, spec, attr,
+						      mod_hdr_acts, &rule);
+	else
+		err = __mlx5_tc_ct_flow_offload(priv, flow, spec, attr,
+						&rule);
 	mutex_unlock(&ct_priv->control_lock);
 	if (err)
 		return ERR_PTR(err);
@@ -1071,11 +1143,15 @@ struct mlx5_flow_handle *
 	mlx5_eswitch_del_offloaded_rule(esw, ct_flow->pre_ct_rule,
 					pre_ct_attr);
 	mlx5_modify_header_dealloc(esw->dev, pre_ct_attr->modify_hdr);
-	mlx5_eswitch_del_offloaded_rule(esw, ct_flow->post_ct_rule,
-					&ct_flow->post_ct_attr);
-	mlx5_esw_chains_put_chain_mapping(esw, ct_flow->chain_mapping);
-	idr_remove(&ct_priv->fte_ids, ct_flow->fte_id);
-	mlx5_tc_ct_del_ft_cb(ct_priv, ct_flow->ft);
+
+	if (ct_flow->post_ct_rule) {
+		mlx5_eswitch_del_offloaded_rule(esw, ct_flow->post_ct_rule,
+						&ct_flow->post_ct_attr);
+		mlx5_esw_chains_put_chain_mapping(esw, ct_flow->chain_mapping);
+		idr_remove(&ct_priv->fte_ids, ct_flow->fte_id);
+		mlx5_tc_ct_del_ft_cb(ct_priv, ct_flow->ft);
+	}
+
 	kfree(ct_flow);
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h
index 464c865..6b2c893 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h
@@ -9,6 +9,7 @@
 #include <net/tc_act/tc_ct.h>
 
 struct mlx5_esw_flow_attr;
+struct mlx5e_tc_mod_hdr_acts;
 struct mlx5_rep_uplink_priv;
 struct mlx5e_tc_flow;
 struct mlx5e_priv;
@@ -97,7 +98,8 @@ struct mlx5_flow_handle *
 mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
 			struct mlx5e_tc_flow *flow,
 			struct mlx5_flow_spec *spec,
-			struct mlx5_esw_flow_attr *attr);
+			struct mlx5_esw_flow_attr *attr,
+			struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts);
 void
 mlx5_tc_ct_delete_flow(struct mlx5e_priv *priv,
 		       struct mlx5e_tc_flow *flow,
@@ -142,7 +144,8 @@ struct mlx5_flow_handle *
 mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
 			struct mlx5e_tc_flow *flow,
 			struct mlx5_flow_spec *spec,
-			struct mlx5_esw_flow_attr *attr)
+			struct mlx5_esw_flow_attr *attr,
+			struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts)
 {
 	return ERR_PTR(-EOPNOTSUPP);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 5497d5e..044891a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -1151,11 +1151,15 @@ static int mlx5e_attach_encap(struct mlx5e_priv *priv,
 			   struct mlx5_flow_spec *spec,
 			   struct mlx5_esw_flow_attr *attr)
 {
+	struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts;
 	struct mlx5_flow_handle *rule;
-	struct mlx5e_tc_mod_hdr_acts;
 
-	if (flow_flag_test(flow, CT))
-		return mlx5_tc_ct_flow_offload(flow->priv, flow, spec, attr);
+	if (flow_flag_test(flow, CT)) {
+		mod_hdr_acts = &attr->parse_attr->mod_hdr_acts;
+
+		return mlx5_tc_ct_flow_offload(flow->priv, flow, spec, attr,
+					       mod_hdr_acts);
+	}
 
 	rule = mlx5_eswitch_add_offloaded_rule(esw, spec, attr);
 	if (IS_ERR(rule))
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next ct-offload v3 04/15] net/sched: act_ct: Instantiate flow table entry actions
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 04/15] net/sched: act_ct: Instantiate flow table entry actions Paul Blakey
@ 2020-03-11 17:41   ` Edward Cree
  2020-03-11 22:27     ` Paul Blakey
  0 siblings, 1 reply; 27+ messages in thread
From: Edward Cree @ 2020-03-11 17:41 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Vlad Buslov,
	David Miller, netdev, Jiri Pirko, Roi Dayan

On 11/03/2020 14:33, Paul Blakey wrote:
> NF flow table API associate 5-tuple rule with an action list by calling
> the flow table type action() CB to fill the rule's actions.
>
> In action CB of act_ct, populate the ct offload entry actions with a new
> ct_metadata action. Initialize the ct_metadata with the ct mark, label and
> zone information. If ct nat was performed, then also append the relevant
> packet mangle actions (e.g. ipv4/ipv6/tcp/udp header rewrites).
>
> Drivers that offload the ft entries may match on the 5-tuple and perform
> the action list.
>
> Signed-off-by: Paul Blakey <paulb@mellanox.com>
> Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Edward Cree <ecree@solarflare.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next ct-offload v3 13/15] net/mlx5e: CT: Offload established flows
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 13/15] net/mlx5e: CT: Offload established flows Paul Blakey
@ 2020-03-11 17:45   ` Edward Cree
  2020-03-11 22:29     ` Paul Blakey
  0 siblings, 1 reply; 27+ messages in thread
From: Edward Cree @ 2020-03-11 17:45 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko, Roi Dayan

On 11/03/2020 14:33, Paul Blakey wrote:
> Register driver callbacks with the nf flow table platform.
> FT add/delete events will create/delete FTE in the CT/CT_NAT tables.
>
> Restoring the CT state on miss will be added in the following patch.
>
> Signed-off-by: Paul Blakey <paulb@mellanox.com>
> Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
> Reviewed-by: Roi Dayan <roid@mellanox.com>
> Reviewed-by: Jiri Pirko <jiri@mellanox.com>
> ---
<snip>
> +static int
> +mlx5_tc_ct_parse_mangle_to_mod_act(struct flow_action_entry *act,
> +				   char *modact)
> +{
> +	u32 offset = act->mangle.offset, field;
> +
> +	switch (act->mangle.htype) {
> +	case FLOW_ACT_MANGLE_HDR_TYPE_IP4:
> +		MLX5_SET(set_action_in, modact, length, 0);
> +		field = offset == offsetof(struct iphdr, saddr) ?
> +			MLX5_ACTION_IN_FIELD_OUT_SIPV4 :
> +			MLX5_ACTION_IN_FIELD_OUT_DIPV4;
Won't this mishandle any mangle of a field other than src/dst addr?

> +		break;
> +
> +	case FLOW_ACT_MANGLE_HDR_TYPE_IP6:
> +		MLX5_SET(set_action_in, modact, length, 0);
> +		if (offset == offsetof(struct ipv6hdr, saddr))
> +			field = MLX5_ACTION_IN_FIELD_OUT_SIPV6_31_0;
> +		else if (offset == offsetof(struct ipv6hdr, saddr) + 4)
> +			field = MLX5_ACTION_IN_FIELD_OUT_SIPV6_63_32;
> +		else if (offset == offsetof(struct ipv6hdr, saddr) + 8)
> +			field = MLX5_ACTION_IN_FIELD_OUT_SIPV6_95_64;
> +		else if (offset == offsetof(struct ipv6hdr, saddr) + 12)
> +			field = MLX5_ACTION_IN_FIELD_OUT_SIPV6_127_96;
> +		else if (offset == offsetof(struct ipv6hdr, daddr))
> +			field = MLX5_ACTION_IN_FIELD_OUT_DIPV6_31_0;
> +		else if (offset == offsetof(struct ipv6hdr, daddr) + 4)
> +			field = MLX5_ACTION_IN_FIELD_OUT_DIPV6_63_32;
> +		else if (offset == offsetof(struct ipv6hdr, daddr) + 8)
> +			field = MLX5_ACTION_IN_FIELD_OUT_DIPV6_95_64;
> +		else if (offset == offsetof(struct ipv6hdr, daddr) + 12)
> +			field = MLX5_ACTION_IN_FIELD_OUT_DIPV6_127_96;
> +		else
> +			return -EOPNOTSUPP;
> +		break;
> +
> +	case FLOW_ACT_MANGLE_HDR_TYPE_TCP:
> +		MLX5_SET(set_action_in, modact, length, 16);
> +		field = offset == offsetof(struct tcphdr, source) ?
> +			MLX5_ACTION_IN_FIELD_OUT_TCP_SPORT :
> +			MLX5_ACTION_IN_FIELD_OUT_TCP_DPORT;
Ditto.

> +		break;
> +
> +	case FLOW_ACT_MANGLE_HDR_TYPE_UDP:
> +		MLX5_SET(set_action_in, modact, length, 16);
> +		field = offset == offsetof(struct udphdr, source) ?
> +			MLX5_ACTION_IN_FIELD_OUT_UDP_SPORT :
> +			MLX5_ACTION_IN_FIELD_OUT_TCP_DPORT;
s/TCP/UDP/?

-ed

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next ct-offload v3 00/15] Introduce connection tracking offload
  2020-03-11 14:33 [PATCH net-next ct-offload v3 00/15] Introduce connection tracking offload Paul Blakey
                   ` (14 preceding siblings ...)
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 15/15] net/mlx5e: CT: Support clear action Paul Blakey
@ 2020-03-11 19:13 ` Marcelo Ricardo Leitner
  2020-03-11 22:27   ` Paul Blakey
  15 siblings, 1 reply; 27+ messages in thread
From: Marcelo Ricardo Leitner @ 2020-03-11 19:13 UTC (permalink / raw)
  To: Paul Blakey
  Cc: Saeed Mahameed, Oz Shlomo, Vlad Buslov, David Miller, netdev,
	Jiri Pirko, Roi Dayan

On Wed, Mar 11, 2020 at 04:33:43PM +0200, Paul Blakey wrote:
> Applying this patchset
> --------------------------
> 
> On top of current net-next ("r8169: simplify getting stats by using netdev_stats_to_stats64"),
> pull Saeed's ct-offload branch, from git git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git
> and fix the following non trivial conflict in fs_core.c as follows:
> #define OFFLOADS_MAX_FT 2
> #define OFFLOADS_NUM_PRIOS 2
> #define OFFLOADS_MIN_LEVEL (ANCHOR_MIN_LEVEL + OFFLOADS_NUM_PRIOS)
> 
> Then apply this patchset.

I did this and I couldn't get tc offloading (not ct) to work anymore.
Then I moved to current net-next (the commit you mentioned above), and
got the same thing.

What I can tell so far is that 
perf script | head
       handler11  4415 [009]  1263.438424: probe:mlx5e_configure_flower__return: (ffffffffc094fa80 <- ffffffff93dc510a) arg1=0xffffffffffffffa1

and that's EOPNOTSUPP. Not sure yet where that is coming from.


Btw, it's prooving to be a nice exercise to find out why it is failing
to offload. Perhaps some netdev_err_once() is welcomed.

  Marcelo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next ct-offload v3 00/15] Introduce connection tracking offload
  2020-03-11 19:13 ` [PATCH net-next ct-offload v3 00/15] Introduce connection tracking offload Marcelo Ricardo Leitner
@ 2020-03-11 22:27   ` Paul Blakey
  2020-03-11 22:44     ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 27+ messages in thread
From: Paul Blakey @ 2020-03-11 22:27 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Saeed Mahameed, Oz Shlomo, Vlad Buslov, David Miller, netdev,
	Jiri Pirko, Roi Dayan


On 11/03/2020 21:13, Marcelo Ricardo Leitner wrote:
> On Wed, Mar 11, 2020 at 04:33:43PM +0200, Paul Blakey wrote:
>> Applying this patchset
>> --------------------------
>>
>> On top of current net-next ("r8169: simplify getting stats by using netdev_stats_to_stats64"),
>> pull Saeed's ct-offload branch, from git git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git
>> and fix the following non trivial conflict in fs_core.c as follows:
>> #define OFFLOADS_MAX_FT 2
>> #define OFFLOADS_NUM_PRIOS 2
>> #define OFFLOADS_MIN_LEVEL (ANCHOR_MIN_LEVEL + OFFLOADS_NUM_PRIOS)
>>
>> Then apply this patchset.
> I did this and I couldn't get tc offloading (not ct) to work anymore.
> Then I moved to current net-next (the commit you mentioned above), and
> got the same thing.
>
> What I can tell so far is that 
> perf script | head
>        handler11  4415 [009]  1263.438424: probe:mlx5e_configure_flower__return: (ffffffffc094fa80 <- ffffffff93dc510a) arg1=0xffffffffffffffa1
>
> and that's EOPNOTSUPP. Not sure yet where that is coming from.

I guess you fail at what this patch "flow_offload: fix allowed types check" is suppose to fix

https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=a393daa8993fd7d6c9c33110d5dac08bc0dc2696

That was the last bug that caused what you decribe - all tc rules failed.

It should be before the patch I detailed as seen in git log here:

https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/log/


Can you check you have it? im not sure if compiling just the driver is enough.

if it is this, it should have fixed it.

if not try skipping flow_action_hw_stats_types_check() calls in mlx5 driver.

there is a netlink error msg most of the cases, try skip_sw or verbose to see.


>
> Btw, it's prooving to be a nice exercise to find out why it is failing
> to offload. Perhaps some netdev_err_once() is welcomed.
>
>   Marcelo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next ct-offload v3 04/15] net/sched: act_ct: Instantiate flow table entry actions
  2020-03-11 17:41   ` Edward Cree
@ 2020-03-11 22:27     ` Paul Blakey
  0 siblings, 0 replies; 27+ messages in thread
From: Paul Blakey @ 2020-03-11 22:27 UTC (permalink / raw)
  To: Edward Cree, Saeed Mahameed, Oz Shlomo, Vlad Buslov,
	David Miller, netdev, Jiri Pirko, Roi Dayan


On 11/03/2020 19:41, Edward Cree wrote:
> On 11/03/2020 14:33, Paul Blakey wrote:
>> NF flow table API associate 5-tuple rule with an action list by calling
>> the flow table type action() CB to fill the rule's actions.
>>
>> In action CB of act_ct, populate the ct offload entry actions with a new
>> ct_metadata action. Initialize the ct_metadata with the ct mark, label and
>> zone information. If ct nat was performed, then also append the relevant
>> packet mangle actions (e.g. ipv4/ipv6/tcp/udp header rewrites).
>>
>> Drivers that offload the ft entries may match on the 5-tuple and perform
>> the action list.
>>
>> Signed-off-by: Paul Blakey <paulb@mellanox.com>
>> Reviewed-by: Jiri Pirko <jiri@mellanox.com>
> Reviewed-by: Edward Cree <ecree@solarflare.com>
thanks!

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next ct-offload v3 13/15] net/mlx5e: CT: Offload established flows
  2020-03-11 17:45   ` Edward Cree
@ 2020-03-11 22:29     ` Paul Blakey
  0 siblings, 0 replies; 27+ messages in thread
From: Paul Blakey @ 2020-03-11 22:29 UTC (permalink / raw)
  To: Edward Cree, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko, Roi Dayan


On 11/03/2020 19:45, Edward Cree wrote:
> On 11/03/2020 14:33, Paul Blakey wrote:
>> Register driver callbacks with the nf flow table platform.
>> FT add/delete events will create/delete FTE in the CT/CT_NAT tables.
>>
>> Restoring the CT state on miss will be added in the following patch.
>>
>> Signed-off-by: Paul Blakey <paulb@mellanox.com>
>> Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
>> Reviewed-by: Roi Dayan <roid@mellanox.com>
>> Reviewed-by: Jiri Pirko <jiri@mellanox.com>
>> ---
> <snip>
>> +static int
>> +mlx5_tc_ct_parse_mangle_to_mod_act(struct flow_action_entry *act,
>> +				   char *modact)
>> +{
>> +	u32 offset = act->mangle.offset, field;
>> +
>> +	switch (act->mangle.htype) {
>> +	case FLOW_ACT_MANGLE_HDR_TYPE_IP4:
>> +		MLX5_SET(set_action_in, modact, length, 0);
>> +		field = offset == offsetof(struct iphdr, saddr) ?
>> +			MLX5_ACTION_IN_FIELD_OUT_SIPV4 :
>> +			MLX5_ACTION_IN_FIELD_OUT_DIPV4;
> Won't this mishandle any mangle of a field other than src/dst addr?

Yes we didn't expect any other types, it was just the api of mangle.

ill add a else case.

>> +		break;
>> +
>> +	case FLOW_ACT_MANGLE_HDR_TYPE_IP6:
>> +		MLX5_SET(set_action_in, modact, length, 0);
>> +		if (offset == offsetof(struct ipv6hdr, saddr))
>> +			field = MLX5_ACTION_IN_FIELD_OUT_SIPV6_31_0;
>> +		else if (offset == offsetof(struct ipv6hdr, saddr) + 4)
>> +			field = MLX5_ACTION_IN_FIELD_OUT_SIPV6_63_32;
>> +		else if (offset == offsetof(struct ipv6hdr, saddr) + 8)
>> +			field = MLX5_ACTION_IN_FIELD_OUT_SIPV6_95_64;
>> +		else if (offset == offsetof(struct ipv6hdr, saddr) + 12)
>> +			field = MLX5_ACTION_IN_FIELD_OUT_SIPV6_127_96;
>> +		else if (offset == offsetof(struct ipv6hdr, daddr))
>> +			field = MLX5_ACTION_IN_FIELD_OUT_DIPV6_31_0;
>> +		else if (offset == offsetof(struct ipv6hdr, daddr) + 4)
>> +			field = MLX5_ACTION_IN_FIELD_OUT_DIPV6_63_32;
>> +		else if (offset == offsetof(struct ipv6hdr, daddr) + 8)
>> +			field = MLX5_ACTION_IN_FIELD_OUT_DIPV6_95_64;
>> +		else if (offset == offsetof(struct ipv6hdr, daddr) + 12)
>> +			field = MLX5_ACTION_IN_FIELD_OUT_DIPV6_127_96;
>> +		else
>> +			return -EOPNOTSUPP;
>> +		break;
>> +
>> +	case FLOW_ACT_MANGLE_HDR_TYPE_TCP:
>> +		MLX5_SET(set_action_in, modact, length, 16);
>> +		field = offset == offsetof(struct tcphdr, source) ?
>> +			MLX5_ACTION_IN_FIELD_OUT_TCP_SPORT :
>> +			MLX5_ACTION_IN_FIELD_OUT_TCP_DPORT;
> Ditto.
>
>> +		break;
>> +
>> +	case FLOW_ACT_MANGLE_HDR_TYPE_UDP:
>> +		MLX5_SET(set_action_in, modact, length, 16);
>> +		field = offset == offsetof(struct udphdr, source) ?
>> +			MLX5_ACTION_IN_FIELD_OUT_UDP_SPORT :
>> +			MLX5_ACTION_IN_FIELD_OUT_TCP_DPORT;
> s/TCP/UDP/?

yes

thanks will fix

>
> -ed

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next ct-offload v3 00/15] Introduce connection tracking offload
  2020-03-11 22:27   ` Paul Blakey
@ 2020-03-11 22:44     ` Marcelo Ricardo Leitner
  2020-03-12  0:01       ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 27+ messages in thread
From: Marcelo Ricardo Leitner @ 2020-03-11 22:44 UTC (permalink / raw)
  To: Paul Blakey
  Cc: Saeed Mahameed, Oz Shlomo, Vlad Buslov, David Miller, netdev,
	Jiri Pirko, Roi Dayan

On Thu, Mar 12, 2020 at 12:27:37AM +0200, Paul Blakey wrote:
> 
> On 11/03/2020 21:13, Marcelo Ricardo Leitner wrote:
> > On Wed, Mar 11, 2020 at 04:33:43PM +0200, Paul Blakey wrote:
> >> Applying this patchset
> >> --------------------------
> >>
> >> On top of current net-next ("r8169: simplify getting stats by using netdev_stats_to_stats64"),
> >> pull Saeed's ct-offload branch, from git git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git
> >> and fix the following non trivial conflict in fs_core.c as follows:
> >> #define OFFLOADS_MAX_FT 2
> >> #define OFFLOADS_NUM_PRIOS 2
> >> #define OFFLOADS_MIN_LEVEL (ANCHOR_MIN_LEVEL + OFFLOADS_NUM_PRIOS)
> >>
> >> Then apply this patchset.
> > I did this and I couldn't get tc offloading (not ct) to work anymore.
> > Then I moved to current net-next (the commit you mentioned above), and
> > got the same thing.
> >
> > What I can tell so far is that 
> > perf script | head
> >        handler11  4415 [009]  1263.438424: probe:mlx5e_configure_flower__return: (ffffffffc094fa80 <- ffffffff93dc510a) arg1=0xffffffffffffffa1
> >
> > and that's EOPNOTSUPP. Not sure yet where that is coming from.
> 
> I guess you fail at what this patch "flow_offload: fix allowed types check" is suppose to fix
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=a393daa8993fd7d6c9c33110d5dac08bc0dc2696
> 
> That was the last bug that caused what you decribe - all tc rules failed.
> 
> It should be before the patch I detailed as seen in git log here:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/log/
> 
> 
> Can you check you have it? im not sure if compiling just the driver is enough.

I had this one, in both tests.

> 
> if it is this, it should have fixed it.
> 
> if not try skipping flow_action_hw_stats_types_check() calls in mlx5 driver.

Ok. This one was my main suspect now after some extra printks. I could
confirm it parse_tc_fdb_actions is returning the error, but not sure
why yet.

> 
> there is a netlink error msg most of the cases, try skip_sw or verbose to see.

Yeah.. that probably would be easier, thanks. I'm using it under OvS
and without skip_sw, got just no logs for it.

> 
> >
> > Btw, it's prooving to be a nice exercise to find out why it is failing
> > to offload. Perhaps some netdev_err_once() is welcomed.
> >
> >   Marcelo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next ct-offload v3 00/15] Introduce connection tracking offload
  2020-03-11 22:44     ` Marcelo Ricardo Leitner
@ 2020-03-12  0:01       ` Marcelo Ricardo Leitner
  2020-03-12  9:33         ` Paul Blakey
  0 siblings, 1 reply; 27+ messages in thread
From: Marcelo Ricardo Leitner @ 2020-03-12  0:01 UTC (permalink / raw)
  To: Paul Blakey
  Cc: Saeed Mahameed, Oz Shlomo, Vlad Buslov, David Miller, netdev,
	Jiri Pirko, Roi Dayan

On Wed, Mar 11, 2020 at 07:44:15PM -0300, Marcelo Ricardo Leitner wrote:
> On Thu, Mar 12, 2020 at 12:27:37AM +0200, Paul Blakey wrote:
...
> > if not try skipping flow_action_hw_stats_types_check() calls in mlx5 driver.
> 
> Ok. This one was my main suspect now after some extra printks. I could
> confirm it parse_tc_fdb_actions is returning the error, but not sure
> why yet.

Copied the extack in flow_action_mixed_hw_stats_types_check() onto a
printk and logged the if parameters:

@@ -284,6 +284,8 @@ flow_action_mixed_hw_stats_types_check(const struct flow_action *action,

        flow_action_for_each(i, action_entry, action) {
                if (i && action_entry->hw_stats_type != last_hw_stats_type) {
+                       printk("Mixing HW stats types for actions is not supported, %d %d %d\n",
+                              i, action_entry->hw_stats_type, last_hw_stats_type);
                        NL_SET_ERR_MSG_MOD(extack, "Mixing HW stats types for actions is not supported");

[  188.554636] Mixing HW stats types for actions is not supported, 2 0 3

with iproute-next from today loaded with
'iproute2/net-next v2] tc: m_action: introduce support for hw stats
type', I get a dump like:

# ./tc -s filter show dev vxlan_sys_4789 ingress
filter protocol ip pref 2 flower chain 0
filter protocol ip pref 2 flower chain 0 handle 0x1
  dst_mac 6a:66:2d:48:92:c2
  src_mac 00:00:00:00:0e:b7
  eth_type ipv4
  ip_proto udp
  ip_ttl 64
  src_port 100
  enc_dst_ip 1.1.1.1
  enc_src_ip 1.1.1.2
  enc_key_id 0
  enc_dst_port 4789
  enc_tos 0
  ip_flags nofrag
  not_in_hw
        action order 1: tunnel_key  unset pipe
         index 4 ref 1 bind 1 installed 2432 sec used 0 sec
        Action statistics:
        Sent 223744 bytes 4864 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0
        no_percpu

        action order 2:  pedit action pipe keys 2
         index 4 ref 1 bind 1 installed 2432 sec used 0 sec
         key #0  at ipv4+8: val 3f000000 mask 00ffffff
         key #1  at udp+0: val 13890000 mask 0000ffff
        Action statistics:
        Sent 223744 bytes 4864 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

        action order 3: csum (iph, udp) action pipe
        index 4 ref 1 bind 1 installed 2432 sec used 0 sec
        Action statistics:
        Sent 223744 bytes 4864 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0
        no_percpu

        action order 4: mirred (Egress Redirect to device eth0) stolen
        index 1 ref 1 bind 1 installed 2432 sec used 0 sec
        Action statistics:
        Sent 223744 bytes 4864 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0
        cookie 9c7d6b3aa84a32498181d98da0af80d2
        no_percpu

Which is quite interesting as it is not listing any hwstats fields at all.
Considering that due to tcf_action_hw_stats_type_get() and
hw_stats_type initialization in tcf_action_init_1(), the default is
for it to be 3 if not specified (which is then not dumped over netlink
by the kernel).

I'm wondering how it was 0 for i==0,1 and is 3 for i==2.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next ct-offload v3 05/15] net/sched: act_ct: Support restoring conntrack info on skbs
  2020-03-11 14:33 ` [PATCH net-next ct-offload v3 05/15] net/sched: act_ct: Support restoring conntrack info on skbs Paul Blakey
@ 2020-03-12  6:40   ` David Miller
  2020-03-12  9:33     ` Paul Blakey
  0 siblings, 1 reply; 27+ messages in thread
From: David Miller @ 2020-03-12  6:40 UTC (permalink / raw)
  To: paulb; +Cc: saeedm, ozsh, jakub.kicinski, vladbu, netdev, jiri, roid

From: Paul Blakey <paulb@mellanox.com>
Date: Wed, 11 Mar 2020 16:33:48 +0200

> +void tcf_ct_flow_table_restore_skb(struct sk_buff *skb, unsigned long cookie)
> +{
> +	enum ip_conntrack_info ctinfo = cookie & NFCT_INFOMASK;
> +	struct nf_conn *ct = (struct nf_conn *)(cookie & NFCT_PTRMASK);

Reverse christmas tree here please, thank you.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next ct-offload v3 00/15] Introduce connection tracking offload
  2020-03-12  0:01       ` Marcelo Ricardo Leitner
@ 2020-03-12  9:33         ` Paul Blakey
  0 siblings, 0 replies; 27+ messages in thread
From: Paul Blakey @ 2020-03-12  9:33 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Saeed Mahameed, Oz Shlomo, Vlad Buslov, David Miller, netdev,
	Jiri Pirko, Roi Dayan



On 3/12/2020 2:01 AM, Marcelo Ricardo Leitner wrote:
> On Wed, Mar 11, 2020 at 07:44:15PM -0300, Marcelo Ricardo Leitner wrote:
>> On Thu, Mar 12, 2020 at 12:27:37AM +0200, Paul Blakey wrote:
> ...
>>> if not try skipping flow_action_hw_stats_types_check() calls in mlx5 driver.
>> Ok. This one was my main suspect now after some extra printks. I could
>> confirm it parse_tc_fdb_actions is returning the error, but not sure
>> why yet.
> Copied the extack in flow_action_mixed_hw_stats_types_check() onto a
> printk and logged the if parameters:
>
> @@ -284,6 +284,8 @@ flow_action_mixed_hw_stats_types_check(const struct flow_action *action,
>
>         flow_action_for_each(i, action_entry, action) {
>                 if (i && action_entry->hw_stats_type != last_hw_stats_type) {
> +                       printk("Mixing HW stats types for actions is not supported, %d %d %d\n",
> +                              i, action_entry->hw_stats_type, last_hw_stats_type);
>                         NL_SET_ERR_MSG_MOD(extack, "Mixing HW stats types for actions is not supported");
>
> [  188.554636] Mixing HW stats types for actions is not supported, 2 0 3
>
> with iproute-next from today loaded with
> 'iproute2/net-next v2] tc: m_action: introduce support for hw stats
> type', I get a dump like:
>
> # ./tc -s filter show dev vxlan_sys_4789 ingress
> filter protocol ip pref 2 flower chain 0
> filter protocol ip pref 2 flower chain 0 handle 0x1
>   dst_mac 6a:66:2d:48:92:c2
>   src_mac 00:00:00:00:0e:b7
>   eth_type ipv4
>   ip_proto udp
>   ip_ttl 64
>   src_port 100
>   enc_dst_ip 1.1.1.1
>   enc_src_ip 1.1.1.2
>   enc_key_id 0
>   enc_dst_port 4789
>   enc_tos 0
>   ip_flags nofrag
>   not_in_hw
>         action order 1: tunnel_key  unset pipe
>          index 4 ref 1 bind 1 installed 2432 sec used 0 sec
>         Action statistics:
>         Sent 223744 bytes 4864 pkt (dropped 0, overlimits 0 requeues 0)
>         backlog 0b 0p requeues 0
>         no_percpu
>
>         action order 2:  pedit action pipe keys 2
>          index 4 ref 1 bind 1 installed 2432 sec used 0 sec
>          key #0  at ipv4+8: val 3f000000 mask 00ffffff
>          key #1  at udp+0: val 13890000 mask 0000ffff
>         Action statistics:
>         Sent 223744 bytes 4864 pkt (dropped 0, overlimits 0 requeues 0)
>         backlog 0b 0p requeues 0
>
>         action order 3: csum (iph, udp) action pipe
>         index 4 ref 1 bind 1 installed 2432 sec used 0 sec
>         Action statistics:
>         Sent 223744 bytes 4864 pkt (dropped 0, overlimits 0 requeues 0)
>         backlog 0b 0p requeues 0
>         no_percpu
>
>         action order 4: mirred (Egress Redirect to device eth0) stolen
>         index 1 ref 1 bind 1 installed 2432 sec used 0 sec
>         Action statistics:
>         Sent 223744 bytes 4864 pkt (dropped 0, overlimits 0 requeues 0)
>         backlog 0b 0p requeues 0
>         cookie 9c7d6b3aa84a32498181d98da0af80d2
>         no_percpu
>
> Which is quite interesting as it is not listing any hwstats fields at all.
> Considering that due to tcf_action_hw_stats_type_get() and
> hw_stats_type initialization in tcf_action_init_1(), the default is
> for it to be 3 if not specified (which is then not dumped over netlink
> by the kernel).
>
> I'm wondering how it was 0 for i==0,1 and is 3 for i==2.
There is a current bug with pedit, as it is split to muliple mangle action entries, and the action entry hw_stats field is not
init for all of them. Then we fail on checking all hw_stats are the same ...




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next ct-offload v3 05/15] net/sched: act_ct: Support restoring conntrack info on skbs
  2020-03-12  6:40   ` David Miller
@ 2020-03-12  9:33     ` Paul Blakey
  0 siblings, 0 replies; 27+ messages in thread
From: Paul Blakey @ 2020-03-12  9:33 UTC (permalink / raw)
  To: David Miller; +Cc: saeedm, ozsh, jakub.kicinski, vladbu, netdev, jiri, roid



On 3/12/2020 8:40 AM, David Miller wrote:
> From: Paul Blakey <paulb@mellanox.com>
> Date: Wed, 11 Mar 2020 16:33:48 +0200
>
>> +void tcf_ct_flow_table_restore_skb(struct sk_buff *skb, unsigned long cookie)
>> +{
>> +	enum ip_conntrack_info ctinfo = cookie & NFCT_INFOMASK;
>> +	struct nf_conn *ct = (struct nf_conn *)(cookie & NFCT_PTRMASK);
> Reverse christmas tree here please, thank you.
sure thanks.


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2020-03-12  9:34 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-11 14:33 [PATCH net-next ct-offload v3 00/15] Introduce connection tracking offload Paul Blakey
2020-03-11 14:33 ` [PATCH net-next ct-offload v3 01/15] net/mlx5: E-Switch, Enable reg c1 loopback when possible Paul Blakey
2020-03-11 14:33 ` [PATCH net-next ct-offload v3 02/15] net/mlx5e: en_rep: Create uplink rep root table after eswitch offloads table Paul Blakey
2020-03-11 14:33 ` [PATCH net-next ct-offload v3 03/15] netfilter: flowtable: Add API for registering to flow table events Paul Blakey
2020-03-11 14:33 ` [PATCH net-next ct-offload v3 04/15] net/sched: act_ct: Instantiate flow table entry actions Paul Blakey
2020-03-11 17:41   ` Edward Cree
2020-03-11 22:27     ` Paul Blakey
2020-03-11 14:33 ` [PATCH net-next ct-offload v3 05/15] net/sched: act_ct: Support restoring conntrack info on skbs Paul Blakey
2020-03-12  6:40   ` David Miller
2020-03-12  9:33     ` Paul Blakey
2020-03-11 14:33 ` [PATCH net-next ct-offload v3 06/15] net/sched: act_ct: Support refreshing the flow table entries Paul Blakey
2020-03-11 14:33 ` [PATCH net-next ct-offload v3 07/15] net/sched: act_ct: Enable hardware offload of flow table entires Paul Blakey
2020-03-11 14:33 ` [PATCH net-next ct-offload v3 08/15] net/mlx5: E-Switch, Introduce global tables Paul Blakey
2020-03-11 14:33 ` [PATCH net-next ct-offload v3 09/15] net/mlx5: E-Switch, Add support for offloading rules with no in_port Paul Blakey
2020-03-11 14:33 ` [PATCH net-next ct-offload v3 10/15] net/mlx5: E-Switch, Support getting chain mapping Paul Blakey
2020-03-11 14:33 ` [PATCH net-next ct-offload v3 11/15] flow_offload: Add flow_match_ct to get rule ct match Paul Blakey
2020-03-11 14:33 ` [PATCH net-next ct-offload v3 12/15] net/mlx5e: CT: Introduce connection tracking Paul Blakey
2020-03-11 14:33 ` [PATCH net-next ct-offload v3 13/15] net/mlx5e: CT: Offload established flows Paul Blakey
2020-03-11 17:45   ` Edward Cree
2020-03-11 22:29     ` Paul Blakey
2020-03-11 14:33 ` [PATCH net-next ct-offload v3 14/15] net/mlx5e: CT: Handle misses after executing CT action Paul Blakey
2020-03-11 14:33 ` [PATCH net-next ct-offload v3 15/15] net/mlx5e: CT: Support clear action Paul Blakey
2020-03-11 19:13 ` [PATCH net-next ct-offload v3 00/15] Introduce connection tracking offload Marcelo Ricardo Leitner
2020-03-11 22:27   ` Paul Blakey
2020-03-11 22:44     ` Marcelo Ricardo Leitner
2020-03-12  0:01       ` Marcelo Ricardo Leitner
2020-03-12  9:33         ` Paul Blakey

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.