From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3336AC3F2D1 for ; Thu, 5 Mar 2020 15:34:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 15956208CD for ; Thu, 5 Mar 2020 15:34:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726954AbgCEPem (ORCPT ); Thu, 5 Mar 2020 10:34:42 -0500 Received: from mail-il-dmz.mellanox.com ([193.47.165.129]:48313 "EHLO mellanox.co.il" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726131AbgCEPej (ORCPT ); Thu, 5 Mar 2020 10:34:39 -0500 Received: from Internal Mail-Server by MTLPINE2 (envelope-from paulb@mellanox.com) with ESMTPS (AES256-SHA encrypted); 5 Mar 2020 17:34:34 +0200 Received: from reg-r-vrt-019-120.mtr.labs.mlnx (reg-r-vrt-019-120.mtr.labs.mlnx [10.213.19.120]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id 025FYYsO010824; Thu, 5 Mar 2020 17:34:34 +0200 From: Paul Blakey To: Paul Blakey , Saeed Mahameed , Oz Shlomo , Jakub Kicinski , Vlad Buslov , David Miller , "netdev@vger.kernel.org" , Jiri Pirko , Roi Dayan Subject: [PATCH net-next ct-offload 00/13] Introduce connection tracking offload Date: Thu, 5 Mar 2020 17:34:15 +0200 Message-Id: <1583422468-8456-1-git-send-email-paulb@mellanox.com> X-Mailer: git-send-email 1.8.4.3 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Background ---------- The connection tracking action provides the ability to associate connection state to a packet. The connection state may be used for stateful packet processing such as stateful firewalls and NAT operations. Connection tracking in TC SW ---------------------------- The CT state may be matched only after the CT action is performed. As such, CT use cases are commonly implemented using multiple chains. Consider the following TC filters, as an example: 1. tc filter add dev ens1f0_0 ingress prio 1 chain 0 proto ip flower \ src_mac 24:8a:07:a5:28:01 ct_state -trk \ action ct \ pipe action goto chain 2 2. tc filter add dev ens1f0_0 ingress prio 1 chain 2 proto ip flower \ ct_state +trk+new \ action ct commit \ pipe action tunnel_key set \ src_ip 0.0.0.0 \ dst_ip 7.7.7.8 \ id 98 \ dst_port 4789 \ action mirred egress redirect dev vxlan0 3. tc filter add dev ens1f0_0 ingress prio 1 chain 2 proto ip flower \ ct_state +trk+est \ action tunnel_key set \ src_ip 0.0.0.0 \ dst_ip 7.7.7.8 \ id 98 \ dst_port 4789 \ action mirred egress redirect dev vxlan0 Filter #1 (chain 0) decides, after initial packet classification, to send the packet to the connection tracking module (ct action). Once the ct_state is initialized by the CT action the packet processing continues on chain 2. Chain 2 classifies the packet based on the ct_state. Filter #2 matches on the +trk+new CT state while filter #3 matches on the +trk+est ct_state. MLX5 Connection tracking HW offload - MLX5 driver patches ------------------------------ The MLX5 hardware model aligns with the software model by realizing a multi-table architecture. In SW the TC CT action sets the CT state on the skb. Similarly, HW sets the CT state on a HW register. Driver gets this CT state while offloading a tuple with a new ct_metadata action that provides it. Matches on ct_state are translated to HW register matches. TC filter with CT action broken to two rules, a pre_ct rule, and a post_ct rule. pre_ct rule: Inserted on the corrosponding tc chain table, matches on original tc match, with actions: any pre ct actions, set fte_id, set zone, and goto the ct table. The fte_id is a register mapping uniquely identifying this filter. post_ct_rule: Inserted in a post_ct table, matches on the fte_id register mapping, with actions: counter + any post ct actions (this is usally 'goto chain X') post_ct table is a table that all the tuples inserted to the ct table goto, so if there is a tuple hit, packet will continue from ct table to post_ct table, after being marked with the CT state (mark/label..) This design ensures that the rule's actions and counters will be executed only after a CT hit. HW misses will continue processing in SW from the last chain ID that was processed in hardware. The following illustrates the HW model: +-------------------+ +--------------------+ +--------------+ + pre_ct (tc chain) +----->+ CT (nat or no nat) +--->+ post_ct +-----> + original match + | + tuple + zone match + | + fte_id match + | +-------------------+ | +--------------------+ | +--------------+ | v v v set chain miss mapping set mark original set fte_id set label filter set zone set established actions set tunnel_id do nat (if needed) do decap To fill CT table, driver registers a CB for flow offload events, for each new flow table that is passed to it from offloading ct actions. Once a flow offload event is triggered on this CB, offload this flow to the hardware CT table. Established events offload -------------------------- Currently, act_ct maintains an FT instance per ct zone. Flow table entries are created, per ct connection, when connections enter an established state and deleted otherwise. Once an entry is created, the FT assumes ownership of the entries, and manages their aging. FT is used for software offload of conntrack. FT entries associate 5-tuples with an action list. The act_ct changes in this patchset: Populate the action list with a (new) ct_metadata action, providing the connection's ct state (zone,mark and label), and mangle actions if NAT is configured. Pass the action's flow table instance as ct action entry parameter, so when the action is offloaded, the driver may register a callback on it's block to receive FT flow offload add/del/stats events. Netilter changes -------------------------- The netfilter changes export the relevant bits, and add the relevant CBs to support the above. Paul Blakey (13): netfilter: flowtable: Add API for registering to flow table events net/sched: act_ct: Instantiate flow table entry actions net/sched: act_ct: Support restoring conntrack info on skbs net/sched: act_ct: Support refreshing the flow table entries net/sched: act_ct: Enable hardware offload of flow table entires net/mlx5: E-Switch, Introduce global tables net/mlx5: E-Switch, Add support for offloading rules with no in_port net/mlx5: E-Switch, Support getting chain mapping flow_offload: Add flow_match_ct to get rule ct match net/mlx5e: CT: Introduce connection tracking net/mlx5e: CT: Offload established flows net/mlx5e: CT: Handle misses after executing CT action net/mlx5e: CT: Support clear action drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 10 + drivers/net/ethernet/mellanox/mlx5/core/Makefile | 1 + drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c | 1333 ++++++++++++++++++++ drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h | 171 +++ drivers/net/ethernet/mellanox/mlx5/core/en_rep.h | 3 + drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 121 +- drivers/net/ethernet/mellanox/mlx5/core/en_tc.h | 9 + drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 5 + .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 26 +- .../mellanox/mlx5/core/eswitch_offloads_chains.c | 43 + .../mellanox/mlx5/core/eswitch_offloads_chains.h | 13 + include/net/flow_offload.h | 14 + include/net/netfilter/nf_flow_table.h | 32 + include/net/tc_act/tc_ct.h | 17 + net/core/flow_offload.c | 7 + net/netfilter/nf_flow_table_core.c | 60 + net/netfilter/nf_flow_table_ip.c | 15 +- net/netfilter/nf_flow_table_offload.c | 27 +- net/sched/act_ct.c | 210 +++ net/sched/cls_api.c | 1 + 20 files changed, 2058 insertions(+), 60 deletions(-) create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h -- 1.8.3.1