From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A3C5C27C76 for ; Wed, 25 Jan 2023 15:32:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234565AbjAYPcs (ORCPT ); Wed, 25 Jan 2023 10:32:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55158 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235319AbjAYPcm (ORCPT ); Wed, 25 Jan 2023 10:32:42 -0500 Received: from NAM04-DM6-obe.outbound.protection.outlook.com (mail-dm6nam04on2068.outbound.protection.outlook.com [40.107.102.68]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5546B37B76 for ; Wed, 25 Jan 2023 07:32:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=OnonNiqNi3kuhauEefUNgUydN50rWko5gMHbiCngUcfV9QCpOoLGeZqWNR+7NyyIUI8IXhCIBJLfvixDwgBJjHZ6ZTfXZQNeckxC4ZB8CNoRkXAP1GSP1nhdcWIMm8A63K4rdOvv4IH6hDE+nKeAR1W7b6mBVETTDzQK1iXmPoIHW/TdjdY+U7m1XnN536M2S2K4Hlk2iUAIErSPrh/iML2GkKlx+ud2Z/XCY/VxZgkYL8LwRQcr0Th5mdCnCcYRbqgZQXmDXZp5PefiFsAObNseO7LnaWpSwce652F9NmVS9Hq7/ZGXLPeEN0WkzF/FyUk9fl2PCEjjlRr5nHHO3g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ZqYOn+xkPYCXmqll1Dj9wTrhdU+4Ta8ZDK3wY5upLGU=; b=cNxYkeI5pK2s0zPzwxyP7eBpqZfYaqYAR/T13nEe5Rh7crWO6znDmfpeARMZJyDqFCslWfdH7eIxYgt2kcLP28eEmw4BXCWjdREOV2qQRC5bFwVAiWFWT5WoI3yb2DeKj82CfINcA55XmtgVQ3NS1N9Zby+ICtozz/CWnYw7qb2U0WpjDlNtSDVsjdzhLQcYwELN7qDD2YFUqZI/K8LaY3Xv+O/G9IqaYkKYvv2Ni6QJ6baJvtAkSJfzQ115VignMqTFaTHrw0xJU2qdViC2Ozu/kAZCAX6RVDK1M+7QKJHXWARgBxVOKsU2UGMVtOa4+SGIitX0CU6VNxADLdsXbg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ZqYOn+xkPYCXmqll1Dj9wTrhdU+4Ta8ZDK3wY5upLGU=; b=uMBPv47i/ry8mHBeGUqI9+Sy8bs1cAGpNDWbPOMQwg110gB6uZJ2IIeyiDtTKETunV0bIdsEk18Iu1m8aJWCv54Bwssw1FXPfPYNI+TYWq3HZm74Z+6UWKtGbVQWevz98n8oOUfQF9QLS7MKl7nmh5Q2r/MIqba8T0pmQFmuOOaYppTP2E1CHd583nlbYckeIo1SGnS7ywDXFHX2/URHC4TqsFpnOT5oZ5CKR2pxaF4nDNytmv85uCJHZm4wMJqG1Am/lZCRFOrQ9dUma5AU1+S7UzuU1pg5ZxO7L0Jd0MBjzLVHDDA/Qq6vlwHtebU5G9xOGaD5j91VPvv86TqkJQ== Received: from MW4PR04CA0111.namprd04.prod.outlook.com (2603:10b6:303:83::26) by MW4PR12MB7438.namprd12.prod.outlook.com (2603:10b6:303:219::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6043.21; Wed, 25 Jan 2023 15:32:39 +0000 Received: from CO1NAM11FT022.eop-nam11.prod.protection.outlook.com (2603:10b6:303:83:cafe::af) by MW4PR04CA0111.outlook.office365.com (2603:10b6:303:83::26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.33 via Frontend Transport; Wed, 25 Jan 2023 15:32:39 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by CO1NAM11FT022.mail.protection.outlook.com (10.13.175.199) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6043.21 via Frontend Transport; Wed, 25 Jan 2023 15:32:39 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Wed, 25 Jan 2023 07:32:26 -0800 Received: from rnnvmail202.nvidia.com (10.129.68.7) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Wed, 25 Jan 2023 07:32:25 -0800 Received: from reg-r-vrt-019-180.mtr.labs.mlnx (10.127.8.11) by mail.nvidia.com (10.129.68.7) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Wed, 25 Jan 2023 07:32:22 -0800 From: Paul Blakey To: Paul Blakey , , Saeed Mahameed , Paolo Abeni , Jakub Kicinski , Eric Dumazet , Jamal Hadi Salim , Cong Wang , "David S. Miller" CC: Oz Shlomo , Jiri Pirko , Roi Dayan , Vlad Buslov Subject: [PATCH net-next v5 0/6] net/sched: cls_api: Support hardware miss to tc action Date: Wed, 25 Jan 2023 17:32:12 +0200 Message-ID: <20230125153218.7230-1-paulb@nvidia.com> X-Mailer: git-send-email 2.30.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1NAM11FT022:EE_|MW4PR12MB7438:EE_ X-MS-Office365-Filtering-Correlation-Id: 963bb230-7513-42d0-e06d-08dafee96485 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: cNWeC5IZtmcUrxYDUQF16ktl5vuUzO3DFq107lDHtWIiaOiQdUZNNAfmnty3tWmuTjVNeOvZBdkYX8BUGVmR/h/4lyNOyAIzvLB57uol2PJj2V8UmX5Y9TTVe6SVU09FkwgjU9NFyL10xWamQ5SDZvVOwrKfZXqJ2El5Uvav8MRSzMA0jdYboElVdIqyrHIBvyDCmiSfJhCeUVYLhn1rsdTswd5YQN8s9M+y/wbkImBXrN2NM2J1zgYYEpsIwRjLao5HYSajmfYV9chXbyQPg98a7fDXHDI58Myc0pjyiZjP9T37t0bd4MFewccIqYQkbr6Qe/pxPg8PkB93M0iS+odaTVwZ9k6nPMAeQIZeUnisYWIl0YMEQXujAIs+r+5TUepin4HkkqU++NciwpCURsVzuPBvkE3AeTYRqmEXYUao+tqmrSibsFuDj5Lsds0GhUePzplJl00VAtcQInEKazMZxb3PuUuRFN/qCmDSnuNOPBiLDBRgoNsQBwNSbGw1h6cQMODld+ev/dbTe9s1nllt52aWTNw8wBHnN17P3rn+VgqF1QejQm50KqCqgbHOqFxsUMjy0GN5dqBRizzYwIQmh60wKwVLxRC/PWbDMeLCSI0LBMVtjX4min7uYWOmyDJyWkp3I1lgDeknJfHvl6vNTBUN1yXn3VIT56hvcbauvLaUQ1n6geU2p/iSFmTePac2npk5fGZK4bGnwcRqlw== X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230025)(4636009)(396003)(136003)(376002)(346002)(39860400002)(451199018)(40470700004)(46966006)(36840700001)(47076005)(82310400005)(110136005)(86362001)(36756003)(40460700003)(40480700001)(356005)(36860700001)(7636003)(82740400003)(54906003)(316002)(2906002)(41300700001)(4326008)(70206006)(70586007)(8936002)(8676002)(5660300002)(186003)(26005)(1076003)(2616005)(83380400001)(426003)(6666004)(107886003)(478600001)(336012);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 Jan 2023 15:32:39.2114 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 963bb230-7513-42d0-e06d-08dafee96485 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT022.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW4PR12MB7438 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Hi, This series adds support for hardware miss to instruct tc to continue execution in a specific tc action instance on a filter's action list. The mlx5 driver patch (besides the refactors) shows its usage instead of using just chain restore. Currently a filter's action list must be executed all together or not at all as driver are only able to tell tc to continue executing from a specific tc chain, and not a specific filter/action. This is troublesome with regards to action CT, where new connections should be sent to software (via tc chain restore), and established connections can be handled in hardware. Checking for new connections is done when executing the ct action in hardware (by checking the packet's tuple against known established tuples). But if there is a packet modification (pedit) action before action CT and the checked tuple is a new connection, hardware will need to revert the previous packet modifications before sending it back to software so it can re-match the same tc filter in software and re-execute its CT action. The following is an example configuration of stateless nat on mlx5 driver that isn't supported before this patchet: #Setup corrosponding mlx5 VFs in namespaces $ ip netns add ns0 $ ip netns add ns1 $ ip link set dev enp8s0f0v0 netns ns0 $ ip netns exec ns0 ifconfig enp8s0f0v0 1.1.1.1/24 up $ ip link set dev enp8s0f0v1 netns ns1 $ ip netns exec ns1 ifconfig enp8s0f0v1 1.1.1.2/24 up #Setup tc arp and ct rules on mxl5 VF representors $ tc qdisc add dev enp8s0f0_0 ingress $ tc qdisc add dev enp8s0f0_1 ingress $ ifconfig enp8s0f0_0 up $ ifconfig enp8s0f0_1 up #Original side $ tc filter add dev enp8s0f0_0 ingress chain 0 proto ip flower \ ct_state -trk ip_proto tcp dst_port 8888 \ action pedit ex munge tcp dport set 5001 pipe \ action csum ip tcp pipe \ action ct pipe \ action goto chain 1 $ tc filter add dev enp8s0f0_0 ingress chain 1 proto ip flower \ ct_state +trk+est \ action mirred egress redirect dev enp8s0f0_1 $ tc filter add dev enp8s0f0_0 ingress chain 1 proto ip flower \ ct_state +trk+new \ action ct commit pipe \ action mirred egress redirect dev enp8s0f0_1 $ tc filter add dev enp8s0f0_0 ingress chain 0 proto arp flower \ action mirred egress redirect dev enp8s0f0_1 #Reply side $ tc filter add dev enp8s0f0_1 ingress chain 0 proto arp flower \ action mirred egress redirect dev enp8s0f0_0 $ tc filter add dev enp8s0f0_1 ingress chain 0 proto ip flower \ ct_state -trk ip_proto tcp \ action ct pipe \ action pedit ex munge tcp sport set 8888 pipe \ action csum ip tcp pipe \ action mirred egress redirect dev enp8s0f0_0 #Run traffic $ ip netns exec ns1 iperf -s -p 5001& $ sleep 2 #wait for iperf to fully open $ ip netns exec ns0 iperf -c 1.1.1.2 -p 8888 #dump tc filter stats on enp8s0f0_0 chain 0 rule and see hardware packets: $ tc -s filter show dev enp8s0f0_0 ingress chain 0 proto ip | grep "hardware.*pkt" Sent hardware 9310116832 bytes 6149672 pkt Sent hardware 9310116832 bytes 6149672 pkt Sent hardware 9310116832 bytes 6149672 pkt A new connection executing the first filter in hardware will first rewrite the dst port to the new port, and then the ct action is executed, because this is a new connection, hardware will need to be send this back to software, on chain 0, to execute the first filter again in software. The dst port needs to be reverted otherwise it won't re-match the old dst port in the first filter. Because of that, currently mlx5 driver will reject offloading the above action ct rule. This series adds supports partial offload of a filter's action list, and letting tc software continue processing in the specific action instance where hardware left off (in the above case after the "action pedit ex munge tcp dport... of the first rule") allowing support for scenarios such as the above. Changelog: v1->v2: Fixed compilation without CONFIG_NET_CLS Cover letter re-write v2->v3: Unlock spin_lock on error in cls flower filter handle refactor Cover letter v3->v4: Silence warning by clang v4->v5: Cover letter example Removed ifdef as much as possible by using inline stubs Paul Blakey (6): net/sched: cls_api: Support hardware miss to tc action net/sched: flower: Move filter handle initialization earlier net/sched: flower: Support hardware miss to tc action net/mlx5: Refactor tc miss handling to a single function net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG net/mlx5e: TC, Set CT miss to the specific ct action instance .../ethernet/mellanox/mlx5/core/en/rep/tc.c | 225 ++------------ .../mellanox/mlx5/core/en/tc/sample.c | 2 +- .../ethernet/mellanox/mlx5/core/en/tc_ct.c | 32 +- .../ethernet/mellanox/mlx5/core/en/tc_ct.h | 2 + .../net/ethernet/mellanox/mlx5/core/en_rx.c | 4 +- .../net/ethernet/mellanox/mlx5/core/en_tc.c | 276 ++++++++++++++++-- .../net/ethernet/mellanox/mlx5/core/en_tc.h | 21 +- .../net/ethernet/mellanox/mlx5/core/eswitch.h | 2 + .../mellanox/mlx5/core/lib/fs_chains.c | 14 +- include/linux/skbuff.h | 6 +- include/net/flow_offload.h | 1 + include/net/pkt_cls.h | 34 ++- include/net/sch_generic.h | 2 + net/openvswitch/flow.c | 2 +- net/sched/act_api.c | 2 +- net/sched/cls_api.c | 214 +++++++++++++- net/sched/cls_flower.c | 73 +++-- 17 files changed, 598 insertions(+), 314 deletions(-) -- 2.30.1