From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.0 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E9E9C6369F for ; Sun, 20 Jan 2019 09:37:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 33B132084F for ; Sun, 20 Jan 2019 09:37:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=CAVIUMNETWORKS.onmicrosoft.com header.i=@CAVIUMNETWORKS.onmicrosoft.com header.b="bWOYcEjz" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730409AbfATJh0 (ORCPT ); Sun, 20 Jan 2019 04:37:26 -0500 Received: from mail-eopbgr750052.outbound.protection.outlook.com ([40.107.75.52]:28896 "EHLO NAM02-BL2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726939AbfATJh0 (ORCPT ); Sun, 20 Jan 2019 04:37:26 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=CAVIUMNETWORKS.onmicrosoft.com; s=selector1-cavium-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=IkI871/a3DTajwFfIeTiEpY74XIhFjhRGnV5HKxWvB4=; b=bWOYcEjzpVhVmiXfoCgokkp3MtcQ/IXE/dyAdjYuwISQcF73eXQzxQjrsfUjhSOVeZpjSlSJVS7Ew9Mlv2chQbNiRswSC+FGARJ7Cy+BibyElDL08Az7d1XTtHvwvt029PlumJkWvgIPblT+MN6feGyJA66LphfgexHvW9101JM= Received: from CY1PR07CA0041.namprd07.prod.outlook.com (2a01:111:e400:c60a::51) by BN7PR07MB4658.namprd07.prod.outlook.com (2603:10b6:406:f1::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1537.31; Sun, 20 Jan 2019 09:37:20 +0000 Received: from DM3NAM05FT047.eop-nam05.prod.protection.outlook.com (2a01:111:f400:7e51::205) by CY1PR07CA0041.outlook.office365.com (2a01:111:e400:c60a::51) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.1537.26 via Frontend Transport; Sun, 20 Jan 2019 09:37:20 +0000 Authentication-Results: spf=pass (sender IP is 199.233.58.38) smtp.mailfrom=cavium.com; vger.kernel.org; dkim=none (message not signed) header.d=none;vger.kernel.org; dmarc=bestguesspass action=none header.from=cavium.com; Received-SPF: Pass (protection.outlook.com: domain of cavium.com designates 199.233.58.38 as permitted sender) receiver=protection.outlook.com; client-ip=199.233.58.38; helo=CAEXCH02.caveonetworks.com; Received: from CAEXCH02.caveonetworks.com (199.233.58.38) by DM3NAM05FT047.mail.protection.outlook.com (10.152.98.161) with Microsoft SMTP Server (version=TLS1_0, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA) id 15.20.1558.9 via Frontend Transport; Sun, 20 Jan 2019 09:37:19 +0000 Received: from lb-tlvb-michal.qlc.com (10.185.6.89) by CAEXCH02.caveonetworks.com (10.67.98.110) with Microsoft SMTP Server id 14.2.347.0; Sun, 20 Jan 2019 01:37:13 -0800 From: Michal Kalderon To: , , CC: , Tomer Tayar Subject: [PATCH net-next 3/3] qede: Error recovery process Date: Sun, 20 Jan 2019 11:36:39 +0200 Message-ID: <20190120093639.11781-4-michal.kalderon@cavium.com> X-Mailer: git-send-email 2.14.4 In-Reply-To: <20190120093639.11781-1-michal.kalderon@cavium.com> References: <20190120093639.11781-1-michal.kalderon@cavium.com> MIME-Version: 1.0 Content-Type: text/plain X-EOPAttributedMessage: 0 X-Forefront-Antispam-Report: CIP:199.233.58.38;IPV:CAL;CTRY:US;EFV:NLI;SFV:NSPM;SFS:(10009020)(396003)(136003)(39860400002)(346002)(376002)(2980300002)(199004)(189003)(1076003)(305945005)(356004)(6666004)(30864003)(104016004)(86362001)(2201001)(53416004)(53936002)(4326008)(107886003)(69596002)(68736007)(106466001)(8936002)(81166006)(81156014)(8676002)(50226002)(47776003)(36756003)(14444005)(186003)(336012)(97736004)(26005)(77096007)(76176011)(51416003)(106002)(2906002)(72206003)(26826003)(508600001)(16586007)(54906003)(110136005)(316002)(126002)(44832011)(476003)(50466002)(446003)(48376002)(2616005)(11346002)(486006);DIR:OUT;SFP:1101;SCL:1;SRVR:BN7PR07MB4658;H:CAEXCH02.caveonetworks.com;FPR:;SPF:Pass;LANG:en;PTR:InfoDomainNonexistent;A:1;MX:1; X-Microsoft-Exchange-Diagnostics: 1;DM3NAM05FT047;1:+Fk0Mdy/W1lhLUXCoPnDU7yKF7/6MolcNH5dRpPmYA9iwehqViSdWDUrBlj8i+wAH8tKgUGmHqmftNIS1GR7KcAeWzzLiQDuQoYnunss5Ve3l7i0Jr4LxpnmtgiVBST6FEJvyAcTAKbLPJIjwd631JW73x/Bii72HeewkXFrUYQ= X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 8f1b4022-cc86-43a7-569e-08d67ebadfbc X-Microsoft-Antispam: BCL:0;PCL:0;RULEID:(2390118)(7020095)(4652040)(7021145)(8989299)(4534185)(7022145)(4603075)(4627221)(201702281549075)(8990200)(7048125)(7024125)(7027125)(7023125)(5600109)(711020)(4608076)(4709027)(2017052603328)(7153060);SRVR:BN7PR07MB4658; X-Microsoft-Exchange-Diagnostics: 1;BN7PR07MB4658;3:feRw+MCC/w0ElKuFLOnJXhm0fXpdk/qsnBWrLEFFuemS8M+DSXm4ASxFwc/Na29ShDLw3oSwoKlhLeBkmvmsCOxYUOxK+2AsvmFdg/AL+Ljcbdex8CWUglzRTWT1Njo3meZ7UEHhXVD3X3Bb1N6DOqiZQM9N60/rjtUKdVJBCCdsDi7WjSYKAsqhoyWtFgea14Tlf7GFTAPBVBWb57XeH5brNRbpFlC+9wq/PnHcAbxTGL54eP1FqljdKieJmcPnYy7AhBXtCR3PphdngWS2mVzmnD1aiWX/J16MaYAwxdln2fyNZUKVt0iKrkFrWkfRDEfGfltmlqPUssqhrVjxtSW96uLMpYSEH8gpjwD+FUyUpgsNVThsBjAUFFKl6zVn;25:wMy0rZhoptT18iokKNdc+czVq1kAewx0BjJnBldEAeVprq49h6sIpnxmnetdG68tKS0jAkc+fkSe7oA5vY42N4vjZUL7u43GpBQuXPrLC4+Khxsgvn2mkjBs3OvdG8zO+NZbMro0oaDI2qJIhrsAUMw0sUeYrVV81FO9WT3/8/AVLvIInpbVROOaVbeFfjhbPwKB/44bScSeayW3wflZeW8tKIxTXE2+ud+dX5Ji6IagpttfVZVJUSp+PycnTaul/HVYbQ3o0szA1ZpIz2UeIgow4knU3Hl82Ebc/rkNjfpgdKqStka/VxbMcaDTFA33mL0xSFUO7B7nOccih+Go5g== X-MS-TrafficTypeDiagnostic: BN7PR07MB4658: X-Microsoft-Exchange-Diagnostics: 1;BN7PR07MB4658;31:zCdITMhcUGM6G0eUtGKq8jlGtYpnUmtkRPDDanC1ryfupx+s8yBeBUMQ/wGLTvv6bFYWF2teLs0hk2rLAQ6IJHXo8uxUo/I+u6dLmP1aMhHgwZjwQN3brxb/gTDckY/AqlCaLqtWqu2Mtei6wh/JH5GJ2yXMmLYUrEi24nOS85xKDixXIRE1PZrlRx9rpHrlLhaHZT5HgxLd/Q5KYuS4efN0PpYWHFoVjPNqmwq+yxI=;20:cW0/KPhFJM18efxfimsNArdvq7OenyQTGkgjNCR+dP7Mh6a/goRhx7JVS2EQFp98W3V/HIOxgeDF1lcVEVpnXuoXO9D4HR0yq/gqhlpU6olPWBqkG4GDyjP+sfuHLZlzwXitA9SyWrxnxn0Su8sCC93kZGOj7iKImRDVAxSJCqfrUtI1BYcShdCFK9RH1ienWl5IJ+mTH46llx0zvinsYSS1avd60iL0Eqhdf5n0fNMniVsdhOXjVJ4IRvZBjO8S8X1KO9ULDVT5kuMDgFikke8Z0D3rnV58ryiL8BRfpZV3/HcWiEy65xKRRfWzHKWITA9LAQ5eAKDWVQBynDh1vtsOh+qbjmcL3zQQzkgsy1/i+NeBHfPaDdg+7jJXsqWjD5uSKTXCTMa6E4FUDjsIgymVphCviwtcDTAzrp77MJxqsONweM+McfyJoSRQLdp6OvAKHAQE/Ft4buG5yn7OK28uA3QmITSz6RSDoHHQq2EhSkWChOguP22fQDbmYRMg X-Microsoft-Antispam-PRVS: X-Microsoft-Exchange-Diagnostics: 1;BN7PR07MB4658;4:0hrrh1g4Vwd9P78+Y2e1/4xRQcxri+FSjf1GyJ7EuN2hWOwgHezmuLNC4Fsi8HpCoha5hHVj1d/pHoIF/HtJtQsGzkxFu+wwBAJJ9M09MS/bLhFHlgMQ3bHXs1yuScKPGmvKjLTrQJxhJ9w3Ka9Rwx0Ez0POf8V8DcPZLXhYGsmW018mHqnaUKzg8iMtvyqbr13u4os6Asbrw8akQGJq64PfLF2obb+1ITAgv8AdWnjlcTSx+cI19kz80SO5NaHSk8+FvEybLZlrdwYYqq2gzdi/QtiNdgHmHiU2d4uTO4/52oG83sVEQNcGVsnbpNUK X-Forefront-PRVS: 0923977CCA X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;BN7PR07MB4658;23:7eV5UIKAzuTxO/qMIHiQH+Jr59/rCiNSs7DXwt8Ng?= =?us-ascii?Q?92Uor8lfhK+DCpE6f0gZTFPkMm6iWVmQb2XvjkQDUJSqGBIzSF/FfuEMNDNA?= =?us-ascii?Q?9t6NlHCTp172bO4vl2IOdk8BTU/4xf4VmoTvdxbrF6YS2h6QKIDul+X9ZrFK?= =?us-ascii?Q?ct5qYm7P/sa/9ElnJGpf32JuuED4VWCJF2dE11n0mCGCnllb7O7TAtv3uBEk?= =?us-ascii?Q?Hq9P+RtS1Cv6OfeQ+sH9hiOx7NlgO1KE40qKLep30GV0ETZFlsT22G71TJoN?= =?us-ascii?Q?JJZkHqlPGqGmN5G4x1qbH0/H0VrVLdUQAiek5O0uR+uxUGbBjCgVzii+j1Mz?= =?us-ascii?Q?m+qaieFZOi4GHD3UJ/QN9ti9y0gz7dajVUrYQbiQTEJQWmrToI8vzJUCX8L/?= =?us-ascii?Q?G0OUP2ilW7t8+dxSlAJscbh1Dax+31qObGIIWdGpNRu3e4tO86vQC1Qae3Ck?= =?us-ascii?Q?af6gDZsvzW4Qp8Bf3IP9BLOldNrNXrUFb9jD1HbIJ4qWghswcBe8RK66z79A?= =?us-ascii?Q?qvdytHY6DfQSmtoISgu82UwBSESz1cwJ1hGbkaVKVm+NOQMZw382hSFoig9M?= =?us-ascii?Q?4gNLLaZtzTFJlo/gupihYHF8yifj3lPGPvzOnmfYQR2cTquIPindbeAjcGyz?= =?us-ascii?Q?OzxURqzzEo1LN6lx5m91QcEOsc8P8alUBfsQi3OPUaueaOjmUZPaoR/1Ykd0?= =?us-ascii?Q?uNoVIyOf3+igBhC67rtH2C1iGxhDWwjrphJpScCDzAigO0WOFidyx711tSKe?= =?us-ascii?Q?ln3bJWBpTdbbBwH9D5ZoXR3aQ5mPYZUZbaf/NDDPrqqllJfEiqivImfln0vY?= =?us-ascii?Q?diHbIIpKUYuoYprnjKeQOSvPkey2pNWqZ7rBTEweL1hiHuORWPLHVADT1O15?= =?us-ascii?Q?FJgOgNM/CrSnKSNnjEcy+0/8srSbP8eBN2/M09FPmcAz2Ut8YYaD+0C/sJwz?= =?us-ascii?Q?ZnobsnCEQr5IfbQNr8qnvWKMdPSq7L7upEQNkCrlcYF/7sQw39tEoSUEpbX+?= =?us-ascii?Q?1dSyefGZlnRlebRYVQnC6nNAS9Fa7rLmhgmqDGtZJ7i8RNWLb02JPXlp//oe?= =?us-ascii?Q?8XH0LE5r+gyLqt/5cMQn+3iqGjInuqadOts7hs2BMX4RfLnTaqixy1hetDKi?= =?us-ascii?Q?vJnhsWaI4FJDC79MSAouid4DA/9R8fk/dZHUcwnWUd6URQ8t1GnlEEwEg6oF?= =?us-ascii?Q?jvMgjdRJP+3Ir6WrQceON+8bD4xNK6D6LsZJ5ejIJHhm1F7Q49M/+r4OQ=3D?= =?us-ascii?Q?=3D?= X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Message-Info: G99WWtK2oiTNINVrjO9BbF8ddEqSFANSrMw1n7BpBUJkIyKXITO1u0LG1R42/XQtypvAS5+BbLdv0tybIrgLagx1xVCXgV/COQ2y3nlgsUHRfqnpnigo3Qi+1vvdVOU+UqIyJ7F9MkQLlHJjoYhSmPONWewJgFsON17pDayhBuyx2pY5r58ePrS10w2C+vAVcW2AjHIDEA+QaiFIT/Zc8AL+9v/ZCKj7UcHILzqDrcgGl7rT4TSrsfYn51bzqZYtsU0+enQRq+FALWtUjd/JKeJ79fVFl0eH/y0qX9htORdKoVobb6EjHIRbKbuGek61YCc3EinmIfGI3ZqDPw84JTXAi8Ds0+6OhCINUkipKFsws0TmRoX42GwJH+UzqKir+rlw0eSuw+xwcX+MAb0GlUbl5EryZIPWt+KZ7uzTkZg= X-Microsoft-Exchange-Diagnostics: 1;BN7PR07MB4658;6:44g2xyc9aRGUqnT5GTwftqf1W4zLp8x02LmWnSb/PWgJye3X/C/cXijrGVB/YPx9W1ut9DjSAhvf5WNzXCw2i2cvOF4NMm217bjzcysJLWUEx49TsgvTEeHme0U9TQxvdGzO5SUeFs20QNstFOJHx8nErdx5c3LAgdbjo1fOYMIPeViwL20t/D/FcEVluk+zXH+wEZTGd0DY9ZsaXZLIblZ7S8CQ54cNCOK+czq0E1c6EdCT7qm+pkQgbeEUbTprg8K7dpMvrmxmnoYJhfBjwPoJYvhFUWf9+sTtdrdRIv1IHtC8exC5f2+JP9/cQ3rU10kgoHJMcJDdGOOJNfaj0GtxhzgivrmRTZuz8BCCUsShcTIbivjqh0cHKYXG60OHdj224Y2gmpFPKEeT1V9yE6E71pIs2W0jFvTrETwoc7lQ18x7P0vO8CzoSHPZTQcyGp1GIqtOT52yLkoKbpBjmA==;5:VU7ZyMUpywCEqcuycabUXvYeioa7sYWv1gqHuzmucVWdveI9kWkR3ULEG+DYN3C82e9GWDpsizo3pr/u6Fo0W5lcLdnpx79IM1U0oAKYTLJantqr4AmvKIUJezPf67IWuxP0gFdydR2UJhlwG4fRma+vWjm9Smt96XQgP9lPSjDK/9+XkPPYsi0IT/d41L1pKSCabqHQ/uQpXGKTBkbgHw==;7:pjtEObvg6OnyNPvVVnMVdZjVliObGVn9pFE5KsHBIUqL4UkqwsXhbbKhLEc0nycm26UcDRqY57PicTgtLTMQlzkysN1RvSwVti7qCp0Wfp5KkTNsZmjQie8SJRssKzp7w8I+u7A3+z6l+RYWrF2LmQ== SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: cavium.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Jan 2019 09:37:19.4370 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 8f1b4022-cc86-43a7-569e-08d67ebadfbc X-MS-Exchange-CrossTenant-Id: 711e4ccf-2e9b-4bcf-a551-4094005b6194 X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=711e4ccf-2e9b-4bcf-a551-4094005b6194;Ip=[199.233.58.38];Helo=[CAEXCH02.caveonetworks.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN7PR07MB4658 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Tomer Tayar This patch adds the error recovery process in the qede driver. The process includes a partial/customized driver unload and load, which allows it to look like a short suspend period to the kernel while preserving the net devices' state. Signed-off-by: Tomer Tayar Signed-off-by: Ariel Elior Signed-off-by: Michal Kalderon --- drivers/net/ethernet/qlogic/qede/qede.h | 3 + drivers/net/ethernet/qlogic/qede/qede_main.c | 300 ++++++++++++++++++++++----- drivers/net/ethernet/qlogic/qede/qede_rdma.c | 64 ++++-- include/linux/qed/qede_rdma.h | 21 +- 4 files changed, 314 insertions(+), 74 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qede/qede.h b/drivers/net/ethernet/qlogic/qede/qede.h index 613249d..8434164 100644 --- a/drivers/net/ethernet/qlogic/qede/qede.h +++ b/drivers/net/ethernet/qlogic/qede/qede.h @@ -162,6 +162,7 @@ struct qede_rdma_dev { struct list_head entry; struct list_head rdma_event_list; struct workqueue_struct *rdma_wq; + bool exp_recovery; }; struct qede_ptp; @@ -264,6 +265,7 @@ struct qede_dev { enum QEDE_STATE { QEDE_STATE_CLOSED, QEDE_STATE_OPEN, + QEDE_STATE_RECOVERY, }; #define HILO_U64(hi, lo) ((((u64)(hi)) << 32) + (lo)) @@ -462,6 +464,7 @@ struct qede_fastpath { #define QEDE_CSUM_UNNECESSARY BIT(1) #define QEDE_TUNN_CSUM_UNNECESSARY BIT(2) +#define QEDE_SP_RECOVERY 0 #define QEDE_SP_RX_MODE 1 #ifdef CONFIG_RFS_ACCEL diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c index 5a74fcb..de955f2 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_main.c +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c @@ -133,23 +133,12 @@ enum qede_pci_private { static void qede_remove(struct pci_dev *pdev); static void qede_shutdown(struct pci_dev *pdev); static void qede_link_update(void *dev, struct qed_link_output *link); +static void qede_schedule_recovery_handler(void *dev); +static void qede_recovery_handler(struct qede_dev *edev); static void qede_get_eth_tlv_data(void *edev, void *data); static void qede_get_generic_tlv_data(void *edev, struct qed_generic_tlvs *data); -/* The qede lock is used to protect driver state change and driver flows that - * are not reentrant. - */ -void __qede_lock(struct qede_dev *edev) -{ - mutex_lock(&edev->qede_lock); -} - -void __qede_unlock(struct qede_dev *edev) -{ - mutex_unlock(&edev->qede_lock); -} - #ifdef CONFIG_QED_SRIOV static int qede_set_vf_vlan(struct net_device *ndev, int vf, u16 vlan, u8 qos, __be16 vlan_proto) @@ -231,6 +220,7 @@ static int qede_sriov_configure(struct pci_dev *pdev, int num_vfs_param) .arfs_filter_op = qede_arfs_filter_op, #endif .link_update = qede_link_update, + .schedule_recovery_handler = qede_schedule_recovery_handler, .get_generic_tlv_data = qede_get_generic_tlv_data, .get_protocol_tlv_data = qede_get_eth_tlv_data, }, @@ -950,11 +940,57 @@ static int qede_alloc_fp_array(struct qede_dev *edev) return -ENOMEM; } +/* The qede lock is used to protect driver state change and driver flows that + * are not reentrant. + */ +void __qede_lock(struct qede_dev *edev) +{ + mutex_lock(&edev->qede_lock); +} + +void __qede_unlock(struct qede_dev *edev) +{ + mutex_unlock(&edev->qede_lock); +} + +/* This version of the lock should be used when acquiring the RTNL lock is also + * needed in addition to the internal qede lock. + */ +void qede_lock(struct qede_dev *edev) +{ + rtnl_lock(); + __qede_lock(edev); +} + +void qede_unlock(struct qede_dev *edev) +{ + __qede_unlock(edev); + rtnl_unlock(); +} + static void qede_sp_task(struct work_struct *work) { struct qede_dev *edev = container_of(work, struct qede_dev, sp_task.work); + /* The locking scheme depends on the specific flag: + * In case of QEDE_SP_RECOVERY, acquiring the RTNL lock is required to + * ensure that ongoing flows are ended and new ones are not started. + * In other cases - only the internal qede lock should be acquired. + */ + + if (test_and_clear_bit(QEDE_SP_RECOVERY, &edev->sp_flags)) { +#ifdef CONFIG_QED_SRIOV + /* SRIOV must be disabled outside the lock to avoid a deadlock. + * The recovery of the active VFs is currently not supported. + */ + qede_sriov_configure(edev->pdev, 0); +#endif + qede_lock(edev); + qede_recovery_handler(edev); + qede_unlock(edev); + } + __qede_lock(edev); if (test_and_clear_bit(QEDE_SP_RX_MODE, &edev->sp_flags)) @@ -1031,8 +1067,13 @@ static void qede_log_probe(struct qede_dev *edev) enum qede_probe_mode { QEDE_PROBE_NORMAL, + QEDE_PROBE_RECOVERY, }; +#define QEDE_RDMA_PROBE_MODE(mode) \ + ((mode) == QEDE_PROBE_NORMAL ? QEDE_RDMA_PROBE_NORMAL \ + : QEDE_RDMA_PROBE_RECOVERY) + static int __qede_probe(struct pci_dev *pdev, u32 dp_module, u8 dp_level, bool is_vf, enum qede_probe_mode mode) { @@ -1051,6 +1092,7 @@ static int __qede_probe(struct pci_dev *pdev, u32 dp_module, u8 dp_level, probe_params.dp_module = dp_module; probe_params.dp_level = dp_level; probe_params.is_vf = is_vf; + probe_params.recov_in_prog = (mode == QEDE_PROBE_RECOVERY); cdev = qed_ops->common->probe(pdev, &probe_params); if (!cdev) { rc = -ENODEV; @@ -1078,11 +1120,20 @@ static int __qede_probe(struct pci_dev *pdev, u32 dp_module, u8 dp_level, if (rc) goto err2; - edev = qede_alloc_etherdev(cdev, pdev, &dev_info, dp_module, - dp_level); - if (!edev) { - rc = -ENOMEM; - goto err2; + if (mode != QEDE_PROBE_RECOVERY) { + edev = qede_alloc_etherdev(cdev, pdev, &dev_info, dp_module, + dp_level); + if (!edev) { + rc = -ENOMEM; + goto err2; + } + } else { + struct net_device *ndev = pci_get_drvdata(pdev); + + edev = netdev_priv(ndev); + edev->cdev = cdev; + memset(&edev->stats, 0, sizeof(edev->stats)); + memcpy(&edev->dev_info, &dev_info, sizeof(dev_info)); } if (is_vf) @@ -1090,28 +1141,31 @@ static int __qede_probe(struct pci_dev *pdev, u32 dp_module, u8 dp_level, qede_init_ndev(edev); - rc = qede_rdma_dev_add(edev); + rc = qede_rdma_dev_add(edev, QEDE_RDMA_PROBE_MODE(mode)); if (rc) goto err3; - /* Prepare the lock prior to the registration of the netdev, - * as once it's registered we might reach flows requiring it - * [it's even possible to reach a flow needing it directly - * from there, although it's unlikely]. - */ - INIT_DELAYED_WORK(&edev->sp_task, qede_sp_task); - mutex_init(&edev->qede_lock); - rc = register_netdev(edev->ndev); - if (rc) { - DP_NOTICE(edev, "Cannot register net-device\n"); - goto err4; + if (mode != QEDE_PROBE_RECOVERY) { + /* Prepare the lock prior to the registration of the netdev, + * as once it's registered we might reach flows requiring it + * [it's even possible to reach a flow needing it directly + * from there, although it's unlikely]. + */ + INIT_DELAYED_WORK(&edev->sp_task, qede_sp_task); + mutex_init(&edev->qede_lock); + + rc = register_netdev(edev->ndev); + if (rc) { + DP_NOTICE(edev, "Cannot register net-device\n"); + goto err4; + } } edev->ops->common->set_name(cdev, edev->ndev->name); /* PTP not supported on VFs */ if (!is_vf) - qede_ptp_enable(edev, true); + qede_ptp_enable(edev, (mode == QEDE_PROBE_NORMAL)); edev->ops->register_ops(cdev, &qede_ll_ops, edev); @@ -1126,7 +1180,7 @@ static int __qede_probe(struct pci_dev *pdev, u32 dp_module, u8 dp_level, return 0; err4: - qede_rdma_dev_remove(edev); + qede_rdma_dev_remove(edev, QEDE_RDMA_PROBE_MODE(mode)); err3: free_netdev(edev->ndev); err2: @@ -1162,8 +1216,13 @@ static int qede_probe(struct pci_dev *pdev, const struct pci_device_id *id) enum qede_remove_mode { QEDE_REMOVE_NORMAL, + QEDE_REMOVE_RECOVERY, }; +#define QEDE_RDMA_REMOVE_MODE(mode) \ + ((mode) == QEDE_REMOVE_NORMAL ? QEDE_RDMA_REMOVE_NORMAL \ + : QEDE_RDMA_REMOVE_RECOVERY) + static void __qede_remove(struct pci_dev *pdev, enum qede_remove_mode mode) { struct net_device *ndev = pci_get_drvdata(pdev); @@ -1172,15 +1231,19 @@ static void __qede_remove(struct pci_dev *pdev, enum qede_remove_mode mode) DP_INFO(edev, "Starting qede_remove\n"); - qede_rdma_dev_remove(edev); - unregister_netdev(ndev); - cancel_delayed_work_sync(&edev->sp_task); + qede_rdma_dev_remove(edev, QEDE_RDMA_REMOVE_MODE(mode)); - qede_ptp_disable(edev); + if (mode != QEDE_REMOVE_RECOVERY) { + unregister_netdev(ndev); - edev->ops->common->set_power_state(cdev, PCI_D0); + cancel_delayed_work_sync(&edev->sp_task); - pci_set_drvdata(pdev, NULL); + edev->ops->common->set_power_state(cdev, PCI_D0); + + pci_set_drvdata(pdev, NULL); + } + + qede_ptp_disable(edev); /* Use global ops since we've freed edev */ qed_ops->common->slowpath_stop(cdev); @@ -1194,7 +1257,8 @@ static void __qede_remove(struct pci_dev *pdev, enum qede_remove_mode mode) * [e.g., QED register callbacks] won't break anything when * accessing the netdevice. */ - free_netdev(ndev); + if (mode != QEDE_REMOVE_RECOVERY) + free_netdev(ndev); dev_info(&pdev->dev, "Ending qede_remove successfully\n"); } @@ -1539,6 +1603,58 @@ static int qede_alloc_mem_load(struct qede_dev *edev) return 0; } +static void qede_empty_tx_queue(struct qede_dev *edev, + struct qede_tx_queue *txq) +{ + unsigned int pkts_compl = 0, bytes_compl = 0; + struct netdev_queue *netdev_txq; + int rc, len = 0; + + netdev_txq = netdev_get_tx_queue(edev->ndev, txq->ndev_txq_id); + + while (qed_chain_get_cons_idx(&txq->tx_pbl) != + qed_chain_get_prod_idx(&txq->tx_pbl)) { + DP_VERBOSE(edev, NETIF_MSG_IFDOWN, + "Freeing a packet on tx queue[%d]: chain_cons 0x%x, chain_prod 0x%x\n", + txq->index, qed_chain_get_cons_idx(&txq->tx_pbl), + qed_chain_get_prod_idx(&txq->tx_pbl)); + + rc = qede_free_tx_pkt(edev, txq, &len); + if (rc) { + DP_NOTICE(edev, + "Failed to free a packet on tx queue[%d]: chain_cons 0x%x, chain_prod 0x%x\n", + txq->index, + qed_chain_get_cons_idx(&txq->tx_pbl), + qed_chain_get_prod_idx(&txq->tx_pbl)); + break; + } + + bytes_compl += len; + pkts_compl++; + txq->sw_tx_cons++; + } + + netdev_tx_completed_queue(netdev_txq, pkts_compl, bytes_compl); +} + +static void qede_empty_tx_queues(struct qede_dev *edev) +{ + int i; + + for_each_queue(i) + if (edev->fp_array[i].type & QEDE_FASTPATH_TX) { + int cos; + + for_each_cos_in_txq(edev, cos) { + struct qede_fastpath *fp; + + fp = &edev->fp_array[i]; + qede_empty_tx_queue(edev, + &fp->txq[cos]); + } + } +} + /* This function inits fp content and resets the SB, RXQ and TXQ structures */ static void qede_init_fp(struct qede_dev *edev) { @@ -2053,6 +2169,7 @@ static int qede_start_queues(struct qede_dev *edev, bool clear_stats) enum qede_unload_mode { QEDE_UNLOAD_NORMAL, + QEDE_UNLOAD_RECOVERY, }; static void qede_unload(struct qede_dev *edev, enum qede_unload_mode mode, @@ -2068,7 +2185,8 @@ static void qede_unload(struct qede_dev *edev, enum qede_unload_mode mode, clear_bit(QEDE_FLAGS_LINK_REQUESTED, &edev->flags); - edev->state = QEDE_STATE_CLOSED; + if (mode != QEDE_UNLOAD_RECOVERY) + edev->state = QEDE_STATE_CLOSED; qede_rdma_dev_event_close(edev); @@ -2076,17 +2194,20 @@ static void qede_unload(struct qede_dev *edev, enum qede_unload_mode mode, netif_tx_disable(edev->ndev); netif_carrier_off(edev->ndev); - /* Reset the link */ - memset(&link_params, 0, sizeof(link_params)); - link_params.link_up = false; - edev->ops->common->set_link(edev->cdev, &link_params); - rc = qede_stop_queues(edev); - if (rc) { - qede_sync_free_irqs(edev); - goto out; - } + if (mode != QEDE_UNLOAD_RECOVERY) { + /* Reset the link */ + memset(&link_params, 0, sizeof(link_params)); + link_params.link_up = false; + edev->ops->common->set_link(edev->cdev, &link_params); - DP_INFO(edev, "Stopped Queues\n"); + rc = qede_stop_queues(edev); + if (rc) { + qede_sync_free_irqs(edev); + goto out; + } + + DP_INFO(edev, "Stopped Queues\n"); + } qede_vlan_mark_nonconfigured(edev); edev->ops->fastpath_stop(edev->cdev); @@ -2102,18 +2223,26 @@ static void qede_unload(struct qede_dev *edev, enum qede_unload_mode mode, qede_napi_disable_remove(edev); + if (mode == QEDE_UNLOAD_RECOVERY) + qede_empty_tx_queues(edev); + qede_free_mem_load(edev); qede_free_fp_array(edev); out: if (!is_locked) __qede_unlock(edev); + + if (mode != QEDE_UNLOAD_RECOVERY) + DP_NOTICE(edev, "Link is down\n"); + DP_INFO(edev, "Ending qede unload\n"); } enum qede_load_mode { QEDE_LOAD_NORMAL, QEDE_LOAD_RELOAD, + QEDE_LOAD_RECOVERY, }; static int qede_load(struct qede_dev *edev, enum qede_load_mode mode, @@ -2293,6 +2422,77 @@ static void qede_link_update(void *dev, struct qed_link_output *link) } } +static void qede_schedule_recovery_handler(void *dev) +{ + struct qede_dev *edev = dev; + + if (edev->state == QEDE_STATE_RECOVERY) { + DP_NOTICE(edev, + "Avoid scheduling a recovery handling since already in recovery state\n"); + return; + } + + set_bit(QEDE_SP_RECOVERY, &edev->sp_flags); + schedule_delayed_work(&edev->sp_task, 0); + + DP_INFO(edev, "Scheduled a recovery handler\n"); +} + +static void qede_recovery_failed(struct qede_dev *edev) +{ + netdev_err(edev->ndev, "Recovery handling has failed. Power cycle is needed.\n"); + + netif_device_detach(edev->ndev); + + if (edev->cdev) + edev->ops->common->set_power_state(edev->cdev, PCI_D3hot); +} + +static void qede_recovery_handler(struct qede_dev *edev) +{ + u32 curr_state = edev->state; + int rc; + + DP_NOTICE(edev, "Starting a recovery process\n"); + + /* No need to acquire first the qede_lock since is done by qede_sp_task + * before calling this function. + */ + edev->state = QEDE_STATE_RECOVERY; + + edev->ops->common->recovery_prolog(edev->cdev); + + if (curr_state == QEDE_STATE_OPEN) + qede_unload(edev, QEDE_UNLOAD_RECOVERY, true); + + __qede_remove(edev->pdev, QEDE_REMOVE_RECOVERY); + + rc = __qede_probe(edev->pdev, edev->dp_module, edev->dp_level, + IS_VF(edev), QEDE_PROBE_RECOVERY); + if (rc) { + edev->cdev = NULL; + goto err; + } + + if (curr_state == QEDE_STATE_OPEN) { + rc = qede_load(edev, QEDE_LOAD_RECOVERY, true); + if (rc) + goto err; + + qede_config_rx_mode(edev->ndev); + udp_tunnel_get_rx_info(edev->ndev); + } + + edev->state = curr_state; + + DP_NOTICE(edev, "Recovery handling is done\n"); + + return; + +err: + qede_recovery_failed(edev); +} + static bool qede_is_txq_full(struct qede_dev *edev, struct qede_tx_queue *txq) { struct netdev_queue *netdev_txq; diff --git a/drivers/net/ethernet/qlogic/qede/qede_rdma.c b/drivers/net/ethernet/qlogic/qede/qede_rdma.c index 1900bf7..9668e5e 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_rdma.c +++ b/drivers/net/ethernet/qlogic/qede/qede_rdma.c @@ -50,6 +50,8 @@ static void _qede_rdma_dev_add(struct qede_dev *edev) if (!qedr_drv) return; + /* Leftovers from previous error recovery */ + edev->rdma_info.exp_recovery = false; edev->rdma_info.qedr_dev = qedr_drv->add(edev->cdev, edev->pdev, edev->ndev); } @@ -87,21 +89,26 @@ static void qede_rdma_destroy_wq(struct qede_dev *edev) destroy_workqueue(edev->rdma_info.rdma_wq); } -int qede_rdma_dev_add(struct qede_dev *edev) +int qede_rdma_dev_add(struct qede_dev *edev, enum qede_rdma_probe_mode mode) { - int rc = 0; + int rc; - if (qede_rdma_supported(edev)) { - rc = qede_rdma_create_wq(edev); - if (rc) - return rc; + if (!qede_rdma_supported(edev)) + return 0; - INIT_LIST_HEAD(&edev->rdma_info.entry); - mutex_lock(&qedr_dev_list_lock); - list_add_tail(&edev->rdma_info.entry, &qedr_dev_list); - _qede_rdma_dev_add(edev); - mutex_unlock(&qedr_dev_list_lock); - } + /* Cannot start qedr while recovering since it wasn't fully stopped */ + if (mode == QEDE_RDMA_PROBE_RECOVERY) + return 0; + + rc = qede_rdma_create_wq(edev); + if (rc) + return rc; + + INIT_LIST_HEAD(&edev->rdma_info.entry); + mutex_lock(&qedr_dev_list_lock); + list_add_tail(&edev->rdma_info.entry, &qedr_dev_list); + _qede_rdma_dev_add(edev); + mutex_unlock(&qedr_dev_list_lock); return rc; } @@ -110,19 +117,31 @@ static void _qede_rdma_dev_remove(struct qede_dev *edev) { if (qedr_drv && qedr_drv->remove && edev->rdma_info.qedr_dev) qedr_drv->remove(edev->rdma_info.qedr_dev); - edev->rdma_info.qedr_dev = NULL; } -void qede_rdma_dev_remove(struct qede_dev *edev) +void qede_rdma_dev_remove(struct qede_dev *edev, + enum qede_rdma_remove_mode mode) { if (!qede_rdma_supported(edev)) return; - qede_rdma_destroy_wq(edev); - mutex_lock(&qedr_dev_list_lock); - _qede_rdma_dev_remove(edev); - list_del(&edev->rdma_info.entry); - mutex_unlock(&qedr_dev_list_lock); + /* Cannot remove qedr while recovering since it wasn't fully stopped */ + if (mode == QEDE_RDMA_REMOVE_NORMAL) { + qede_rdma_destroy_wq(edev); + mutex_lock(&qedr_dev_list_lock); + if (!edev->rdma_info.exp_recovery) + _qede_rdma_dev_remove(edev); + edev->rdma_info.qedr_dev = NULL; + list_del(&edev->rdma_info.entry); + mutex_unlock(&qedr_dev_list_lock); + } else { + if (!edev->rdma_info.exp_recovery) { + mutex_lock(&qedr_dev_list_lock); + _qede_rdma_dev_remove(edev); + mutex_unlock(&qedr_dev_list_lock); + } + edev->rdma_info.exp_recovery = true; + } } static void _qede_rdma_dev_open(struct qede_dev *edev) @@ -204,7 +223,8 @@ void qede_rdma_unregister_driver(struct qedr_driver *drv) mutex_lock(&qedr_dev_list_lock); list_for_each_entry(edev, &qedr_dev_list, rdma_info.entry) { - if (edev->rdma_info.qedr_dev) + /* If device has experienced recovery it was already removed */ + if (edev->rdma_info.qedr_dev && !edev->rdma_info.exp_recovery) _qede_rdma_dev_remove(edev); } qedr_drv = NULL; @@ -284,6 +304,10 @@ static void qede_rdma_add_event(struct qede_dev *edev, { struct qede_rdma_event_work *event_node; + /* If a recovery was experienced avoid adding the event */ + if (edev->rdma_info.exp_recovery) + return; + if (!edev->rdma_info.qedr_dev) return; diff --git a/include/linux/qed/qede_rdma.h b/include/linux/qed/qede_rdma.h index 9904617..e29d719 100644 --- a/include/linux/qed/qede_rdma.h +++ b/include/linux/qed/qede_rdma.h @@ -55,6 +55,16 @@ struct qede_rdma_event_work { enum qede_rdma_event event; }; +enum qede_rdma_probe_mode { + QEDE_RDMA_PROBE_NORMAL, + QEDE_RDMA_PROBE_RECOVERY, +}; + +enum qede_rdma_remove_mode { + QEDE_RDMA_REMOVE_NORMAL, + QEDE_RDMA_REMOVE_RECOVERY, +}; + struct qedr_driver { unsigned char name[32]; @@ -74,21 +84,24 @@ struct qedr_driver { bool qede_rdma_supported(struct qede_dev *dev); #if IS_ENABLED(CONFIG_QED_RDMA) -int qede_rdma_dev_add(struct qede_dev *dev); +int qede_rdma_dev_add(struct qede_dev *dev, enum qede_rdma_probe_mode mode); void qede_rdma_dev_event_open(struct qede_dev *dev); void qede_rdma_dev_event_close(struct qede_dev *dev); -void qede_rdma_dev_remove(struct qede_dev *dev); +void qede_rdma_dev_remove(struct qede_dev *dev, + enum qede_rdma_remove_mode mode); void qede_rdma_event_changeaddr(struct qede_dev *edr); #else -static inline int qede_rdma_dev_add(struct qede_dev *dev) +static inline int qede_rdma_dev_add(struct qede_dev *dev, + enum qede_rdma_probe_mode mode) { return 0; } static inline void qede_rdma_dev_event_open(struct qede_dev *dev) {} static inline void qede_rdma_dev_event_close(struct qede_dev *dev) {} -static inline void qede_rdma_dev_remove(struct qede_dev *dev) {} +static inline void qede_rdma_dev_remove(struct qede_dev *dev, + enum qede_rdma_remove_mode mode) {} static inline void qede_rdma_event_changeaddr(struct qede_dev *edr) {} #endif #endif -- 1.8.3.1