From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D221C41537 for ; Mon, 2 Aug 2021 14:14:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DF99A60555 for ; Mon, 2 Aug 2021 14:14:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237072AbhHBOOD (ORCPT ); Mon, 2 Aug 2021 10:14:03 -0400 Received: from mail.kernel.org ([198.145.29.99]:49164 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236591AbhHBODc (ORCPT ); Mon, 2 Aug 2021 10:03:32 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id C0A0461211; Mon, 2 Aug 2021 13:57:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1627912640; bh=PpLBshF79vpConzbkrHRkjI2D8Qk/JOWna+XLDYDzTU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=oobBpshVb5dTyYmbBOsXfVLAXjoi8bckrV09eKckOE9F+esWZ5j+aFwHgED/EcDmG hzBKYOtWnWv85OewuE8QDiroJwiR/JhjDeqyOkn4W8djuyayOEvYNBVxLkn2d+isae PwMl4xMUMmuMkzk73imc/Pfb7Gf0yksoH3yZ9XHs= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Aya Levin , Moshe Shemesh , Saeed Mahameed , Sasha Levin Subject: [PATCH 5.13 083/104] net/mlx5: Unload device upon firmware fatal error Date: Mon, 2 Aug 2021 15:45:20 +0200 Message-Id: <20210802134346.743470990@linuxfoundation.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210802134344.028226640@linuxfoundation.org> References: <20210802134344.028226640@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Aya Levin [ Upstream commit 7f331bf0f060c2727e36d64f9b098b4ee5f3dfad ] When fw_fatal reporter reports an error, the firmware in not responding. Unload the device to ensure that the driver closes all its resources, even if recovery is not due (user disabled auto-recovery or reporter is in grace period). On successful recovery the device is loaded back up. Fixes: b3bd076f7501 ("net/mlx5: Report devlink health on FW fatal issues") Signed-off-by: Aya Levin Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed Signed-off-by: Sasha Levin --- drivers/net/ethernet/mellanox/mlx5/core/health.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c index 9ff163c5bcde..9abeb80ffa31 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c @@ -626,8 +626,16 @@ static void mlx5_fw_fatal_reporter_err_work(struct work_struct *work) } fw_reporter_ctx.err_synd = health->synd; fw_reporter_ctx.miss_counter = health->miss_counter; - devlink_health_report(health->fw_fatal_reporter, - "FW fatal error reported", &fw_reporter_ctx); + if (devlink_health_report(health->fw_fatal_reporter, + "FW fatal error reported", &fw_reporter_ctx) == -ECANCELED) { + /* If recovery wasn't performed, due to grace period, + * unload the driver. This ensures that the driver + * closes all its resources and it is not subjected to + * requests from the kernel. + */ + mlx5_core_err(dev, "Driver is in error state. Unloading\n"); + mlx5_unload_one(dev); + } } static const struct devlink_health_reporter_ops mlx5_fw_fatal_reporter_ops = { -- 2.30.2