All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Stefan Assmann <sassmann@redhat.com>,
	Jiri Benc <jbenc@redhat.com>,
	Jesse Brandeburg <jesse.brandeburg@intel.com>,
	Dave Switzer <david.switzer@intel.com>,
	Tony Nguyen <anthony.l.nguyen@intel.com>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH 4.19 24/28] i40e: fix endless loop under rtnl
Date: Mon, 11 Oct 2021 15:47:14 +0200	[thread overview]
Message-ID: <20211011134641.498616131@linuxfoundation.org> (raw)
In-Reply-To: <20211011134640.711218469@linuxfoundation.org>

From: Jiri Benc <jbenc@redhat.com>

[ Upstream commit 857b6c6f665cca9828396d9743faf37fd09e9ac3 ]

The loop in i40e_get_capabilities can never end. The problem is that
although i40e_aq_discover_capabilities returns with an error if there's
a firmware problem, the returned error is not checked. There is a check for
pf->hw.aq.asq_last_status but that value is set to I40E_AQ_RC_OK on most
firmware problems.

When i40e_aq_discover_capabilities encounters a firmware problem, it will
encounter the same problem on its next invocation. As the result, the loop
becomes endless. We hit this with I40E_ERR_ADMIN_QUEUE_TIMEOUT but looking
at the code, it can happen with a range of other firmware errors.

I don't know what the correct behavior should be: whether the firmware
should be retried a few times, or whether pf->hw.aq.asq_last_status should
be always set to the encountered firmware error (but then it would be
pointless and can be just replaced by the i40e_aq_discover_capabilities
return value). However, the current behavior with an endless loop under the
rtnl mutex(!) is unacceptable and Intel has not submitted a fix, although we
explained the bug to them 7 months ago.

This may not be the best possible fix but it's better than hanging the whole
system on a firmware bug.

Fixes: 56a62fc86895 ("i40e: init code and hardware support")
Tested-by: Stefan Assmann <sassmann@redhat.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 246734be5177..8f7d3af75ed6 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -9068,7 +9068,7 @@ static int i40e_get_capabilities(struct i40e_pf *pf,
 		if (pf->hw.aq.asq_last_status == I40E_AQ_RC_ENOMEM) {
 			/* retry with a larger buffer */
 			buf_len = data_size;
-		} else if (pf->hw.aq.asq_last_status != I40E_AQ_RC_OK) {
+		} else if (pf->hw.aq.asq_last_status != I40E_AQ_RC_OK || err) {
 			dev_info(&pf->pdev->dev,
 				 "capability discovery failed, err %s aq_err %s\n",
 				 i40e_stat_str(&pf->hw, err),
-- 
2.33.0




  parent reply	other threads:[~2021-10-11 14:17 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-11 13:46 [PATCH 4.19 00/28] 4.19.211-rc1 review Greg Kroah-Hartman
2021-10-11 13:46 ` [PATCH 4.19 01/28] Partially revert "usb: Kconfig: using select for USB_COMMON dependency" Greg Kroah-Hartman
2021-10-11 13:46 ` [PATCH 4.19 02/28] USB: cdc-acm: fix racy tty buffer accesses Greg Kroah-Hartman
2021-10-11 13:46 ` [PATCH 4.19 03/28] USB: cdc-acm: fix break reporting Greg Kroah-Hartman
2021-10-11 13:46 ` [PATCH 4.19 04/28] xen/privcmd: fix error handling in mmap-resource processing Greg Kroah-Hartman
2021-10-11 13:46 ` [PATCH 4.19 05/28] ovl: fix missing negative dentry check in ovl_rename() Greg Kroah-Hartman
2021-10-11 13:46 ` [PATCH 4.19 06/28] nfsd4: Handle the NFSv4 READDIR dircount hint being zero Greg Kroah-Hartman
2021-10-11 13:46 ` [PATCH 4.19 07/28] xen/balloon: fix cancelled balloon action Greg Kroah-Hartman
2021-10-11 13:46 ` [PATCH 4.19 08/28] ARM: dts: omap3430-sdp: Fix NAND device node Greg Kroah-Hartman
2021-10-11 13:46 ` [PATCH 4.19 09/28] ARM: dts: qcom: apq8064: use compatible which contains chipid Greg Kroah-Hartman
2021-10-11 13:47 ` [PATCH 4.19 10/28] bpf, mips: Validate conditional branch offsets Greg Kroah-Hartman
2021-10-11 13:47 ` [PATCH 4.19 11/28] xtensa: call irqchip_init only when CONFIG_USE_OF is selected Greg Kroah-Hartman
2021-10-11 13:47 ` [PATCH 4.19 12/28] bpf, arm: Fix register clobbering in div/mod implementation Greg Kroah-Hartman
2021-10-11 13:47 ` [PATCH 4.19 13/28] bpf: Fix integer overflow in prealloc_elems_and_freelist() Greg Kroah-Hartman
2021-10-11 13:47 ` [PATCH 4.19 14/28] phy: mdio: fix memory leak Greg Kroah-Hartman
2021-10-11 13:47 ` [PATCH 4.19 15/28] net_sched: fix NULL deref in fifo_set_limit() Greg Kroah-Hartman
2021-10-11 13:47 ` [PATCH 4.19 16/28] powerpc/fsl/dts: Fix phy-connection-type for fm1mac3 Greg Kroah-Hartman
2021-10-11 13:47 ` [PATCH 4.19 17/28] ptp_pch: Load module automatically if ID matches Greg Kroah-Hartman
2021-10-11 13:47 ` [PATCH 4.19 18/28] ARM: imx6: disable the GIC CPU interface before calling stby-poweroff sequence Greg Kroah-Hartman
2021-10-11 13:47 ` [PATCH 4.19 19/28] net: bridge: use nla_total_size_64bit() in br_get_linkxstats_size() Greg Kroah-Hartman
2021-10-11 13:47 ` [PATCH 4.19 20/28] net: sfp: Fix typo in state machine debug string Greg Kroah-Hartman
2021-10-11 13:47 ` [PATCH 4.19 21/28] netlink: annotate data races around nlk->bound Greg Kroah-Hartman
2021-10-11 13:47 ` [PATCH 4.19 22/28] drm/nouveau/debugfs: fix file release memory leak Greg Kroah-Hartman
2021-10-11 13:47 ` [PATCH 4.19 23/28] rtnetlink: fix if_nlmsg_stats_size() under estimation Greg Kroah-Hartman
2021-10-11 13:47 ` Greg Kroah-Hartman [this message]
2021-10-11 13:47 ` [PATCH 4.19 25/28] i40e: Fix freeing of uninitialized misc IRQ vector Greg Kroah-Hartman
2021-10-11 13:47 ` [PATCH 4.19 26/28] i2c: acpi: fix resource leak in reconfiguration device addition Greg Kroah-Hartman
2021-10-11 13:47 ` [PATCH 4.19 27/28] powerpc/bpf: Fix BPF_MOD when imm == 1 Greg Kroah-Hartman
2021-10-11 13:47 ` [PATCH 4.19 28/28] x86/Kconfig: Correct reference to MWINCHIP3D Greg Kroah-Hartman
2021-10-11 16:51 ` [PATCH 4.19 00/28] 4.19.211-rc1 review Pavel Machek
2021-10-11 20:51 ` Guenter Roeck
2021-10-12  1:18 ` Shuah Khan
2021-10-12  1:59 ` Guenter Roeck
2021-10-12  8:13 ` Samuel Zou

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211011134641.498616131@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=anthony.l.nguyen@intel.com \
    --cc=david.switzer@intel.com \
    --cc=jbenc@redhat.com \
    --cc=jesse.brandeburg@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sashal@kernel.org \
    --cc=sassmann@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.