linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Abdul Haleem <abdhalee@linux.vnet.ibm.com>,
	"Guilherme G. Piccoli" <gpiccoli@linux.vnet.ibm.com>,
	Shahed Shaikh <Shahed.Shaikh@cavium.com>,
	"David S. Miller" <davem@davemloft.net>,
	Sasha Levin <alexander.levin@microsoft.com>
Subject: [PATCH 4.4 15/34] bnx2x: Improve reliability in case of nested PCI errors
Date: Fri,  2 Mar 2018 09:51:11 +0100	[thread overview]
Message-ID: <20180302084436.942538303@linuxfoundation.org> (raw)
In-Reply-To: <20180302084435.842679610@linuxfoundation.org>

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: "Guilherme G. Piccoli" <gpiccoli@linux.vnet.ibm.com>


[ Upstream commit f7084059a9cb9e56a186e1677b1dcffd76c2cd24 ]

While in recovery process of PCI error (called EEH on PowerPC arch),
another PCI transaction could be corrupted causing a situation of
nested PCI errors. Also, this scenario could be reproduced with
error injection mechanisms (for debug purposes).

We observe that in case of nested PCI errors, bnx2x might attempt to
initialize its shmem and cause a kernel crash due to bad addresses
read from MCP. Multiple different stack traces were observed depending
on the point the second PCI error happens.

This patch avoids the crashes by:

 * failing PCI recovery in case of nested errors (since multiple
 PCI errors in a row are not expected to lead to a functional
 adapter anyway), and by,

 * preventing access to adapter FW when MCP is failed (we mark it as
 failed when shmem cannot get initialized properly).

Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Acked-by: Shahed Shaikh <Shahed.Shaikh@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c  |    4 ++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |   14 +++++++++++++-
 2 files changed, 15 insertions(+), 3 deletions(-)

--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -3052,7 +3052,7 @@ int bnx2x_nic_unload(struct bnx2x *bp, i
 
 	del_timer_sync(&bp->timer);
 
-	if (IS_PF(bp)) {
+	if (IS_PF(bp) && !BP_NOMCP(bp)) {
 		/* Set ALWAYS_ALIVE bit in shmem */
 		bp->fw_drv_pulse_wr_seq |= DRV_PULSE_ALWAYS_ALIVE;
 		bnx2x_drv_pulse(bp);
@@ -3134,7 +3134,7 @@ int bnx2x_nic_unload(struct bnx2x *bp, i
 	bp->cnic_loaded = false;
 
 	/* Clear driver version indication in shmem */
-	if (IS_PF(bp))
+	if (IS_PF(bp) && !BP_NOMCP(bp))
 		bnx2x_update_mng_version(bp);
 
 	/* Check if there are pending parity attentions. If there are - set
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -9570,6 +9570,15 @@ static int bnx2x_init_shmem(struct bnx2x
 
 	do {
 		bp->common.shmem_base = REG_RD(bp, MISC_REG_SHARED_MEM_ADDR);
+
+		/* If we read all 0xFFs, means we are in PCI error state and
+		 * should bail out to avoid crashes on adapter's FW reads.
+		 */
+		if (bp->common.shmem_base == 0xFFFFFFFF) {
+			bp->flags |= NO_MCP_FLAG;
+			return -ENODEV;
+		}
+
 		if (bp->common.shmem_base) {
 			val = SHMEM_RD(bp, validity_map[BP_PORT(bp)]);
 			if (val & SHR_MEM_VALIDITY_MB)
@@ -14214,7 +14223,10 @@ static pci_ers_result_t bnx2x_io_slot_re
 		BNX2X_ERR("IO slot reset --> driver unload\n");
 
 		/* MCP should have been reset; Need to wait for validity */
-		bnx2x_init_shmem(bp);
+		if (bnx2x_init_shmem(bp)) {
+			rtnl_unlock();
+			return PCI_ERS_RESULT_DISCONNECT;
+		}
 
 		if (IS_PF(bp) && SHMEM2_HAS(bp, drv_capabilities_flag)) {
 			u32 v;

  parent reply	other threads:[~2018-03-02  8:54 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-02  8:50 [PATCH 4.4 00/34] 4.4.120-stable review Greg Kroah-Hartman
2018-03-02  8:50 ` [PATCH 4.4 01/34] hrtimer: Ensure POSIX compliance (relative CLOCK_REALTIME hrtimers) Greg Kroah-Hartman
2018-03-02  8:50 ` [PATCH 4.4 02/34] f2fs: fix a bug caused by NULL extent tree Greg Kroah-Hartman
2018-03-02  8:50 ` [PATCH 4.4 03/34] mtd: nand: gpmi: Fix failure when a erased page has a bitflip at BBM Greg Kroah-Hartman
2018-03-06 21:22   ` Ben Hutchings
2018-03-07  8:12     ` Boris Brezillon
2018-03-07  8:19       ` Richard Weinberger
2018-03-07 14:58       ` Sasha Levin
2018-03-02  8:51 ` [PATCH 4.4 04/34] ipv6: icmp6: Allow icmp messages to be looped back Greg Kroah-Hartman
2018-03-02  8:51 ` [PATCH 4.4 05/34] ARM: 8731/1: Fix csum_partial_copy_from_user() stack mismatch Greg Kroah-Hartman
2018-03-02  8:51 ` [PATCH 4.4 06/34] sget(): handle failures of register_shrinker() Greg Kroah-Hartman
2018-03-07  2:16   ` Ben Hutchings
2018-03-02  8:51 ` [PATCH 4.4 07/34] drm/nouveau/pci: do a msi rearm on init Greg Kroah-Hartman
2018-03-02  8:51 ` [PATCH 4.4 08/34] spi: atmel: fixed spin_lock usage inside atmel_spi_remove Greg Kroah-Hartman
2018-03-02  8:51 ` [PATCH 4.4 09/34] net: arc_emac: fix arc_emac_rx() error paths Greg Kroah-Hartman
2018-03-02  8:51 ` [PATCH 4.4 10/34] scsi: storvsc: Fix scsi_cmd error assignments in storvsc_handle_error Greg Kroah-Hartman
2018-03-02  8:51 ` [PATCH 4.4 11/34] ARM: dts: ls1021a: fix incorrect clock references Greg Kroah-Hartman
2018-03-02  8:51 ` [PATCH 4.4 12/34] lib/mpi: Fix umul_ppmm() for MIPS64r6 Greg Kroah-Hartman
2018-03-02  8:51 ` [PATCH 4.4 13/34] tg3: Add workaround to restrict 5762 MRRS to 2048 Greg Kroah-Hartman
2018-03-02  8:51 ` [PATCH 4.4 14/34] tg3: Enable PHY reset in MTU change path for 5720 Greg Kroah-Hartman
2018-03-02  8:51 ` Greg Kroah-Hartman [this message]
2018-03-02  8:51 ` [PATCH 4.4 16/34] led: core: Fix brightness setting when setting delay_off=0 Greg Kroah-Hartman
2018-03-07 15:30   ` Ben Hutchings
2018-03-07 15:32   ` Ben Hutchings
2018-03-07 20:39     ` Jacek Anaszewski
2018-03-08 17:24       ` Greg Kroah-Hartman
2018-03-08 18:04         ` Pavel Machek
2018-03-08 20:48         ` Jacek Anaszewski
2018-03-09  1:09           ` Greg Kroah-Hartman
2018-03-02  8:51 ` [PATCH 4.4 17/34] s390/dasd: fix wrongly assigned configuration data Greg Kroah-Hartman
2018-03-02  8:51 ` [PATCH 4.4 18/34] IB/mlx4: Fix mlx4_ib_alloc_mr error flow Greg Kroah-Hartman
2018-03-02  8:51 ` [PATCH 4.4 19/34] IB/ipoib: Fix race condition in neigh creation Greg Kroah-Hartman
2018-03-02  8:51 ` [PATCH 4.4 20/34] xfs: quota: fix missed destroy of qi_tree_lock Greg Kroah-Hartman
2018-03-02  8:51 ` [PATCH 4.4 21/34] xfs: quota: check result of register_shrinker() Greg Kroah-Hartman
2018-03-02  8:51 ` [PATCH 4.4 22/34] e1000: fix disabling already-disabled warning Greg Kroah-Hartman
2018-03-02  8:51 ` [PATCH 4.4 23/34] drm/ttm: check the return value of kzalloc Greg Kroah-Hartman
2018-03-02  8:51 ` [PATCH 4.4 24/34] mac80211: mesh: drop frames appearing to be from us Greg Kroah-Hartman
2018-03-02  8:51 ` [PATCH 4.4 25/34] can: flex_can: Correct the checking for frame length in flexcan_start_xmit() Greg Kroah-Hartman
2018-03-02  8:51 ` [PATCH 4.4 26/34] bnxt_en: Fix the Invalid VF id check in bnxt_vf_ndo_prep routine Greg Kroah-Hartman
2018-03-02  8:51 ` [PATCH 4.4 27/34] xen-netfront: enable device after manual module load Greg Kroah-Hartman
2018-03-02  8:51 ` [PATCH 4.4 28/34] mdio-sun4i: Fix a memory leak Greg Kroah-Hartman
2018-03-02  8:51 ` [PATCH 4.4 29/34] SolutionEngine771x: fix Ether platform data Greg Kroah-Hartman
2018-03-02  8:51 ` [PATCH 4.4 30/34] xen/gntdev: Fix off-by-one error when unmapping with holes Greg Kroah-Hartman
2018-03-02  8:51 ` [PATCH 4.4 31/34] xen/gntdev: Fix partial gntdev_mmap() cleanup Greg Kroah-Hartman
2018-03-02  8:51 ` [PATCH 4.4 32/34] sctp: make use of pre-calculated len Greg Kroah-Hartman
2018-03-02  8:51 ` [PATCH 4.4 33/34] net: gianfar_ptp: move set_fipers() to spinlock protecting area Greg Kroah-Hartman
2018-03-02  8:51 ` [PATCH 4.4 34/34] MIPS: Implement __multi3 for GCC7 MIPS64r6 builds Greg Kroah-Hartman
2018-03-02 12:22 ` [PATCH 4.4 00/34] 4.4.120-stable review Nathan Chancellor
2018-03-02 18:53   ` Greg Kroah-Hartman
2018-03-02 13:58 ` kernelci.org bot
2018-03-02 16:59 ` Naresh Kamboju
2018-03-02 17:16 ` Guenter Roeck
2018-03-02 21:30 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180302084436.942538303@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=Shahed.Shaikh@cavium.com \
    --cc=abdhalee@linux.vnet.ibm.com \
    --cc=alexander.levin@microsoft.com \
    --cc=davem@davemloft.net \
    --cc=gpiccoli@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).