linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Gabriele Paoloni <gabriele.paoloni@huawei.com>,
	Dongdong Liu <liudongdong3@huawei.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Sasha Levin <alexander.levin@verizon.com>
Subject: [PATCH 3.18 35/38] PCI/AER: Report non-fatal errors only to the affected endpoint
Date: Fri, 22 Dec 2017 09:46:48 +0100	[thread overview]
Message-ID: <20171222084453.567376756@linuxfoundation.org> (raw)
In-Reply-To: <20171222084451.623495387@linuxfoundation.org>

3.18-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Gabriele Paoloni <gabriele.paoloni@huawei.com>


[ Upstream commit 86acc790717fb60fb51ea3095084e331d8711c74 ]

Previously, if an non-fatal error was reported by an endpoint, we
called report_error_detected() for the endpoint, every sibling on the
bus, and their descendents.  If any of them did not implement the
.error_detected() method, do_recovery() failed, leaving all these
devices unrecovered.

For example, the system described in the bugzilla below has two devices:

  0000:74:02.0 [19e5:a230] SAS controller, driver has .error_detected()
  0000:74:03.0 [19e5:a235] SATA controller, driver lacks .error_detected()

When a device such as 74:02.0 reported a non-fatal error, do_recovery()
failed because 74:03.0 lacked an .error_detected() method.  But per PCIe
r3.1, sec 6.2.2.2.2, such an error does not compromise the Link and
does not affect 74:03.0:

  Non-fatal errors are uncorrectable errors which cause a particular
  transaction to be unreliable but the Link is otherwise fully functional.
  Isolating Non-fatal from Fatal errors provides Requester/Receiver logic
  in a device or system management software the opportunity to recover from
  the error without resetting the components on the Link and disturbing
  other transactions in progress.  Devices not associated with the
  transaction in error are not impacted by the error.

Report non-fatal errors only to the endpoint that reported them.  We really
want to check for AER_NONFATAL here, but the current code structure doesn't
allow that.  Looking for pci_channel_io_normal is the best we can do now.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=197055
Fixes: 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver")
Signed-off-by: Gabriele Paoloni <gabriele.paoloni@huawei.com>
Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/pci/pcie/aer/aerdrv_core.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -360,7 +360,14 @@ static pci_ers_result_t broadcast_error_
 		 * If the error is reported by an end point, we think this
 		 * error is related to the upstream link of the end point.
 		 */
-		pci_walk_bus(dev->bus, cb, &result_data);
+		if (state == pci_channel_io_normal)
+			/*
+			 * the error is non fatal so the bus is ok, just invoke
+			 * the callback for the function that logged the error.
+			 */
+			cb(dev, &result_data);
+		else
+			pci_walk_bus(dev->bus, cb, &result_data);
 	}
 
 	return result_data.result;

  parent reply	other threads:[~2017-12-22  8:49 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-22  8:46 [PATCH 3.18 00/38] 3.18.90-stable review Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 01/38] arm64: Initialise high_memory global variable earlier Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 02/38] ALSA: hda - add support for docking station for HP 820 G2 Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 03/38] cpuidle: Validate cpu_dev in cpuidle_add_sysfs() Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 04/38] r8152: fix the list rx_done may be used without initialization Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 05/38] crypto: deadlock between crypto_alg_sem/rtnl_mutex/genl_mutex Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 07/38] usb: gadget: f_uvc: Sanity check wMaxPacketSize for SuperSpeed Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 08/38] usb: gadget: udc: remove pointer dereference after free Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 09/38] netfilter: nfnl_cthelper: fix runtime expectation policy updates Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 10/38] netfilter: nfnl_cthelper: Fix memory leak Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 11/38] scsi: lpfc: Fix PT2PT PRLI reject Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 14/38] hwmon: (asus_atk0110) fix uninitialized data access Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 15/38] i2c: mux: pca954x: Add missing pca9546 definition to chip_desc Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 16/38] HID: xinmo: fix for out of range for THT 2P arcade controller Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 17/38] s390/qeth: no ETH header for outbound AF_IUCV Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 18/38] net: Do not allow negative values for busy_read and busy_poll sysctl interfaces Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 19/38] i40e: Do not enable NAPI on q_vectors that have no rings Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 20/38] irda: vlsi_ir: fix check for DMA mapping errors Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 21/38] netfilter: nfnl_cthelper: fix a race when walk the nf_ct_helper_hash table Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 22/38] netfilter: nf_nat_snmp: Fix panic when snmp_trap_helper fails to register Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 23/38] ARM: dts: am335x-evmsk: adjust mmc2 param to allow suspend Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 24/38] isdn: kcapi: avoid uninitialized data Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 25/38] xhci: plat: Register shutdown for xhci_plat Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 26/38] ARM: dma-mapping: disallow dma_get_sgtable() for non-kernel managed memory Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 27/38] cpuidle: powernv: Pass correct drv->cpumask for registration Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 28/38] backlight: pwm_bl: Fix overflow condition Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 29/38] crypto: crypto4xx - increase context and scatter ring buffer elements Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 30/38] net: phy: at803x: Change error to EINVAL for invalid MAC Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 31/38] PCI: Avoid bus reset if bridge itself is broken Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 32/38] scsi: cxgb4i: fix Tx skb leak Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 33/38] PCI: Create SR-IOV virtfn/physfn links before attaching driver Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 34/38] igb: check memory allocation failure Greg Kroah-Hartman
2017-12-22  8:46 ` Greg Kroah-Hartman [this message]
2017-12-22  8:46 ` [PATCH 3.18 36/38] scsi: lpfc: Fix secure firmware updates Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 37/38] scsi: lpfc: PLOGI failures during NPIV testing Greg Kroah-Hartman
2017-12-22  8:46 ` [PATCH 3.18 38/38] fm10k: ensure we process SM mbx when processing VF mbx Greg Kroah-Hartman
2017-12-22 17:53 ` [PATCH 3.18 00/38] 3.18.90-stable review Guenter Roeck
2017-12-23  9:18   ` Greg Kroah-Hartman
2017-12-22 21:12 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171222084453.567376756@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=alexander.levin@verizon.com \
    --cc=bhelgaas@google.com \
    --cc=gabriele.paoloni@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liudongdong3@huawei.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).