linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Pali Rohár" <pali@kernel.org>
To: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
	Thomas Petazzoni <thomas.petazzoni@bootlin.com>,
	Rob Herring <robh@kernel.org>,
	Bjorn Helgaas <bhelgaas@google.com>
Cc: "Russell King" <rmk+kernel@armlinux.org.uk>,
	"Marek Behún" <kabel@kernel.org>,
	"Remi Pommarel" <repk@triplefau.lt>, Xogium <contact@xogium.me>,
	"Tomasz Maciej Nowak" <tmn505@gmail.com>,
	"Marc Zyngier" <maz@kernel.org>,
	linux-pci@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH 01/42] PCI: aardvark: Fix kernel panic during PIO transfer
Date: Thu,  6 May 2021 17:31:12 +0200	[thread overview]
Message-ID: <20210506153153.30454-2-pali@kernel.org> (raw)
In-Reply-To: <20210506153153.30454-1-pali@kernel.org>

Trying to start a new PIO transfer by writing value 0 in PIO_START register
when previous transfer has not yet completed (which is indicated by value 1
in PIO_START) causes an External Abort on CPU, which results in kernel
panic:

    SError Interrupt on CPU0, code 0xbf000002 -- SError
    Kernel panic - not syncing: Asynchronous SError Interrupt

To prevent kernel panic, it is required to reject a new PIO transfer when
previous one has not finished yet.

If previous PIO transfer is not finished yet, the kernel may issue a new
PIO request only if the previous PIO transfer timed out.

In the past the root cause of this issue was incorrectly identified (as it
often happens during link retraining or after link down event) and special
hack was implemented in Trusted Firmware to catch all SError events in EL3,
to ignore errors with code 0xbf000002 and not forwarding any other errors
to kernel and instead throw panic from EL3 Trusted Firmware handler.

Links to discussion and patches about this issue:
https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/commit/?id=3c7dcdac5c50
https://lore.kernel.org/linux-pci/20190316161243.29517-1-repk@triplefau.lt/
https://lore.kernel.org/linux-pci/971be151d24312cc533989a64bd454b4@www.loen.fr/
https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/1541

But the real cause was the fact that during link retraning or after link
down event the PIO transfer may take longer time, up to the 1.44s until it
times out. This increased probability that a new PIO transfer would be
issued by kernel while previous one has not finished yet.

After applying this change into the kernel, it is possible to revert the
mentioned TF-A hack and SError events do not have to be caught in TF-A EL3.

Signed-off-by: Pali Rohár <pali@kernel.org>
Reviewed-by: Marek Behún <kabel@kernel.org>
Cc: stable@vger.kernel.org # 7fbcb5da811b ("PCI: aardvark: Don't rely on jiffies while holding spinlock")
---
 drivers/pci/controller/pci-aardvark.c | 49 ++++++++++++++++++++++-----
 1 file changed, 40 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
index 051b48bd7985..e3f5e7ab7606 100644
--- a/drivers/pci/controller/pci-aardvark.c
+++ b/drivers/pci/controller/pci-aardvark.c
@@ -514,7 +514,7 @@ static int advk_pcie_wait_pio(struct advk_pcie *pcie)
 		udelay(PIO_RETRY_DELAY);
 	}
 
-	dev_err(dev, "config read/write timed out\n");
+	dev_err(dev, "PIO read/write transfer time out\n");
 	return -ETIMEDOUT;
 }
 
@@ -657,6 +657,35 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
 	return true;
 }
 
+static bool advk_pcie_pio_is_running(struct advk_pcie *pcie)
+{
+	struct device *dev = &pcie->pdev->dev;
+
+	/*
+	 * Trying to start a new PIO transfer when previous has not completed
+	 * cause External Abort on CPU which results in kernel panic:
+	 *
+	 *     SError Interrupt on CPU0, code 0xbf000002 -- SError
+	 *     Kernel panic - not syncing: Asynchronous SError Interrupt
+	 *
+	 * Functions advk_pcie_rd_conf() and advk_pcie_wr_conf() are protected
+	 * by raw_spin_lock_irqsave() at pci_lock_config() level to prevent
+	 * concurrent calls at the same time. But because PIO transfer may take
+	 * about 1.5s when link is down or card is disconnected, it means that
+	 * advk_pcie_wait_pio() does not always have to wait for completion.
+	 *
+	 * Some versions of ARM Trusted Firmware handles this External Abort at
+	 * EL3 level and mask it to prevent kernel panic. Relevant TF-A commit:
+	 * https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/commit/?id=3c7dcdac5c50
+	 */
+	if (advk_readl(pcie, PIO_START)) {
+		dev_err(dev, "Previous PIO read/write transfer is still running\n");
+		return true;
+	}
+
+	return false;
+}
+
 static int advk_pcie_rd_conf(struct pci_bus *bus, u32 devfn,
 			     int where, int size, u32 *val)
 {
@@ -673,9 +702,10 @@ static int advk_pcie_rd_conf(struct pci_bus *bus, u32 devfn,
 		return pci_bridge_emul_conf_read(&pcie->bridge, where,
 						 size, val);
 
-	/* Start PIO */
-	advk_writel(pcie, 0, PIO_START);
-	advk_writel(pcie, 1, PIO_ISR);
+	if (advk_pcie_pio_is_running(pcie)) {
+		*val = 0xffffffff;
+		return PCIBIOS_SET_FAILED;
+	}
 
 	/* Program the control register */
 	reg = advk_readl(pcie, PIO_CTRL);
@@ -694,7 +724,8 @@ static int advk_pcie_rd_conf(struct pci_bus *bus, u32 devfn,
 	/* Program the data strobe */
 	advk_writel(pcie, 0xf, PIO_WR_DATA_STRB);
 
-	/* Start the transfer */
+	/* Clear PIO DONE ISR and start the transfer */
+	advk_writel(pcie, 1, PIO_ISR);
 	advk_writel(pcie, 1, PIO_START);
 
 	ret = advk_pcie_wait_pio(pcie);
@@ -734,9 +765,8 @@ static int advk_pcie_wr_conf(struct pci_bus *bus, u32 devfn,
 	if (where % size)
 		return PCIBIOS_SET_FAILED;
 
-	/* Start PIO */
-	advk_writel(pcie, 0, PIO_START);
-	advk_writel(pcie, 1, PIO_ISR);
+	if (advk_pcie_pio_is_running(pcie))
+		return PCIBIOS_SET_FAILED;
 
 	/* Program the control register */
 	reg = advk_readl(pcie, PIO_CTRL);
@@ -763,7 +793,8 @@ static int advk_pcie_wr_conf(struct pci_bus *bus, u32 devfn,
 	/* Program the data strobe */
 	advk_writel(pcie, data_strobe, PIO_WR_DATA_STRB);
 
-	/* Start the transfer */
+	/* Clear PIO DONE ISR and start the transfer */
+	advk_writel(pcie, 1, PIO_ISR);
 	advk_writel(pcie, 1, PIO_START);
 
 	ret = advk_pcie_wait_pio(pcie);
-- 
2.20.1


  reply	other threads:[~2021-05-06 15:32 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-06 15:31 [PATCH 00/42] PCI: aardvark: Various driver fixes Pali Rohár
2021-05-06 15:31 ` Pali Rohár [this message]
2021-05-19  8:06   ` [PATCH 01/42] PCI: aardvark: Fix kernel panic during PIO transfer Pali Rohár
2021-05-06 15:31 ` [PATCH 02/42] PCI: aardvark: Fix checking for PIO Non-posted Request Pali Rohár
2021-05-06 15:31 ` [PATCH 03/42] PCI: aardvark: Fix checking for PIO status Pali Rohár
2021-05-06 15:31 ` [PATCH 04/42] PCI: aardvark: Increase polling delay to 1.5s while waiting for PIO response Pali Rohár
2021-05-06 15:31 ` [PATCH 05/42] PCI: pci-bridge-emul: Add PCIe Root Capabilities Register Pali Rohár
2021-05-06 23:10   ` Bjorn Helgaas
2021-05-07 14:40     ` Pali Rohár
2021-05-07 16:41       ` Bjorn Helgaas
2021-05-06 15:31 ` [PATCH 06/42] PCI: aardvark: Fix reporting CRS Software Visibility on emulated bridge Pali Rohár
2021-05-07 13:03   ` Bjorn Helgaas
2021-05-07 15:25     ` Pali Rohár
2021-05-07 15:33     ` Pali Rohár
2021-05-06 15:31 ` [PATCH 07/42] PCI: aardvark: Fix link training Pali Rohár
2021-05-06 15:31 ` [PATCH 08/42] PCI: Add PCI_EXP_DEVCTL_PAYLOAD_* macros Pali Rohár
2021-05-06 15:31 ` [PATCH 09/42] PCI: aardvark: Fix PCIe Max Payload Size setting Pali Rohár
2021-05-06 15:31 ` [PATCH 10/42] PCI: aardvark: Implement workaround for the readback value of VEND_ID Pali Rohár
2021-05-06 15:31 ` [PATCH 11/42] PCI: aardvark: Do not touch status bits of masked interrupts in interrupt handler Pali Rohár
2021-05-06 15:31 ` [PATCH 12/42] PCI: aardvark: Check for virq mapping when processing INTx IRQ Pali Rohár
2021-05-07  9:15   ` Marc Zyngier
2021-06-04 16:24     ` Pali Rohár
2021-06-04 16:29       ` Marc Zyngier
2021-05-06 15:31 ` [PATCH 13/42] PCI: aardvark: Remove irq_mask_ack callback for INTx interrupts Pali Rohár
2021-05-07  9:16   ` Marc Zyngier
2021-05-06 15:31 ` [PATCH 14/42] PCI: aardvark: Don't mask irq when mapping Pali Rohár
2021-05-07  9:20   ` Marc Zyngier
2021-05-07  9:27     ` Pali Rohár
2021-05-06 15:31 ` [PATCH 15/42] PCI: aardvark: Change name of INTx irq_chip to advk-INT Pali Rohár
2021-05-07  9:08   ` Marc Zyngier
2021-05-24 14:36     ` Marek Behún
2021-05-24 15:14       ` Marc Zyngier
2021-05-06 15:31 ` [PATCH 16/42] PCI: aardvark: Remove unneeded goto Pali Rohár
2021-05-06 15:31 ` [PATCH 17/42] PCI: aardvark: Fix support for MSI interrupts Pali Rohár
2021-05-07 10:16   ` Marc Zyngier
2021-05-07 14:44     ` Pali Rohár
2021-05-07 16:24       ` Marc Zyngier
2021-06-04 16:02         ` Pali Rohár
2021-06-04 16:22           ` Marc Zyngier
2021-05-06 15:31 ` [PATCH 18/42] PCI: aardvark: Correctly clear and unmask all " Pali Rohár
2021-05-07 10:19   ` Marc Zyngier
2021-05-07 10:21     ` Pali Rohár
2021-05-06 15:31 ` [PATCH 19/42] PCI: aardvark: Fix setting MSI address Pali Rohár
2021-05-07 10:25   ` Marc Zyngier
2021-05-06 15:31 ` [PATCH 20/42] PCI: aardvark: Add support for more than 32 MSI interrupts Pali Rohár
2021-07-02 21:35   ` Pali Rohár
2021-05-06 15:31 ` [PATCH 21/42] PCI: aardvark: Add support for masking " Pali Rohár
2021-05-06 15:31 ` [PATCH 22/42] PCI: aardvark: Enable MSI-X support Pali Rohár
2021-05-06 15:31 ` [PATCH 23/42] PCI: aardvark: Fix support for ERR interrupt on emulated bridge Pali Rohár
2021-05-06 15:31 ` [PATCH 24/42] PCI: aardvark: Fix support for PME " Pali Rohár
2021-05-06 15:31 ` [PATCH 25/42] PCI: aardvark: Fix support for PME requester " Pali Rohár
2021-05-06 15:31 ` [PATCH 26/42] PCI: aardvark: Fix support for bus mastering and PCI_COMMAND " Pali Rohár
2021-05-06 15:31 ` [PATCH 27/42] PCI: aardvark: Disable bus mastering and mask all interrupts when unbinding driver Pali Rohár
2021-05-06 15:31 ` [PATCH 28/42] PCI: aardvark: Free config space for emulated root bridge when unbinding driver to fix memory leak Pali Rohár
2021-05-06 15:31 ` [PATCH 29/42] PCI: aardvark: Reset PCIe card and disable PHY when unbinding driver Pali Rohár
2021-05-06 15:31 ` [PATCH 30/42] PCI: aardvark: Rewrite irq code to chained irq handler Pali Rohár
2021-05-06 15:31 ` [PATCH 31/42] PCI: aardvark: Use separate INTA interrupt for emulated root bridge Pali Rohár
2021-05-06 15:31 ` [PATCH 32/42] PCI: pci-bridge-emul: Add description for class_revision field Pali Rohár
2021-05-06 15:31 ` [PATCH 33/42] PCI: pci-bridge-emul: Add definitions for missing capabilities registers Pali Rohár
2021-05-06 15:31 ` [PATCH 34/42] PCI: aardvark: Add support for DEVCAP2, DEVCTL2, LNKCAP2 and LNKCTL2 registers on emulated bridge Pali Rohár
2021-05-06 15:31 ` [PATCH 35/42] PCI: aardvark: Add support for PCI_BRIDGE_CTL_BUS_RESET " Pali Rohár
2021-05-06 15:31 ` [PATCH 36/42] PCI: aardvark: Replace custom PCIE_CORE_ERR_CAPCTL_* macros by linux/pci_regs.h macros Pali Rohár
2021-05-06 15:31 ` [PATCH 37/42] PCI: aardvark: Replace custom PCIE_CORE_INT_* macros by linux PCI_INTERRUPT_* values Pali Rohár
2021-05-06 15:31 ` [PATCH 38/42] PCI: aardvark: Cleanup some register macros Pali Rohár
2021-05-06 15:31 ` [PATCH 39/42] PCI: aardvark: Add comments for OB_WIN_ENABLE and ADDR_WIN_DISABLE Pali Rohár
2021-05-06 15:31 ` [PATCH 40/42] PCI: pci-bridge-emul: re-arrange register tests Pali Rohár
2021-05-06 15:31 ` [PATCH 41/42] PCI: pci-bridge-emul: add support for PCIe extended capabilities Pali Rohár
2021-05-06 15:31 ` [PATCH 42/42] PCI: aardvark: Add support for Advanced Error Reporting registers on emulated bridge Pali Rohár
2021-06-03 15:16 ` [PATCH 00/42] PCI: aardvark: Various driver fixes Lorenzo Pieralisi
2021-06-03 17:02   ` Pali Rohár
2021-06-03 18:02   ` Simon Glass
2021-06-03 18:18     ` Pali Rohár
2021-06-04 14:05     ` Lorenzo Pieralisi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210506153153.30454-2-pali@kernel.org \
    --to=pali@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=contact@xogium.me \
    --cc=kabel@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=maz@kernel.org \
    --cc=repk@triplefau.lt \
    --cc=rmk+kernel@armlinux.org.uk \
    --cc=robh@kernel.org \
    --cc=thomas.petazzoni@bootlin.com \
    --cc=tmn505@gmail.com \
    --subject='Re: [PATCH 01/42] PCI: aardvark: Fix kernel panic during PIO transfer' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).