linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Pali Rohár" <pali@kernel.org>
To: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
	Thomas Petazzoni <thomas.petazzoni@bootlin.com>,
	Rob Herring <robh@kernel.org>,
	Bjorn Helgaas <bhelgaas@google.com>
Cc: "Russell King" <rmk+kernel@armlinux.org.uk>,
	"Marek Behún" <kabel@kernel.org>,
	"Remi Pommarel" <repk@triplefau.lt>, Xogium <contact@xogium.me>,
	"Tomasz Maciej Nowak" <tmn505@gmail.com>,
	"Marc Zyngier" <maz@kernel.org>,
	linux-pci@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 01/42] PCI: aardvark: Fix kernel panic during PIO transfer
Date: Wed, 19 May 2021 10:06:27 +0200	[thread overview]
Message-ID: <20210519080627.sfs74d2zxopkarfl@pali> (raw)
In-Reply-To: <20210506153153.30454-2-pali@kernel.org>

On Thursday 06 May 2021 17:31:12 Pali Rohár wrote:
> Trying to start a new PIO transfer by writing value 0 in PIO_START register
> when previous transfer has not yet completed (which is indicated by value 1
> in PIO_START) causes an External Abort on CPU, which results in kernel
> panic:
> 
>     SError Interrupt on CPU0, code 0xbf000002 -- SError
>     Kernel panic - not syncing: Asynchronous SError Interrupt
> 
> To prevent kernel panic, it is required to reject a new PIO transfer when
> previous one has not finished yet.
> 
> If previous PIO transfer is not finished yet, the kernel may issue a new
> PIO request only if the previous PIO transfer timed out.
> 
> In the past the root cause of this issue was incorrectly identified (as it
> often happens during link retraining or after link down event) and special
> hack was implemented in Trusted Firmware to catch all SError events in EL3,
> to ignore errors with code 0xbf000002 and not forwarding any other errors
> to kernel and instead throw panic from EL3 Trusted Firmware handler.
> 
> Links to discussion and patches about this issue:
> https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/commit/?id=3c7dcdac5c50
> https://lore.kernel.org/linux-pci/20190316161243.29517-1-repk@triplefau.lt/
> https://lore.kernel.org/linux-pci/971be151d24312cc533989a64bd454b4@www.loen.fr/
> https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/1541
> 
> But the real cause was the fact that during link retraning or after link
> down event the PIO transfer may take longer time, up to the 1.44s until it
> times out. This increased probability that a new PIO transfer would be
> issued by kernel while previous one has not finished yet.
> 
> After applying this change into the kernel, it is possible to revert the
> mentioned TF-A hack and SError events do not have to be caught in TF-A EL3.
> 
> Signed-off-by: Pali Rohár <pali@kernel.org>
> Reviewed-by: Marek Behún <kabel@kernel.org>
> Cc: stable@vger.kernel.org # 7fbcb5da811b ("PCI: aardvark: Don't rely on jiffies while holding spinlock")

Hello! Could you please review at least this patch? It is fixing kernel
panic and to prevent future kernel crashes I would really suggest to
merge this one patch into 5.13 queue.

> ---
>  drivers/pci/controller/pci-aardvark.c | 49 ++++++++++++++++++++++-----
>  1 file changed, 40 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
> index 051b48bd7985..e3f5e7ab7606 100644
> --- a/drivers/pci/controller/pci-aardvark.c
> +++ b/drivers/pci/controller/pci-aardvark.c
> @@ -514,7 +514,7 @@ static int advk_pcie_wait_pio(struct advk_pcie *pcie)
>  		udelay(PIO_RETRY_DELAY);
>  	}
>  
> -	dev_err(dev, "config read/write timed out\n");
> +	dev_err(dev, "PIO read/write transfer time out\n");
>  	return -ETIMEDOUT;
>  }
>  
> @@ -657,6 +657,35 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
>  	return true;
>  }
>  
> +static bool advk_pcie_pio_is_running(struct advk_pcie *pcie)
> +{
> +	struct device *dev = &pcie->pdev->dev;
> +
> +	/*
> +	 * Trying to start a new PIO transfer when previous has not completed
> +	 * cause External Abort on CPU which results in kernel panic:
> +	 *
> +	 *     SError Interrupt on CPU0, code 0xbf000002 -- SError
> +	 *     Kernel panic - not syncing: Asynchronous SError Interrupt
> +	 *
> +	 * Functions advk_pcie_rd_conf() and advk_pcie_wr_conf() are protected
> +	 * by raw_spin_lock_irqsave() at pci_lock_config() level to prevent
> +	 * concurrent calls at the same time. But because PIO transfer may take
> +	 * about 1.5s when link is down or card is disconnected, it means that
> +	 * advk_pcie_wait_pio() does not always have to wait for completion.
> +	 *
> +	 * Some versions of ARM Trusted Firmware handles this External Abort at
> +	 * EL3 level and mask it to prevent kernel panic. Relevant TF-A commit:
> +	 * https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/commit/?id=3c7dcdac5c50
> +	 */
> +	if (advk_readl(pcie, PIO_START)) {
> +		dev_err(dev, "Previous PIO read/write transfer is still running\n");
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
>  static int advk_pcie_rd_conf(struct pci_bus *bus, u32 devfn,
>  			     int where, int size, u32 *val)
>  {
> @@ -673,9 +702,10 @@ static int advk_pcie_rd_conf(struct pci_bus *bus, u32 devfn,
>  		return pci_bridge_emul_conf_read(&pcie->bridge, where,
>  						 size, val);
>  
> -	/* Start PIO */
> -	advk_writel(pcie, 0, PIO_START);
> -	advk_writel(pcie, 1, PIO_ISR);
> +	if (advk_pcie_pio_is_running(pcie)) {
> +		*val = 0xffffffff;
> +		return PCIBIOS_SET_FAILED;
> +	}
>  
>  	/* Program the control register */
>  	reg = advk_readl(pcie, PIO_CTRL);
> @@ -694,7 +724,8 @@ static int advk_pcie_rd_conf(struct pci_bus *bus, u32 devfn,
>  	/* Program the data strobe */
>  	advk_writel(pcie, 0xf, PIO_WR_DATA_STRB);
>  
> -	/* Start the transfer */
> +	/* Clear PIO DONE ISR and start the transfer */
> +	advk_writel(pcie, 1, PIO_ISR);
>  	advk_writel(pcie, 1, PIO_START);
>  
>  	ret = advk_pcie_wait_pio(pcie);
> @@ -734,9 +765,8 @@ static int advk_pcie_wr_conf(struct pci_bus *bus, u32 devfn,
>  	if (where % size)
>  		return PCIBIOS_SET_FAILED;
>  
> -	/* Start PIO */
> -	advk_writel(pcie, 0, PIO_START);
> -	advk_writel(pcie, 1, PIO_ISR);
> +	if (advk_pcie_pio_is_running(pcie))
> +		return PCIBIOS_SET_FAILED;
>  
>  	/* Program the control register */
>  	reg = advk_readl(pcie, PIO_CTRL);
> @@ -763,7 +793,8 @@ static int advk_pcie_wr_conf(struct pci_bus *bus, u32 devfn,
>  	/* Program the data strobe */
>  	advk_writel(pcie, data_strobe, PIO_WR_DATA_STRB);
>  
> -	/* Start the transfer */
> +	/* Clear PIO DONE ISR and start the transfer */
> +	advk_writel(pcie, 1, PIO_ISR);
>  	advk_writel(pcie, 1, PIO_START);
>  
>  	ret = advk_pcie_wait_pio(pcie);
> -- 
> 2.20.1
> 

  reply	other threads:[~2021-05-19  8:06 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-06 15:31 [PATCH 00/42] PCI: aardvark: Various driver fixes Pali Rohár
2021-05-06 15:31 ` [PATCH 01/42] PCI: aardvark: Fix kernel panic during PIO transfer Pali Rohár
2021-05-19  8:06   ` Pali Rohár [this message]
2021-05-06 15:31 ` [PATCH 02/42] PCI: aardvark: Fix checking for PIO Non-posted Request Pali Rohár
2021-05-06 15:31 ` [PATCH 03/42] PCI: aardvark: Fix checking for PIO status Pali Rohár
2021-05-06 15:31 ` [PATCH 04/42] PCI: aardvark: Increase polling delay to 1.5s while waiting for PIO response Pali Rohár
2021-05-06 15:31 ` [PATCH 05/42] PCI: pci-bridge-emul: Add PCIe Root Capabilities Register Pali Rohár
2021-05-06 23:10   ` Bjorn Helgaas
2021-05-07 14:40     ` Pali Rohár
2021-05-07 16:41       ` Bjorn Helgaas
2021-05-06 15:31 ` [PATCH 06/42] PCI: aardvark: Fix reporting CRS Software Visibility on emulated bridge Pali Rohár
2021-05-07 13:03   ` Bjorn Helgaas
2021-05-07 15:25     ` Pali Rohár
2021-05-07 15:33     ` Pali Rohár
2021-05-06 15:31 ` [PATCH 07/42] PCI: aardvark: Fix link training Pali Rohár
2021-05-06 15:31 ` [PATCH 08/42] PCI: Add PCI_EXP_DEVCTL_PAYLOAD_* macros Pali Rohár
2021-05-06 15:31 ` [PATCH 09/42] PCI: aardvark: Fix PCIe Max Payload Size setting Pali Rohár
2021-05-06 15:31 ` [PATCH 10/42] PCI: aardvark: Implement workaround for the readback value of VEND_ID Pali Rohár
2021-05-06 15:31 ` [PATCH 11/42] PCI: aardvark: Do not touch status bits of masked interrupts in interrupt handler Pali Rohár
2021-05-06 15:31 ` [PATCH 12/42] PCI: aardvark: Check for virq mapping when processing INTx IRQ Pali Rohár
2021-05-07  9:15   ` Marc Zyngier
2021-06-04 16:24     ` Pali Rohár
2021-06-04 16:29       ` Marc Zyngier
2021-05-06 15:31 ` [PATCH 13/42] PCI: aardvark: Remove irq_mask_ack callback for INTx interrupts Pali Rohár
2021-05-07  9:16   ` Marc Zyngier
2021-05-06 15:31 ` [PATCH 14/42] PCI: aardvark: Don't mask irq when mapping Pali Rohár
2021-05-07  9:20   ` Marc Zyngier
2021-05-07  9:27     ` Pali Rohár
2021-05-06 15:31 ` [PATCH 15/42] PCI: aardvark: Change name of INTx irq_chip to advk-INT Pali Rohár
2021-05-07  9:08   ` Marc Zyngier
2021-05-24 14:36     ` Marek Behún
2021-05-24 15:14       ` Marc Zyngier
2021-05-06 15:31 ` [PATCH 16/42] PCI: aardvark: Remove unneeded goto Pali Rohár
2021-05-06 15:31 ` [PATCH 17/42] PCI: aardvark: Fix support for MSI interrupts Pali Rohár
2021-05-07 10:16   ` Marc Zyngier
2021-05-07 14:44     ` Pali Rohár
2021-05-07 16:24       ` Marc Zyngier
2021-06-04 16:02         ` Pali Rohár
2021-06-04 16:22           ` Marc Zyngier
2021-05-06 15:31 ` [PATCH 18/42] PCI: aardvark: Correctly clear and unmask all " Pali Rohár
2021-05-07 10:19   ` Marc Zyngier
2021-05-07 10:21     ` Pali Rohár
2021-05-06 15:31 ` [PATCH 19/42] PCI: aardvark: Fix setting MSI address Pali Rohár
2021-05-07 10:25   ` Marc Zyngier
2021-05-06 15:31 ` [PATCH 20/42] PCI: aardvark: Add support for more than 32 MSI interrupts Pali Rohár
2021-07-02 21:35   ` Pali Rohár
2021-05-06 15:31 ` [PATCH 21/42] PCI: aardvark: Add support for masking " Pali Rohár
2021-05-06 15:31 ` [PATCH 22/42] PCI: aardvark: Enable MSI-X support Pali Rohár
2021-05-06 15:31 ` [PATCH 23/42] PCI: aardvark: Fix support for ERR interrupt on emulated bridge Pali Rohár
2021-05-06 15:31 ` [PATCH 24/42] PCI: aardvark: Fix support for PME " Pali Rohár
2021-05-06 15:31 ` [PATCH 25/42] PCI: aardvark: Fix support for PME requester " Pali Rohár
2021-05-06 15:31 ` [PATCH 26/42] PCI: aardvark: Fix support for bus mastering and PCI_COMMAND " Pali Rohár
2021-05-06 15:31 ` [PATCH 27/42] PCI: aardvark: Disable bus mastering and mask all interrupts when unbinding driver Pali Rohár
2021-05-06 15:31 ` [PATCH 28/42] PCI: aardvark: Free config space for emulated root bridge when unbinding driver to fix memory leak Pali Rohár
2021-05-06 15:31 ` [PATCH 29/42] PCI: aardvark: Reset PCIe card and disable PHY when unbinding driver Pali Rohár
2021-05-06 15:31 ` [PATCH 30/42] PCI: aardvark: Rewrite irq code to chained irq handler Pali Rohár
2021-05-06 15:31 ` [PATCH 31/42] PCI: aardvark: Use separate INTA interrupt for emulated root bridge Pali Rohár
2021-05-06 15:31 ` [PATCH 32/42] PCI: pci-bridge-emul: Add description for class_revision field Pali Rohár
2021-05-06 15:31 ` [PATCH 33/42] PCI: pci-bridge-emul: Add definitions for missing capabilities registers Pali Rohár
2021-05-06 15:31 ` [PATCH 34/42] PCI: aardvark: Add support for DEVCAP2, DEVCTL2, LNKCAP2 and LNKCTL2 registers on emulated bridge Pali Rohár
2021-05-06 15:31 ` [PATCH 35/42] PCI: aardvark: Add support for PCI_BRIDGE_CTL_BUS_RESET " Pali Rohár
2021-05-06 15:31 ` [PATCH 36/42] PCI: aardvark: Replace custom PCIE_CORE_ERR_CAPCTL_* macros by linux/pci_regs.h macros Pali Rohár
2021-05-06 15:31 ` [PATCH 37/42] PCI: aardvark: Replace custom PCIE_CORE_INT_* macros by linux PCI_INTERRUPT_* values Pali Rohár
2021-05-06 15:31 ` [PATCH 38/42] PCI: aardvark: Cleanup some register macros Pali Rohár
2021-05-06 15:31 ` [PATCH 39/42] PCI: aardvark: Add comments for OB_WIN_ENABLE and ADDR_WIN_DISABLE Pali Rohár
2021-05-06 15:31 ` [PATCH 40/42] PCI: pci-bridge-emul: re-arrange register tests Pali Rohár
2021-05-06 15:31 ` [PATCH 41/42] PCI: pci-bridge-emul: add support for PCIe extended capabilities Pali Rohár
2021-05-06 15:31 ` [PATCH 42/42] PCI: aardvark: Add support for Advanced Error Reporting registers on emulated bridge Pali Rohár
2021-06-03 15:16 ` [PATCH 00/42] PCI: aardvark: Various driver fixes Lorenzo Pieralisi
2021-06-03 17:02   ` Pali Rohár
2021-06-03 18:02   ` Simon Glass
2021-06-03 18:18     ` Pali Rohár
2021-06-04 14:05     ` Lorenzo Pieralisi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210519080627.sfs74d2zxopkarfl@pali \
    --to=pali@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=contact@xogium.me \
    --cc=kabel@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=maz@kernel.org \
    --cc=repk@triplefau.lt \
    --cc=rmk+kernel@armlinux.org.uk \
    --cc=robh@kernel.org \
    --cc=thomas.petazzoni@bootlin.com \
    --cc=tmn505@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).