From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-21.7 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B2D9C636CA for ; Sat, 17 Jul 2021 17:33:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6572E6115C for ; Sat, 17 Jul 2021 17:33:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232935AbhGQRgd (ORCPT ); Sat, 17 Jul 2021 13:36:33 -0400 Received: from mail.kernel.org ([198.145.29.99]:34216 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232828AbhGQRgd (ORCPT ); Sat, 17 Jul 2021 13:36:33 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 1752160FE9; Sat, 17 Jul 2021 17:33:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1626543216; bh=8UJsRAhhjNWe5moISiQfYaQWURDgLORzw685VSH4JKQ=; h=Date:From:To:Cc:Subject:In-Reply-To:From; b=hFNYYlHH6LBqD9Uu4Lq5RXs7fZ9XjVh1W4kXnUJ/rz0FIkBQj3vEKefS/PriWYe42 VwT0vIzW97WA2eX6V84+ZxDRXc2ROcofx0FZ+uucBg0R/MjVlo7RVNWo13nRCzFdDP Z3p24L6wyZk1U7KClIrPmChGnGgu007K/pZHvDz19GJBoZK2xSvnn+zuBrSMC1lQbR G0aPzaKK69wZZEg6VE62tgZXmRPtIvAQbamLB5pqt5pFu/obPS/m2KDLUZVnFziH4I UVNGwk8TrdTkLGJMgr64aXdH5RBIbVGRpNc9pagcw9RkFbspyWVqO4DkkwDGr3CQA1 JccO77A7y1z9g== Date: Sat, 17 Jul 2021 12:33:34 -0500 From: Bjorn Helgaas To: marek.vasut@gmail.com Cc: linux-pci@vger.kernel.org, Marek Vasut , Bjorn Helgaas , Geert Uytterhoeven , Lorenzo Pieralisi , Wolfram Sang , Yoshihiro Shimoda , linux-renesas-soc@vger.kernel.org Subject: Re: [PATCH V6] PCI: rcar: Add L1 link state fix into data abort hook Message-ID: <20210717173334.GA2232818@bjorn-Precision-5520> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210514200549.431275-1-marek.vasut@gmail.com> Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org On Fri, May 14, 2021 at 10:05:49PM +0200, marek.vasut@gmail.com wrote: > From: Marek Vasut > > The R-Car PCIe controller is capable of handling L0s/L1 link states. > While the controller can enter and exit L0s link state, and exit L1 > link state, without any additional action from the driver, to enter > L1 link state, the driver must complete the link state transition by > issuing additional commands to the controller. > > The problem is, this transition is not atomic. The controller sets > PMEL1RX bit in PMSR register upon reception of PM_ENTER_L1 DLLP from > the PCIe card, but then the controller enters some sort of inbetween > state. The driver must detect this condition and complete the link > state transition, by setting L1IATN bit in PMCTLR and waiting for > the link state transition to complete. > > If a PCIe access happens inside this window, where the controller > is between L0 and L1 link states, the access generates a fault and > the ARM 'imprecise external abort' handler is invoked. > > Just like other PCI controller drivers, here we hook the fault handler, > perform the fixup to help the controller enter L1 link state, and then > restart the instruction which triggered the fault. Since the controller > is in L1 link state now, the link can exit from L1 link state to L0 and > successfully complete the access. > > While it was suggested to disable L1 link state support completely on > the controller level, this would not prevent the L1 link state entry > initiated by the link partner. This happens e.g. in case a PCIe card > enters D3Hot state, which could be initiated from pci_set_power_state() > if the card indicates D3Hot support, which in turn means link must enter > L1 state. So instead, fix up the L1 link state after all. > > Note that this fixup is applicable only to Aarch32 R-Car controllers, > the Aarch64 R-Car perform the same fixup in TFA, see TFA commit [1] > 0969397f2 ("rcar_gen3: plat: Prevent PCIe hang during L1X config access") > [1] https://github.com/ARM-software/arm-trusted-firmware/commit/0969397f295621aa26b3d14b76dd397d22be58bf This patch is horribly ugly but it's working around a horrible hardware problem, and I don't have any better suggestions, so I guess we don't really have much choice. I do think the commit log is a bit glib: - "The R-Car PCIe controller is capable of handling L0s/L1 link states." AFAICT every PCIe device is required to handle L0 and L1 without software assistance. So saying R-Car is "capable" puts a better face on this than seems warranted. L0s doesn't seem relevant at all; at least it doesn't seem to play a role in the patch. There's no such thing as "returning to L0s" as mentioned in the comment below; L0s is only reachable from L0. Returns from L1 only go to L0 (PCIe r5.0, fig 5-1). - "The problem is, this transition is not atomic." I think the *problem* is the hardware is broken in the first place. This transition is supposed to be invisible to software. - "Just like other PCI controller drivers ..." suggests that this is an ordinary situation that we shouldn't be concerned about. This patch may be the best we can do to work around a bad hardware defect, but it's definitely not ordinary. I think the other hook_fault_code() uses are for reporting legitimate PCIe errors, which most controllers log and turn into ~0 data responses without generating an abort or machine check, not things caused by hardware defects, so they're not really comparable. Has Renesas documented this as an erratum? Will future devices require additions to rcar_pcie_abort_handler_of_match[]? It'd be nice if the commit log mentioned the user-visible effect of this problem. I guess it does mention external aborts -- I assume you see those when downstream devices go to D3hot or when ASPM puts the link in L1? And the abort results in a reboot? To be clear, I'm not objecting to the patch. It's a hardware problem and we should work around it as best we can. > Signed-off-by: Marek Vasut > Cc: Bjorn Helgaas > Cc: Geert Uytterhoeven > Cc: Lorenzo Pieralisi > Cc: Wolfram Sang > Cc: Yoshihiro Shimoda > Cc: linux-renesas-soc@vger.kernel.org > --- > V2: - Update commit message, add link to TFA repository commit > - Handle the LPAE case as in ARM fault.c and fsr-{2,3}level.c > - Cache clock and check whether they are enabled before register > access > V3: - Fix commit message according to spellchecker > - Use of_find_matching_node() to apply hook only on Gen1 and Gen2 RCar > (in case the kernel is multiplatform) > V4: - Mark rcar_pcie_abort_handler_of_match with __initconst > V5: - Add mutex around rcar_pcie_aarch32_abort_handler() > - Update commit message again to point out issues with L1/D3Hot states > V6: - Return 1 only if condition cannot be fixed > --- > drivers/pci/controller/pcie-rcar-host.c | 84 +++++++++++++++++++++++++ > drivers/pci/controller/pcie-rcar.h | 7 +++ > 2 files changed, 91 insertions(+) > > diff --git a/drivers/pci/controller/pcie-rcar-host.c b/drivers/pci/controller/pcie-rcar-host.c > index 765cf2b45e24..0d3f8dc5ff8a 100644 > --- a/drivers/pci/controller/pcie-rcar-host.c > +++ b/drivers/pci/controller/pcie-rcar-host.c > @@ -13,6 +13,7 @@ > > #include > #include > +#include > #include > #include > #include > @@ -41,6 +42,21 @@ struct rcar_msi { > int irq2; > }; > > +#ifdef CONFIG_ARM > +/* > + * Here we keep a static copy of the remapped PCIe controller address. > + * This is only used on aarch32 systems, all of which have one single > + * PCIe controller, to provide quick access to the PCIe controller in > + * the L1 link state fixup function, called from the ARM fault handler. > + */ > +static void __iomem *pcie_base; > +/* > + * Static copy of bus clock pointer, so we can check whether the clock > + * is enabled or not. > + */ > +static struct clk *pcie_bus_clk; > +#endif > + > /* Structure representing the PCIe interface */ > struct rcar_pcie_host { > struct rcar_pcie pcie; > @@ -776,6 +792,12 @@ static int rcar_pcie_get_resources(struct rcar_pcie_host *host) > } > host->msi.irq2 = i; > > +#ifdef CONFIG_ARM > + /* Cache static copy for L1 link state fixup hook on aarch32 */ > + pcie_base = pcie->base; > + pcie_bus_clk = host->bus_clk; > +#endif > + > return 0; > > err_irq2: > @@ -1031,4 +1053,66 @@ static struct platform_driver rcar_pcie_driver = { > }, > .probe = rcar_pcie_probe, > }; > + > +#ifdef CONFIG_ARM > +static DEFINE_SPINLOCK(pmsr_lock); > +static int rcar_pcie_aarch32_abort_handler(unsigned long addr, > + unsigned int fsr, struct pt_regs *regs) > +{ > + unsigned long flags; > + int ret = 0; > + u32 pmsr; > + > + spin_lock_irqsave(&pmsr_lock, flags); > + > + if (!pcie_base || !__clk_is_enabled(pcie_bus_clk)) { > + ret = 1; > + goto unlock_exit; > + } > + > + pmsr = readl(pcie_base + PMSR); > + > + /* > + * Test if the PCIe controller received PM_ENTER_L1 DLLP and > + * the PCIe controller is not in L1 link state. If true, apply > + * fix, which will put the controller into L1 link state, from > + * which it can return to L0s/L0 on its own. > + */ > + if ((pmsr & PMEL1RX) && ((pmsr & PMSTATE) != PMSTATE_L1)) { > + writel(L1IATN, pcie_base + PMCTLR); > + while (!(readl(pcie_base + PMSR) & L1FAEG)) > + ; > + writel(L1FAEG | PMEL1RX, pcie_base + PMSR); > + } > + > +unlock_exit: > + spin_unlock_irqrestore(&pmsr_lock, flags); > + return ret; > +} > + > +static const struct of_device_id rcar_pcie_abort_handler_of_match[] __initconst = { > + { .compatible = "renesas,pcie-r8a7779" }, > + { .compatible = "renesas,pcie-r8a7790" }, > + { .compatible = "renesas,pcie-r8a7791" }, > + { .compatible = "renesas,pcie-rcar-gen2" }, > + {}, > +}; > + > +static int __init rcar_pcie_init(void) > +{ > + if (of_find_matching_node(NULL, rcar_pcie_abort_handler_of_match)) { > +#ifdef CONFIG_ARM_LPAE > + hook_fault_code(17, rcar_pcie_aarch32_abort_handler, SIGBUS, 0, > + "asynchronous external abort"); > +#else > + hook_fault_code(22, rcar_pcie_aarch32_abort_handler, SIGBUS, 0, > + "imprecise external abort"); > +#endif > + } > + > + return platform_driver_register(&rcar_pcie_driver); > +} > +device_initcall(rcar_pcie_init); > +#else > builtin_platform_driver(rcar_pcie_driver); > +#endif > diff --git a/drivers/pci/controller/pcie-rcar.h b/drivers/pci/controller/pcie-rcar.h > index d4c698b5f821..9bb125db85c6 100644 > --- a/drivers/pci/controller/pcie-rcar.h > +++ b/drivers/pci/controller/pcie-rcar.h > @@ -85,6 +85,13 @@ > #define LTSMDIS BIT(31) > #define MACCTLR_INIT_VAL (LTSMDIS | MACCTLR_NFTS_MASK) > #define PMSR 0x01105c > +#define L1FAEG BIT(31) > +#define PMEL1RX BIT(23) > +#define PMSTATE GENMASK(18, 16) > +#define PMSTATE_L1 (3 << 16) > +#define PMCTLR 0x011060 > +#define L1IATN BIT(31) > + > #define MACS2R 0x011078 > #define MACCGSPSETR 0x011084 > #define SPCNGRSN BIT(31) > -- > 2.30.2 >