From: Marek Vasut <marek.vasut@gmail.com>
To: Bjorn Helgaas <helgaas@kernel.org>,
Geert Uytterhoeven <geert@linux-m68k.org>
Cc: linux-pci <linux-pci@vger.kernel.org>,
Bjorn Helgaas <bhelgaas@google.com>,
Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
Wolfram Sang <wsa@the-dreams.de>,
Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>,
Linux-Renesas <linux-renesas-soc@vger.kernel.org>
Subject: Re: [PATCH V6] PCI: rcar: Add L1 link state fix into data abort hook
Date: Tue, 27 Jul 2021 19:08:17 +0200 [thread overview]
Message-ID: <88b82ef7-3e6e-fd3c-4d18-d497f7c1998c@gmail.com> (raw)
In-Reply-To: <20210726174925.GA624246@bjorn-Precision-5520>
On 7/26/21 7:49 PM, Bjorn Helgaas wrote:
> On Mon, Jul 26, 2021 at 04:47:54PM +0200, Geert Uytterhoeven wrote:
[...]
>>>> The R-Car PCIe controller is capable of handling L0s/L1 link states.
>>>> While the controller can enter and exit L0s link state, and exit L1
>>>> link state, without any additional action from the driver, to enter
>>>> L1 link state, the driver must complete the link state transition by
>>>> issuing additional commands to the controller.
>>>>
>>>> The problem is, this transition is not atomic. The controller sets
>>>> PMEL1RX bit in PMSR register upon reception of PM_ENTER_L1 DLLP from
>>>> the PCIe card, but then the controller enters some sort of inbetween
>>>> state. The driver must detect this condition and complete the link
>>>> state transition, by setting L1IATN bit in PMCTLR and waiting for
>>>> the link state transition to complete.
>>>>
>>>> If a PCIe access happens inside this window, where the controller
>>>> is between L0 and L1 link states, the access generates a fault and
>>>> the ARM 'imprecise external abort' handler is invoked.
>>>>
>>>> Just like other PCI controller drivers, here we hook the fault handler,
>>>> perform the fixup to help the controller enter L1 link state, and then
>>>> restart the instruction which triggered the fault. Since the controller
>>>> is in L1 link state now, the link can exit from L1 link state to L0 and
>>>> successfully complete the access.
>>>>
>>>> While it was suggested to disable L1 link state support completely on
>>>> the controller level, this would not prevent the L1 link state entry
>>>> initiated by the link partner. This happens e.g. in case a PCIe card
>>>> enters D3Hot state, which could be initiated from pci_set_power_state()
>>>> if the card indicates D3Hot support, which in turn means link must enter
>>>> L1 state. So instead, fix up the L1 link state after all.
>>>>
>>>> Note that this fixup is applicable only to Aarch32 R-Car controllers,
>>>> the Aarch64 R-Car perform the same fixup in TFA, see TFA commit [1]
>>>> 0969397f2 ("rcar_gen3: plat: Prevent PCIe hang during L1X config access")
>>>> [1] https://github.com/ARM-software/arm-trusted-firmware/commit/0969397f295621aa26b3d14b76dd397d22be58bf
>>>
>>> This patch is horribly ugly but it's working around a horrible
>>> hardware problem, and I don't have any better suggestions, so I guess
>>> we don't really have much choice.
>>>
>>> I do think the commit log is a bit glib:
>>>
>>> - "The R-Car PCIe controller is capable of handling L0s/L1 link
>>> states." AFAICT every PCIe device is required to handle L0 and L1
>>> without software assistance. So saying R-Car is "capable" puts a
>>> better face on this than seems warranted.
>>>
>>> L0s doesn't seem relevant at all; at least it doesn't seem to play
>>> a role in the patch. There's no such thing as "returning to L0s"
>>> as mentioned in the comment below; L0s is only reachable from L0.
>>> Returns from L1 only go to L0 (PCIe r5.0, fig 5-1).
>>>
>>> - "The problem is, this transition is not atomic." I think the
>>> *problem* is the hardware is broken in the first place. This
>>> transition is supposed to be invisible to software.
>>>
>>> - "Just like other PCI controller drivers ..." suggests that this is
>>> an ordinary situation that we shouldn't be concerned about. This
>>> patch may be the best we can do to work around a bad hardware
>>> defect, but it's definitely not ordinary.
>>>
>>> I think the other hook_fault_code() uses are for reporting
>>> legitimate PCIe errors, which most controllers log and turn
>>> into ~0 data responses without generating an abort or machine
>>> check, not things caused by hardware defects, so they're not
>>> really comparable.
>>>
>>> Has Renesas documented this as an erratum? Will future devices
>>> require additions to rcar_pcie_abort_handler_of_match[]?
>>>
>>> It'd be nice if the commit log mentioned the user-visible effect of
>>> this problem. I guess it does mention external aborts -- I assume you
>>> see those when downstream devices go to D3hot or when ASPM puts the
>>> link in L1? And the abort results in a reboot?
>>>
>>> To be clear, I'm not objecting to the patch. It's a hardware problem
>>> and we should work around it as best we can.
>>
>> Cool! So what's missing for this patch, which we have been polishing
>> for almost one year, to be applied, so innocent people can no longer
>> lock up an R-Car system just by inserting an ubiquitous Intel Ethernet
>> card, and suspending the system?
>
> Nothing missing from my point of view, so if Lorenzo is OK with it,
> he'll apply it. If I were applying it, I would make the commit log
> something like this:
>
> When the link is in L1, hardware should return it to L0
> automatically whenever a transaction targets a component on the
> other end of the link (PCIe r5.0, sec 5.2).
>
> The R-Car PCIe controller doesn't handle this transition correctly.
> If the link is not in L0, an MMIO transaction targeting a downstream
> device fails, and the controller reports an ARM imprecise external
> abort.
>
> Work around this by hooking the abort handler so the driver can
> detect this situation and help the hardware complete the link state
> transition.
>
> When the R-Car controller receives a PM_ENTER_L1 DLLP from the
> downstream component, it sets PMEL1RX bit in PMSR register, but then
> the controller enters some sort of in-between state. A subsequent
> MMIO transaction will fail, resulting in the external abort. The
> abort handler detects this condition and completes the link state
> transition by setting the L1IATN bit in PMCTLR and waiting for the
> link state transition to complete.
OK, should I submit V7 and just copy-paste this commit message in, or
wait for Lorenzo to provide clear direction ?
> I assume that on the PCIe side, there must be an error like
> Unsupported Request or Malformed TLP, and the R-Car controller is
> logging that and turning it into the ARM external abort?
>
> I didn't see a clear response to Pali's question about what happens if
> there's no MMIO access, e.g., what if the downstream device initiates
> a DMA or MSI transaction?
If the link is in this state, the packet won't reach the root complex,
so nothing happens. And I don't see a good way to fix that option.
next prev parent reply other threads:[~2021-07-27 17:09 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-14 20:05 [PATCH V6] PCI: rcar: Add L1 link state fix into data abort hook marek.vasut
2021-05-17 7:39 ` Geert Uytterhoeven
2021-07-17 17:33 ` Bjorn Helgaas
2021-07-17 18:14 ` Marek Vasut
2021-07-19 8:59 ` Lorenzo Pieralisi
2021-07-19 15:38 ` Marek Vasut
2021-07-19 17:23 ` Pali Rohár
2021-07-19 18:39 ` Marek Vasut
2021-07-22 20:31 ` Pali Rohár
2021-07-19 22:06 ` Bjorn Helgaas
2021-07-27 16:11 ` Lorenzo Pieralisi
2021-07-27 16:16 ` Geert Uytterhoeven
2021-07-26 14:47 ` Geert Uytterhoeven
2021-07-26 17:49 ` Bjorn Helgaas
2021-07-27 16:32 ` Lorenzo Pieralisi
2021-08-05 18:30 ` Pali Rohár
2021-07-27 17:08 ` Marek Vasut [this message]
2021-08-04 11:06 ` Lorenzo Pieralisi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=88b82ef7-3e6e-fd3c-4d18-d497f7c1998c@gmail.com \
--to=marek.vasut@gmail.com \
--cc=bhelgaas@google.com \
--cc=geert@linux-m68k.org \
--cc=helgaas@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-renesas-soc@vger.kernel.org \
--cc=lorenzo.pieralisi@arm.com \
--cc=wsa@the-dreams.de \
--cc=yoshihiro.shimoda.uh@renesas.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).