From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from phobos.denx.de (phobos.denx.de [85.214.62.61]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8185FC433F5 for ; Thu, 5 May 2022 11:46:15 +0000 (UTC) Received: from h2850616.stratoserver.net (localhost [IPv6:::1]) by phobos.denx.de (Postfix) with ESMTP id 2244583D7F; Thu, 5 May 2022 13:46:13 +0200 (CEST) Authentication-Results: phobos.denx.de; dmarc=none (p=none dis=none) header.from=orcam.me.uk Authentication-Results: phobos.denx.de; spf=pass smtp.mailfrom=u-boot-bounces@lists.denx.de Received: by phobos.denx.de (Postfix, from userid 109) id 5EA6C83D7F; Thu, 5 May 2022 13:46:11 +0200 (CEST) Received: from angie.orcam.me.uk (angie.orcam.me.uk [78.133.224.34]) by phobos.denx.de (Postfix) with ESMTP id 4761F83B19 for ; Thu, 5 May 2022 13:46:08 +0200 (CEST) Authentication-Results: phobos.denx.de; dmarc=none (p=none dis=none) header.from=orcam.me.uk Authentication-Results: phobos.denx.de; spf=none smtp.mailfrom=macro@orcam.me.uk Received: by angie.orcam.me.uk (Postfix, from userid 500) id E85E292009C; Thu, 5 May 2022 13:46:07 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by angie.orcam.me.uk (Postfix) with ESMTP id E235692009B; Thu, 5 May 2022 12:46:07 +0100 (BST) Date: Thu, 5 May 2022 12:46:07 +0100 (BST) From: "Maciej W. Rozycki" To: =?UTF-8?Q?Pali_Roh=C3=A1r?= cc: Stefan Roese , Bin Meng , Simon Glass , u-boot@lists.denx.de Subject: Re: [PATCH] pci: Do not enable PCIe ASMedia ASM2824 workaround by default In-Reply-To: <20220501151804.3hqoevejieictnox@pali> Message-ID: References: <20220406150911.23927-1-pali@kernel.org> <0e7c3779-fb2e-4ef9-bcf0-08be2ce29bcb@denx.de> <20220501151804.3hqoevejieictnox@pali> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-BeenThere: u-boot@lists.denx.de X-Mailman-Version: 2.1.39 Precedence: list List-Id: U-Boot discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: u-boot-bounces@lists.denx.de Sender: "U-Boot" X-Virus-Scanned: clamav-milter 0.103.5 at phobos.denx.de X-Virus-Status: Clean On Sun, 1 May 2022, Pali Rohár wrote: > > By now I have become somewhat tired arguing and explaining matters over > > and over again as things have been moving as slow as molasses in this > > area, but one point I want to raise here is while it is indeed the ASM2824 > > device that seems problematic, it may actually be downstream, so you won't > > know it's there until you go through the workaround, as observed with the > > root port of the SiFive FU740-C000 SOC (which has a separate workaround in > > U-boot, clearly for the same issue; cf. `pcie_sifive_force_gen1'). So it > > looks like the erratum is going to show up with some device combinations > > in which the device enumerator may not have a way to know an ASM2824 is > > there until the workaround applied to an upstream device has let the link > > work. > > I see that Linux patch was not not merged yet and there are already > comments that this issue is probably board or arch specific and maybe > should be in arch/riscv linux dir: > https://lore.kernel.org/linux-pci/20220421202711.GA1415244@bhelgaas/ Maybe, maybe not, and given the lack of cooperation shown by ASMedia I am not going to spend any money on their hardware to get this figured out, such as previously mentioned (plus the necessary M.2 to PCIe slot adapters). Main concerns with my change so far appear to be some coding style issues which are due to my lack of following of the development of newer internal interfaces. That'll be fixed with the next iteration when I get to it (hopefully this coming weekend). I have a lot of stuff in progress (and a job and personal life too) and need to prioritise. > >From that comment I have feeling that the issue could be really specific > to board or combination of connected devices (ASM2824+PI7C9X2G304) as > there is really no other report about this issue. > > In any case it is weird. As I say, from the findings so far this seems to me like a protocol rather than signalling (electrical) issue, i.e. a design problem with either or both chips. A board design fault (such as improper decoupling, crosstalk, or whatever) could only result in a signalling issue. And if it was a signalling issue, then the 5GT/s rate couldn't be negotiated, whether automatically or by hand as with this workaround, and then work reliably over days to weeks. > > And as I previously already mentioned the Linux version of the workaround > > is only activated by the vendor:device ID because you cannot busy-loop > > polling on the Link Training bit in Linux (while you can do it in U-Boot, > > because U-Boot is not an OS). > > Is is not _only_ because of this. For a longer time there is a direction It is exactly and only for this. I think I know why I wrote it this way. > to specify exact list of _affected_ hw which needs workaround. And not > usage of wildcard which match all hardware, even unaffected. See for > example comments which are adding workaround for broken GIC HW: > https://lore.kernel.org/lkml/87ilsutb6w.wl-maz@kernel.org/ > > And this makes sense, workarounds should be targeted. Your observation only makes sense for a problem contained within a device itself and not for a problem with establishing a link between two devices. If the problem is caused by a downstream device, then you cannot match on it to activate a workaround because you won't know it's there until you have activated the workaround. It hasn't been determined here whether the problem is the upstream device, the downstream device or a combination of the two. And I don't think it is appropriate to put the burden onto the users to determine what the case is, by deliberately releasing a limited fix in a hope that something will break for someone and they will report it. It is an engineer's responsibility to ensure the quality of the solution made is of the highest quality feasible. If we knew what the exact actual cause is, then we could keep the workaround limited to the cases identified by the cause. Also FWIW the next iteration of my Linux change will no longer match the quirk on a device ID and will instead use the PCI bridge class ID with a further qualification within. I had a concern about the safety of the change, but I have got it mentally resolved now and only the potentially unsafe part will be further qualified by a device ID. Sadly I still need to limit the workaround to data link layer link active reporting capable devices, which will undoubtedly exclude some up to 5GT/s only hardware, due to the need to avoid busy-looping on the Link Training bit, which is a no-no in an OS. So the U-Boot code will remain the only workaround for some configurations. Maciej