linux-riscv.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Xie <kevin.xie@starfivetech.com>
To: Bo Gan <ganboing@gmail.com>, Keith Busch <kbusch@kernel.org>
Cc: Lorenzo Pieralisi <lpieralisi@kernel.org>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Minda Chen <minda.chen@starfivetech.com>,
	Conor Dooley <conor@kernel.org>, "kw@linux.com" <kw@linux.com>,
	"robh+dt@kernel.org" <robh+dt@kernel.org>,
	"bhelgaas@google.com" <bhelgaas@google.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"daire.mcnamara@microchip.com" <daire.mcnamara@microchip.com>,
	"emil.renner.berthing@canonical.com"
	<emil.renner.berthing@canonical.com>,
	"krzysztof.kozlowski+dt@linaro.org"
	<krzysztof.kozlowski+dt@linaro.org>,
	"devicetree@vger.kernel.org" <devicetree@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-riscv@lists.infradead.org"
	<linux-riscv@lists.infradead.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	"aou@eecs.berkeley.edu" <aou@eecs.berkeley.edu>,
	"p.zabel@pengutronix.de" <p.zabel@pengutronix.de>,
	Mason Huo <mason.huo@starfivetech.com>,
	Leyfoon Tan <leyfoon.tan@starfivetech.com>
Subject: Re: [PATCH v15,RESEND 22/23] PCI: starfive: Offload the NVMe timeout workaround to host drivers.
Date: Wed, 20 Mar 2024 08:42:43 +0000	[thread overview]
Message-ID: <ZQ0PR01MB0981DB4002BAECB38DAF0AE18233A@ZQ0PR01MB0981.CHNPR01.prod.partner.outlook.cn> (raw)
In-Reply-To: <16a1e6c6-2e2f-08e5-8da0-1462cec57e1f@gmail.com>

> On 3/13/24 7:51 PM, Keith Busch wrote:
> > On Thu, Mar 14, 2024 at 02:18:38AM +0000, Kevin Xie wrote:
> >>> Re: [PATCH v15,RESEND 22/23] PCI: starfive: Offload the NVMe timeout
> >>> workaround to host drivers.
> >>>
> >>> On Mon, Mar 04, 2024 at 10:08:06AM -0800, Palmer Dabbelt wrote:
> >>>> On Thu, 29 Feb 2024 07:08:43 PST (-0800), lpieralisi@kernel.org wrote:
> >>>>> On Tue, Feb 27, 2024 at 06:35:21PM +0800, Minda Chen wrote:
> >>>>>> From: Kevin Xie <kevin.xie@starfivetech.com>
> >>>>>>
> >>>>>> As the Starfive JH7110 hardware can't keep two inbound post write
> >>>>>> in order all the time, such as MSI messages and NVMe completions.
> >>>>>> If the NVMe completion update later than the MSI, an NVMe IRQ
> >>>>>> handle
> >>> will miss.
> >>>>>
> >>>>> Please explain what the problem is and what "NVMe completions"
> >>>>> means given that you are talking about posted writes.
> 
> Echoing Keith here. Why are you treating NVMe completions + MSI as a special
> case?
> What's special about this combination other than two posted writes? I own
> JH7110 visionfive 2 boards myself, and if I'm not mistaken, there are two
> identical PCIe controllers in JH7110. The first one connects the onboard USB
> controller of vf2, which also enables MSI interrupts. How come this exact
> problem not affecting the USB controller? The commit message from Minda
> strongly suggests it does, and also for R8169 NIC. Thus, why would you suggest
> the problem is confined to NVMe?
> 
> Bo
> 

Hi, Bo, 
Yes, we have two PCIe controller in JH7110 SoC, and the USB hub & NIC over
PCIe work fine no matter we apply this patch or not.

Let me explain in detail about the origin of this patch:
As described in the title, we fixed the timeout issue by a workaround in NVMe
driver, that may affects all other platforms. Thus, we wanted to offload the
workaround from NVMe driver to our PCIe controller platform driver.
After finished the offload patch, we wanted to test if it does harm to the other
PCIe devices, so you can see we tested with R8169 NIC in description.

MSI and NVMe completion are two inbound post requests from NVMe device
to JH7110.
We made the mistake of generalizing both of them as " two inbound post write", 
because we had been investigating the issue in the direction of inbound ordering.

Actually, the phenomenon of this problem is:
"JH7110 have a small probability of not getting the updated NVMe completion 
pending status in NVMe MSI handler, and we can get one after 1 to 3us delay."
Thus, that may related to the ordering, cache consistency or other reasons.
This issue is still under investigation.
 
> >>
> >> Sorry, we made a casual conclusion here.
> >> Not any two of inbound post requests can`t be kept in order in JH7110
> >> SoC, the only one case we found is NVMe completions with MSI interrupts.
> >> To be more precise, they are the pending status in nvme_completion
> >> struct and nvme_irq handler in nvme/host/pci.c.
> >>
> >> We have shown the original workaround patch before:
> >>
> https://lore.kernel.org/lkml/CAJM55Z9HtBSyCq7rDEDFdw644pOWCKJfPqhmi3
> S
> >> D1x6p3g2SLQ@mail.gmail.com/ We put it in our github branch and works
> >> fine for a long time.
> >> Looking forward to better advices from someone familiar with NVMe
> drivers.
> >
> > So this platform treats strictly ordered writes the same as if relaxed
> > ordering was enabled? I am not sure if we could reasonably work around
> > such behavior. An arbitrary delay is likely too long for most cases,
> > and too short for the worst case.
> >
> > I suppose we could quirk a non-posted transaction in the interrupt
> > handler to force flush pending memory updates, but that will
> > noticeably harm your nvme performance. Maybe if you constrain such
> > behavior to the spurious IRQ_NONE condition, then it might be okay? I don't
> know.
> >
> 
> Also copied Keith's latest reply below, and I also have the same doubt.
> 

Hi, Keith, sorry for the late reply.
We have tried to add a dummy non-post request( config read ) in the handler,
but it doesn't help.
Besides, we tried to add the mb() before checking the NVMe completion, 
and it doesn't help too.

> > Hm, that may not be good enough: if nvme completions can be reordered
> > with their msi's, then I assume data may reorder with their completion.
> > Your application will inevitably see stale and corrupted data, so it
> > sounds like you need some kind of barrier per completion. Ouch!

If we do not apply the patch, we might get the timeout warnings and waste
some time, the problem seems to be less serious than you described.
After applying the workaround, we can do tasks with NVMe SSD normally,
such as boot up, hibernation and saving data.
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  reply	other threads:[~2024-03-20  8:43 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-27 10:34 [PATCH v15,RESEND 00/23] Refactoring Microchip PCIe driver and add StarFive PCIe Minda Chen
2024-02-27 10:35 ` [PATCH v15,RESEND 01/23] dt-bindings: PCI: Add PLDA XpressRICH PCIe host common properties Minda Chen
2024-02-27 10:35 ` [PATCH v15,RESEND 02/23] PCI: microchip: Move pcie-microchip-host.c to plda directory Minda Chen
2024-02-27 10:35 ` [PATCH v15,RESEND 03/23] PCI: microchip: Move PLDA IP register macros to pcie-plda.h Minda Chen
2024-02-29 10:11   ` Lorenzo Pieralisi
2024-02-29 10:52     ` Minda Chen
2024-02-27 10:35 ` [PATCH v15,RESEND 04/23] PCI: microchip: Add bridge_addr field to struct mc_pcie Minda Chen
2024-02-27 10:35 ` [PATCH v15,RESEND 05/23] PCI: microchip: Rename two PCIe data structures Minda Chen
2024-02-29 10:01   ` Lorenzo Pieralisi
2024-03-01 11:00     ` Minda Chen
2024-02-27 10:35 ` [PATCH v15,RESEND 06/23] PCI: microchip: Move PCIe host data structures to plda-pcie.h Minda Chen
2024-02-27 10:35 ` [PATCH v15,RESEND 07/23] PCI: microchip: Rename two setup functions Minda Chen
2024-02-27 10:35 ` [PATCH v15,RESEND 08/23] PCI: microchip: Change the argument of plda_pcie_setup_iomems() Minda Chen
2024-02-27 10:35 ` [PATCH v15,RESEND 09/23] PCI: microchip: Move setup functions to pcie-plda-host.c Minda Chen
2024-02-27 10:35 ` [PATCH v15,RESEND 10/23] PCI: microchip: Rename interrupt related functions Minda Chen
2024-02-27 10:35 ` [PATCH v15,RESEND 11/23] PCI: microchip: Add num_events field to struct plda_pcie_rp Minda Chen
2024-02-27 10:35 ` [PATCH v15,RESEND 12/23] PCI: microchip: Add request_event_irq() callback function Minda Chen
2024-02-27 10:35 ` [PATCH v15,RESEND 13/23] PCI: microchip: Add INTx and MSI event num to struct plda_event Minda Chen
2024-02-27 10:35 ` [PATCH v15,RESEND 14/23] PCI: microchip: Add get_events() callback and add PLDA get_event() Minda Chen
2024-02-27 10:35 ` [PATCH v15,RESEND 15/23] PCI: microchip: Add event irqchip field to host port and add PLDA irqchip Minda Chen
2024-02-27 10:35 ` [PATCH v15,RESEND 16/23] PCI: microchip: Move IRQ functions to pcie-plda-host.c Minda Chen
2024-02-27 10:35 ` [PATCH v15,RESEND 17/23] PCI: plda: Add event bitmap field to struct plda_pcie_rp Minda Chen
2024-02-27 10:35 ` [PATCH v15,RESEND 18/23] PCI: plda: Add host init/deinit and map bus functions Minda Chen
2024-02-27 10:35 ` [PATCH v15,RESEND 19/23] dt-bindings: PCI: Add StarFive JH7110 PCIe controller Minda Chen
2024-02-27 10:35 ` [PATCH v15,RESEND 20/23] PCI: Add PCIE_RESET_CONFIG_DEVICE_WAIT_MS waiting time value Minda Chen
2024-02-27 10:35 ` [PATCH v15,RESEND 21/23] PCI: starfive: Add JH7110 PCIe controller Minda Chen
2024-02-29 14:24   ` Lorenzo Pieralisi
2024-03-04 12:52     ` Kevin Xie
2024-02-27 10:35 ` [PATCH v15,RESEND 22/23] PCI: starfive: Offload the NVMe timeout workaround to host drivers Minda Chen
2024-02-29 15:08   ` Lorenzo Pieralisi
2024-03-04 18:08     ` Palmer Dabbelt
2024-03-05 15:56       ` Lorenzo Pieralisi
2024-03-14  2:18         ` Kevin Xie
2024-03-14  2:51           ` Keith Busch
2024-03-14  3:39             ` Keith Busch
2024-03-20  7:12             ` Bo Gan
2024-03-20  8:42               ` Kevin Xie [this message]
2024-02-27 10:35 ` [PATCH v15,RESEND 23/23] riscv: dts: starfive: add PCIe dts configuration for JH7110 Minda Chen
2024-02-27 18:06 ` [PATCH v15,RESEND 00/23] Refactoring Microchip PCIe driver and add StarFive PCIe Aurelien Jarno

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZQ0PR01MB0981DB4002BAECB38DAF0AE18233A@ZQ0PR01MB0981.CHNPR01.prod.partner.outlook.cn \
    --to=kevin.xie@starfivetech.com \
    --cc=aou@eecs.berkeley.edu \
    --cc=bhelgaas@google.com \
    --cc=conor@kernel.org \
    --cc=daire.mcnamara@microchip.com \
    --cc=devicetree@vger.kernel.org \
    --cc=emil.renner.berthing@canonical.com \
    --cc=ganboing@gmail.com \
    --cc=kbusch@kernel.org \
    --cc=krzysztof.kozlowski+dt@linaro.org \
    --cc=kw@linux.com \
    --cc=leyfoon.tan@starfivetech.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=lpieralisi@kernel.org \
    --cc=mason.huo@starfivetech.com \
    --cc=minda.chen@starfivetech.com \
    --cc=p.zabel@pengutronix.de \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=robh+dt@kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).