linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: <Kelvin.Cao@microchip.com>
To: <helgaas@kernel.org>, <logang@deltatee.com>
Cc: <kurt.schwemmer@microsemi.com>, <bhelgaas@google.com>,
	<linux-pci@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<kelvincao@outlook.com>
Subject: Re: [PATCH 1/5] PCI/switchtec: Error out MRPC execution when no GAS access
Date: Fri, 1 Oct 2021 23:49:18 +0000	[thread overview]
Message-ID: <ed856f361ef3ca80e34c4565daffe6e566a8baa3.camel@microchip.com> (raw)
In-Reply-To: <21326395-6d4b-07aa-f445-ecc5dc189d17@deltatee.com>

On Fri, 2021-10-01 at 14:29 -0600, Logan Gunthorpe wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you
> know the content is safe
> 
> On 2021-10-01 2:18 p.m., Bjorn Helgaas wrote:
> > On Fri, Sep 24, 2021 at 11:08:38AM +0000, kelvin.cao@microchip.com
> > wrote:
> > > From: Kelvin Cao <kelvin.cao@microchip.com>
> > > 
> > > After a firmware hard reset, MRPC command executions, which are
> > > based
> > > on the PCI BAR (which Microchip refers to as GAS) read/write,
> > > will hang
> > > indefinitely. This is because after a reset, the host will fail
> > > all GAS
> > > reads (get all 1s), in which case the driver won't get a valid
> > > MRPC
> > > status.
> > 
> > Trying to write a merge commit log for this, but having a hard time
> > summarizing it.  It sounds like it covers both Switchtec-specific
> > (firmware and MRPC commands) and generic PCIe behavior (MMIO read
> > failures).
> > 
> > This has something to do with a firmware hard reset.  What is that?
> > Is that like a firmware reboot?  A device reset, e.g., FLR or
> > secondary bus reset, that causes a firmware reboot?  A device reset
> > initiated by firmware?
A firmware reset can be triggered by a reset command issued to the
firmware to reboot it.
> > Anyway, apparently when that happens, MMIO reads to the switch fail
> > (timeout or error completion on PCIe) for a while.  If a device
> > reset
> > is involved, that much is standard PCIe behavior.  And the driver
> > sees
> > ~0 data from those failed reads.  That's not part of the PCIe spec,
> > but is typical root complex behavior.
> > 
> > But you said the MRPC commands hang indefinitely.  Presumably MMIO
> > reads would start succeeding eventually when the device becomes
> > ready,
> > so I don't know how that translates to "indefinitely."
> 
> I suspect Kelvin can expand on this and fix the issue below. But in
> my
> experience, the MMIO will read ~0 forever after a firmware reset,
> until
> the system is rebooted. Presumably on systems that have good hot plug
> support they are supposed to recover. Though I've never seen that.

This is also my observation, all MMIO read will fail (~0 returned)
until the system is rebooted or a PCI rescan is performed.

> The MMIO read that signals the MRPC status always returns ~0 and the
> userspace request will eventually time out.

The problem in this case is that, in DMA MRPC mode, the status (in host
memory) is always initialized to 'in progress', and it's up to the
firmware to update it to 'done' after the command is executed in the
firmware. After a firmware reset is performed, the firmware cannot be
triggered to start a MRPC command, therefore the status in host memory
remains 'in progress' in the driver, which prevents a MRPC from timing
out. I should have included this in the message.
> 
> > Weird to refer to a PCI BAR as "GAS".  Maybe expanding the acronym
> > would help it make sense.
> GAS is the term used by the firmware developers and is in all their
> documentation. It stands for Global Address Space.
> 
> > What does "host" refer to?  I guess it's the switch (the
> > switchtec_dev), since you say it fails MMIO reads?
> 
> Yes, a bit confusing. The firmware is dead or not setup right so MMIO
> reads are not succeeding and the root complex is returning ~0 to the
> driver on reads.
Ditto. Will update in v2.
> 
> Logan

  reply	other threads:[~2021-10-01 23:49 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-24 11:08 [PATCH 0/5] Switchtec Fixes and Improvements kelvin.cao
2021-09-24 11:08 ` [PATCH 1/5] PCI/switchtec: Error out MRPC execution when no GAS access kelvin.cao
2021-10-01 20:18   ` Bjorn Helgaas
2021-10-01 20:29     ` Logan Gunthorpe
2021-10-01 23:49       ` Kelvin.Cao [this message]
2021-10-02 15:11         ` Bjorn Helgaas
2021-10-04 20:51           ` Kelvin.Cao
2021-10-05 20:11             ` Bjorn Helgaas
2021-10-06  0:37               ` Kelvin.Cao
2021-10-06  2:33                 ` Bjorn Helgaas
2021-10-06  5:49                   ` Kelvin.Cao
2021-10-06 14:19                     ` Bjorn Helgaas
2021-10-06 19:00                       ` Kelvin.Cao
2021-10-06 20:20                         ` Bjorn Helgaas
2021-10-06 21:27                           ` Kelvin.Cao
2021-10-07 21:23                             ` Bjorn Helgaas
2021-10-08  0:06                               ` Kelvin.Cao
2021-10-08 11:03                                 ` Bjorn Helgaas
2021-10-01 22:58     ` Kelvin.Cao
2021-10-01 23:52       ` Logan Gunthorpe
2021-10-02  0:05         ` Kelvin.Cao
2021-09-24 11:08 ` [PATCH 2/5] PCI/switchtec: Fix a MRPC error status handling issue kelvin.cao
2021-09-24 11:08 ` [PATCH 3/5] PCI/switchtec: Update the way of getting management VEP instance ID kelvin.cao
2021-09-24 11:08 ` [PATCH 4/5] PCI/switchtec: Replace ENOTSUPP with EOPNOTSUPP kelvin.cao
2021-09-24 11:08 ` [PATCH 5/5] PCI/switchtec: Add check of event support kelvin.cao
2021-09-24 15:53 ` [PATCH 0/5] Switchtec Fixes and Improvements Logan Gunthorpe
2021-09-25  5:27   ` Kelvin.Cao
2021-09-27 16:39 ` Bjorn Helgaas
2021-09-27 18:25   ` Kelvin.Cao
2021-10-08 17:05 ` Bjorn Helgaas
2021-10-08 17:23   ` Logan Gunthorpe
2021-10-08 18:25     ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ed856f361ef3ca80e34c4565daffe6e566a8baa3.camel@microchip.com \
    --to=kelvin.cao@microchip.com \
    --cc=bhelgaas@google.com \
    --cc=helgaas@kernel.org \
    --cc=kelvincao@outlook.com \
    --cc=kurt.schwemmer@microsemi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=logang@deltatee.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).