linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Logan Gunthorpe <logang@deltatee.com>
To: Bjorn Helgaas <helgaas@kernel.org>, kelvin.cao@microchip.com
Cc: kurt.schwemmer@microsemi.com, bhelgaas@google.com,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	kelvincao@outlook.com
Subject: Re: [PATCH 1/5] PCI/switchtec: Error out MRPC execution when no GAS access
Date: Fri, 1 Oct 2021 14:29:08 -0600	[thread overview]
Message-ID: <21326395-6d4b-07aa-f445-ecc5dc189d17@deltatee.com> (raw)
In-Reply-To: <20211001201822.GA962472@bhelgaas>



On 2021-10-01 2:18 p.m., Bjorn Helgaas wrote:
> On Fri, Sep 24, 2021 at 11:08:38AM +0000, kelvin.cao@microchip.com wrote:
>> From: Kelvin Cao <kelvin.cao@microchip.com>
>>
>> After a firmware hard reset, MRPC command executions, which are based
>> on the PCI BAR (which Microchip refers to as GAS) read/write, will hang
>> indefinitely. This is because after a reset, the host will fail all GAS
>> reads (get all 1s), in which case the driver won't get a valid MRPC
>> status.
> 
> Trying to write a merge commit log for this, but having a hard time
> summarizing it.  It sounds like it covers both Switchtec-specific
> (firmware and MRPC commands) and generic PCIe behavior (MMIO read
> failures).
> 
> This has something to do with a firmware hard reset.  What is that?
> Is that like a firmware reboot?  A device reset, e.g., FLR or
> secondary bus reset, that causes a firmware reboot?  A device reset
> initiated by firmware?
> 
> Anyway, apparently when that happens, MMIO reads to the switch fail
> (timeout or error completion on PCIe) for a while.  If a device reset
> is involved, that much is standard PCIe behavior.  And the driver sees
> ~0 data from those failed reads.  That's not part of the PCIe spec,
> but is typical root complex behavior.
> 
> But you said the MRPC commands hang indefinitely.  Presumably MMIO
> reads would start succeeding eventually when the device becomes ready,
> so I don't know how that translates to "indefinitely."

I suspect Kelvin can expand on this and fix the issue below. But in my
experience, the MMIO will read ~0 forever after a firmware reset, until
the system is rebooted. Presumably on systems that have good hot plug
support they are supposed to recover. Though I've never seen that.

The MMIO read that signals the MRPC status always returns ~0 and the
userspace request will eventually time out.

> Weird to refer to a PCI BAR as "GAS".  Maybe expanding the acronym
> would help it make sense.
GAS is the term used by the firmware developers and is in all their
documentation. It stands for Global Address Space.

> What does "host" refer to?  I guess it's the switch (the
> switchtec_dev), since you say it fails MMIO reads?

Yes, a bit confusing. The firmware is dead or not setup right so MMIO
reads are not succeeding and the root complex is returning ~0 to the
driver on reads.

Logan

  reply	other threads:[~2021-10-01 20:29 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-24 11:08 [PATCH 0/5] Switchtec Fixes and Improvements kelvin.cao
2021-09-24 11:08 ` [PATCH 1/5] PCI/switchtec: Error out MRPC execution when no GAS access kelvin.cao
2021-10-01 20:18   ` Bjorn Helgaas
2021-10-01 20:29     ` Logan Gunthorpe [this message]
2021-10-01 23:49       ` Kelvin.Cao
2021-10-02 15:11         ` Bjorn Helgaas
2021-10-04 20:51           ` Kelvin.Cao
2021-10-05 20:11             ` Bjorn Helgaas
2021-10-06  0:37               ` Kelvin.Cao
2021-10-06  2:33                 ` Bjorn Helgaas
2021-10-06  5:49                   ` Kelvin.Cao
2021-10-06 14:19                     ` Bjorn Helgaas
2021-10-06 19:00                       ` Kelvin.Cao
2021-10-06 20:20                         ` Bjorn Helgaas
2021-10-06 21:27                           ` Kelvin.Cao
2021-10-07 21:23                             ` Bjorn Helgaas
2021-10-08  0:06                               ` Kelvin.Cao
2021-10-08 11:03                                 ` Bjorn Helgaas
2021-10-01 22:58     ` Kelvin.Cao
2021-10-01 23:52       ` Logan Gunthorpe
2021-10-02  0:05         ` Kelvin.Cao
2021-09-24 11:08 ` [PATCH 2/5] PCI/switchtec: Fix a MRPC error status handling issue kelvin.cao
2021-09-24 11:08 ` [PATCH 3/5] PCI/switchtec: Update the way of getting management VEP instance ID kelvin.cao
2021-09-24 11:08 ` [PATCH 4/5] PCI/switchtec: Replace ENOTSUPP with EOPNOTSUPP kelvin.cao
2021-09-24 11:08 ` [PATCH 5/5] PCI/switchtec: Add check of event support kelvin.cao
2021-09-24 15:53 ` [PATCH 0/5] Switchtec Fixes and Improvements Logan Gunthorpe
2021-09-25  5:27   ` Kelvin.Cao
2021-09-27 16:39 ` Bjorn Helgaas
2021-09-27 18:25   ` Kelvin.Cao
2021-10-08 17:05 ` Bjorn Helgaas
2021-10-08 17:23   ` Logan Gunthorpe
2021-10-08 18:25     ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=21326395-6d4b-07aa-f445-ecc5dc189d17@deltatee.com \
    --to=logang@deltatee.com \
    --cc=bhelgaas@google.com \
    --cc=helgaas@kernel.org \
    --cc=kelvin.cao@microchip.com \
    --cc=kelvincao@outlook.com \
    --cc=kurt.schwemmer@microsemi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).