All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Kelvin.Cao@microchip.com
Cc: logang@deltatee.com, kurt.schwemmer@microsemi.com,
	bhelgaas@google.com, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, kelvincao@outlook.com
Subject: Re: [PATCH 1/5] PCI/switchtec: Error out MRPC execution when no GAS access
Date: Sat, 2 Oct 2021 10:11:53 -0500	[thread overview]
Message-ID: <20211002151153.GA967141@bhelgaas> (raw)
In-Reply-To: <ed856f361ef3ca80e34c4565daffe6e566a8baa3.camel@microchip.com>

On Fri, Oct 01, 2021 at 11:49:18PM +0000, Kelvin.Cao@microchip.com wrote:
> On Fri, 2021-10-01 at 14:29 -0600, Logan Gunthorpe wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you
> > know the content is safe
> > 
> > On 2021-10-01 2:18 p.m., Bjorn Helgaas wrote:
> > > On Fri, Sep 24, 2021 at 11:08:38AM +0000, kelvin.cao@microchip.com
> > > wrote:
> > > > From: Kelvin Cao <kelvin.cao@microchip.com>
> > > > 
> > > > After a firmware hard reset, MRPC command executions, which
> > > > are based on the PCI BAR (which Microchip refers to as GAS)
> > > > read/write, will hang indefinitely. This is because after a
> > > > reset, the host will fail all GAS reads (get all 1s), in which
> > > > case the driver won't get a valid MRPC status.
> > > 
> > > Trying to write a merge commit log for this, but having a hard
> > > time summarizing it.  It sounds like it covers both
> > > Switchtec-specific (firmware and MRPC commands) and generic PCIe
> > > behavior (MMIO read failures).
> > > 
> > > This has something to do with a firmware hard reset.  What is
> > > that?  Is that like a firmware reboot?  A device reset, e.g.,
> > > FLR or secondary bus reset, that causes a firmware reboot?  A
> > > device reset initiated by firmware?
>
> A firmware reset can be triggered by a reset command issued to the
> firmware to reboot it.

So I guess this reset command was issued by the driver?

> > > Anyway, apparently when that happens, MMIO reads to the switch
> > > fail (timeout or error completion on PCIe) for a while.  If a
> > > device reset is involved, that much is standard PCIe behavior.
> > > And the driver sees ~0 data from those failed reads.  That's not
> > > part of the PCIe spec, but is typical root complex behavior.
> > > 
> > > But you said the MRPC commands hang indefinitely.  Presumably
> > > MMIO reads would start succeeding eventually when the device
> > > becomes ready, so I don't know how that translates to
> > > "indefinitely."
> > 
> > I suspect Kelvin can expand on this and fix the issue below. But
> > in my experience, the MMIO will read ~0 forever after a firmware
> > reset, until the system is rebooted. Presumably on systems that
> > have good hot plug support they are supposed to recover. Though
> > I've never seen that.
> 
> This is also my observation, all MMIO read will fail (~0 returned)
> until the system is rebooted or a PCI rescan is performed.

This made sense until you said MMIO reads would start working after a
PCI rescan.  A rescan doesn't really do anything special other than
doing config accesses to the device.  Two things come to mind:

1) Rescan does a config read of the Vendor ID, and devices may
respond with "Configuration Request Retry Status" if they are not
ready.  In this event, Linux retries this for a while.  This scenario
doesn't quite fit because it sounds like this is a device-specific
reset initiated by the driver, and CRS is not permited in this case.
PCIe r5.0, sec 2.3.1, says:

  A device Function is explicitly not permitted to return CRS
  following a software-initiated reset (other than an FLR) of the
  device, e.g., by the device's software driver writing to a
  device-specific reset bit.

2) The device may lose its bus and device number configuration after a
reset, so it must capture bus and device numbers from config writes.
I don't think Linux does this explicitly, but a rescan does do config
writes, which could accidentally fix something (PCIe r5.0, sec 2.2.9).

> > The MMIO read that signals the MRPC status always returns ~0 and the
> > userspace request will eventually time out.
> 
> The problem in this case is that, in DMA MRPC mode, the status (in host
> memory) is always initialized to 'in progress', and it's up to the
> firmware to update it to 'done' after the command is executed in the
> firmware. After a firmware reset is performed, the firmware cannot be
> triggered to start a MRPC command, therefore the status in host memory
> remains 'in progress' in the driver, which prevents a MRPC from timing
> out. I should have included this in the message.

I *thought* the problem was that the PCIe Memory Read failed and the
Root Complex fabricated ~0 data to complete the CPU read.  But now I'm
not sure, because it sounds like it might be that the PCIe transaction
succeeds, but it reads data that hasn't been updated by the firmware,
i.e., it reads 'in progress' because firmware hasn't updated it to
'done'.

Bjorn

  reply	other threads:[~2021-10-02 15:12 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-24 11:08 [PATCH 0/5] Switchtec Fixes and Improvements kelvin.cao
2021-09-24 11:08 ` [PATCH 1/5] PCI/switchtec: Error out MRPC execution when no GAS access kelvin.cao
2021-10-01 20:18   ` Bjorn Helgaas
2021-10-01 20:29     ` Logan Gunthorpe
2021-10-01 23:49       ` Kelvin.Cao
2021-10-02 15:11         ` Bjorn Helgaas [this message]
2021-10-04 20:51           ` Kelvin.Cao
2021-10-05 20:11             ` Bjorn Helgaas
2021-10-06  0:37               ` Kelvin.Cao
2021-10-06  2:33                 ` Bjorn Helgaas
2021-10-06  5:49                   ` Kelvin.Cao
2021-10-06 14:19                     ` Bjorn Helgaas
2021-10-06 19:00                       ` Kelvin.Cao
2021-10-06 20:20                         ` Bjorn Helgaas
2021-10-06 21:27                           ` Kelvin.Cao
2021-10-07 21:23                             ` Bjorn Helgaas
2021-10-08  0:06                               ` Kelvin.Cao
2021-10-08 11:03                                 ` Bjorn Helgaas
2021-10-01 22:58     ` Kelvin.Cao
2021-10-01 23:52       ` Logan Gunthorpe
2021-10-02  0:05         ` Kelvin.Cao
2021-09-24 11:08 ` [PATCH 2/5] PCI/switchtec: Fix a MRPC error status handling issue kelvin.cao
2021-09-24 11:08 ` [PATCH 3/5] PCI/switchtec: Update the way of getting management VEP instance ID kelvin.cao
2021-09-24 11:08 ` [PATCH 4/5] PCI/switchtec: Replace ENOTSUPP with EOPNOTSUPP kelvin.cao
2021-09-24 11:08 ` [PATCH 5/5] PCI/switchtec: Add check of event support kelvin.cao
2021-09-24 15:53 ` [PATCH 0/5] Switchtec Fixes and Improvements Logan Gunthorpe
2021-09-25  5:27   ` Kelvin.Cao
2021-09-27 16:39 ` Bjorn Helgaas
2021-09-27 18:25   ` Kelvin.Cao
2021-10-08 17:05 ` Bjorn Helgaas
2021-10-08 17:23   ` Logan Gunthorpe
2021-10-08 18:25     ` Bjorn Helgaas
  -- strict thread matches above, loose matches on Subject: below --
2021-09-24 10:24 [PATCH 0/5] Switchtec Miscellaneous " kelvin.cao
2021-09-24 10:24 ` [PATCH 1/5] PCI/switchtec: Error out MRPC execution when no GAS access kelvin.cao
2021-09-24 10:22 [PATCH 0/5] Switchtec Miscellaneous Fixes and Improvements kelvin.cao
2021-09-24 10:22 ` [PATCH 1/5] PCI/switchtec: Error out MRPC execution when no GAS access kelvin.cao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211002151153.GA967141@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=Kelvin.Cao@microchip.com \
    --cc=bhelgaas@google.com \
    --cc=kelvincao@outlook.com \
    --cc=kurt.schwemmer@microsemi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=logang@deltatee.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.