All of lore.kernel.org
 help / color / mirror / Atom feed
* PCIe resets/restore and lack of CRS wait
@ 2018-03-22  4:03 Benjamin Herrenschmidt
  2018-03-22  4:36 ` okaya
  0 siblings, 1 reply; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2018-03-22  4:03 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas; +Cc: Michael Neuling

Hi Folks !

So while chasing some issues in our EEH error handling, we noticed that
the generic code has about a bazillion of "reset" path for devices,
most of them seemingly missing a wait for CRS after the reset.

That includes PM based resets or wakeups (can a D3->D0 transition cause
CRS to be returned ? Unclear but we should try to be safe), but mostly
it includes anything that resets the pcie port (PERST) or the secondary
bridge reset (hot resets).

For example take __pci_reset_function_locked(...), it can call
pci_parent_bus_reset() which will perform a hot reset but will *not*
wait for CRS.

There are a plethora of reset path in there that are similar, it's
actually hard to figure out which is what, but they all have in common
that they don't wait for CRS with the notable exception of the FLR
case.

I'm keen on doing a rather "blanket" fix by adding a CRS wait inside
pci_dev_restore(). Would you guys agree ?

Also why does pci_flr_wait() not use vendor/device ID but instead waits
on the COMMAND register being all 1's ? It's not clear to me ...
VID/DID will give a very specific signature for CRS which is ffff0001
while COMMAND could return all 1's for other reasons (device unplugged
for example).

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PCIe resets/restore and lack of CRS wait
  2018-03-22  4:03 PCIe resets/restore and lack of CRS wait Benjamin Herrenschmidt
@ 2018-03-22  4:36 ` okaya
  2018-03-22  5:12   ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 10+ messages in thread
From: okaya @ 2018-03-22  4:36 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linux-pci, Bjorn Helgaas, Michael Neuling, linux-pci-owner

On 2018-03-22 00:03, Benjamin Herrenschmidt wrote:
> Hi Folks !
> 
> So while chasing some issues in our EEH error handling, we noticed that
> the generic code has about a bazillion of "reset" path for devices,
> most of them seemingly missing a wait for CRS after the reset.
> 
> That includes PM based resets or wakeups (can a D3->D0 transition cause
> CRS to be returned ? Unclear but we should try to be safe), but mostly
> it includes anything that resets the pcie port (PERST) or the secondary
> bridge reset (hot resets).
> 
> For example take __pci_reset_function_locked(...), it can call
> pci_parent_bus_reset() which will perform a hot reset but will *not*
> wait for CRS.
> 
> There are a plethora of reset path in there that are similar, it's
> actually hard to figure out which is what, but they all have in common
> that they don't wait for CRS with the notable exception of the FLR
> case.

Bjorn merged my change for 4.17. Kernel should handle these now.

> 
> I'm keen on doing a rather "blanket" fix by adding a CRS wait inside
> pci_dev_restore(). Would you guys agree ?
> 
> Also why does pci_flr_wait() not use vendor/device ID but instead waits
> on the COMMAND register being all 1's ? It's not clear to me ...
> VID/DID will give a very specific signature for CRS which is ffff0001
> while COMMAND could return all 1's for other reasons (device unplugged
> for example).
> 

Because if you read vendor id of a virtual function, you get 0xffffffff


> Cheers,
> Ben.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PCIe resets/restore and lack of CRS wait
  2018-03-22  4:36 ` okaya
@ 2018-03-22  5:12   ` Benjamin Herrenschmidt
  2018-03-22  6:24     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2018-03-22  5:12 UTC (permalink / raw)
  To: okaya; +Cc: linux-pci, Bjorn Helgaas, Michael Neuling, linux-pci-owner

On Thu, 2018-03-22 at 00:36 -0400, okaya@codeaurora.org wrote:
> On 2018-03-22 00:03, Benjamin Herrenschmidt wrote:
> > Hi Folks !
> > 
> > So while chasing some issues in our EEH error handling, we noticed that
> > the generic code has about a bazillion of "reset" path for devices,
> > most of them seemingly missing a wait for CRS after the reset.
> > 
> > That includes PM based resets or wakeups (can a D3->D0 transition cause
> > CRS to be returned ? Unclear but we should try to be safe), but mostly
> > it includes anything that resets the pcie port (PERST) or the secondary
> > bridge reset (hot resets).
> > 
> > For example take __pci_reset_function_locked(...), it can call
> > pci_parent_bus_reset() which will perform a hot reset but will *not*
> > wait for CRS.
> > 
> > There are a plethora of reset path in there that are similar, it's
> > actually hard to figure out which is what, but they all have in common
> > that they don't wait for CRS with the notable exception of the FLR
> > case.
> 
> Bjorn merged my change for 4.17. Kernel should handle these now.

Ah nice ! I'll check that out, I was checking 4.16-rc6 !

> > I'm keen on doing a rather "blanket" fix by adding a CRS wait inside
> > pci_dev_restore(). Would you guys agree ?
> > 
> > Also why does pci_flr_wait() not use vendor/device ID but instead waits
> > on the COMMAND register being all 1's ? It's not clear to me ...
> > VID/DID will give a very specific signature for CRS which is ffff0001
> > while COMMAND could return all 1's for other reasons (device unplugged
> > for example).
> > 
> 
> Because if you read vendor id of a virtual function, you get 0xffffffff

Ah indeed, I forgot about that... 

Thanks !

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PCIe resets/restore and lack of CRS wait
  2018-03-22  5:12   ` Benjamin Herrenschmidt
@ 2018-03-22  6:24     ` Benjamin Herrenschmidt
  2018-03-22 11:25       ` okaya
  0 siblings, 1 reply; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2018-03-22  6:24 UTC (permalink / raw)
  To: okaya; +Cc: linux-pci, Bjorn Helgaas, Michael Neuling, linux-pci-owner

On Thu, 2018-03-22 at 16:12 +1100, Benjamin Herrenschmidt wrote:
> 
> > > I'm keen on doing a rather "blanket" fix by adding a CRS wait inside
> > > pci_dev_restore(). Would you guys agree ?
> > > 
> > > Also why does pci_flr_wait() not use vendor/device ID but instead waits
> > > on the COMMAND register being all 1's ? It's not clear to me ...
> > > VID/DID will give a very specific signature for CRS which is ffff0001
> > > while COMMAND could return all 1's for other reasons (device unplugged
> > > for example).
> > > 
> > 
> > Because if you read vendor id of a virtual function, you get 0xffffffff
> 
> Ah indeed, I forgot about that... 

Actually, that makes me a bit nervous, I wonder if we should limit
that "trick" to VFs and otherwise do the right thing. As per PCIe
spec 3.1a (I haven't looked at 4.0 yet)

<<
stalled while the device completes its self-initialization. Software
that intends to take advantage of
this mechanism must ensure that the first access made to a device
following a valid reset condition is
a Configuration Read Request accessing both bytes of the Vendor ID
field in the device’s
Configuration Space header. For this case only, the Root Complex, if
enabled, will synthesize a
special read-data value for the Vendor ID field to indicate to software
that CRS Completion Status
has been returned by the device. For other Configuration Requests, or
when CRS Software
Visibility is not enabled, the Root Complex will generally re-issue the
Configuration Request until it
completes with a status other than CRS as described in Section 2.3.2.
>>

That tells me that there is no guarantee by spec that we'll get
ffff's, instead we might get HW stalls, or other really nasty
effects when probing a register other than 0 (VID/DID) for CRS.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PCIe resets/restore and lack of CRS wait
  2018-03-22  6:24     ` Benjamin Herrenschmidt
@ 2018-03-22 11:25       ` okaya
  2018-03-22 13:46         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 10+ messages in thread
From: okaya @ 2018-03-22 11:25 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linux-pci, Bjorn Helgaas, Michael Neuling, linux-pci-owner

On 2018-03-22 02:24, Benjamin Herrenschmidt wrote:
> On Thu, 2018-03-22 at 16:12 +1100, Benjamin Herrenschmidt wrote:
>> 
>> > > I'm keen on doing a rather "blanket" fix by adding a CRS wait inside
>> > > pci_dev_restore(). Would you guys agree ?
>> > >
>> > > Also why does pci_flr_wait() not use vendor/device ID but instead waits
>> > > on the COMMAND register being all 1's ? It's not clear to me ...
>> > > VID/DID will give a very specific signature for CRS which is ffff0001
>> > > while COMMAND could return all 1's for other reasons (device unplugged
>> > > for example).
>> > >
>> >
>> > Because if you read vendor id of a virtual function, you get 0xffffffff
>> 
>> Ah indeed, I forgot about that...
> 
> Actually, that makes me a bit nervous, I wonder if we should limit
> that "trick" to VFs and otherwise do the right thing. As per PCIe
> spec 3.1a (I haven't looked at 4.0 yet)
> 
> <<
> stalled while the device completes its self-initialization. Software
> that intends to take advantage of
> this mechanism must ensure that the first access made to a device
> following a valid reset condition is
> a Configuration Read Request accessing both bytes of the Vendor ID
> field in the device’s
> Configuration Space header. For this case only, the Root Complex, if
> enabled, will synthesize a
> special read-data value for the Vendor ID field to indicate to software
> that CRS Completion Status
> has been returned by the device. For other Configuration Requests, or
> when CRS Software
> Visibility is not enabled, the Root Complex will generally re-issue the
> Configuration Request until it
> completes with a status other than CRS as described in Section 2.3.2.
>>> 
> 
> That tells me that there is no guarantee by spec that we'll get
> ffff's, instead we might get HW stalls, or other really nasty
> effects when probing a register other than 0 (VID/DID) for CRS.

AFAIK, spec also mentions that sw needs to observe 0xffffffff for all 
other registers other than vendor id during CRS period.

> 
> Cheers,
> Ben.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PCIe resets/restore and lack of CRS wait
  2018-03-22 11:25       ` okaya
@ 2018-03-22 13:46         ` Benjamin Herrenschmidt
  2018-03-22 13:58           ` Sinan Kaya
  0 siblings, 1 reply; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2018-03-22 13:46 UTC (permalink / raw)
  To: okaya; +Cc: linux-pci, Bjorn Helgaas, Michael Neuling, linux-pci-owner

On Thu, 2018-03-22 at 07:25 -0400, okaya@codeaurora.org wrote:
> > > > 
> > 
> > That tells me that there is no guarantee by spec that we'll get
> > ffff's, instead we might get HW stalls, or other really nasty
> > effects when probing a register other than 0 (VID/DID) for CRS.
> 
> AFAIK, spec also mentions that sw needs to observe 0xffffffff for all 
> other registers other than vendor id during CRS period.

This isnt what's in the 3.1a spec at least ... section 2.3.2 explains
the specified behaviour which is, for any register other than 0
(VID/DID), to re-issue the request...

Now, it can have a timeout, and thus might be completed as a failed
transaction after a while, but it's a sub-optimal way (ie, we'll end up
hogging the CPU on loads) and it's not 100% clear that a failed
transaction returns as all 1's (it should but ...).

I can make sure it happens the way the code expects on powerpc, I'm not
too worried about that, but I think for such a generic function, it
would make sense to stick a bit closer to the spec.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PCIe resets/restore and lack of CRS wait
  2018-03-22 13:46         ` Benjamin Herrenschmidt
@ 2018-03-22 13:58           ` Sinan Kaya
  2018-03-22 14:14             ` Bjorn Helgaas
  0 siblings, 1 reply; 10+ messages in thread
From: Sinan Kaya @ 2018-03-22 13:58 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linux-pci, Bjorn Helgaas, Michael Neuling, linux-pci-owner

On 3/22/2018 8:46 AM, Benjamin Herrenschmidt wrote:
> On Thu, 2018-03-22 at 07:25 -0400, okaya@codeaurora.org wrote:
>>>>>
>>>
>>> That tells me that there is no guarantee by spec that we'll get
>>> ffff's, instead we might get HW stalls, or other really nasty
>>> effects when probing a register other than 0 (VID/DID) for CRS.
>>
>> AFAIK, spec also mentions that sw needs to observe 0xffffffff for all 
>> other registers other than vendor id during CRS period.
> 
> This isnt what's in the 3.1a spec at least ... section 2.3.2 explains
> the specified behaviour which is, for any register other than 0
> (VID/DID), to re-issue the request...
> 

I don't have any hard preference on this. Bjorn wanted code to work for
systems with and without CRS capability. That was the reason we stayed
away from 0xffff0001.

CRS just gives you HW implementation defined retries for non vendor-id
register like you mentioned. If device does not reply in this period
of polling time, you should get all 1s eventually back.

All 1s is the spec way of saying device doesn't exist for config
transactions.

Some HW can poll indefinitely or other HW can return immediately.

If CRS visibility is supported & enabled, polling indefinitely and
hogging the CPU wouldn't be the best HW implementation.

> Now, it can have a timeout, and thus might be completed as a failed
> transaction after a while, but it's a sub-optimal way (ie, we'll end up
> hogging the CPU on loads) and it's not 100% clear that a failed
> transaction returns as all 1's (it should but ...).
> 
> I can make sure it happens the way the code expects on powerpc, I'm not
> too worried about that, but I think for such a generic function, it
> would make sense to stick a bit closer to the spec. 
> Cheers,
> Ben.
> 
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PCIe resets/restore and lack of CRS wait
  2018-03-22 13:58           ` Sinan Kaya
@ 2018-03-22 14:14             ` Bjorn Helgaas
  2018-03-22 17:54               ` Sinan Kaya
  0 siblings, 1 reply; 10+ messages in thread
From: Bjorn Helgaas @ 2018-03-22 14:14 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Benjamin Herrenschmidt, linux-pci, Bjorn Helgaas,
	Michael Neuling, linux-pci-owner

On Thu, Mar 22, 2018 at 08:58:06AM -0500, Sinan Kaya wrote:
> On 3/22/2018 8:46 AM, Benjamin Herrenschmidt wrote:
> > On Thu, 2018-03-22 at 07:25 -0400, okaya@codeaurora.org wrote:
> >>> That tells me that there is no guarantee by spec that we'll get
> >>> ffff's, instead we might get HW stalls, or other really nasty
> >>> effects when probing a register other than 0 (VID/DID) for CRS.
> >>
> >> AFAIK, spec also mentions that sw needs to observe 0xffffffff for all 
> >> other registers other than vendor id during CRS period.
> > 
> > This isnt what's in the 3.1a spec at least ... section 2.3.2 explains
> > the specified behaviour which is, for any register other than 0
> > (VID/DID), to re-issue the request...
> 
> I don't have any hard preference on this. Bjorn wanted code to work for
> systems with and without CRS capability. That was the reason we stayed
> away from 0xffff0001.

CRS SV is optional, so the code has to work when it's absent.  But we
can tell whether it's supported, so if we need to, we can do something
different when it's absent.

> CRS just gives you HW implementation defined retries for non vendor-id
> register like you mentioned. If device does not reply in this period
> of polling time, you should get all 1s eventually back.
> 
> All 1s is the spec way of saying device doesn't exist for config
> transactions.

I remember implementation notes mentioning all 1's data returns, e.g.,
PCIe r4.0, sec 2.3.2, but I don't *think* there's actually a spec
requirement that CRS or other errors be reported that way.  So if
there's a way to avoid relying on all 1's data, I think that would be
a good thing.

Bjorn

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PCIe resets/restore and lack of CRS wait
  2018-03-22 14:14             ` Bjorn Helgaas
@ 2018-03-22 17:54               ` Sinan Kaya
  2018-03-22 22:02                 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 10+ messages in thread
From: Sinan Kaya @ 2018-03-22 17:54 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Benjamin Herrenschmidt, linux-pci, Bjorn Helgaas,
	Michael Neuling, linux-pci-owner

Ben,

On 3/22/2018 9:14 AM, Bjorn Helgaas wrote:
> CRS SV is optional, so the code has to work when it's absent.  But we
> can tell whether it's supported, so if we need to, we can do something
> different when it's absent.

Let us know if you want to work on this or not. Otherwise, I'll add
to the list of things I'm doing.

Sinan

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PCIe resets/restore and lack of CRS wait
  2018-03-22 17:54               ` Sinan Kaya
@ 2018-03-22 22:02                 ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2018-03-22 22:02 UTC (permalink / raw)
  To: Sinan Kaya, Bjorn Helgaas
  Cc: linux-pci, Bjorn Helgaas, Michael Neuling, linux-pci-owner

On Thu, 2018-03-22 at 12:54 -0500, Sinan Kaya wrote:
> Ben,
> 
> On 3/22/2018 9:14 AM, Bjorn Helgaas wrote:
> > CRS SV is optional, so the code has to work when it's absent.  But we
> > can tell whether it's supported, so if we need to, we can do something
> > different when it's absent.
> 
> Let us know if you want to work on this or not. Otherwise, I'll add
> to the list of things I'm doing.

On my side (powerpc) I'll make sure we do get ffff's and that our retry
timeouts aren't too high. I just need to makes sure we don't end up
triggering an EEH failure on the timeout though.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-03-22 22:02 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-22  4:03 PCIe resets/restore and lack of CRS wait Benjamin Herrenschmidt
2018-03-22  4:36 ` okaya
2018-03-22  5:12   ` Benjamin Herrenschmidt
2018-03-22  6:24     ` Benjamin Herrenschmidt
2018-03-22 11:25       ` okaya
2018-03-22 13:46         ` Benjamin Herrenschmidt
2018-03-22 13:58           ` Sinan Kaya
2018-03-22 14:14             ` Bjorn Helgaas
2018-03-22 17:54               ` Sinan Kaya
2018-03-22 22:02                 ` Benjamin Herrenschmidt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.