All of lore.kernel.org
 help / color / mirror / Atom feed
* Should a PCIe Link Down event set the PCI_DEV_DISCONNECTED bit?
@ 2018-07-26 22:38 Alex G.
  2018-07-26 23:00 ` Rajat Jain
  2018-07-27  7:18 ` Lukas Wunner
  0 siblings, 2 replies; 22+ messages in thread
From: Alex G. @ 2018-07-26 22:38 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-pci, Bolen, Austin, alex_gagniuc@dellteam.com

Hi,

I was under the impression that a DLLSC or PDSC would trigger the 
PCI_DEV_DISCONNECTED bit to be set, blocking any further config access.

Although I'm not seeing that happen after a  "Slot(xxx): Link Down" 
event. I suspect my understanding is then wrong. Or maybe 
PCI_DEV_DISCONNECTED means something else.

In the latter case, does it not make sense to have a separate bit to say 
"Don't touch this device"?

Alex

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Should a PCIe Link Down event set the PCI_DEV_DISCONNECTED bit?
  2018-07-26 22:38 Should a PCIe Link Down event set the PCI_DEV_DISCONNECTED bit? Alex G.
@ 2018-07-26 23:00 ` Rajat Jain
  2018-07-27  0:04   ` Alex_Gagniuc
  2018-07-27  7:18 ` Lukas Wunner
  1 sibling, 1 reply; 22+ messages in thread
From: Rajat Jain @ 2018-07-26 23:00 UTC (permalink / raw)
  To: Alexandru Gagniuc; +Cc: Busch, Keith, linux-pci, austin_bolen, Alex_Gagniuc

On Thu, Jul 26, 2018 at 3:38 PM Alex G. <mr.nuke.me@gmail.com> wrote:
>
> Hi,
>
> I was under the impression that a DLLSC or PDSC would trigger the
> PCI_DEV_DISCONNECTED bit to be set, blocking any further config access.

This sounds like an good idea to me. Caveat below.

>
> Although I'm not seeing that happen after a  "Slot(xxx): Link Down"
> event.

It should eventually happen when the pciehp takes a note and finally
processes the event in pciehp_unconfigure_device(). I suspect that the
reason this was put towards the end of the event processing was to be
absolutely be sure that the device removal won't be cancelled (Since
the bit is not being cleared anywhere, if somehow the device removal
is not processed, it may result in ghost devices that are permanently
"unreadable").

Thanks,

Rajat

> I suspect my understanding is then wrong. Or maybe
> PCI_DEV_DISCONNECTED means something else.
>
> In the latter case, does it not make sense to have a separate bit to say
> "Don't touch this device"?
>
> Alex

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Should a PCIe Link Down event set the PCI_DEV_DISCONNECTED bit?
  2018-07-26 23:00 ` Rajat Jain
@ 2018-07-27  0:04   ` Alex_Gagniuc
  0 siblings, 0 replies; 22+ messages in thread
From: Alex_Gagniuc @ 2018-07-27  0:04 UTC (permalink / raw)
  To: rajatja, mr.nuke.me; +Cc: keith.busch, linux-pci, Austin.Bolen

On 07/26/2018 06:01 PM, Rajat Jain wrote:=0A=
> On Thu, Jul 26, 2018 at 3:38 PM Alex G. <mr.nuke.me@gmail.com> wrote:=0A=
>>=0A=
>> Hi,=0A=
>>=0A=
>> I was under the impression that a DLLSC or PDSC would trigger the=0A=
>> PCI_DEV_DISCONNECTED bit to be set, blocking any further config access.=
=0A=
> =0A=
> This sounds like an good idea to me. Caveat below.=0A=
=0A=
=0A=
>>=0A=
>> Although I'm not seeing that happen after a  "Slot(xxx): Link Down"=0A=
>> event.=0A=
> =0A=
> It should eventually happen when the pciehp takes a note and finally=0A=
> processes the event in pciehp_unconfigure_device(). I suspect that the=0A=
> reason this was put towards the end of the event processing was to be=0A=
> absolutely be sure that the device removal won't be cancelled (Since=0A=
> the bit is not being cleared anywhere, if somehow the device removal=0A=
> is not processed, it may result in ghost devices that are permanently=0A=
> "unreadable").=0A=
=0A=
I think that's something that could be fixed by clearing the =0A=
DISCONNECTED bit on a link up.=0A=
What I'm confused about here is, I've put a dump_stack() in =0A=
pci_dev_set_disconnected(), and I'm not hitting it. I see the "Link =0A=
Down" message, from which I would expect the subordinate to get =0A=
disconnected eventually. I would expect this to get hit when removing a =0A=
device:=0A=
=0A=
[ 1256.290283] pciehp 0000:b0:05.0:pcie204: Slot(179): Link Down=0A=
[ 1256.300043] pciehp 0000:b0:05.0:pcie204: Slot(179): Card not present=0A=
[no other messages related to b0:05.0 or its children]=0A=
=0A=
=0A=
Alex=0A=
=0A=
> Thanks,=0A=
> =0A=
> Rajat=0A=
> =0A=
>> I suspect my understanding is then wrong. Or maybe=0A=
>> PCI_DEV_DISCONNECTED means something else.=0A=
>>=0A=
>> In the latter case, does it not make sense to have a separate bit to say=
=0A=
>> "Don't touch this device"?=0A=
>>=0A=
>> Alex=0A=
> =0A=
=0A=

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Should a PCIe Link Down event set the PCI_DEV_DISCONNECTED bit?
  2018-07-26 22:38 Should a PCIe Link Down event set the PCI_DEV_DISCONNECTED bit? Alex G.
  2018-07-26 23:00 ` Rajat Jain
@ 2018-07-27  7:18 ` Lukas Wunner
  2018-07-27 15:52   ` Alex_Gagniuc
  1 sibling, 1 reply; 22+ messages in thread
From: Lukas Wunner @ 2018-07-27  7:18 UTC (permalink / raw)
  To: Alex G.; +Cc: Keith Busch, linux-pci, Bolen, Austin, alex_gagniuc@dellteam.com

On Thu, Jul 26, 2018 at 05:38:50PM -0500, Alex G. wrote:
> I was under the impression that a DLLSC or PDSC would trigger the
> PCI_DEV_DISCONNECTED bit to be set, blocking any further config access.

Only if the Presence Detect State bit in the Slot Status register is
not set.

I think the idea was that if the card is still in the slot, its driver can
be unbound orderly because the device is still accessible.  So there's no
need to set PCI_DEV_DISCONNECTED.

However if there is a already a new card in the slot, PCI_DEV_DISCONNECTED
will erroneously not be set.  Also, if card removal was triggered by the
link going down but the card is still in the slot, PCI_DEV_DISCONNECTED
will also erroneously not be set even though it's inaccessible.

The only situations when PCI_DEV_DISCONNECTED should not be set is if the
card is being removed by sysfs or by the user pressing the Attention Button.
Anything else is a surprise removal.  What we need to do is pass down a
flag to pciehp_unconfigure_device() to indicate whether one of those two
situations is at hand or not, and PCI_DEV_DISCONNECTED should be set
depending on that flag.

Are you testing Bjorn's pci/hotplug branch or something based on 4.18 or
earlier?  There is a major event handling rework queued on that branch,
so you may want to test with that.


> In the latter case, does it not make sense to have a separate bit to say
> "Don't touch this device"?

Hm, what would you need it for?

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Should a PCIe Link Down event set the PCI_DEV_DISCONNECTED bit?
  2018-07-27  7:18 ` Lukas Wunner
@ 2018-07-27 15:52   ` Alex_Gagniuc
  2018-07-27 17:05     ` Lukas Wunner
  0 siblings, 1 reply; 22+ messages in thread
From: Alex_Gagniuc @ 2018-07-27 15:52 UTC (permalink / raw)
  To: lukas, mr.nuke.me
  Cc: keith.busch, linux-pci, Austin.Bolen, Stuart.Hayes, Narendra.K,
	Christopher.Arzola, David.Chalfant

Hi Lukas,=0A=
=0A=
+CC some more Dell folks.=0A=
=0A=
On 07/27/2018 02:18 AM, Lukas Wunner wrote:=0A=
> On Thu, Jul 26, 2018 at 05:38:50PM -0500, Alex G. wrote:=0A=
>> I was under the impression that a DLLSC or PDSC would trigger the=0A=
>> PCI_DEV_DISCONNECTED bit to be set, blocking any further config access.=
=0A=
> =0A=
> Only if the Presence Detect State bit in the Slot Status register is=0A=
> not set.=0A=
> =0A=
> I think the idea was that if the card is still in the slot, its driver ca=
n=0A=
> be unbound orderly because the device is still accessible.  So there's no=
=0A=
> need to set PCI_DEV_DISCONNECTED.=0A=
=0A=
This sounds counter to what I had intuited. I mentally associate =0A=
DISCONNECTED with "the link is down". I don't understand the idea of a =0A=
driver doing an orderly removal when the link is down. Device registers =0A=
wouldn't be accessible in that case.=0A=
=0A=
> However if there is a already a new card in the slot, PCI_DEV_DISCONNECTE=
D=0A=
> will erroneously not be set.  Also, if card removal was triggered by the=
=0A=
> link going down but the card is still in the slot, PCI_DEV_DISCONNECTED=
=0A=
> will also erroneously not be set even though it's inaccessible.=0A=
> =0A=
> The only situations when PCI_DEV_DISCONNECTED should not be set is if the=
=0A=
> card is being removed by sysfs or by the user pressing the Attention Butt=
on.=0A=
> Anything else is a surprise removal.  What we need to do is pass down a=
=0A=
> flag to pciehp_unconfigure_device() to indicate whether one of those two=
=0A=
> situations is at hand or not, and PCI_DEV_DISCONNECTED should be set=0A=
> depending on that flag.=0A=
=0A=
That makes a little more sense. Is someone working on this?=0A=
=0A=
> Are you testing Bjorn's pci/hotplug branch or something based on 4.18 or=
=0A=
> earlier?  There is a major event handling rework queued on that branch,=
=0A=
> so you may want to test with that.=0A=
=0A=
Been using 4.18-rc6. I'll take a look at Bjorn's latest and greatest. =0A=
Thank you for the heads-up.=0A=
=0A=
>> In the latter case, does it not make sense to have a separate bit to say=
=0A=
>> "Don't touch this device"?=0A=
> =0A=
> Hm, what would you need it for?=0A=
=0A=
Idiotic firmware bugs and unrefined device removal mechanisms. For =0A=
example, on removal, we shouldn't be touching MSI registers on a device =0A=
whose link is down. Firmware-first (FFS) systems tend to blow up, crash, =
=0A=
and burn when that happens.=0A=
=0A=
Another example is NVME subsystem resets (NSSR). NSSR resets the PCIe =0A=
link, and the PD state stays the same. Once teardown gets to masking the =
=0A=
MSI-X bits, it sends an MMIO to the downed link.=0A=
=0A=
I want to resolve such crashes by guarding teardown paths against =0A=
touching registers on the disconnected device -- yes, I know it's a =0A=
band-aid and not a cure.=0A=
=0A=
Alex=0A=
=0A=
> Thanks,=0A=
> =0A=
> Lukas=0A=
> =0A=

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Should a PCIe Link Down event set the PCI_DEV_DISCONNECTED bit?
  2018-07-27 15:52   ` Alex_Gagniuc
@ 2018-07-27 17:05     ` Lukas Wunner
  2018-07-27 17:51       ` Alex_Gagniuc
  0 siblings, 1 reply; 22+ messages in thread
From: Lukas Wunner @ 2018-07-27 17:05 UTC (permalink / raw)
  To: Alex_Gagniuc
  Cc: mr.nuke.me, keith.busch, linux-pci, Austin.Bolen, Stuart.Hayes,
	Narendra.K, Christopher.Arzola, David.Chalfant

On Fri, Jul 27, 2018 at 03:52:04PM +0000, Alex_Gagniuc@Dellteam.com wrote:
> On 07/27/2018 02:18 AM, Lukas Wunner wrote:
> > On Thu, Jul 26, 2018 at 05:38:50PM -0500, Alex G. wrote:
> >> I was under the impression that a DLLSC or PDSC would trigger the
> >> PCI_DEV_DISCONNECTED bit to be set, blocking any further config access.
> > 
> > Only if the Presence Detect State bit in the Slot Status register is
> > not set.
> > 
> > I think the idea was that if the card is still in the slot, its driver can
> > be unbound orderly because the device is still accessible.  So there's no
> > need to set PCI_DEV_DISCONNECTED.
> 
> This sounds counter to what I had intuited. I mentally associate 
> DISCONNECTED with "the link is down". I don't understand the idea of a 
> driver doing an orderly removal when the link is down. Device registers 
> wouldn't be accessible in that case.

That's basically what I meant, sorry for not being clear.

> > However if there is a already a new card in the slot, PCI_DEV_DISCONNECTED
> > will erroneously not be set.  Also, if card removal was triggered by the
> > link going down but the card is still in the slot, PCI_DEV_DISCONNECTED
> > will also erroneously not be set even though it's inaccessible.
> > 
> > The only situations when PCI_DEV_DISCONNECTED should not be set is if the
> > card is being removed by sysfs or by the user pressing the Attention Button.
> > Anything else is a surprise removal.  What we need to do is pass down a
> > flag to pciehp_unconfigure_device() to indicate whether one of those two
> > situations is at hand or not, and PCI_DEV_DISCONNECTED should be set
> > depending on that flag.
> 
> That makes a little more sense. Is someone working on this?

Me, but not for 4.19, we're too late in the cycle, I'm going to post
a small number of fixup patches for Bjorn's pci/hotplug branch shortly,
to be included in 4.19, and that's it.

I posted a patch as part of the series that's now on Bjorn's pci/hotplug
branch which touched PCI_DEV_DISCONNECTED code in pciehp_pci.c, but had
to withdraw that particular patch:
https://patchwork.ozlabs.org/patch/930403/

The first problem with that patch was that I hadn't fully understood
yet when to set PCI_DEV_DISCONNECTED and when not to set it.

The second problem was that it fixed a deadlock on unplug in a way
that wasn't generic enough.  The same deadlock can occur in other
situations.  The real fix is to unbind the driver lockless in
pci_stop_dev().  That's non-trivial to achieve, but doable.
I don't think there's a way around that to fix the problem:
https://www.spinics.net/lists/linux-pci/msg73652.html

There's (still) a lot that can be improved in pciehp, I hope to
keep working on that as time permits.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Should a PCIe Link Down event set the PCI_DEV_DISCONNECTED bit?
  2018-07-27 17:05     ` Lukas Wunner
@ 2018-07-27 17:51       ` Alex_Gagniuc
  2018-07-27 18:17         ` Sinan Kaya
  2018-07-28 18:31         ` Lukas Wunner
  0 siblings, 2 replies; 22+ messages in thread
From: Alex_Gagniuc @ 2018-07-27 17:51 UTC (permalink / raw)
  To: lukas
  Cc: mr.nuke.me, keith.busch, linux-pci, Austin.Bolen, Stuart.Hayes,
	Narendra.K, Christopher.Arzola, David.Chalfant

On 07/27/2018 12:05 PM, Lukas Wunner wrote:=0A=
> On Fri, Jul 27, 2018 at 03:52:04PM +0000, Alex_Gagniuc@Dellteam.com wrote=
:=0A=
>> On 07/27/2018 02:18 AM, Lukas Wunner wrote:=0A=
>>> On Thu, Jul 26, 2018 at 05:38:50PM -0500, Alex G. wrote:=0A=
>>>> I was under the impression that a DLLSC or PDSC would trigger the=0A=
>>>> PCI_DEV_DISCONNECTED bit to be set, blocking any further config access=
.=0A=
>>>=0A=
>>> Only if the Presence Detect State bit in the Slot Status register is=0A=
>>> not set.=0A=
>>>=0A=
>>> I think the idea was that if the card is still in the slot, its driver =
can=0A=
>>> be unbound orderly because the device is still accessible.  So there's =
no=0A=
>>> need to set PCI_DEV_DISCONNECTED.=0A=
>>=0A=
>> This sounds counter to what I had intuited. I mentally associate=0A=
>> DISCONNECTED with "the link is down". I don't understand the idea of a=
=0A=
>> driver doing an orderly removal when the link is down. Device registers=
=0A=
>> wouldn't be accessible in that case.=0A=
> =0A=
> That's basically what I meant, sorry for not being clear.=0A=
> =0A=
>>> However if there is a already a new card in the slot, PCI_DEV_DISCONNEC=
TED=0A=
>>> will erroneously not be set.  Also, if card removal was triggered by th=
e=0A=
>>> link going down but the card is still in the slot, PCI_DEV_DISCONNECTED=
=0A=
>>> will also erroneously not be set even though it's inaccessible.=0A=
>>>=0A=
>>> The only situations when PCI_DEV_DISCONNECTED should not be set is if t=
he=0A=
>>> card is being removed by sysfs or by the user pressing the Attention Bu=
tton.=0A=
>>> Anything else is a surprise removal.  What we need to do is pass down a=
=0A=
>>> flag to pciehp_unconfigure_device() to indicate whether one of those tw=
o=0A=
>>> situations is at hand or not, and PCI_DEV_DISCONNECTED should be set=0A=
>>> depending on that flag.=0A=
>>=0A=
>> That makes a little more sense. Is someone working on this?=0A=
> =0A=
> Me, but not for 4.19, we're too late in the cycle, I'm going to post=0A=
> a small number of fixup patches for Bjorn's pci/hotplug branch shortly,=
=0A=
> to be included in 4.19, and that's it.=0A=
> =0A=
> I posted a patch as part of the series that's now on Bjorn's pci/hotplug=
=0A=
> branch which touched PCI_DEV_DISCONNECTED code in pciehp_pci.c, but had=
=0A=
> to withdraw that particular patch:=0A=
> https://patchwork.ozlabs.org/patch/930403/=0A=
> =0A=
> The first problem with that patch was that I hadn't fully understood=0A=
> yet when to set PCI_DEV_DISCONNECTED and when not to set it.=0A=
=0A=
I think PCI_DEV_DISCONNECTED is a documentation issue above all else. =0A=
The history I was given is that drivers would take a very long time to =0A=
tear down a device. Config space IO to an nonexistent device took a long =
=0A=
while to time out. Performance was one motivation -- and was not documented=
.=0A=
=0A=
> The second problem was that it fixed a deadlock on unplug in a way=0A=
> that wasn't generic enough.  The same deadlock can occur in other=0A=
> situations.  The real fix is to unbind the driver lockless in=0A=
> pci_stop_dev().  That's non-trivial to achieve, but doable.=0A=
> I don't think there's a way around that to fix the problem:=0A=
> https://www.spinics.net/lists/linux-pci/msg73652.html=0A=
> =0A=
> There's (still) a lot that can be improved in pciehp, I hope to=0A=
> keep working on that as time permits.=0A=
=0A=
Thanks for all the info. The fix that I was settling on is (pasted) =0A=
below. Though that seems to conflict a bit with what you are trying to =0A=
do. Now I'm a little conflicted If I should try to submit the below or not.=
=0A=
=0A=
Alex=0A=
=0A=
diff --git a/drivers/pci/hotplug/pciehp_pci.c =0A=
b/drivers/pci/hotplug/pciehp_pci.c=0A=
index 3f518dea856d..5cf02639fa0b 100644=0A=
--- a/drivers/pci/hotplug/pciehp_pci.c=0A=
+++ b/drivers/pci/hotplug/pciehp_pci.c=0A=
@@ -74,6 +74,7 @@ int pciehp_unconfigure_device(struct slot *p_slot)=0A=
         ctrl_dbg(ctrl, "%s: domain:bus:dev =3D %04x:%02x:00\n",=0A=
                  __func__, pci_domain_nr(parent), parent->number);=0A=
         pciehp_get_adapter_status(p_slot, &presence);=0A=
+       presence =3D presence && pciehp_check_link_active(ctrl);=0A=
=0A=
         pci_lock_rescan_remove();=0A=
=0A=
=0A=
=0A=
> Thanks,=0A=
> =0A=
> Lukas=0A=
> =0A=
=0A=

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: Should a PCIe Link Down event set the PCI_DEV_DISCONNECTED bit?
  2018-07-27 17:51       ` Alex_Gagniuc
@ 2018-07-27 18:17         ` Sinan Kaya
  2018-07-27 18:23           ` Alex_Gagniuc
  2018-07-28 18:31         ` Lukas Wunner
  1 sibling, 1 reply; 22+ messages in thread
From: Sinan Kaya @ 2018-07-27 18:17 UTC (permalink / raw)
  To: Alex_Gagniuc, lukas
  Cc: mr.nuke.me, keith.busch, linux-pci, Austin.Bolen, Stuart.Hayes,
	Narendra.K, Christopher.Arzola, David.Chalfant

On 7/27/2018 10:51 AM, Alex_Gagniuc@Dellteam.com wrote:
>> The first problem with that patch was that I hadn't fully understood
>> yet when to set PCI_DEV_DISCONNECTED and when not to set it.
> I think PCI_DEV_DISCONNECTED is a documentation issue above all else.
> The history I was given is that drivers would take a very long time to
> tear down a device. Config space IO to an nonexistent device took a long
> while to time out. Performance was one motivation -- and was not documented.
> 

Completion timeouts are typically in the orders of 50ms. If you have a
lot of outstanding non-posted requests, it will take time for SW to
flush all requests.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Should a PCIe Link Down event set the PCI_DEV_DISCONNECTED bit?
  2018-07-27 18:17         ` Sinan Kaya
@ 2018-07-27 18:23           ` Alex_Gagniuc
  2018-07-27 18:34             ` Sinan Kaya
  0 siblings, 1 reply; 22+ messages in thread
From: Alex_Gagniuc @ 2018-07-27 18:23 UTC (permalink / raw)
  To: okaya, lukas
  Cc: mr.nuke.me, keith.busch, linux-pci, Austin.Bolen, Stuart.Hayes,
	Narendra.K, Christopher.Arzola, David.Chalfant

On 07/27/2018 01:18 PM, Sinan Kaya wrote:=0A=
> On 7/27/2018 10:51 AM, Alex_Gagniuc@Dellteam.com wrote:=0A=
>>> The first problem with that patch was that I hadn't fully understood=0A=
>>> yet when to set PCI_DEV_DISCONNECTED and when not to set it.=0A=
>> I think PCI_DEV_DISCONNECTED is a documentation issue above all else.=0A=
>> The history I was given is that drivers would take a very long time to=
=0A=
>> tear down a device. Config space IO to an nonexistent device took a long=
=0A=
>> while to time out. Performance was one motivation -- and was not documen=
ted.=0A=
>>=0A=
> =0A=
> Completion timeouts are typically in the orders of 50ms. If you have a=0A=
> lot of outstanding non-posted requests, it will take time for SW to=0A=
> flush all requests.=0A=
=0A=
That's why we check pci_dev_is_disconnected() in pci/access.c=0A=
But if the DISCONNECTED bit doesn't get set, it makes me wonder how =0A=
solid things currently are.=0A=
=0A=
Alex=0A=

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Should a PCIe Link Down event set the PCI_DEV_DISCONNECTED bit?
  2018-07-27 18:23           ` Alex_Gagniuc
@ 2018-07-27 18:34             ` Sinan Kaya
  0 siblings, 0 replies; 22+ messages in thread
From: Sinan Kaya @ 2018-07-27 18:34 UTC (permalink / raw)
  To: Alex_Gagniuc, lukas
  Cc: mr.nuke.me, keith.busch, linux-pci, Austin.Bolen, Stuart.Hayes,
	Narendra.K, Christopher.Arzola, David.Chalfant

On 7/27/2018 11:23 AM, Alex_Gagniuc@Dellteam.com wrote:
>> Completion timeouts are typically in the orders of 50ms. If you have a
>> lot of outstanding non-posted requests, it will take time for SW to
>> flush all requests.
> That's why we check pci_dev_is_disconnected() in pci/access.c
> But if the DISCONNECTED bit doesn't get set, it makes me wonder how
> solid things currently are.

I don't see any problem by setting this bit as long as the link is
down regardless of the drive presence state and clearing it on link 
recovery.

Is there a problem with this approach? I'm catching up with this
thread.

pcie_do_fatal_recovery() doesn't query presence as an example.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Should a PCIe Link Down event set the PCI_DEV_DISCONNECTED bit?
  2018-07-27 17:51       ` Alex_Gagniuc
  2018-07-27 18:17         ` Sinan Kaya
@ 2018-07-28 18:31         ` Lukas Wunner
  2018-07-29  0:26           ` Sinan Kaya
                             ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: Lukas Wunner @ 2018-07-28 18:31 UTC (permalink / raw)
  To: Alex_Gagniuc
  Cc: mr.nuke.me, keith.busch, linux-pci, Austin.Bolen, Stuart.Hayes,
	Narendra.K, Christopher.Arzola, David.Chalfant, Sinan Kaya

On Fri, Jul 27, 2018 at 05:51:04PM +0000, Alex_Gagniuc@Dellteam.com wrote:
> I think PCI_DEV_DISCONNECTED is a documentation issue above all else. 
> The history I was given is that drivers would take a very long time to 
> tear down a device. Config space IO to an nonexistent device took a long 
> while to time out. Performance was one motivation -- and was not documented.

Often it is possible for the driver to detect surprise removal by
checking if mmio reads return "all ones".  But in some cases that's
a valid value to read from mmio and then this approach won't work.
Also, checking every mmio read may negatively impact performance.

Finally, if the card was quickly swapped and the link to the new
card is already up, you may be accessing that new card.  (mmio
accesses may then still return all ones if the BARs are blank,
but at least config space accesses should work.)

Once it has been determined that the device has been surprise removed,
that fact should be cached somewhere to short-circuit any further device
accesses.  PCI_DEV_DISCONNECTED can act as such a cache.


> Thanks for all the info. The fix that I was settling on is (pasted) 
> below. Though that seems to conflict a bit with what you are trying to 
> do. Now I'm a little conflicted If I should try to submit the below or not.
> 
> --- a/drivers/pci/hotplug/pciehp_pci.c
> +++ b/drivers/pci/hotplug/pciehp_pci.c
> @@ -74,6 +74,7 @@ int pciehp_unconfigure_device(struct slot *p_slot)
>          ctrl_dbg(ctrl, "%s: domain:bus:dev = %04x:%02x:00\n",
>                   __func__, pci_domain_nr(parent), parent->number);
>          pciehp_get_adapter_status(p_slot, &presence);
> +       presence = presence && pciehp_check_link_active(ctrl);

That approach won't work if the card was quickly swapped and the link
to the new card is already up when pciehp_unconfigure_device() runs.

FWIW, the below is what I had in mind (on top of Bjorn's pci/hotplug
branch).  Does this work for you?

-- >8 --
Subject: [PATCH] PCI: pciehp: Differentiate between surprise and safe removal

When removing PCI devices below a hotplug bridge, pciehp marks them as
disconnected if the card is no longer present in the slot or it quiesces
them if the card is still present (by disabling INTx interrupts, bus
mastering and SERR# reporting).

To detect whether the card is still present, pciehp checks the Presence
Detect State bit in the Slot Status register.  The problem with this
approach is that even if the card is present, the link to it may be
down, and it that case it would be better to mark the devices as
disconnected instead of trying to quiesce them.  Moreover, if the card
in the slot was quickly replaced by another one, the Presence Detect
State bit would be set, yet trying to quiesce the new card's devices
would be wrong and the correct thing to do is to mark the previous
card's devices as disconnected.

Instead of looking at the Presence Detect State bit, it is better to
differentiate whether the card was surprise removed versus safely
removed (via sysfs or an Attention Button press).  On surprise removal,
the devices should be marked as disconnected, whereas on safe removal it
is correct to quiesce the devices.

The knowledge whether a surprise removal or a safe removal is at hand
does exist further up in the call stack:  A surprise removal is
initiated by pciehp_handle_presence_or_link_change(), a safe removal by
pciehp_handle_disable_request().

Pass that information down to pciehp_unconfigure_device() and use it in
lieu of the Presence Detect State bit.  While there, add kernel-doc to
pciehp_unconfigure_device() and pciehp_configure_device().

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/pci/hotplug/pciehp.h      |  2 +-
 drivers/pci/hotplug/pciehp_ctrl.c | 22 +++++++++++++---------
 drivers/pci/hotplug/pciehp_pci.c  | 23 ++++++++++++++++++++---
 3 files changed, 34 insertions(+), 13 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 652c46d9b215..bce29ae769dd 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -195,7 +195,7 @@ void pciehp_handle_button_press(struct slot *slot);
 void pciehp_handle_disable_request(struct slot *slot);
 void pciehp_handle_presence_or_link_change(struct slot *slot, u32 events);
 int pciehp_configure_device(struct slot *p_slot);
-void pciehp_unconfigure_device(struct slot *p_slot);
+void pciehp_unconfigure_device(struct slot *p_slot, bool presence);
 void pciehp_queue_pushbutton_work(struct work_struct *work);
 struct controller *pcie_init(struct pcie_device *dev);
 int pcie_init_notification(struct controller *ctrl);
diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index d7d55160b5f8..8836648e145f 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -40,6 +40,9 @@
    hotplug controller logic
  */
 
+#define SAFE_REMOVAL	 true
+#define SURPRISE_REMOVAL false
+
 static void set_slot_off(struct controller *ctrl, struct slot *pslot)
 {
 	/* turn off slot, turn on Amber LED, turn off Green LED if supported*/
@@ -115,12 +118,13 @@ static int board_added(struct slot *p_slot)
 /**
  * remove_board - Turns off slot and LEDs
  * @p_slot: slot where board is being removed
+ * @safe_removal: whether the board is safely removed (versus surprise removed)
  */
-static void remove_board(struct slot *p_slot)
+static void remove_board(struct slot *p_slot, bool safe_removal)
 {
 	struct controller *ctrl = p_slot->ctrl;
 
-	pciehp_unconfigure_device(p_slot);
+	pciehp_unconfigure_device(p_slot, safe_removal);
 
 	if (POWER_CTRL(ctrl)) {
 		pciehp_power_off_slot(p_slot);
@@ -138,7 +142,7 @@ static void remove_board(struct slot *p_slot)
 }
 
 static int pciehp_enable_slot(struct slot *slot);
-static int pciehp_disable_slot(struct slot *slot);
+static int pciehp_disable_slot(struct slot *slot, bool safe_removal);
 
 void pciehp_request(struct controller *ctrl, int action)
 {
@@ -230,7 +234,7 @@ void pciehp_handle_disable_request(struct slot *slot)
 	slot->state = POWEROFF_STATE;
 	mutex_unlock(&slot->lock);
 
-	ctrl->request_result = pciehp_disable_slot(slot);
+	ctrl->request_result = pciehp_disable_slot(slot, SAFE_REMOVAL);
 }
 
 void pciehp_handle_presence_or_link_change(struct slot *slot, u32 events)
@@ -257,7 +261,7 @@ void pciehp_handle_presence_or_link_change(struct slot *slot, u32 events)
 		if (events & PCI_EXP_SLTSTA_PDC)
 			ctrl_info(ctrl, "Slot(%s): Card not present\n",
 				  slot_name(slot));
-		pciehp_disable_slot(slot);
+		pciehp_disable_slot(slot, SURPRISE_REMOVAL);
 		break;
 	default:
 		mutex_unlock(&slot->lock);
@@ -343,7 +347,7 @@ static int pciehp_enable_slot(struct slot *slot)
 	return ret;
 }
 
-static int __pciehp_disable_slot(struct slot *p_slot)
+static int __pciehp_disable_slot(struct slot *p_slot, bool safe_removal)
 {
 	u8 getstatus = 0;
 	struct controller *ctrl = p_slot->ctrl;
@@ -357,17 +361,17 @@ static int __pciehp_disable_slot(struct slot *p_slot)
 		}
 	}
 
-	remove_board(p_slot);
+	remove_board(p_slot, safe_removal);
 	return 0;
 }
 
-static int pciehp_disable_slot(struct slot *slot)
+static int pciehp_disable_slot(struct slot *slot, bool safe_removal)
 {
 	struct controller *ctrl = slot->ctrl;
 	int ret;
 
 	pm_runtime_get_sync(&ctrl->pcie->port->dev);
-	ret = __pciehp_disable_slot(slot);
+	ret = __pciehp_disable_slot(slot, safe_removal);
 	pm_runtime_put(&ctrl->pcie->port->dev);
 
 	mutex_lock(&slot->lock);
diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c
index ec3f065bb1c0..079aac163484 100644
--- a/drivers/pci/hotplug/pciehp_pci.c
+++ b/drivers/pci/hotplug/pciehp_pci.c
@@ -34,6 +34,14 @@
 #include "../pci.h"
 #include "pciehp.h"
 
+/**
+ * pciehp_configure_device() - enumerate PCI devices below a hotplug bridge
+ * @p_slot: PCIe hotplug slot
+ *
+ * Enumerate PCI devices below a hotplug bridge and add them to the system.
+ * Return 0 on success, %-EEXIST if the devices are already enumerated or
+ * %-ENODEV if enumeration failed.
+ */
 int pciehp_configure_device(struct slot *p_slot)
 {
 	struct pci_dev *dev;
@@ -76,9 +84,19 @@ int pciehp_configure_device(struct slot *p_slot)
 	return ret;
 }
 
-void pciehp_unconfigure_device(struct slot *p_slot)
+/**
+ * pciehp_unconfigure_device() - remove PCI devices below a hotplug bridge
+ * @p_slot: PCIe hotplug slot
+ * @presence: whether the card is still present in the slot;
+ *	true for safe removal via sysfs or an Attention Button press,
+ *	false for surprise removal
+ *
+ * Unbind PCI devices below a hotplug bridge from their drivers and remove
+ * them from the system.  Safely removed devices are quiesced.  Surprise
+ * removed devices are marked as such to prevent further accesses.
+ */
+void pciehp_unconfigure_device(struct slot *p_slot, bool presence)
 {
-	u8 presence = 0;
 	struct pci_dev *dev, *temp;
 	struct pci_bus *parent = p_slot->ctrl->pcie->port->subordinate;
 	u16 command;
@@ -86,7 +104,6 @@ void pciehp_unconfigure_device(struct slot *p_slot)
 
 	ctrl_dbg(ctrl, "%s: domain:bus:dev = %04x:%02x:00\n",
 		 __func__, pci_domain_nr(parent), parent->number);
-	pciehp_get_adapter_status(p_slot, &presence);
 
 	pci_lock_rescan_remove();
 
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: Should a PCIe Link Down event set the PCI_DEV_DISCONNECTED bit?
  2018-07-28 18:31         ` Lukas Wunner
@ 2018-07-29  0:26           ` Sinan Kaya
  2018-07-29 12:09             ` Lukas Wunner
  2018-07-30 13:28           ` David Laight
  2018-07-30 21:38           ` Alex_Gagniuc
  2 siblings, 1 reply; 22+ messages in thread
From: Sinan Kaya @ 2018-07-29  0:26 UTC (permalink / raw)
  To: Lukas Wunner, Alex_Gagniuc
  Cc: mr.nuke.me, keith.busch, linux-pci, Austin.Bolen, Stuart.Hayes,
	Narendra.K, Christopher.Arzola, David.Chalfant

On 7/28/2018 11:31 AM, Lukas Wunner wrote:
> The knowledge whether a surprise removal or a safe removal is at hand
> does exist further up in the call stack:  A surprise removal is
> initiated by pciehp_handle_presence_or_link_change(), a safe removal by
> pciehp_handle_disable_request().

Can you also check if platform supports surprise link down error
reporting (Link Capabilities Register) and reports a surprise link
down event in AER Uncorrectable Error Status Register for the
hotplug code to make it more reliable?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Should a PCIe Link Down event set the PCI_DEV_DISCONNECTED bit?
  2018-07-29  0:26           ` Sinan Kaya
@ 2018-07-29 12:09             ` Lukas Wunner
  2018-07-29 16:59               ` Sinan Kaya
  0 siblings, 1 reply; 22+ messages in thread
From: Lukas Wunner @ 2018-07-29 12:09 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Alex_Gagniuc, mr.nuke.me, keith.busch, linux-pci, Austin.Bolen,
	Stuart.Hayes, Narendra.K, Christopher.Arzola, David.Chalfant

On Sat, Jul 28, 2018 at 05:26:57PM -0700, Sinan Kaya wrote:
> On 7/28/2018 11:31 AM, Lukas Wunner wrote:
> >The knowledge whether a surprise removal or a safe removal is at hand
> >does exist further up in the call stack:  A surprise removal is
> >initiated by pciehp_handle_presence_or_link_change(), a safe removal by
> >pciehp_handle_disable_request().
> 
> Can you also check if platform supports surprise link down error
> reporting (Link Capabilities Register) and reports a surprise link
> down event in AER Uncorrectable Error Status Register for the
> hotplug code to make it more reliable?

We read the Link Capabilities register in pcie_init() to determine if
Data Link Layer Link Active Reporting is supported.  (That's a feature
added in the PCIe r1.1 Base Spec.  Old devices that strictly adhere to
PCIe r1.0 don't support it.)

We could likewise cache the Surprise Down Error Reporting Capable bit
in struct controller.  But I don't quite understand yet how and when
you want it to be used by pciehp?  If the link goes down, pciehp doesn't
care whether that's caused by a fatal error or removal by the user.
It seems correct to me to also remove devices on a fatal error, after all
they're no longer accessible until the error is cleared (IIUC).
Do you agree or disagree?

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Should a PCIe Link Down event set the PCI_DEV_DISCONNECTED bit?
  2018-07-29 12:09             ` Lukas Wunner
@ 2018-07-29 16:59               ` Sinan Kaya
  0 siblings, 0 replies; 22+ messages in thread
From: Sinan Kaya @ 2018-07-29 16:59 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Alex_Gagniuc, mr.nuke.me, keith.busch, linux-pci, Austin.Bolen,
	Stuart.Hayes, Narendra.K, Christopher.Arzola, David.Chalfant

On 7/29/18, Lukas Wunner <lukas@wunner.de> wrote:
> On Sat, Jul 28, 2018 at 05:26:57PM -0700, Sinan Kaya wrote:
>> On 7/28/2018 11:31 AM, Lukas Wunner wrote:
>> >The knowledge whether a surprise removal or a safe removal is at hand
>> >does exist further up in the call stack:  A surprise removal is
>> >initiated by pciehp_handle_presence_or_link_change(), a safe removal by
>> >pciehp_handle_disable_request().
>>
>> Can you also check if platform supports surprise link down error
>> reporting (Link Capabilities Register) and reports a surprise link
>> down event in AER Uncorrectable Error Status Register for the
>> hotplug code to make it more reliable?
>
> We read the Link Capabilities register in pcie_init() to determine if
> Data Link Layer Link Active Reporting is supported.  (That's a feature
> added in the PCIe r1.1 Base Spec.  Old devices that strictly adhere to
> PCIe r1.0 don't support it.)
>
> We could likewise cache the Surprise Down Error Reporting Capable bit
> in struct controller.  But I don't quite understand yet how and when
> you want it to be used by pciehp?  If the link goes down, pciehp doesn't
> care whether that's caused by a fatal error or removal by the user.
> It seems correct to me to also remove devices on a fatal error, after all
> they're no longer accessible until the error is cleared (IIUC).
> Do you agree or disagree?

Yes, we have to remove the devices for both. However, I don't think
pciehp is the right place for fatal error link events. I am trying to
separate these two execution paths.

If we fail, someone will have to take the challenge and unify link
code for both.
Right now, two threads are trying to do the same thing in parallel.

Why surprise link down check...

Data link layer can change due to signal integrity issues.
Spec defines surprise link down bit to separate device removal
>From other link quality issues.

That's why, i was suggesting to check it if a pcie device supports
It.

>
> Thanks,
>
> Lukas
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Should a PCIe Link Down event set the PCI_DEV_DISCONNECTED bit?
  2018-07-28 18:31         ` Lukas Wunner
  2018-07-29  0:26           ` Sinan Kaya
@ 2018-07-30 13:28           ` David Laight
  2018-07-30 13:54             ` Lukas Wunner
  2018-07-30 21:38           ` Alex_Gagniuc
  2 siblings, 1 reply; 22+ messages in thread
From: David Laight @ 2018-07-30 13:28 UTC (permalink / raw)
  To: 'Lukas Wunner', Alex_Gagniuc
  Cc: mr.nuke.me, keith.busch, linux-pci, Austin.Bolen, Stuart.Hayes,
	Narendra.K, Christopher.Arzola, David.Chalfant, Sinan Kaya

From: Lukas Wunner
> Sent: 28 July 2018 19:32
...
> Finally, if the card was quickly swapped and the link to the new
> card is already up, you may be accessing that new card.  (mmio
> accesses may then still return all ones if the BARs are blank,
> but at least config space accesses should work.)

On my i7-7700 system that no longer works (at least with some cards).
If I take the PCIe link down completely (reset the FPGA on the card)
it doesn't recover (loops through detect active/quiet and a third
state I can't quite remember).

ISTR that it recovers from the link going down when I short out
the PCIe data lines.

It worked fine on a XEON E5-2609 system - I did it a lot when
updating the fpga image.

Can anyone else verify whether this works on other systems?
Or whether the kernel (or BIOS) needs to (re-)initialise
some register to make link recovery work.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Should a PCIe Link Down event set the PCI_DEV_DISCONNECTED bit?
  2018-07-30 13:28           ` David Laight
@ 2018-07-30 13:54             ` Lukas Wunner
  2018-07-30 16:06               ` David Laight
  0 siblings, 1 reply; 22+ messages in thread
From: Lukas Wunner @ 2018-07-30 13:54 UTC (permalink / raw)
  To: David Laight
  Cc: Alex_Gagniuc, mr.nuke.me, keith.busch, linux-pci, Austin.Bolen,
	Stuart.Hayes, Narendra.K, Christopher.Arzola, David.Chalfant,
	Sinan Kaya

On Mon, Jul 30, 2018 at 01:28:14PM +0000, David Laight wrote:
> From: Lukas Wunner
> > Sent: 28 July 2018 19:32
> ...
> > Finally, if the card was quickly swapped and the link to the new
> > card is already up, you may be accessing that new card.  (mmio
> > accesses may then still return all ones if the BARs are blank,
> > but at least config space accesses should work.)
> 
> On my i7-7700 system that no longer works (at least with some cards).
> If I take the PCIe link down completely (reset the FPGA on the card)
> it doesn't recover (loops through detect active/quiet and a third
> state I can't quite remember).
> 
> ISTR that it recovers from the link going down when I short out
> the PCIe data lines.
> 
> It worked fine on a XEON E5-2609 system - I did it a lot when
> updating the fpga image.
> 
> Can anyone else verify whether this works on other systems?
> Or whether the kernel (or BIOS) needs to (re-)initialise
> some register to make link recovery work.

Huh?  Can you be a bit more specific what exactly no longer works
and which branch or kernel version introduced the regression?

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Should a PCIe Link Down event set the PCI_DEV_DISCONNECTED bit?
  2018-07-30 13:54             ` Lukas Wunner
@ 2018-07-30 16:06               ` David Laight
  0 siblings, 0 replies; 22+ messages in thread
From: David Laight @ 2018-07-30 16:06 UTC (permalink / raw)
  To: 'Lukas Wunner'
  Cc: Alex_Gagniuc, mr.nuke.me, keith.busch, linux-pci, Austin.Bolen,
	Stuart.Hayes, Narendra.K, Christopher.Arzola, David.Chalfant,
	Sinan Kaya

From: Lukas Wunner
> Sent: 30 July 2018 14:54
> 
> On Mon, Jul 30, 2018 at 01:28:14PM +0000, David Laight wrote:
> > From: Lukas Wunner
> > > Sent: 28 July 2018 19:32
> > ...
> > > Finally, if the card was quickly swapped and the link to the new
> > > card is already up, you may be accessing that new card.  (mmio
> > > accesses may then still return all ones if the BARs are blank,
> > > but at least config space accesses should work.)
> >
> > On my i7-7700 system that no longer works (at least with some cards).
> > If I take the PCIe link down completely (reset the FPGA on the card)
> > it doesn't recover (loops through detect active/quiet and a third
> > state I can't quite remember).
> >
> > ISTR that it recovers from the link going down when I short out
> > the PCIe data lines.
> >
> > It worked fine on a XEON E5-2609 system - I did it a lot when
> > updating the fpga image.
> >
> > Can anyone else verify whether this works on other systems?
> > Or whether the kernel (or BIOS) needs to (re-)initialise
> > some register to make link recovery work.
> 
> Huh?  Can you be a bit more specific what exactly no longer works
> and which branch or kernel version introduced the regression?

I've just rerun the test on the failing system.
I believe it is related to the CPU/BIOS version, not the kernel.
What I'm actually doing is:

1) Boot the system and load the PCIe drivers for a card we make.
2) echo 1 >/sys/devices/pci..../remove
3) Completely reset the Altera(Intel) fpga at the far end of the PCIe link.

I now expect the link to recover, on the XEON E5-2609 it does (with a 4.15-rc6
kernel) but on the i7-7700 it does not (and hasn't for much older kernels).

I also don't think it makes any difference whether the PCIe slot is
directly connected to the cpu or off the companion chip.

We don't have a PCIe analyser, but the fpga traces ltssm state transitions
to an internal memory buffer which we can read using a serial link when the
PCIe link is down.

After the fpga reset I get:

     clocks: abs delta
status:        2 +4G2 Detect Quiet, set l2_exit, set hotrst_exit, set dlup_exit
status:        3   +1 Polling Compliance, clear l2_exit, clear hotrst_exit, clear dlup_exit
status:        6   +3 Polling Compliance, link speed 1, set link2 de-emphasis level
status:       75 +111 Polling Compliance, set link data link active
status:       76   +1 Polling Active
status:      289 +531 Polling Active, set pld_clk_inuse
status:   16e3da +1M4 Detect Quiet
status:   22558b +M75 Detect Active
status:   22b1a3 +23k Polling Active

Repeats forever at the same rate.

status: 2dd8a0c7 +23k Polling Active
status: 2def8429 +1M5 Detect Quiet
status: 2dfaf5da +M75 Detect Active
status: 2dfb51f3 +23k Polling Active
status: 2e123555 +1M5 Detect Quiet
status: 2e1da706 +M75 Detect Active
status: 2e1e031f +23k Polling Active
status: 2e34e681 +1M5 Detect Quiet
status: 2e405832 +M75 Detect Active
status: 2e40b44b +23k Polling Active

Until I do a 'reboot' when it all recovers

status: 2e48c9d9 +M52 Polling Active, set avalon bus reset
status: 2e48c9dd   +4 Polling Active, clear avalon bus reset
status: 2e48c9de   +1 Detect Quiet
status: 2e48c9df   +1 Detect Quiet, clear pld_clk_inuse
status: 2e48c9e5   +6 Detect Quiet, set pld_clk_inuse
status: 2e48c9e6   +1 Detect Quiet, clear pld_clk_inuse
status: 2e48c9e8   +2 Detect Quiet, set pld_clk_inuse
status: 2e48c9e9   +1 Detect Active
status: 2e48c9eb   +2 Detect Active, clear link data link active
status: 2e48c9f3   +8 Detect Active, set link data link active

The trace suppresses repeated 'Detect active' 'Detect quiet' traces
because they happen for a considerable period during a reboot'.

time: Thu Jan  1 01:10:33 1970
status: 32fd4f1f +78M Detect Quiet, active<=>quiet bounces 102
status: 3308c162 +M75 Detect Active
status: 33091d54 +23k Polling Active
status: 33170b7f +M91 Polling Configuration
status: 33170bc7  +72 Config Link width start, link width 8
status: 33170be7  +32 Config Link accept
status: 33171c1d +4k1 Config Lane num wait
status: 33171c2d  +16 Config Lane num accept
status: 33171c61  +52 Config Complete
status: 33171c67   +6 Config Complete, link width 1
status: 33171caa  +67 Config Idle
status: 33171cba  +16 L0

I'm not sure what 'Polling active' means.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Should a PCIe Link Down event set the PCI_DEV_DISCONNECTED bit?
  2018-07-28 18:31         ` Lukas Wunner
  2018-07-29  0:26           ` Sinan Kaya
  2018-07-30 13:28           ` David Laight
@ 2018-07-30 21:38           ` Alex_Gagniuc
  2018-07-31  9:28             ` Lukas Wunner
  2 siblings, 1 reply; 22+ messages in thread
From: Alex_Gagniuc @ 2018-07-30 21:38 UTC (permalink / raw)
  To: lukas
  Cc: mr.nuke.me, keith.busch, linux-pci, Austin.Bolen, Stuart.Hayes,
	Narendra.K, Christopher.Arzola, David.Chalfant, okaya

On 07/28/2018 01:31 PM, Lukas Wunner wrote:=0A=
> On Fri, Jul 27, 2018 at 05:51:04PM +0000, Alex_Gagniuc@Dellteam.com wrote=
:=0A=
>> I think PCI_DEV_DISCONNECTED is a documentation issue above all else.=0A=
>> The history I was given is that drivers would take a very long time to=
=0A=
>> tear down a device. Config space IO to an nonexistent device took a long=
=0A=
>> while to time out. Performance was one motivation -- and was not documen=
ted.=0A=
> =0A=
> Often it is possible for the driver to detect surprise removal by=0A=
> checking if mmio reads return "all ones".  But in some cases that's=0A=
> a valid value to read from mmio and then this approach won't work.=0A=
> Also, checking every mmio read may negatively impact performance.=0A=
=0A=
A colleague and me beat that dead horse to the afterdeath. Consensus was =
=0A=
that the return value is less reliable than a coin toss (of a two-heads =0A=
coin).=0A=
=0A=
> Finally, if the card was quickly swapped and the link to the new=0A=
> card is already up, you may be accessing that new card.  (mmio=0A=
> accesses may then still return all ones if the BARs are blank,=0A=
> but at least config space accesses should work.)=0A=
=0A=
I'm really not worried about this cases. Assuming that someone goes to =0A=
the effort to swap thing this fast, the IRQ would print an "already =0A=
getting powered down" message, or something of the sorts. It would be =0A=
plainly obvious to even the most casual dmesg observer.=0A=
=0A=
 From what I can tell, the hotplug code is well armed to handle such =0A=
races and resolve them automatically.=0A=
=0A=
> Once it has been determined that the device has been surprise removed,=0A=
> that fact should be cached somewhere to short-circuit any further device=
=0A=
> accesses.  PCI_DEV_DISCONNECTED can act as such a cache.=0A=
=0A=
I do insist that you spell SURPRISE!!! this way. :)=0A=
=0A=
>> Thanks for all the info. The fix that I was settling on is (pasted)=0A=
>> below. Though that seems to conflict a bit with what you are trying to=
=0A=
>> do. Now I'm a little conflicted If I should try to submit the below or n=
ot.=0A=
>>=0A=
>> --- a/drivers/pci/hotplug/pciehp_pci.c=0A=
>> +++ b/drivers/pci/hotplug/pciehp_pci.c=0A=
>> @@ -74,6 +74,7 @@ int pciehp_unconfigure_device(struct slot *p_slot)=0A=
>>           ctrl_dbg(ctrl, "%s: domain:bus:dev =3D %04x:%02x:00\n",=0A=
>>                    __func__, pci_domain_nr(parent), parent->number);=0A=
>>           pciehp_get_adapter_status(p_slot, &presence);=0A=
>> +       presence =3D presence && pciehp_check_link_active(ctrl);=0A=
> =0A=
> That approach won't work if the card was quickly swapped and the link=0A=
> to the new card is already up when pciehp_unconfigure_device() runs.=0A=
=0A=
You're right. We should use the PD/link status gathered at the time of =0A=
the interrupt.=0A=
=0A=
> FWIW, the below is what I had in mind (on top of Bjorn's pci/hotplug=0A=
> branch).  Does this work for you?=0A=
=0A=
This, and another patch (you have been CC'd) solve my problem of =0A=
crashing during surprise removal. Thanks!=0A=
=0A=
Alex=0A=
=0A=
> -- >8 --=0A=
> Subject: [PATCH] PCI: pciehp: Differentiate between surprise and safe rem=
oval=0A=
> =0A=
> When removing PCI devices below a hotplug bridge, pciehp marks them as=0A=
> disconnected if the card is no longer present in the slot or it quiesces=
=0A=
> them if the card is still present (by disabling INTx interrupts, bus=0A=
> mastering and SERR# reporting).=0A=
> =0A=
> To detect whether the card is still present, pciehp checks the Presence=
=0A=
> Detect State bit in the Slot Status register.  The problem with this=0A=
> approach is that even if the card is present, the link to it may be=0A=
> down, and it that case it would be better to mark the devices as=0A=
> disconnected instead of trying to quiesce them.  Moreover, if the card=0A=
> in the slot was quickly replaced by another one, the Presence Detect=0A=
> State bit would be set, yet trying to quiesce the new card's devices=0A=
> would be wrong and the correct thing to do is to mark the previous=0A=
> card's devices as disconnected.=0A=
> =0A=
> Instead of looking at the Presence Detect State bit, it is better to=0A=
> differentiate whether the card was surprise removed versus safely=0A=
> removed (via sysfs or an Attention Button press).  On surprise removal,=
=0A=
> the devices should be marked as disconnected, whereas on safe removal it=
=0A=
> is correct to quiesce the devices.=0A=
> =0A=
> The knowledge whether a surprise removal or a safe removal is at hand=0A=
> does exist further up in the call stack:  A surprise removal is=0A=
> initiated by pciehp_handle_presence_or_link_change(), a safe removal by=
=0A=
> pciehp_handle_disable_request().=0A=
> =0A=
> Pass that information down to pciehp_unconfigure_device() and use it in=
=0A=
> lieu of the Presence Detect State bit.  While there, add kernel-doc to=0A=
> pciehp_unconfigure_device() and pciehp_configure_device().=0A=
> =0A=
> Signed-off-by: Lukas Wunner <lukas@wunner.de>=0A=
> Cc: Alexandru Gagniuc <mr.nuke.me@gmail.com>=0A=
> ---=0A=
>   drivers/pci/hotplug/pciehp.h      |  2 +-=0A=
>   drivers/pci/hotplug/pciehp_ctrl.c | 22 +++++++++++++---------=0A=
>   drivers/pci/hotplug/pciehp_pci.c  | 23 ++++++++++++++++++++---=0A=
>   3 files changed, 34 insertions(+), 13 deletions(-)=0A=
> =0A=
> diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h=
=0A=
> index 652c46d9b215..bce29ae769dd 100644=0A=
> --- a/drivers/pci/hotplug/pciehp.h=0A=
> +++ b/drivers/pci/hotplug/pciehp.h=0A=
> @@ -195,7 +195,7 @@ void pciehp_handle_button_press(struct slot *slot);=
=0A=
>   void pciehp_handle_disable_request(struct slot *slot);=0A=
>   void pciehp_handle_presence_or_link_change(struct slot *slot, u32 event=
s);=0A=
>   int pciehp_configure_device(struct slot *p_slot);=0A=
> -void pciehp_unconfigure_device(struct slot *p_slot);=0A=
> +void pciehp_unconfigure_device(struct slot *p_slot, bool presence);=0A=
>   void pciehp_queue_pushbutton_work(struct work_struct *work);=0A=
>   struct controller *pcie_init(struct pcie_device *dev);=0A=
>   int pcie_init_notification(struct controller *ctrl);=0A=
> diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pcie=
hp_ctrl.c=0A=
> index d7d55160b5f8..8836648e145f 100644=0A=
> --- a/drivers/pci/hotplug/pciehp_ctrl.c=0A=
> +++ b/drivers/pci/hotplug/pciehp_ctrl.c=0A=
> @@ -40,6 +40,9 @@=0A=
>      hotplug controller logic=0A=
>    */=0A=
>   =0A=
> +#define SAFE_REMOVAL	 true=0A=
> +#define SURPRISE_REMOVAL false=0A=
> +=0A=
>   static void set_slot_off(struct controller *ctrl, struct slot *pslot)=
=0A=
>   {=0A=
>   	/* turn off slot, turn on Amber LED, turn off Green LED if supported*/=
=0A=
> @@ -115,12 +118,13 @@ static int board_added(struct slot *p_slot)=0A=
>   /**=0A=
>    * remove_board - Turns off slot and LEDs=0A=
>    * @p_slot: slot where board is being removed=0A=
> + * @safe_removal: whether the board is safely removed (versus surprise r=
emoved)=0A=
>    */=0A=
> -static void remove_board(struct slot *p_slot)=0A=
> +static void remove_board(struct slot *p_slot, bool safe_removal)=0A=
>   {=0A=
>   	struct controller *ctrl =3D p_slot->ctrl;=0A=
>   =0A=
> -	pciehp_unconfigure_device(p_slot);=0A=
> +	pciehp_unconfigure_device(p_slot, safe_removal);=0A=
>   =0A=
>   	if (POWER_CTRL(ctrl)) {=0A=
>   		pciehp_power_off_slot(p_slot);=0A=
> @@ -138,7 +142,7 @@ static void remove_board(struct slot *p_slot)=0A=
>   }=0A=
>   =0A=
>   static int pciehp_enable_slot(struct slot *slot);=0A=
> -static int pciehp_disable_slot(struct slot *slot);=0A=
> +static int pciehp_disable_slot(struct slot *slot, bool safe_removal);=0A=
>   =0A=
>   void pciehp_request(struct controller *ctrl, int action)=0A=
>   {=0A=
> @@ -230,7 +234,7 @@ void pciehp_handle_disable_request(struct slot *slot)=
=0A=
>   	slot->state =3D POWEROFF_STATE;=0A=
>   	mutex_unlock(&slot->lock);=0A=
>   =0A=
> -	ctrl->request_result =3D pciehp_disable_slot(slot);=0A=
> +	ctrl->request_result =3D pciehp_disable_slot(slot, SAFE_REMOVAL);=0A=
>   }=0A=
>   =0A=
>   void pciehp_handle_presence_or_link_change(struct slot *slot, u32 event=
s)=0A=
> @@ -257,7 +261,7 @@ void pciehp_handle_presence_or_link_change(struct slo=
t *slot, u32 events)=0A=
>   		if (events & PCI_EXP_SLTSTA_PDC)=0A=
>   			ctrl_info(ctrl, "Slot(%s): Card not present\n",=0A=
>   				  slot_name(slot));=0A=
> -		pciehp_disable_slot(slot);=0A=
> +		pciehp_disable_slot(slot, SURPRISE_REMOVAL);=0A=
>   		break;=0A=
>   	default:=0A=
>   		mutex_unlock(&slot->lock);=0A=
> @@ -343,7 +347,7 @@ static int pciehp_enable_slot(struct slot *slot)=0A=
>   	return ret;=0A=
>   }=0A=
>   =0A=
> -static int __pciehp_disable_slot(struct slot *p_slot)=0A=
> +static int __pciehp_disable_slot(struct slot *p_slot, bool safe_removal)=
=0A=
>   {=0A=
>   	u8 getstatus =3D 0;=0A=
>   	struct controller *ctrl =3D p_slot->ctrl;=0A=
> @@ -357,17 +361,17 @@ static int __pciehp_disable_slot(struct slot *p_slo=
t)=0A=
>   		}=0A=
>   	}=0A=
>   =0A=
> -	remove_board(p_slot);=0A=
> +	remove_board(p_slot, safe_removal);=0A=
>   	return 0;=0A=
>   }=0A=
>   =0A=
> -static int pciehp_disable_slot(struct slot *slot)=0A=
> +static int pciehp_disable_slot(struct slot *slot, bool safe_removal)=0A=
>   {=0A=
>   	struct controller *ctrl =3D slot->ctrl;=0A=
>   	int ret;=0A=
>   =0A=
>   	pm_runtime_get_sync(&ctrl->pcie->port->dev);=0A=
> -	ret =3D __pciehp_disable_slot(slot);=0A=
> +	ret =3D __pciehp_disable_slot(slot, safe_removal);=0A=
>   	pm_runtime_put(&ctrl->pcie->port->dev);=0A=
>   =0A=
>   	mutex_lock(&slot->lock);=0A=
> diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pcieh=
p_pci.c=0A=
> index ec3f065bb1c0..079aac163484 100644=0A=
> --- a/drivers/pci/hotplug/pciehp_pci.c=0A=
> +++ b/drivers/pci/hotplug/pciehp_pci.c=0A=
> @@ -34,6 +34,14 @@=0A=
>   #include "../pci.h"=0A=
>   #include "pciehp.h"=0A=
>   =0A=
> +/**=0A=
> + * pciehp_configure_device() - enumerate PCI devices below a hotplug bri=
dge=0A=
> + * @p_slot: PCIe hotplug slot=0A=
> + *=0A=
> + * Enumerate PCI devices below a hotplug bridge and add them to the syst=
em.=0A=
> + * Return 0 on success, %-EEXIST if the devices are already enumerated o=
r=0A=
> + * %-ENODEV if enumeration failed.=0A=
> + */=0A=
>   int pciehp_configure_device(struct slot *p_slot)=0A=
>   {=0A=
>   	struct pci_dev *dev;=0A=
> @@ -76,9 +84,19 @@ int pciehp_configure_device(struct slot *p_slot)=0A=
>   	return ret;=0A=
>   }=0A=
>   =0A=
> -void pciehp_unconfigure_device(struct slot *p_slot)=0A=
> +/**=0A=
> + * pciehp_unconfigure_device() - remove PCI devices below a hotplug brid=
ge=0A=
> + * @p_slot: PCIe hotplug slot=0A=
> + * @presence: whether the card is still present in the slot;=0A=
> + *	true for safe removal via sysfs or an Attention Button press,=0A=
> + *	false for surprise removal=0A=
> + *=0A=
> + * Unbind PCI devices below a hotplug bridge from their drivers and remo=
ve=0A=
> + * them from the system.  Safely removed devices are quiesced.  Surprise=
=0A=
> + * removed devices are marked as such to prevent further accesses.=0A=
> + */=0A=
> +void pciehp_unconfigure_device(struct slot *p_slot, bool presence)=0A=
>   {=0A=
> -	u8 presence =3D 0;=0A=
>   	struct pci_dev *dev, *temp;=0A=
>   	struct pci_bus *parent =3D p_slot->ctrl->pcie->port->subordinate;=0A=
>   	u16 command;=0A=
> @@ -86,7 +104,6 @@ void pciehp_unconfigure_device(struct slot *p_slot)=0A=
>   =0A=
>   	ctrl_dbg(ctrl, "%s: domain:bus:dev =3D %04x:%02x:00\n",=0A=
>   		 __func__, pci_domain_nr(parent), parent->number);=0A=
> -	pciehp_get_adapter_status(p_slot, &presence);=0A=
>   =0A=
>   	pci_lock_rescan_remove();=0A=
>   =0A=
> =0A=
=0A=

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Should a PCIe Link Down event set the PCI_DEV_DISCONNECTED bit?
  2018-07-30 21:38           ` Alex_Gagniuc
@ 2018-07-31  9:28             ` Lukas Wunner
  2018-07-31 16:35               ` Alex_Gagniuc
  0 siblings, 1 reply; 22+ messages in thread
From: Lukas Wunner @ 2018-07-31  9:28 UTC (permalink / raw)
  To: Alex_Gagniuc
  Cc: mr.nuke.me, keith.busch, linux-pci, Austin.Bolen, Stuart.Hayes,
	Narendra.K, Christopher.Arzola, David.Chalfant, okaya

On Mon, Jul 30, 2018 at 09:38:04PM +0000, Alex_Gagniuc@Dellteam.com wrote:
> On 07/28/2018 01:31 PM, Lukas Wunner wrote:
> > On Fri, Jul 27, 2018 at 05:51:04PM +0000, Alex_Gagniuc@Dellteam.com wrote:
> > > I think PCI_DEV_DISCONNECTED is a documentation issue above all else.
> > > The history I was given is that drivers would take a very long time to
> > > tear down a device. Config space IO to an nonexistent device took a long
> > > while to time out. Performance was one motivation -- and was not
> > > documented.
> > 
> > Often it is possible for the driver to detect surprise removal by
> > checking if mmio reads return "all ones".  But in some cases that's
> > a valid value to read from mmio and then this approach won't work.
> > Also, checking every mmio read may negatively impact performance.
> 
> A colleague and me beat that dead horse to the afterdeath. Consensus was 
> that the return value is less reliable than a coin toss (of a two-heads 
> coin).

Can you elaborate why?  Because the "official" stance is that checking
every read where "all ones" is an invalid value is the proper way to
detect unplugged devices.  (Official as in, voiced by Greg KH and Bjorn.)
In that sense, PCI_DEV_DISCONNECTED is sort of an unloved child.

See this thread:
https://www.spinics.net/lists/linux-acpi/msg81445.html


> > FWIW, the below is what I had in mind (on top of Bjorn's pci/hotplug
> > branch).  Does this work for you?
> 
> This, and another patch (you have been CC'd) solve my problem of 
> crashing during surprise removal. Thanks!

Ok thanks, I submitted the patch this morning with your Tested-by.
Unfortunately I forgot to cc all your Dell colleagues, sorry.

Lukas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Should a PCIe Link Down event set the PCI_DEV_DISCONNECTED bit?
  2018-07-31  9:28             ` Lukas Wunner
@ 2018-07-31 16:35               ` Alex_Gagniuc
  2018-08-01  8:58                 ` David Laight
  0 siblings, 1 reply; 22+ messages in thread
From: Alex_Gagniuc @ 2018-07-31 16:35 UTC (permalink / raw)
  To: lukas
  Cc: mr.nuke.me, keith.busch, linux-pci, Austin.Bolen, Stuart.Hayes,
	Narendra.K, Christopher.Arzola, David.Chalfant, okaya

On 07/31/2018 04:29 AM, Lukas Wunner wrote:=0A=
> On Mon, Jul 30, 2018 at 09:38:04PM +0000, Alex_Gagniuc@Dellteam.com wrote=
:=0A=
>> On 07/28/2018 01:31 PM, Lukas Wunner wrote:=0A=
>>> On Fri, Jul 27, 2018 at 05:51:04PM +0000, Alex_Gagniuc@Dellteam.com wro=
te:=0A=
>>>> I think PCI_DEV_DISCONNECTED is a documentation issue above all else.=
=0A=
>>>> The history I was given is that drivers would take a very long time to=
=0A=
>>>> tear down a device. Config space IO to an nonexistent device took a lo=
ng=0A=
>>>> while to time out. Performance was one motivation -- and was not=0A=
>>>> documented.=0A=
>>>=0A=
>>> Often it is possible for the driver to detect surprise removal by=0A=
>>> checking if mmio reads return "all ones".  But in some cases that's=0A=
>>> a valid value to read from mmio and then this approach won't work.=0A=
>>> Also, checking every mmio read may negatively impact performance.=0A=
>>=0A=
>> A colleague and me beat that dead horse to the afterdeath. Consensus was=
=0A=
>> that the return value is less reliable than a coin toss (of a two-heads=
=0A=
>> coin).=0A=
> =0A=
> Can you elaborate why?  Because the "official" stance is that checking=0A=
> every read where "all ones" is an invalid value is the proper way to=0A=
> detect unplugged devices.  (Official as in, voiced by Greg KH and Bjorn.)=
=0A=
> In that sense, PCI_DEV_DISCONNECTED is sort of an unloved child.=0A=
=0A=
All ones is not necessarily invalid. The bug surface is every single =0A=
config read. This approach doesn't even cover config writes -- config =0A=
writes are non-posted requests too in PCIe.=0A=
"Build it, and they will come". That means that drivers would expect =0A=
-ENODEV when a device is gone. If we have that infrastructure, more =0A=
drivers will start using it over time, and it's something that can also =0A=
be used by generic parts of the PCI code. That also means you need a =0A=
generic mechanism to determine a device is bye-bye, and that's what =0A=
PCI_DEV_DISCONNECTED gives you.=0A=
=0A=
> See this thread:=0A=
> https://www.spinics.net/lists/linux-acpi/msg81445.html=0A=
=0A=
The discussion is based on the wrong assumptions that reads are =0A=
symmetrical wrt to a device being present or not. However, completion =0A=
timeouts and unsupported requests blow that assumption right out of the =0A=
water. Best case scenario, you just waste a little more time waiting for =
=0A=
hardware IO. More common is to end up crashing the machine.=0A=
=0A=
Greg's ideas work in a perfect world where PCI and PCIe are equivalent =0A=
at every level. And in that case, PCI_DEV_DISCONNECTED would have been =0A=
pure, 100% genuine Redmond bloatware. We wouldn't have gotten complaints =
=0A=
from Facebook and other industry players that it takes too damn long to =0A=
remove a device. We probably also wouldn't be seeing machines crash on =0A=
PCIe removal.=0A=
=0A=
Fun fact: Before PCI_DEV_DISCONNECTED, you could physically swap a =0A=
device before the the teardown path was done with the previous device. =0A=
Figuring out what problems that caused is left as an exercise to the reader=
.=0A=
=0A=
>>> FWIW, the below is what I had in mind (on top of Bjorn's pci/hotplug=0A=
>>> branch).  Does this work for you?=0A=
>>=0A=
>> This, and another patch (you have been CC'd) solve my problem of=0A=
>> crashing during surprise removal. Thanks!=0A=
> =0A=
> Ok thanks, I submitted the patch this morning with your Tested-by.=0A=
=0A=
Sweet. Thanks!=0A=
=0A=
> Unfortunately I forgot to cc all your Dell colleagues, sorry.=0A=
=0A=
They'll live. They already noticed it and sent me emails about it.=0A=
=0A=
Alex=0A=
=0A=
> Lukas=0A=
> =0A=
=0A=

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Should a PCIe Link Down event set the PCI_DEV_DISCONNECTED bit?
  2018-07-31 16:35               ` Alex_Gagniuc
@ 2018-08-01  8:58                 ` David Laight
  2018-08-01 19:06                   ` Alex_Gagniuc
  0 siblings, 1 reply; 22+ messages in thread
From: David Laight @ 2018-08-01  8:58 UTC (permalink / raw)
  To: 'Alex_Gagniuc@Dellteam.com', lukas
  Cc: mr.nuke.me, keith.busch, linux-pci, Austin.Bolen, Stuart.Hayes,
	Narendra.K, Christopher.Arzola, David.Chalfant, okaya

From: Alex_Gagniuc@Dellteam.com
> Sent: 31 July 2018 17:36
> 
> On 07/31/2018 04:29 AM, Lukas Wunner wrote:
> > On Mon, Jul 30, 2018 at 09:38:04PM +0000, Alex_Gagniuc@Dellteam.com wrote:
> >> On 07/28/2018 01:31 PM, Lukas Wunner wrote:
> >>> On Fri, Jul 27, 2018 at 05:51:04PM +0000, Alex_Gagniuc@Dellteam.com wrote:
> >>>> I think PCI_DEV_DISCONNECTED is a documentation issue above all else.
> >>>> The history I was given is that drivers would take a very long time to
> >>>> tear down a device. Config space IO to an nonexistent device took a long
> >>>> while to time out. Performance was one motivation -- and was not
> >>>> documented.
> >>>
> >>> Often it is possible for the driver to detect surprise removal by
> >>> checking if mmio reads return "all ones".  But in some cases that's
> >>> a valid value to read from mmio and then this approach won't work.
> >>> Also, checking every mmio read may negatively impact performance.
> >>
> >> A colleague and me beat that dead horse to the afterdeath. Consensus was
> >> that the return value is less reliable than a coin toss (of a two-heads
> >> coin).

Something cheap-ish to find out whether a -1 was caused by a card
removal might be sensible - Especially if it can be done without
a config space read.
Clearly you can't check anything BEFORE doing the read.
And reading the pci-id from config space isn't entirely useful.
If the card has reset itself (and the link recovered) then you
need to read a BAR register and check it is setup.

More interestingly a read request that is inside the bridge's address
window but outside any BAR (fairly easy to setup if the target has
a large BAR and a small one) will also timeout (and return -1) even
though there is no failure of the link.

If the target supports AER the information about the failed cycle
ends up in the target's AER registers - even if the host bridge
doesn't support AER (or it is being ignored).
So it might be useful being able to read the AER registers even when
no AER interrupt (or other notification) actually happens.

I've not managed to get linux to pick up AER interrupts even on
systems where the hardware clearly supports them (at least on
some slots).  I suspect the BIOS is carefully disabling them
because of reports of message logs being spammed with AER errors.

We also have one system (possibly a Dell 740) where any failure
of a PCIe link leads to an NMI and a kernel crash!
Not entirely useful in a server model that is supposed to have
resilience against various errors.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Should a PCIe Link Down event set the PCI_DEV_DISCONNECTED bit?
  2018-08-01  8:58                 ` David Laight
@ 2018-08-01 19:06                   ` Alex_Gagniuc
  0 siblings, 0 replies; 22+ messages in thread
From: Alex_Gagniuc @ 2018-08-01 19:06 UTC (permalink / raw)
  To: David.Laight, lukas
  Cc: mr.nuke.me, keith.busch, linux-pci, Austin.Bolen, Stuart.Hayes,
	Narendra.K, Christopher.Arzola, David.Chalfant, okaya

On 08/01/2018 03:57 AM, David Laight wrote:=0A=
> From: Alex_Gagniuc@Dellteam.com=0A=
>> Sent: 31 July 2018 17:36=0A=
>>=0A=
>> On 07/31/2018 04:29 AM, Lukas Wunner wrote:=0A=
>>> On Mon, Jul 30, 2018 at 09:38:04PM +0000, Alex_Gagniuc@Dellteam.com wro=
te:=0A=
>>>> On 07/28/2018 01:31 PM, Lukas Wunner wrote:=0A=
>>>>> On Fri, Jul 27, 2018 at 05:51:04PM +0000, Alex_Gagniuc@Dellteam.com w=
rote:=0A=
>>>>>> I think PCI_DEV_DISCONNECTED is a documentation issue above all else=
.=0A=
>>>>>> The history I was given is that drivers would take a very long time =
to=0A=
>>>>>> tear down a device. Config space IO to an nonexistent device took a =
long=0A=
>>>>>> while to time out. Performance was one motivation -- and was not=0A=
>>>>>> documented.=0A=
>>>>>=0A=
>>>>> Often it is possible for the driver to detect surprise removal by=0A=
>>>>> checking if mmio reads return "all ones".  But in some cases that's=
=0A=
>>>>> a valid value to read from mmio and then this approach won't work.=0A=
>>>>> Also, checking every mmio read may negatively impact performance.=0A=
>>>>=0A=
>>>> A colleague and me beat that dead horse to the afterdeath. Consensus w=
as=0A=
>>>> that the return value is less reliable than a coin toss (of a two-head=
s=0A=
>>>> coin).=0A=
> =0A=
> Something cheap-ish to find out whether a -1 was caused by a card=0A=
> removal might be sensible - Especially if it can be done without=0A=
> a config space read.=0A=
> Clearly you can't check anything BEFORE doing the read.=0A=
> And reading the pci-id from config space isn't entirely useful.=0A=
> If the card has reset itself (and the link recovered) then you=0A=
> need to read a BAR register and check it is setup.=0A=
> =0A=
> More interestingly a read request that is inside the bridge's address=0A=
> window but outside any BAR (fairly easy to setup if the target has=0A=
> a large BAR and a small one) will also timeout (and return -1) even=0A=
> though there is no failure of the link.=0A=
> =0A=
> If the target supports AER the information about the failed cycle=0A=
> ends up in the target's AER registers - even if the host bridge=0A=
> doesn't support AER (or it is being ignored).=0A=
> So it might be useful being able to read the AER registers even when=0A=
> no AER interrupt (or other notification) actually happens.=0A=
=0A=
There are a number of ways to know a device is kaput. Information from =0A=
AER and DPC has proven to be the most reliable. So much, that for the =0A=
problems I am trying to solve, this information is necessary and sufficient=
.=0A=
=0A=
> I've not managed to get linux to pick up AER interrupts even on=0A=
> systems where the hardware clearly supports them (at least on=0A=
> some slots).  I suspect the BIOS is carefully disabling them=0A=
> because of reports of message logs being spammed with AER errors.=0A=
=0A=
I suspect you've hit an FFS bug.=0A=
=0A=
> We also have one system (possibly a Dell 740)=0A=
=0A=
Not sure we make a "possibly 740" model. Let me ask around.=0A=
=0A=
> where any failure of a PCIe link leads to an NMI and a kernel crash!=0A=
=0A=
The kernel crash is a linux bug. I've worked on that extensively in the =0A=
past. We tried to fix it [1]. Unfortunately, due to an unprofessional =0A=
maintainer and months of spinning in circles, word came that our =0A=
resources are better spent elsewhere. Feel free to pick up where we left =
=0A=
off.=0A=
=0A=
> Not entirely useful in a server model that is supposed to have=0A=
> resilience against various errors.=0A=
=0A=
You're preaching to the choir. The architecture and features are driven =0A=
by customer demand. A lot of those "features" -- I haven't asked what =0A=
they are -- are easily implemented with FFS. If you have a problem with =0A=
FFS in particular -- and I do realize a lot of the specs around FFS are =0A=
poorly written and not well thought out -- then it's marketing, sales =0A=
and corporate that should know.=0A=
=0A=
Here's the thing. I think the FW's job is to do the absolute minimal =0A=
initialization to pass on control to the OS. uboot/linux stacks execute =0A=
this beautifully. But customers want features, that very often, OS =0A=
vendors are hesitant or outright refusing to implement. The only =0A=
remaining place to implement them is the platform.=0A=
=0A=
It sucks, but it's how things are. Anyway, the patches at [1] should =0A=
solve your system crashing issue.=0A=
=0A=
Alex=0A=
=0A=
=0A=
[1] https://lore.kernel.org/patchwork/patch/908811/=0A=
=0A=

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2018-08-01 20:53 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-26 22:38 Should a PCIe Link Down event set the PCI_DEV_DISCONNECTED bit? Alex G.
2018-07-26 23:00 ` Rajat Jain
2018-07-27  0:04   ` Alex_Gagniuc
2018-07-27  7:18 ` Lukas Wunner
2018-07-27 15:52   ` Alex_Gagniuc
2018-07-27 17:05     ` Lukas Wunner
2018-07-27 17:51       ` Alex_Gagniuc
2018-07-27 18:17         ` Sinan Kaya
2018-07-27 18:23           ` Alex_Gagniuc
2018-07-27 18:34             ` Sinan Kaya
2018-07-28 18:31         ` Lukas Wunner
2018-07-29  0:26           ` Sinan Kaya
2018-07-29 12:09             ` Lukas Wunner
2018-07-29 16:59               ` Sinan Kaya
2018-07-30 13:28           ` David Laight
2018-07-30 13:54             ` Lukas Wunner
2018-07-30 16:06               ` David Laight
2018-07-30 21:38           ` Alex_Gagniuc
2018-07-31  9:28             ` Lukas Wunner
2018-07-31 16:35               ` Alex_Gagniuc
2018-08-01  8:58                 ` David Laight
2018-08-01 19:06                   ` Alex_Gagniuc

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.