PCI: hotplug: Erroneous removal of hotplug PCI devices

* PCI: hotplug: Erroneous removal of hotplug PCI devices
@ 2019-01-23 18:20 Alex_Gagniuc
  2019-01-23 18:44 ` Keith Busch
  2019-01-23 18:54 ` Lukas Wunner
  0 siblings, 2 replies; 23+ messages in thread
From: Alex_Gagniuc @ 2019-01-23 18:20 UTC (permalink / raw)
  To: linux-pci; +Cc: bhelgaas, lukas, keith.busch, Austin.Bolen

Hi all,

This may be a mind-twisting explanation, so pleas bear with me.

In PCIe, the presence detect bit (PD) in the slot status register should 
be a logical OR of in-band and out-of band presence. In-band presence is 
the data link layer status. So one would expect that a link up event, 
would be accompanied by a PD changed event with PD set. Not everyone 
follows that.

I have a system here with the following order of events:
  *   0 ms : Link up
  * 400 ms : Presence detect up
On the first event, the device is probed as expected, and on the second 
event, the device is removed as a SURPRISE!!!_REMOVAL. This is a bug.

The logic is that on every change of presence detect:
/* Even if [the slot]'s occupied again, we cannot assume the card is the 
same. */
Reasonable, but the resulting behavior is a bug.

Solution 1 is to say it's a spec violation, so ignore it. They'll change 
the "logical OR" thing in the next PCIe spec, so we still will have to 
worry about this.

It's obvious that just relying on presence detect state is prone to race 
conditions. However, if a device is replaced, we'd expect the data link 
layer state to change as well. So I think the best way to proceed is to 
skip the SURPRISE!!!_REMOVAL if the following are true:
  * presence detect is set
  * DLL changed is not set
  * presence detect was not previously set

Thoughts?

Alex

^ permalink raw reply	[flat|nested] 23+ messages in thread