linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* system generating an NMI due to 80696f991424d ("PCI: pciehp: Tolerate Presence Detect hardwired to zero")
@ 2020-01-15 10:26 Oliver Neukum
  2020-01-15 11:24 ` Lukas Wunner
  0 siblings, 1 reply; 6+ messages in thread
From: Oliver Neukum @ 2020-01-15 10:26 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: David Yang, Rajat Jain, Ashok Raj, linux-pci

Hi,

I got a bug report about some systems generating an NMI and
subsequently crashing bisected down to 80696f991424d.
Apparently these systems do not react well to __pciehp_enable_slot
while no card is present. Restoring the check to __pciehp_enable_slot()
removed in 80696f991424d makes the current kernels work.

What is to be done? Do you want a special case for the affected
systems based on DMI, or should I revert 80696f991424d?

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: system generating an NMI due to 80696f991424d ("PCI: pciehp: Tolerate Presence Detect hardwired to zero")
  2020-01-15 10:26 system generating an NMI due to 80696f991424d ("PCI: pciehp: Tolerate Presence Detect hardwired to zero") Oliver Neukum
@ 2020-01-15 11:24 ` Lukas Wunner
  2020-01-16  5:35   ` Lukas Wunner
  0 siblings, 1 reply; 6+ messages in thread
From: Lukas Wunner @ 2020-01-15 11:24 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: David Yang, Rajat Jain, Ashok Raj, linux-pci

On Wed, Jan 15, 2020 at 11:26:26AM +0100, Oliver Neukum wrote:
> I got a bug report about some systems generating an NMI and
> subsequently crashing bisected down to 80696f991424d.
> Apparently these systems do not react well to __pciehp_enable_slot
> while no card is present. Restoring the check to __pciehp_enable_slot()
> removed in 80696f991424d makes the current kernels work.

That's odd, these systems must be setting the Data Link Layer Link Active
bit in the Link Status Register even though no card is present.


> What is to be done? Do you want a special case for the affected
> systems based on DMI, or should I revert 80696f991424d?

It would be good if we could get a better idea what's going on before
deciding what action to take.  What systems are we talking about exactly?
Can you provide dmesg and lspci -vvvv output including the NMI, e.g. by
attaching it to a new bugzilla?

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: system generating an NMI due to 80696f991424d ("PCI: pciehp: Tolerate Presence Detect hardwired to zero")
  2020-01-15 11:24 ` Lukas Wunner
@ 2020-01-16  5:35   ` Lukas Wunner
  2020-02-05 12:15     ` Oliver Neukum
  0 siblings, 1 reply; 6+ messages in thread
From: Lukas Wunner @ 2020-01-16  5:35 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: David Yang, Rajat Jain, Ashok Raj, linux-pci, Stuart Hayes

[cc += Stuart]

On Wed, Jan 15, 2020 at 12:24:29PM +0100, Lukas Wunner wrote:
> On Wed, Jan 15, 2020 at 11:26:26AM +0100, Oliver Neukum wrote:
> > I got a bug report about some systems generating an NMI and
> > subsequently crashing bisected down to 80696f991424d.
> > Apparently these systems do not react well to __pciehp_enable_slot
> > while no card is present. Restoring the check to __pciehp_enable_slot()
> > removed in 80696f991424d makes the current kernels work.
> 
> That's odd, these systems must be setting the Data Link Layer Link Active
> bit in the Link Status Register even though no card is present.

Recent PCIe versions allow turning off in-band presence detect, in which
case the DLLLA bit can be set even though Presence Detect is not set.
You may be dealing with one of those systems but without full dmesg
and lspci output this is just an educated guess.

A series was submitted by Dell last year to support disabling in-band
presence detect, but it hasn't been merged yet by Bjorn:

https://lore.kernel.org/linux-pci/20191025190047.38130-1-stuart.w.hayes@gmail.com/

You may want to try if that series helps.

Thanks,

Lukas


> > What is to be done? Do you want a special case for the affected
> > systems based on DMI, or should I revert 80696f991424d?
> 
> It would be good if we could get a better idea what's going on before
> deciding what action to take.  What systems are we talking about exactly?
> Can you provide dmesg and lspci -vvvv output including the NMI, e.g. by
> attaching it to a new bugzilla?
> 
> Thanks,
> 
> Lukas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: system generating an NMI due to 80696f991424d ("PCI: pciehp: Tolerate Presence Detect hardwired to zero")
  2020-01-16  5:35   ` Lukas Wunner
@ 2020-02-05 12:15     ` Oliver Neukum
  2020-02-08 20:31       ` Lukas Wunner
  0 siblings, 1 reply; 6+ messages in thread
From: Oliver Neukum @ 2020-02-05 12:15 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: David Yang, Rajat Jain, Ashok Raj, linux-pci, Stuart Hayes

Am Donnerstag, den 16.01.2020, 06:35 +0100 schrieb Lukas Wunner:
> [cc += Stuart]
> 
> On Wed, Jan 15, 2020 at 12:24:29PM +0100, Lukas Wunner wrote:
> > On Wed, Jan 15, 2020 at 11:26:26AM +0100, Oliver Neukum wrote:
> > > I got a bug report about some systems generating an NMI and
> > > subsequently crashing bisected down to 80696f991424d.
> > > Apparently these systems do not react well to __pciehp_enable_slot
> > > while no card is present. Restoring the check to __pciehp_enable_slot()
> > > removed in 80696f991424d makes the current kernels work.
> > 
> > That's odd, these systems must be setting the Data Link Layer Link Active
> > bit in the Link Status Register even though no card is present.
> 
> Recent PCIe versions allow turning off in-band presence detect, in which
> case the DLLLA bit can be set even though Presence Detect is not set.
> You may be dealing with one of those systems but without full dmesg
> and lspci output this is just an educated guess.
> 
> A series was submitted by Dell last year to support disabling in-band
> presence detect, but it hasn't been merged yet by Bjorn:
> 
> https://lore.kernel.org/linux-pci/20191025190047.38130-1-stuart.w.hayes@gmail.com/
> 
> You may want to try if that series helps.

Hi,

it has been tested and it does the job. May I ask whether you could
ack it or propose necessary changes, so that we can proceed?

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: system generating an NMI due to 80696f991424d ("PCI: pciehp: Tolerate Presence Detect hardwired to zero")
  2020-02-05 12:15     ` Oliver Neukum
@ 2020-02-08 20:31       ` Lukas Wunner
  2020-02-19 10:25         ` Oliver Neukum
  0 siblings, 1 reply; 6+ messages in thread
From: Lukas Wunner @ 2020-02-08 20:31 UTC (permalink / raw)
  To: Oliver Neukum, Bjorn Helgaas
  Cc: David Yang, Rajat Jain, Ashok Raj, linux-pci, Stuart Hayes,
	Libor Pechacek

On Wed, Feb 05, 2020 at 01:15:00PM +0100, Oliver Neukum wrote:
> Am Donnerstag, den 16.01.2020, 06:35 +0100 schrieb Lukas Wunner:
> > On Wed, Jan 15, 2020 at 11:26:26AM +0100, Oliver Neukum wrote:
> > > I got a bug report about some systems generating an NMI and
> > > subsequently crashing bisected down to 80696f991424d.
> > > Apparently these systems do not react well to __pciehp_enable_slot
> > > while no card is present. Restoring the check to __pciehp_enable_slot()
> > > removed in 80696f991424d makes the current kernels work.
> > 
> > Recent PCIe versions allow turning off in-band presence detect, in which
> > case the DLLLA bit can be set even though Presence Detect is not set.
> > You may be dealing with one of those systems but without full dmesg
> > and lspci output this is just an educated guess.
> > 
> > A series was submitted by Dell last year to support disabling in-band
> > presence detect, but it hasn't been merged yet by Bjorn:
> > 
> > https://lore.kernel.org/linux-pci/20191025190047.38130-1-stuart.w.hayes@gmail.com/
> > 
> > You may want to try if that series helps.
> 
> it has been tested and it does the job. May I ask whether you could
> ack it or propose necessary changes, so that we can proceed?

Thanks for testing, so I assume that's a

Tested-by: Oliver Neukum <oneukum@suse.com>

The series has already been reviewed by Mika Westerberg, additionally
Andy Shevchenko has provided a Reviewed-by for each individual patch.
Nevertheless I've just also reviewed it once more and provided my
opinion in a separate e-mail.

The patches are not forgotten, they're still marked "New" in patchwork:
https://patchwork.kernel.org/cover/11212969/

So I assume Bjorn will get to them after the merge window closes
(and after taking a breather).

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: system generating an NMI due to 80696f991424d ("PCI: pciehp: Tolerate Presence Detect hardwired to zero")
  2020-02-08 20:31       ` Lukas Wunner
@ 2020-02-19 10:25         ` Oliver Neukum
  0 siblings, 0 replies; 6+ messages in thread
From: Oliver Neukum @ 2020-02-19 10:25 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas
  Cc: David Yang, Rajat Jain, Ashok Raj, linux-pci, Stuart Hayes,
	Libor Pechacek

Am Samstag, den 08.02.2020, 21:31 +0100 schrieb Lukas Wunner:
> Thanks for testing, so I assume that's a
> 
> Tested-by: Oliver Neukum <oneukum@suse.com>


Yes, this is tested.

	HTH
		Oliver


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-02-19 10:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-15 10:26 system generating an NMI due to 80696f991424d ("PCI: pciehp: Tolerate Presence Detect hardwired to zero") Oliver Neukum
2020-01-15 11:24 ` Lukas Wunner
2020-01-16  5:35   ` Lukas Wunner
2020-02-05 12:15     ` Oliver Neukum
2020-02-08 20:31       ` Lukas Wunner
2020-02-19 10:25         ` Oliver Neukum

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).