linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* sb_edac.c lacks PCI domain support?
@ 2018-08-08 20:07 Bjorn Helgaas
  2018-08-08 20:18 ` Luck, Tony
  0 siblings, 1 reply; 3+ messages in thread
From: Bjorn Helgaas @ 2018-08-08 20:07 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Borislav Petkov
  Cc: linux-edac, linux-kernel, Lee, Chun-Yi, Tony Luck

I think sb_edac.c (and probably other EDAC stuff) lacks PCI domain
support.  I notice messages like this:

  [   14.370256] pci 0000:ff:13.5: [8086:6fad] type 00 class 0x088000
  [   14.980481] pci 0000:bf:13.5: [8086:6fad] type 00 class 0x088000
  [   15.590646] pci 0000:7f:13.5: [8086:6fad] type 00 class 0x088000
  [   16.200498] pci 0000:3f:13.5: [8086:6fad] type 00 class 0x088000
  [   17.928243] pci 0001:ff:13.5: [8086:6fad] type 00 class 0x088000
  [   18.538876] pci 0001:bf:13.5: [8086:6fad] type 00 class 0x088000
  [   19.149211] pci 0001:7f:13.5: [8086:6fad] type 00 class 0x088000
  [   19.759431] pci 0001:3f:13.5: [8086:6fad] type 00 class 0x088000
  ...
  [   54.298058] EDAC sbridge: Duplicated device for 8086:6fad
  [   54.298062] EDAC sbridge: Failed to register device with error -19.

on a large system (see [1]).  It looks like sbridge_get_onedevice()
looks up things based on the PCI bus number, but it ignores the PCI
domain (aka segment) number, and I suspect it thinks 0000:ff:13.5 and
0001:ff:13.5 are duplicates.

  sbridge_get_all_devices
    while (...)
      do
	sbridge_get_onedevice
	  pdev = pci_get_device(...)
	  sbridge_dev = get_sbridge_dev(pdev->bus->number, ...)
	  if (sbridge_dev->pdev[sbridge_dev->i_devs])
	    printk("Duplicated device ...")
	    return -ENODEV                   # -19
      while (pdev ...)

It looks like 88ae80aa609c ("EDAC, skx_edac: Handle systems with
segmented PCI busses") fixes a similar problem; maybe that should
be applied elsewhere in EDAC as well?

Why doesn't EDAC use the standard pci_register_driver() interface?
That would avoid issues like this.  It would also avoid the potential
conflict of another driver operating on the device at the same time.

[1] https://bugzilla.kernel.org/attachment.cgi?id=277759 (attachment
to unrelated bug https://bugzilla.kernel.org/show_bug.cgi?id=200765)

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-08-14 13:58 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-08 20:07 sb_edac.c lacks PCI domain support? Bjorn Helgaas
2018-08-08 20:18 ` Luck, Tony
2018-08-14 13:58   ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).