linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* sb_edac.c lacks PCI domain support?
@ 2018-08-08 20:07 Bjorn Helgaas
  2018-08-08 20:18 ` Luck, Tony
  0 siblings, 1 reply; 3+ messages in thread
From: Bjorn Helgaas @ 2018-08-08 20:07 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Borislav Petkov
  Cc: linux-edac, linux-kernel, Lee, Chun-Yi, Tony Luck

I think sb_edac.c (and probably other EDAC stuff) lacks PCI domain
support.  I notice messages like this:

  [   14.370256] pci 0000:ff:13.5: [8086:6fad] type 00 class 0x088000
  [   14.980481] pci 0000:bf:13.5: [8086:6fad] type 00 class 0x088000
  [   15.590646] pci 0000:7f:13.5: [8086:6fad] type 00 class 0x088000
  [   16.200498] pci 0000:3f:13.5: [8086:6fad] type 00 class 0x088000
  [   17.928243] pci 0001:ff:13.5: [8086:6fad] type 00 class 0x088000
  [   18.538876] pci 0001:bf:13.5: [8086:6fad] type 00 class 0x088000
  [   19.149211] pci 0001:7f:13.5: [8086:6fad] type 00 class 0x088000
  [   19.759431] pci 0001:3f:13.5: [8086:6fad] type 00 class 0x088000
  ...
  [   54.298058] EDAC sbridge: Duplicated device for 8086:6fad
  [   54.298062] EDAC sbridge: Failed to register device with error -19.

on a large system (see [1]).  It looks like sbridge_get_onedevice()
looks up things based on the PCI bus number, but it ignores the PCI
domain (aka segment) number, and I suspect it thinks 0000:ff:13.5 and
0001:ff:13.5 are duplicates.

  sbridge_get_all_devices
    while (...)
      do
	sbridge_get_onedevice
	  pdev = pci_get_device(...)
	  sbridge_dev = get_sbridge_dev(pdev->bus->number, ...)
	  if (sbridge_dev->pdev[sbridge_dev->i_devs])
	    printk("Duplicated device ...")
	    return -ENODEV                   # -19
      while (pdev ...)

It looks like 88ae80aa609c ("EDAC, skx_edac: Handle systems with
segmented PCI busses") fixes a similar problem; maybe that should
be applied elsewhere in EDAC as well?

Why doesn't EDAC use the standard pci_register_driver() interface?
That would avoid issues like this.  It would also avoid the potential
conflict of another driver operating on the device at the same time.

[1] https://bugzilla.kernel.org/attachment.cgi?id=277759 (attachment
to unrelated bug https://bugzilla.kernel.org/show_bug.cgi?id=200765)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: sb_edac.c lacks PCI domain support?
  2018-08-08 20:07 sb_edac.c lacks PCI domain support? Bjorn Helgaas
@ 2018-08-08 20:18 ` Luck, Tony
  2018-08-14 13:58   ` Bjorn Helgaas
  0 siblings, 1 reply; 3+ messages in thread
From: Luck, Tony @ 2018-08-08 20:18 UTC (permalink / raw)
  To: Bjorn Helgaas, Mauro Carvalho Chehab, Borislav Petkov
  Cc: linux-edac, linux-kernel, Lee, Chun-Yi

> I think sb_edac.c (and probably other EDAC stuff) lacks PCI domain
> support

There's a patch queued to fix this.

  https://marc.info/?l=linux-edac&m=153256485215534&w=2


> It looks like 88ae80aa609c ("EDAC, skx_edac: Handle systems with
> segmented PCI busses") fixes a similar problem; maybe that should
> be applied elsewhere in EDAC as well?
>
> Why doesn't EDAC use the standard pci_register_driver() interface?
> That would avoid issues like this.  It would also avoid the potential
> conflict of another driver operating on the device at the same time.

EDAC drivers get information to translate system physical addresses
to DIMM addresses from a whole bundle of different PCIe devices config
space.  Connections between different parts of the translation algorithm
depend on finding the devices that belong to the same memory controller
by matching up bus numbers (actually <segment,busnumber> tuples as
we are now finding).

Can pci_register_driver() cope with this odd usage?

-Tony

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: sb_edac.c lacks PCI domain support?
  2018-08-08 20:18 ` Luck, Tony
@ 2018-08-14 13:58   ` Bjorn Helgaas
  0 siblings, 0 replies; 3+ messages in thread
From: Bjorn Helgaas @ 2018-08-14 13:58 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Mauro Carvalho Chehab, Borislav Petkov, linux-edac, linux-kernel,
	Lee, Chun-Yi, linux-pci, Masayoshi Mizuma

[+cc Mizuma-san, linux-pci]

On Wed, Aug 08, 2018 at 08:18:03PM +0000, Luck, Tony wrote:
> > I think sb_edac.c (and probably other EDAC stuff) lacks PCI domain
> > support
> 
> There's a patch queued to fix this.
> 
>   https://marc.info/?l=linux-edac&m=153256485215534&w=2

That's excellent, thank you!

i7core_edac.c also looks suspicious because it manages things based on
"socket", which is computed from a PCI bus number, and doesn't
reference the PCI domain at all:

  bus = pdev->bus->number;
  socket = last_bus - bus;
  alloc_i7core_dev(socket, ...);

> > It looks like 88ae80aa609c ("EDAC, skx_edac: Handle systems with
> > segmented PCI busses") fixes a similar problem; maybe that should
> > be applied elsewhere in EDAC as well?
> >
> > Why doesn't EDAC use the standard pci_register_driver() interface?
> > That would avoid issues like this.  It would also avoid the potential
> > conflict of another driver operating on the device at the same time.
> 
> EDAC drivers get information to translate system physical addresses
> to DIMM addresses from a whole bundle of different PCIe devices config
> space.  Connections between different parts of the translation algorithm
> depend on finding the devices that belong to the same memory controller
> by matching up bus numbers (actually <segment,busnumber> tuples as
> we are now finding).
> 
> Can pci_register_driver() cope with this odd usage?

pci_register_driver() can bind a driver to as many devices as desired.
It would be up to the driver to keep track of all its devices and
match them up by <segment,busnumber>.  But I think the drivers keep
track of all that already.

The biggest problem would probably be the fact that it couldn't depend
on being bound to devices in any given order.  It might have to check
to see whether it has found all the devices it expects, or maybe
become a little resilient to having errors reported when it doesn't
have all the information it needs to decode them.

Bjorn

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-08-14 13:58 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-08 20:07 sb_edac.c lacks PCI domain support? Bjorn Helgaas
2018-08-08 20:18 ` Luck, Tony
2018-08-14 13:58   ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).