All of lore.kernel.org
 help / color / mirror / Atom feed
* PCI: aer: AER correctable status register may be cleared before the aer_isr workqueue inspects it
@ 2016-11-12  2:49 Duc Dang
  0 siblings, 0 replies; 2+ messages in thread
From: Duc Dang @ 2016-11-12  2:49 UTC (permalink / raw)
  To: Bjorn Helgaas, Bjorn Helgaas, linux-pci; +Cc: patches

[resend in text mode]

Hi Bjorn and All,

I ran into this error message when testing with my X-Gene Mustang
board (on kernel 4.9-rc1):
pcieport 0002:00:00.0: AER: Corrected error received: id=0000
pcieport 0002:00:00.0: can't find device of ID0000

Looking into aer_isr() code, in case of handling correctable AER
event, handle_error_source() will be called and it will clear
Correctable Error Status Register of the device that reports AER
event. This operation may end up clearing the status bit of the new
(and same type) correctable AER event that happen when this aer_isr
worker thread is still running; and cause the next aer_isr worker
thread find no error status bit get set, so it prints out the above
warning messages.

I can see a possible solution is we cache all AER status registers of
both the root port and all of its end-point devices inside interrupt
handler (aer_irq) and pass these information to the aer_isr worker
thread. But it seems an expensive operation to be done in interrupt
context and I am not sure if you've already encountered and thought
about this issue before?

Regards,
Duc Dang.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* PCI: aer: AER correctable status register may be cleared before the aer_isr workqueue inspects it
@ 2016-11-12  2:38 Duc Dang
  0 siblings, 0 replies; 2+ messages in thread
From: Duc Dang @ 2016-11-12  2:38 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci; +Cc: patches

[-- Attachment #1: Type: text/plain, Size: 1094 bytes --]

Hi Bjorn and All,

I ran into this error message when testing with my X-Gene Mustang board (on
kernel 4.9-rc1):
pcieport 0002:00:00.0: AER: Corrected error received: id=0000
pcieport 0002:00:00.0: can't find device of ID0000

Looking into aer_isr() code, in case of handling correctable AER event,
handle_error_source() will be called and it will clear Correctable Error
Status Register of the device that reports AER event. This operation may
end up clearing the status bit of the new (and same type) correctable AER
event that happen when this aer_isr worker thread is still running; and
cause the next aer_isr worker thread find no error status bit get set, so
it prints out the above warning messages.

I can see a possible solution is we cache all AER status registers of both
the root port and all of its end-point devices inside interrupt handler
(aer_irq) and pass these information to the aer_isr worker thread. But it
seems an expensive operation to be done in interrupt context and I am not
sure if you've already encountered and thought about this issue before?

Regards,
Duc Dang.

[-- Attachment #2: Type: text/html, Size: 1344 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2016-11-12  2:49 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-12  2:49 PCI: aer: AER correctable status register may be cleared before the aer_isr workqueue inspects it Duc Dang
  -- strict thread matches above, loose matches on Subject: below --
2016-11-12  2:38 Duc Dang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.