All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC: Counters for PCI Express AERs
@ 2018-05-17 21:05 Rajat Jain
  2018-05-17 21:25 ` okaya
  0 siblings, 1 reply; 7+ messages in thread
From: Rajat Jain @ 2018-05-17 21:05 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas

Hello,

I have been thinking about adding counters for different kinds of AERs
and expose them via sysfs. IMHO this would help by giving some sense
of "link quality" for PCIe links (a lot of correctable AERs may
indicate system is workable, but may indicate some signal integrity
issues etc). Currently, on a correctable AER, we do log them, but
having them in sysfs would allow userspace tools to possibly
(periodically) poll them and raise an appropriate warning in case of
too many errors. I know that for my purposes, getting some idea of PCI
link quality or a way to quantize it, would help.

Do you think such counters make sense or would be helpful generically?
Also, please let me know if something like this already exists?

Thanks,

Rajat

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: Counters for PCI Express AERs
  2018-05-17 21:05 RFC: Counters for PCI Express AERs Rajat Jain
@ 2018-05-17 21:25 ` okaya
  2018-05-17 21:48   ` Rajat Jain
  0 siblings, 1 reply; 7+ messages in thread
From: okaya @ 2018-05-17 21:25 UTC (permalink / raw)
  To: Rajat Jain; +Cc: linux-pci, Bjorn Helgaas, linux-pci-owner

On 2018-05-17 17:05, Rajat Jain wrote:
> Hello,
> 
> I have been thinking about adding counters for different kinds of AERs
> and expose them via sysfs. IMHO this would help by giving some sense
> of "link quality" for PCIe links (a lot of correctable AERs may
> indicate system is workable, but may indicate some signal integrity
> issues etc). Currently, on a correctable AER, we do log them, but
> having them in sysfs would allow userspace tools to possibly
> (periodically) poll them and raise an appropriate warning in case of
> too many errors. I know that for my purposes, getting some idea of PCI
> link quality or a way to quantize it, would help.
> 
> Do you think such counters make sense or would be helpful generically?
> Also, please let me know if something like this already exists?

This question came from FB folks last year. They were told to use the 
perf events for counting.

I don't honestly have a strong opinion.

> 
> Thanks,
> 
> Rajat

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: Counters for PCI Express AERs
  2018-05-17 21:25 ` okaya
@ 2018-05-17 21:48   ` Rajat Jain
  2018-05-17 21:52     ` Rajat Jain
  0 siblings, 1 reply; 7+ messages in thread
From: Rajat Jain @ 2018-05-17 21:48 UTC (permalink / raw)
  To: Sinan Kaya, Jes Sorensen; +Cc: linux-pci, Bjorn Helgaas, linux-pci-owner

[+Jes Sorensen]

On Thu, May 17, 2018 at 2:25 PM,  <okaya@codeaurora.org> wrote:
> On 2018-05-17 17:05, Rajat Jain wrote:
>>
>> Hello,
>>
>> I have been thinking about adding counters for different kinds of AERs
>> and expose them via sysfs. IMHO this would help by giving some sense
>> of "link quality" for PCIe links (a lot of correctable AERs may
>> indicate system is workable, but may indicate some signal integrity
>> issues etc). Currently, on a correctable AER, we do log them, but
>> having them in sysfs would allow userspace tools to possibly
>> (periodically) poll them and raise an appropriate warning in case of
>> too many errors. I know that for my purposes, getting some idea of PCI
>> link quality or a way to quantize it, would help.
>>
>> Do you think such counters make sense or would be helpful generically?
>> Also, please let me know if something like this already exists?
>
>
> This question came from FB folks last year. They were told to use the perf
> events for counting.

Thanks for the info. I think you are referring to this:
https://linuxplumbersconf.org/2017/ocw/proposals/4803.html

Jes: did anything come out of the proposal? I'm wondering if you have
any patch that in work-in-progress that I could use may be as a
starting point?

Thanks,

Rajat

>
> I don't honestly have a strong opinion.

Thanks! I'd like to work on this if not already done.

>
>>
>> Thanks,
>>
>> Rajat

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: Counters for PCI Express AERs
  2018-05-17 21:48   ` Rajat Jain
@ 2018-05-17 21:52     ` Rajat Jain
  2018-05-18 14:24       ` Jes Sorensen
  0 siblings, 1 reply; 7+ messages in thread
From: Rajat Jain @ 2018-05-17 21:52 UTC (permalink / raw)
  To: Sinan Kaya, jsorensen; +Cc: linux-pci, Bjorn Helgaas, linux-pci-owner

[Fixing the new email address for Jes Sorensen now]

On Thu, May 17, 2018 at 2:48 PM, Rajat Jain <rajatja@google.com> wrote:
> [+Jes Sorensen]
>
> On Thu, May 17, 2018 at 2:25 PM,  <okaya@codeaurora.org> wrote:
>> On 2018-05-17 17:05, Rajat Jain wrote:
>>>
>>> Hello,
>>>
>>> I have been thinking about adding counters for different kinds of AERs
>>> and expose them via sysfs. IMHO this would help by giving some sense
>>> of "link quality" for PCIe links (a lot of correctable AERs may
>>> indicate system is workable, but may indicate some signal integrity
>>> issues etc). Currently, on a correctable AER, we do log them, but
>>> having them in sysfs would allow userspace tools to possibly
>>> (periodically) poll them and raise an appropriate warning in case of
>>> too many errors. I know that for my purposes, getting some idea of PCI
>>> link quality or a way to quantize it, would help.
>>>
>>> Do you think such counters make sense or would be helpful generically?
>>> Also, please let me know if something like this already exists?
>>
>>
>> This question came from FB folks last year. They were told to use the perf
>> events for counting.
>
> Thanks for the info. I think you are referring to this:
> https://linuxplumbersconf.org/2017/ocw/proposals/4803.html
>
> Jes: did anything come out of the proposal? I'm wondering if you have
> any patch that in work-in-progress that I could use may be as a
> starting point?
>
> Thanks,
>
> Rajat
>
>>
>> I don't honestly have a strong opinion.
>
> Thanks! I'd like to work on this if not already done.
>
>>
>>>
>>> Thanks,
>>>
>>> Rajat

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: Counters for PCI Express AERs
  2018-05-17 21:52     ` Rajat Jain
@ 2018-05-18 14:24       ` Jes Sorensen
  2018-05-18 16:31         ` Rajat Jain
  0 siblings, 1 reply; 7+ messages in thread
From: Jes Sorensen @ 2018-05-18 14:24 UTC (permalink / raw)
  To: Rajat Jain, Sinan Kaya
  Cc: linux-pci, Bjorn Helgaas, linux-pci-owner, Kyle McMartin

On 05/17/2018 05:52 PM, Rajat Jain wrote:
> [Fixing the new email address for Jes Sorensen now]
> 
> On Thu, May 17, 2018 at 2:48 PM, Rajat Jain <rajatja@google.com> wrote:
>> [+Jes Sorensen]
>>
>> On Thu, May 17, 2018 at 2:25 PM,  <okaya@codeaurora.org> wrote:
>>> On 2018-05-17 17:05, Rajat Jain wrote:
>>>>
>>>> Hello,
>>>>
>>>> I have been thinking about adding counters for different kinds of AERs
>>>> and expose them via sysfs. IMHO this would help by giving some sense
>>>> of "link quality" for PCIe links (a lot of correctable AERs may
>>>> indicate system is workable, but may indicate some signal integrity
>>>> issues etc). Currently, on a correctable AER, we do log them, but
>>>> having them in sysfs would allow userspace tools to possibly
>>>> (periodically) poll them and raise an appropriate warning in case of
>>>> too many errors. I know that for my purposes, getting some idea of PCI
>>>> link quality or a way to quantize it, would help.
>>>>
>>>> Do you think such counters make sense or would be helpful generically?
>>>> Also, please let me know if something like this already exists?
>>>
>>>
>>> This question came from FB folks last year. They were told to use the perf
>>> events for counting.
>>
>> Thanks for the info. I think you are referring to this:
>>
>> Jes: did anything come out of the proposal? I'm wondering if you have
>> any patch that in work-in-progress that I could use may be as a
>> starting point?

Kyle McMartin was working on this, I don't know the current status.

Jes

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: Counters for PCI Express AERs
  2018-05-18 14:24       ` Jes Sorensen
@ 2018-05-18 16:31         ` Rajat Jain
  2018-05-23 22:20           ` Kyle McMartin
  0 siblings, 1 reply; 7+ messages in thread
From: Rajat Jain @ 2018-05-18 16:31 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: Sinan Kaya, linux-pci, Bjorn Helgaas, linux-pci-owner, Kyle McMartin

On Fri, May 18, 2018 at 7:24 AM, Jes Sorensen <jsorensen@fb.com> wrote:
> On 05/17/2018 05:52 PM, Rajat Jain wrote:
>> [Fixing the new email address for Jes Sorensen now]
>>
>> On Thu, May 17, 2018 at 2:48 PM, Rajat Jain <rajatja@google.com> wrote:
>>> [+Jes Sorensen]
>>>
>>> On Thu, May 17, 2018 at 2:25 PM,  <okaya@codeaurora.org> wrote:
>>>> On 2018-05-17 17:05, Rajat Jain wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> I have been thinking about adding counters for different kinds of AERs
>>>>> and expose them via sysfs. IMHO this would help by giving some sense
>>>>> of "link quality" for PCIe links (a lot of correctable AERs may
>>>>> indicate system is workable, but may indicate some signal integrity
>>>>> issues etc). Currently, on a correctable AER, we do log them, but
>>>>> having them in sysfs would allow userspace tools to possibly
>>>>> (periodically) poll them and raise an appropriate warning in case of
>>>>> too many errors. I know that for my purposes, getting some idea of PCI
>>>>> link quality or a way to quantize it, would help.
>>>>>
>>>>> Do you think such counters make sense or would be helpful generically?
>>>>> Also, please let me know if something like this already exists?
>>>>
>>>>
>>>> This question came from FB folks last year. They were told to use the perf
>>>> events for counting.
>>>
>>> Thanks for the info. I think you are referring to this:
>>>
>>> Jes: did anything come out of the proposal? I'm wondering if you have
>>> any patch that in work-in-progress that I could use may be as a
>>> starting point?
>
> Kyle McMartin was working on this, I don't know the current status.


Never mind, I think I'm more than halfway there and will be sending a
patch in a day or two.

>
> Jes
>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: Counters for PCI Express AERs
  2018-05-18 16:31         ` Rajat Jain
@ 2018-05-23 22:20           ` Kyle McMartin
  0 siblings, 0 replies; 7+ messages in thread
From: Kyle McMartin @ 2018-05-23 22:20 UTC (permalink / raw)
  To: Rajat Jain; +Cc: Jes Sorensen, Sinan Kaya, linux-pci, Bjorn Helgaas

On Fri, May 18, 2018 at 09:31:11AM -0700, Rajat Jain wrote:
> >>> Jes: did anything come out of the proposal? I'm wondering if you have
> >>> any patch that in work-in-progress that I could use may be as a
> >>> starting point?
> >
> > Kyle McMartin was working on this, I don't know the current status.
> 
> 
> Never mind, I think I'm more than halfway there and will be sending a
> patch in a day or two.
> 

Patch set looks good to me, it's pretty analogous to what I came up with
in the fall. I ended up using the tracepoints instead of patching in sysfs
to enable it across all of our kernel versions in production.

Really glad you did this work, hopefully it gets merged.

cheers, Kyle

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-05-23 22:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-17 21:05 RFC: Counters for PCI Express AERs Rajat Jain
2018-05-17 21:25 ` okaya
2018-05-17 21:48   ` Rajat Jain
2018-05-17 21:52     ` Rajat Jain
2018-05-18 14:24       ` Jes Sorensen
2018-05-18 16:31         ` Rajat Jain
2018-05-23 22:20           ` Kyle McMartin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.