netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* devlink interface for asynchronous event/messages from firmware?
@ 2020-05-21  0:03 Jacob Keller
  2020-05-21  0:16 ` Jakub Kicinski
  0 siblings, 1 reply; 13+ messages in thread
From: Jacob Keller @ 2020-05-21  0:03 UTC (permalink / raw)
  To: Jiri Pirko, Jakub Kicinski, netdev

Hi Jiri, Jakub,

I've been asked to investigate using devlink as a mechanism for
reporting asynchronous events/messages from firmware including
diagnostic messages, etc.

Essentially, the ice firmware can report various status or diagnostic
messages which are useful for debugging internal behavior. We want to be
able to get these messages (and relevant data associated with them) in a
format beyond just "dump it to the dmesg buffer and recover it later".

It seems like this would be an appropriate use of devlink. I thought
maybe this would work with devlink health:

i.e. we create a devlink health reporter, and then when firmware sends a
message, we use devlink_health_report.

But when I dug into this, it doesn't seem like a natural fit. The health
reporters expect to see an "error" state, and don't seem to really fit
the notion of "log a message from firmware" notion.

One of the issues is that the health reporter only keeps one dump, when
what we really want is a way to have a monitoring application get the
dump and then store its contents.

Thoughts on what might make sense for this? It feels like a stretch of
the health interface...

I mean basically what I am thinking of having is using the devlink_fmsg
interface to just send a netlink message that then gets sent over the
devlink monitor socket and gets dumped immediately.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: devlink interface for asynchronous event/messages from firmware?
  2020-05-21  0:03 devlink interface for asynchronous event/messages from firmware? Jacob Keller
@ 2020-05-21  0:16 ` Jakub Kicinski
  2020-05-21 20:22   ` Jacob Keller
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2020-05-21  0:16 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Jiri Pirko, netdev

On Wed, 20 May 2020 17:03:02 -0700 Jacob Keller wrote:
> Hi Jiri, Jakub,
> 
> I've been asked to investigate using devlink as a mechanism for
> reporting asynchronous events/messages from firmware including
> diagnostic messages, etc.
> 
> Essentially, the ice firmware can report various status or diagnostic
> messages which are useful for debugging internal behavior. We want to be
> able to get these messages (and relevant data associated with them) in a
> format beyond just "dump it to the dmesg buffer and recover it later".
> 
> It seems like this would be an appropriate use of devlink. I thought
> maybe this would work with devlink health:
> 
> i.e. we create a devlink health reporter, and then when firmware sends a
> message, we use devlink_health_report.
> 
> But when I dug into this, it doesn't seem like a natural fit. The health
> reporters expect to see an "error" state, and don't seem to really fit
> the notion of "log a message from firmware" notion.
> 
> One of the issues is that the health reporter only keeps one dump, when
> what we really want is a way to have a monitoring application get the
> dump and then store its contents.
> 
> Thoughts on what might make sense for this? It feels like a stretch of
> the health interface...
> 
> I mean basically what I am thinking of having is using the devlink_fmsg
> interface to just send a netlink message that then gets sent over the
> devlink monitor socket and gets dumped immediately.

Why does user space need a raw firmware interface in the first place?

Examples?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: devlink interface for asynchronous event/messages from firmware?
  2020-05-21  0:16 ` Jakub Kicinski
@ 2020-05-21 20:22   ` Jacob Keller
  2020-05-21 20:52     ` Ido Schimmel
  0 siblings, 1 reply; 13+ messages in thread
From: Jacob Keller @ 2020-05-21 20:22 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Jiri Pirko, netdev



On 5/20/2020 5:16 PM, Jakub Kicinski wrote:
> On Wed, 20 May 2020 17:03:02 -0700 Jacob Keller wrote:
>> Hi Jiri, Jakub,
>>
>> I've been asked to investigate using devlink as a mechanism for
>> reporting asynchronous events/messages from firmware including
>> diagnostic messages, etc.
>>
>> Essentially, the ice firmware can report various status or diagnostic
>> messages which are useful for debugging internal behavior. We want to be
>> able to get these messages (and relevant data associated with them) in a
>> format beyond just "dump it to the dmesg buffer and recover it later".
>>
>> It seems like this would be an appropriate use of devlink. I thought
>> maybe this would work with devlink health:
>>
>> i.e. we create a devlink health reporter, and then when firmware sends a
>> message, we use devlink_health_report.
>>
>> But when I dug into this, it doesn't seem like a natural fit. The health
>> reporters expect to see an "error" state, and don't seem to really fit
>> the notion of "log a message from firmware" notion.
>>
>> One of the issues is that the health reporter only keeps one dump, when
>> what we really want is a way to have a monitoring application get the
>> dump and then store its contents.
>>
>> Thoughts on what might make sense for this? It feels like a stretch of
>> the health interface...
>>
>> I mean basically what I am thinking of having is using the devlink_fmsg
>> interface to just send a netlink message that then gets sent over the
>> devlink monitor socket and gets dumped immediately.
> 
> Why does user space need a raw firmware interface in the first place?
> 
> Examples?
> 

So the ice firmware can optionally send diagnostic debug messages via
its control queue. The current solutions we've used internally
essentially hex-dump the binary contents to the kernel log, and then
these get scraped and converted into a useful format for human consumption.

I'm not 100% of the format, but I know it's based on a decoding file
that is specific to a given firmware image, and thus attempting to tie
this into the driver is problematic.

There is also a plan to provide a simpler interface for some of the
diagnostic messages where a simple bijection between one code to one
message for a handful of events, like if the link engine can detect a
known reason why it wasn't able to get link. I suppose these could be
translated and immediately printed by the driver without a special
interface.

-Jake

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: devlink interface for asynchronous event/messages from firmware?
  2020-05-21 20:22   ` Jacob Keller
@ 2020-05-21 20:52     ` Ido Schimmel
  2020-05-21 20:59       ` Jacob Keller
  0 siblings, 1 reply; 13+ messages in thread
From: Ido Schimmel @ 2020-05-21 20:52 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Jakub Kicinski, Jiri Pirko, netdev, petrm, amitc

On Thu, May 21, 2020 at 01:22:34PM -0700, Jacob Keller wrote:
> On 5/20/2020 5:16 PM, Jakub Kicinski wrote:
> > On Wed, 20 May 2020 17:03:02 -0700 Jacob Keller wrote:
> >> Hi Jiri, Jakub,
> >>
> >> I've been asked to investigate using devlink as a mechanism for
> >> reporting asynchronous events/messages from firmware including
> >> diagnostic messages, etc.
> >>
> >> Essentially, the ice firmware can report various status or diagnostic
> >> messages which are useful for debugging internal behavior. We want to be
> >> able to get these messages (and relevant data associated with them) in a
> >> format beyond just "dump it to the dmesg buffer and recover it later".
> >>
> >> It seems like this would be an appropriate use of devlink. I thought
> >> maybe this would work with devlink health:
> >>
> >> i.e. we create a devlink health reporter, and then when firmware sends a
> >> message, we use devlink_health_report.
> >>
> >> But when I dug into this, it doesn't seem like a natural fit. The health
> >> reporters expect to see an "error" state, and don't seem to really fit
> >> the notion of "log a message from firmware" notion.
> >>
> >> One of the issues is that the health reporter only keeps one dump, when
> >> what we really want is a way to have a monitoring application get the
> >> dump and then store its contents.
> >>
> >> Thoughts on what might make sense for this? It feels like a stretch of
> >> the health interface...
> >>
> >> I mean basically what I am thinking of having is using the devlink_fmsg
> >> interface to just send a netlink message that then gets sent over the
> >> devlink monitor socket and gets dumped immediately.
> > 
> > Why does user space need a raw firmware interface in the first place?
> > 
> > Examples?
> > 
> 
> So the ice firmware can optionally send diagnostic debug messages via
> its control queue. The current solutions we've used internally
> essentially hex-dump the binary contents to the kernel log, and then
> these get scraped and converted into a useful format for human consumption.
> 
> I'm not 100% of the format, but I know it's based on a decoding file
> that is specific to a given firmware image, and thus attempting to tie
> this into the driver is problematic.

You explained how it works, but not why it's needed :)

> There is also a plan to provide a simpler interface for some of the
> diagnostic messages where a simple bijection between one code to one
> message for a handful of events, like if the link engine can detect a
> known reason why it wasn't able to get link. I suppose these could be
> translated and immediately printed by the driver without a special
> interface.

Petr worked on something similar last year:
https://lore.kernel.org/netdev/cover.1552672441.git.petrm@mellanox.com/

Amit is currently working on a new version based on ethtool (netlink).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: devlink interface for asynchronous event/messages from firmware?
  2020-05-21 20:52     ` Ido Schimmel
@ 2020-05-21 20:59       ` Jacob Keller
  2020-05-21 21:51         ` Jakub Kicinski
  2020-05-22 11:03         ` Jiri Pirko
  0 siblings, 2 replies; 13+ messages in thread
From: Jacob Keller @ 2020-05-21 20:59 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: Jakub Kicinski, Jiri Pirko, netdev, petrm, amitc



On 5/21/2020 1:52 PM, Ido Schimmel wrote:
> On Thu, May 21, 2020 at 01:22:34PM -0700, Jacob Keller wrote:
>> On 5/20/2020 5:16 PM, Jakub Kicinski wrote:
>>> On Wed, 20 May 2020 17:03:02 -0700 Jacob Keller wrote:
>>>> Hi Jiri, Jakub,
>>>>
>>>> I've been asked to investigate using devlink as a mechanism for
>>>> reporting asynchronous events/messages from firmware including
>>>> diagnostic messages, etc.
>>>>
>>>> Essentially, the ice firmware can report various status or diagnostic
>>>> messages which are useful for debugging internal behavior. We want to be
>>>> able to get these messages (and relevant data associated with them) in a
>>>> format beyond just "dump it to the dmesg buffer and recover it later".
>>>>
>>>> It seems like this would be an appropriate use of devlink. I thought
>>>> maybe this would work with devlink health:
>>>>
>>>> i.e. we create a devlink health reporter, and then when firmware sends a
>>>> message, we use devlink_health_report.
>>>>
>>>> But when I dug into this, it doesn't seem like a natural fit. The health
>>>> reporters expect to see an "error" state, and don't seem to really fit
>>>> the notion of "log a message from firmware" notion.
>>>>
>>>> One of the issues is that the health reporter only keeps one dump, when
>>>> what we really want is a way to have a monitoring application get the
>>>> dump and then store its contents.
>>>>
>>>> Thoughts on what might make sense for this? It feels like a stretch of
>>>> the health interface...
>>>>
>>>> I mean basically what I am thinking of having is using the devlink_fmsg
>>>> interface to just send a netlink message that then gets sent over the
>>>> devlink monitor socket and gets dumped immediately.
>>>
>>> Why does user space need a raw firmware interface in the first place?
>>>
>>> Examples?
>>>
>>
>> So the ice firmware can optionally send diagnostic debug messages via
>> its control queue. The current solutions we've used internally
>> essentially hex-dump the binary contents to the kernel log, and then
>> these get scraped and converted into a useful format for human consumption.
>>
>> I'm not 100% of the format, but I know it's based on a decoding file
>> that is specific to a given firmware image, and thus attempting to tie
>> this into the driver is problematic.
> 
> You explained how it works, but not why it's needed :)

Well, the reason we want it is to be able to read the debug/diagnostics
data in order to debug issues that might be related to firmware or
software mis-use of firmware interfaces.

By having it be a separate interface rather than trying to scrape from
the kernel message buffer, it becomes something we can have as a
possibility for debugging in the field.

> 
>> There is also a plan to provide a simpler interface for some of the
>> diagnostic messages where a simple bijection between one code to one
>> message for a handful of events, like if the link engine can detect a
>> known reason why it wasn't able to get link. I suppose these could be
>> translated and immediately printed by the driver without a special
>> interface.
> 
> Petr worked on something similar last year:
> https://lore.kernel.org/netdev/cover.1552672441.git.petrm@mellanox.com/
> 
> Amit is currently working on a new version based on ethtool (netlink).
> 

I'll take a look, thanks!

-Jake

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: devlink interface for asynchronous event/messages from firmware?
  2020-05-21 20:59       ` Jacob Keller
@ 2020-05-21 21:51         ` Jakub Kicinski
  2020-05-21 22:09           ` Jacob Keller
  2020-05-22 11:00           ` Jiri Pirko
  2020-05-22 11:03         ` Jiri Pirko
  1 sibling, 2 replies; 13+ messages in thread
From: Jakub Kicinski @ 2020-05-21 21:51 UTC (permalink / raw)
  To: Jacob Keller, Ido Schimmel; +Cc: Jiri Pirko, netdev, petrm, amitc

On Thu, 21 May 2020 13:59:32 -0700 Jacob Keller wrote:
> >> So the ice firmware can optionally send diagnostic debug messages via
> >> its control queue. The current solutions we've used internally
> >> essentially hex-dump the binary contents to the kernel log, and then
> >> these get scraped and converted into a useful format for human consumption.
> >>
> >> I'm not 100% of the format, but I know it's based on a decoding file
> >> that is specific to a given firmware image, and thus attempting to tie
> >> this into the driver is problematic.  
> > 
> > You explained how it works, but not why it's needed :)  
> 
> Well, the reason we want it is to be able to read the debug/diagnostics
> data in order to debug issues that might be related to firmware or
> software mis-use of firmware interfaces.
> 
> By having it be a separate interface rather than trying to scrape from
> the kernel message buffer, it becomes something we can have as a
> possibility for debugging in the field.

For pure debug/tracing perhaps trace_devlink_hwerr() is the right fit?

Right Ido?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: devlink interface for asynchronous event/messages from firmware?
  2020-05-21 21:51         ` Jakub Kicinski
@ 2020-05-21 22:09           ` Jacob Keller
  2020-05-21 22:32             ` Ido Schimmel
  2020-05-22 11:00           ` Jiri Pirko
  1 sibling, 1 reply; 13+ messages in thread
From: Jacob Keller @ 2020-05-21 22:09 UTC (permalink / raw)
  To: Jakub Kicinski, Ido Schimmel; +Cc: Jiri Pirko, netdev, petrm, amitc



On 5/21/2020 2:51 PM, Jakub Kicinski wrote:
> On Thu, 21 May 2020 13:59:32 -0700 Jacob Keller wrote:
>>>> So the ice firmware can optionally send diagnostic debug messages via
>>>> its control queue. The current solutions we've used internally
>>>> essentially hex-dump the binary contents to the kernel log, and then
>>>> these get scraped and converted into a useful format for human consumption.
>>>>
>>>> I'm not 100% of the format, but I know it's based on a decoding file
>>>> that is specific to a given firmware image, and thus attempting to tie
>>>> this into the driver is problematic.  
>>>
>>> You explained how it works, but not why it's needed :)  
>>
>> Well, the reason we want it is to be able to read the debug/diagnostics
>> data in order to debug issues that might be related to firmware or
>> software mis-use of firmware interfaces.
>>
>> By having it be a separate interface rather than trying to scrape from
>> the kernel message buffer, it becomes something we can have as a
>> possibility for debugging in the field.
> 
> For pure debug/tracing perhaps trace_devlink_hwerr() is the right fit?
> 
> Right Ido?
> 

Hm, yes that might be more suitable for this purpose. I'll take a look
at it!

Thanks,
Jake

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: devlink interface for asynchronous event/messages from firmware?
  2020-05-21 22:09           ` Jacob Keller
@ 2020-05-21 22:32             ` Ido Schimmel
  0 siblings, 0 replies; 13+ messages in thread
From: Ido Schimmel @ 2020-05-21 22:32 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Jakub Kicinski, Jiri Pirko, netdev, petrm, amitc

On Thu, May 21, 2020 at 03:09:57PM -0700, Jacob Keller wrote:
> 
> 
> On 5/21/2020 2:51 PM, Jakub Kicinski wrote:
> > On Thu, 21 May 2020 13:59:32 -0700 Jacob Keller wrote:
> >>>> So the ice firmware can optionally send diagnostic debug messages via
> >>>> its control queue. The current solutions we've used internally
> >>>> essentially hex-dump the binary contents to the kernel log, and then
> >>>> these get scraped and converted into a useful format for human consumption.
> >>>>
> >>>> I'm not 100% of the format, but I know it's based on a decoding file
> >>>> that is specific to a given firmware image, and thus attempting to tie
> >>>> this into the driver is problematic.  
> >>>
> >>> You explained how it works, but not why it's needed :)  
> >>
> >> Well, the reason we want it is to be able to read the debug/diagnostics
> >> data in order to debug issues that might be related to firmware or
> >> software mis-use of firmware interfaces.
> >>
> >> By having it be a separate interface rather than trying to scrape from
> >> the kernel message buffer, it becomes something we can have as a
> >> possibility for debugging in the field.
> > 
> > For pure debug/tracing perhaps trace_devlink_hwerr() is the right fit?
> > 
> > Right Ido?
> > 
> 
> Hm, yes that might be more suitable for this purpose. I'll take a look
> at it!

Jacob, here is more context that might help:

https://lore.kernel.org/netdev/20191103083554.6317-1-idosch@idosch.org/
https://lore.kernel.org/netdev/20191112064830.27002-1-idosch@idosch.org/

> 
> Thanks,
> Jake

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: devlink interface for asynchronous event/messages from firmware?
  2020-05-21 21:51         ` Jakub Kicinski
  2020-05-21 22:09           ` Jacob Keller
@ 2020-05-22 11:00           ` Jiri Pirko
  2020-05-22 17:46             ` Jakub Kicinski
  2020-05-26 21:00             ` Jacob Keller
  1 sibling, 2 replies; 13+ messages in thread
From: Jiri Pirko @ 2020-05-22 11:00 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Jacob Keller, Ido Schimmel, netdev, petrm, amitc

Thu, May 21, 2020 at 11:51:13PM CEST, kuba@kernel.org wrote:
>On Thu, 21 May 2020 13:59:32 -0700 Jacob Keller wrote:
>> >> So the ice firmware can optionally send diagnostic debug messages via
>> >> its control queue. The current solutions we've used internally
>> >> essentially hex-dump the binary contents to the kernel log, and then
>> >> these get scraped and converted into a useful format for human consumption.
>> >>
>> >> I'm not 100% of the format, but I know it's based on a decoding file
>> >> that is specific to a given firmware image, and thus attempting to tie
>> >> this into the driver is problematic.  
>> > 
>> > You explained how it works, but not why it's needed :)  
>> 
>> Well, the reason we want it is to be able to read the debug/diagnostics
>> data in order to debug issues that might be related to firmware or
>> software mis-use of firmware interfaces.
>> 
>> By having it be a separate interface rather than trying to scrape from
>> the kernel message buffer, it becomes something we can have as a
>> possibility for debugging in the field.
>
>For pure debug/tracing perhaps trace_devlink_hwerr() is the right fit?

Well, trace_devlink_hwerr() is for simple errors that are mapped 1:1
with some string. From what I got, Jacob needs to pass some data
structures to the user. Something more similar to health reporter dumps
and their fmsg.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: devlink interface for asynchronous event/messages from firmware?
  2020-05-21 20:59       ` Jacob Keller
  2020-05-21 21:51         ` Jakub Kicinski
@ 2020-05-22 11:03         ` Jiri Pirko
  1 sibling, 0 replies; 13+ messages in thread
From: Jiri Pirko @ 2020-05-22 11:03 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Ido Schimmel, Jakub Kicinski, netdev, petrm, amitc

Thu, May 21, 2020 at 10:59:32PM CEST, jacob.e.keller@intel.com wrote:
>
>
>On 5/21/2020 1:52 PM, Ido Schimmel wrote:
>> On Thu, May 21, 2020 at 01:22:34PM -0700, Jacob Keller wrote:
>>> On 5/20/2020 5:16 PM, Jakub Kicinski wrote:
>>>> On Wed, 20 May 2020 17:03:02 -0700 Jacob Keller wrote:
>>>>> Hi Jiri, Jakub,
>>>>>
>>>>> I've been asked to investigate using devlink as a mechanism for
>>>>> reporting asynchronous events/messages from firmware including
>>>>> diagnostic messages, etc.
>>>>>
>>>>> Essentially, the ice firmware can report various status or diagnostic
>>>>> messages which are useful for debugging internal behavior. We want to be
>>>>> able to get these messages (and relevant data associated with them) in a
>>>>> format beyond just "dump it to the dmesg buffer and recover it later".
>>>>>
>>>>> It seems like this would be an appropriate use of devlink. I thought
>>>>> maybe this would work with devlink health:
>>>>>
>>>>> i.e. we create a devlink health reporter, and then when firmware sends a
>>>>> message, we use devlink_health_report.
>>>>>
>>>>> But when I dug into this, it doesn't seem like a natural fit. The health
>>>>> reporters expect to see an "error" state, and don't seem to really fit
>>>>> the notion of "log a message from firmware" notion.
>>>>>
>>>>> One of the issues is that the health reporter only keeps one dump, when
>>>>> what we really want is a way to have a monitoring application get the
>>>>> dump and then store its contents.
>>>>>
>>>>> Thoughts on what might make sense for this? It feels like a stretch of
>>>>> the health interface...
>>>>>
>>>>> I mean basically what I am thinking of having is using the devlink_fmsg
>>>>> interface to just send a netlink message that then gets sent over the
>>>>> devlink monitor socket and gets dumped immediately.
>>>>
>>>> Why does user space need a raw firmware interface in the first place?
>>>>
>>>> Examples?
>>>>
>>>
>>> So the ice firmware can optionally send diagnostic debug messages via
>>> its control queue. The current solutions we've used internally
>>> essentially hex-dump the binary contents to the kernel log, and then
>>> these get scraped and converted into a useful format for human consumption.
>>>
>>> I'm not 100% of the format, but I know it's based on a decoding file
>>> that is specific to a given firmware image, and thus attempting to tie
>>> this into the driver is problematic.
>> 
>> You explained how it works, but not why it's needed :)
>
>Well, the reason we want it is to be able to read the debug/diagnostics
>data in order to debug issues that might be related to firmware or
>software mis-use of firmware interfaces.

I think that the health reporter would be able to serve this purpose.
There is an event in firmware-> the event is propagated to the user.

The limitation we have in devlink health right now is that we only store
the last event. So perhaps we need to extend to optionally hold a
list/ring-buffer of events?


>
>By having it be a separate interface rather than trying to scrape from
>the kernel message buffer, it becomes something we can have as a
>possibility for debugging in the field.
>
>> 
>>> There is also a plan to provide a simpler interface for some of the
>>> diagnostic messages where a simple bijection between one code to one
>>> message for a handful of events, like if the link engine can detect a
>>> known reason why it wasn't able to get link. I suppose these could be
>>> translated and immediately printed by the driver without a special
>>> interface.
>> 
>> Petr worked on something similar last year:
>> https://lore.kernel.org/netdev/cover.1552672441.git.petrm@mellanox.com/
>> 
>> Amit is currently working on a new version based on ethtool (netlink).
>> 
>
>I'll take a look, thanks!
>
>-Jake

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: devlink interface for asynchronous event/messages from firmware?
  2020-05-22 11:00           ` Jiri Pirko
@ 2020-05-22 17:46             ` Jakub Kicinski
  2020-05-26 21:13               ` Jacob Keller
  2020-05-26 21:00             ` Jacob Keller
  1 sibling, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2020-05-22 17:46 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Jacob Keller, Ido Schimmel, netdev, petrm, amitc

On Fri, 22 May 2020 13:00:28 +0200 Jiri Pirko wrote:
> Thu, May 21, 2020 at 11:51:13PM CEST, kuba@kernel.org wrote:
> >On Thu, 21 May 2020 13:59:32 -0700 Jacob Keller wrote:  
> >> >> So the ice firmware can optionally send diagnostic debug messages via
> >> >> its control queue. The current solutions we've used internally
> >> >> essentially hex-dump the binary contents to the kernel log, and then
> >> >> these get scraped and converted into a useful format for human consumption.
> >> >>
> >> >> I'm not 100% of the format, but I know it's based on a decoding file
> >> >> that is specific to a given firmware image, and thus attempting to tie
> >> >> this into the driver is problematic.    
> >> > 
> >> > You explained how it works, but not why it's needed :)    
> >> 
> >> Well, the reason we want it is to be able to read the debug/diagnostics
> >> data in order to debug issues that might be related to firmware or
> >> software mis-use of firmware interfaces.
> >> 
> >> By having it be a separate interface rather than trying to scrape from
> >> the kernel message buffer, it becomes something we can have as a
> >> possibility for debugging in the field.  
> >
> >For pure debug/tracing perhaps trace_devlink_hwerr() is the right fit?  
> 
> Well, trace_devlink_hwerr() is for simple errors that are mapped 1:1
> with some string.

Ah, damn, I missed it takes char :/

> From what I got, Jacob needs to pass some data structures to the
> user. Something more similar to health reporter dumps and their fmsg.

For health reporters AFAIU right now every health reporter event
indicates something bad has happened, so it should be logged and
potentially reported to the vendor.

My understanding is that Jake needs more of a tracing infra, for
debug messages. Is that true? Do you need an on/off switch for 
those as well?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: devlink interface for asynchronous event/messages from firmware?
  2020-05-22 11:00           ` Jiri Pirko
  2020-05-22 17:46             ` Jakub Kicinski
@ 2020-05-26 21:00             ` Jacob Keller
  1 sibling, 0 replies; 13+ messages in thread
From: Jacob Keller @ 2020-05-26 21:00 UTC (permalink / raw)
  To: Jiri Pirko, Jakub Kicinski; +Cc: Ido Schimmel, netdev, petrm, amitc



On 5/22/2020 4:00 AM, Jiri Pirko wrote:
> Thu, May 21, 2020 at 11:51:13PM CEST, kuba@kernel.org wrote:
>> On Thu, 21 May 2020 13:59:32 -0700 Jacob Keller wrote:
>>>>> So the ice firmware can optionally send diagnostic debug messages via
>>>>> its control queue. The current solutions we've used internally
>>>>> essentially hex-dump the binary contents to the kernel log, and then
>>>>> these get scraped and converted into a useful format for human consumption.
>>>>>
>>>>> I'm not 100% of the format, but I know it's based on a decoding file
>>>>> that is specific to a given firmware image, and thus attempting to tie
>>>>> this into the driver is problematic.  
>>>>
>>>> You explained how it works, but not why it's needed :)  
>>>
>>> Well, the reason we want it is to be able to read the debug/diagnostics
>>> data in order to debug issues that might be related to firmware or
>>> software mis-use of firmware interfaces.
>>>
>>> By having it be a separate interface rather than trying to scrape from
>>> the kernel message buffer, it becomes something we can have as a
>>> possibility for debugging in the field.
>>
>> For pure debug/tracing perhaps trace_devlink_hwerr() is the right fit?
> 
> Well, trace_devlink_hwerr() is for simple errors that are mapped 1:1
> with some string. From what I got, Jacob needs to pass some data
> structures to the user. Something more similar to health reporter dumps
> and their fmsg.
> 

Right. From my understanding the messages for debugging are not in a
format that can be immediately turned into a text string.

The reasoning behind this is that the set of messages changes,
(especially during early firmware bringup) and thus sending actual ASCII
messages doesn't work well. It goes back to the "firmware is a black box".

The problem is that in practice, we need ways to help debug this black
box, and this was one method that doesn't require hooking up a more
expensive device to intercept and debug with a step-through debugger. It
also enables capturing more verbose information about what the firmware
is doing.

But from how I understand it, the messages can't really be immediately
interpreted into usable format by the kernel. I suppose in theory they
could but it then requires carrying the full translation table.

Today, this is done by using a custom driver which logs the messages
directly to the kernel log buffer, which we know isn't the best solution.

Using a trace point is less bad, since that goes into the tracefs, and
will be disabled by default and goes into the tracefs system instead of
going into the default print buffer...

The pain is the fact that we have to request loading a custom driver
that enables these prints, meaning that it is harder to obtain the data
than if we can just say "enable firmware logs, reproduce the issue, and
grab this data"

Thanks,
Jake

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: devlink interface for asynchronous event/messages from firmware?
  2020-05-22 17:46             ` Jakub Kicinski
@ 2020-05-26 21:13               ` Jacob Keller
  0 siblings, 0 replies; 13+ messages in thread
From: Jacob Keller @ 2020-05-26 21:13 UTC (permalink / raw)
  To: Jakub Kicinski, Jiri Pirko; +Cc: Ido Schimmel, netdev, petrm, amitc



On 5/22/2020 10:46 AM, Jakub Kicinski wrote:
> On Fri, 22 May 2020 13:00:28 +0200 Jiri Pirko wrote:
>> Thu, May 21, 2020 at 11:51:13PM CEST, kuba@kernel.org wrote:
>>> For pure debug/tracing perhaps trace_devlink_hwerr() is the right fit?  
>>
>> Well, trace_devlink_hwerr() is for simple errors that are mapped 1:1
>> with some string.
> 
> Ah, damn, I missed it takes char :/

using trace_devlink_hwerr is better than what we *have* been doing at
least. :) I think if we instead made our own driver trace point it might
work well enough.

> 
>> From what I got, Jacob needs to pass some data structures to the
>> user. Something more similar to health reporter dumps and their fmsg.
> 
> For health reporters AFAIU right now every health reporter event
> indicates something bad has happened, so it should be logged and
> potentially reported to the vendor.
> 

Right, that's why I don't think it's a great fit.


> My understanding is that Jake needs more of a tracing infra, for
> debug messages. Is that true? Do you need an on/off switch for 
> those as well?
> 

The messages come over different "modules" of the firmware, I think we
have ~16-20 or so modules, so ideally we'd have an on-off switch for
each module, and there's also a message level range which is sort of
like the dbg, info, err messaging.

The current solution relies on a custom driver build that enables the
logging and the messaging, and uses some module parameters to configure
this stuff. The big downside is that we don't feel the current
implementation can be left in, certainly not upstream. This means,
anytime a firmware engineer says "please get us firmware logs" we have
to reproduce whatever issue with the custom build of the driver.

The value of having this information is a significant increase in
productivity when debugging issues that might be occurring in the
firmware, or in misuse of fw<->driver interfaces, or missed expectations
between developers, etc.

Our goal is to find something that we can safely leave in the driver
that will be off by default, but enabled if necessary to capture the
logging data.

From the sounds of it, maybe the best solution is to implement this as a
trace event. Possibly we could just implement it as a driver-specific
trace event so it'd show up in tracing/events/<driver>/fwlogs, or
something like that. That still leaves open the question of the best way
to configure which modules and levels are enabled...

This debug logging is separate from a similar-sounding system that is
intended to report non-debug messages such as link-failure reason. I do
agree that something like that ought to instead be handled by the driver
determining "oh this is a link failure indication, so I'll report it
over the ethtool netlink interface, and convert it to the value expected
by that interface".

I'm not sure what other data besides link-failure reporting that is
intended to be sent in this simpler format, as I haven't gotten any
other examples yet. The intent was to have these messages displayed by
doing a simple lookup from code to message, as there would be
significantly fewer of these and they are intended to help guide system
administrators. But given that the only example I've seen so far is the
link messages, it's unclear to me what else they would be used for.

And just to clarify, in either case the intention is that these are
one-way and read-only interfaces.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-05-26 21:13 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-21  0:03 devlink interface for asynchronous event/messages from firmware? Jacob Keller
2020-05-21  0:16 ` Jakub Kicinski
2020-05-21 20:22   ` Jacob Keller
2020-05-21 20:52     ` Ido Schimmel
2020-05-21 20:59       ` Jacob Keller
2020-05-21 21:51         ` Jakub Kicinski
2020-05-21 22:09           ` Jacob Keller
2020-05-21 22:32             ` Ido Schimmel
2020-05-22 11:00           ` Jiri Pirko
2020-05-22 17:46             ` Jakub Kicinski
2020-05-26 21:13               ` Jacob Keller
2020-05-26 21:00             ` Jacob Keller
2020-05-22 11:03         ` Jiri Pirko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).