* devlink interface for asynchronous event/messages from firmware? @ 2020-05-21 0:03 Jacob Keller 2020-05-21 0:16 ` Jakub Kicinski 0 siblings, 1 reply; 13+ messages in thread From: Jacob Keller @ 2020-05-21 0:03 UTC (permalink / raw) To: Jiri Pirko, Jakub Kicinski, netdev Hi Jiri, Jakub, I've been asked to investigate using devlink as a mechanism for reporting asynchronous events/messages from firmware including diagnostic messages, etc. Essentially, the ice firmware can report various status or diagnostic messages which are useful for debugging internal behavior. We want to be able to get these messages (and relevant data associated with them) in a format beyond just "dump it to the dmesg buffer and recover it later". It seems like this would be an appropriate use of devlink. I thought maybe this would work with devlink health: i.e. we create a devlink health reporter, and then when firmware sends a message, we use devlink_health_report. But when I dug into this, it doesn't seem like a natural fit. The health reporters expect to see an "error" state, and don't seem to really fit the notion of "log a message from firmware" notion. One of the issues is that the health reporter only keeps one dump, when what we really want is a way to have a monitoring application get the dump and then store its contents. Thoughts on what might make sense for this? It feels like a stretch of the health interface... I mean basically what I am thinking of having is using the devlink_fmsg interface to just send a netlink message that then gets sent over the devlink monitor socket and gets dumped immediately. Thanks, Jake ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: devlink interface for asynchronous event/messages from firmware? 2020-05-21 0:03 devlink interface for asynchronous event/messages from firmware? Jacob Keller @ 2020-05-21 0:16 ` Jakub Kicinski 2020-05-21 20:22 ` Jacob Keller 0 siblings, 1 reply; 13+ messages in thread From: Jakub Kicinski @ 2020-05-21 0:16 UTC (permalink / raw) To: Jacob Keller; +Cc: Jiri Pirko, netdev On Wed, 20 May 2020 17:03:02 -0700 Jacob Keller wrote: > Hi Jiri, Jakub, > > I've been asked to investigate using devlink as a mechanism for > reporting asynchronous events/messages from firmware including > diagnostic messages, etc. > > Essentially, the ice firmware can report various status or diagnostic > messages which are useful for debugging internal behavior. We want to be > able to get these messages (and relevant data associated with them) in a > format beyond just "dump it to the dmesg buffer and recover it later". > > It seems like this would be an appropriate use of devlink. I thought > maybe this would work with devlink health: > > i.e. we create a devlink health reporter, and then when firmware sends a > message, we use devlink_health_report. > > But when I dug into this, it doesn't seem like a natural fit. The health > reporters expect to see an "error" state, and don't seem to really fit > the notion of "log a message from firmware" notion. > > One of the issues is that the health reporter only keeps one dump, when > what we really want is a way to have a monitoring application get the > dump and then store its contents. > > Thoughts on what might make sense for this? It feels like a stretch of > the health interface... > > I mean basically what I am thinking of having is using the devlink_fmsg > interface to just send a netlink message that then gets sent over the > devlink monitor socket and gets dumped immediately. Why does user space need a raw firmware interface in the first place? Examples? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: devlink interface for asynchronous event/messages from firmware? 2020-05-21 0:16 ` Jakub Kicinski @ 2020-05-21 20:22 ` Jacob Keller 2020-05-21 20:52 ` Ido Schimmel 0 siblings, 1 reply; 13+ messages in thread From: Jacob Keller @ 2020-05-21 20:22 UTC (permalink / raw) To: Jakub Kicinski; +Cc: Jiri Pirko, netdev On 5/20/2020 5:16 PM, Jakub Kicinski wrote: > On Wed, 20 May 2020 17:03:02 -0700 Jacob Keller wrote: >> Hi Jiri, Jakub, >> >> I've been asked to investigate using devlink as a mechanism for >> reporting asynchronous events/messages from firmware including >> diagnostic messages, etc. >> >> Essentially, the ice firmware can report various status or diagnostic >> messages which are useful for debugging internal behavior. We want to be >> able to get these messages (and relevant data associated with them) in a >> format beyond just "dump it to the dmesg buffer and recover it later". >> >> It seems like this would be an appropriate use of devlink. I thought >> maybe this would work with devlink health: >> >> i.e. we create a devlink health reporter, and then when firmware sends a >> message, we use devlink_health_report. >> >> But when I dug into this, it doesn't seem like a natural fit. The health >> reporters expect to see an "error" state, and don't seem to really fit >> the notion of "log a message from firmware" notion. >> >> One of the issues is that the health reporter only keeps one dump, when >> what we really want is a way to have a monitoring application get the >> dump and then store its contents. >> >> Thoughts on what might make sense for this? It feels like a stretch of >> the health interface... >> >> I mean basically what I am thinking of having is using the devlink_fmsg >> interface to just send a netlink message that then gets sent over the >> devlink monitor socket and gets dumped immediately. > > Why does user space need a raw firmware interface in the first place? > > Examples? > So the ice firmware can optionally send diagnostic debug messages via its control queue. The current solutions we've used internally essentially hex-dump the binary contents to the kernel log, and then these get scraped and converted into a useful format for human consumption. I'm not 100% of the format, but I know it's based on a decoding file that is specific to a given firmware image, and thus attempting to tie this into the driver is problematic. There is also a plan to provide a simpler interface for some of the diagnostic messages where a simple bijection between one code to one message for a handful of events, like if the link engine can detect a known reason why it wasn't able to get link. I suppose these could be translated and immediately printed by the driver without a special interface. -Jake ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: devlink interface for asynchronous event/messages from firmware? 2020-05-21 20:22 ` Jacob Keller @ 2020-05-21 20:52 ` Ido Schimmel 2020-05-21 20:59 ` Jacob Keller 0 siblings, 1 reply; 13+ messages in thread From: Ido Schimmel @ 2020-05-21 20:52 UTC (permalink / raw) To: Jacob Keller; +Cc: Jakub Kicinski, Jiri Pirko, netdev, petrm, amitc On Thu, May 21, 2020 at 01:22:34PM -0700, Jacob Keller wrote: > On 5/20/2020 5:16 PM, Jakub Kicinski wrote: > > On Wed, 20 May 2020 17:03:02 -0700 Jacob Keller wrote: > >> Hi Jiri, Jakub, > >> > >> I've been asked to investigate using devlink as a mechanism for > >> reporting asynchronous events/messages from firmware including > >> diagnostic messages, etc. > >> > >> Essentially, the ice firmware can report various status or diagnostic > >> messages which are useful for debugging internal behavior. We want to be > >> able to get these messages (and relevant data associated with them) in a > >> format beyond just "dump it to the dmesg buffer and recover it later". > >> > >> It seems like this would be an appropriate use of devlink. I thought > >> maybe this would work with devlink health: > >> > >> i.e. we create a devlink health reporter, and then when firmware sends a > >> message, we use devlink_health_report. > >> > >> But when I dug into this, it doesn't seem like a natural fit. The health > >> reporters expect to see an "error" state, and don't seem to really fit > >> the notion of "log a message from firmware" notion. > >> > >> One of the issues is that the health reporter only keeps one dump, when > >> what we really want is a way to have a monitoring application get the > >> dump and then store its contents. > >> > >> Thoughts on what might make sense for this? It feels like a stretch of > >> the health interface... > >> > >> I mean basically what I am thinking of having is using the devlink_fmsg > >> interface to just send a netlink message that then gets sent over the > >> devlink monitor socket and gets dumped immediately. > > > > Why does user space need a raw firmware interface in the first place? > > > > Examples? > > > > So the ice firmware can optionally send diagnostic debug messages via > its control queue. The current solutions we've used internally > essentially hex-dump the binary contents to the kernel log, and then > these get scraped and converted into a useful format for human consumption. > > I'm not 100% of the format, but I know it's based on a decoding file > that is specific to a given firmware image, and thus attempting to tie > this into the driver is problematic. You explained how it works, but not why it's needed :) > There is also a plan to provide a simpler interface for some of the > diagnostic messages where a simple bijection between one code to one > message for a handful of events, like if the link engine can detect a > known reason why it wasn't able to get link. I suppose these could be > translated and immediately printed by the driver without a special > interface. Petr worked on something similar last year: https://lore.kernel.org/netdev/cover.1552672441.git.petrm@mellanox.com/ Amit is currently working on a new version based on ethtool (netlink). ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: devlink interface for asynchronous event/messages from firmware? 2020-05-21 20:52 ` Ido Schimmel @ 2020-05-21 20:59 ` Jacob Keller 2020-05-21 21:51 ` Jakub Kicinski 2020-05-22 11:03 ` Jiri Pirko 0 siblings, 2 replies; 13+ messages in thread From: Jacob Keller @ 2020-05-21 20:59 UTC (permalink / raw) To: Ido Schimmel; +Cc: Jakub Kicinski, Jiri Pirko, netdev, petrm, amitc On 5/21/2020 1:52 PM, Ido Schimmel wrote: > On Thu, May 21, 2020 at 01:22:34PM -0700, Jacob Keller wrote: >> On 5/20/2020 5:16 PM, Jakub Kicinski wrote: >>> On Wed, 20 May 2020 17:03:02 -0700 Jacob Keller wrote: >>>> Hi Jiri, Jakub, >>>> >>>> I've been asked to investigate using devlink as a mechanism for >>>> reporting asynchronous events/messages from firmware including >>>> diagnostic messages, etc. >>>> >>>> Essentially, the ice firmware can report various status or diagnostic >>>> messages which are useful for debugging internal behavior. We want to be >>>> able to get these messages (and relevant data associated with them) in a >>>> format beyond just "dump it to the dmesg buffer and recover it later". >>>> >>>> It seems like this would be an appropriate use of devlink. I thought >>>> maybe this would work with devlink health: >>>> >>>> i.e. we create a devlink health reporter, and then when firmware sends a >>>> message, we use devlink_health_report. >>>> >>>> But when I dug into this, it doesn't seem like a natural fit. The health >>>> reporters expect to see an "error" state, and don't seem to really fit >>>> the notion of "log a message from firmware" notion. >>>> >>>> One of the issues is that the health reporter only keeps one dump, when >>>> what we really want is a way to have a monitoring application get the >>>> dump and then store its contents. >>>> >>>> Thoughts on what might make sense for this? It feels like a stretch of >>>> the health interface... >>>> >>>> I mean basically what I am thinking of having is using the devlink_fmsg >>>> interface to just send a netlink message that then gets sent over the >>>> devlink monitor socket and gets dumped immediately. >>> >>> Why does user space need a raw firmware interface in the first place? >>> >>> Examples? >>> >> >> So the ice firmware can optionally send diagnostic debug messages via >> its control queue. The current solutions we've used internally >> essentially hex-dump the binary contents to the kernel log, and then >> these get scraped and converted into a useful format for human consumption. >> >> I'm not 100% of the format, but I know it's based on a decoding file >> that is specific to a given firmware image, and thus attempting to tie >> this into the driver is problematic. > > You explained how it works, but not why it's needed :) Well, the reason we want it is to be able to read the debug/diagnostics data in order to debug issues that might be related to firmware or software mis-use of firmware interfaces. By having it be a separate interface rather than trying to scrape from the kernel message buffer, it becomes something we can have as a possibility for debugging in the field. > >> There is also a plan to provide a simpler interface for some of the >> diagnostic messages where a simple bijection between one code to one >> message for a handful of events, like if the link engine can detect a >> known reason why it wasn't able to get link. I suppose these could be >> translated and immediately printed by the driver without a special >> interface. > > Petr worked on something similar last year: > https://lore.kernel.org/netdev/cover.1552672441.git.petrm@mellanox.com/ > > Amit is currently working on a new version based on ethtool (netlink). > I'll take a look, thanks! -Jake ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: devlink interface for asynchronous event/messages from firmware? 2020-05-21 20:59 ` Jacob Keller @ 2020-05-21 21:51 ` Jakub Kicinski 2020-05-21 22:09 ` Jacob Keller 2020-05-22 11:00 ` Jiri Pirko 2020-05-22 11:03 ` Jiri Pirko 1 sibling, 2 replies; 13+ messages in thread From: Jakub Kicinski @ 2020-05-21 21:51 UTC (permalink / raw) To: Jacob Keller, Ido Schimmel; +Cc: Jiri Pirko, netdev, petrm, amitc On Thu, 21 May 2020 13:59:32 -0700 Jacob Keller wrote: > >> So the ice firmware can optionally send diagnostic debug messages via > >> its control queue. The current solutions we've used internally > >> essentially hex-dump the binary contents to the kernel log, and then > >> these get scraped and converted into a useful format for human consumption. > >> > >> I'm not 100% of the format, but I know it's based on a decoding file > >> that is specific to a given firmware image, and thus attempting to tie > >> this into the driver is problematic. > > > > You explained how it works, but not why it's needed :) > > Well, the reason we want it is to be able to read the debug/diagnostics > data in order to debug issues that might be related to firmware or > software mis-use of firmware interfaces. > > By having it be a separate interface rather than trying to scrape from > the kernel message buffer, it becomes something we can have as a > possibility for debugging in the field. For pure debug/tracing perhaps trace_devlink_hwerr() is the right fit? Right Ido? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: devlink interface for asynchronous event/messages from firmware? 2020-05-21 21:51 ` Jakub Kicinski @ 2020-05-21 22:09 ` Jacob Keller 2020-05-21 22:32 ` Ido Schimmel 2020-05-22 11:00 ` Jiri Pirko 1 sibling, 1 reply; 13+ messages in thread From: Jacob Keller @ 2020-05-21 22:09 UTC (permalink / raw) To: Jakub Kicinski, Ido Schimmel; +Cc: Jiri Pirko, netdev, petrm, amitc On 5/21/2020 2:51 PM, Jakub Kicinski wrote: > On Thu, 21 May 2020 13:59:32 -0700 Jacob Keller wrote: >>>> So the ice firmware can optionally send diagnostic debug messages via >>>> its control queue. The current solutions we've used internally >>>> essentially hex-dump the binary contents to the kernel log, and then >>>> these get scraped and converted into a useful format for human consumption. >>>> >>>> I'm not 100% of the format, but I know it's based on a decoding file >>>> that is specific to a given firmware image, and thus attempting to tie >>>> this into the driver is problematic. >>> >>> You explained how it works, but not why it's needed :) >> >> Well, the reason we want it is to be able to read the debug/diagnostics >> data in order to debug issues that might be related to firmware or >> software mis-use of firmware interfaces. >> >> By having it be a separate interface rather than trying to scrape from >> the kernel message buffer, it becomes something we can have as a >> possibility for debugging in the field. > > For pure debug/tracing perhaps trace_devlink_hwerr() is the right fit? > > Right Ido? > Hm, yes that might be more suitable for this purpose. I'll take a look at it! Thanks, Jake ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: devlink interface for asynchronous event/messages from firmware? 2020-05-21 22:09 ` Jacob Keller @ 2020-05-21 22:32 ` Ido Schimmel 0 siblings, 0 replies; 13+ messages in thread From: Ido Schimmel @ 2020-05-21 22:32 UTC (permalink / raw) To: Jacob Keller; +Cc: Jakub Kicinski, Jiri Pirko, netdev, petrm, amitc On Thu, May 21, 2020 at 03:09:57PM -0700, Jacob Keller wrote: > > > On 5/21/2020 2:51 PM, Jakub Kicinski wrote: > > On Thu, 21 May 2020 13:59:32 -0700 Jacob Keller wrote: > >>>> So the ice firmware can optionally send diagnostic debug messages via > >>>> its control queue. The current solutions we've used internally > >>>> essentially hex-dump the binary contents to the kernel log, and then > >>>> these get scraped and converted into a useful format for human consumption. > >>>> > >>>> I'm not 100% of the format, but I know it's based on a decoding file > >>>> that is specific to a given firmware image, and thus attempting to tie > >>>> this into the driver is problematic. > >>> > >>> You explained how it works, but not why it's needed :) > >> > >> Well, the reason we want it is to be able to read the debug/diagnostics > >> data in order to debug issues that might be related to firmware or > >> software mis-use of firmware interfaces. > >> > >> By having it be a separate interface rather than trying to scrape from > >> the kernel message buffer, it becomes something we can have as a > >> possibility for debugging in the field. > > > > For pure debug/tracing perhaps trace_devlink_hwerr() is the right fit? > > > > Right Ido? > > > > Hm, yes that might be more suitable for this purpose. I'll take a look > at it! Jacob, here is more context that might help: https://lore.kernel.org/netdev/20191103083554.6317-1-idosch@idosch.org/ https://lore.kernel.org/netdev/20191112064830.27002-1-idosch@idosch.org/ > > Thanks, > Jake ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: devlink interface for asynchronous event/messages from firmware? 2020-05-21 21:51 ` Jakub Kicinski 2020-05-21 22:09 ` Jacob Keller @ 2020-05-22 11:00 ` Jiri Pirko 2020-05-22 17:46 ` Jakub Kicinski 2020-05-26 21:00 ` Jacob Keller 1 sibling, 2 replies; 13+ messages in thread From: Jiri Pirko @ 2020-05-22 11:00 UTC (permalink / raw) To: Jakub Kicinski; +Cc: Jacob Keller, Ido Schimmel, netdev, petrm, amitc Thu, May 21, 2020 at 11:51:13PM CEST, kuba@kernel.org wrote: >On Thu, 21 May 2020 13:59:32 -0700 Jacob Keller wrote: >> >> So the ice firmware can optionally send diagnostic debug messages via >> >> its control queue. The current solutions we've used internally >> >> essentially hex-dump the binary contents to the kernel log, and then >> >> these get scraped and converted into a useful format for human consumption. >> >> >> >> I'm not 100% of the format, but I know it's based on a decoding file >> >> that is specific to a given firmware image, and thus attempting to tie >> >> this into the driver is problematic. >> > >> > You explained how it works, but not why it's needed :) >> >> Well, the reason we want it is to be able to read the debug/diagnostics >> data in order to debug issues that might be related to firmware or >> software mis-use of firmware interfaces. >> >> By having it be a separate interface rather than trying to scrape from >> the kernel message buffer, it becomes something we can have as a >> possibility for debugging in the field. > >For pure debug/tracing perhaps trace_devlink_hwerr() is the right fit? Well, trace_devlink_hwerr() is for simple errors that are mapped 1:1 with some string. From what I got, Jacob needs to pass some data structures to the user. Something more similar to health reporter dumps and their fmsg. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: devlink interface for asynchronous event/messages from firmware? 2020-05-22 11:00 ` Jiri Pirko @ 2020-05-22 17:46 ` Jakub Kicinski 2020-05-26 21:13 ` Jacob Keller 2020-05-26 21:00 ` Jacob Keller 1 sibling, 1 reply; 13+ messages in thread From: Jakub Kicinski @ 2020-05-22 17:46 UTC (permalink / raw) To: Jiri Pirko; +Cc: Jacob Keller, Ido Schimmel, netdev, petrm, amitc On Fri, 22 May 2020 13:00:28 +0200 Jiri Pirko wrote: > Thu, May 21, 2020 at 11:51:13PM CEST, kuba@kernel.org wrote: > >On Thu, 21 May 2020 13:59:32 -0700 Jacob Keller wrote: > >> >> So the ice firmware can optionally send diagnostic debug messages via > >> >> its control queue. The current solutions we've used internally > >> >> essentially hex-dump the binary contents to the kernel log, and then > >> >> these get scraped and converted into a useful format for human consumption. > >> >> > >> >> I'm not 100% of the format, but I know it's based on a decoding file > >> >> that is specific to a given firmware image, and thus attempting to tie > >> >> this into the driver is problematic. > >> > > >> > You explained how it works, but not why it's needed :) > >> > >> Well, the reason we want it is to be able to read the debug/diagnostics > >> data in order to debug issues that might be related to firmware or > >> software mis-use of firmware interfaces. > >> > >> By having it be a separate interface rather than trying to scrape from > >> the kernel message buffer, it becomes something we can have as a > >> possibility for debugging in the field. > > > >For pure debug/tracing perhaps trace_devlink_hwerr() is the right fit? > > Well, trace_devlink_hwerr() is for simple errors that are mapped 1:1 > with some string. Ah, damn, I missed it takes char :/ > From what I got, Jacob needs to pass some data structures to the > user. Something more similar to health reporter dumps and their fmsg. For health reporters AFAIU right now every health reporter event indicates something bad has happened, so it should be logged and potentially reported to the vendor. My understanding is that Jake needs more of a tracing infra, for debug messages. Is that true? Do you need an on/off switch for those as well? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: devlink interface for asynchronous event/messages from firmware? 2020-05-22 17:46 ` Jakub Kicinski @ 2020-05-26 21:13 ` Jacob Keller 0 siblings, 0 replies; 13+ messages in thread From: Jacob Keller @ 2020-05-26 21:13 UTC (permalink / raw) To: Jakub Kicinski, Jiri Pirko; +Cc: Ido Schimmel, netdev, petrm, amitc On 5/22/2020 10:46 AM, Jakub Kicinski wrote: > On Fri, 22 May 2020 13:00:28 +0200 Jiri Pirko wrote: >> Thu, May 21, 2020 at 11:51:13PM CEST, kuba@kernel.org wrote: >>> For pure debug/tracing perhaps trace_devlink_hwerr() is the right fit? >> >> Well, trace_devlink_hwerr() is for simple errors that are mapped 1:1 >> with some string. > > Ah, damn, I missed it takes char :/ using trace_devlink_hwerr is better than what we *have* been doing at least. :) I think if we instead made our own driver trace point it might work well enough. > >> From what I got, Jacob needs to pass some data structures to the >> user. Something more similar to health reporter dumps and their fmsg. > > For health reporters AFAIU right now every health reporter event > indicates something bad has happened, so it should be logged and > potentially reported to the vendor. > Right, that's why I don't think it's a great fit. > My understanding is that Jake needs more of a tracing infra, for > debug messages. Is that true? Do you need an on/off switch for > those as well? > The messages come over different "modules" of the firmware, I think we have ~16-20 or so modules, so ideally we'd have an on-off switch for each module, and there's also a message level range which is sort of like the dbg, info, err messaging. The current solution relies on a custom driver build that enables the logging and the messaging, and uses some module parameters to configure this stuff. The big downside is that we don't feel the current implementation can be left in, certainly not upstream. This means, anytime a firmware engineer says "please get us firmware logs" we have to reproduce whatever issue with the custom build of the driver. The value of having this information is a significant increase in productivity when debugging issues that might be occurring in the firmware, or in misuse of fw<->driver interfaces, or missed expectations between developers, etc. Our goal is to find something that we can safely leave in the driver that will be off by default, but enabled if necessary to capture the logging data. From the sounds of it, maybe the best solution is to implement this as a trace event. Possibly we could just implement it as a driver-specific trace event so it'd show up in tracing/events/<driver>/fwlogs, or something like that. That still leaves open the question of the best way to configure which modules and levels are enabled... This debug logging is separate from a similar-sounding system that is intended to report non-debug messages such as link-failure reason. I do agree that something like that ought to instead be handled by the driver determining "oh this is a link failure indication, so I'll report it over the ethtool netlink interface, and convert it to the value expected by that interface". I'm not sure what other data besides link-failure reporting that is intended to be sent in this simpler format, as I haven't gotten any other examples yet. The intent was to have these messages displayed by doing a simple lookup from code to message, as there would be significantly fewer of these and they are intended to help guide system administrators. But given that the only example I've seen so far is the link messages, it's unclear to me what else they would be used for. And just to clarify, in either case the intention is that these are one-way and read-only interfaces. Thanks, Jake ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: devlink interface for asynchronous event/messages from firmware? 2020-05-22 11:00 ` Jiri Pirko 2020-05-22 17:46 ` Jakub Kicinski @ 2020-05-26 21:00 ` Jacob Keller 1 sibling, 0 replies; 13+ messages in thread From: Jacob Keller @ 2020-05-26 21:00 UTC (permalink / raw) To: Jiri Pirko, Jakub Kicinski; +Cc: Ido Schimmel, netdev, petrm, amitc On 5/22/2020 4:00 AM, Jiri Pirko wrote: > Thu, May 21, 2020 at 11:51:13PM CEST, kuba@kernel.org wrote: >> On Thu, 21 May 2020 13:59:32 -0700 Jacob Keller wrote: >>>>> So the ice firmware can optionally send diagnostic debug messages via >>>>> its control queue. The current solutions we've used internally >>>>> essentially hex-dump the binary contents to the kernel log, and then >>>>> these get scraped and converted into a useful format for human consumption. >>>>> >>>>> I'm not 100% of the format, but I know it's based on a decoding file >>>>> that is specific to a given firmware image, and thus attempting to tie >>>>> this into the driver is problematic. >>>> >>>> You explained how it works, but not why it's needed :) >>> >>> Well, the reason we want it is to be able to read the debug/diagnostics >>> data in order to debug issues that might be related to firmware or >>> software mis-use of firmware interfaces. >>> >>> By having it be a separate interface rather than trying to scrape from >>> the kernel message buffer, it becomes something we can have as a >>> possibility for debugging in the field. >> >> For pure debug/tracing perhaps trace_devlink_hwerr() is the right fit? > > Well, trace_devlink_hwerr() is for simple errors that are mapped 1:1 > with some string. From what I got, Jacob needs to pass some data > structures to the user. Something more similar to health reporter dumps > and their fmsg. > Right. From my understanding the messages for debugging are not in a format that can be immediately turned into a text string. The reasoning behind this is that the set of messages changes, (especially during early firmware bringup) and thus sending actual ASCII messages doesn't work well. It goes back to the "firmware is a black box". The problem is that in practice, we need ways to help debug this black box, and this was one method that doesn't require hooking up a more expensive device to intercept and debug with a step-through debugger. It also enables capturing more verbose information about what the firmware is doing. But from how I understand it, the messages can't really be immediately interpreted into usable format by the kernel. I suppose in theory they could but it then requires carrying the full translation table. Today, this is done by using a custom driver which logs the messages directly to the kernel log buffer, which we know isn't the best solution. Using a trace point is less bad, since that goes into the tracefs, and will be disabled by default and goes into the tracefs system instead of going into the default print buffer... The pain is the fact that we have to request loading a custom driver that enables these prints, meaning that it is harder to obtain the data than if we can just say "enable firmware logs, reproduce the issue, and grab this data" Thanks, Jake ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: devlink interface for asynchronous event/messages from firmware? 2020-05-21 20:59 ` Jacob Keller 2020-05-21 21:51 ` Jakub Kicinski @ 2020-05-22 11:03 ` Jiri Pirko 1 sibling, 0 replies; 13+ messages in thread From: Jiri Pirko @ 2020-05-22 11:03 UTC (permalink / raw) To: Jacob Keller; +Cc: Ido Schimmel, Jakub Kicinski, netdev, petrm, amitc Thu, May 21, 2020 at 10:59:32PM CEST, jacob.e.keller@intel.com wrote: > > >On 5/21/2020 1:52 PM, Ido Schimmel wrote: >> On Thu, May 21, 2020 at 01:22:34PM -0700, Jacob Keller wrote: >>> On 5/20/2020 5:16 PM, Jakub Kicinski wrote: >>>> On Wed, 20 May 2020 17:03:02 -0700 Jacob Keller wrote: >>>>> Hi Jiri, Jakub, >>>>> >>>>> I've been asked to investigate using devlink as a mechanism for >>>>> reporting asynchronous events/messages from firmware including >>>>> diagnostic messages, etc. >>>>> >>>>> Essentially, the ice firmware can report various status or diagnostic >>>>> messages which are useful for debugging internal behavior. We want to be >>>>> able to get these messages (and relevant data associated with them) in a >>>>> format beyond just "dump it to the dmesg buffer and recover it later". >>>>> >>>>> It seems like this would be an appropriate use of devlink. I thought >>>>> maybe this would work with devlink health: >>>>> >>>>> i.e. we create a devlink health reporter, and then when firmware sends a >>>>> message, we use devlink_health_report. >>>>> >>>>> But when I dug into this, it doesn't seem like a natural fit. The health >>>>> reporters expect to see an "error" state, and don't seem to really fit >>>>> the notion of "log a message from firmware" notion. >>>>> >>>>> One of the issues is that the health reporter only keeps one dump, when >>>>> what we really want is a way to have a monitoring application get the >>>>> dump and then store its contents. >>>>> >>>>> Thoughts on what might make sense for this? It feels like a stretch of >>>>> the health interface... >>>>> >>>>> I mean basically what I am thinking of having is using the devlink_fmsg >>>>> interface to just send a netlink message that then gets sent over the >>>>> devlink monitor socket and gets dumped immediately. >>>> >>>> Why does user space need a raw firmware interface in the first place? >>>> >>>> Examples? >>>> >>> >>> So the ice firmware can optionally send diagnostic debug messages via >>> its control queue. The current solutions we've used internally >>> essentially hex-dump the binary contents to the kernel log, and then >>> these get scraped and converted into a useful format for human consumption. >>> >>> I'm not 100% of the format, but I know it's based on a decoding file >>> that is specific to a given firmware image, and thus attempting to tie >>> this into the driver is problematic. >> >> You explained how it works, but not why it's needed :) > >Well, the reason we want it is to be able to read the debug/diagnostics >data in order to debug issues that might be related to firmware or >software mis-use of firmware interfaces. I think that the health reporter would be able to serve this purpose. There is an event in firmware-> the event is propagated to the user. The limitation we have in devlink health right now is that we only store the last event. So perhaps we need to extend to optionally hold a list/ring-buffer of events? > >By having it be a separate interface rather than trying to scrape from >the kernel message buffer, it becomes something we can have as a >possibility for debugging in the field. > >> >>> There is also a plan to provide a simpler interface for some of the >>> diagnostic messages where a simple bijection between one code to one >>> message for a handful of events, like if the link engine can detect a >>> known reason why it wasn't able to get link. I suppose these could be >>> translated and immediately printed by the driver without a special >>> interface. >> >> Petr worked on something similar last year: >> https://lore.kernel.org/netdev/cover.1552672441.git.petrm@mellanox.com/ >> >> Amit is currently working on a new version based on ethtool (netlink). >> > >I'll take a look, thanks! > >-Jake ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2020-05-26 21:13 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-05-21 0:03 devlink interface for asynchronous event/messages from firmware? Jacob Keller 2020-05-21 0:16 ` Jakub Kicinski 2020-05-21 20:22 ` Jacob Keller 2020-05-21 20:52 ` Ido Schimmel 2020-05-21 20:59 ` Jacob Keller 2020-05-21 21:51 ` Jakub Kicinski 2020-05-21 22:09 ` Jacob Keller 2020-05-21 22:32 ` Ido Schimmel 2020-05-22 11:00 ` Jiri Pirko 2020-05-22 17:46 ` Jakub Kicinski 2020-05-26 21:13 ` Jacob Keller 2020-05-26 21:00 ` Jacob Keller 2020-05-22 11:03 ` Jiri Pirko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).