netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jacob Keller <jacob.e.keller@intel.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org, jiri@resnulli.us, valex@mellanox.com,
	linyunsheng@huawei.com, lihong.yang@intel.com
Subject: Re: [RFC PATCH v2 06/22] ice: add basic handler for devlink .info_get
Date: Wed, 19 Feb 2020 09:33:09 -0800	[thread overview]
Message-ID: <6b4dd025-bcf8-12de-99b0-1e05e16333e8@intel.com> (raw)
In-Reply-To: <20200218184552.7077647b@kicinski-fedora-PC1C0HJN>

On 2/18/2020 6:45 PM, Jakub Kicinski wrote:
> On Fri, 14 Feb 2020 15:22:05 -0800 Jacob Keller wrote:
>> The devlink .info_get callback allows the driver to report detailed
>> version information. The following devlink versions are reported with
>> this initial implementation:
>>
>>  "fw.mgmt" -> The version of the firmware that controls PHY, link, etc
>>  "fw.mgmt.api" -> API version of interface exposed over the AdminQ
>>  "fw.mgmt.bundle" -> Unique identifier for the firmware bundle
>>  "fw.undi.orom" -> Version of the Option ROM containing the UEFI driver
>>  "nvm.psid" -> Version of the format for the NVM parameter set
>>  "nvm.bundle" -> Unique identifier for the combined NVM image
> 
> I spent some time today trying to write up the design choices behind
> the original implementation but I think I can't complete that unless 
> I understand what the PSID thing really is.
> 

Ok.

> So the original design is motivated by two things:
>  - making FW versions understandable / per component (as opposed 
>    to the crowded ethtool string)
>  - making it possible to automate FW management in a fleet of machines
>    across vendors.
> 
> The second one is more important.
> 
> The design was expecting the following:
>  - HW design is uniquely identified by 'fixed' versions;
>  - each HW design requires only one FW build (but FW build can cover
>    multiple versions of HW);
> 
> This is why serial number is not part of the fixed versions, even
> though it is fixed. Serial is different per board, users should be 
> able to map HW design to the FW version they want to run.
> 

Right. Serial is separate from board, while something like the board.id
is an identifier of the *design* of the board, not of an individual one.

> Effectively FW update agent does this:
> 
>   # Get unique HW design identifier
>   $hw_id = devlink-dev-info['fixed']
> 
>   # Find out which FW we want to use for this NIC
>   $want_fw_id = some-db-backed.lookup($hw_id)
> 
>   # Update if necessary  
>   if $want_fw_id != devlink-dev-info['stored']:
>      # maybe download the file
>      devlink-dev-flash()
> 
>   # Reboot if necessary
>   if $want_fw_id != devlink-dev-info['running']
>      reboot()
> 
> 
> dev-info sets can obviously contain multiple values, but field by field
> comparison for simple == and != should work just fine.
> 
> The complications which had arisen so far are two:
>  - even though all components are versioned some customers expressed
>    uneasiness of only identifying the components but not the entire
>    "build". That's why we added the 'fw.bundle'. When multiple
>    components are "bundled" together into a flashable firmware image
>    that bundle itself gets and ID.
>    I'd expect there to be a bundle for each set of components which are
>    distributed as a FW image. IOW bundle ID per type of file that can
>    be downloaded from the vendor support site. For max convenience I'd
>    think there should be one file that contains all components so
>    customers don't have to juggle files. That means overall fw.bundle
>    that covers all.
>    Note: that fw.bundle is only meaningful if _all_ components are
>    unchanged from flash image, so the FW must do a self-check to
>    validate any component covered by a bundle id is unchanged.
> 

Right that makes sense.

>  - the PSID stuff was added, which IIUC is either (a) an identifier 
>    for configuration parameters which are not visible via normal Linux
>    tools, or (b) a way for an OEM to label a product.
>    This changes where this thing should reside because we don't expect
>    OEM to relabel the product/SKU (or do we?) and hence it's a fixed
>    version.
>    If it's an identifier for random parameters of the board (serdes
>    params, temperature info, IDK) which is expected to maybe be updated
>    or tuned it should be in running/stored.
> 

Hmm. In my case nvm.psid is basically describing the format of the NVM
parameter set, but I don't think it actually covers the contents. This
version can update if you update to a newer image.

I probably need to re-word the versions to be "fw.bundle" and "fw.psid",
rather than using "nvm", given how you're describing the fields above.

>    So any further info on what's an EETRACK in your case?
> 

EETRACK is basically the name we used for "bundle", as it is a unique
identifier generated when new images are prepared.

I think this should probably just become "fw.bundle".

What I have now as "fw.mgmt.bundle" is a little different. It's
basically a unique identifier obtained from the build system of the
management firmware that can be used to identify exactly what got built
for that firmware. (i.e. it would change even if the developers failed
to update their version number).

>    For MLX there's bunch of documents which tell us how we can create 
>    an ini file with parameters, but no info on what those parameters
>    actually are. 
> 
>    Jiri would you be able to help? Please chime in..
> 
> 
> Sorry for the painful review process, it's quite hard to review what
> people are doing without knowing the back end. Hopefully above gives
> you an idea of the intentions when this code was added :)
> 

I understand the difficulty.

> I see that the next patch adds a 'fixed' version, so if that's
> sufficient to identify your board there isn't any blocker here.

Yes, the board.id is the unique identifier of the physical board design.
It's what we've called the Product Board Assembly identifier.

> 
> What I'd still like to consider is:
>  - if fw.mgmt.bundle needs to be a bundle if it doesn't bundle multiple
>    things? If it's hard to inject the build ID into the fw.mgmt version
>    that's fine.

I mostly didn't like having it as part of the same version because it is
somewhat distinct. I don't think it's a "bundle" in the sense of what
you're describing.

It is basically just an identifier from the build system of that
component and will be changed even if the developer did not update the
firmware version. It's useful primarily to identify precisely where that
build of the firmware binary came from. (Hence why I originally used
".build").

>  - fw.undi.orom - do we need to say orom? Is there anything else than
>    orom for UNDI in the flash?

Hmm.. I'll double check this. I wasn't entirely sure if we had other
components which is why I went that route. I think you're right though
and this can just be "fw.undi".

>  - nvm.psid may perhaps be better as nvm.psid.api? Following your
>    fw.mgmt.api?

Hmm. Yea this isn't really a parameter set id, but more of describing
the format. I am not sure I fully understand it myself yet.

>  - nvm.bundle - eetrack sounds mode like a stream, so perhaps this is
>    the PSID?
> 

So, I think this should probably become "fw.bundle", and I can drop the
nvm bits altogether. The EETRACK id is a unique identifier we create
when new images are created. If you have the eetrack you can look up
data on the source binary that the NVM image came from.

It wouldn't cover the parameters that can be changed, so I don't think
it's a psid.


Given this discussion, here is what I have so far:

"fw.bundle" -> What was "nvm.bundle", the identifier for the combined fw
image. This would be our EETRACK id.
"fw.mgmt" -> The management firmware 3 digit version
"fw.mgmt.api" -> The version of API exposed by this firmware
"fw.mgmt.build" -> The build identifier. I really do think this should
be ".build" rather than .bundle, as it's definitely not a bundle in the
same sense. I *could* simply make "fw.mgmt" be "maj.min.patch build" but
I think it makes sense as its own field.

"fw.undi" -> Version of the Option ROM containing the UEFI driver

"fw.psid.api" -> what was the "nvm.psid". This I think needs a bit more
work to define. It makes sense to me as some sort of "api" as (if I
understand it correctly) it is the format for the parameters, but does
not itself define the parameter contents.

The original reason for using "fw" and "nvm" was because we (internally)
use fw to mean the management firmware.. where as these APIs really
combine the blocks and use "fw.mgmt" for the management firmware. Thus I
think it makes sense to move from

I also have a couple other oddities that need to be sorted out. We want
to display the DDP version (piece of "firmware" that is loaded during
driver load, and is not permanent to the NVM). In some sense this is our
"fw.app", but because it's loaded by driver at load and not as
permanently stored in the NVM... I'm not really sure that makes sense to
put this as the "fw.app", since it is not updated or really touched by
the firmware flash update.

Finally we also have a component we call the "netlist", which I'm still
not fully up to speed on exactly what it represents, but it has multiple
pieces of data including a 2-digit Major.Minor version of the base, a
type field indicating the format, and a 2-digit revision field that is
incremented on internal and external changes to the contents. Finally
there is a hash that I think might *actually* be something like a psid
or a bundle to uniquely represent this component. I haven't included
this component yet because I'm still trying to grasp exactly what it
represents and how best to describe each piece.

Thanks for your review,
Jake

>> With this, devlink can now report at least the same information as
>> reported by the older ethtool interface. Each section of the
>> "firmware-version" is also reported independently so that it is easier
>> to understand the meaning.
>>
>> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

  reply	other threads:[~2020-02-19 17:33 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-14 23:21 [RFC PATCH v2 00/22] devlink region updates Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 01/22] ice: use __le16 types for explicitly Little Endian values Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 02/22] ice: create function to read a section of the NVM and Shadow RAM Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 03/22] ice: implement full NVM read from ETHTOOL_GEEPROM Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 04/22] ice: enable initial devlink support Jacob Keller
2020-03-02 16:30   ` Jiri Pirko
2020-03-02 19:29     ` Jacob Keller
2020-03-03 13:47       ` Jiri Pirko
2020-03-03 17:53         ` Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 05/22] ice: rename variables used for Option ROM version Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 06/22] ice: add basic handler for devlink .info_get Jacob Keller
2020-02-19  2:45   ` Jakub Kicinski
2020-02-19 17:33     ` Jacob Keller [this message]
2020-02-19 19:57       ` Jakub Kicinski
2020-02-19 21:37         ` Jacob Keller
2020-02-19 23:47           ` Jakub Kicinski
2020-02-20  0:06             ` Jacob Keller
2020-02-21 22:11               ` Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 07/22] ice: add board identifier info to " Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 08/22] devlink: prepare to support region operations Jacob Keller
2020-03-02 17:42   ` Jiri Pirko
2020-02-14 23:22 ` [RFC PATCH v2 09/22] devlink: convert snapshot destructor callback to region op Jacob Keller
2020-03-02 17:42   ` Jiri Pirko
2020-02-14 23:22 ` [RFC PATCH v2 10/22] devlink: trivial: fix tab in function documentation Jacob Keller
2020-03-02 17:42   ` Jiri Pirko
2020-02-14 23:22 ` [RFC PATCH v2 11/22] devlink: add functions to take snapshot while locked Jacob Keller
2020-03-02 17:43   ` Jiri Pirko
2020-03-02 22:25     ` Jacob Keller
2020-03-03  8:41       ` Jiri Pirko
2020-02-14 23:22 ` [RFC PATCH v2 12/22] devlink: convert snapshot id getter to return an error Jacob Keller
2020-03-02 17:44   ` Jiri Pirko
2020-02-14 23:22 ` [RFC PATCH v2 13/22] devlink: track snapshot ids using an IDR and refcounts Jacob Keller
2020-02-18 21:44   ` Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 14/22] devlink: implement DEVLINK_CMD_REGION_NEW Jacob Keller
2020-03-02 17:41   ` Jiri Pirko
2020-03-02 19:38     ` Jacob Keller
2020-03-03  9:30       ` Jiri Pirko
2020-03-03 17:51         ` Jacob Keller
2020-03-04 11:58           ` Jiri Pirko
2020-03-04 17:43             ` Jacob Keller
2020-03-05  6:41               ` Jiri Pirko
2020-03-05 22:33                 ` Jacob Keller
2020-03-06  6:16                   ` Jiri Pirko
2020-03-02 22:11     ` Jacob Keller
2020-03-02 22:14     ` Jacob Keller
2020-03-02 22:35     ` Jacob Keller
2020-03-03  9:31       ` Jiri Pirko
2020-02-14 23:22 ` [RFC PATCH v2 15/22] netdevsim: support taking immediate snapshot via devlink Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 16/22] devlink: simplify arguments for read_snapshot_fill Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 17/22] devlink: use min_t to calculate data_size Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 18/22] devlink: report extended error message in region_read_dumpit Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 19/22] devlink: remove unnecessary parameter from chunk_fill function Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 20/22] devlink: refactor region_read_snapshot_fill to use a callback function Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 21/22] devlink: support directly reading from region memory Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 22/22] ice: add a devlink region to dump shadow RAM contents Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 1/2] devlink: add support for DEVLINK_CMD_REGION_NEW Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 2/2] devlink: stop requiring snapshot for regions Jacob Keller
2020-03-02 16:27 ` [RFC PATCH v2 00/22] devlink region updates Jiri Pirko
2020-03-02 19:27   ` Jacob Keller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6b4dd025-bcf8-12de-99b0-1e05e16333e8@intel.com \
    --to=jacob.e.keller@intel.com \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=lihong.yang@intel.com \
    --cc=linyunsheng@huawei.com \
    --cc=netdev@vger.kernel.org \
    --cc=valex@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).