netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Florian Fainelli <f.fainelli@gmail.com>
To: Jakub Kicinski <kuba@kernel.org>, David Ahern <dsahern@gmail.com>
Cc: Ido Schimmel <idosch@idosch.org>,
	netdev@vger.kernel.org, davem@davemloft.net, jiri@nvidia.com,
	amcohen@nvidia.com, danieller@nvidia.com, mlxsw@nvidia.com,
	roopa@nvidia.com, andrew@lunn.ch, vivien.didelot@gmail.com,
	tariqt@nvidia.com, ayal@nvidia.com, mkubecek@suse.cz,
	Ido Schimmel <idosch@nvidia.com>
Subject: Re: [RFC PATCH net-next 0/6] devlink: Add device metric support
Date: Tue, 18 Aug 2020 21:30:16 -0700	[thread overview]
Message-ID: <55e40430-a52f-f77b-0d1e-ef79386a0a53@gmail.com> (raw)
In-Reply-To: <20200818203501.5c51e61a@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>



On 8/18/2020 8:35 PM, Jakub Kicinski wrote:
> On Tue, 18 Aug 2020 20:43:11 -0600 David Ahern wrote:
>> On 8/18/20 6:24 PM, Jakub Kicinski wrote:
>>> On Mon, 17 Aug 2020 15:50:53 +0300 Ido Schimmel wrote:
>>>> From: Ido Schimmel <idosch@nvidia.com>
>>>>
>>>> This patch set extends devlink to allow device drivers to expose device
>>>> metrics to user space in a standard and extensible fashion, as opposed
>>>> to the driver-specific debugfs approach.
>>>
>>> I feel like all those loose hardware interfaces are a huge maintenance
>>> burden. I don't know what the solution is, but the status quo is not
>>> great.
>>
>> I don't agree with the 'loose' characterization.
> 
> Loose as in not bound by any standard or best practices.
> 
>> Ido and team are pushing what is arguably a modern version of
>> `ethtool -S`, so it provides a better API for retrieving data.
> 
> ethtool -S is absolutely terrible. Everybody comes up with their own
> names for IEEE stats, and dumps stats which clearly have corresponding
> fields in rtnl_link_stats64 there. We don't need a modern ethtool -S,
> we need to get away from that mess.
> 
>>> I spend way too much time patrolling ethtool -S outputs already.
>>
>> But that's the nature of detailed stats which are often essential to
>> ensuring the system is operating as expected or debugging some problem.
>> Commonality is certainly desired in names when relevant to be able to
>> build tooling around the stats.
> 
> There are stats which are clearly detailed and device specific,
> but what ends up happening is that people expose very much not
> implementation specific stats through the free form interfaces,
> because it's the easiest.
> 
> And users are left picking up the pieces, having to ask vendors what
> each stat means, and trying to create abstractions in their user space
> glue.

Should we require vendors to either provide a Documentation/ entry for 
each statistics they have (and be guaranteed that it will be outdated 
unless someone notices), or would you rather have the statistics 
description be part of the devlink interface itself? Should we define 
namespaces such that standard metrics should be under the standard 
namespace and the vendor standard is the wild west?

> 
>> As an example, per-queue stats have been
>> essential to me for recent investigations. ethq has been really helpful
>> in crossing NIC vendors and viewing those stats as it handles the
>> per-vendor naming differences, but it requires changes to show anything
>> else - errors per queue, xdp stats, drops, etc. This part could be simpler.
> 
> Sounds like you're agreeing with me?
> 
>> As for this set, I believe the metrics exposed here are more unique to
>> switch ASICs.
> 
> This is the list from patch 6:
> 
>     * - ``nve_vxlan_encap``
>     * - ``nve_vxlan_decap``
>     * - ``nve_vxlan_decap_errors``
>     * - ``nve_vxlan_decap_discards``
> 
> What's so unique?
> 
>> At least one company I know of has built a business model
>> around exposing detailed telemetry of switch ASICs, so clearly some find
>> them quite valuable.
> 
> It's a question of interface, not the value of exposed data.
> 
> If I have to download vendor documentation and tooling, or adapt my own
> scripts for every new vendor, I could have as well downloaded an SDK.

Are not you being a bit over dramatic here with your example? At least 
you can run the same command to obtain the stats regardless of the 
driver and vendor, so from that perspective Linux continues to be the 
abstraction and that is not broken.
-- 
Florian

  reply	other threads:[~2020-08-19  4:30 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-17 12:50 [RFC PATCH net-next 0/6] devlink: Add device metric support Ido Schimmel
2020-08-17 12:50 ` [RFC PATCH net-next 1/6] devlink: Add device metric infrastructure Ido Schimmel
2020-08-17 14:12   ` Andrew Lunn
2020-08-17 12:50 ` [RFC PATCH net-next 2/6] netdevsim: Add devlink metric support Ido Schimmel
2020-08-17 12:50 ` [RFC PATCH net-next 3/6] selftests: netdevsim: Add devlink metric tests Ido Schimmel
2020-08-17 12:50 ` [RFC PATCH net-next 4/6] mlxsw: reg: Add Tunneling NVE Counters Register Ido Schimmel
2020-08-17 12:50 ` [RFC PATCH net-next 5/6] mlxsw: reg: Add Tunneling NVE Counters Register Version 2 Ido Schimmel
2020-08-17 12:50 ` [RFC PATCH net-next 6/6] mlxsw: spectrum_nve: Expose VXLAN counters via devlink-metric Ido Schimmel
2020-08-17 14:29   ` Andrew Lunn
2020-08-18  6:59     ` Ido Schimmel
2020-08-19  0:24 ` [RFC PATCH net-next 0/6] devlink: Add device metric support Jakub Kicinski
2020-08-19  2:43   ` David Ahern
2020-08-19  3:35     ` Jakub Kicinski
2020-08-19  4:30       ` Florian Fainelli [this message]
2020-08-19 16:18         ` Jakub Kicinski
2020-08-19 17:20           ` Florian Fainelli
2020-08-19 18:07             ` Jakub Kicinski
2020-08-20 14:35               ` David Ahern
2020-08-20 16:09                 ` Jakub Kicinski
2020-08-21 10:30                   ` Ido Schimmel
2020-08-21 16:53                     ` Jakub Kicinski
2020-08-21 19:12                       ` David Ahern
2020-08-21 23:50                         ` Jakub Kicinski
2020-08-21 23:59                           ` David Ahern
2020-08-22  0:37                             ` Jakub Kicinski
2020-08-22  1:18                               ` David Ahern
2020-08-22 16:27                                 ` Jakub Kicinski
2020-08-23  7:04                                   ` Ido Schimmel
2020-08-24 19:11                                     ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55e40430-a52f-f77b-0d1e-ef79386a0a53@gmail.com \
    --to=f.fainelli@gmail.com \
    --cc=amcohen@nvidia.com \
    --cc=andrew@lunn.ch \
    --cc=ayal@nvidia.com \
    --cc=danieller@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@gmail.com \
    --cc=idosch@idosch.org \
    --cc=idosch@nvidia.com \
    --cc=jiri@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=mkubecek@suse.cz \
    --cc=mlxsw@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=roopa@nvidia.com \
    --cc=tariqt@nvidia.com \
    --cc=vivien.didelot@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).