Re: OpenBMC Project metrics

From: krtaylor <kurt.r.taylor@gmail.com>
To: Patrick Williams <patrick@stwcx.xyz>
Cc: Andrew Jeffery <andrew@aj.id.au>,
	OpenBMC Maillist <openbmc@lists.ozlabs.org>
Subject: Re: OpenBMC Project metrics
Date: Mon, 9 Dec 2019 12:41:14 -0600	[thread overview]
Message-ID: <c8172655-7ba4-0c7e-a52b-7d7eb08858f3@gmail.com> (raw)
In-Reply-To: <20191206165144.GA48825@patrickw3-mbp>

On 12/6/19 10:51 AM, Patrick Williams wrote:
> On Fri, Dec 06, 2019 at 07:33:26AM -0600, krtaylor wrote:
>> On 12/4/19 4:33 PM, Andrew Jeffery wrote:
>>> On Thu, 5 Dec 2019, at 05:14, Kurt Taylor wrote:
>>>>
>>>> NOTE: these metrics should be used *very carefully*. They do not
>>>> represent the total contributions to the project. We value
>>>> contributions many that do not show up in these charts, including
>>>> reviews, mail list involvement, IRC involvement, etc.
>>>
>>> Given all the caveats and the lopsided view the graphs display, what
>>> are we trying to achieve by graphing the metric of commits per company?
>>
>> "What gets measured, gets managed" I am a firm believer of this simple
>> quote. Measuring a project always improves it. That, and I have been asked
>> to start gathering metrics from several of our contributing companies. They
>> appreciate it.
> 
> I recognize some other projects publish statistics like this and it is
> all publicly available information but I personally have slight
> apprehension about this data.  This project is no where as mature as the
> Linux kernel and the data is highly skewed towards one company.  I have
> some concern that this data could be used for political purposes, both
> externally in community interaction and internally to member companies
> w.r.t. their decisions on future involvement.
> 
> The data is public (from Gerrit), no doubt about it, but I think it is
> reasonable to question if it is a net-positive or net-negative for the
> community to gather the data and put it on Github, to put it on Github
> and advertise it, or to put it on Github and the advertisement coming from
> the Community Manager.  (ie. there is a spectrum of possible ways to
> deal with this data with different pros/cons)
> 
>> "Measuring a project always improves it."
> 
> Maybe a first step here is answering what is the desired change by
> publishing this data?  And who's desire is it?  That isn't obvious to me.

Thanks again for the comments! See the previous reply to Andrew where I 
address some of these points.

> 
>>> It's also not clear to me what the inputs to these graphs are, for instance
>>> whether changes to Linux, u-boot, qemu or other major projects that we
>>> consume and contribute to are included or whether it's just repositories
>>> under the openbmc org on github. If we're excluding upstream projects,
>>> why?
>>
>> It is only for contributions under openbmc. Other projects have been
>> excluded simply because they have their own project metrics. For example:
> 
> The commit-count-from-Gerrit approach is slightly disappointing to me for
> two reasons:
> 
>      1. Commit count does little to assess the impact of the
>         contribution.  Ex. a one-line recipe update to add a dependency
>         counts the same as a feature.
> 
>      2. There are significant contributions on the kernel side done by
>         and pretty much exclusively for this project.  The effort
>         involved with getting kernel patches upstream is at least an
>         order of magnitude higher than userspace changes (see also
>         "impact").
> 
>>> Where are the scripts to reproduce the graphs? Can you contribute them
>>> to openbmc-tools?
>>
>> Eventually yes, if my employer will let me do more upstream. :) But, the
>> data is publicly available, you can get it yourself from gerrit. Simply go
>> to our gerrit dashboard and search something like: " status:merged AND
>> after:<date> AND before:<date> AND NOT topic:autobump AND owner:<gerrit id>
>> "
> 
> One aspect that isn't immediately obvious, since it isn't available via
> source code, is how you've done the company assignment.  I suspect the
> ones for your employer are correct but for other companies there might be
> mistakes or oversights when people are using personal email addresses.

I have been using the lists of developers that have access to run tests 
(ci-authorized) maintained per company, and do not use the email address 
(error prone) instead using the gerrit id. You are right that it may not 
be a complete or accurate list, but its all we have at the moment. The 
groups are listed here:

https://gerrit.openbmc-project.xyz/admin/groups

As you can see, some contributors listed don't even have email addresses 
in their gerrit profile.

> I think this concern also ties into the ask a month ago with the
> "computer readable CLA database."  If we had a CLA database and this
> tool used it, we would have one place to audit for correctness.

Absolutely! I have been working with the LF for a tool, but found out 
that the tool they have developed costs money and requires a LF Id. It 
is an option for the future for us to use this system and have 
CLA/CCLA/Id/Developer ACLs all in one place, but that will require 
"member" companies and monetary contributions to the project. That, or 
develop our own. All good topics for discussion.

Thanks again!
Kurt Taylor (krtaylor)