xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Prototype Code Review Dashboards (input required)
@ 2016-03-01 13:53 Lars Kurth
  2016-03-01 17:04 ` Lars Kurth
  0 siblings, 1 reply; 11+ messages in thread
From: Lars Kurth @ 2016-03-01 13:53 UTC (permalink / raw)
  To: xen-devel; +Cc: dizquierdo, Jesus M. Gonzalez-Barahona

Hi everyone,

we have a first publicly available prototype of the code review dashboard and I am looking for input. You can access the dashboard via:

- kibana-xen.bitergia.com  
  There are a couple of links on the top, which get you to two different dashboards 
  - Dash 1 contains panels for A use-cases : all data applies to series, not individual patches
  - Dash 2 for B use-cases : all data applies to series, not individual patches
  - The A0 and B0 use-cases look are based on some older dashboards, which are not published

I have not looked at this version in detail yet, but CC'ed Bitergia and am planning to use this thread to provide feedback. And I am also looking for input from you folks. 

General instructions on usage:
- For every table, you can click on an item. This will add a filter which is displayed in the 3rd line from the top and can be deleted and edited by clicking onto the filter. Also see https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html - note that I have not played with it
- You can also create time based filters by clicking on time based diagrams (just select a portion of the diagram). This will set up a time range selectorYou can delete or change the filter in the top right corner. 
- It is also possible to create more advanced filters in the 2nd line from the top (the default is *). For example you can type in statements such as
  - subject: "libxc*"
  - subject: "libxc*" AND post_ack_comment: 1
  - subject: "libxc*" AND merged: 0
  - The syntax for the search bar, I believe can be found at http://lucene.apache.org/core/2_9_4/queryparsersyntax.html
  - You can get a sense of the fields that you can use in the big tables at the bottom. 

Notes:
- I also want to add a message ID, such that individual reviews could easily be found within a mail client. That has not yet been done.
- I also want to see whether there is a way to get from a patch series to its parts

https://www.elastic.co/guide/en/kibana/current/dashboard.html provides an overview, but note that the dashboards are read-only. 

Below, are a number of use-cases we looked at ...

Regards
Lars

Case of Study A.1: Identify top reviewers (for both individuals and companies)
------------------------------------------------------------------------------

Goal: Highlight review contributions - ability to use the data to "reward" review contributions and encourage more "review" contributions

Possible places where this could be added : a separate table which is not time based, but can be filtered by time
Possible metrics: number of review comments by person, number of patches/patch series a person is actively commenting on, number of ACKS and reviewed by tags submitted by person

Actions: we could try to have a panel only focused on rewarding people and only based on information per domain and individual with some large table at the end with all of the reviews.

Case of Study A.2: Identify imbalances between reviewing and contribution
------------------------------------------------------------------------

Context: We suspect that we have some imbalances in the community, aka
- some companies/individuals which primarily commit patches, but not review code
- some companies/individuals which primarily comment on patches, but do not write code
- some which do both

Goal: Highlight imbalances

Possible places where this could be added : a separate table which is not time based, but can be filtered by time or some histogram
Possible metrics: number of patches/patch series a person/company is actively commenting on divided by number of patches/patch series a person/company is actively submitting

Actions: build this dashboard with pre-processed information about the balances.

Case of Study A.3: identify post ACK-commenting on patches
----------------------------------------------------------

Background: We suspect that post-ACK commenting on patches may be a key issue in our community. Post-ACK comments would be an indicator for something having gone wrong in the review process.

Goal:
- Identify people and companies which persistently comment post ACK
- Potentially this could be very powerful, if we had a widget such as a pie chart which shows the proportion of patches/patch series with no post-ACK comments vs. with post-ACK comments
- AND if that could be used to see how all the other data was different if one or the other were selected
- In addition being able to get a table of people/companies which shows data by person/company such as: #of comments post-ACK, #of patches/series impacted by post-ACK comments and then being able to get to those series would be incredible

NOTE: Need to check the data. It seems there are many post-ACK comments in a 5 year view, but none in the last year. That seems wrong. 

Case of Study A.0: with the people focused dashboard as it is
-------------------------------------------------------------
NOTE: this view/use-case is not yet shown in the current dashboard: it very much focusses on the analysis in https://github.com/dicortazar/ipython-notebooks/blob/master/projects/xen-analysis/Code-Review-Metrics.ipynb

Context: From 'Comments per Domain': select a compay with high patch to comment and selecct one with a high number. Then use other diagrams to check for any bad interactions between people. This seems to be powerful.

Required Improvements:
- filter by ACKs adding a company table that lists number of ACKs and time to ACK.
- filter by average number of patch version revisions by person and/or company.

More context: 'Selecting a time period for Patch Time to comment' and then repeating the above is very useful. Going to peaks of the time to merge helped to drill down to the cause of the issue.

Actions: we could probably improve this panel with information about ACKs.


Note that the following is not fully implemented yet

Case of Study B.1: identify post ACK-commenting on patches
----------------------------------------------------------

Goal:
- Identify and track whether post ACK commenting significantly impacts review times
- Also see UC 3 for the people view: so a selector which would allow me to select patches with/without post-ACK comments would be very powerful and then see whether there is an impact on some of the common stats

Case of Study B.2: see whether the backlog is increasing
--------------------------------------------------------

Context:
Right now it is impossible to tell whether
a) Whether/how the backlog is increasing
b) At which rate we are completing reviews
c) These could be series or patches
Admittedly some of this will require us to get the data set accuracy up

Extra context:
We could go for 1-year old patch series as abandoned ones.
Let's add information about efficiency of the community adding the BMI index.

Actions: try to work on a backlog panel with some efficiency metrics on top of this.

Case of study B.3: focus attention on nearly completed reviews (by % ACKED)
---------------------------------------------------------------------------

One of my goals of these dashboards was to enable developers to make use of it and also to make sure that reviews that are almost complete get completed more quickly
To do this, we need some sort of selectors that allow to differentiate between
a) COMMITTED (past) and UNCOMMITTED reviews
b) For UNCOMMITTED reviews
- Maybe have a pie chart which allows to select the of files which are 25 ACKED, 50% ACKED, ...
- Similarly a pie chart to select those where there was activity in: the "last week", "last 2 weeks:, "last month", "last quarter", "last year", "older"
This would be very powerful, in particular if it can be linked back to a specific review

Case of study B.0: with the time focused dashboard as it is
-----------------------------------------------------------NOTE: this view/use-case is not yet shown in the current dashboard: it very much focusses on the analysis in https://github.com/dicortazar/ipython-notebooks/blob/master/projects/xen-analysis/Code-Review-Metrics.ipynb

Context:
- I have not been able to play with this extensively, but generally playing with the different views has been instructive in many cases and very quickly I could get to a small set of problem reviews
- The key problem I had with this view, is that there is no bridge back to people and companies: say, I identified a patch series (or a class of them) which cause a time based problem, I was then not able to see who the "actors" (individuals and companies) were who caused the problem

Actions: have a look and check if it's possible to add filters by domain and individuals

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Prototype Code Review Dashboards (input required)
  2016-03-01 13:53 Prototype Code Review Dashboards (input required) Lars Kurth
@ 2016-03-01 17:04 ` Lars Kurth
  2016-03-02 22:45   ` Daniel Izquierdo
  0 siblings, 1 reply; 11+ messages in thread
From: Lars Kurth @ 2016-03-01 17:04 UTC (permalink / raw)
  To: dizquierdo, Jesus M. Gonzalez-Barahona; +Cc: xen-devel

Daniel, Jesus,

I am going to break my comments down into different sections to make this more consumable. Let's focus on the A1-A3 use-cases in this mail.

First I wanted to start of with some questions about definitions, as I am seeing some discrepancies in some of the data shown and am trying to understand exactly what the data means, and then have a look at the individual sections.

General, to the Xen-A1.A2.A3 dash board
- I played with some filters and noticed some oddities, e.g. if I filter on "merged: 0" all views change as expected
- If I filter on "merged: 1", a lot of widgets show no data. Is this showing that there is an issue with the data somewhere?
- I see similar issues with other filters, e.g. 'emailtype: "patch"'

> On 1 Mar 2016, at 13:53, Lars Kurth <lars.kurth.xen@gmail.com> wrote:
> 
> Case of Study A.1: Identify top reviewers (for both individuals and companies)
> ------------------------------------------------------------------------------
> 
> Goal: Highlight review contributions - ability to use the data to "reward" review contributions and encourage more "review" contributions

The widgets in question are:
- Evolution 'Reviewed by-flag' (no patchseries, no patches)
- What is the difference to Evolution of patches
- Top People/Domains Reviewing patches

Q1: Are this the reviewed-by flags? 
Q2: What is the scope? Do the number count 
- the # files someone reviewed
- the # patches someone reviewed
- the # series someone reviewed

If a reviewer is solely defined by the reviewed-by tags, the data does not provide a correct picture.
It may be better to use the following definition (although, others may disagree)
A reviewer is someone who did one of the following for a patch or series:
- Added a reviewed-by flag
- Added an acked-by flag (maintainers tend to use acked-by)
- Made a comment, but is NOT the author

Related to that use-case are also the following widgets
- Evolution of Comments Activity
- Top People/Domains Commenting (which also contain post-ACK comments and are thus also related to A.3)
- Evolution of Email activity

Q3: Again, the scope isn't quite clear
Q4: The figures are higher than those in "People/Domains Reviewing patches". Are comments on people's own patches included (these would be replies to the comments of others)

> Possible places where this could be added : a separate table which is not time based, but can be filtered by time
> Possible metrics: number of review comments by person, number of patches/patch series a person is actively commenting on, number of ACKS and reviewed by tags submitted by person
> 
> Actions: we could try to have a panel only focused on rewarding people and only based on information per domain and individual with some large table at the end with all of the reviews.

I don't have a strong view, but I think that there are too many tables and graphs and that they could possibly be consolidated. For example

- The "Top People/Domains Reviewing Patches" views are a subset of the imbalance tables. In particular exporting the data would be useful for me. 
  - Personally, I don't mind just having the superset.
  - In particular, as the tables are sortable
  - And the only filters that can be added are sender and sender_domain

- Depending on the answer of Q2, the "Evolution 'Reviewed by-flag'..." and "Evolution of patches..." could probably be shown as a combined bar graph
  - If they have the same scope, then a bar chart may be better
  - I will probably have some further thoughts about this, based on the answer.

> Case of Study A.2: Identify imbalances between reviewing and contribution
> ------------------------------------------------------------------------
> 
> Context: We suspect that we have some imbalances in the community, aka
> - some companies/individuals which primarily commit patches, but not review code
> - some companies/individuals which primarily comment on patches, but do not write code
> - some which do both

I think this works quite fine, and the data is consistent with A.1
The only comment I would have is that we should calculate the Balance by Reviews - Patches posted

> Goal: Highlight imbalances
> 
> Possible places where this could be added : a separate table which is not time based, but can be filtered by time or some histogram
> Possible metrics: number of patches/patch series a person/company is actively commenting on divided by number of patches/patch series a person/company is actively submitting
> 
> Actions: build this dashboard with pre-processed information about the balances.

This seems to be quite good: the only observation is that authors (and domains) are case sensitive and probably should be normalised
There also seem to be entries such as "Jan Beulich [mailto:JBeulich@suse.com]"

I also found lots of entries, with multiple e-mail addresses such as
- "andrew.cooper3@citrix.com; Ian.Campbell@citrix.com; wei.liu2@citrix.com; ian.jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com; Dong, Eddie; Nakajima, Jun; Tian, Kevin; xen-devel@lists.xen.org; keir@xen.org"
- Most of these have (0,0,0) and can probably be removed, if that's possible

> Case of Study A.3: identify post ACK-commenting on patches
> ----------------------------------------------------------
> 
> Background: We suspect that post-ACK commenting on patches may be a key issue in our community. Post-ACK comments would be an indicator for something having gone wrong in the review process.

> Goal:
> - Identify people and companies which persistently comment post ACK
> - Potentially this could be very powerful, if we had a widget such as a pie chart which shows the proportion of patches/patch series with no post-ACK comments vs. with post-ACK comments

I think that
- Evolution of Comments Activity
- Top People/Domains Commenting (which also contain post-ACK comments and are thus also related to A.3)
- Evolution of Email activity
- _emailtype views : not quite sure what the difference is

serves two purposes: it contributes to A.1, but also to A.3

Related to this seem to be 
- Evolution of Flags
- Top People/Domain analysis
- Evolution of Patch series

Q5: What are All Flags? This seems awfully high: maybe signed off? 
Something doesn't quite seem to add up, e.g. if I look at March 16th, I get
- Patch e-mails = 1851
- All flags = 1533
- Reviewed by = 117
- Acked by = 150
- Comments = 480 
 All flags seems to be tracking Patch e-mails

Q6: Same question about the scope. The background is whether we can consolidate some of these, and what we don't need.

> NOTE: Need to check the data. It seems there are many post-ACK comments in a 5 year view, but none in the last year. That seems wrong. 

However it seems that the data for post-ACK comments may be wrong: in the last year, there were 0 post-ACK comments. That is clearly wrong.

> - AND if that could be used to see how all the other data was different if one or the other were selected
> - In addition being able to get a table of people/companies which shows data by person/company such as: #of comments post-ACK, #of patches/series impacted by post-ACK comments and then being able to get to those series would be incredible

This seems to not be there yet. If there was a filter (manual or through some widget) that would be good. But I guess we need to look at the data first.

> Case of Study A.0: with the people focused dashboard as it is
> -------------------------------------------------------------
> NOTE: this view/use-case is not yet shown in the current dashboard: it very much focusses on the analysis in https://github.com/dicortazar/ipython-notebooks/blob/master/projects/xen-analysis/Code-Review-Metrics.ipynb
> 
> Context: From 'Comments per Domain': select a compay with high patch to comment and selecct one with a high number. Then use other diagrams to check for any bad interactions between people. This seems to be powerful.
> 
> Required Improvements:
> - filter by ACKs adding a company table that lists number of ACKs and time to ACK.
> - filter by average number of patch version revisions by person and/or company.
> 
> More context: 'Selecting a time period for Patch Time to comment' and then repeating the above is very useful. Going to peaks of the time to merge helped to drill down to the cause of the issue.
> 
> Actions: we could probably improve this panel with information about ACKs.

Given, that the Xen-A1.A2.A3 dash board is quite busy, we should keep that separate.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Prototype Code Review Dashboards (input required)
  2016-03-01 17:04 ` Lars Kurth
@ 2016-03-02 22:45   ` Daniel Izquierdo
  2016-03-03 18:55     ` Lars Kurth
  2016-03-09 17:06     ` Daniel Izquierdo
  0 siblings, 2 replies; 11+ messages in thread
From: Daniel Izquierdo @ 2016-03-02 22:45 UTC (permalink / raw)
  To: Lars Kurth, Jesus M. Gonzalez-Barahona; +Cc: xen-devel

On 01/03/16 18:04, Lars Kurth wrote:
> Daniel, Jesus,
>
> I am going to break my comments down into different sections to make this more consumable. Let's focus on the A1-A3 use-cases in this mail.
>
> First I wanted to start of with some questions about definitions, as I am seeing some discrepancies in some of the data shown and am trying to understand exactly what the data means, and then have a look at the individual sections.
>
> General, to the Xen-A1.A2.A3 dash board
> - I played with some filters and noticed some oddities, e.g. if I filter on "merged: 0" all views change as expected
> - If I filter on "merged: 1", a lot of widgets show no data. Is this showing that there is an issue with the data somewhere?
> - I see similar issues with other filters, e.g. 'emailtype: "patch"'

In order to bring some context to the dataset, ElasticSearch was 
initially used for parsing Apache logs. That means that data should be 
formatted as 'a row = an event'.

In this dataset there are several events that are defined by the field 
'EmailType'. 'patchserie', 'patch', 'comment', 'flag'. And then, 
depending on that 'EmailType', each of the columns may have some meaning 
or some other.

This structure uses the 'EmailType' as the central key where the rest of 
the columns provide extra syntax. For instance, post_ack_comment field 
only makes sense for the EmailType:comment.

Coming back to the comments:

There are fields that apply only to specific type of events. In the case 
of 'merge' this applies only in the case of patches. merge:1 would 
filter patches that are merged (so the rest of the information is 
literally removed as they are not merged). If we filter by merge:0, 
these are the rest of the information (even including flags).

Thus, using the filter merge:1 leads to having info only related to 
'patches' in this case.

As this panel shows information about other types than 'patch', if you 
filter by some 'emailtype' such as 'patch' then you're focusing only on 
patches data and this will display the merged and not merged ones.

In order to improve this, we can either create a panel for type of 
analysis (one panel for patches, one for comments, etc). Or we can play 
with adding the 'merge' field to any flag, patchserie, patch and comment 
whose patch was merged at some point. The latter may sound a bit weird 
as a 'merged' status does not apply to a flag (Reviewed-by) for instance.

>> On 1 Mar 2016, at 13:53, Lars Kurth <lars.kurth.xen@gmail.com> wrote:
>>
>> Case of Study A.1: Identify top reviewers (for both individuals and companies)
>> ------------------------------------------------------------------------------
>>
>> Goal: Highlight review contributions - ability to use the data to "reward" review contributions and encourage more "review" contributions
> The widgets in question are:
> - Evolution 'Reviewed by-flag' (no patchseries, no patches)
> - What is the difference to Evolution of patches
> - Top People/Domains Reviewing patches
>
> Q1: Are this the reviewed-by flags?

They are only the Reviewed-by flags.

> Q2: What is the scope? Do the number count
> - the # files someone reviewed
> - the # patches someone reviewed
> - the # series someone reviewed

The number counts the number of reviews accomplished by a developer or 
by a domain. A review is accomplished when the flag 'reviewed-by' is 
detected in a email replying a patch.

If a developer reviews several patches or several versions of the same 
patch, each of those is counted as a different review.

>
> If a reviewer is solely defined by the reviewed-by tags, the data does not provide a correct picture.
This is how this works so far.

> It may be better to use the following definition (although, others may disagree)
> A reviewer is someone who did one of the following for a patch or series:
> - Added a reviewed-by flag
> - Added an acked-by flag (maintainers tend to use acked-by)
> - Made a comment, but is NOT the author

We can update that definition. Do we want to have extra discussion with 
this respect?

> Related to that use-case are also the following widgets
> - Evolution of Comments Activity
> - Top People/Domains Commenting (which also contain post-ACK comments and are thus also related to A.3)
> - Evolution of Email activity
>
> Q3: Again, the scope isn't quite clear

This is the number of comments replying to a patch. A comment is defined 
as an email reply to a patch.

> Q4: The figures are higher than those in "People/Domains Reviewing patches". Are comments on people's own patches included (these would be replies to the comments of others)

I should check the  last question. I'd say that we're including them, as 
they are 'comments' to a patch. You can indeed comment your own patches 
:). But we can deal with this if this does not make sense.

>
>> Possible places where this could be added : a separate table which is not time based, but can be filtered by time
>> Possible metrics: number of review comments by person, number of patches/patch series a person is actively commenting on, number of ACKS and reviewed by tags submitted by person
>>
>> Actions: we could try to have a panel only focused on rewarding people and only based on information per domain and individual with some large table at the end with all of the reviews.
> I don't have a strong view, but I think that there are too many tables and graphs and that they could possibly be consolidated. For example
>
> - The "Top People/Domains Reviewing Patches" views are a subset of the imbalance tables. In particular exporting the data would be useful for me.
>    - Personally, I don't mind just having the superset.
>    - In particular, as the tables are sortable
>    - And the only filters that can be added are sender and sender_domain

Each set of evolutionary chart and the two tables are based on a 
different 'search' in ElasticSearch that is in the end like a 'view' in sql.
So the consolidation may require extra work here not that easy to 
change, although we can discuss about this. This is divided by use 
cases, so consolidating this may mean consolidate the use cases in first 
place.

>
> - Depending on the answer of Q2, the "Evolution 'Reviewed by-flag'..." and "Evolution of patches..." could probably be shown as a combined bar graph
>    - If they have the same scope, then a bar chart may be better
>    - I will probably have some further thoughts about this, based on the answer.

In the case of patches, this is the number of patches. The several 
versions of a patch are counted as a new patch in this case. We assumed 
that patches re-sent were those that required extra work, so they could 
be counted as a new patch. Regarding to the reviews, this is a similar 
approach, tables and the evolutionary chart shows the number of total 
reviews that were detailed in the specific email.

>> Case of Study A.2: Identify imbalances between reviewing and contribution
>> ------------------------------------------------------------------------
>>
>> Context: We suspect that we have some imbalances in the community, aka
>> - some companies/individuals which primarily commit patches, but not review code
>> - some companies/individuals which primarily comment on patches, but do not write code
>> - some which do both
> I think this works quite fine, and the data is consistent with A.1
> The only comment I would have is that we should calculate the Balance by Reviews - Patches posted

We can changed that.

>
>> Goal: Highlight imbalances
>>
>> Possible places where this could be added : a separate table which is not time based, but can be filtered by time or some histogram
>> Possible metrics: number of patches/patch series a person/company is actively commenting on divided by number of patches/patch series a person/company is actively submitting
>>
>> Actions: build this dashboard with pre-processed information about the balances.
> This seems to be quite good: the only observation is that authors (and domains) are case sensitive and probably should be normalised
> There also seem to be entries such as "Jan Beulich [mailto:JBeulich@suse.com]"
>
> I also found lots of entries, with multiple e-mail addresses such as
> - "andrew.cooper3@citrix.com; Ian.Campbell@citrix.com; wei.liu2@citrix.com; ian.jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com; Dong, Eddie; Nakajima, Jun; Tian, Kevin; xen-devel@lists.xen.org; keir@xen.org"
> - Most of these have (0,0,0) and can probably be removed, if that's possible

This needs some cleaning actions, thanks for the pointer, I'll have a 
look at this!

>
>> Case of Study A.3: identify post ACK-commenting on patches
>> ----------------------------------------------------------
>>
>> Background: We suspect that post-ACK commenting on patches may be a key issue in our community. Post-ACK comments would be an indicator for something having gone wrong in the review process.
>> Goal:
>> - Identify people and companies which persistently comment post ACK
>> - Potentially this could be very powerful, if we had a widget such as a pie chart which shows the proportion of patches/patch series with no post-ACK comments vs. with post-ACK comments
> I think that
> - Evolution of Comments Activity
> - Top People/Domains Commenting (which also contain post-ACK comments and are thus also related to A.3)
> - Evolution of Email activity
> - _emailtype views : not quite sure what the difference is
>
> serves two purposes: it contributes to A.1, but also to A.3
>
> Related to this seem to be
> - Evolution of Flags
> - Top People/Domain analysis
> - Evolution of Patch series
>
> Q5: What are All Flags? This seems awfully high: maybe signed off?
> Something doesn't quite seem to add up, e.g. if I look at March 16th, I get
> - Patch e-mails = 1851
> - All flags = 1533
> - Reviewed by = 117
> - Acked by = 150
> - Comments = 480
>   All flags seems to be tracking Patch e-mails

Flags are found in some replies to some patches. When a flag is found in 
an email, there's a new entry in the list of flags. If there were three 
flags (eg signed-off, cc and reported-by) those are three new entries in 
the table.
In this case, 'all flags' is the aggregation of all of the flags found. 
Each of those special flags counts.

Said this, we can reduce the flags in the dataset down to the number of 
flags of interest for the analysis: reviewed-by and acked-by. This flags 
info contains richer information.

>
> Q6: Same question about the scope. The background is whether we can consolidate some of these, and what we don't need.
>
>> NOTE: Need to check the data. It seems there are many post-ACK comments in a 5 year view, but none in the last year. That seems wrong.
> However it seems that the data for post-ACK comments may be wrong: in the last year, there were 0 post-ACK comments. That is clearly wrong.

This sounds like a bug. I'll review this. Thanks again for this pointer.
>
>> - AND if that could be used to see how all the other data was different if one or the other were selected
>> - In addition being able to get a table of people/companies which shows data by person/company such as: #of comments post-ACK, #of patches/series impacted by post-ACK comments and then being able to get to those series would be incredible
> This seems to not be there yet. If there was a filter (manual or through some widget) that would be good. But I guess we need to look at the data first.

With the current data and for this specific case, a work around would be 
to write in the search box: "sender_domain:citrix.com AND 
post_ack_comment:1". This provides a list of comments in one of the 
bottom tables ( so far wrong data as you mentioned regarding to no 
existing post-ack comments in 2015). There you can see that there's a 
patchserie_id that can be later used for tracking those patchseries 
affected by post_ack_comments.

>
>> Case of Study A.0: with the people focused dashboard as it is
>> -------------------------------------------------------------
>> NOTE: this view/use-case is not yet shown in the current dashboard: it very much focusses on the analysis in https://github.com/dicortazar/ipython-notebooks/blob/master/projects/xen-analysis/Code-Review-Metrics.ipynb
>>
>> Context: From 'Comments per Domain': select a compay with high patch to comment and selecct one with a high number. Then use other diagrams to check for any bad interactions between people. This seems to be powerful.
>>
>> Required Improvements:
>> - filter by ACKs adding a company table that lists number of ACKs and time to ACK.
>> - filter by average number of patch version revisions by person and/or company.
>>
>> More context: 'Selecting a time period for Patch Time to comment' and then repeating the above is very useful. Going to peaks of the time to merge helped to drill down to the cause of the issue.
>>
>> Actions: we could probably improve this panel with information about ACKs.
> Given, that the Xen-A1.A2.A3 dash board is quite busy, we should keep that separate.
>
>
>

I still have to upload this, sorry for the delay!


This is a summary of the actions unless extra comments are provided:

* Balance should be calculated as reviews - patches
* Cleaning actions in the dataset when finding multiple email addresses
* Bugs with the post-ack comments
* Add the extra panel defined as use case A.0

Some extra feedback or discussion:

* Reduce the flags to be used. We're currently using all of the flags 
available and sub-setting the reviewed-by and acked-by
* I'm not sure if we need extra discussion related to the merge:1 filter
* Check how reviews and patches are counted. Any new version of a patch 
is counted as a new patch. Any new reviewed-by flag in a reply to a 
patch is counted as a review.


Regards,
Daniel.


-- 
Daniel Izquierdo Cortazar, PhD
Chief Data Officer
---------
"Software Analytics for your peace of mind"
www.bitergia.com
@bitergia


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Prototype Code Review Dashboards (input required)
  2016-03-02 22:45   ` Daniel Izquierdo
@ 2016-03-03 18:55     ` Lars Kurth
  2016-03-04  8:42       ` Jan Beulich
  2016-03-09 16:58       ` Daniel Izquierdo
  2016-03-09 17:06     ` Daniel Izquierdo
  1 sibling, 2 replies; 11+ messages in thread
From: Lars Kurth @ 2016-03-03 18:55 UTC (permalink / raw)
  To: Daniel Izquierdo, Jan Beulich; +Cc: xen-devel, Jesus M. Gonzalez-Barahona

Daniel, (added Jan - search for @Jan)

thanks for the feedback

> On 2 Mar 2016, at 22:45, Daniel Izquierdo <dizquierdo@bitergia.com> wrote:
> 
> On 01/03/16 18:04, Lars Kurth wrote:
>> Daniel, Jesus,
>> 
>> I am going to break my comments down into different sections to make this more consumable. Let's focus on the A1-A3 use-cases in this mail.
>> 
>> First I wanted to start of with some questions about definitions, as I am seeing some discrepancies in some of the data shown and am trying to understand exactly what the data means, and then have a look at the individual sections.
>> 
>> General, to the Xen-A1.A2.A3 dash board
>> - I played with some filters and noticed some oddities, e.g. if I filter on "merged: 0" all views change as expected
>> - If I filter on "merged: 1", a lot of widgets show no data. Is this showing that there is an issue with the data somewhere?
>> - I see similar issues with other filters, e.g. 'emailtype: "patch"'
> 
> In order to bring some context to the dataset, ElasticSearch was initially used for parsing Apache logs. That means that data should be formatted as 'a row = an event'.
> 
> In this dataset there are several events that are defined by the field 'EmailType'. 'patchserie', 'patch', 'comment', 'flag'. And then, depending on that 'EmailType', each of the columns may have some meaning or some other.

That makes sense and is quite important background information and different from the original. I think for people to be able to build sophisticated search queries - once we have finalised all the fields - we should probably document this and the exact meaning of fields (also see below). I was thinking - and I volunteer - to put together an extra view that acts like a legend. With the editable version of the dashboard, that should be easy to do.

I think we either
- need to look at the panels and come up with some cleaner and clearer terminology 
- or, split the panels and make sure that only views which refer  
let me think about it and make a proposal.

> This structure uses the 'EmailType' as the central key where the rest of the columns provide extra syntax. For instance, post_ack_comment field only makes sense for the EmailType:comment.

Now I understand.

> Coming back to the comments:
> 
> There are fields that apply only to specific type of events. In the case of 'merge' this applies only in the case of patches. merge:1 would filter patches that are merged (so the rest of the information is literally removed as they are not merged). If we filter by merge:0, these are the rest of the information (even including flags).

OK. I think this is very confusing because when you look at the tables at the 3 tables at the bottom, all of them show the same fields. And the default for an undefined field seems to be 0. Is there any way to make sure that undefined fields are set to na or -1 or similar. That would make things clearer. 

> Thus, using the filter merge:1 leads to having info only related to 'patches' in this case.
> 
> As this panel shows information about other types than 'patch', if you filter by some 'emailtype' such as 'patch' then you're focusing only on patches data and this will display the merged and not merged ones.
> 
> In order to improve this, we can either create a panel for type of analysis (one panel for patches, one for comments, etc). Or we can play with adding the 'merge' field to any flag, patchserie, patch and comment whose patch was merged at some point. The latter may sound a bit weird as a 'merged' status does not apply to a flag (Reviewed-by) for instance.

Given that we have a lot of stuff on the panels, we should probably go for the 1st option. But there may be corner cases, when we may want to go for the second. When I go to the editable dashboard version, how can I tell which EmailType is used. I couldn't figure that out.

> 
>>> On 1 Mar 2016, at 13:53, Lars Kurth <lars.kurth.xen@gmail.com> wrote:
>>> 
>>> Case of Study A.1: Identify top reviewers (for both individuals and companies)
>>> ------------------------------------------------------------------------------
>>> 
>>> Goal: Highlight review contributions - ability to use the data to "reward" review contributions and encourage more "review" contributions
>> The widgets in question are:
>> - Evolution 'Reviewed by-flag' (no patchseries, no patches)
>> - What is the difference to Evolution of patches
>> - Top People/Domains Reviewing patches
>> 
>> Q1: Are this the reviewed-by flags?
> 
> They are only the Reviewed-by flags.
> 
>> Q2: What is the scope? Do the number count
>> - the # files someone reviewed
>> - the # patches someone reviewed
>> - the # series someone reviewed
> 
> The number counts the number of reviews accomplished by a developer or by a domain. A review is accomplished when the flag 'reviewed-by' is detected in a email replying a patch.
> 
> If a developer reviews several patches or several versions of the same patch, each of those is counted as a different review.

So this is basically the number of reviewed by flags aggregated per developer. 

>> If a reviewer is solely defined by the reviewed-by tags, the data does not provide a correct picture.
> This is how this works so far.
> 
>> It may be better to use the following definition (although, others may disagree)
>> A reviewer is someone who did one of the following for a patch or series:
>> - Added a reviewed-by flag
>> - Added an acked-by flag (maintainers tend to use acked-by)
>> - Made a comment, but is NOT the author
> 
> We can update that definition. Do we want to have extra discussion with this respect?

I think that would be more correct. In particular, as we still will be able 
@Jan, what is your view? This use-case was primarily created because of 

> 
>> Related to that use-case are also the following widgets
>> - Evolution of Comments Activity
>> - Top People/Domains Commenting (which also contain post-ACK comments and are thus also related to A.3)
>> - Evolution of Email activity
>> 
>> Q3: Again, the scope isn't quite clear
> 
> This is the number of comments replying to a patch. A comment is defined as an email reply to a patch.
> 
>> Q4: The figures are higher than those in "People/Domains Reviewing patches". Are comments on people's own patches included (these would be replies to the comments of others)
> 
> I should check the  last question. I'd say that we're including them, as they are 'comments' to a patch. You can indeed comment your own patches :). But we can deal with this if this does not make sense.

Given the use-case of highlighting people commenting on the patches of others, we should probably not count comments to your own patches. 


>>> Possible places where this could be added : a separate table which is not time based, but can be filtered by time
>>> Possible metrics: number of review comments by person, number of patches/patch series a person is actively commenting on, number of ACKS and reviewed by tags submitted by person
>>> 
>>> Actions: we could try to have a panel only focused on rewarding people and only based on information per domain and individual with some large table at the end with all of the reviews.
>> I don't have a strong view, but I think that there are too many tables and graphs and that they could possibly be consolidated. For example
>> 
>> - The "Top People/Domains Reviewing Patches" views are a subset of the imbalance tables. In particular exporting the data would be useful for me.
>>   - Personally, I don't mind just having the superset.
>>   - In particular, as the tables are sortable
>>   - And the only filters that can be added are sender and sender_domain
> 
> Each set of evolutionary chart and the two tables are based on a different 'search' in ElasticSearch that is in the end like a 'view' in sql.
> So the consolidation may require extra work here not that easy to change, although we can discuss about this. This is divided by use cases, so consolidating this may mean consolidate the use cases in first place.

Ah OK. This is not such a big deal. I think we may need to consolidate a little bit though, looking at the data. There seems to be a lot of duplicate information.

>> - Depending on the answer of Q2, the "Evolution 'Reviewed by-flag'..." and "Evolution of patches..." could probably be shown as a combined bar graph
>>   - If they have the same scope, then a bar chart may be better
>>   - I will probably have some further thoughts about this, based on the answer.
> 
> In the case of patches, this is the number of patches. The several versions of a patch are counted as a new patch in this case.

OK: just brings me back to that we need a legend with clear definitions and consistently apply terminology. Is this also true for the tables? 

> We assumed that patches re-sent were those that required extra work, so they could be counted as a new patch. Regarding to the reviews, this is a similar approach, tables and the evolutionary chart shows the number of total reviews that were detailed in the specific email.

OK: I think this is fine


>>> Case of Study A.2: Identify imbalances between reviewing and contribution
>>> ------------------------------------------------------------------------
>>> 
>>> Context: We suspect that we have some imbalances in the community, aka
>>> - some companies/individuals which primarily commit patches, but not review code
>>> - some companies/individuals which primarily comment on patches, but do not write code
>>> - some which do both
>> I think this works quite fine, and the data is consistent with A.1
>> The only comment I would have is that we should calculate the Balance by Reviews - Patches posted
> 
> We can changed that.
> 
>> 
>>> Goal: Highlight imbalances
>>> 
>>> Possible places where this could be added : a separate table which is not time based, but can be filtered by time or some histogram
>>> Possible metrics: number of patches/patch series a person/company is actively commenting on divided by number of patches/patch series a person/company is actively submitting
>>> 
>>> Actions: build this dashboard with pre-processed information about the balances.
>> This seems to be quite good: the only observation is that authors (and domains) are case sensitive and probably should be normalised
>> There also seem to be entries such as "Jan Beulich [mailto:JBeulich@suse.com]"
>> 
>> I also found lots of entries, with multiple e-mail addresses such as
>> - "andrew.cooper3@citrix.com; Ian.Campbell@citrix.com; wei.liu2@citrix.com; ian.jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com; Dong, Eddie; Nakajima, Jun; Tian, Kevin; xen-devel@lists.xen.org; keir@xen.org"
>> - Most of these have (0,0,0) and can probably be removed, if that's possible
> 
> This needs some cleaning actions, thanks for the pointer, I'll have a look at this!

Before you start, I wanted to seed the following idea. I noticed that a lot of other CC's are in those lists also. Remember, we were talking about increasing the matching rate of xen-devel@ e-mail traffic and git repos better. One of the key outstanding issues was that a significant portion of patch reviews posted xen-devel are in fact for QEMU, Linux, FreeBSD, NetBSD, etc. All of them follow a similar review process as Xen, but to identify the actual back-log in Xen - and to not have to deal with a lot of extra complexity - we originally decided to filter these out.

But that was before I understood how powerful Kibana was: an alternative may be to add an extra field to comments, patches, ... (I think that field would apply to EmailType=*) that deals with CC'ed mailing lists, such as 
- qemu-devel@nongnu.org
- linux-api@vger.kernel.org
- linux-arch@vger.kernel.org
  ... 
- linuxppc-dev@lists.ozlabs.org
- port-xen@netbsd.org
- (this may not be a complete list)

and put patchseries, patches, comments, flags into different buckets via an extra flag (e.g. project: xen-only, linux, netbsd, freebsd, ...) and allow to filter (e.g. via a pie chart). This would mean that we could use a filter, instead of complex matching, to cut the data. The default could be 'project: "xen-only"'. 

>>> Case of Study A.3: identify post ACK-commenting on patches
>>> ----------------------------------------------------------
>>> 
>>> Background: We suspect that post-ACK commenting on patches may be a key issue in our community. Post-ACK comments would be an indicator for something having gone wrong in the review process.
>>> Goal:
>>> - Identify people and companies which persistently comment post ACK
>>> - Potentially this could be very powerful, if we had a widget such as a pie chart which shows the proportion of patches/patch series with no post-ACK comments vs. with post-ACK comments
>> I think that
>> - Evolution of Comments Activity
>> - Top People/Domains Commenting (which also contain post-ACK comments and are thus also related to A.3)
>> - Evolution of Email activity
>> - _emailtype views : not quite sure what the difference is
>> 
>> serves two purposes: it contributes to A.1, but also to A.3
>> 
>> Related to this seem to be
>> - Evolution of Flags
>> - Top People/Domain analysis
>> - Evolution of Patch series
>> 
>> Q5: What are All Flags? This seems awfully high: maybe signed off?
>> Something doesn't quite seem to add up, e.g. if I look at March 16th, I get
>> - Patch e-mails = 1851
>> - All flags = 1533
>> - Reviewed by = 117
>> - Acked by = 150
>> - Comments = 480
>>  All flags seems to be tracking Patch e-mails
> 
> Flags are found in some replies to some patches. When a flag is found in an email, there's a new entry in the list of flags. If there were three flags (eg signed-off, cc and reported-by) those are three new entries in the table.
> In this case, 'all flags' is the aggregation of all of the flags found. Each of those special flags counts.
> 
> Said this, we can reduce the flags in the dataset down to the number of flags of interest for the analysis: reviewed-by and acked-by. This flags info contains richer information.

Is there an easy way to identify the flags which are available? I can then decide which ones we care about.

>> Q6: Same question about the scope. The background is whether we can consolidate some of these, and what we don't need.
>> 
>>> NOTE: Need to check the data. It seems there are many post-ACK comments in a 5 year view, but none in the last year. That seems wrong.
>> However it seems that the data for post-ACK comments may be wrong: in the last year, there were 0 post-ACK comments. That is clearly wrong.
> 
> This sounds like a bug. I'll review this. Thanks again for this pointer.

Thanks.

>>> - AND if that could be used to see how all the other data was different if one or the other were selected
>>> - In addition being able to get a table of people/companies which shows data by person/company such as: #of comments post-ACK, #of patches/series impacted by post-ACK comments and then being able to get to those series would be incredible
>> This seems to not be there yet. If there was a filter (manual or through some widget) that would be good. But I guess we need to look at the data first.
> 
> With the current data and for this specific case, a work around would be to write in the search box: "sender_domain:citrix.com AND post_ack_comment:1". This provides a list of comments in one of the bottom tables ( so far wrong data as you mentioned regarding to no existing post-ack comments in 2015). There you can see that there's a patchserie_id that can be later used for tracking those patchseries affected by post_ack_comments.

Let me try it, once you fixed the bug.

Regards
Lars
P.S.: The more I play with this, the more I like it. 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Prototype Code Review Dashboards (input required)
  2016-03-03 18:55     ` Lars Kurth
@ 2016-03-04  8:42       ` Jan Beulich
  2016-03-04  9:05         ` Lars Kurth
  2016-03-09 16:58       ` Daniel Izquierdo
  1 sibling, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2016-03-04  8:42 UTC (permalink / raw)
  To: Lars Kurth; +Cc: Daniel Izquierdo, Jesus M. Gonzalez-Barahona, xen-devel

>>> On 03.03.16 at 19:55, <lars.kurth.xen@gmail.com> wrote:
>> On 2 Mar 2016, at 22:45, Daniel Izquierdo <dizquierdo@bitergia.com> wrote:
>> On 01/03/16 18:04, Lars Kurth wrote:
>>> Q2: What is the scope? Do the number count
>>> - the # files someone reviewed
>>> - the # patches someone reviewed
>>> - the # series someone reviewed
>> 
>> The number counts the number of reviews accomplished by a developer or by a 
> domain. A review is accomplished when the flag 'reviewed-by' is detected in a 
> email replying a patch.
>> 
>> If a developer reviews several patches or several versions of the same 
> patch, each of those is counted as a different review.
> 
> So this is basically the number of reviewed by flags aggregated per 
> developer. 
> 
>>> If a reviewer is solely defined by the reviewed-by tags, the data does not provide a correct picture.
>> This is how this works so far.
>> 
>>> It may be better to use the following definition (although, others may disagree)
>>> A reviewer is someone who did one of the following for a patch or series:
>>> - Added a reviewed-by flag
>>> - Added an acked-by flag (maintainers tend to use acked-by)
>>> - Made a comment, but is NOT the author
>> 
>> We can update that definition. Do we want to have extra discussion with this respect?
> 
> I think that would be more correct. In particular, as we still will be able 
> @Jan, what is your view? This use-case was primarily created because of 

Two of your reply sentences seem to be missing their tails, so it's
really hard for me to tell my view, as it's not really clear what
you're asking for.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Prototype Code Review Dashboards (input required)
  2016-03-04  8:42       ` Jan Beulich
@ 2016-03-04  9:05         ` Lars Kurth
  2016-03-04  9:21           ` Jan Beulich
  0 siblings, 1 reply; 11+ messages in thread
From: Lars Kurth @ 2016-03-04  9:05 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Daniel Izquierdo, Jesus M. Gonzalez-Barahona, xen-devel


> On 4 Mar 2016, at 08:42, Jan Beulich <JBeulich@suse.com> wrote:
> 
>>>> On 03.03.16 at 19:55, <lars.kurth.xen@gmail.com> wrote:
>>> On 2 Mar 2016, at 22:45, Daniel Izquierdo <dizquierdo@bitergia.com> wrote:
>>> On 01/03/16 18:04, Lars Kurth wrote:
>>>> Q2: What is the scope? Do the number count
>>>> - the # files someone reviewed
>>>> - the # patches someone reviewed
>>>> - the # series someone reviewed
>>> 
>>> The number counts the number of reviews accomplished by a developer or by a 
>> domain. A review is accomplished when the flag 'reviewed-by' is detected in a 
>> email replying a patch.
>>> 
>>> If a developer reviews several patches or several versions of the same 
>> patch, each of those is counted as a different review.
>> 
>> So this is basically the number of reviewed by flags aggregated per 
>> developer. 
>> 
>>>> If a reviewer is solely defined by the reviewed-by tags, the data does not provide a correct picture.
>>> This is how this works so far.
>>> 
>>>> It may be better to use the following definition (although, others may disagree)
>>>> A reviewer is someone who did one of the following for a patch or series:
>>>> - Added a reviewed-by flag
>>>> - Added an acked-by flag (maintainers tend to use acked-by)
>>>> - Made a comment, but is NOT the author
>>> 
>>> We can update that definition. Do we want to have extra discussion with this respect?
>> 
>> I think that would be more correct. In particular, as we still will be able 
>> @Jan, what is your view? This use-case was primarily created because of 


> Two of your reply sentences seem to be missing their tails, so it's
> really hard for me to tell my view, as it's not really clear what
> you're asking for.

Apologies

I think that would be more correct. In particular, as we still will be able to get the reviewed-by and acked-by flags from the tools we already have (and they are also covered in graphs). They represent an outcome, but not really the effort that is spent on reviews. And the comments as used in the other panels, do not differentiate between people reviewing and responding to reviews.

@Jan, the use-case to measure real review contributions was primarily added on your request. Do you think the proposed definition above, is good enough?

Lars
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Prototype Code Review Dashboards (input required)
  2016-03-04  9:05         ` Lars Kurth
@ 2016-03-04  9:21           ` Jan Beulich
  2016-03-07 17:24             ` Lars Kurth
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2016-03-04  9:21 UTC (permalink / raw)
  To: Lars Kurth; +Cc: Daniel Izquierdo, Jesus M. Gonzalez-Barahona, xen-devel

>>> On 04.03.16 at 10:05, <lars.kurth.xen@gmail.com> wrote:
>> On 4 Mar 2016, at 08:42, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 03.03.16 at 19:55, <lars.kurth.xen@gmail.com> wrote:
>>>> On 2 Mar 2016, at 22:45, Daniel Izquierdo <dizquierdo@bitergia.com> wrote:
>>>> On 01/03/16 18:04, Lars Kurth wrote:
>>>>> It may be better to use the following definition (although, others may disagree)
>>>>> A reviewer is someone who did one of the following for a patch or series:
>>>>> - Added a reviewed-by flag
>>>>> - Added an acked-by flag (maintainers tend to use acked-by)
>>>>> - Made a comment, but is NOT the author
>>>> 
>>>> We can update that definition. Do we want to have extra discussion with this respect?
>>> 
>>> I think that would be more correct. In particular, as we still will be able 
>>> @Jan, what is your view? This use-case was primarily created because of 
> 
> 
>> Two of your reply sentences seem to be missing their tails, so it's
>> really hard for me to tell my view, as it's not really clear what
>> you're asking for.
> 
> Apologies
> 
> I think that would be more correct. In particular, as we still will be able 
> to get the reviewed-by and acked-by flags from the tools we already have (and 
> they are also covered in graphs). They represent an outcome, but not really 
> the effort that is spent on reviews. And the comments as used in the other 
> panels, do not differentiate between people reviewing and responding to 
> reviews.
> 
> @Jan, the use-case to measure real review contributions was primarily added 
> on your request. Do you think the proposed definition above, is good enough?

Yes, the last bullet point should be what mostly addresses my
original concern. Some differentiation between Acked-by and
Reviewed-by may also help - remember that in the case of
maintainers we generally mean the latter to imply the former,
and that in the case of non-maintainers the former doesn't
really mean much.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Prototype Code Review Dashboards (input required)
  2016-03-04  9:21           ` Jan Beulich
@ 2016-03-07 17:24             ` Lars Kurth
  0 siblings, 0 replies; 11+ messages in thread
From: Lars Kurth @ 2016-03-07 17:24 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Daniel Izquierdo, Jesus M. Gonzalez-Barahona, xen-devel


> On 4 Mar 2016, at 09:21, Jan Beulich <jbeulich@suse.com> wrote:
> 
>>>> On 04.03.16 at 10:05, <lars.kurth.xen@gmail.com> wrote:
>>> On 4 Mar 2016, at 08:42, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>> On 03.03.16 at 19:55, <lars.kurth.xen@gmail.com> wrote:
>>>>> On 2 Mar 2016, at 22:45, Daniel Izquierdo <dizquierdo@bitergia.com> wrote:
>>>>> On 01/03/16 18:04, Lars Kurth wrote:
>>>>>> It may be better to use the following definition (although, others may disagree)
>>>>>> A reviewer is someone who did one of the following for a patch or series:
>>>>>> - Added a reviewed-by flag
>>>>>> - Added an acked-by flag (maintainers tend to use acked-by)
>>>>>> - Made a comment, but is NOT the author
>>>>> 
...
>> 
>> @Jan, the use-case to measure real review contributions was primarily added 
>> on your request. Do you think the proposed definition above, is good enough?
> 
> Yes, the last bullet point should be what mostly addresses my
> original concern. Some differentiation between Acked-by and
> Reviewed-by may also help - remember that in the case of
> maintainers we generally mean the latter to imply the former,
> and that in the case of non-maintainers the former doesn't
> really mean much.

Sounds as if an approach similar to the one taken for the commit vs. review balance may make sense
Regards
Lars
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Prototype Code Review Dashboards (input required)
  2016-03-03 18:55     ` Lars Kurth
  2016-03-04  8:42       ` Jan Beulich
@ 2016-03-09 16:58       ` Daniel Izquierdo
  1 sibling, 0 replies; 11+ messages in thread
From: Daniel Izquierdo @ 2016-03-09 16:58 UTC (permalink / raw)
  To: Lars Kurth, Jan Beulich; +Cc: xen-devel, Jesus M. Gonzalez-Barahona

On 03/03/16 19:55, Lars Kurth wrote:
> Daniel, (added Jan - search for @Jan)
>
> thanks for the feedback
>
>> On 2 Mar 2016, at 22:45, Daniel Izquierdo <dizquierdo@bitergia.com> wrote:
>>
>> On 01/03/16 18:04, Lars Kurth wrote:
>>> Daniel, Jesus,
>>>
>>> I am going to break my comments down into different sections to make this more consumable. Let's focus on the A1-A3 use-cases in this mail.
>>>
>>> First I wanted to start of with some questions about definitions, as I am seeing some discrepancies in some of the data shown and am trying to understand exactly what the data means, and then have a look at the individual sections.
>>>
>>> General, to the Xen-A1.A2.A3 dash board
>>> - I played with some filters and noticed some oddities, e.g. if I filter on "merged: 0" all views change as expected
>>> - If I filter on "merged: 1", a lot of widgets show no data. Is this showing that there is an issue with the data somewhere?
>>> - I see similar issues with other filters, e.g. 'emailtype: "patch"'
>> In order to bring some context to the dataset, ElasticSearch was initially used for parsing Apache logs. That means that data should be formatted as 'a row = an event'.
>>
>> In this dataset there are several events that are defined by the field 'EmailType'. 'patchserie', 'patch', 'comment', 'flag'. And then, depending on that 'EmailType', each of the columns may have some meaning or some other.
> That makes sense and is quite important background information and different from the original. I think for people to be able to build sophisticated search queries - once we have finalised all the fields - we should probably document this and the exact meaning of fields (also see below). I was thinking - and I volunteer - to put together an extra view that acts like a legend. With the editable version of the dashboard, that should be easy to do.
>
> I think we either
> - need to look at the panels and come up with some cleaner and clearer terminology
> - or, split the panels and make sure that only views which refer
> let me think about it and make a proposal.
>
>> This structure uses the 'EmailType' as the central key where the rest of the columns provide extra syntax. For instance, post_ack_comment field only makes sense for the EmailType:comment.
> Now I understand.
>
>> Coming back to the comments:
>>
>> There are fields that apply only to specific type of events. In the case of 'merge' this applies only in the case of patches. merge:1 would filter patches that are merged (so the rest of the information is literally removed as they are not merged). If we filter by merge:0, these are the rest of the information (even including flags).
> OK. I think this is very confusing because when you look at the tables at the 3 tables at the bottom, all of them show the same fields. And the default for an undefined field seems to be 0. Is there any way to make sure that undefined fields are set to na or -1 or similar. That would make things clearer.

Those were now updated and only contains (hopefully) useful info in each 
of the cases.

>
>> Thus, using the filter merge:1 leads to having info only related to 'patches' in this case.
>>
>> As this panel shows information about other types than 'patch', if you filter by some 'emailtype' such as 'patch' then you're focusing only on patches data and this will display the merged and not merged ones.
>>
>> In order to improve this, we can either create a panel for type of analysis (one panel for patches, one for comments, etc). Or we can play with adding the 'merge' field to any flag, patchserie, patch and comment whose patch was merged at some point. The latter may sound a bit weird as a 'merged' status does not apply to a flag (Reviewed-by) for instance.
> Given that we have a lot of stuff on the panels, we should probably go for the 1st option. But there may be corner cases, when we may want to go for the second.

>   When I go to the editable dashboard version, how can I tell which EmailType is used. I couldn't figure that out.

You need to 'edit' the visualization. And in the visualization panel, 
this tells you if this is based on a specific search (so a specific 
EmailType is used).

I guess this is simpler if we prepare some documentation on how to 
proceed and edit the panels once we agree in the final status.

>>>> On 1 Mar 2016, at 13:53, Lars Kurth <lars.kurth.xen@gmail.com> wrote:
>>>>
>>>> Case of Study A.1: Identify top reviewers (for both individuals and companies)
>>>> ------------------------------------------------------------------------------
>>>>
>>>> Goal: Highlight review contributions - ability to use the data to "reward" review contributions and encourage more "review" contributions
>>> The widgets in question are:
>>> - Evolution 'Reviewed by-flag' (no patchseries, no patches)
>>> - What is the difference to Evolution of patches
>>> - Top People/Domains Reviewing patches
>>>
>>> Q1: Are this the reviewed-by flags?
>> They are only the Reviewed-by flags.
>>
>>> Q2: What is the scope? Do the number count
>>> - the # files someone reviewed
>>> - the # patches someone reviewed
>>> - the # series someone reviewed
>> The number counts the number of reviews accomplished by a developer or by a domain. A review is accomplished when the flag 'reviewed-by' is detected in a email replying a patch.
>>
>> If a developer reviews several patches or several versions of the same patch, each of those is counted as a different review.
> So this is basically the number of reviewed by flags aggregated per developer.

Exactly.

>>> If a reviewer is solely defined by the reviewed-by tags, the data does not provide a correct picture.
>> This is how this works so far.
>>
>>> It may be better to use the following definition (although, others may disagree)
>>> A reviewer is someone who did one of the following for a patch or series:
>>> - Added a reviewed-by flag
>>> - Added an acked-by flag (maintainers tend to use acked-by)
>>> - Made a comment, but is NOT the author
>> We can update that definition. Do we want to have extra discussion with this respect?
> I think that would be more correct. In particular, as we still will be able
> @Jan, what is your view? This use-case was primarily created because of
>
>>> Related to that use-case are also the following widgets
>>> - Evolution of Comments Activity
>>> - Top People/Domains Commenting (which also contain post-ACK comments and are thus also related to A.3)
>>> - Evolution of Email activity
>>>
>>> Q3: Again, the scope isn't quite clear
>> This is the number of comments replying to a patch. A comment is defined as an email reply to a patch.
>>
>>> Q4: The figures are higher than those in "People/Domains Reviewing patches". Are comments on people's own patches included (these would be replies to the comments of others)
>> I should check the  last question. I'd say that we're including them, as they are 'comments' to a patch. You can indeed comment your own patches :). But we can deal with this if this does not make sense.
> Given the use-case of highlighting people commenting on the patches of others, we should probably not count comments to your own patches.

I should then filter a bit more this (TODO).

>
>
>>>> Possible places where this could be added : a separate table which is not time based, but can be filtered by time
>>>> Possible metrics: number of review comments by person, number of patches/patch series a person is actively commenting on, number of ACKS and reviewed by tags submitted by person
>>>>
>>>> Actions: we could try to have a panel only focused on rewarding people and only based on information per domain and individual with some large table at the end with all of the reviews.
>>> I don't have a strong view, but I think that there are too many tables and graphs and that they could possibly be consolidated. For example
>>>
>>> - The "Top People/Domains Reviewing Patches" views are a subset of the imbalance tables. In particular exporting the data would be useful for me.
>>>    - Personally, I don't mind just having the superset.
>>>    - In particular, as the tables are sortable
>>>    - And the only filters that can be added are sender and sender_domain
>> Each set of evolutionary chart and the two tables are based on a different 'search' in ElasticSearch that is in the end like a 'view' in sql.
>> So the consolidation may require extra work here not that easy to change, although we can discuss about this. This is divided by use cases, so consolidating this may mean consolidate the use cases in first place.
> Ah OK. This is not such a big deal. I think we may need to consolidate a little bit though, looking at the data. There seems to be a lot of duplicate information.
>
>>> - Depending on the answer of Q2, the "Evolution 'Reviewed by-flag'..." and "Evolution of patches..." could probably be shown as a combined bar graph
>>>    - If they have the same scope, then a bar chart may be better
>>>    - I will probably have some further thoughts about this, based on the answer.
>> In the case of patches, this is the number of patches. The several versions of a patch are counted as a new patch in this case.
> OK: just brings me back to that we need a legend with clear definitions and consistently apply terminology. Is this also true for the tables?

This also apply to the tables.


>
>> We assumed that patches re-sent were those that required extra work, so they could be counted as a new patch. Regarding to the reviews, this is a similar approach, tables and the evolutionary chart shows the number of total reviews that were detailed in the specific email.
> OK: I think this is fine
>
>
>>>> Case of Study A.2: Identify imbalances between reviewing and contribution
>>>> ------------------------------------------------------------------------
>>>>
>>>> Context: We suspect that we have some imbalances in the community, aka
>>>> - some companies/individuals which primarily commit patches, but not review code
>>>> - some companies/individuals which primarily comment on patches, but do not write code
>>>> - some which do both
>>> I think this works quite fine, and the data is consistent with A.1
>>> The only comment I would have is that we should calculate the Balance by Reviews - Patches posted
>> We can changed that.
>>
>>>> Goal: Highlight imbalances
>>>>
>>>> Possible places where this could be added : a separate table which is not time based, but can be filtered by time or some histogram
>>>> Possible metrics: number of patches/patch series a person/company is actively commenting on divided by number of patches/patch series a person/company is actively submitting
>>>>
>>>> Actions: build this dashboard with pre-processed information about the balances.
>>> This seems to be quite good: the only observation is that authors (and domains) are case sensitive and probably should be normalised
>>> There also seem to be entries such as "Jan Beulich [mailto:JBeulich@suse.com]"
>>>
>>> I also found lots of entries, with multiple e-mail addresses such as
>>> - "andrew.cooper3@citrix.com; Ian.Campbell@citrix.com; wei.liu2@citrix.com; ian.jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com; Dong, Eddie; Nakajima, Jun; Tian, Kevin; xen-devel@lists.xen.org; keir@xen.org"
>>> - Most of these have (0,0,0) and can probably be removed, if that's possible
>> This needs some cleaning actions, thanks for the pointer, I'll have a look at this!
> Before you start, I wanted to seed the following idea. I noticed that a lot of other CC's are in those lists also. Remember, we were talking about increasing the matching rate of xen-devel@ e-mail traffic and git repos better. One of the key outstanding issues was that a significant portion of patch reviews posted xen-devel are in fact for QEMU, Linux, FreeBSD, NetBSD, etc. All of them follow a similar review process as Xen, but to identify the actual back-log in Xen - and to not have to deal with a lot of extra complexity - we originally decided to filter these out.
>
> But that was before I understood how powerful Kibana was: an alternative may be to add an extra field to comments, patches, ... (I think that field would apply to EmailType=*) that deals with CC'ed mailing lists, such as
> - qemu-devel@nongnu.org
> - linux-api@vger.kernel.org
> - linux-arch@vger.kernel.org
>    ...
> - linuxppc-dev@lists.ozlabs.org
> - port-xen@netbsd.org
> - (this may not be a complete list)
>
> and put patchseries, patches, comments, flags into different buckets via an extra flag (e.g. project: xen-only, linux, netbsd, freebsd, ...) and allow to filter (e.g. via a pie chart). This would mean that we could use a filter, instead of complex matching, to cut the data. The default could be 'project: "xen-only"'.

I was checking this filter, but I realized that once a first patchserie 
is sent, any of the replies add in CC the mailing list. I also found 
some cases where other mailing lists are cc-ed.

In any case, I'd say that this can be done.

>>>> Case of Study A.3: identify post ACK-commenting on patches
>>>> ----------------------------------------------------------
>>>>
>>>> Background: We suspect that post-ACK commenting on patches may be a key issue in our community. Post-ACK comments would be an indicator for something having gone wrong in the review process.
>>>> Goal:
>>>> - Identify people and companies which persistently comment post ACK
>>>> - Potentially this could be very powerful, if we had a widget such as a pie chart which shows the proportion of patches/patch series with no post-ACK comments vs. with post-ACK comments
>>> I think that
>>> - Evolution of Comments Activity
>>> - Top People/Domains Commenting (which also contain post-ACK comments and are thus also related to A.3)
>>> - Evolution of Email activity
>>> - _emailtype views : not quite sure what the difference is
>>>
>>> serves two purposes: it contributes to A.1, but also to A.3
>>>
>>> Related to this seem to be
>>> - Evolution of Flags
>>> - Top People/Domain analysis
>>> - Evolution of Patch series
>>>
>>> Q5: What are All Flags? This seems awfully high: maybe signed off?
>>> Something doesn't quite seem to add up, e.g. if I look at March 16th, I get
>>> - Patch e-mails = 1851
>>> - All flags = 1533
>>> - Reviewed by = 117
>>> - Acked by = 150
>>> - Comments = 480
>>>   All flags seems to be tracking Patch e-mails
>> Flags are found in some replies to some patches. When a flag is found in an email, there's a new entry in the list of flags. If there were three flags (eg signed-off, cc and reported-by) those are three new entries in the table.
>> In this case, 'all flags' is the aggregation of all of the flags found. Each of those special flags counts.
>>
>> Said this, we can reduce the flags in the dataset down to the number of flags of interest for the analysis: reviewed-by and acked-by. This flags info contains richer information.
> Is there an easy way to identify the flags which are available? I can then decide which ones we care about.

You should go to the 'discover' section. Or we can add a new table with 
all of the available flags in any of the panels (although this may mean 
extra noise in those panels).

Another option is again to prepare some documentation.

>
>>> Q6: Same question about the scope. The background is whether we can consolidate some of these, and what we don't need.
>>>
>>>> NOTE: Need to check the data. It seems there are many post-ACK comments in a 5 year view, but none in the last year. That seems wrong.
>>> However it seems that the data for post-ACK comments may be wrong: in the last year, there were 0 post-ACK comments. That is clearly wrong.
>> This sounds like a bug. I'll review this. Thanks again for this pointer.
> Thanks.
>
>>>> - AND if that could be used to see how all the other data was different if one or the other were selected
>>>> - In addition being able to get a table of people/companies which shows data by person/company such as: #of comments post-ACK, #of patches/series impacted by post-ACK comments and then being able to get to those series would be incredible
>>> This seems to not be there yet. If there was a filter (manual or through some widget) that would be good. But I guess we need to look at the data first.
>> With the current data and for this specific case, a work around would be to write in the search box: "sender_domain:citrix.com AND post_ack_comment:1". This provides a list of comments in one of the bottom tables ( so far wrong data as you mentioned regarding to no existing post-ack comments in 2015). There you can see that there's a patchserie_id that can be later used for tracking those patchseries affected by post_ack_comments.
> Let me try it, once you fixed the bug.
>
> Regards
> Lars
> P.S.: The more I play with this, the more I like it.

Great :).

Thanks for your comments.

I'll reply now to my previous email pointing to some specific actions 
and replying to the previous actions.

Regards,
Daniel.



-- 
Daniel Izquierdo Cortazar, PhD
Chief Data Officer
---------
"Software Analytics for your peace of mind"
www.bitergia.com
@bitergia


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Prototype Code Review Dashboards (input required)
  2016-03-02 22:45   ` Daniel Izquierdo
  2016-03-03 18:55     ` Lars Kurth
@ 2016-03-09 17:06     ` Daniel Izquierdo
  2016-03-09 20:03       ` Lars Kurth
  1 sibling, 1 reply; 11+ messages in thread
From: Daniel Izquierdo @ 2016-03-09 17:06 UTC (permalink / raw)
  To: xen-devel

On 02/03/16 23:45, Daniel Izquierdo wrote:

[...]
>> Given, that the Xen-A1.A2.A3 dash board is quite busy, we should keep 
>> that separate.
>>
>>
>>
>
> I still have to upload this, sorry for the delay!
>
>
> This is a summary of the actions unless extra comments are provided:
>
> * Balance should be calculated as reviews - patches
> * Cleaning actions in the dataset when finding multiple email addresses
> * Bugs with the post-ack comments
> * Add the extra panel defined as use case A.0
>
> Some extra feedback or discussion:
>
> * Reduce the flags to be used. We're currently using all of the flags 
> available and sub-setting the reviewed-by and acked-by
> * I'm not sure if we need extra discussion related to the merge:1 filter
> * Check how reviews and patches are counted. Any new version of a 
> patch is counted as a new patch. Any new reviewed-by flag in a reply 
> to a patch is counted as a review.


 From those actions, this is the current status:

* Balance should be calculated as reviews - patches (DONE)
* Cleaning actions in the dataset when finding multiple email addresses 
(still in progress and I should add your comments in a previous email)
* Bugs with the post-ack comments (DONE)
* Add the extra panel defined as use case A.0 (DONE)
* Tables at the bottom of the panels contains proper information and not 
all of the fields (DONE)
* Do not count comments to your own patches (to be done)

Regarding to the extra feedback:
* Reduce the flags to be used. We're currently using all of the flags 
available and sub-setting the reviewed-by and acked-by.
** This opened a new discussion about the CC mailing lists and the 
creation of some buckets of info.
* I'm not sure if we need extra discussion related to the merge:1 filter
** No advances
* Check how reviews and patches are counted. Any new version of a patch 
is counted as a new patch. Any new reviewed-by flag in a reply to a 
patch is counted as a review.
** No advances


Regards,
Daniel.

-- 
Daniel Izquierdo Cortazar, PhD
Chief Data Officer
---------
"Software Analytics for your peace of mind"
www.bitergia.com
@bitergia


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Prototype Code Review Dashboards (input required)
  2016-03-09 17:06     ` Daniel Izquierdo
@ 2016-03-09 20:03       ` Lars Kurth
  0 siblings, 0 replies; 11+ messages in thread
From: Lars Kurth @ 2016-03-09 20:03 UTC (permalink / raw)
  To: Daniel Izquierdo; +Cc: xen-devel


> On 9 Mar 2016, at 17:06, Daniel Izquierdo <dizquierdo@bitergia.com> wrote:
> 
> On 02/03/16 23:45, Daniel Izquierdo wrote:
> 
> [...]
>>> Given, that the Xen-A1.A2.A3 dash board is quite busy, we should keep that separate.
>>> 
>>> 
>>> 
>> 
>> I still have to upload this, sorry for the delay!
>> 
>> 
>> This is a summary of the actions unless extra comments are provided:
>> 
>> * Balance should be calculated as reviews - patches
>> * Cleaning actions in the dataset when finding multiple email addresses
>> * Bugs with the post-ack comments
>> * Add the extra panel defined as use case A.0
>> 
>> Some extra feedback or discussion:
>> 
>> * Reduce the flags to be used. We're currently using all of the flags available and sub-setting the reviewed-by and acked-by
>> * I'm not sure if we need extra discussion related to the merge:1 filter
>> * Check how reviews and patches are counted. Any new version of a patch is counted as a new patch. Any new reviewed-by flag in a reply to a patch is counted as a review.
> 

Thank you for the progress update

> From those actions, this is the current status:
> 
> * Balance should be calculated as reviews - patches (DONE)
> * Cleaning actions in the dataset when finding multiple email addresses (still in progress and I should add your comments in a previous email)
> * Bugs with the post-ack comments (DONE)
> * Add the extra panel defined as use case A.0 (DONE)
I noticed that the naming of fields in the panels have inconsistent names, e.g. patchserie_sender_domain vs sender_domain
That makes writing queries hard and, if possible, should be cleaned up at some stage

> * Tables at the bottom of the panels contains proper information and not all of the fields (DONE)
> * Do not count comments to your own patches (to be done)
Note that we thus should also not count post-ACK comments on own patches

> Regarding to the extra feedback:
> * Reduce the flags to be used. We're currently using all of the flags available and sub-setting the reviewed-by and acked-by.
> ** This opened a new discussion about the CC mailing lists and the creation of some buckets of info.
We should discuss the best way forward

> * I'm not sure if we need extra discussion related to the merge:1 filter
> ** No advances
Are there any areas where you need my input. I think there were 2 issues
a) One generally with uninitialised values : however that applies to other flags also
b) One about the views not shown : I think that is OK and fixing a) would make it clear what values are defined and which are not  

> * Check how reviews and patches are counted. Any new version of a patch is counted as a new patch. Any new reviewed-by flag in a reply to a patch is counted as a review.
> ** No advances

I think these are both OK. Rationale:
- Reviewed by flags can be removed and then added again. As we count multiple review comments, we can count multiple reviewed-by
- As for counting a new version of a patch as a separate one, I think we need some discussion
Firstly, we need to be careful of is that we don't include old versions of a patches/patchseries within the uncommitted backlog.
Also, we need to make sure that counting of patch related items (e.g. reviews) are consistent with how we count patches.

Regards
Lars 

 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-03-09 20:03 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-01 13:53 Prototype Code Review Dashboards (input required) Lars Kurth
2016-03-01 17:04 ` Lars Kurth
2016-03-02 22:45   ` Daniel Izquierdo
2016-03-03 18:55     ` Lars Kurth
2016-03-04  8:42       ` Jan Beulich
2016-03-04  9:05         ` Lars Kurth
2016-03-04  9:21           ` Jan Beulich
2016-03-07 17:24             ` Lars Kurth
2016-03-09 16:58       ` Daniel Izquierdo
2016-03-09 17:06     ` Daniel Izquierdo
2016-03-09 20:03       ` Lars Kurth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).