linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Small student project idea on appropriate integration trees in MAINTAINERS
@ 2021-01-22  8:22 Lukas Bulwahn
  2021-01-28 23:54 ` Jonathan Corbet
  0 siblings, 1 reply; 4+ messages in thread
From: Lukas Bulwahn @ 2021-01-22  8:22 UTC (permalink / raw)
  To: devel
  Cc: Ralf Ramsauer, Wolfgang Mauerer, Jonathan Corbet,
	Linux Kernel Mailing List, Pia Eichinger, Başak Erdamar

Dear all,


here is a small student project idea:


In previous work on MAINTAINERS and process conformance, Pia Eichinger
[1] has investigated: are patches integrated by the maintainers
defined by the responsibilities in MAINTAINERS?

In this project, we are interested in a related (possibly simpler)
question: Are the commits integrated into the appropriate integration
trees referenced in MAINTAINERS?

As I believe, a main difference between considering maintainers and
integration trees is that the information in MAINTAINERS about
integration trees is more erroneous, as it is not used as prominently
as the personal maintainer information, name and email, with the
wide-spread use of ./scripts/get_maintainer.pl. So, correcting those
errors on integration trees in MAINTAINERS is more dominant (but also
simpler) compared to correcting errors on personal maintainer
information in MAINTAINERS.

The answer on the question above can then ultimately be used to
identify which integration tree entries should be added to specific
sections in MAINTAINERS to match best against the actual integration
observed in git.

The factors and metric to determine what is best is of course the
challenging task of identifying a suitable heuristics that is:
  1. good enough to be used to create a change to MAINTAINERS that is
accepted by the community, and
  2. simple enough to be implemented with reasonable effort.

Background:

The MAINTAINERS section includes references, through the T: entries,
to the location of a source configuration management (SCM) tree with
its type, e.g., git, quilt, hg,
For each commit, the kernel git history carries the commit's
integration tree path, i.e., the information through with source
configuration management (SCM) trees a commit was integrated until it
was finally integrated into Linus Torvalds' tree.

Ideally the references in the MAINTAINERS sections are:
  - complete, i.e, all integration trees used for recent kernel
releases are mentioned in MAINTAINERS.
  - sound, i.e., the majority of the commits are integrated through
the trees referenced in the MAINTAINERS sections a patch belongs to.
  - precise, i.e., for each MAINTAINERS section, the majority of the
commits that belong to a  section are integrated through the tree
referenced in that section.

Goal:

We identify and measure to these properties above, completeness,
soundness and precision.

Then, we use that information to determine which integration tree
entries should be added to which specific sections to maximally
increase the three properties.

To evaluate the adequacy of this method, we can obtain feedback from
the responsible kernel maintainers through proposing patches modifying
the MAINTAINERS file, for the additions that we identified as most
relevant (maximally increasing the properties, to a reasonable
threshold of number of patch proposals [to not swamp maintainers
initially] and a threshold on relevance [to not send out minor changes
that are largely irrelevant to the community]).

In this project, we can make use of:

- gitdm [git://git.lwn.net/gitdm.git]: gitdm includes some scripts to
parse MAINTAINERS and obtain the integration tree patch of a commit.

and/or

- pasta [https://github.com/lfd/PaStA]: Similarly to gitdm, pasta
provides functionality to parse MAINTAINERS and some functionalities
on extracting information on commits.

Potential project phases:

- In the first phase (PoC phase), we could probably just create a
setup that combines or extends the functionality in gitdm and/or in
pasta.

- In the second phase (MAINTAINERS patch creation phase), we send out
some patches and collect feedback from maintainers.

- In a third phase, with a better understanding of the individual
pieces in gitdm and/or in pasta, we could then create a cleaner design
that also refactors gitdm and pasta to share the same implementation
when essentially the same basic functionality is used within the
various analyses.


References:

[1] https://lists.elisa.tech/g/devel/message/1269


---

Any thoughts on this small student project?

If it is not too crazy, I will mentor a student on this project
through one of the next mentoring programs (Google Summer of Code, LF
mentorship, etc.).


Lukas

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Small student project idea on appropriate integration trees in MAINTAINERS
  2021-01-22  8:22 Small student project idea on appropriate integration trees in MAINTAINERS Lukas Bulwahn
@ 2021-01-28 23:54 ` Jonathan Corbet
  2021-02-05  6:42   ` Lukas Bulwahn
  0 siblings, 1 reply; 4+ messages in thread
From: Jonathan Corbet @ 2021-01-28 23:54 UTC (permalink / raw)
  To: Lukas Bulwahn
  Cc: devel, Ralf Ramsauer, Wolfgang Mauerer,
	Linux Kernel Mailing List, Pia Eichinger, Başak Erdamar

On Fri, 22 Jan 2021 09:22:24 +0100
Lukas Bulwahn <lukas.bulwahn@gmail.com> wrote:

> In this project, we can make use of:
> 
> - gitdm [git://git.lwn.net/gitdm.git]: gitdm includes some scripts to
> parse MAINTAINERS and obtain the integration tree patch of a commit.

Look also at the 'treeplot' tool there, which determines which tree(s)
each patch went through and makes pretty (OK, not hugely pretty) pictures
from the result.

I suspect you'll find that the tree information is mostly correct.
Developers need to know that to be able to base their patches properly; an
incorrect entry would lead to a certain amount of maintainer misery.

Thanks,

jon

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Small student project idea on appropriate integration trees in MAINTAINERS
  2021-01-28 23:54 ` Jonathan Corbet
@ 2021-02-05  6:42   ` Lukas Bulwahn
  2021-02-05 14:06     ` Joe Perches
  0 siblings, 1 reply; 4+ messages in thread
From: Lukas Bulwahn @ 2021-02-05  6:42 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: devel, Ralf Ramsauer, Wolfgang Mauerer,
	Linux Kernel Mailing List, Pia Eichinger, Başak Erdamar

On Fri, Jan 29, 2021 at 12:54 AM Jonathan Corbet <corbet@lwn.net> wrote:
>
> On Fri, 22 Jan 2021 09:22:24 +0100
> Lukas Bulwahn <lukas.bulwahn@gmail.com> wrote:
>
> > In this project, we can make use of:
> >
> > - gitdm [git://git.lwn.net/gitdm.git]: gitdm includes some scripts to
> > parse MAINTAINERS and obtain the integration tree patch of a commit.
>
> Look also at the 'treeplot' tool there, which determines which tree(s)
> each patch went through and makes pretty (OK, not hugely pretty) pictures
> from the result.

Thanks, we are well aware, and that is a good reminder for Basak and
me to get our gitdm treeplot patches in shape for proper submission.

>
> I suspect you'll find that the tree information is mostly correct.

Your suspicion, which is counter to my hypothesis, makes this
investigation worthwhile just to see how correct that information
really is.

> Developers need to know that to be able to base their patches properly; an
> incorrect entry would lead to a certain amount of maintainer misery.
>

Maybe the missing or wrong information in MAINTAINERS or the lack of
clear recommendation for new developers to a kernel subsystem on which
integration tree a patch shall apply to is one of the reasons for some
maintainers' misery.

Let us find someone interested to measure and investigate and then we
will see...


Lukas

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Small student project idea on appropriate integration trees in MAINTAINERS
  2021-02-05  6:42   ` Lukas Bulwahn
@ 2021-02-05 14:06     ` Joe Perches
  0 siblings, 0 replies; 4+ messages in thread
From: Joe Perches @ 2021-02-05 14:06 UTC (permalink / raw)
  To: Lukas Bulwahn, Jonathan Corbet
  Cc: devel, Ralf Ramsauer, Wolfgang Mauerer,
	Linux Kernel Mailing List, Pia Eichinger, Başak Erdamar

On Fri, 2021-02-05 at 07:42 +0100, Lukas Bulwahn wrote:
> On Fri, Jan 29, 2021 at 12:54 AM Jonathan Corbet <corbet@lwn.net> wrote:
> > 
> > On Fri, 22 Jan 2021 09:22:24 +0100
> > Lukas Bulwahn <lukas.bulwahn@gmail.com> wrote:
> > 
> > > In this project, we can make use of:
> > > 
> > > - gitdm [git://git.lwn.net/gitdm.git]: gitdm includes some scripts to
> > > parse MAINTAINERS and obtain the integration tree patch of a commit.
> > 
> > Look also at the 'treeplot' tool there, which determines which tree(s)
> > each patch went through and makes pretty (OK, not hugely pretty) pictures
> > from the result.
> 
> Thanks, we are well aware, and that is a good reminder for Basak and
> me to get our gitdm treeplot patches in shape for proper submission.
> 
> > 
> > I suspect you'll find that the tree information is mostly correct.
> 
> Your suspicion, which is counter to my hypothesis, makes this
> investigation worthwhile just to see how correct that information
> really is.

I suspect the specific development trees listed in each MAINTAINERS
subsystems is mostly useless information.

Just like the number of subsystems listed and their nominal maintainers
is more vanity than actual.  MAINTAINER subsystems may be active for
a small window of time when submitted, but these mostly driver entries
are quickly aged out as the typically driver is completed.

The 80:20 rule when applied to MAINTAINERS I suspect is more like 90:10.

> > Developers need to know that to be able to base their patches properly; an
> > incorrect entry would lead to a certain amount of maintainer misery.

The -next integration tree works relatively well as the basis for
development.  If a patch doesn't apply, it's typically fairly easy
to rebase it in the rare occasion it doesn't apply to a particular
active tree.



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-02-05 16:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-22  8:22 Small student project idea on appropriate integration trees in MAINTAINERS Lukas Bulwahn
2021-01-28 23:54 ` Jonathan Corbet
2021-02-05  6:42   ` Lukas Bulwahn
2021-02-05 14:06     ` Joe Perches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).