Re: git trees organization

From: Stephen Hemminger <stephen@networkplumber.org>
To: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Cc: Thomas Monjalon <thomas@monjalon.net>,
	Ferruh Yigit <ferruh.yigit@intel.com>,
	Bruce Richardson <bruce.richardson@intel.com>,
	dev@dpdk.org
Subject: Re: git trees organization
Date: Wed, 13 Sep 2017 19:25:15 -0700	[thread overview]
Message-ID: <CAOaVG150jLnm=RcqRmmCUUO8cUw3+k+0QmqH1Txp4S3poCX9Pw@mail.gmail.com> (raw)
In-Reply-To: <20170913145402.GA2481@6wind.com>

On Sep 13, 2017 7:54 AM, "Adrien Mazarguil" <adrien.mazarguil@6wind.com>
wrote:

On Wed, Sep 13, 2017 at 02:21:00PM +0100, Ferruh Yigit wrote:
> On 9/13/2017 1:25 PM, Adrien Mazarguil wrote:
> > On Wed, Sep 13, 2017 at 12:38:37PM +0100, Ferruh Yigit wrote:
> >> On 9/13/2017 8:58 AM, Adrien Mazarguil wrote:
> >>> Hi,
> >>>
> >>> On Tue, Sep 12, 2017 at 09:32:07AM +0100, Bruce Richardson wrote:
> >>>> On Tue, Sep 12, 2017 at 12:03:30AM +0200, Thomas Monjalon wrote:
> >>>>> Hi all,
> >>>>>
> >>>>> As you know I am currently the only maintainer of the master tree.
> >>>>> It is very convenient because I need to synchronize with others
> >>>>> only when pulling "next-*" trees.
> >>>>> But the drawback is that I should be available very often to
> >>>>> avoid stalled patches waiting in patchwork backlog.
> >>>>>
> >>>>> I feel it is the good time to move to a slightly different
organization.
> >>>>> I am working closely with Ferruh Yigit for almost one year, as
next-net
> >>>>> maintainer, and I think it would be very efficient to delegate him
some
> >>>>> work for the master tree.
> >>>>
> >>>> I think Ferruh has been doing an excellent job on the net tree, and
> >>>> would be an excellent candidate to help with the workload on the
master
> >>>> tree.
> >>>>
> >>>>> I mean that I would use the patchwork delegation to explicitly
divide
> >>>>> the workload given our different experiences.
> >>>>> Ferruh, do you agree taking this new responsibility?
> >>>>>
> >>>>> At the same time, we can think how to add more git sub-trees:
> >>>>
> >>>> In principle, I'm in favour, but I think that the subtrees of the
master
> >>>> tree should be at a fairly coarse granularity, and not be too many of
> >>>> them. The more subtrees, the more likely we are to have issues with
> >>>> patchsets needing to be split across trees, or having to take bits
from
> >>>> multiple trees in order to test if everything is working.
> >>> <snip>
> >>>
> >>> About that, how about we start allowing true merge commits instead of
> >>> rebasing (rewriting history) in order to ease things for maintainers?
> >>>
> >>> This approach makes pull requests show up as a merge commits that
contain
> >>> the (ideally trivial) changes needed to resolve any conflicts; this
has the
> >>> following benefits:
> >>>
> >>> - The work done by a maintainer during that merge is tracked, not
silently
> >>>   ignored or lost. The merge commit itself is signed-off by its
author.
> >>>
> >>> - This allows tracing mistakes or bugs to the conflict resolution
itself.
> >>>
> >>> - Upstream can reject pull requests on the basis that merging it is
not
> >>>   trivial enough (i.e. downstream must merge upstream changes first).
> >>>
> >>> - Sub-trees can merge among themselves in case they need features that
> >>>   encompass several trees, not necessarily always against the master
> >>>   tree. Everything is tracked.
> >>>
> >>> - Maintainers do not ever modify the commits they get from other
trees,
> >>>   which keep their SHAs unmodified as part of the history. A given
commit ID
> >>>   is truly unique among all trees (back-port trees remain the only
exception
> >>>   since commits are cherry-picked).
> >>>
> >>> - It shifts the entire responsibility to the maintainers of sub-trees.
> >>>
> >>> The only downside is that commits have several parents, history
becomes a
> >>> graph that developers need to get used to (some might call it a mess),
> >>> however that's probably not an issue for those already used to Linux
kernel
> >>> development and other large projects.
> >>>
> >>> I know this was already discussed in the past, however I think adding
more
> >>> sub-trees will make rebasing too complex otherwise>
> >>> Thoughts?
> >>>
> >>
> >> Using git merge looks more proper git usage, but I have one question /
> >> concern:
> >>
> >> For next-net, sometimes there are dependent patches in main tree, and
> >> what I am doing is rebasing sub-tree on top of latest main tree.
> >>
> >> When switched to merge method, how dependent patches can be get into
the
> >> sub-tree? Merge from main tree to sub-tree?
> >
> > Yes, that's the idea. On the other hand, as a maintainer, you are not
> > responsible for the contents of what's merged from other official
> > trees. Commits are taken as they are, this implies trust between tree
> > maintainers.
> >
> >> Won't this bidirectional merging confusing?
> >
> > Probably at first, this certainly needs some getting used to. We can
attempt
> > to avoid such merges as much as possible with proper coordination.
Avoiding
> > them is likely not possible though if we want to keep history intact.
>
> Let's assume I need to merge from main tree to next-net three times
> before the integration, because of dependencies.
> When Thomas merged next-net to main tree for rc1, will those three merge
> commits visible in main tree?

Yes, all merge commits will remain visible and part of history. There's only
one case when such commits are optional: fast-forwards. For instance
assuming the HEAD of your current branch is part of upstream's history, you
may not get a merge commit if you pull from upstream before applying a
series instead of doing the reverse.

Whoever gets the fast-forward wins, therefore merging often is better.

Even with many subsequent merges, it's not all that bad. Graph views such as
"git log --oneline --graph" help a lot in clarifying things (once you get
used to them).

> For Linux I guess sub-trees not merging from Linus' tree because of
> dependencies, I assume they are only merging after release to get new
> commits not in their sub-tree.

Linux does that all the time and not necessarily after a release, even among
sub-trees. We don't plan to have as many different trees as Linux so it
should remain much simpler for us in any case.

> But for DPDK main tree is also getting new patches on its own.

Same for Linux even if those come in minority, I think it's not a problem
either way, Git is really good at merging histories.

Note that a maintainer can also add glue commits of his own on top of his
tree in order to simplify a subsequent merge. This is better than addressing
everything in the merge commit itself in non-trivial cases (although it
should be the job of the requester).

> >> And following are notes from my current experience:
> >>
> >> - Having re-writable history gives some flexibility to sub-trees.
> >> Possible to update commit logs and amend patches even after pushed.
> >
> > This is both a good and a bad thing. Thanks to that, history is
currently
> > linear and extremely clean, I think we haven't had a single patch that
> > doesn't compile or a bad merge artifact for a very long time.
> >
> > On the other hand, if you look at the effort required to maintain a
single
> > sub-tree that way, it likely becomes exponential for several. All of
them
> > need to constantly rebase their stuff, with conflicts to address. The
more
> > people, the more mistakes and so on.
> >
> > Once part of an official tree, a commit cannot be amended anymore, it's
too
> > late; revert commits will be more common. We need to accept this first.
> >
> >> - It is hard to confirm pulled commits in main tree, I guess merge
> >> commit will make this easier.
> >>
> >> - To track main tree, continuously rebasing and continuously re-writing
> >> history, I am doing this almost daily, this may be hard for people
> >> working on top of next-net.
> >
> > Yes that's one of the "bad" things with the current approach. Consider
> > automatic non-regression testing against disappearing commit IDs.
Tracking
> > their status is difficult when history is changing. For instance we
usually
> > add local tags to track CI successes/failures. All of them now point to
> > nonexistent commits.
> >
> >> - Conflict resolving done by sub-trees during rebase, instead of done
by
> >> main tree during merge. So this may be more distributed effort.
> >
> > It's not too different actually. With merge you can reject PRs on the
basis
> > that there are too many conflicts to take care of, not unlike a request
for
> > rebase.
> >
> > On the plus side if you make any changes yourself in order to solve
them,
> > they are made part of the merge commit, not in the original commits
which
> > remain unmodified.
>
> You are right, we are loosing original code when there is merge
> conflict, and I think there is no way to trace back to the original
> commit in repo, but from mail list and patchwork perhaps.
>
> And very hard to find out what has been changed for conflict resolving,
> and if something went wrong!
>
> >
> >> - Rebasing gives more straight forward history in main repo, merge
> >> commits looks more confusing, although I would expect it won't be as
> >> complex as Linux tree, so may not be a problem.
> >
> > Right, that's the main drawback. I think there's no way to know the
impact
> > unless we attempt it as an experiment.

--
Adrien Mazarguil
6WIND

Bisecting a tree with lots of subtree merges is terrible. That is why Linus
rebases and doesn't directly take linux-next