git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jerry Zhang <jerry@skydio.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Ross Yeager <ross@skydio.com>, Abraham Bachrach <abe@skydio.com>,
	Brian Kubisiak <brian.kubisiak@skydio.com>
Subject: Re: [PATCH] git-rev-list: add --exclude-path-first-parent flag
Date: Tue, 20 Apr 2021 17:16:59 -0700	[thread overview]
Message-ID: <CAMKO5Cu68cnUu6UEuwQSHoFQ31g9g4TtYgy5vpe35cr90cETXw@mail.gmail.com> (raw)
In-Reply-To: <xmqqczutiddk.fsf@gitster.g>

On Sat, Apr 17, 2021 at 12:22 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Jerry Zhang <jerry@skydio.com> writes:
>
> > On Fri, Apr 16, 2021 at 5:45 PM Junio C Hamano <gitster@pobox.com> wrote:
> >>
> >> Jerry Zhang <jerry@skydio.com> writes:
> >>
> >> > Add the --exclude-path-first-parent flag,
> >> > which works similarly to --first-parent,
> >> > but affects only the graph traversal for
> >> > the set of commits being excluded.
> >> >
> >> >    -A-------E-HEAD
> >> >      \     /
> >> >       B-C-D
> >> >
> >> > In this example, the goal is to return the
> >> > set {B, C, D} which represents a working
> >> > branch that has been merged into main branch
> >> > E. `git rev-list D ^E` will end up returning
> >> > no commits since the exclude path eliminates
> >> > D and its ancestors.
> >> > `git rev-list --exclude-path-first-parent D ^E`
> >> > however will return {B, C, D} as desired.
> >>
> >> It is not clera why you want to have this, instead of doing a more
> >> obvious "D..E^".  Even better is "E^..E", which is often what you
> >> want when viewing a history like my 'seen' that is a straight-line
> >> into which tips of branches are merged.
> > My motivation is to find the point at which a release branch forked off from
> > a main branch, even though the release branch could have been merged
> > into the main branch multiple times since it was forked off.
> >
> > If we add another merge from release to main, it will be more clear
> > that those give different results:
> >
> >         -A-----E-F-main
> >           \   / /
> >            B-C-D-release
> >
> > `git rev-list --exclude-path-first-parent release ^main` returns {B, C, D}.
> > I've added commit F to show that we don't necessarily have info on E,
> > there could be many commits between it and the tip of main.
>
> OK, you meant to deal with repeated merges into integration branch.
>
> So the idea is to just name the end point merge, say F (you also
> could name D as the starting point, but see below), and
>
>  - initially mark its first parent as UNINTERESTING (i.e. E), and
>    other parents as INTERESTING (i.e. D).
>
>  - run the revision traversal machinery, but when propagating the
>    UNINTERESTING bit, give it only to the first parent.  The second
>    and later parents won't become UNINTERESTING.
>
>  - stop after we exhaust INTERESTING commits.
>
> It would probably work for your idealized topology, but I do not
> know what happens when there are criss-cross merges.  In the revised
> picture, you are merging down from the B-C-D chain into the
> mainline, but once the B-C-D chain becomes longer and diverges too
> much from the mainline, it becomes tempting to break the "merge only
> in one direction" discipline and merge back from the mainline, to
> "catch up", and such a merge will have the history of B-C-D line of
> development as its first parent.  Would that screw up the selection
> of which line of development is uninteresting?
Yeah this flag (as well as the --first-parent flag) is mainly only useful
because "git merge" will always put the "branch you're on" as parent 1
and the "branch being merged in" as parent 2. It is possible to break
this assumption with either commit-tree or by merging while on one
branch and pushing to another, but then the user should understand
the consequences of doing so. In our case this isn't possible because
a server handles all merges into the main branches.
>
> >> > Add the --exclude-path-first-parent flag,
> >> > which works similarly to --first-parent,
> >> > but affects only the graph traversal for
> >> > the set of commits being excluded.
> >> >
> >> >    -A-------E-HEAD
> >> >      \     /
> >> >       B-C-D
>
> In any case, it was totally unclear from the proposed log messsage,
> and the overlong option name that does not say much did not help me
> guess what you wanted to do with it.  Specifically, it is not clear
> what "exclude" means (we do not usually use the word in the context
Exclude appears in the first paragraph of the man for git rev-list:
"      List commits that are reachable by following the parent
       links from the given commit(s), but exclude commits that
       are reachable from the one(s) given with a ^ in front of
       them. The output is given in reverse chronological order
       by default."
It appears 5+ more times in the man page with the same meaning.
> of revision traversal), and when we talk about "path" in the context
> of revision traversal, we almost always mean the paths to the files,
> i.e. pathspec that limits and simplifies the shape of the history.
"path" is used in the same man page for the flag "--ancestry-path".
I agree that it could be ambiguous though, so perhaps "chain" would
be better.
> Also, it claims that it works similarly to --first-parent, but what
> you are doing is to propagate UNINTERESTING bit on the first-parent
> chain, which ends up showing the side branch (i.e. B-C-D chain),
> without showing the commits on the first-parent chain (A and E).
>
> What are the words that convey the idea behind this operation
> clearly at the conceptual level?  Let's think aloud to see if we can
> come up with a better name.
>
>  * first parents are unintertesting
>
>  * show commits on side branch(es)
>
>  * follow side branch.
>
> I think that is closer to the problem you are solving, if I
> understand what you wrote above correctly.
>
> Perhaps --show-side-branch or --follow-side-branch?  I dunno.
For my particular use-case I am using it in combination with
--first-parent and a single include and exclude commit to show the
commits on the "side-branch" of the include commit. But if you specify
multiple commits for either or don't use --first-parent, the behavior is
different and I don't think "--side-branch" describes it well in those cases.

Since I don't believe I can predict all use-cases for the flag,
I'd rather name it by what it "does" rather than what it is "for".
If we're concerned about length, maybe "first-parent-not" could
get the meaning across:
- for "rev-list --first-parent A --not B" only first parents are visited
along A's ancestry
- for "rev-list --first-parent-not A --not B" it might be reasonable
that since B is a "not" commit, only first parents are visited along
B's ancestry.

Overall I don't think we can make a name so clear that the user
can avoid the man page anyway.

  reply	other threads:[~2021-04-21  0:17 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-17  0:15 [PATCH] git-rev-list: add --exclude-path-first-parent flag Jerry Zhang
2021-04-17  0:45 ` Junio C Hamano
2021-04-17  1:07   ` Jerry Zhang
2021-04-17  4:09     ` Felipe Contreras
2021-04-17  7:22     ` Junio C Hamano
2021-04-21  0:16       ` Jerry Zhang [this message]
2021-04-21  0:48 ` [PATCH V2] git-rev-list: add --first-parent-not flag Jerry Zhang
2021-07-28  3:20   ` [PATCH V3] " Jerry Zhang
2021-12-11  2:13     ` Jerry Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMKO5Cu68cnUu6UEuwQSHoFQ31g9g4TtYgy5vpe35cr90cETXw@mail.gmail.com \
    --to=jerry@skydio.com \
    --cc=abe@skydio.com \
    --cc=brian.kubisiak@skydio.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=ross@skydio.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).