git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Derrick Stolee <stolee@gmail.com>,
	Jeff Hostetler <git@jeffhostetler.com>
Subject: Re: [PATCH] rebase --merge: optionally skip upstreamed commits
Date: Sat, 14 Mar 2020 01:04:55 -0700	[thread overview]
Message-ID: <CABPp-BGcQHT5PkqYw3YRWOv8kbiSumJ+YQ3owkd_7viKWs3q7A@mail.gmail.com> (raw)
In-Reply-To: <CABPp-BE83ZhezkgmwatxAhqh4rptMUggcjSwBeiSByyPTUi6Lw@mail.gmail.com>

On Thu, Mar 12, 2020 at 3:40 PM Elijah Newren <newren@gmail.com> wrote:
>
>  On Thu, Mar 12, 2020 at 11:04 AM Jonathan Tan <jonathantanmy@google.com> wrote:
> >
> > > > Does this suggest that the cherry-pick detection is suboptimal and
> > > > needs to be improved?  When rebasing, it is typical that you are just
> > > > rebasing a small number of patches compared to how many exist
> > > > upstream.  As such, any upstream patch modifying files outside the set
> > > > of files modified on the rebased side is known to not be PATCHSAME
> > > > without looking at those new files.
> > >
> > > That's true - and this would drastically reduce the fetches necessary in
> > > partial clone, perhaps enough that we no longer need this check.
> > >
> > > In the absence of partial clone, this also might improve performance
> > > sufficiently, such that we no longer need my new option. (Or it might
> > > not.)
> >
> > I took a further look at this. patch-ids.c and its caller
> > (cherry_pick_list() in revision.c) implement duplicate checking by first
> > generating full diff outputs for the commits in the shorter side,
> > putting them in a hashmap keyed by the SHA-1 of the diff output (and
> > values being the commit itself), and then generating full diff outputs
> > for the commits in the longer side and checking them against the
> > hashmap. When processing the shorter side, we could also generate
> > filename-only diffs and put their hashes into a hashset; so when
> > processing the longer side, we could generate the filename-only diff
> > first (without reading any blobs) and checking them against our new
> > hashset, and only if it appears in our new hashset, then do we generate
> > the full diff (thus reading blobs).
> >
> > One issue with this is unpredictability to the user (since which blobs
> > get read depend on which side is longer), but that seems resolvable by
> > not doing any length checks but always reading the blobs on the right
> > side (that is, the non-upstream side).
> >
> > So I would say that yes, the cherry-pick detection is suboptimal and
> > could be improved.
>
> Sweet, thanks for doing this investigative work!  Sounds promising.
>
> > So the question is...what to do with my patch? An
> > argument could be made that my patch should be dropped because an
> > improvement in cherry-pick detection would eliminate the need for the
> > option I'm introducing in my patch, but after some thought, I think that
> > this option will still be useful even with cherry-pick detection.
>
> The option may be totally justified here, but can I go on a possible
> tangent and vent for a little bit?  We seem to introduce options an
> awful lot.  While options can be valuable they also have a cost, and I
> think we tend not to acknowledge that.  Some of the negatives:
>
> - Developers add options when they run into bugs instead of fixing
> bugs (usually they don't realize that the behavior in question is a
> bug, but that's exacerbated by a willingness to add options and not
> consider costs)
> - Developers add options without considering combinations of options
> and what they mean (though it's hard to fault them because considering
> combinations becomes harder and harder with the more options we have;
> it's a negative feedback cycle)
> - Growth in number of options leads to code that is hard or impossible
> to refactor based on a maze of competing options with myriad edge and
> corner cases that are fundamentally broken
> - Users get overloaded by the sheer number of options and minor distinctions
>
> The fourth case is probably obvious, so let me just include some
> examples of the first three cases above:
> * Commits b00bf1c9a8 ("git-rebase: make --allow-empty-message the
> default", 2018-06-27), 22a69fda19 ("git-rebase.txt: update description
> of --allow-empty-message", 2020-01-16), and d48e5e21da ("rebase
> (interactive-backend): make --keep-empty the default", 2020-02-15)
> noted that options that previously existed were just workarounds to
> buggy behavior and the flags should have been always on.
> * Commit e86bbcf987 ("clean: disambiguate the definition of -d",
> 2019-09-17) showed a pretty hairy case where the combination of
> options led to cases where I not only didn't know how to implement
> correct behavior, I didn't even know how to state what the desired
> behavior for end-users was.  Despite a few different reports over a
> year and a half that I had a series that fixed some known issues for
> users the series languished because I couldn't get an answer on what
> was right.  See also
> https://lore.kernel.org/git/20190905154735.29784-1-newren@gmail.com/
> * See the huge "Behavioral differences" section of the git-rebase
> manpage, and a combination of rants from me on dir.c:
>   - https://lore.kernel.org/git/20190905154735.29784-1-newren@gmail.com/
>   - https://lore.kernel.org/git/CABPp-BFG3FkTkC=L1v97LUksndkOmCN8ZhNJh5eoNdquE7v9DA@mail.gmail.com/
>   - https://lore.kernel.org/git/pull.676.v3.git.git.1576571586.gitgitgadget@gmail.com/
>   - The commit message of
> https://lore.kernel.org/git/d3136ef52f3306d465a5a6004cdc9ba5b1ae4148.1580495486.git.gitgitgadget@gmail.com/
>
> >  If we
> > move in a direction where not only blobs but also trees (or even
> > commits) are omitted, we'll definitely want this new option.
>
> Why would this new option be needed if we omitted trees?   If trees
> are omitted based on something like sparse-checkouts, then they are
> omitted based on path; shouldn't we be able to avoid walking trees
> just by noting they modified some path outside a requested sparse
> checkout?
>
> I want grep, log, etc. to behave within the cone of a sparse checkout,
> which means that I need trees of upstream branches within the relevant
> paths anyway.  But theoretically I should certainly be able to avoid
> walking trees outside those paths.
>
> >  And even if
> > a user is not using partial clone at all, I think it is still useful to
> > suppress both the filtering of commits (e.g. when upstream has a commit
> > then revert, it would be reasonable to cherry-pick the same commit on
> > top) and reduce disk reads (although I don't know if this will be the
> > case in practice).
>
> That sounds like yet another argument that the behavior you're arguing
> for should be the default, not a flag we make the users pick to
> workaround bugs.  Yes, sometimes weird behaviors beget usecases (cue
> link to xkcd's comic on emacs spacebar overheating) and we need to
> provide transition plans, but I think this might be a case where
> transitioning makes sense.  From a high level, here's my guess
> (emphasis on guess) at the history:
>
> * am checked for upstream patches, because apply would get confused
> trying to apply an already applied patch
> * legacy-interactive-rebase would check for upstream patches as a
> performance optimization because having to shell out to a separate
> cherry-pick process for each commit is slow (and may have also been
> done partially to match am, even though am did it as a workaround)

Or maybe that wasn't the reasoning?  I'm having a hard time parsing the
history to verify:

a6ec3c1599 ("git-rebase: Use --ignore-if-in-upstream option when
            executing git-format-patch.", 2006-10-03)
  '''This reduces the number of conflicts when rebasing after a series of
     patches to the same piece of code is committed upstream.'''

96ffe892e3 ("rebase -i: ignore patches that are already in the
            upstream", 2007-08-01)
  '''Non-interactive rebase had this from the beginning -- match it by
     using --cherry-pick option to rev-list.'''

and related: 1e0dacdbdb ("rebase: omit patch-identical commits with
    --fork-point", 2014-07-16)

> And now we're in the state where:
> * The check-for-upstream bits hurt performance, significantly enough
> that we have three different reports of folks not liking it (you, me,
> and Taylor)
> * It actively does the wrong thing in cases such as revert + re-apply
> sequences, which exist in practice (and exist a lot more than they
> should, but they absolutely do exist)

Though these are definitely still problems with
check-if-upstream-already behavior.

> We've made changes in other places (e.g. opening an editor for merge
> or rebase, push.default, etc.); is there any reason a similar change
> wouldn't be justified here?

After another day of thought, and my attempt to figure out the reason
above, perhaps my assumptions about the reason behind the original
behavior, any my assumptions about the sanity of switching the default
might not be as grounded as I thought.  Thus, my worries about yet
another flag may be overstated as well, at least in this case.

  reply	other threads:[~2020-03-15  3:22 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-09 20:55 [PATCH] rebase --merge: optionally skip upstreamed commits Jonathan Tan
2020-03-10  2:10 ` Taylor Blau
2020-03-10 15:51   ` Jonathan Tan
2020-03-10 12:17 ` Johannes Schindelin
2020-03-10 16:00   ` Jonathan Tan
2020-03-10 18:56 ` Elijah Newren
2020-03-10 22:56   ` Jonathan Tan
2020-03-12 18:04     ` Jonathan Tan
2020-03-12 22:40       ` Elijah Newren
2020-03-14  8:04         ` Elijah Newren [this message]
2020-03-17  3:03           ` Jonathan Tan
2020-03-18 17:30 ` [PATCH v2] " Jonathan Tan
2020-03-18 18:47   ` Junio C Hamano
2020-03-18 19:28     ` Jonathan Tan
2020-03-18 19:55       ` Junio C Hamano
2020-03-18 20:41         ` Elijah Newren
2020-03-18 23:39           ` Junio C Hamano
2020-03-19  0:17             ` Elijah Newren
2020-03-18 20:20   ` Junio C Hamano
2020-03-26 17:50   ` Jonathan Tan
2020-03-26 19:17     ` Elijah Newren
2020-03-26 19:27     ` Junio C Hamano
2020-03-29 10:12   ` [PATCH v2 4/4] t3402: use POSIX compliant regex(7) Đoàn Trần Công Danh
2020-03-30  4:06 ` [PATCH v3] rebase --merge: optionally skip upstreamed commits Jonathan Tan
2020-03-30  5:09   ` Junio C Hamano
2020-03-30  5:22   ` Danh Doan
2020-03-30 12:13   ` Derrick Stolee
2020-03-30 16:49     ` Junio C Hamano
2020-03-30 16:57     ` Jonathan Tan
2020-03-31 11:55       ` Derrick Stolee
2020-03-31 16:27   ` Elijah Newren
2020-03-31 18:34     ` Junio C Hamano
2020-03-31 18:43       ` Junio C Hamano
2020-04-10 22:27     ` Jonathan Tan
2020-04-11  0:06       ` Elijah Newren
2020-04-11  1:11         ` Jonathan Tan
2020-04-11  2:46           ` Elijah Newren
2020-03-26  7:35 [PATCH 0/3] add travis job for linux with musl libc Đoàn Trần Công Danh
2020-03-26  7:35 ` [PATCH 1/3] ci: libify logic for usage and checking CI_USER Đoàn Trần Công Danh
2020-03-26  7:35 ` [PATCH 2/3] ci: refactor docker runner script Đoàn Trần Công Danh
2020-03-26 16:06   ` Eric Sunshine
2020-03-28 17:53   ` SZEDER Gábor
2020-03-29  6:36     ` Danh Doan
2020-03-26  7:35 ` [PATCH 3/3] travis: build and test on Linux with musl libc and busybox Đoàn Trần Công Danh
2020-03-29  5:49 ` [PATCH 0/3] add travis job for linux with musl libc Junio C Hamano
2020-03-29 10:12 ` [PATCH v2 0/4] Travis + Azure jobs " Đoàn Trần Công Danh
2020-03-29 10:12   ` [PATCH v2 1/4] ci: libify logic for usage and checking CI_USER Đoàn Trần Công Danh
2020-03-29 10:12   ` [PATCH v2 2/4] ci: refactor docker runner script Đoàn Trần Công Danh
2020-04-01 21:51     ` SZEDER Gábor
2020-03-29 10:12   ` [PATCH v2 3/4] travis: build and test on Linux with musl libc and busybox Đoàn Trần Công Danh
2020-04-01 22:18     ` SZEDER Gábor
2020-04-02  1:42       ` Danh Doan
2020-04-07 14:53       ` Johannes Schindelin
2020-04-07 21:35         ` Junio C Hamano
2020-04-10 13:38           ` Johannes Schindelin
2020-03-29 16:23   ` [PATCH v2 0/4] Travis + Azure jobs for linux with musl libc Junio C Hamano
2020-04-02 13:03   ` [PATCH v3 0/6] " Đoàn Trần Công Danh
2020-04-02 13:04     ` [PATCH v3 1/6] ci: make MAKEFLAGS available inside the Docker container in the Linux32 job Đoàn Trần Công Danh
2020-04-02 13:04     ` [PATCH v3 2/6] ci/lib-docker: preserve required environment variables Đoàn Trần Công Danh
2020-04-03  8:22       ` SZEDER Gábor
2020-04-03 10:09         ` Danh Doan
2020-04-03 19:55           ` SZEDER Gábor
2020-04-02 13:04     ` [PATCH v3 3/6] ci/linux32: parameterise command to switch arch Đoàn Trần Công Danh
2020-04-02 13:04     ` [PATCH v3 4/6] ci: refactor docker runner script Đoàn Trần Công Danh
2020-04-02 13:04     ` [PATCH v3 5/6] ci/linux32: libify install-dependencies step Đoàn Trần Công Danh
2020-04-02 13:04     ` [PATCH v3 6/6] travis: build and test on Linux with musl libc and busybox Đoàn Trần Công Danh
2020-04-02 17:53     ` [PATCH v3 0/6] Travis + Azure jobs for linux with musl libc Junio C Hamano
2020-04-03  0:23       ` Danh Doan
2020-04-04  1:08   ` [PATCH v4 0/6] Travis " Đoàn Trần Công Danh
2020-04-04  1:08     ` [PATCH v4 1/6] ci: make MAKEFLAGS available inside the Docker container in the Linux32 job Đoàn Trần Công Danh
2020-04-04  1:08     ` [PATCH v4 2/6] ci/lib-docker: preserve required environment variables Đoàn Trần Công Danh
2020-04-04  1:08     ` [PATCH v4 3/6] ci/linux32: parameterise command to switch arch Đoàn Trần Công Danh
2020-04-04  1:08     ` [PATCH v4 4/6] ci: refactor docker runner script Đoàn Trần Công Danh
2020-04-04  1:08     ` [PATCH v4 5/6] ci/linux32: libify install-dependencies step Đoàn Trần Công Danh
2020-04-04  1:08     ` [PATCH v4 6/6] travis: build and test on Linux with musl libc and busybox Đoàn Trần Công Danh
2020-04-05 20:39     ` [PATCH v4 0/6] Travis jobs for linux with musl libc Junio C Hamano
2020-04-07 14:55       ` Johannes Schindelin
2020-04-07 19:25         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BGcQHT5PkqYw3YRWOv8kbiSumJ+YQ3owkd_7viKWs3q7A@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).