git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: Elijah Newren <newren@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Derrick Stolee <stolee@gmail.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Minh Thai <mthai@google.com>
Subject: Re: Huge push upload despite only having a tiny change
Date: Tue, 2 Jun 2020 18:53:14 -0700	[thread overview]
Message-ID: <20200603015314.GA253041@google.com> (raw)
In-Reply-To: <CABPp-BEswHLhymQ1_07g3qqu=7kFR3eQyAHR0qMgSvi6THy=zQ@mail.gmail.com>

(cc: Jonathan Tan for "git push" discussion; Minh Thai for negotiate
 hook discussion)
Hi,

Elijah Newren wrote:

> I had a user report that two nearly identical pushes (the second being
> an amended commit of the first) took dramatically differing amounts of
> time and amount of data uploaded (from 4.5 seconds and about 21k
> uploaded, to 223 seconds and over 100 MB uploaded).

Yes, this is why I want push negotiation.  (It's been something we've
been discussing for protocol v2 for push.)

If they fetch before they push, does that help?

[...]
> * The server was running Gerrit 3.1.4 (i.e. jgit).

Gerrit servers have the interesting property that many people are
pushing to the same Git repo.  (This is common in some other hosting
scenarios such as Gitlab, but the most common case among Git users
still seems to be pushing to a repo you own.)

When you push, because there's no negotiation phase, the only
information we have about what is present on the server is what is in
the ref advertisement.  (We have remote-tracking branches which seem
potentially useful, but we don't have a way to ask the server "are
these objects you still have?")  The ref advertisement describes the
*current* state of all refs.  If I am pushing a new topic branch (in
Gerrit jargon, a new change for review) based on the *old* state of a
branch that has moved on, then we can only hope that some other ref
(for example a tag) points to a recent enough state to give us a base
for what to upload.

There is one trick a server can use to mitigate this: advertise some
refs that don't exist!  If you advertise a ref ".have", then Git
will understand that the server has that object but it is not an
actual ref.  Gerrit uses this trick in its HackPushNegotiateHook[1]
to advertise a few recent commits.

At $DAYJOB we ran into some clients where "a few recent commits" was
not sufficient to get to history that the client is aware of.  We
tried changing it to do some exponential deepening, and that helped.
We should probably upstream that change for other Gerrit users.

Gerrit also advertises some other ".have"s, for example for recent
changes by the same author in case you're uploading an amended
version.  That's less relevant here.

But fundamentally, this is something that cannot be addressed properly
without improving the "git push" protocol (adding a negotiation
phase).

Summary: (1) try fetching first (2) let's improve
HackPushNegotiateHook#advertiseRefs (3) let's improve "git push"
protocol to make this a problem of the past.

Thanks and hope that helps,
Jonathan

[1] https://gerrit.googlesource.com/gerrit/+/e1f4fee1f3ce674f44cb9788e6798ff8522bb876/java/com/google/gerrit/server/git/receive/HackPushNegotiateHook.java#111

  parent reply	other threads:[~2020-06-03  1:53 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-02 19:21 Huge push upload despite only having a tiny change Elijah Newren
2020-06-02 19:40 ` Derrick Stolee
2020-06-03  1:35   ` Elijah Newren
2020-06-03  1:53 ` Jonathan Nieder [this message]
2020-06-03  2:36   ` Elijah Newren
2020-06-03 20:39   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200603015314.GA253041@google.com \
    --to=jrnieder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=mthai@google.com \
    --cc=newren@gmail.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).