git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Lubomir Rintel <lkundrak@v3.sk>
Cc: Jonathan Nieder <jrnieder@gmail.com>, git@vger.kernel.org
Subject: Re: Git 2.26 fetches many times more objects than it should, wasting gigabytes
Date: Wed, 22 Apr 2020 05:57:02 -0400	[thread overview]
Message-ID: <20200422095702.GA475060@coredump.intra.peff.net> (raw)
In-Reply-To: <20200422084254.GA27502@furthur.local>

[-- Attachment #1: Type: text/plain, Size: 2366 bytes --]

On Wed, Apr 22, 2020 at 10:42:54AM +0200, Lubomir Rintel wrote:

> my git repository with Linux grows several gigabytes each time I fetch:

Thanks for this report. We've been tracking the issue but have had
trouble reproducing it.

To get you unstuck, the immediate workaround is to drop back to the
older protocol, like:

  git -c protocol.version=0 fetch --all

>   [lkundrak@furthur linux]$ git fetch --all

Here's a recipe based on your fetches that shows the problem.

  # start with an up-to-date regular clone of linus's tree; I had one
  # lying around from https://github.com/torvalds/linux, but the source
  # shouldn't matter
  rm -rf repo.git
  git clone --bare /path/to/linux repo.git
  cd repo.git

  git remote add next git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next
  git remote add xo git@github.com:hackerspace/olpc-xo175-linux
  git fetch --all

The "next" fetch grabs about 30MB of objects. But the xo one downloads
1.5GB from 7.4M objects. That's using v2.26.2, so protocol 2.

If I switch to the v0 protocol like:

  git -c protocol.version=0 fetch --all

then the xo fetch is only 48k objects, 23MB. So this is definitely
exhibiting the problem.

There are a few data points we've been wanting to collect:

 - does setting fetch.negotiationAlgorithm=skipping help? Yes, but not
   as much as the v0 protocol does. It sends 84k objects, 33MB.

 - does the same fetch over v0 stateless-http have similar problems? No,
   swapping out the second "remote add" for:

     git remote add xo https://github.com/hackerspace/olpc-xo175-linux

   results in the same 48k, 32MB fetch. The v0 conversation involved 10
   POST requests. The v2 conversation only took 6 (and generates the
   same big response as the ssh session, unsurprisingly).

So it really does seem like something in v2 is not trying as hard to
negotiate as v0 did, even when using stateless-http.

I'm attaching for-each-ref output before and after the xo fetch. That
should be sufficient to recreate the situation synthetically even once
these repos have moved on.

I have GIT_TRACE_PACKET output showing the whole negotiation, but it's
pretty hard to look at. I _think_ a lot more is said in the v0
conversation, but it's difficult to sort out because there's a lot of
extra packet framing as we shuttle bits back and forth between
remote-curl and fetch-pack.

-Peff

[-- Attachment #2: refs.before.gz --]
[-- Type: application/gzip, Size: 18588 bytes --]

[-- Attachment #3: refs.after.gz --]
[-- Type: application/gzip, Size: 23471 bytes --]

  reply	other threads:[~2020-04-22  9:57 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-22  8:42 Git 2.26 fetches many times more objects than it should, wasting gigabytes Lubomir Rintel
2020-04-22  9:57 ` Jeff King [this message]
2020-04-22 10:30   ` Jeff King
2020-04-22 10:40     ` Jeff King
2020-04-22 15:33       ` Junio C Hamano
2020-04-22 19:33         ` Jeff King
2020-04-23 21:37       ` Jonathan Tan
2020-04-23 21:54         ` Junio C Hamano
2020-04-24  5:32         ` Jeff King
2020-04-22 15:40   ` Jonathan Nieder
2020-04-22 19:36     ` Jeff King
2020-04-22 15:50   ` [PATCH] Revert "fetch: default to protocol version 2" Jonathan Nieder
2020-04-22 18:23     ` Junio C Hamano
2020-04-22 19:40     ` Jeff King
2020-04-22 19:47       ` Jeff King
2020-04-22 16:53   ` Git 2.26 fetches many times more objects than it should, wasting gigabytes Jonathan Nieder
2020-04-22 17:32     ` Junio C Hamano
2020-04-22 19:18     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200422095702.GA475060@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=lkundrak@v3.sk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).