From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Mike Hommey <mh@glandium.org>
Cc: git@vger.kernel.org
Subject: Re: fast-import slowness when importing large files with small differences
Date: Sat, 30 Jun 2018 00:10:24 +0200 [thread overview]
Message-ID: <87o9ftckhb.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <20180629094413.bgltep6ntlza6vhz@glandium.org>
On Fri, Jun 29 2018, Mike Hommey wrote:
> I noticed some slowness when fast-importing data from the Firefox mercurial
> repository, where fast-import spends more than 5 minutes importing ~2000
> revisions of one particular file. I reduced a testcase while still
> using real data. One could synthesize data with kind of the same
> properties, but I figured real data could be useful.
>
> To reproduce:
> $ git clone https://gist.github.com/b6b8edcff2005cc482cf84972adfbba9.git foo
> $ git init bar
> $ cd bar
> $ python ../foo/import.py ../foo/data.gz | git fast-import --depth=2000
>
> [...]
> So maybe it would make sense to consolidate the diff code (after all,
> diff-delta.c is an old specialized fork of xdiff). With manual trimming
> of common head and tail, this gets down to 3:33.
>
> I'll also note that Facebook has imported xdiff from the git code base
> into mercurial and improved performance on it, so it might also be worth
> looking at what's worth taking from there.
It would be interesting to see how does this compares with a more naïve
approach of committing every version of this file one-at-a-time into a
new repository (with & without gc.auto=0). Perhaps deltaing as we go is
suboptimal compared to just writing out a lot of redundant data and
repacking it all at once later.
next prev parent reply other threads:[~2018-06-29 22:10 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-29 9:44 fast-import slowness when importing large files with small differences Mike Hommey
2018-06-29 20:14 ` Stefan Beller
2018-06-29 20:28 ` [PATCH] xdiff: reduce indent heuristic overhead Stefan Beller
2018-06-29 21:17 ` Junio C Hamano
2018-06-29 23:37 ` Stefan Beller
2018-06-30 1:11 ` Jun Wu
2018-07-01 15:57 ` Michael Haggerty
2018-07-02 17:27 ` Stefan Beller
2018-07-03 9:15 ` Michael Haggerty
2018-07-27 22:23 ` Stefan Beller
2018-07-03 18:14 ` Junio C Hamano
2018-06-29 20:39 ` fast-import slowness when importing large files with small differences Jeff King
2018-06-29 20:51 ` Stefan Beller
2018-06-29 22:10 ` Ævar Arnfjörð Bjarmason [this message]
2018-06-29 23:35 ` Mike Hommey
2018-07-03 16:05 ` Ævar Arnfjörð Bjarmason
2018-07-03 22:38 ` Mike Hommey
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87o9ftckhb.fsf@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=mh@glandium.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).