fast-import slowness when importing large files with small differences

* fast-import slowness when importing large files with small differences
@ 2018-06-29  9:44 Mike Hommey
  2018-06-29 20:14 ` Stefan Beller
  2018-06-29 22:10 ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 17+ messages in thread
From: Mike Hommey @ 2018-06-29  9:44 UTC (permalink / raw)
  To: git

Hi,

I noticed some slowness when fast-importing data from the Firefox mercurial
repository, where fast-import spends more than 5 minutes importing ~2000
revisions of one particular file. I reduced a testcase while still
using real data. One could synthesize data with kind of the same
properties, but I figured real data could be useful.

To reproduce:
$ git clone https://gist.github.com/b6b8edcff2005cc482cf84972adfbba9.git foo
$ git init bar
$ cd bar
$ python ../foo/import.py ../foo/data.gz | git fast-import --depth=2000

(--depth=2000 to minimize the pack size)

The python script doesn't have much overhead:
$ time python ../foo/import.py ../foo/data.gz > /dev/null

real	0m14.564s
user	0m9.813s
sys	0m4.703s

It generates about 26GB of data from that 4.2MB data.gz.

$ python ../foo/import.py ../foo/data.gz | time git fast-import --depth=2000
git-fast-import statistics:
---------------------------------------------------------------------
Alloc'd objects:       5000
Total objects:         1868 (       133 duplicates                  )
      blobs  :         1868 (       133 duplicates       1867 deltas of       1868 attempts)
      trees  :            0 (         0 duplicates          0 deltas of          0 attempts)
      commits:            0 (         0 duplicates          0 deltas of          0 attempts)
      tags   :            0 (         0 duplicates          0 deltas of          0 attempts)
Total branches:           0 (         0 loads     )
      marks:           1024 (         0 unique    )
      atoms:              0
Memory total:          2282 KiB
       pools:          2048 KiB
     objects:           234 KiB
---------------------------------------------------------------------
pack_report: getpagesize()            =       4096
pack_report: core.packedGitWindowSize = 1073741824
pack_report: core.packedGitLimit      = 35184372088832
pack_report: pack_used_ctr            =          0
pack_report: pack_mmap_calls          =          0
pack_report: pack_open_windows        =          0 /          0
pack_report: pack_mapped              =          0 /          0
---------------------------------------------------------------------

321.61user 6.60system 5:50.08elapsed 93%CPU (0avgtext+0avgdata 83192maxresident)k
0inputs+10568outputs (0major+38689minor)pagefaults 0swaps

(The resulting pack is 5.3MB, fwiw)

Obviously, sha1'ing 26GB is not going to be free, but it's also not the
dominating cost, according to perf:

    63.52%  git-fast-import  git-fast-import     [.] create_delta_index
    17.46%  git-fast-import  git-fast-import     [.] sha1_compression_states
     9.89%  git-fast-import  git-fast-import     [.] ubc_check
     6.23%  git-fast-import  git-fast-import     [.] create_delta
     2.49%  git-fast-import  git-fast-import     [.] sha1_process

That's a whole lot of time spent on create_delta_index.

FWIW, if delta was 100% free (yes, I tested that), the fast-import would
take 1:40 with the following profile:

    58.74%  git-fast-import  git-fast-import     [.] sha1_compression_states
    32.45%  git-fast-import  git-fast-import     [.] ubc_check
     8.25%  git-fast-import  git-fast-import     [.] sha1_process

I toyed with the idea of eliminating common head and tail before
creating the delta, and got some promising result: a fast-import taking
3:22 instead of 5:50, with the following profile:

    34.67%  git-fast-import  git-fast-import     [.] create_delta_index
    30.88%  git-fast-import  git-fast-import     [.] sha1_compression_states
    17.15%  git-fast-import  git-fast-import     [.] ubc_check
     7.25%  git-fast-import  git-fast-import     [.] store_object
     4.47%  git-fast-import  git-fast-import     [.] sha1_process
     2.72%  git-fast-import  git-fast-import     [.] create_delta2

The resulting pack is however much larger (for some reason, many objects
are left non-deltaed), and the deltas are partly broken (they don't
apply cleanly), but that just tells the code is not ready to be sent. I
don't expect working code would be much slower than this. The remaining
question is whether this is beneficial for more normal cases.

I also seemed to remember when I tested a while ago, that somehow xdiff
handles those files faster than diff-delta, and I'm wondering if it
would make sense to to make the pack code use xdiff. So I tested
replacing diff_delta with a call to xdi_diff_outf with a callback that
does nothing and zeroed out xpparam_t and xdemitconf_t (not sure that's
best, though, I haven't looked very deeply), and that finished in 5:15
with the following profile (without common head trimming,
xdiff-interface apparently does common tail trimming):

    32.99%  git-fast-import  git-fast-import     [.] xdl_prepare_ctx.isra.0
    20.42%  git-fast-import  git-fast-import     [.] sha1_compression_states
    15.26%  git-fast-import  git-fast-import     [.] xdl_hash_record
    11.65%  git-fast-import  git-fast-import     [.] ubc_check
     3.09%  git-fast-import  git-fast-import     [.] xdl_recs_cmp
     3.03%  git-fast-import  git-fast-import     [.] sha1_process
     2.91%  git-fast-import  git-fast-import     [.] xdl_prepare_env

So maybe it would make sense to consolidate the diff code (after all,
diff-delta.c is an old specialized fork of xdiff). With manual trimming
of common head and tail, this gets down to 3:33.

I'll also note that Facebook has imported xdiff from the git code base
into mercurial and improved performance on it, so it might also be worth
looking at what's worth taking from there.

Cheers,

Mike

^ permalink raw reply	[flat|nested] 17+ messages in thread