All of lore.kernel.org
 help / color / mirror / Atom feed
* Git blame performance on files with a lot of history
@ 2018-12-14 18:29 Clement Moyroud
  2018-12-14 19:10 ` Bryan Turner
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Clement Moyroud @ 2018-12-14 18:29 UTC (permalink / raw)
  To: git

Hello,

My group at work is migrating a CVS repo to Git. The biggest issue we
face so far is the performance of git blame, especially compared to
CVS on the same file. One file especially causes us trouble: it's a
30k lines file with 25 years of history in 3k+ commits. The complete
repo has 200k+ commits over that same period of time.

Currently, 'cvs annotate' takes 2.7 seconds, while 'git blame'
(without -M nor -C) takes 145s.

I tried using the commit-graph with the Bloom filter, per
https://public-inbox.org/git/61559c5b-546e-d61b-d2e1-68de692f5972@gmail.com/.
No dice:
    > time GIT_TEST_BLOOM_FILTERS=1
/wv/cmoyroud/calibre-src/git-bloom-filters/git-bloom-bin/bin/git
commit-graph write --reachable
    Annotating commits in commit graph: 573705, done.
    Computing commit graph generation numbers: 100% (286441/286441), done.
    Computing commit diff Bloom filters: 100% (286441/286441), done.
    GIT_TEST_BLOOM_FILTERS=1  commit-graph write --reachable  386.80s
user 31.78s system 78% cpu 8:53.87 total
    > time GIT_TEST_BLOOM_FILTERS=1 GIT_TRACE_BLOOM_FILTER=2
GIT_USE_POC_BLOOM_FILTER=y /path/to/git blame master --
important/file.C > /tmp/foo.compiler.bloom
    Blaming lines: 100% (33179/33179), done.
    GIT_TEST_BLOOM_FILTERS=1 GIT_TRACE_BLOOM_FILTER=2
GIT_USE_POC_BLOOM_FILTER=y   145.11s user 0.97s system 99% cpu 2:26.22
total
    > time /path/to/git blame master -- important/file.C >
/tmp/foo.compiler.nobloom
    Blaming lines: 100% (33179/33179), done.
    GIT_TEST_BLOOM_FILTERS=1 GIT_TEST_BLOOM_FILTERS=1
GIT_USE_POC_BLOOM_FILTER=y   141.69s user 0.77s system 99% cpu 2:22.56
total

I used Derrick Stolee's tree at
https://github.com/derrickstolee/git/tree/bloom/stolee

Looking at the blame code, it does not seem to be able to use the
commit graph, so I tried the same rev-list command from the e-mail,
using my own file:
    > GIT_TRACE_BLOOM_FILTER=2 GIT_USE_POC_BLOOM_FILTER=y
/path/to/git rev-list --count --full-history HEAD -- important/file.C
    3576

No trace information there either. Running 'strings' on the binary
reports the env. variable names, so I'm not totally crazy. Let me know
if I tried the right thing :)

Looks like blame performance is gonna be the biggest issue for us, so
I'm really interested in seeing improvements there. Let me know if
there's anything else I can try.

Cheers,

Clément

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-12-17 20:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-14 18:29 Git blame performance on files with a lot of history Clement Moyroud
2018-12-14 19:10 ` Bryan Turner
2018-12-17 20:43   ` Clement Moyroud
2018-12-14 21:31 ` Derrick Stolee
2018-12-17 20:59   ` Clement Moyroud
2018-12-14 22:48 ` Ævar Arnfjörð Bjarmason
2018-12-17 20:30   ` Clement Moyroud

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.