From: "D. Ben Knoble" <ben.knoble@gmail.com>
To: git@vger.kernel.org
Subject: git-status performance with submodules
Date: Mon, 2 Dec 2019 01:19:49 -0500 [thread overview]
Message-ID: <CALnO6CCoXOZTsfag6yN_Ffn+H7KE-KTzm+P-GqLKnDMg8j_Qmg@mail.gmail.com> (raw)
[If this has already gone through multiple times, I apologize for the
repetition; I have had a hard time getting GMail to send this. Past
versions had attachments, which I believe contributed to failures.
This one has none, but has links to all the content.]
Hello all,
I have a concern about the performance of git-status with many (~38)
submodules. As part of a (large-scale) system dynamics class, I was tasked
with identifying a performance problem, tracing it using KUTrace(2)[3], and
subsequently investigating it. I ended up with some unique observations about
git-status and submodules[2].
The interactive HTML traces are available on Google Drive[4][5].
I won't recreate all the details here, but I would encourage you to play with
the traces, or at least go through the slides.
### The short-version
Git status is slow(3).
### Baseline
- time git-status, with many submodules, and --ignore-submodules=none
0.497s
- time git-status in non-submodule heavy repos
0.014s
### What I consider a temporary fix
- time git-status, with many submodules, and --ignore-submodules=all
0.026s
### What I would like to see
I would like to improve the git-status performance with this many submodules,
so that I can remove diff.ignoreSubmodules=none from my config (it is useful
information, and the flag affects many commands). I would be willing to work
on a discussed and designed fix.
### What I am curious about
From the traces (attached), it appears that git-status suffers from a lack of
(possibly embarrassing) parallelism: I would expect each submodule to be
independently check-able, but the process section of the trace has them
executing serially (for reasons unknown to me). The apparent need to fork/exec
many processes in this way appears to also be a source of latency, along with
the very large number of filesystem-related syscalls (if my understanding is
correct).
What can we do to fix this? Is there a reason for this (really terribly slow)
serial execution? Is this something developers haven't bothered to optimize
("unexpected use case")? If so, I would like to discuss taking a crack at it,
because I do have at least one repository with this many submodules, and I
care about its performance.
---
Notes
1) All timings were taken with the https://github.com/benknoble/Dotfiles repo
from around commit da194a8f4104a9fc74e8895ebc8512434f07d393
2) KUTrace is a set of kernel patches and userspace programs that provide
low-overhead tracing, as well as post-processing those traces
3) Timings taken on my machine (2012 macbook pro; can provide more details if
requested)
---
Links
[1]: https://docs.google.com/presentation/d/1z-6ffE9KY-Jswl2BiWzYV2DG6fOutgWSi_aZ5uql__s/edit?usp=sharing
[2]: https://benknoble.github.io/blog/2019/11/07/git-stat/
[3]: https://github.com/dicksites/KUtrace
[4]: https://drive.google.com/file/d/1JyYO420yWp7XvNJJ8HLOPU0o6mesSKZf/view?usp=sharing
[5]: https://drive.google.com/file/d/1BqqxH0PRCYz_vvYkBBFpbL5dkFTLPyuK/view?usp=sharing
next reply other threads:[~2019-12-02 6:20 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-02 6:19 D. Ben Knoble [this message]
2019-12-02 6:50 ` git-status performance with submodules Junio C Hamano
2019-12-02 14:05 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CALnO6CCoXOZTsfag6yN_Ffn+H7KE-KTzm+P-GqLKnDMg8j_Qmg@mail.gmail.com \
--to=ben.knoble@gmail.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).