git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matheus Tavares <matheus.bernardino@usp.br>
To: Derrick Stolee <stolee@gmail.com>
Cc: Jeff Hostetler via GitGitGadget <gitgitgadget@gmail.com>,
	git <git@vger.kernel.org>,
	Jeff Hostetler <jeffhost@microsoft.com>
Subject: Re: [PATCH 0/9] Trace2 stopwatch timers and global counters
Date: Tue, 21 Dec 2021 20:27:58 -0300	[thread overview]
Message-ID: <CAHd-oW6ChTb94hDOUzZZCAo5KBu5_QvD8sbpbSb2BQiWsXkMaw@mail.gmail.com> (raw)
In-Reply-To: <92923ca0-fbf9-e763-5735-214f3ad0cc3a@gmail.com>

On Tue, Dec 21, 2021 at 11:51 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 12/20/2021 10:01 AM, Jeff Hostetler via GitGitGadget wrote:
> >
> >  3. Rationale
> >
> > Timers and counters are an alternative to the existing "region" and "data"
> > events. The latter are intended to trace the major flow (or phases) of the
> > program and possibly capture the amount of work performed within a loop, for
> > example. The former are offered as a way to measure activity that is not
> > localized, such as the time spent in zlib or lstat, which may be called from
> > many different parts of the program.
>
> I'm excited for these API features.

Me too! This would have been very useful on some experiments I had to
run in the past.

Thanks for working on it, Jeff :)

> I also like your attention to thread contexts. I think these timers
> would be very interesting to use in parallel checkout. CC'ing Matheus
> for his thoughts on where he would want timer summaries for that
> feature.

For parallel checkout, I think it would be interesting to have timer
summaries for open/close, fstat/lstat, write, and
inflation/delta-reconstruction. Perhaps pkt-line routines too, so that
we can see how much time we spend in inter-process communication.

It would be nice to have timer information for disk reading as well
(more on that below), but I don't think it is possible since we read
the objects through mmap() and thus, we cannot easily isolate the
actual reading time from the decompression time :(

> I would probably want the per-thread summary to know if we
> are blocked on one really long thread while the others finish quickly.

That would be interesting. Parallel checkout actually uses
subprocesses, but I can see the per-thread summary being useful on
grep, for example. (Nevertheless, the use case you mentioned for the
timers -- to evaluate the work balance on parallel checkout -- seems
very interesting.)

> Within that: what are the things causing us to be slow? Is it zlib?
> Is it lstat()?

On my tests, the bottleneck on checkout heavily depended on the
underlying storage type. On HDDs, the bottleneck was object reading
(i.e. page faults on mmap()-ed files), with about 70% to 80% of the
checkout runtime.

On SSDs, reading was much faster, so CPU (i.e. inflation) became the
bottleneck, with 50% of the runtime. (Inflation only lost to reading
when checking out from *many* loose objects.)

Finally, on NFS, file creation with open(O_CREAT | O_EXCL) and fstat()
(which makes the NFS client flush previously cached writes to the
server) were the bottlenecks, with about 40% of the total runtime
each.

These numbers come from a (sequential) `git checkout .` execution on
an empty working tree of the Linux kernel (v5.12), and they were
gathered using eBPF-based profilers. For other operations, especially
ones that require many file removals or more laborious tree merging in
unpack_trees(), I suspect the bottlenecks may change.

If anyone would be interested in seeing the flamegraphs and other
plots for these profiling numbers, I have them at:
https://matheustavares.gitlab.io/annexes/parallel-checkout/profiling

And there is a bit more context at:
https://matheustavares.gitlab.io/posts/parallel-checkout

  reply	other threads:[~2021-12-21 23:28 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-20 15:01 [PATCH 0/9] Trace2 stopwatch timers and global counters Jeff Hostetler via GitGitGadget
2021-12-20 15:01 ` [PATCH 1/9] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx Jeff Hostetler via GitGitGadget
2021-12-20 15:01 ` [PATCH 2/9] trace2: convert tr2tls_thread_ctx.thread_name from strbuf to char* Jeff Hostetler via GitGitGadget
2021-12-20 16:31   ` Ævar Arnfjörð Bjarmason
2021-12-20 19:07     ` Jeff Hostetler
2021-12-20 19:35       ` Ævar Arnfjörð Bjarmason
2021-12-22 16:32         ` Jeff Hostetler
2021-12-21  7:33     ` Junio C Hamano
2021-12-21  7:22   ` Junio C Hamano
2021-12-22 16:28     ` Jeff Hostetler
2021-12-22 19:57       ` Junio C Hamano
2021-12-20 15:01 ` [PATCH 3/9] trace2: defer free of TLS CTX until program exit Jeff Hostetler via GitGitGadget
2021-12-21  7:30   ` Junio C Hamano
2021-12-22 21:59     ` Jeff Hostetler
2021-12-22 22:56       ` Junio C Hamano
2021-12-22 23:04         ` Jeff Hostetler
2021-12-23  7:38         ` Johannes Sixt
2021-12-23 18:18           ` Junio C Hamano
2021-12-27 18:51             ` Jeff Hostetler
2021-12-20 15:01 ` [PATCH 4/9] trace2: add thread-name override to event target Jeff Hostetler via GitGitGadget
2021-12-20 15:01 ` [PATCH 5/9] trace2: add thread-name override to perf target Jeff Hostetler via GitGitGadget
2021-12-20 15:01 ` [PATCH 6/9] trace2: add timer events to perf and event target formats Jeff Hostetler via GitGitGadget
2021-12-20 16:39   ` Ævar Arnfjörð Bjarmason
2021-12-20 19:44     ` Jeff Hostetler
2021-12-21 14:20   ` Derrick Stolee
2021-12-20 15:01 ` [PATCH 7/9] trace2: add stopwatch timers Jeff Hostetler via GitGitGadget
2021-12-20 16:42   ` Ævar Arnfjörð Bjarmason
2021-12-22 21:38     ` Jeff Hostetler
2021-12-21 14:45   ` Derrick Stolee
2021-12-22 21:57     ` Jeff Hostetler
2021-12-20 15:01 ` [PATCH 8/9] trace2: add counter events to perf and event target formats Jeff Hostetler via GitGitGadget
2021-12-20 16:51   ` Ævar Arnfjörð Bjarmason
2021-12-22 22:56     ` Jeff Hostetler
2021-12-20 15:01 ` [PATCH 9/9] trace2: add global counters Jeff Hostetler via GitGitGadget
2021-12-20 17:14   ` Ævar Arnfjörð Bjarmason
2021-12-22 22:18     ` Jeff Hostetler
2021-12-21 14:51 ` [PATCH 0/9] Trace2 stopwatch timers and " Derrick Stolee
2021-12-21 23:27   ` Matheus Tavares [this message]
2021-12-28 19:36 ` [PATCH v2 " Jeff Hostetler via GitGitGadget
2021-12-28 19:36   ` [PATCH v2 1/9] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx Jeff Hostetler via GitGitGadget
2021-12-29  0:48     ` Ævar Arnfjörð Bjarmason
2021-12-28 19:36   ` [PATCH v2 2/9] trace2: convert tr2tls_thread_ctx.thread_name from strbuf to flex array Jeff Hostetler via GitGitGadget
2021-12-29  1:11     ` Ævar Arnfjörð Bjarmason
2021-12-29 16:46       ` Jeff Hostetler
2021-12-28 19:36   ` [PATCH v2 3/9] trace2: defer free of thread local storage until program exit Jeff Hostetler via GitGitGadget
2021-12-28 19:36   ` [PATCH v2 4/9] trace2: add thread-name override to event target Jeff Hostetler via GitGitGadget
2021-12-28 19:36   ` [PATCH v2 5/9] trace2: add thread-name override to perf target Jeff Hostetler via GitGitGadget
2021-12-29  1:48     ` Ævar Arnfjörð Bjarmason
2021-12-29 17:15       ` Jeff Hostetler
2021-12-28 19:36   ` [PATCH v2 6/9] trace2: add timer events to perf and event target formats Jeff Hostetler via GitGitGadget
2021-12-28 19:36   ` [PATCH v2 7/9] trace2: add stopwatch timers Jeff Hostetler via GitGitGadget
2021-12-28 19:36   ` [PATCH v2 8/9] trace2: add counter events to perf and event target formats Jeff Hostetler via GitGitGadget
2021-12-28 19:36   ` [PATCH v2 9/9] trace2: add global counters Jeff Hostetler via GitGitGadget
2021-12-29  1:54   ` [PATCH v2 0/9] Trace2 stopwatch timers and " Ævar Arnfjörð Bjarmason
2021-12-30 16:42     ` Jeff Hostetler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHd-oW6ChTb94hDOUzZZCAo5KBu5_QvD8sbpbSb2BQiWsXkMaw@mail.gmail.com \
    --to=matheus.bernardino@usp.br \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jeffhost@microsoft.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).