All of lore.kernel.org
 help / color / mirror / Atom feed
* [GSoC] Git Blog 6
@ 2021-06-27 12:43 ZheNing Hu
  2021-06-27 21:52 ` Christian Couder
  0 siblings, 1 reply; 3+ messages in thread
From: ZheNing Hu @ 2021-06-27 12:43 UTC (permalink / raw)
  To: Git List
  Cc: Junio C Hamano, Christian Couder, Hariom verma, Bagas Sanjaya, Jeff King

My sixth week blog finished:
The web version is here:
https://adlternative.github.io/GSOC-Git-Blog-6/

## Week6: Performance Testing

This week, `Christan`, `Hariom`, and `Bagas` reviewed my patches. I
revised the commit
messages and made some content adjustments based on their suggestions.
Patch is here:
 [[PATCH v6 00/15] [GSOC][RFC] cat-file: reuse ref-filter
logic](https://lore.kernel.org/git/pull.980.v6.git.1624797350.gitgitgadget@gmail.com/).

Compared with the big changes that made `git cat-file` re-use
`ref-filter` logic in previous
weeks, the overall change this week is not very large.

Although from a functional point of view, the current version of `git
cat-file` can support more
atoms, but from a performance point of view, it is far less than the
previous version.

The following is an excerpt from the commit message of `[GSOC]
cat-file: reuse ref-filter logic`:

```
The performance for `git cat-file --batch-all-objects
--batch-check` on the Git repository itself with performance
testing tool `hyperfine` changes from 669.4 ms ± 31.1 ms to
1.134 s ± 0.063 s.

The performance for `git cat-file --batch-all-objects --batch
>/dev/null` on the Git repository itself with performance testing
tool `time` change from "27.37s user 0.29s system 98% cpu 28.089
total" to "33.69s user 1.54s system 87% cpu 40.258 total".
```

Now the performance of `cat-file --batch` is almost half of the
previous performance of it,
because of the complex logic in `ref-filter`. So I am thinking, how
can I find the performance
bottleneck in `ref-filter` and optimize it?

So I try to find and use performance testing tools to analyze the
time-consuming steps of
`git cat-file --batch`.

* Using Google's `gperftools`:
1. Add the link parameter `-lprofiler` in `config.mak`: `CFLAGS += -lprofiler`.
2. `make`.
3. Use `CPUPROFILE=/tmp/prof.out /<path>/git cat-file --batch-check
--batch-all-objects`
to run the git and general `prof.out`, which contains the results of
performance analysis.
4. Use `pprof --text /<path>/git /tmp/prof.out` to display the result
in the terminal.

* `git cat-file` after re-using the `ref-filter` logic:

```
Using local file /<path>/git.
Using local file /<path>/prof.out.
/usr/bin/addr2line: /<path>/git: file format not recognized
Total: 283 samples
62 21.9% 21.9% 62 21.9% __GI___libc_write
34 12.0% 33.9% 34 12.0% 000055d9164cdc36
20 7.1% 41.0% 20 7.1% inflate
12 4.2% 45.2% 12 4.2% inflateCodesUsed@@ZLIB_1.2.9
8 2.8% 48.1% 8 2.8% __memcmp_avx2_movbe
7 2.5% 50.5% 7 2.5% 000055d9164cdc33
5 1.8% 52.3% 5 1.8% __libc_open64
5 1.8% 54.1% 5 1.8% __memmove_avx_unaligned_erms
5 1.8% 55.8% 5 1.8% _int_malloc
3 1.1% 56.9% 3 1.1% 000055d9164ccbce
3 1.1% 58.0% 3 1.1% malloc_consolidate
2 0.7% 58.7% 2 0.7% 000055d916478d78
2 0.7% 59.4% 2 0.7% 000055d9164cc821
2 0.7% 60.1% 2 0.7% 000055d9164cc8f5
2 0.7% 60.8% 2 0.7% 000055d9164ccc49
2 0.7% 61.5% 2 0.7% 000055d91659dc02
2 0.7% 62.2% 2 0.7% __GI___libc_free
2 0.7% 62.9% 2 0.7% __calloc
2 0.7% 63.6% 2 0.7% __fstatat64
2 0.7% 64.3% 2 0.7% __memset_avx2_unaligned_erms
2 0.7% 65.0% 2 0.7% __strlen_avx2
2 0.7% 65.7% 2 0.7% __vfprintf_internal
...
1 0.4% 92.9% 1 0.4% _IO_new_do_write
1 0.4% 93.3% 1 0.4% _IO_new_file_write
1 0.4% 93.6% 1 0.4% _IO_old_init
1 0.4% 94.0% 1 0.4% _IO_str_init_static_internal
1 0.4% 94.3% 1 0.4% __GI__IO_default_xsputn
1 0.4% 94.7% 1 0.4% __GI__IO_fwrite
1 0.4% 95.1% 1 0.4% __GI__IO_setb
1 0.4% 95.4% 1 0.4% __GI___libc_malloc
1 0.4% 95.8% 1 0.4% __GI___mmap64
1 0.4% 96.1% 1 0.4% __GI___qsort_r
1 0.4% 96.5% 1 0.4% __GI_munmap
1 0.4% 96.8% 1 0.4% __abi_tag
1 0.4% 97.2% 1 0.4% __strchr_avx2
1 0.4% 97.5% 1 0.4% _int_free
1 0.4% 97.9% 1 0.4% _itoa_word
1 0.4% 98.2% 1 0.4% adler32_z@@ZLIB_1.2.9
1 0.4% 98.6% 1 0.4% inflateBackEnd@@ZLIB_1.2.0
1 0.4% 98.9% 1 0.4% inflateReset2@@ZLIB_1.2.3.4
1 0.4% 99.3% 4 1.4% msort_with_tmp.part.0
1 0.4% 99.6% 1 0.4% unlink_chunk.constprop.0
1 0.4% 100.0% 1 0.4% zError
...
0 0.0% 100.0% 150 53.0% __libc_start_main
```

* `git cat-file` before re-using the `ref-filter` logic:

```
Using local file /<path>/git.
Using local file /tmp/prof.out2.
/usr/bin/addr2line: /<path>/git: file format not recognized
Total: 234 samples
52 22.2% 22.2% 52 22.2% __GI___libc_write
24 10.3% 32.5% 24 10.3% 00005564fbe6b0da
23 9.8% 42.3% 23 9.8% inflateCodesUsed@@ZLIB_1.2.9
11 4.7% 47.0% 11 4.7% inflate
9 3.8% 50.9% 9 3.8% 00005564fbe6b0d7
7 3.0% 53.8% 7 3.0% __memcmp_avx2_movbe
5 2.1% 56.0% 5 2.1% _int_malloc
4 1.7% 57.7% 4 1.7% __libc_open64
3 1.3% 59.0% 3 1.3% __GI__IO_default_xsputn
3 1.3% 60.3% 3 1.3% __memmove_avx_unaligned_erms
3 1.3% 61.5% 3 1.3% __strchrnul_avx2
2 0.9% 62.4% 2 0.9% 00005564fbe1621c
2 0.9% 63.2% 2 0.9% 00005564fbe6813a
2 0.9% 64.1% 2 0.9% 00005564fbe6b0dd
2 0.9% 65.0% 2 0.9% 00005564fbe6b0e9
2 0.9% 65.8% 2 0.9% 00005564fbe6b8f6
2 0.9% 66.7% 2 0.9% 00005564fbef4ce6
2 0.9% 67.5% 2 0.9% __GI___libc_malloc
...
1 0.4% 97.4% 1 0.4% _IO_new_file_write
1 0.4% 97.9% 1 0.4% _IO_new_file_xsputn
1 0.4% 98.3% 1 0.4% __GI___libc_free
1 0.4% 98.7% 1 0.4% __strlen_avx2
1 0.4% 99.1% 1 0.4% __vsnprintf_internal
1 0.4% 99.6% 1 0.4% adler32_z@@ZLIB_1.2.9
1 0.4% 100.0% 1 0.4% inflateBackEnd@@ZLIB_1.2.0
...
0 0.0% 100.0% 118 50.4% __libc_start_main
0 0.0% 100.0% 4 1.7% msort_with_tmp.part.0
```

Ignore `write`, `inflate` and `000055d9164cdc36`, the execution time of `memcmp`
and `memmove` is very large.

However, whether it is `git cat-file` before or after re-using the
`ref-filter` logic,
the functions they call take up similar proportions of time.
* Using `perf`:
`perf top -p <git-pid>`

```
12.72% libc-2.33.so [.] __memmove_avx_unaligned_erms
7.39% libz.so.1.2.11 [.] inflate
5.56% libz.so.1.2.11 [.] 0x00000000000088ba
5.27% libz.so.1.2.11 [.] adler32_z
3.46% git [.] patch_delta
```
We can see that `memmove` is still the part that accounts for the
largest proportion of time.

We can already come to a conclusion: A lot of the time of `cat-file
--batch` is used for data copy,
this may be the focus of our later performance optimization.

--
ZheNing Hu

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [GSoC] Git Blog 6
  2021-06-27 12:43 [GSoC] Git Blog 6 ZheNing Hu
@ 2021-06-27 21:52 ` Christian Couder
  2021-06-28 15:55   ` ZheNing Hu
  0 siblings, 1 reply; 3+ messages in thread
From: Christian Couder @ 2021-06-27 21:52 UTC (permalink / raw)
  To: ZheNing Hu
  Cc: Git List, Junio C Hamano, Hariom verma, Bagas Sanjaya, Jeff King

On Sun, Jun 27, 2021 at 2:43 PM ZheNing Hu <adlternative@gmail.com> wrote:
>
> My sixth week blog finished:
> The web version is here:
> https://adlternative.github.io/GSOC-Git-Blog-6/

Great!

[...]

> Now the performance of `cat-file --batch` is almost half of the
> previous performance of it,
> because of the complex logic in `ref-filter`. So I am thinking, how
> can I find the performance
> bottleneck in `ref-filter` and optimize it?
>
> So I try to find and use performance testing tools to analyze the
> time-consuming steps of
> `git cat-file --batch`.

Yeah, that's the right thing to do at this point!

> * Using Google's `gperftools`:

[...]

> Ignore `write`, `inflate` and `000055d9164cdc36`, the execution time of `memcmp`
> and `memmove` is very large.
>
> However, whether it is `git cat-file` before or after re-using the
> `ref-filter` logic,
> the functions they call take up similar proportions of time.

Yeah, it's not clear what is causing performance issues from this. It
may be that the issue is that with the ref-filter code more objects
need to be fully read. I wonder if there is a way to count that.

> * Using `perf`:
> `perf top -p <git-pid>`
>
> ```
> 12.72% libc-2.33.so [.] __memmove_avx_unaligned_erms
> 7.39% libz.so.1.2.11 [.] inflate
> 5.56% libz.so.1.2.11 [.] 0x00000000000088ba
> 5.27% libz.so.1.2.11 [.] adler32_z
> 3.46% git [.] patch_delta
> ```
> We can see that `memmove` is still the part that accounts for the
> largest proportion of time.

Yeah. Do you think that the code is calling memmove directly or
through another function? And anyway which part of the code is
responsible for these calls?

Flame graphs (http://www.brendangregg.com/flamegraphs.html) might help
get a better idea of what happens.

> We can already come to a conclusion: A lot of the time of `cat-file
> --batch` is used for data copy,
> this may be the focus of our later performance optimization.

Thanks!

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [GSoC] Git Blog 6
  2021-06-27 21:52 ` Christian Couder
@ 2021-06-28 15:55   ` ZheNing Hu
  0 siblings, 0 replies; 3+ messages in thread
From: ZheNing Hu @ 2021-06-28 15:55 UTC (permalink / raw)
  To: Christian Couder
  Cc: Git List, Junio C Hamano, Hariom verma, Bagas Sanjaya, Jeff King

Christian Couder <christian.couder@gmail.com> 于2021年6月28日周一 上午5:53写道:
>
> > * Using Google's `gperftools`:
>
> [...]
>
> > Ignore `write`, `inflate` and `000055d9164cdc36`, the execution time of `memcmp`
> > and `memmove` is very large.
> >
> > However, whether it is `git cat-file` before or after re-using the
> > `ref-filter` logic,
> > the functions they call take up similar proportions of time.
>
> Yeah, it's not clear what is causing performance issues from this. It
> may be that the issue is that with the ref-filter code more objects
> need to be fully read. I wonder if there is a way to count that.
>

Yes, maybe we need more precise time, and if it can locate the function
in git will be better.

> > * Using `perf`:
> > `perf top -p <git-pid>`
> >
> > ```
> > 12.72% libc-2.33.so [.] __memmove_avx_unaligned_erms
> > 7.39% libz.so.1.2.11 [.] inflate
> > 5.56% libz.so.1.2.11 [.] 0x00000000000088ba
> > 5.27% libz.so.1.2.11 [.] adler32_z
> > 3.46% git [.] patch_delta
> > ```
> > We can see that `memmove` is still the part that accounts for the
> > largest proportion of time.
>
> Yeah. Do you think that the code is calling memmove directly or
> through another function? And anyway which part of the code is
> responsible for these calls?
>

There is no clear conclusion yet. But it is likely to be xmemdupz, xstrdup.


> Flame graphs (http://www.brendangregg.com/flamegraphs.html) might help
> get a better idea of what happens.
>

Thanks, I will check if we can take use of it.

> > We can already come to a conclusion: A lot of the time of `cat-file
> > --batch` is used for data copy,
> > this may be the focus of our later performance optimization.
>
> Thanks!

Thanks.
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-06-28 15:55 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-27 12:43 [GSoC] Git Blog 6 ZheNing Hu
2021-06-27 21:52 ` Christian Couder
2021-06-28 15:55   ` ZheNing Hu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.