From: Andrzej Hunt <andrzej@ahunt.org>
To: "SZEDER Gábor" <szeder.dev@gmail.com>,
"Andrzej Hunt via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Andrzej Hunt <ajrhunt@google.com>
Subject: Re: [PATCH 04/12] bloom: clear each bloom_key after use
Date: Sun, 25 Apr 2021 15:17:38 +0200 [thread overview]
Message-ID: <a641ca69-05c8-a2c0-59a8-93711eb3d349@ahunt.org> (raw)
In-Reply-To: <20210411072651.GF2947267@szeder.dev>
On 11/04/2021 09:26, SZEDER Gábor wrote:
> On Fri, Apr 09, 2021 at 06:47:23PM +0000, Andrzej Hunt via GitGitGadget wrote:
>> From: Andrzej Hunt <ajrhunt@google.com>
>>
>> fill_bloom_key() allocates memory into bloom_key, we need to clean that
>> up once the key is no longer needed.
>>
>> This fixes the following leak which was found while running t0002-t0099.
>> Although this leak is happening in code being called from a test-helper,
>> the same code is also used in various locations around git, and could
>> presumably happen during normal usage too.
>
> It does indeed happen: 'git commit-graph write --reachable
> --changed-paths' generates Bloom filters for every commit, with each
> filter containing all paths modified by its associated commit, so it
> leaks a lot of 7 * 4byte hashes. This patch reduces the memory usage
> of that command:
>
> Max RSS
> before after
> ---------------------------------------------
> android-base 1275028k 1006576k -21.1%
> chromium 3245144k 3127764k -3.6%
> cmssw 793996k 699156k -12.0%
> cpython 371584k 343480k -7.6%
> elasticsearch 748104k 637936k -14.7%
> freebsd-src 819020k 741272k -9.5%
> gcc 867412k 730332k -15.8%
> gecko-dev 2619112k 2457280k -6.2%
> git 252684k 216900k -14.2%
> glibc 239000k 222228k -7.0%
> go 264132k 251344k -4.9%
> homebrew-cask 542188k 480588k -11.4%
> homebrew-core 805332k 715848k -11.1%
> jdk 417832k 342928k -17.9%
> libreoff-core 1257296k 1089980k -13.3%
> linux 2033296k 1759712k -13.5%
> llvm-project 1067216k 956704k -10.4%
> mariadb-srv 695172k 559508k -19.5%
> postgres 340132k 317416k -6.7%
> rails 325432k 294332k -9.6%
> rust 655244k 584904k -10.7%
> tensorflow 507308k 480848k -5.2%
> webkit 2466812k 2237332k -9.3%
>
> Just out of curiosity, I disabled the questionable hardcoded 512 paths
> limit on the size of modified path Bloom filters, and the memory usage
> in the jdk repository sunk by over 55%, from 849520k to 379760k.
>
> Please feel free to include any of the above data points in the commit
> message.
Thank you for the detailed analysis - these kinds of results are very
motivating! I will include a brief summary (something like "10% typical
improvement for 'commit-graph write' for large repos") along with a link
to your posting for those who want the full picture.
next prev parent reply other threads:[~2021-04-25 13:17 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-09 18:47 [PATCH 00/12] Fix all leaks in tests t0002-t0099: Part 1 Andrzej Hunt via GitGitGadget
2021-04-09 18:47 ` [PATCH 01/12] revision: free remainder of old commit list in limit_list Andrzej Hunt via GitGitGadget
2021-04-10 7:29 ` René Scharfe
2021-04-25 13:32 ` Andrzej Hunt
2021-04-09 18:47 ` [PATCH 02/12] wt-status: fix multiple small leaks Andrzej Hunt via GitGitGadget
2021-04-09 18:47 ` [PATCH 03/12] ls-files: free max_prefix when done Andrzej Hunt via GitGitGadget
2021-04-10 8:12 ` René Scharfe
2021-04-25 13:16 ` Andrzej Hunt
2021-04-09 18:47 ` [PATCH 04/12] bloom: clear each bloom_key after use Andrzej Hunt via GitGitGadget
2021-04-11 7:26 ` SZEDER Gábor
2021-04-25 13:17 ` Andrzej Hunt [this message]
2021-04-09 18:47 ` [PATCH 05/12] branch: FREE_AND_NULL instead of NULL'ing real_ref Andrzej Hunt via GitGitGadget
2021-04-09 18:47 ` [PATCH 06/12] builtin/bugreport: don't leak prefixed filename Andrzej Hunt via GitGitGadget
2021-04-09 18:47 ` [PATCH 07/12] builtin/check-ignore: clear_pathspec before returning Andrzej Hunt via GitGitGadget
2021-04-09 18:47 ` [PATCH 08/12] builtin/checkout: clear pending objects after diffing Andrzej Hunt via GitGitGadget
2021-04-09 18:47 ` [PATCH 09/12] mailinfo: also free strbuf lists when clearing mailinfo Andrzej Hunt via GitGitGadget
2021-04-11 11:43 ` Junio C Hamano
2021-04-25 13:15 ` Andrzej Hunt
2021-04-09 18:47 ` [PATCH 10/12] builtin/for-each-ref: free filter and UNLEAK sorting Andrzej Hunt via GitGitGadget
2021-04-09 18:47 ` [PATCH 11/12] builtin/rebase: release git_format_patch_opt too Andrzej Hunt via GitGitGadget
2021-04-09 18:47 ` [PATCH 12/12] builtin/rm: avoid leaking pathspec and seen Andrzej Hunt via GitGitGadget
2021-04-25 14:16 ` [PATCH v2 00/12] Fix all leaks in tests t0002-t0099: Part 1 Andrzej Hunt via GitGitGadget
2021-04-25 14:16 ` [PATCH v2 01/12] revision: free remainder of old commit list in limit_list Andrzej Hunt via GitGitGadget
2021-04-25 14:16 ` [PATCH v2 02/12] wt-status: fix multiple small leaks Andrzej Hunt via GitGitGadget
2021-04-25 14:16 ` [PATCH v2 03/12] ls-files: free max_prefix when done Andrzej Hunt via GitGitGadget
2021-04-25 14:16 ` [PATCH v2 04/12] bloom: clear each bloom_key after use Andrzej Hunt via GitGitGadget
2021-04-25 14:16 ` [PATCH v2 05/12] branch: FREE_AND_NULL instead of NULL'ing real_ref Andrzej Hunt via GitGitGadget
2021-04-25 14:16 ` [PATCH v2 06/12] builtin/bugreport: don't leak prefixed filename Andrzej Hunt via GitGitGadget
2021-04-25 14:16 ` [PATCH v2 07/12] builtin/check-ignore: clear_pathspec before returning Andrzej Hunt via GitGitGadget
2021-04-25 14:16 ` [PATCH v2 08/12] builtin/checkout: clear pending objects after diffing Andrzej Hunt via GitGitGadget
2021-04-25 14:16 ` [PATCH v2 09/12] mailinfo: also free strbuf lists when clearing mailinfo Andrzej Hunt via GitGitGadget
2021-04-28 0:43 ` Junio C Hamano
2021-04-25 14:16 ` [PATCH v2 10/12] builtin/for-each-ref: free filter and UNLEAK sorting Andrzej Hunt via GitGitGadget
2021-04-25 14:16 ` [PATCH v2 11/12] builtin/rebase: release git_format_patch_opt too Andrzej Hunt via GitGitGadget
2021-04-25 14:16 ` [PATCH v2 12/12] builtin/rm: avoid leaking pathspec and seen Andrzej Hunt via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a641ca69-05c8-a2c0-59a8-93711eb3d349@ahunt.org \
--to=andrzej@ahunt.org \
--cc=ajrhunt@google.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=szeder.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.