All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrzej Hunt <andrzej@ahunt.org>
To: "SZEDER Gábor" <szeder.dev@gmail.com>,
	"Andrzej Hunt via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Andrzej Hunt <ajrhunt@google.com>
Subject: Re: [PATCH 04/12] bloom: clear each bloom_key after use
Date: Sun, 25 Apr 2021 15:17:38 +0200	[thread overview]
Message-ID: <a641ca69-05c8-a2c0-59a8-93711eb3d349@ahunt.org> (raw)
In-Reply-To: <20210411072651.GF2947267@szeder.dev>



On 11/04/2021 09:26, SZEDER Gábor wrote:
> On Fri, Apr 09, 2021 at 06:47:23PM +0000, Andrzej Hunt via GitGitGadget wrote:
>> From: Andrzej Hunt <ajrhunt@google.com>
>>
>> fill_bloom_key() allocates memory into bloom_key, we need to clean that
>> up once the key is no longer needed.
>>
>> This fixes the following leak which was found while running t0002-t0099.
>> Although this leak is happening in code being called from a test-helper,
>> the same code is also used in various locations around git, and could
>> presumably happen during normal usage too.
> 
> It does indeed happen: 'git commit-graph write --reachable
> --changed-paths' generates Bloom filters for every commit, with each
> filter containing all paths modified by its associated commit, so it
> leaks a lot of 7 * 4byte hashes.  This patch reduces the memory usage
> of that command:
> 
>                           Max RSS
>                      before      after
>    ---------------------------------------------
>    android-base     1275028k   1006576k   -21.1%
>    chromium         3245144k   3127764k    -3.6%
>    cmssw             793996k    699156k   -12.0%
>    cpython           371584k    343480k    -7.6%
>    elasticsearch     748104k    637936k   -14.7%
>    freebsd-src       819020k    741272k    -9.5%
>    gcc               867412k    730332k   -15.8%
>    gecko-dev        2619112k   2457280k    -6.2%
>    git               252684k    216900k   -14.2%
>    glibc             239000k    222228k    -7.0%
>    go                264132k    251344k    -4.9%
>    homebrew-cask     542188k    480588k   -11.4%
>    homebrew-core     805332k    715848k   -11.1%
>    jdk               417832k    342928k   -17.9%
>    libreoff-core    1257296k   1089980k   -13.3%
>    linux            2033296k   1759712k   -13.5%
>    llvm-project     1067216k    956704k   -10.4%
>    mariadb-srv       695172k    559508k   -19.5%
>    postgres          340132k    317416k    -6.7%
>    rails             325432k    294332k    -9.6%
>    rust              655244k    584904k   -10.7%
>    tensorflow        507308k    480848k    -5.2%
>    webkit           2466812k   2237332k    -9.3%
> 
> Just out of curiosity, I disabled the questionable hardcoded 512 paths
> limit on the size of modified path Bloom filters, and the memory usage
> in the jdk repository sunk by over 55%, from 849520k to 379760k.
> 
> Please feel free to include any of the above data points in the commit
> message.

Thank you for the detailed analysis - these kinds of results are very 
motivating! I will include a brief summary (something like "10% typical 
improvement for 'commit-graph write' for large repos") along with a link 
to your posting for those who want the full picture.

  reply	other threads:[~2021-04-25 13:17 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-09 18:47 [PATCH 00/12] Fix all leaks in tests t0002-t0099: Part 1 Andrzej Hunt via GitGitGadget
2021-04-09 18:47 ` [PATCH 01/12] revision: free remainder of old commit list in limit_list Andrzej Hunt via GitGitGadget
2021-04-10  7:29   ` René Scharfe
2021-04-25 13:32     ` Andrzej Hunt
2021-04-09 18:47 ` [PATCH 02/12] wt-status: fix multiple small leaks Andrzej Hunt via GitGitGadget
2021-04-09 18:47 ` [PATCH 03/12] ls-files: free max_prefix when done Andrzej Hunt via GitGitGadget
2021-04-10  8:12   ` René Scharfe
2021-04-25 13:16     ` Andrzej Hunt
2021-04-09 18:47 ` [PATCH 04/12] bloom: clear each bloom_key after use Andrzej Hunt via GitGitGadget
2021-04-11  7:26   ` SZEDER Gábor
2021-04-25 13:17     ` Andrzej Hunt [this message]
2021-04-09 18:47 ` [PATCH 05/12] branch: FREE_AND_NULL instead of NULL'ing real_ref Andrzej Hunt via GitGitGadget
2021-04-09 18:47 ` [PATCH 06/12] builtin/bugreport: don't leak prefixed filename Andrzej Hunt via GitGitGadget
2021-04-09 18:47 ` [PATCH 07/12] builtin/check-ignore: clear_pathspec before returning Andrzej Hunt via GitGitGadget
2021-04-09 18:47 ` [PATCH 08/12] builtin/checkout: clear pending objects after diffing Andrzej Hunt via GitGitGadget
2021-04-09 18:47 ` [PATCH 09/12] mailinfo: also free strbuf lists when clearing mailinfo Andrzej Hunt via GitGitGadget
2021-04-11 11:43   ` Junio C Hamano
2021-04-25 13:15     ` Andrzej Hunt
2021-04-09 18:47 ` [PATCH 10/12] builtin/for-each-ref: free filter and UNLEAK sorting Andrzej Hunt via GitGitGadget
2021-04-09 18:47 ` [PATCH 11/12] builtin/rebase: release git_format_patch_opt too Andrzej Hunt via GitGitGadget
2021-04-09 18:47 ` [PATCH 12/12] builtin/rm: avoid leaking pathspec and seen Andrzej Hunt via GitGitGadget
2021-04-25 14:16 ` [PATCH v2 00/12] Fix all leaks in tests t0002-t0099: Part 1 Andrzej Hunt via GitGitGadget
2021-04-25 14:16   ` [PATCH v2 01/12] revision: free remainder of old commit list in limit_list Andrzej Hunt via GitGitGadget
2021-04-25 14:16   ` [PATCH v2 02/12] wt-status: fix multiple small leaks Andrzej Hunt via GitGitGadget
2021-04-25 14:16   ` [PATCH v2 03/12] ls-files: free max_prefix when done Andrzej Hunt via GitGitGadget
2021-04-25 14:16   ` [PATCH v2 04/12] bloom: clear each bloom_key after use Andrzej Hunt via GitGitGadget
2021-04-25 14:16   ` [PATCH v2 05/12] branch: FREE_AND_NULL instead of NULL'ing real_ref Andrzej Hunt via GitGitGadget
2021-04-25 14:16   ` [PATCH v2 06/12] builtin/bugreport: don't leak prefixed filename Andrzej Hunt via GitGitGadget
2021-04-25 14:16   ` [PATCH v2 07/12] builtin/check-ignore: clear_pathspec before returning Andrzej Hunt via GitGitGadget
2021-04-25 14:16   ` [PATCH v2 08/12] builtin/checkout: clear pending objects after diffing Andrzej Hunt via GitGitGadget
2021-04-25 14:16   ` [PATCH v2 09/12] mailinfo: also free strbuf lists when clearing mailinfo Andrzej Hunt via GitGitGadget
2021-04-28  0:43     ` Junio C Hamano
2021-04-25 14:16   ` [PATCH v2 10/12] builtin/for-each-ref: free filter and UNLEAK sorting Andrzej Hunt via GitGitGadget
2021-04-25 14:16   ` [PATCH v2 11/12] builtin/rebase: release git_format_patch_opt too Andrzej Hunt via GitGitGadget
2021-04-25 14:16   ` [PATCH v2 12/12] builtin/rm: avoid leaking pathspec and seen Andrzej Hunt via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a641ca69-05c8-a2c0-59a8-93711eb3d349@ahunt.org \
    --to=andrzej@ahunt.org \
    --cc=ajrhunt@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.