All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com,
	chakrabortyabhradeep79@gmail.com,
	Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <derrickstolee@github.com>
Subject: [PATCH] midx:  reduce memory pressure while writing bitmaps
Date: Mon, 18 Jul 2022 20:36:05 +0000	[thread overview]
Message-ID: <pull.1292.git.1658176565751.gitgitgadget@gmail.com> (raw)

From: Derrick Stolee <derrickstolee@github.com>

We noticed that some 'git multi-pack-index write --bitmap' processes
were running with very high memory. It turns out that a lot of this
memory is required to store a list of every object in the written
multi-pack-index, with a second copy that has additional information
used for the bitmap writing logic.

Using 'valgrind --tool=massif' before this change, the following chart
shows how memory load increased and was maintained throughout the
process:

    GB
4.102^                                                             ::
     |              @  @::@@::@@::::::::@::::::@@:#:::::::::::::@@:: :
     |         :::::@@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |      :::: :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |    :::: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |    : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |    : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     | @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     | @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
   0 +--------------------------------------------------------------->

It turns out that the 'struct write_midx_context' data is persisting
through the life of the process, including the 'entries' array. This
array is used last inside find_commits_for_midx_bitmap() within
write_midx_bitmap(). If we free (and nullify) the array at that point,
we can free a decent chunk of memory before the bitmap logic adds more
to the memory footprint.

Here is the massif memory load chart after this change:

    GB
3.111^      #
     |      #                              :::::::::::@::::::::::::::@
     |      #        ::::::::::::::::::::::::: : :: : @:: ::::: :: ::@
     |     @#  :::::::::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |  :::@#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |  :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |  :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |  :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
   0 +--------------------------------------------------------------->

It is unfortunate that the lifetime of the 'entries' array is less
clear. To make this simpler, I added a few things to try and prevent an
accidental reference:

 1. Using FREE_AND_NULL() we will at least get a segfault from reading a
    NULL pointer instead of a use-after-free.

 2. 'entries_nr' is also set to zero to make any loop that would iterate
    over the entries be trivial.

 3. Set the 'ctx' pointer to NULL within write_midx_bitmap() so it does
    not get another reference later. This requires adding a local copy
    of 'pack_order' giving us a reference that we can use later in the
    method.

 4. Add significant comments in write_midx_bitmap() and
    write_midx_internal() to add warnings for future authors who might
    accidentally add references to this cleared memory.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
    midx: reduce memory pressure while writing bitmaps
    
    The thing I'm most worried about with this patch is whether there is
    enough (or too much) defensive programming.
    
    Thanks, -Stolee

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1292%2Fderrickstolee%2Fbitmap-memory-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1292/derrickstolee/bitmap-memory-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1292

 midx.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index 5f0dd386b02..cc31d803a5f 100644
--- a/midx.c
+++ b/midx.c
@@ -1063,6 +1063,7 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
 	struct commit **commits = NULL;
 	uint32_t i, commits_nr;
 	uint16_t options = 0;
+	uint32_t *pack_order;
 	char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, hash_to_hex(midx_hash));
 	int ret;
 
@@ -1076,6 +1077,15 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
 
 	commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, ctx);
 
+	/*
+	 * Remove the ctx.entries to reduce memory pressure.
+	 * Nullify 'ctx' to help avoid adding new references to ctx->entries.
+	 */
+	FREE_AND_NULL(ctx->entries);
+	ctx->entries_nr = 0;
+	pack_order = ctx->pack_order;
+	ctx = NULL;
+
 	/*
 	 * Build the MIDX-order index based on pdata.objects (which is already
 	 * in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of
@@ -1102,7 +1112,7 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
 	 * bitmap_writer_finish().
 	 */
 	for (i = 0; i < pdata.nr_objects; i++)
-		index[ctx->pack_order[i]] = &pdata.objects[i].idx;
+		index[pack_order[i]] = &pdata.objects[i].idx;
 
 	bitmap_writer_select_commits(commits, commits_nr, -1);
 	ret = bitmap_writer_build(&pdata);
@@ -1443,6 +1453,11 @@ static int write_midx_internal(const char *object_dir,
 	if (flags & MIDX_WRITE_REV_INDEX &&
 	    git_env_bool("GIT_TEST_MIDX_WRITE_REV", 0))
 		write_midx_reverse_index(midx_name.buf, midx_hash, &ctx);
+
+	/*
+	 * Writing the bitmap must be last, as it will free ctx.entries
+	 * to reduce memory pressure during the bitmap write.
+	 */
 	if (flags & MIDX_WRITE_BITMAP) {
 		if (write_midx_bitmap(midx_name.buf, midx_hash, &ctx,
 				      refs_snapshot, flags) < 0) {

base-commit: 9dd64cb4d310986dd7b8ca7fff92f9b61e0bd21a
-- 
gitgitgadget

             reply	other threads:[~2022-07-18 20:36 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-18 20:36 Derrick Stolee via GitGitGadget [this message]
2022-07-18 21:47 ` [PATCH] midx: reduce memory pressure while writing bitmaps Ævar Arnfjörð Bjarmason
2022-07-19 13:50   ` Derrick Stolee
2022-07-19 15:26 ` [PATCH v2 0/3] " Derrick Stolee via GitGitGadget
2022-07-19 15:26   ` [PATCH v2 1/3] pack-bitmap-write: use const for hashes Derrick Stolee via GitGitGadget
2022-07-19 15:26   ` [PATCH v2 2/3] midx: extract bitmap write setup Derrick Stolee via GitGitGadget
2022-07-19 15:26   ` [PATCH v2 3/3] midx: reduce memory pressure while writing bitmaps Derrick Stolee via GitGitGadget
2022-07-19 15:59     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.1292.git.1658176565751.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=chakrabortyabhradeep79@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.