From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com,
chakrabortyabhradeep79@gmail.com,
Derrick Stolee <derrickstolee@github.com>,
Derrick Stolee <derrickstolee@github.com>
Subject: [PATCH] midx: reduce memory pressure while writing bitmaps
Date: Mon, 18 Jul 2022 20:36:05 +0000 [thread overview]
Message-ID: <pull.1292.git.1658176565751.gitgitgadget@gmail.com> (raw)
From: Derrick Stolee <derrickstolee@github.com>
We noticed that some 'git multi-pack-index write --bitmap' processes
were running with very high memory. It turns out that a lot of this
memory is required to store a list of every object in the written
multi-pack-index, with a second copy that has additional information
used for the bitmap writing logic.
Using 'valgrind --tool=massif' before this change, the following chart
shows how memory load increased and was maintained throughout the
process:
GB
4.102^ ::
| @ @::@@::@@::::::::@::::::@@:#:::::::::::::@@:: :
| :::::@@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| :::: :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| :::: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
0 +--------------------------------------------------------------->
It turns out that the 'struct write_midx_context' data is persisting
through the life of the process, including the 'entries' array. This
array is used last inside find_commits_for_midx_bitmap() within
write_midx_bitmap(). If we free (and nullify) the array at that point,
we can free a decent chunk of memory before the bitmap logic adds more
to the memory footprint.
Here is the massif memory load chart after this change:
GB
3.111^ #
| # :::::::::::@::::::::::::::@
| # ::::::::::::::::::::::::: : :: : @:: ::::: :: ::@
| @# :::::::::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| :::@#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
0 +--------------------------------------------------------------->
It is unfortunate that the lifetime of the 'entries' array is less
clear. To make this simpler, I added a few things to try and prevent an
accidental reference:
1. Using FREE_AND_NULL() we will at least get a segfault from reading a
NULL pointer instead of a use-after-free.
2. 'entries_nr' is also set to zero to make any loop that would iterate
over the entries be trivial.
3. Set the 'ctx' pointer to NULL within write_midx_bitmap() so it does
not get another reference later. This requires adding a local copy
of 'pack_order' giving us a reference that we can use later in the
method.
4. Add significant comments in write_midx_bitmap() and
write_midx_internal() to add warnings for future authors who might
accidentally add references to this cleared memory.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
midx: reduce memory pressure while writing bitmaps
The thing I'm most worried about with this patch is whether there is
enough (or too much) defensive programming.
Thanks, -Stolee
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1292%2Fderrickstolee%2Fbitmap-memory-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1292/derrickstolee/bitmap-memory-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1292
midx.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/midx.c b/midx.c
index 5f0dd386b02..cc31d803a5f 100644
--- a/midx.c
+++ b/midx.c
@@ -1063,6 +1063,7 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
struct commit **commits = NULL;
uint32_t i, commits_nr;
uint16_t options = 0;
+ uint32_t *pack_order;
char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, hash_to_hex(midx_hash));
int ret;
@@ -1076,6 +1077,15 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, ctx);
+ /*
+ * Remove the ctx.entries to reduce memory pressure.
+ * Nullify 'ctx' to help avoid adding new references to ctx->entries.
+ */
+ FREE_AND_NULL(ctx->entries);
+ ctx->entries_nr = 0;
+ pack_order = ctx->pack_order;
+ ctx = NULL;
+
/*
* Build the MIDX-order index based on pdata.objects (which is already
* in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of
@@ -1102,7 +1112,7 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
* bitmap_writer_finish().
*/
for (i = 0; i < pdata.nr_objects; i++)
- index[ctx->pack_order[i]] = &pdata.objects[i].idx;
+ index[pack_order[i]] = &pdata.objects[i].idx;
bitmap_writer_select_commits(commits, commits_nr, -1);
ret = bitmap_writer_build(&pdata);
@@ -1443,6 +1453,11 @@ static int write_midx_internal(const char *object_dir,
if (flags & MIDX_WRITE_REV_INDEX &&
git_env_bool("GIT_TEST_MIDX_WRITE_REV", 0))
write_midx_reverse_index(midx_name.buf, midx_hash, &ctx);
+
+ /*
+ * Writing the bitmap must be last, as it will free ctx.entries
+ * to reduce memory pressure during the bitmap write.
+ */
if (flags & MIDX_WRITE_BITMAP) {
if (write_midx_bitmap(midx_name.buf, midx_hash, &ctx,
refs_snapshot, flags) < 0) {
base-commit: 9dd64cb4d310986dd7b8ca7fff92f9b61e0bd21a
--
gitgitgadget
next reply other threads:[~2022-07-18 20:36 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-18 20:36 Derrick Stolee via GitGitGadget [this message]
2022-07-18 21:47 ` [PATCH] midx: reduce memory pressure while writing bitmaps Ævar Arnfjörð Bjarmason
2022-07-19 13:50 ` Derrick Stolee
2022-07-19 15:26 ` [PATCH v2 0/3] " Derrick Stolee via GitGitGadget
2022-07-19 15:26 ` [PATCH v2 1/3] pack-bitmap-write: use const for hashes Derrick Stolee via GitGitGadget
2022-07-19 15:26 ` [PATCH v2 2/3] midx: extract bitmap write setup Derrick Stolee via GitGitGadget
2022-07-19 15:26 ` [PATCH v2 3/3] midx: reduce memory pressure while writing bitmaps Derrick Stolee via GitGitGadget
2022-07-19 15:59 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.1292.git.1658176565751.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=chakrabortyabhradeep79@gmail.com \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
--cc=vdye@github.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.