All of lore.kernel.org
 help / color / mirror / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: Git Mailing List <git@vger.kernel.org>
Cc: "SZEDER Gábor" <szeder.dev@gmail.com>,
	"Martin Ågren" <martin.agren@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>
Subject: [PATCH 10/10] name-rev: release unused name strings
Date: Tue, 4 Feb 2020 22:26:18 +0100	[thread overview]
Message-ID: <4eddc458-6294-9b9c-857b-50ba484a7168@web.de> (raw)
In-Reply-To: <084909f8-fefa-1fe0-b2ce-74eff47c4972@web.de>

name_rev() assigns a name to a commit and its parents and grandparents
and so on.  Commits share their name string with their first parent,
which in turn does the same, recursively to the root.  That saves a lot
of allocations.  When a better name is found, the old name is replaced,
but its memory is not released.  That leakage can become significant.

Can we release these old strings exactly once even though they are
referenced multiple times?  Yes, indeed -- we can make use of the fact
that name_rev() visits the ancestors of a commit after it set a new name
for it and tries to update their names as well.

Members of the first ancestral line have the same taggerdate and
from_tag values, but a higher distance value than their child commit at
generation 0.  These are the only criteria used by is_better_name().
Lower distance values are considered better, so a name that is better
for a child will also be better for its parent and grandparent etc.

That means we can free(3) an inferior name at generation 0 and rely on
name_rev() to replace all references in ancestors as well.

If we do that then we need to stop using the string pointer alone to
distinguish new empty rev_name slots from initialized ones, though, as
it technically becomes invalid after the free(3) call -- even though its
value is still different from NULL.

We can check the generation value first, as empty slots will have it
initialized to 0, and for the actual generation 0 we'll set a new valid
name right after the create_or_update_name() call that releases the
string.

For the Chromium repo, releasing superceded names reduces the memory
footprint of name-rev --all significantly.  Here's the output of GNU
time before:

0.98user 0.48system 0:01.46elapsed 99%CPU (0avgtext+0avgdata 2601812maxresident)k
0inputs+0outputs (0major+571470minor)pagefaults 0swaps

... and with this patch:

1.01user 0.26system 0:01.28elapsed 100%CPU (0avgtext+0avgdata 1559196maxresident)k
0inputs+0outputs (0major+314370minor)pagefaults 0swaps

It also gets faster; hyperfine before:

Benchmark #1: ./git -C ../chromium/src name-rev --all
  Time (mean ± σ):      1.534 s ±  0.006 s    [User: 1.039 s, System: 0.494 s]
  Range (min … max):    1.522 s …  1.542 s    10 runs

... and with this patch:

Benchmark #1: ./git -C ../chromium/src name-rev --all
  Time (mean ± σ):      1.338 s ±  0.006 s    [User: 1.047 s, System: 0.291 s]
  Range (min … max):    1.327 s …  1.346 s    10 runs

For the Linux repo it doesn't pay off; memory usage only gets down from:

0.76user 0.03system 0:00.80elapsed 99%CPU (0avgtext+0avgdata 292848maxresident)k
0inputs+0outputs (0major+44579minor)pagefaults 0swaps

... to:

0.78user 0.03system 0:00.81elapsed 100%CPU (0avgtext+0avgdata 284696maxresident)k
0inputs+0outputs (0major+44892minor)pagefaults 0swaps

The runtime actually increases slightly from:

Benchmark #1: ./git -C ../linux/ name-rev --all
  Time (mean ± σ):     828.8 ms ±   5.0 ms    [User: 797.2 ms, System: 31.6 ms]
  Range (min … max):   824.1 ms … 838.9 ms    10 runs

... to:

Benchmark #1: ./git -C ../linux/ name-rev --all
  Time (mean ± σ):     847.6 ms ±   3.4 ms    [User: 807.9 ms, System: 39.6 ms]
  Range (min … max):   843.4 ms … 854.3 ms    10 runs

Why is that?  In the Chromium repo, ca. 44000 free(3) calls in
create_or_update_name() release almost 1GB, while in the Linux repo
240000+ calls release a bit more than 5MB, so the average discarded
name is ca.  1000x longer in the latter.

Overall I think it's the right tradeoff to make, as it helps curb the
memory usage in repositories with big discarded names, and the added
overhead is small.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 builtin/name-rev.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 98f55bcea9..23a639ff30 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -17,7 +17,7 @@
 #define CUTOFF_DATE_SLOP 86400

 struct rev_name {
-	const char *tip_name;
+	char *tip_name;
 	timestamp_t taggerdate;
 	int generation;
 	int distance;
@@ -34,7 +34,7 @@ static struct commit_rev_name rev_names;

 static int is_valid_rev_name(const struct rev_name *name)
 {
-	return name && name->tip_name;
+	return name && (name->generation || name->tip_name);
 }

 static struct rev_name *get_commit_rev_name(const struct commit *commit)
@@ -87,9 +87,20 @@ static struct rev_name *create_or_update_name(struct commit *commit,
 {
 	struct rev_name *name = commit_rev_name_at(&rev_names, commit);

-	if (is_valid_rev_name(name) &&
-	    !is_better_name(name, taggerdate, distance, from_tag))
-		return NULL;
+	if (is_valid_rev_name(name)) {
+		if (!is_better_name(name, taggerdate, distance, from_tag))
+			return NULL;
+
+		/*
+		 * This string might still be shared with ancestors
+		 * (generation > 0).  We can release it here regardless,
+		 * because the new name that has just won will be better
+		 * for them as well, so name_rev() will replace these
+		 * stale pointers when it processes the parents.
+		 */
+		if (!name->generation)
+			free(name->tip_name);
+	}

 	name->taggerdate = taggerdate;
 	name->generation = generation;
--
2.25.0

  parent reply	other threads:[~2020-02-04 21:26 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-04 21:12 [PATCH 00/10] name-rev: improve memory usage René Scharfe
2020-02-04 21:14 ` [PATCH 01/10] name-rev: rewrite create_or_update_name() René Scharfe
2020-02-05  2:00   ` Derrick Stolee
2020-02-05  2:35     ` Taylor Blau
2020-02-05 16:45   ` Andrei Rybak
2020-02-05 16:47     ` René Scharfe
2020-02-04 21:15 ` [PATCH 02/10] name-rev: remove unused typedef René Scharfe
2020-02-04 21:16 ` [PATCH 03/10] name-rev: respect const qualifier René Scharfe
2020-02-04 21:17 ` [PATCH 04/10] name-rev: don't leak path copy in name_ref() René Scharfe
2020-02-05 14:35   ` Derrick Stolee
2020-02-04 21:20 ` [PATCH 05/10] name-rev: don't _peek() in create_or_update_name() René Scharfe
2020-02-04 21:22 ` [PATCH 06/10] name-rev: put struct rev_name into commit slab René Scharfe
2020-02-04 21:23 ` [PATCH 07/10] name-rev: factor out get_parent_name() René Scharfe
2020-02-04 21:24 ` [PATCH 08/10] name-rev: pre-size buffer in get_parent_name() René Scharfe
2020-02-05  3:19   ` Derrick Stolee
2020-02-05 15:16     ` René Scharfe
2020-02-04 21:25 ` [PATCH 09/10] name-rev: generate name strings only if they are better René Scharfe
2020-02-05 15:11   ` Derrick Stolee
2020-02-05 15:50     ` René Scharfe
2020-02-04 21:26 ` René Scharfe [this message]
2020-02-05 15:19   ` [PATCH 10/10] name-rev: release unused name strings Derrick Stolee
2020-02-05  3:28 ` [PATCH 00/10] name-rev: improve memory usage Derrick Stolee
2020-02-05 15:20   ` Derrick Stolee
2020-02-05 17:19 ` [PATCH 01/10 RESEND AUTHOR FIXED] name-rev: rewrite create_or_update_name() René Scharfe
2020-02-05 17:50 ` [PATCH 11/10] name-rev: sort tip names before applying René Scharfe
2020-02-05 18:23   ` Junio C Hamano
2020-02-05 18:55     ` René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4eddc458-6294-9b9c-857b-50ba484a7168@web.de \
    --to=l.s.r@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=martin.agren@gmail.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.