All of lore.kernel.org
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: "René Scharfe" <l.s.r@web.de>, "Git Mailing List" <git@vger.kernel.org>
Cc: "SZEDER Gábor" <szeder.dev@gmail.com>,
	"Martin Ågren" <martin.agren@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>
Subject: Re: [PATCH 10/10] name-rev: release unused name strings
Date: Wed, 5 Feb 2020 10:19:16 -0500	[thread overview]
Message-ID: <35b282f8-c3a9-e7e3-5ea8-0542e7ce24ac@gmail.com> (raw)
In-Reply-To: <4eddc458-6294-9b9c-857b-50ba484a7168@web.de>

On 2/4/2020 4:26 PM, René Scharfe wrote:
> The runtime actually increases slightly from:
> 
> Benchmark #1: ./git -C ../linux/ name-rev --all
>   Time (mean ± σ):     828.8 ms ±   5.0 ms    [User: 797.2 ms, System: 31.6 ms]
>   Range (min … max):   824.1 ms … 838.9 ms    10 runs
> 
> ... to:
> 
> Benchmark #1: ./git -C ../linux/ name-rev --all
>   Time (mean ± σ):     847.6 ms ±   3.4 ms    [User: 807.9 ms, System: 39.6 ms]
>   Range (min … max):   843.4 ms … 854.3 ms    10 runs
> 
> Why is that?  In the Chromium repo, ca. 44000 free(3) calls in
> create_or_update_name() release almost 1GB, while in the Linux repo
> 240000+ calls release a bit more than 5MB, so the average discarded
> name is ca.  1000x longer in the latter.
> 
> Overall I think it's the right tradeoff to make, as it helps curb the
> memory usage in repositories with big discarded names, and the added
> overhead is small.

I agree this trade-off is worth it. Your reasoning for why it is
happening makes sense, too.

> +	if (is_valid_rev_name(name)) {
> +		if (!is_better_name(name, taggerdate, distance, from_tag))
> +			return NULL;
> +
> +		/*
> +		 * This string might still be shared with ancestors
> +		 * (generation > 0).  We can release it here regardless,
> +		 * because the new name that has just won will be better
> +		 * for them as well, so name_rev() will replace these
> +		 * stale pointers when it processes the parents.
> +		 */
> +		if (!name->generation)
> +			free(name->tip_name);
> +	}

And here, this idea of "still be shared with ancestors" is confusing
without the additional context that the name-rev algorithm is using
depth-first-search to find the "best" name. At this point, we are
trying to replace the existing name with a better one, and use
"generation == 0" to declare "I am the initial owner of tip_name".
The rest of the ancestors will replace their tip_name pointer with
the new name, all while not accessing this freed memory.

Keeping such dangling references to freed memory is certainly
dangerous, but these references are short-lived within the name_rev()
method. That limits the possible ways this could cause issues in
the future.

Thanks,
-Stolee


  reply	other threads:[~2020-02-05 15:19 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-04 21:12 [PATCH 00/10] name-rev: improve memory usage René Scharfe
2020-02-04 21:14 ` [PATCH 01/10] name-rev: rewrite create_or_update_name() René Scharfe
2020-02-05  2:00   ` Derrick Stolee
2020-02-05  2:35     ` Taylor Blau
2020-02-05 16:45   ` Andrei Rybak
2020-02-05 16:47     ` René Scharfe
2020-02-04 21:15 ` [PATCH 02/10] name-rev: remove unused typedef René Scharfe
2020-02-04 21:16 ` [PATCH 03/10] name-rev: respect const qualifier René Scharfe
2020-02-04 21:17 ` [PATCH 04/10] name-rev: don't leak path copy in name_ref() René Scharfe
2020-02-05 14:35   ` Derrick Stolee
2020-02-04 21:20 ` [PATCH 05/10] name-rev: don't _peek() in create_or_update_name() René Scharfe
2020-02-04 21:22 ` [PATCH 06/10] name-rev: put struct rev_name into commit slab René Scharfe
2020-02-04 21:23 ` [PATCH 07/10] name-rev: factor out get_parent_name() René Scharfe
2020-02-04 21:24 ` [PATCH 08/10] name-rev: pre-size buffer in get_parent_name() René Scharfe
2020-02-05  3:19   ` Derrick Stolee
2020-02-05 15:16     ` René Scharfe
2020-02-04 21:25 ` [PATCH 09/10] name-rev: generate name strings only if they are better René Scharfe
2020-02-05 15:11   ` Derrick Stolee
2020-02-05 15:50     ` René Scharfe
2020-02-04 21:26 ` [PATCH 10/10] name-rev: release unused name strings René Scharfe
2020-02-05 15:19   ` Derrick Stolee [this message]
2020-02-05  3:28 ` [PATCH 00/10] name-rev: improve memory usage Derrick Stolee
2020-02-05 15:20   ` Derrick Stolee
2020-02-05 17:19 ` [PATCH 01/10 RESEND AUTHOR FIXED] name-rev: rewrite create_or_update_name() René Scharfe
2020-02-05 17:50 ` [PATCH 11/10] name-rev: sort tip names before applying René Scharfe
2020-02-05 18:23   ` Junio C Hamano
2020-02-05 18:55     ` René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=35b282f8-c3a9-e7e3-5ea8-0542e7ce24ac@gmail.com \
    --to=stolee@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    --cc=martin.agren@gmail.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.