All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Jeff King <peff@peff.net>,
	Eric Sunshine <sunshine@sunshineco.com>,
	Elijah Newren <newren@gmail.com>,
	Derrick Stolee <stolee@gmail.com>,
	Elijah Newren <newren@gmail.com>
Subject: [PATCH v3 0/9] Final optimization batch (#15): use memory pools
Date: Fri, 30 Jul 2021 11:47:35 +0000	[thread overview]
Message-ID: <pull.990.v3.git.1627645664.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.990.v2.git.1627531121.gitgitgadget@gmail.com>

This series textually depends on en/ort-perf-batch-14, but the ideas are
orthogonal to it and orthogonal to previous series. It can be reviewed
independently.

Changes since v1, addressing Eric's feedback:

 * Fixed a comment that became out-of-date in patch 1
 * Swapped commits 2 and 3 so that one can better motivate the other.

Changes since v2, addressing Peff's feedback

 * Rebased on en/ort-perf-batch-14 (resolving a trivial conflict with the
   new string_list_init_nodup() usage)
 * Added a new preliminary patch renaming str*_func() to str*_clear_func()
 * Added a new final patch that hardcodes that we'll just use memory pools

=== Basic Optimization idea ===

In this series, I make use of memory pools to get faster allocations and
deallocations for many data structures that tend to all be deallocated at
the same time anyway.

=== Results ===

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28), the
changes in just this series improves the performance as follows:

                     Before Series           After Series
no-renames:      204.2  ms ±  3.0  ms    198.3 ms ±  2.9 ms
mega-renames:      1.076 s ±  0.015 s    661.8 ms ±  5.9 ms
just-one-mega:   364.1  ms ±  7.0  ms    264.6 ms ±  2.5 ms


As a reminder, before any merge-ort/diffcore-rename performance work, the
performance results we started with were:

no-renames-am:      6.940 s ±  0.485 s
no-renames:        18.912 s ±  0.174 s
mega-renames:    5964.031 s ± 10.459 s
just-one-mega:    149.583 s ±  0.751 s


=== Overall Results across all optimization work ===

This is my final prepared optimization series. It might be worth reviewing
how my optimizations fared overall, comparing the original merge-recursive
timings with three things: how much merge-recursive improved (as a
side-effect of optimizing merge-ort), how much improvement we would have
gotten from a hypothetical infinite parallelization of rename detection, and
what I achieved at the end with merge-ort:

                               Timings

                                          Infinite
                 merge-       merge-     Parallelism
                recursive    recursive    of rename    merge-ort
                 v2.30.0      current     detection     current
                ----------   ---------   -----------   ---------
no-renames:       18.912 s    18.030 s     11.699 s     198.3 ms
mega-renames:   5964.031 s   361.281 s    203.886 s     661.8 ms
just-one-mega:   149.583 s    11.009 s      7.553 s     264.6 ms

                           Speedup factors

                                          Infinite
                 merge-       merge-     Parallelism
                recursive    recursive    of rename
                 v2.30.0      current     detection    merge-ort
                ----------   ---------   -----------   ---------
no-renames:         1           1.05         1.6           95
mega-renames:       1          16.5         29           9012
just-one-mega:      1          13.6         20            565


And, for partial clone users:

             Factor reduction in number of objects needed

                                          Infinite
                 merge-       merge-     Parallelism
                recursive    recursive    of rename
                 v2.30.0      current     detection    merge-ort
                ----------   ---------   -----------   ---------
mega-renames:       1            1            1          181.3


=== Caveat ===

It may be worth noting, though, that my optimization numbers above for
merge-ort use test-tool fast-rebase. git rebase -s ort on the three
testcases above is 5-20 times slower (taking 3.835s, 6.798s, and 1.235s,
respectively). At this point, any further optimization work should go into
making a faster full-featured rebase by copying the ideas from fast-rebase:
avoid unnecessary process forking, avoid updating the index and working copy
until either the rebase is finished or you hit a conflict (and don't write
rebase metadata to disk until that point either), get rid of the glacially
slow revision walking of the upstream side of history (nuke
can_fast_forward(), make --reapply-cherry-picks the default) or at least
don't revision walk so many times (multiple calls to get_merge_bases in
can_fast_forward() plus a is_linear_history() walk, checking for upstream
cherry-picks, probably more), turn off per-commit hooks that probably should
have never been on anyway, etc.

Elijah Newren (9):
  merge-ort: rename str{map,intmap,set}_func()
  diffcore-rename: use a mem_pool for exact rename detection's hashmap
  merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers
  merge-ort: set up a memory pool
  merge-ort: switch our strmaps over to using memory pools
  diffcore-rename, merge-ort: add wrapper functions for filepair
    alloc/dealloc
  merge-ort: store filepairs and filespecs in our mem_pool
  merge-ort: reuse path strings in pool_alloc_filespec
  merge-ort: remove compile-time ability to turn off usage of memory
    pools

 diffcore-rename.c |  68 +++++++++++--
 diffcore.h        |   3 +
 merge-ort.c       | 238 ++++++++++++++++++++++++++++++++--------------
 3 files changed, 231 insertions(+), 78 deletions(-)


base-commit: 8b09a900a1f1f00d4deb04f567994ae8f1804b5e
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-990%2Fnewren%2Fort-perf-batch-15-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-990/newren/ort-perf-batch-15-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/990

Range-diff vs v2:

  -:  ----------- >  1:  e075d985f26 merge-ort: rename str{map,intmap,set}_func()
  1:  ea08b34d29b =  2:  8416afa89fb diffcore-rename: use a mem_pool for exact rename detection's hashmap
  2:  fdfc2b93ba4 !  3:  2c0b90eaba5 merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers
     @@ Metadata
       ## Commit message ##
          merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers
      
     -    We need functions which will either call
     +    Make the code more flexible so that it can handle both being run with or
     +    without a memory pool by adding utility functions which will either call
              xmalloc, xcalloc, xstrndup
          or
              mem_pool_alloc, mem_pool_calloc, mem_pool_strndup
     -    depending on whether we have a non-NULL memory pool.  Add these
     -    functions; a subsequent commit will make use of these.
     +    depending on whether we have a non-NULL memory pool.  A subsequent
     +    commit will make use of these.
     +
     +    (We will actually be dropping these functions soon and just assuming we
     +    always have a memory pool, but the flexibility was very useful during
     +    development of merge-ort so I want to be able to restore it if needed.)
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
  3:  c7150869107 =  4:  6646f6fd1ca merge-ort: set up a memory pool
  4:  dd8839b2843 !  5:  7c49aa601d0 merge-ort: switch our strmaps over to using memory pools
     @@ Commit message
      
       ## merge-ort.c ##
      @@ merge-ort.c: static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
     - 	void (*strset_func)(struct strset *) =
     + 	void (*strset_clear_func)(struct strset *) =
       		reinitialize ? strset_partial_clear : strset_clear;
       
      -	/*
     @@ merge-ort.c: static void clear_or_reinit_internal_opts(struct merge_options_inte
      -	 * to deallocate them.
      -	 */
      -	free_strmap_strings(&opti->paths);
     --	strmap_func(&opti->paths, 1);
     +-	strmap_clear_func(&opti->paths, 1);
      +	if (opti->pool)
     -+		strmap_func(&opti->paths, 0);
     ++		strmap_clear_func(&opti->paths, 0);
      +	else {
      +		/*
      +		 * We marked opti->paths with strdup_strings = 0, so that
     @@ merge-ort.c: static void clear_or_reinit_internal_opts(struct merge_options_inte
      +		 * to these strings, it is time to deallocate them.
      +		 */
      +		free_strmap_strings(&opti->paths);
     -+		strmap_func(&opti->paths, 1);
     ++		strmap_clear_func(&opti->paths, 1);
      +	}
       
       	/*
       	 * All keys and values in opti->conflicted are a subset of those in
      @@ merge-ort.c: static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
       	 */
     - 	strmap_func(&opti->conflicted, 0);
     + 	strmap_clear_func(&opti->conflicted, 0);
       
      -	/*
      -	 * opti->paths_to_free is similar to opti->paths; we created it with
     @@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_res
       	 */
      -	strmap_init_with_options(&opt->priv->paths, NULL, 0);
      -	strmap_init_with_options(&opt->priv->conflicted, NULL, 0);
     --	string_list_init(&opt->priv->paths_to_free, 0);
     +-	string_list_init_nodup(&opt->priv->paths_to_free);
      +	strmap_init_with_options(&opt->priv->paths, pool, 0);
      +	strmap_init_with_options(&opt->priv->conflicted, pool, 0);
      +	if (!opt->priv->pool)
     -+		string_list_init(&opt->priv->paths_to_free, 0);
     ++		string_list_init_nodup(&opt->priv->paths_to_free);
       
       	/*
       	 * keys & strbufs in output will sometimes need to outlive "paths",
  5:  560800a80ef =  6:  08cf2498f96 diffcore-rename, merge-ort: add wrapper functions for filepair alloc/dealloc
  6:  94d60c8a476 =  7:  4ffa5af8b57 merge-ort: store filepairs and filespecs in our mem_pool
  7:  fda885dabe6 =  8:  1556f0443c3 merge-ort: reuse path strings in pool_alloc_filespec
  -:  ----------- >  9:  de30dbac25e merge-ort: remove compile-time ability to turn off usage of memory pools

-- 
gitgitgadget

  parent reply	other threads:[~2021-07-30 11:47 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-23 12:54 [PATCH 0/7] Final optimization batch (#15): use memory pools Elijah Newren via GitGitGadget
2021-07-23 12:54 ` [PATCH 1/7] diffcore-rename: use a mem_pool for exact rename detection's hashmap Elijah Newren via GitGitGadget
2021-07-23 21:59   ` Eric Sunshine
2021-07-23 22:03     ` Elijah Newren
2021-07-23 12:54 ` [PATCH 2/7] merge-ort: set up a memory pool Elijah Newren via GitGitGadget
2021-07-23 12:54 ` [PATCH 3/7] merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers Elijah Newren via GitGitGadget
2021-07-23 22:07   ` Eric Sunshine
2021-07-26 14:36   ` Derrick Stolee
2021-07-28 22:49     ` Elijah Newren
2021-07-29 15:26       ` Jeff King
2021-07-30  2:27         ` Elijah Newren
2021-07-30 16:12           ` Jeff King
2021-07-23 12:54 ` [PATCH 4/7] merge-ort: switch our strmaps over to using memory pools Elijah Newren via GitGitGadget
2021-07-23 12:54 ` [PATCH 5/7] diffcore-rename, merge-ort: add wrapper functions for filepair alloc/dealloc Elijah Newren via GitGitGadget
2021-07-23 12:54 ` [PATCH 6/7] merge-ort: store filepairs and filespecs in our mem_pool Elijah Newren via GitGitGadget
2021-07-23 12:54 ` [PATCH 7/7] merge-ort: reuse path strings in pool_alloc_filespec Elijah Newren via GitGitGadget
2021-07-26 14:44 ` [PATCH 0/7] Final optimization batch (#15): use memory pools Derrick Stolee
2021-07-28 22:52   ` Elijah Newren
2021-07-29  3:58 ` [PATCH v2 " Elijah Newren via GitGitGadget
2021-07-29  3:58   ` [PATCH v2 1/7] diffcore-rename: use a mem_pool for exact rename detection's hashmap Elijah Newren via GitGitGadget
2021-07-29  3:58   ` [PATCH v2 2/7] merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers Elijah Newren via GitGitGadget
2021-07-29  3:58   ` [PATCH v2 3/7] merge-ort: set up a memory pool Elijah Newren via GitGitGadget
2021-07-29  3:58   ` [PATCH v2 4/7] merge-ort: switch our strmaps over to using memory pools Elijah Newren via GitGitGadget
2021-07-29 15:28     ` Jeff King
2021-07-29 18:37       ` Elijah Newren
2021-07-29 20:09         ` Jeff King
2021-07-30  2:30           ` Elijah Newren
2021-07-30 16:12             ` Jeff King
2021-07-30 13:30           ` Ævar Arnfjörð Bjarmason
2021-07-30 14:36             ` Elijah Newren
2021-07-30 16:23               ` Ævar Arnfjörð Bjarmason
2021-07-29  3:58   ` [PATCH v2 5/7] diffcore-rename, merge-ort: add wrapper functions for filepair alloc/dealloc Elijah Newren via GitGitGadget
2021-07-29  3:58   ` [PATCH v2 6/7] merge-ort: store filepairs and filespecs in our mem_pool Elijah Newren via GitGitGadget
2021-07-29  3:58   ` [PATCH v2 7/7] merge-ort: reuse path strings in pool_alloc_filespec Elijah Newren via GitGitGadget
2021-07-29 14:58   ` [PATCH v2 0/7] Final optimization batch (#15): use memory pools Derrick Stolee
2021-07-29 16:20   ` Jeff King
2021-07-29 16:23     ` Jeff King
2021-07-29 19:46       ` Junio C Hamano
2021-07-29 20:48         ` Junio C Hamano
2021-07-29 21:05           ` Elijah Newren
2021-07-29 20:46     ` Elijah Newren
2021-07-29 21:14       ` Jeff King
2021-07-30 11:47   ` Elijah Newren via GitGitGadget [this message]
2021-07-30 11:47     ` [PATCH v3 1/9] merge-ort: rename str{map,intmap,set}_func() Elijah Newren via GitGitGadget
2021-07-30 11:47     ` [PATCH v3 2/9] diffcore-rename: use a mem_pool for exact rename detection's hashmap Elijah Newren via GitGitGadget
2021-07-30 11:47     ` [PATCH v3 3/9] merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers Elijah Newren via GitGitGadget
2021-07-30 11:47     ` [PATCH v3 4/9] merge-ort: set up a memory pool Elijah Newren via GitGitGadget
2021-07-30 11:47     ` [PATCH v3 5/9] merge-ort: switch our strmaps over to using memory pools Elijah Newren via GitGitGadget
2021-07-30 11:47     ` [PATCH v3 6/9] diffcore-rename, merge-ort: add wrapper functions for filepair alloc/dealloc Elijah Newren via GitGitGadget
2021-07-30 11:47     ` [PATCH v3 7/9] merge-ort: store filepairs and filespecs in our mem_pool Elijah Newren via GitGitGadget
2021-07-30 11:47     ` [PATCH v3 8/9] merge-ort: reuse path strings in pool_alloc_filespec Elijah Newren via GitGitGadget
2021-07-30 11:47     ` [PATCH v3 9/9] merge-ort: remove compile-time ability to turn off usage of memory pools Elijah Newren via GitGitGadget
2021-07-30 16:24       ` Jeff King
2021-07-31 17:27     ` [PATCH v4 0/9] Final optimization batch (#15): use " Elijah Newren via GitGitGadget
2021-07-31 17:27       ` [PATCH v4 1/9] merge-ort: rename str{map,intmap,set}_func() Elijah Newren via GitGitGadget
2021-07-31 17:27       ` [PATCH v4 2/9] diffcore-rename: use a mem_pool for exact rename detection's hashmap Elijah Newren via GitGitGadget
2021-07-31 17:27       ` [PATCH v4 3/9] merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers Elijah Newren via GitGitGadget
2021-07-31 17:27       ` [PATCH v4 4/9] merge-ort: set up a memory pool Elijah Newren via GitGitGadget
2021-07-31 17:27       ` [PATCH v4 5/9] merge-ort: switch our strmaps over to using memory pools Elijah Newren via GitGitGadget
2021-07-31 17:27       ` [PATCH v4 6/9] diffcore-rename, merge-ort: add wrapper functions for filepair alloc/dealloc Elijah Newren via GitGitGadget
2021-07-31 17:27       ` [PATCH v4 7/9] merge-ort: store filepairs and filespecs in our mem_pool Elijah Newren via GitGitGadget
2021-07-31 17:27       ` [PATCH v4 8/9] merge-ort: reuse path strings in pool_alloc_filespec Elijah Newren via GitGitGadget
2021-07-31 17:27       ` [PATCH v4 9/9] merge-ort: remove compile-time ability to turn off usage of memory pools Elijah Newren via GitGitGadget
2021-08-02 15:27       ` [PATCH v4 0/9] Final optimization batch (#15): use " Derrick Stolee
2021-08-03 15:45       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.990.v3.git.1627645664.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.