From: Elijah Newren <newren@gmail.com> To: Jeff King <peff@peff.net> Cc: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>, Git Mailing List <git@vger.kernel.org> Subject: Re: [PATCH 4/5] strmap: add strdup_strings option Date: Fri, 28 Aug 2020 10:20:40 -0700 [thread overview] Message-ID: <CABPp-BFL_f4ee=NrfKwVEcLzuCZy7pfXOTLimp5Qn3P89RYfpA@mail.gmail.com> (raw) In-Reply-To: <20200828070802.GC2105050@coredump.intra.peff.net> On Fri, Aug 28, 2020 at 12:08 AM Jeff King <peff@peff.net> wrote: > > On Fri, Aug 21, 2020 at 03:25:44PM -0700, Elijah Newren wrote: > > > > - That sounds like a lot of maps. :) I guess you've looked at > > > compacting some of them into a single map-to-struct? > > > > Oh, map-to-struct is the primary use. But compacting them won't work, > > because the reason for the additional maps is that they have different > > sets of keys (this set of paths meet a certain condition...). Only > > one map contains all the paths involved in the merge. > > OK, I guess I'm not surprised that you would not have missed such an > obvious optimization. :) > > > Also, several of those maps don't even store a value; and are really > > just a set implemented via strmap (thus meaning the only bit of data I > > need for some conditions is whether any given path meets it). It > > seems slightly ugly to have to call strmap_put(map, string, NULL) for > > those. I wonder if I should have another strset type much like your > > suggesting for strintmap. Hmm... > > FWIW, khash does have a "set" mode where it avoids allocating the value > array at all. Cool. > What's the easiest way to benchmark merge-ort? Note that I discovered another optimization that I'm working on implementing; when finished, it should cut down a little more on the time spent on inexact rename detection. That should have the side effect of having the time spent on strmaps stick out some more in the overall timings (as a percentage of overall time anyway). So, I'm focused on that before I do other benchmarking work (which is part of the reason I mentioned my strmap/hashmap benchmarking last week might take a while). Anyway, on to your question: === If you just want to be able to run the ort merge algorithm === Clone git@github.com:newren/git and checkout the 'ort' branch and build it. It currently changes the default merge algorithm to 'ort' and even ignores '-s recursive' by remapping it to '-s ort' (because I wanted to see how regression tests fared with ort as a replacement for recrusive). It should pass the regression tests if you want to run those first. But note that if you want to compare 'ort' to 'recursive', then currently you need to have two different git builds, one of my branch and one with a different checkout of something else (e.g. 2.28.0 or 'master' or whatever). === Decide the granularity of your timing === I suspect you know more than me here, but maybe my pointers are useful anyway... Decide if you want to measure overall program runtime, or dive into details. I used both a simple 'time' and the better 'hyperfine' for the former, and used both 'perf' and GIT_TRACE2_PERF for the latter. One nice thing about GIT_TRACE2_PERF was I wrote a simple program to aggregate the times per region and provide percentages, in a script at the toplevel named 'summarize-perf' that I can use to prefix commands. Thus, I could for example run from my linux clone: $ ../git/summarize-perf git fast-rebase --onto HEAD base hwmon-updates and I'd get output that looks something like (note that this is a subset of the real output): 1.400 : 35 : label:inmemory_nonrecursive 0.827 : 41 : ..label:renames 0.019 : <unmeasured> ( 2.2%) 0.803 : 37 : ....label:regular renames 0.004 : 31 : ....label:directory renames 0.001 : 31 : ....label:process renames 0.513 : 41 : ..label:collect_merge_info 0.048 : 35 : ..label:process_entries 0.117 : 1 : label:checkout 0.000 : 1 : label:record_unmerged and where those fields are <time> : <count> : <region label>. === If you want to time the testcases I used heavily while developing === The rebase-testcase/redo-timings script (in the ort branch) has details on what I actually ran, though it has some paranoia around attempting to make my laptop run semi-quietly and try to avoid all the variance that I wished I could control a bit better. And it assumes you are running in a linux clone with a few branches set up a certain way. Let me explain those tests without using that script, as simply as I can: The setup for the particular cases I was testing is as follows: * Clone the linux kernel, and run the following: $ git branch hwmon-updates fd8bdb23b91876ac1e624337bb88dc1dcc21d67e $ git branch hwmon-just-one fd8bdb23b91876ac1e624337bb88dc1dcc21d67e~34 $ git branch base 4703d9119972bf586d2cca76ec6438f819ffa30e $ git switch -c 5.4-renames v5.4 $ git mv drivers pilots $ git commit -m "Rename drivers/ to pilots/" And from there, there were three primary tests I was comparing: * Rename testcase, 35 patches: $ git checkout 5.4-renames^0 $ git fast-rebase --onto HEAD base hwmon-updates * Rename testcase, just 1 patch: $ git switch 5.4-renames^0 $ git fast-rebase --onto HEAD base hwmon-just-one * No renames (or at least very few renames) testcase, 35 patches: $ git checkout v5.4^0 $ git branch -f hwmon-updates fd8bdb23b91876ac1e624337bb88dc1dcc21d67e # Need to reset hwmon-updates, due to fast-rebase done above $ git fast-rebase --onto HEAD base hwmon-updates (If you want to compare with 'recursive' from a different build of git, just replace 'fast-rebase' with 'rebase'. You can also use 'rebase' instead of 'fast-rebase' on the ort branch and it'll use the ort merge algorithm, but you get all the annoying working-tree-updates-while-rebasing rather than just having the working tree updated at the end of the rebase. You also get all the annoying forks of 'git checkout' and 'git commit' that sequencer is guilty of spawning. But it certainly supports a lot more options and can save state to allow resuming after conflicts, unlike 'fast-rebase'.) > I suspect I could swap out hashmap for khash (messily) in an hour or less. Well, you might be assuming I used sane strmaps, with each strmap having a fixed type for the stored value. That's mostly true, but there were two counterexamples I can think of: "paths" (the biggest strmap) is a map of string -> {merged_info OR conflict_info}, because merged_info is a smaller subset of conflict_info and saves space for each path that can be trivially merged. Also, in diffcore-rename, "dir_rename" starts life as a map of string -> strmap, but later transitions to string -> string, because I'm evil and didn't set up a temporary strmap like I probably should have. Also, the code is littered with FIXME comments, unnecessary #ifdefs, and is generally in need of lots of cleanup. Sorry.
next prev parent reply other threads:[~2020-08-28 17:21 UTC|newest] Thread overview: 144+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-08-21 18:52 [PATCH 0/5] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget 2020-08-21 18:52 ` [PATCH 1/5] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget 2020-08-21 19:22 ` Jeff King 2020-08-21 18:52 ` [PATCH 2/5] strmap: new utility functions Elijah Newren via GitGitGadget 2020-08-21 19:48 ` Jeff King 2020-08-21 18:52 ` [PATCH 3/5] strmap: add more " Elijah Newren via GitGitGadget 2020-08-21 19:58 ` Jeff King 2020-08-21 18:52 ` [PATCH 4/5] strmap: add strdup_strings option Elijah Newren via GitGitGadget 2020-08-21 20:01 ` Jeff King 2020-08-21 20:41 ` Elijah Newren 2020-08-21 21:03 ` Jeff King 2020-08-21 22:25 ` Elijah Newren 2020-08-28 7:08 ` Jeff King 2020-08-28 17:20 ` Elijah Newren [this message] 2020-08-21 18:52 ` [PATCH 5/5] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget 2020-08-21 20:10 ` Jeff King 2020-08-21 20:51 ` Elijah Newren 2020-08-21 21:05 ` Jeff King 2020-08-21 20:16 ` [PATCH 0/5] Add struct strmap and associated utility functions Jeff King 2020-08-21 21:33 ` Elijah Newren 2020-08-21 22:28 ` Elijah Newren 2020-08-28 7:03 ` Jeff King 2020-08-28 15:29 ` Elijah Newren 2020-09-01 9:27 ` Jeff King 2020-10-13 0:40 ` [PATCH v2 00/10] " Elijah Newren via GitGitGadget 2020-10-13 0:40 ` [PATCH v2 01/10] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget 2020-10-30 12:50 ` Jeff King 2020-10-30 19:55 ` Elijah Newren 2020-11-03 16:26 ` Jeff King 2020-11-03 16:48 ` Elijah Newren 2020-10-13 0:40 ` [PATCH v2 02/10] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget 2020-10-30 12:51 ` Jeff King 2020-10-13 0:40 ` [PATCH v2 03/10] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget 2020-10-30 13:35 ` Jeff King 2020-10-30 15:37 ` Elijah Newren 2020-11-03 16:08 ` Jeff King 2020-11-03 16:16 ` Elijah Newren 2020-10-13 0:40 ` [PATCH v2 04/10] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget 2020-10-30 13:41 ` Jeff King 2020-10-30 16:03 ` Elijah Newren 2020-11-03 16:10 ` Jeff King 2020-10-13 0:40 ` [PATCH v2 05/10] strmap: new utility functions Elijah Newren via GitGitGadget 2020-10-30 14:12 ` Jeff King 2020-10-30 16:26 ` Elijah Newren 2020-10-13 0:40 ` [PATCH v2 06/10] strmap: add more " Elijah Newren via GitGitGadget 2020-10-30 14:23 ` Jeff King 2020-10-30 16:43 ` Elijah Newren 2020-11-03 16:12 ` Jeff King 2020-10-13 0:40 ` [PATCH v2 07/10] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget 2020-10-30 14:27 ` Jeff King 2020-10-13 0:40 ` [PATCH v2 08/10] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget 2020-10-30 14:39 ` Jeff King 2020-10-30 17:28 ` Elijah Newren 2020-11-03 16:20 ` Jeff King 2020-11-03 16:46 ` Elijah Newren 2020-10-13 0:40 ` [PATCH v2 09/10] strmap: add a strset sub-type Elijah Newren via GitGitGadget 2020-10-30 14:44 ` Jeff King 2020-10-30 18:02 ` Elijah Newren 2020-10-13 0:40 ` [PATCH v2 10/10] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget 2020-10-30 14:56 ` Jeff King 2020-10-30 19:31 ` Elijah Newren 2020-11-03 16:24 ` Jeff King 2020-11-02 18:55 ` [PATCH v3 00/13] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget 2020-11-02 18:55 ` [PATCH v3 01/13] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget 2020-11-02 18:55 ` [PATCH v3 02/13] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget 2020-11-02 18:55 ` [PATCH v3 03/13] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget 2020-11-02 18:55 ` [PATCH v3 04/13] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget 2020-11-02 18:55 ` [PATCH v3 05/13] hashmap: provide deallocation function names Elijah Newren via GitGitGadget 2020-11-02 18:55 ` [PATCH v3 06/13] strmap: new utility functions Elijah Newren via GitGitGadget 2020-11-02 18:55 ` [PATCH v3 07/13] strmap: add more " Elijah Newren via GitGitGadget 2020-11-04 20:13 ` Jeff King 2020-11-04 20:24 ` Elijah Newren 2020-11-02 18:55 ` [PATCH v3 08/13] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget 2020-11-02 18:55 ` [PATCH v3 09/13] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget 2020-11-04 20:21 ` Jeff King 2020-11-02 18:55 ` [PATCH v3 10/13] strmap: add a strset sub-type Elijah Newren via GitGitGadget 2020-11-04 20:31 ` Jeff King 2020-11-02 18:55 ` [PATCH v3 11/13] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget 2020-11-02 18:55 ` [PATCH v3 12/13] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant Elijah Newren via GitGitGadget 2020-11-04 20:43 ` Jeff King 2020-11-02 18:55 ` [PATCH v3 13/13] Use new HASHMAP_INIT macro to simplify hashmap initialization Elijah Newren via GitGitGadget 2020-11-04 20:48 ` Jeff King 2020-11-04 20:52 ` [PATCH v3 00/13] Add struct strmap and associated utility functions Jeff King 2020-11-04 22:20 ` Elijah Newren 2020-11-05 0:22 ` [PATCH v4 " Elijah Newren via GitGitGadget 2020-11-05 0:22 ` [PATCH v4 01/13] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget 2020-11-05 0:22 ` [PATCH v4 02/13] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget 2020-11-05 0:22 ` [PATCH v4 03/13] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget 2020-11-05 0:22 ` [PATCH v4 04/13] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget 2020-11-05 0:22 ` [PATCH v4 05/13] hashmap: provide deallocation function names Elijah Newren via GitGitGadget 2020-11-05 0:22 ` [PATCH v4 06/13] strmap: new utility functions Elijah Newren via GitGitGadget 2020-11-05 0:22 ` [PATCH v4 07/13] strmap: add more " Elijah Newren via GitGitGadget 2020-11-05 0:22 ` [PATCH v4 08/13] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget 2020-11-05 0:22 ` [PATCH v4 09/13] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget 2020-11-05 0:22 ` [PATCH v4 10/13] strmap: add a strset sub-type Elijah Newren via GitGitGadget 2020-11-05 0:22 ` [PATCH v4 11/13] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget 2020-11-05 0:22 ` [PATCH v4 12/13] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant Elijah Newren via GitGitGadget 2020-11-05 0:22 ` [PATCH v4 13/13] Use new HASHMAP_INIT macro to simplify hashmap initialization Elijah Newren via GitGitGadget 2020-11-05 13:29 ` [PATCH v4 00/13] Add struct strmap and associated utility functions Jeff King 2020-11-05 20:25 ` Junio C Hamano 2020-11-05 21:17 ` Jeff King 2020-11-05 21:22 ` Elijah Newren 2020-11-05 22:15 ` Junio C Hamano 2020-11-06 0:24 ` [PATCH v5 00/15] " Elijah Newren via GitGitGadget 2020-11-06 0:24 ` [PATCH v5 01/15] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget 2020-11-06 0:24 ` [PATCH v5 02/15] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget 2020-11-06 0:24 ` [PATCH v5 03/15] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget 2020-11-06 0:24 ` [PATCH v5 04/15] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget 2020-11-06 0:24 ` [PATCH v5 05/15] hashmap: provide deallocation function names Elijah Newren via GitGitGadget 2020-11-06 0:24 ` [PATCH v5 06/15] strmap: new utility functions Elijah Newren via GitGitGadget 2020-11-06 0:24 ` [PATCH v5 07/15] strmap: add more " Elijah Newren via GitGitGadget 2020-11-06 0:24 ` [PATCH v5 08/15] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget 2020-11-06 0:24 ` [PATCH v5 09/15] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget 2020-11-06 0:24 ` [PATCH v5 10/15] strmap: split create_entry() out of strmap_put() Elijah Newren via GitGitGadget 2020-11-06 0:24 ` [PATCH v5 11/15] strmap: add a strset sub-type Elijah Newren via GitGitGadget 2020-11-06 0:24 ` [PATCH v5 12/15] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget 2020-11-11 17:33 ` Phillip Wood 2020-11-11 18:49 ` Elijah Newren 2020-11-11 19:01 ` Jeff King 2020-11-11 20:34 ` Chris Torek 2020-11-06 0:24 ` [PATCH v5 13/15] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant Elijah Newren via GitGitGadget 2020-11-06 0:24 ` [PATCH v5 14/15] Use new HASHMAP_INIT macro to simplify hashmap initialization Elijah Newren via GitGitGadget 2020-11-06 0:24 ` [PATCH v5 15/15] shortlog: use strset from strmap.h Elijah Newren via GitGitGadget 2020-11-06 2:00 ` [PATCH v5 00/15] Add struct strmap and associated utility functions Junio C Hamano 2020-11-06 2:42 ` Elijah Newren 2020-11-06 2:48 ` Jeff King 2020-11-06 17:32 ` Junio C Hamano 2020-11-11 20:02 ` [PATCH v6 " Elijah Newren via GitGitGadget 2020-11-11 20:02 ` [PATCH v6 01/15] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget 2020-11-11 20:02 ` [PATCH v6 02/15] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget 2020-11-11 20:02 ` [PATCH v6 03/15] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget 2020-11-11 20:02 ` [PATCH v6 04/15] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget 2020-11-11 20:02 ` [PATCH v6 05/15] hashmap: provide deallocation function names Elijah Newren via GitGitGadget 2020-11-11 20:02 ` [PATCH v6 06/15] strmap: new utility functions Elijah Newren via GitGitGadget 2020-11-11 20:02 ` [PATCH v6 07/15] strmap: add more " Elijah Newren via GitGitGadget 2020-11-11 20:02 ` [PATCH v6 08/15] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget 2020-11-11 20:02 ` [PATCH v6 09/15] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget 2020-11-11 20:02 ` [PATCH v6 10/15] strmap: split create_entry() out of strmap_put() Elijah Newren via GitGitGadget 2020-11-11 20:02 ` [PATCH v6 11/15] strmap: add a strset sub-type Elijah Newren via GitGitGadget 2020-11-11 20:02 ` [PATCH v6 12/15] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget 2020-11-11 20:02 ` [PATCH v6 13/15] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant Elijah Newren via GitGitGadget 2020-11-11 20:02 ` [PATCH v6 14/15] Use new HASHMAP_INIT macro to simplify hashmap initialization Elijah Newren via GitGitGadget 2020-11-11 20:02 ` [PATCH v6 15/15] shortlog: use strset from strmap.h Elijah Newren via GitGitGadget 2020-11-11 20:07 ` [PATCH v6 00/15] Add struct strmap and associated utility functions Jeff King
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CABPp-BFL_f4ee=NrfKwVEcLzuCZy7pfXOTLimp5Qn3P89RYfpA@mail.gmail.com' \ --to=newren@gmail.com \ --cc=git@vger.kernel.org \ --cc=gitgitgadget@gmail.com \ --cc=peff@peff.net \ --subject='Re: [PATCH 4/5] strmap: add strdup_strings option' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).