All of lore.kernel.org
 help / color / mirror / Atom feed
From: "SZEDER Gábor" <szeder.dev@gmail.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: Abhishek Kumar <abhishekkumar8222@gmail.com>,
	jnareb@gmail.com, git@vger.kernel.org
Subject: Re: [GSoC Patch 0/3] Move generation, graph_pos to a slab
Date: Mon, 8 Jun 2020 18:46:37 +0200	[thread overview]
Message-ID: <20200608164637.GE8232@szeder.dev> (raw)
In-Reply-To: <13db757a-9412-7f1e-805c-8a028c4ab2b1@gmail.com>

On Mon, Jun 08, 2020 at 09:45:12AM -0400, Derrick Stolee wrote:
> On 6/8/2020 4:36 AM, SZEDER Gábor wrote:
> > On Mon, Jun 08, 2020 at 11:18:27AM +0530, Abhishek Kumar wrote:
> >> On Sun, Jun 07, 2020 at 09:53:47PM +0200, SZEDER Gábor wrote:
> >>> On Thu, Jun 04, 2020 at 10:22:27AM -0400, Derrick Stolee wrote:
> >>>> On 6/4/2020 3:27 AM, Abhishek Kumar wrote:
> >>>>> The struct commit is used in many contexts. However, members generation
> >>>>> and graph_pos are only used for commit-graph related operations and
> >>>>> otherwise waste memory.
> >>>>>
> >>>>> This wastage would have been more pronounced as transistion to
> >>>>> generation number v2, which uses 64-bit generation number instead of
> >>>>> current 32-bits.
> >>>>
> >>>> Thanks! This is an important step, and will already improve
> >>>> performance in subtle ways.
> >>>
> >>> While the reduced memory footprint of each commit object might improve
> >>> performance, accessing graph position and generation numbers in a
> >>> commit-slab is more expensive than direct field accesses in 'struct
> >>> commit' instances.  Consequently, these patches increase the runtime
> >>> of 'git merge-base --is-ancestor HEAD~50000 HEAD' in the linux
> >>> repository from 0.630s to 0.940s.
> >>>
> >>
> >> Thank you for checking performance. Performance penalty was something we
> >> had discussed here [1]. 
> >>
> >> Caching the commit slab results in local variables helped wonderfully in v2 [2].
> >> For example, the runtime of 'git merge-base --is-ancestor HEAD~50000 HEAD'
> >> in the linux repository increased from 0.762 to 0.767s. Since this is a
> >> change of <1%, it is *no longer* a performance regression in my opinion.
> > 
> > Interesting, I measured 0.870s with v2, still a notable increase from
> > 0.630s.
> 
> This is an interesting point. The --is-ancestor is critical to the
> performance issue (as measured on my machine).
> 
> For "git merge-base HEAD~50000 HEAD" on the Linux repo, I get
> 
> v2.27.0:
> real    0m0.515s
> user    0m0.467s
> sys     0m0.048s
> 
> v2 series:
> real    0m0.534s
> user    0m0.481s
> sys     0m0.053s

I, too, see similarly small differences in this case.

> With "--is-ancestor" I see the following:
> 
> v2.27.0:
> real    0m0.591s
> user    0m0.539s
> sys     0m0.052s
> 
> v2 series:
> real    0m0.773s
> user    0m0.733s
> sys     0m0.040s
> 
> The --is-ancestor option [1] says
> 
>     Check if the first <commit> is an ancestor of the second
>     <commit>, and exit with status 0 if true, or with status
>     1 if not. Errors are signaled by a non-zero status that
>     is not 1.
> 
> [1] https://git-scm.com/docs/git-merge-base#Documentation/git-merge-base.txt---is-ancestor
> 
> This _should_ be faster than "git branch --contains HEAD~50000",
> but it is much much slower:
> 
> $ time git branch --contains HEAD~50000
> real    0m0.068s
> user    0m0.061s
> sys     0m0.008s
> 
> So, there is definitely something going on that slows the
> "--is-ancestor" path in this case. But, the solution is not
> to halt the current patch (which likely has memory footprint
> benefits when dealing with a lot of tree and blob objects)
> and instead fix the underlying algorithm.

Other, more common cases are affected as well, notably the simple 'git
rev-list --topo-order':

  performance: 1.226479734 s: git command: /home/szeder/src/git/BUILDS/v2.27.0/bin/git rev-list --topo-order HEAD
  max RSS: 162400k
  
  performance: 1.741309536 s: git command: /home/szeder/src/git/git rev-list --topo-order HEAD
  max RSS: 169556k

Is the supposed memory footprint reduction that large to justify this
runtime increase?

> Let's add that to the list of things to do.

And to the commit messages.

> >>>  create mode 100644 contrib/coccinelle/generation.cocci
> >>>  create mode 100644 contrib/coccinelle/graph_pos.cocci
> >>
> >> I appreciate the Coccinelle scripts to help identify
> >> automatic fixes for other topics in-flight. However,
> >> I wonder if they would be better placed inside the
> >> existing commit.cocci file?
> >
> > We add Coccinelle scripts to avoid undesirable code patterns entering
> > our code base.  That, however, is not the case here: this is a
> > one-time conversion, and at the end of this series 'struct commit'
> > won't have a 'generation' field anymore, so once it's merged the
> > compiler will catch any new 'commit->generation' accesses.  Therefore
> > I don't think that these Coccinelle scripts should be added at all.
> 
> I disagree. We _also_ add Coccinelle scripts when doing one-time
> refactors to avoid logical merge conflicts with other topics in
> flight. If someone else is working on a parallel topic that adds
> references to graph_pos or generation member, then the scripts provide
> an easy way for the maintainer to update those references in the merge
> commit. Alternatively, the contributor could rebase on top of this
> series and run the scripts themselves to fix their patches before
> submission.
> 
> For example, this was done carefully in the sha->object_id
> conversion using contrib/coccinelle/object_id.cocci.

'object_id.cocci' is not about sha->object_id conversions, but about
avoiding undesirable code patterns, e.g. we prefer oideq() over
!oidcmp(), and the compiler, of course, can't help to catch that.
Coccinelle scripts used for actual sha->object_id transformations were
not added to 'object_id.cocci', but were recorded only in the commit
messages for reference, see e.g.  9b56149996 (merge-recursive: convert
struct merge_file_info to object_id, 2016-06-24) and a couple of its
ancestors.


  reply	other threads:[~2020-06-08 16:46 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-04  7:27 [GSoC Patch 0/3] Move generation, graph_pos to a slab Abhishek Kumar
2020-06-04  7:27 ` [GSoC Patch 1/3] commit: introduce helpers for generation slab Abhishek Kumar
2020-06-04 14:36   ` Derrick Stolee
2020-06-04 17:35   ` Junio C Hamano
2020-06-05 23:23   ` Jakub Narębski
2020-06-04  7:27 ` [GSoC Patch 2/3] commit: convert commit->generation to a slab Abhishek Kumar
2020-06-04 14:27   ` Derrick Stolee
2020-06-04 17:49   ` Junio C Hamano
2020-06-06 22:03   ` Jakub Narębski
2020-06-04  7:27 ` [GSoC Patch 3/3] commit: convert commit->graph_pos " Abhishek Kumar
2020-06-07 12:12   ` Jakub Narębski
2020-06-04 14:22 ` [GSoC Patch 0/3] Move generation, graph_pos " Derrick Stolee
2020-06-04 17:55   ` Junio C Hamano
2020-06-07 19:53   ` SZEDER Gábor
2020-06-08  5:48     ` Abhishek Kumar
2020-06-08  8:36       ` SZEDER Gábor
2020-06-08 13:45         ` Derrick Stolee
2020-06-08 16:46           ` SZEDER Gábor [this message]
2020-06-08 15:21         ` Jakub Narębski
2020-06-05 19:00 ` Jakub Narębski
2020-06-07 19:32 ` [GSOC Patch v2 0/4] " Abhishek Kumar
2020-06-07 19:32   ` [GSOC Patch v2 1/4] commit-graph: introduce commit_graph_data_slab Abhishek Kumar
2020-06-15 16:27     ` Taylor Blau
2020-06-07 19:32   ` [GSOC Patch v2 2/4] commit: move members graph_pos, generation to a slab Abhishek Kumar
2020-06-08  8:26     ` SZEDER Gábor
2020-06-08 12:35       ` Derrick Stolee
2020-06-07 19:32   ` [GSOC Patch v2 3/4] commit-graph: use generation directly when writing commit-graph Abhishek Kumar
2020-06-08 16:31     ` Jakub Narębski
2020-06-15 16:31       ` Taylor Blau
2020-06-07 19:32   ` [GSOC Patch v2 4/4] commit-graph: minimize commit_graph_data_slab access Abhishek Kumar
2020-06-08 16:22   ` [GSOC Patch v2 0/4] Move generation, graph_pos to a slab Jakub Narębski
2020-06-15 16:24   ` Taylor Blau
2020-06-17  9:14 ` [GSOC Patch v4 " Abhishek Kumar
2020-06-17  9:14   ` [GSOC Patch v4 1/4] object: drop parsed_object_pool->commit_count Abhishek Kumar
2020-06-17  9:14   ` [GSOC Patch v4 2/4] commit-graph: introduce commit_graph_data_slab Abhishek Kumar
2020-06-17  9:14   ` [GSOC Patch v4 3/4] commit: move members graph_pos, generation to a slab Abhishek Kumar
2020-06-17  9:14   ` [GSOC Patch v4 4/4] commit-graph: minimize commit_graph_data_slab access Abhishek Kumar
2020-06-19 13:59   ` [GSOC Patch v4 0/4] Move generation, graph_pos to a slab Derrick Stolee
2020-06-19 17:44     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200608164637.GE8232@szeder.dev \
    --to=szeder.dev@gmail.com \
    --cc=abhishekkumar8222@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.