All of lore.kernel.org
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, git@jeffhostetler.com, peff@peff.net,
	jonathantanmy@google.com, szeder.dev@gmail.com,
	sbeller@google.com, Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH v4 04/13] commit-graph: implement write_commit_graph()
Date: Fri, 23 Feb 2018 15:02:21 -0500	[thread overview]
Message-ID: <8d29382b-e0a7-968f-b231-f27f89f275d3@gmail.com> (raw)
In-Reply-To: <xmqqsh9rtsg0.fsf@gitster-ct.c.googlers.com>



On 2/23/2018 2:30 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
>
>> jt/binsearch-with-fanout introduces one when there is a 256-entry
>> fanout table (not the case here).
>>
>> The bsearch() method in search.h (and used in
>> pack-write.c:need_large_offset) does not return the _position_ of a
>> found element.
>>
>> Neither of these suit my needs, but I could just be searching for the
>> wrong strings. Also, I could divert my energies in this area to create
>> a generic search in the style of jt/binsearch-with-fanout.
> ... me goes and digs ...
>
> What I had in mind was the one in sha1-lookup.c, actually.  Having
> said that, hand-rolling another one is not too huge a deal;
> eventually people will notice and consolidate code after the series
> stabilizes anyway ;-)

Ah, sha1_pos(). That definitely satisfies my use case. Thanks! My local 
branch has this replacement.

>
>>>> +				num_large_edges++;
>>>> +				parent = parent->next;
>>>> +			} while (parent);
>>> It feels somewhat wasteful to traverse the commit's parents list
>>> only to count, without populating the octopus table (which I
>>> understand is assumed to be minority case under this design).
>> Since we are writing the commit graph file in-order, we cannot write
>> the octopus table until after the chunk lengths are known.
> Oh, my "minority case" comment was me wondering "since we expect
> there are only a few, why not keep them in memory while we discover
> them here, so that the writing phase that come later do not have to
> go through all commits again counting their parents?  would it be
> more performant and a better trade-off?"  We can certainly do such
> an optimization later (iow, it is not all that crucial issue and
> certainly I didn't mention the above as something that needs to be
> "fixed"--there is nothing broken).
>
>> store the octopus table in memory and then dump it into the file
>> later, but walking the parents is quite fast after all the commits are
>> loaded. I'm not sure the time optimization merits the extra complexity
>> here. (I'm happy to revisit this if we do see this performance
>> lacking.)
>>
>> P.S. I really like the name "octopus table" and will use that for
>> informal discussions of this format.
> I actually do not mind that name used as the official term.  I find
> it far more descriptive and understandable than "long edge" / "large
> edge" at least to a Git person.

I will consider using this in the format and design documents.

>
>> You're right that int_id isn't great, and your more-specific
>> "oid_table_pos" shows an extra reason why it isn't great: when we add
>> the GRAPH_LAST_EDGE bit or set it to GRAPH_PARENT_MISSING, the value
>> is NOT a table position.
> Perhaps I am somewhat biased, but it is quite natural for our
> codebase and internal API to say something like this:
>
>      x_pos(table, key) function's return value is the non-negative
>      position for the key in the table when the key is there; when it
>      returns a negative value, it is (-1 - position) where the "position"
>      is the position in the table they key would have been found if
>      it was in there.
>
> and store the return value from such a function in a variable called
> "pos".  Surely, sometimes "pos" does not have _the_ position, but
> that does not make it a bad name.
>
> Saying "MISSING is a special value that denotes 'nothing is here'"
> and allowing it to be set to a variable that meant to hold the
> position is not such a bad thing, though.  After all, that is how
> you use NULL as a special value for a pointer variable ;-).
>
> Same for using the high bit to mean something else.  Taking these
> together you would explain "low 31-bit of pos holds the position for
> the item in the table.  MISSING is a special value that you can use
> to denote there is nothing.  The LAST_EDGE bit denotes that one
> group of positions ends there", or something like that.
>
>> I think the current name makes the following call very clear:
> It is still a strange name nevertheless.
>
>>>> +char *write_commit_graph(const char *obj_dir)
>>>> +{
>>>> +	struct packed_oid_list oids;
>>>> +	struct packed_commit_list commits;
>>>> +	struct sha1file *f;
>>>> +	int i, count_distinct = 0;
>>>> +	DIR *info_dir;
>>>> +	struct strbuf tmp_file = STRBUF_INIT;
>>>> +	struct strbuf graph_file = STRBUF_INIT;
>>>> +	unsigned char final_hash[GIT_MAX_RAWSZ];
>>>> +	char *graph_name;
>>>> +	int fd;
>>>> +	uint32_t chunk_ids[5];
>>>> +	uint64_t chunk_offsets[5];
>>>> +	int num_chunks;
>>>> +	int num_long_edges;
>>>> +	struct commit_list *parent;
>>>> +
>>>> +	oids.nr = 0;
>>>> +	oids.alloc = (int)(0.15 * approximate_object_count());
>>> Heh, traditionalist would probably avoid unnecessary use of float
>>> and use something like 1/4 or 1/8 ;-)  After all, it is merely a
>>> ballpark guestimate.
>>>
>>>> +	num_long_edges = 0;
>>> This again is about naming, but I find it a bit unnatural to call
>>> the edge between a chind and its octopus parents "long".  Individual
>>> edges are not long--the only thing that is long is your "list of
>>> edges".  Some other codepaths in this patch seems to call the same
>>> concept with s/long/large/, which I found somewhat puzzling.
>> How about "num_extra_edges"...
> Yes, "extra" in the name makes it very understandable.
>


  parent reply	other threads:[~2018-02-23 20:02 UTC|newest]

Thread overview: 146+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-30 21:39 [PATCH v2 00/14] Serialized Git Commit Graph Derrick Stolee
2018-01-30 21:39 ` [PATCH v2 01/14] commit-graph: add format document Derrick Stolee
2018-02-01 21:44   ` Jonathan Tan
2018-01-30 21:39 ` [PATCH v2 02/14] graph: add commit graph design document Derrick Stolee
2018-01-31  2:19   ` Stefan Beller
2018-01-30 21:39 ` [PATCH v2 03/14] commit-graph: create git-commit-graph builtin Derrick Stolee
2018-02-02  0:53   ` SZEDER Gábor
2018-01-30 21:39 ` [PATCH v2 04/14] commit-graph: implement construct_commit_graph() Derrick Stolee
2018-02-01 22:23   ` Jonathan Tan
2018-02-01 23:46   ` SZEDER Gábor
2018-02-02 15:32   ` SZEDER Gábor
2018-02-05 16:06     ` Derrick Stolee
2018-02-07 15:08       ` SZEDER Gábor
2018-02-07 15:10         ` Derrick Stolee
2018-01-30 21:39 ` [PATCH v2 05/14] commit-graph: implement git-commit-graph --write Derrick Stolee
2018-02-01 23:33   ` Jonathan Tan
2018-02-02 18:36     ` Stefan Beller
2018-02-02 22:48       ` Junio C Hamano
2018-02-03  1:58         ` Derrick Stolee
2018-02-03  9:28           ` Jeff King
2018-02-05 18:48             ` Junio C Hamano
2018-02-06 18:55               ` Derrick Stolee
2018-02-01 23:48   ` SZEDER Gábor
2018-02-05 18:07     ` Derrick Stolee
2018-02-02  1:47   ` SZEDER Gábor
2018-01-30 21:39 ` [PATCH v2 06/14] commit-graph: implement git-commit-graph --read Derrick Stolee
2018-01-31  2:22   ` Stefan Beller
2018-02-02  0:02   ` SZEDER Gábor
2018-02-02  0:23   ` Jonathan Tan
2018-02-05 19:29     ` Derrick Stolee
2018-01-30 21:39 ` [PATCH v2 07/14] commit-graph: implement git-commit-graph --update-head Derrick Stolee
2018-02-02  1:35   ` SZEDER Gábor
2018-02-05 21:01     ` Derrick Stolee
2018-02-02  2:45   ` SZEDER Gábor
2018-01-30 21:39 ` [PATCH v2 08/14] commit-graph: implement git-commit-graph --clear Derrick Stolee
2018-02-02  4:01   ` SZEDER Gábor
2018-01-30 21:39 ` [PATCH v2 09/14] commit-graph: teach git-commit-graph --delete-expired Derrick Stolee
2018-02-02 15:04   ` SZEDER Gábor
2018-01-30 21:39 ` [PATCH v2 10/14] commit-graph: add core.commitgraph setting Derrick Stolee
2018-01-31 22:44   ` Igor Djordjevic
2018-02-02 16:01   ` SZEDER Gábor
2018-01-30 21:39 ` [PATCH v2 11/14] commit: integrate commit graph with commit parsing Derrick Stolee
2018-02-02  1:51   ` Jonathan Tan
2018-02-06 14:53     ` Derrick Stolee
2018-01-30 21:39 ` [PATCH v2 12/14] commit-graph: read only from specific pack-indexes Derrick Stolee
2018-01-30 21:39 ` [PATCH v2 13/14] commit-graph: close under reachability Derrick Stolee
2018-01-30 21:39 ` [PATCH v2 14/14] commit-graph: build graph from starting commits Derrick Stolee
2018-01-30 21:47 ` [PATCH v2 00/14] Serialized Git Commit Graph Stefan Beller
2018-02-01  2:34   ` Stefan Beller
2018-02-08 20:37 ` [PATCH v3 " Derrick Stolee
2018-02-08 20:37   ` [PATCH v3 01/14] commit-graph: add format document Derrick Stolee
2018-02-08 21:21     ` Junio C Hamano
2018-02-08 21:33       ` Derrick Stolee
2018-02-08 23:16         ` Junio C Hamano
2018-02-08 20:37   ` [PATCH v3 02/14] graph: add commit graph design document Derrick Stolee
2018-02-08 20:37   ` [PATCH v3 03/14] commit-graph: create git-commit-graph builtin Derrick Stolee
2018-02-08 21:27     ` Junio C Hamano
2018-02-08 21:36       ` Derrick Stolee
2018-02-08 23:21         ` Junio C Hamano
2018-02-08 20:37   ` [PATCH v3 04/14] commit-graph: implement write_commit_graph() Derrick Stolee
2018-02-08 22:14     ` Junio C Hamano
2018-02-15 18:19     ` Junio C Hamano
2018-02-15 18:23       ` Derrick Stolee
2018-02-08 20:37   ` [PATCH v3 05/14] commit-graph: implement 'git-commit-graph write' Derrick Stolee
2018-02-13 21:57     ` Jonathan Tan
2018-02-08 20:37   ` [PATCH v3 06/14] commit-graph: implement 'git-commit-graph read' Derrick Stolee
2018-02-08 23:38     ` Junio C Hamano
2018-02-08 20:37   ` [PATCH v3 07/14] commit-graph: update graph-head during write Derrick Stolee
2018-02-12 18:56     ` Junio C Hamano
2018-02-12 20:37       ` Junio C Hamano
2018-02-12 21:24         ` Derrick Stolee
2018-02-13 22:38     ` Jonathan Tan
2018-02-08 20:37   ` [PATCH v3 08/14] commit-graph: implement 'git-commit-graph clear' Derrick Stolee
2018-02-13 22:49     ` Jonathan Tan
2018-02-08 20:37   ` [PATCH v3 09/14] commit-graph: implement --delete-expired Derrick Stolee
2018-02-08 20:37   ` [PATCH v3 10/14] commit-graph: add core.commitGraph setting Derrick Stolee
2018-02-08 20:37   ` [PATCH v3 11/14] commit: integrate commit graph with commit parsing Derrick Stolee
2018-02-14  0:12     ` Jonathan Tan
2018-02-14 18:08       ` Derrick Stolee
2018-02-15 18:25     ` Junio C Hamano
2018-02-08 20:37   ` [PATCH v3 12/14] commit-graph: close under reachability Derrick Stolee
2018-02-08 20:37   ` [PATCH v3 13/14] commit-graph: read only from specific pack-indexes Derrick Stolee
2018-02-08 20:37   ` [PATCH v3 14/14] commit-graph: build graph from starting commits Derrick Stolee
2018-02-09 13:02     ` SZEDER Gábor
2018-02-09 13:45       ` Derrick Stolee
2018-02-14 18:15   ` [PATCH v3 00/14] Serialized Git Commit Graph Derrick Stolee
2018-02-14 18:27     ` Stefan Beller
2018-02-14 19:11       ` Derrick Stolee
2018-02-19 18:53     ` [PATCH v4 00/13] " Derrick Stolee
2018-02-19 18:53       ` [PATCH v4 01/13] commit-graph: add format document Derrick Stolee
2018-02-20 20:49         ` Junio C Hamano
2018-02-21 19:23         ` Stefan Beller
2018-02-21 19:45           ` Derrick Stolee
2018-02-21 19:48             ` Stefan Beller
2018-03-30 13:25         ` Jakub Narebski
2018-04-02 13:09           ` Derrick Stolee
2018-04-02 14:09             ` Jakub Narebski
2018-02-19 18:53       ` [PATCH v4 02/13] graph: add commit graph design document Derrick Stolee
2018-02-20 21:42         ` Junio C Hamano
2018-02-23 15:44           ` Derrick Stolee
2018-02-21 19:34         ` Stefan Beller
2018-02-19 18:53       ` [PATCH v4 03/13] commit-graph: create git-commit-graph builtin Derrick Stolee
2018-02-20 21:51         ` Junio C Hamano
2018-02-21 18:58           ` Junio C Hamano
2018-02-23 16:07             ` Derrick Stolee
2018-02-26 16:25         ` SZEDER Gábor
2018-02-26 17:08           ` Derrick Stolee
2018-02-19 18:53       ` [PATCH v4 04/13] commit-graph: implement write_commit_graph() Derrick Stolee
2018-02-20 22:57         ` Junio C Hamano
2018-02-23 17:23           ` Derrick Stolee
2018-02-23 19:30             ` Junio C Hamano
2018-02-23 19:48               ` Junio C Hamano
2018-02-23 20:02               ` Derrick Stolee [this message]
2018-02-26 16:10         ` SZEDER Gábor
2018-02-28 18:47         ` Junio C Hamano
2018-02-19 18:53       ` [PATCH v4 05/13] commit-graph: implement 'git-commit-graph write' Derrick Stolee
2018-02-21 19:25         ` Junio C Hamano
2018-02-19 18:53       ` [PATCH v4 06/13] commit-graph: implement git commit-graph read Derrick Stolee
2018-02-21 20:11         ` Junio C Hamano
2018-02-22 18:25           ` Junio C Hamano
2018-02-19 18:53       ` [PATCH v4 07/13] commit-graph: implement --set-latest Derrick Stolee
2018-02-22 18:31         ` Junio C Hamano
2018-02-23 17:53           ` Derrick Stolee
2018-02-19 18:53       ` [PATCH v4 08/13] commit-graph: implement --delete-expired Derrick Stolee
2018-02-21 21:34         ` Stefan Beller
2018-02-23 17:43           ` Derrick Stolee
2018-02-22 18:48         ` Junio C Hamano
2018-02-23 17:59           ` Derrick Stolee
2018-02-23 19:33             ` Junio C Hamano
2018-02-23 19:41               ` Derrick Stolee
2018-02-23 19:51                 ` Junio C Hamano
2018-02-19 18:53       ` [PATCH v4 09/13] commit-graph: add core.commitGraph setting Derrick Stolee
2018-02-19 18:53       ` [PATCH v4 10/13] commit-graph: close under reachability Derrick Stolee
2018-02-19 18:53       ` [PATCH v4 11/13] commit: integrate commit graph with commit parsing Derrick Stolee
2018-02-19 18:53       ` [PATCH v4 12/13] commit-graph: read only from specific pack-indexes Derrick Stolee
2018-02-21 22:25         ` Stefan Beller
2018-02-23 19:19           ` Derrick Stolee
2018-02-19 18:53       ` [PATCH v4 13/13] commit-graph: build graph from starting commits Derrick Stolee
2018-03-30 11:10       ` [PATCH v4 00/13] Serialized Git Commit Graph Jakub Narebski
2018-04-02 13:02         ` Derrick Stolee
2018-04-02 14:46           ` Jakub Narebski
2018-04-02 15:02             ` Derrick Stolee
2018-04-02 17:35               ` Stefan Beller
2018-04-02 17:54                 ` Derrick Stolee
2018-04-02 18:02                   ` Stefan Beller
2018-04-07 22:37               ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8d29382b-e0a7-968f-b231-f27f89f275d3@gmail.com \
    --to=stolee@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.