git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, sandals@crustytoothpaste.net,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v3 0/6] Create commit-graph file format v2
Date: Thu, 02 May 2019 20:02:22 +0200	[thread overview]
Message-ID: <87h8acivkh.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <bb0c22f9-9d0b-0fa6-e826-8e2ac146c6f9@gmail.com>


On Thu, May 02 2019, Derrick Stolee wrote:

> On 5/1/2019 4:25 PM, Ævar Arnfjörð Bjarmason wrote:
>> I won't repeat my outstanding v2 feedback about v1 & v2
>> incompatibilities, except to say that I'd in principle be fine with
>> having a v2 format the way this series is adding it. I.e. saying "here
>> it is, it's never written by default, we'll figure out these compat
>> issues later".
>>
>> My only objection/nit on that point would be that the current
>> docs/commit messages should make some mention of the really bad
>> interactions between v1 and v2 on different git versions.
>
> Good idea to add some warnings in the docs to say something like
> "version 2 is not supported by Git 2.2x and earlier".
>
>> However, having written this again I really don't understand why we need
>> a v2 of this format at all.
>
> [snip]
>
>> How about we instead just don't change the header? I.e.:
>>
>>  * Let's just live with "1" as the marker for SHA-1.
>>
>>    Yeah it would be cute to use 0x73686131 instead like "struct
>>    git_hash_algo", but we can live with a 1=0x73686131 ("sha1"),
>>    2=0x73323536 ("s256") mapping somewhere. It's not like we're going to
>>    be running into the 255 limit of hash algorithms Git will support any
>>    time soon.
>
> This was the intended direction of using the 1-byte value before, but
> we have a preferred plan to use the 4-byte value in all future file formats.

Right, and I wouldn't argue about such a pointless thing for a future
file format.

But since the v1->v2 migration story is so unfriendly already for
reasons that can't be helped at this point (existing released versions)
I think we need to weigh the trade-offs of changing the header v.s. just
doing the conceptually less clean thing that allows existing clients a
painless transition.

>>  * Don't add the reachability index version *to the header* or change
>>    the reserved byte to be an error (see [1] again).
>
> Since we can make the "corrected commit date" offset for a commit be
> strictly larger than the offset of a parent, we can make it so an old client
> will not give incorrect values when we use the new values. The only downside
> would be that we would fail on 'git commit-graph verify' since the offsets
> are not actually generation numbers in all cases.

Aren't you talking about how the *content* (presumably in the chunk part
of the graph) is going to look like? I just mean these couple of bytes
in the header, again as a proxy discussion for "do we *really* need to
change this?".

>> Instead we just add these things to new "chunks" as appropriate. As this
>> patch of mine shows we can easily do that, and it doesn't error out on
>> any existing version of git:
>> https://github.com/avar/git/commit/3fca63e12a9d38867d4bc0a8a25d419c00a09d95
>
> I like the idea of a "metadata" chunk. This can be useful for a lot of things.
> If we start the chunk with a "number of items" and only append items to the
> list, we can dynamically grow the chunk as we add values.

Right. I like it too :) But right now I'm just using it as a demo for
how new arbitrary chunk data can be added to the v1 format in backwards
compatible ways.

My inclination for an actual version of that patch would be to make it
easier to read/extend (even just dump JSON there, or a series of
key/values) over micro-optimizing the storage size. Such metadata will
always be tiny v.s. the rest, but that's for later bikeshedding...

>> I now can't imagine a situation where we'd ever need to change the
>> format. We have 32 bits of chunk ids to play with, and can have 255 of
>> these chunks at a time, and unknown chunks are ignored by existing
>> versions and future version.
>
> The solutions you have discussed work for 2 of the 3 problems at hand.
> The incremental file format is a high-value feature, but _would_ break
> existing clients if they don't understand the extra data. Unless I am
> missing something for how to succeed here.

We would write out a file like this:

    <CGPH signature>
    <rest of v1 header incl. chunk offsets (but higher chunk count)>
    <chunks git understands now>
    <new chunks>
    <signature>

Where one of the new chunks could be INCC ("incremental count") or
whatever, serving the same purpose as the v2 modification of the header
to use the padding byte for the count. Then we'd have more chunks with
the incremental data (or pack it into one "magic" chunk with its own
format...).

Existing clients deal with the graph being incomplete, so the writer
could just not bother to update that part of the data and a newer
clients would know to find the rest in a series of incremental updates.

IOW an "empty" commit-graph now is 1100 bytes. Worst case we'd be
writing at least that number of bytes that would be mostly or entirely
useless to older clients, with the rest being new stuff newer clients
understand.

On the incremental format: I don't like:

 1) The idea that an incremental format would involve in-place
    modification of an existing file (or would we write a completely new
    one and move it in-place?).

    If it's in-place modification we get away a lot of things/avoid
    complexity with "we might delete, but we never modify existing" on
    existing *.{idx,bitmap} formats. E.g. mmap() is a royal PITA
    (more-so than now) once you need to deal with modifications.

    Also, if it's in-place we'd need to fully recompute the checksum
    SHA-1 as we modify the file.

 2) The assumption that we'd just have 255 of these, wouldn't it be
    reasonable to have a MIDX-like for it & write it along with packs as
    they come in? I.e. eventually have PACK.{pack,idx,bitmap,graph}.

    We support more than 255 packs, and it seems likely that we'd
    eventually want to generate/coalesce packs/idx and any other
    side-indexes on the same "gc" schedule.

But those are separate from any back-compat concerns, which is what I
think makes sense to focus on now.

>> 1. See feedback on the v2 patch in
>>    https://public-inbox.org/git/87muk6q98k.fsf@evledraar.gmail.com/
>
> My response [2] to that message includes the discussion of the
> incremental file format.
>
> [2] https://public-inbox.org/git/87muk6q98k.fsf@evledraar.gmail.com/

  reply	other threads:[~2019-05-02 18:02 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-23 21:59 [PATCH 0/6] Create commit-graph file format v2 Derrick Stolee via GitGitGadget
2019-01-23 21:59 ` [PATCH 1/6] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
2019-01-23 21:59 ` [PATCH 2/6] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
2019-01-23 21:59 ` [PATCH 3/6] commit-graph: create new version flags Derrick Stolee via GitGitGadget
2019-01-23 21:59 ` [PATCH 4/6] commit-graph: add --version=<n> option Derrick Stolee via GitGitGadget
2019-01-24  9:31   ` Ævar Arnfjörð Bjarmason
2019-01-23 21:59 ` [PATCH 5/6] commit-graph: implement file format version 2 Derrick Stolee via GitGitGadget
2019-01-23 23:56   ` Jonathan Tan
2019-01-24  9:40   ` Ævar Arnfjörð Bjarmason
2019-01-24 14:34     ` Derrick Stolee
2019-03-21  9:21   ` Ævar Arnfjörð Bjarmason
2019-01-23 21:59 ` [PATCH 6/6] commit-graph: test verifying a corrupt v2 header Derrick Stolee via GitGitGadget
2019-01-23 23:59   ` Jonathan Tan
2019-01-24 23:05 ` [PATCH 0/6] Create commit-graph file format v2 Junio C Hamano
2019-01-24 23:39 ` Junio C Hamano
2019-01-25 13:54   ` Derrick Stolee
2019-04-24 19:58 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
2019-04-24 19:58   ` [PATCH v2 1/5] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
2019-04-24 19:58   ` [PATCH v2 2/5] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
2019-04-25  5:21     ` Junio C Hamano
2019-04-24 19:58   ` [PATCH v2 3/5] commit-graph: create new version flags Derrick Stolee via GitGitGadget
2019-04-25  5:29     ` Junio C Hamano
2019-04-25 11:09       ` Derrick Stolee
2019-04-25 21:31     ` Ævar Arnfjörð Bjarmason
2019-04-26  2:20       ` Junio C Hamano
2019-04-24 19:58   ` [PATCH v2 4/5] commit-graph: add --version=<n> option Derrick Stolee via GitGitGadget
2019-04-24 19:58   ` [PATCH v2 5/5] commit-graph: implement file format version 2 Derrick Stolee via GitGitGadget
2019-04-25 22:09   ` [PATCH v2 0/5] Create commit-graph file format v2 Ævar Arnfjörð Bjarmason
2019-04-26  2:28     ` Junio C Hamano
2019-04-26  8:33       ` Ævar Arnfjörð Bjarmason
2019-04-26 12:06         ` Derrick Stolee
2019-04-26 13:55           ` Ævar Arnfjörð Bjarmason
2019-04-27 12:57     ` Ævar Arnfjörð Bjarmason
2019-05-01 13:11   ` [PATCH v3 0/6] " Derrick Stolee via GitGitGadget
2019-05-01 13:11     ` [PATCH v3 1/6] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
2019-05-01 14:46       ` Ævar Arnfjörð Bjarmason
2019-05-01 13:11     ` [PATCH v3 2/6] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
2019-05-01 13:11     ` [PATCH v3 3/6] commit-graph: create new version parameter Derrick Stolee via GitGitGadget
2019-05-01 13:11     ` [PATCH v3 4/6] commit-graph: add --version=<n> option Derrick Stolee via GitGitGadget
2019-05-01 13:11     ` [PATCH v3 5/6] commit-graph: implement file format version 2 Derrick Stolee via GitGitGadget
2019-05-01 19:12       ` Ævar Arnfjörð Bjarmason
2019-05-01 19:56         ` Derrick Stolee
2019-05-01 13:11     ` [PATCH v3 6/6] commit-graph: remove Future Work section Derrick Stolee via GitGitGadget
2019-05-01 14:58       ` Ævar Arnfjörð Bjarmason
2019-05-01 19:59         ` Derrick Stolee
2019-05-01 20:25     ` [PATCH v3 0/6] Create commit-graph file format v2 Ævar Arnfjörð Bjarmason
2019-05-02 13:26       ` Derrick Stolee
2019-05-02 18:02         ` Ævar Arnfjörð Bjarmason [this message]
2019-05-03 12:47           ` Derrick Stolee
2019-05-03 13:41             ` Ævar Arnfjörð Bjarmason
2019-05-06  8:27               ` Christian Couder
2019-05-06 13:47                 ` Derrick Stolee
2019-05-03 14:16             ` SZEDER Gábor
2019-05-03 15:11               ` Derrick Stolee
2019-05-09 14:22     ` [PATCH v4 00/11] Commit-graph write refactor (was: Create commit-graph file format v2) Derrick Stolee via GitGitGadget
2019-05-09 14:22       ` [PATCH v4 01/11] commit-graph: fix the_repository reference Derrick Stolee via GitGitGadget
2019-05-13  2:56         ` Junio C Hamano
2019-05-09 14:22       ` [PATCH v4 02/11] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
2019-05-13  3:13         ` Junio C Hamano
2019-05-13 11:04           ` Derrick Stolee
2019-05-13 11:22             ` Derrick Stolee
2019-05-09 14:22       ` [PATCH v4 03/11] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
2019-05-13  3:44         ` Junio C Hamano
2019-05-13 11:07           ` Derrick Stolee
2019-05-09 14:22       ` [PATCH v4 04/11] commit-graph: remove Future Work section Derrick Stolee via GitGitGadget
2019-05-09 14:22       ` [PATCH v4 05/11] commit-graph: create write_commit_graph_context Derrick Stolee via GitGitGadget
2019-05-09 14:22       ` [PATCH v4 07/11] commit-graph: extract fill_oids_from_commit_hex() Derrick Stolee via GitGitGadget
2019-05-09 14:22       ` [PATCH v4 06/11] commit-graph: extract fill_oids_from_packs() Derrick Stolee via GitGitGadget
2019-05-13  5:05         ` Junio C Hamano
2019-05-09 14:22       ` [PATCH v4 08/11] commit-graph: extract fill_oids_from_all_packs() Derrick Stolee via GitGitGadget
2019-05-09 14:22       ` [PATCH v4 09/11] commit-graph: extract count_distinct_commits() Derrick Stolee via GitGitGadget
2019-05-09 14:22       ` [PATCH v4 10/11] commit-graph: extract copy_oids_to_commits() Derrick Stolee via GitGitGadget
2019-05-09 14:22       ` [PATCH v4 11/11] commit-graph: extract write_commit_graph_file() Derrick Stolee via GitGitGadget
2019-05-13  5:09         ` Junio C Hamano
2019-05-09 17:58       ` [PATCH v4 00/11] Commit-graph write refactor (was: Create commit-graph file format v2) Josh Steadmon
2019-06-12 13:29       ` [PATCH v5 " Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 01/11] commit-graph: fix the_repository reference Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 02/11] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
2019-06-29 17:23           ` SZEDER Gábor
2019-07-01 12:19             ` Derrick Stolee
2019-06-12 13:29         ` [PATCH v5 03/11] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 04/11] commit-graph: remove Future Work section Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 05/11] commit-graph: create write_commit_graph_context Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 06/11] commit-graph: extract fill_oids_from_packs() Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 08/11] commit-graph: extract fill_oids_from_all_packs() Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 07/11] commit-graph: extract fill_oids_from_commit_hex() Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 09/11] commit-graph: extract count_distinct_commits() Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 10/11] commit-graph: extract copy_oids_to_commits() Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 11/11] commit-graph: extract write_commit_graph_file() Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h8acivkh.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=sandals@crustytoothpaste.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).