All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Derrick Stolee <derrickstolee@github.com>
Cc: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, me@ttaylorr.com, gitster@pobox.com,
	abhishekkumar8222@gmail.com
Subject: Re: [PATCH 5/7] commit-graph: document file format v2
Date: Mon, 28 Feb 2022 15:27:49 +0100	[thread overview]
Message-ID: <220228.86pmn73toq.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <d19f5ee8-af92-805f-7ea2-8285862c1123@github.com>


On Mon, Feb 28 2022, Derrick Stolee wrote:

> On 2/25/2022 5:31 PM, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Thu, Feb 24 2022, Derrick Stolee via GitGitGadget wrote:
>> 
> ...
>>>    Generation Data (ID: {'G', 'D', 'A', 'T' }) (N * 4 bytes) [Optional]
>>>      * This list of 4-byte values store corrected commit date offsets for the
>>> @@ -103,6 +112,9 @@ CHUNK DATA:
>>>      * Generation Data chunk is present only when commit-graph file is written
>>>        by compatible versions of Git and in case of split commit-graph chains,
>>>        the topmost layer also has Generation Data chunk.
>>> +    * This chunk does not exist if the commit-graph file format version is 2,
>>> +      because the corrected commit date offset data is stored in the Commit
>>> +      Data chunk.
>>>  
>>>    Generation Data Overflow (ID: {'G', 'D', 'O', 'V' }) [Optional]
>>>      * This list of 8-byte values stores the corrected commit date offsets
>> 
>> We talked a while ago now about how we do commit-graph format changes
>> and this is partially echoing those earlier questions[1] from 2019.
>> 
>> I fully understand why we're writing this amended CDAT chunk in a
>> different layout. By not having the GDAT side-chunk to look up in the
>> data is more local, that part of the file is more compact etc.
>> 
>> What I don't understand is why getting those performance improvements
>> requires the breaking version change & the writing of the incompatible
>> version number.
>> 
>> I.e. couldn't the differently formatted CDAT chunk be written instead to a new
>> chunk name (say "2DAT") instead? Per [1] we'd pay a small fixed cost for
>> a possibly empty chunk (I didn't re-do those numbers), but surely the
>> performance improvements will be about the same for that miniscule
>> overhead.
>
> CDAT is a required chunk. It is part of the v1 spec that CDAT exists
> and is correct. All other Git clients will error out when reading a
> "v1" graph without such a chunk, and in a way that is less helpful to
> users. Instead of clearly indicating "file version is too new" it will
> say "commit-graph is missing the Commit Data chunk" which is not
> helpful.

Yes. That would be the worst of both worlds.

I thought the reference to the 2019-era post made it clear (which is
explicit about this aspect), but I'm talking about writing one of:

 A. An empty chunk
 B. Keeping a "stale" chunk around (as we re-write the graph)
 C. Duplicate writes of new/old chunks.

And not simply omitting the CDAT chunk. As you point out would give you
all the drawbacks of a version number change, with none of the benefits.

I haven't re-tested this now, but at the time doing any of (A..C) would
work smoothly for older clients, while giving newer ones improved data.

>> It will give you something you can't have here, which is optional
>> compatibility with older clients by writing both versions. That'll be a
>> ~2x as large file on disk, but with the page cache & each client version
>> skipping to the data it needs caching characteristics & data locality
>> should work out to about the same thing.
>
> Writing both is the only way that this could work without incrementing
> the graph version number, but I'd rather just update the number and
> avoid wasting the effort to write that extra data.

...

> It seems you are hyper-focused on "we don't _need_ to update the version
> number" and you are willing to recommend wasteful approaches in order to
> support that stance.

I'd say less hyper-focused, and more clarifying an IMO major unstated
trade-off of the proposed format change.

> So: you're right. We don't _need_ to update the version number. But this
> is the best choice among the options available.

...

>> Or maybe they won't. I just found it surprising when reviewing this to
>> not find an answer to why that approach wasn't
>> considered.
>
> The point is to create a new format that can be chosen when deployed
> in an environment where older Git versions will not exist (such as
> a Git server). The new version is not chosen by default and instead
> is opt-in through the commitGraph.generationVersion config option.
>
> Perhaps in a year or two we would consider making this the new
> default, but there is no rush to do so.

Looking into this a bit more I think that in either case this is less of
a big deal after my 43d35618055 (commit-graph write: don't die if the
existing graph is corrupt, 2019-03-25), which came out of some of those
discussions at the time of [1].

I.e. now a client that only understands version N-1 will warn when
loading it, wheras it's only if a pre-v2.22.0 client (which has that
commit) reads the repository that we'd hard die on it, correct?

But speaking of hyper-focus. I think that arguably applies to you in
this case when considering the trade-offs of these sorts of format
changes :)

I.e. you're primarily considering cases of say a git server (presumably
running on GitHub) or another such deployment where it's easy to have
full control over all of your versions "in the wild".

And thus a three-phase rollout of something like a format change can be
done in a timely and predictable manner.

But git is used by *a lot* of people in a bunch of different
scenarios. E.g.:

 * A shared (hopefully read-only) NFS mounted by remote "unmanaged" clients.
 * A tarred-up directory including a .git, which may be transferred to
   a machine with a pre-v2.22.0 version.

Or even softer cases of failure, such as:

 * A cronjob causes an alert/incident somewhere because the server 
   operator started writing a new version, but forgot about a set
   of machines that are still on the old version.

I think that even if it's less conceptually clean it's worth considering
being over backwards to be kinder to such use-cases, unless it's really
required for other reasons to break such in-the-wild use-cases.

Or in this case, if it's thought to be worth it to help reviewers decide
by separating the performance improvement aspect from the changed
interaction between new graphs and older clients.

As a further nit on the proposed end-state here: Do I understand it
correctly that commitGraph.generationVersion=[1|2] (i.e. on current
"master") will always result in a file that's compatible with older
versions, since the only thing "v2" there controls now is to write the
optional GDAT and GDOV chunks?

Whereas going from commitGraph.generationVersion=2 to
commitGraph.generationVersion=3 in this series will impact older clients
as noted above, since we're bumping the version (of the file, to 2 if
the config is 3, which as Junio noted is a bit confusing).

I think if you're set on going down the path of bumping the top-level
version that deserves to be made much clearer in the added
documentation. Right now the only hint to that is a passing mention that
for v3:

    [it] will be incompatible with some old versions of Git

Which if we're opting for breaking format changes really should note
some of the caveats above, that pre-v2.22.0 hard-dies, and probably
describe "some old versions of Git" a bit more clearly.

It actually means once this gets released "the git version that was the
latest one you could download yesterday". Which a reader of the docs
probably won't expect when starting to play with this in mixed-version
environment.

1. https://lore.kernel.org/git/87h8acivkh.fsf@evledraar.gmail.com/

  reply	other threads:[~2022-02-28 15:19 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-24 20:38 [PATCH 0/7] Commit-graph: Generation Number v2 Fixes, v3 implementation Derrick Stolee via GitGitGadget
2022-02-24 20:38 ` [PATCH 1/7] test-read-graph: include extra post-parse info Derrick Stolee via GitGitGadget
2022-02-24 20:38 ` [PATCH 2/7] commit-graph: fix ordering bug in generation numbers Derrick Stolee via GitGitGadget
2022-02-24 22:15   ` Junio C Hamano
2022-02-25 13:51     ` Derrick Stolee
2022-02-25 17:35       ` Junio C Hamano
2022-02-24 20:38 ` [PATCH 3/7] commit-graph: start parsing generation v2 (again) Derrick Stolee via GitGitGadget
2022-02-28 15:18   ` Patrick Steinhardt
2022-02-28 16:23     ` Derrick Stolee
2022-02-28 16:59       ` Patrick Steinhardt
2022-02-28 18:44         ` Derrick Stolee
2022-03-01  9:46           ` Patrick Steinhardt
2022-03-01 10:35             ` Patrick Steinhardt
2022-03-01 14:06               ` Derrick Stolee
2022-03-01 14:53                 ` Patrick Steinhardt
2022-03-01 15:25                   ` Derrick Stolee
2022-03-02 13:57                     ` Patrick Steinhardt
2022-03-02 14:57                       ` Derrick Stolee
2022-03-02 18:15                         ` Junio C Hamano
2022-03-02 18:46                           ` Derrick Stolee
2022-03-02 22:42                             ` Junio C Hamano
2022-03-03 11:19                         ` Patrick Steinhardt
2022-03-03 16:00                           ` Derrick Stolee
2022-03-04 14:03                             ` Derrick Stolee
2022-03-07 10:34                               ` Patrick Steinhardt
2022-03-07 13:45                                 ` Derrick Stolee
2022-03-07 17:22                                   ` Junio C Hamano
2022-03-10 13:58                                   ` Patrick Steinhardt
2022-03-10 17:18                                     ` Derrick Stolee
2022-02-24 20:38 ` [PATCH 4/7] commit-graph: fix generation number v2 overflow values Derrick Stolee via GitGitGadget
2022-02-24 22:35   ` Junio C Hamano
2022-02-25 13:53     ` Derrick Stolee
2022-02-25 17:38       ` Junio C Hamano
2022-02-24 20:38 ` [PATCH 5/7] commit-graph: document file format v2 Derrick Stolee via GitGitGadget
2022-02-24 22:55   ` Junio C Hamano
2022-02-25 22:31   ` Ævar Arnfjörð Bjarmason
2022-02-28 13:44     ` Derrick Stolee
2022-02-28 14:27       ` Ævar Arnfjörð Bjarmason [this message]
2022-02-28 16:39         ` Derrick Stolee
2022-02-28 21:14           ` Ævar Arnfjörð Bjarmason
2022-03-01 14:19             ` Derrick Stolee
2022-03-01 14:29               ` Ævar Arnfjörð Bjarmason
2022-03-01 15:59                 ` Derrick Stolee
2022-02-24 20:38 ` [PATCH 6/7] commit-graph: parse " Derrick Stolee via GitGitGadget
2022-02-24 23:01   ` Junio C Hamano
2022-02-25 13:54     ` Derrick Stolee
2022-02-24 20:38 ` [PATCH 7/7] commit-graph: write " Derrick Stolee via GitGitGadget
2022-02-24 21:42 ` [PATCH 0/7] Commit-graph: Generation Number v2 Fixes, v3 implementation Junio C Hamano
2022-02-24 23:06   ` Junio C Hamano
2022-02-25 13:55     ` Derrick Stolee
2022-02-28 13:53 ` [PATCH v2 0/4] Commit-graph: Generation Number v2 Fixes Derrick Stolee via GitGitGadget
2022-02-28 13:53   ` [PATCH v2 1/4] test-read-graph: include extra post-parse info Derrick Stolee via GitGitGadget
2022-02-28 15:22     ` Ævar Arnfjörð Bjarmason
2022-02-28 13:53   ` [PATCH v2 2/4] commit-graph: fix ordering bug in generation numbers Derrick Stolee via GitGitGadget
2022-02-28 15:25     ` Ævar Arnfjörð Bjarmason
2022-02-28 13:53   ` [PATCH v2 3/4] commit-graph: start parsing generation v2 (again) Derrick Stolee via GitGitGadget
2022-02-28 15:30     ` Ævar Arnfjörð Bjarmason
2022-02-28 16:43       ` Derrick Stolee
2022-02-28 13:53   ` [PATCH v2 4/4] commit-graph: fix generation number v2 overflow values Derrick Stolee via GitGitGadget
2022-02-28 15:40     ` Ævar Arnfjörð Bjarmason
2022-03-01 17:23   ` [PATCH v2 0/4] Commit-graph: Generation Number v2 Fixes Ævar Arnfjörð Bjarmason
2022-03-01 19:48   ` [PATCH v3 0/5] " Derrick Stolee via GitGitGadget
2022-03-01 19:48     ` [PATCH v3 1/5] test-read-graph: include extra post-parse info Derrick Stolee via GitGitGadget
2022-03-01 19:48     ` [PATCH v3 2/5] t5318: extract helpers to lib-commit-graph.sh Derrick Stolee via GitGitGadget
2022-03-01 19:48     ` [PATCH v3 3/5] commit-graph: fix ordering bug in generation numbers Derrick Stolee via GitGitGadget
2022-03-01 20:13       ` Junio C Hamano
2022-03-01 20:30         ` Junio C Hamano
2022-03-02 14:13           ` Derrick Stolee
2022-03-01 19:48     ` [PATCH v3 4/5] commit-graph: start parsing generation v2 (again) Derrick Stolee via GitGitGadget
2022-03-01 19:48     ` [PATCH v3 5/5] commit-graph: fix generation number v2 overflow values Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=220228.86pmn73toq.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=abhishekkumar8222@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.