All of lore.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: "René Scharfe" <l.s.r@web.de>
Cc: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, szeder.dev@gmail.com,
	Derrick Stolee <derrickstolee@github.com>
Subject: Re: [PATCH 00/15] Refactor chunk-format into an API
Date: Tue, 8 Dec 2020 13:49:57 -0500	[thread overview]
Message-ID: <X8/K1dUgUmwp8ZOv@nand.local> (raw)
In-Reply-To: <ee0b73f7-8f59-a1dc-0a21-bf796bf9f2e2@web.de>

On Fri, Dec 04, 2020 at 01:48:31PM +0100, René Scharfe wrote:
> Am 03.12.20 um 17:16 schrieb Derrick Stolee via GitGitGadget:
> > I was thinking about file formats recently and realized that the "chunks"
> > that are common to the commit-graph and multi-pack-index could inform future
> > file formats. To make that process easier, let's combine the process of
> > writing and reading chunks into a common API that both of these existing
> > formats use.
> >
> > There is some extra benefit immediately: the writing and reading code for
> > each gets a bit cleaner. Also, there were different checks in each that made
> > the process more robust. Now, these share a common set of checks.
>
> >  Documentation/technical/chunk-format.txt      |  54 ++
> >  .../technical/commit-graph-format.txt         |   3 +
> >  Documentation/technical/pack-format.txt       |   3 +
> >  Makefile                                      |   1 +
> >  chunk-format.c                                | 105 ++++
> >  chunk-format.h                                |  69 +++
> >  commit-graph.c                                | 298 ++++++-----
> >  midx.c                                        | 466 ++++++++----------
> >  t/t5318-commit-graph.sh                       |   2 +-
> >  t/t5319-multi-pack-index.sh                   |   6 +-
> >  10 files changed, 623 insertions(+), 384 deletions(-)
>
> 623-384-54-3-3-1-69-2-6 = 101
>
> So if we ignore changes to documentation, headers, tests and build
> script this spends ca. 100 more lines of code than the current version.
> That's roughly the size of the new file chunk-format.c -- from this
> bird's-eye-view the new API seems to be pure overhead.
>
> In the new code I see several magic numbers, use of void pointers and
> casting as well as repetition -- is this really going in the right
> direction?  I get the feeling that YAGNI.

I think that Stolee is going in the right direction. I suggested earlier
in the thread making a new "chunkfile" type which can handle allocating
new chunks, writing their tables of contents, and so on.

So, I think that we should pursues that direction a little further
before deciding whether or not this is worth continuing. My early
experiments showed that it does add a little more code to the
chunk-format.{c,h} files, but you get negative diffs in midx.c and
commit-graph.c, which is more in line with what I would expect from this
series.

I do think that the "overhead" here is more tolerable than we might
think; I'd rather have a well-documented "chunkfile" implementation
written once and called twice, than two near-identical implementations
of _writing_ the chunks / table of contents at each of the call sites.
So, even if this does end up being a net-lines-added kind of diff, I'd
still say that it's worth it.

With regards to the "YAGNI" comment... I do have thoughts about
extending the reachability bitmap format to use chunks (of course, this
would break compatibility with JGit, and it isn't something that I plan
to do in the short-term, or even necessarily in the future).

In any event, I'm sure that this won't be these two won't be the last
chunk-based formats that we have in Git.

> René

Thanks,
Taylor

  parent reply	other threads:[~2020-12-08 20:39 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-03 16:16 [PATCH 00/15] Refactor chunk-format into an API Derrick Stolee via GitGitGadget
2020-12-03 16:16 ` [PATCH 01/15] commit-graph: anonymize data in chunk_write_fn Derrick Stolee via GitGitGadget
2020-12-03 16:16 ` [PATCH 02/15] chunk-format: add API for writing table of contents Derrick Stolee via GitGitGadget
2020-12-08 17:56   ` Taylor Blau
2020-12-03 16:16 ` [PATCH 03/15] midx: rename pack_info to write_midx_context Derrick Stolee via GitGitGadget
2020-12-03 16:16 ` [PATCH 04/15] midx: use context in write_midx_pack_names() Derrick Stolee via GitGitGadget
2020-12-03 16:16 ` [PATCH 05/15] midx: add entries to write_midx_context Derrick Stolee via GitGitGadget
2020-12-03 21:42   ` Junio C Hamano
2020-12-04 13:39     ` Derrick Stolee
2020-12-08 18:00   ` Taylor Blau
2020-12-03 16:16 ` [PATCH 06/15] midx: add pack_perm " Derrick Stolee via GitGitGadget
2020-12-03 16:16 ` [PATCH 07/15] midx: add num_large_offsets " Derrick Stolee via GitGitGadget
2020-12-03 16:16 ` [PATCH 08/15] midx: convert chunk write methods to return int Derrick Stolee via GitGitGadget
2020-12-03 21:50   ` Junio C Hamano
2020-12-04 13:40     ` Derrick Stolee
2020-12-03 16:16 ` [PATCH 09/15] midx: drop chunk progress during write Derrick Stolee via GitGitGadget
2020-12-03 16:16 ` [PATCH 10/15] midx: use chunk-format API in write_midx_internal() Derrick Stolee via GitGitGadget
2020-12-08 18:42   ` Taylor Blau
2020-12-10 14:36     ` Derrick Stolee
2020-12-03 16:16 ` [PATCH 11/15] midx: use 64-bit multiplication for chunk sizes Derrick Stolee via GitGitGadget
2020-12-03 22:00   ` Junio C Hamano
2020-12-08 18:43     ` Taylor Blau
2020-12-03 16:16 ` [PATCH 12/15] chunk-format: create write_chunks() Derrick Stolee via GitGitGadget
2020-12-08 18:45   ` Taylor Blau
2020-12-03 16:16 ` [PATCH 13/15] chunk-format: create chunk reading API Derrick Stolee via GitGitGadget
2020-12-03 22:17   ` Junio C Hamano
2020-12-04 13:47     ` Derrick Stolee
2020-12-04 20:17       ` Junio C Hamano
2020-12-03 22:43   ` Junio C Hamano
2020-12-04 13:45     ` Derrick Stolee
2020-12-03 16:16 ` [PATCH 14/15] commit-graph: restore duplicate chunk checks Derrick Stolee via GitGitGadget
2020-12-07 13:43   ` Derrick Stolee
2020-12-03 16:16 ` [PATCH 15/15] chunk-format: add technical docs Derrick Stolee via GitGitGadget
2020-12-04 12:48 ` [PATCH 00/15] Refactor chunk-format into an API René Scharfe
2020-12-04 13:57   ` Derrick Stolee
2020-12-04 19:42   ` Junio C Hamano
2020-12-08 18:49   ` Taylor Blau [this message]
2020-12-09 17:13     ` René Scharfe
2020-12-10  0:50       ` Taylor Blau
2020-12-10 14:30         ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=X8/K1dUgUmwp8ZOv@nand.local \
    --to=me@ttaylorr.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=l.s.r@web.de \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.