git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, szeder.dev@gmail.com, me@ttaylorr.com,
	Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH 13/15] chunk-format: create chunk reading API
Date: Thu, 03 Dec 2020 14:43:21 -0800	[thread overview]
Message-ID: <xmqq1rg6h5o6.fsf@gitster.c.googlers.com> (raw)
In-Reply-To: <6801e231f7414444a272f2ea87dcc6f60f29e25a.1607012215.git.gitgitgadget@gmail.com> (Derrick Stolee via GitGitGadget's message of "Thu, 03 Dec 2020 16:16:52 +0000")

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
> ...
> -		case GRAPH_CHUNKID_DATA:
> -			if (graph->chunk_commit_data)
> -				chunk_repeated = 1;
> -			else
> -				graph->chunk_commit_data = data + chunk_offset;
> -			break;
> +	/* limit the chunk-format list if we are ignoring Bloom filters */
> +	if (!r->settings.commit_graph_read_changed_paths)
> +		chunks_nr = 5;

I presume that this magic number 5 is directly connected to ...

> +static struct read_chunk_info read_chunks[] = {
> +	[0] = {
> +		GRAPH_CHUNKID_OIDFANOUT,
> +		graph_read_oid_fanout
> +	},
> +	[1] = {
> +		GRAPH_CHUNKID_OIDLOOKUP,
> +		graph_read_oid_lookup
> +	},
> +	[2] = {
> +		GRAPH_CHUNKID_DATA,
> +		graph_read_data
> +	},
> +	[3] = {
> +		GRAPH_CHUNKID_EXTRAEDGES,
> +		graph_read_extra_edges
> +	},
> +	[4] = {
> +		GRAPH_CHUNKID_BASE,
> +		graph_read_base_graphs
> +	},
> +	[5] = {
> +		GRAPH_CHUNKID_BLOOMINDEXES,
> +		graph_read_bloom_indices
> +	},
> +	[6] = {
> +		GRAPH_CHUNKID_BLOOMDATA,
> +		graph_read_bloom_data
> +	}
> +};

... the index of these entries in the table.

I find this arrangement brittle.  For one thing, it means that new
chunks cannot be added at an arbitrary place, and BLOOMDATA must be
at the end for the "limit the list when ignoring" logic to work
correctly (even when the developer who is adding a new chunk type
realizes that new one must be inserted before [5], and the magic
number 5 above must be updated), and it won't work at all if a
similar "we can optionally ignore reading the data" treatment needs
to be made for new chunk types, since two or more things cannot be
at the end of the list at the same time X-<.

Would it be a cleaner way to implement this "when member X in the
settings structure is not set, ignore BLOOMINDEXES and BLOOMDATA"
feature to let these graph_read_*() functions have access to the
settings structure, so that bloom_indices and bloom_data callback
functions can make decisions for themselves?

Once we do that, as far as I can tell, there is no longer any reason
to initialize read_chunks[] array with designated initializer.  The
implementation of read_table_of_contents() does not tolerate if this
array is sparsely populated anyway, and except for the "do we ignore
bloom?" hack, nothing really depends on the order of entries in the
array, right?

Thanks.

  parent reply	other threads:[~2020-12-03 22:44 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-03 16:16 [PATCH 00/15] Refactor chunk-format into an API Derrick Stolee via GitGitGadget
2020-12-03 16:16 ` [PATCH 01/15] commit-graph: anonymize data in chunk_write_fn Derrick Stolee via GitGitGadget
2020-12-03 16:16 ` [PATCH 02/15] chunk-format: add API for writing table of contents Derrick Stolee via GitGitGadget
2020-12-08 17:56   ` Taylor Blau
2020-12-03 16:16 ` [PATCH 03/15] midx: rename pack_info to write_midx_context Derrick Stolee via GitGitGadget
2020-12-03 16:16 ` [PATCH 04/15] midx: use context in write_midx_pack_names() Derrick Stolee via GitGitGadget
2020-12-03 16:16 ` [PATCH 05/15] midx: add entries to write_midx_context Derrick Stolee via GitGitGadget
2020-12-03 21:42   ` Junio C Hamano
2020-12-04 13:39     ` Derrick Stolee
2020-12-08 18:00   ` Taylor Blau
2020-12-03 16:16 ` [PATCH 06/15] midx: add pack_perm " Derrick Stolee via GitGitGadget
2020-12-03 16:16 ` [PATCH 07/15] midx: add num_large_offsets " Derrick Stolee via GitGitGadget
2020-12-03 16:16 ` [PATCH 08/15] midx: convert chunk write methods to return int Derrick Stolee via GitGitGadget
2020-12-03 21:50   ` Junio C Hamano
2020-12-04 13:40     ` Derrick Stolee
2020-12-03 16:16 ` [PATCH 09/15] midx: drop chunk progress during write Derrick Stolee via GitGitGadget
2020-12-03 16:16 ` [PATCH 10/15] midx: use chunk-format API in write_midx_internal() Derrick Stolee via GitGitGadget
2020-12-08 18:42   ` Taylor Blau
2020-12-10 14:36     ` Derrick Stolee
2020-12-03 16:16 ` [PATCH 11/15] midx: use 64-bit multiplication for chunk sizes Derrick Stolee via GitGitGadget
2020-12-03 22:00   ` Junio C Hamano
2020-12-08 18:43     ` Taylor Blau
2020-12-03 16:16 ` [PATCH 12/15] chunk-format: create write_chunks() Derrick Stolee via GitGitGadget
2020-12-08 18:45   ` Taylor Blau
2020-12-03 16:16 ` [PATCH 13/15] chunk-format: create chunk reading API Derrick Stolee via GitGitGadget
2020-12-03 22:17   ` Junio C Hamano
2020-12-04 13:47     ` Derrick Stolee
2020-12-04 20:17       ` Junio C Hamano
2020-12-03 22:43   ` Junio C Hamano [this message]
2020-12-04 13:45     ` Derrick Stolee
2020-12-03 16:16 ` [PATCH 14/15] commit-graph: restore duplicate chunk checks Derrick Stolee via GitGitGadget
2020-12-07 13:43   ` Derrick Stolee
2020-12-03 16:16 ` [PATCH 15/15] chunk-format: add technical docs Derrick Stolee via GitGitGadget
2020-12-04 12:48 ` [PATCH 00/15] Refactor chunk-format into an API René Scharfe
2020-12-04 13:57   ` Derrick Stolee
2020-12-04 19:42   ` Junio C Hamano
2020-12-08 18:49   ` Taylor Blau
2020-12-09 17:13     ` René Scharfe
2020-12-10  0:50       ` Taylor Blau
2020-12-10 14:30         ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq1rg6h5o6.fsf@gitster.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=derrickstolee@github.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).