All of lore.kernel.org
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, szeder.dev@gmail.com, me@ttaylorr.com,
	Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH 13/15] chunk-format: create chunk reading API
Date: Fri, 4 Dec 2020 08:45:10 -0500	[thread overview]
Message-ID: <1df7dec3-37d1-9ebf-b9c5-66d5bd4cbaa1@gmail.com> (raw)
In-Reply-To: <xmqq1rg6h5o6.fsf@gitster.c.googlers.com>

On 12/3/2020 5:43 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Derrick Stolee <dstolee@microsoft.com>
>> ...
>> -		case GRAPH_CHUNKID_DATA:
>> -			if (graph->chunk_commit_data)
>> -				chunk_repeated = 1;
>> -			else
>> -				graph->chunk_commit_data = data + chunk_offset;
>> -			break;
>> +	/* limit the chunk-format list if we are ignoring Bloom filters */
>> +	if (!r->settings.commit_graph_read_changed_paths)
>> +		chunks_nr = 5;
> 
> I presume that this magic number 5 is directly connected to ...
> 
>> +static struct read_chunk_info read_chunks[] = {
>> +	[0] = {
>> +		GRAPH_CHUNKID_OIDFANOUT,
>> +		graph_read_oid_fanout
>> +	},
>> +	[1] = {
>> +		GRAPH_CHUNKID_OIDLOOKUP,
>> +		graph_read_oid_lookup
>> +	},
>> +	[2] = {
>> +		GRAPH_CHUNKID_DATA,
>> +		graph_read_data
>> +	},
>> +	[3] = {
>> +		GRAPH_CHUNKID_EXTRAEDGES,
>> +		graph_read_extra_edges
>> +	},
>> +	[4] = {
>> +		GRAPH_CHUNKID_BASE,
>> +		graph_read_base_graphs
>> +	},
>> +	[5] = {
>> +		GRAPH_CHUNKID_BLOOMINDEXES,
>> +		graph_read_bloom_indices
>> +	},
>> +	[6] = {
>> +		GRAPH_CHUNKID_BLOOMDATA,
>> +		graph_read_bloom_data
>> +	}
>> +};
> 
> ... the index of these entries in the table.
> 
> I find this arrangement brittle.  For one thing, it means that new
> chunks cannot be added at an arbitrary place, and BLOOMDATA must be
> at the end for the "limit the list when ignoring" logic to work
> correctly (even when the developer who is adding a new chunk type
> realizes that new one must be inserted before [5], and the magic
> number 5 above must be updated), and it won't work at all if a
> similar "we can optionally ignore reading the data" treatment needs
> to be made for new chunk types, since two or more things cannot be
> at the end of the list at the same time X-<.
> 
> Would it be a cleaner way to implement this "when member X in the
> settings structure is not set, ignore BLOOMINDEXES and BLOOMDATA"
> feature to let these graph_read_*() functions have access to the
> settings structure, so that bloom_indices and bloom_data callback
> functions can make decisions for themselves?
> 
> Once we do that, as far as I can tell, there is no longer any reason
> to initialize read_chunks[] array with designated initializer.  The
> implementation of read_table_of_contents() does not tolerate if this
> array is sparsely populated anyway, and except for the "do we ignore
> bloom?" hack, nothing really depends on the order of entries in the
> array, right?

This is a good point, and it's my fault for not noticing this behavior.
In an earlier version, I was initializing read_chunks dynamically
inside parse_commit_graph(), where the change made more sense (the
Bloom chunks were not initialized at all based on this condition). I
think the better way to handle this is to check the config within
the reading functions graph_read_bloom_(indices|data).

The added benefit of checking in the read_chunk_fn is that the
chunk-format API can see that these chunks are "known" chunk types,
in case we were to add logging for "I don't understand this chunk
type".

Thanks,
-Stolee

  reply	other threads:[~2020-12-04 13:45 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-03 16:16 [PATCH 00/15] Refactor chunk-format into an API Derrick Stolee via GitGitGadget
2020-12-03 16:16 ` [PATCH 01/15] commit-graph: anonymize data in chunk_write_fn Derrick Stolee via GitGitGadget
2020-12-03 16:16 ` [PATCH 02/15] chunk-format: add API for writing table of contents Derrick Stolee via GitGitGadget
2020-12-08 17:56   ` Taylor Blau
2020-12-03 16:16 ` [PATCH 03/15] midx: rename pack_info to write_midx_context Derrick Stolee via GitGitGadget
2020-12-03 16:16 ` [PATCH 04/15] midx: use context in write_midx_pack_names() Derrick Stolee via GitGitGadget
2020-12-03 16:16 ` [PATCH 05/15] midx: add entries to write_midx_context Derrick Stolee via GitGitGadget
2020-12-03 21:42   ` Junio C Hamano
2020-12-04 13:39     ` Derrick Stolee
2020-12-08 18:00   ` Taylor Blau
2020-12-03 16:16 ` [PATCH 06/15] midx: add pack_perm " Derrick Stolee via GitGitGadget
2020-12-03 16:16 ` [PATCH 07/15] midx: add num_large_offsets " Derrick Stolee via GitGitGadget
2020-12-03 16:16 ` [PATCH 08/15] midx: convert chunk write methods to return int Derrick Stolee via GitGitGadget
2020-12-03 21:50   ` Junio C Hamano
2020-12-04 13:40     ` Derrick Stolee
2020-12-03 16:16 ` [PATCH 09/15] midx: drop chunk progress during write Derrick Stolee via GitGitGadget
2020-12-03 16:16 ` [PATCH 10/15] midx: use chunk-format API in write_midx_internal() Derrick Stolee via GitGitGadget
2020-12-08 18:42   ` Taylor Blau
2020-12-10 14:36     ` Derrick Stolee
2020-12-03 16:16 ` [PATCH 11/15] midx: use 64-bit multiplication for chunk sizes Derrick Stolee via GitGitGadget
2020-12-03 22:00   ` Junio C Hamano
2020-12-08 18:43     ` Taylor Blau
2020-12-03 16:16 ` [PATCH 12/15] chunk-format: create write_chunks() Derrick Stolee via GitGitGadget
2020-12-08 18:45   ` Taylor Blau
2020-12-03 16:16 ` [PATCH 13/15] chunk-format: create chunk reading API Derrick Stolee via GitGitGadget
2020-12-03 22:17   ` Junio C Hamano
2020-12-04 13:47     ` Derrick Stolee
2020-12-04 20:17       ` Junio C Hamano
2020-12-03 22:43   ` Junio C Hamano
2020-12-04 13:45     ` Derrick Stolee [this message]
2020-12-03 16:16 ` [PATCH 14/15] commit-graph: restore duplicate chunk checks Derrick Stolee via GitGitGadget
2020-12-07 13:43   ` Derrick Stolee
2020-12-03 16:16 ` [PATCH 15/15] chunk-format: add technical docs Derrick Stolee via GitGitGadget
2020-12-04 12:48 ` [PATCH 00/15] Refactor chunk-format into an API René Scharfe
2020-12-04 13:57   ` Derrick Stolee
2020-12-04 19:42   ` Junio C Hamano
2020-12-08 18:49   ` Taylor Blau
2020-12-09 17:13     ` René Scharfe
2020-12-10  0:50       ` Taylor Blau
2020-12-10 14:30         ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1df7dec3-37d1-9ebf-b9c5-66d5bd4cbaa1@gmail.com \
    --to=stolee@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.