All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josh Steadmon <steadmon@google.com>
To: git@vger.kernel.org
Cc: me@ttaylorr.com, derrickstolee@github.com,
	lessleydennington@gmail.com, gitster@pobox.com, vdye@github.com,
	avarab@gmail.com, jonathantanmy@google.com
Subject: [PATCH v3] commit-graph: refactor to avoid prepare_repo_settings
Date: Tue, 14 Jun 2022 15:37:21 -0700	[thread overview]
Message-ID: <9b56496b0809cc8a25af877ea97042e2cb7f2af6.1655246092.git.steadmon@google.com> (raw)
In-Reply-To: <Yjt6mLIfw0V3aVTO@nand.local>

From: Taylor Blau <me@ttaylorr.com>

Prior to ab14d0676c (commit-graph: pass a 'struct repository *' in more
places, 2020-09-09), parsing a commit-graph was a pure function
depending only on the contents of the commit-graph itself. Commit
ab14d0676c introduced a dependency on a `struct repository` pointer, and
later commits such as b66d84756f (commit-graph: respect
'commitGraph.readChangedPaths', 2020-09-09) added dependencies on config
settings, which were accessed through the `settings` field of the
repository pointer. This field was initialized via a call to
`prepare_repo_settings()`.

In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only in git
repos), prepare_repo_settings was changed to issue a BUG() if it is
called by a process whose CWD is not a Git repository.

This series of changes broke fuzz-commit-graph, which attempts to parse
arbitrary fuzzing-engine-provided bytes as a commit graph file.
commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
since we run the fuzz tests without a valid repository, we are hitting
the BUG() from 44c7e62 for every test case.

Fix this by moving the majority of the implementaiton of
`parse_commit_graph()` into a new function,
`parse_commit_graph_settings()` that accepts a repo_settings pointer.
This allows fuzz-commit-graph to continue to test the commit-graph
parser implementation without relying on prepare_repo_settings().

Additionally, properly initialize the
repo_settings.commit_graph_generation_version field in
prepare_repo_settings(). Load the value from the config if present, and
default to version 2 otherwise.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
---
Range-diff against v2:
1:  2c2bfc7b43 ! 1:  9b56496b08 commit-graph: refactor to avoid prepare_repo_settings
    @@ fuzz-commit-graph.c: int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size
      	initialize_the_repository();
     -	g = parse_commit_graph(the_repository, (void *)data, size);
     +	/*
    -+	 * Manually initialize commit-graph settings, to avoid the need to run
    -+	 * in an actual repository.
    ++	 * Initialize the_repository with commit-graph settings that would
    ++	 * normally be read from the repository's gitdir. We want to avoid
    ++	 * touching the disk to keep the individual fuzz-test cases as fast as
    ++	 * possible.
     +	 */
     +	the_repository->settings.commit_graph_generation_version = 2;
     +	the_repository->settings.commit_graph_read_changed_paths = 1;

 commit-graph.c      | 22 ++++++++++------------
 commit-graph.h      |  2 ++
 fuzz-commit-graph.c | 10 +++++++++-
 repo-settings.c     | 12 +++++++++++-
 repository.h        |  1 +
 5 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 265c010122..c54a734619 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -96,13 +96,6 @@ define_commit_slab(commit_graph_data_slab, struct commit_graph_data);
 static struct commit_graph_data_slab commit_graph_data_slab =
 	COMMIT_SLAB_INIT(1, commit_graph_data_slab);
 
-static int get_configured_generation_version(struct repository *r)
-{
-	int version = 2;
-	repo_config_get_int(r, "commitgraph.generationversion", &version);
-	return version;
-}
-
 uint32_t commit_graph_position(const struct commit *c)
 {
 	struct commit_graph_data *data =
@@ -335,6 +328,13 @@ static int graph_read_bloom_data(const unsigned char *chunk_start,
 
 struct commit_graph *parse_commit_graph(struct repository *r,
 					void *graph_map, size_t graph_size)
+{
+	prepare_repo_settings(r);
+	return parse_commit_graph_settings(&r->settings, graph_map, graph_size);
+}
+
+struct commit_graph *parse_commit_graph_settings(struct repo_settings *s,
+						 void *graph_map, size_t graph_size)
 {
 	const unsigned char *data;
 	struct commit_graph *graph;
@@ -371,8 +371,6 @@ struct commit_graph *parse_commit_graph(struct repository *r,
 		return NULL;
 	}
 
-	prepare_repo_settings(r);
-
 	graph = alloc_commit_graph();
 
 	graph->hash_len = the_hash_algo->rawsz;
@@ -402,14 +400,14 @@ struct commit_graph *parse_commit_graph(struct repository *r,
 	pair_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges);
 	pair_chunk(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs);
 
-	if (get_configured_generation_version(r) >= 2) {
+	if (s->commit_graph_generation_version >= 2) {
 		pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA,
 			&graph->chunk_generation_data);
 		pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW,
 			&graph->chunk_generation_data_overflow);
 	}
 
-	if (r->settings.commit_graph_read_changed_paths) {
+	if (s->commit_graph_read_changed_paths) {
 		pair_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
 			   &graph->chunk_bloom_indexes);
 		read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
@@ -2288,7 +2286,7 @@ int write_commit_graph(struct object_directory *odb,
 	ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0;
 	ctx->opts = opts;
 	ctx->total_bloom_filter_data_size = 0;
-	ctx->write_generation_data = (get_configured_generation_version(r) == 2);
+	ctx->write_generation_data = (r->settings.commit_graph_generation_version == 2);
 	ctx->num_generation_data_overflows = 0;
 
 	bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY",
diff --git a/commit-graph.h b/commit-graph.h
index 04a94e1830..0f0d28b129 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -95,6 +95,8 @@ struct commit_graph *read_commit_graph_one(struct repository *r,
 					   struct object_directory *odb);
 struct commit_graph *parse_commit_graph(struct repository *r,
 					void *graph_map, size_t graph_size);
+struct commit_graph *parse_commit_graph_settings(struct repo_settings *s,
+					void *graph_map, size_t graph_size);
 
 /*
  * Return 1 if and only if the repository has a commit-graph
diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c
index e7cf6d5b0f..810b1a8001 100644
--- a/fuzz-commit-graph.c
+++ b/fuzz-commit-graph.c
@@ -11,7 +11,15 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
 	struct commit_graph *g;
 
 	initialize_the_repository();
-	g = parse_commit_graph(the_repository, (void *)data, size);
+	/*
+	 * Initialize the_repository with commit-graph settings that would
+	 * normally be read from the repository's gitdir. We want to avoid
+	 * touching the disk to keep the individual fuzz-test cases as fast as
+	 * possible.
+	 */
+	the_repository->settings.commit_graph_generation_version = 2;
+	the_repository->settings.commit_graph_read_changed_paths = 1;
+	g = parse_commit_graph_settings(&the_repository->settings, (void *)data, size);
 	repo_clear(the_repository);
 	free_commit_graph(g);
 
diff --git a/repo-settings.c b/repo-settings.c
index b4fbd16cdc..26241c1c2c 100644
--- a/repo-settings.c
+++ b/repo-settings.c
@@ -10,6 +10,13 @@ static void repo_cfg_bool(struct repository *r, const char *key, int *dest,
 		*dest = def;
 }
 
+static void repo_cfg_int(struct repository *r, const char *key, int *dest,
+			 int def)
+{
+	if (repo_config_get_int(r, key, dest))
+		*dest = def;
+}
+
 void prepare_repo_settings(struct repository *r)
 {
 	int experimental;
@@ -41,11 +48,14 @@ void prepare_repo_settings(struct repository *r)
 		r->settings.core_untracked_cache = UNTRACKED_CACHE_WRITE;
 	}
 
-	/* Boolean config or default, does not cascade (simple)  */
+	/* Commit graph config or default, does not cascade (simple) */
 	repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1);
+	repo_cfg_int(r, "commitgraph.generationversion", &r->settings.commit_graph_generation_version, 2);
 	repo_cfg_bool(r, "commitgraph.readchangedpaths", &r->settings.commit_graph_read_changed_paths, 1);
 	repo_cfg_bool(r, "gc.writecommitgraph", &r->settings.gc_write_commit_graph, 1);
 	repo_cfg_bool(r, "fetch.writecommitgraph", &r->settings.fetch_write_commit_graph, 0);
+
+	/* Boolean config or default, does not cascade (simple)  */
 	repo_cfg_bool(r, "pack.usesparse", &r->settings.pack_use_sparse, 1);
 	repo_cfg_bool(r, "core.multipackindex", &r->settings.core_multi_pack_index, 1);
 	repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0);
diff --git a/repository.h b/repository.h
index ca837cb9e9..4f8275f97c 100644
--- a/repository.h
+++ b/repository.h
@@ -29,6 +29,7 @@ struct repo_settings {
 	int initialized;
 
 	int core_commit_graph;
+	int commit_graph_generation_version;
 	int commit_graph_read_changed_paths;
 	int gc_write_commit_graph;
 	int fetch_write_commit_graph;

base-commit: 715d08a9e51251ad8290b181b6ac3b9e1f9719d7
-- 
2.36.1.476.g0c4daa206d-goog


  parent reply	other threads:[~2022-06-14 22:37 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-23 18:03 [RFC PATCH] repo-settings: set defaults even when not in a repo Josh Steadmon
2022-03-23 19:22 ` Derrick Stolee
2022-03-23 19:52   ` Taylor Blau
2022-03-28 19:15     ` Josh Steadmon
2022-03-29  1:21       ` Taylor Blau
2022-03-28 19:53     ` Josh Steadmon
2022-03-29  1:22       ` Taylor Blau
2022-03-29  9:03     ` Ævar Arnfjörð Bjarmason
2022-03-30  2:26       ` Taylor Blau
2022-04-09  6:33         ` Josh Steadmon
2022-03-29  9:04     ` Ævar Arnfjörð Bjarmason
2022-03-30  2:34       ` Taylor Blau
2022-03-30 17:38         ` Ævar Arnfjörð Bjarmason
2022-03-30 20:14           ` Junio C Hamano
2022-04-09  6:52     ` [RFC PATCH v2] commit-graph: refactor to avoid prepare_repo_settings Josh Steadmon
2022-06-07 20:02       ` Jonathan Tan
2022-06-14 22:38         ` Josh Steadmon
2022-06-14 22:37     ` Josh Steadmon [this message]
2022-06-14 23:32       ` [PATCH v3] " Taylor Blau
2022-06-23 21:59       ` Junio C Hamano
2022-07-14 21:44         ` Josh Steadmon
2022-07-14 21:43     ` [PATCH v4] commit-graph: pass repo_settings instead of repository Josh Steadmon
2022-07-14 22:48       ` Junio C Hamano
2022-03-23 20:11 ` [RFC PATCH] repo-settings: set defaults even when not in a repo Victoria Dye
2022-03-23 20:54   ` Junio C Hamano
2022-03-23 21:19     ` Victoria Dye
2022-03-23 20:51 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9b56496b0809cc8a25af877ea97042e2cb7f2af6.1655246092.git.steadmon@google.com \
    --to=steadmon@google.com \
    --cc=avarab@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=lessleydennington@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.