* Re: [RFC PATCH] repo-settings: set defaults even when not in a repo
2022-03-23 19:52 ` Taylor Blau
@ 2022-03-28 19:15 ` Josh Steadmon
2022-03-29 1:21 ` Taylor Blau
2022-03-28 19:53 ` Josh Steadmon
` (5 subsequent siblings)
6 siblings, 1 reply; 27+ messages in thread
From: Josh Steadmon @ 2022-03-28 19:15 UTC (permalink / raw)
To: Taylor Blau; +Cc: Derrick Stolee, git, lessleydennington, gitster, vdye
On 2022.03.23 15:52, Taylor Blau wrote:
> On Wed, Mar 23, 2022 at 03:22:13PM -0400, Derrick Stolee wrote:
> > On 3/23/2022 2:03 PM, Josh Steadmon wrote:
> > > prepare_repo_settings() initializes a `struct repository` with various
> > > default config options and settings read from a repository-local config
> > > file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only
> > > in git repos), prepare_repo_settings was changed to issue a BUG() if it
> > > is called by a process whose CWD is not a Git repository. This approach
> > > was suggested in [1].
> > >
> > > This breaks fuzz-commit-graph, which attempts to parse arbitrary
> > > fuzzing-engine-provided bytes as a commit graph file.
> > > commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
> > > since we run the fuzz tests without a valid repository, we are hitting
> > > the BUG() from 44c7e62 for every test case.
> > >
> > > Fix this by refactoring prepare_repo_settings() such that it sets
> > > default options unconditionally; if its process is in a Git repository,
> > > it will also load settings from the local config. This eliminates the
> > > need for a BUG() when not in a repository.
> >
> > I think you have the right idea and this can work.
>
> Hmmm. To me this feels like bending over backwards in
> `prepare_repo_settings()` to accommodate one particular caller. I'm not
> necessarily opposed to it, but it does feel strange to make
> `prepare_repo_settings()` a noop here, since I would expect that any
> callers who do want to call `prepare_repo_settings()` are likely
> convinced that they are inside of a repository, and it probably should
> be a BUG() if they aren't.
>
> I was initially thinking that Josh's alternative of introducing and
> calling a lower-level version of `prepare_commit_graph()` that doesn't
> require the use of any repository settings would make sense.
>
> But when I started looking at implementing it, I became confused at how
> this is supposed to work at all without using a repository. We depend on
> the values parsed from:
>
> - commitGraph.generationVersion
> - commitGraph.readChangedPaths
>
> to determine which part(s) of the commit graph we do and don't read.
>
> Looking around, I think I probably inadvertently broke this in
> ab14d0676c (commit-graph: pass a 'struct repository *' in more places,
> 2020-09-09). But prior to ab14d0676c, neither of those settings existed,
> so parsing the commit graph was a pure function of the commit graph's
> contents alone, and didn't rely on the existence of a repository.
Yeah, I have not done a great job keeping the fuzzers up to date with
commit-graph changes :(.
> We could pretend as if `commitGraph.generationVersion` is always "2" and
> `commitGraph.readChangedPaths` is always "true", and I think that would
> still give us good-enough coverage.
It might also be worthwhile for the fuzzer to test each interesting
combination of settings, using the same arbitrary input.
> I assume that libFuzzer doesn't give us a convenient way to stub out a
> repository. Maybe we should have a variant of `parse_commit_graph` that
> instead takes a `struct repo_settings` filled out with the defaults?
>
> We could use that to teach libFuzzer about additional states that the
> commit graph parser can be in, too (though probably outside the scope of
> this patch).
>
> I tried to sketch all of this out, which seems to work. Let me know what
> you think:
Yes, your patch looks like a much smaller change than I feared would be
the case for a parse_commit_graph refactor. I'll test it out, and
probably add some additional fuzzer improvements on top. Thanks for the
patch!
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH] repo-settings: set defaults even when not in a repo
2022-03-28 19:15 ` Josh Steadmon
@ 2022-03-29 1:21 ` Taylor Blau
0 siblings, 0 replies; 27+ messages in thread
From: Taylor Blau @ 2022-03-29 1:21 UTC (permalink / raw)
To: Josh Steadmon, Derrick Stolee, git, lessleydennington, gitster, vdye
On Mon, Mar 28, 2022 at 12:15:53PM -0700, Josh Steadmon wrote:
> > Looking around, I think I probably inadvertently broke this in
> > ab14d0676c (commit-graph: pass a 'struct repository *' in more places,
> > 2020-09-09). But prior to ab14d0676c, neither of those settings existed,
> > so parsing the commit graph was a pure function of the commit graph's
> > contents alone, and didn't rely on the existence of a repository.
>
> Yeah, I have not done a great job keeping the fuzzers up to date with
> commit-graph changes :(.
I think that puts you and I in the same boat, since the original
breakage from ab14d0676c blames back to me. I'm sorry that I didn't
notice that my change had broken the fuzzing code at the time, and I
appreciate you working on fixing it!
> > We could pretend as if `commitGraph.generationVersion` is always "2" and
> > `commitGraph.readChangedPaths` is always "true", and I think that would
> > still give us good-enough coverage.
>
> It might also be worthwhile for the fuzzer to test each interesting
> combination of settings, using the same arbitrary input.
Definitely. I don't think it hurts to just focus on getting the common
case ("2", "true") working again. And if libFuzzer makes it easy-ish to
test more of the possible input space, I'm all for it.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH] repo-settings: set defaults even when not in a repo
2022-03-23 19:52 ` Taylor Blau
2022-03-28 19:15 ` Josh Steadmon
@ 2022-03-28 19:53 ` Josh Steadmon
2022-03-29 1:22 ` Taylor Blau
2022-03-29 9:03 ` Ævar Arnfjörð Bjarmason
` (4 subsequent siblings)
6 siblings, 1 reply; 27+ messages in thread
From: Josh Steadmon @ 2022-03-28 19:53 UTC (permalink / raw)
To: Taylor Blau; +Cc: Derrick Stolee, git, lessleydennington, gitster, vdye
On 2022.03.23 15:52, Taylor Blau wrote:
> I tried to sketch all of this out, which seems to work. Let me know what
> you think:
BTW, is it OK if I include your Signed-off-by on this when I send my V2?
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH] repo-settings: set defaults even when not in a repo
2022-03-28 19:53 ` Josh Steadmon
@ 2022-03-29 1:22 ` Taylor Blau
0 siblings, 0 replies; 27+ messages in thread
From: Taylor Blau @ 2022-03-29 1:22 UTC (permalink / raw)
To: Josh Steadmon, Derrick Stolee, git, lessleydennington, gitster, vdye
On Mon, Mar 28, 2022 at 12:53:33PM -0700, Josh Steadmon wrote:
> On 2022.03.23 15:52, Taylor Blau wrote:
> > I tried to sketch all of this out, which seems to work. Let me know what
> > you think:
>
> BTW, is it OK if I include your Signed-off-by on this when I send my V2?
Yes, absolutely. Thanks for asking; it's fine to include my
Signed-off-by on any patches / diffs that I send to the mailing list.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH] repo-settings: set defaults even when not in a repo
2022-03-23 19:52 ` Taylor Blau
2022-03-28 19:15 ` Josh Steadmon
2022-03-28 19:53 ` Josh Steadmon
@ 2022-03-29 9:03 ` Ævar Arnfjörð Bjarmason
2022-03-30 2:26 ` Taylor Blau
2022-03-29 9:04 ` Ævar Arnfjörð Bjarmason
` (3 subsequent siblings)
6 siblings, 1 reply; 27+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-29 9:03 UTC (permalink / raw)
To: Taylor Blau
Cc: Derrick Stolee, Josh Steadmon, git, lessleydennington, gitster
On Wed, Mar 23 2022, Taylor Blau wrote:
> /* Boolean config or default, does not cascade (simple) */
> repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1);
> + repo_cfg_int(r, "commitgraph.generationversion", &r->settings.commit_graph_generation_version, 2);
If this ends up being kept let's add it to its own commented "section",
the comment 2-lines above it is now incorrect.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH] repo-settings: set defaults even when not in a repo
2022-03-29 9:03 ` Ævar Arnfjörð Bjarmason
@ 2022-03-30 2:26 ` Taylor Blau
2022-04-09 6:33 ` Josh Steadmon
0 siblings, 1 reply; 27+ messages in thread
From: Taylor Blau @ 2022-03-30 2:26 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: Taylor Blau, Derrick Stolee, Josh Steadmon, git,
lessleydennington, gitster
On Tue, Mar 29, 2022 at 11:03:14AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> On Wed, Mar 23 2022, Taylor Blau wrote:
>
> > /* Boolean config or default, does not cascade (simple) */
> > repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1);
> > + repo_cfg_int(r, "commitgraph.generationversion", &r->settings.commit_graph_generation_version, 2);
>
> If this ends up being kept let's add it to its own commented "section",
> the comment 2-lines above it is now incorrect.
Thanks for pointing it out; indeed that comment is definitely stale with
respect to the newer code. This patch was just a sketch of an
alternative approach for Josh to consider, so I can't guarantee that it
isn't immune to nit-picks ;).
I think that Josh is planning on picking this up, so hopefully he
doesn't mind applying these and any other touch-ups locally before
resubmitting.
(Josh, if you do: thank you very much in advance!)
Thanks,
Taylor
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH] repo-settings: set defaults even when not in a repo
2022-03-30 2:26 ` Taylor Blau
@ 2022-04-09 6:33 ` Josh Steadmon
0 siblings, 0 replies; 27+ messages in thread
From: Josh Steadmon @ 2022-04-09 6:33 UTC (permalink / raw)
To: Taylor Blau
Cc: Ævar Arnfjörð Bjarmason, Derrick Stolee, git,
lessleydennington, gitster
On 2022.03.29 22:26, Taylor Blau wrote:
> On Tue, Mar 29, 2022 at 11:03:14AM +0200, Ævar Arnfjörð Bjarmason wrote:
> >
> > On Wed, Mar 23 2022, Taylor Blau wrote:
> >
> > > /* Boolean config or default, does not cascade (simple) */
> > > repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1);
> > > + repo_cfg_int(r, "commitgraph.generationversion", &r->settings.commit_graph_generation_version, 2);
> >
> > If this ends up being kept let's add it to its own commented "section",
> > the comment 2-lines above it is now incorrect.
>
> Thanks for pointing it out; indeed that comment is definitely stale with
> respect to the newer code. This patch was just a sketch of an
> alternative approach for Josh to consider, so I can't guarantee that it
> isn't immune to nit-picks ;).
>
> I think that Josh is planning on picking this up, so hopefully he
> doesn't mind applying these and any other touch-ups locally before
> resubmitting.
>
> (Josh, if you do: thank you very much in advance!)
>
> Thanks,
> Taylor
Done in V2, to be sent out shortly.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH] repo-settings: set defaults even when not in a repo
2022-03-23 19:52 ` Taylor Blau
` (2 preceding siblings ...)
2022-03-29 9:03 ` Ævar Arnfjörð Bjarmason
@ 2022-03-29 9:04 ` Ævar Arnfjörð Bjarmason
2022-03-30 2:34 ` Taylor Blau
2022-04-09 6:52 ` [RFC PATCH v2] commit-graph: refactor to avoid prepare_repo_settings Josh Steadmon
` (2 subsequent siblings)
6 siblings, 1 reply; 27+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-29 9:04 UTC (permalink / raw)
To: Taylor Blau
Cc: Derrick Stolee, Josh Steadmon, git, lessleydennington, gitster
On Wed, Mar 23 2022, Taylor Blau wrote:
> On Wed, Mar 23, 2022 at 03:22:13PM -0400, Derrick Stolee wrote:
>> On 3/23/2022 2:03 PM, Josh Steadmon wrote:
>> > prepare_repo_settings() initializes a `struct repository` with various
>> > default config options and settings read from a repository-local config
>> > file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only
>> > in git repos), prepare_repo_settings was changed to issue a BUG() if it
>> > is called by a process whose CWD is not a Git repository. This approach
>> > was suggested in [1].
>> >
>> > This breaks fuzz-commit-graph, which attempts to parse arbitrary
>> > fuzzing-engine-provided bytes as a commit graph file.
>> > commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
>> > since we run the fuzz tests without a valid repository, we are hitting
>> > the BUG() from 44c7e62 for every test case.
>> >
>> > Fix this by refactoring prepare_repo_settings() such that it sets
>> > default options unconditionally; if its process is in a Git repository,
>> > it will also load settings from the local config. This eliminates the
>> > need for a BUG() when not in a repository.
>>
>> I think you have the right idea and this can work.
>
> Hmmm. To me this feels like bending over backwards in
> `prepare_repo_settings()` to accommodate one particular caller. I'm not
> necessarily opposed to it, but it does feel strange to make
> `prepare_repo_settings()` a noop here, since I would expect that any
> callers who do want to call `prepare_repo_settings()` are likely
> convinced that they are inside of a repository, and it probably should
> be a BUG() if they aren't.
I think adding that BUG() was overzelous in the first place, per
https://lore.kernel.org/git/211207.86r1apow9f.gmgdl@evledraar.gmail.com/;
I don't see what purpose it solves to be this overly anal in this code,
and 44c7e62e51e (repo-settings: prepare_repo_settings only in git repos,
2021-12-06) just discusses "what" and not "why".
I think a perfectly fine solution to this is just to revert it:
diff --git a/repo-settings.c b/repo-settings.c
index b4fbd16cdcc..e162c1479bf 100644
--- a/repo-settings.c
+++ b/repo-settings.c
@@ -18,7 +18,7 @@ void prepare_repo_settings(struct repository *r)
int manyfiles;
if (!r->gitdir)
- BUG("Cannot add settings for uninitialized repository");
+ return;
if (r->settings.initialized++)
return;
I have that in my local integration branch, because I ended up wanting
to add prepare_repo_settings() to usage.c, which may or may not run
inside a repo (and maybe we'll have that config, maybe not).
But really, in common-main.c we do a initialize_the_repository(), so a
"struct repository" is already a thing we have before we get to the
"RUN_SETUP_GENTLY" or whatever in git.c, and a bunch of things all over
the place assume that it's the equivalent of { 0 }-initialized.
If we actually want to turn repository.[ch] into some strict API where
"Tho Shalt Not Use the_repository unless" we're actually in a repo
surely we should have it be NULL then, and to add that BUG() to the
likes of initialize_the_repository().
Except I think there's no point in that, and it would just lead to
pointless churn, so why do it for the settings in particular? Why can't
they just be { 0 }-init'd too?
If some caller cares about the distinction between r->settings being
like it is because of us actually having a repo, or us using the
defaults why can't they just check r->gitdir themselves?
For the rest the default of "just provide the defaults then" is a much
saner API.
I think *maybe* what this actually wanted to do was to bridge the gap
between "startup_info->have_repository" and a caller in builtin/ calling
prepare_repo_settings(), i.e. that it was a logic error to have that
RUN_SETUP_GENTLY caller do that.
I can see how that *might* be useful as some sanity assertion, but then
maybe we could add a more narrow BUG() just for that case, even having a
builtin_prepare_repo_settings() wrapper in builtin.h or whatever.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH] repo-settings: set defaults even when not in a repo
2022-03-29 9:04 ` Ævar Arnfjörð Bjarmason
@ 2022-03-30 2:34 ` Taylor Blau
2022-03-30 17:38 ` Ævar Arnfjörð Bjarmason
0 siblings, 1 reply; 27+ messages in thread
From: Taylor Blau @ 2022-03-30 2:34 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: Taylor Blau, Derrick Stolee, Josh Steadmon, git,
lessleydennington, gitster
On Tue, Mar 29, 2022 at 11:04:18AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> On Wed, Mar 23 2022, Taylor Blau wrote:
>
> > On Wed, Mar 23, 2022 at 03:22:13PM -0400, Derrick Stolee wrote:
> >> On 3/23/2022 2:03 PM, Josh Steadmon wrote:
> >> > prepare_repo_settings() initializes a `struct repository` with various
> >> > default config options and settings read from a repository-local config
> >> > file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only
> >> > in git repos), prepare_repo_settings was changed to issue a BUG() if it
> >> > is called by a process whose CWD is not a Git repository. This approach
> >> > was suggested in [1].
> >> >
> >> > This breaks fuzz-commit-graph, which attempts to parse arbitrary
> >> > fuzzing-engine-provided bytes as a commit graph file.
> >> > commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
> >> > since we run the fuzz tests without a valid repository, we are hitting
> >> > the BUG() from 44c7e62 for every test case.
> >> >
> >> > Fix this by refactoring prepare_repo_settings() such that it sets
> >> > default options unconditionally; if its process is in a Git repository,
> >> > it will also load settings from the local config. This eliminates the
> >> > need for a BUG() when not in a repository.
> >>
> >> I think you have the right idea and this can work.
> >
> > Hmmm. To me this feels like bending over backwards in
> > `prepare_repo_settings()` to accommodate one particular caller. I'm not
> > necessarily opposed to it, but it does feel strange to make
> > `prepare_repo_settings()` a noop here, since I would expect that any
> > callers who do want to call `prepare_repo_settings()` are likely
> > convinced that they are inside of a repository, and it probably should
> > be a BUG() if they aren't.
>
> I think adding that BUG() was overzelous in the first place, per
> https://lore.kernel.org/git/211207.86r1apow9f.gmgdl@evledraar.gmail.com/;
I think Junio raised a good point in
https://lore.kernel.org/git/xmqqcznh8913.fsf@gitster.g/
, though some of the detail was lost in 44c7e62e51 (repo-settings:
prepare_repo_settings only in git repos, 2021-12-06).
> I have that in my local integration branch, because I ended up wanting
> to add prepare_repo_settings() to usage.c, which may or may not run
> inside a repo (and maybe we'll have that config, maybe not).
I see what you're saying, though I think we would be equally OK to have
a default value of the repo_settings struct that we could rely on. I
said some of this back in
https://lore.kernel.org/git/Yjt6mLIfw0V3aVTO@nand.local/
, namely the parts around "I would expect that any callers who do want
to call `prepare_repo_settings()` are likely convinced that they are
inside of a repository, and it probably should be a BUG() if they
aren't."
Thinking in terms of your message, though, I think the distinction (from
my perspective, at least) is between (a) using something called
_repo_-settings in a non-repo context, and (b) calling a function which
is supposed to fill in its values from a repository (which the caller
implicitly expects to exist).
Neither feel _good_ to me, but (b) feels worse, since it is making it OK
to operate in a likely-unexpected context with respect to the caller's
expectations.
Anyway, I think that we are pretty far into the weeds, and it's likely
time to turn around. I don't have that strong a feeling either way, and
in all honesty either approach is probably just fine.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH] repo-settings: set defaults even when not in a repo
2022-03-30 2:34 ` Taylor Blau
@ 2022-03-30 17:38 ` Ævar Arnfjörð Bjarmason
2022-03-30 20:14 ` Junio C Hamano
0 siblings, 1 reply; 27+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-30 17:38 UTC (permalink / raw)
To: Taylor Blau
Cc: Derrick Stolee, Josh Steadmon, git, lessleydennington, gitster
On Tue, Mar 29 2022, Taylor Blau wrote:
> On Tue, Mar 29, 2022 at 11:04:18AM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Wed, Mar 23 2022, Taylor Blau wrote:
>>
>> > On Wed, Mar 23, 2022 at 03:22:13PM -0400, Derrick Stolee wrote:
>> >> On 3/23/2022 2:03 PM, Josh Steadmon wrote:
>> >> > prepare_repo_settings() initializes a `struct repository` with various
>> >> > default config options and settings read from a repository-local config
>> >> > file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only
>> >> > in git repos), prepare_repo_settings was changed to issue a BUG() if it
>> >> > is called by a process whose CWD is not a Git repository. This approach
>> >> > was suggested in [1].
>> >> >
>> >> > This breaks fuzz-commit-graph, which attempts to parse arbitrary
>> >> > fuzzing-engine-provided bytes as a commit graph file.
>> >> > commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
>> >> > since we run the fuzz tests without a valid repository, we are hitting
>> >> > the BUG() from 44c7e62 for every test case.
>> >> >
>> >> > Fix this by refactoring prepare_repo_settings() such that it sets
>> >> > default options unconditionally; if its process is in a Git repository,
>> >> > it will also load settings from the local config. This eliminates the
>> >> > need for a BUG() when not in a repository.
>> >>
>> >> I think you have the right idea and this can work.
>> >
>> > Hmmm. To me this feels like bending over backwards in
>> > `prepare_repo_settings()` to accommodate one particular caller. I'm not
>> > necessarily opposed to it, but it does feel strange to make
>> > `prepare_repo_settings()` a noop here, since I would expect that any
>> > callers who do want to call `prepare_repo_settings()` are likely
>> > convinced that they are inside of a repository, and it probably should
>> > be a BUG() if they aren't.
>>
>> I think adding that BUG() was overzelous in the first place, per
>> https://lore.kernel.org/git/211207.86r1apow9f.gmgdl@evledraar.gmail.com/;
>
> I think Junio raised a good point in
>
> https://lore.kernel.org/git/xmqqcznh8913.fsf@gitster.g/
>
> , though some of the detail was lost in 44c7e62e51 (repo-settings:
> prepare_repo_settings only in git repos, 2021-12-06).
>
>> I have that in my local integration branch, because I ended up wanting
>> to add prepare_repo_settings() to usage.c, which may or may not run
>> inside a repo (and maybe we'll have that config, maybe not).
>
> I see what you're saying, though I think we would be equally OK to have
> a default value of the repo_settings struct that we could rely on. I
> said some of this back in
>
> https://lore.kernel.org/git/Yjt6mLIfw0V3aVTO@nand.local/
>
> , namely the parts around "I would expect that any callers who do want
> to call `prepare_repo_settings()` are likely convinced that they are
> inside of a repository, and it probably should be a BUG() if they
> aren't."
>
> Thinking in terms of your message, though, I think the distinction (from
> my perspective, at least) is between (a) using something called
> _repo_-settings in a non-repo context, and (b) calling a function which
> is supposed to fill in its values from a repository (which the caller
> implicitly expects to exist).
>
> Neither feel _good_ to me, but (b) feels worse, since it is making it OK
> to operate in a likely-unexpected context with respect to the caller's
> expectations.
I agree that it's a bit iffy. I'm basically advocating for treating
"the_repository->settings" as though it's a new "the_config" or
whatever.
Maybe we'd be better off just making that move, or having
the_repository->settings contain only settings relevant to cases where
we only have a repository.
But I think predicating useful uses of it on that refactoring is
overdoing it a bit, especially as I think your "(b)" concern here is
already something we deal with when it comes to
initialize_the_repository() and checks for
"the_repository->gitdir".
Can't we just have callers that really care about the distinction check
"->gitdir" instead? As they're already doing in some cases already?
Or just:
git mv {repo,global}-settings.c
Since that's what it seems to want to be anyway.
> Anyway, I think that we are pretty far into the weeds, and it's likely
> time to turn around. I don't have that strong a feeling either way, and
> in all honesty either approach is probably just fine.
*nod*
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH] repo-settings: set defaults even when not in a repo
2022-03-30 17:38 ` Ævar Arnfjörð Bjarmason
@ 2022-03-30 20:14 ` Junio C Hamano
0 siblings, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2022-03-30 20:14 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: Taylor Blau, Derrick Stolee, Josh Steadmon, git, lessleydennington
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> Or just:
>
> git mv {repo,global}-settings.c
>
> Since that's what it seems to want to be anyway.
Hmph, care to elaborate a bit more on "seems to"?
Here is my take
- The code makes extensive use of repo_cfg_bool(), which is a thin
wrapper around repo_config_get_bool(); despite its name, it is
not about reading from the configuration file of that repository
and nowhere else. It can be affected by global configuration.
- Other uses of repo_config_get_*() it uses is the same way.
So, it wants to grab a set of configuration that would +apply+ to
this specific instance of "struct repository".
But that is quite different from "give us settings that would apply
in general, I do not have a specific repository in mind", which is
what "global-settings.c" would imply at least to me.
And in order for the "this specific instance" to make sense, the
caller should have made sure that it is indeed a repository.
Lifting that BUG() from the code path not only smells sloppy way to
work around some corner case code that does not prepare the
repository properly, but does not make much sense, at least to me.
In exchange for scrapping the safety to help a caller that forgets
to prepare repository before it is ready to call this function, what
are we gaining?
I went back to the thread-starter message and re-read its
justification. It talks about:
> Concerns:
>
> Are any callers strictly dependent on having a BUG() here? I suspect
> that the worst that would happen is that rather than this BUG(), the
> caller would later hit its own BUG() or die(), so I do not think this is
> a blocker. Additionally, every builtin that directly calls
> prepare_repo_settings is either marked as RUN_SETUP, which means we
> would die() prior to calling it anyway, or checks on its own before
> calling it (builtin/diff.c). There are several callers in library code,
> though, and I have not tracked down how all of those are used.
Asking for existing callers being dependent on having a BUG() is a
pure nonsense. The existing callers are there in shipped versions
of Git exactly because they do things correctly not to hit the BUG(),
so BY DEFINITION, they do not care if the BUG() is there or not.
So that is not "a blocker", but is a non-argument to ask if existing
code paths care if the BUG() is gone.
What BUG() is protecting us against is a careless developer who
writes a new code or alters an existing code path that ends up
making the control flow in such a way that a proper set-up of the
repository structure is bypassed by mistake before calling this
function. The function is call-once by r->settings.initialized
guarding it, calling it and then doing a set-up will result in an
unexplainable bug even if the caller tries to compensate by calling
it twice, as r->settings that is set incorrectly will be sticky.
Having said all that, I can be pursuaded to consider an approach to
allow callers to explicitly ask for running outside repository, just
like the more strict setup_git_directory() for majority of callers
has looser setup_git_directory_gently() counterpart. The current
callers should retain the "you must have discovered gitdir" there,
but a special purpose code that is not even Git (like fuzzer) can
say
prepare_repo_settings_gently(r, &nongit_ok);
instead.
diff --git c/repo-settings.c w/repo-settings.c
index b4fbd16cdc..c492bc7671 100644
--- c/repo-settings.c
+++ w/repo-settings.c
@@ -10,15 +10,24 @@ static void repo_cfg_bool(struct repository *r, const char *key, int *dest,
*dest = def;
}
-void prepare_repo_settings(struct repository *r)
+void prepare_repo_settings_gently(struct repository *r, int *nongit)
{
int experimental;
int value;
char *strval;
int manyfiles;
- if (!r->gitdir)
- BUG("Cannot add settings for uninitialized repository");
+ if (!r->gitdir) {
+ /*
+ * The caller can pass nongit (out paremeter) to ask if r is already
+ * initialized (and act on it after this function returns).
+ */
+ if (!nongit)
+ BUG("Cannot add settings for uninitialized repository");
+ *nongit = 1;
+ } else if (nongit) {
+ *nongit = 0;
+ }
if (r->settings.initialized++)
return;
diff --git c/repository.h w/repository.h
index e29f361703..98f6ec12cc 100644
--- c/repository.h
+++ w/repository.h
@@ -222,7 +222,8 @@ int repo_read_index_unmerged(struct repository *);
*/
void repo_update_index_if_able(struct repository *, struct lock_file *);
-void prepare_repo_settings(struct repository *r);
+#define prepare_repo_settings(r) prepare_repo_settings_gently((r), NULL)
+void prepare_repo_settings_gently(struct repository *r, int *nongit);
/*
* Return 1 if upgrade repository format to target_version succeeded,
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [RFC PATCH v2] commit-graph: refactor to avoid prepare_repo_settings
2022-03-23 19:52 ` Taylor Blau
` (3 preceding siblings ...)
2022-03-29 9:04 ` Ævar Arnfjörð Bjarmason
@ 2022-04-09 6:52 ` Josh Steadmon
2022-06-07 20:02 ` Jonathan Tan
2022-06-14 22:37 ` [PATCH v3] " Josh Steadmon
2022-07-14 21:43 ` [PATCH v4] commit-graph: pass repo_settings instead of repository Josh Steadmon
6 siblings, 1 reply; 27+ messages in thread
From: Josh Steadmon @ 2022-04-09 6:52 UTC (permalink / raw)
To: git; +Cc: me, derrickstolee, lessleydennington, gitster, vdye, avarab
From: Taylor Blau <me@ttaylorr.com>
Prior to ab14d0676c (commit-graph: pass a 'struct repository *' in more
places, 2020-09-09), parsing a commit-graph was a pure function
depending only on the contents of the commit-graph itself. Commit
ab14d0676c introduced a dependency on a `struct repository` pointer, and
later commits such as b66d84756f (commit-graph: respect
'commitGraph.readChangedPaths', 2020-09-09) added dependencies on config
settings, which were accessed through the `settings` field of the
repository pointer. This field was initialized via a call to
`prepare_repo_settings()`.
In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only in git
repos), prepare_repo_settings was changed to issue a BUG() if it is
called by a process whose CWD is not a Git repository.
This series of changes broke fuzz-commit-graph, which attempts to parse
arbitrary fuzzing-engine-provided bytes as a commit graph file.
commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
since we run the fuzz tests without a valid repository, we are hitting
the BUG() from 44c7e62 for every test case.
Fix this by moving the majority of the implementaiton of
`parse_commit_graph()` into a new function,
`parse_commit_graph_settings()` that accepts a repo_settings pointer.
This allows fuzz-commit-graph to continue to test the commit-graph
parser implementation without relying on prepare_repo_settings().
Additionally, properly initialize the
repo_settings.commit_graph_generation_version field in
prepare_repo_settings(). Load the value from the config if present, and
default to version 2 otherwise.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
I've taken the diff from Taylor's message Yjt6mLIfw0V3aVTO@nand.local
with a small tweak to the fuzzer: I didn't see that the commit graph
settings were being initialized anywhere outside of
prepare_repo_settings(), so I manually initialized them in
fuzz-commit-graph. I've also moved the commit-graph settings in
prepare_repo_settings() to their own section, as suggested by Ævar.
I've tried to combine Taylor's explanation from his email with the
commit message from my original patch. Taylor, if you feel like anything
needs to be changed please let me know or feel free to resend with your
changes. Thanks again for providing this fix!
commit-graph.c | 22 ++++++++++------------
commit-graph.h | 2 ++
fuzz-commit-graph.c | 8 +++++++-
repo-settings.c | 12 +++++++++++-
repository.h | 1 +
5 files changed, 31 insertions(+), 14 deletions(-)
diff --git a/commit-graph.c b/commit-graph.c
index 265c010122..c54a734619 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -96,13 +96,6 @@ define_commit_slab(commit_graph_data_slab, struct commit_graph_data);
static struct commit_graph_data_slab commit_graph_data_slab =
COMMIT_SLAB_INIT(1, commit_graph_data_slab);
-static int get_configured_generation_version(struct repository *r)
-{
- int version = 2;
- repo_config_get_int(r, "commitgraph.generationversion", &version);
- return version;
-}
-
uint32_t commit_graph_position(const struct commit *c)
{
struct commit_graph_data *data =
@@ -335,6 +328,13 @@ static int graph_read_bloom_data(const unsigned char *chunk_start,
struct commit_graph *parse_commit_graph(struct repository *r,
void *graph_map, size_t graph_size)
+{
+ prepare_repo_settings(r);
+ return parse_commit_graph_settings(&r->settings, graph_map, graph_size);
+}
+
+struct commit_graph *parse_commit_graph_settings(struct repo_settings *s,
+ void *graph_map, size_t graph_size)
{
const unsigned char *data;
struct commit_graph *graph;
@@ -371,8 +371,6 @@ struct commit_graph *parse_commit_graph(struct repository *r,
return NULL;
}
- prepare_repo_settings(r);
-
graph = alloc_commit_graph();
graph->hash_len = the_hash_algo->rawsz;
@@ -402,14 +400,14 @@ struct commit_graph *parse_commit_graph(struct repository *r,
pair_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges);
pair_chunk(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs);
- if (get_configured_generation_version(r) >= 2) {
+ if (s->commit_graph_generation_version >= 2) {
pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA,
&graph->chunk_generation_data);
pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW,
&graph->chunk_generation_data_overflow);
}
- if (r->settings.commit_graph_read_changed_paths) {
+ if (s->commit_graph_read_changed_paths) {
pair_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
&graph->chunk_bloom_indexes);
read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
@@ -2288,7 +2286,7 @@ int write_commit_graph(struct object_directory *odb,
ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0;
ctx->opts = opts;
ctx->total_bloom_filter_data_size = 0;
- ctx->write_generation_data = (get_configured_generation_version(r) == 2);
+ ctx->write_generation_data = (r->settings.commit_graph_generation_version == 2);
ctx->num_generation_data_overflows = 0;
bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY",
diff --git a/commit-graph.h b/commit-graph.h
index 04a94e1830..0f0d28b129 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -95,6 +95,8 @@ struct commit_graph *read_commit_graph_one(struct repository *r,
struct object_directory *odb);
struct commit_graph *parse_commit_graph(struct repository *r,
void *graph_map, size_t graph_size);
+struct commit_graph *parse_commit_graph_settings(struct repo_settings *s,
+ void *graph_map, size_t graph_size);
/*
* Return 1 if and only if the repository has a commit-graph
diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c
index e7cf6d5b0f..e53a2635f6 100644
--- a/fuzz-commit-graph.c
+++ b/fuzz-commit-graph.c
@@ -11,7 +11,13 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
struct commit_graph *g;
initialize_the_repository();
- g = parse_commit_graph(the_repository, (void *)data, size);
+ /*
+ * Manually initialize commit-graph settings, to avoid the need to run
+ * in an actual repository.
+ */
+ the_repository->settings.commit_graph_generation_version = 2;
+ the_repository->settings.commit_graph_read_changed_paths = 1;
+ g = parse_commit_graph_settings(&the_repository->settings, (void *)data, size);
repo_clear(the_repository);
free_commit_graph(g);
diff --git a/repo-settings.c b/repo-settings.c
index b4fbd16cdc..26241c1c2c 100644
--- a/repo-settings.c
+++ b/repo-settings.c
@@ -10,6 +10,13 @@ static void repo_cfg_bool(struct repository *r, const char *key, int *dest,
*dest = def;
}
+static void repo_cfg_int(struct repository *r, const char *key, int *dest,
+ int def)
+{
+ if (repo_config_get_int(r, key, dest))
+ *dest = def;
+}
+
void prepare_repo_settings(struct repository *r)
{
int experimental;
@@ -41,11 +48,14 @@ void prepare_repo_settings(struct repository *r)
r->settings.core_untracked_cache = UNTRACKED_CACHE_WRITE;
}
- /* Boolean config or default, does not cascade (simple) */
+ /* Commit graph config or default, does not cascade (simple) */
repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1);
+ repo_cfg_int(r, "commitgraph.generationversion", &r->settings.commit_graph_generation_version, 2);
repo_cfg_bool(r, "commitgraph.readchangedpaths", &r->settings.commit_graph_read_changed_paths, 1);
repo_cfg_bool(r, "gc.writecommitgraph", &r->settings.gc_write_commit_graph, 1);
repo_cfg_bool(r, "fetch.writecommitgraph", &r->settings.fetch_write_commit_graph, 0);
+
+ /* Boolean config or default, does not cascade (simple) */
repo_cfg_bool(r, "pack.usesparse", &r->settings.pack_use_sparse, 1);
repo_cfg_bool(r, "core.multipackindex", &r->settings.core_multi_pack_index, 1);
repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0);
diff --git a/repository.h b/repository.h
index ca837cb9e9..4f8275f97c 100644
--- a/repository.h
+++ b/repository.h
@@ -29,6 +29,7 @@ struct repo_settings {
int initialized;
int core_commit_graph;
+ int commit_graph_generation_version;
int commit_graph_read_changed_paths;
int gc_write_commit_graph;
int fetch_write_commit_graph;
base-commit: 715d08a9e51251ad8290b181b6ac3b9e1f9719d7
--
2.35.1.1178.g4f1659d476-goog
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [RFC PATCH v2] commit-graph: refactor to avoid prepare_repo_settings
2022-04-09 6:52 ` [RFC PATCH v2] commit-graph: refactor to avoid prepare_repo_settings Josh Steadmon
@ 2022-06-07 20:02 ` Jonathan Tan
2022-06-14 22:38 ` Josh Steadmon
0 siblings, 1 reply; 27+ messages in thread
From: Jonathan Tan @ 2022-06-07 20:02 UTC (permalink / raw)
To: Josh Steadmon
Cc: Jonathan Tan, git, me, derrickstolee, lessleydennington, gitster,
vdye, avarab
Josh Steadmon <steadmon@google.com> writes:
> diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c
> index e7cf6d5b0f..e53a2635f6 100644
> --- a/fuzz-commit-graph.c
> +++ b/fuzz-commit-graph.c
> @@ -11,7 +11,13 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
> struct commit_graph *g;
>
> initialize_the_repository();
> - g = parse_commit_graph(the_repository, (void *)data, size);
> + /*
> + * Manually initialize commit-graph settings, to avoid the need to run
> + * in an actual repository.
> + */
> + the_repository->settings.commit_graph_generation_version = 2;
> + the_repository->settings.commit_graph_read_changed_paths = 1;
> + g = parse_commit_graph_settings(&the_repository->settings, (void *)data, size);
> repo_clear(the_repository);
> free_commit_graph(g);
The comment doesn't explain why we need to avoid an actual repository. Maybe
better: Initialize the commit-graph settings that would normally be read from
the repository's gitdir.
Other than that, this patch looks good to me. Isolating a part of the
commit graph mechanism that can be fuzzed without access to the disk is,
I think, a good idea, and this patch is a good way to do it.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH v2] commit-graph: refactor to avoid prepare_repo_settings
2022-06-07 20:02 ` Jonathan Tan
@ 2022-06-14 22:38 ` Josh Steadmon
0 siblings, 0 replies; 27+ messages in thread
From: Josh Steadmon @ 2022-06-14 22:38 UTC (permalink / raw)
To: Jonathan Tan
Cc: git, me, derrickstolee, lessleydennington, gitster, vdye, avarab
On 2022.06.07 13:02, Jonathan Tan wrote:
> Josh Steadmon <steadmon@google.com> writes:
> > diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c
> > index e7cf6d5b0f..e53a2635f6 100644
> > --- a/fuzz-commit-graph.c
> > +++ b/fuzz-commit-graph.c
> > @@ -11,7 +11,13 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
> > struct commit_graph *g;
> >
> > initialize_the_repository();
> > - g = parse_commit_graph(the_repository, (void *)data, size);
> > + /*
> > + * Manually initialize commit-graph settings, to avoid the need to run
> > + * in an actual repository.
> > + */
> > + the_repository->settings.commit_graph_generation_version = 2;
> > + the_repository->settings.commit_graph_read_changed_paths = 1;
> > + g = parse_commit_graph_settings(&the_repository->settings, (void *)data, size);
> > repo_clear(the_repository);
> > free_commit_graph(g);
>
> The comment doesn't explain why we need to avoid an actual repository. Maybe
> better: Initialize the commit-graph settings that would normally be read from
> the repository's gitdir.
>
> Other than that, this patch looks good to me. Isolating a part of the
> commit graph mechanism that can be fuzzed without access to the disk is,
> I think, a good idea, and this patch is a good way to do it.
Done in V3, thanks for taking a look!
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v3] commit-graph: refactor to avoid prepare_repo_settings
2022-03-23 19:52 ` Taylor Blau
` (4 preceding siblings ...)
2022-04-09 6:52 ` [RFC PATCH v2] commit-graph: refactor to avoid prepare_repo_settings Josh Steadmon
@ 2022-06-14 22:37 ` Josh Steadmon
2022-06-14 23:32 ` Taylor Blau
2022-06-23 21:59 ` Junio C Hamano
2022-07-14 21:43 ` [PATCH v4] commit-graph: pass repo_settings instead of repository Josh Steadmon
6 siblings, 2 replies; 27+ messages in thread
From: Josh Steadmon @ 2022-06-14 22:37 UTC (permalink / raw)
To: git
Cc: me, derrickstolee, lessleydennington, gitster, vdye, avarab,
jonathantanmy
From: Taylor Blau <me@ttaylorr.com>
Prior to ab14d0676c (commit-graph: pass a 'struct repository *' in more
places, 2020-09-09), parsing a commit-graph was a pure function
depending only on the contents of the commit-graph itself. Commit
ab14d0676c introduced a dependency on a `struct repository` pointer, and
later commits such as b66d84756f (commit-graph: respect
'commitGraph.readChangedPaths', 2020-09-09) added dependencies on config
settings, which were accessed through the `settings` field of the
repository pointer. This field was initialized via a call to
`prepare_repo_settings()`.
In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only in git
repos), prepare_repo_settings was changed to issue a BUG() if it is
called by a process whose CWD is not a Git repository.
This series of changes broke fuzz-commit-graph, which attempts to parse
arbitrary fuzzing-engine-provided bytes as a commit graph file.
commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
since we run the fuzz tests without a valid repository, we are hitting
the BUG() from 44c7e62 for every test case.
Fix this by moving the majority of the implementaiton of
`parse_commit_graph()` into a new function,
`parse_commit_graph_settings()` that accepts a repo_settings pointer.
This allows fuzz-commit-graph to continue to test the commit-graph
parser implementation without relying on prepare_repo_settings().
Additionally, properly initialize the
repo_settings.commit_graph_generation_version field in
prepare_repo_settings(). Load the value from the config if present, and
default to version 2 otherwise.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
---
Range-diff against v2:
1: 2c2bfc7b43 ! 1: 9b56496b08 commit-graph: refactor to avoid prepare_repo_settings
@@ fuzz-commit-graph.c: int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size
initialize_the_repository();
- g = parse_commit_graph(the_repository, (void *)data, size);
+ /*
-+ * Manually initialize commit-graph settings, to avoid the need to run
-+ * in an actual repository.
++ * Initialize the_repository with commit-graph settings that would
++ * normally be read from the repository's gitdir. We want to avoid
++ * touching the disk to keep the individual fuzz-test cases as fast as
++ * possible.
+ */
+ the_repository->settings.commit_graph_generation_version = 2;
+ the_repository->settings.commit_graph_read_changed_paths = 1;
commit-graph.c | 22 ++++++++++------------
commit-graph.h | 2 ++
fuzz-commit-graph.c | 10 +++++++++-
repo-settings.c | 12 +++++++++++-
repository.h | 1 +
5 files changed, 33 insertions(+), 14 deletions(-)
diff --git a/commit-graph.c b/commit-graph.c
index 265c010122..c54a734619 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -96,13 +96,6 @@ define_commit_slab(commit_graph_data_slab, struct commit_graph_data);
static struct commit_graph_data_slab commit_graph_data_slab =
COMMIT_SLAB_INIT(1, commit_graph_data_slab);
-static int get_configured_generation_version(struct repository *r)
-{
- int version = 2;
- repo_config_get_int(r, "commitgraph.generationversion", &version);
- return version;
-}
-
uint32_t commit_graph_position(const struct commit *c)
{
struct commit_graph_data *data =
@@ -335,6 +328,13 @@ static int graph_read_bloom_data(const unsigned char *chunk_start,
struct commit_graph *parse_commit_graph(struct repository *r,
void *graph_map, size_t graph_size)
+{
+ prepare_repo_settings(r);
+ return parse_commit_graph_settings(&r->settings, graph_map, graph_size);
+}
+
+struct commit_graph *parse_commit_graph_settings(struct repo_settings *s,
+ void *graph_map, size_t graph_size)
{
const unsigned char *data;
struct commit_graph *graph;
@@ -371,8 +371,6 @@ struct commit_graph *parse_commit_graph(struct repository *r,
return NULL;
}
- prepare_repo_settings(r);
-
graph = alloc_commit_graph();
graph->hash_len = the_hash_algo->rawsz;
@@ -402,14 +400,14 @@ struct commit_graph *parse_commit_graph(struct repository *r,
pair_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges);
pair_chunk(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs);
- if (get_configured_generation_version(r) >= 2) {
+ if (s->commit_graph_generation_version >= 2) {
pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA,
&graph->chunk_generation_data);
pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW,
&graph->chunk_generation_data_overflow);
}
- if (r->settings.commit_graph_read_changed_paths) {
+ if (s->commit_graph_read_changed_paths) {
pair_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
&graph->chunk_bloom_indexes);
read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
@@ -2288,7 +2286,7 @@ int write_commit_graph(struct object_directory *odb,
ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0;
ctx->opts = opts;
ctx->total_bloom_filter_data_size = 0;
- ctx->write_generation_data = (get_configured_generation_version(r) == 2);
+ ctx->write_generation_data = (r->settings.commit_graph_generation_version == 2);
ctx->num_generation_data_overflows = 0;
bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY",
diff --git a/commit-graph.h b/commit-graph.h
index 04a94e1830..0f0d28b129 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -95,6 +95,8 @@ struct commit_graph *read_commit_graph_one(struct repository *r,
struct object_directory *odb);
struct commit_graph *parse_commit_graph(struct repository *r,
void *graph_map, size_t graph_size);
+struct commit_graph *parse_commit_graph_settings(struct repo_settings *s,
+ void *graph_map, size_t graph_size);
/*
* Return 1 if and only if the repository has a commit-graph
diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c
index e7cf6d5b0f..810b1a8001 100644
--- a/fuzz-commit-graph.c
+++ b/fuzz-commit-graph.c
@@ -11,7 +11,15 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
struct commit_graph *g;
initialize_the_repository();
- g = parse_commit_graph(the_repository, (void *)data, size);
+ /*
+ * Initialize the_repository with commit-graph settings that would
+ * normally be read from the repository's gitdir. We want to avoid
+ * touching the disk to keep the individual fuzz-test cases as fast as
+ * possible.
+ */
+ the_repository->settings.commit_graph_generation_version = 2;
+ the_repository->settings.commit_graph_read_changed_paths = 1;
+ g = parse_commit_graph_settings(&the_repository->settings, (void *)data, size);
repo_clear(the_repository);
free_commit_graph(g);
diff --git a/repo-settings.c b/repo-settings.c
index b4fbd16cdc..26241c1c2c 100644
--- a/repo-settings.c
+++ b/repo-settings.c
@@ -10,6 +10,13 @@ static void repo_cfg_bool(struct repository *r, const char *key, int *dest,
*dest = def;
}
+static void repo_cfg_int(struct repository *r, const char *key, int *dest,
+ int def)
+{
+ if (repo_config_get_int(r, key, dest))
+ *dest = def;
+}
+
void prepare_repo_settings(struct repository *r)
{
int experimental;
@@ -41,11 +48,14 @@ void prepare_repo_settings(struct repository *r)
r->settings.core_untracked_cache = UNTRACKED_CACHE_WRITE;
}
- /* Boolean config or default, does not cascade (simple) */
+ /* Commit graph config or default, does not cascade (simple) */
repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1);
+ repo_cfg_int(r, "commitgraph.generationversion", &r->settings.commit_graph_generation_version, 2);
repo_cfg_bool(r, "commitgraph.readchangedpaths", &r->settings.commit_graph_read_changed_paths, 1);
repo_cfg_bool(r, "gc.writecommitgraph", &r->settings.gc_write_commit_graph, 1);
repo_cfg_bool(r, "fetch.writecommitgraph", &r->settings.fetch_write_commit_graph, 0);
+
+ /* Boolean config or default, does not cascade (simple) */
repo_cfg_bool(r, "pack.usesparse", &r->settings.pack_use_sparse, 1);
repo_cfg_bool(r, "core.multipackindex", &r->settings.core_multi_pack_index, 1);
repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0);
diff --git a/repository.h b/repository.h
index ca837cb9e9..4f8275f97c 100644
--- a/repository.h
+++ b/repository.h
@@ -29,6 +29,7 @@ struct repo_settings {
int initialized;
int core_commit_graph;
+ int commit_graph_generation_version;
int commit_graph_read_changed_paths;
int gc_write_commit_graph;
int fetch_write_commit_graph;
base-commit: 715d08a9e51251ad8290b181b6ac3b9e1f9719d7
--
2.36.1.476.g0c4daa206d-goog
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v3] commit-graph: refactor to avoid prepare_repo_settings
2022-06-14 22:37 ` [PATCH v3] " Josh Steadmon
@ 2022-06-14 23:32 ` Taylor Blau
2022-06-23 21:59 ` Junio C Hamano
1 sibling, 0 replies; 27+ messages in thread
From: Taylor Blau @ 2022-06-14 23:32 UTC (permalink / raw)
To: Josh Steadmon
Cc: git, me, derrickstolee, lessleydennington, gitster, vdye, avarab,
jonathantanmy
On Tue, Jun 14, 2022 at 03:37:21PM -0700, Josh Steadmon wrote:
> Range-diff against v2:
> 1: 2c2bfc7b43 ! 1: 9b56496b08 commit-graph: refactor to avoid prepare_repo_settings
> @@ fuzz-commit-graph.c: int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size
> initialize_the_repository();
> - g = parse_commit_graph(the_repository, (void *)data, size);
> + /*
> -+ * Manually initialize commit-graph settings, to avoid the need to run
> -+ * in an actual repository.
> ++ * Initialize the_repository with commit-graph settings that would
> ++ * normally be read from the repository's gitdir. We want to avoid
> ++ * touching the disk to keep the individual fuzz-test cases as fast as
> ++ * possible.
> + */
> + the_repository->settings.commit_graph_generation_version = 2;
> + the_repository->settings.commit_graph_read_changed_paths = 1;
This version looks good to me. Thanks for working on this, Josh!
Thanks,
Taylor
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3] commit-graph: refactor to avoid prepare_repo_settings
2022-06-14 22:37 ` [PATCH v3] " Josh Steadmon
2022-06-14 23:32 ` Taylor Blau
@ 2022-06-23 21:59 ` Junio C Hamano
2022-07-14 21:44 ` Josh Steadmon
1 sibling, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2022-06-23 21:59 UTC (permalink / raw)
To: Josh Steadmon
Cc: git, me, derrickstolee, lessleydennington, vdye, avarab, jonathantanmy
Josh Steadmon <steadmon@google.com> writes:
> This series of changes broke fuzz-commit-graph, which attempts to parse
> arbitrary fuzzing-engine-provided bytes as a commit graph file.
> commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
> since we run the fuzz tests without a valid repository, we are hitting
> the BUG() from 44c7e62 for every test case.
>
> Fix this by moving the majority of the implementaiton of
> `parse_commit_graph()` into a new function,
> `parse_commit_graph_settings()` that accepts a repo_settings pointer.
> This allows fuzz-commit-graph to continue to test the commit-graph
> parser implementation without relying on prepare_repo_settings().
It sounds like this is not a "fix" but a workaround to bend the
production code so that a non-production test shim can be inserted
more easily.
I am OK with the idea, but have a huge problem with the new name.
Is it just me who thinks parse_commit_graph_settings() is a function
that parses some kind of settings that affects the way the commit
graph gets used or written?
Stepping back a bit, why can't fuzz-commit-graph prepare a
repository object that looks sufficiently real? Something along the
lines of...
struct repository fake_repo;
fake_repo.settings.initialized = 1;
fake_repo.gitdir = ".";
parse_commit_graph(&fake_repo, (void *)data, size);
...
Also, I feel somewhat uneasy to see these changes:
> - if (get_configured_generation_version(r) >= 2) {
> + if (s->commit_graph_generation_version >= 2) {
> - if (r->settings.commit_graph_read_changed_paths) {
> + if (s->commit_graph_read_changed_paths) {
> - ctx->write_generation_data = (get_configured_generation_version(r) == 2);
> + ctx->write_generation_data = (r->settings.commit_graph_generation_version == 2);
> ctx->num_generation_data_overflows = 0;
that makes the production code bend over backwards to _avoid_
referencing 'r', only to cater to the test shim. That's an
artificial limitation we are forcing on our developers who works on
this code. It might be that what is in the repository settings is
sufficient for today's code to work, but I do not think needs for
fuzz tests should tie the hand of this production code by forbidding
it to look at other things in the repository in the future. After
all, tests are to serve the production code, not the other way
around.
On the other hand, I think a change that is slightly smaller than
the posted patch, which justifies itself with a completely different
rationale, would be totally acceptable. You can justify this change
with NO mention of fuzzers.
The parse_commit_graph() function takes a "struct repository *"
pointer, but all it cares about is the .settings member of it.
Update the function and all its existing callers so that it
takes "struct repo_settings *" instead.
Now, in the future, some developers _might_ find it necessary to
access stuff other than the repository settings to validate the
contents of the graph file, and we may need to change it to take a
full repository structure again. The test should adjust to such
needs of the production code, not the other way around. But until
then...
Thanks.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3] commit-graph: refactor to avoid prepare_repo_settings
2022-06-23 21:59 ` Junio C Hamano
@ 2022-07-14 21:44 ` Josh Steadmon
0 siblings, 0 replies; 27+ messages in thread
From: Josh Steadmon @ 2022-07-14 21:44 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, me, derrickstolee, lessleydennington, vdye, avarab, jonathantanmy
On 2022.06.23 14:59, Junio C Hamano wrote:
> Josh Steadmon <steadmon@google.com> writes:
>
> > This series of changes broke fuzz-commit-graph, which attempts to parse
> > arbitrary fuzzing-engine-provided bytes as a commit graph file.
> > commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
> > since we run the fuzz tests without a valid repository, we are hitting
> > the BUG() from 44c7e62 for every test case.
> >
> > Fix this by moving the majority of the implementaiton of
> > `parse_commit_graph()` into a new function,
> > `parse_commit_graph_settings()` that accepts a repo_settings pointer.
> > This allows fuzz-commit-graph to continue to test the commit-graph
> > parser implementation without relying on prepare_repo_settings().
>
> It sounds like this is not a "fix" but a workaround to bend the
> production code so that a non-production test shim can be inserted
> more easily.
>
> I am OK with the idea, but have a huge problem with the new name.
>
> Is it just me who thinks parse_commit_graph_settings() is a function
> that parses some kind of settings that affects the way the commit
> graph gets used or written?
>
> Stepping back a bit, why can't fuzz-commit-graph prepare a
> repository object that looks sufficiently real? Something along the
> lines of...
>
> struct repository fake_repo;
>
> fake_repo.settings.initialized = 1;
> fake_repo.gitdir = ".";
> parse_commit_graph(&fake_repo, (void *)data, size);
> ...
>
> Also, I feel somewhat uneasy to see these changes:
>
> > - if (get_configured_generation_version(r) >= 2) {
> > + if (s->commit_graph_generation_version >= 2) {
> > - if (r->settings.commit_graph_read_changed_paths) {
> > + if (s->commit_graph_read_changed_paths) {
> > - ctx->write_generation_data = (get_configured_generation_version(r) == 2);
> > + ctx->write_generation_data = (r->settings.commit_graph_generation_version == 2);
> > ctx->num_generation_data_overflows = 0;
>
> that makes the production code bend over backwards to _avoid_
> referencing 'r', only to cater to the test shim. That's an
> artificial limitation we are forcing on our developers who works on
> this code. It might be that what is in the repository settings is
> sufficient for today's code to work, but I do not think needs for
> fuzz tests should tie the hand of this production code by forbidding
> it to look at other things in the repository in the future. After
> all, tests are to serve the production code, not the other way
> around.
>
> On the other hand, I think a change that is slightly smaller than
> the posted patch, which justifies itself with a completely different
> rationale, would be totally acceptable. You can justify this change
> with NO mention of fuzzers.
>
> The parse_commit_graph() function takes a "struct repository *"
> pointer, but all it cares about is the .settings member of it.
>
> Update the function and all its existing callers so that it
> takes "struct repo_settings *" instead.
>
> Now, in the future, some developers _might_ find it necessary to
> access stuff other than the repository settings to validate the
> contents of the graph file, and we may need to change it to take a
> full repository structure again. The test should adjust to such
> needs of the production code, not the other way around. But until
> then...
>
> Thanks.
I trimmed down the changes and reworded the commit message for V4. I'm
assuming you don't object to mentioning the fuzzer, just that it
shouldn't be the main justifiation for the change.
Sorry for the delayed response, and thanks for the feedback.
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v4] commit-graph: pass repo_settings instead of repository
2022-03-23 19:52 ` Taylor Blau
` (5 preceding siblings ...)
2022-06-14 22:37 ` [PATCH v3] " Josh Steadmon
@ 2022-07-14 21:43 ` Josh Steadmon
2022-07-14 22:48 ` Junio C Hamano
6 siblings, 1 reply; 27+ messages in thread
From: Josh Steadmon @ 2022-07-14 21:43 UTC (permalink / raw)
To: git; +Cc: me, gitster
From: Taylor Blau <me@ttaylorr.com>
The parse_commit_graph() function takes a 'struct repository *' pointer,
but it only ever accesses config settings (either directly or through
the .settings field of the repo struct). Move all relevant config
settings into the repo_settings struct, and update parse_commit_graph()
and its existing callers so that it takes 'struct repo_settings *'
instead.
Callers of parse_commit_graph() will now need to call
prepare_repo_settings() themselves, or initialize a 'struct
repo_settings' directly.
Prior to ab14d0676c (commit-graph: pass a 'struct repository *' in more
places, 2020-09-09), parsing a commit-graph was a pure function
depending only on the contents of the commit-graph itself. Commit
ab14d0676c introduced a dependency on a `struct repository` pointer, and
later commits such as b66d84756f (commit-graph: respect
'commitGraph.readChangedPaths', 2020-09-09) added dependencies on config
settings, which were accessed through the `settings` field of the
repository pointer. This field was initialized via a call to
`prepare_repo_settings()`.
Additionally, this fixes an issue in fuzz-commit-graph: In 44c7e62
(2021-12-06, repo-settings:prepare_repo_settings only in git repos),
prepare_repo_settings was changed to issue a BUG() if it is called by a
process whose CWD is not a Git repository.
The combination of commits mentioned above broke fuzz-commit-graph,
which attempts to parse arbitrary fuzzing-engine-provided bytes as a
commit graph file. Prior to this change, parse_commit_graph() called
prepare_repo_settings(), but since we run the fuzz tests without a valid
repository, we are hitting the BUG() from 44c7e62 for every test case.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
---
Range-diff against v3:
1: 9b56496b08 ! 1: fd70b61191 commit-graph: refactor to avoid prepare_repo_settings
@@ Metadata
Author: Taylor Blau <me@ttaylorr.com>
## Commit message ##
- commit-graph: refactor to avoid prepare_repo_settings
+ commit-graph: pass repo_settings instead of repository
+
+ The parse_commit_graph() function takes a 'struct repository *' pointer,
+ but it only ever accesses config settings (either directly or through
+ the .settings field of the repo struct). Move all relevant config
+ settings into the repo_settings struct, and update parse_commit_graph()
+ and its existing callers so that it takes 'struct repo_settings *'
+ instead.
+
+ Callers of parse_commit_graph() will now need to call
+ prepare_repo_settings() themselves, or initialize a 'struct
+ repo_settings' directly.
Prior to ab14d0676c (commit-graph: pass a 'struct repository *' in more
places, 2020-09-09), parsing a commit-graph was a pure function
@@ Commit message
repository pointer. This field was initialized via a call to
`prepare_repo_settings()`.
- In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only in git
- repos), prepare_repo_settings was changed to issue a BUG() if it is
- called by a process whose CWD is not a Git repository.
-
- This series of changes broke fuzz-commit-graph, which attempts to parse
- arbitrary fuzzing-engine-provided bytes as a commit graph file.
- commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
- since we run the fuzz tests without a valid repository, we are hitting
- the BUG() from 44c7e62 for every test case.
+ Additionally, this fixes an issue in fuzz-commit-graph: In 44c7e62
+ (2021-12-06, repo-settings:prepare_repo_settings only in git repos),
+ prepare_repo_settings was changed to issue a BUG() if it is called by a
+ process whose CWD is not a Git repository.
- Fix this by moving the majority of the implementaiton of
- `parse_commit_graph()` into a new function,
- `parse_commit_graph_settings()` that accepts a repo_settings pointer.
- This allows fuzz-commit-graph to continue to test the commit-graph
- parser implementation without relying on prepare_repo_settings().
-
- Additionally, properly initialize the
- repo_settings.commit_graph_generation_version field in
- prepare_repo_settings(). Load the value from the config if present, and
- default to version 2 otherwise.
+ The combination of commits mentioned above broke fuzz-commit-graph,
+ which attempts to parse arbitrary fuzzing-engine-provided bytes as a
+ commit graph file. Prior to this change, parse_commit_graph() called
+ prepare_repo_settings(), but since we run the fuzz tests without a valid
+ repository, we are hitting the BUG() from 44c7e62 for every test case.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
## commit-graph.c ##
-@@ commit-graph.c: define_commit_slab(commit_graph_data_slab, struct commit_graph_data);
- static struct commit_graph_data_slab commit_graph_data_slab =
- COMMIT_SLAB_INIT(1, commit_graph_data_slab);
+@@ commit-graph.c: struct commit_graph *load_commit_graph_one_fd_st(struct repository *r,
+ }
+ graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0);
+ close(fd);
+- ret = parse_commit_graph(r, graph_map, graph_size);
++ prepare_repo_settings(r);
++ ret = parse_commit_graph(&r->settings, graph_map, graph_size);
--static int get_configured_generation_version(struct repository *r)
--{
-- int version = 2;
-- repo_config_get_int(r, "commitgraph.generationversion", &version);
-- return version;
--}
--
- uint32_t commit_graph_position(const struct commit *c)
- {
- struct commit_graph_data *data =
+ if (ret)
+ ret->odb = odb;
@@ commit-graph.c: static int graph_read_bloom_data(const unsigned char *chunk_start,
+ return 0;
+ }
- struct commit_graph *parse_commit_graph(struct repository *r,
+-struct commit_graph *parse_commit_graph(struct repository *r,
++struct commit_graph *parse_commit_graph(struct repo_settings *s,
void *graph_map, size_t graph_size)
-+{
-+ prepare_repo_settings(r);
-+ return parse_commit_graph_settings(&r->settings, graph_map, graph_size);
-+}
-+
-+struct commit_graph *parse_commit_graph_settings(struct repo_settings *s,
-+ void *graph_map, size_t graph_size)
{
const unsigned char *data;
- struct commit_graph *graph;
@@ commit-graph.c: struct commit_graph *parse_commit_graph(struct repository *r,
return NULL;
}
@@ commit-graph.c: struct commit_graph *parse_commit_graph(struct repository *r,
pair_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
&graph->chunk_bloom_indexes);
read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
-@@ commit-graph.c: int write_commit_graph(struct object_directory *odb,
- ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0;
- ctx->opts = opts;
- ctx->total_bloom_filter_data_size = 0;
-- ctx->write_generation_data = (get_configured_generation_version(r) == 2);
-+ ctx->write_generation_data = (r->settings.commit_graph_generation_version == 2);
- ctx->num_generation_data_overflows = 0;
-
- bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY",
## commit-graph.h ##
-@@ commit-graph.h: struct commit_graph *read_commit_graph_one(struct repository *r,
+@@ commit-graph.h: struct commit_graph *load_commit_graph_one_fd_st(struct repository *r,
+ struct object_directory *odb);
+ struct commit_graph *read_commit_graph_one(struct repository *r,
struct object_directory *odb);
- struct commit_graph *parse_commit_graph(struct repository *r,
+-struct commit_graph *parse_commit_graph(struct repository *r,
++
++/*
++ * Callers should initialize the repo_settings with prepare_repo_settings()
++ * prior to calling parse_commit_graph().
++ */
++struct commit_graph *parse_commit_graph(struct repo_settings *s,
void *graph_map, size_t graph_size);
-+struct commit_graph *parse_commit_graph_settings(struct repo_settings *s,
-+ void *graph_map, size_t graph_size);
/*
- * Return 1 if and only if the repository has a commit-graph
## fuzz-commit-graph.c ##
+@@
+ #include "commit-graph.h"
+ #include "repository.h"
+
+-struct commit_graph *parse_commit_graph(struct repository *r,
++struct commit_graph *parse_commit_graph(struct repo_settings *s,
+ void *graph_map, size_t graph_size);
+
+ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size);
@@ fuzz-commit-graph.c: int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
struct commit_graph *g;
@@ fuzz-commit-graph.c: int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size
+ */
+ the_repository->settings.commit_graph_generation_version = 2;
+ the_repository->settings.commit_graph_read_changed_paths = 1;
-+ g = parse_commit_graph_settings(&the_repository->settings, (void *)data, size);
++ g = parse_commit_graph(&the_repository->settings, (void *)data, size);
repo_clear(the_repository);
free_commit_graph(g);
commit-graph.c | 11 +++++------
commit-graph.h | 7 ++++++-
fuzz-commit-graph.c | 12 ++++++++++--
repo-settings.c | 12 +++++++++++-
repository.h | 1 +
5 files changed, 33 insertions(+), 10 deletions(-)
diff --git a/commit-graph.c b/commit-graph.c
index 265c010122..f305b65117 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -264,7 +264,8 @@ struct commit_graph *load_commit_graph_one_fd_st(struct repository *r,
}
graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0);
close(fd);
- ret = parse_commit_graph(r, graph_map, graph_size);
+ prepare_repo_settings(r);
+ ret = parse_commit_graph(&r->settings, graph_map, graph_size);
if (ret)
ret->odb = odb;
@@ -333,7 +334,7 @@ static int graph_read_bloom_data(const unsigned char *chunk_start,
return 0;
}
-struct commit_graph *parse_commit_graph(struct repository *r,
+struct commit_graph *parse_commit_graph(struct repo_settings *s,
void *graph_map, size_t graph_size)
{
const unsigned char *data;
@@ -371,8 +372,6 @@ struct commit_graph *parse_commit_graph(struct repository *r,
return NULL;
}
- prepare_repo_settings(r);
-
graph = alloc_commit_graph();
graph->hash_len = the_hash_algo->rawsz;
@@ -402,14 +401,14 @@ struct commit_graph *parse_commit_graph(struct repository *r,
pair_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges);
pair_chunk(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs);
- if (get_configured_generation_version(r) >= 2) {
+ if (s->commit_graph_generation_version >= 2) {
pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA,
&graph->chunk_generation_data);
pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW,
&graph->chunk_generation_data_overflow);
}
- if (r->settings.commit_graph_read_changed_paths) {
+ if (s->commit_graph_read_changed_paths) {
pair_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
&graph->chunk_bloom_indexes);
read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
diff --git a/commit-graph.h b/commit-graph.h
index 04a94e1830..c89b336791 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -93,7 +93,12 @@ struct commit_graph *load_commit_graph_one_fd_st(struct repository *r,
struct object_directory *odb);
struct commit_graph *read_commit_graph_one(struct repository *r,
struct object_directory *odb);
-struct commit_graph *parse_commit_graph(struct repository *r,
+
+/*
+ * Callers should initialize the repo_settings with prepare_repo_settings()
+ * prior to calling parse_commit_graph().
+ */
+struct commit_graph *parse_commit_graph(struct repo_settings *s,
void *graph_map, size_t graph_size);
/*
diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c
index e7cf6d5b0f..914026f5d8 100644
--- a/fuzz-commit-graph.c
+++ b/fuzz-commit-graph.c
@@ -1,7 +1,7 @@
#include "commit-graph.h"
#include "repository.h"
-struct commit_graph *parse_commit_graph(struct repository *r,
+struct commit_graph *parse_commit_graph(struct repo_settings *s,
void *graph_map, size_t graph_size);
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size);
@@ -11,7 +11,15 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
struct commit_graph *g;
initialize_the_repository();
- g = parse_commit_graph(the_repository, (void *)data, size);
+ /*
+ * Initialize the_repository with commit-graph settings that would
+ * normally be read from the repository's gitdir. We want to avoid
+ * touching the disk to keep the individual fuzz-test cases as fast as
+ * possible.
+ */
+ the_repository->settings.commit_graph_generation_version = 2;
+ the_repository->settings.commit_graph_read_changed_paths = 1;
+ g = parse_commit_graph(&the_repository->settings, (void *)data, size);
repo_clear(the_repository);
free_commit_graph(g);
diff --git a/repo-settings.c b/repo-settings.c
index b4fbd16cdc..26241c1c2c 100644
--- a/repo-settings.c
+++ b/repo-settings.c
@@ -10,6 +10,13 @@ static void repo_cfg_bool(struct repository *r, const char *key, int *dest,
*dest = def;
}
+static void repo_cfg_int(struct repository *r, const char *key, int *dest,
+ int def)
+{
+ if (repo_config_get_int(r, key, dest))
+ *dest = def;
+}
+
void prepare_repo_settings(struct repository *r)
{
int experimental;
@@ -41,11 +48,14 @@ void prepare_repo_settings(struct repository *r)
r->settings.core_untracked_cache = UNTRACKED_CACHE_WRITE;
}
- /* Boolean config or default, does not cascade (simple) */
+ /* Commit graph config or default, does not cascade (simple) */
repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1);
+ repo_cfg_int(r, "commitgraph.generationversion", &r->settings.commit_graph_generation_version, 2);
repo_cfg_bool(r, "commitgraph.readchangedpaths", &r->settings.commit_graph_read_changed_paths, 1);
repo_cfg_bool(r, "gc.writecommitgraph", &r->settings.gc_write_commit_graph, 1);
repo_cfg_bool(r, "fetch.writecommitgraph", &r->settings.fetch_write_commit_graph, 0);
+
+ /* Boolean config or default, does not cascade (simple) */
repo_cfg_bool(r, "pack.usesparse", &r->settings.pack_use_sparse, 1);
repo_cfg_bool(r, "core.multipackindex", &r->settings.core_multi_pack_index, 1);
repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0);
diff --git a/repository.h b/repository.h
index ca837cb9e9..4f8275f97c 100644
--- a/repository.h
+++ b/repository.h
@@ -29,6 +29,7 @@ struct repo_settings {
int initialized;
int core_commit_graph;
+ int commit_graph_generation_version;
int commit_graph_read_changed_paths;
int gc_write_commit_graph;
int fetch_write_commit_graph;
base-commit: 715d08a9e51251ad8290b181b6ac3b9e1f9719d7
--
2.37.0.170.g444d1eabd0-goog
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v4] commit-graph: pass repo_settings instead of repository
2022-07-14 21:43 ` [PATCH v4] commit-graph: pass repo_settings instead of repository Josh Steadmon
@ 2022-07-14 22:48 ` Junio C Hamano
0 siblings, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2022-07-14 22:48 UTC (permalink / raw)
To: Josh Steadmon; +Cc: git, me
Josh Steadmon <steadmon@google.com> writes:
> From: Taylor Blau <me@ttaylorr.com>
>
> The parse_commit_graph() function takes a 'struct repository *' pointer,
> but it only ever accesses config settings (either directly or through
> the .settings field of the repo struct). Move all relevant config
> settings into the repo_settings struct, and update parse_commit_graph()
> and its existing callers so that it takes 'struct repo_settings *'
> instead.
Sounds good. Will queue.
I think this is ready to be merged down to 'next' by now?
Thanks.
^ permalink raw reply [flat|nested] 27+ messages in thread