git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RFC on packfile URIs and .gitmodules check
@ 2021-01-15 23:43 Jonathan Tan
  2021-01-16  0:30 ` Junio C Hamano
                   ` (3 more replies)
  0 siblings, 4 replies; 229+ messages in thread
From: Jonathan Tan @ 2021-01-15 23:43 UTC (permalink / raw)
  To: git; +Cc: peff, Jonathan Tan

Someone at $DAYJOB noticed that if a .gitmodules-containing tree and the
.gitmodules blob itself are sent in 2 separate packfiles during a fetch
(which can happen when packfile URIs are used), transfer.fsckobjects
causes the fetch to fail. You can reproduce it as follows (as of the
time of writing):

  $ git -c fetch.uriprotocols=https -c transfer.fsckobjects=true clone https://chromium.googlesource.com/chromiumos/codesearch
  Cloning into 'codesearch'...
  remote: Total 2242 (delta 0), reused 2242 (delta 0)
  Receiving objects: 100% (2242/2242), 1.77 MiB | 4.62 MiB/s, done.
  error: object 1f155c20935ee1154a813a814f03ef2b3976680f: gitmodulesMissing: unable to read .gitmodules blob
  fatal: fsck error in pack objects
  fatal: index-pack failed

This happens because the fsck part is currently being done in
index-pack, which operates on one pack at a time. When index-pack sees
the tree, it runs fsck on it (like any other object), and the fsck
subsystem remembers the .gitmodules target (specifically, in
gitmodules_found in fsck.c). Later, index-pack runs fsck_finish() which
checks if the target exists, but it doesn't, so it reports the failure.

One option is for fetch to do its own pass of checking all downloaded
objects once all packfiles have been downloaded, but that seems wasteful
as all trees would have to be re-inflated.

Another option is to do it within the connectivity check instead - so,
update rev-list and the object walking mechanism to be able to detect
.gitmodules in trees and fsck the target blob whenever such an entry
occurs. This has the advantage that there is no extra re-inflation,
although it might be strange to have object walking be able to fsck.

The simplest solution would be to just relax this - check the blob if it
exists, but if it doesn't, it's OK. Some things in favor of this
solution:

 - This is something we already do in the partial clone case (although
   it could be argued that in this case, we're already trusting the
   server for far more than .gitmodules, so just because it's OK in the
   partial clone case doesn't mean that it's OK in the regular case).

 - Also, the commit message for this feature (from ed8b10f631 ("fsck: check
   .gitmodules content", 2018-05-21)) gives a rationale of a newer
   server being able to protect older clients.
    - Servers using receive-pack (instead of fetch-pack) to obtain
      objects would still be protected, since receive-pack still only
      accepts one packfile at a time (and there are currently no plans
      to expand this).
    - Also, malicious .gitobjects files could still be crafted that pass
      fsck checking - for example, by containing a URL (of another
      server) that refers to a repo with a .gitobjects that would fail
      fsck.

So I would rather go with just relaxing the check, but if consensus is
that we should still do it, I'll investigate doing it in the
connectivity check.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC on packfile URIs and .gitmodules check
  2021-01-15 23:43 RFC on packfile URIs and .gitmodules check Jonathan Tan
@ 2021-01-16  0:30 ` Junio C Hamano
  2021-01-16  3:22   ` Taylor Blau
  2021-01-20  8:07 ` Ævar Arnfjörð Bjarmason
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 229+ messages in thread
From: Junio C Hamano @ 2021-01-16  0:30 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, peff

Jonathan Tan <jonathantanmy@google.com> writes:

> Someone at $DAYJOB noticed that if a .gitmodules-containing tree and the
> .gitmodules blob itself are sent in 2 separate packfiles during a fetch
> (which can happen when packfile URIs are used), transfer.fsckobjects
> causes the fetch to fail. You can reproduce it as follows (as of the
> time of writing):
>
>   $ git -c fetch.uriprotocols=https -c transfer.fsckobjects=true clone https://chromium.googlesource.com/chromiumos/codesearch
>   Cloning into 'codesearch'...
>   remote: Total 2242 (delta 0), reused 2242 (delta 0)
>   Receiving objects: 100% (2242/2242), 1.77 MiB | 4.62 MiB/s, done.
>   error: object 1f155c20935ee1154a813a814f03ef2b3976680f: gitmodulesMissing: unable to read .gitmodules blob
>   fatal: fsck error in pack objects
>   fatal: index-pack failed
>
> This happens because the fsck part is currently being done in
> index-pack, which operates on one pack at a time. When index-pack sees
> the tree, it runs fsck on it (like any other object), and the fsck
> subsystem remembers the .gitmodules target (specifically, in
> gitmodules_found in fsck.c). Later, index-pack runs fsck_finish() which
> checks if the target exists, but it doesn't, so it reports the failure.

Is this because the gitmodules blob is contained in the base image
served via the pack URI mechansim, and the "dynamic" packfile for
the latest part of the history refers to the gitmodules file that is
unchanged, hence the latter one lacks it?

> Another option is to do it within the connectivity check instead - so,
> update rev-list and the object walking mechanism to be able to detect
> .gitmodules in trees and fsck the target blob whenever such an entry
> occurs. This has the advantage that there is no extra re-inflation,
> although it might be strange to have object walking be able to fsck.
>
> The simplest solution would be to just relax this - check the blob if it
> exists, but if it doesn't, it's OK. Some things in favor of this
> solution:
>
>  - This is something we already do in the partial clone case (although
>    it could be argued that in this case, we're already trusting the
>    server for far more than .gitmodules, so just because it's OK in the
>    partial clone case doesn't mean that it's OK in the regular case).
>
>  - Also, the commit message for this feature (from ed8b10f631 ("fsck: check
>    .gitmodules content", 2018-05-21)) gives a rationale of a newer
>    server being able to protect older clients.
>     - Servers using receive-pack (instead of fetch-pack) to obtain
>       objects would still be protected, since receive-pack still only
>       accepts one packfile at a time (and there are currently no plans
>       to expand this).
>     - Also, malicious .gitobjects files could still be crafted that pass
>       fsck checking - for example, by containing a URL (of another
>       server) that refers to a repo with a .gitobjects that would fail
>       fsck.
>
> So I would rather go with just relaxing the check, but if consensus is
> that we should still do it, I'll investigate doing it in the
> connectivity check.

You've listed two possible solutions, i.e.

 (1) punt and declare that we assume an missing and uncheckable blob
     is OK,

 (2) defer the check after transfer completes.

Between the two, my gut feeling is that the latter is preferrable.
If we assume an missing and uncheckable one is OK, then even if a
blob is available to be checked, there is not much point in
checking, no?

As long as the quarantine of incoming pack works correctly,
streaming the incoming packdata (and packfile downloaded out of line
via a separate mechanism like pack URI) to index-pack that does not
check to complete the transfer, with a separate step to check the
sanity of these packs as a whole, should not harm the repository
even if it is interrupted in the middle, after transfer is done but
before checking says it is OK.

As a potential third option, I wonder if it is easier for everybody
involved (including third-party implementation of their
index-pack/fsck equivalent) if we made it a rule that a pack that
has a tree that refers to .git<something> must include the blob for
it?

Thanks.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC on packfile URIs and .gitmodules check
  2021-01-16  0:30 ` Junio C Hamano
@ 2021-01-16  3:22   ` Taylor Blau
  2021-01-19 12:56     ` Derrick Stolee
  2021-01-19 19:02     ` Jonathan Tan
  0 siblings, 2 replies; 229+ messages in thread
From: Taylor Blau @ 2021-01-16  3:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Tan, git, peff

On Fri, Jan 15, 2021 at 04:30:07PM -0800, Junio C Hamano wrote:
> Jonathan Tan <jonathantanmy@google.com> writes:
>
> > Someone at $DAYJOB noticed that if a .gitmodules-containing tree and the
> > .gitmodules blob itself are sent in 2 separate packfiles during a fetch
> > (which can happen when packfile URIs are used), transfer.fsckobjects
> > causes the fetch to fail. You can reproduce it as follows (as of the
> > time of writing):
> >
> >   $ git -c fetch.uriprotocols=https -c transfer.fsckobjects=true clone https://chromium.googlesource.com/chromiumos/codesearch
> >   Cloning into 'codesearch'...
> >   remote: Total 2242 (delta 0), reused 2242 (delta 0)
> >   Receiving objects: 100% (2242/2242), 1.77 MiB | 4.62 MiB/s, done.
> >   error: object 1f155c20935ee1154a813a814f03ef2b3976680f: gitmodulesMissing: unable to read .gitmodules blob
> >   fatal: fsck error in pack objects
> >   fatal: index-pack failed
> >
> > This happens because the fsck part is currently being done in
> > index-pack, which operates on one pack at a time. When index-pack sees
> > the tree, it runs fsck on it (like any other object), and the fsck
> > subsystem remembers the .gitmodules target (specifically, in
> > gitmodules_found in fsck.c). Later, index-pack runs fsck_finish() which
> > checks if the target exists, but it doesn't, so it reports the failure.
>
> Is this because the gitmodules blob is contained in the base image
> served via the pack URI mechansim, and the "dynamic" packfile for
> the latest part of the history refers to the gitmodules file that is
> unchanged, hence the latter one lacks it?

That seems like a likely explanation, although this seems ultimately up
to what the pack CDN serves.
> You've listed two possible solutions, i.e.
>
>  (1) punt and declare that we assume an missing and uncheckable blob
>      is OK,
>
>  (2) defer the check after transfer completes.
>
> Between the two, my gut feeling is that the latter is preferrable.
> If we assume an missing and uncheckable one is OK, then even if a
> blob is available to be checked, there is not much point in
> checking, no?

I'm going to second this. If this were a more benign check, then I'd
perhaps feel differently, but .gitmodules fsck checks seem to get
hardened fairly often during security releases, and so it seems
important to keep performing them when the user asked for it.

> As long as the quarantine of incoming pack works correctly,
> streaming the incoming packdata (and packfile downloaded out of line
> via a separate mechanism like pack URI) to index-pack that does not
> check to complete the transfer, with a separate step to check the
> sanity of these packs as a whole, should not harm the repository
> even if it is interrupted in the middle, after transfer is done but
> before checking says it is OK.

Agreed. Bear in mind that I am pretty unfamiliar with this code, and so
I'm not sure if it's 'easy' or not to change it in this way. The obvious
downside, which Jonathan notes, is that you almost certainly have to
reinflate all of the trees again.

But, since the user is asking for transfer.fsckObjects explicitly, I
don't think that it's a problem.

> As a potential third option, I wonder if it is easier for everybody
> involved (including third-party implementation of their
> index-pack/fsck equivalent) if we made it a rule that a pack that
> has a tree that refers to .git<something> must include the blob for
> it?

Interesting, but I'm sure CDN administrators would prefer to have as few
restrictions in place as possible.

A potential fourth option that I can think of is that we can try to
eagerly perform the .gitmodules fsck checks as we receive objects, under
the assumption that the .gitmoudles blob and the tree which contains it
appear in the same pack.

If they do, then we ought to be able to check them as we currently do
(and avoid leaving them to the slow post-processing step). Any blobs
that we _can't_ find get placed into an array, and then that array is
iterated over after we have received all packs, including from the CDN.
Any blobs that couldn't be found in the pack transferred from the
remote, the CDN, or the local repository (and isn't explicitly excluded
via an object --filter) is declared missing.

Thoughts?

> Thanks.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC on packfile URIs and .gitmodules check
  2021-01-16  3:22   ` Taylor Blau
@ 2021-01-19 12:56     ` Derrick Stolee
  2021-01-19 19:13       ` Jonathan Tan
  2021-01-19 19:02     ` Jonathan Tan
  1 sibling, 1 reply; 229+ messages in thread
From: Derrick Stolee @ 2021-01-19 12:56 UTC (permalink / raw)
  To: Taylor Blau, Junio C Hamano; +Cc: Jonathan Tan, git, peff

On 1/15/2021 10:22 PM, Taylor Blau wrote:
> On Fri, Jan 15, 2021 at 04:30:07PM -0800, Junio C Hamano wrote:
>> Jonathan Tan <jonathantanmy@google.com> writes:
>>
>>> Someone at $DAYJOB noticed that if a .gitmodules-containing tree and the
>>> .gitmodules blob itself are sent in 2 separate packfiles during a fetch
>>> (which can happen when packfile URIs are used), transfer.fsckobjects
>>> causes the fetch to fail. You can reproduce it as follows (as of the
>>> time of writing):
>>>
>>>   $ git -c fetch.uriprotocols=https -c transfer.fsckobjects=true clone https://chromium.googlesource.com/chromiumos/codesearch
>>>   Cloning into 'codesearch'...
>>>   remote: Total 2242 (delta 0), reused 2242 (delta 0)
>>>   Receiving objects: 100% (2242/2242), 1.77 MiB | 4.62 MiB/s, done.
>>>   error: object 1f155c20935ee1154a813a814f03ef2b3976680f: gitmodulesMissing: unable to read .gitmodules blob
>>>   fatal: fsck error in pack objects
>>>   fatal: index-pack failed

I'm contributing a quick suggestion for just this item:

>>> This happens because the fsck part is currently being done in
>>> index-pack, which operates on one pack at a time. When index-pack sees
>>> the tree, it runs fsck on it (like any other object), and the fsck
>>> subsystem remembers the .gitmodules target (specifically, in
>>> gitmodules_found in fsck.c). Later, index-pack runs fsck_finish() which
>>> checks if the target exists, but it doesn't, so it reports the failure.
>>
>> Is this because the gitmodules blob is contained in the base image
>> served via the pack URI mechansim, and the "dynamic" packfile for
>> the latest part of the history refers to the gitmodules file that is
>> unchanged, hence the latter one lacks it?
> 
> That seems like a likely explanation, although this seems ultimately up
> to what the pack CDN serves.
>> You've listed two possible solutions, i.e.
>>
>>  (1) punt and declare that we assume an missing and uncheckable blob
>>      is OK,
>>
>>  (2) defer the check after transfer completes.
>>
>> Between the two, my gut feeling is that the latter is preferrable.
>> If we assume an missing and uncheckable one is OK, then even if a
>> blob is available to be checked, there is not much point in
>> checking, no?
> 
> I'm going to second this. If this were a more benign check, then I'd
> perhaps feel differently, but .gitmodules fsck checks seem to get
> hardened fairly often during security releases, and so it seems
> important to keep performing them when the user asked for it.

It might be nice to teach 'index-pack' a mode that says certain
errors should be reported as warnings by writing the problematic
OIDs to stdout/stderr. Then, the second check after all packs are
present can focus on those problematic objects instead of
re-scanning everything.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC on packfile URIs and .gitmodules check
  2021-01-16  3:22   ` Taylor Blau
  2021-01-19 12:56     ` Derrick Stolee
@ 2021-01-19 19:02     ` Jonathan Tan
  1 sibling, 0 replies; 229+ messages in thread
From: Jonathan Tan @ 2021-01-19 19:02 UTC (permalink / raw)
  To: me; +Cc: gitster, jonathantanmy, git, peff

> > Is this because the gitmodules blob is contained in the base image
> > served via the pack URI mechansim, and the "dynamic" packfile for
> > the latest part of the history refers to the gitmodules file that is
> > unchanged, hence the latter one lacks it?
> 
> That seems like a likely explanation, although this seems ultimately up
> to what the pack CDN serves.

In this case, yes, that is what is happening.

> > You've listed two possible solutions, i.e.
> >
> >  (1) punt and declare that we assume an missing and uncheckable blob
> >      is OK,
> >
> >  (2) defer the check after transfer completes.
> >
> > Between the two, my gut feeling is that the latter is preferrable.
> > If we assume an missing and uncheckable one is OK, then even if a
> > blob is available to be checked, there is not much point in
> > checking, no?
> 
> I'm going to second this. If this were a more benign check, then I'd
> perhaps feel differently, but .gitmodules fsck checks seem to get
> hardened fairly often during security releases, and so it seems
> important to keep performing them when the user asked for it.

That makes sense.

> > As long as the quarantine of incoming pack works correctly,
> > streaming the incoming packdata (and packfile downloaded out of line
> > via a separate mechanism like pack URI) to index-pack that does not
> > check to complete the transfer, with a separate step to check the
> > sanity of these packs as a whole, should not harm the repository
> > even if it is interrupted in the middle, after transfer is done but
> > before checking says it is OK.
> 
> Agreed. Bear in mind that I am pretty unfamiliar with this code, and so
> I'm not sure if it's 'easy' or not to change it in this way. The obvious
> downside, which Jonathan notes, is that you almost certainly have to
> reinflate all of the trees again.
> 
> But, since the user is asking for transfer.fsckObjects explicitly, I
> don't think that it's a problem.

We might be able to avoid the reinflate if we do it as part of the
connectivity check or somehow teach index-pack a way to communicate the
dangling .gitmodules links (as you suggest below).

> > As a potential third option, I wonder if it is easier for everybody
> > involved (including third-party implementation of their
> > index-pack/fsck equivalent) if we made it a rule that a pack that
> > has a tree that refers to .git<something> must include the blob for
> > it?
> 
> Interesting, but I'm sure CDN administrators would prefer to have as few
> restrictions in place as possible.

That rule would help, but it also seems inelegant in that if we put
commits that have the same .gitmodules in 2 or more different packs,
there would be identical objects across those packs (besides the reason
Taylor mentioned).

> A potential fourth option that I can think of is that we can try to
> eagerly perform the .gitmodules fsck checks as we receive objects, under
> the assumption that the .gitmoudles blob and the tree which contains it
> appear in the same pack.
> 
> If they do, then we ought to be able to check them as we currently do
> (and avoid leaving them to the slow post-processing step). Any blobs
> that we _can't_ find get placed into an array, and then that array is
> iterated over after we have received all packs, including from the CDN.
> Any blobs that couldn't be found in the pack transferred from the
> remote, the CDN, or the local repository (and isn't explicitly excluded
> via an object --filter) is declared missing.
> 
> Thoughts?

The hard part is communicating this array to the parent fetch process.
Stolee has a suggestion [1] which I will reply to directly.

[1] https://lore.kernel.org/git/d2ca2fec-a353-787a-15a7-3831a665523e@gmail.com/

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC on packfile URIs and .gitmodules check
  2021-01-19 12:56     ` Derrick Stolee
@ 2021-01-19 19:13       ` Jonathan Tan
  2021-01-20  1:04         ` Junio C Hamano
  0 siblings, 1 reply; 229+ messages in thread
From: Jonathan Tan @ 2021-01-19 19:13 UTC (permalink / raw)
  To: stolee; +Cc: me, gitster, jonathantanmy, git, peff

> I'm contributing a quick suggestion for just this item:
> 
> >>> This happens because the fsck part is currently being done in
> >>> index-pack, which operates on one pack at a time. When index-pack sees
> >>> the tree, it runs fsck on it (like any other object), and the fsck
> >>> subsystem remembers the .gitmodules target (specifically, in
> >>> gitmodules_found in fsck.c). Later, index-pack runs fsck_finish() which
> >>> checks if the target exists, but it doesn't, so it reports the failure.
> >>
> >> Is this because the gitmodules blob is contained in the base image
> >> served via the pack URI mechansim, and the "dynamic" packfile for
> >> the latest part of the history refers to the gitmodules file that is
> >> unchanged, hence the latter one lacks it?
> > 
> > That seems like a likely explanation, although this seems ultimately up
> > to what the pack CDN serves.
> >> You've listed two possible solutions, i.e.
> >>
> >>  (1) punt and declare that we assume an missing and uncheckable blob
> >>      is OK,
> >>
> >>  (2) defer the check after transfer completes.
> >>
> >> Between the two, my gut feeling is that the latter is preferrable.
> >> If we assume an missing and uncheckable one is OK, then even if a
> >> blob is available to be checked, there is not much point in
> >> checking, no?
> > 
> > I'm going to second this. If this were a more benign check, then I'd
> > perhaps feel differently, but .gitmodules fsck checks seem to get
> > hardened fairly often during security releases, and so it seems
> > important to keep performing them when the user asked for it.
> 
> It might be nice to teach 'index-pack' a mode that says certain
> errors should be reported as warnings by writing the problematic
> OIDs to stdout/stderr. Then, the second check after all packs are
> present can focus on those problematic objects instead of
> re-scanning everything.

My initial reaction was that stdout is already used to report the hash
part of the generated name and that stderr is already used for whatever
warnings there are, but looking at the documentation, index-pack
--fsck-objects is "[for] internal use only", so it might be fine to
extend the output format in this case and report the problematic OIDs
after the hash. I'll take a look.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC on packfile URIs and .gitmodules check
  2021-01-19 19:13       ` Jonathan Tan
@ 2021-01-20  1:04         ` Junio C Hamano
  0 siblings, 0 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-01-20  1:04 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: stolee, me, git, peff

Jonathan Tan <jonathantanmy@google.com> writes:

>> It might be nice to teach 'index-pack' a mode that says certain
>> errors should be reported as warnings by writing the problematic
>> OIDs to stdout/stderr. Then, the second check after all packs are
>> present can focus on those problematic objects instead of
>> re-scanning everything.
>
> My initial reaction was that stdout is already used to report the hash
> part of the generated name and that stderr is already used for whatever
> warnings there are, but looking at the documentation, index-pack
> --fsck-objects is "[for] internal use only", so it might be fine to
> extend the output format in this case and report the problematic OIDs
> after the hash. I'll take a look.

If I am not mistaken, Taylor also mentioned the possibility to give
"these objects need reinspecting" to a later process, and it is an
excellent suggestion.  And I think it is perfectly fine to adjust
the internal format used purely for internal use.

Thanks.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC on packfile URIs and .gitmodules check
  2021-01-15 23:43 RFC on packfile URIs and .gitmodules check Jonathan Tan
  2021-01-16  0:30 ` Junio C Hamano
@ 2021-01-20  8:07 ` Ævar Arnfjörð Bjarmason
  2021-01-20 19:30   ` Jonathan Tan
  2021-01-20 19:36   ` [PATCH] Doc: clarify contents of packfile sent as URI Jonathan Tan
  2021-01-24  2:34 ` [PATCH 0/4] Check .gitmodules when using packfile URIs Jonathan Tan
  2021-02-22 19:20 ` [PATCH v2 " Jonathan Tan
  3 siblings, 2 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-01-20  8:07 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, peff, Derrick Stolee, Taylor Blau


On Sat, Jan 16 2021, Jonathan Tan wrote:

> Someone at $DAYJOB noticed that if a .gitmodules-containing tree and the
> .gitmodules blob itself are sent in 2 separate packfiles during a fetch
> (which can happen when packfile URIs are used), transfer.fsckobjects
> causes the fetch to fail. You can reproduce it as follows (as of the
> time of writing):
>
>   $ git -c fetch.uriprotocols=https -c transfer.fsckobjects=true clone https://chromium.googlesource.com/chromiumos/codesearch
>   Cloning into 'codesearch'...
>   remote: Total 2242 (delta 0), reused 2242 (delta 0)
>   Receiving objects: 100% (2242/2242), 1.77 MiB | 4.62 MiB/s, done.
>   error: object 1f155c20935ee1154a813a814f03ef2b3976680f: gitmodulesMissing: unable to read .gitmodules blob
>   fatal: fsck error in pack objects
>   fatal: index-pack failed
>
> This happens because the fsck part is currently being done in
> index-pack, which operates on one pack at a time. When index-pack sees
> the tree, it runs fsck on it (like any other object), and the fsck
> subsystem remembers the .gitmodules target (specifically, in
> gitmodules_found in fsck.c). Later, index-pack runs fsck_finish() which
> checks if the target exists, but it doesn't, so it reports the failure.
>
> One option is for fetch to do its own pass of checking all downloaded
> objects once all packfiles have been downloaded, but that seems wasteful
> as all trees would have to be re-inflated.
>
> Another option is to do it within the connectivity check instead - so,
> update rev-list and the object walking mechanism to be able to detect
> .gitmodules in trees and fsck the target blob whenever such an entry
> occurs. This has the advantage that there is no extra re-inflation,
> although it might be strange to have object walking be able to fsck.
>
> The simplest solution would be to just relax this - check the blob if it
> exists, but if it doesn't, it's OK. Some things in favor of this
> solution:
>
>  - This is something we already do in the partial clone case (although
>    it could be argued that in this case, we're already trusting the
>    server for far more than .gitmodules, so just because it's OK in the
>    partial clone case doesn't mean that it's OK in the regular case).
>
>  - Also, the commit message for this feature (from ed8b10f631 ("fsck: check
>    .gitmodules content", 2018-05-21)) gives a rationale of a newer
>    server being able to protect older clients.
>     - Servers using receive-pack (instead of fetch-pack) to obtain
>       objects would still be protected, since receive-pack still only
>       accepts one packfile at a time (and there are currently no plans
>       to expand this).
>     - Also, malicious .gitobjects files could still be crafted that pass
>       fsck checking - for example, by containing a URL (of another
>       server) that refers to a repo with a .gitobjects that would fail
>       fsck.
>
> So I would rather go with just relaxing the check, but if consensus is
> that we should still do it, I'll investigate doing it in the
> connectivity check.

Would this still behave if the $DAYJOB's packfile-uri server support was
behaving as documented in packfile-uri.txt, or just because it has
outside-spec behavior?

I.e. the spec[1] says this:

    This is the implementation: a feature, marked experimental, that
    allows the server to be configured by one or more
    `uploadpack.blobPackfileUri=<sha1> <uri>` entries. Whenever the list
    of objects to be sent is assembled, all such blobs are excluded,
    replaced with URIs. The client will download those URIs, expecting
    them to each point to packfiles containing single blobs.

Which I can't see leaving an opening for more than packfile-uri being to
serve up packfiles which each contain a single blob.

In that case it seems to me we'd be OK (but I haven't tested), because
fsck_finish() will call read_object_file() which'll try to read that
"blob from the object store when it encounters the ".gitmodules" tree,
and because we'd have already downloaded the packfile with the blob
before moving onto the main dialog.

But as we discussed on-list before[2] this isn't the way packfile-uri
actually works in the wild. It's really just sending some arbitrary data
in a pack in that URI, with a server that knows what's in that pack and
will send the rest in such a way that everything ends up being
connected.

As far as I can tell the only reason this is called "packfile URI" and
behaves this way in git.git is because of the convenience of
intrumenting pack-objects.c with an "oidset excluded_by_config" to not
stream those blobs in a pack, but it isn't how the only (I'm pretty
sure) production server implementation in the wild behaves at all.

So *poke* about the reply I had in [3] late last year. I think the first
thing worth doing here is fixing the docs so they describe how this
works. You didn't get back on that (and I also forgot about it until
this thread), but it would be nice to know what you think about the
suggested prose there.

Re-reading it I'd add something like this to the spec:

 A. That the config is called "uploadpack.blobPackfileUri" in git.git
    has nothing to do with how this is expected to behave on the
    wire. It's just to serve the narrow support pack-objects.c has for
    crafting such a pack.

 B. It's then called "packfile-uris" on the wire, nothing to do with
    blobs. Just packs with a checksum that we'll validate. An older
    versions of this spec said "[a] packfiles containing single blobs"
    but it can be any combination of blob/tree/commit data.

 C. A client is then expected to deal with any combination of data
    ordered/sliced/split up etc. in any possible way from such a
    combination of "packfile-uris" and PACK dialog, as long as the end
    result is valid.

Except that the result of this discussion will perhaps be a more narrow
definition for "C".

1. https://github.com/git/git/blob/cd8402e0fd8cfc0ec9fb10e22ffb6aabd992eae1/Documentation/technical/packfile-uri.txt#L37-L41
2. https://lore.kernel.org/git/20201125190957.1113461-1-jonathantanmy@google.com/
3. https://lore.kernel.org/git/87tut5vghw.fsf@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC on packfile URIs and .gitmodules check
  2021-01-20  8:07 ` Ævar Arnfjörð Bjarmason
@ 2021-01-20 19:30   ` Jonathan Tan
  2021-01-21  3:06     ` Junio C Hamano
  2021-01-20 19:36   ` [PATCH] Doc: clarify contents of packfile sent as URI Jonathan Tan
  1 sibling, 1 reply; 229+ messages in thread
From: Jonathan Tan @ 2021-01-20 19:30 UTC (permalink / raw)
  To: avarab; +Cc: jonathantanmy, git, peff, stolee, me

> Would this still behave if the $DAYJOB's packfile-uri server support was
> behaving as documented in packfile-uri.txt, or just because it has
> outside-spec behavior?
> 
> I.e. the spec[1] says this:
> 
>     This is the implementation: a feature, marked experimental, that
>     allows the server to be configured by one or more
>     `uploadpack.blobPackfileUri=<sha1> <uri>` entries. Whenever the list
>     of objects to be sent is assembled, all such blobs are excluded,
>     replaced with URIs. The client will download those URIs, expecting
>     them to each point to packfiles containing single blobs.
> 
> Which I can't see leaving an opening for more than packfile-uri being to
> serve up packfiles which each contain a single blob.

I meant to leave an opening by referring to this just as a Minimum
Viable Product and by explaining in Future Work that the protocol allows
evolution of (among other things) which objects the server sends through
a URI without any protocol changes.

But in any case, this will also happen even if we constrain ourselves to
excluding single blobs and sending them via other packfiles instead -
see below.

> In that case it seems to me we'd be OK (but I haven't tested), because
> fsck_finish() will call read_object_file() which'll try to read that
> "blob from the object store when it encounters the ".gitmodules" tree,
> and because we'd have already downloaded the packfile with the blob
> before moving onto the main dialog.

We wouldn't be OK, actually. Suppose we have a separate packfile
containing only the ".gitmodules" blob - when we call fsck_finish(), we
would not have downloaded the other packfile yet. Git processes the
entire fetch response by piping the inline packfile (after demux) into
index-pack (which is the one that calls fsck_finish()) before it
downloads any of the other packfile(s).

> But as we discussed on-list before[2] this isn't the way packfile-uri
> actually works in the wild. It's really just sending some arbitrary data
> in a pack in that URI, with a server that knows what's in that pack and
> will send the rest in such a way that everything ends up being
> connected.
> 
> As far as I can tell the only reason this is called "packfile URI" and
> behaves this way in git.git is because of the convenience of
> intrumenting pack-objects.c with an "oidset excluded_by_config" to not
> stream those blobs in a pack, but it isn't how the only (I'm pretty
> sure) production server implementation in the wild behaves at all.

I don't know if this is the only production server implementation, but
yes, this particular one (googlesource.com) can put objects of multiple
types in the other packfile, not only a single blob. There is some JGit
code here [1] that can send a URI corresponding to a "CachedPack" (which
may contain all objects, not only blobs) if that pack is also available
through a URI.

[1] https://gerrit.googlesource.com/jgit/+/a004820858b54d18c6f72fc94dc33bce8b606d66

> So *poke* about the reply I had in [3] late last year. I think the first
> thing worth doing here is fixing the docs so they describe how this
> works. You didn't get back on that (and I also forgot about it until
> this thread), but it would be nice to know what you think about the
> suggested prose there.

Rereading that, the issue is that uploadpack.blobPackfileUri is indeed
how the current Git server handles it - it excludes a blob and sends a
URI instead. The client is not supposed to see how the server has
configured it, and should not be constrained by the fact that the server
that is being shipped with it only excludes single blobs.

> Re-reading it I'd add something like this to the spec:
> 
>  A. That the config is called "uploadpack.blobPackfileUri" in git.git
>     has nothing to do with how this is expected to behave on the
>     wire. It's just to serve the narrow support pack-objects.c has for
>     crafting such a pack.

Yes, that's true.

>  B. It's then called "packfile-uris" on the wire, nothing to do with
>     blobs. Just packs with a checksum that we'll validate. An older
>     versions of this spec said "[a] packfiles containing single blobs"
>     but it can be any combination of blob/tree/commit data.

Yes, we can delete that line.

>  C. A client is then expected to deal with any combination of data
>     ordered/sliced/split up etc. in any possible way from such a
>     combination of "packfile-uris" and PACK dialog, as long as the end
>     result is valid.
> 
> Except that the result of this discussion will perhaps be a more narrow
> definition for "C".

Yes. I think all these can be done just by changing the last sentence in
"Server design" - I'll send a patch.

> 1. https://github.com/git/git/blob/cd8402e0fd8cfc0ec9fb10e22ffb6aabd992eae1/Documentation/technical/packfile-uri.txt#L37-L41
> 2. https://lore.kernel.org/git/20201125190957.1113461-1-jonathantanmy@google.com/
> 3. https://lore.kernel.org/git/87tut5vghw.fsf@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 229+ messages in thread

* [PATCH] Doc: clarify contents of packfile sent as URI
  2021-01-20  8:07 ` Ævar Arnfjörð Bjarmason
  2021-01-20 19:30   ` Jonathan Tan
@ 2021-01-20 19:36   ` Jonathan Tan
  1 sibling, 0 replies; 229+ messages in thread
From: Jonathan Tan @ 2021-01-20 19:36 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, avarab

Clarify that, when the packfile-uri feature is used, the client should
not assume that the extra packfiles downloaded would only contain a
single blob, but support packfiles containing multiple objects of all
types.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/technical/packfile-uri.txt | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/packfile-uri.txt b/Documentation/technical/packfile-uri.txt
index 318713abc3..f7eabc6c76 100644
--- a/Documentation/technical/packfile-uri.txt
+++ b/Documentation/technical/packfile-uri.txt
@@ -37,8 +37,11 @@ at least so that we can test the client.
 This is the implementation: a feature, marked experimental, that allows the
 server to be configured by one or more `uploadpack.blobPackfileUri=<sha1>
 <uri>` entries. Whenever the list of objects to be sent is assembled, all such
-blobs are excluded, replaced with URIs. The client will download those URIs,
-expecting them to each point to packfiles containing single blobs.
+blobs are excluded, replaced with URIs. As noted in "Future work" below, the
+server can evolve in the future to support excluding other objects (or other
+implementations of servers could be made that support excluding other objects)
+without needing a protocol change, so clients should not expect that packfiles
+downloaded in this way only contain single blobs.
 
 Client design
 -------------
-- 
2.30.0.284.gd98b1dd5eaa7-goog


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* Re: RFC on packfile URIs and .gitmodules check
  2021-01-20 19:30   ` Jonathan Tan
@ 2021-01-21  3:06     ` Junio C Hamano
  2021-01-21 18:32       ` Jonathan Tan
  0 siblings, 1 reply; 229+ messages in thread
From: Junio C Hamano @ 2021-01-21  3:06 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: avarab, git, peff, stolee, me

Jonathan Tan <jonathantanmy@google.com> writes:

> We wouldn't be OK, actually. Suppose we have a separate packfile
> containing only the ".gitmodules" blob - when we call fsck_finish(), we
> would not have downloaded the other packfile yet. Git processes the
> entire fetch response by piping the inline packfile (after demux) into
> index-pack (which is the one that calls fsck_finish()) before it
> downloads any of the other packfile(s).

Is that order documented as a requirement for implementation?

Naïvely, I would expect that a CDN offload would be to relieve
servers from the burden of having to repack ancient part of the
history all the time for any new "clone" clients and that is what
the "here is a URI, go fetch it because I won't give you objects
that already appear there" feature is about.  Because we expect that
the offloaded contents would not be up-to-date, the traditional
packfile transfer would then is used to complete the history with
objects necessary for the parts of the history newer than the
offloaded contents.

And from that viewpoint, it sounds totally backwards to start
processing the up-to-the-minute fresh packfile that came via the
traditional packfile transfer before the CDN offloaded contents are
fetched and stored safely in our repository.

We probably want to finish interaction with the live server as
quickly as possible---it would go counter to that wish if we force
the live part of the history hang in flight, unprocessed, while the
client downloads offloaded bulk from CDN and processes it, making
the server side stuck waiting for some write(2) to go through.

But I still wonder if it is an option to locally delay the
processing of the up-to-the-minute-fresh part.

Instead of feeding what comes from them directly to "index-pack
--fsck-objects", would it make sense to spool it to a temporary, so
that we can release the server early, but then make sure to fetch
and process packfile URI material before coming back to process the
spooled packdata.  That would allow the newer part of the history to
have newer trees that still reference the same old .gitmodules that
is found in the frozen packfile that comes from CDN, no?

Or can there be a situation where some objects in CDN pack are
referred to by objects in the up-to-the-minute-fresh pack (e.g. a
".gitmodules" blob in CDN pack is still unchanged and used in an
updated tree in the latest revision) and some other objects in CDN
pack refer to an object in the live part of the history?  If there
is such a cyclic dependency, "index-pack --fsck" one pack at a time
would not work, but I doubt such a cycle can arise.

Thanks.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC on packfile URIs and .gitmodules check
  2021-01-21  3:06     ` Junio C Hamano
@ 2021-01-21 18:32       ` Jonathan Tan
  2021-01-21 18:39         ` Junio C Hamano
  0 siblings, 1 reply; 229+ messages in thread
From: Jonathan Tan @ 2021-01-21 18:32 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, avarab, git, peff, stolee, me

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > We wouldn't be OK, actually. Suppose we have a separate packfile
> > containing only the ".gitmodules" blob - when we call fsck_finish(), we
> > would not have downloaded the other packfile yet. Git processes the
> > entire fetch response by piping the inline packfile (after demux) into
> > index-pack (which is the one that calls fsck_finish()) before it
> > downloads any of the other packfile(s).
> 
> Is that order documented as a requirement for implementation?
> 
> Naïvely, I would expect that a CDN offload would be to relieve
> servers from the burden of having to repack ancient part of the
> history all the time for any new "clone" clients and that is what
> the "here is a URI, go fetch it because I won't give you objects
> that already appear there" feature is about.  Because we expect that
> the offloaded contents would not be up-to-date, the traditional
> packfile transfer would then is used to complete the history with
> objects necessary for the parts of the history newer than the
> offloaded contents.
> 
> And from that viewpoint, it sounds totally backwards to start
> processing the up-to-the-minute fresh packfile that came via the
> traditional packfile transfer before the CDN offloaded contents are
> fetched and stored safely in our repository.
> 
> We probably want to finish interaction with the live server as
> quickly as possible---it would go counter to that wish if we force
> the live part of the history hang in flight, unprocessed, while the
> client downloads offloaded bulk from CDN and processes it, making
> the server side stuck waiting for some write(2) to go through.
> 
> But I still wonder if it is an option to locally delay the
> processing of the up-to-the-minute-fresh part.
> 
> Instead of feeding what comes from them directly to "index-pack
> --fsck-objects", would it make sense to spool it to a temporary, so
> that we can release the server early, but then make sure to fetch
> and process packfile URI material before coming back to process the
> spooled packdata.  That would allow the newer part of the history to
> have newer trees that still reference the same old .gitmodules that
> is found in the frozen packfile that comes from CDN, no?
> 
> Or can there be a situation where some objects in CDN pack are
> referred to by objects in the up-to-the-minute-fresh pack (e.g. a
> ".gitmodules" blob in CDN pack is still unchanged and used in an
> updated tree in the latest revision) and some other objects in CDN
> pack refer to an object in the live part of the history?  If there
> is such a cyclic dependency, "index-pack --fsck" one pack at a time
> would not work, but I doubt such a cycle can arise.

My intention is that the order of the packfiles (and cyclic
dependencies) would not matter, so we wouldn't need to delay any
processing of the up-to-the-minute-fresh part. I'm currently working on
getting index-pack to output a list of the dangling .gitmodules files,
so that fetch-pack (its consumer) can do one final fsck on those files.

Another way, as you said, is to say that the order of the packfiles
matters (which potentially allows some simplification on the client
side) but I don't think that we need to lose this flexibility.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC on packfile URIs and .gitmodules check
  2021-01-21 18:32       ` Jonathan Tan
@ 2021-01-21 18:39         ` Junio C Hamano
  0 siblings, 0 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-01-21 18:39 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: avarab, git, peff, stolee, me

Jonathan Tan <jonathantanmy@google.com> writes:

>> Jonathan Tan <jonathantanmy@google.com> writes:
>> 
>> Or can there be a situation where some objects in CDN pack are
>> referred to by objects in the up-to-the-minute-fresh pack (e.g. a
>> ".gitmodules" blob in CDN pack is still unchanged and used in an
>> updated tree in the latest revision) and some other objects in CDN
>> pack refer to an object in the live part of the history?  If there
>> is such a cyclic dependency, "index-pack --fsck" one pack at a time
>> would not work, but I doubt such a cycle can arise.
>
> My intention is that the order of the packfiles (and cyclic
> dependencies) would not matter...
> I'm currently working on
> getting index-pack to output a list of the dangling .gitmodules files,
> so that fetch-pack (its consumer) can do one final fsck on those files.

In other words, it essentially becomes "we check everything we
obtained as a single unit across multiple packs, but for performance
we'll let index-pack work as much as possible on each individual
pack while it has necessary data in its core, and then we conclude
by checking the objects on the 'boundaries' that cannot be validated
using info that is only in one pack".

That does sound like the right approach.  THanks.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* [PATCH 0/4] Check .gitmodules when using packfile URIs
  2021-01-15 23:43 RFC on packfile URIs and .gitmodules check Jonathan Tan
  2021-01-16  0:30 ` Junio C Hamano
  2021-01-20  8:07 ` Ævar Arnfjörð Bjarmason
@ 2021-01-24  2:34 ` Jonathan Tan
  2021-01-24  2:34   ` [PATCH 1/4] http: allow custom index-pack args Jonathan Tan
                     ` (5 more replies)
  2021-02-22 19:20 ` [PATCH v2 " Jonathan Tan
  3 siblings, 6 replies; 229+ messages in thread
From: Jonathan Tan @ 2021-01-24  2:34 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

This patch set resolves the .gitmodules-and-tree-in-separate-packfiles
issue I mentioned in [1] by having index-pack print out all dangling
.gitmodules (instead of returning with an error code) and then teaching
fetch-pack to read those and run its own fsck checks after all
index-pack invocations are complete.

As part of this, index-pack has to output (1) the hash that goes into
the name of the .pack/.idx file and (2) the hashes of all dangling
.gitmodules. I just had (2) come after (1). If anyone has a better idea,
I'm interested.

I also discovered a bug in that different index-pack arguments were used
when processing the inline packfile and when processing the ones
referenced by URIs. Patch 1-3 fixes that bug by passing the arguments to
use as a space-separated URL-encoded list. (URL-encoded so that we can
have spaces in the arguments.) Again, if anyone has a better idea, I'm
interested. It is only in patch 4 that we have the dangling .gitmodules
fix.

[1] https://lore.kernel.org/git/20210115234300.350442-1-jonathantanmy@google.com/

Jonathan Tan (4):
  http: allow custom index-pack args
  http-fetch: allow custom index-pack args
  fetch-pack: with packfile URIs, use index-pack arg
  fetch-pack: print and use dangling .gitmodules

 Documentation/git-http-fetch.txt |   9 ++-
 Documentation/git-index-pack.txt |   7 +-
 builtin/index-pack.c             |   9 ++-
 builtin/receive-pack.c           |   2 +-
 fetch-pack.c                     | 106 ++++++++++++++++++++++++++-----
 fsck.c                           |  16 +++--
 fsck.h                           |   8 +++
 http-fetch.c                     |  35 +++++++++-
 http.c                           |  15 +++--
 http.h                           |  10 +--
 pack-write.c                     |   8 ++-
 pack.h                           |   2 +-
 t/t5550-http-fetch-dumb.sh       |   3 +-
 t/t5702-protocol-v2.sh           |  47 ++++++++++++++
 14 files changed, 232 insertions(+), 45 deletions(-)

-- 
2.30.0.280.ga3ce27912f-goog


^ permalink raw reply	[flat|nested] 229+ messages in thread

* [PATCH 1/4] http: allow custom index-pack args
  2021-01-24  2:34 ` [PATCH 0/4] Check .gitmodules when using packfile URIs Jonathan Tan
@ 2021-01-24  2:34   ` Jonathan Tan
  2021-01-24  2:34   ` [PATCH 2/4] http-fetch: " Jonathan Tan
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 229+ messages in thread
From: Jonathan Tan @ 2021-01-24  2:34 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Currently, when fetching, packfiles referenced by URIs are run through
index-pack without any arguments other than --stdin and --keep, no
matter what arguments are used for the packfile that is inline in the
fetch response. As a preparation for ensuring that all packs (whether
inline or not) use the same index-pack arguments, teach the http
subsystem to allow custom index-pack arguments.

http-fetch has been updated to use the new API. For now, it passes
--keep alone instead of --keep with a process ID, but this is only
temporary because http-fetch itself will be taught to accept index-pack
parameters (instead of using a hardcoded constant) in a subsequent
commit.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 http-fetch.c |  6 +++++-
 http.c       | 15 ++++++++-------
 http.h       | 10 +++++-----
 3 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/http-fetch.c b/http-fetch.c
index c4ccc5fea9..2d1d9d054f 100644
--- a/http-fetch.c
+++ b/http-fetch.c
@@ -43,6 +43,9 @@ static int fetch_using_walker(const char *raw_url, int get_verbosely,
 	return rc;
 }
 
+static const char *index_pack_args[] =
+	{"index-pack", "--stdin", "--keep", NULL};
+
 static void fetch_single_packfile(struct object_id *packfile_hash,
 				  const char *url) {
 	struct http_pack_request *preq;
@@ -55,7 +58,8 @@ static void fetch_single_packfile(struct object_id *packfile_hash,
 	if (preq == NULL)
 		die("couldn't create http pack request");
 	preq->slot->results = &results;
-	preq->generate_keep = 1;
+	preq->index_pack_args = index_pack_args;
+	preq->preserve_index_pack_stdout = 1;
 
 	if (start_active_slot(preq->slot)) {
 		run_active_slot(preq->slot);
diff --git a/http.c b/http.c
index 8b23a546af..f8ea28bb2e 100644
--- a/http.c
+++ b/http.c
@@ -2259,6 +2259,9 @@ void release_http_pack_request(struct http_pack_request *preq)
 	free(preq);
 }
 
+static const char *default_index_pack_args[] =
+	{"index-pack", "--stdin", NULL};
+
 int finish_http_pack_request(struct http_pack_request *preq)
 {
 	struct child_process ip = CHILD_PROCESS_INIT;
@@ -2270,17 +2273,15 @@ int finish_http_pack_request(struct http_pack_request *preq)
 
 	tmpfile_fd = xopen(preq->tmpfile.buf, O_RDONLY);
 
-	strvec_push(&ip.args, "index-pack");
-	strvec_push(&ip.args, "--stdin");
 	ip.git_cmd = 1;
 	ip.in = tmpfile_fd;
-	if (preq->generate_keep) {
-		strvec_pushf(&ip.args, "--keep=git %"PRIuMAX,
-			     (uintmax_t)getpid());
+	ip.argv = preq->index_pack_args ? preq->index_pack_args
+					: default_index_pack_args;
+
+	if (preq->preserve_index_pack_stdout)
 		ip.out = 0;
-	} else {
+	else
 		ip.no_stdout = 1;
-	}
 
 	if (run_command(&ip)) {
 		ret = -1;
diff --git a/http.h b/http.h
index 5de792ef3f..bf3d1270ad 100644
--- a/http.h
+++ b/http.h
@@ -218,12 +218,12 @@ struct http_pack_request {
 	char *url;
 
 	/*
-	 * If this is true, finish_http_pack_request() will pass "--keep" to
-	 * index-pack, resulting in the creation of a keep file, and will not
-	 * suppress its stdout (that is, the "keep\t<hash>\n" line will be
-	 * printed to stdout).
+	 * index-pack command to run. Must be terminated by NULL.
+	 *
+	 * If NULL, defaults to	{"index-pack", "--stdin", NULL}.
 	 */
-	unsigned generate_keep : 1;
+	const char **index_pack_args;
+	unsigned preserve_index_pack_stdout : 1;
 
 	FILE *packfile;
 	struct strbuf tmpfile;
-- 
2.30.0.280.ga3ce27912f-goog


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 2/4] http-fetch: allow custom index-pack args
  2021-01-24  2:34 ` [PATCH 0/4] Check .gitmodules when using packfile URIs Jonathan Tan
  2021-01-24  2:34   ` [PATCH 1/4] http: allow custom index-pack args Jonathan Tan
@ 2021-01-24  2:34   ` Jonathan Tan
  2021-01-24 11:52     ` Ævar Arnfjörð Bjarmason
  2021-02-16 20:49     ` Josh Steadmon
  2021-01-24  2:34   ` [PATCH 3/4] fetch-pack: with packfile URIs, use index-pack arg Jonathan Tan
                     ` (3 subsequent siblings)
  5 siblings, 2 replies; 229+ messages in thread
From: Jonathan Tan @ 2021-01-24  2:34 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

This is the next step in teaching fetch-pack to pass its index-pack
arguments when processing packfiles referenced by URIs.

The "--keep" in fetch-pack.c will be replaced with a full message in a
subsequent commit.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/git-http-fetch.txt |  9 ++++++--
 fetch-pack.c                     |  1 +
 http-fetch.c                     | 35 +++++++++++++++++++++++++++-----
 t/t5550-http-fetch-dumb.sh       |  3 ++-
 4 files changed, 40 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-http-fetch.txt b/Documentation/git-http-fetch.txt
index 4deb4893f5..aa171088e8 100644
--- a/Documentation/git-http-fetch.txt
+++ b/Documentation/git-http-fetch.txt
@@ -41,11 +41,16 @@ commit-id::
 		<commit-id>['\t'<filename-as-in--w>]
 
 --packfile=<hash>::
-	Instead of a commit id on the command line (which is not expected in
+	For internal use only. Instead of a commit id on the command line (which is not expected in
 	this case), 'git http-fetch' fetches the packfile directly at the given
 	URL and uses index-pack to generate corresponding .idx and .keep files.
 	The hash is used to determine the name of the temporary file and is
-	arbitrary. The output of index-pack is printed to stdout.
+	arbitrary. The output of index-pack is printed to stdout. Requires
+	--index-pack-args.
+
+--index-pack-args=<args>::
+	For internal use only. The command to run on the contents of the
+	downloaded pack. Arguments are URL-encoded separated by spaces.
 
 --recover::
 	Verify that everything reachable from target is fetched.  Used after
diff --git a/fetch-pack.c b/fetch-pack.c
index 876f90c759..274ae602f7 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1645,6 +1645,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 		strvec_pushf(&cmd.args, "--packfile=%.*s",
 			     (int) the_hash_algo->hexsz,
 			     packfile_uris.items[i].string);
+		strvec_push(&cmd.args, "--index-pack-args=index-pack --stdin --keep");
 		strvec_push(&cmd.args, uri);
 		cmd.git_cmd = 1;
 		cmd.no_stdin = 1;
diff --git a/http-fetch.c b/http-fetch.c
index 2d1d9d054f..12feb84e71 100644
--- a/http-fetch.c
+++ b/http-fetch.c
@@ -3,6 +3,7 @@
 #include "exec-cmd.h"
 #include "http.h"
 #include "walker.h"
+#include "strvec.h"
 
 static const char http_fetch_usage[] = "git http-fetch "
 "[-c] [-t] [-a] [-v] [--recover] [-w ref] [--stdin | --packfile=hash | commit-id] url";
@@ -43,11 +44,9 @@ static int fetch_using_walker(const char *raw_url, int get_verbosely,
 	return rc;
 }
 
-static const char *index_pack_args[] =
-	{"index-pack", "--stdin", "--keep", NULL};
-
 static void fetch_single_packfile(struct object_id *packfile_hash,
-				  const char *url) {
+				  const char *url,
+				  const char **index_pack_args) {
 	struct http_pack_request *preq;
 	struct slot_results results;
 	int ret;
@@ -90,6 +89,7 @@ int cmd_main(int argc, const char **argv)
 	int packfile = 0;
 	int nongit;
 	struct object_id packfile_hash;
+	const char *index_pack_args = NULL;
 
 	setup_git_directory_gently(&nongit);
 
@@ -116,6 +116,8 @@ int cmd_main(int argc, const char **argv)
 			packfile = 1;
 			if (parse_oid_hex(p, &packfile_hash, &end) || *end)
 				die(_("argument to --packfile must be a valid hash (got '%s')"), p);
+		} else if (skip_prefix(argv[arg], "--index-pack-args=", &p)) {
+			index_pack_args = p;
 		}
 		arg++;
 	}
@@ -128,10 +130,33 @@ int cmd_main(int argc, const char **argv)
 	git_config(git_default_config, NULL);
 
 	if (packfile) {
-		fetch_single_packfile(&packfile_hash, argv[arg]);
+		struct strvec encoded = STRVEC_INIT;
+		char **raw;
+		int i;
+
+		if (!index_pack_args)
+			die(_("--packfile requires --index-pack-args"));
+
+		strvec_split(&encoded, index_pack_args);
+
+		CALLOC_ARRAY(raw, encoded.nr + 1);
+		for (i = 0; i < encoded.nr; i++)
+			raw[i] = url_percent_decode(encoded.v[i]);
+
+		fetch_single_packfile(&packfile_hash, argv[arg],
+				      (const char **) raw);
+
+		for (i = 0; i < encoded.nr; i++)
+			free(raw[i]);
+		free(raw);
+		strvec_clear(&encoded);
+
 		return 0;
 	}
 
+	if (index_pack_args)
+		die(_("--index-pack-args can only be used with --packfile"));
+
 	if (commits_on_stdin) {
 		commits = walker_targets_stdin(&commit_id, &write_ref);
 	} else {
diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh
index 483578b2d7..af90e7efed 100755
--- a/t/t5550-http-fetch-dumb.sh
+++ b/t/t5550-http-fetch-dumb.sh
@@ -224,7 +224,8 @@ test_expect_success 'http-fetch --packfile' '
 
 	git init packfileclient &&
 	p=$(cd "$HTTPD_DOCUMENT_ROOT_PATH"/repo_pack.git && ls objects/pack/pack-*.pack) &&
-	git -C packfileclient http-fetch --packfile=$ARBITRARY "$HTTPD_URL"/dumb/repo_pack.git/$p >out &&
+	git -C packfileclient http-fetch --packfile=$ARBITRARY \
+		--index-pack-args="index-pack --stdin --keep" "$HTTPD_URL"/dumb/repo_pack.git/$p >out &&
 
 	grep "^keep.[0-9a-f]\{16,\}$" out &&
 	cut -c6- out >packhash &&
-- 
2.30.0.280.ga3ce27912f-goog


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 3/4] fetch-pack: with packfile URIs, use index-pack arg
  2021-01-24  2:34 ` [PATCH 0/4] Check .gitmodules when using packfile URIs Jonathan Tan
  2021-01-24  2:34   ` [PATCH 1/4] http: allow custom index-pack args Jonathan Tan
  2021-01-24  2:34   ` [PATCH 2/4] http-fetch: " Jonathan Tan
@ 2021-01-24  2:34   ` Jonathan Tan
  2021-01-24  2:34   ` [PATCH 4/4] fetch-pack: print and use dangling .gitmodules Jonathan Tan
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 229+ messages in thread
From: Jonathan Tan @ 2021-01-24  2:34 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Unify the index-pack arguments used when processing the inline pack and
when downloading packfiles referenced by URIs. This is done by teaching
get_pack() to also store the index-pack arguments whenever at least one
packfile URI is given, and then when processing the packfile URI(s),
using the stored arguments.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 fetch-pack.c | 35 ++++++++++++++++++++++++++---------
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 274ae602f7..fe69635eb5 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -797,12 +797,13 @@ static void write_promisor_file(const char *keep_name,
 }
 
 /*
- * Pass 1 as "only_packfile" if the pack received is the only pack in this
- * fetch request (that is, if there were no packfile URIs provided).
+ * If packfile URIs were provided, pass a non-NULL pointer to index_pack_args.
+ * The string to pass as the --index-pack-args argument to http-fetch will be
+ * stored there. (It must be freed by the caller.)
  */
 static int get_pack(struct fetch_pack_args *args,
 		    int xd[2], struct string_list *pack_lockfiles,
-		    int only_packfile,
+		    char **index_pack_args,
 		    struct ref **sought, int nr_sought)
 {
 	struct async demux;
@@ -845,7 +846,7 @@ static int get_pack(struct fetch_pack_args *args,
 		strvec_push(&cmd.args, alternate_shallow_file);
 	}
 
-	if (do_keep || args->from_promisor) {
+	if (do_keep || args->from_promisor || index_pack_args) {
 		if (pack_lockfiles)
 			cmd.out = -1;
 		cmd_name = "index-pack";
@@ -863,7 +864,7 @@ static int get_pack(struct fetch_pack_args *args,
 				     "--keep=fetch-pack %"PRIuMAX " on %s",
 				     (uintmax_t)getpid(), hostname);
 		}
-		if (only_packfile && args->check_self_contained_and_connected)
+		if (!index_pack_args && args->check_self_contained_and_connected)
 			strvec_push(&cmd.args, "--check-self-contained-and-connected");
 		else
 			/*
@@ -901,7 +902,7 @@ static int get_pack(struct fetch_pack_args *args,
 	    : transfer_fsck_objects >= 0
 	    ? transfer_fsck_objects
 	    : 0) {
-		if (args->from_promisor || !only_packfile)
+		if (args->from_promisor || index_pack_args)
 			/*
 			 * We cannot use --strict in index-pack because it
 			 * checks both broken objects and links, but we only
@@ -913,6 +914,19 @@ static int get_pack(struct fetch_pack_args *args,
 				     fsck_msg_types.buf);
 	}
 
+	if (index_pack_args) {
+		struct strbuf joined = STRBUF_INIT;
+		int i;
+
+		for (i = 0; i < cmd.args.nr; i++) {
+			if (i)
+				strbuf_addch(&joined, ' ');
+			strbuf_addstr_urlencode(&joined, cmd.args.v[i],
+						is_rfc3986_unreserved);
+		}
+		*index_pack_args = strbuf_detach(&joined, NULL);
+	}
+
 	cmd.in = demux.out;
 	cmd.git_cmd = 1;
 	if (start_command(&cmd))
@@ -1084,7 +1098,7 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 		alternate_shallow_file = setup_temporary_shallow(si->shallow);
 	else
 		alternate_shallow_file = NULL;
-	if (get_pack(args, fd, pack_lockfiles, 1, sought, nr_sought))
+	if (get_pack(args, fd, pack_lockfiles, NULL, sought, nr_sought))
 		die(_("git fetch-pack: fetch failed."));
 
  all_done:
@@ -1535,6 +1549,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 	int seen_ack = 0;
 	struct string_list packfile_uris = STRING_LIST_INIT_DUP;
 	int i;
+	char *index_pack_args = NULL;
 
 	negotiator = &negotiator_alloc;
 	fetch_negotiator_init(r, negotiator);
@@ -1624,7 +1639,8 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 				receive_packfile_uris(&reader, &packfile_uris);
 			process_section_header(&reader, "packfile", 0);
 			if (get_pack(args, fd, pack_lockfiles,
-				     !packfile_uris.nr, sought, nr_sought))
+				     packfile_uris.nr ? &index_pack_args : NULL,
+				     sought, nr_sought))
 				die(_("git fetch-pack: fetch failed."));
 			do_check_stateless_delimiter(args, &reader);
 
@@ -1645,7 +1661,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 		strvec_pushf(&cmd.args, "--packfile=%.*s",
 			     (int) the_hash_algo->hexsz,
 			     packfile_uris.items[i].string);
-		strvec_push(&cmd.args, "--index-pack-args=index-pack --stdin --keep");
+		strvec_pushf(&cmd.args, "--index-pack-args=%s", index_pack_args);
 		strvec_push(&cmd.args, uri);
 		cmd.git_cmd = 1;
 		cmd.no_stdin = 1;
@@ -1681,6 +1697,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 						 packname));
 	}
 	string_list_clear(&packfile_uris, 0);
+	FREE_AND_NULL(index_pack_args);
 
 	if (negotiator)
 		negotiator->release(negotiator);
-- 
2.30.0.280.ga3ce27912f-goog


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 4/4] fetch-pack: print and use dangling .gitmodules
  2021-01-24  2:34 ` [PATCH 0/4] Check .gitmodules when using packfile URIs Jonathan Tan
                     ` (2 preceding siblings ...)
  2021-01-24  2:34   ` [PATCH 3/4] fetch-pack: with packfile URIs, use index-pack arg Jonathan Tan
@ 2021-01-24  2:34   ` Jonathan Tan
  2021-01-24  7:56     ` Junio C Hamano
                       ` (3 more replies)
  2021-01-24  6:29   ` [PATCH 0/4] Check .gitmodules when using packfile URIs Junio C Hamano
  2021-02-18 23:34   ` Junio C Hamano
  5 siblings, 4 replies; 229+ messages in thread
From: Jonathan Tan @ 2021-01-24  2:34 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Teach index-pack to print dangling .gitmodules links after its "keep" or
"pack" line instead of declaring an error, and teach fetch-pack to check
such lines printed.

This allows the tree side of the .gitmodules link to be in one packfile
and the blob side to be in another without failing the fsck check,
because it is now fetch-pack which checks such objects after all
packfiles have been downloaded and indexed (and not index-pack on an
individual packfile, as it is before this commit).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/git-index-pack.txt |  7 ++-
 builtin/index-pack.c             |  9 +++-
 builtin/receive-pack.c           |  2 +-
 fetch-pack.c                     | 78 +++++++++++++++++++++++++++-----
 fsck.c                           | 16 +++++--
 fsck.h                           |  8 ++++
 pack-write.c                     |  8 +++-
 pack.h                           |  2 +-
 t/t5702-protocol-v2.sh           | 47 +++++++++++++++++++
 9 files changed, 155 insertions(+), 22 deletions(-)

diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index af0c26232c..e74a4a1eda 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -78,7 +78,12 @@ OPTIONS
 	Die if the pack contains broken links. For internal use only.
 
 --fsck-objects::
-	Die if the pack contains broken objects. For internal use only.
+	For internal use only.
++
+Die if the pack contains broken objects. If the pack contains a tree
+pointing to a .gitmodules blob that does not exist, prints the hash of
+that blob (for the caller to check) after the hash that goes into the
+name of the pack/idx file (see "Notes").
 
 --threads=<n>::
 	Specifies the number of threads to spawn when resolving
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 557bd2f348..f995c15115 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1888,8 +1888,13 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	else
 		close(input_fd);
 
-	if (do_fsck_object && fsck_finish(&fsck_options))
-		die(_("fsck error in pack objects"));
+	if (do_fsck_object) {
+		struct fsck_options fo = FSCK_OPTIONS_STRICT;
+
+		fo.print_dangling_gitmodules = 1;
+		if (fsck_finish(&fo))
+			die(_("fsck error in pack objects"));
+	}
 
 	free(objects);
 	strbuf_release(&index_name_buf);
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index d49d050e6e..ed2c9b42e9 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -2275,7 +2275,7 @@ static const char *unpack(int err_fd, struct shallow_info *si)
 		status = start_command(&child);
 		if (status)
 			return "index-pack fork failed";
-		pack_lockfile = index_pack_lockfile(child.out);
+		pack_lockfile = index_pack_lockfile(child.out, NULL);
 		close(child.out);
 		status = finish_command(&child);
 		if (status)
diff --git a/fetch-pack.c b/fetch-pack.c
index fe69635eb5..128362e0ba 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -796,6 +796,26 @@ static void write_promisor_file(const char *keep_name,
 	strbuf_release(&promisor_name);
 }
 
+static void parse_gitmodules_oids(int fd, struct oidset *gitmodules_oids)
+{
+	int len = the_hash_algo->hexsz + 1; /* hash + NL */
+
+	do {
+		char hex_hash[GIT_MAX_HEXSZ + 1];
+		int read_len = read_in_full(fd, hex_hash, len);
+		struct object_id oid;
+		const char *end;
+
+		if (!read_len)
+			return;
+		if (read_len != len)
+			die("invalid length read %d", read_len);
+		if (parse_oid_hex(hex_hash, &oid, &end) || *end != '\n')
+			die("invalid hash");
+		oidset_insert(gitmodules_oids, &oid);
+	} while (1);
+}
+
 /*
  * If packfile URIs were provided, pass a non-NULL pointer to index_pack_args.
  * The string to pass as the --index-pack-args argument to http-fetch will be
@@ -804,7 +824,8 @@ static void write_promisor_file(const char *keep_name,
 static int get_pack(struct fetch_pack_args *args,
 		    int xd[2], struct string_list *pack_lockfiles,
 		    char **index_pack_args,
-		    struct ref **sought, int nr_sought)
+		    struct ref **sought, int nr_sought,
+		    struct oidset *gitmodules_oids)
 {
 	struct async demux;
 	int do_keep = args->keep_pack;
@@ -812,6 +833,7 @@ static int get_pack(struct fetch_pack_args *args,
 	struct pack_header header;
 	int pass_header = 0;
 	struct child_process cmd = CHILD_PROCESS_INIT;
+	int fsck_objects = 0;
 	int ret;
 
 	memset(&demux, 0, sizeof(demux));
@@ -846,8 +868,15 @@ static int get_pack(struct fetch_pack_args *args,
 		strvec_push(&cmd.args, alternate_shallow_file);
 	}
 
-	if (do_keep || args->from_promisor || index_pack_args) {
-		if (pack_lockfiles)
+	if (fetch_fsck_objects >= 0
+	    ? fetch_fsck_objects
+	    : transfer_fsck_objects >= 0
+	    ? transfer_fsck_objects
+	    : 0)
+		fsck_objects = 1;
+
+	if (do_keep || args->from_promisor || index_pack_args || fsck_objects) {
+		if (pack_lockfiles || fsck_objects)
 			cmd.out = -1;
 		cmd_name = "index-pack";
 		strvec_push(&cmd.args, cmd_name);
@@ -897,11 +926,7 @@ static int get_pack(struct fetch_pack_args *args,
 		strvec_pushf(&cmd.args, "--pack_header=%"PRIu32",%"PRIu32,
 			     ntohl(header.hdr_version),
 				 ntohl(header.hdr_entries));
-	if (fetch_fsck_objects >= 0
-	    ? fetch_fsck_objects
-	    : transfer_fsck_objects >= 0
-	    ? transfer_fsck_objects
-	    : 0) {
+	if (fsck_objects) {
 		if (args->from_promisor || index_pack_args)
 			/*
 			 * We cannot use --strict in index-pack because it
@@ -931,10 +956,15 @@ static int get_pack(struct fetch_pack_args *args,
 	cmd.git_cmd = 1;
 	if (start_command(&cmd))
 		die(_("fetch-pack: unable to fork off %s"), cmd_name);
-	if (do_keep && pack_lockfiles) {
-		char *pack_lockfile = index_pack_lockfile(cmd.out);
+	if (do_keep && (pack_lockfiles || fsck_objects)) {
+		int is_well_formed;
+		char *pack_lockfile = index_pack_lockfile(cmd.out, &is_well_formed);
+
+		if (!is_well_formed)
+			die(_("fetch-pack: invalid index-pack output"));
 		if (pack_lockfile)
 			string_list_append_nodup(pack_lockfiles, pack_lockfile);
+		parse_gitmodules_oids(cmd.out, gitmodules_oids);
 		close(cmd.out);
 	}
 
@@ -969,6 +999,22 @@ static int cmp_ref_by_name(const void *a_, const void *b_)
 	return strcmp(a->name, b->name);
 }
 
+static void fsck_gitmodules_oids(struct oidset *gitmodules_oids)
+{
+	struct oidset_iter iter;
+	const struct object_id *oid;
+	struct fsck_options fo = FSCK_OPTIONS_STRICT;
+
+	if (!oidset_size(gitmodules_oids))
+		return;
+
+	oidset_iter_init(gitmodules_oids, &iter);
+	while ((oid = oidset_iter_next(&iter)))
+		register_found_gitmodules(oid);
+	if (fsck_finish(&fo))
+		die("fsck failed");
+}
+
 static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 				 int fd[2],
 				 const struct ref *orig_ref,
@@ -983,6 +1029,7 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 	int agent_len;
 	struct fetch_negotiator negotiator_alloc;
 	struct fetch_negotiator *negotiator;
+	struct oidset gitmodules_oids = OIDSET_INIT;
 
 	negotiator = &negotiator_alloc;
 	fetch_negotiator_init(r, negotiator);
@@ -1098,8 +1145,10 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 		alternate_shallow_file = setup_temporary_shallow(si->shallow);
 	else
 		alternate_shallow_file = NULL;
-	if (get_pack(args, fd, pack_lockfiles, NULL, sought, nr_sought))
+	if (get_pack(args, fd, pack_lockfiles, NULL, sought, nr_sought,
+		     &gitmodules_oids))
 		die(_("git fetch-pack: fetch failed."));
+	fsck_gitmodules_oids(&gitmodules_oids);
 
  all_done:
 	if (negotiator)
@@ -1550,6 +1599,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 	struct string_list packfile_uris = STRING_LIST_INIT_DUP;
 	int i;
 	char *index_pack_args = NULL;
+	struct oidset gitmodules_oids = OIDSET_INIT;
 
 	negotiator = &negotiator_alloc;
 	fetch_negotiator_init(r, negotiator);
@@ -1640,7 +1690,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 			process_section_header(&reader, "packfile", 0);
 			if (get_pack(args, fd, pack_lockfiles,
 				     packfile_uris.nr ? &index_pack_args : NULL,
-				     sought, nr_sought))
+				     sought, nr_sought, &gitmodules_oids))
 				die(_("git fetch-pack: fetch failed."));
 			do_check_stateless_delimiter(args, &reader);
 
@@ -1680,6 +1730,8 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 
 		packname[the_hash_algo->hexsz] = '\0';
 
+		parse_gitmodules_oids(cmd.out, &gitmodules_oids);
+
 		close(cmd.out);
 
 		if (finish_command(&cmd))
@@ -1699,6 +1751,8 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 	string_list_clear(&packfile_uris, 0);
 	FREE_AND_NULL(index_pack_args);
 
+	fsck_gitmodules_oids(&gitmodules_oids);
+
 	if (negotiator)
 		negotiator->release(negotiator);
 
diff --git a/fsck.c b/fsck.c
index f82e2fe9e3..04f3d342af 100644
--- a/fsck.c
+++ b/fsck.c
@@ -1243,6 +1243,11 @@ int fsck_error_function(struct fsck_options *o,
 	return 1;
 }
 
+void register_found_gitmodules(const struct object_id *oid)
+{
+	oidset_insert(&gitmodules_found, oid);
+}
+
 int fsck_finish(struct fsck_options *options)
 {
 	int ret = 0;
@@ -1262,10 +1267,13 @@ int fsck_finish(struct fsck_options *options)
 		if (!buf) {
 			if (is_promisor_object(oid))
 				continue;
-			ret |= report(options,
-				      oid, OBJ_BLOB,
-				      FSCK_MSG_GITMODULES_MISSING,
-				      "unable to read .gitmodules blob");
+			if (options->print_dangling_gitmodules)
+				printf("%s\n", oid_to_hex(oid));
+			else
+				ret |= report(options,
+					      oid, OBJ_BLOB,
+					      FSCK_MSG_GITMODULES_MISSING,
+					      "unable to read .gitmodules blob");
 			continue;
 		}
 
diff --git a/fsck.h b/fsck.h
index 69cf715e79..4b8cf03445 100644
--- a/fsck.h
+++ b/fsck.h
@@ -41,6 +41,12 @@ struct fsck_options {
 	int *msg_type;
 	struct oidset skiplist;
 	kh_oid_map_t *object_names;
+
+	/*
+	 * If 1, print the hashes of missing .gitmodules blobs instead of
+	 * considering them to be errors.
+	 */
+	unsigned print_dangling_gitmodules:1;
 };
 
 #define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT }
@@ -62,6 +68,8 @@ int fsck_walk(struct object *obj, void *data, struct fsck_options *options);
 int fsck_object(struct object *obj, void *data, unsigned long size,
 	struct fsck_options *options);
 
+void register_found_gitmodules(const struct object_id *oid);
+
 /*
  * Some fsck checks are context-dependent, and may end up queued; run this
  * after completing all fsck_object() calls in order to resolve any remaining
diff --git a/pack-write.c b/pack-write.c
index 3513665e1e..f66ea8e5a1 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -272,7 +272,7 @@ void fixup_pack_header_footer(int pack_fd,
 	fsync_or_die(pack_fd, pack_name);
 }
 
-char *index_pack_lockfile(int ip_out)
+char *index_pack_lockfile(int ip_out, int *is_well_formed)
 {
 	char packname[GIT_MAX_HEXSZ + 6];
 	const int len = the_hash_algo->hexsz + 6;
@@ -286,11 +286,17 @@ char *index_pack_lockfile(int ip_out)
 	 */
 	if (read_in_full(ip_out, packname, len) == len && packname[len-1] == '\n') {
 		const char *name;
+
+		if (is_well_formed)
+			*is_well_formed = 1;
 		packname[len-1] = 0;
 		if (skip_prefix(packname, "keep\t", &name))
 			return xstrfmt("%s/pack/pack-%s.keep",
 				       get_object_directory(), name);
+		return NULL;
 	}
+	if (is_well_formed)
+		*is_well_formed = 0;
 	return NULL;
 }
 
diff --git a/pack.h b/pack.h
index 9fc0945ac9..09cffec395 100644
--- a/pack.h
+++ b/pack.h
@@ -85,7 +85,7 @@ int verify_pack_index(struct packed_git *);
 int verify_pack(struct repository *, struct packed_git *, verify_fn fn, struct progress *, uint32_t);
 off_t write_pack_header(struct hashfile *f, uint32_t);
 void fixup_pack_header_footer(int, unsigned char *, const char *, uint32_t, unsigned char *, off_t);
-char *index_pack_lockfile(int fd);
+char *index_pack_lockfile(int fd, int *is_well_formed);
 
 /*
  * The "hdr" output buffer should be at least this big, which will handle sizes
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index 7d5b17909b..8b8fb43dbc 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -936,6 +936,53 @@ test_expect_success 'packfile-uri with transfer.fsckobjects fails on bad object'
 	test_i18ngrep "invalid author/committer line - missing email" error
 '
 
+test_expect_success 'packfile-uri with transfer.fsckobjects succeeds when .gitmodules is separate from tree' '
+	P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+	rm -rf "$P" http_child &&
+
+	git init "$P" &&
+	git -C "$P" config "uploadpack.allowsidebandall" "true" &&
+
+	echo "[submodule libfoo]" >"$P/.gitmodules" &&
+	echo "path = include/foo" >>"$P/.gitmodules" &&
+	echo "url = git://example.com/git/lib.git" >>"$P/.gitmodules" &&
+	git -C "$P" add .gitmodules &&
+	git -C "$P" commit -m x &&
+
+	configure_exclusion "$P" .gitmodules >h &&
+
+	sane_unset GIT_TEST_SIDEBAND_ALL &&
+	git -c protocol.version=2 -c transfer.fsckobjects=1 \
+		-c fetch.uriprotocols=http,https \
+		clone "$HTTPD_URL/smart/http_parent" http_child &&
+
+	# Ensure that there are exactly 4 files (2 .pack and 2 .idx).
+	ls http_child/.git/objects/pack/* >filelist &&
+	test_line_count = 4 filelist
+'
+
+test_expect_success 'packfile-uri with transfer.fsckobjects fails when .gitmodules separate from tree is invalid' '
+	P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+	rm -rf "$P" http_child err &&
+
+	git init "$P" &&
+	git -C "$P" config "uploadpack.allowsidebandall" "true" &&
+
+	echo "[submodule \"..\"]" >"$P/.gitmodules" &&
+	echo "path = include/foo" >>"$P/.gitmodules" &&
+	echo "url = git://example.com/git/lib.git" >>"$P/.gitmodules" &&
+	git -C "$P" add .gitmodules &&
+	git -C "$P" commit -m x &&
+
+	configure_exclusion "$P" .gitmodules >h &&
+
+	sane_unset GIT_TEST_SIDEBAND_ALL &&
+	test_must_fail git -c protocol.version=2 -c transfer.fsckobjects=1 \
+		-c fetch.uriprotocols=http,https \
+		clone "$HTTPD_URL/smart/http_parent" http_child 2>err &&
+	test_i18ngrep "disallowed submodule name" err
+'
+
 # DO NOT add non-httpd-specific tests here, because the last part of this
 # test script is only executed when httpd is available and enabled.
 
-- 
2.30.0.280.ga3ce27912f-goog


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* Re: [PATCH 0/4] Check .gitmodules when using packfile URIs
  2021-01-24  2:34 ` [PATCH 0/4] Check .gitmodules when using packfile URIs Jonathan Tan
                     ` (3 preceding siblings ...)
  2021-01-24  2:34   ` [PATCH 4/4] fetch-pack: print and use dangling .gitmodules Jonathan Tan
@ 2021-01-24  6:29   ` Junio C Hamano
  2021-01-28  0:35     ` Jonathan Tan
  2021-02-18 23:34   ` Junio C Hamano
  5 siblings, 1 reply; 229+ messages in thread
From: Junio C Hamano @ 2021-01-24  6:29 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> As part of this, index-pack has to output (1) the hash that goes into
> the name of the .pack/.idx file and (2) the hashes of all dangling
> .gitmodules. I just had (2) come after (1). If anyone has a better idea,
> I'm interested.

I have this feeling that the "blobs that need to be validated across
packs" will *not* be the last enhancement we'd need to make to the
output from index-pack to allow richer communication between it and
its invoker.  While there is no reason to change how the first line
of the output looks like, we'd probably want to make sure that the
future versions of Git can easily tell "list of blobs that require
further validation" from other additional information.

I am not comfortable to recommend "ok, then let's add a delimiter
line '---\n' if/when we need to have something after the list of
blobs and append more stuff in future versions of Git", because we
may find need to emit new kinds of info before the list of blobs
that needs further validation, for example, in future versions of
Git.

Having said all that, the internal communication between the
index-pack and its caller do not need as much care about
compatibility across versions as output visible to end-users, so
when a future version of Git needs to send different kinds of
information in different order from what you created here, we can do
so pretty much freely, I would guess.

Thanks.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 4/4] fetch-pack: print and use dangling .gitmodules
  2021-01-24  2:34   ` [PATCH 4/4] fetch-pack: print and use dangling .gitmodules Jonathan Tan
@ 2021-01-24  7:56     ` Junio C Hamano
  2021-01-26  1:57       ` Junio C Hamano
  2021-01-24 12:18     ` Ævar Arnfjörð Bjarmason
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 229+ messages in thread
From: Junio C Hamano @ 2021-01-24  7:56 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
> index 7d5b17909b..8b8fb43dbc 100755
> ...
> +	sane_unset GIT_TEST_SIDEBAND_ALL &&
> +	git -c protocol.version=2 -c transfer.fsckobjects=1 \
> +		-c fetch.uriprotocols=http,https \
> +		clone "$HTTPD_URL/smart/http_parent" http_child &&
> +
> +	# Ensure that there are exactly 4 files (2 .pack and 2 .idx).

Ehh, please don't.  We may add multi-pack-index there, or perhaps
reverse index files in the future.  If you care about having two
packs logically because you are exercising the out-of-band
prepackaged packfile plus the dynamic transfer, make sure you have
two packs (and probably the idx files that go with them).  Don't
assume there will be one .idx each for them *AND* nothing else
there.

> +	ls http_child/.git/objects/pack/* >filelist &&
> +	test_line_count = 4 filelist
> +'

IOW,

	d=http_child/.git/objects/pack/
	ls "$d"/*.pack "$d"/*.idx >filelist &&
	test_line_count = 4 filelist

or something like that.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 2/4] http-fetch: allow custom index-pack args
  2021-01-24  2:34   ` [PATCH 2/4] http-fetch: " Jonathan Tan
@ 2021-01-24 11:52     ` Ævar Arnfjörð Bjarmason
  2021-01-28  0:32       ` Jonathan Tan
  2021-02-16 20:49     ` Josh Steadmon
  1 sibling, 1 reply; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-01-24 11:52 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git


On Sun, Jan 24 2021, Jonathan Tan wrote:

>  --packfile=<hash>::
> -	Instead of a commit id on the command line (which is not expected in
> +	For internal use only. Instead of a commit id on the command line (which is not expected in

Leaves the rest at ~79 and this long line at ~100. Perhaps a follow-up
change to re-word-wrap would be in order?

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 4/4] fetch-pack: print and use dangling .gitmodules
  2021-01-24  2:34   ` [PATCH 4/4] fetch-pack: print and use dangling .gitmodules Jonathan Tan
  2021-01-24  7:56     ` Junio C Hamano
@ 2021-01-24 12:18     ` Ævar Arnfjörð Bjarmason
  2021-01-28  1:03       ` Jonathan Tan
  2021-01-24 12:30     ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:27     ` Ævar Arnfjörð Bjarmason
  3 siblings, 1 reply; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-01-24 12:18 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git


On Sun, Jan 24 2021, Jonathan Tan wrote:

> +void register_found_gitmodules(const struct object_id *oid)
> +{
> +	oidset_insert(&gitmodules_found, oid);
> +}
> +

In fsck.c we only use this variable to insert into it, or in fsck_blob()
to do the actual check, but then we either abort early if we've found
it, or right after that:

        if (object_on_skiplist(options, oid))
                return 0;

So (along with comments I have below...) you could just use the existing
"skiplist" option instead, no?

>  int fsck_finish(struct fsck_options *options)
>  {
>  	int ret = 0;
> @@ -1262,10 +1267,13 @@ int fsck_finish(struct fsck_options *options)
>  		if (!buf) {
>  			if (is_promisor_object(oid))
>  				continue;
> -			ret |= report(options,
> -				      oid, OBJ_BLOB,
> -				      FSCK_MSG_GITMODULES_MISSING,
> -				      "unable to read .gitmodules blob");
> +			if (options->print_dangling_gitmodules)
> +				printf("%s\n", oid_to_hex(oid));
> +			else
> +				ret |= report(options,
> +					      oid, OBJ_BLOB,
> +					      FSCK_MSG_GITMODULES_MISSING,
> +					      "unable to read .gitmodules blob");
>  			continue;
>  		}
>  
> diff --git a/fsck.h b/fsck.h
> index 69cf715e79..4b8cf03445 100644
> --- a/fsck.h
> +++ b/fsck.h
> @@ -41,6 +41,12 @@ struct fsck_options {
>  	int *msg_type;
>  	struct oidset skiplist;
>  	kh_oid_map_t *object_names;
> +
> +	/*
> +	 * If 1, print the hashes of missing .gitmodules blobs instead of
> +	 * considering them to be errors.
> +	 */
> +	unsigned print_dangling_gitmodules:1;
>  };
>  
>  #define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT }
> @@ -62,6 +68,8 @@ int fsck_walk(struct object *obj, void *data, struct fsck_options *options);
>  int fsck_object(struct object *obj, void *data, unsigned long size,
>  	struct fsck_options *options);
>  
> +void register_found_gitmodules(const struct object_id *oid);
> +
>  /*
>   * Some fsck checks are context-dependent, and may end up queued; run this
>   * after completing all fsck_object() calls in order to resolve any remaining


This whole thing seems just like the bad path I took in earlier rounds
of my in-flight mktag series. You don't need this new custom API. You
just setup an error handler for your fsck which ignores / prints / logs
/ whatever the OIDs you want if you get a FSCK_MSG_GITMODULES_MISSING
error, which you then "return 0" on.

If you don't have FSCK_MSG_GITMODULES_MISSING punt and call
fsck_error_function().

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 4/4] fetch-pack: print and use dangling .gitmodules
  2021-01-24  2:34   ` [PATCH 4/4] fetch-pack: print and use dangling .gitmodules Jonathan Tan
  2021-01-24  7:56     ` Junio C Hamano
  2021-01-24 12:18     ` Ævar Arnfjörð Bjarmason
@ 2021-01-24 12:30     ` Ævar Arnfjörð Bjarmason
  2021-01-28  1:15       ` Jonathan Tan
  2021-02-17 19:27     ` Ævar Arnfjörð Bjarmason
  3 siblings, 1 reply; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-01-24 12:30 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git


On Sun, Jan 24 2021, Jonathan Tan wrote:
>  --fsck-objects::
> -	Die if the pack contains broken objects. For internal use only.
> +	For internal use only.
> ++
> +Die if the pack contains broken objects. If the pack contains a tree
> +pointing to a .gitmodules blob that does not exist, prints the hash of
> +that blob (for the caller to check) after the hash that goes into the
> +name of the pack/idx file (see "Notes").

[I should have waited a bit and sent one E-Mail]

Is this really generally usable as an IPC mechanism, what if we need
another set of OIDs we care about? Shouldn't it at least be hidden
behind some option so you don't get a deluge of output from index-pack
if you're not in this packfile-uri mode?

But, along with my other E-Mail...

> [...]
> +static void parse_gitmodules_oids(int fd, struct oidset *gitmodules_oids)
> +{
> +	int len = the_hash_algo->hexsz + 1; /* hash + NL */
> +
> +	do {
> +		char hex_hash[GIT_MAX_HEXSZ + 1];
> +		int read_len = read_in_full(fd, hex_hash, len);
> +		struct object_id oid;
> +		const char *end;
> +
> +		if (!read_len)
> +			return;
> +		if (read_len != len)
> +			die("invalid length read %d", read_len);
> +		if (parse_oid_hex(hex_hash, &oid, &end) || *end != '\n')
> +			die("invalid hash");
> +		oidset_insert(gitmodules_oids, &oid);
> +	} while (1);
> +}
> +

Doesn't this IPC mechanism already exist in the form of fsck.skipList?
See my 1f3299fda9 (fsck: make fsck_config() re-usable, 2021-01-05) on
"next". I.e. as noted in my just-sent-E-Mail you could probably just
re-use skiplist as-is.

Or if not it seems to me that this whole IPC mechanism would be better
done with a tempfile and passing it along like we already pass the
fsck.skipList between these processes.

I doubt it's going to be large enough to matter, we could just put it in
.git/ somewhere, like we put gc.log etc (but created with a mktemp()
name...).

Or if we want to keep the "print <list> | process" model we can refactor
the existing fsck IPC noted in 1f3299fda9 a bit, so e.g. you pass some
version of "lines prefixed with "fsck-skiplist: " go into list xyz via a
command-line option. And then existing option(s) and your potential new
list (which as noted, I think is probably redundant to the skiplist) can
use it.




^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 4/4] fetch-pack: print and use dangling .gitmodules
  2021-01-24  7:56     ` Junio C Hamano
@ 2021-01-26  1:57       ` Junio C Hamano
  2021-01-28  1:04         ` Jonathan Tan
  0 siblings, 1 reply; 229+ messages in thread
From: Junio C Hamano @ 2021-01-26  1:57 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> Jonathan Tan <jonathantanmy@google.com> writes:
>
>> diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
>> index 7d5b17909b..8b8fb43dbc 100755
>> ...
>> +	sane_unset GIT_TEST_SIDEBAND_ALL &&
>> +	git -c protocol.version=2 -c transfer.fsckobjects=1 \
>> +		-c fetch.uriprotocols=http,https \
>> +		clone "$HTTPD_URL/smart/http_parent" http_child &&
>> +
>> +	# Ensure that there are exactly 4 files (2 .pack and 2 .idx).
>
> Ehh, please don't.  We may add multi-pack-index there, or perhaps
> reverse index files in the future.  If you care about having two
> packs logically because you are exercising the out-of-band
> prepackaged packfile plus the dynamic transfer, make sure you have
> two packs (and probably the idx files that go with them).  Don't
> assume there will be one .idx each for them *AND* nothing else
> there.
>
>> +	ls http_child/.git/objects/pack/* >filelist &&
>> +	test_line_count = 4 filelist
>> +'
>
> IOW,
>
> 	d=http_child/.git/objects/pack/
> 	ls "$d"/*.pack "$d"/*.idx >filelist &&
> 	test_line_count = 4 filelist
>
> or something like that.

FYI, I have the following queued to make the tip of 'seen' pass the
tests.

---- >8 -------- >8 -------- >8 -------- >8 -------- >8 -------- >8 ----
From: Junio C Hamano <gitster@pobox.com>
Date: Mon, 25 Jan 2021 17:27:10 -0800
Subject: [PATCH] SQUASH??? test fix

---
 t/t5702-protocol-v2.sh | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index 8b8fb43dbc..b1bc73a9a9 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -847,8 +847,9 @@ test_expect_success 'part of packfile response provided as URI' '
 	test -f hfound &&
 	test -f h2found &&
 
-	# Ensure that there are exactly 6 files (3 .pack and 3 .idx).
-	ls http_child/.git/objects/pack/* >filelist &&
+	# Ensure that there are exactly 3 packfiles with associated .idx
+	ls http_child/.git/objects/pack/*.pack \
+	    http_child/.git/objects/pack/*.idx >filelist &&
 	test_line_count = 6 filelist
 '
 
@@ -901,8 +902,9 @@ test_expect_success 'packfile-uri with transfer.fsckobjects' '
 		-c fetch.uriprotocols=http,https \
 		clone "$HTTPD_URL/smart/http_parent" http_child &&
 
-	# Ensure that there are exactly 4 files (2 .pack and 2 .idx).
-	ls http_child/.git/objects/pack/* >filelist &&
+	# Ensure that there are exactly 2 packfiles with associated .idx
+	ls http_child/.git/objects/pack/*.pack \
+	    http_child/.git/objects/pack/*.idx >filelist &&
 	test_line_count = 4 filelist
 '
 
@@ -956,8 +958,9 @@ test_expect_success 'packfile-uri with transfer.fsckobjects succeeds when .gitmo
 		-c fetch.uriprotocols=http,https \
 		clone "$HTTPD_URL/smart/http_parent" http_child &&
 
-	# Ensure that there are exactly 4 files (2 .pack and 2 .idx).
-	ls http_child/.git/objects/pack/* >filelist &&
+	# Ensure that there are exactly 2 packfiles with associated .idx
+	ls http_child/.git/objects/pack/*.pack \
+	    http_child/.git/objects/pack/*.idx >filelist &&
 	test_line_count = 4 filelist
 '
 
-- 
2.30.0-509-gbbf2750a06


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* Re: [PATCH 2/4] http-fetch: allow custom index-pack args
  2021-01-24 11:52     ` Ævar Arnfjörð Bjarmason
@ 2021-01-28  0:32       ` Jonathan Tan
  0 siblings, 0 replies; 229+ messages in thread
From: Jonathan Tan @ 2021-01-28  0:32 UTC (permalink / raw)
  To: avarab; +Cc: jonathantanmy, git

> On Sun, Jan 24 2021, Jonathan Tan wrote:
> 
> >  --packfile=<hash>::
> > -	Instead of a commit id on the command line (which is not expected in
> > +	For internal use only. Instead of a commit id on the command line (which is not expected in
> 
> Leaves the rest at ~79 and this long line at ~100. Perhaps a follow-up
> change to re-word-wrap would be in order?

Hmm...I'll split that onto two lines then. I don't think it's worth the
extra commit in history to have it exactly wrapped right, so I'll forgo
the follow-up change for now.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 0/4] Check .gitmodules when using packfile URIs
  2021-01-24  6:29   ` [PATCH 0/4] Check .gitmodules when using packfile URIs Junio C Hamano
@ 2021-01-28  0:35     ` Jonathan Tan
  2021-02-18 11:31       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 229+ messages in thread
From: Jonathan Tan @ 2021-01-28  0:35 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > As part of this, index-pack has to output (1) the hash that goes into
> > the name of the .pack/.idx file and (2) the hashes of all dangling
> > .gitmodules. I just had (2) come after (1). If anyone has a better idea,
> > I'm interested.
> 
> I have this feeling that the "blobs that need to be validated across
> packs" will *not* be the last enhancement we'd need to make to the
> output from index-pack to allow richer communication between it and
> its invoker.  While there is no reason to change how the first line
> of the output looks like, we'd probably want to make sure that the
> future versions of Git can easily tell "list of blobs that require
> further validation" from other additional information.
> 
> I am not comfortable to recommend "ok, then let's add a delimiter
> line '---\n' if/when we need to have something after the list of
> blobs and append more stuff in future versions of Git", because we
> may find need to emit new kinds of info before the list of blobs
> that needs further validation, for example, in future versions of
> Git.
> 
> Having said all that, the internal communication between the
> index-pack and its caller do not need as much care about
> compatibility across versions as output visible to end-users, so
> when a future version of Git needs to send different kinds of
> information in different order from what you created here, we can do
> so pretty much freely, I would guess.

Yeah, that's what I thought too - since this is an internal interface,
we can evolve them in lockstep. If we're really worried about the Git
binaries (on a user's system) getting out of sync, we could just make
sure that subsequent updates to this protocol are
non-backwards-compatible (e.g. have index-pack emit "foo <hash>", where
"foo" is a string that describes the new check, so that current
fetch-pack will reject "foo" since it is not a hash).

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 4/4] fetch-pack: print and use dangling .gitmodules
  2021-01-24 12:18     ` Ævar Arnfjörð Bjarmason
@ 2021-01-28  1:03       ` Jonathan Tan
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 229+ messages in thread
From: Jonathan Tan @ 2021-01-28  1:03 UTC (permalink / raw)
  To: avarab; +Cc: jonathantanmy, git

> On Sun, Jan 24 2021, Jonathan Tan wrote:
> 
> > +void register_found_gitmodules(const struct object_id *oid)
> > +{
> > +	oidset_insert(&gitmodules_found, oid);
> > +}
> > +
> 
> In fsck.c we only use this variable to insert into it, or in fsck_blob()
> to do the actual check, but then we either abort early if we've found
> it, or right after that:

By "this variable", do you mean gitmodules_found? fsck_finish() consumes
it.

>         if (object_on_skiplist(options, oid))
>                 return 0;
> 
> So (along with comments I have below...) you could just use the existing
> "skiplist" option instead, no?

I don't understand this part (in particular, the part you quoted). About
"skiplist", I'll reply to your other email [1] which has more details.

[1] https://lore.kernel.org/git/87czxu7c15.fsf@evledraar.gmail.com/

> This whole thing seems just like the bad path I took in earlier rounds
> of my in-flight mktag series. You don't need this new custom API. You
> just setup an error handler for your fsck which ignores / prints / logs
> / whatever the OIDs you want if you get a FSCK_MSG_GITMODULES_MISSING
> error, which you then "return 0" on.
> 
> If you don't have FSCK_MSG_GITMODULES_MISSING punt and call
> fsck_error_function().

I tried that first, and the issue is that IDs like
FSCK_MSG_GITMODULES_MISSING are internal to fsck.c. As for whether we
should start exposing the IDs publicly, I think we should wait until a
few new cases like this come up, so that we more fully understand the
requirements first.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 4/4] fetch-pack: print and use dangling .gitmodules
  2021-01-26  1:57       ` Junio C Hamano
@ 2021-01-28  1:04         ` Jonathan Tan
  0 siblings, 0 replies; 229+ messages in thread
From: Jonathan Tan @ 2021-01-28  1:04 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git

> > Ehh, please don't.  We may add multi-pack-index there, or perhaps
> > reverse index files in the future.  If you care about having two
> > packs logically because you are exercising the out-of-band
> > prepackaged packfile plus the dynamic transfer, make sure you have
> > two packs (and probably the idx files that go with them).  Don't
> > assume there will be one .idx each for them *AND* nothing else
> > there.
> >
> >> +	ls http_child/.git/objects/pack/* >filelist &&
> >> +	test_line_count = 4 filelist
> >> +'
> >
> > IOW,
> >
> > 	d=http_child/.git/objects/pack/
> > 	ls "$d"/*.pack "$d"/*.idx >filelist &&
> > 	test_line_count = 4 filelist
> >
> > or something like that.
> 
> FYI, I have the following queued to make the tip of 'seen' pass the
> tests.

[snip]

OK - I'll include these changes in the next version.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 4/4] fetch-pack: print and use dangling .gitmodules
  2021-01-24 12:30     ` Ævar Arnfjörð Bjarmason
@ 2021-01-28  1:15       ` Jonathan Tan
  2021-02-17  2:10         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 229+ messages in thread
From: Jonathan Tan @ 2021-01-28  1:15 UTC (permalink / raw)
  To: avarab; +Cc: jonathantanmy, git

> On Sun, Jan 24 2021, Jonathan Tan wrote:
> >  --fsck-objects::
> > -	Die if the pack contains broken objects. For internal use only.
> > +	For internal use only.
> > ++
> > +Die if the pack contains broken objects. If the pack contains a tree
> > +pointing to a .gitmodules blob that does not exist, prints the hash of
> > +that blob (for the caller to check) after the hash that goes into the
> > +name of the pack/idx file (see "Notes").
> 
> [I should have waited a bit and sent one E-Mail]
> 
> Is this really generally usable as an IPC mechanism, what if we need
> another set of OIDs we care about? Shouldn't it at least be hidden
> behind some option so you don't get a deluge of output from index-pack
> if you're not in this packfile-uri mode?

--fsck-objects is only for internal use, and it's only used by
fetch-pack.c. So its only consumer does want the output.

Junio also mentioned the possibility of another set of OIDs, and I
replied [1].

[1] https://lore.kernel.org/git/20210128003536.3874866-1-jonathantanmy@google.com/

> But, along with my other E-Mail...
> 
> > [...]
> > +static void parse_gitmodules_oids(int fd, struct oidset *gitmodules_oids)
> > +{
> > +	int len = the_hash_algo->hexsz + 1; /* hash + NL */
> > +
> > +	do {
> > +		char hex_hash[GIT_MAX_HEXSZ + 1];
> > +		int read_len = read_in_full(fd, hex_hash, len);
> > +		struct object_id oid;
> > +		const char *end;
> > +
> > +		if (!read_len)
> > +			return;
> > +		if (read_len != len)
> > +			die("invalid length read %d", read_len);
> > +		if (parse_oid_hex(hex_hash, &oid, &end) || *end != '\n')
> > +			die("invalid hash");
> > +		oidset_insert(gitmodules_oids, &oid);
> > +	} while (1);
> > +}
> > +
> 
> Doesn't this IPC mechanism already exist in the form of fsck.skipList?
> See my 1f3299fda9 (fsck: make fsck_config() re-usable, 2021-01-05) on
> "next". I.e. as noted in my just-sent-E-Mail you could probably just
> re-use skiplist as-is.

I'm not sure how fsck.skipList could be used here. Before running
fsck_finish() for the first time, we don't know which .gitmodules are
missing and which are not. And when running fsck_finish() for the second
time, we definitely do not want to skip any blobs.

> Or if not it seems to me that this whole IPC mechanism would be better
> done with a tempfile and passing it along like we already pass the
> fsck.skipList between these processes.
> 
> I doubt it's going to be large enough to matter, we could just put it in
> .git/ somewhere, like we put gc.log etc (but created with a mktemp()
> name...).
> 
> Or if we want to keep the "print <list> | process" model we can refactor
> the existing fsck IPC noted in 1f3299fda9 a bit, so e.g. you pass some
> version of "lines prefixed with "fsck-skiplist: " go into list xyz via a
> command-line option. And then existing option(s) and your potential new
> list (which as noted, I think is probably redundant to the skiplist) can
> use it.

I think using stdout is superior to using a tempfile - we don't have to
worry about interrupted invocations, for example.

What do you mean by "the existing fsck IPC noted in 1f3299fda9"? If you
mean the ability to pass a list of OIDs, for example using "-c
fsck.skipList=filename.txt", I'm not sure that it solves anything.
Firstly, I don't think that the skipList is useful here (as I said
earlier). And secondly, I don't think that OID input is the issue -
right now, the design is a process (index-pack, calling fsck_finish())
writing to its output which is then picked up by the calling process
(fetch-pack). We are not sending the dangling .gitmodules through stdin
anywhere.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 2/4] http-fetch: allow custom index-pack args
  2021-01-24  2:34   ` [PATCH 2/4] http-fetch: " Jonathan Tan
  2021-01-24 11:52     ` Ævar Arnfjörð Bjarmason
@ 2021-02-16 20:49     ` Josh Steadmon
  2021-02-16 22:57       ` Junio C Hamano
  1 sibling, 1 reply; 229+ messages in thread
From: Josh Steadmon @ 2021-02-16 20:49 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On 2021.01.23 18:34, Jonathan Tan wrote:
> This is the next step in teaching fetch-pack to pass its index-pack
> arguments when processing packfiles referenced by URIs.
> 
> The "--keep" in fetch-pack.c will be replaced with a full message in a
> subsequent commit.
> 
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  Documentation/git-http-fetch.txt |  9 ++++++--
>  fetch-pack.c                     |  1 +
>  http-fetch.c                     | 35 +++++++++++++++++++++++++++-----
>  t/t5550-http-fetch-dumb.sh       |  3 ++-
>  4 files changed, 40 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/git-http-fetch.txt b/Documentation/git-http-fetch.txt
> index 4deb4893f5..aa171088e8 100644
> --- a/Documentation/git-http-fetch.txt
> +++ b/Documentation/git-http-fetch.txt
> @@ -41,11 +41,16 @@ commit-id::
>  		<commit-id>['\t'<filename-as-in--w>]
>  
>  --packfile=<hash>::
> -	Instead of a commit id on the command line (which is not expected in
> +	For internal use only. Instead of a commit id on the command line (which is not expected in
>  	this case), 'git http-fetch' fetches the packfile directly at the given
>  	URL and uses index-pack to generate corresponding .idx and .keep files.
>  	The hash is used to determine the name of the temporary file and is
> -	arbitrary. The output of index-pack is printed to stdout.
> +	arbitrary. The output of index-pack is printed to stdout. Requires
> +	--index-pack-args.
> +
> +--index-pack-args=<args>::
> +	For internal use only. The command to run on the contents of the
> +	downloaded pack. Arguments are URL-encoded separated by spaces.

I'm a bit skeptical of using URL encoding to work around embedded
spaces. I believe in Emily's config-based hooks series, she wrote an
argument parser to pull repeated arguments into a strvec, could you do
something like that here?

I'm sympathetic to the idea that since this is an internal-only flag, we
can be a bit weird with the argument format, though.

>  --recover::
>  	Verify that everything reachable from target is fetched.  Used after
> diff --git a/fetch-pack.c b/fetch-pack.c
> index 876f90c759..274ae602f7 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -1645,6 +1645,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
>  		strvec_pushf(&cmd.args, "--packfile=%.*s",
>  			     (int) the_hash_algo->hexsz,
>  			     packfile_uris.items[i].string);
> +		strvec_push(&cmd.args, "--index-pack-args=index-pack --stdin --keep");
>  		strvec_push(&cmd.args, uri);
>  		cmd.git_cmd = 1;
>  		cmd.no_stdin = 1;
> diff --git a/http-fetch.c b/http-fetch.c
> index 2d1d9d054f..12feb84e71 100644
> --- a/http-fetch.c
> +++ b/http-fetch.c
> @@ -3,6 +3,7 @@
>  #include "exec-cmd.h"
>  #include "http.h"
>  #include "walker.h"
> +#include "strvec.h"
>  
>  static const char http_fetch_usage[] = "git http-fetch "
>  "[-c] [-t] [-a] [-v] [--recover] [-w ref] [--stdin | --packfile=hash | commit-id] url";
> @@ -43,11 +44,9 @@ static int fetch_using_walker(const char *raw_url, int get_verbosely,
>  	return rc;
>  }
>  
> -static const char *index_pack_args[] =
> -	{"index-pack", "--stdin", "--keep", NULL};
> -
>  static void fetch_single_packfile(struct object_id *packfile_hash,
> -				  const char *url) {
> +				  const char *url,
> +				  const char **index_pack_args) {
>  	struct http_pack_request *preq;
>  	struct slot_results results;
>  	int ret;
> @@ -90,6 +89,7 @@ int cmd_main(int argc, const char **argv)
>  	int packfile = 0;
>  	int nongit;
>  	struct object_id packfile_hash;
> +	const char *index_pack_args = NULL;
>  
>  	setup_git_directory_gently(&nongit);
>  
> @@ -116,6 +116,8 @@ int cmd_main(int argc, const char **argv)
>  			packfile = 1;
>  			if (parse_oid_hex(p, &packfile_hash, &end) || *end)
>  				die(_("argument to --packfile must be a valid hash (got '%s')"), p);
> +		} else if (skip_prefix(argv[arg], "--index-pack-args=", &p)) {
> +			index_pack_args = p;
>  		}
>  		arg++;
>  	}
> @@ -128,10 +130,33 @@ int cmd_main(int argc, const char **argv)
>  	git_config(git_default_config, NULL);
>  
>  	if (packfile) {
> -		fetch_single_packfile(&packfile_hash, argv[arg]);
> +		struct strvec encoded = STRVEC_INIT;
> +		char **raw;
> +		int i;
> +
> +		if (!index_pack_args)
> +			die(_("--packfile requires --index-pack-args"));
> +
> +		strvec_split(&encoded, index_pack_args);
> +
> +		CALLOC_ARRAY(raw, encoded.nr + 1);
> +		for (i = 0; i < encoded.nr; i++)
> +			raw[i] = url_percent_decode(encoded.v[i]);
> +
> +		fetch_single_packfile(&packfile_hash, argv[arg],
> +				      (const char **) raw);
> +
> +		for (i = 0; i < encoded.nr; i++)
> +			free(raw[i]);
> +		free(raw);
> +		strvec_clear(&encoded);
> +
>  		return 0;
>  	}
>  
> +	if (index_pack_args)
> +		die(_("--index-pack-args can only be used with --packfile"));
> +
>  	if (commits_on_stdin) {
>  		commits = walker_targets_stdin(&commit_id, &write_ref);
>  	} else {
> diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh
> index 483578b2d7..af90e7efed 100755
> --- a/t/t5550-http-fetch-dumb.sh
> +++ b/t/t5550-http-fetch-dumb.sh
> @@ -224,7 +224,8 @@ test_expect_success 'http-fetch --packfile' '
>  
>  	git init packfileclient &&
>  	p=$(cd "$HTTPD_DOCUMENT_ROOT_PATH"/repo_pack.git && ls objects/pack/pack-*.pack) &&
> -	git -C packfileclient http-fetch --packfile=$ARBITRARY "$HTTPD_URL"/dumb/repo_pack.git/$p >out &&
> +	git -C packfileclient http-fetch --packfile=$ARBITRARY \
> +		--index-pack-args="index-pack --stdin --keep" "$HTTPD_URL"/dumb/repo_pack.git/$p >out &&
>  
>  	grep "^keep.[0-9a-f]\{16,\}$" out &&
>  	cut -c6- out >packhash &&
> -- 
> 2.30.0.280.ga3ce27912f-goog
> 

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 2/4] http-fetch: allow custom index-pack args
  2021-02-16 20:49     ` Josh Steadmon
@ 2021-02-16 22:57       ` Junio C Hamano
  2021-02-17 19:46         ` Jonathan Tan
  0 siblings, 1 reply; 229+ messages in thread
From: Junio C Hamano @ 2021-02-16 22:57 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: Jonathan Tan, git

Josh Steadmon <steadmon@google.com> writes:

>> +--index-pack-args=<args>::
>> +	For internal use only. The command to run on the contents of the
>> +	downloaded pack. Arguments are URL-encoded separated by spaces.
>
> I'm a bit skeptical of using URL encoding to work around embedded
> spaces. I believe in Emily's config-based hooks series, she wrote an
> argument parser to pull repeated arguments into a strvec, could you do
> something like that here?
>
> I'm sympathetic to the idea that since this is an internal-only flag, we
> can be a bit weird with the argument format, though.

We tend to prefer quote.c::sq_quote*() suite of quoting; does this
codepath have very different constraints that require different
encoding?

Thanks.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 4/4] fetch-pack: print and use dangling .gitmodules
  2021-01-28  1:03       ` Jonathan Tan
@ 2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
                             ` (15 more replies)
  0 siblings, 16 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17  1:48 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git


On Thu, Jan 28 2021, Jonathan Tan wrote:

Sorry I managed to miss this at the time. Hopefully a late reply is
better than never.

>> On Sun, Jan 24 2021, Jonathan Tan wrote:
>> 
>> > +void register_found_gitmodules(const struct object_id *oid)
>> > +{
>> > +	oidset_insert(&gitmodules_found, oid);
>> > +}
>> > +
>> 
>> In fsck.c we only use this variable to insert into it, or in fsck_blob()
>> to do the actual check, but then we either abort early if we've found
>> it, or right after that:
>
> By "this variable", do you mean gitmodules_found? fsck_finish() consumes
> it.

Yes, consumes it to emit errors with report(), no?

>>         if (object_on_skiplist(options, oid))
>>                 return 0;
>> 
>> So (along with comments I have below...) you could just use the existing
>> "skiplist" option instead, no?
>
> I don't understand this part (in particular, the part you quoted). About
> "skiplist", I'll reply to your other email [1] which has more details.
>
> [1] https://lore.kernel.org/git/87czxu7c15.fsf@evledraar.gmail.com/

*nod*

>> This whole thing seems just like the bad path I took in earlier rounds
>> of my in-flight mktag series. You don't need this new custom API. You
>> just setup an error handler for your fsck which ignores / prints / logs
>> / whatever the OIDs you want if you get a FSCK_MSG_GITMODULES_MISSING
>> error, which you then "return 0" on.
>> 
>> If you don't have FSCK_MSG_GITMODULES_MISSING punt and call
>> fsck_error_function().
>
> I tried that first, and the issue is that IDs like
> FSCK_MSG_GITMODULES_MISSING are internal to fsck.c. As for whether we
> should start exposing the IDs publicly, I think we should wait until a
> few new cases like this come up, so that we more fully understand the
> requirements first.

The requirement is that you want the objects ids we'd otherwise error
about in fsck_finish(). Yeah we don't pass the "fsck_msg_id" down in the
"report()" function, but you can reliably strstr() it out of the
message. We document & hard rely on that already, since it's also a
config key.

But yeah, we could just change the report function to pass down the id
and move the relevant macros from fsck.c to fsck.h. I think that would
be a smaller change conceptually than a special-case flag in
fsck_options for something we could otherwise do with the error
reporting.


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 4/4] fetch-pack: print and use dangling .gitmodules
  2021-01-28  1:15       ` Jonathan Tan
@ 2021-02-17  2:10         ` Ævar Arnfjörð Bjarmason
  2021-02-17 20:10           ` Jonathan Tan
  0 siblings, 1 reply; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17  2:10 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git


On Thu, Jan 28 2021, Jonathan Tan wrote:

>> On Sun, Jan 24 2021, Jonathan Tan wrote:
>> >  --fsck-objects::
>> > -	Die if the pack contains broken objects. For internal use only.
>> > +	For internal use only.
>> > ++
>> > +Die if the pack contains broken objects. If the pack contains a tree
>> > +pointing to a .gitmodules blob that does not exist, prints the hash of
>> > +that blob (for the caller to check) after the hash that goes into the
>> > +name of the pack/idx file (see "Notes").
>> 
>> [I should have waited a bit and sent one E-Mail]
>> 
>> Is this really generally usable as an IPC mechanism, what if we need
>> another set of OIDs we care about? Shouldn't it at least be hidden
>> behind some option so you don't get a deluge of output from index-pack
>> if you're not in this packfile-uri mode?
>
> --fsck-objects is only for internal use, and it's only used by
> fetch-pack.c. So its only consumer does want the output.
>
> Junio also mentioned the possibility of another set of OIDs, and I
> replied [1].
>
> [1] https://lore.kernel.org/git/20210128003536.3874866-1-jonathantanmy@google.com/
>
>> But, along with my other E-Mail...
>> 
>> > [...]
>> > +static void parse_gitmodules_oids(int fd, struct oidset *gitmodules_oids)
>> > +{
>> > +	int len = the_hash_algo->hexsz + 1; /* hash + NL */
>> > +
>> > +	do {
>> > +		char hex_hash[GIT_MAX_HEXSZ + 1];
>> > +		int read_len = read_in_full(fd, hex_hash, len);
>> > +		struct object_id oid;
>> > +		const char *end;
>> > +
>> > +		if (!read_len)
>> > +			return;
>> > +		if (read_len != len)
>> > +			die("invalid length read %d", read_len);
>> > +		if (parse_oid_hex(hex_hash, &oid, &end) || *end != '\n')
>> > +			die("invalid hash");
>> > +		oidset_insert(gitmodules_oids, &oid);
>> > +	} while (1);
>> > +}
>> > +
>> 
>> Doesn't this IPC mechanism already exist in the form of fsck.skipList?
>> See my 1f3299fda9 (fsck: make fsck_config() re-usable, 2021-01-05) on
>> "next". I.e. as noted in my just-sent-E-Mail you could probably just
>> re-use skiplist as-is.
>
> I'm not sure how fsck.skipList could be used here. Before running
> fsck_finish() for the first time, we don't know which .gitmodules are
> missing and which are not. And when running fsck_finish() for the second
> time, we definitely do not want to skip any blobs.
>
>> Or if not it seems to me that this whole IPC mechanism would be better
>> done with a tempfile and passing it along like we already pass the
>> fsck.skipList between these processes.
>> 
>> I doubt it's going to be large enough to matter, we could just put it in
>> .git/ somewhere, like we put gc.log etc (but created with a mktemp()
>> name...).
>> 
>> Or if we want to keep the "print <list> | process" model we can refactor
>> the existing fsck IPC noted in 1f3299fda9 a bit, so e.g. you pass some
>> version of "lines prefixed with "fsck-skiplist: " go into list xyz via a
>> command-line option. And then existing option(s) and your potential new
>> list (which as noted, I think is probably redundant to the skiplist) can
>> use it.
>
> I think using stdout is superior to using a tempfile - we don't have to
> worry about interrupted invocations, for example.
>
> What do you mean by "the existing fsck IPC noted in 1f3299fda9"? If you
> mean the ability to pass a list of OIDs, for example using "-c
> fsck.skipList=filename.txt", I'm not sure that it solves anything.
> Firstly, I don't think that the skipList is useful here (as I said
> earlier). And secondly, I don't think that OID input is the issue -
> right now, the design is a process (index-pack, calling fsck_finish())
> writing to its output which is then picked up by the calling process
> (fetch-pack). We are not sending the dangling .gitmodules through stdin
> anywhere.

Sorry for being unclear here. I don't think (honestly I don't remember,
it's been almost a month) that I meant to you should use the skipList.

Looking at that code again we use object_on_skiplist() to do an early
punt in report(), but also fsck_blob(), presumably you never want the
latter, and that early punting wouldn't be needed if your report()
function intercepted the modules blob id for stashing it away / later
reporting / whatever.

So yeah, I'm 99% sure now that's not what I meant :)

What I meant with:

    Or if we want to keep the "print <list> | process"[...]

Is that we have an existing ad-hoc IPC model for these commands in
passing along the skipList, which is made more complex because sometimes
the initial process reads the file, sometimes it passes it along as-is
to the child.

And then there's this patch that passes OIDs too, but through a
different mechanism.

I was suggesting that perhaps it made more sense to refactor both so
they could use the same mechanism, because we're potentially passing two
lists of OIDs between the two. Just one goes via line-at-a-time in the
output, the other via a config option on the command-line.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 4/4] fetch-pack: print and use dangling .gitmodules
  2021-01-24  2:34   ` [PATCH 4/4] fetch-pack: print and use dangling .gitmodules Jonathan Tan
                       ` (2 preceding siblings ...)
  2021-01-24 12:30     ` Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:27     ` Ævar Arnfjörð Bjarmason
  2021-02-17 20:11       ` Jonathan Tan
  3 siblings, 1 reply; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:27 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git


On Sun, Jan 24 2021, Jonathan Tan wrote:

> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index 557bd2f348..f995c15115 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -1888,8 +1888,13 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
>  	else
>  		close(input_fd);
>  
> -	if (do_fsck_object && fsck_finish(&fsck_options))
> -		die(_("fsck error in pack objects"));
> +	if (do_fsck_object) {
> +		struct fsck_options fo = FSCK_OPTIONS_STRICT;
> +
> +		fo.print_dangling_gitmodules = 1;
> +		if (fsck_finish(&fo))
> +			die(_("fsck error in pack objects"));
> +	}
> [...]
> +static void fsck_gitmodules_oids(struct oidset *gitmodules_oids)
> +{
> +	struct oidset_iter iter;
> +	const struct object_id *oid;
> +	struct fsck_options fo = FSCK_OPTIONS_STRICT;
> +
> +	if (!oidset_size(gitmodules_oids))
> +		return;
> +
> +	oidset_iter_init(gitmodules_oids, &iter);
> +	while ((oid = oidset_iter_next(&iter)))
> +		register_found_gitmodules(oid);
> +	if (fsck_finish(&fo))
> +		die("fsck failed");
> +}
> +

What's the need for STRICT here & can't the former use the existing
fsck_options in index-pack.c? With this on top we pass all tests:

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 18531199242..5464edf4778 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1933,10 +1933,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 		close(input_fd);
 
 	if (do_fsck_object) {
-		struct fsck_options fo = FSCK_OPTIONS_STRICT;
-
-		fo.print_dangling_gitmodules = 1;
-		if (fsck_finish(&fo))
+		fsck_options.print_dangling_gitmodules = 1;
+		if (fsck_finish(&fsck_options))
 			die(_("fsck error in pack objects"));
 	}
 
diff --git a/fetch-pack.c b/fetch-pack.c
index 0a337a04f1f..a8754d97e3d 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -997,7 +997,7 @@ static void fsck_gitmodules_oids(struct oidset *gitmodules_oids)
 {
 	struct oidset_iter iter;
 	const struct object_id *oid;
-	struct fsck_options fo = FSCK_OPTIONS_STRICT;
+	struct fsck_options fo = FSCK_OPTIONS_DEFAULT;
 
 	if (!oidset_size(gitmodules_oids))
 		return;

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 00/14] fsck: API improvements
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:42           ` Ævar Arnfjörð Bjarmason
  2021-02-17 21:02             ` Junio C Hamano
                               ` (11 more replies)
  2021-02-17 19:42           ` [PATCH 01/14] fsck.h: indent arguments to of fsck_set_msg_type Ævar Arnfjörð Bjarmason
                             ` (14 subsequent siblings)
  15 siblings, 12 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Jonathan Tan pointed out that the fsck error_func doesn't pass you the
ID of the fsck failure in [1]. This series improves the API so it
does, and moves the gitmodules_{found,done} variables into the
fsck_options struct.

The result is that instead of the "print_dangling_gitmodules" member
in that series we can just implement that with the diff at the end of
this cover letter (goes on top of a merge of this series & "seen"),
and without any changes to fsck_finish().

This conflicts with other in-flight fsck changes but the conflict is
rather trivial. Jeff King has another concurrent series to add a
couple of new fsck checks, those need to be moved to fsck.h, and
there's another trivial conflict in 2 hunks due to the
gitmodules_{found,done} move.

1. https://lore.kernel.org/git/87blcja2ha.fsf@evledraar.gmail.com/

Ævar Arnfjörð Bjarmason (14):
  fsck.h: indent arguments to of fsck_set_msg_type
  fsck.h: use use "enum object_type" instead of "int"
  fsck.c: rename variables in fsck_set_msg_type() for less confusion
  fsck.c: move definition of msg_id into append_msg_id()
  fsck.c: rename remaining fsck_msg_id "id" to "msg_id"
  fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
  fsck.c: call parse_msg_type() early in fsck_set_msg_type()
  fsck.c: undefine temporary STR macro after use
  fsck.c: give "FOREACH_MSG_ID" a more specific name
  fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h
  fsck.c: pass along the fsck_msg_id in the fsck_error callback
  fsck.c: add an fsck_set_msg_type() API that takes enums
  fsck.h: update FSCK_OPTIONS_* for object_name
  fsck.c: move gitmodules_{found,done} into fsck_options

 builtin/fsck.c           |   7 +-
 builtin/index-pack.c     |   3 +-
 builtin/mktag.c          |   7 +-
 builtin/unpack-objects.c |   3 +-
 fsck.c                   | 160 ++++++++++++---------------------------
 fsck.h                   |  98 +++++++++++++++++++++---
 6 files changed, 152 insertions(+), 126 deletions(-)

-- 

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 82f381f854..22dfcfc5de 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1713,6 +1713,20 @@ static void show_pack_info(int stat_only)
 	}
 }
 
+static int index_pack_fsck_error_func(struct fsck_options *o,
+				      const struct object_id *oid,
+				      enum object_type object_type,
+				      enum fsck_msg_type msg_type,
+				      enum fsck_msg_id msg_id,
+				      const char *message)
+{
+	if (msg_id == FSCK_MSG_GITMODULES_MISSING) {
+		puts(oid_to_hex(oid));
+		return 0;
+	}
+	return fsck_error_function(o, oid, object_type, msg_type, msg_id, message);
+}
+
 int cmd_index_pack(int argc, const char **argv, const char *prefix)
 {
 	int i, fix_thin_pack = 0, verify = 0, stat_only = 0, rev_index;
@@ -1934,10 +1948,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 		close(input_fd);
 
 	if (do_fsck_object) {
-		struct fsck_options fo = FSCK_OPTIONS_STRICT;
-
-		fo.print_dangling_gitmodules = 1;
-		if (fsck_finish(&fo))
+		fsck_options.error_func = index_pack_fsck_error_func;
+		if (fsck_finish(&fsck_options))
 			die(_("fsck error in pack objects"));
 	}
 
diff --git a/fetch-pack.c b/fetch-pack.c
index 0a337a04f1..9fc2ce86e4 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -40,6 +40,7 @@ static struct shallow_lock shallow_lock;
 static const char *alternate_shallow_file;
 static struct strbuf fsck_msg_types = STRBUF_INIT;
 static struct string_list uri_protocols = STRING_LIST_INIT_DUP;
+static struct fsck_options fsck_options = FSCK_OPTIONS_STRICT;
 
 /* Remember to update object flag allocation in object.h */
 #define COMPLETE	(1U << 0)
@@ -993,19 +994,34 @@ static int cmp_ref_by_name(const void *a_, const void *b_)
 	return strcmp(a->name, b->name);
 }
 
+static int fetch_pack_fsck_error_func(struct fsck_options *o,
+				      const struct object_id *oid,
+				      enum object_type object_type,
+				      enum fsck_msg_type msg_type,
+				      enum fsck_msg_id msg_id,
+				      const char *message)
+{
+	if (msg_id == FSCK_MSG_GITMODULES_MISSING) {
+		puts(oid_to_hex(oid));
+		return 0;
+	}
+	return fsck_error_function(o, oid, object_type, msg_type, msg_id, message);
+}
+
 static void fsck_gitmodules_oids(struct oidset *gitmodules_oids)
 {
 	struct oidset_iter iter;
 	const struct object_id *oid;
-	struct fsck_options fo = FSCK_OPTIONS_STRICT;
 
 	if (!oidset_size(gitmodules_oids))
 		return;
 
 	oidset_iter_init(gitmodules_oids, &iter);
 	while ((oid = oidset_iter_next(&iter)))
-		register_found_gitmodules(oid);
-	if (fsck_finish(&fo))
+		oidset_insert(&fsck_options.gitmodules_found, oid);
+
+	fsck_options.error_func = fetch_pack_fsck_error_func;
+	if (fsck_finish(&fsck_options))
 		die("fsck failed");
 }
 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 01/14] fsck.h: indent arguments to of fsck_set_msg_type
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:42           ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 02/14] fsck.h: use use "enum object_type" instead of "int" Ævar Arnfjörð Bjarmason
                             ` (13 subsequent siblings)
  15 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fsck.h b/fsck.h
index 423c467feb7..df0b64a2163 100644
--- a/fsck.h
+++ b/fsck.h
@@ -11,7 +11,7 @@ struct fsck_options;
 struct object;
 
 void fsck_set_msg_type(struct fsck_options *options,
-		const char *msg_id, const char *msg_type);
+		       const char *msg_id, const char *msg_type);
 void fsck_set_msg_types(struct fsck_options *options, const char *values);
 int is_valid_msg_type(const char *msg_id, const char *msg_type);
 
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 02/14] fsck.h: use use "enum object_type" instead of "int"
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 01/14] fsck.h: indent arguments to of fsck_set_msg_type Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:42           ` Ævar Arnfjörð Bjarmason
  2021-02-17 23:40             ` Junio C Hamano
  2021-02-17 19:42           ` [PATCH 03/14] fsck.c: rename variables in fsck_set_msg_type() for less confusion Ævar Arnfjörð Bjarmason
                             ` (12 subsequent siblings)
  15 siblings, 1 reply; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Change the fsck_walk_func to use an "enum object_type" instead of an
"int" type. The types are compatible, and ever since this was added in
355885d5315 (add generic, type aware object chain walker, 2008-02-25)
we've used entries from object_type (OBJ_BLOB etc.).

So this doesn't really change anything as far as the generated code is
concerned, it just gives the compiler more information and makes this
easier to read.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c           | 3 ++-
 builtin/index-pack.c     | 3 ++-
 builtin/unpack-objects.c | 3 ++-
 fsck.h                   | 3 ++-
 4 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 821e7798c70..68f0329e69e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -197,7 +197,8 @@ static int traverse_reachable(void)
 	return !!result;
 }
 
-static int mark_used(struct object *obj, int type, void *data, struct fsck_options *options)
+static int mark_used(struct object *obj, enum object_type object_type,
+		     void *data, struct fsck_options *options)
 {
 	if (!obj)
 		return 1;
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 54f74c48741..2f291a14d4a 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -212,7 +212,8 @@ static void cleanup_thread(void)
 	free(thread_data);
 }
 
-static int mark_link(struct object *obj, int type, void *data, struct fsck_options *options)
+static int mark_link(struct object *obj, enum object_type type,
+		     void *data, struct fsck_options *options)
 {
 	if (!obj)
 		return -1;
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index dd4a75e030d..ca54fd16688 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -187,7 +187,8 @@ static void write_cached_object(struct object *obj, struct obj_buffer *obj_buf)
  * that have reachability requirements and calls this function.
  * Verify its reachability and validity recursively and write it out.
  */
-static int check_object(struct object *obj, int type, void *data, struct fsck_options *options)
+static int check_object(struct object *obj, enum object_type type,
+			void *data, struct fsck_options *options)
 {
 	struct obj_buffer *obj_buf;
 
diff --git a/fsck.h b/fsck.h
index df0b64a2163..0c75789d219 100644
--- a/fsck.h
+++ b/fsck.h
@@ -23,7 +23,8 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type);
  *     <0	error signaled and abort
  *     >0	error signaled and do not abort
  */
-typedef int (*fsck_walk_func)(struct object *obj, int type, void *data, struct fsck_options *options);
+typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type,
+			      void *data, struct fsck_options *options);
 
 /* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
 typedef int (*fsck_error)(struct fsck_options *o,
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 03/14] fsck.c: rename variables in fsck_set_msg_type() for less confusion
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
                             ` (2 preceding siblings ...)
  2021-02-17 19:42           ` [PATCH 02/14] fsck.h: use use "enum object_type" instead of "int" Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:42           ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 04/14] fsck.c: move definition of msg_id into append_msg_id() Ævar Arnfjörð Bjarmason
                             ` (11 subsequent siblings)
  15 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Rename variables in a function added in 0282f4dced0 (fsck: offer a
function to demote fsck errors to warnings, 2015-06-22).

It was needlessly confusing that it took a "msg_type" argument, but
then later declared another "msg_type" of a different type.

Let's rename that to "tmp", and rename "id" to "msg_id" and "msg_id"
to "msg_id_str" etc. This will make a follow-up change smaller.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/fsck.c b/fsck.c
index 4b7f0b73d73..acccad243ec 100644
--- a/fsck.c
+++ b/fsck.c
@@ -203,27 +203,27 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type)
 }
 
 void fsck_set_msg_type(struct fsck_options *options,
-		const char *msg_id, const char *msg_type)
+		const char *msg_id_str, const char *msg_type_str)
 {
-	int id = parse_msg_id(msg_id), type;
+	int msg_id = parse_msg_id(msg_id_str), msg_type;
 
-	if (id < 0)
-		die("Unhandled message id: %s", msg_id);
-	type = parse_msg_type(msg_type);
+	if (msg_id < 0)
+		die("Unhandled message id: %s", msg_id_str);
+	msg_type = parse_msg_type(msg_type_str);
 
-	if (type != FSCK_ERROR && msg_id_info[id].msg_type == FSCK_FATAL)
-		die("Cannot demote %s to %s", msg_id, msg_type);
+	if (msg_type != FSCK_ERROR && msg_id_info[msg_id].msg_type == FSCK_FATAL)
+		die("Cannot demote %s to %s", msg_id_str, msg_type_str);
 
 	if (!options->msg_type) {
 		int i;
-		int *msg_type;
-		ALLOC_ARRAY(msg_type, FSCK_MSG_MAX);
+		int *tmp;
+		ALLOC_ARRAY(tmp, FSCK_MSG_MAX);
 		for (i = 0; i < FSCK_MSG_MAX; i++)
-			msg_type[i] = fsck_msg_type(i, options);
-		options->msg_type = msg_type;
+			tmp[i] = fsck_msg_type(i, options);
+		options->msg_type = tmp;
 	}
 
-	options->msg_type[id] = type;
+	options->msg_type[msg_id] = msg_type;
 }
 
 void fsck_set_msg_types(struct fsck_options *options, const char *values)
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 04/14] fsck.c: move definition of msg_id into append_msg_id()
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
                             ` (3 preceding siblings ...)
  2021-02-17 19:42           ` [PATCH 03/14] fsck.c: rename variables in fsck_set_msg_type() for less confusion Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:42           ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 05/14] fsck.c: rename remaining fsck_msg_id "id" to "msg_id" Ævar Arnfjörð Bjarmason
                             ` (10 subsequent siblings)
  15 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Refactor code added in 71ab8fa840f (fsck: report the ID of the
error/warning, 2015-06-22) to resolve the msg_id to a string in the
function that wants it, instead of doing it in report().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fsck.c b/fsck.c
index acccad243ec..1070071ffec 100644
--- a/fsck.c
+++ b/fsck.c
@@ -264,8 +264,9 @@ void fsck_set_msg_types(struct fsck_options *options, const char *values)
 	free(to_free);
 }
 
-static void append_msg_id(struct strbuf *sb, const char *msg_id)
+static void append_msg_id(struct strbuf *sb, enum fsck_msg_id id)
 {
+	const char *msg_id = msg_id_info[id].id_string;
 	for (;;) {
 		char c = *(msg_id)++;
 
@@ -308,7 +309,7 @@ static int report(struct fsck_options *options,
 	else if (msg_type == FSCK_INFO)
 		msg_type = FSCK_WARN;
 
-	append_msg_id(&sb, msg_id_info[id].id_string);
+	append_msg_id(&sb, id);
 
 	va_start(ap, fmt);
 	strbuf_vaddf(&sb, fmt, ap);
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 05/14] fsck.c: rename remaining fsck_msg_id "id" to "msg_id"
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
                             ` (4 preceding siblings ...)
  2021-02-17 19:42           ` [PATCH 04/14] fsck.c: move definition of msg_id into append_msg_id() Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:42           ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 06/14] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Ævar Arnfjörð Bjarmason
                             ` (9 subsequent siblings)
  15 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Rename the remaining variables of type fsck_msg_id from "id" to
"msg_id". This change is relatively small, and is worth the churn for
a later change where we have different id's in the "report" function.
---
 fsck.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fsck.c b/fsck.c
index 1070071ffec..dbb6f7c4ee2 100644
--- a/fsck.c
+++ b/fsck.c
@@ -264,19 +264,19 @@ void fsck_set_msg_types(struct fsck_options *options, const char *values)
 	free(to_free);
 }
 
-static void append_msg_id(struct strbuf *sb, enum fsck_msg_id id)
+static void append_msg_id(struct strbuf *sb, enum fsck_msg_id msg_id)
 {
-	const char *msg_id = msg_id_info[id].id_string;
+	const char *msg_id_str = msg_id_info[msg_id].id_string;
 	for (;;) {
-		char c = *(msg_id)++;
+		char c = *(msg_id_str)++;
 
 		if (!c)
 			break;
 		if (c != '_')
 			strbuf_addch(sb, tolower(c));
 		else {
-			assert(*msg_id);
-			strbuf_addch(sb, *(msg_id)++);
+			assert(*msg_id_str);
+			strbuf_addch(sb, *(msg_id_str)++);
 		}
 	}
 
@@ -292,11 +292,11 @@ static int object_on_skiplist(struct fsck_options *opts,
 __attribute__((format (printf, 5, 6)))
 static int report(struct fsck_options *options,
 		  const struct object_id *oid, enum object_type object_type,
-		  enum fsck_msg_id id, const char *fmt, ...)
+		  enum fsck_msg_id msg_id, const char *fmt, ...)
 {
 	va_list ap;
 	struct strbuf sb = STRBUF_INIT;
-	int msg_type = fsck_msg_type(id, options), result;
+	int msg_type = fsck_msg_type(msg_id, options), result;
 
 	if (msg_type == FSCK_IGNORE)
 		return 0;
@@ -309,7 +309,7 @@ static int report(struct fsck_options *options,
 	else if (msg_type == FSCK_INFO)
 		msg_type = FSCK_WARN;
 
-	append_msg_id(&sb, id);
+	append_msg_id(&sb, msg_id);
 
 	va_start(ap, fmt);
 	strbuf_vaddf(&sb, fmt, ap);
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 06/14] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
                             ` (5 preceding siblings ...)
  2021-02-17 19:42           ` [PATCH 05/14] fsck.c: rename remaining fsck_msg_id "id" to "msg_id" Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:42           ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 07/14] fsck.c: call parse_msg_type() early in fsck_set_msg_type() Ævar Arnfjörð Bjarmason
                             ` (8 subsequent siblings)
  15 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Move the FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} defines into a new
fsck_msg_type enum.

These defines were originally introduced in:

 - ba002f3b28a (builtin-fsck: move common object checking code to
   fsck.c, 2008-02-25)
 - f50c4407305 (fsck: disallow demoting grave fsck errors to warnings,
   2015-06-22)
 - efaba7cc77f (fsck: optionally ignore specific fsck issues
   completely, 2015-06-22)
 - f27d05b1704 (fsck: allow upgrading fsck warnings to errors,
   2015-06-22)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  |  2 +-
 builtin/mktag.c |  3 ++-
 fsck.c          | 21 ++++++++++-----------
 fsck.h          | 17 ++++++++++-------
 4 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 68f0329e69e..d6d745dc702 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -89,7 +89,7 @@ static int objerror(struct object *obj, const char *err)
 static int fsck_error_func(struct fsck_options *o,
 			   const struct object_id *oid,
 			   enum object_type object_type,
-			   int msg_type, const char *message)
+			   enum fsck_msg_type msg_type, const char *message)
 {
 	switch (msg_type) {
 	case FSCK_WARN:
diff --git a/builtin/mktag.c b/builtin/mktag.c
index 41a399a69e4..1834394a9b6 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -22,7 +22,8 @@ static int mktag_config(const char *var, const char *value, void *cb)
 static int mktag_fsck_error_func(struct fsck_options *o,
 				 const struct object_id *oid,
 				 enum object_type object_type,
-				 int msg_type, const char *message)
+				 enum fsck_msg_type msg_type,
+				 const char *message)
 {
 	switch (msg_type) {
 	case FSCK_WARN:
diff --git a/fsck.c b/fsck.c
index dbb6f7c4ee2..00e0fef21ca 100644
--- a/fsck.c
+++ b/fsck.c
@@ -22,9 +22,6 @@
 static struct oidset gitmodules_found = OIDSET_INIT;
 static struct oidset gitmodules_done = OIDSET_INIT;
 
-#define FSCK_FATAL -1
-#define FSCK_INFO -2
-
 #define FOREACH_MSG_ID(FUNC) \
 	/* fatal errors */ \
 	FUNC(NUL_IN_HEADER, FATAL) \
@@ -97,7 +94,7 @@ static struct {
 	const char *id_string;
 	const char *downcased;
 	const char *camelcased;
-	int msg_type;
+	enum fsck_msg_type msg_type;
 } msg_id_info[FSCK_MSG_MAX + 1] = {
 	FOREACH_MSG_ID(MSG_ID)
 	{ NULL, NULL, NULL, -1 }
@@ -164,10 +161,10 @@ void list_config_fsck_msg_ids(struct string_list *list, const char *prefix)
 		list_config_item(list, prefix, msg_id_info[i].camelcased);
 }
 
-static int fsck_msg_type(enum fsck_msg_id msg_id,
+static enum fsck_msg_type fsck_msg_type(enum fsck_msg_id msg_id,
 	struct fsck_options *options)
 {
-	int msg_type;
+	enum fsck_msg_type msg_type;
 
 	assert(msg_id >= 0 && msg_id < FSCK_MSG_MAX);
 
@@ -182,7 +179,7 @@ static int fsck_msg_type(enum fsck_msg_id msg_id,
 	return msg_type;
 }
 
-static int parse_msg_type(const char *str)
+static enum fsck_msg_type parse_msg_type(const char *str)
 {
 	if (!strcmp(str, "error"))
 		return FSCK_ERROR;
@@ -205,7 +202,8 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type)
 void fsck_set_msg_type(struct fsck_options *options,
 		const char *msg_id_str, const char *msg_type_str)
 {
-	int msg_id = parse_msg_id(msg_id_str), msg_type;
+	int msg_id = parse_msg_id(msg_id_str);
+	enum fsck_msg_type msg_type;
 
 	if (msg_id < 0)
 		die("Unhandled message id: %s", msg_id_str);
@@ -216,7 +214,7 @@ void fsck_set_msg_type(struct fsck_options *options,
 
 	if (!options->msg_type) {
 		int i;
-		int *tmp;
+		enum fsck_msg_type *tmp;
 		ALLOC_ARRAY(tmp, FSCK_MSG_MAX);
 		for (i = 0; i < FSCK_MSG_MAX; i++)
 			tmp[i] = fsck_msg_type(i, options);
@@ -296,7 +294,8 @@ static int report(struct fsck_options *options,
 {
 	va_list ap;
 	struct strbuf sb = STRBUF_INIT;
-	int msg_type = fsck_msg_type(msg_id, options), result;
+	enum fsck_msg_type msg_type = fsck_msg_type(msg_id, options);
+	int result;
 
 	if (msg_type == FSCK_IGNORE)
 		return 0;
@@ -1262,7 +1261,7 @@ int fsck_object(struct object *obj, void *data, unsigned long size,
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid,
 			enum object_type object_type,
-			int msg_type, const char *message)
+			enum fsck_msg_type msg_type, const char *message)
 {
 	if (msg_type == FSCK_WARN) {
 		warning("object %s: %s", fsck_describe_object(o, oid), message);
diff --git a/fsck.h b/fsck.h
index 0c75789d219..c77e8ddf10b 100644
--- a/fsck.h
+++ b/fsck.h
@@ -3,10 +3,13 @@
 
 #include "oidset.h"
 
-#define FSCK_ERROR 1
-#define FSCK_WARN 2
-#define FSCK_IGNORE 3
-
+enum fsck_msg_type {
+	FSCK_INFO = -2,
+	FSCK_FATAL = -1,
+	FSCK_ERROR = 1,
+	FSCK_WARN,
+	FSCK_IGNORE
+};
 struct fsck_options;
 struct object;
 
@@ -29,17 +32,17 @@ typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type,
 /* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
 typedef int (*fsck_error)(struct fsck_options *o,
 			  const struct object_id *oid, enum object_type object_type,
-			  int msg_type, const char *message);
+			  enum fsck_msg_type msg_type, const char *message);
 
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid, enum object_type object_type,
-			int msg_type, const char *message);
+			enum fsck_msg_type msg_type, const char *message);
 
 struct fsck_options {
 	fsck_walk_func walk;
 	fsck_error error_func;
 	unsigned strict:1;
-	int *msg_type;
+	enum fsck_msg_type *msg_type;
 	struct oidset skiplist;
 	kh_oid_map_t *object_names;
 };
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 07/14] fsck.c: call parse_msg_type() early in fsck_set_msg_type()
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
                             ` (6 preceding siblings ...)
  2021-02-17 19:42           ` [PATCH 06/14] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:42           ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 08/14] fsck.c: undefine temporary STR macro after use Ævar Arnfjörð Bjarmason
                             ` (7 subsequent siblings)
  15 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

There's no reason to defer the calling of parse_msg_type() until after
we've checked if the "id < 0". This is not a hot codepath, and
parse_msg_type() itself may die on invalid input.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fsck.c b/fsck.c
index 00e0fef21ca..7c53080ad48 100644
--- a/fsck.c
+++ b/fsck.c
@@ -203,11 +203,10 @@ void fsck_set_msg_type(struct fsck_options *options,
 		const char *msg_id_str, const char *msg_type_str)
 {
 	int msg_id = parse_msg_id(msg_id_str);
-	enum fsck_msg_type msg_type;
+	enum fsck_msg_type msg_type = parse_msg_type(msg_type_str);
 
 	if (msg_id < 0)
 		die("Unhandled message id: %s", msg_id_str);
-	msg_type = parse_msg_type(msg_type_str);
 
 	if (msg_type != FSCK_ERROR && msg_id_info[msg_id].msg_type == FSCK_FATAL)
 		die("Cannot demote %s to %s", msg_id_str, msg_type_str);
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 08/14] fsck.c: undefine temporary STR macro after use
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
                             ` (7 preceding siblings ...)
  2021-02-17 19:42           ` [PATCH 07/14] fsck.c: call parse_msg_type() early in fsck_set_msg_type() Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:42           ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 09/14] fsck.c: give "FOREACH_MSG_ID" a more specific name Ævar Arnfjörð Bjarmason
                             ` (6 subsequent siblings)
  15 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

In f417eed8cde (fsck: provide a function to parse fsck message IDs,
2015-06-22) the "STR" macro was introduced, but that short macro name
was not undefined after use as was done earlier in the same series for
the MSG_ID macro in c99ba492f1c (fsck: introduce identifiers for fsck
messages, 2015-06-22).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fsck.c b/fsck.c
index 7c53080ad48..88884e91c89 100644
--- a/fsck.c
+++ b/fsck.c
@@ -100,6 +100,7 @@ static struct {
 	{ NULL, NULL, NULL, -1 }
 };
 #undef MSG_ID
+#undef STR
 
 static void prepare_msg_ids(void)
 {
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 09/14] fsck.c: give "FOREACH_MSG_ID" a more specific name
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
                             ` (8 preceding siblings ...)
  2021-02-17 19:42           ` [PATCH 08/14] fsck.c: undefine temporary STR macro after use Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:42           ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 10/14] fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h Ævar Arnfjörð Bjarmason
                             ` (5 subsequent siblings)
  15 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Rename the FOREACH_MSG_ID macro to FOREACH_FSCK_MSG_ID in preparation
for moving it over to fsck.h. It's good convention to name macros
in *.h files in such a way as to clearly not clash with any other
names in other files.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fsck.c b/fsck.c
index 88884e91c89..1730acd698d 100644
--- a/fsck.c
+++ b/fsck.c
@@ -22,7 +22,7 @@
 static struct oidset gitmodules_found = OIDSET_INIT;
 static struct oidset gitmodules_done = OIDSET_INIT;
 
-#define FOREACH_MSG_ID(FUNC) \
+#define FOREACH_FSCK_MSG_ID(FUNC) \
 	/* fatal errors */ \
 	FUNC(NUL_IN_HEADER, FATAL) \
 	FUNC(UNTERMINATED_HEADER, FATAL) \
@@ -83,7 +83,7 @@ static struct oidset gitmodules_done = OIDSET_INIT;
 
 #define MSG_ID(id, msg_type) FSCK_MSG_##id,
 enum fsck_msg_id {
-	FOREACH_MSG_ID(MSG_ID)
+	FOREACH_FSCK_MSG_ID(MSG_ID)
 	FSCK_MSG_MAX
 };
 #undef MSG_ID
@@ -96,7 +96,7 @@ static struct {
 	const char *camelcased;
 	enum fsck_msg_type msg_type;
 } msg_id_info[FSCK_MSG_MAX + 1] = {
-	FOREACH_MSG_ID(MSG_ID)
+	FOREACH_FSCK_MSG_ID(MSG_ID)
 	{ NULL, NULL, NULL, -1 }
 };
 #undef MSG_ID
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 10/14] fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
                             ` (9 preceding siblings ...)
  2021-02-17 19:42           ` [PATCH 09/14] fsck.c: give "FOREACH_MSG_ID" a more specific name Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:42           ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 11/14] fsck.c: pass along the fsck_msg_id in the fsck_error callback Ævar Arnfjörð Bjarmason
                             ` (4 subsequent siblings)
  15 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Move the FOREACH_FSCK_MSG_ID macro and the fsck_msg_id enum it helps
define from fsck.c to fsck.h. This is in preparation for having
non-static functions take the fsck_msg_id as an argument.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 66 ---------------------------------------------------------
 fsck.h | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 67 insertions(+), 66 deletions(-)

diff --git a/fsck.c b/fsck.c
index 1730acd698d..980ef2cb8fa 100644
--- a/fsck.c
+++ b/fsck.c
@@ -22,72 +22,6 @@
 static struct oidset gitmodules_found = OIDSET_INIT;
 static struct oidset gitmodules_done = OIDSET_INIT;
 
-#define FOREACH_FSCK_MSG_ID(FUNC) \
-	/* fatal errors */ \
-	FUNC(NUL_IN_HEADER, FATAL) \
-	FUNC(UNTERMINATED_HEADER, FATAL) \
-	/* errors */ \
-	FUNC(BAD_DATE, ERROR) \
-	FUNC(BAD_DATE_OVERFLOW, ERROR) \
-	FUNC(BAD_EMAIL, ERROR) \
-	FUNC(BAD_NAME, ERROR) \
-	FUNC(BAD_OBJECT_SHA1, ERROR) \
-	FUNC(BAD_PARENT_SHA1, ERROR) \
-	FUNC(BAD_TAG_OBJECT, ERROR) \
-	FUNC(BAD_TIMEZONE, ERROR) \
-	FUNC(BAD_TREE, ERROR) \
-	FUNC(BAD_TREE_SHA1, ERROR) \
-	FUNC(BAD_TYPE, ERROR) \
-	FUNC(DUPLICATE_ENTRIES, ERROR) \
-	FUNC(MISSING_AUTHOR, ERROR) \
-	FUNC(MISSING_COMMITTER, ERROR) \
-	FUNC(MISSING_EMAIL, ERROR) \
-	FUNC(MISSING_NAME_BEFORE_EMAIL, ERROR) \
-	FUNC(MISSING_OBJECT, ERROR) \
-	FUNC(MISSING_SPACE_BEFORE_DATE, ERROR) \
-	FUNC(MISSING_SPACE_BEFORE_EMAIL, ERROR) \
-	FUNC(MISSING_TAG, ERROR) \
-	FUNC(MISSING_TAG_ENTRY, ERROR) \
-	FUNC(MISSING_TREE, ERROR) \
-	FUNC(MISSING_TREE_OBJECT, ERROR) \
-	FUNC(MISSING_TYPE, ERROR) \
-	FUNC(MISSING_TYPE_ENTRY, ERROR) \
-	FUNC(MULTIPLE_AUTHORS, ERROR) \
-	FUNC(TREE_NOT_SORTED, ERROR) \
-	FUNC(UNKNOWN_TYPE, ERROR) \
-	FUNC(ZERO_PADDED_DATE, ERROR) \
-	FUNC(GITMODULES_MISSING, ERROR) \
-	FUNC(GITMODULES_BLOB, ERROR) \
-	FUNC(GITMODULES_LARGE, ERROR) \
-	FUNC(GITMODULES_NAME, ERROR) \
-	FUNC(GITMODULES_SYMLINK, ERROR) \
-	FUNC(GITMODULES_URL, ERROR) \
-	FUNC(GITMODULES_PATH, ERROR) \
-	FUNC(GITMODULES_UPDATE, ERROR) \
-	/* warnings */ \
-	FUNC(BAD_FILEMODE, WARN) \
-	FUNC(EMPTY_NAME, WARN) \
-	FUNC(FULL_PATHNAME, WARN) \
-	FUNC(HAS_DOT, WARN) \
-	FUNC(HAS_DOTDOT, WARN) \
-	FUNC(HAS_DOTGIT, WARN) \
-	FUNC(NULL_SHA1, WARN) \
-	FUNC(ZERO_PADDED_FILEMODE, WARN) \
-	FUNC(NUL_IN_COMMIT, WARN) \
-	/* infos (reported as warnings, but ignored by default) */ \
-	FUNC(GITMODULES_PARSE, INFO) \
-	FUNC(BAD_TAG_NAME, INFO) \
-	FUNC(MISSING_TAGGER_ENTRY, INFO) \
-	/* ignored (elevated when requested) */ \
-	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
-
-#define MSG_ID(id, msg_type) FSCK_MSG_##id,
-enum fsck_msg_id {
-	FOREACH_FSCK_MSG_ID(MSG_ID)
-	FSCK_MSG_MAX
-};
-#undef MSG_ID
-
 #define STR(x) #x
 #define MSG_ID(id, msg_type) { STR(id), NULL, NULL, FSCK_##msg_type },
 static struct {
diff --git a/fsck.h b/fsck.h
index c77e8ddf10b..b4c53aaa08c 100644
--- a/fsck.h
+++ b/fsck.h
@@ -10,6 +10,73 @@ enum fsck_msg_type {
 	FSCK_WARN,
 	FSCK_IGNORE
 };
+
+#define FOREACH_FSCK_MSG_ID(FUNC) \
+	/* fatal errors */ \
+	FUNC(NUL_IN_HEADER, FATAL) \
+	FUNC(UNTERMINATED_HEADER, FATAL) \
+	/* errors */ \
+	FUNC(BAD_DATE, ERROR) \
+	FUNC(BAD_DATE_OVERFLOW, ERROR) \
+	FUNC(BAD_EMAIL, ERROR) \
+	FUNC(BAD_NAME, ERROR) \
+	FUNC(BAD_OBJECT_SHA1, ERROR) \
+	FUNC(BAD_PARENT_SHA1, ERROR) \
+	FUNC(BAD_TAG_OBJECT, ERROR) \
+	FUNC(BAD_TIMEZONE, ERROR) \
+	FUNC(BAD_TREE, ERROR) \
+	FUNC(BAD_TREE_SHA1, ERROR) \
+	FUNC(BAD_TYPE, ERROR) \
+	FUNC(DUPLICATE_ENTRIES, ERROR) \
+	FUNC(MISSING_AUTHOR, ERROR) \
+	FUNC(MISSING_COMMITTER, ERROR) \
+	FUNC(MISSING_EMAIL, ERROR) \
+	FUNC(MISSING_NAME_BEFORE_EMAIL, ERROR) \
+	FUNC(MISSING_OBJECT, ERROR) \
+	FUNC(MISSING_SPACE_BEFORE_DATE, ERROR) \
+	FUNC(MISSING_SPACE_BEFORE_EMAIL, ERROR) \
+	FUNC(MISSING_TAG, ERROR) \
+	FUNC(MISSING_TAG_ENTRY, ERROR) \
+	FUNC(MISSING_TREE, ERROR) \
+	FUNC(MISSING_TREE_OBJECT, ERROR) \
+	FUNC(MISSING_TYPE, ERROR) \
+	FUNC(MISSING_TYPE_ENTRY, ERROR) \
+	FUNC(MULTIPLE_AUTHORS, ERROR) \
+	FUNC(TREE_NOT_SORTED, ERROR) \
+	FUNC(UNKNOWN_TYPE, ERROR) \
+	FUNC(ZERO_PADDED_DATE, ERROR) \
+	FUNC(GITMODULES_MISSING, ERROR) \
+	FUNC(GITMODULES_BLOB, ERROR) \
+	FUNC(GITMODULES_LARGE, ERROR) \
+	FUNC(GITMODULES_NAME, ERROR) \
+	FUNC(GITMODULES_SYMLINK, ERROR) \
+	FUNC(GITMODULES_URL, ERROR) \
+	FUNC(GITMODULES_PATH, ERROR) \
+	FUNC(GITMODULES_UPDATE, ERROR) \
+	/* warnings */ \
+	FUNC(BAD_FILEMODE, WARN) \
+	FUNC(EMPTY_NAME, WARN) \
+	FUNC(FULL_PATHNAME, WARN) \
+	FUNC(HAS_DOT, WARN) \
+	FUNC(HAS_DOTDOT, WARN) \
+	FUNC(HAS_DOTGIT, WARN) \
+	FUNC(NULL_SHA1, WARN) \
+	FUNC(ZERO_PADDED_FILEMODE, WARN) \
+	FUNC(NUL_IN_COMMIT, WARN) \
+	/* infos (reported as warnings, but ignored by default) */ \
+	FUNC(GITMODULES_PARSE, INFO) \
+	FUNC(BAD_TAG_NAME, INFO) \
+	FUNC(MISSING_TAGGER_ENTRY, INFO) \
+	/* ignored (elevated when requested) */ \
+	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
+
+#define MSG_ID(id, msg_type) FSCK_MSG_##id,
+enum fsck_msg_id {
+	FOREACH_FSCK_MSG_ID(MSG_ID)
+	FSCK_MSG_MAX
+};
+#undef MSG_ID
+
 struct fsck_options;
 struct object;
 
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 11/14] fsck.c: pass along the fsck_msg_id in the fsck_error callback
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
                             ` (10 preceding siblings ...)
  2021-02-17 19:42           ` [PATCH 10/14] fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:42           ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 12/14] fsck.c: add an fsck_set_msg_type() API that takes enums Ævar Arnfjörð Bjarmason
                             ` (3 subsequent siblings)
  15 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Change the fsck_error callback to also pass along the
fsck_msg_id. Before this change the only way to get the message id was
to parse it back out of the "message".

Let's pass it down explicitly for the benefit of callers that might
want to use it, as discussed in [1].

Passing the msg_type is now redundant, as you can always get it back
from the msg_id, but I'm not changing that convention. It's really
common to need the msg_type, and the report() function itself (which
calls "fsck_error") needs to call fsck_msg_type() to discover
it. Let's not needlessly re-do that work in the user callback.

1. https://lore.kernel.org/git/87blcja2ha.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  | 4 +++-
 builtin/mktag.c | 1 +
 fsck.c          | 6 ++++--
 fsck.h          | 6 ++++--
 4 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index d6d745dc702..b71fac4ceca 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -89,7 +89,9 @@ static int objerror(struct object *obj, const char *err)
 static int fsck_error_func(struct fsck_options *o,
 			   const struct object_id *oid,
 			   enum object_type object_type,
-			   enum fsck_msg_type msg_type, const char *message)
+			   enum fsck_msg_type msg_type,
+			   enum fsck_msg_id msg_id,
+			   const char *message)
 {
 	switch (msg_type) {
 	case FSCK_WARN:
diff --git a/builtin/mktag.c b/builtin/mktag.c
index 1834394a9b6..dc989c356f5 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -23,6 +23,7 @@ static int mktag_fsck_error_func(struct fsck_options *o,
 				 const struct object_id *oid,
 				 enum object_type object_type,
 				 enum fsck_msg_type msg_type,
+				 enum fsck_msg_id msg_id,
 				 const char *message)
 {
 	switch (msg_type) {
diff --git a/fsck.c b/fsck.c
index 980ef2cb8fa..007f02b556a 100644
--- a/fsck.c
+++ b/fsck.c
@@ -247,7 +247,7 @@ static int report(struct fsck_options *options,
 	va_start(ap, fmt);
 	strbuf_vaddf(&sb, fmt, ap);
 	result = options->error_func(options, oid, object_type,
-				     msg_type, sb.buf);
+				     msg_type, msg_id, sb.buf);
 	strbuf_release(&sb);
 	va_end(ap);
 
@@ -1195,7 +1195,9 @@ int fsck_object(struct object *obj, void *data, unsigned long size,
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid,
 			enum object_type object_type,
-			enum fsck_msg_type msg_type, const char *message)
+			enum fsck_msg_type msg_type,
+			enum fsck_msg_id msg_id,
+			const char *message)
 {
 	if (msg_type == FSCK_WARN) {
 		warning("object %s: %s", fsck_describe_object(o, oid), message);
diff --git a/fsck.h b/fsck.h
index b4c53aaa08c..56536d7f29e 100644
--- a/fsck.h
+++ b/fsck.h
@@ -99,11 +99,13 @@ typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type,
 /* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
 typedef int (*fsck_error)(struct fsck_options *o,
 			  const struct object_id *oid, enum object_type object_type,
-			  enum fsck_msg_type msg_type, const char *message);
+			  enum fsck_msg_type msg_type, enum fsck_msg_id msg_id,
+			  const char *message);
 
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid, enum object_type object_type,
-			enum fsck_msg_type msg_type, const char *message);
+			enum fsck_msg_type msg_type, enum fsck_msg_id msg_id,
+			const char *message);
 
 struct fsck_options {
 	fsck_walk_func walk;
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 12/14] fsck.c: add an fsck_set_msg_type() API that takes enums
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
                             ` (11 preceding siblings ...)
  2021-02-17 19:42           ` [PATCH 11/14] fsck.c: pass along the fsck_msg_id in the fsck_error callback Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:42           ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 13/14] fsck.h: update FSCK_OPTIONS_* for object_name Ævar Arnfjörð Bjarmason
                             ` (2 subsequent siblings)
  15 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Change code I added in acf9de4c94e (mktag: use fsck instead of custom
verify_tag(), 2021-01-05) to make use of a new API function that takes
the fsck_msg_{id,type} types, instead of arbitrary strings that
we'll (hopefully) parse into those types.

At the time that the fsck_set_msg_type() API was introduced in
0282f4dced0 (fsck: offer a function to demote fsck errors to warnings,
2015-06-22) it was only intended to be used to parse user-supplied
data.

For things that are purely internal to the C code it makes sense to
have the compiler check these arguments, and to skip the sanity
checking of the data in fsck_set_msg_type() which is redundant to
checks we get from the compiler.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/mktag.c |  3 ++-
 fsck.c          | 27 +++++++++++++++++----------
 fsck.h          |  3 +++
 3 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/builtin/mktag.c b/builtin/mktag.c
index dc989c356f5..de67a94f24e 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -93,7 +93,8 @@ int cmd_mktag(int argc, const char **argv, const char *prefix)
 		die_errno(_("could not read from stdin"));
 
 	fsck_options.error_func = mktag_fsck_error_func;
-	fsck_set_msg_type(&fsck_options, "extraheaderentry", "warn");
+	fsck_set_msg_type_from_ids(&fsck_options, FSCK_MSG_EXTRA_HEADER_ENTRY,
+				   FSCK_WARN);
 	/* config might set fsck.extraHeaderEntry=* again */
 	git_config(mktag_config, NULL);
 	if (fsck_tag_standalone(NULL, buf.buf, buf.len, &fsck_options,
diff --git a/fsck.c b/fsck.c
index 007f02b556a..54632404de5 100644
--- a/fsck.c
+++ b/fsck.c
@@ -134,6 +134,22 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type)
 	return 1;
 }
 
+void fsck_set_msg_type_from_ids(struct fsck_options *options,
+				enum fsck_msg_id msg_id,
+				enum fsck_msg_type msg_type)
+{
+	if (!options->msg_type) {
+		int i;
+		enum fsck_msg_type *tmp;
+		ALLOC_ARRAY(tmp, FSCK_MSG_MAX);
+		for (i = 0; i < FSCK_MSG_MAX; i++)
+			tmp[i] = fsck_msg_type(i, options);
+		options->msg_type = tmp;
+	}
+
+	options->msg_type[msg_id] = msg_type;
+}
+
 void fsck_set_msg_type(struct fsck_options *options,
 		const char *msg_id_str, const char *msg_type_str)
 {
@@ -146,16 +162,7 @@ void fsck_set_msg_type(struct fsck_options *options,
 	if (msg_type != FSCK_ERROR && msg_id_info[msg_id].msg_type == FSCK_FATAL)
 		die("Cannot demote %s to %s", msg_id_str, msg_type_str);
 
-	if (!options->msg_type) {
-		int i;
-		enum fsck_msg_type *tmp;
-		ALLOC_ARRAY(tmp, FSCK_MSG_MAX);
-		for (i = 0; i < FSCK_MSG_MAX; i++)
-			tmp[i] = fsck_msg_type(i, options);
-		options->msg_type = tmp;
-	}
-
-	options->msg_type[msg_id] = msg_type;
+	fsck_set_msg_type_from_ids(options, msg_id, msg_type);
 }
 
 void fsck_set_msg_types(struct fsck_options *options, const char *values)
diff --git a/fsck.h b/fsck.h
index 56536d7f29e..af145bb4596 100644
--- a/fsck.h
+++ b/fsck.h
@@ -80,6 +80,9 @@ enum fsck_msg_id {
 struct fsck_options;
 struct object;
 
+void fsck_set_msg_type_from_ids(struct fsck_options *options,
+				enum fsck_msg_id msg_id,
+				enum fsck_msg_type msg_type);
 void fsck_set_msg_type(struct fsck_options *options,
 		       const char *msg_id, const char *msg_type);
 void fsck_set_msg_types(struct fsck_options *options, const char *values);
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 13/14] fsck.h: update FSCK_OPTIONS_* for object_name
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
                             ` (12 preceding siblings ...)
  2021-02-17 19:42           ` [PATCH 12/14] fsck.c: add an fsck_set_msg_type() API that takes enums Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:42           ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 14/14] fsck.c: move gitmodules_{found,done} into fsck_options Ævar Arnfjörð Bjarmason
  2021-02-17 20:05           ` [PATCH 4/4] fetch-pack: print and use dangling .gitmodules Jonathan Tan
  15 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Add the object_name member to the initialization macro. This was
omitted in 7b35efd734e (fsck_walk(): optionally name objects on the
go, 2016-07-17) when the field was added.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fsck.h b/fsck.h
index af145bb4596..28137a77df0 100644
--- a/fsck.h
+++ b/fsck.h
@@ -119,8 +119,8 @@ struct fsck_options {
 	kh_oid_map_t *object_names;
 };
 
-#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT }
-#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT }
+#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT, NULL }
+#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT, NULL }
 
 /* descend in all linked child objects
  * the return value is:
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 14/14] fsck.c: move gitmodules_{found,done} into fsck_options
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
                             ` (13 preceding siblings ...)
  2021-02-17 19:42           ` [PATCH 13/14] fsck.h: update FSCK_OPTIONS_* for object_name Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:42           ` Ævar Arnfjörð Bjarmason
  2021-02-17 20:05           ` [PATCH 4/4] fetch-pack: print and use dangling .gitmodules Jonathan Tan
  15 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Move the gitmodules_{found,done} static variables added in
159e7b080bf (fsck: detect gitmodules files, 2018-05-02) into the
fsck_options struct. It makes sense to keep all the context in the
same place.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 19 ++++++++-----------
 fsck.h |  6 ++++--
 2 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/fsck.c b/fsck.c
index 54632404de5..f344b6be3d3 100644
--- a/fsck.c
+++ b/fsck.c
@@ -19,9 +19,6 @@
 #include "credential.h"
 #include "help.h"
 
-static struct oidset gitmodules_found = OIDSET_INIT;
-static struct oidset gitmodules_done = OIDSET_INIT;
-
 #define STR(x) #x
 #define MSG_ID(id, msg_type) { STR(id), NULL, NULL, FSCK_##msg_type },
 static struct {
@@ -621,7 +618,7 @@ static int fsck_tree(const struct object_id *oid,
 
 		if (is_hfs_dotgitmodules(name) || is_ntfs_dotgitmodules(name)) {
 			if (!S_ISLNK(mode))
-				oidset_insert(&gitmodules_found, oid);
+				oidset_insert(&options->gitmodules_found, oid);
 			else
 				retval += report(options,
 						 oid, OBJ_TREE,
@@ -635,7 +632,7 @@ static int fsck_tree(const struct object_id *oid,
 				has_dotgit |= is_ntfs_dotgit(backslash);
 				if (is_ntfs_dotgitmodules(backslash)) {
 					if (!S_ISLNK(mode))
-						oidset_insert(&gitmodules_found, oid);
+						oidset_insert(&options->gitmodules_found, oid);
 					else
 						retval += report(options, oid, OBJ_TREE,
 								 FSCK_MSG_GITMODULES_SYMLINK,
@@ -1147,9 +1144,9 @@ static int fsck_blob(const struct object_id *oid, const char *buf,
 	struct fsck_gitmodules_data data;
 	struct config_options config_opts = { 0 };
 
-	if (!oidset_contains(&gitmodules_found, oid))
+	if (!oidset_contains(&options->gitmodules_found, oid))
 		return 0;
-	oidset_insert(&gitmodules_done, oid);
+	oidset_insert(&options->gitmodules_done, oid);
 
 	if (object_on_skiplist(options, oid))
 		return 0;
@@ -1220,13 +1217,13 @@ int fsck_finish(struct fsck_options *options)
 	struct oidset_iter iter;
 	const struct object_id *oid;
 
-	oidset_iter_init(&gitmodules_found, &iter);
+	oidset_iter_init(&options->gitmodules_found, &iter);
 	while ((oid = oidset_iter_next(&iter))) {
 		enum object_type type;
 		unsigned long size;
 		char *buf;
 
-		if (oidset_contains(&gitmodules_done, oid))
+		if (oidset_contains(&options->gitmodules_done, oid))
 			continue;
 
 		buf = read_object_file(oid, &type, &size);
@@ -1251,8 +1248,8 @@ int fsck_finish(struct fsck_options *options)
 	}
 
 
-	oidset_clear(&gitmodules_found);
-	oidset_clear(&gitmodules_done);
+	oidset_clear(&options->gitmodules_found);
+	oidset_clear(&options->gitmodules_done);
 	return ret;
 }
 
diff --git a/fsck.h b/fsck.h
index 28137a77df0..99c77289688 100644
--- a/fsck.h
+++ b/fsck.h
@@ -116,11 +116,13 @@ struct fsck_options {
 	unsigned strict:1;
 	enum fsck_msg_type *msg_type;
 	struct oidset skiplist;
+	struct oidset gitmodules_found;
+	struct oidset gitmodules_done;
 	kh_oid_map_t *object_names;
 };
 
-#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT, NULL }
-#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT, NULL }
+#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT, OIDSET_INIT, OIDSET_INIT, NULL }
+#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT, OIDSET_INIT, OIDSET_INIT, NULL }
 
 /* descend in all linked child objects
  * the return value is:
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* Re: [PATCH 2/4] http-fetch: allow custom index-pack args
  2021-02-16 22:57       ` Junio C Hamano
@ 2021-02-17 19:46         ` Jonathan Tan
  0 siblings, 0 replies; 229+ messages in thread
From: Jonathan Tan @ 2021-02-17 19:46 UTC (permalink / raw)
  To: gitster; +Cc: steadmon, jonathantanmy, git

> Josh Steadmon <steadmon@google.com> writes:
> 
> >> +--index-pack-args=<args>::
> >> +	For internal use only. The command to run on the contents of the
> >> +	downloaded pack. Arguments are URL-encoded separated by spaces.
> >
> > I'm a bit skeptical of using URL encoding to work around embedded
> > spaces. I believe in Emily's config-based hooks series, she wrote an
> > argument parser to pull repeated arguments into a strvec, could you do
> > something like that here?
> >
> > I'm sympathetic to the idea that since this is an internal-only flag, we
> > can be a bit weird with the argument format, though.
> 
> We tend to prefer quote.c::sq_quote*() suite of quoting; does this
> codepath have very different constraints that require different
> encoding?

My main issue was that I needed to join arbitrary strings and then split
them, which is why I URL-encoded them (so that they would no longer
contain spaces) and then used spaces as the "join" separator. With
Josh's suggestion, I wouldn't need any sort of encoding or quoting, so I
think I'll use that.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 4/4] fetch-pack: print and use dangling .gitmodules
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
                             ` (14 preceding siblings ...)
  2021-02-17 19:42           ` [PATCH 14/14] fsck.c: move gitmodules_{found,done} into fsck_options Ævar Arnfjörð Bjarmason
@ 2021-02-17 20:05           ` Jonathan Tan
  15 siblings, 0 replies; 229+ messages in thread
From: Jonathan Tan @ 2021-02-17 20:05 UTC (permalink / raw)
  To: avarab; +Cc: jonathantanmy, git

> > I tried that first, and the issue is that IDs like
> > FSCK_MSG_GITMODULES_MISSING are internal to fsck.c. As for whether we
> > should start exposing the IDs publicly, I think we should wait until a
> > few new cases like this come up, so that we more fully understand the
> > requirements first.
> 
> The requirement is that you want the objects ids we'd otherwise error
> about in fsck_finish(). Yeah we don't pass the "fsck_msg_id" down in the
> "report()" function, but you can reliably strstr() it out of the
> message.

We can't strstr() because of false positives (if, e.g. there is a
submodule name that contains the string we're looking for), but looking
at report() in fsck.c, the message ID is the very first thing appended,
so I think we can use starts_with().

> We document & hard rely on that already, since it's also a
> config key.

Ah, good point.

> But yeah, we could just change the report function to pass down the id
> and move the relevant macros from fsck.c to fsck.h. I think that would
> be a smaller change conceptually than a special-case flag in
> fsck_options for something we could otherwise do with the error
> reporting.

I agree - I thought this wouldn't be possible, but like you said, we can
reliably make use of the string in report() (or pass the ID, like your
patch set [1] does) so we should do this.

What would be the best way to proceed, now that we have at least 2 patch
sets (mine and yours) in play? I was thinking that I should update my
one to use the string reported in report() (with starts_with()), so that
both our patch sets can be reviewed and merged in parallel, and after
that, update the fsck code to use the ID instead of the string.

[1] https://lore.kernel.org/git/87blcja2ha.fsf@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 4/4] fetch-pack: print and use dangling .gitmodules
  2021-02-17  2:10         ` Ævar Arnfjörð Bjarmason
@ 2021-02-17 20:10           ` Jonathan Tan
  2021-02-18 12:07             ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 229+ messages in thread
From: Jonathan Tan @ 2021-02-17 20:10 UTC (permalink / raw)
  To: avarab; +Cc: jonathantanmy, git

> Sorry for being unclear here. I don't think (honestly I don't remember,
> it's been almost a month) that I meant to you should use the skipList.
> 
> Looking at that code again we use object_on_skiplist() to do an early
> punt in report(), but also fsck_blob(), presumably you never want the
> latter, and that early punting wouldn't be needed if your report()
> function intercepted the modules blob id for stashing it away / later
> reporting / whatever.
> 
> So yeah, I'm 99% sure now that's not what I meant :)
> 
> What I meant with:
> 
>     Or if we want to keep the "print <list> | process"[...]
> 
> Is that we have an existing ad-hoc IPC model for these commands in
> passing along the skipList, which is made more complex because sometimes
> the initial process reads the file, sometimes it passes it along as-is
> to the child.
> 
> And then there's this patch that passes OIDs too, but through a
> different mechanism.
> 
> I was suggesting that perhaps it made more sense to refactor both so
> they could use the same mechanism, because we're potentially passing two
> lists of OIDs between the two. Just one goes via line-at-a-time in the
> output, the other via a config option on the command-line.

Thanks for your explanation. I still think that they are quite different
- skiplist is a user-written file containing a list of OIDs that will
likely never change, whereas my list of dangling .gitmodules is a list
of OIDs dynamically generated (and thus, always different) whenever a
fetch is done. So I think it's quite reasonable to pass skiplist as a
file name, and my list should be passed line-by-line.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 4/4] fetch-pack: print and use dangling .gitmodules
  2021-02-17 19:27     ` Ævar Arnfjörð Bjarmason
@ 2021-02-17 20:11       ` Jonathan Tan
  0 siblings, 0 replies; 229+ messages in thread
From: Jonathan Tan @ 2021-02-17 20:11 UTC (permalink / raw)
  To: avarab; +Cc: jonathantanmy, git

> What's the need for STRICT here & can't the former use the existing
> fsck_options in index-pack.c? With this on top we pass all tests:

[snip code]

Good point - I'll do that.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 00/14] fsck: API improvements
  2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
@ 2021-02-17 21:02             ` Junio C Hamano
  2021-02-18  0:00               ` Ævar Arnfjörð Bjarmason
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                               ` (10 subsequent siblings)
  11 siblings, 1 reply; 229+ messages in thread
From: Junio C Hamano @ 2021-02-17 21:02 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Jonathan Tan pointed out that the fsck error_func doesn't pass you the
> ID of the fsck failure in [1]. This series improves the API so it
> does, and moves the gitmodules_{found,done} variables into the
> fsck_options struct.
>
> The result is that instead of the "print_dangling_gitmodules" member
> in that series we can just implement that with the diff at the end of
> this cover letter (goes on top of a merge of this series & "seen"),
> and without any changes to fsck_finish().
>
> This conflicts with other in-flight fsck changes but the conflict is
> rather trivial. Jeff King has another concurrent series to add a
> couple of new fsck checks, those need to be moved to fsck.h, and
> there's another trivial conflict in 2 hunks due to the
> gitmodules_{found,done} move.
>
> 1. https://lore.kernel.org/git/87blcja2ha.fsf@evledraar.gmail.com/

Let's get this reviewed now, but with expectation that it will be
rebased after the dust settles.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 02/14] fsck.h: use use "enum object_type" instead of "int"
  2021-02-17 19:42           ` [PATCH 02/14] fsck.h: use use "enum object_type" instead of "int" Ævar Arnfjörð Bjarmason
@ 2021-02-17 23:40             ` Junio C Hamano
  0 siblings, 0 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-02-17 23:40 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Subject: Re: [PATCH 02/14] fsck.h: use use "enum object_type" instead of "int"

use use.

> Change the fsck_walk_func to use an "enum object_type" instead of an
> "int" type. The types are compatible, and ever since this was added in
> 355885d5315 (add generic, type aware object chain walker, 2008-02-25)
> we've used entries from object_type (OBJ_BLOB etc.).
>
> So this doesn't really change anything as far as the generated code is
> concerned, it just gives the compiler more information and makes this
> easier to read.

Yup, as long as we won't trick the compiler into complaining "ah,
but you are not covering OBJ_OFS_DELTA or OBJ_BAD values in your
switch statement", I think a change like this is a good thing.

> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  builtin/fsck.c           | 3 ++-
>  builtin/index-pack.c     | 3 ++-
>  builtin/unpack-objects.c | 3 ++-
>  fsck.h                   | 3 ++-
>  4 files changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index 821e7798c70..68f0329e69e 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -197,7 +197,8 @@ static int traverse_reachable(void)
>  	return !!result;
>  }
>  
> -static int mark_used(struct object *obj, int type, void *data, struct fsck_options *options)
> +static int mark_used(struct object *obj, enum object_type object_type,
> +		     void *data, struct fsck_options *options)
>  {
>  	if (!obj)
>  		return 1;
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index 54f74c48741..2f291a14d4a 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -212,7 +212,8 @@ static void cleanup_thread(void)
>  	free(thread_data);
>  }
>  
> -static int mark_link(struct object *obj, int type, void *data, struct fsck_options *options)
> +static int mark_link(struct object *obj, enum object_type type,
> +		     void *data, struct fsck_options *options)
>  {
>  	if (!obj)
>  		return -1;
> diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
> index dd4a75e030d..ca54fd16688 100644
> --- a/builtin/unpack-objects.c
> +++ b/builtin/unpack-objects.c
> @@ -187,7 +187,8 @@ static void write_cached_object(struct object *obj, struct obj_buffer *obj_buf)
>   * that have reachability requirements and calls this function.
>   * Verify its reachability and validity recursively and write it out.
>   */
> -static int check_object(struct object *obj, int type, void *data, struct fsck_options *options)
> +static int check_object(struct object *obj, enum object_type type,
> +			void *data, struct fsck_options *options)
>  {
>  	struct obj_buffer *obj_buf;
>  
> diff --git a/fsck.h b/fsck.h
> index df0b64a2163..0c75789d219 100644
> --- a/fsck.h
> +++ b/fsck.h
> @@ -23,7 +23,8 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type);
>   *     <0	error signaled and abort
>   *     >0	error signaled and do not abort
>   */
> -typedef int (*fsck_walk_func)(struct object *obj, int type, void *data, struct fsck_options *options);
> +typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type,
> +			      void *data, struct fsck_options *options);
>  
>  /* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
>  typedef int (*fsck_error)(struct fsck_options *o,

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 00/14] fsck: API improvements
  2021-02-17 21:02             ` Junio C Hamano
@ 2021-02-18  0:00               ` Ævar Arnfjörð Bjarmason
  2021-02-18 19:12                 ` Junio C Hamano
  0 siblings, 1 reply; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-18  0:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan


On Wed, Feb 17 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> Jonathan Tan pointed out that the fsck error_func doesn't pass you the
>> ID of the fsck failure in [1]. This series improves the API so it
>> does, and moves the gitmodules_{found,done} variables into the
>> fsck_options struct.
>>
>> The result is that instead of the "print_dangling_gitmodules" member
>> in that series we can just implement that with the diff at the end of
>> this cover letter (goes on top of a merge of this series & "seen"),
>> and without any changes to fsck_finish().
>>
>> This conflicts with other in-flight fsck changes but the conflict is
>> rather trivial. Jeff King has another concurrent series to add a
>> couple of new fsck checks, those need to be moved to fsck.h, and
>> there's another trivial conflict in 2 hunks due to the
>> gitmodules_{found,done} move.
>>
>> 1. https://lore.kernel.org/git/87blcja2ha.fsf@evledraar.gmail.com/
>
> Let's get this reviewed now, but with expectation that it will be
> rebased after the dust settles.

Makes sense. Pending a review of this would you be interested in queuing
a v2 of this that doesn't conflict with in-flight topics?

Patches 01..09 & 13/14 can live conflict-free with what's in "seen" now
(I'd have made the 13th the 10th in v1 if I'd noticed). Then I could
re-roll the remainder of this once the other topics land.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen')
  2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
  2021-02-17 21:02             ` Junio C Hamano
@ 2021-02-18 10:58             ` Ævar Arnfjörð Bjarmason
  2021-02-18 22:19               ` Junio C Hamano
                                 ` (23 more replies)
  2021-02-18 10:58             ` [PATCH v2 01/10] fsck.h: indent arguments to of fsck_set_msg_type Ævar Arnfjörð Bjarmason
                               ` (9 subsequent siblings)
  11 siblings, 24 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-18 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

As suggested in
https://lore.kernel.org/git/87zh028ctp.fsf@evledraar.gmail.com/ a
version of this that doesn't conflict with other in-flight topics. I
can submit the rest later.

Ævar Arnfjörð Bjarmason (10):
  fsck.h: indent arguments to of fsck_set_msg_type
  fsck.h: use "enum object_type" instead of "int"
  fsck.c: rename variables in fsck_set_msg_type() for less confusion
  fsck.c: move definition of msg_id into append_msg_id()
  fsck.c: rename remaining fsck_msg_id "id" to "msg_id"
  fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
  fsck.c: call parse_msg_type() early in fsck_set_msg_type()
  fsck.c: undefine temporary STR macro after use
  fsck.c: give "FOREACH_MSG_ID" a more specific name
  fsck.h: update FSCK_OPTIONS_* for object_name

 builtin/fsck.c           |  5 ++--
 builtin/index-pack.c     |  3 +-
 builtin/mktag.c          |  3 +-
 builtin/unpack-objects.c |  3 +-
 fsck.c                   | 60 ++++++++++++++++++++--------------------
 fsck.h                   | 26 +++++++++--------
 6 files changed, 54 insertions(+), 46 deletions(-)

Range-diff:
 -:  ----------- >  1:  88b347b74ed fsck.h: indent arguments to of fsck_set_msg_type
 1:  1a60d65d2ca !  2:  868eac3d4d1 fsck.h: use use "enum object_type" instead of "int"
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    fsck.h: use use "enum object_type" instead of "int"
    +    fsck.h: use "enum object_type" instead of "int"
     
         Change the fsck_walk_func to use an "enum object_type" instead of an
         "int" type. The types are compatible, and ever since this was added in
 2:  24761f269b7 =  3:  f599dc6c8f3 fsck.c: rename variables in fsck_set_msg_type() for less confusion
 3:  fb4c66f9305 =  4:  33f3b1942c1 fsck.c: move definition of msg_id into append_msg_id()
 4:  a129dbd9964 =  5:  28c9245e418 fsck.c: rename remaining fsck_msg_id "id" to "msg_id"
 5:  d9bee41072e =  6:  d25037c6f18 fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
 6:  423568026c3 =  7:  66d0f1047cc fsck.c: call parse_msg_type() early in fsck_set_msg_type()
 7:  cb43e832738 =  8:  7643a5bf211 fsck.c: undefine temporary STR macro after use
 8:  2cd14cb4e2a =  9:  7c64e2267ce fsck.c: give "FOREACH_MSG_ID" a more specific name
 9:  1ada154ef23 <  -:  ----------- fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h
10:  c4179445f22 <  -:  ----------- fsck.c: pass along the fsck_msg_id in the fsck_error callback
11:  c1fc724f0e8 <  -:  ----------- fsck.c: add an fsck_set_msg_type() API that takes enums
12:  8de91fac068 = 10:  a98a3512629 fsck.h: update FSCK_OPTIONS_* for object_name
13:  29ff97856ff <  -:  ----------- fsck.c: move gitmodules_{found,done} into fsck_options
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 229+ messages in thread

* [PATCH v2 01/10] fsck.h: indent arguments to of fsck_set_msg_type
  2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
  2021-02-17 21:02             ` Junio C Hamano
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
@ 2021-02-18 10:58             ` Ævar Arnfjörð Bjarmason
  2021-02-18 10:58             ` [PATCH v2 02/10] fsck.h: use "enum object_type" instead of "int" Ævar Arnfjörð Bjarmason
                               ` (8 subsequent siblings)
  11 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-18 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fsck.h b/fsck.h
index 423c467feb7..df0b64a2163 100644
--- a/fsck.h
+++ b/fsck.h
@@ -11,7 +11,7 @@ struct fsck_options;
 struct object;
 
 void fsck_set_msg_type(struct fsck_options *options,
-		const char *msg_id, const char *msg_type);
+		       const char *msg_id, const char *msg_type);
 void fsck_set_msg_types(struct fsck_options *options, const char *values);
 int is_valid_msg_type(const char *msg_id, const char *msg_type);
 
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 02/10] fsck.h: use "enum object_type" instead of "int"
  2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
                               ` (2 preceding siblings ...)
  2021-02-18 10:58             ` [PATCH v2 01/10] fsck.h: indent arguments to of fsck_set_msg_type Ævar Arnfjörð Bjarmason
@ 2021-02-18 10:58             ` Ævar Arnfjörð Bjarmason
  2021-02-18 10:58             ` [PATCH v2 03/10] fsck.c: rename variables in fsck_set_msg_type() for less confusion Ævar Arnfjörð Bjarmason
                               ` (7 subsequent siblings)
  11 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-18 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Change the fsck_walk_func to use an "enum object_type" instead of an
"int" type. The types are compatible, and ever since this was added in
355885d5315 (add generic, type aware object chain walker, 2008-02-25)
we've used entries from object_type (OBJ_BLOB etc.).

So this doesn't really change anything as far as the generated code is
concerned, it just gives the compiler more information and makes this
easier to read.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c           | 3 ++-
 builtin/index-pack.c     | 3 ++-
 builtin/unpack-objects.c | 3 ++-
 fsck.h                   | 3 ++-
 4 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 821e7798c70..68f0329e69e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -197,7 +197,8 @@ static int traverse_reachable(void)
 	return !!result;
 }
 
-static int mark_used(struct object *obj, int type, void *data, struct fsck_options *options)
+static int mark_used(struct object *obj, enum object_type object_type,
+		     void *data, struct fsck_options *options)
 {
 	if (!obj)
 		return 1;
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 54f74c48741..2f291a14d4a 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -212,7 +212,8 @@ static void cleanup_thread(void)
 	free(thread_data);
 }
 
-static int mark_link(struct object *obj, int type, void *data, struct fsck_options *options)
+static int mark_link(struct object *obj, enum object_type type,
+		     void *data, struct fsck_options *options)
 {
 	if (!obj)
 		return -1;
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index dd4a75e030d..ca54fd16688 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -187,7 +187,8 @@ static void write_cached_object(struct object *obj, struct obj_buffer *obj_buf)
  * that have reachability requirements and calls this function.
  * Verify its reachability and validity recursively and write it out.
  */
-static int check_object(struct object *obj, int type, void *data, struct fsck_options *options)
+static int check_object(struct object *obj, enum object_type type,
+			void *data, struct fsck_options *options)
 {
 	struct obj_buffer *obj_buf;
 
diff --git a/fsck.h b/fsck.h
index df0b64a2163..0c75789d219 100644
--- a/fsck.h
+++ b/fsck.h
@@ -23,7 +23,8 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type);
  *     <0	error signaled and abort
  *     >0	error signaled and do not abort
  */
-typedef int (*fsck_walk_func)(struct object *obj, int type, void *data, struct fsck_options *options);
+typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type,
+			      void *data, struct fsck_options *options);
 
 /* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
 typedef int (*fsck_error)(struct fsck_options *o,
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 03/10] fsck.c: rename variables in fsck_set_msg_type() for less confusion
  2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
                               ` (3 preceding siblings ...)
  2021-02-18 10:58             ` [PATCH v2 02/10] fsck.h: use "enum object_type" instead of "int" Ævar Arnfjörð Bjarmason
@ 2021-02-18 10:58             ` Ævar Arnfjörð Bjarmason
  2021-02-18 19:45               ` Jeff King
  2021-02-18 10:58             ` [PATCH v2 04/10] fsck.c: move definition of msg_id into append_msg_id() Ævar Arnfjörð Bjarmason
                               ` (6 subsequent siblings)
  11 siblings, 1 reply; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-18 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Rename variables in a function added in 0282f4dced0 (fsck: offer a
function to demote fsck errors to warnings, 2015-06-22).

It was needlessly confusing that it took a "msg_type" argument, but
then later declared another "msg_type" of a different type.

Let's rename that to "tmp", and rename "id" to "msg_id" and "msg_id"
to "msg_id_str" etc. This will make a follow-up change smaller.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/fsck.c b/fsck.c
index 4b7f0b73d73..acccad243ec 100644
--- a/fsck.c
+++ b/fsck.c
@@ -203,27 +203,27 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type)
 }
 
 void fsck_set_msg_type(struct fsck_options *options,
-		const char *msg_id, const char *msg_type)
+		const char *msg_id_str, const char *msg_type_str)
 {
-	int id = parse_msg_id(msg_id), type;
+	int msg_id = parse_msg_id(msg_id_str), msg_type;
 
-	if (id < 0)
-		die("Unhandled message id: %s", msg_id);
-	type = parse_msg_type(msg_type);
+	if (msg_id < 0)
+		die("Unhandled message id: %s", msg_id_str);
+	msg_type = parse_msg_type(msg_type_str);
 
-	if (type != FSCK_ERROR && msg_id_info[id].msg_type == FSCK_FATAL)
-		die("Cannot demote %s to %s", msg_id, msg_type);
+	if (msg_type != FSCK_ERROR && msg_id_info[msg_id].msg_type == FSCK_FATAL)
+		die("Cannot demote %s to %s", msg_id_str, msg_type_str);
 
 	if (!options->msg_type) {
 		int i;
-		int *msg_type;
-		ALLOC_ARRAY(msg_type, FSCK_MSG_MAX);
+		int *tmp;
+		ALLOC_ARRAY(tmp, FSCK_MSG_MAX);
 		for (i = 0; i < FSCK_MSG_MAX; i++)
-			msg_type[i] = fsck_msg_type(i, options);
-		options->msg_type = msg_type;
+			tmp[i] = fsck_msg_type(i, options);
+		options->msg_type = tmp;
 	}
 
-	options->msg_type[id] = type;
+	options->msg_type[msg_id] = msg_type;
 }
 
 void fsck_set_msg_types(struct fsck_options *options, const char *values)
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 04/10] fsck.c: move definition of msg_id into append_msg_id()
  2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
                               ` (4 preceding siblings ...)
  2021-02-18 10:58             ` [PATCH v2 03/10] fsck.c: rename variables in fsck_set_msg_type() for less confusion Ævar Arnfjörð Bjarmason
@ 2021-02-18 10:58             ` Ævar Arnfjörð Bjarmason
  2021-02-18 10:58             ` [PATCH v2 05/10] fsck.c: rename remaining fsck_msg_id "id" to "msg_id" Ævar Arnfjörð Bjarmason
                               ` (5 subsequent siblings)
  11 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-18 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Refactor code added in 71ab8fa840f (fsck: report the ID of the
error/warning, 2015-06-22) to resolve the msg_id to a string in the
function that wants it, instead of doing it in report().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fsck.c b/fsck.c
index acccad243ec..1070071ffec 100644
--- a/fsck.c
+++ b/fsck.c
@@ -264,8 +264,9 @@ void fsck_set_msg_types(struct fsck_options *options, const char *values)
 	free(to_free);
 }
 
-static void append_msg_id(struct strbuf *sb, const char *msg_id)
+static void append_msg_id(struct strbuf *sb, enum fsck_msg_id id)
 {
+	const char *msg_id = msg_id_info[id].id_string;
 	for (;;) {
 		char c = *(msg_id)++;
 
@@ -308,7 +309,7 @@ static int report(struct fsck_options *options,
 	else if (msg_type == FSCK_INFO)
 		msg_type = FSCK_WARN;
 
-	append_msg_id(&sb, msg_id_info[id].id_string);
+	append_msg_id(&sb, id);
 
 	va_start(ap, fmt);
 	strbuf_vaddf(&sb, fmt, ap);
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 05/10] fsck.c: rename remaining fsck_msg_id "id" to "msg_id"
  2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
                               ` (5 preceding siblings ...)
  2021-02-18 10:58             ` [PATCH v2 04/10] fsck.c: move definition of msg_id into append_msg_id() Ævar Arnfjörð Bjarmason
@ 2021-02-18 10:58             ` Ævar Arnfjörð Bjarmason
  2021-02-18 22:23               ` Junio C Hamano
  2021-02-18 10:58             ` [PATCH v2 06/10] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Ævar Arnfjörð Bjarmason
                               ` (4 subsequent siblings)
  11 siblings, 1 reply; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-18 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Rename the remaining variables of type fsck_msg_id from "id" to
"msg_id". This change is relatively small, and is worth the churn for
a later change where we have different id's in the "report" function.
---
 fsck.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fsck.c b/fsck.c
index 1070071ffec..dbb6f7c4ee2 100644
--- a/fsck.c
+++ b/fsck.c
@@ -264,19 +264,19 @@ void fsck_set_msg_types(struct fsck_options *options, const char *values)
 	free(to_free);
 }
 
-static void append_msg_id(struct strbuf *sb, enum fsck_msg_id id)
+static void append_msg_id(struct strbuf *sb, enum fsck_msg_id msg_id)
 {
-	const char *msg_id = msg_id_info[id].id_string;
+	const char *msg_id_str = msg_id_info[msg_id].id_string;
 	for (;;) {
-		char c = *(msg_id)++;
+		char c = *(msg_id_str)++;
 
 		if (!c)
 			break;
 		if (c != '_')
 			strbuf_addch(sb, tolower(c));
 		else {
-			assert(*msg_id);
-			strbuf_addch(sb, *(msg_id)++);
+			assert(*msg_id_str);
+			strbuf_addch(sb, *(msg_id_str)++);
 		}
 	}
 
@@ -292,11 +292,11 @@ static int object_on_skiplist(struct fsck_options *opts,
 __attribute__((format (printf, 5, 6)))
 static int report(struct fsck_options *options,
 		  const struct object_id *oid, enum object_type object_type,
-		  enum fsck_msg_id id, const char *fmt, ...)
+		  enum fsck_msg_id msg_id, const char *fmt, ...)
 {
 	va_list ap;
 	struct strbuf sb = STRBUF_INIT;
-	int msg_type = fsck_msg_type(id, options), result;
+	int msg_type = fsck_msg_type(msg_id, options), result;
 
 	if (msg_type == FSCK_IGNORE)
 		return 0;
@@ -309,7 +309,7 @@ static int report(struct fsck_options *options,
 	else if (msg_type == FSCK_INFO)
 		msg_type = FSCK_WARN;
 
-	append_msg_id(&sb, id);
+	append_msg_id(&sb, msg_id);
 
 	va_start(ap, fmt);
 	strbuf_vaddf(&sb, fmt, ap);
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 06/10] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
  2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
                               ` (6 preceding siblings ...)
  2021-02-18 10:58             ` [PATCH v2 05/10] fsck.c: rename remaining fsck_msg_id "id" to "msg_id" Ævar Arnfjörð Bjarmason
@ 2021-02-18 10:58             ` Ævar Arnfjörð Bjarmason
  2021-02-18 19:52               ` Jeff King
  2021-02-18 10:58             ` [PATCH v2 07/10] fsck.c: call parse_msg_type() early in fsck_set_msg_type() Ævar Arnfjörð Bjarmason
                               ` (3 subsequent siblings)
  11 siblings, 1 reply; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-18 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Move the FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} defines into a new
fsck_msg_type enum.

These defines were originally introduced in:

 - ba002f3b28a (builtin-fsck: move common object checking code to
   fsck.c, 2008-02-25)
 - f50c4407305 (fsck: disallow demoting grave fsck errors to warnings,
   2015-06-22)
 - efaba7cc77f (fsck: optionally ignore specific fsck issues
   completely, 2015-06-22)
 - f27d05b1704 (fsck: allow upgrading fsck warnings to errors,
   2015-06-22)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  |  2 +-
 builtin/mktag.c |  3 ++-
 fsck.c          | 21 ++++++++++-----------
 fsck.h          | 17 ++++++++++-------
 4 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 68f0329e69e..d6d745dc702 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -89,7 +89,7 @@ static int objerror(struct object *obj, const char *err)
 static int fsck_error_func(struct fsck_options *o,
 			   const struct object_id *oid,
 			   enum object_type object_type,
-			   int msg_type, const char *message)
+			   enum fsck_msg_type msg_type, const char *message)
 {
 	switch (msg_type) {
 	case FSCK_WARN:
diff --git a/builtin/mktag.c b/builtin/mktag.c
index 41a399a69e4..1834394a9b6 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -22,7 +22,8 @@ static int mktag_config(const char *var, const char *value, void *cb)
 static int mktag_fsck_error_func(struct fsck_options *o,
 				 const struct object_id *oid,
 				 enum object_type object_type,
-				 int msg_type, const char *message)
+				 enum fsck_msg_type msg_type,
+				 const char *message)
 {
 	switch (msg_type) {
 	case FSCK_WARN:
diff --git a/fsck.c b/fsck.c
index dbb6f7c4ee2..00e0fef21ca 100644
--- a/fsck.c
+++ b/fsck.c
@@ -22,9 +22,6 @@
 static struct oidset gitmodules_found = OIDSET_INIT;
 static struct oidset gitmodules_done = OIDSET_INIT;
 
-#define FSCK_FATAL -1
-#define FSCK_INFO -2
-
 #define FOREACH_MSG_ID(FUNC) \
 	/* fatal errors */ \
 	FUNC(NUL_IN_HEADER, FATAL) \
@@ -97,7 +94,7 @@ static struct {
 	const char *id_string;
 	const char *downcased;
 	const char *camelcased;
-	int msg_type;
+	enum fsck_msg_type msg_type;
 } msg_id_info[FSCK_MSG_MAX + 1] = {
 	FOREACH_MSG_ID(MSG_ID)
 	{ NULL, NULL, NULL, -1 }
@@ -164,10 +161,10 @@ void list_config_fsck_msg_ids(struct string_list *list, const char *prefix)
 		list_config_item(list, prefix, msg_id_info[i].camelcased);
 }
 
-static int fsck_msg_type(enum fsck_msg_id msg_id,
+static enum fsck_msg_type fsck_msg_type(enum fsck_msg_id msg_id,
 	struct fsck_options *options)
 {
-	int msg_type;
+	enum fsck_msg_type msg_type;
 
 	assert(msg_id >= 0 && msg_id < FSCK_MSG_MAX);
 
@@ -182,7 +179,7 @@ static int fsck_msg_type(enum fsck_msg_id msg_id,
 	return msg_type;
 }
 
-static int parse_msg_type(const char *str)
+static enum fsck_msg_type parse_msg_type(const char *str)
 {
 	if (!strcmp(str, "error"))
 		return FSCK_ERROR;
@@ -205,7 +202,8 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type)
 void fsck_set_msg_type(struct fsck_options *options,
 		const char *msg_id_str, const char *msg_type_str)
 {
-	int msg_id = parse_msg_id(msg_id_str), msg_type;
+	int msg_id = parse_msg_id(msg_id_str);
+	enum fsck_msg_type msg_type;
 
 	if (msg_id < 0)
 		die("Unhandled message id: %s", msg_id_str);
@@ -216,7 +214,7 @@ void fsck_set_msg_type(struct fsck_options *options,
 
 	if (!options->msg_type) {
 		int i;
-		int *tmp;
+		enum fsck_msg_type *tmp;
 		ALLOC_ARRAY(tmp, FSCK_MSG_MAX);
 		for (i = 0; i < FSCK_MSG_MAX; i++)
 			tmp[i] = fsck_msg_type(i, options);
@@ -296,7 +294,8 @@ static int report(struct fsck_options *options,
 {
 	va_list ap;
 	struct strbuf sb = STRBUF_INIT;
-	int msg_type = fsck_msg_type(msg_id, options), result;
+	enum fsck_msg_type msg_type = fsck_msg_type(msg_id, options);
+	int result;
 
 	if (msg_type == FSCK_IGNORE)
 		return 0;
@@ -1262,7 +1261,7 @@ int fsck_object(struct object *obj, void *data, unsigned long size,
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid,
 			enum object_type object_type,
-			int msg_type, const char *message)
+			enum fsck_msg_type msg_type, const char *message)
 {
 	if (msg_type == FSCK_WARN) {
 		warning("object %s: %s", fsck_describe_object(o, oid), message);
diff --git a/fsck.h b/fsck.h
index 0c75789d219..c77e8ddf10b 100644
--- a/fsck.h
+++ b/fsck.h
@@ -3,10 +3,13 @@
 
 #include "oidset.h"
 
-#define FSCK_ERROR 1
-#define FSCK_WARN 2
-#define FSCK_IGNORE 3
-
+enum fsck_msg_type {
+	FSCK_INFO = -2,
+	FSCK_FATAL = -1,
+	FSCK_ERROR = 1,
+	FSCK_WARN,
+	FSCK_IGNORE
+};
 struct fsck_options;
 struct object;
 
@@ -29,17 +32,17 @@ typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type,
 /* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
 typedef int (*fsck_error)(struct fsck_options *o,
 			  const struct object_id *oid, enum object_type object_type,
-			  int msg_type, const char *message);
+			  enum fsck_msg_type msg_type, const char *message);
 
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid, enum object_type object_type,
-			int msg_type, const char *message);
+			enum fsck_msg_type msg_type, const char *message);
 
 struct fsck_options {
 	fsck_walk_func walk;
 	fsck_error error_func;
 	unsigned strict:1;
-	int *msg_type;
+	enum fsck_msg_type *msg_type;
 	struct oidset skiplist;
 	kh_oid_map_t *object_names;
 };
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 07/10] fsck.c: call parse_msg_type() early in fsck_set_msg_type()
  2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
                               ` (7 preceding siblings ...)
  2021-02-18 10:58             ` [PATCH v2 06/10] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Ævar Arnfjörð Bjarmason
@ 2021-02-18 10:58             ` Ævar Arnfjörð Bjarmason
  2021-02-18 22:29               ` Junio C Hamano
  2021-02-18 10:58             ` [PATCH v2 08/10] fsck.c: undefine temporary STR macro after use Ævar Arnfjörð Bjarmason
                               ` (2 subsequent siblings)
  11 siblings, 1 reply; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-18 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

There's no reason to defer the calling of parse_msg_type() until after
we've checked if the "id < 0". This is not a hot codepath, and
parse_msg_type() itself may die on invalid input.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fsck.c b/fsck.c
index 00e0fef21ca..7c53080ad48 100644
--- a/fsck.c
+++ b/fsck.c
@@ -203,11 +203,10 @@ void fsck_set_msg_type(struct fsck_options *options,
 		const char *msg_id_str, const char *msg_type_str)
 {
 	int msg_id = parse_msg_id(msg_id_str);
-	enum fsck_msg_type msg_type;
+	enum fsck_msg_type msg_type = parse_msg_type(msg_type_str);
 
 	if (msg_id < 0)
 		die("Unhandled message id: %s", msg_id_str);
-	msg_type = parse_msg_type(msg_type_str);
 
 	if (msg_type != FSCK_ERROR && msg_id_info[msg_id].msg_type == FSCK_FATAL)
 		die("Cannot demote %s to %s", msg_id_str, msg_type_str);
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 08/10] fsck.c: undefine temporary STR macro after use
  2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
                               ` (8 preceding siblings ...)
  2021-02-18 10:58             ` [PATCH v2 07/10] fsck.c: call parse_msg_type() early in fsck_set_msg_type() Ævar Arnfjörð Bjarmason
@ 2021-02-18 10:58             ` Ævar Arnfjörð Bjarmason
  2021-02-18 22:30               ` Junio C Hamano
  2021-02-18 10:58             ` [PATCH v2 09/10] fsck.c: give "FOREACH_MSG_ID" a more specific name Ævar Arnfjörð Bjarmason
  2021-02-18 10:58             ` [PATCH v2 10/10] fsck.h: update FSCK_OPTIONS_* for object_name Ævar Arnfjörð Bjarmason
  11 siblings, 1 reply; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-18 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

In f417eed8cde (fsck: provide a function to parse fsck message IDs,
2015-06-22) the "STR" macro was introduced, but that short macro name
was not undefined after use as was done earlier in the same series for
the MSG_ID macro in c99ba492f1c (fsck: introduce identifiers for fsck
messages, 2015-06-22).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fsck.c b/fsck.c
index 7c53080ad48..88884e91c89 100644
--- a/fsck.c
+++ b/fsck.c
@@ -100,6 +100,7 @@ static struct {
 	{ NULL, NULL, NULL, -1 }
 };
 #undef MSG_ID
+#undef STR
 
 static void prepare_msg_ids(void)
 {
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 09/10] fsck.c: give "FOREACH_MSG_ID" a more specific name
  2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
                               ` (9 preceding siblings ...)
  2021-02-18 10:58             ` [PATCH v2 08/10] fsck.c: undefine temporary STR macro after use Ævar Arnfjörð Bjarmason
@ 2021-02-18 10:58             ` Ævar Arnfjörð Bjarmason
  2021-02-18 19:56               ` Jeff King
  2021-02-18 10:58             ` [PATCH v2 10/10] fsck.h: update FSCK_OPTIONS_* for object_name Ævar Arnfjörð Bjarmason
  11 siblings, 1 reply; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-18 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Rename the FOREACH_MSG_ID macro to FOREACH_FSCK_MSG_ID in preparation
for moving it over to fsck.h. It's good convention to name macros
in *.h files in such a way as to clearly not clash with any other
names in other files.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fsck.c b/fsck.c
index 88884e91c89..1730acd698d 100644
--- a/fsck.c
+++ b/fsck.c
@@ -22,7 +22,7 @@
 static struct oidset gitmodules_found = OIDSET_INIT;
 static struct oidset gitmodules_done = OIDSET_INIT;
 
-#define FOREACH_MSG_ID(FUNC) \
+#define FOREACH_FSCK_MSG_ID(FUNC) \
 	/* fatal errors */ \
 	FUNC(NUL_IN_HEADER, FATAL) \
 	FUNC(UNTERMINATED_HEADER, FATAL) \
@@ -83,7 +83,7 @@ static struct oidset gitmodules_done = OIDSET_INIT;
 
 #define MSG_ID(id, msg_type) FSCK_MSG_##id,
 enum fsck_msg_id {
-	FOREACH_MSG_ID(MSG_ID)
+	FOREACH_FSCK_MSG_ID(MSG_ID)
 	FSCK_MSG_MAX
 };
 #undef MSG_ID
@@ -96,7 +96,7 @@ static struct {
 	const char *camelcased;
 	enum fsck_msg_type msg_type;
 } msg_id_info[FSCK_MSG_MAX + 1] = {
-	FOREACH_MSG_ID(MSG_ID)
+	FOREACH_FSCK_MSG_ID(MSG_ID)
 	{ NULL, NULL, NULL, -1 }
 };
 #undef MSG_ID
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 10/10] fsck.h: update FSCK_OPTIONS_* for object_name
  2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
                               ` (10 preceding siblings ...)
  2021-02-18 10:58             ` [PATCH v2 09/10] fsck.c: give "FOREACH_MSG_ID" a more specific name Ævar Arnfjörð Bjarmason
@ 2021-02-18 10:58             ` Ævar Arnfjörð Bjarmason
  2021-02-18 19:56               ` Jeff King
  2021-02-18 22:32               ` Junio C Hamano
  11 siblings, 2 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-18 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Add the object_name member to the initialization macro. This was
omitted in 7b35efd734e (fsck_walk(): optionally name objects on the
go, 2016-07-17) when the field was added.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fsck.h b/fsck.h
index c77e8ddf10b..5d44ff1c8e3 100644
--- a/fsck.h
+++ b/fsck.h
@@ -47,8 +47,8 @@ struct fsck_options {
 	kh_oid_map_t *object_names;
 };
 
-#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT }
-#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT }
+#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT, NULL }
+#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT, NULL }
 
 /* descend in all linked child objects
  * the return value is:
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* Re: [PATCH 0/4] Check .gitmodules when using packfile URIs
  2021-01-28  0:35     ` Jonathan Tan
@ 2021-02-18 11:31       ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-18 11:31 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: gitster, git, Patrick Steinhardt


On Thu, Jan 28 2021, Jonathan Tan wrote:

>> Jonathan Tan <jonathantanmy@google.com> writes:
>> 
>> > As part of this, index-pack has to output (1) the hash that goes into
>> > the name of the .pack/.idx file and (2) the hashes of all dangling
>> > .gitmodules. I just had (2) come after (1). If anyone has a better idea,
>> > I'm interested.
>> 
>> I have this feeling that the "blobs that need to be validated across
>> packs" will *not* be the last enhancement we'd need to make to the
>> output from index-pack to allow richer communication between it and
>> its invoker.  While there is no reason to change how the first line
>> of the output looks like, we'd probably want to make sure that the
>> future versions of Git can easily tell "list of blobs that require
>> further validation" from other additional information.
>> 
>> I am not comfortable to recommend "ok, then let's add a delimiter
>> line '---\n' if/when we need to have something after the list of
>> blobs and append more stuff in future versions of Git", because we
>> may find need to emit new kinds of info before the list of blobs
>> that needs further validation, for example, in future versions of
>> Git.
>> 
>> Having said all that, the internal communication between the
>> index-pack and its caller do not need as much care about
>> compatibility across versions as output visible to end-users, so
>> when a future version of Git needs to send different kinds of
>> information in different order from what you created here, we can do
>> so pretty much freely, I would guess.
>
> Yeah, that's what I thought too - since this is an internal interface,
> we can evolve them in lockstep. If we're really worried about the Git
> binaries (on a user's system) getting out of sync, 

I'm thinking in reading "getting out of sync" that you may be missing an
aspect of the issue here.

We're not talking about some abnormal error in some packaging system,
but how we'd expect all installations of git to behave if you update
them with *.rpm, *.deb etc, e.g. when your binaries are in
/usr/libexec/git-core. I suppose NixOS or something where there's
hash-based paths may be exempt from this.

On those systems if you've got a server serving concurrent traffic and
update the "git" package you could expect failure if any git process
invoked by another is incompatible during such an upgrade.

If you browse some of the recent GIT_CONFIG_PARAMETERS discussion this
was discussed there. I.e. even if GIT_CONFIG_PARAMETERS is internal-only
we bent over backwards not to change it in such a way as to have process
A invoking process B and the two not understanding each other because of
such an upgrade.

That's exactly because of this case, where receive-pack may be started
on version A, someone runs "apt install git" in the background
concurrently, and now a version A of that program is talking to a
version B index-pack.

> we could just make sure that subsequent updates to this protocol are
> non-backwards-compatible (e.g. have index-pack emit "foo <hash>",
> where "foo" is a string that describes the new check, so that current
> fetch-pack will reject "foo" since it is not a hash).

And then presumably index-pack would die and receive-pack would die on
the push or whatever, so the push fails for the end user.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 4/4] fetch-pack: print and use dangling .gitmodules
  2021-02-17 20:10           ` Jonathan Tan
@ 2021-02-18 12:07             ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-18 12:07 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git


On Wed, Feb 17 2021, Jonathan Tan wrote:

>> Sorry for being unclear here. I don't think (honestly I don't remember,
>> it's been almost a month) that I meant to you should use the skipList.
>> 
>> Looking at that code again we use object_on_skiplist() to do an early
>> punt in report(), but also fsck_blob(), presumably you never want the
>> latter, and that early punting wouldn't be needed if your report()
>> function intercepted the modules blob id for stashing it away / later
>> reporting / whatever.
>> 
>> So yeah, I'm 99% sure now that's not what I meant :)
>> 
>> What I meant with:
>> 
>>     Or if we want to keep the "print <list> | process"[...]
>> 
>> Is that we have an existing ad-hoc IPC model for these commands in
>> passing along the skipList, which is made more complex because sometimes
>> the initial process reads the file, sometimes it passes it along as-is
>> to the child.
>> 
>> And then there's this patch that passes OIDs too, but through a
>> different mechanism.
>> 
>> I was suggesting that perhaps it made more sense to refactor both so
>> they could use the same mechanism, because we're potentially passing two
>> lists of OIDs between the two. Just one goes via line-at-a-time in the
>> output, the other via a config option on the command-line.
>
> Thanks for your explanation. I still think that they are quite different
> - skiplist is a user-written file containing a list of OIDs that will
> likely never change, whereas my list of dangling .gitmodules is a list
> of OIDs dynamically generated (and thus, always different) whenever a
> fetch is done. So I think it's quite reasonable to pass skiplist as a
> file name, and my list should be passed line-by-line.

Sure, but I'm not talking about passing it as a tempfile.

Yes, I suggested that in the third-to-last paragraph of [1] but then
went on to say that we could also move to some IPC mechanism where you
spew in the list of dangling .gitmodules, and we also spew in the
skipList and anything else we want to pass in.

I'm not saying this needs to be part of this series. But let me
rephrase:

We now have some combination of
{receive-pack,upload-pack,send-pack,fetch-pack,unpack-objects} that need
to communicate locally or pass data back & forth, passing data either
via a CLI option to read a file, packnames/refs on --stdin, or (now) a
single list of OIDs on stdout.

Let's say we don't just need to pass the .gitmodules OIDs, but also
e.g. .mailmap OIDs or whatever (due to some future vulnerability).

Would this IPC mechanism deal with that, or would we need to introduce a
breaking change (Re: my recently send mail about concurrent updates of
libexec programs)? Can we use soemething like pkt-line to talk back &
forth in an extensible way?

Not needed now, just food for thought...

1. https://lore.kernel.org/git/87czxu7c15.fsf@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 00/14] fsck: API improvements
  2021-02-18  0:00               ` Ævar Arnfjörð Bjarmason
@ 2021-02-18 19:12                 ` Junio C Hamano
  2021-02-18 19:57                   ` Jeff King
  0 siblings, 1 reply; 229+ messages in thread
From: Junio C Hamano @ 2021-02-18 19:12 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> Let's get this reviewed now, but with expectation that it will be
>> rebased after the dust settles.
>
> Makes sense. Pending a review of this would you be interested in queuing
> a v2 of this that doesn't conflict with in-flight topics?

Not really.  I am not sure your recent patches are getting
sufficient review bandwidth they deserve.

> Patches 01..09 & 13/14 can live conflict-free with what's in "seen" now
> (I'd have made the 13th the 10th in v1 if I'd noticed). Then I could
> re-roll the remainder of this once the other topics land.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v2 03/10] fsck.c: rename variables in fsck_set_msg_type() for less confusion
  2021-02-18 10:58             ` [PATCH v2 03/10] fsck.c: rename variables in fsck_set_msg_type() for less confusion Ævar Arnfjörð Bjarmason
@ 2021-02-18 19:45               ` Jeff King
  0 siblings, 0 replies; 229+ messages in thread
From: Jeff King @ 2021-02-18 19:45 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Johannes Schindelin, Jonathan Tan

On Thu, Feb 18, 2021 at 11:58:33AM +0100, Ævar Arnfjörð Bjarmason wrote:

> Rename variables in a function added in 0282f4dced0 (fsck: offer a
> function to demote fsck errors to warnings, 2015-06-22).
> 
> It was needlessly confusing that it took a "msg_type" argument, but
> then later declared another "msg_type" of a different type.
> 
> Let's rename that to "tmp", and rename "id" to "msg_id" and "msg_id"
> to "msg_id_str" etc. This will make a follow-up change smaller.

I think this is an improvement, though maybe "severity" would be a
less-generic term than "type".

>  void fsck_set_msg_type(struct fsck_options *options,
> -		const char *msg_id, const char *msg_type)
> +		const char *msg_id_str, const char *msg_type_str)
>  {
> -	int id = parse_msg_id(msg_id), type;
> +	int msg_id = parse_msg_id(msg_id_str), msg_type;

I always get nervous when a refactoring renames something away from
"foo", and then renames another thing _to_ "foo". Any untouched bits of
code are vulnerable to confusing them.

But I think the types are sufficiently different that we can mostly rely
on the compiler (though things like numeric or bool comparisons can work
with either pointers or ints), and the fact that we can see the entire
function is small enough that we can see the entire thing in the context
here.

So I think it is OK.

-Peff

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v2 06/10] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
  2021-02-18 10:58             ` [PATCH v2 06/10] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Ævar Arnfjörð Bjarmason
@ 2021-02-18 19:52               ` Jeff King
  2021-02-18 22:27                 ` Junio C Hamano
  0 siblings, 1 reply; 229+ messages in thread
From: Jeff King @ 2021-02-18 19:52 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Johannes Schindelin, Jonathan Tan

On Thu, Feb 18, 2021 at 11:58:36AM +0100, Ævar Arnfjörð Bjarmason wrote:

> Move the FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} defines into a new
> fsck_msg_type enum.

Makes sense. As with my previous comment, I wonder if "severity" is a
more descriptive term.

> diff --git a/fsck.h b/fsck.h
> index 0c75789d219..c77e8ddf10b 100644
> --- a/fsck.h
> +++ b/fsck.h
> @@ -3,10 +3,13 @@
>  
>  #include "oidset.h"
>  
> -#define FSCK_ERROR 1
> -#define FSCK_WARN 2
> -#define FSCK_IGNORE 3
> -
> +enum fsck_msg_type {
> +	FSCK_INFO = -2,
> +	FSCK_FATAL = -1,
> +	FSCK_ERROR = 1,
> +	FSCK_WARN,
> +	FSCK_IGNORE
> +};

You kept the values the same as they were before, which is good in a
refactoring step, but...wow, the ordering is weird and confusing.

In FATAL/ERROR/WARN/IGNORE the number increases as severity decreases.
Maybe reversed from how I'd do it, but at least the order makes sense.
But somehow INFO is on the far side of FATAL?

Again, not something to address in this patch, but I hope something we
could maybe deal with in the longer term (perhaps along with fixing the
weird "INFO is a warning from the user's perspective, but WARNING is
generally an error" behavior).

I also know that this is assigning WARN and IGNORE based on
counting-by-one from ERROR, so it's correct. But I think it would be
more obvious if you simply filled in the values manually, so a reader
does not have to wonder why some are assigned and some are not.

-Peff

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v2 10/10] fsck.h: update FSCK_OPTIONS_* for object_name
  2021-02-18 10:58             ` [PATCH v2 10/10] fsck.h: update FSCK_OPTIONS_* for object_name Ævar Arnfjörð Bjarmason
@ 2021-02-18 19:56               ` Jeff King
  2021-02-18 22:33                 ` Junio C Hamano
  2021-02-18 22:32               ` Junio C Hamano
  1 sibling, 1 reply; 229+ messages in thread
From: Jeff King @ 2021-02-18 19:56 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Johannes Schindelin, Jonathan Tan

On Thu, Feb 18, 2021 at 11:58:40AM +0100, Ævar Arnfjörð Bjarmason wrote:

> Add the object_name member to the initialization macro. This was
> omitted in 7b35efd734e (fsck_walk(): optionally name objects on the
> go, 2016-07-17) when the field was added.

We're correct either way here, because trailing fields that are not
initialized will get the usual zero-initialization. But I don't mind
trying to be more complete.

That said, we have embraced designated initializers these days, in which
case we usually omit the NULL ones. So perhaps:

  #define FSCK_OPTIONS_DEFAULT { \
	.walk = fsck_error_function, \
	.skiplist = OIDSET_INIT, \
  }
  #define FSCK_OPTIONS_STRICT { \
	.walk = fsck_error_function, \
	.skiplist = OIDSET_INIT, \
	.strict = 1, \
  }

would be more readable still?

-Peff

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v2 09/10] fsck.c: give "FOREACH_MSG_ID" a more specific name
  2021-02-18 10:58             ` [PATCH v2 09/10] fsck.c: give "FOREACH_MSG_ID" a more specific name Ævar Arnfjörð Bjarmason
@ 2021-02-18 19:56               ` Jeff King
  0 siblings, 0 replies; 229+ messages in thread
From: Jeff King @ 2021-02-18 19:56 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Johannes Schindelin, Jonathan Tan

On Thu, Feb 18, 2021 at 11:58:39AM +0100, Ævar Arnfjörð Bjarmason wrote:

> Rename the FOREACH_MSG_ID macro to FOREACH_FSCK_MSG_ID in preparation
> for moving it over to fsck.h. It's good convention to name macros
> in *.h files in such a way as to clearly not clash with any other
> names in other files.

The patch to move it is not in this v2 of the series, so arguably this
is less interesting. However, I think the resulting code is equally or
more readable, so I don't mind it standing on its own.

-Peff

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 00/14] fsck: API improvements
  2021-02-18 19:12                 ` Junio C Hamano
@ 2021-02-18 19:57                   ` Jeff King
  2021-02-18 20:27                     ` Junio C Hamano
  2021-02-18 22:36                     ` Junio C Hamano
  0 siblings, 2 replies; 229+ messages in thread
From: Jeff King @ 2021-02-18 19:57 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, git, Johannes Schindelin,
	Jonathan Tan

On Thu, Feb 18, 2021 at 11:12:26AM -0800, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> 
> >> Let's get this reviewed now, but with expectation that it will be
> >> rebased after the dust settles.
> >
> > Makes sense. Pending a review of this would you be interested in queuing
> > a v2 of this that doesn't conflict with in-flight topics?
> 
> Not really.  I am not sure your recent patches are getting
> sufficient review bandwidth they deserve.

FWIW, I just read through v2 (without having looked at all at v1 yet!),
and they all seemed like quite reasonable cleanups. I left a few small
comments that might be worth a quick re-roll, but I would also be OK
with the patches being picked up as-is.

-Peff

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 00/14] fsck: API improvements
  2021-02-18 19:57                   ` Jeff King
@ 2021-02-18 20:27                     ` Junio C Hamano
  2021-02-19  0:54                       ` Ævar Arnfjörð Bjarmason
  2021-02-18 22:36                     ` Junio C Hamano
  1 sibling, 1 reply; 229+ messages in thread
From: Junio C Hamano @ 2021-02-18 20:27 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, git, Johannes Schindelin,
	Jonathan Tan

Jeff King <peff@peff.net> writes:

> On Thu, Feb 18, 2021 at 11:12:26AM -0800, Junio C Hamano wrote:
>
>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>> 
>> >> Let's get this reviewed now, but with expectation that it will be
>> >> rebased after the dust settles.
>> >
>> > Makes sense. Pending a review of this would you be interested in queuing
>> > a v2 of this that doesn't conflict with in-flight topics?
>> 
>> Not really.  I am not sure your recent patches are getting
>> sufficient review bandwidth they deserve.
>
> FWIW, I just read through v2 (without having looked at all at v1 yet!),
> and they all seemed like quite reasonable cleanups. I left a few small
> comments that might be worth a quick re-roll, but I would also be OK
> with the patches being picked up as-is.

That's good to hear.  I shouldn't even have bothered to answer the
question, if the v2 were to have sent to the list without waiting
for my reply ;-)

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen')
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
@ 2021-02-18 22:19               ` Junio C Hamano
  2021-03-06 11:04               ` [PATCH v3 00/22] fsck: API improvements Ævar Arnfjörð Bjarmason
                                 ` (22 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-02-18 22:19 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> As suggested in
> https://lore.kernel.org/git/87zh028ctp.fsf@evledraar.gmail.com/ a
> version of this that doesn't conflict with other in-flight topics. I
> can submit the rest later.

And a bystander does not have a clue what this thing is about,
beyond that it tweaks fsck API, how urgent it would be, what benefit
it brings to us?

That kind of things are expected to be described here.

The cover letter of v1 does not do much better job, either, but is
it fair to understand that this primarily is about allowing the
callback functions (which handle various problems fsck machinery
finds) to learn what error it encountered, so that things like
"enumerate missing .gitmodules blobs" 384c9d1c (fetch-pack: print
and use dangling .gitmodules, 2021-01-23) wants to do does not have
to be written by inserting a very narrow custom code into the
general error reporting codepath, but by customizing the error
reporting function?

If so, can we at least say something a bit more specific and
focused, than the overly broad "API improvements"?

THanks.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v2 05/10] fsck.c: rename remaining fsck_msg_id "id" to "msg_id"
  2021-02-18 10:58             ` [PATCH v2 05/10] fsck.c: rename remaining fsck_msg_id "id" to "msg_id" Ævar Arnfjörð Bjarmason
@ 2021-02-18 22:23               ` Junio C Hamano
  0 siblings, 0 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-02-18 22:23 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Rename the remaining variables of type fsck_msg_id from "id" to
> "msg_id". This change is relatively small, and is worth the churn for
> a later change where we have different id's in the "report" function.
> ---
>  fsck.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)

Up to this point I have no objections to the patches themselves, but
this one is not signed off.

> diff --git a/fsck.c b/fsck.c
> index 1070071ffec..dbb6f7c4ee2 100644
> --- a/fsck.c
> +++ b/fsck.c
> @@ -264,19 +264,19 @@ void fsck_set_msg_types(struct fsck_options *options, const char *values)
>  	free(to_free);
>  }
>  
> -static void append_msg_id(struct strbuf *sb, enum fsck_msg_id id)
> +static void append_msg_id(struct strbuf *sb, enum fsck_msg_id msg_id)
>  {
> -	const char *msg_id = msg_id_info[id].id_string;
> +	const char *msg_id_str = msg_id_info[msg_id].id_string;
>  	for (;;) {
> -		char c = *(msg_id)++;
> +		char c = *(msg_id_str)++;
>  
>  		if (!c)
>  			break;
>  		if (c != '_')
>  			strbuf_addch(sb, tolower(c));
>  		else {
> -			assert(*msg_id);
> -			strbuf_addch(sb, *(msg_id)++);
> +			assert(*msg_id_str);
> +			strbuf_addch(sb, *(msg_id_str)++);
>  		}
>  	}
>  
> @@ -292,11 +292,11 @@ static int object_on_skiplist(struct fsck_options *opts,
>  __attribute__((format (printf, 5, 6)))
>  static int report(struct fsck_options *options,
>  		  const struct object_id *oid, enum object_type object_type,
> -		  enum fsck_msg_id id, const char *fmt, ...)
> +		  enum fsck_msg_id msg_id, const char *fmt, ...)
>  {
>  	va_list ap;
>  	struct strbuf sb = STRBUF_INIT;
> -	int msg_type = fsck_msg_type(id, options), result;
> +	int msg_type = fsck_msg_type(msg_id, options), result;
>  
>  	if (msg_type == FSCK_IGNORE)
>  		return 0;
> @@ -309,7 +309,7 @@ static int report(struct fsck_options *options,
>  	else if (msg_type == FSCK_INFO)
>  		msg_type = FSCK_WARN;
>  
> -	append_msg_id(&sb, id);
> +	append_msg_id(&sb, msg_id);
>  
>  	va_start(ap, fmt);
>  	strbuf_vaddf(&sb, fmt, ap);

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v2 06/10] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
  2021-02-18 19:52               ` Jeff King
@ 2021-02-18 22:27                 ` Junio C Hamano
  0 siblings, 0 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-02-18 22:27 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, git, Johannes Schindelin,
	Jonathan Tan

Jeff King <peff@peff.net> writes:

> On Thu, Feb 18, 2021 at 11:58:36AM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> Move the FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} defines into a new
>> fsck_msg_type enum.
>
> Makes sense. As with my previous comment, I wonder if "severity" is a
> more descriptive term.
>
>> diff --git a/fsck.h b/fsck.h
>> index 0c75789d219..c77e8ddf10b 100644
>> --- a/fsck.h
>> +++ b/fsck.h
>> @@ -3,10 +3,13 @@
>>  
>>  #include "oidset.h"
>>  
>> -#define FSCK_ERROR 1
>> -#define FSCK_WARN 2
>> -#define FSCK_IGNORE 3
>> -
>> +enum fsck_msg_type {
>> +	FSCK_INFO = -2,
>> +	FSCK_FATAL = -1,
>> +	FSCK_ERROR = 1,
>> +	FSCK_WARN,
>> +	FSCK_IGNORE
>> +};
>
> You kept the values the same as they were before, which is good in a
> refactoring step, but...wow, the ordering is weird and confusing.
>
> In FATAL/ERROR/WARN/IGNORE the number increases as severity decreases.
> Maybe reversed from how I'd do it, but at least the order makes sense.
> But somehow INFO is on the far side of FATAL?
>
> Again, not something to address in this patch, but I hope something we
> could maybe deal with in the longer term (perhaps along with fixing the
> weird "INFO is a warning from the user's perspective, but WARNING is
> generally an error" behavior).
>
> I also know that this is assigning WARN and IGNORE based on
> counting-by-one from ERROR, so it's correct. But I think it would be
> more obvious if you simply filled in the values manually, so a reader
> does not have to wonder why some are assigned and some are not.

I had the same reaction, plus "Wow, we had FSCK_* constants in two
different places and without colliding?  Have we been lucky?
Declaring it in one place, whether we use enum or not (as enum is
not very useful in C as a type checking vehicle), makes a lot of
sense but why does this come this late in the series, instead of
being at the front as a trivial low-hanging fruit?"

Thanks.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v2 07/10] fsck.c: call parse_msg_type() early in fsck_set_msg_type()
  2021-02-18 10:58             ` [PATCH v2 07/10] fsck.c: call parse_msg_type() early in fsck_set_msg_type() Ævar Arnfjörð Bjarmason
@ 2021-02-18 22:29               ` Junio C Hamano
  0 siblings, 0 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-02-18 22:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> There's no reason to defer the calling of parse_msg_type() until after
> we've checked if the "id < 0". This is not a hot codepath, and
> parse_msg_type() itself may die on invalid input.

That explains why this change can be done, but does not justify why
it is a good change.  Unlike all the previous steps, I would rather
say this is borderline needless churn.

Let's keep reading as the picture may change as we touch more code
around this area.

Thanks.


>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  fsck.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/fsck.c b/fsck.c
> index 00e0fef21ca..7c53080ad48 100644
> --- a/fsck.c
> +++ b/fsck.c
> @@ -203,11 +203,10 @@ void fsck_set_msg_type(struct fsck_options *options,
>  		const char *msg_id_str, const char *msg_type_str)
>  {
>  	int msg_id = parse_msg_id(msg_id_str);
> -	enum fsck_msg_type msg_type;
> +	enum fsck_msg_type msg_type = parse_msg_type(msg_type_str);
>  
>  	if (msg_id < 0)
>  		die("Unhandled message id: %s", msg_id_str);
> -	msg_type = parse_msg_type(msg_type_str);
>  
>  	if (msg_type != FSCK_ERROR && msg_id_info[msg_id].msg_type == FSCK_FATAL)
>  		die("Cannot demote %s to %s", msg_id_str, msg_type_str);

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v2 08/10] fsck.c: undefine temporary STR macro after use
  2021-02-18 10:58             ` [PATCH v2 08/10] fsck.c: undefine temporary STR macro after use Ævar Arnfjörð Bjarmason
@ 2021-02-18 22:30               ` Junio C Hamano
  0 siblings, 0 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-02-18 22:30 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> In f417eed8cde (fsck: provide a function to parse fsck message IDs,
> 2015-06-22) the "STR" macro was introduced, but that short macro name
> was not undefined after use as was done earlier in the same series for
> the MSG_ID macro in c99ba492f1c (fsck: introduce identifiers for fsck
> messages, 2015-06-22).
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  fsck.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/fsck.c b/fsck.c
> index 7c53080ad48..88884e91c89 100644
> --- a/fsck.c
> +++ b/fsck.c
> @@ -100,6 +100,7 @@ static struct {
>  	{ NULL, NULL, NULL, -1 }
>  };
>  #undef MSG_ID
> +#undef STR

Good clean-up.

>  
>  static void prepare_msg_ids(void)
>  {

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v2 10/10] fsck.h: update FSCK_OPTIONS_* for object_name
  2021-02-18 10:58             ` [PATCH v2 10/10] fsck.h: update FSCK_OPTIONS_* for object_name Ævar Arnfjörð Bjarmason
  2021-02-18 19:56               ` Jeff King
@ 2021-02-18 22:32               ` Junio C Hamano
  1 sibling, 0 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-02-18 22:32 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Add the object_name member to the initialization macro. This was
> omitted in 7b35efd734e (fsck_walk(): optionally name objects on the
> go, 2016-07-17) when the field was added.

This is more of a Meh to me.  If this were to change us to
designated initializers and omit NULL and 0 initialization, it would
be more interesting.

Thanks.

>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  fsck.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fsck.h b/fsck.h
> index c77e8ddf10b..5d44ff1c8e3 100644
> --- a/fsck.h
> +++ b/fsck.h
> @@ -47,8 +47,8 @@ struct fsck_options {
>  	kh_oid_map_t *object_names;
>  };
>  
> -#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT }
> -#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT }
> +#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT, NULL }
> +#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT, NULL }
>  
>  /* descend in all linked child objects
>   * the return value is:

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v2 10/10] fsck.h: update FSCK_OPTIONS_* for object_name
  2021-02-18 19:56               ` Jeff King
@ 2021-02-18 22:33                 ` Junio C Hamano
  0 siblings, 0 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-02-18 22:33 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, git, Johannes Schindelin,
	Jonathan Tan

Jeff King <peff@peff.net> writes:

> On Thu, Feb 18, 2021 at 11:58:40AM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> Add the object_name member to the initialization macro. This was
>> omitted in 7b35efd734e (fsck_walk(): optionally name objects on the
>> go, 2016-07-17) when the field was added.
>
> We're correct either way here, because trailing fields that are not
> initialized will get the usual zero-initialization. But I don't mind
> trying to be more complete.
>
> That said, we have embraced designated initializers these days, in which
> case we usually omit the NULL ones. So perhaps:
>
>   #define FSCK_OPTIONS_DEFAULT { \
> 	.walk = fsck_error_function, \
> 	.skiplist = OIDSET_INIT, \
>   }
>   #define FSCK_OPTIONS_STRICT { \
> 	.walk = fsck_error_function, \
> 	.skiplist = OIDSET_INIT, \
> 	.strict = 1, \
>   }
>
> would be more readable still?

Ahh, I should probably have read your reviews first before reading
patches myself ;-)

Thanks.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 00/14] fsck: API improvements
  2021-02-18 19:57                   ` Jeff King
  2021-02-18 20:27                     ` Junio C Hamano
@ 2021-02-18 22:36                     ` Junio C Hamano
  1 sibling, 0 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-02-18 22:36 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Jeff King
  Cc: git, Johannes Schindelin, Jonathan Tan

Jeff King <peff@peff.net> writes:

> On Thu, Feb 18, 2021 at 11:12:26AM -0800, Junio C Hamano wrote:
>
>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>> 
>> >> Let's get this reviewed now, but with expectation that it will be
>> >> rebased after the dust settles.
>> >
>> > Makes sense. Pending a review of this would you be interested in queuing
>> > a v2 of this that doesn't conflict with in-flight topics?
>> 
>> Not really.  I am not sure your recent patches are getting
>> sufficient review bandwidth they deserve.
>
> FWIW, I just read through v2 (without having looked at all at v1 yet!),
> and they all seemed like quite reasonable cleanups. I left a few small
> comments that might be worth a quick re-roll, but I would also be OK
> with the patches being picked up as-is.

Yeah, all except for a handful minor nits looked good.

Thanks for writing and reviewing.  Perhaps a final reroll to tie the
loose ends, or is it just a matter of signing off one of them and
droping a couple of other ones (which other ones)?




^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 0/4] Check .gitmodules when using packfile URIs
  2021-01-24  2:34 ` [PATCH 0/4] Check .gitmodules when using packfile URIs Jonathan Tan
                     ` (4 preceding siblings ...)
  2021-01-24  6:29   ` [PATCH 0/4] Check .gitmodules when using packfile URIs Junio C Hamano
@ 2021-02-18 23:34   ` Junio C Hamano
  2021-02-19  0:46     ` Jonathan Tan
  2021-02-19  1:08     ` Ævar Arnfjörð Bjarmason
  5 siblings, 2 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-02-18 23:34 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> This patch set resolves the .gitmodules-and-tree-in-separate-packfiles
> issue I mentioned in [1] by having index-pack print out all dangling
> .gitmodules (instead of returning with an error code) and then teaching
> fetch-pack to read those and run its own fsck checks after all
> index-pack invocations are complete.
>
> As part of this, index-pack has to output (1) the hash that goes into
> the name of the .pack/.idx file and (2) the hashes of all dangling
> .gitmodules. I just had (2) come after (1). If anyone has a better idea,
> I'm interested.
>
> I also discovered a bug in that different index-pack arguments were used
> when processing the inline packfile and when processing the ones
> referenced by URIs. Patch 1-3 fixes that bug by passing the arguments to
> use as a space-separated URL-encoded list. (URL-encoded so that we can
> have spaces in the arguments.) Again, if anyone has a better idea, I'm
> interested. It is only in patch 4 that we have the dangling .gitmodules
> fix.

This seems to have been stalled but I think it would be a better
approach to use a custom callback for error reporting, suggested by
Ævar, which would be where his fsck API clean-up topic would lead
to.

If it is not ultra-urgent, perhaps you can retract the ones that are
queued right now, work with Ævar to finish the error-callback work
and rebuild this topic on top of it?  Thanks.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 0/4] Check .gitmodules when using packfile URIs
  2021-02-18 23:34   ` Junio C Hamano
@ 2021-02-19  0:46     ` Jonathan Tan
  2021-02-20  3:31       ` Junio C Hamano
  2021-02-19  1:08     ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 229+ messages in thread
From: Jonathan Tan @ 2021-02-19  0:46 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git

> This seems to have been stalled but I think it would be a better
> approach to use a custom callback for error reporting, suggested by
> Ævar, which would be where his fsck API clean-up topic would lead
> to.
> 
> If it is not ultra-urgent, perhaps you can retract the ones that are
> queued right now, work with Ævar to finish the error-callback work
> and rebuild this topic on top of it?  Thanks.

OK - that works. My original idea was to rewrite it using an
error-callback but using starts_with() instead of the ID that Ævar's
work will provide, but seeing that at least one other contributor (Peff)
seems OK with the patches, rebasing mine on top of his works too. I'll
also take a look at his patches.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 00/14] fsck: API improvements
  2021-02-18 20:27                     ` Junio C Hamano
@ 2021-02-19  0:54                       ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-19  0:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git, Johannes Schindelin, Jonathan Tan


On Thu, Feb 18 2021, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
>
>> On Thu, Feb 18, 2021 at 11:12:26AM -0800, Junio C Hamano wrote:
>>
>>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>>> 
>>> >> Let's get this reviewed now, but with expectation that it will be
>>> >> rebased after the dust settles.
>>> >
>>> > Makes sense. Pending a review of this would you be interested in queuing
>>> > a v2 of this that doesn't conflict with in-flight topics?
>>> 
>>> Not really.  I am not sure your recent patches are getting
>>> sufficient review bandwidth they deserve.
>>
>> FWIW, I just read through v2 (without having looked at all at v1 yet!),
>> and they all seemed like quite reasonable cleanups. I left a few small
>> comments that might be worth a quick re-roll, but I would also be OK
>> with the patches being picked up as-is.
>
> That's good to hear.  I shouldn't even have bothered to answer the
> question, if the v2 were to have sent to the list without waiting
> for my reply ;-)

FWIW it's not that I didn't care about the reply, but I'm somewhat
intermittently available time/network wise in the coming days. And
there's the TZ difference between us.

I sent v1 thinking you might be willing to pick it up & resolve the
conflict, but since you expressed an interest in deferring it until
conflicting work landed figured I'd ask (and then just sent the patches)
if you'd be interested in a conflict-free version to queue alongside
those changes.

If it was still "nah" fair enough, I'd just wait. But if not those
patches would be there to pickup.

Thanks a lot to you & Jeff for the review on v2. I won't have time to
address all that today, and in any case I got the message that maybe I
should stop firehosing the list with patch series's for a bit :)

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 0/4] Check .gitmodules when using packfile URIs
  2021-02-18 23:34   ` Junio C Hamano
  2021-02-19  0:46     ` Jonathan Tan
@ 2021-02-19  1:08     ` Ævar Arnfjörð Bjarmason
  2021-02-20  3:29       ` Junio C Hamano
  1 sibling, 1 reply; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-19  1:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Tan, git


On Fri, Feb 19 2021, Junio C Hamano wrote:

> Jonathan Tan <jonathantanmy@google.com> writes:
>
>> This patch set resolves the .gitmodules-and-tree-in-separate-packfiles
>> issue I mentioned in [1] by having index-pack print out all dangling
>> .gitmodules (instead of returning with an error code) and then teaching
>> fetch-pack to read those and run its own fsck checks after all
>> index-pack invocations are complete.
>>
>> As part of this, index-pack has to output (1) the hash that goes into
>> the name of the .pack/.idx file and (2) the hashes of all dangling
>> .gitmodules. I just had (2) come after (1). If anyone has a better idea,
>> I'm interested.
>>
>> I also discovered a bug in that different index-pack arguments were used
>> when processing the inline packfile and when processing the ones
>> referenced by URIs. Patch 1-3 fixes that bug by passing the arguments to
>> use as a space-separated URL-encoded list. (URL-encoded so that we can
>> have spaces in the arguments.) Again, if anyone has a better idea, I'm
>> interested. It is only in patch 4 that we have the dangling .gitmodules
>> fix.
>
> This seems to have been stalled but I think it would be a better
> approach to use a custom callback for error reporting, suggested by
> Ævar, which would be where his fsck API clean-up topic would lead
> to.
>
> If it is not ultra-urgent, perhaps you can retract the ones that are
> queued right now, work with Ævar to finish the error-callback work
> and rebuild this topic on top of it?  Thanks.

If my vote counts for something I think it makes sense to have
Jonathan's series go first and just ignore my fsck API improvement
patches (well, the part of my v1[1] which conflicts with his work).

I'm also happy to help him queue his on top of a v1 version of my
series.

But the end result of doing so (shown after the "--" in [1]) is just a
small re-arrangement of code to get a cleaner fsck API use, it doesn't
actually matter to anyone using git.

Whereas his patches actually do, we have in-the-wild server/repo/clone
setups that are getting on-clone errors, and the window for 2.31 is
getting closer.

We can always do the small API use refactoring later. My interest in
barking up that tree was just that I've been poking at that part of the
fsck API and have some follow-up work that hasn't made it onto the list
yet that makes other use of the fsck API.

So in the longer term I wanted us to think about not needing N special
cases like "print_dangling_gitmodules" if we could help it, but in the
shorter term having it is a non-issue.

1. https://lore.kernel.org/git/20210217194246.25342-1-avarab@gmail.com/

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 0/4] Check .gitmodules when using packfile URIs
  2021-02-19  1:08     ` Ævar Arnfjörð Bjarmason
@ 2021-02-20  3:29       ` Junio C Hamano
  0 siblings, 0 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-02-20  3:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Jonathan Tan, git

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Fri, Feb 19 2021, Junio C Hamano wrote:
>
>> This seems to have been stalled but I think it would be a better
>> approach to use a custom callback for error reporting, suggested by
>> Ævar, which would be where his fsck API clean-up topic would lead
>> to.
>>
>> If it is not ultra-urgent, perhaps you can retract the ones that are
>> queued right now, work with Ævar to finish the error-callback work
>> and rebuild this topic on top of it?  Thanks.
>
> If my vote counts for something I think it makes sense to have
> Jonathan's series go first and just ignore my fsck API improvement
> patches (well, the part of my v1[1] which conflicts with his work).
>
> I'm also happy to help him queue his on top of a v1 version of my
> series.

Either would work for us, I would think.

Thanks.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 0/4] Check .gitmodules when using packfile URIs
  2021-02-19  0:46     ` Jonathan Tan
@ 2021-02-20  3:31       ` Junio C Hamano
  0 siblings, 0 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-02-20  3:31 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, Ævar Arnfjörð Bjarmason

Jonathan Tan <jonathantanmy@google.com> writes:

>> This seems to have been stalled but I think it would be a better
>> approach to use a custom callback for error reporting, suggested by
>> Ævar, which would be where his fsck API clean-up topic would lead
>> to.
>> 
>> If it is not ultra-urgent, perhaps you can retract the ones that are
>> queued right now, work with Ævar to finish the error-callback work
>> and rebuild this topic on top of it?  Thanks.
>
> OK - that works. My original idea was to rewrite it using an
> error-callback but using starts_with() instead of the ID that Ævar's
> work will provide, but seeing that at least one other contributor (Peff)
> seems OK with the patches, rebasing mine on top of his works too. I'll
> also take a look at his patches.

Thanks, either way would work for me, but if the suggested route
forces you review Ævar's code and work together, that would be a
good bonus point ;-)


^ permalink raw reply	[flat|nested] 229+ messages in thread

* [PATCH v2 0/4] Check .gitmodules when using packfile URIs
  2021-01-15 23:43 RFC on packfile URIs and .gitmodules check Jonathan Tan
                   ` (2 preceding siblings ...)
  2021-01-24  2:34 ` [PATCH 0/4] Check .gitmodules when using packfile URIs Jonathan Tan
@ 2021-02-22 19:20 ` Jonathan Tan
  2021-02-22 19:20   ` [PATCH v2 1/4] http: allow custom index-pack args Jonathan Tan
                     ` (4 more replies)
  3 siblings, 5 replies; 229+ messages in thread
From: Jonathan Tan @ 2021-02-22 19:20 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, avarab, gitster

Here's v2. I think I've addressed all the review comments, including
passing the index-pack args as separate arguments (to avoid the
necessity to somehow encode in order to get rid of spaces), and by using
a custom error function instead of a specific option in fsck.

This applies on master. I mentioned earlier [1] that I was planning to
implement this on Ævar's fsck API improvements, but after looking at the
latest v2, I see that it omits patch 11 from v1 (which is the one I
need), so what I've done is to use a string check in the meantime.

[1] https://lore.kernel.org/git/20210219004612.1181920-1-jonathantanmy@google.com/

Jonathan Tan (4):
  http: allow custom index-pack args
  http-fetch: allow custom index-pack args
  fetch-pack: with packfile URIs, use index-pack arg
  fetch-pack: print and use dangling .gitmodules

 Documentation/git-http-fetch.txt |  10 ++-
 Documentation/git-index-pack.txt |   7 ++-
 builtin/index-pack.c             |  25 +++++++-
 builtin/receive-pack.c           |   2 +-
 fetch-pack.c                     | 103 ++++++++++++++++++++++++++-----
 fsck.c                           |   5 ++
 fsck.h                           |   2 +
 http-fetch.c                     |  20 +++++-
 http.c                           |  15 ++---
 http.h                           |  10 +--
 pack-write.c                     |   8 ++-
 pack.h                           |   2 +-
 t/t5550-http-fetch-dumb.sh       |   5 +-
 t/t5702-protocol-v2.sh           |  58 +++++++++++++++--
 14 files changed, 227 insertions(+), 45 deletions(-)

Range-diff against v1:
-:  ---------- > 1:  b7e376be16 http: allow custom index-pack args
1:  9fba6c9bcc ! 2:  57220ceb84 http-fetch: allow custom index-pack args
    @@ Documentation/git-http-fetch.txt: commit-id::
      
      --packfile=<hash>::
     -	Instead of a commit id on the command line (which is not expected in
    -+	For internal use only. Instead of a commit id on the command line (which is not expected in
    ++	For internal use only. Instead of a commit id on the command
    ++	line (which is not expected in
      	this case), 'git http-fetch' fetches the packfile directly at the given
      	URL and uses index-pack to generate corresponding .idx and .keep files.
      	The hash is used to determine the name of the temporary file and is
    @@ fetch-pack.c: static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
      		strvec_pushf(&cmd.args, "--packfile=%.*s",
      			     (int) the_hash_algo->hexsz,
      			     packfile_uris.items[i].string);
    -+		strvec_push(&cmd.args, "--index-pack-args=index-pack --stdin --keep");
    ++		strvec_push(&cmd.args, "--index-pack-arg=index-pack");
    ++		strvec_push(&cmd.args, "--index-pack-arg=--stdin");
    ++		strvec_push(&cmd.args, "--index-pack-arg=--keep");
      		strvec_push(&cmd.args, uri);
      		cmd.git_cmd = 1;
      		cmd.no_stdin = 1;
    @@ http-fetch.c: int cmd_main(int argc, const char **argv)
      	int packfile = 0;
      	int nongit;
      	struct object_id packfile_hash;
    -+	const char *index_pack_args = NULL;
    ++	struct strvec index_pack_args = STRVEC_INIT;
      
      	setup_git_directory_gently(&nongit);
      
    @@ http-fetch.c: int cmd_main(int argc, const char **argv)
      			packfile = 1;
      			if (parse_oid_hex(p, &packfile_hash, &end) || *end)
      				die(_("argument to --packfile must be a valid hash (got '%s')"), p);
    -+		} else if (skip_prefix(argv[arg], "--index-pack-args=", &p)) {
    -+			index_pack_args = p;
    ++		} else if (skip_prefix(argv[arg], "--index-pack-arg=", &p)) {
    ++			strvec_push(&index_pack_args, p);
      		}
      		arg++;
      	}
    @@ http-fetch.c: int cmd_main(int argc, const char **argv)
      
      	if (packfile) {
     -		fetch_single_packfile(&packfile_hash, argv[arg]);
    -+		struct strvec encoded = STRVEC_INIT;
    -+		char **raw;
    -+		int i;
    -+
    -+		if (!index_pack_args)
    ++		if (!index_pack_args.nr)
     +			die(_("--packfile requires --index-pack-args"));
     +
    -+		strvec_split(&encoded, index_pack_args);
    -+
    -+		CALLOC_ARRAY(raw, encoded.nr + 1);
    -+		for (i = 0; i < encoded.nr; i++)
    -+			raw[i] = url_percent_decode(encoded.v[i]);
    -+
     +		fetch_single_packfile(&packfile_hash, argv[arg],
    -+				      (const char **) raw);
    -+
    -+		for (i = 0; i < encoded.nr; i++)
    -+			free(raw[i]);
    -+		free(raw);
    -+		strvec_clear(&encoded);
    ++				      index_pack_args.v);
     +
      		return 0;
      	}
      
    -+	if (index_pack_args)
    ++	if (index_pack_args.nr)
     +		die(_("--index-pack-args can only be used with --packfile"));
     +
      	if (commits_on_stdin) {
    @@ t/t5550-http-fetch-dumb.sh: test_expect_success 'http-fetch --packfile' '
      	p=$(cd "$HTTPD_DOCUMENT_ROOT_PATH"/repo_pack.git && ls objects/pack/pack-*.pack) &&
     -	git -C packfileclient http-fetch --packfile=$ARBITRARY "$HTTPD_URL"/dumb/repo_pack.git/$p >out &&
     +	git -C packfileclient http-fetch --packfile=$ARBITRARY \
    -+		--index-pack-args="index-pack --stdin --keep" "$HTTPD_URL"/dumb/repo_pack.git/$p >out &&
    ++		--index-pack-arg=index-pack --index-pack-arg=--stdin \
    ++		--index-pack-arg=--keep \
    ++		"$HTTPD_URL"/dumb/repo_pack.git/$p >out &&
      
      	grep "^keep.[0-9a-f]\{16,\}$" out &&
      	cut -c6- out >packhash &&
2:  7c3244e79f ! 3:  aa87335464 fetch-pack: with packfile URIs, use index-pack arg
    @@ fetch-pack.c: static void write_promisor_file(const char *keep_name,
     - * Pass 1 as "only_packfile" if the pack received is the only pack in this
     - * fetch request (that is, if there were no packfile URIs provided).
     + * If packfile URIs were provided, pass a non-NULL pointer to index_pack_args.
    -+ * The string to pass as the --index-pack-args argument to http-fetch will be
    ++ * The strings to pass as the --index-pack-arg arguments to http-fetch will be
     + * stored there. (It must be freed by the caller.)
       */
      static int get_pack(struct fetch_pack_args *args,
      		    int xd[2], struct string_list *pack_lockfiles,
     -		    int only_packfile,
    -+		    char **index_pack_args,
    ++		    struct strvec *index_pack_args,
      		    struct ref **sought, int nr_sought)
      {
      	struct async demux;
    @@ fetch-pack.c: static int get_pack(struct fetch_pack_args *args,
      	}
      
     +	if (index_pack_args) {
    -+		struct strbuf joined = STRBUF_INIT;
     +		int i;
     +
    -+		for (i = 0; i < cmd.args.nr; i++) {
    -+			if (i)
    -+				strbuf_addch(&joined, ' ');
    -+			strbuf_addstr_urlencode(&joined, cmd.args.v[i],
    -+						is_rfc3986_unreserved);
    -+		}
    -+		*index_pack_args = strbuf_detach(&joined, NULL);
    ++		for (i = 0; i < cmd.args.nr; i++)
    ++			strvec_push(index_pack_args, cmd.args.v[i]);
     +	}
     +
      	cmd.in = demux.out;
    @@ fetch-pack.c: static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
      	int seen_ack = 0;
      	struct string_list packfile_uris = STRING_LIST_INIT_DUP;
      	int i;
    -+	char *index_pack_args = NULL;
    ++	struct strvec index_pack_args = STRVEC_INIT;
      
      	negotiator = &negotiator_alloc;
      	fetch_negotiator_init(r, negotiator);
    @@ fetch-pack.c: static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
      				die(_("git fetch-pack: fetch failed."));
      			do_check_stateless_delimiter(args, &reader);
      
    +@@ fetch-pack.c: static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
    + 	}
    + 
    + 	for (i = 0; i < packfile_uris.nr; i++) {
    ++		int j;
    + 		struct child_process cmd = CHILD_PROCESS_INIT;
    + 		char packname[GIT_MAX_HEXSZ + 1];
    + 		const char *uri = packfile_uris.items[i].string +
     @@ fetch-pack.c: static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
      		strvec_pushf(&cmd.args, "--packfile=%.*s",
      			     (int) the_hash_algo->hexsz,
      			     packfile_uris.items[i].string);
    --		strvec_push(&cmd.args, "--index-pack-args=index-pack --stdin --keep");
    -+		strvec_pushf(&cmd.args, "--index-pack-args=%s", index_pack_args);
    +-		strvec_push(&cmd.args, "--index-pack-arg=index-pack");
    +-		strvec_push(&cmd.args, "--index-pack-arg=--stdin");
    +-		strvec_push(&cmd.args, "--index-pack-arg=--keep");
    ++		for (j = 0; j < index_pack_args.nr; j++)
    ++			strvec_pushf(&cmd.args, "--index-pack-arg=%s",
    ++				     index_pack_args.v[j]);
      		strvec_push(&cmd.args, uri);
      		cmd.git_cmd = 1;
      		cmd.no_stdin = 1;
    @@ fetch-pack.c: static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
      						 packname));
      	}
      	string_list_clear(&packfile_uris, 0);
    -+	FREE_AND_NULL(index_pack_args);
    ++	strvec_clear(&index_pack_args);
      
      	if (negotiator)
      		negotiator->release(negotiator);
3:  384c9d1c73 ! 4:  e8b18d02e6 fetch-pack: print and use dangling .gitmodules
    @@ Documentation/git-index-pack.txt: OPTIONS
      	Specifies the number of threads to spawn when resolving
     
      ## builtin/index-pack.c ##
    +@@ builtin/index-pack.c: static void show_pack_info(int stat_only)
    + 	}
    + }
    + 
    ++static int print_dangling_gitmodules(struct fsck_options *o,
    ++				     const struct object_id *oid,
    ++				     enum object_type object_type,
    ++				     int msg_type, const char *message)
    ++{
    ++	/*
    ++	 * NEEDSWORK: Plumb the MSG_ID (from fsck.c) here and use it
    ++	 * instead of relying on this string check.
    ++	 */
    ++	if (starts_with(message, "gitmodulesMissing")) {
    ++		printf("%s\n", oid_to_hex(oid));
    ++		return 0;
    ++	}
    ++	return fsck_error_function(o, oid, object_type, msg_type, message);
    ++}
    ++
    + int cmd_index_pack(int argc, const char **argv, const char *prefix)
    + {
    + 	int i, fix_thin_pack = 0, verify = 0, stat_only = 0;
     @@ builtin/index-pack.c: int cmd_index_pack(int argc, const char **argv, const char *prefix)
      	else
      		close(input_fd);
    @@ builtin/index-pack.c: int cmd_index_pack(int argc, const char **argv, const char
     -	if (do_fsck_object && fsck_finish(&fsck_options))
     -		die(_("fsck error in pack objects"));
     +	if (do_fsck_object) {
    -+		struct fsck_options fo = FSCK_OPTIONS_STRICT;
    ++		struct fsck_options fo = fsck_options;
     +
    -+		fo.print_dangling_gitmodules = 1;
    ++		fo.error_func = print_dangling_gitmodules;
     +		if (fsck_finish(&fo))
     +			die(_("fsck error in pack objects"));
     +	}
    @@ fetch-pack.c: static void write_promisor_file(const char *keep_name,
     +
      /*
       * If packfile URIs were provided, pass a non-NULL pointer to index_pack_args.
    -  * The string to pass as the --index-pack-args argument to http-fetch will be
    +  * The strings to pass as the --index-pack-arg arguments to http-fetch will be
     @@ fetch-pack.c: static void write_promisor_file(const char *keep_name,
      static int get_pack(struct fetch_pack_args *args,
      		    int xd[2], struct string_list *pack_lockfiles,
    - 		    char **index_pack_args,
    + 		    struct strvec *index_pack_args,
     -		    struct ref **sought, int nr_sought)
     +		    struct ref **sought, int nr_sought,
     +		    struct oidset *gitmodules_oids)
    @@ fetch-pack.c: static struct ref *do_fetch_pack(struct fetch_pack_args *args,
     @@ fetch-pack.c: static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
      	struct string_list packfile_uris = STRING_LIST_INIT_DUP;
      	int i;
    - 	char *index_pack_args = NULL;
    + 	struct strvec index_pack_args = STRVEC_INIT;
     +	struct oidset gitmodules_oids = OIDSET_INIT;
      
      	negotiator = &negotiator_alloc;
    @@ fetch-pack.c: static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
      		if (finish_command(&cmd))
     @@ fetch-pack.c: static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
      	string_list_clear(&packfile_uris, 0);
    - 	FREE_AND_NULL(index_pack_args);
    + 	strvec_clear(&index_pack_args);
      
     +	fsck_gitmodules_oids(&gitmodules_oids);
     +
    @@ fsck.c: int fsck_error_function(struct fsck_options *o,
      int fsck_finish(struct fsck_options *options)
      {
      	int ret = 0;
    -@@ fsck.c: int fsck_finish(struct fsck_options *options)
    - 		if (!buf) {
    - 			if (is_promisor_object(oid))
    - 				continue;
    --			ret |= report(options,
    --				      oid, OBJ_BLOB,
    --				      FSCK_MSG_GITMODULES_MISSING,
    --				      "unable to read .gitmodules blob");
    -+			if (options->print_dangling_gitmodules)
    -+				printf("%s\n", oid_to_hex(oid));
    -+			else
    -+				ret |= report(options,
    -+					      oid, OBJ_BLOB,
    -+					      FSCK_MSG_GITMODULES_MISSING,
    -+					      "unable to read .gitmodules blob");
    - 			continue;
    - 		}
    - 
     
      ## fsck.h ##
    -@@ fsck.h: struct fsck_options {
    - 	int *msg_type;
    - 	struct oidset skiplist;
    - 	kh_oid_map_t *object_names;
    -+
    -+	/*
    -+	 * If 1, print the hashes of missing .gitmodules blobs instead of
    -+	 * considering them to be errors.
    -+	 */
    -+	unsigned print_dangling_gitmodules:1;
    - };
    - 
    - #define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT }
     @@ fsck.h: int fsck_walk(struct object *obj, void *data, struct fsck_options *options);
      int fsck_object(struct object *obj, void *data, unsigned long size,
      	struct fsck_options *options);
    @@ pack.h: int verify_pack_index(struct packed_git *);
       * The "hdr" output buffer should be at least this big, which will handle sizes
     
      ## t/t5702-protocol-v2.sh ##
    +@@ t/t5702-protocol-v2.sh: test_expect_success 'part of packfile response provided as URI' '
    + 	test -f hfound &&
    + 	test -f h2found &&
    + 
    +-	# Ensure that there are exactly 6 files (3 .pack and 3 .idx).
    +-	ls http_child/.git/objects/pack/* >filelist &&
    ++	# Ensure that there are exactly 3 packfiles with associated .idx
    ++	ls http_child/.git/objects/pack/*.pack \
    ++	    http_child/.git/objects/pack/*.idx >filelist &&
    + 	test_line_count = 6 filelist
    + '
    + 
    +@@ t/t5702-protocol-v2.sh: test_expect_success 'packfile-uri with transfer.fsckobjects' '
    + 		-c fetch.uriprotocols=http,https \
    + 		clone "$HTTPD_URL/smart/http_parent" http_child &&
    + 
    +-	# Ensure that there are exactly 4 files (2 .pack and 2 .idx).
    +-	ls http_child/.git/objects/pack/* >filelist &&
    ++	# Ensure that there are exactly 2 packfiles with associated .idx
    ++	ls http_child/.git/objects/pack/*.pack \
    ++	    http_child/.git/objects/pack/*.idx >filelist &&
    + 	test_line_count = 4 filelist
    + '
    + 
     @@ t/t5702-protocol-v2.sh: test_expect_success 'packfile-uri with transfer.fsckobjects fails on bad object'
      	test_i18ngrep "invalid author/committer line - missing email" error
      '
    @@ t/t5702-protocol-v2.sh: test_expect_success 'packfile-uri with transfer.fsckobje
     +		-c fetch.uriprotocols=http,https \
     +		clone "$HTTPD_URL/smart/http_parent" http_child &&
     +
    -+	# Ensure that there are exactly 4 files (2 .pack and 2 .idx).
    -+	ls http_child/.git/objects/pack/* >filelist &&
    ++	# Ensure that there are exactly 2 packfiles with associated .idx
    ++	ls http_child/.git/objects/pack/*.pack \
    ++	    http_child/.git/objects/pack/*.idx >filelist &&
     +	test_line_count = 4 filelist
     +'
     +
4:  da0d7b38ae < -:  ---------- SQUASH??? test fix
-- 
2.30.0.617.g56c4b15f3c-goog


^ permalink raw reply	[flat|nested] 229+ messages in thread

* [PATCH v2 1/4] http: allow custom index-pack args
  2021-02-22 19:20 ` [PATCH v2 " Jonathan Tan
@ 2021-02-22 19:20   ` Jonathan Tan
  2021-02-22 19:20   ` [PATCH v2 2/4] http-fetch: " Jonathan Tan
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 229+ messages in thread
From: Jonathan Tan @ 2021-02-22 19:20 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, avarab, gitster

Currently, when fetching, packfiles referenced by URIs are run through
index-pack without any arguments other than --stdin and --keep, no
matter what arguments are used for the packfile that is inline in the
fetch response. As a preparation for ensuring that all packs (whether
inline or not) use the same index-pack arguments, teach the http
subsystem to allow custom index-pack arguments.

http-fetch has been updated to use the new API. For now, it passes
--keep alone instead of --keep with a process ID, but this is only
temporary because http-fetch itself will be taught to accept index-pack
parameters (instead of using a hardcoded constant) in a subsequent
commit.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 http-fetch.c |  6 +++++-
 http.c       | 15 ++++++++-------
 http.h       | 10 +++++-----
 3 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/http-fetch.c b/http-fetch.c
index c4ccc5fea9..2d1d9d054f 100644
--- a/http-fetch.c
+++ b/http-fetch.c
@@ -43,6 +43,9 @@ static int fetch_using_walker(const char *raw_url, int get_verbosely,
 	return rc;
 }
 
+static const char *index_pack_args[] =
+	{"index-pack", "--stdin", "--keep", NULL};
+
 static void fetch_single_packfile(struct object_id *packfile_hash,
 				  const char *url) {
 	struct http_pack_request *preq;
@@ -55,7 +58,8 @@ static void fetch_single_packfile(struct object_id *packfile_hash,
 	if (preq == NULL)
 		die("couldn't create http pack request");
 	preq->slot->results = &results;
-	preq->generate_keep = 1;
+	preq->index_pack_args = index_pack_args;
+	preq->preserve_index_pack_stdout = 1;
 
 	if (start_active_slot(preq->slot)) {
 		run_active_slot(preq->slot);
diff --git a/http.c b/http.c
index 8b23a546af..f8ea28bb2e 100644
--- a/http.c
+++ b/http.c
@@ -2259,6 +2259,9 @@ void release_http_pack_request(struct http_pack_request *preq)
 	free(preq);
 }
 
+static const char *default_index_pack_args[] =
+	{"index-pack", "--stdin", NULL};
+
 int finish_http_pack_request(struct http_pack_request *preq)
 {
 	struct child_process ip = CHILD_PROCESS_INIT;
@@ -2270,17 +2273,15 @@ int finish_http_pack_request(struct http_pack_request *preq)
 
 	tmpfile_fd = xopen(preq->tmpfile.buf, O_RDONLY);
 
-	strvec_push(&ip.args, "index-pack");
-	strvec_push(&ip.args, "--stdin");
 	ip.git_cmd = 1;
 	ip.in = tmpfile_fd;
-	if (preq->generate_keep) {
-		strvec_pushf(&ip.args, "--keep=git %"PRIuMAX,
-			     (uintmax_t)getpid());
+	ip.argv = preq->index_pack_args ? preq->index_pack_args
+					: default_index_pack_args;
+
+	if (preq->preserve_index_pack_stdout)
 		ip.out = 0;
-	} else {
+	else
 		ip.no_stdout = 1;
-	}
 
 	if (run_command(&ip)) {
 		ret = -1;
diff --git a/http.h b/http.h
index 5de792ef3f..bf3d1270ad 100644
--- a/http.h
+++ b/http.h
@@ -218,12 +218,12 @@ struct http_pack_request {
 	char *url;
 
 	/*
-	 * If this is true, finish_http_pack_request() will pass "--keep" to
-	 * index-pack, resulting in the creation of a keep file, and will not
-	 * suppress its stdout (that is, the "keep\t<hash>\n" line will be
-	 * printed to stdout).
+	 * index-pack command to run. Must be terminated by NULL.
+	 *
+	 * If NULL, defaults to	{"index-pack", "--stdin", NULL}.
 	 */
-	unsigned generate_keep : 1;
+	const char **index_pack_args;
+	unsigned preserve_index_pack_stdout : 1;
 
 	FILE *packfile;
 	struct strbuf tmpfile;
-- 
2.30.0.617.g56c4b15f3c-goog


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 2/4] http-fetch: allow custom index-pack args
  2021-02-22 19:20 ` [PATCH v2 " Jonathan Tan
  2021-02-22 19:20   ` [PATCH v2 1/4] http: allow custom index-pack args Jonathan Tan
@ 2021-02-22 19:20   ` Jonathan Tan
  2021-02-23 13:17     ` Ævar Arnfjörð Bjarmason
  2021-03-05  0:19     ` Jonathan Nieder
  2021-02-22 19:20   ` [PATCH v2 3/4] fetch-pack: with packfile URIs, use index-pack arg Jonathan Tan
                     ` (2 subsequent siblings)
  4 siblings, 2 replies; 229+ messages in thread
From: Jonathan Tan @ 2021-02-22 19:20 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, avarab, gitster

This is the next step in teaching fetch-pack to pass its index-pack
arguments when processing packfiles referenced by URIs.

The "--keep" in fetch-pack.c will be replaced with a full message in a
subsequent commit.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/git-http-fetch.txt | 10 ++++++++--
 fetch-pack.c                     |  3 +++
 http-fetch.c                     | 20 +++++++++++++++-----
 t/t5550-http-fetch-dumb.sh       |  5 ++++-
 4 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-http-fetch.txt b/Documentation/git-http-fetch.txt
index 4deb4893f5..9fa17b60e4 100644
--- a/Documentation/git-http-fetch.txt
+++ b/Documentation/git-http-fetch.txt
@@ -41,11 +41,17 @@ commit-id::
 		<commit-id>['\t'<filename-as-in--w>]
 
 --packfile=<hash>::
-	Instead of a commit id on the command line (which is not expected in
+	For internal use only. Instead of a commit id on the command
+	line (which is not expected in
 	this case), 'git http-fetch' fetches the packfile directly at the given
 	URL and uses index-pack to generate corresponding .idx and .keep files.
 	The hash is used to determine the name of the temporary file and is
-	arbitrary. The output of index-pack is printed to stdout.
+	arbitrary. The output of index-pack is printed to stdout. Requires
+	--index-pack-args.
+
+--index-pack-args=<args>::
+	For internal use only. The command to run on the contents of the
+	downloaded pack. Arguments are URL-encoded separated by spaces.
 
 --recover::
 	Verify that everything reachable from target is fetched.  Used after
diff --git a/fetch-pack.c b/fetch-pack.c
index 876f90c759..aeac010b0b 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1645,6 +1645,9 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 		strvec_pushf(&cmd.args, "--packfile=%.*s",
 			     (int) the_hash_algo->hexsz,
 			     packfile_uris.items[i].string);
+		strvec_push(&cmd.args, "--index-pack-arg=index-pack");
+		strvec_push(&cmd.args, "--index-pack-arg=--stdin");
+		strvec_push(&cmd.args, "--index-pack-arg=--keep");
 		strvec_push(&cmd.args, uri);
 		cmd.git_cmd = 1;
 		cmd.no_stdin = 1;
diff --git a/http-fetch.c b/http-fetch.c
index 2d1d9d054f..fa642462a9 100644
--- a/http-fetch.c
+++ b/http-fetch.c
@@ -3,6 +3,7 @@
 #include "exec-cmd.h"
 #include "http.h"
 #include "walker.h"
+#include "strvec.h"
 
 static const char http_fetch_usage[] = "git http-fetch "
 "[-c] [-t] [-a] [-v] [--recover] [-w ref] [--stdin | --packfile=hash | commit-id] url";
@@ -43,11 +44,9 @@ static int fetch_using_walker(const char *raw_url, int get_verbosely,
 	return rc;
 }
 
-static const char *index_pack_args[] =
-	{"index-pack", "--stdin", "--keep", NULL};
-
 static void fetch_single_packfile(struct object_id *packfile_hash,
-				  const char *url) {
+				  const char *url,
+				  const char **index_pack_args) {
 	struct http_pack_request *preq;
 	struct slot_results results;
 	int ret;
@@ -90,6 +89,7 @@ int cmd_main(int argc, const char **argv)
 	int packfile = 0;
 	int nongit;
 	struct object_id packfile_hash;
+	struct strvec index_pack_args = STRVEC_INIT;
 
 	setup_git_directory_gently(&nongit);
 
@@ -116,6 +116,8 @@ int cmd_main(int argc, const char **argv)
 			packfile = 1;
 			if (parse_oid_hex(p, &packfile_hash, &end) || *end)
 				die(_("argument to --packfile must be a valid hash (got '%s')"), p);
+		} else if (skip_prefix(argv[arg], "--index-pack-arg=", &p)) {
+			strvec_push(&index_pack_args, p);
 		}
 		arg++;
 	}
@@ -128,10 +130,18 @@ int cmd_main(int argc, const char **argv)
 	git_config(git_default_config, NULL);
 
 	if (packfile) {
-		fetch_single_packfile(&packfile_hash, argv[arg]);
+		if (!index_pack_args.nr)
+			die(_("--packfile requires --index-pack-args"));
+
+		fetch_single_packfile(&packfile_hash, argv[arg],
+				      index_pack_args.v);
+
 		return 0;
 	}
 
+	if (index_pack_args.nr)
+		die(_("--index-pack-args can only be used with --packfile"));
+
 	if (commits_on_stdin) {
 		commits = walker_targets_stdin(&commit_id, &write_ref);
 	} else {
diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh
index 483578b2d7..358b322e05 100755
--- a/t/t5550-http-fetch-dumb.sh
+++ b/t/t5550-http-fetch-dumb.sh
@@ -224,7 +224,10 @@ test_expect_success 'http-fetch --packfile' '
 
 	git init packfileclient &&
 	p=$(cd "$HTTPD_DOCUMENT_ROOT_PATH"/repo_pack.git && ls objects/pack/pack-*.pack) &&
-	git -C packfileclient http-fetch --packfile=$ARBITRARY "$HTTPD_URL"/dumb/repo_pack.git/$p >out &&
+	git -C packfileclient http-fetch --packfile=$ARBITRARY \
+		--index-pack-arg=index-pack --index-pack-arg=--stdin \
+		--index-pack-arg=--keep \
+		"$HTTPD_URL"/dumb/repo_pack.git/$p >out &&
 
 	grep "^keep.[0-9a-f]\{16,\}$" out &&
 	cut -c6- out >packhash &&
-- 
2.30.0.617.g56c4b15f3c-goog


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 3/4] fetch-pack: with packfile URIs, use index-pack arg
  2021-02-22 19:20 ` [PATCH v2 " Jonathan Tan
  2021-02-22 19:20   ` [PATCH v2 1/4] http: allow custom index-pack args Jonathan Tan
  2021-02-22 19:20   ` [PATCH v2 2/4] http-fetch: " Jonathan Tan
@ 2021-02-22 19:20   ` Jonathan Tan
  2021-02-22 19:20   ` [PATCH v2 4/4] fetch-pack: print and use dangling .gitmodules Jonathan Tan
  2021-02-22 20:12   ` [PATCH v2 0/4] Check .gitmodules when using packfile URIs Junio C Hamano
  4 siblings, 0 replies; 229+ messages in thread
From: Jonathan Tan @ 2021-02-22 19:20 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, avarab, gitster

Unify the index-pack arguments used when processing the inline pack and
when downloading packfiles referenced by URIs. This is done by teaching
get_pack() to also store the index-pack arguments whenever at least one
packfile URI is given, and then when processing the packfile URI(s),
using the stored arguments.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 fetch-pack.c | 34 +++++++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index aeac010b0b..dd0a6c4b34 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -797,12 +797,13 @@ static void write_promisor_file(const char *keep_name,
 }
 
 /*
- * Pass 1 as "only_packfile" if the pack received is the only pack in this
- * fetch request (that is, if there were no packfile URIs provided).
+ * If packfile URIs were provided, pass a non-NULL pointer to index_pack_args.
+ * The strings to pass as the --index-pack-arg arguments to http-fetch will be
+ * stored there. (It must be freed by the caller.)
  */
 static int get_pack(struct fetch_pack_args *args,
 		    int xd[2], struct string_list *pack_lockfiles,
-		    int only_packfile,
+		    struct strvec *index_pack_args,
 		    struct ref **sought, int nr_sought)
 {
 	struct async demux;
@@ -845,7 +846,7 @@ static int get_pack(struct fetch_pack_args *args,
 		strvec_push(&cmd.args, alternate_shallow_file);
 	}
 
-	if (do_keep || args->from_promisor) {
+	if (do_keep || args->from_promisor || index_pack_args) {
 		if (pack_lockfiles)
 			cmd.out = -1;
 		cmd_name = "index-pack";
@@ -863,7 +864,7 @@ static int get_pack(struct fetch_pack_args *args,
 				     "--keep=fetch-pack %"PRIuMAX " on %s",
 				     (uintmax_t)getpid(), hostname);
 		}
-		if (only_packfile && args->check_self_contained_and_connected)
+		if (!index_pack_args && args->check_self_contained_and_connected)
 			strvec_push(&cmd.args, "--check-self-contained-and-connected");
 		else
 			/*
@@ -901,7 +902,7 @@ static int get_pack(struct fetch_pack_args *args,
 	    : transfer_fsck_objects >= 0
 	    ? transfer_fsck_objects
 	    : 0) {
-		if (args->from_promisor || !only_packfile)
+		if (args->from_promisor || index_pack_args)
 			/*
 			 * We cannot use --strict in index-pack because it
 			 * checks both broken objects and links, but we only
@@ -913,6 +914,13 @@ static int get_pack(struct fetch_pack_args *args,
 				     fsck_msg_types.buf);
 	}
 
+	if (index_pack_args) {
+		int i;
+
+		for (i = 0; i < cmd.args.nr; i++)
+			strvec_push(index_pack_args, cmd.args.v[i]);
+	}
+
 	cmd.in = demux.out;
 	cmd.git_cmd = 1;
 	if (start_command(&cmd))
@@ -1084,7 +1092,7 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 		alternate_shallow_file = setup_temporary_shallow(si->shallow);
 	else
 		alternate_shallow_file = NULL;
-	if (get_pack(args, fd, pack_lockfiles, 1, sought, nr_sought))
+	if (get_pack(args, fd, pack_lockfiles, NULL, sought, nr_sought))
 		die(_("git fetch-pack: fetch failed."));
 
  all_done:
@@ -1535,6 +1543,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 	int seen_ack = 0;
 	struct string_list packfile_uris = STRING_LIST_INIT_DUP;
 	int i;
+	struct strvec index_pack_args = STRVEC_INIT;
 
 	negotiator = &negotiator_alloc;
 	fetch_negotiator_init(r, negotiator);
@@ -1624,7 +1633,8 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 				receive_packfile_uris(&reader, &packfile_uris);
 			process_section_header(&reader, "packfile", 0);
 			if (get_pack(args, fd, pack_lockfiles,
-				     !packfile_uris.nr, sought, nr_sought))
+				     packfile_uris.nr ? &index_pack_args : NULL,
+				     sought, nr_sought))
 				die(_("git fetch-pack: fetch failed."));
 			do_check_stateless_delimiter(args, &reader);
 
@@ -1636,6 +1646,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 	}
 
 	for (i = 0; i < packfile_uris.nr; i++) {
+		int j;
 		struct child_process cmd = CHILD_PROCESS_INIT;
 		char packname[GIT_MAX_HEXSZ + 1];
 		const char *uri = packfile_uris.items[i].string +
@@ -1645,9 +1656,9 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 		strvec_pushf(&cmd.args, "--packfile=%.*s",
 			     (int) the_hash_algo->hexsz,
 			     packfile_uris.items[i].string);
-		strvec_push(&cmd.args, "--index-pack-arg=index-pack");
-		strvec_push(&cmd.args, "--index-pack-arg=--stdin");
-		strvec_push(&cmd.args, "--index-pack-arg=--keep");
+		for (j = 0; j < index_pack_args.nr; j++)
+			strvec_pushf(&cmd.args, "--index-pack-arg=%s",
+				     index_pack_args.v[j]);
 		strvec_push(&cmd.args, uri);
 		cmd.git_cmd = 1;
 		cmd.no_stdin = 1;
@@ -1683,6 +1694,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 						 packname));
 	}
 	string_list_clear(&packfile_uris, 0);
+	strvec_clear(&index_pack_args);
 
 	if (negotiator)
 		negotiator->release(negotiator);
-- 
2.30.0.617.g56c4b15f3c-goog


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 4/4] fetch-pack: print and use dangling .gitmodules
  2021-02-22 19:20 ` [PATCH v2 " Jonathan Tan
                     ` (2 preceding siblings ...)
  2021-02-22 19:20   ` [PATCH v2 3/4] fetch-pack: with packfile URIs, use index-pack arg Jonathan Tan
@ 2021-02-22 19:20   ` Jonathan Tan
  2021-02-22 20:12   ` [PATCH v2 0/4] Check .gitmodules when using packfile URIs Junio C Hamano
  4 siblings, 0 replies; 229+ messages in thread
From: Jonathan Tan @ 2021-02-22 19:20 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, avarab, gitster

Teach index-pack to print dangling .gitmodules links after its "keep" or
"pack" line instead of declaring an error, and teach fetch-pack to check
such lines printed.

This allows the tree side of the .gitmodules link to be in one packfile
and the blob side to be in another without failing the fsck check,
because it is now fetch-pack which checks such objects after all
packfiles have been downloaded and indexed (and not index-pack on an
individual packfile, as it is before this commit).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/git-index-pack.txt |  7 ++-
 builtin/index-pack.c             | 25 +++++++++-
 builtin/receive-pack.c           |  2 +-
 fetch-pack.c                     | 78 +++++++++++++++++++++++++++-----
 fsck.c                           |  5 ++
 fsck.h                           |  2 +
 pack-write.c                     |  8 +++-
 pack.h                           |  2 +-
 t/t5702-protocol-v2.sh           | 58 ++++++++++++++++++++++--
 9 files changed, 165 insertions(+), 22 deletions(-)

diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index af0c26232c..e74a4a1eda 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -78,7 +78,12 @@ OPTIONS
 	Die if the pack contains broken links. For internal use only.
 
 --fsck-objects::
-	Die if the pack contains broken objects. For internal use only.
+	For internal use only.
++
+Die if the pack contains broken objects. If the pack contains a tree
+pointing to a .gitmodules blob that does not exist, prints the hash of
+that blob (for the caller to check) after the hash that goes into the
+name of the pack/idx file (see "Notes").
 
 --threads=<n>::
 	Specifies the number of threads to spawn when resolving
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 557bd2f348..0444febeee 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1693,6 +1693,22 @@ static void show_pack_info(int stat_only)
 	}
 }
 
+static int print_dangling_gitmodules(struct fsck_options *o,
+				     const struct object_id *oid,
+				     enum object_type object_type,
+				     int msg_type, const char *message)
+{
+	/*
+	 * NEEDSWORK: Plumb the MSG_ID (from fsck.c) here and use it
+	 * instead of relying on this string check.
+	 */
+	if (starts_with(message, "gitmodulesMissing")) {
+		printf("%s\n", oid_to_hex(oid));
+		return 0;
+	}
+	return fsck_error_function(o, oid, object_type, msg_type, message);
+}
+
 int cmd_index_pack(int argc, const char **argv, const char *prefix)
 {
 	int i, fix_thin_pack = 0, verify = 0, stat_only = 0;
@@ -1888,8 +1904,13 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	else
 		close(input_fd);
 
-	if (do_fsck_object && fsck_finish(&fsck_options))
-		die(_("fsck error in pack objects"));
+	if (do_fsck_object) {
+		struct fsck_options fo = fsck_options;
+
+		fo.error_func = print_dangling_gitmodules;
+		if (fsck_finish(&fo))
+			die(_("fsck error in pack objects"));
+	}
 
 	free(objects);
 	strbuf_release(&index_name_buf);
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index d49d050e6e..ed2c9b42e9 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -2275,7 +2275,7 @@ static const char *unpack(int err_fd, struct shallow_info *si)
 		status = start_command(&child);
 		if (status)
 			return "index-pack fork failed";
-		pack_lockfile = index_pack_lockfile(child.out);
+		pack_lockfile = index_pack_lockfile(child.out, NULL);
 		close(child.out);
 		status = finish_command(&child);
 		if (status)
diff --git a/fetch-pack.c b/fetch-pack.c
index dd0a6c4b34..f9def5ac74 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -796,6 +796,26 @@ static void write_promisor_file(const char *keep_name,
 	strbuf_release(&promisor_name);
 }
 
+static void parse_gitmodules_oids(int fd, struct oidset *gitmodules_oids)
+{
+	int len = the_hash_algo->hexsz + 1; /* hash + NL */
+
+	do {
+		char hex_hash[GIT_MAX_HEXSZ + 1];
+		int read_len = read_in_full(fd, hex_hash, len);
+		struct object_id oid;
+		const char *end;
+
+		if (!read_len)
+			return;
+		if (read_len != len)
+			die("invalid length read %d", read_len);
+		if (parse_oid_hex(hex_hash, &oid, &end) || *end != '\n')
+			die("invalid hash");
+		oidset_insert(gitmodules_oids, &oid);
+	} while (1);
+}
+
 /*
  * If packfile URIs were provided, pass a non-NULL pointer to index_pack_args.
  * The strings to pass as the --index-pack-arg arguments to http-fetch will be
@@ -804,7 +824,8 @@ static void write_promisor_file(const char *keep_name,
 static int get_pack(struct fetch_pack_args *args,
 		    int xd[2], struct string_list *pack_lockfiles,
 		    struct strvec *index_pack_args,
-		    struct ref **sought, int nr_sought)
+		    struct ref **sought, int nr_sought,
+		    struct oidset *gitmodules_oids)
 {
 	struct async demux;
 	int do_keep = args->keep_pack;
@@ -812,6 +833,7 @@ static int get_pack(struct fetch_pack_args *args,
 	struct pack_header header;
 	int pass_header = 0;
 	struct child_process cmd = CHILD_PROCESS_INIT;
+	int fsck_objects = 0;
 	int ret;
 
 	memset(&demux, 0, sizeof(demux));
@@ -846,8 +868,15 @@ static int get_pack(struct fetch_pack_args *args,
 		strvec_push(&cmd.args, alternate_shallow_file);
 	}
 
-	if (do_keep || args->from_promisor || index_pack_args) {
-		if (pack_lockfiles)
+	if (fetch_fsck_objects >= 0
+	    ? fetch_fsck_objects
+	    : transfer_fsck_objects >= 0
+	    ? transfer_fsck_objects
+	    : 0)
+		fsck_objects = 1;
+
+	if (do_keep || args->from_promisor || index_pack_args || fsck_objects) {
+		if (pack_lockfiles || fsck_objects)
 			cmd.out = -1;
 		cmd_name = "index-pack";
 		strvec_push(&cmd.args, cmd_name);
@@ -897,11 +926,7 @@ static int get_pack(struct fetch_pack_args *args,
 		strvec_pushf(&cmd.args, "--pack_header=%"PRIu32",%"PRIu32,
 			     ntohl(header.hdr_version),
 				 ntohl(header.hdr_entries));
-	if (fetch_fsck_objects >= 0
-	    ? fetch_fsck_objects
-	    : transfer_fsck_objects >= 0
-	    ? transfer_fsck_objects
-	    : 0) {
+	if (fsck_objects) {
 		if (args->from_promisor || index_pack_args)
 			/*
 			 * We cannot use --strict in index-pack because it
@@ -925,10 +950,15 @@ static int get_pack(struct fetch_pack_args *args,
 	cmd.git_cmd = 1;
 	if (start_command(&cmd))
 		die(_("fetch-pack: unable to fork off %s"), cmd_name);
-	if (do_keep && pack_lockfiles) {
-		char *pack_lockfile = index_pack_lockfile(cmd.out);
+	if (do_keep && (pack_lockfiles || fsck_objects)) {
+		int is_well_formed;
+		char *pack_lockfile = index_pack_lockfile(cmd.out, &is_well_formed);
+
+		if (!is_well_formed)
+			die(_("fetch-pack: invalid index-pack output"));
 		if (pack_lockfile)
 			string_list_append_nodup(pack_lockfiles, pack_lockfile);
+		parse_gitmodules_oids(cmd.out, gitmodules_oids);
 		close(cmd.out);
 	}
 
@@ -963,6 +993,22 @@ static int cmp_ref_by_name(const void *a_, const void *b_)
 	return strcmp(a->name, b->name);
 }
 
+static void fsck_gitmodules_oids(struct oidset *gitmodules_oids)
+{
+	struct oidset_iter iter;
+	const struct object_id *oid;
+	struct fsck_options fo = FSCK_OPTIONS_STRICT;
+
+	if (!oidset_size(gitmodules_oids))
+		return;
+
+	oidset_iter_init(gitmodules_oids, &iter);
+	while ((oid = oidset_iter_next(&iter)))
+		register_found_gitmodules(oid);
+	if (fsck_finish(&fo))
+		die("fsck failed");
+}
+
 static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 				 int fd[2],
 				 const struct ref *orig_ref,
@@ -977,6 +1023,7 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 	int agent_len;
 	struct fetch_negotiator negotiator_alloc;
 	struct fetch_negotiator *negotiator;
+	struct oidset gitmodules_oids = OIDSET_INIT;
 
 	negotiator = &negotiator_alloc;
 	fetch_negotiator_init(r, negotiator);
@@ -1092,8 +1139,10 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 		alternate_shallow_file = setup_temporary_shallow(si->shallow);
 	else
 		alternate_shallow_file = NULL;
-	if (get_pack(args, fd, pack_lockfiles, NULL, sought, nr_sought))
+	if (get_pack(args, fd, pack_lockfiles, NULL, sought, nr_sought,
+		     &gitmodules_oids))
 		die(_("git fetch-pack: fetch failed."));
+	fsck_gitmodules_oids(&gitmodules_oids);
 
  all_done:
 	if (negotiator)
@@ -1544,6 +1593,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 	struct string_list packfile_uris = STRING_LIST_INIT_DUP;
 	int i;
 	struct strvec index_pack_args = STRVEC_INIT;
+	struct oidset gitmodules_oids = OIDSET_INIT;
 
 	negotiator = &negotiator_alloc;
 	fetch_negotiator_init(r, negotiator);
@@ -1634,7 +1684,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 			process_section_header(&reader, "packfile", 0);
 			if (get_pack(args, fd, pack_lockfiles,
 				     packfile_uris.nr ? &index_pack_args : NULL,
-				     sought, nr_sought))
+				     sought, nr_sought, &gitmodules_oids))
 				die(_("git fetch-pack: fetch failed."));
 			do_check_stateless_delimiter(args, &reader);
 
@@ -1677,6 +1727,8 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 
 		packname[the_hash_algo->hexsz] = '\0';
 
+		parse_gitmodules_oids(cmd.out, &gitmodules_oids);
+
 		close(cmd.out);
 
 		if (finish_command(&cmd))
@@ -1696,6 +1748,8 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 	string_list_clear(&packfile_uris, 0);
 	strvec_clear(&index_pack_args);
 
+	fsck_gitmodules_oids(&gitmodules_oids);
+
 	if (negotiator)
 		negotiator->release(negotiator);
 
diff --git a/fsck.c b/fsck.c
index f82e2fe9e3..49ef6569e8 100644
--- a/fsck.c
+++ b/fsck.c
@@ -1243,6 +1243,11 @@ int fsck_error_function(struct fsck_options *o,
 	return 1;
 }
 
+void register_found_gitmodules(const struct object_id *oid)
+{
+	oidset_insert(&gitmodules_found, oid);
+}
+
 int fsck_finish(struct fsck_options *options)
 {
 	int ret = 0;
diff --git a/fsck.h b/fsck.h
index 69cf715e79..d75b723bd5 100644
--- a/fsck.h
+++ b/fsck.h
@@ -62,6 +62,8 @@ int fsck_walk(struct object *obj, void *data, struct fsck_options *options);
 int fsck_object(struct object *obj, void *data, unsigned long size,
 	struct fsck_options *options);
 
+void register_found_gitmodules(const struct object_id *oid);
+
 /*
  * Some fsck checks are context-dependent, and may end up queued; run this
  * after completing all fsck_object() calls in order to resolve any remaining
diff --git a/pack-write.c b/pack-write.c
index 3513665e1e..f66ea8e5a1 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -272,7 +272,7 @@ void fixup_pack_header_footer(int pack_fd,
 	fsync_or_die(pack_fd, pack_name);
 }
 
-char *index_pack_lockfile(int ip_out)
+char *index_pack_lockfile(int ip_out, int *is_well_formed)
 {
 	char packname[GIT_MAX_HEXSZ + 6];
 	const int len = the_hash_algo->hexsz + 6;
@@ -286,11 +286,17 @@ char *index_pack_lockfile(int ip_out)
 	 */
 	if (read_in_full(ip_out, packname, len) == len && packname[len-1] == '\n') {
 		const char *name;
+
+		if (is_well_formed)
+			*is_well_formed = 1;
 		packname[len-1] = 0;
 		if (skip_prefix(packname, "keep\t", &name))
 			return xstrfmt("%s/pack/pack-%s.keep",
 				       get_object_directory(), name);
+		return NULL;
 	}
+	if (is_well_formed)
+		*is_well_formed = 0;
 	return NULL;
 }
 
diff --git a/pack.h b/pack.h
index 9fc0945ac9..09cffec395 100644
--- a/pack.h
+++ b/pack.h
@@ -85,7 +85,7 @@ int verify_pack_index(struct packed_git *);
 int verify_pack(struct repository *, struct packed_git *, verify_fn fn, struct progress *, uint32_t);
 off_t write_pack_header(struct hashfile *f, uint32_t);
 void fixup_pack_header_footer(int, unsigned char *, const char *, uint32_t, unsigned char *, off_t);
-char *index_pack_lockfile(int fd);
+char *index_pack_lockfile(int fd, int *is_well_formed);
 
 /*
  * The "hdr" output buffer should be at least this big, which will handle sizes
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index 7d5b17909b..b1bc73a9a9 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -847,8 +847,9 @@ test_expect_success 'part of packfile response provided as URI' '
 	test -f hfound &&
 	test -f h2found &&
 
-	# Ensure that there are exactly 6 files (3 .pack and 3 .idx).
-	ls http_child/.git/objects/pack/* >filelist &&
+	# Ensure that there are exactly 3 packfiles with associated .idx
+	ls http_child/.git/objects/pack/*.pack \
+	    http_child/.git/objects/pack/*.idx >filelist &&
 	test_line_count = 6 filelist
 '
 
@@ -901,8 +902,9 @@ test_expect_success 'packfile-uri with transfer.fsckobjects' '
 		-c fetch.uriprotocols=http,https \
 		clone "$HTTPD_URL/smart/http_parent" http_child &&
 
-	# Ensure that there are exactly 4 files (2 .pack and 2 .idx).
-	ls http_child/.git/objects/pack/* >filelist &&
+	# Ensure that there are exactly 2 packfiles with associated .idx
+	ls http_child/.git/objects/pack/*.pack \
+	    http_child/.git/objects/pack/*.idx >filelist &&
 	test_line_count = 4 filelist
 '
 
@@ -936,6 +938,54 @@ test_expect_success 'packfile-uri with transfer.fsckobjects fails on bad object'
 	test_i18ngrep "invalid author/committer line - missing email" error
 '
 
+test_expect_success 'packfile-uri with transfer.fsckobjects succeeds when .gitmodules is separate from tree' '
+	P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+	rm -rf "$P" http_child &&
+
+	git init "$P" &&
+	git -C "$P" config "uploadpack.allowsidebandall" "true" &&
+
+	echo "[submodule libfoo]" >"$P/.gitmodules" &&
+	echo "path = include/foo" >>"$P/.gitmodules" &&
+	echo "url = git://example.com/git/lib.git" >>"$P/.gitmodules" &&
+	git -C "$P" add .gitmodules &&
+	git -C "$P" commit -m x &&
+
+	configure_exclusion "$P" .gitmodules >h &&
+
+	sane_unset GIT_TEST_SIDEBAND_ALL &&
+	git -c protocol.version=2 -c transfer.fsckobjects=1 \
+		-c fetch.uriprotocols=http,https \
+		clone "$HTTPD_URL/smart/http_parent" http_child &&
+
+	# Ensure that there are exactly 2 packfiles with associated .idx
+	ls http_child/.git/objects/pack/*.pack \
+	    http_child/.git/objects/pack/*.idx >filelist &&
+	test_line_count = 4 filelist
+'
+
+test_expect_success 'packfile-uri with transfer.fsckobjects fails when .gitmodules separate from tree is invalid' '
+	P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+	rm -rf "$P" http_child err &&
+
+	git init "$P" &&
+	git -C "$P" config "uploadpack.allowsidebandall" "true" &&
+
+	echo "[submodule \"..\"]" >"$P/.gitmodules" &&
+	echo "path = include/foo" >>"$P/.gitmodules" &&
+	echo "url = git://example.com/git/lib.git" >>"$P/.gitmodules" &&
+	git -C "$P" add .gitmodules &&
+	git -C "$P" commit -m x &&
+
+	configure_exclusion "$P" .gitmodules >h &&
+
+	sane_unset GIT_TEST_SIDEBAND_ALL &&
+	test_must_fail git -c protocol.version=2 -c transfer.fsckobjects=1 \
+		-c fetch.uriprotocols=http,https \
+		clone "$HTTPD_URL/smart/http_parent" http_child 2>err &&
+	test_i18ngrep "disallowed submodule name" err
+'
+
 # DO NOT add non-httpd-specific tests here, because the last part of this
 # test script is only executed when httpd is available and enabled.
 
-- 
2.30.0.617.g56c4b15f3c-goog


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* Re: [PATCH v2 0/4] Check .gitmodules when using packfile URIs
  2021-02-22 19:20 ` [PATCH v2 " Jonathan Tan
                     ` (3 preceding siblings ...)
  2021-02-22 19:20   ` [PATCH v2 4/4] fetch-pack: print and use dangling .gitmodules Jonathan Tan
@ 2021-02-22 20:12   ` Junio C Hamano
  4 siblings, 0 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-02-22 20:12 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, avarab

Jonathan Tan <jonathantanmy@google.com> writes:

> Here's v2. I think I've addressed all the review comments, including
> passing the index-pack args as separate arguments (to avoid the
> necessity to somehow encode in order to get rid of spaces), and by using
> a custom error function instead of a specific option in fsck.
>
> This applies on master. I mentioned earlier [1] that I was planning to
> implement this on Ævar's fsck API improvements, but after looking at the
> latest v2, I see that it omits patch 11 from v1 (which is the one I
> need), so what I've done is to use a string check in the meantime.
>
> [1] https://lore.kernel.org/git/20210219004612.1181920-1-jonathantanmy@google.com/

I only looked at the difference between this round and what is in
'seen', but everything looked reasonable to me (including the code
that is near NEEDSWORK comment, and what the comment said).

Will queue.  Thanks.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v2 2/4] http-fetch: allow custom index-pack args
  2021-02-22 19:20   ` [PATCH v2 2/4] http-fetch: " Jonathan Tan
@ 2021-02-23 13:17     ` Ævar Arnfjörð Bjarmason
  2021-02-23 16:51       ` Jonathan Tan
  2021-03-05  0:19     ` Jonathan Nieder
  1 sibling, 1 reply; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-23 13:17 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, gitster


On Mon, Feb 22 2021, Jonathan Tan wrote:

> diff --git a/Documentation/git-http-fetch.txt b/Documentation/git-http-fetch.txt
> index 4deb4893f5..9fa17b60e4 100644
> --- a/Documentation/git-http-fetch.txt
> +++ b/Documentation/git-http-fetch.txt
> @@ -41,11 +41,17 @@ commit-id::
>  		<commit-id>['\t'<filename-as-in--w>]
>  
>  --packfile=<hash>::
> -	Instead of a commit id on the command line (which is not expected in
> +	For internal use only. Instead of a commit id on the command
> +	line (which is not expected in
>  	this case), 'git http-fetch' fetches the packfile directly at the given
>  	URL and uses index-pack to generate corresponding .idx and .keep files.
>  	The hash is used to determine the name of the temporary file and is
> -	arbitrary. The output of index-pack is printed to stdout.
> +	arbitrary. The output of index-pack is printed to stdout. Requires
> +	--index-pack-args.
> +
> +--index-pack-args=<args>::
> +	For internal use only. The command to run on the contents of the
> +	downloaded pack. Arguments are URL-encoded separated by spaces.
>  
>  --recover::
>  	Verify that everything reachable from target is fetched.  Used after
> diff --git a/fetch-pack.c b/fetch-pack.c
> index 876f90c759..aeac010b0b 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -1645,6 +1645,9 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
>  		strvec_pushf(&cmd.args, "--packfile=%.*s",
>  			     (int) the_hash_algo->hexsz,
>  			     packfile_uris.items[i].string);
> +		strvec_push(&cmd.args, "--index-pack-arg=index-pack");
> +		strvec_push(&cmd.args, "--index-pack-arg=--stdin");
> +		strvec_push(&cmd.args, "--index-pack-arg=--keep");

The docs say --*-args, but the code checks --*arg, that seems like a
mistake that should be fixed to make the code/tests use the plural form,
no?

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v2 2/4] http-fetch: allow custom index-pack args
  2021-02-23 13:17     ` Ævar Arnfjörð Bjarmason
@ 2021-02-23 16:51       ` Jonathan Tan
  0 siblings, 0 replies; 229+ messages in thread
From: Jonathan Tan @ 2021-02-23 16:51 UTC (permalink / raw)
  To: avarab; +Cc: jonathantanmy, git, gitster

> > diff --git a/Documentation/git-http-fetch.txt b/Documentation/git-http-fetch.txt
> > index 4deb4893f5..9fa17b60e4 100644
> > --- a/Documentation/git-http-fetch.txt
> > +++ b/Documentation/git-http-fetch.txt
> > @@ -41,11 +41,17 @@ commit-id::
> >  		<commit-id>['\t'<filename-as-in--w>]
> >  
> >  --packfile=<hash>::
> > -	Instead of a commit id on the command line (which is not expected in
> > +	For internal use only. Instead of a commit id on the command
> > +	line (which is not expected in
> >  	this case), 'git http-fetch' fetches the packfile directly at the given
> >  	URL and uses index-pack to generate corresponding .idx and .keep files.
> >  	The hash is used to determine the name of the temporary file and is
> > -	arbitrary. The output of index-pack is printed to stdout.
> > +	arbitrary. The output of index-pack is printed to stdout. Requires
> > +	--index-pack-args.
> > +
> > +--index-pack-args=<args>::
> > +	For internal use only. The command to run on the contents of the
> > +	downloaded pack. Arguments are URL-encoded separated by spaces.
> >  
> >  --recover::
> >  	Verify that everything reachable from target is fetched.  Used after
> > diff --git a/fetch-pack.c b/fetch-pack.c
> > index 876f90c759..aeac010b0b 100644
> > --- a/fetch-pack.c
> > +++ b/fetch-pack.c
> > @@ -1645,6 +1645,9 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
> >  		strvec_pushf(&cmd.args, "--packfile=%.*s",
> >  			     (int) the_hash_algo->hexsz,
> >  			     packfile_uris.items[i].string);
> > +		strvec_push(&cmd.args, "--index-pack-arg=index-pack");
> > +		strvec_push(&cmd.args, "--index-pack-arg=--stdin");
> > +		strvec_push(&cmd.args, "--index-pack-arg=--keep");
> 
> The docs say --*-args, but the code checks --*arg, that seems like a
> mistake that should be fixed to make the code/tests use the plural form,
> no?

Thanks for catching that. Originally it was plural since this single
argument would give multiple arguments to index-pack, but now each
argument gives only a single argument, so "arg" is correct. I'll update
it in the next version.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v2 2/4] http-fetch: allow custom index-pack args
  2021-02-22 19:20   ` [PATCH v2 2/4] http-fetch: " Jonathan Tan
  2021-02-23 13:17     ` Ævar Arnfjörð Bjarmason
@ 2021-03-05  0:19     ` Jonathan Nieder
  2021-03-05  1:16       ` [PATCH] fetch-pack: do not mix --pack_header and packfile uri Jonathan Tan
  1 sibling, 1 reply; 229+ messages in thread
From: Jonathan Nieder @ 2021-03-05  0:19 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, avarab, gitster, Nathan Mulcahey

Hi Jonathan,

Jonathan Tan wrote:

> This is the next step in teaching fetch-pack to pass its index-pack
> arguments when processing packfiles referenced by URIs.
>
> The "--keep" in fetch-pack.c will be replaced with a full message in a
> subsequent commit.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  Documentation/git-http-fetch.txt | 10 ++++++++--
>  fetch-pack.c                     |  3 +++
>  http-fetch.c                     | 20 +++++++++++++++-----
>  t/t5550-http-fetch-dumb.sh       |  5 ++++-
>  4 files changed, 30 insertions(+), 8 deletions(-)

This is producing an interesting symptom for me:

 git init repro
 cd repro
 git config fetch.uriprotocols https
 git config remote.origin.url https://fuchsia.googlesource.com/fuchsia
 git config remote.origin.fetch +refs/heads/*:refs/remotes/origin/*
 git fetch -p origin

Expected result: fetches

Actual result:

 fatal: pack has bad object at offset 12: unknown object type 5
 fatal: finish_http_pack_request gave result -1
 fatal: fetch-pack: expected keep then TAB at start of http-fetch output

Thanks to Nathan Mulcahey (cc-ed) for a clear report.

Bisects to b664e9ffa153189dae9b88f32d1c5fedcf85056a, which is part of
"next" and 2.31.0-rc1.  Another report of the same is at
https://crbug.com/1184814.

Known problem?

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 229+ messages in thread

* [PATCH] fetch-pack: do not mix --pack_header and packfile uri
  2021-03-05  0:19     ` Jonathan Nieder
@ 2021-03-05  1:16       ` Jonathan Tan
  2021-03-05  1:52         ` Junio C Hamano
  2021-03-05 18:50         ` Junio C Hamano
  0 siblings, 2 replies; 229+ messages in thread
From: Jonathan Tan @ 2021-03-05  1:16 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, jrnieder, nmulcahey

When fetching (as opposed to cloning) from a repository with packfile
URIs enabled, an error like this may occur:

 fatal: pack has bad object at offset 12: unknown object type 5
 fatal: finish_http_pack_request gave result -1
 fatal: fetch-pack: expected keep then TAB at start of http-fetch output

This bug was introduced in b664e9ffa1 ("fetch-pack: with packfile URIs,
use index-pack arg", 2021-02-22), when the index-pack args used when
processing the inline packfile of a fetch response and when processing
packfile URIs were unified.

This bug happens because fetch, by default, partially reads (and
consumes) the header of the inline packfile to determine if it should
store the downloaded objects as a packfile or loose objects, and thus
passes --pack_header=<...> to index-pack to inform it that some bytes
are missing. However, when it subsequently fetches the additional
packfiles linked by URIs, it reuses the same index-pack arguments, thus
wrongly passing --index-pack-arg=--pack_header=<...> when no bytes are
missing.

This does not happen when cloning because "git clone" always passes
do_keep, which instructs the fetch mechanism to always retain the
packfile, eliminating the need to read the header.

There are a few ways to fix this, including filtering out pack_header
arguments when downloading the additional packfiles, but I decided to
stick to always using index-pack throughout when packfile URIs are
present - thus, Git no longer needs to read the bytes, and no longer
needs --pack_header here.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
Here's a fix for this issue.

This is on jt/transfer-fsck-across-packs.

One simplification that we could do is to eliminate the unpack-objects
codepath. As far as I understand, the main advantage of writing loose
objects is that we have automatic SHA-1 collision detection, but we have
such mitigations when writing packs too, so that might not be as large a
benefit as we think. This simplification would have enabled us to avoid
this bug, I think.
---
 fetch-pack.c           |  4 ++--
 t/t5702-protocol-v2.sh | 21 +++++++++++++++++++++
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index f9def5ac74..e990607742 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -852,7 +852,7 @@ static int get_pack(struct fetch_pack_args *args,
 	else
 		demux.out = xd[0];
 
-	if (!args->keep_pack && unpack_limit) {
+	if (!args->keep_pack && unpack_limit && !index_pack_args) {
 
 		if (read_pack_header(demux.out, &header))
 			die(_("protocol error: bad pack header"));
@@ -885,7 +885,7 @@ static int get_pack(struct fetch_pack_args *args,
 			strvec_push(&cmd.args, "-v");
 		if (args->use_thin_pack)
 			strvec_push(&cmd.args, "--fix-thin");
-		if (do_keep && (args->lock_pack || unpack_limit)) {
+		if ((do_keep || index_pack_args) && (args->lock_pack || unpack_limit)) {
 			char hostname[HOST_NAME_MAX + 1];
 			if (xgethostname(hostname, sizeof(hostname)))
 				xsnprintf(hostname, sizeof(hostname), "localhost");
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index b1bc73a9a9..9df1ec82ca 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -853,6 +853,27 @@ test_expect_success 'part of packfile response provided as URI' '
 	test_line_count = 6 filelist
 '
 
+test_expect_success 'packfile URIs with fetch instead of clone' '
+	P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+	rm -rf "$P" http_child log &&
+
+	git init "$P" &&
+	git -C "$P" config "uploadpack.allowsidebandall" "true" &&
+
+	echo my-blob >"$P/my-blob" &&
+	git -C "$P" add my-blob &&
+	git -C "$P" commit -m x &&
+
+	configure_exclusion "$P" my-blob >h &&
+
+	git init http_child &&
+
+	GIT_TEST_SIDEBAND_ALL=1 \
+	git -C http_child -c protocol.version=2 \
+		-c fetch.uriprotocols=http,https \
+		fetch "$HTTPD_URL/smart/http_parent"
+'
+
 test_expect_success 'fetching with valid packfile URI but invalid hash fails' '
 	P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
 	rm -rf "$P" http_child log &&
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* Re: [PATCH] fetch-pack: do not mix --pack_header and packfile uri
  2021-03-05  1:16       ` [PATCH] fetch-pack: do not mix --pack_header and packfile uri Jonathan Tan
@ 2021-03-05  1:52         ` Junio C Hamano
  2021-03-05 18:50         ` Junio C Hamano
  1 sibling, 0 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-03-05  1:52 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, jrnieder, nmulcahey

Jonathan Tan <jonathantanmy@google.com> writes:

> One simplification that we could do is to eliminate the unpack-objects
> codepath. As far as I understand, the main advantage of writing loose
> objects is that we have automatic SHA-1 collision detection, but we have
> such mitigations when writing packs too, so that might not be as large a
> benefit as we think. This simplification would have enabled us to avoid
> this bug, I think.

My understanding is that the primary advantage of loose objects
codepath is to help us avoid having too many little packs (instead,
we can accumulate enough objects in the loose form and let GC pack
them, at least the ones among them that are still reachable, into a
single pack).  Historically, the only mode of operation "repack"
offers that reduces the number of remaining packs has been "do full
reachability of the entire history, and pack everything into one",
so avoiding creation of little packs and leaving things loose until
we accumulate enough used to matter.

With the geometric rolling repacking, it may not matter as much, and
keeping everything packed, even in a small pack, might start to be
overall win.  So I am not opposed to such a simplification; we may
not be ready for it right now, but I think it would be a sensible
future direction.





^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH] fetch-pack: do not mix --pack_header and packfile uri
  2021-03-05  1:16       ` [PATCH] fetch-pack: do not mix --pack_header and packfile uri Jonathan Tan
  2021-03-05  1:52         ` Junio C Hamano
@ 2021-03-05 18:50         ` Junio C Hamano
  2021-03-05 19:46           ` Junio C Hamano
  2021-03-05 22:59           ` Jonathan Tan
  1 sibling, 2 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-03-05 18:50 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, jrnieder, nmulcahey

Jonathan Tan <jonathantanmy@google.com> writes:

> When fetching (as opposed to cloning) from a repository with packfile
> URIs enabled, an error like this may occur:
>
>  fatal: pack has bad object at offset 12: unknown object type 5
>  fatal: finish_http_pack_request gave result -1
>  fatal: fetch-pack: expected keep then TAB at start of http-fetch output
>
> This bug was introduced in b664e9ffa1 ("fetch-pack: with packfile URIs,
> use index-pack arg", 2021-02-22), when the index-pack args used when
> processing the inline packfile of a fetch response and when processing
> packfile URIs were unified.

> This bug happens because fetch, by default, partially reads (and
> consumes) the header of the inline packfile to determine if it should
> store the downloaded objects as a packfile or loose objects, and thus
> passes --pack_header=<...> to index-pack to inform it that some bytes
> are missing. 

... and what the values in them are.

> However, when it subsequently fetches the additional
> packfiles linked by URIs, it reuses the same index-pack arguments, thus
> wrongly passing --index-pack-arg=--pack_header=<...> when no bytes are
> missing.
>
> This does not happen when cloning because "git clone" always passes
> do_keep, which instructs the fetch mechanism to always retain the
> packfile, eliminating the need to read the header.
>
> There are a few ways to fix this, including filtering out pack_header
> arguments when downloading the additional packfiles, but ...

Avoiding the condition that exhibits the breakage is possible, and I
think it is what is done here, but I actually think that the only
right fix is to pass correct argument to commands we invoke in the
first place.  Why are we reusing the same argument array to begin
with?

    ... goes back and reads the offending commit ...

commit b664e9ffa153189dae9b88f32d1c5fedcf85056a
Author: Jonathan Tan <jonathantanmy@google.com>
Date:   Mon Feb 22 11:20:08 2021 -0800

    fetch-pack: with packfile URIs, use index-pack arg
    
    Unify the index-pack arguments used when processing the inline pack and
    when downloading packfiles referenced by URIs. This is done by teaching
    get_pack() to also store the index-pack arguments whenever at least one
    packfile URI is given, and then when processing the packfile URI(s),
    using the stored arguments.

THis makes it sound like the entire idea of this offending commit
was wrong, and before it, the codepath that processed the packfile
fetched from the packfile URI were using the index-pack correctly
by using index-pack arguments that are independent from the one that
is used to process the packfile given in-stream.  Why isn't the fix
just a straight revert of the commit???

> This is on jt/transfer-fsck-across-packs.

Ouch.  This definitely is an -rc material.


> -	if (!args->keep_pack && unpack_limit) {
> +	if (!args->keep_pack && unpack_limit && !index_pack_args) {

This one makes sense as an "avoid conditions that reveals how badly
the code is broken" band-aid.  When we have index-pack related
arguments, we cannot use the unpack-objects codepath even if we are
being fed a tiny pack, so there is no point peeking at the beginning
of the pack stream to find out how many objects it has.  OK.

> @@ -885,7 +885,7 @@ static int get_pack(struct fetch_pack_args *args,
>  			strvec_push(&cmd.args, "-v");
>  		if (args->use_thin_pack)
>  			strvec_push(&cmd.args, "--fix-thin");
> -		if (do_keep && (args->lock_pack || unpack_limit)) {
> +		if ((do_keep || index_pack_args) && (args->lock_pack || unpack_limit)) {
>  			char hostname[HOST_NAME_MAX + 1];
>  			if (xgethostname(hostname, sizeof(hostname)))
>  				xsnprintf(hostname, sizeof(hostname), "localhost");

I do not quite get what this hunk is doing.  Care to explain?

Thanks.

> diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
> index b1bc73a9a9..9df1ec82ca 100755
> --- a/t/t5702-protocol-v2.sh
> +++ b/t/t5702-protocol-v2.sh
> @@ -853,6 +853,27 @@ test_expect_success 'part of packfile response provided as URI' '
>  	test_line_count = 6 filelist
>  '
>  
> +test_expect_success 'packfile URIs with fetch instead of clone' '
> +	P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
> +	rm -rf "$P" http_child log &&
> +
> +	git init "$P" &&
> +	git -C "$P" config "uploadpack.allowsidebandall" "true" &&
> +
> +	echo my-blob >"$P/my-blob" &&
> +	git -C "$P" add my-blob &&
> +	git -C "$P" commit -m x &&
> +
> +	configure_exclusion "$P" my-blob >h &&
> +
> +	git init http_child &&
> +
> +	GIT_TEST_SIDEBAND_ALL=1 \
> +	git -C http_child -c protocol.version=2 \
> +		-c fetch.uriprotocols=http,https \
> +		fetch "$HTTPD_URL/smart/http_parent"
> +'
> +
>  test_expect_success 'fetching with valid packfile URI but invalid hash fails' '
>  	P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
>  	rm -rf "$P" http_child log &&

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH] fetch-pack: do not mix --pack_header and packfile uri
  2021-03-05 18:50         ` Junio C Hamano
@ 2021-03-05 19:46           ` Junio C Hamano
  2021-03-05 23:11             ` Jonathan Tan
  2021-03-05 23:20             ` Junio C Hamano
  2021-03-05 22:59           ` Jonathan Tan
  1 sibling, 2 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-03-05 19:46 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, jrnieder, nmulcahey

Junio C Hamano <gitster@pobox.com> writes:

> Avoiding the condition that exhibits the breakage is possible, and I
> think it is what is done here, but I actually think that the only
> right fix is to pass correct argument to commands we invoke in the
> first place.  Why are we reusing the same argument array to begin
> with?
>
>     ... goes back and reads the offending commit ...
>
> commit b664e9ffa153189dae9b88f32d1c5fedcf85056a
> Author: Jonathan Tan <jonathantanmy@google.com>
> Date:   Mon Feb 22 11:20:08 2021 -0800
>
>     fetch-pack: with packfile URIs, use index-pack arg
>     
>     Unify the index-pack arguments used when processing the inline pack and
>     when downloading packfiles referenced by URIs. This is done by teaching
>     get_pack() to also store the index-pack arguments whenever at least one
>     packfile URI is given, and then when processing the packfile URI(s),
>     using the stored arguments.
>
> THis makes it sound like the entire idea of this offending commit
> was wrong, and before it, the codepath that processed the packfile
> fetched from the packfile URI were using the index-pack correctly
> by using index-pack arguments that are independent from the one that
> is used to process the packfile given in-stream.  Why isn't the fix
> just a straight revert of the commit???

By the way, the band-aid in this patch may be OK for the upcoming
release (purely because it is easy to see that is sufficient for
today's codebase), but I said the above because I worry about the
health of the codebase in the longer term.  The "pass_header" may
not stay to be the only difference between the URI packfile and
in-stream packfile in the way they make index-pack invocations.

>> This is on jt/transfer-fsck-across-packs.
>
> Ouch.  This definitely is an -rc material.

Thanks.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH] fetch-pack: do not mix --pack_header and packfile uri
  2021-03-05 18:50         ` Junio C Hamano
  2021-03-05 19:46           ` Junio C Hamano
@ 2021-03-05 22:59           ` Jonathan Tan
  2021-03-05 23:18             ` Junio C Hamano
  1 sibling, 1 reply; 229+ messages in thread
From: Jonathan Tan @ 2021-03-05 22:59 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git, jrnieder, nmulcahey

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > When fetching (as opposed to cloning) from a repository with packfile
> > URIs enabled, an error like this may occur:
> >
> >  fatal: pack has bad object at offset 12: unknown object type 5
> >  fatal: finish_http_pack_request gave result -1
> >  fatal: fetch-pack: expected keep then TAB at start of http-fetch output
> >
> > This bug was introduced in b664e9ffa1 ("fetch-pack: with packfile URIs,
> > use index-pack arg", 2021-02-22), when the index-pack args used when
> > processing the inline packfile of a fetch response and when processing
> > packfile URIs were unified.
> 
> > This bug happens because fetch, by default, partially reads (and
> > consumes) the header of the inline packfile to determine if it should
> > store the downloaded objects as a packfile or loose objects, and thus
> > passes --pack_header=<...> to index-pack to inform it that some bytes
> > are missing. 
> 
> ... and what the values in them are.

Ah, that's true.

> > However, when it subsequently fetches the additional
> > packfiles linked by URIs, it reuses the same index-pack arguments, thus
> > wrongly passing --index-pack-arg=--pack_header=<...> when no bytes are
> > missing.
> >
> > This does not happen when cloning because "git clone" always passes
> > do_keep, which instructs the fetch mechanism to always retain the
> > packfile, eliminating the need to read the header.
> >
> > There are a few ways to fix this, including filtering out pack_header
> > arguments when downloading the additional packfiles, but ...
> 
> Avoiding the condition that exhibits the breakage is possible, and I
> think it is what is done here, but I actually think that the only
> right fix is to pass correct argument to commands we invoke in the
> first place.  Why are we reusing the same argument array to begin
> with?
> 
>     ... goes back and reads the offending commit ...
> 
> commit b664e9ffa153189dae9b88f32d1c5fedcf85056a
> Author: Jonathan Tan <jonathantanmy@google.com>
> Date:   Mon Feb 22 11:20:08 2021 -0800
> 
>     fetch-pack: with packfile URIs, use index-pack arg
>     
>     Unify the index-pack arguments used when processing the inline pack and
>     when downloading packfiles referenced by URIs. This is done by teaching
>     get_pack() to also store the index-pack arguments whenever at least one
>     packfile URI is given, and then when processing the packfile URI(s),
>     using the stored arguments.
> 
> THis makes it sound like the entire idea of this offending commit
> was wrong, and before it, the codepath that processed the packfile
> fetched from the packfile URI were using the index-pack correctly
> by using index-pack arguments that are independent from the one that
> is used to process the packfile given in-stream.  Why isn't the fix
> just a straight revert of the commit???

I should probably have written more in the commit message to justify the
unification, but it is also part of a bug fix (in particular,
--fsck-objects wasn't being passed to the index-pack that indexed the
packfiles linked by URI) and for code health purposes (to prevent future
bugs by eliminating the divergence). So reverting that commit would
reintroduce another bug.

> > @@ -885,7 +885,7 @@ static int get_pack(struct fetch_pack_args *args,
> >  			strvec_push(&cmd.args, "-v");
> >  		if (args->use_thin_pack)
> >  			strvec_push(&cmd.args, "--fix-thin");
> > -		if (do_keep && (args->lock_pack || unpack_limit)) {
> > +		if ((do_keep || index_pack_args) && (args->lock_pack || unpack_limit)) {
> >  			char hostname[HOST_NAME_MAX + 1];
> >  			if (xgethostname(hostname, sizeof(hostname)))
> >  				xsnprintf(hostname, sizeof(hostname), "localhost");
> 
> I do not quite get what this hunk is doing.  Care to explain?

The "do_keep" part was unnecessarily restrictive and I used a band-aid
solution to loosen it. I think this started from 88e2f9ed8e ("introduce
fetch-object: fetch one promisor object", 2017-12-05) where I might have
misunderstood what do_keep was meant to do, and taught fetch-pack to use
"index-pack" if do_keep is true or args->from_promisor is true. What I
should have done is to set do_keep to true if args->from_promisor is
true. Future commits continued to do that with fsck_objects and
index_pack_args.

Maybe what I can do is to refactor get_pack() so that do_keep retains
its original meaning of whether to use "index-pack" or "unpack-objects",
and then we wouldn't need this line. What do you think (code-wise and
whether this fits in with the release schedule, if we want to get this
in before release)?

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH] fetch-pack: do not mix --pack_header and packfile uri
  2021-03-05 19:46           ` Junio C Hamano
@ 2021-03-05 23:11             ` Jonathan Tan
  2021-03-05 23:20             ` Junio C Hamano
  1 sibling, 0 replies; 229+ messages in thread
From: Jonathan Tan @ 2021-03-05 23:11 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git, jrnieder, nmulcahey

> By the way, the band-aid in this patch may be OK for the upcoming
> release (purely because it is easy to see that is sufficient for
> today's codebase), but I said the above because I worry about the
> health of the codebase in the longer term.  The "pass_header" may
> not stay to be the only difference between the URI packfile and
> in-stream packfile in the way they make index-pack invocations.

That is true, but at the same time, I think it's better to have the
arguments be the same because there are options (e.g. --promisor and
--fsck-objects) that have to be duplicated, and I think that for the
most part, the URI packfiles and the inline packfile will be processed
identically.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH] fetch-pack: do not mix --pack_header and packfile uri
  2021-03-05 22:59           ` Jonathan Tan
@ 2021-03-05 23:18             ` Junio C Hamano
  2021-03-08 19:14               ` Jonathan Tan
  0 siblings, 1 reply; 229+ messages in thread
From: Junio C Hamano @ 2021-03-05 23:18 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, jrnieder, nmulcahey

Jonathan Tan <jonathantanmy@google.com> writes:

> I should probably have written more in the commit message to justify the
> unification, but it is also part of a bug fix (in particular,
> --fsck-objects wasn't being passed to the index-pack that indexed the
> packfiles linked by URI) and for code health purposes (to prevent future
> bugs by eliminating the divergence). So reverting that commit would
> reintroduce another bug.

Not necessarily.  Unifying two that do not inherently have to be
identical makes it impossible to pass two different things, and that
is what we are seeing in the bug this patch is trying to fix (by
forcing the two to be identical by eliminating the unpack-objects
codepath in certain cases).  

The right "fix" for the original bug would have been to keep them
still separate yet making it easy to pass args that must be used in
both of them, no?

>> > -		if (do_keep && (args->lock_pack || unpack_limit)) {
>> > +		if ((do_keep || index_pack_args) && (args->lock_pack || unpack_limit)) {
>> >  			char hostname[HOST_NAME_MAX + 1];
>> >  			if (xgethostname(hostname, sizeof(hostname)))
>> >  				xsnprintf(hostname, sizeof(hostname), "localhost");
>> 
>> I do not quite get what this hunk is doing.  Care to explain?
>
> The "do_keep" part was unnecessarily restrictive and I used a band-aid
> solution to loosen it. I think this started from 88e2f9ed8e ("introduce
> fetch-object: fetch one promisor object", 2017-12-05) where I might have
> misunderstood what do_keep was meant to do, and taught fetch-pack to use
> "index-pack" if do_keep is true or args->from_promisor is true. What I
> should have done is to set do_keep to true if args->from_promisor is
> true. Future commits continued to do that with fsck_objects and
> index_pack_args.

> Maybe what I can do is to refactor get_pack() so that do_keep retains
> its original meaning of whether to use "index-pack" or "unpack-objects",
> and then we wouldn't need this line. What do you think (code-wise and
> whether this fits in with the release schedule, if we want to get this
> in before release)?

How bad is the breakage this one is trying to fix?  I know it would
only affect folks who have to interact with the server that uses
packfile URI feature, but do they have a workaround, perhaps with a
configuration knob or command line option to ignore the packfile
URI, and how large is the affected population?

I cannot shake the feeling that we are seeing band-aid on top of
band-aid forced by having chosen to go in a wrong direction in the
beginning X-<, and prefer to see the code drift even further into
the same direction; hence my earlier suggestion to go back to the
root cause by first reverting the wrong fix that introduced this bug
and fixing the original bug in a different way.

I dunno how involved the necessary surgery would be, though.  If
this is easy to work around, perhaps it might be a better option for
the overall project to ship the upcoming release with this listed as
a known breakage.

Thanks.




^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH] fetch-pack: do not mix --pack_header and packfile uri
  2021-03-05 19:46           ` Junio C Hamano
  2021-03-05 23:11             ` Jonathan Tan
@ 2021-03-05 23:20             ` Junio C Hamano
  1 sibling, 0 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-03-05 23:20 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, jrnieder, nmulcahey

Junio C Hamano <gitster@pobox.com> writes:

>> THis makes it sound like the entire idea of this offending commit
>> was wrong, and before it, the codepath that processed the packfile
>> fetched from the packfile URI were using the index-pack correctly
>> by using index-pack arguments that are independent from the one that
>> is used to process the packfile given in-stream.  Why isn't the fix
>> just a straight revert of the commit???
>
> By the way, the band-aid in this patch may be OK for the upcoming
> release (purely because it is easy to see that is sufficient for
> today's codebase), but I said the above because I worry about the
> health of the codebase in the longer term.  The "pass_header" may
> not stay to be the only difference between the URI packfile and
> in-stream packfile in the way they make index-pack invocations.

For example, the URI one presumably is a CDN hosted long term one,
which may be a good candidate to --keep, and in-stream one,
especially when packfile URI feature is used, can be expected to be
recent small leftover bits that it is likely that we do not want to
keep (in fact, if they are small enough, we'd prefer to keep them
loose).

^ permalink raw reply	[flat|nested] 229+ messages in thread

* [PATCH v3 00/22] fsck: API improvements
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
  2021-02-18 22:19               ` Junio C Hamano
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-07 23:04                 ` Junio C Hamano
                                   ` (23 more replies)
  2021-03-06 11:04               ` [PATCH v3 01/22] fsck.h: update FSCK_OPTIONS_* for object_name Ævar Arnfjörð Bjarmason
                                 ` (21 subsequent siblings)
  23 siblings, 24 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Now that jt/transfer-fsck-across-packs has been merged to master
here's a re-roll of v1[1]+v2[2] of this series. v2 was slimmed-down +
had a trivial typo fix, so I've done the range-diff against v1.

This makes the recent fetch-pack work use the fsck_msg_id API to
distinguish messages, and has other various cleanups and improvements
to make the fsck API easier to use in the future.

There's a an easy merge conflict here with other in-flight changes to
fsck. I figured it was better to send this now than wait for those to
land.

1. https://lore.kernel.org/git/20210217194246.25342-1-avarab@gmail.com/
2. https://lore.kernel.org/git/20210218105840.11989-1-avarab@gmail.com/

Ævar Arnfjörð Bjarmason (22):
  fsck.h: update FSCK_OPTIONS_* for object_name
  fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT}
  fsck.h: reduce duplication between FSCK_OPTIONS_{DEFAULT,STRICT}
  fsck.h: add a FSCK_OPTIONS_COMMON_ERROR_FUNC macro
  fsck.h: indent arguments to of fsck_set_msg_type
  fsck.h: use "enum object_type" instead of "int"
  fsck.c: rename variables in fsck_set_msg_type() for less confusion
  fsck.c: move definition of msg_id into append_msg_id()
  fsck.c: rename remaining fsck_msg_id "id" to "msg_id"
  fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type"
  fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
  fsck.h: re-order and re-assign "enum fsck_msg_type"
  fsck.c: call parse_msg_type() early in fsck_set_msg_type()
  fsck.c: undefine temporary STR macro after use
  fsck.c: give "FOREACH_MSG_ID" a more specific name
  fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h
  fsck.c: pass along the fsck_msg_id in the fsck_error callback
  fsck.c: add an fsck_set_msg_type() API that takes enums
  fsck.c: move gitmodules_{found,done} into fsck_options
  fetch-pack: don't needlessly copy fsck_options
  fetch-pack: use file-scope static struct for fsck_options
  fetch-pack: use new fsck API to printing dangling submodules

 Makefile                 |   1 +
 builtin/fsck.c           |   7 +-
 builtin/index-pack.c     |  30 ++-----
 builtin/mktag.c          |   7 +-
 builtin/unpack-objects.c |   3 +-
 fetch-pack.c             |   6 +-
 fsck-cb.c                |  16 ++++
 fsck.c                   | 175 ++++++++++++---------------------------
 fsck.h                   | 132 ++++++++++++++++++++++++++---
 9 files changed, 211 insertions(+), 166 deletions(-)
 create mode 100644 fsck-cb.c

Range-diff:
13:  8de91fac068 =  1:  9d809466bd1 fsck.h: update FSCK_OPTIONS_* for object_name
 -:  ----------- >  2:  33e8b6d6545 fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT}
 -:  ----------- >  3:  c23f7ce9e4a fsck.h: reduce duplication between FSCK_OPTIONS_{DEFAULT,STRICT}
 -:  ----------- >  4:  5dde68df6c3 fsck.h: add a FSCK_OPTIONS_COMMON_ERROR_FUNC macro
 1:  88b347b74ed =  5:  7ae35a6e9d2 fsck.h: indent arguments to of fsck_set_msg_type
 2:  1a60d65d2ca !  6:  dfb5f754b37 fsck.h: use use "enum object_type" instead of "int"
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    fsck.h: use use "enum object_type" instead of "int"
    +    fsck.h: use "enum object_type" instead of "int"
     
         Change the fsck_walk_func to use an "enum object_type" instead of an
         "int" type. The types are compatible, and ever since this was added in
 3:  24761f269b7 !  7:  fd58ec73c6b fsck.c: rename variables in fsck_set_msg_type() for less confusion
    @@ Commit message
         It was needlessly confusing that it took a "msg_type" argument, but
         then later declared another "msg_type" of a different type.
     
    -    Let's rename that to "tmp", and rename "id" to "msg_id" and "msg_id"
    -    to "msg_id_str" etc. This will make a follow-up change smaller.
    +    Let's rename that to "severity", and rename "id" to "msg_id" and
    +    "msg_id" to "msg_id_str" etc. This will make a follow-up change
    +    smaller.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    @@ fsck.c: int is_valid_msg_type(const char *msg_id, const char *msg_type)
      		int i;
     -		int *msg_type;
     -		ALLOC_ARRAY(msg_type, FSCK_MSG_MAX);
    -+		int *tmp;
    -+		ALLOC_ARRAY(tmp, FSCK_MSG_MAX);
    ++		int *severity;
    ++		ALLOC_ARRAY(severity, FSCK_MSG_MAX);
      		for (i = 0; i < FSCK_MSG_MAX; i++)
     -			msg_type[i] = fsck_msg_type(i, options);
     -		options->msg_type = msg_type;
    -+			tmp[i] = fsck_msg_type(i, options);
    -+		options->msg_type = tmp;
    ++			severity[i] = fsck_msg_type(i, options);
    ++		options->msg_type = severity;
      	}
      
     -	options->msg_type[id] = type;
 4:  fb4c66f9305 =  8:  48cb4d3bb70 fsck.c: move definition of msg_id into append_msg_id()
 5:  a129dbd9964 !  9:  2c80ad32038 fsck.c: rename remaining fsck_msg_id "id" to "msg_id"
    @@ Commit message
         "msg_id". This change is relatively small, and is worth the churn for
         a later change where we have different id's in the "report" function.
     
    +    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
    +
      ## fsck.c ##
     @@ fsck.c: void fsck_set_msg_types(struct fsck_options *options, const char *values)
      	free(to_free);
 -:  ----------- > 10:  92dfbdfb624 fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type"
 6:  d9bee41072e ! 11:  c1c476af69b fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
    @@ Commit message
          - f27d05b1704 (fsck: allow upgrading fsck warnings to errors,
            2015-06-22)
     
    +    The reason these were defined in two different places is because we
    +    use FSCK_{IGNORE,INFO,FATAL} only in fsck.c, but FSCK_{ERROR,WARN} are
    +    used by external callbacks.
    +
    +    Untangling that would take some more work, since we expose the new
    +    "enum fsck_msg_type" to both. Similar to "enum object_type" it's not
    +    worth structuring the API in such a way that only those who need
    +    FSCK_{ERROR,WARN} pass around a different type.
    +
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## builtin/fsck.c ##
    @@ builtin/fsck.c: static int objerror(struct object *obj, const char *err)
      	switch (msg_type) {
      	case FSCK_WARN:
     
    + ## builtin/index-pack.c ##
    +@@ builtin/index-pack.c: static void show_pack_info(int stat_only)
    + static int print_dangling_gitmodules(struct fsck_options *o,
    + 				     const struct object_id *oid,
    + 				     enum object_type object_type,
    +-				     int msg_type, const char *message)
    ++				     enum fsck_msg_type msg_type,
    ++				     const char *message)
    + {
    + 	/*
    + 	 * NEEDSWORK: Plumb the MSG_ID (from fsck.c) here and use it
    +
      ## builtin/mktag.c ##
     @@ builtin/mktag.c: static int mktag_config(const char *var, const char *value, void *cb)
      static int mktag_fsck_error_func(struct fsck_options *o,
    @@ fsck.c: void list_config_fsck_msg_ids(struct string_list *list, const char *pref
     +static enum fsck_msg_type fsck_msg_type(enum fsck_msg_id msg_id,
      	struct fsck_options *options)
      {
    --	int msg_type;
    -+	enum fsck_msg_type msg_type;
    - 
      	assert(msg_id >= 0 && msg_id < FSCK_MSG_MAX);
      
    + 	if (!options->msg_type) {
    +-		int msg_type = msg_id_info[msg_id].msg_type;
    ++		enum fsck_msg_type msg_type = msg_id_info[msg_id].msg_type;
    + 
    + 		if (options->strict && msg_type == FSCK_WARN)
    + 			msg_type = FSCK_ERROR;
     @@ fsck.c: static int fsck_msg_type(enum fsck_msg_id msg_id,
    - 	return msg_type;
    + 	return options->msg_type[msg_id];
      }
      
     -static int parse_msg_type(const char *str)
    @@ fsck.c: void fsck_set_msg_type(struct fsck_options *options,
      
      	if (!options->msg_type) {
      		int i;
    --		int *tmp;
    -+		enum fsck_msg_type *tmp;
    - 		ALLOC_ARRAY(tmp, FSCK_MSG_MAX);
    +-		int *severity;
    ++		enum fsck_msg_type *severity;
    + 		ALLOC_ARRAY(severity, FSCK_MSG_MAX);
      		for (i = 0; i < FSCK_MSG_MAX; i++)
    - 			tmp[i] = fsck_msg_type(i, options);
    + 			severity[i] = fsck_msg_type(i, options);
     @@ fsck.c: static int report(struct fsck_options *options,
      {
      	va_list ap;
    @@ fsck.h
     -#define FSCK_ERROR 1
     -#define FSCK_WARN 2
     -#define FSCK_IGNORE 3
    --
     +enum fsck_msg_type {
    -+	FSCK_INFO = -2,
    ++	FSCK_INFO  = -2,
     +	FSCK_FATAL = -1,
     +	FSCK_ERROR = 1,
     +	FSCK_WARN,
     +	FSCK_IGNORE
     +};
    + 
      struct fsck_options;
      struct object;
    - 
     @@ fsck.h: typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type,
      /* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
      typedef int (*fsck_error)(struct fsck_options *o,
 -:  ----------- > 12:  d55587719a5 fsck.h: re-order and re-assign "enum fsck_msg_type"
 7:  423568026c3 = 13:  32828d1c78c fsck.c: call parse_msg_type() early in fsck_set_msg_type()
 8:  cb43e832738 = 14:  5c62066235c fsck.c: undefine temporary STR macro after use
 9:  2cd14cb4e2a = 15:  f8e50fbf7d3 fsck.c: give "FOREACH_MSG_ID" a more specific name
10:  1ada154ef23 ! 16:  cd74dee8769 fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h
    @@ fsck.c
      ## fsck.h ##
     @@ fsck.h: enum fsck_msg_type {
      	FSCK_WARN,
    - 	FSCK_IGNORE
      };
    -+
    + 
     +#define FOREACH_FSCK_MSG_ID(FUNC) \
     +	/* fatal errors */ \
     +	FUNC(NUL_IN_HEADER, FATAL) \
11:  c4179445f22 ! 17:  234e287d081 fsck.c: pass along the fsck_msg_id in the fsck_error callback
    @@ builtin/fsck.c: static int objerror(struct object *obj, const char *err)
      	switch (msg_type) {
      	case FSCK_WARN:
     
    + ## builtin/index-pack.c ##
    +@@ builtin/index-pack.c: static int print_dangling_gitmodules(struct fsck_options *o,
    + 				     const struct object_id *oid,
    + 				     enum object_type object_type,
    + 				     enum fsck_msg_type msg_type,
    ++				     enum fsck_msg_id msg_id,
    + 				     const char *message)
    + {
    + 	/*
    +@@ builtin/index-pack.c: static int print_dangling_gitmodules(struct fsck_options *o,
    + 		printf("%s\n", oid_to_hex(oid));
    + 		return 0;
    + 	}
    +-	return fsck_error_function(o, oid, object_type, msg_type, message);
    ++	return fsck_error_function(o, oid, object_type, msg_type, msg_id, message);
    + }
    + 
    + int cmd_index_pack(int argc, const char **argv, const char *prefix)
    +
      ## builtin/mktag.c ##
     @@ builtin/mktag.c: static int mktag_fsck_error_func(struct fsck_options *o,
      				 const struct object_id *oid,
12:  c1fc724f0e8 ! 18:  8049dc07391 fsck.c: add an fsck_set_msg_type() API that takes enums
    @@ fsck.c: int is_valid_msg_type(const char *msg_id, const char *msg_type)
     +{
     +	if (!options->msg_type) {
     +		int i;
    -+		enum fsck_msg_type *tmp;
    -+		ALLOC_ARRAY(tmp, FSCK_MSG_MAX);
    ++		enum fsck_msg_type *severity;
    ++		ALLOC_ARRAY(severity, FSCK_MSG_MAX);
     +		for (i = 0; i < FSCK_MSG_MAX; i++)
    -+			tmp[i] = fsck_msg_type(i, options);
    -+		options->msg_type = tmp;
    ++			severity[i] = fsck_msg_type(i, options);
    ++		options->msg_type = severity;
     +	}
     +
     +	options->msg_type[msg_id] = msg_type;
    @@ fsck.c: void fsck_set_msg_type(struct fsck_options *options,
      
     -	if (!options->msg_type) {
     -		int i;
    --		enum fsck_msg_type *tmp;
    --		ALLOC_ARRAY(tmp, FSCK_MSG_MAX);
    +-		enum fsck_msg_type *severity;
    +-		ALLOC_ARRAY(severity, FSCK_MSG_MAX);
     -		for (i = 0; i < FSCK_MSG_MAX; i++)
    --			tmp[i] = fsck_msg_type(i, options);
    --		options->msg_type = tmp;
    +-			severity[i] = fsck_msg_type(i, options);
    +-		options->msg_type = severity;
     -	}
     -
     -	options->msg_type[msg_id] = msg_type;
14:  29ff97856ff ! 19:  4224a29d15c fsck.c: move gitmodules_{found,done} into fsck_options
    @@ Commit message
         fsck_options struct. It makes sense to keep all the context in the
         same place.
     
    +    This requires changing the recently added register_found_gitmodules()
    +    function added in 5476e1efde (fetch-pack: print and use dangling
    +    .gitmodules, 2021-02-22) to take fsck_options. That function will be
    +    removed in a subsequent commit, but as it'll require the new
    +    gitmodules_found attribute of "fsck_options" we need this intermediate
    +    step first.
    +
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    + ## fetch-pack.c ##
    +@@ fetch-pack.c: static void fsck_gitmodules_oids(struct oidset *gitmodules_oids)
    + 
    + 	oidset_iter_init(gitmodules_oids, &iter);
    + 	while ((oid = oidset_iter_next(&iter)))
    +-		register_found_gitmodules(oid);
    ++		register_found_gitmodules(&fo, oid);
    + 	if (fsck_finish(&fo))
    + 		die("fsck failed");
    + }
    +
      ## fsck.c ##
     @@
      #include "credential.h"
    @@ fsck.c: static int fsck_blob(const struct object_id *oid, const char *buf,
      
      	if (object_on_skiplist(options, oid))
      		return 0;
    +@@ fsck.c: int fsck_error_function(struct fsck_options *o,
    + 	return 1;
    + }
    + 
    +-void register_found_gitmodules(const struct object_id *oid)
    ++void register_found_gitmodules(struct fsck_options *options, const struct object_id *oid)
    + {
    +-	oidset_insert(&gitmodules_found, oid);
    ++	oidset_insert(&options->gitmodules_found, oid);
    + }
    + 
    + int fsck_finish(struct fsck_options *options)
     @@ fsck.c: int fsck_finish(struct fsck_options *options)
      	struct oidset_iter iter;
      	const struct object_id *oid;
    @@ fsck.h: struct fsck_options {
      	kh_oid_map_t *object_names;
      };
      
    --#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT, NULL }
    --#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT, NULL }
    -+#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT, OIDSET_INIT, OIDSET_INIT, NULL }
    -+#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT, OIDSET_INIT, OIDSET_INIT, NULL }
    +@@ fsck.h: struct fsck_options {
    + 	.walk = NULL, \
    + 	.msg_type = NULL, \
    + 	.skiplist = OIDSET_INIT, \
    ++	.gitmodules_found = OIDSET_INIT, \
    ++	.gitmodules_done = OIDSET_INIT, \
    + 	.object_names = NULL,
    + #define FSCK_OPTIONS_COMMON_ERROR_FUNC \
    + 	FSCK_OPTIONS_COMMON \
    +@@ fsck.h: int fsck_walk(struct object *obj, void *data, struct fsck_options *options);
    + int fsck_object(struct object *obj, void *data, unsigned long size,
    + 	struct fsck_options *options);
    + 
    +-void register_found_gitmodules(const struct object_id *oid);
    ++void register_found_gitmodules(struct fsck_options *options,
    ++			       const struct object_id *oid);
      
    - /* descend in all linked child objects
    -  * the return value is:
    + /*
    +  * fsck a tag, and pass info about it back to the caller. This is
 -:  ----------- > 20:  40b13468129 fetch-pack: don't needlessly copy fsck_options
 -:  ----------- > 21:  8e418abfbd7 fetch-pack: use file-scope static struct for fsck_options
 -:  ----------- > 22:  113de190f7d fetch-pack: use new fsck API to printing dangling submodules
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply	[flat|nested] 229+ messages in thread

* [PATCH v3 01/22] fsck.h: update FSCK_OPTIONS_* for object_name
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
  2021-02-18 22:19               ` Junio C Hamano
  2021-03-06 11:04               ` [PATCH v3 00/22] fsck: API improvements Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 02/22] fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT} Ævar Arnfjörð Bjarmason
                                 ` (20 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Add the object_name member to the initialization macro. This was
omitted in 7b35efd734e (fsck_walk(): optionally name objects on the
go, 2016-07-17) when the field was added.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fsck.h b/fsck.h
index 733378f1260..2274843ba0c 100644
--- a/fsck.h
+++ b/fsck.h
@@ -43,8 +43,8 @@ struct fsck_options {
 	kh_oid_map_t *object_names;
 };
 
-#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT }
-#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT }
+#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT, NULL }
+#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT, NULL }
 
 /* descend in all linked child objects
  * the return value is:
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 02/22] fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT}
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (2 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 01/22] fsck.h: update FSCK_OPTIONS_* for object_name Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 03/22] fsck.h: reduce duplication between FSCK_OPTIONS_{DEFAULT,STRICT} Ævar Arnfjörð Bjarmason
                                 ` (19 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.h | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/fsck.h b/fsck.h
index 2274843ba0c..40f3cb3f645 100644
--- a/fsck.h
+++ b/fsck.h
@@ -43,8 +43,22 @@ struct fsck_options {
 	kh_oid_map_t *object_names;
 };
 
-#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT, NULL }
-#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT, NULL }
+#define FSCK_OPTIONS_DEFAULT { \
+	.walk = NULL, \
+	.error_func = fsck_error_function, \
+	.strict = 0, \
+	.msg_type = NULL, \
+	.skiplist = OIDSET_INIT, \
+	.object_names = NULL, \
+}
+#define FSCK_OPTIONS_STRICT { \
+	.walk = NULL, \
+	.error_func = fsck_error_function, \
+	.strict = 1, \
+	.msg_type = NULL, \
+	.skiplist = OIDSET_INIT, \
+	.object_names = NULL, \
+}
 
 /* descend in all linked child objects
  * the return value is:
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 03/22] fsck.h: reduce duplication between FSCK_OPTIONS_{DEFAULT,STRICT}
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (3 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 02/22] fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT} Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 04/22] fsck.h: add a FSCK_OPTIONS_COMMON_ERROR_FUNC macro Ævar Arnfjörð Bjarmason
                                 ` (18 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Use a temporary macro to define what FSCK_OPTIONS_{DEFAULT,STRICT}
have in common, and define the two in terms of that macro.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.h | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/fsck.h b/fsck.h
index 40f3cb3f645..ea3a907ec3b 100644
--- a/fsck.h
+++ b/fsck.h
@@ -43,22 +43,14 @@ struct fsck_options {
 	kh_oid_map_t *object_names;
 };
 
-#define FSCK_OPTIONS_DEFAULT { \
+#define FSCK_OPTIONS_COMMON \
 	.walk = NULL, \
 	.error_func = fsck_error_function, \
-	.strict = 0, \
 	.msg_type = NULL, \
 	.skiplist = OIDSET_INIT, \
-	.object_names = NULL, \
-}
-#define FSCK_OPTIONS_STRICT { \
-	.walk = NULL, \
-	.error_func = fsck_error_function, \
-	.strict = 1, \
-	.msg_type = NULL, \
-	.skiplist = OIDSET_INIT, \
-	.object_names = NULL, \
-}
+	.object_names = NULL,
+#define FSCK_OPTIONS_DEFAULT	{ .strict = 0, FSCK_OPTIONS_COMMON }
+#define FSCK_OPTIONS_STRICT	{ .strict = 1, FSCK_OPTIONS_COMMON }
 
 /* descend in all linked child objects
  * the return value is:
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 04/22] fsck.h: add a FSCK_OPTIONS_COMMON_ERROR_FUNC macro
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (4 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 03/22] fsck.h: reduce duplication between FSCK_OPTIONS_{DEFAULT,STRICT} Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 05/22] fsck.h: indent arguments to of fsck_set_msg_type Ævar Arnfjörð Bjarmason
                                 ` (17 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Add a FSCK_OPTIONS_COMMON_ERROR_FUNC macro for those that would like
to use FSCK_OPTIONS_COMMON in their own initialization, but supply
their own error functions.

Nothing is being changed to use this yet, but in some subsequent
commits we'll make use of this macro.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.h | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fsck.h b/fsck.h
index ea3a907ec3b..dc35924cbf5 100644
--- a/fsck.h
+++ b/fsck.h
@@ -45,12 +45,15 @@ struct fsck_options {
 
 #define FSCK_OPTIONS_COMMON \
 	.walk = NULL, \
-	.error_func = fsck_error_function, \
 	.msg_type = NULL, \
 	.skiplist = OIDSET_INIT, \
 	.object_names = NULL,
-#define FSCK_OPTIONS_DEFAULT	{ .strict = 0, FSCK_OPTIONS_COMMON }
-#define FSCK_OPTIONS_STRICT	{ .strict = 1, FSCK_OPTIONS_COMMON }
+#define FSCK_OPTIONS_COMMON_ERROR_FUNC \
+	FSCK_OPTIONS_COMMON \
+	.error_func = fsck_error_function
+
+#define FSCK_OPTIONS_DEFAULT	{ .strict = 0, FSCK_OPTIONS_COMMON_ERROR_FUNC }
+#define FSCK_OPTIONS_STRICT	{ .strict = 1, FSCK_OPTIONS_COMMON_ERROR_FUNC }
 
 /* descend in all linked child objects
  * the return value is:
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 05/22] fsck.h: indent arguments to of fsck_set_msg_type
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (5 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 04/22] fsck.h: add a FSCK_OPTIONS_COMMON_ERROR_FUNC macro Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 06/22] fsck.h: use "enum object_type" instead of "int" Ævar Arnfjörð Bjarmason
                                 ` (16 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fsck.h b/fsck.h
index dc35924cbf5..5e488cef6b3 100644
--- a/fsck.h
+++ b/fsck.h
@@ -11,7 +11,7 @@ struct fsck_options;
 struct object;
 
 void fsck_set_msg_type(struct fsck_options *options,
-		const char *msg_id, const char *msg_type);
+		       const char *msg_id, const char *msg_type);
 void fsck_set_msg_types(struct fsck_options *options, const char *values);
 int is_valid_msg_type(const char *msg_id, const char *msg_type);
 
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 06/22] fsck.h: use "enum object_type" instead of "int"
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (6 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 05/22] fsck.h: indent arguments to of fsck_set_msg_type Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 07/22] fsck.c: rename variables in fsck_set_msg_type() for less confusion Ævar Arnfjörð Bjarmason
                                 ` (15 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Change the fsck_walk_func to use an "enum object_type" instead of an
"int" type. The types are compatible, and ever since this was added in
355885d5315 (add generic, type aware object chain walker, 2008-02-25)
we've used entries from object_type (OBJ_BLOB etc.).

So this doesn't really change anything as far as the generated code is
concerned, it just gives the compiler more information and makes this
easier to read.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c           | 3 ++-
 builtin/index-pack.c     | 3 ++-
 builtin/unpack-objects.c | 3 ++-
 fsck.h                   | 3 ++-
 4 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 821e7798c70..68f0329e69e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -197,7 +197,8 @@ static int traverse_reachable(void)
 	return !!result;
 }
 
-static int mark_used(struct object *obj, int type, void *data, struct fsck_options *options)
+static int mark_used(struct object *obj, enum object_type object_type,
+		     void *data, struct fsck_options *options)
 {
 	if (!obj)
 		return 1;
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index bad57488079..69f24fe9f76 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -212,7 +212,8 @@ static void cleanup_thread(void)
 	free(thread_data);
 }
 
-static int mark_link(struct object *obj, int type, void *data, struct fsck_options *options)
+static int mark_link(struct object *obj, enum object_type type,
+		     void *data, struct fsck_options *options)
 {
 	if (!obj)
 		return -1;
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index dd4a75e030d..ca54fd16688 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -187,7 +187,8 @@ static void write_cached_object(struct object *obj, struct obj_buffer *obj_buf)
  * that have reachability requirements and calls this function.
  * Verify its reachability and validity recursively and write it out.
  */
-static int check_object(struct object *obj, int type, void *data, struct fsck_options *options)
+static int check_object(struct object *obj, enum object_type type,
+			void *data, struct fsck_options *options)
 {
 	struct obj_buffer *obj_buf;
 
diff --git a/fsck.h b/fsck.h
index 5e488cef6b3..f67edd8f1f9 100644
--- a/fsck.h
+++ b/fsck.h
@@ -23,7 +23,8 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type);
  *     <0	error signaled and abort
  *     >0	error signaled and do not abort
  */
-typedef int (*fsck_walk_func)(struct object *obj, int type, void *data, struct fsck_options *options);
+typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type,
+			      void *data, struct fsck_options *options);
 
 /* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
 typedef int (*fsck_error)(struct fsck_options *o,
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 07/22] fsck.c: rename variables in fsck_set_msg_type() for less confusion
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (7 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 06/22] fsck.h: use "enum object_type" instead of "int" Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 08/22] fsck.c: move definition of msg_id into append_msg_id() Ævar Arnfjörð Bjarmason
                                 ` (14 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Rename variables in a function added in 0282f4dced0 (fsck: offer a
function to demote fsck errors to warnings, 2015-06-22).

It was needlessly confusing that it took a "msg_type" argument, but
then later declared another "msg_type" of a different type.

Let's rename that to "severity", and rename "id" to "msg_id" and
"msg_id" to "msg_id_str" etc. This will make a follow-up change
smaller.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/fsck.c b/fsck.c
index e3030f3b358..0a9ac9ca070 100644
--- a/fsck.c
+++ b/fsck.c
@@ -203,27 +203,27 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type)
 }
 
 void fsck_set_msg_type(struct fsck_options *options,
-		const char *msg_id, const char *msg_type)
+		const char *msg_id_str, const char *msg_type_str)
 {
-	int id = parse_msg_id(msg_id), type;
+	int msg_id = parse_msg_id(msg_id_str), msg_type;
 
-	if (id < 0)
-		die("Unhandled message id: %s", msg_id);
-	type = parse_msg_type(msg_type);
+	if (msg_id < 0)
+		die("Unhandled message id: %s", msg_id_str);
+	msg_type = parse_msg_type(msg_type_str);
 
-	if (type != FSCK_ERROR && msg_id_info[id].msg_type == FSCK_FATAL)
-		die("Cannot demote %s to %s", msg_id, msg_type);
+	if (msg_type != FSCK_ERROR && msg_id_info[msg_id].msg_type == FSCK_FATAL)
+		die("Cannot demote %s to %s", msg_id_str, msg_type_str);
 
 	if (!options->msg_type) {
 		int i;
-		int *msg_type;
-		ALLOC_ARRAY(msg_type, FSCK_MSG_MAX);
+		int *severity;
+		ALLOC_ARRAY(severity, FSCK_MSG_MAX);
 		for (i = 0; i < FSCK_MSG_MAX; i++)
-			msg_type[i] = fsck_msg_type(i, options);
-		options->msg_type = msg_type;
+			severity[i] = fsck_msg_type(i, options);
+		options->msg_type = severity;
 	}
 
-	options->msg_type[id] = type;
+	options->msg_type[msg_id] = msg_type;
 }
 
 void fsck_set_msg_types(struct fsck_options *options, const char *values)
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 08/22] fsck.c: move definition of msg_id into append_msg_id()
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (8 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 07/22] fsck.c: rename variables in fsck_set_msg_type() for less confusion Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 09/22] fsck.c: rename remaining fsck_msg_id "id" to "msg_id" Ævar Arnfjörð Bjarmason
                                 ` (13 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Refactor code added in 71ab8fa840f (fsck: report the ID of the
error/warning, 2015-06-22) to resolve the msg_id to a string in the
function that wants it, instead of doing it in report().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fsck.c b/fsck.c
index 0a9ac9ca070..b977493f57a 100644
--- a/fsck.c
+++ b/fsck.c
@@ -264,8 +264,9 @@ void fsck_set_msg_types(struct fsck_options *options, const char *values)
 	free(to_free);
 }
 
-static void append_msg_id(struct strbuf *sb, const char *msg_id)
+static void append_msg_id(struct strbuf *sb, enum fsck_msg_id id)
 {
+	const char *msg_id = msg_id_info[id].id_string;
 	for (;;) {
 		char c = *(msg_id)++;
 
@@ -308,7 +309,7 @@ static int report(struct fsck_options *options,
 	else if (msg_type == FSCK_INFO)
 		msg_type = FSCK_WARN;
 
-	append_msg_id(&sb, msg_id_info[id].id_string);
+	append_msg_id(&sb, id);
 
 	va_start(ap, fmt);
 	strbuf_vaddf(&sb, fmt, ap);
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 09/22] fsck.c: rename remaining fsck_msg_id "id" to "msg_id"
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (9 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 08/22] fsck.c: move definition of msg_id into append_msg_id() Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 10/22] fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type" Ævar Arnfjörð Bjarmason
                                 ` (12 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Rename the remaining variables of type fsck_msg_id from "id" to
"msg_id". This change is relatively small, and is worth the churn for
a later change where we have different id's in the "report" function.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fsck.c b/fsck.c
index b977493f57a..6b72ddaa51d 100644
--- a/fsck.c
+++ b/fsck.c
@@ -264,19 +264,19 @@ void fsck_set_msg_types(struct fsck_options *options, const char *values)
 	free(to_free);
 }
 
-static void append_msg_id(struct strbuf *sb, enum fsck_msg_id id)
+static void append_msg_id(struct strbuf *sb, enum fsck_msg_id msg_id)
 {
-	const char *msg_id = msg_id_info[id].id_string;
+	const char *msg_id_str = msg_id_info[msg_id].id_string;
 	for (;;) {
-		char c = *(msg_id)++;
+		char c = *(msg_id_str)++;
 
 		if (!c)
 			break;
 		if (c != '_')
 			strbuf_addch(sb, tolower(c));
 		else {
-			assert(*msg_id);
-			strbuf_addch(sb, *(msg_id)++);
+			assert(*msg_id_str);
+			strbuf_addch(sb, *(msg_id_str)++);
 		}
 	}
 
@@ -292,11 +292,11 @@ static int object_on_skiplist(struct fsck_options *opts,
 __attribute__((format (printf, 5, 6)))
 static int report(struct fsck_options *options,
 		  const struct object_id *oid, enum object_type object_type,
-		  enum fsck_msg_id id, const char *fmt, ...)
+		  enum fsck_msg_id msg_id, const char *fmt, ...)
 {
 	va_list ap;
 	struct strbuf sb = STRBUF_INIT;
-	int msg_type = fsck_msg_type(id, options), result;
+	int msg_type = fsck_msg_type(msg_id, options), result;
 
 	if (msg_type == FSCK_IGNORE)
 		return 0;
@@ -309,7 +309,7 @@ static int report(struct fsck_options *options,
 	else if (msg_type == FSCK_INFO)
 		msg_type = FSCK_WARN;
 
-	append_msg_id(&sb, id);
+	append_msg_id(&sb, msg_id);
 
 	va_start(ap, fmt);
 	strbuf_vaddf(&sb, fmt, ap);
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 10/22] fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type"
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (10 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 09/22] fsck.c: rename remaining fsck_msg_id "id" to "msg_id" Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 11/22] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Ævar Arnfjörð Bjarmason
                                 ` (11 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Refactor "if options->msg_type" and other code added in
0282f4dced0 (fsck: offer a function to demote fsck errors to warnings,
2015-06-22) to reduce the scope of the "int msg_type" variable.

This is in preparation for changing its type in a subsequent commit,
only using it in the "!options->msg_type" scope makes that change

This also brings the code in line with the fsck_set_msg_type()
function (also added in 0282f4dced0), which does a similar check for
"!options->msg_type". Another minor benefit is getting rid of the
style violation of not having braces for the body of the "if".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/fsck.c b/fsck.c
index 6b72ddaa51d..0988ab65792 100644
--- a/fsck.c
+++ b/fsck.c
@@ -167,19 +167,17 @@ void list_config_fsck_msg_ids(struct string_list *list, const char *prefix)
 static int fsck_msg_type(enum fsck_msg_id msg_id,
 	struct fsck_options *options)
 {
-	int msg_type;
-
 	assert(msg_id >= 0 && msg_id < FSCK_MSG_MAX);
 
-	if (options->msg_type)
-		msg_type = options->msg_type[msg_id];
-	else {
-		msg_type = msg_id_info[msg_id].msg_type;
+	if (!options->msg_type) {
+		int msg_type = msg_id_info[msg_id].msg_type;
+
 		if (options->strict && msg_type == FSCK_WARN)
 			msg_type = FSCK_ERROR;
+		return msg_type;
 	}
 
-	return msg_type;
+	return options->msg_type[msg_id];
 }
 
 static int parse_msg_type(const char *str)
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 11/22] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (11 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 10/22] fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type" Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 12/22] fsck.h: re-order and re-assign "enum fsck_msg_type" Ævar Arnfjörð Bjarmason
                                 ` (10 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Move the FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} defines into a new
fsck_msg_type enum.

These defines were originally introduced in:

 - ba002f3b28a (builtin-fsck: move common object checking code to
   fsck.c, 2008-02-25)
 - f50c4407305 (fsck: disallow demoting grave fsck errors to warnings,
   2015-06-22)
 - efaba7cc77f (fsck: optionally ignore specific fsck issues
   completely, 2015-06-22)
 - f27d05b1704 (fsck: allow upgrading fsck warnings to errors,
   2015-06-22)

The reason these were defined in two different places is because we
use FSCK_{IGNORE,INFO,FATAL} only in fsck.c, but FSCK_{ERROR,WARN} are
used by external callbacks.

Untangling that would take some more work, since we expose the new
"enum fsck_msg_type" to both. Similar to "enum object_type" it's not
worth structuring the API in such a way that only those who need
FSCK_{ERROR,WARN} pass around a different type.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c       |  2 +-
 builtin/index-pack.c |  3 ++-
 builtin/mktag.c      |  3 ++-
 fsck.c               | 21 ++++++++++-----------
 fsck.h               | 16 ++++++++++------
 5 files changed, 25 insertions(+), 20 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 68f0329e69e..d6d745dc702 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -89,7 +89,7 @@ static int objerror(struct object *obj, const char *err)
 static int fsck_error_func(struct fsck_options *o,
 			   const struct object_id *oid,
 			   enum object_type object_type,
-			   int msg_type, const char *message)
+			   enum fsck_msg_type msg_type, const char *message)
 {
 	switch (msg_type) {
 	case FSCK_WARN:
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 69f24fe9f76..56b8efaa89b 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1716,7 +1716,8 @@ static void show_pack_info(int stat_only)
 static int print_dangling_gitmodules(struct fsck_options *o,
 				     const struct object_id *oid,
 				     enum object_type object_type,
-				     int msg_type, const char *message)
+				     enum fsck_msg_type msg_type,
+				     const char *message)
 {
 	/*
 	 * NEEDSWORK: Plumb the MSG_ID (from fsck.c) here and use it
diff --git a/builtin/mktag.c b/builtin/mktag.c
index 41a399a69e4..1834394a9b6 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -22,7 +22,8 @@ static int mktag_config(const char *var, const char *value, void *cb)
 static int mktag_fsck_error_func(struct fsck_options *o,
 				 const struct object_id *oid,
 				 enum object_type object_type,
-				 int msg_type, const char *message)
+				 enum fsck_msg_type msg_type,
+				 const char *message)
 {
 	switch (msg_type) {
 	case FSCK_WARN:
diff --git a/fsck.c b/fsck.c
index 0988ab65792..fb7d071bbf9 100644
--- a/fsck.c
+++ b/fsck.c
@@ -22,9 +22,6 @@
 static struct oidset gitmodules_found = OIDSET_INIT;
 static struct oidset gitmodules_done = OIDSET_INIT;
 
-#define FSCK_FATAL -1
-#define FSCK_INFO -2
-
 #define FOREACH_MSG_ID(FUNC) \
 	/* fatal errors */ \
 	FUNC(NUL_IN_HEADER, FATAL) \
@@ -97,7 +94,7 @@ static struct {
 	const char *id_string;
 	const char *downcased;
 	const char *camelcased;
-	int msg_type;
+	enum fsck_msg_type msg_type;
 } msg_id_info[FSCK_MSG_MAX + 1] = {
 	FOREACH_MSG_ID(MSG_ID)
 	{ NULL, NULL, NULL, -1 }
@@ -164,13 +161,13 @@ void list_config_fsck_msg_ids(struct string_list *list, const char *prefix)
 		list_config_item(list, prefix, msg_id_info[i].camelcased);
 }
 
-static int fsck_msg_type(enum fsck_msg_id msg_id,
+static enum fsck_msg_type fsck_msg_type(enum fsck_msg_id msg_id,
 	struct fsck_options *options)
 {
 	assert(msg_id >= 0 && msg_id < FSCK_MSG_MAX);
 
 	if (!options->msg_type) {
-		int msg_type = msg_id_info[msg_id].msg_type;
+		enum fsck_msg_type msg_type = msg_id_info[msg_id].msg_type;
 
 		if (options->strict && msg_type == FSCK_WARN)
 			msg_type = FSCK_ERROR;
@@ -180,7 +177,7 @@ static int fsck_msg_type(enum fsck_msg_id msg_id,
 	return options->msg_type[msg_id];
 }
 
-static int parse_msg_type(const char *str)
+static enum fsck_msg_type parse_msg_type(const char *str)
 {
 	if (!strcmp(str, "error"))
 		return FSCK_ERROR;
@@ -203,7 +200,8 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type)
 void fsck_set_msg_type(struct fsck_options *options,
 		const char *msg_id_str, const char *msg_type_str)
 {
-	int msg_id = parse_msg_id(msg_id_str), msg_type;
+	int msg_id = parse_msg_id(msg_id_str);
+	enum fsck_msg_type msg_type;
 
 	if (msg_id < 0)
 		die("Unhandled message id: %s", msg_id_str);
@@ -214,7 +212,7 @@ void fsck_set_msg_type(struct fsck_options *options,
 
 	if (!options->msg_type) {
 		int i;
-		int *severity;
+		enum fsck_msg_type *severity;
 		ALLOC_ARRAY(severity, FSCK_MSG_MAX);
 		for (i = 0; i < FSCK_MSG_MAX; i++)
 			severity[i] = fsck_msg_type(i, options);
@@ -294,7 +292,8 @@ static int report(struct fsck_options *options,
 {
 	va_list ap;
 	struct strbuf sb = STRBUF_INIT;
-	int msg_type = fsck_msg_type(msg_id, options), result;
+	enum fsck_msg_type msg_type = fsck_msg_type(msg_id, options);
+	int result;
 
 	if (msg_type == FSCK_IGNORE)
 		return 0;
@@ -1265,7 +1264,7 @@ int fsck_object(struct object *obj, void *data, unsigned long size,
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid,
 			enum object_type object_type,
-			int msg_type, const char *message)
+			enum fsck_msg_type msg_type, const char *message)
 {
 	if (msg_type == FSCK_WARN) {
 		warning("object %s: %s", fsck_describe_object(o, oid), message);
diff --git a/fsck.h b/fsck.h
index f67edd8f1f9..2ecc15eee77 100644
--- a/fsck.h
+++ b/fsck.h
@@ -3,9 +3,13 @@
 
 #include "oidset.h"
 
-#define FSCK_ERROR 1
-#define FSCK_WARN 2
-#define FSCK_IGNORE 3
+enum fsck_msg_type {
+	FSCK_INFO  = -2,
+	FSCK_FATAL = -1,
+	FSCK_ERROR = 1,
+	FSCK_WARN,
+	FSCK_IGNORE
+};
 
 struct fsck_options;
 struct object;
@@ -29,17 +33,17 @@ typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type,
 /* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
 typedef int (*fsck_error)(struct fsck_options *o,
 			  const struct object_id *oid, enum object_type object_type,
-			  int msg_type, const char *message);
+			  enum fsck_msg_type msg_type, const char *message);
 
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid, enum object_type object_type,
-			int msg_type, const char *message);
+			enum fsck_msg_type msg_type, const char *message);
 
 struct fsck_options {
 	fsck_walk_func walk;
 	fsck_error error_func;
 	unsigned strict:1;
-	int *msg_type;
+	enum fsck_msg_type *msg_type;
 	struct oidset skiplist;
 	kh_oid_map_t *object_names;
 };
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 12/22] fsck.h: re-order and re-assign "enum fsck_msg_type"
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (12 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 11/22] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 13/22] fsck.c: call parse_msg_type() early in fsck_set_msg_type() Ævar Arnfjörð Bjarmason
                                 ` (9 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Change the values in the "enum fsck_msg_type" from being manually
assigned to using default C enum values.

This means we end up with a FSCK_IGNORE=0, which was previously
defined as "2".

I'm confident that nothing relies on these values, we always compare
them explicitly. Let's not omit "0" so it won't be assumed that we're
using these as a boolean somewhere.

This also allows us to re-structure the fields to mark which are
"private" v.s. "public". See the preceding commit for a rationale for
not simply splitting these into two enums, namely that this is used
for both the private and public fsck API.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.h | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fsck.h b/fsck.h
index 2ecc15eee77..fce9981a0cb 100644
--- a/fsck.h
+++ b/fsck.h
@@ -4,11 +4,13 @@
 #include "oidset.h"
 
 enum fsck_msg_type {
-	FSCK_INFO  = -2,
-	FSCK_FATAL = -1,
-	FSCK_ERROR = 1,
+	/* for internal use only */
+	FSCK_IGNORE,
+	FSCK_INFO,
+	FSCK_FATAL,
+	/* "public", fed to e.g. error_func callbacks */
+	FSCK_ERROR,
 	FSCK_WARN,
-	FSCK_IGNORE
 };
 
 struct fsck_options;
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 13/22] fsck.c: call parse_msg_type() early in fsck_set_msg_type()
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (13 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 12/22] fsck.h: re-order and re-assign "enum fsck_msg_type" Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 14/22] fsck.c: undefine temporary STR macro after use Ævar Arnfjörð Bjarmason
                                 ` (8 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

There's no reason to defer the calling of parse_msg_type() until after
we've checked if the "id < 0". This is not a hot codepath, and
parse_msg_type() itself may die on invalid input.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fsck.c b/fsck.c
index fb7d071bbf9..2ccf1a2f0fd 100644
--- a/fsck.c
+++ b/fsck.c
@@ -201,11 +201,10 @@ void fsck_set_msg_type(struct fsck_options *options,
 		const char *msg_id_str, const char *msg_type_str)
 {
 	int msg_id = parse_msg_id(msg_id_str);
-	enum fsck_msg_type msg_type;
+	enum fsck_msg_type msg_type = parse_msg_type(msg_type_str);
 
 	if (msg_id < 0)
 		die("Unhandled message id: %s", msg_id_str);
-	msg_type = parse_msg_type(msg_type_str);
 
 	if (msg_type != FSCK_ERROR && msg_id_info[msg_id].msg_type == FSCK_FATAL)
 		die("Cannot demote %s to %s", msg_id_str, msg_type_str);
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 14/22] fsck.c: undefine temporary STR macro after use
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (14 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 13/22] fsck.c: call parse_msg_type() early in fsck_set_msg_type() Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 15/22] fsck.c: give "FOREACH_MSG_ID" a more specific name Ævar Arnfjörð Bjarmason
                                 ` (7 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

In f417eed8cde (fsck: provide a function to parse fsck message IDs,
2015-06-22) the "STR" macro was introduced, but that short macro name
was not undefined after use as was done earlier in the same series for
the MSG_ID macro in c99ba492f1c (fsck: introduce identifiers for fsck
messages, 2015-06-22).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fsck.c b/fsck.c
index 2ccf1a2f0fd..f4c924ed044 100644
--- a/fsck.c
+++ b/fsck.c
@@ -100,6 +100,7 @@ static struct {
 	{ NULL, NULL, NULL, -1 }
 };
 #undef MSG_ID
+#undef STR
 
 static void prepare_msg_ids(void)
 {
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 15/22] fsck.c: give "FOREACH_MSG_ID" a more specific name
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (15 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 14/22] fsck.c: undefine temporary STR macro after use Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 16/22] fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h Ævar Arnfjörð Bjarmason
                                 ` (6 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Rename the FOREACH_MSG_ID macro to FOREACH_FSCK_MSG_ID in preparation
for moving it over to fsck.h. It's good convention to name macros
in *.h files in such a way as to clearly not clash with any other
names in other files.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fsck.c b/fsck.c
index f4c924ed044..6fbc56e9faa 100644
--- a/fsck.c
+++ b/fsck.c
@@ -22,7 +22,7 @@
 static struct oidset gitmodules_found = OIDSET_INIT;
 static struct oidset gitmodules_done = OIDSET_INIT;
 
-#define FOREACH_MSG_ID(FUNC) \
+#define FOREACH_FSCK_MSG_ID(FUNC) \
 	/* fatal errors */ \
 	FUNC(NUL_IN_HEADER, FATAL) \
 	FUNC(UNTERMINATED_HEADER, FATAL) \
@@ -83,7 +83,7 @@ static struct oidset gitmodules_done = OIDSET_INIT;
 
 #define MSG_ID(id, msg_type) FSCK_MSG_##id,
 enum fsck_msg_id {
-	FOREACH_MSG_ID(MSG_ID)
+	FOREACH_FSCK_MSG_ID(MSG_ID)
 	FSCK_MSG_MAX
 };
 #undef MSG_ID
@@ -96,7 +96,7 @@ static struct {
 	const char *camelcased;
 	enum fsck_msg_type msg_type;
 } msg_id_info[FSCK_MSG_MAX + 1] = {
-	FOREACH_MSG_ID(MSG_ID)
+	FOREACH_FSCK_MSG_ID(MSG_ID)
 	{ NULL, NULL, NULL, -1 }
 };
 #undef MSG_ID
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 16/22] fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (16 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 15/22] fsck.c: give "FOREACH_MSG_ID" a more specific name Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 17/22] fsck.c: pass along the fsck_msg_id in the fsck_error callback Ævar Arnfjörð Bjarmason
                                 ` (5 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Move the FOREACH_FSCK_MSG_ID macro and the fsck_msg_id enum it helps
define from fsck.c to fsck.h. This is in preparation for having
non-static functions take the fsck_msg_id as an argument.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 66 ----------------------------------------------------------
 fsck.h | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 66 insertions(+), 66 deletions(-)

diff --git a/fsck.c b/fsck.c
index 6fbc56e9faa..8a66168e516 100644
--- a/fsck.c
+++ b/fsck.c
@@ -22,72 +22,6 @@
 static struct oidset gitmodules_found = OIDSET_INIT;
 static struct oidset gitmodules_done = OIDSET_INIT;
 
-#define FOREACH_FSCK_MSG_ID(FUNC) \
-	/* fatal errors */ \
-	FUNC(NUL_IN_HEADER, FATAL) \
-	FUNC(UNTERMINATED_HEADER, FATAL) \
-	/* errors */ \
-	FUNC(BAD_DATE, ERROR) \
-	FUNC(BAD_DATE_OVERFLOW, ERROR) \
-	FUNC(BAD_EMAIL, ERROR) \
-	FUNC(BAD_NAME, ERROR) \
-	FUNC(BAD_OBJECT_SHA1, ERROR) \
-	FUNC(BAD_PARENT_SHA1, ERROR) \
-	FUNC(BAD_TAG_OBJECT, ERROR) \
-	FUNC(BAD_TIMEZONE, ERROR) \
-	FUNC(BAD_TREE, ERROR) \
-	FUNC(BAD_TREE_SHA1, ERROR) \
-	FUNC(BAD_TYPE, ERROR) \
-	FUNC(DUPLICATE_ENTRIES, ERROR) \
-	FUNC(MISSING_AUTHOR, ERROR) \
-	FUNC(MISSING_COMMITTER, ERROR) \
-	FUNC(MISSING_EMAIL, ERROR) \
-	FUNC(MISSING_NAME_BEFORE_EMAIL, ERROR) \
-	FUNC(MISSING_OBJECT, ERROR) \
-	FUNC(MISSING_SPACE_BEFORE_DATE, ERROR) \
-	FUNC(MISSING_SPACE_BEFORE_EMAIL, ERROR) \
-	FUNC(MISSING_TAG, ERROR) \
-	FUNC(MISSING_TAG_ENTRY, ERROR) \
-	FUNC(MISSING_TREE, ERROR) \
-	FUNC(MISSING_TREE_OBJECT, ERROR) \
-	FUNC(MISSING_TYPE, ERROR) \
-	FUNC(MISSING_TYPE_ENTRY, ERROR) \
-	FUNC(MULTIPLE_AUTHORS, ERROR) \
-	FUNC(TREE_NOT_SORTED, ERROR) \
-	FUNC(UNKNOWN_TYPE, ERROR) \
-	FUNC(ZERO_PADDED_DATE, ERROR) \
-	FUNC(GITMODULES_MISSING, ERROR) \
-	FUNC(GITMODULES_BLOB, ERROR) \
-	FUNC(GITMODULES_LARGE, ERROR) \
-	FUNC(GITMODULES_NAME, ERROR) \
-	FUNC(GITMODULES_SYMLINK, ERROR) \
-	FUNC(GITMODULES_URL, ERROR) \
-	FUNC(GITMODULES_PATH, ERROR) \
-	FUNC(GITMODULES_UPDATE, ERROR) \
-	/* warnings */ \
-	FUNC(BAD_FILEMODE, WARN) \
-	FUNC(EMPTY_NAME, WARN) \
-	FUNC(FULL_PATHNAME, WARN) \
-	FUNC(HAS_DOT, WARN) \
-	FUNC(HAS_DOTDOT, WARN) \
-	FUNC(HAS_DOTGIT, WARN) \
-	FUNC(NULL_SHA1, WARN) \
-	FUNC(ZERO_PADDED_FILEMODE, WARN) \
-	FUNC(NUL_IN_COMMIT, WARN) \
-	/* infos (reported as warnings, but ignored by default) */ \
-	FUNC(GITMODULES_PARSE, INFO) \
-	FUNC(BAD_TAG_NAME, INFO) \
-	FUNC(MISSING_TAGGER_ENTRY, INFO) \
-	/* ignored (elevated when requested) */ \
-	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
-
-#define MSG_ID(id, msg_type) FSCK_MSG_##id,
-enum fsck_msg_id {
-	FOREACH_FSCK_MSG_ID(MSG_ID)
-	FSCK_MSG_MAX
-};
-#undef MSG_ID
-
 #define STR(x) #x
 #define MSG_ID(id, msg_type) { STR(id), NULL, NULL, FSCK_##msg_type },
 static struct {
diff --git a/fsck.h b/fsck.h
index fce9981a0cb..c3d3b47b88b 100644
--- a/fsck.h
+++ b/fsck.h
@@ -13,6 +13,72 @@ enum fsck_msg_type {
 	FSCK_WARN,
 };
 
+#define FOREACH_FSCK_MSG_ID(FUNC) \
+	/* fatal errors */ \
+	FUNC(NUL_IN_HEADER, FATAL) \
+	FUNC(UNTERMINATED_HEADER, FATAL) \
+	/* errors */ \
+	FUNC(BAD_DATE, ERROR) \
+	FUNC(BAD_DATE_OVERFLOW, ERROR) \
+	FUNC(BAD_EMAIL, ERROR) \
+	FUNC(BAD_NAME, ERROR) \
+	FUNC(BAD_OBJECT_SHA1, ERROR) \
+	FUNC(BAD_PARENT_SHA1, ERROR) \
+	FUNC(BAD_TAG_OBJECT, ERROR) \
+	FUNC(BAD_TIMEZONE, ERROR) \
+	FUNC(BAD_TREE, ERROR) \
+	FUNC(BAD_TREE_SHA1, ERROR) \
+	FUNC(BAD_TYPE, ERROR) \
+	FUNC(DUPLICATE_ENTRIES, ERROR) \
+	FUNC(MISSING_AUTHOR, ERROR) \
+	FUNC(MISSING_COMMITTER, ERROR) \
+	FUNC(MISSING_EMAIL, ERROR) \
+	FUNC(MISSING_NAME_BEFORE_EMAIL, ERROR) \
+	FUNC(MISSING_OBJECT, ERROR) \
+	FUNC(MISSING_SPACE_BEFORE_DATE, ERROR) \
+	FUNC(MISSING_SPACE_BEFORE_EMAIL, ERROR) \
+	FUNC(MISSING_TAG, ERROR) \
+	FUNC(MISSING_TAG_ENTRY, ERROR) \
+	FUNC(MISSING_TREE, ERROR) \
+	FUNC(MISSING_TREE_OBJECT, ERROR) \
+	FUNC(MISSING_TYPE, ERROR) \
+	FUNC(MISSING_TYPE_ENTRY, ERROR) \
+	FUNC(MULTIPLE_AUTHORS, ERROR) \
+	FUNC(TREE_NOT_SORTED, ERROR) \
+	FUNC(UNKNOWN_TYPE, ERROR) \
+	FUNC(ZERO_PADDED_DATE, ERROR) \
+	FUNC(GITMODULES_MISSING, ERROR) \
+	FUNC(GITMODULES_BLOB, ERROR) \
+	FUNC(GITMODULES_LARGE, ERROR) \
+	FUNC(GITMODULES_NAME, ERROR) \
+	FUNC(GITMODULES_SYMLINK, ERROR) \
+	FUNC(GITMODULES_URL, ERROR) \
+	FUNC(GITMODULES_PATH, ERROR) \
+	FUNC(GITMODULES_UPDATE, ERROR) \
+	/* warnings */ \
+	FUNC(BAD_FILEMODE, WARN) \
+	FUNC(EMPTY_NAME, WARN) \
+	FUNC(FULL_PATHNAME, WARN) \
+	FUNC(HAS_DOT, WARN) \
+	FUNC(HAS_DOTDOT, WARN) \
+	FUNC(HAS_DOTGIT, WARN) \
+	FUNC(NULL_SHA1, WARN) \
+	FUNC(ZERO_PADDED_FILEMODE, WARN) \
+	FUNC(NUL_IN_COMMIT, WARN) \
+	/* infos (reported as warnings, but ignored by default) */ \
+	FUNC(GITMODULES_PARSE, INFO) \
+	FUNC(BAD_TAG_NAME, INFO) \
+	FUNC(MISSING_TAGGER_ENTRY, INFO) \
+	/* ignored (elevated when requested) */ \
+	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
+
+#define MSG_ID(id, msg_type) FSCK_MSG_##id,
+enum fsck_msg_id {
+	FOREACH_FSCK_MSG_ID(MSG_ID)
+	FSCK_MSG_MAX
+};
+#undef MSG_ID
+
 struct fsck_options;
 struct object;
 
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 17/22] fsck.c: pass along the fsck_msg_id in the fsck_error callback
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (17 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 16/22] fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 18/22] fsck.c: add an fsck_set_msg_type() API that takes enums Ævar Arnfjörð Bjarmason
                                 ` (4 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Change the fsck_error callback to also pass along the
fsck_msg_id. Before this change the only way to get the message id was
to parse it back out of the "message".

Let's pass it down explicitly for the benefit of callers that might
want to use it, as discussed in [1].

Passing the msg_type is now redundant, as you can always get it back
from the msg_id, but I'm not changing that convention. It's really
common to need the msg_type, and the report() function itself (which
calls "fsck_error") needs to call fsck_msg_type() to discover
it. Let's not needlessly re-do that work in the user callback.

1. https://lore.kernel.org/git/87blcja2ha.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c       | 4 +++-
 builtin/index-pack.c | 3 ++-
 builtin/mktag.c      | 1 +
 fsck.c               | 6 ++++--
 fsck.h               | 6 ++++--
 5 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index d6d745dc702..b71fac4ceca 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -89,7 +89,9 @@ static int objerror(struct object *obj, const char *err)
 static int fsck_error_func(struct fsck_options *o,
 			   const struct object_id *oid,
 			   enum object_type object_type,
-			   enum fsck_msg_type msg_type, const char *message)
+			   enum fsck_msg_type msg_type,
+			   enum fsck_msg_id msg_id,
+			   const char *message)
 {
 	switch (msg_type) {
 	case FSCK_WARN:
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 56b8efaa89b..2b2266a4b7d 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1717,6 +1717,7 @@ static int print_dangling_gitmodules(struct fsck_options *o,
 				     const struct object_id *oid,
 				     enum object_type object_type,
 				     enum fsck_msg_type msg_type,
+				     enum fsck_msg_id msg_id,
 				     const char *message)
 {
 	/*
@@ -1727,7 +1728,7 @@ static int print_dangling_gitmodules(struct fsck_options *o,
 		printf("%s\n", oid_to_hex(oid));
 		return 0;
 	}
-	return fsck_error_function(o, oid, object_type, msg_type, message);
+	return fsck_error_function(o, oid, object_type, msg_type, msg_id, message);
 }
 
 int cmd_index_pack(int argc, const char **argv, const char *prefix)
diff --git a/builtin/mktag.c b/builtin/mktag.c
index 1834394a9b6..dc989c356f5 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -23,6 +23,7 @@ static int mktag_fsck_error_func(struct fsck_options *o,
 				 const struct object_id *oid,
 				 enum object_type object_type,
 				 enum fsck_msg_type msg_type,
+				 enum fsck_msg_id msg_id,
 				 const char *message)
 {
 	switch (msg_type) {
diff --git a/fsck.c b/fsck.c
index 8a66168e516..5a040eb4fd5 100644
--- a/fsck.c
+++ b/fsck.c
@@ -245,7 +245,7 @@ static int report(struct fsck_options *options,
 	va_start(ap, fmt);
 	strbuf_vaddf(&sb, fmt, ap);
 	result = options->error_func(options, oid, object_type,
-				     msg_type, sb.buf);
+				     msg_type, msg_id, sb.buf);
 	strbuf_release(&sb);
 	va_end(ap);
 
@@ -1198,7 +1198,9 @@ int fsck_object(struct object *obj, void *data, unsigned long size,
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid,
 			enum object_type object_type,
-			enum fsck_msg_type msg_type, const char *message)
+			enum fsck_msg_type msg_type,
+			enum fsck_msg_id msg_id,
+			const char *message)
 {
 	if (msg_type == FSCK_WARN) {
 		warning("object %s: %s", fsck_describe_object(o, oid), message);
diff --git a/fsck.h b/fsck.h
index c3d3b47b88b..33ecf3f3f16 100644
--- a/fsck.h
+++ b/fsck.h
@@ -101,11 +101,13 @@ typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type,
 /* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
 typedef int (*fsck_error)(struct fsck_options *o,
 			  const struct object_id *oid, enum object_type object_type,
-			  enum fsck_msg_type msg_type, const char *message);
+			  enum fsck_msg_type msg_type, enum fsck_msg_id msg_id,
+			  const char *message);
 
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid, enum object_type object_type,
-			enum fsck_msg_type msg_type, const char *message);
+			enum fsck_msg_type msg_type, enum fsck_msg_id msg_id,
+			const char *message);
 
 struct fsck_options {
 	fsck_walk_func walk;
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 18/22] fsck.c: add an fsck_set_msg_type() API that takes enums
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (18 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 17/22] fsck.c: pass along the fsck_msg_id in the fsck_error callback Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 19/22] fsck.c: move gitmodules_{found,done} into fsck_options Ævar Arnfjörð Bjarmason
                                 ` (3 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Change code I added in acf9de4c94e (mktag: use fsck instead of custom
verify_tag(), 2021-01-05) to make use of a new API function that takes
the fsck_msg_{id,type} types, instead of arbitrary strings that
we'll (hopefully) parse into those types.

At the time that the fsck_set_msg_type() API was introduced in
0282f4dced0 (fsck: offer a function to demote fsck errors to warnings,
2015-06-22) it was only intended to be used to parse user-supplied
data.

For things that are purely internal to the C code it makes sense to
have the compiler check these arguments, and to skip the sanity
checking of the data in fsck_set_msg_type() which is redundant to
checks we get from the compiler.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/mktag.c |  3 ++-
 fsck.c          | 27 +++++++++++++++++----------
 fsck.h          |  3 +++
 3 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/builtin/mktag.c b/builtin/mktag.c
index dc989c356f5..de67a94f24e 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -93,7 +93,8 @@ int cmd_mktag(int argc, const char **argv, const char *prefix)
 		die_errno(_("could not read from stdin"));
 
 	fsck_options.error_func = mktag_fsck_error_func;
-	fsck_set_msg_type(&fsck_options, "extraheaderentry", "warn");
+	fsck_set_msg_type_from_ids(&fsck_options, FSCK_MSG_EXTRA_HEADER_ENTRY,
+				   FSCK_WARN);
 	/* config might set fsck.extraHeaderEntry=* again */
 	git_config(mktag_config, NULL);
 	if (fsck_tag_standalone(NULL, buf.buf, buf.len, &fsck_options,
diff --git a/fsck.c b/fsck.c
index 5a040eb4fd5..f26f47b2a10 100644
--- a/fsck.c
+++ b/fsck.c
@@ -132,6 +132,22 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type)
 	return 1;
 }
 
+void fsck_set_msg_type_from_ids(struct fsck_options *options,
+				enum fsck_msg_id msg_id,
+				enum fsck_msg_type msg_type)
+{
+	if (!options->msg_type) {
+		int i;
+		enum fsck_msg_type *severity;
+		ALLOC_ARRAY(severity, FSCK_MSG_MAX);
+		for (i = 0; i < FSCK_MSG_MAX; i++)
+			severity[i] = fsck_msg_type(i, options);
+		options->msg_type = severity;
+	}
+
+	options->msg_type[msg_id] = msg_type;
+}
+
 void fsck_set_msg_type(struct fsck_options *options,
 		const char *msg_id_str, const char *msg_type_str)
 {
@@ -144,16 +160,7 @@ void fsck_set_msg_type(struct fsck_options *options,
 	if (msg_type != FSCK_ERROR && msg_id_info[msg_id].msg_type == FSCK_FATAL)
 		die("Cannot demote %s to %s", msg_id_str, msg_type_str);
 
-	if (!options->msg_type) {
-		int i;
-		enum fsck_msg_type *severity;
-		ALLOC_ARRAY(severity, FSCK_MSG_MAX);
-		for (i = 0; i < FSCK_MSG_MAX; i++)
-			severity[i] = fsck_msg_type(i, options);
-		options->msg_type = severity;
-	}
-
-	options->msg_type[msg_id] = msg_type;
+	fsck_set_msg_type_from_ids(options, msg_id, msg_type);
 }
 
 void fsck_set_msg_types(struct fsck_options *options, const char *values)
diff --git a/fsck.h b/fsck.h
index 33ecf3f3f16..6c2fd9c5cc0 100644
--- a/fsck.h
+++ b/fsck.h
@@ -82,6 +82,9 @@ enum fsck_msg_id {
 struct fsck_options;
 struct object;
 
+void fsck_set_msg_type_from_ids(struct fsck_options *options,
+				enum fsck_msg_id msg_id,
+				enum fsck_msg_type msg_type);
 void fsck_set_msg_type(struct fsck_options *options,
 		       const char *msg_id, const char *msg_type);
 void fsck_set_msg_types(struct fsck_options *options, const char *values);
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 19/22] fsck.c: move gitmodules_{found,done} into fsck_options
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (19 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 18/22] fsck.c: add an fsck_set_msg_type() API that takes enums Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 20/22] fetch-pack: don't needlessly copy fsck_options Ævar Arnfjörð Bjarmason
                                 ` (2 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Move the gitmodules_{found,done} static variables added in
159e7b080bf (fsck: detect gitmodules files, 2018-05-02) into the
fsck_options struct. It makes sense to keep all the context in the
same place.

This requires changing the recently added register_found_gitmodules()
function added in 5476e1efde (fetch-pack: print and use dangling
.gitmodules, 2021-02-22) to take fsck_options. That function will be
removed in a subsequent commit, but as it'll require the new
gitmodules_found attribute of "fsck_options" we need this intermediate
step first.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fetch-pack.c |  2 +-
 fsck.c       | 23 ++++++++++-------------
 fsck.h       |  7 ++++++-
 3 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 0cb59acc486..53d7ef00856 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -998,7 +998,7 @@ static void fsck_gitmodules_oids(struct oidset *gitmodules_oids)
 
 	oidset_iter_init(gitmodules_oids, &iter);
 	while ((oid = oidset_iter_next(&iter)))
-		register_found_gitmodules(oid);
+		register_found_gitmodules(&fo, oid);
 	if (fsck_finish(&fo))
 		die("fsck failed");
 }
diff --git a/fsck.c b/fsck.c
index f26f47b2a10..565274a946c 100644
--- a/fsck.c
+++ b/fsck.c
@@ -19,9 +19,6 @@
 #include "credential.h"
 #include "help.h"
 
-static struct oidset gitmodules_found = OIDSET_INIT;
-static struct oidset gitmodules_done = OIDSET_INIT;
-
 #define STR(x) #x
 #define MSG_ID(id, msg_type) { STR(id), NULL, NULL, FSCK_##msg_type },
 static struct {
@@ -624,7 +621,7 @@ static int fsck_tree(const struct object_id *oid,
 
 		if (is_hfs_dotgitmodules(name) || is_ntfs_dotgitmodules(name)) {
 			if (!S_ISLNK(mode))
-				oidset_insert(&gitmodules_found, oid);
+				oidset_insert(&options->gitmodules_found, oid);
 			else
 				retval += report(options,
 						 oid, OBJ_TREE,
@@ -638,7 +635,7 @@ static int fsck_tree(const struct object_id *oid,
 				has_dotgit |= is_ntfs_dotgit(backslash);
 				if (is_ntfs_dotgitmodules(backslash)) {
 					if (!S_ISLNK(mode))
-						oidset_insert(&gitmodules_found, oid);
+						oidset_insert(&options->gitmodules_found, oid);
 					else
 						retval += report(options, oid, OBJ_TREE,
 								 FSCK_MSG_GITMODULES_SYMLINK,
@@ -1150,9 +1147,9 @@ static int fsck_blob(const struct object_id *oid, const char *buf,
 	struct fsck_gitmodules_data data;
 	struct config_options config_opts = { 0 };
 
-	if (!oidset_contains(&gitmodules_found, oid))
+	if (!oidset_contains(&options->gitmodules_found, oid))
 		return 0;
-	oidset_insert(&gitmodules_done, oid);
+	oidset_insert(&options->gitmodules_done, oid);
 
 	if (object_on_skiplist(options, oid))
 		return 0;
@@ -1217,9 +1214,9 @@ int fsck_error_function(struct fsck_options *o,
 	return 1;
 }
 
-void register_found_gitmodules(const struct object_id *oid)
+void register_found_gitmodules(struct fsck_options *options, const struct object_id *oid)
 {
-	oidset_insert(&gitmodules_found, oid);
+	oidset_insert(&options->gitmodules_found, oid);
 }
 
 int fsck_finish(struct fsck_options *options)
@@ -1228,13 +1225,13 @@ int fsck_finish(struct fsck_options *options)
 	struct oidset_iter iter;
 	const struct object_id *oid;
 
-	oidset_iter_init(&gitmodules_found, &iter);
+	oidset_iter_init(&options->gitmodules_found, &iter);
 	while ((oid = oidset_iter_next(&iter))) {
 		enum object_type type;
 		unsigned long size;
 		char *buf;
 
-		if (oidset_contains(&gitmodules_done, oid))
+		if (oidset_contains(&options->gitmodules_done, oid))
 			continue;
 
 		buf = read_object_file(oid, &type, &size);
@@ -1259,8 +1256,8 @@ int fsck_finish(struct fsck_options *options)
 	}
 
 
-	oidset_clear(&gitmodules_found);
-	oidset_clear(&gitmodules_done);
+	oidset_clear(&options->gitmodules_found);
+	oidset_clear(&options->gitmodules_done);
 	return ret;
 }
 
diff --git a/fsck.h b/fsck.h
index 6c2fd9c5cc0..bb59ef05b68 100644
--- a/fsck.h
+++ b/fsck.h
@@ -118,6 +118,8 @@ struct fsck_options {
 	unsigned strict:1;
 	enum fsck_msg_type *msg_type;
 	struct oidset skiplist;
+	struct oidset gitmodules_found;
+	struct oidset gitmodules_done;
 	kh_oid_map_t *object_names;
 };
 
@@ -125,6 +127,8 @@ struct fsck_options {
 	.walk = NULL, \
 	.msg_type = NULL, \
 	.skiplist = OIDSET_INIT, \
+	.gitmodules_found = OIDSET_INIT, \
+	.gitmodules_done = OIDSET_INIT, \
 	.object_names = NULL,
 #define FSCK_OPTIONS_COMMON_ERROR_FUNC \
 	FSCK_OPTIONS_COMMON \
@@ -149,7 +153,8 @@ int fsck_walk(struct object *obj, void *data, struct fsck_options *options);
 int fsck_object(struct object *obj, void *data, unsigned long size,
 	struct fsck_options *options);
 
-void register_found_gitmodules(const struct object_id *oid);
+void register_found_gitmodules(struct fsck_options *options,
+			       const struct object_id *oid);
 
 /*
  * fsck a tag, and pass info about it back to the caller. This is
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 20/22] fetch-pack: don't needlessly copy fsck_options
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (20 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 19/22] fsck.c: move gitmodules_{found,done} into fsck_options Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 21/22] fetch-pack: use file-scope static struct for fsck_options Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 22/22] fetch-pack: use new fsck API to printing dangling submodules Ævar Arnfjörð Bjarmason
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Change the behavior of the .gitmodules validation added in
5476e1efde (fetch-pack: print and use dangling .gitmodules,
2021-02-22) so we're using one "fsck_options".

I found that code confusing to read. One might think that not setting
up the error_func earlier means that we're relying on the "error_func"
not being set in some code in between the two hunks being modified
here.

But we're not, all we're doing in the rest of "cmd_index_pack()" is
further setup by calling fsck_set_msg_types(), and assigning to
do_fsck_object.

So there was no reason in 5476e1efde to make a shallow copy of the
fsck_options struct before setting error_func. Let's just do this
setup at the top of the function, along with the "walk" assignment.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/index-pack.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 2b2266a4b7d..5ad80b85b47 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1761,6 +1761,7 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 
 	read_replace_refs = 0;
 	fsck_options.walk = mark_link;
+	fsck_options.error_func = print_dangling_gitmodules;
 
 	reset_pack_idx_option(&opts);
 	git_config(git_index_pack_config, &opts);
@@ -1951,13 +1952,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	else
 		close(input_fd);
 
-	if (do_fsck_object) {
-		struct fsck_options fo = fsck_options;
-
-		fo.error_func = print_dangling_gitmodules;
-		if (fsck_finish(&fo))
-			die(_("fsck error in pack objects"));
-	}
+	if (do_fsck_object && fsck_finish(&fsck_options))
+		die(_("fsck error in pack objects"));
 
 	free(objects);
 	strbuf_release(&index_name_buf);
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 21/22] fetch-pack: use file-scope static struct for fsck_options
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (21 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 20/22] fetch-pack: don't needlessly copy fsck_options Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 22/22] fetch-pack: use new fsck API to printing dangling submodules Ævar Arnfjörð Bjarmason
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Change code added in 5476e1efde (fetch-pack: print and use dangling
.gitmodules, 2021-02-22) so that we use a file-scoped "static struct
fsck_options" instead of defining one in the "fsck_gitmodules_oids()"
function.

We use this pattern in all of
builtin/{fsck,index-pack,mktag,unpack-objects}.c. It's odd to see
fetch-pack be the odd one out. One might think that we're using other
fsck_options structs in fetch-pack, or doing on fsck twice there, but
we're not.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fetch-pack.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 53d7ef00856..f961c3067cd 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -38,6 +38,7 @@ static int server_supports_filtering;
 static int advertise_sid;
 static struct shallow_lock shallow_lock;
 static const char *alternate_shallow_file;
+static struct fsck_options fsck_options = FSCK_OPTIONS_STRICT;
 static struct strbuf fsck_msg_types = STRBUF_INIT;
 static struct string_list uri_protocols = STRING_LIST_INIT_DUP;
 
@@ -991,15 +992,14 @@ static void fsck_gitmodules_oids(struct oidset *gitmodules_oids)
 {
 	struct oidset_iter iter;
 	const struct object_id *oid;
-	struct fsck_options fo = FSCK_OPTIONS_STRICT;
 
 	if (!oidset_size(gitmodules_oids))
 		return;
 
 	oidset_iter_init(gitmodules_oids, &iter);
 	while ((oid = oidset_iter_next(&iter)))
-		register_found_gitmodules(&fo, oid);
-	if (fsck_finish(&fo))
+		register_found_gitmodules(&fsck_options, oid);
+	if (fsck_finish(&fsck_options))
 		die("fsck failed");
 }
 
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 22/22] fetch-pack: use new fsck API to printing dangling submodules
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (22 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 21/22] fetch-pack: use file-scope static struct for fsck_options Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Refactor the check added in 5476e1efde (fetch-pack: print and use
dangling .gitmodules, 2021-02-22) to make use of us now passing the
"msg_id" to the user defined "error_func". We can now compare against
the FSCK_MSG_GITMODULES_MISSING instead of parsing the generated
message.

Let's also replace register_found_gitmodules() with directly
manipulating the "gitmodules_found" member. A recent commit moved it
into "fsck_options" so we could do this here.

Add a fsck-cb.c file similar to parse-options-cb.c, the alternative
would be to either define this directly in fsck.c as a public API, or
to create some library shared by fetch-pack.c ad builtin/index-pack.

I expect that there won't be many of these fsck utility functions in
the future, so just having a single fsck-cb.c makes sense.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Makefile             |  1 +
 builtin/index-pack.c | 21 +--------------------
 fetch-pack.c         |  4 ++--
 fsck-cb.c            | 16 ++++++++++++++++
 fsck.c               |  5 -----
 fsck.h               | 22 +++++++++++++++++++---
 6 files changed, 39 insertions(+), 30 deletions(-)
 create mode 100644 fsck-cb.c

diff --git a/Makefile b/Makefile
index dd08b4ced01..5bf128c5d2c 100644
--- a/Makefile
+++ b/Makefile
@@ -879,6 +879,7 @@ LIB_OBJS += fetch-negotiator.o
 LIB_OBJS += fetch-pack.o
 LIB_OBJS += fmt-merge-msg.o
 LIB_OBJS += fsck.o
+LIB_OBJS += fsck-cb.o
 LIB_OBJS += fsmonitor.o
 LIB_OBJS += gettext.o
 LIB_OBJS += gpg-interface.o
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 5ad80b85b47..11f0fafd33b 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -120,7 +120,7 @@ static int nr_threads;
 static int from_stdin;
 static int strict;
 static int do_fsck_object;
-static struct fsck_options fsck_options = FSCK_OPTIONS_STRICT;
+static struct fsck_options fsck_options = FSCK_OPTIONS_MISSING_GITMODULES;
 static int verbose;
 static int show_resolving_progress;
 static int show_stat;
@@ -1713,24 +1713,6 @@ static void show_pack_info(int stat_only)
 	}
 }
 
-static int print_dangling_gitmodules(struct fsck_options *o,
-				     const struct object_id *oid,
-				     enum object_type object_type,
-				     enum fsck_msg_type msg_type,
-				     enum fsck_msg_id msg_id,
-				     const char *message)
-{
-	/*
-	 * NEEDSWORK: Plumb the MSG_ID (from fsck.c) here and use it
-	 * instead of relying on this string check.
-	 */
-	if (starts_with(message, "gitmodulesMissing")) {
-		printf("%s\n", oid_to_hex(oid));
-		return 0;
-	}
-	return fsck_error_function(o, oid, object_type, msg_type, msg_id, message);
-}
-
 int cmd_index_pack(int argc, const char **argv, const char *prefix)
 {
 	int i, fix_thin_pack = 0, verify = 0, stat_only = 0, rev_index;
@@ -1761,7 +1743,6 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 
 	read_replace_refs = 0;
 	fsck_options.walk = mark_link;
-	fsck_options.error_func = print_dangling_gitmodules;
 
 	reset_pack_idx_option(&opts);
 	git_config(git_index_pack_config, &opts);
diff --git a/fetch-pack.c b/fetch-pack.c
index f961c3067cd..7fc305b65c4 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -38,7 +38,7 @@ static int server_supports_filtering;
 static int advertise_sid;
 static struct shallow_lock shallow_lock;
 static const char *alternate_shallow_file;
-static struct fsck_options fsck_options = FSCK_OPTIONS_STRICT;
+static struct fsck_options fsck_options = FSCK_OPTIONS_MISSING_GITMODULES;
 static struct strbuf fsck_msg_types = STRBUF_INIT;
 static struct string_list uri_protocols = STRING_LIST_INIT_DUP;
 
@@ -998,7 +998,7 @@ static void fsck_gitmodules_oids(struct oidset *gitmodules_oids)
 
 	oidset_iter_init(gitmodules_oids, &iter);
 	while ((oid = oidset_iter_next(&iter)))
-		register_found_gitmodules(&fsck_options, oid);
+		oidset_insert(&fsck_options.gitmodules_found, oid);
 	if (fsck_finish(&fsck_options))
 		die("fsck failed");
 }
diff --git a/fsck-cb.c b/fsck-cb.c
new file mode 100644
index 00000000000..465a49235ac
--- /dev/null
+++ b/fsck-cb.c
@@ -0,0 +1,16 @@
+#include "git-compat-util.h"
+#include "fsck.h"
+
+int fsck_error_cb_print_missing_gitmodules(struct fsck_options *o,
+					   const struct object_id *oid,
+					   enum object_type object_type,
+					   enum fsck_msg_type msg_type,
+					   enum fsck_msg_id msg_id,
+					   const char *message)
+{
+	if (msg_id == FSCK_MSG_GITMODULES_MISSING) {
+		puts(oid_to_hex(oid));
+		return 0;
+	}
+	return fsck_error_function(o, oid, object_type, msg_type, msg_id, message);
+}
diff --git a/fsck.c b/fsck.c
index 565274a946c..b0089844db9 100644
--- a/fsck.c
+++ b/fsck.c
@@ -1214,11 +1214,6 @@ int fsck_error_function(struct fsck_options *o,
 	return 1;
 }
 
-void register_found_gitmodules(struct fsck_options *options, const struct object_id *oid)
-{
-	oidset_insert(&options->gitmodules_found, oid);
-}
-
 int fsck_finish(struct fsck_options *options)
 {
 	int ret = 0;
diff --git a/fsck.h b/fsck.h
index bb59ef05b68..ae3107638ab 100644
--- a/fsck.h
+++ b/fsck.h
@@ -153,9 +153,6 @@ int fsck_walk(struct object *obj, void *data, struct fsck_options *options);
 int fsck_object(struct object *obj, void *data, unsigned long size,
 	struct fsck_options *options);
 
-void register_found_gitmodules(struct fsck_options *options,
-			       const struct object_id *oid);
-
 /*
  * fsck a tag, and pass info about it back to the caller. This is
  * exposed fsck_object() internals for git-mktag(1).
@@ -204,4 +201,23 @@ const char *fsck_describe_object(struct fsck_options *options,
 int fsck_config_internal(const char *var, const char *value, void *cb,
 			 struct fsck_options *options);
 
+/*
+ * Initializations for callbacks in fsck-cb.c
+ */
+#define FSCK_OPTIONS_MISSING_GITMODULES { \
+	.strict = 1, \
+	.error_func = fsck_error_cb_print_missing_gitmodules, \
+	FSCK_OPTIONS_COMMON \
+}
+
+/*
+ * Error callbacks in fsck-cb.c
+ */
+int fsck_error_cb_print_missing_gitmodules(struct fsck_options *o,
+					   const struct object_id *oid,
+					   enum object_type object_type,
+					   enum fsck_msg_type msg_type,
+					   enum fsck_msg_id msg_id,
+					   const char *message);
+
 #endif
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* Re: [PATCH v3 00/22] fsck: API improvements
  2021-03-06 11:04               ` [PATCH v3 00/22] fsck: API improvements Ævar Arnfjörð Bjarmason
@ 2021-03-07 23:04                 ` Junio C Hamano
  2021-03-08  9:16                   ` Ævar Arnfjörð Bjarmason
  2021-03-16 16:17                 ` [PATCH v4 " Ævar Arnfjörð Bjarmason
                                   ` (22 subsequent siblings)
  23 siblings, 1 reply; 229+ messages in thread
From: Junio C Hamano @ 2021-03-07 23:04 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Now that jt/transfer-fsck-across-packs has been merged to master
> here's a re-roll of v1[1]+v2[2] of this series.

It unfortunately is not a good time to review or helping any work on
this series, as the base topic introduced an unpleasant regression
and needs to either probably gain a band-aid (or reverted in the
worst case); of course, it would be appreciated to help resolve the
issues on that topic ;-)

Thanks.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v3 00/22] fsck: API improvements
  2021-03-07 23:04                 ` Junio C Hamano
@ 2021-03-08  9:16                   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-08  9:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan


On Mon, Mar 08 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> Now that jt/transfer-fsck-across-packs has been merged to master
>> here's a re-roll of v1[1]+v2[2] of this series.
>
> It unfortunately is not a good time to review or helping any work on
> this series, as the base topic introduced an unpleasant regression
> and needs to either probably gain a band-aid (or reverted in the
> worst case); of course, it would be appreciated to help resolve the
> issues on that topic ;-)

I should have mentioned: I saw the bug & proposed fix thread for that.
I see that 2aec3bc4b64 (fetch-pack: do not mix --pack_header and
packfile uri, 2021-03-04) down into next is now merged down to next.

My reading of that thread is that the reported bug is solved, but
perhaps we're not 100% happy with the solution?

In any case, that patch does not conflict with this series, and all
tests pass with/without the two merged together.

I don't forese an issue with the two stepping on each other's toes,
since I'm just modifying the rather low-level fsck interface of spewing
out .gitmodules entries, not touching the logic of what's then done with
that information...



^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH] fetch-pack: do not mix --pack_header and packfile uri
  2021-03-05 23:18             ` Junio C Hamano
@ 2021-03-08 19:14               ` Jonathan Tan
  2021-03-08 19:34                 ` Junio C Hamano
  0 siblings, 1 reply; 229+ messages in thread
From: Jonathan Tan @ 2021-03-08 19:14 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git, jrnieder, nmulcahey

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > I should probably have written more in the commit message to justify the
> > unification, but it is also part of a bug fix (in particular,
> > --fsck-objects wasn't being passed to the index-pack that indexed the
> > packfiles linked by URI) and for code health purposes (to prevent future
> > bugs by eliminating the divergence). So reverting that commit would
> > reintroduce another bug.
> 
> Not necessarily.  Unifying two that do not inherently have to be
> identical makes it impossible to pass two different things, and that
> is what we are seeing in the bug this patch is trying to fix (by
> forcing the two to be identical by eliminating the unpack-objects
> codepath in certain cases).  
> 
> The right "fix" for the original bug would have been to keep them
> still separate yet making it easy to pass args that must be used in
> both of them, no?

OK - I'll do this.

> >> > -		if (do_keep && (args->lock_pack || unpack_limit)) {
> >> > +		if ((do_keep || index_pack_args) && (args->lock_pack || unpack_limit)) {
> >> >  			char hostname[HOST_NAME_MAX + 1];
> >> >  			if (xgethostname(hostname, sizeof(hostname)))
> >> >  				xsnprintf(hostname, sizeof(hostname), "localhost");
> >> 
> >> I do not quite get what this hunk is doing.  Care to explain?
> >
> > The "do_keep" part was unnecessarily restrictive and I used a band-aid
> > solution to loosen it. I think this started from 88e2f9ed8e ("introduce
> > fetch-object: fetch one promisor object", 2017-12-05) where I might have
> > misunderstood what do_keep was meant to do, and taught fetch-pack to use
> > "index-pack" if do_keep is true or args->from_promisor is true. What I
> > should have done is to set do_keep to true if args->from_promisor is
> > true. Future commits continued to do that with fsck_objects and
> > index_pack_args.
> 
> > Maybe what I can do is to refactor get_pack() so that do_keep retains
> > its original meaning of whether to use "index-pack" or "unpack-objects",
> > and then we wouldn't need this line. What do you think (code-wise and
> > whether this fits in with the release schedule, if we want to get this
> > in before release)?
> 
> How bad is the breakage this one is trying to fix?  I know it would
> only affect folks who have to interact with the server that uses
> packfile URI feature, but do they have a workaround, perhaps with a
> configuration knob or command line option to ignore the packfile
> URI,

Yes, there's a workaround (to disable packfile URIs from the client side
using a config variable).

> and how large is the affected population?

The only issues I've seen are within $DAYJOB, and there, we can carry
our own patch to fix this issue. So the affected population (right now)
is probably not much (if it even exists).

> I cannot shake the feeling that we are seeing band-aid on top of
> band-aid forced by having chosen to go in a wrong direction in the
> beginning X-<, and prefer to see the code drift even further into
> the same direction; hence my earlier suggestion to go back to the
> root cause by first reverting the wrong fix that introduced this bug
> and fixing the original bug in a different way.
> 
> I dunno how involved the necessary surgery would be, though.  If
> this is easy to work around, perhaps it might be a better option for
> the overall project to ship the upcoming release with this listed as
> a known breakage.

I don't think it's too difficult - I think we'll only need to filter out
the --pack_header when we figure out the arguments to pass for the
packfiles given by URI. I'll take a look.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH] fetch-pack: do not mix --pack_header and packfile uri
  2021-03-08 19:14               ` Jonathan Tan
@ 2021-03-08 19:34                 ` Junio C Hamano
  2021-03-09 19:13                   ` Junio C Hamano
  0 siblings, 1 reply; 229+ messages in thread
From: Junio C Hamano @ 2021-03-08 19:34 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, jrnieder, nmulcahey

Jonathan Tan <jonathantanmy@google.com> writes:

>> I dunno how involved the necessary surgery would be, though.  If
>> this is easy to work around, perhaps it might be a better option for
>> the overall project to ship the upcoming release with this listed as
>> a known breakage.
>
> I don't think it's too difficult - I think we'll only need to filter out
> the --pack_header when we figure out the arguments to pass for the
> packfiles given by URI. I'll take a look.

What you sent earlier is a much better band-aid than "keep the
single args array but filter an element out in only one codepath"
band-aid, I would think.

Any change that is more involved than a single-liner trivial bugfix
would be too late for this cycle, as we'd be cutting -rc2 by the end
of tomorrow.

Thanks.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH] fetch-pack: do not mix --pack_header and packfile uri
  2021-03-08 19:34                 ` Junio C Hamano
@ 2021-03-09 19:13                   ` Junio C Hamano
  2021-03-10  5:24                     ` Junio C Hamano
  2021-03-10 16:57                     ` Jonathan Tan
  0 siblings, 2 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-03-09 19:13 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, jrnieder, nmulcahey

Junio C Hamano <gitster@pobox.com> writes:

> Jonathan Tan <jonathantanmy@google.com> writes:
>
>>> I dunno how involved the necessary surgery would be, though.  If
>>> this is easy to work around, perhaps it might be a better option for
>>> the overall project to ship the upcoming release with this listed as
>>> a known breakage.
>>
>> I don't think it's too difficult - I think we'll only need to filter out
>> the --pack_header when we figure out the arguments to pass for the
>> packfiles given by URI. I'll take a look.
>
> What you sent earlier is a much better band-aid than "keep the
> single args array but filter an element out in only one codepath"
> band-aid, I would think.
>
> Any change that is more involved than a single-liner trivial bugfix
> would be too late for this cycle, as we'd be cutting -rc2 by the end
> of tomorrow.

I was looking at the index_pack_args vs pass_header codepath in
fetch-pack.c again after finishing the -rc2 stuff, and noticed
something curious.

Before running the command to process in-stream packdata, we have
this bit:

	if (index_pack_args) {
		int i;

		for (i = 0; i < cmd.args.nr; i++)
			strvec_push(index_pack_args, cmd.args.v[i]);
	}

where cmd.args is what the original code (before the "we need to
prepare the index pack arguments for the offline HTTP transfer"
logic was bolted onto this codepath), so it could of course have
things like "--fix-thin", "--promisor", when we are processing an
in-stream packfile that has sufficiently large number of objects and
choose "index-pack" to process it.  None of them should be given to
the "index-pack" that processes the offline packfile that is given
via the packfile URI mechanism.

Also, because this loop copies everything in cmd.args, if our
in-stream packdata is small, cmd.args.v[0] would be "unpack-objects",
and we end up asking the command to explode the (presumably large
enough to be worth pre-generating and serving via CDN) packfile that
is given via the packfile URI mechanism.

What I think I am seeing in the code is that there are many things
other than "pass_header" that fundamentally cannot be reused between
the processing of the in-stream packdata and the offline packfile
given by the packfile URI (e.g. the in-stream one may want to use
"unpack-objects" to avoid accumulating too many tiny packs, so there
is nothing to be shared with "index-pack" that will always be used
for the offline one), and any attempt to "reuse" cmd.args while
"filtering out" inappropriate bits is fragile and unfruitful.

Instead, I think we should not touch index_pack in the earlier part
of the function at all (both reading, writing, or even checking for
NULL-ness), and use the "if (index_pack_args)" block we already have
(i.e. the one before we call start_command() to process the
in-stream packdata) to decide what the command line to process the
offline pack should look like.  That way, we won't ever risk such a
confusion like running "unpack-objects" instead of "index-pack" (but
we can choose to do so deliberately, of course---the important point
is to recognise that the in-stream pack and the offline one are
independant and we should decide how to cook them separately).


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH] fetch-pack: do not mix --pack_header and packfile uri
  2021-03-09 19:13                   ` Junio C Hamano
@ 2021-03-10  5:24                     ` Junio C Hamano
  2021-03-10 16:57                     ` Jonathan Tan
  1 sibling, 0 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-03-10  5:24 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, jrnieder, nmulcahey

Junio C Hamano <gitster@pobox.com> writes:

> Instead, I think we should not touch index_pack in the earlier part
> of the function at all (both reading, writing, or even checking for
> NULL-ness), ...

I have to take the "NULL-ness" part back.  As the NULL-ness of the
variable is also used to convey that URI packfile is in use, which
in turn means we have to tell "index-pack" we are going to use for
processing in-stream packfile that the objects in the pack may be
pointing at objects that are not yet available.  So we do need to
check for the NULL-ness in order to decide what command line to use
to process the in-stream packdata.


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH] fetch-pack: do not mix --pack_header and packfile uri
  2021-03-09 19:13                   ` Junio C Hamano
  2021-03-10  5:24                     ` Junio C Hamano
@ 2021-03-10 16:57                     ` Jonathan Tan
  2021-03-10 18:30                       ` Junio C Hamano
  1 sibling, 1 reply; 229+ messages in thread
From: Jonathan Tan @ 2021-03-10 16:57 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git, jrnieder, nmulcahey

> I was looking at the index_pack_args vs pass_header codepath in
> fetch-pack.c again after finishing the -rc2 stuff, and noticed
> something curious.
> 
> Before running the command to process in-stream packdata, we have
> this bit:
> 
> 	if (index_pack_args) {
> 		int i;
> 
> 		for (i = 0; i < cmd.args.nr; i++)
> 			strvec_push(index_pack_args, cmd.args.v[i]);
> 	}
> 
> where cmd.args is what the original code (before the "we need to
> prepare the index pack arguments for the offline HTTP transfer"
> logic was bolted onto this codepath), so it could of course have
> things like "--fix-thin", "--promisor", when we are processing an
> in-stream packfile that has sufficiently large number of objects and
> choose "index-pack" to process it.  None of them should be given to
> the "index-pack" that processes the offline packfile that is given
> via the packfile URI mechanism.

Thanks for continuing to take a look at this. My thinking is that all
packfiles (inline or through URI) should be processed in as similar a
manner as possible. Looking at the potential arguments passed to
index-pack:

 1.  --shallow-file (before "index-pack", that is, an argument passed to
     "git" itself and not the subcommand)
 2.  index-pack
 3.  --stdin
 4.  -v
 5.  --fix-thin
 6.  --keep
 7.  [--check-self-contained-and-connected is guarded by !index_pack_args
     so we won't be passing it]
 8.  --promisor
 9.  --pack_header
 10. --fsck_objects
 11. [--strict appears in an "else" block opposite index_pack_args so we
     won't be passing it]

You mentioned --fix-thin (5) and --promisor (8). Why do you think that
none of these should be given to the "index-pack" that processes the
packfiles given by URI? Perhaps it could be argued that these extra
packfiles don't need --fix-thin (but I would say that I think servers
should be allowed to serve thin packfiles through URI too), but I think
that --promisor is necessary (so that a server could, for example,
offload all trees and commits to a packfile in a CDN, and offload all
blobs to a separate packfile in a CDN).

Looking at this list, I think that all the arguments (except 9, which
has been fixed) are necessary (or at least useful) for indexing a
packfile given by URI.

> Also, because this loop copies everything in cmd.args, if our
> in-stream packdata is small, cmd.args.v[0] would be "unpack-objects",
> and we end up asking the command to explode the (presumably large
> enough to be worth pre-generating and serving via CDN) packfile that
> is given via the packfile URI mechanism.

I specifically guard against this through the "if (do_keep ||
args->from_promisor || index_pack_args || fsck_objects) {" line (which
is a complicated line, unfortunately).

> What I think I am seeing in the code is that there are many things
> other than "pass_header" that fundamentally cannot be reused between
> the processing of the in-stream packdata and the offline packfile
> given by the packfile URI (e.g. the in-stream one may want to use
> "unpack-objects" to avoid accumulating too many tiny packs, so there
> is nothing to be shared with "index-pack" that will always be used
> for the offline one), and any attempt to "reuse" cmd.args while
> "filtering out" inappropriate bits is fragile and unfruitful.
>
> Instead, I think we should not touch index_pack in the earlier part
> of the function at all (both reading, writing, or even checking for
> NULL-ness), and use the "if (index_pack_args)" block we already have
> (i.e. the one before we call start_command() to process the
> in-stream packdata) to decide what the command line to process the
> offline pack should look like.  That way, we won't ever risk such a
> confusion like running "unpack-objects" instead of "index-pack" (but
> we can choose to do so deliberately, of course---the important point
> is to recognise that the in-stream pack and the offline one are
> independant and we should decide how to cook them separately).

We could do that, although I'm concerned that we would be repeating
logic a lot (deciding whether or not to pass an argument). One other
approach is for each "strvec.push?(&cmd.args" to also have another line
that pushes to index_pack_args if it's relevant. But as I said earlier,
I think that all or nearly all arguments will be relevant to both.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH] fetch-pack: do not mix --pack_header and packfile uri
  2021-03-10 16:57                     ` Jonathan Tan
@ 2021-03-10 18:30                       ` Junio C Hamano
  2021-03-10 19:56                         ` Junio C Hamano
  0 siblings, 1 reply; 229+ messages in thread
From: Junio C Hamano @ 2021-03-10 18:30 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, jrnieder, nmulcahey

Jonathan Tan <jonathantanmy@google.com> writes:

> You mentioned --fix-thin (5) and --promisor (8). Why do you think that
> none of these should be given to the "index-pack" that processes the
> packfiles given by URI?

Actually, --fix-thin is probably even worse than that.

As the code processes the in-stream packdata before processing or
even downloading the pregenerated URI packfile, the objects
necessary to fix a "thin" in-stream packdata are likely to be
unavailable (it is exactly the same problem as the one that made us
to delay the fsckobjects done in index-pack when URI packfile is
involved, isn't it?).  Even if the client asks --thin, the server
side shouldn't produce a thin pack for in-stream packdata, no?

> Perhaps it could be argued that these extra
> packfiles don't need --fix-thin (but I would say that I think servers
> should be allowed to serve thin packfiles through URI too), 

I agree that URI packfile could be thin; after all, the server end
chooses, based on what the client claims to have, which pregenerated
packfile to hand out, so it is perfectly fine to hand out a
pregenerated packfile that is thin if the client asks for a thin
pack and says it has base objects missing from that packfile.  And
because it is (assumed to be) pregenerated, we can make a requirement
that no URI packfile should depend on objects that are created later
that that (which means it won't depend on in-stream packdata).

But we cannot process a thin in-stream packdata, if we are to
process it first, right?

> but I think
> that --promisor is necessary (so that a server could, for example,
> offload all trees and commits to a packfile in a CDN, and offload all
> blobs to a separate packfile in a CDN).

Yes, both packfiles conceptually are given by that same server who
promises to be always available to feed us everything we'd need
later, so both packfiles should be marked to have come from the same
promisor.  So this is one example that happens to be sharable
between the two.

But I do not see it as an indication that the two packs inherently
must be processed with the same options.

> Looking at this list, I think that all the arguments (except 9, which
> has been fixed) are necessary (or at least useful) for indexing a
> packfile given by URI.

I have to say that this is focusing too much on the current need by
going through how the current code handles two packs.  Of course, if
we start from "two must be the same" viewpoint, and restrict what
the code can do by "guarding" bits that require the two to be
different out based on "if (index_pack_args)", then the resulting
code would invoke two index-pack the same way.

I am more worried about the longer term code health, so "currently
mostly the same" does not make a convincing argument for the future
why the two must be processed the same way.

>> Also, because this loop copies everything in cmd.args, if our
>> in-stream packdata is small, cmd.args.v[0] would be "unpack-objects",
>> and we end up asking the command to explode the (presumably large
>> enough to be worth pre-generating and serving via CDN) packfile that
>> is given via the packfile URI mechanism.
>
> I specifically guard against this through the "if (do_keep ||
> args->from_promisor || index_pack_args || fsck_objects) {" line (which
> is a complicated line, unfortunately).

I am aware of that line that forbids the in-stream packdata from
getting unpacked into loose objects.  But unless we were told to
keep the resulting pack, or run fsck-objects via the index-pack, I
do not see an inherent reason why the "most recent leftover bits
that are not in the pregenerated pack offloaded to CDN" objects must
be kept in a separate packfile, especially if the number of objects
in it is smaller than the unpack limit threshold.  In other words,
I view that "guard" as one of the things that blinds us into thinking
that the two packs should be handled the same way.  It is the other
way around---the guard is there only because the code wanted to handle
the two packs the same way.

When cloning from a server that offers bulk of old history in a URI
packfile and an in-stream packfile, shouldn't the result be like
cloning from the server back when it had only the objects in the URI
packfile, and then fetching from it again when it acquired objects
that came in the in-stream packfile?  The objects that come during
the second fetch would be left loose if there aren't that many, so
that the third and subsequent fetches and local activity can
accumulate enough loose objects to be packed into a single new pack,
avoiding accumulation of too many tiny packs.  And the "guard"
breaks that only because this codepath wants to reuse cmd.args that
is unrelated to populate index_pack_args.  Isn't that an artificial
limitation that we may want to eventually fix?

When we want to fix that, the "options are mostly the same when we
use the index-pack command for both packdata, so let's copy the
entire command line" would come back and haunt us.  The person who
is doing the fix may be somebody other than you, so it may not
matter to you today, but it will hurt somebody tomorrow.

I already said that I think 2aec3bc4 (fetch-pack: do not mix
--pack_header and packfile uri, 2021-03-04) is OK as a short-term
fix for the upcoming release, but it does not change the fact that
it is piling technical debt on top of existing technical debt.

And that is why I am reacting against your earlier mention of
"filering out" rather strongly.  The approach continues the "keep
the single args array in the belief that two must be mostly the
same", which I view as a misguided starting point that must be
rethought.

Thanks.


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH] fetch-pack: do not mix --pack_header and packfile uri
  2021-03-10 18:30                       ` Junio C Hamano
@ 2021-03-10 19:56                         ` Junio C Hamano
  2021-03-10 23:29                           ` Jonathan Tan
  0 siblings, 1 reply; 229+ messages in thread
From: Junio C Hamano @ 2021-03-10 19:56 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, jrnieder, nmulcahey

Junio C Hamano <gitster@pobox.com> writes:

> I already said that I think 2aec3bc4 (fetch-pack: do not mix
> --pack_header and packfile uri, 2021-03-04) is OK as a short-term
> fix for the upcoming release, but it does not change the fact that
> it is piling technical debt on top of existing technical debt.
>
> And that is why I am reacting against your earlier mention of
> "filering out" rather strongly.  The approach continues the "keep
> the single args array in the belief that two must be mostly the
> same", which I view as a misguided starting point that must be
> rethought.

Another way to think about the codepath is this.

Can the bulk of get_pack() that deals with a single incoming
packfile (from the part that makes the decision to use either
index-pack or unpack-objects and chooses what options to pass to the
command, to the part that actually calls run_command() and feeds the
packdata to the command) be made into a helper function that handles
one packdata stream and nothing else?  Such a helper would most
likely take as its parameters

 - a stream to read the packdata from (for in-stream packfile that
   is handled by get_pack(), we already have it available)

 - fetch_pack_args and other options that are meant to affect the
   operation of fetch-pack, among which are two bits that are of
   interest in this topic: if we want to run fsck-objects and if the
   entire fetch-pack is dealing with more than one packfile
   (currently, the only source of need to process multiple packfiles
   is packfile URI mechanism, but that does not have to stay that
   way).

Then get_pack() can move a lot of code out of it to this helper and
just call it.  The processing the other packfile obtained by the
packfile URI mechanism out of band can open the packstream and call
the helper the same way.  When packfile URI mechanism is in use, both
invocations of the helper would get "you are not alone so fsck may
hit missing objects" bit, if fsck-objects are asked for.

That would avoid the "duplicated logic" and still allow the code to
choose the best disposition of the incoming packdata per packfile.

In an extreme case, it is not hard to imagine that somebody prepares
a very small base packfile and feed it via packfile URI mechanism,
but have accumulated so many objects that are not yet rolled into an
updated base packfile---cloning from such a repository may result in
running unpack-objects for the packfile that came out of band, while
processing the in-stream packfile with index-pack.

Hmm?



^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH] fetch-pack: do not mix --pack_header and packfile uri
  2021-03-10 19:56                         ` Junio C Hamano
@ 2021-03-10 23:29                           ` Jonathan Tan
  2021-03-11  0:59                             ` Junio C Hamano
  2021-03-11  1:41                             ` Junio C Hamano
  0 siblings, 2 replies; 229+ messages in thread
From: Jonathan Tan @ 2021-03-10 23:29 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git, jrnieder, nmulcahey

> Junio C Hamano <gitster@pobox.com> writes:
> 
> > I already said that I think 2aec3bc4 (fetch-pack: do not mix
> > --pack_header and packfile uri, 2021-03-04) is OK as a short-term
> > fix for the upcoming release, but it does not change the fact that
> > it is piling technical debt on top of existing technical debt.
> >
> > And that is why I am reacting against your earlier mention of
> > "filering out" rather strongly.  The approach continues the "keep
> > the single args array in the belief that two must be mostly the
> > same", which I view as a misguided starting point that must be
> > rethought.
> 
> Another way to think about the codepath is this.
> 
> Can the bulk of get_pack() that deals with a single incoming
> packfile (from the part that makes the decision to use either
> index-pack or unpack-objects and chooses what options to pass to the
> command, to the part that actually calls run_command() and feeds the
> packdata to the command) be made into a helper function that handles
> one packdata stream and nothing else?  Such a helper would most
> likely take as its parameters
> 
>  - a stream to read the packdata from (for in-stream packfile that
>    is handled by get_pack(), we already have it available)
> 
>  - fetch_pack_args and other options that are meant to affect the
>    operation of fetch-pack, among which are two bits that are of
>    interest in this topic: if we want to run fsck-objects and if the
>    entire fetch-pack is dealing with more than one packfile
>    (currently, the only source of need to process multiple packfiles
>    is packfile URI mechanism, but that does not have to stay that
>    way).

This probably means that fetch-pack.c itself (instead of
finish_http_pack_request(), currently being called from a separate
http_fetch process) should call index-pack for the out-of-band
packfiles, which is conceptually reasonable. This means that
finish_http_pack_request() will need to be able to refrain from running
index-pack itself and instead just return where the pack was downloaded.

> Then get_pack() can move a lot of code out of it to this helper and
> just call it.  The processing the other packfile obtained by the
> packfile URI mechanism out of band can open the packstream and call
> the helper the same way.  When packfile URI mechanism is in use, both
> invocations of the helper would get "you are not alone so fsck may
> hit missing objects" bit, if fsck-objects are asked for.
> 
> That would avoid the "duplicated logic" and still allow the code to
> choose the best disposition of the incoming packdata per packfile.
> 
> In an extreme case, it is not hard to imagine that somebody prepares
> a very small base packfile and feed it via packfile URI mechanism,
> but have accumulated so many objects that are not yet rolled into an
> updated base packfile---cloning from such a repository may result in
> running unpack-objects for the packfile that came out of band, while
> processing the in-stream packfile with index-pack.
> 
> Hmm?

Your suggestion (as opposed to the current situation, in which we're
locked into using index-pack for the out-of-band packfiles) would make
this possible, yes.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH] fetch-pack: do not mix --pack_header and packfile uri
  2021-03-10 23:29                           ` Jonathan Tan
@ 2021-03-11  0:59                             ` Junio C Hamano
  2021-03-11  1:41                             ` Junio C Hamano
  1 sibling, 0 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-03-11  0:59 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, jrnieder, nmulcahey

Jonathan Tan <jonathantanmy@google.com> writes:

>> Then get_pack() can move a lot of code out of it to this helper and
>> just call it.  The processing the other packfile obtained by the
>> packfile URI mechanism out of band can open the packstream and call
>> the helper the same way.  When packfile URI mechanism is in use, both
>> invocations of the helper would get "you are not alone so fsck may
>> hit missing objects" bit, if fsck-objects are asked for.
>> 
>> That would avoid the "duplicated logic" and still allow the code to
>> choose the best disposition of the incoming packdata per packfile.
>> 
>> In an extreme case, it is not hard to imagine that somebody prepares
>> a very small base packfile and feed it via packfile URI mechanism,
>> but have accumulated so many objects that are not yet rolled into an
>> updated base packfile---cloning from such a repository may result in
>> running unpack-objects for the packfile that came out of band, while
>> processing the in-stream packfile with index-pack.
>> 
>> Hmm?
>
> Your suggestion (as opposed to the current situation, in which we're
> locked into using index-pack for the out-of-band packfiles) would make
> this possible, yes.

Just to make sure, I am not interested in running unpack-objects on
oob packfiles, as they are expected to be "so old, big and not
changing that it is worth pre-generating" packfiles, so "yes the
approach would make that useless thing possible" is not a useful
criteria to judge how good the alternative approach would be.  If
the approach results in a cleaner design that gives us more
flexibility without risking unnecessary code duplication, it would
be a good sign that the approach is more sound than the direction we
took so far, though.

Thanks.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH] fetch-pack: do not mix --pack_header and packfile uri
  2021-03-10 23:29                           ` Jonathan Tan
  2021-03-11  0:59                             ` Junio C Hamano
@ 2021-03-11  1:41                             ` Junio C Hamano
  2021-03-11 17:22                               ` Jonathan Tan
  1 sibling, 1 reply; 229+ messages in thread
From: Junio C Hamano @ 2021-03-11  1:41 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, jrnieder, nmulcahey

Jonathan Tan <jonathantanmy@google.com> writes:

> This probably means that fetch-pack.c itself (instead of
> finish_http_pack_request(), currently being called from a separate
> http_fetch process) should call index-pack for the out-of-band
> packfiles, which is conceptually reasonable. This means that
> finish_http_pack_request() will need to be able to refrain from running
> index-pack itself and instead just return where the pack was downloaded.

The HTTP downloading for packfile specified via the packfile URI
mechansim is so different from the rest of the HTTP codepaths in
nature, isn't it?  It is a straight "download a static file over the
web, and we could even afford to resume, or send multiple requests
to gain throughput" usecase, which does not exist anywhere else in
Git (eh, other than the dumb HTTP protocol nobody sane should be
using anymore).

Since we are not in the business of writing a performant HTTP
downloader, if we can update the codepath not to rely on our http.c
code, and instead spawn one of the command line tools written
specifically for the "download a single large file over HTTP"
usecase (like curl, wget or aria2c), wait for it to do its thing and
then concentrate on the processing specific to Git (like running
index-pack with various options), it would take us closer to the
"make clone resumable" dream, wouldn't it?

Thanks.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH] fetch-pack: do not mix --pack_header and packfile uri
  2021-03-11  1:41                             ` Junio C Hamano
@ 2021-03-11 17:22                               ` Jonathan Tan
  2021-03-11 21:21                                 ` Junio C Hamano
  0 siblings, 1 reply; 229+ messages in thread
From: Jonathan Tan @ 2021-03-11 17:22 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git, jrnieder, nmulcahey

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > This probably means that fetch-pack.c itself (instead of
> > finish_http_pack_request(), currently being called from a separate
> > http_fetch process) should call index-pack for the out-of-band
> > packfiles, which is conceptually reasonable. This means that
> > finish_http_pack_request() will need to be able to refrain from running
> > index-pack itself and instead just return where the pack was downloaded.
> 
> The HTTP downloading for packfile specified via the packfile URI
> mechansim is so different from the rest of the HTTP codepaths in
> nature, isn't it?  It is a straight "download a static file over the
> web, and we could even afford to resume, or send multiple requests
> to gain throughput" usecase, which does not exist anywhere else in
> Git (eh, other than the dumb HTTP protocol nobody sane should be
> using anymore).

Yes - and I also noticed that finish_http_pack_request() is also used in
http-push.c, but I'm not familiar with that.

> Since we are not in the business of writing a performant HTTP
> downloader, if we can update the codepath not to rely on our http.c
> code, and instead spawn one of the command line tools written
> specifically for the "download a single large file over HTTP"
> usecase (like curl, wget or aria2c), wait for it to do its thing and
> then concentrate on the processing specific to Git (like running
> index-pack with various options), it would take us closer to the
> "make clone resumable" dream, wouldn't it?
> 
> Thanks.

We would have to figure out how to communicate any Git HTTP config
variables to curl/wget etc. (and also declare a dependency on such a
tool), but that could be done.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH] fetch-pack: do not mix --pack_header and packfile uri
  2021-03-11 17:22                               ` Jonathan Tan
@ 2021-03-11 21:21                                 ` Junio C Hamano
  0 siblings, 0 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-03-11 21:21 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, jrnieder, nmulcahey

Jonathan Tan <jonathantanmy@google.com> writes:

>> Since we are not in the business of writing a performant HTTP
>> downloader, if we can update the codepath not to rely on our http.c
>> code, and instead spawn one of the command line tools written
>> specifically for the "download a single large file over HTTP"
>> usecase (like curl, wget or aria2c), wait for it to do its thing and
>> then concentrate on the processing specific to Git (like running
>> index-pack with various options), it would take us closer to the
>> "make clone resumable" dream, wouldn't it?
>> 
>> Thanks.
>
> We would have to figure out how to communicate any Git HTTP config
> variables to curl/wget etc. (and also declare a dependency on such a
> tool), but that could be done.

Sure, and we do not have to go all the way there in a single step.

We'd likely need to ship with a basic "download from this URL and
store it in this specified temporary file" (or "to this fd") and use
it as the default downloader.  We just need to design the interface
to that downloader (i.e. which we want to make replaceable) to be
not too intimate with the details of the side that spawns the
downloader (i.e. git and git-fetch), and other people can write
replacement as a thin wrapper around curl/wget etc. to contribute to
us.

Thanks.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* [PATCH v4 00/22] fsck: API improvements
  2021-03-06 11:04               ` [PATCH v3 00/22] fsck: API improvements Ævar Arnfjörð Bjarmason
  2021-03-07 23:04                 ` Junio C Hamano
@ 2021-03-16 16:17                 ` Ævar Arnfjörð Bjarmason
  2021-03-16 19:35                   ` Derrick Stolee
                                     ` (20 more replies)
  2021-03-16 16:17                 ` [PATCH v4 01/22] fsck.h: update FSCK_OPTIONS_* for object_name Ævar Arnfjörð Bjarmason
                                   ` (21 subsequent siblings)
  23 siblings, 21 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-16 16:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

A re-send of a rebased v3, which I sent at:
http://lore.kernel.org/git/20210306110439.27694-1-avarab@gmail.com as
seen in the range-diff there are no changes since v3. I'm just sending
this as a post-release bump of this, per
https://lore.kernel.org/git/xmqqy2etczqi.fsf@gitster.g/

Ævar Arnfjörð Bjarmason (22):
  fsck.h: update FSCK_OPTIONS_* for object_name
  fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT}
  fsck.h: reduce duplication between FSCK_OPTIONS_{DEFAULT,STRICT}
  fsck.h: add a FSCK_OPTIONS_COMMON_ERROR_FUNC macro
  fsck.h: indent arguments to of fsck_set_msg_type
  fsck.h: use "enum object_type" instead of "int"
  fsck.c: rename variables in fsck_set_msg_type() for less confusion
  fsck.c: move definition of msg_id into append_msg_id()
  fsck.c: rename remaining fsck_msg_id "id" to "msg_id"
  fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type"
  fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
  fsck.h: re-order and re-assign "enum fsck_msg_type"
  fsck.c: call parse_msg_type() early in fsck_set_msg_type()
  fsck.c: undefine temporary STR macro after use
  fsck.c: give "FOREACH_MSG_ID" a more specific name
  fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h
  fsck.c: pass along the fsck_msg_id in the fsck_error callback
  fsck.c: add an fsck_set_msg_type() API that takes enums
  fsck.c: move gitmodules_{found,done} into fsck_options
  fetch-pack: don't needlessly copy fsck_options
  fetch-pack: use file-scope static struct for fsck_options
  fetch-pack: use new fsck API to printing dangling submodules

 Makefile                 |   1 +
 builtin/fsck.c           |   7 +-
 builtin/index-pack.c     |  30 ++-----
 builtin/mktag.c          |   7 +-
 builtin/unpack-objects.c |   3 +-
 fetch-pack.c             |   6 +-
 fsck-cb.c                |  16 ++++
 fsck.c                   | 175 ++++++++++++---------------------------
 fsck.h                   | 132 ++++++++++++++++++++++++++---
 9 files changed, 211 insertions(+), 166 deletions(-)
 create mode 100644 fsck-cb.c

Range-diff:
 1:  9d809466bd =  1:  9cd942b526 fsck.h: update FSCK_OPTIONS_* for object_name
 2:  33e8b6d654 =  2:  d67966b838 fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT}
 3:  c23f7ce9e4 =  3:  211472e0c5 fsck.h: reduce duplication between FSCK_OPTIONS_{DEFAULT,STRICT}
 4:  5dde68df6c =  4:  70afee988d fsck.h: add a FSCK_OPTIONS_COMMON_ERROR_FUNC macro
 5:  7ae35a6e9d =  5:  1337d53352 fsck.h: indent arguments to of fsck_set_msg_type
 6:  dfb5f754b3 =  6:  e4ef107bb4 fsck.h: use "enum object_type" instead of "int"
 7:  fd58ec73c6 =  7:  20bac3207e fsck.c: rename variables in fsck_set_msg_type() for less confusion
 8:  48cb4d3bb7 =  8:  09c3bba9e9 fsck.c: move definition of msg_id into append_msg_id()
 9:  2c80ad3203 =  9:  8067df53a2 fsck.c: rename remaining fsck_msg_id "id" to "msg_id"
10:  92dfbdfb62 = 10:  bdf5e13f3d fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type"
11:  c1c476af69 = 11:  b03caa237f fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
12:  d55587719a = 12:  7b1d13b4cc fsck.h: re-order and re-assign "enum fsck_msg_type"
13:  32828d1c78 = 13:  a8e4ca7b19 fsck.c: call parse_msg_type() early in fsck_set_msg_type()
14:  5c62066235 = 14:  214c375a20 fsck.c: undefine temporary STR macro after use
15:  f8e50fbf7d = 15:  19a2499a80 fsck.c: give "FOREACH_MSG_ID" a more specific name
16:  cd74dee876 = 16:  6e1a7b6274 fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h
17:  234e287d08 = 17:  42af4e164c fsck.c: pass along the fsck_msg_id in the fsck_error callback
18:  8049dc0739 = 18:  fa47f473a8 fsck.c: add an fsck_set_msg_type() API that takes enums
19:  4224a29d15 = 19:  4cc3880cc4 fsck.c: move gitmodules_{found,done} into fsck_options
20:  40b1346812 = 20:  fd219d318a fetch-pack: don't needlessly copy fsck_options
21:  8e418abfbd = 21:  e4cd8c250e fetch-pack: use file-scope static struct for fsck_options
22:  113de190f7 = 22:  fdbc3c304c fetch-pack: use new fsck API to printing dangling submodules
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply	[flat|nested] 229+ messages in thread

* [PATCH v4 01/22] fsck.h: update FSCK_OPTIONS_* for object_name
  2021-03-06 11:04               ` [PATCH v3 00/22] fsck: API improvements Ævar Arnfjörð Bjarmason
  2021-03-07 23:04                 ` Junio C Hamano
  2021-03-16 16:17                 ` [PATCH v4 " Ævar Arnfjörð Bjarmason
@ 2021-03-16 16:17                 ` Ævar Arnfjörð Bjarmason
  2021-03-17 18:35                   ` Junio C Hamano
  2021-03-19 14:43                   ` Johannes Schindelin
  2021-03-16 16:17                 ` [PATCH v4 02/22] fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT} Ævar Arnfjörð Bjarmason
                                   ` (20 subsequent siblings)
  23 siblings, 2 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-16 16:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Add the object_name member to the initialization macro. This was
omitted in 7b35efd734e (fsck_walk(): optionally name objects on the
go, 2016-07-17) when the field was added.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fsck.h b/fsck.h
index 733378f126..2274843ba0 100644
--- a/fsck.h
+++ b/fsck.h
@@ -43,8 +43,8 @@ struct fsck_options {
 	kh_oid_map_t *object_names;
 };
 
-#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT }
-#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT }
+#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT, NULL }
+#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT, NULL }
 
 /* descend in all linked child objects
  * the return value is:
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 02/22] fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT}
  2021-03-06 11:04               ` [PATCH v3 00/22] fsck: API improvements Ævar Arnfjörð Bjarmason
                                   ` (2 preceding siblings ...)
  2021-03-16 16:17                 ` [PATCH v4 01/22] fsck.h: update FSCK_OPTIONS_* for object_name Ævar Arnfjörð Bjarmason
@ 2021-03-16 16:17                 ` Ævar Arnfjörð Bjarmason
  2021-03-16 18:59                   ` Derrick Stolee
  2021-03-17 18:38                   ` Junio C Hamano
  2021-03-16 16:17                 ` [PATCH v4 03/22] fsck.h: reduce duplication between FSCK_OPTIONS_{DEFAULT,STRICT} Ævar Arnfjörð Bjarmason
                                   ` (19 subsequent siblings)
  23 siblings, 2 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-16 16:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.h | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/fsck.h b/fsck.h
index 2274843ba0..40f3cb3f64 100644
--- a/fsck.h
+++ b/fsck.h
@@ -43,8 +43,22 @@ struct fsck_options {
 	kh_oid_map_t *object_names;
 };
 
-#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT, NULL }
-#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT, NULL }
+#define FSCK_OPTIONS_DEFAULT { \
+	.walk = NULL, \
+	.error_func = fsck_error_function, \
+	.strict = 0, \
+	.msg_type = NULL, \
+	.skiplist = OIDSET_INIT, \
+	.object_names = NULL, \
+}
+#define FSCK_OPTIONS_STRICT { \
+	.walk = NULL, \
+	.error_func = fsck_error_function, \
+	.strict = 1, \
+	.msg_type = NULL, \
+	.skiplist = OIDSET_INIT, \
+	.object_names = NULL, \
+}
 
 /* descend in all linked child objects
  * the return value is:
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 03/22] fsck.h: reduce duplication between FSCK_OPTIONS_{DEFAULT,STRICT}
  2021-03-06 11:04               ` [PATCH v3 00/22] fsck: API improvements Ævar Arnfjörð Bjarmason
                                   ` (3 preceding siblings ...)
  2021-03-16 16:17                 ` [PATCH v4 02/22] fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT} Ævar Arnfjörð Bjarmason
@ 2021-03-16 16:17                 ` Ævar Arnfjörð Bjarmason
  2021-03-16 16:17                 ` [PATCH v4 04/22] fsck.h: add a FSCK_OPTIONS_COMMON_ERROR_FUNC macro Ævar Arnfjörð Bjarmason
                                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-16 16:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Use a temporary macro to define what FSCK_OPTIONS_{DEFAULT,STRICT}
have in common, and define the two in terms of that macro.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.h | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/fsck.h b/fsck.h
index 40f3cb3f64..ea3a907ec3 100644
--- a/fsck.h
+++ b/fsck.h
@@ -43,22 +43,14 @@ struct fsck_options {
 	kh_oid_map_t *object_names;
 };
 
-#define FSCK_OPTIONS_DEFAULT { \
+#define FSCK_OPTIONS_COMMON \
 	.walk = NULL, \
 	.error_func = fsck_error_function, \
-	.strict = 0, \
 	.msg_type = NULL, \
 	.skiplist = OIDSET_INIT, \
-	.object_names = NULL, \
-}
-#define FSCK_OPTIONS_STRICT { \
-	.walk = NULL, \
-	.error_func = fsck_error_function, \
-	.strict = 1, \
-	.msg_type = NULL, \
-	.skiplist = OIDSET_INIT, \
-	.object_names = NULL, \
-}
+	.object_names = NULL,
+#define FSCK_OPTIONS_DEFAULT	{ .strict = 0, FSCK_OPTIONS_COMMON }
+#define FSCK_OPTIONS_STRICT	{ .strict = 1, FSCK_OPTIONS_COMMON }
 
 /* descend in all linked child objects
  * the return value is:
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 04/22] fsck.h: add a FSCK_OPTIONS_COMMON_ERROR_FUNC macro
  2021-03-06 11:04               ` [PATCH v3 00/22] fsck: API improvements Ævar Arnfjörð Bjarmason
                                   ` (4 preceding siblings ...)
  2021-03-16 16:17                 ` [PATCH v4 03/22] fsck.h: reduce duplication between FSCK_OPTIONS_{DEFAULT,STRICT} Ævar Arnfjörð Bjarmason
@ 2021-03-16 16:17                 ` Ævar Arnfjörð Bjarmason
  2021-03-16 19:06                   ` Derrick Stolee
  2021-03-16 16:17                 ` [PATCH v4 05/22] fsck.h: indent arguments to of fsck_set_msg_type Ævar Arnfjörð Bjarmason
                                   ` (17 subsequent siblings)
  23 siblings, 1 reply; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-16 16:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Add a FSCK_OPTIONS_COMMON_ERROR_FUNC macro for those that would like
to use FSCK_OPTIONS_COMMON in their own initialization, but supply
their own error functions.

Nothing is being changed to use this yet, but in some subsequent
commits we'll make use of this macro.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.h | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fsck.h b/fsck.h
index ea3a907ec3..dc35924cbf 100644
--- a/fsck.h
+++ b/fsck.h
@@ -45,12 +45,15 @@ struct fsck_options {
 
 #define FSCK_OPTIONS_COMMON \
 	.walk = NULL, \
-	.error_func = fsck_error_function, \
 	.msg_type = NULL, \
 	.skiplist = OIDSET_INIT, \
 	.object_names = NULL,
-#define FSCK_OPTIONS_DEFAULT	{ .strict = 0, FSCK_OPTIONS_COMMON }
-#define FSCK_OPTIONS_STRICT	{ .strict = 1, FSCK_OPTIONS_COMMON }
+#define FSCK_OPTIONS_COMMON_ERROR_FUNC \
+	FSCK_OPTIONS_COMMON \
+	.error_func = fsck_error_function
+
+#define FSCK_OPTIONS_DEFAULT	{ .strict = 0, FSCK_OPTIONS_COMMON_ERROR_FUNC }
+#define FSCK_OPTIONS_STRICT	{ .strict = 1, FSCK_OPTIONS_COMMON_ERROR_FUNC }
 
 /* descend in all linked child objects
  * the return value is:
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 05/22] fsck.h: indent arguments to of fsck_set_msg_type
  2021-03-06 11:04               ` [PATCH v3 00/22] fsck: API improvements Ævar Arnfjörð Bjarmason
                                   ` (5 preceding siblings ...)
  2021-03-16 16:17                 ` [PATCH v4 04/22] fsck.h: add a FSCK_OPTIONS_COMMON_ERROR_FUNC macro Ævar Arnfjörð Bjarmason
@ 2021-03-16 16:17                 ` Ævar Arnfjörð Bjarmason
  2021-03-16 16:17                 ` [PATCH v4 06/22] fsck.h: use "enum object_type" instead of "int" Ævar Arnfjörð Bjarmason
                                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-16 16:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fsck.h b/fsck.h
index dc35924cbf..5e488cef6b 100644
--- a/fsck.h
+++ b/fsck.h
@@ -11,7 +11,7 @@ struct fsck_options;
 struct object;
 
 void fsck_set_msg_type(struct fsck_options *options,
-		const char *msg_id, const char *msg_type);
+		       const char *msg_id, const char *msg_type);
 void fsck_set_msg_types(struct fsck_options *options, const char *values);
 int is_valid_msg_type(const char *msg_id, const char *msg_type);
 
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 06/22] fsck.h: use "enum object_type" instead of "int"
  2021-03-06 11:04               ` [PATCH v3 00/22] fsck: API improvements Ævar Arnfjörð Bjarmason
                                   ` (6 preceding siblings ...)
  2021-03-16 16:17                 ` [PATCH v4 05/22] fsck.h: indent arguments to of fsck_set_msg_type Ævar Arnfjörð Bjarmason
@ 2021-03-16 16:17                 ` Ævar Arnfjörð Bjarmason
  2021-03-16 16:17                 ` [PATCH v4 07/22] fsck.c: rename variables in fsck_set_msg_type() for less confusion Ævar Arnfjörð Bjarmason
                                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-16 16:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Change the fsck_walk_func to use an "enum object_type" instead of an
"int" type. The types are compatible, and ever since this was added in
355885d5315 (add generic, type aware object chain walker, 2008-02-25)
we've used entries from object_type (OBJ_BLOB etc.).

So this doesn't really change anything as far as the generated code is
concerned, it just gives the compiler more information and makes this
easier to read.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c           | 3 ++-
 builtin/index-pack.c     | 3 ++-
 builtin/unpack-objects.c | 3 ++-
 fsck.h                   | 3 ++-
 4 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 821e7798c7..68f0329e69 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -197,7 +197,8 @@ static int traverse_reachable(void)
 	return !!result;
 }
 
-static int mark_used(struct object *obj, int type, void *data, struct fsck_options *options)
+static int mark_used(struct object *obj, enum object_type object_type,
+		     void *data, struct fsck_options *options)
 {
 	if (!obj)
 		return 1;
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index bad5748807..69f24fe9f7 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -212,7 +212,8 @@ static void cleanup_thread(void)
 	free(thread_data);
 }
 
-static int mark_link(struct object *obj, int type, void *data, struct fsck_options *options)
+static int mark_link(struct object *obj, enum object_type type,
+		     void *data, struct fsck_options *options)
 {
 	if (!obj)
 		return -1;
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index dd4a75e030..ca54fd1668 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -187,7 +187,8 @@ static void write_cached_object(struct object *obj, struct obj_buffer *obj_buf)
  * that have reachability requirements and calls this function.
  * Verify its reachability and validity recursively and write it out.
  */
-static int check_object(struct object *obj, int type, void *data, struct fsck_options *options)
+static int check_object(struct object *obj, enum object_type type,
+			void *data, struct fsck_options *options)
 {
 	struct obj_buffer *obj_buf;
 
diff --git a/fsck.h b/fsck.h
index 5e488cef6b..f67edd8f1f 100644
--- a/fsck.h
+++ b/fsck.h
@@ -23,7 +23,8 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type);
  *     <0	error signaled and abort
  *     >0	error signaled and do not abort
  */
-typedef int (*fsck_walk_func)(struct object *obj, int type, void *data, struct fsck_options *options);
+typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type,
+			      void *data, struct fsck_options *options);
 
 /* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
 typedef int (*fsck_error)(struct fsck_options *o,
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 07/22] fsck.c: rename variables in fsck_set_msg_type() for less confusion
  2021-03-06 11:04               ` [PATCH v3 00/22] fsck: API improvements Ævar Arnfjörð Bjarmason
                                   ` (7 preceding siblings ...)
  2021-03-16 16:17                 ` [PATCH v4 06/22] fsck.h: use "enum object_type" instead of "int" Ævar Arnfjörð Bjarmason
@ 2021-03-16 16:17                 ` Ævar Arnfjörð Bjarmason
  2021-03-16 16:17                 ` [PATCH v4 08/22] fsck.c: move definition of msg_id into append_msg_id() Ævar Arnfjörð Bjarmason
                                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-16 16:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Rename variables in a function added in 0282f4dced0 (fsck: offer a
function to demote fsck errors to warnings, 2015-06-22).

It was needlessly confusing that it took a "msg_type" argument, but
then later declared another "msg_type" of a different type.

Let's rename that to "severity", and rename "id" to "msg_id" and
"msg_id" to "msg_id_str" etc. This will make a follow-up change
smaller.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/fsck.c b/fsck.c
index e3030f3b35..0a9ac9ca07 100644
--- a/fsck.c
+++ b/fsck.c
@@ -203,27 +203,27 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type)
 }
 
 void fsck_set_msg_type(struct fsck_options *options,
-		const char *msg_id, const char *msg_type)
+		const char *msg_id_str, const char *msg_type_str)
 {
-	int id = parse_msg_id(msg_id), type;
+	int msg_id = parse_msg_id(msg_id_str), msg_type;
 
-	if (id < 0)
-		die("Unhandled message id: %s", msg_id);
-	type = parse_msg_type(msg_type);
+	if (msg_id < 0)
+		die("Unhandled message id: %s", msg_id_str);
+	msg_type = parse_msg_type(msg_type_str);
 
-	if (type != FSCK_ERROR && msg_id_info[id].msg_type == FSCK_FATAL)
-		die("Cannot demote %s to %s", msg_id, msg_type);
+	if (msg_type != FSCK_ERROR && msg_id_info[msg_id].msg_type == FSCK_FATAL)
+		die("Cannot demote %s to %s", msg_id_str, msg_type_str);
 
 	if (!options->msg_type) {
 		int i;
-		int *msg_type;
-		ALLOC_ARRAY(msg_type, FSCK_MSG_MAX);
+		int *severity;
+		ALLOC_ARRAY(severity, FSCK_MSG_MAX);
 		for (i = 0; i < FSCK_MSG_MAX; i++)
-			msg_type[i] = fsck_msg_type(i, options);
-		options->msg_type = msg_type;
+			severity[i] = fsck_msg_type(i, options);
+		options->msg_type = severity;
 	}
 
-	options->msg_type[id] = type;
+	options->msg_type[msg_id] = msg_type;
 }
 
 void fsck_set_msg_types(struct fsck_options *options, const char *values)
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 08/22] fsck.c: move definition of msg_id into append_msg_id()
  2021-03-06 11:04               ` [PATCH v3 00/22] fsck: API improvements Ævar Arnfjörð Bjarmason
                                   ` (8 preceding siblings ...)
  2021-03-16 16:17                 ` [PATCH v4 07/22] fsck.c: rename variables in fsck_set_msg_type() for less confusion Ævar Arnfjörð Bjarmason
@ 2021-03-16 16:17                 ` Ævar Arnfjörð Bjarmason
  2021-03-17 18:45                   ` Junio C Hamano
  2021-03-16 16:17                 ` [PATCH v4 09/22] fsck.c: rename remaining fsck_msg_id "id" to "msg_id" Ævar Arnfjörð Bjarmason
                                   ` (13 subsequent siblings)
  23 siblings, 1 reply; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-16 16:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Refactor code added in 71ab8fa840f (fsck: report the ID of the
error/warning, 2015-06-22) to resolve the msg_id to a string in the
function that wants it, instead of doing it in report().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fsck.c b/fsck.c
index 0a9ac9ca07..b977493f57 100644
--- a/fsck.c
+++ b/fsck.c
@@ -264,8 +264,9 @@ void fsck_set_msg_types(struct fsck_options *options, const char *values)
 	free(to_free);
 }
 
-static void append_msg_id(struct strbuf *sb, const char *msg_id)
+static void append_msg_id(struct strbuf *sb, enum fsck_msg_id id)
 {
+	const char *msg_id = msg_id_info[id].id_string;
 	for (;;) {
 		char c = *(msg_id)++;
 
@@ -308,7 +309,7 @@ static int report(struct fsck_options *options,
 	else if (msg_type == FSCK_INFO)
 		msg_type = FSCK_WARN;
 
-	append_msg_id(&sb, msg_id_info[id].id_string);
+	append_msg_id(&sb, id);
 
 	va_start(ap, fmt);
 	strbuf_vaddf(&sb, fmt, ap);
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 09/22] fsck.c: rename remaining fsck_msg_id "id" to "msg_id"
  2021-03-06 11:04               ` [PATCH v3 00/22] fsck: API improvements Ævar Arnfjörð Bjarmason
                                   ` (9 preceding siblings ...)
  2021-03-16 16:17                 ` [PATCH v4 08/22] fsck.c: move definition of msg_id into append_msg_id() Ævar Arnfjörð Bjarmason
@ 2021-03-16 16:17                 ` Ævar Arnfjörð Bjarmason
  2021-03-16 16:17                 ` [PATCH v4 10/22] fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type" Ævar Arnfjörð Bjarmason
                                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-16 16:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Rename the remaining variables of type fsck_msg_id from "id" to
"msg_id". This change is relatively small, and is worth the churn for
a later change where we have different id's in the "report" function.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fsck.c b/fsck.c
index b977493f57..6b72ddaa51 100644
--- a/fsck.c
+++ b/fsck.c
@@ -264,19 +264,19 @@ void fsck_set_msg_types(struct fsck_options *options, const char *values)
 	free(to_free);
 }
 
-static void append_msg_id(struct strbuf *sb, enum fsck_msg_id id)
+static void append_msg_id(struct strbuf *sb, enum fsck_msg_id msg_id)
 {
-	const char *msg_id = msg_id_info[id].id_string;
+	const char *msg_id_str = msg_id_info[msg_id].id_string;
 	for (;;) {
-		char c = *(msg_id)++;
+		char c = *(msg_id_str)++;
 
 		if (!c)
 			break;
 		if (c != '_')
 			strbuf_addch(sb, tolower(c));
 		else {
-			assert(*msg_id);
-			strbuf_addch(sb, *(msg_id)++);
+			assert(*msg_id_str);
+			strbuf_addch(sb, *(msg_id_str)++);
 		}
 	}
 
@@ -292,11 +292,11 @@ static int object_on_skiplist(struct fsck_options *opts,
 __attribute__((format (printf, 5, 6)))
 static int report(struct fsck_options *options,
 		  const struct object_id *oid, enum object_type object_type,
-		  enum fsck_msg_id id, const char *fmt, ...)
+		  enum fsck_msg_id msg_id, const char *fmt, ...)
 {
 	va_list ap;
 	struct strbuf sb = STRBUF_INIT;
-	int msg_type = fsck_msg_type(id, options), result;
+	int msg_type = fsck_msg_type(msg_id, options), result;
 
 	if (msg_type == FSCK_IGNORE)
 		return 0;
@@ -309,7 +309,7 @@ static int report(struct fsck_options *options,
 	else if (msg_type == FSCK_INFO)
 		msg_type = FSCK_WARN;
 
-	append_msg_id(&sb, id);
+	append_msg_id(&sb, msg_id);
 
 	va_start(ap, fmt);
 	strbuf_vaddf(&sb, fmt, ap);
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 10/22] fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type"
  2021-03-06 11:04               ` [PATCH v3 00/22] fsck: API improvements Ævar Arnfjörð Bjarmason
                                   ` (10 preceding siblings ...)
  2021-03-16 16:17                 ` [PATCH v4 09/22] fsck.c: rename remaining fsck_msg_id "id" to "msg_id" Ævar Arnfjörð Bjarmason
@ 2021-03-16 16:17                 ` Ævar Arnfjörð Bjarmason
  2021-03-16 16:17                 ` [PATCH v4 11/22] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Ævar Arnfjörð Bjarmason
                                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-16 16:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Refactor "if options->msg_type" and other code added in
0282f4dced0 (fsck: offer a function to demote fsck errors to warnings,
2015-06-22) to reduce the scope of the "int msg_type" variable.

This is in preparation for changing its type in a subsequent commit,
only using it in the "!options->msg_type" scope makes that change

This also brings the code in line with the fsck_set_msg_type()
function (also added in 0282f4dced0), which does a similar check for
"!options->msg_type". Another minor benefit is getting rid of the
style violation of not having braces for the body of the "if".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/fsck.c b/fsck.c
index 6b72ddaa51..0988ab6579 100644
--- a/fsck.c
+++ b/fsck.c
@@ -167,19 +167,17 @@ void list_config_fsck_msg_ids(struct string_list *list, const char *prefix)
 static int fsck_msg_type(enum fsck_msg_id msg_id,
 	struct fsck_options *options)
 {
-	int msg_type;
-
 	assert(msg_id >= 0 && msg_id < FSCK_MSG_MAX);
 
-	if (options->msg_type)
-		msg_type = options->msg_type[msg_id];
-	else {
-		msg_type = msg_id_info[msg_id].msg_type;
+	if (!options->msg_type) {
+		int msg_type = msg_id_info[msg_id].msg_type;
+
 		if (options->strict && msg_type == FSCK_WARN)
 			msg_type = FSCK_ERROR;
+		return msg_type;
 	}
 
-	return msg_type;
+	return options->msg_type[msg_id];
 }
 
 static int parse_msg_type(const char *str)
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 11/22] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
  2021-03-06 11:04               ` [PATCH v3 00/22] fsck: API improvements Ævar Arnfjörð Bjarmason
                                   ` (11 preceding siblings ...)
  2021-03-16 16:17                 ` [PATCH v4 10/22] fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type" Ævar Arnfjörð Bjarmason
@ 2021-03-16 16:17                 ` Ævar Arnfjörð Bjarmason
  2021-03-17 18:48                   ` Junio C Hamano
  2021-03-16 16:17                 ` [PATCH v4 12/22] fsck.h: re-order and re-assign "enum fsck_msg_type" Ævar Arnfjörð Bjarmason
                                   ` (10 subsequent siblings)
  23 siblings, 1 reply; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-16 16:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Move the FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} defines into a new
fsck_msg_type enum.

These defines were originally introduced in:

 - ba002f3b28a (builtin-fsck: move common object checking code to
   fsck.c, 2008-02-25)
 - f50c4407305 (fsck: disallow demoting grave fsck errors to warnings,
   2015-06-22)
 - efaba7cc77f (fsck: optionally ignore specific fsck issues
   completely, 2015-06-22)
 - f27d05b1704 (fsck: allow upgrading fsck warnings to errors,
   2015-06-22)

The reason these were defined in two different places is because we
use FSCK_{IGNORE,INFO,FATAL} only in fsck.c, but FSCK_{ERROR,WARN} are
used by external callbacks.

Untangling that would take some more work, since we expose the new
"enum fsck_msg_type" to both. Similar to "enum object_type" it's not
worth structuring the API in such a way that only those who need
FSCK_{ERROR,WARN} pass around a different type.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c       |  2 +-
 builtin/index-pack.c |  3 ++-
 builtin/mktag.c      |  3 ++-
 fsck.c               | 21 ++++++++++-----------
 fsck.h               | 16 ++++++++++------
 5 files changed, 25 insertions(+), 20 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 68f0329e69..d6d745dc70 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -89,7 +89,7 @@ static int objerror(struct object *obj, const char *err)
 static int fsck_error_func(struct fsck_options *o,
 			   const struct object_id *oid,
 			   enum object_type object_type,
-			   int msg_type, const char *message)
+			   enum fsck_msg_type msg_type, const char *message)
 {
 	switch (msg_type) {
 	case FSCK_WARN:
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 69f24fe9f7..56b8efaa89 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1716,7 +1716,8 @@ static void show_pack_info(int stat_only)
 static int print_dangling_gitmodules(struct fsck_options *o,
 				     const struct object_id *oid,
 				     enum object_type object_type,
-				     int msg_type, const char *message)
+				     enum fsck_msg_type msg_type,
+				     const char *message)
 {
 	/*
 	 * NEEDSWORK: Plumb the MSG_ID (from fsck.c) here and use it
diff --git a/builtin/mktag.c b/builtin/mktag.c
index 41a399a69e..1834394a9b 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -22,7 +22,8 @@ static int mktag_config(const char *var, const char *value, void *cb)
 static int mktag_fsck_error_func(struct fsck_options *o,
 				 const struct object_id *oid,
 				 enum object_type object_type,
-				 int msg_type, const char *message)
+				 enum fsck_msg_type msg_type,
+				 const char *message)
 {
 	switch (msg_type) {
 	case FSCK_WARN:
diff --git a/fsck.c b/fsck.c
index 0988ab6579..fb7d071bbf 100644
--- a/fsck.c
+++ b/fsck.c
@@ -22,9 +22,6 @@
 static struct oidset gitmodules_found = OIDSET_INIT;
 static struct oidset gitmodules_done = OIDSET_INIT;
 
-#define FSCK_FATAL -1
-#define FSCK_INFO -2
-
 #define FOREACH_MSG_ID(FUNC) \
 	/* fatal errors */ \
 	FUNC(NUL_IN_HEADER, FATAL) \
@@ -97,7 +94,7 @@ static struct {
 	const char *id_string;
 	const char *downcased;
 	const char *camelcased;
-	int msg_type;
+	enum fsck_msg_type msg_type;
 } msg_id_info[FSCK_MSG_MAX + 1] = {
 	FOREACH_MSG_ID(MSG_ID)
 	{ NULL, NULL, NULL, -1 }
@@ -164,13 +161,13 @@ void list_config_fsck_msg_ids(struct string_list *list, const char *prefix)
 		list_config_item(list, prefix, msg_id_info[i].camelcased);
 }
 
-static int fsck_msg_type(enum fsck_msg_id msg_id,
+static enum fsck_msg_type fsck_msg_type(enum fsck_msg_id msg_id,
 	struct fsck_options *options)
 {
 	assert(msg_id >= 0 && msg_id < FSCK_MSG_MAX);
 
 	if (!options->msg_type) {
-		int msg_type = msg_id_info[msg_id].msg_type;
+		enum fsck_msg_type msg_type = msg_id_info[msg_id].msg_type;
 
 		if (options->strict && msg_type == FSCK_WARN)
 			msg_type = FSCK_ERROR;
@@ -180,7 +177,7 @@ static int fsck_msg_type(enum fsck_msg_id msg_id,
 	return options->msg_type[msg_id];
 }
 
-static int parse_msg_type(const char *str)
+static enum fsck_msg_type parse_msg_type(const char *str)
 {
 	if (!strcmp(str, "error"))
 		return FSCK_ERROR;
@@ -203,7 +200,8 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type)
 void fsck_set_msg_type(struct fsck_options *options,
 		const char *msg_id_str, const char *msg_type_str)
 {
-	int msg_id = parse_msg_id(msg_id_str), msg_type;
+	int msg_id = parse_msg_id(msg_id_str);
+	enum fsck_msg_type msg_type;
 
 	if (msg_id < 0)
 		die("Unhandled message id: %s", msg_id_str);
@@ -214,7 +212,7 @@ void fsck_set_msg_type(struct fsck_options *options,
 
 	if (!options->msg_type) {
 		int i;
-		int *severity;
+		enum fsck_msg_type *severity;
 		ALLOC_ARRAY(severity, FSCK_MSG_MAX);
 		for (i = 0; i < FSCK_MSG_MAX; i++)
 			severity[i] = fsck_msg_type(i, options);
@@ -294,7 +292,8 @@ static int report(struct fsck_options *options,
 {
 	va_list ap;
 	struct strbuf sb = STRBUF_INIT;
-	int msg_type = fsck_msg_type(msg_id, options), result;
+	enum fsck_msg_type msg_type = fsck_msg_type(msg_id, options);
+	int result;
 
 	if (msg_type == FSCK_IGNORE)
 		return 0;
@@ -1265,7 +1264,7 @@ int fsck_object(struct object *obj, void *data, unsigned long size,
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid,
 			enum object_type object_type,
-			int msg_type, const char *message)
+			enum fsck_msg_type msg_type, const char *message)
 {
 	if (msg_type == FSCK_WARN) {
 		warning("object %s: %s", fsck_describe_object(o, oid), message);
diff --git a/fsck.h b/fsck.h
index f67edd8f1f..2ecc15eee7 100644
--- a/fsck.h
+++ b/fsck.h
@@ -3,9 +3,13 @@
 
 #include "oidset.h"
 
-#define FSCK_ERROR 1
-#define FSCK_WARN 2
-#define FSCK_IGNORE 3
+enum fsck_msg_type {
+	FSCK_INFO  = -2,
+	FSCK_FATAL = -1,
+	FSCK_ERROR = 1,
+	FSCK_WARN,
+	FSCK_IGNORE
+};
 
 struct fsck_options;
 struct object;
@@ -29,17 +33,17 @@ typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type,
 /* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
 typedef int (*fsck_error)(struct fsck_options *o,
 			  const struct object_id *oid, enum object_type object_type,
-			  int msg_type, const char *message);
+			  enum fsck_msg_type msg_type, const char *message);
 
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid, enum object_type object_type,
-			int msg_type, const char *message);
+			enum fsck_msg_type msg_type, const char *message);
 
 struct fsck_options {
 	fsck_walk_func walk;
 	fsck_error error_func;
 	unsigned strict:1;
-	int *msg_type;
+	enum fsck_msg_type *msg_type;
 	struct oidset skiplist;
 	kh_oid_map_t *object_names;
 };
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 12/22] fsck.h: re-order and re-assign "enum fsck_msg_type"
  2021-03-06 11:04               ` [PATCH v3 00/22] fsck: API improvements Ævar Arnfjörð Bjarmason
                                   ` (12 preceding siblings ...)
  2021-03-16 16:17                 ` [PATCH v4 11/22] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Ævar Arnfjörð Bjarmason
@ 2021-03-16 16:17                 ` Ævar Arnfjörð Bjarmason
  2021-03-17 18:50                   ` Junio C Hamano
  2021-03-16 16:17                 ` [PATCH v4 13/22] fsck.c: call parse_msg_type() early in fsck_set_msg_type() Ævar Arnfjörð Bjarmason
                                   ` (9 subsequent siblings)
  23 siblings, 1 reply; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-16 16:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Change the values in the "enum fsck_msg_type" from being manually
assigned to using default C enum values.

This means we end up with a FSCK_IGNORE=0, which was previously
defined as "2".

I'm confident that nothing relies on these values, we always compare
them explicitly. Let's not omit "0" so it won't be assumed that we're
using these as a boolean somewhere.

This also allows us to re-structure the fields to mark which are
"private" v.s. "public". See the preceding commit for a rationale for
not simply splitting these into two enums, namely that this is used
for both the private and public fsck API.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.h | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fsck.h b/fsck.h
index 2ecc15eee7..fce9981a0c 100644
--- a/fsck.h
+++ b/fsck.h
@@ -4,11 +4,13 @@
 #include "oidset.h"
 
 enum fsck_msg_type {
-	FSCK_INFO  = -2,
-	FSCK_FATAL = -1,
-	FSCK_ERROR = 1,
+	/* for internal use only */
+	FSCK_IGNORE,
+	FSCK_INFO,
+	FSCK_FATAL,
+	/* "public", fed to e.g. error_func callbacks */
+	FSCK_ERROR,
 	FSCK_WARN,
-	FSCK_IGNORE
 };
 
 struct fsck_options;
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 13/22] fsck.c: call parse_msg_type() early in fsck_set_msg_type()
  2021-03-06 11:04               ` [PATCH v3 00/22] fsck: API improvements Ævar Arnfjörð Bjarmason
                                   ` (13 preceding siblings ...)
  2021-03-16 16:17                 ` [PATCH v4 12/22] fsck.h: re-order and re-assign "enum fsck_msg_type" Ævar Arnfjörð Bjarmason
@ 2021-03-16 16:17                 ` Ævar Arnfjörð Bjarmason
  2021-03-16 16:17                 ` [PATCH v4 14/22] fsck.c: undefine temporary STR macro after use Ævar Arnfjörð Bjarmason
                                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-16 16:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

There's no reason to defer the calling of parse_msg_type() until after
we've checked if the "id < 0". This is not a hot codepath, and
parse_msg_type() itself may die on invalid input.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fsck.c b/fsck.c
index fb7d071bbf..2ccf1a2f0f 100644
--- a/fsck.c
+++ b/fsck.c
@@ -201,11 +201,10 @@ void fsck_set_msg_type(struct fsck_options *options,
 		const char *msg_id_str, const char *msg_type_str)
 {
 	int msg_id = parse_msg_id(msg_id_str);
-	enum fsck_msg_type msg_type;
+	enum fsck_msg_type msg_type = parse_msg_type(msg_type_str);
 
 	if (msg_id < 0)
 		die("Unhandled message id: %s", msg_id_str);
-	msg_type = parse_msg_type(msg_type_str);
 
 	if (msg_type != FSCK_ERROR && msg_id_info[msg_id].msg_type == FSCK_FATAL)
 		die("Cannot demote %s to %s", msg_id_str, msg_type_str);
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 14/22] fsck.c: undefine temporary STR macro after use
  2021-03-06 11:04               ` [PATCH v3 00/22] fsck: API improvements Ævar Arnfjörð Bjarmason
                                   ` (14 preceding siblings ...)
  2021-03-16 16:17                 ` [PATCH v4 13/22] fsck.c: call parse_msg_type() early in fsck_set_msg_type() Ævar Arnfjörð Bjarmason
@ 2021-03-16 16:17                 ` Ævar Arnfjörð Bjarmason
  2021-03-17 18:57                   ` Junio C Hamano
  2021-03-16 16:17                 ` [PATCH v4 15/22] fsck.c: give "FOREACH_MSG_ID" a more specific name Ævar Arnfjörð Bjarmason
                                   ` (7 subsequent siblings)
  23 siblings, 1 reply; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-16 16:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

In f417eed8cde (fsck: provide a function to parse fsck message IDs,
2015-06-22) the "STR" macro was introduced, but that short macro name
was not undefined after use as was done earlier in the same series for
the MSG_ID macro in c99ba492f1c (fsck: introduce identifiers for fsck
messages, 2015-06-22).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fsck.c b/fsck.c
index 2ccf1a2f0f..f4c924ed04 100644
--- a/fsck.c
+++ b/fsck.c
@@ -100,6 +100,7 @@ static struct {
 	{ NULL, NULL, NULL, -1 }
 };
 #undef MSG_ID
+#undef STR
 
 static void prepare_msg_ids(void)
 {
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 15/22] fsck.c: give "FOREACH_MSG_ID" a more specific name
  2021-03-06 11:04               ` [PATCH v3 00/22] fsck: API improvements Ævar Arnfjörð Bjarmason
                                   ` (15 preceding siblings ...)
  2021-03-16 16:17                 ` [PATCH v4 14/22] fsck.c: undefine temporary STR macro after use Ævar Arnfjörð Bjarmason
@ 2021-03-16 16:17                 ` Ævar Arnfjörð Bjarmason
  2021-03-16 16:17                 ` [PATCH v4 16/22] fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h Ævar Arnfjörð Bjarmason
                                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-16 16:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Rename the FOREACH_MSG_ID macro to FOREACH_FSCK_MSG_ID in preparation
for moving it over to fsck.h. It's good convention to name macros
in *.h files in such a way as to clearly not clash with any other
names in other files.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fsck.c b/fsck.c
index f4c924ed04..6fbc56e9fa 100644
--- a/fsck.c
+++ b/fsck.c
@@ -22,7 +22,7 @@
 static struct oidset gitmodules_found = OIDSET_INIT;
 static struct oidset gitmodules_done = OIDSET_INIT;
 
-#define FOREACH_MSG_ID(FUNC) \
+#define FOREACH_FSCK_MSG_ID(FUNC) \
 	/* fatal errors */ \
 	FUNC(NUL_IN_HEADER, FATAL) \
 	FUNC(UNTERMINATED_HEADER, FATAL) \
@@ -83,7 +83,7 @@ static struct oidset gitmodules_done = OIDSET_INIT;
 
 #define MSG_ID(id, msg_type) FSCK_MSG_##id,
 enum fsck_msg_id {
-	FOREACH_MSG_ID(MSG_ID)
+	FOREACH_FSCK_MSG_ID(MSG_ID)
 	FSCK_MSG_MAX
 };
 #undef MSG_ID
@@ -96,7 +96,7 @@ static struct {
 	const char *camelcased;
 	enum fsck_msg_type msg_type;
 } msg_id_info[FSCK_MSG_MAX + 1] = {
-	FOREACH_MSG_ID(MSG_ID)
+	FOREACH_FSCK_MSG_ID(MSG_ID)
 	{ NULL, NULL, NULL, -1 }
 };
 #undef MSG_ID
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 16/22] fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h
  2021-03-06 11:04               ` [PATCH v3 00/22] fsck: API improvements Ævar Arnfjörð Bjarmason
                                   ` (16 preceding siblings ...)
  2021-03-16 16:17                 ` [PATCH v4 15/22] fsck.c: give "FOREACH_MSG_ID" a more specific name Ævar Arnfjörð Bjarmason
@ 2021-03-16 16:17                 ` Ævar Arnfjörð Bjarmason
  2021-03-16 16:17                 ` [PATCH v4 17/22] fsck.c: pass along the fsck_msg_id in the fsck_error callback Ævar Arnfjörð Bjarmason
                                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-16 16:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Move the FOREACH_FSCK_MSG_ID macro and the fsck_msg_id enum it helps
define from fsck.c to fsck.h. This is in preparation for having
non-static functions take the fsck_msg_id as an argument.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 66 ----------------------------------------------------------
 fsck.h | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 66 insertions(+), 66 deletions(-)

diff --git a/fsck.c b/fsck.c
index 6fbc56e9fa..8a66168e51 100644
--- a/fsck.c
+++ b/fsck.c
@@ -22,72 +22,6 @@
 static struct oidset gitmodules_found = OIDSET_INIT;
 static struct oidset gitmodules_done = OIDSET_INIT;
 
-#define FOREACH_FSCK_MSG_ID(FUNC) \
-	/* fatal errors */ \
-	FUNC(NUL_IN_HEADER, FATAL) \
-	FUNC(UNTERMINATED_HEADER, FATAL) \
-	/* errors */ \
-	FUNC(BAD_DATE, ERROR) \
-	FUNC(BAD_DATE_OVERFLOW, ERROR) \
-	FUNC(BAD_EMAIL, ERROR) \
-	FUNC(BAD_NAME, ERROR) \
-	FUNC(BAD_OBJECT_SHA1, ERROR) \
-	FUNC(BAD_PARENT_SHA1, ERROR) \
-	FUNC(BAD_TAG_OBJECT, ERROR) \
-	FUNC(BAD_TIMEZONE, ERROR) \
-	FUNC(BAD_TREE, ERROR) \
-	FUNC(BAD_TREE_SHA1, ERROR) \
-	FUNC(BAD_TYPE, ERROR) \
-	FUNC(DUPLICATE_ENTRIES, ERROR) \
-	FUNC(MISSING_AUTHOR, ERROR) \
-	FUNC(MISSING_COMMITTER, ERROR) \
-	FUNC(MISSING_EMAIL, ERROR) \
-	FUNC(MISSING_NAME_BEFORE_EMAIL, ERROR) \
-	FUNC(MISSING_OBJECT, ERROR) \
-	FUNC(MISSING_SPACE_BEFORE_DATE, ERROR) \
-	FUNC(MISSING_SPACE_BEFORE_EMAIL, ERROR) \
-	FUNC(MISSING_TAG, ERROR) \
-	FUNC(MISSING_TAG_ENTRY, ERROR) \
-	FUNC(MISSING_TREE, ERROR) \
-	FUNC(MISSING_TREE_OBJECT, ERROR) \
-	FUNC(MISSING_TYPE, ERROR) \
-	FUNC(MISSING_TYPE_ENTRY, ERROR) \
-	FUNC(MULTIPLE_AUTHORS, ERROR) \
-	FUNC(TREE_NOT_SORTED, ERROR) \
-	FUNC(UNKNOWN_TYPE, ERROR) \
-	FUNC(ZERO_PADDED_DATE, ERROR) \
-	FUNC(GITMODULES_MISSING, ERROR) \
-	FUNC(GITMODULES_BLOB, ERROR) \
-	FUNC(GITMODULES_LARGE, ERROR) \
-	FUNC(GITMODULES_NAME, ERROR) \
-	FUNC(GITMODULES_SYMLINK, ERROR) \
-	FUNC(GITMODULES_URL, ERROR) \
-	FUNC(GITMODULES_PATH, ERROR) \
-	FUNC(GITMODULES_UPDATE, ERROR) \
-	/* warnings */ \
-	FUNC(BAD_FILEMODE, WARN) \
-	FUNC(EMPTY_NAME, WARN) \
-	FUNC(FULL_PATHNAME, WARN) \
-	FUNC(HAS_DOT, WARN) \
-	FUNC(HAS_DOTDOT, WARN) \
-	FUNC(HAS_DOTGIT, WARN) \
-	FUNC(NULL_SHA1, WARN) \
-	FUNC(ZERO_PADDED_FILEMODE, WARN) \
-	FUNC(NUL_IN_COMMIT, WARN) \
-	/* infos (reported as warnings, but ignored by default) */ \
-	FUNC(GITMODULES_PARSE, INFO) \
-	FUNC(BAD_TAG_NAME, INFO) \
-	FUNC(MISSING_TAGGER_ENTRY, INFO) \
-	/* ignored (elevated when requested) */ \
-	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
-
-#define MSG_ID(id, msg_type) FSCK_MSG_##id,
-enum fsck_msg_id {
-	FOREACH_FSCK_MSG_ID(MSG_ID)
-	FSCK_MSG_MAX
-};
-#undef MSG_ID
-
 #define STR(x) #x
 #define MSG_ID(id, msg_type) { STR(id), NULL, NULL, FSCK_##msg_type },
 static struct {
diff --git a/fsck.h b/fsck.h
index fce9981a0c..c3d3b47b88 100644
--- a/fsck.h
+++ b/fsck.h
@@ -13,6 +13,72 @@ enum fsck_msg_type {
 	FSCK_WARN,
 };
 
+#define FOREACH_FSCK_MSG_ID(FUNC) \
+	/* fatal errors */ \
+	FUNC(NUL_IN_HEADER, FATAL) \
+	FUNC(UNTERMINATED_HEADER, FATAL) \
+	/* errors */ \
+	FUNC(BAD_DATE, ERROR) \
+	FUNC(BAD_DATE_OVERFLOW, ERROR) \
+	FUNC(BAD_EMAIL, ERROR) \
+	FUNC(BAD_NAME, ERROR) \
+	FUNC(BAD_OBJECT_SHA1, ERROR) \
+	FUNC(BAD_PARENT_SHA1, ERROR) \
+	FUNC(BAD_TAG_OBJECT, ERROR) \
+	FUNC(BAD_TIMEZONE, ERROR) \
+	FUNC(BAD_TREE, ERROR) \
+	FUNC(BAD_TREE_SHA1, ERROR) \
+	FUNC(BAD_TYPE, ERROR) \
+	FUNC(DUPLICATE_ENTRIES, ERROR) \
+	FUNC(MISSING_AUTHOR, ERROR) \
+	FUNC(MISSING_COMMITTER, ERROR) \
+	FUNC(MISSING_EMAIL, ERROR) \
+	FUNC(MISSING_NAME_BEFORE_EMAIL, ERROR) \
+	FUNC(MISSING_OBJECT, ERROR) \
+	FUNC(MISSING_SPACE_BEFORE_DATE, ERROR) \
+	FUNC(MISSING_SPACE_BEFORE_EMAIL, ERROR) \
+	FUNC(MISSING_TAG, ERROR) \
+	FUNC(MISSING_TAG_ENTRY, ERROR) \
+	FUNC(MISSING_TREE, ERROR) \
+	FUNC(MISSING_TREE_OBJECT, ERROR) \
+	FUNC(MISSING_TYPE, ERROR) \
+	FUNC(MISSING_TYPE_ENTRY, ERROR) \
+	FUNC(MULTIPLE_AUTHORS, ERROR) \
+	FUNC(TREE_NOT_SORTED, ERROR) \
+	FUNC(UNKNOWN_TYPE, ERROR) \
+	FUNC(ZERO_PADDED_DATE, ERROR) \
+	FUNC(GITMODULES_MISSING, ERROR) \
+	FUNC(GITMODULES_BLOB, ERROR) \
+	FUNC(GITMODULES_LARGE, ERROR) \
+	FUNC(GITMODULES_NAME, ERROR) \
+	FUNC(GITMODULES_SYMLINK, ERROR) \
+	FUNC(GITMODULES_URL, ERROR) \
+	FUNC(GITMODULES_PATH, ERROR) \
+	FUNC(GITMODULES_UPDATE, ERROR) \
+	/* warnings */ \
+	FUNC(BAD_FILEMODE, WARN) \
+	FUNC(EMPTY_NAME, WARN) \
+	FUNC(FULL_PATHNAME, WARN) \
+	FUNC(HAS_DOT, WARN) \
+	FUNC(HAS_DOTDOT, WARN) \
+	FUNC(HAS_DOTGIT, WARN) \
+	FUNC(NULL_SHA1, WARN) \
+	FUNC(ZERO_PADDED_FILEMODE, WARN) \
+	FUNC(NUL_IN_COMMIT, WARN) \
+	/* infos (reported as warnings, but ignored by default) */ \
+	FUNC(GITMODULES_PARSE, INFO) \
+	FUNC(BAD_TAG_NAME, INFO) \
+	FUNC(MISSING_TAGGER_ENTRY, INFO) \
+	/* ignored (elevated when requested) */ \
+	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
+
+#define MSG_ID(id, msg_type) FSCK_MSG_##id,
+enum fsck_msg_id {
+	FOREACH_FSCK_MSG_ID(MSG_ID)
+	FSCK_MSG_MAX
+};
+#undef MSG_ID
+
 struct fsck_options;
 struct object;
 
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 17/22] fsck.c: pass along the fsck_msg_id in the fsck_error callback
  2021-03-06 11:04               ` [PATCH v3 00/22] fsck: API improvements Ævar Arnfjörð Bjarmason
                                   ` (17 preceding siblings ...)
  2021-03-16 16:17                 ` [PATCH v4 16/22] fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h Ævar Arnfjörð Bjarmason
@ 2021-03-16 16:17                 ` Ævar Arnfjörð Bjarmason
  2021-03-17 19:01                   ` Junio C Hamano
  2021-03-16 16:17                 ` [PATCH v4 18/22] fsck.c: add an fsck_set_msg_type() API that takes enums Ævar Arnfjörð Bjarmason
                                   ` (4 subsequent siblings)
  23 siblings, 1 reply; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-16 16:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Change the fsck_error callback to also pass along the
fsck_msg_id. Before this change the only way to get the message id was
to parse it back out of the "message".

Let's pass it down explicitly for the benefit of callers that might
want to use it, as discussed in [1].

Passing the msg_type is now redundant, as you can always get it back
from the msg_id, but I'm not changing that convention. It's really
common to need the msg_type, and the report() function itself (which
calls "fsck_error") needs to call fsck_msg_type() to discover
it. Let's not needlessly re-do that work in the user callback.

1. https://lore.kernel.org/git/87blcja2ha.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c       | 4 +++-
 builtin/index-pack.c | 3 ++-
 builtin/mktag.c      | 1 +
 fsck.c               | 6 ++++--
 fsck.h               | 6 ++++--
 5 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index d6d745dc70..b71fac4cec 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -89,7 +89,9 @@ static int objerror(struct object *obj, const char *err)
 static int fsck_error_func(struct fsck_options *o,
 			   const struct object_id *oid,
 			   enum object_type object_type,
-			   enum fsck_msg_type msg_type, const char *message)
+			   enum fsck_msg_type msg_type,
+			   enum fsck_msg_id msg_id,
+			   const char *message)
 {
 	switch (msg_type) {
 	case FSCK_WARN:
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 56b8efaa89..2b2266a4b7 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1717,6 +1717,7 @@ static int print_dangling_gitmodules(struct fsck_options *o,
 				     const struct object_id *oid,
 				     enum object_type object_type,
 				     enum fsck_msg_type msg_type,
+				     enum fsck_msg_id msg_id,
 				     const char *message)
 {
 	/*
@@ -1727,7 +1728,7 @@ static int print_dangling_gitmodules(struct fsck_options *o,
 		printf("%s\n", oid_to_hex(oid));
 		return 0;
 	}
-	return fsck_error_function(o, oid, object_type, msg_type, message);
+	return fsck_error_function(o, oid, object_type, msg_type, msg_id, message);
 }
 
 int cmd_index_pack(int argc, const char **argv, const char *prefix)
diff --git a/builtin/mktag.c b/builtin/mktag.c
index 1834394a9b..dc989c356f 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -23,6 +23,7 @@ static int mktag_fsck_error_func(struct fsck_options *o,
 				 const struct object_id *oid,
 				 enum object_type object_type,
 				 enum fsck_msg_type msg_type,
+				 enum fsck_msg_id msg_id,
 				 const char *message)
 {
 	switch (msg_type) {
diff --git a/fsck.c b/fsck.c
index 8a66168e51..5a040eb4fd 100644
--- a/fsck.c
+++ b/fsck.c
@@ -245,7 +245,7 @@ static int report(struct fsck_options *options,
 	va_start(ap, fmt);
 	strbuf_vaddf(&sb, fmt, ap);
 	result = options->error_func(options, oid, object_type,
-				     msg_type, sb.buf);
+				     msg_type, msg_id, sb.buf);
 	strbuf_release(&sb);
 	va_end(ap);
 
@@ -1198,7 +1198,9 @@ int fsck_object(struct object *obj, void *data, unsigned long size,
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid,
 			enum object_type object_type,
-			enum fsck_msg_type msg_type, const char *message)
+			enum fsck_msg_type msg_type,
+			enum fsck_msg_id msg_id,
+			const char *message)
 {
 	if (msg_type == FSCK_WARN) {
 		warning("object %s: %s", fsck_describe_object(o, oid), message);
diff --git a/fsck.h b/fsck.h
index c3d3b47b88..33ecf3f3f1 100644
--- a/fsck.h
+++ b/fsck.h
@@ -101,11 +101,13 @@ typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type,
 /* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
 typedef int (*fsck_error)(struct fsck_options *o,
 			  const struct object_id *oid, enum object_type object_type,
-			  enum fsck_msg_type msg_type, const char *message);
+			  enum fsck_msg_type msg_type, enum fsck_msg_id msg_id,
+			  const char *message);
 
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid, enum object_type object_type,
-			enum fsck_msg_type msg_type, const char *message);
+			enum fsck_msg_type msg_type, enum fsck_msg_id msg_id,
+			const char *message);
 
 struct fsck_options {
 	fsck_walk_func walk;
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 18/22] fsck.c: add an fsck_set_msg_type() API that takes enums
  2021-03-06 11:04               ` [PATCH v3 00/22] fsck: API improvements Ævar Arnfjörð Bjarmason
                                   ` (18 preceding siblings ...)
  2021-03-16 16:17                 ` [PATCH v4 17/22] fsck.c: pass along the fsck_msg_id in the fsck_error callback Ævar Arnfjörð Bjarmason
@ 2021-03-16 16:17                 ` Ævar Arnfjörð Bjarmason
  2021-03-16 16:17                 ` [PATCH v4 19/22] fsck.c: move gitmodules_{found,done} into fsck_options Ævar Arnfjörð Bjarmason
                                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-16 16:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Change code I added in acf9de4c94e (mktag: use fsck instead of custom
verify_tag(), 2021-01-05) to make use of a new API function that takes
the fsck_msg_{id,type} types, instead of arbitrary strings that
we'll (hopefully) parse into those types.

At the time that the fsck_set_msg_type() API was introduced in
0282f4dced0 (fsck: offer a function to demote fsck errors to warnings,
2015-06-22) it was only intended to be used to parse user-supplied
data.

For things that are purely internal to the C code it makes sense to
have the compiler check these arguments, and to skip the sanity
checking of the data in fsck_set_msg_type() which is redundant to
checks we get from the compiler.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/mktag.c |  3 ++-
 fsck.c          | 27 +++++++++++++++++----------
 fsck.h          |  3 +++
 3 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/builtin/mktag.c b/builtin/mktag.c
index dc989c356f..de67a94f24 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -93,7 +93,8 @@ int cmd_mktag(int argc, const char **argv, const char *prefix)
 		die_errno(_("could not read from stdin"));
 
 	fsck_options.error_func = mktag_fsck_error_func;
-	fsck_set_msg_type(&fsck_options, "extraheaderentry", "warn");
+	fsck_set_msg_type_from_ids(&fsck_options, FSCK_MSG_EXTRA_HEADER_ENTRY,
+				   FSCK_WARN);
 	/* config might set fsck.extraHeaderEntry=* again */
 	git_config(mktag_config, NULL);
 	if (fsck_tag_standalone(NULL, buf.buf, buf.len, &fsck_options,
diff --git a/fsck.c b/fsck.c
index 5a040eb4fd..f26f47b2a1 100644
--- a/fsck.c
+++ b/fsck.c
@@ -132,6 +132,22 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type)
 	return 1;
 }
 
+void fsck_set_msg_type_from_ids(struct fsck_options *options,
+				enum fsck_msg_id msg_id,
+				enum fsck_msg_type msg_type)
+{
+	if (!options->msg_type) {
+		int i;
+		enum fsck_msg_type *severity;
+		ALLOC_ARRAY(severity, FSCK_MSG_MAX);
+		for (i = 0; i < FSCK_MSG_MAX; i++)
+			severity[i] = fsck_msg_type(i, options);
+		options->msg_type = severity;
+	}
+
+	options->msg_type[msg_id] = msg_type;
+}
+
 void fsck_set_msg_type(struct fsck_options *options,
 		const char *msg_id_str, const char *msg_type_str)
 {
@@ -144,16 +160,7 @@ void fsck_set_msg_type(struct fsck_options *options,
 	if (msg_type != FSCK_ERROR && msg_id_info[msg_id].msg_type == FSCK_FATAL)
 		die("Cannot demote %s to %s", msg_id_str, msg_type_str);
 
-	if (!options->msg_type) {
-		int i;
-		enum fsck_msg_type *severity;
-		ALLOC_ARRAY(severity, FSCK_MSG_MAX);
-		for (i = 0; i < FSCK_MSG_MAX; i++)
-			severity[i] = fsck_msg_type(i, options);
-		options->msg_type = severity;
-	}
-
-	options->msg_type[msg_id] = msg_type;
+	fsck_set_msg_type_from_ids(options, msg_id, msg_type);
 }
 
 void fsck_set_msg_types(struct fsck_options *options, const char *values)
diff --git a/fsck.h b/fsck.h
index 33ecf3f3f1..6c2fd9c5cc 100644
--- a/fsck.h
+++ b/fsck.h
@@ -82,6 +82,9 @@ enum fsck_msg_id {
 struct fsck_options;
 struct object;
 
+void fsck_set_msg_type_from_ids(struct fsck_options *options,
+				enum fsck_msg_id msg_id,
+				enum fsck_msg_type msg_type);
 void fsck_set_msg_type(struct fsck_options *options,
 		       const char *msg_id, const char *msg_type);
 void fsck_set_msg_types(struct fsck_options *options, const char *values);
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 19/22] fsck.c: move gitmodules_{found,done} into fsck_options
  2021-03-06 11:04               ` [PATCH v3 00/22] fsck: API improvements Ævar Arnfjörð Bjarmason
                                   ` (19 preceding siblings ...)
  2021-03-16 16:17                 ` [PATCH v4 18/22] fsck.c: add an fsck_set_msg_type() API that takes enums Ævar Arnfjörð Bjarmason
@ 2021-03-16 16:17                 ` Ævar Arnfjörð Bjarmason
  2021-03-16 16:17                 ` [PATCH v4 20/22] fetch-pack: don't needlessly copy fsck_options Ævar Arnfjörð Bjarmason
                                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-16 16:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Move the gitmodules_{found,done} static variables added in
159e7b080bf (fsck: detect gitmodules files, 2018-05-02) into the
fsck_options struct. It makes sense to keep all the context in the
same place.

This requires changing the recently added register_found_gitmodules()
function added in 5476e1efde (fetch-pack: print and use dangling
.gitmodules, 2021-02-22) to take fsck_options. That function will be
removed in a subsequent commit, but as it'll require the new
gitmodules_found attribute of "fsck_options" we need this intermediate
step first.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fetch-pack.c |  2 +-
 fsck.c       | 23 ++++++++++-------------
 fsck.h       |  7 ++++++-
 3 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 6a61a46428..82c3c2c043 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -998,7 +998,7 @@ static void fsck_gitmodules_oids(struct oidset *gitmodules_oids)
 
 	oidset_iter_init(gitmodules_oids, &iter);
 	while ((oid = oidset_iter_next(&iter)))
-		register_found_gitmodules(oid);
+		register_found_gitmodules(&fo, oid);
 	if (fsck_finish(&fo))
 		die("fsck failed");
 }
diff --git a/fsck.c b/fsck.c
index f26f47b2a1..565274a946 100644
--- a/fsck.c
+++ b/fsck.c
@@ -19,9 +19,6 @@
 #include "credential.h"
 #include "help.h"
 
-static struct oidset gitmodules_found = OIDSET_INIT;
-static struct oidset gitmodules_done = OIDSET_INIT;
-
 #define STR(x) #x
 #define MSG_ID(id, msg_type) { STR(id), NULL, NULL, FSCK_##msg_type },
 static struct {
@@ -624,7 +621,7 @@ static int fsck_tree(const struct object_id *oid,
 
 		if (is_hfs_dotgitmodules(name) || is_ntfs_dotgitmodules(name)) {
 			if (!S_ISLNK(mode))
-				oidset_insert(&gitmodules_found, oid);
+				oidset_insert(&options->gitmodules_found, oid);
 			else
 				retval += report(options,
 						 oid, OBJ_TREE,
@@ -638,7 +635,7 @@ static int fsck_tree(const struct object_id *oid,
 				has_dotgit |= is_ntfs_dotgit(backslash);
 				if (is_ntfs_dotgitmodules(backslash)) {
 					if (!S_ISLNK(mode))
-						oidset_insert(&gitmodules_found, oid);
+						oidset_insert(&options->gitmodules_found, oid);
 					else
 						retval += report(options, oid, OBJ_TREE,
 								 FSCK_MSG_GITMODULES_SYMLINK,
@@ -1150,9 +1147,9 @@ static int fsck_blob(const struct object_id *oid, const char *buf,
 	struct fsck_gitmodules_data data;
 	struct config_options config_opts = { 0 };
 
-	if (!oidset_contains(&gitmodules_found, oid))
+	if (!oidset_contains(&options->gitmodules_found, oid))
 		return 0;
-	oidset_insert(&gitmodules_done, oid);
+	oidset_insert(&options->gitmodules_done, oid);
 
 	if (object_on_skiplist(options, oid))
 		return 0;
@@ -1217,9 +1214,9 @@ int fsck_error_function(struct fsck_options *o,
 	return 1;
 }
 
-void register_found_gitmodules(const struct object_id *oid)
+void register_found_gitmodules(struct fsck_options *options, const struct object_id *oid)
 {
-	oidset_insert(&gitmodules_found, oid);
+	oidset_insert(&options->gitmodules_found, oid);
 }
 
 int fsck_finish(struct fsck_options *options)
@@ -1228,13 +1225,13 @@ int fsck_finish(struct fsck_options *options)
 	struct oidset_iter iter;
 	const struct object_id *oid;
 
-	oidset_iter_init(&gitmodules_found, &iter);
+	oidset_iter_init(&options->gitmodules_found, &iter);
 	while ((oid = oidset_iter_next(&iter))) {
 		enum object_type type;
 		unsigned long size;
 		char *buf;
 
-		if (oidset_contains(&gitmodules_done, oid))
+		if (oidset_contains(&options->gitmodules_done, oid))
 			continue;
 
 		buf = read_object_file(oid, &type, &size);
@@ -1259,8 +1256,8 @@ int fsck_finish(struct fsck_options *options)
 	}
 
 
-	oidset_clear(&gitmodules_found);
-	oidset_clear(&gitmodules_done);
+	oidset_clear(&options->gitmodules_found);
+	oidset_clear(&options->gitmodules_done);
 	return ret;
 }
 
diff --git a/fsck.h b/fsck.h
index 6c2fd9c5cc..bb59ef05b6 100644
--- a/fsck.h
+++ b/fsck.h
@@ -118,6 +118,8 @@ struct fsck_options {
 	unsigned strict:1;
 	enum fsck_msg_type *msg_type;
 	struct oidset skiplist;
+	struct oidset gitmodules_found;
+	struct oidset gitmodules_done;
 	kh_oid_map_t *object_names;
 };
 
@@ -125,6 +127,8 @@ struct fsck_options {
 	.walk = NULL, \
 	.msg_type = NULL, \
 	.skiplist = OIDSET_INIT, \
+	.gitmodules_found = OIDSET_INIT, \
+	.gitmodules_done = OIDSET_INIT, \
 	.object_names = NULL,
 #define FSCK_OPTIONS_COMMON_ERROR_FUNC \
 	FSCK_OPTIONS_COMMON \
@@ -149,7 +153,8 @@ int fsck_walk(struct object *obj, void *data, struct fsck_options *options);
 int fsck_object(struct object *obj, void *data, unsigned long size,
 	struct fsck_options *options);
 
-void register_found_gitmodules(const struct object_id *oid);
+void register_found_gitmodules(struct fsck_options *options,
+			       const struct object_id *oid);
 
 /*
  * fsck a tag, and pass info about it back to the caller. This is
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 20/22] fetch-pack: don't needlessly copy fsck_options
  2021-03-06 11:04               ` [PATCH v3 00/22] fsck: API improvements Ævar Arnfjörð Bjarmason
                                   ` (20 preceding siblings ...)
  2021-03-16 16:17                 ` [PATCH v4 19/22] fsck.c: move gitmodules_{found,done} into fsck_options Ævar Arnfjörð Bjarmason
@ 2021-03-16 16:17                 ` Ævar Arnfjörð Bjarmason
  2021-03-16 16:17                 ` [PATCH v4 21/22] fetch-pack: use file-scope static struct for fsck_options Ævar Arnfjörð Bjarmason
  2021-03-16 16:17                 ` [PATCH v4 22/22] fetch-pack: use new fsck API to printing dangling submodules Ævar Arnfjörð Bjarmason
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-16 16:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Change the behavior of the .gitmodules validation added in
5476e1efde (fetch-pack: print and use dangling .gitmodules,
2021-02-22) so we're using one "fsck_options".

I found that code confusing to read. One might think that not setting
up the error_func earlier means that we're relying on the "error_func"
not being set in some code in between the two hunks being modified
here.

But we're not, all we're doing in the rest of "cmd_index_pack()" is
further setup by calling fsck_set_msg_types(), and assigning to
do_fsck_object.

So there was no reason in 5476e1efde to make a shallow copy of the
fsck_options struct before setting error_func. Let's just do this
setup at the top of the function, along with the "walk" assignment.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/index-pack.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 2b2266a4b7..5ad80b85b4 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1761,6 +1761,7 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 
 	read_replace_refs = 0;
 	fsck_options.walk = mark_link;
+	fsck_options.error_func = print_dangling_gitmodules;
 
 	reset_pack_idx_option(&opts);
 	git_config(git_index_pack_config, &opts);
@@ -1951,13 +1952,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	else
 		close(input_fd);
 
-	if (do_fsck_object) {
-		struct fsck_options fo = fsck_options;
-
-		fo.error_func = print_dangling_gitmodules;
-		if (fsck_finish(&fo))
-			die(_("fsck error in pack objects"));
-	}
+	if (do_fsck_object && fsck_finish(&fsck_options))
+		die(_("fsck error in pack objects"));
 
 	free(objects);
 	strbuf_release(&index_name_buf);
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 21/22] fetch-pack: use file-scope static struct for fsck_options
  2021-03-06 11:04               ` [PATCH v3 00/22] fsck: API improvements Ævar Arnfjörð Bjarmason
                                   ` (21 preceding siblings ...)
  2021-03-16 16:17                 ` [PATCH v4 20/22] fetch-pack: don't needlessly copy fsck_options Ævar Arnfjörð Bjarmason
@ 2021-03-16 16:17                 ` Ævar Arnfjörð Bjarmason
  2021-03-16 16:17                 ` [PATCH v4 22/22] fetch-pack: use new fsck API to printing dangling submodules Ævar Arnfjörð Bjarmason
  23 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-16 16:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Change code added in 5476e1efde (fetch-pack: print and use dangling
.gitmodules, 2021-02-22) so that we use a file-scoped "static struct
fsck_options" instead of defining one in the "fsck_gitmodules_oids()"
function.

We use this pattern in all of
builtin/{fsck,index-pack,mktag,unpack-objects}.c. It's odd to see
fetch-pack be the odd one out. One might think that we're using other
fsck_options structs in fetch-pack, or doing on fsck twice there, but
we're not.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fetch-pack.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 82c3c2c043..229fd8e2c2 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -38,6 +38,7 @@ static int server_supports_filtering;
 static int advertise_sid;
 static struct shallow_lock shallow_lock;
 static const char *alternate_shallow_file;
+static struct fsck_options fsck_options = FSCK_OPTIONS_STRICT;
 static struct strbuf fsck_msg_types = STRBUF_INIT;
 static struct string_list uri_protocols = STRING_LIST_INIT_DUP;
 
@@ -991,15 +992,14 @@ static void fsck_gitmodules_oids(struct oidset *gitmodules_oids)
 {
 	struct oidset_iter iter;
 	const struct object_id *oid;
-	struct fsck_options fo = FSCK_OPTIONS_STRICT;
 
 	if (!oidset_size(gitmodules_oids))
 		return;
 
 	oidset_iter_init(gitmodules_oids, &iter);
 	while ((oid = oidset_iter_next(&iter)))
-		register_found_gitmodules(&fo, oid);
-	if (fsck_finish(&fo))
+		register_found_gitmodules(&fsck_options, oid);
+	if (fsck_finish(&fsck_options))
 		die("fsck failed");
 }
 
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 22/22] fetch-pack: use new fsck API to printing dangling submodules
  2021-03-06 11:04               ` [PATCH v3 00/22] fsck: API improvements Ævar Arnfjörð Bjarmason
                                   ` (22 preceding siblings ...)
  2021-03-16 16:17                 ` [PATCH v4 21/22] fetch-pack: use file-scope static struct for fsck_options Ævar Arnfjörð Bjarmason
@ 2021-03-16 16:17                 ` Ævar Arnfjörð Bjarmason
  2021-03-16 19:32                   ` Derrick Stolee
  2021-03-17 19:12                   ` Junio C Hamano
  23 siblings, 2 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-16 16:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Refactor the check added in 5476e1efde (fetch-pack: print and use
dangling .gitmodules, 2021-02-22) to make use of us now passing the
"msg_id" to the user defined "error_func". We can now compare against
the FSCK_MSG_GITMODULES_MISSING instead of parsing the generated
message.

Let's also replace register_found_gitmodules() with directly
manipulating the "gitmodules_found" member. A recent commit moved it
into "fsck_options" so we could do this here.

Add a fsck-cb.c file similar to parse-options-cb.c, the alternative
would be to either define this directly in fsck.c as a public API, or
to create some library shared by fetch-pack.c ad builtin/index-pack.

I expect that there won't be many of these fsck utility functions in
the future, so just having a single fsck-cb.c makes sense.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Makefile             |  1 +
 builtin/index-pack.c | 21 +--------------------
 fetch-pack.c         |  4 ++--
 fsck-cb.c            | 16 ++++++++++++++++
 fsck.c               |  5 -----
 fsck.h               | 22 +++++++++++++++++++---
 6 files changed, 39 insertions(+), 30 deletions(-)
 create mode 100644 fsck-cb.c

diff --git a/Makefile b/Makefile
index dfb0f1000f..3faa8bd0d3 100644
--- a/Makefile
+++ b/Makefile
@@ -882,6 +882,7 @@ LIB_OBJS += fetch-negotiator.o
 LIB_OBJS += fetch-pack.o
 LIB_OBJS += fmt-merge-msg.o
 LIB_OBJS += fsck.o
+LIB_OBJS += fsck-cb.o
 LIB_OBJS += fsmonitor.o
 LIB_OBJS += gettext.o
 LIB_OBJS += gpg-interface.o
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 5ad80b85b4..11f0fafd33 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -120,7 +120,7 @@ static int nr_threads;
 static int from_stdin;
 static int strict;
 static int do_fsck_object;
-static struct fsck_options fsck_options = FSCK_OPTIONS_STRICT;
+static struct fsck_options fsck_options = FSCK_OPTIONS_MISSING_GITMODULES;
 static int verbose;
 static int show_resolving_progress;
 static int show_stat;
@@ -1713,24 +1713,6 @@ static void show_pack_info(int stat_only)
 	}
 }
 
-static int print_dangling_gitmodules(struct fsck_options *o,
-				     const struct object_id *oid,
-				     enum object_type object_type,
-				     enum fsck_msg_type msg_type,
-				     enum fsck_msg_id msg_id,
-				     const char *message)
-{
-	/*
-	 * NEEDSWORK: Plumb the MSG_ID (from fsck.c) here and use it
-	 * instead of relying on this string check.
-	 */
-	if (starts_with(message, "gitmodulesMissing")) {
-		printf("%s\n", oid_to_hex(oid));
-		return 0;
-	}
-	return fsck_error_function(o, oid, object_type, msg_type, msg_id, message);
-}
-
 int cmd_index_pack(int argc, const char **argv, const char *prefix)
 {
 	int i, fix_thin_pack = 0, verify = 0, stat_only = 0, rev_index;
@@ -1761,7 +1743,6 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 
 	read_replace_refs = 0;
 	fsck_options.walk = mark_link;
-	fsck_options.error_func = print_dangling_gitmodules;
 
 	reset_pack_idx_option(&opts);
 	git_config(git_index_pack_config, &opts);
diff --git a/fetch-pack.c b/fetch-pack.c
index 229fd8e2c2..008a3facd4 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -38,7 +38,7 @@ static int server_supports_filtering;
 static int advertise_sid;
 static struct shallow_lock shallow_lock;
 static const char *alternate_shallow_file;
-static struct fsck_options fsck_options = FSCK_OPTIONS_STRICT;
+static struct fsck_options fsck_options = FSCK_OPTIONS_MISSING_GITMODULES;
 static struct strbuf fsck_msg_types = STRBUF_INIT;
 static struct string_list uri_protocols = STRING_LIST_INIT_DUP;
 
@@ -998,7 +998,7 @@ static void fsck_gitmodules_oids(struct oidset *gitmodules_oids)
 
 	oidset_iter_init(gitmodules_oids, &iter);
 	while ((oid = oidset_iter_next(&iter)))
-		register_found_gitmodules(&fsck_options, oid);
+		oidset_insert(&fsck_options.gitmodules_found, oid);
 	if (fsck_finish(&fsck_options))
 		die("fsck failed");
 }
diff --git a/fsck-cb.c b/fsck-cb.c
new file mode 100644
index 0000000000..465a49235a
--- /dev/null
+++ b/fsck-cb.c
@@ -0,0 +1,16 @@
+#include "git-compat-util.h"
+#include "fsck.h"
+
+int fsck_error_cb_print_missing_gitmodules(struct fsck_options *o,
+					   const struct object_id *oid,
+					   enum object_type object_type,
+					   enum fsck_msg_type msg_type,
+					   enum fsck_msg_id msg_id,
+					   const char *message)
+{
+	if (msg_id == FSCK_MSG_GITMODULES_MISSING) {
+		puts(oid_to_hex(oid));
+		return 0;
+	}
+	return fsck_error_function(o, oid, object_type, msg_type, msg_id, message);
+}
diff --git a/fsck.c b/fsck.c
index 565274a946..b0089844db 100644
--- a/fsck.c
+++ b/fsck.c
@@ -1214,11 +1214,6 @@ int fsck_error_function(struct fsck_options *o,
 	return 1;
 }
 
-void register_found_gitmodules(struct fsck_options *options, const struct object_id *oid)
-{
-	oidset_insert(&options->gitmodules_found, oid);
-}
-
 int fsck_finish(struct fsck_options *options)
 {
 	int ret = 0;
diff --git a/fsck.h b/fsck.h
index bb59ef05b6..ae3107638a 100644
--- a/fsck.h
+++ b/fsck.h
@@ -153,9 +153,6 @@ int fsck_walk(struct object *obj, void *data, struct fsck_options *options);
 int fsck_object(struct object *obj, void *data, unsigned long size,
 	struct fsck_options *options);
 
-void register_found_gitmodules(struct fsck_options *options,
-			       const struct object_id *oid);
-
 /*
  * fsck a tag, and pass info about it back to the caller. This is
  * exposed fsck_object() internals for git-mktag(1).
@@ -204,4 +201,23 @@ const char *fsck_describe_object(struct fsck_options *options,
 int fsck_config_internal(const char *var, const char *value, void *cb,
 			 struct fsck_options *options);
 
+/*
+ * Initializations for callbacks in fsck-cb.c
+ */
+#define FSCK_OPTIONS_MISSING_GITMODULES { \
+	.strict = 1, \
+	.error_func = fsck_error_cb_print_missing_gitmodules, \
+	FSCK_OPTIONS_COMMON \
+}
+
+/*
+ * Error callbacks in fsck-cb.c
+ */
+int fsck_error_cb_print_missing_gitmodules(struct fsck_options *o,
+					   const struct object_id *oid,
+					   enum object_type object_type,
+					   enum fsck_msg_type msg_type,
+					   enum fsck_msg_id msg_id,
+					   const char *message);
+
 #endif
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* Re: [PATCH v4 02/22] fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT}
  2021-03-16 16:17                 ` [PATCH v4 02/22] fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT} Ævar Arnfjörð Bjarmason
@ 2021-03-16 18:59                   ` Derrick Stolee
  2021-03-17 18:38                   ` Junio C Hamano
  1 sibling, 0 replies; 229+ messages in thread
From: Derrick Stolee @ 2021-03-16 18:59 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan

On 3/16/2021 12:17 PM, Ævar Arnfjörð Bjarmason wrote:
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  fsck.h | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/fsck.h b/fsck.h
> index 2274843ba0..40f3cb3f64 100644
> --- a/fsck.h
> +++ b/fsck.h
> @@ -43,8 +43,22 @@ struct fsck_options {
>  	kh_oid_map_t *object_names;
>  };
>  
> -#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT, NULL }
> -#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT, NULL }

You just edited these lines in the previous patch. Seems unnecesary
to split them. You can point out that the object_names portion was
previously excluded in the message here.

> +#define FSCK_OPTIONS_DEFAULT { \
> +	.walk = NULL, \
> +	.error_func = fsck_error_function, \
> +	.strict = 0, \
> +	.msg_type = NULL, \
> +	.skiplist = OIDSET_INIT, \
> +	.object_names = NULL, \
> +}
> +#define FSCK_OPTIONS_STRICT { \
> +	.walk = NULL, \
> +	.error_func = fsck_error_function, \
> +	.strict = 1, \
> +	.msg_type = NULL, \
> +	.skiplist = OIDSET_INIT, \
> +	.object_names = NULL, \
> +}

This explicit definition is better.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v4 04/22] fsck.h: add a FSCK_OPTIONS_COMMON_ERROR_FUNC macro
  2021-03-16 16:17                 ` [PATCH v4 04/22] fsck.h: add a FSCK_OPTIONS_COMMON_ERROR_FUNC macro Ævar Arnfjörð Bjarmason
@ 2021-03-16 19:06                   ` Derrick Stolee
  0 siblings, 0 replies; 229+ messages in thread
From: Derrick Stolee @ 2021-03-16 19:06 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan

On 3/16/2021 12:17 PM, Ævar Arnfjörð Bjarmason wrote:
> Add a FSCK_OPTIONS_COMMON_ERROR_FUNC macro for those that would like
> to use FSCK_OPTIONS_COMMON in their own initialization, but supply
> their own error functions.
> 
> Nothing is being changed to use this yet, but in some subsequent
> commits we'll make use of this macro.
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  fsck.h | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/fsck.h b/fsck.h
> index ea3a907ec3..dc35924cbf 100644
> --- a/fsck.h
> +++ b/fsck.h
> @@ -45,12 +45,15 @@ struct fsck_options {
>  
>  #define FSCK_OPTIONS_COMMON \
>  	.walk = NULL, \
> -	.error_func = fsck_error_function, \
>  	.msg_type = NULL, \
>  	.skiplist = OIDSET_INIT, \
>  	.object_names = NULL,
> -#define FSCK_OPTIONS_DEFAULT	{ .strict = 0, FSCK_OPTIONS_COMMON }
> -#define FSCK_OPTIONS_STRICT	{ .strict = 1, FSCK_OPTIONS_COMMON }
> +#define FSCK_OPTIONS_COMMON_ERROR_FUNC \
> +	FSCK_OPTIONS_COMMON \
> +	.error_func = fsck_error_function
> +
> +#define FSCK_OPTIONS_DEFAULT	{ .strict = 0, FSCK_OPTIONS_COMMON_ERROR_FUNC }
> +#define FSCK_OPTIONS_STRICT	{ .strict = 1, FSCK_OPTIONS_COMMON_ERROR_FUNC }

OK. It seems like you are converging on your final definitions
for these macros. At first glance, this seems like unnecessary
split to demonstrate the tiny changes between, but it could
just be done with one change and a description of why you want
the four different entry points as macros.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v4 22/22] fetch-pack: use new fsck API to printing dangling submodules
  2021-03-16 16:17                 ` [PATCH v4 22/22] fetch-pack: use new fsck API to printing dangling submodules Ævar Arnfjörð Bjarmason
@ 2021-03-16 19:32                   ` Derrick Stolee
  2021-03-17 13:47                     ` Ævar Arnfjörð Bjarmason
  2021-03-17 19:12                   ` Junio C Hamano
  1 sibling, 1 reply; 229+ messages in thread
From: Derrick Stolee @ 2021-03-16 19:32 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan

On 3/16/2021 12:17 PM, Ævar Arnfjörð Bjarmason wrote:
> Refactor the check added in 5476e1efde (fetch-pack: print and use
> dangling .gitmodules, 2021-02-22) to make use of us now passing the
> "msg_id" to the user defined "error_func". We can now compare against
> the FSCK_MSG_GITMODULES_MISSING instead of parsing the generated
> message.
> 
> Let's also replace register_found_gitmodules() with directly
> manipulating the "gitmodules_found" member. A recent commit moved it
> into "fsck_options" so we could do this here.
> 
> Add a fsck-cb.c file similar to parse-options-cb.c, the alternative
> would be to either define this directly in fsck.c as a public API, or
> to create some library shared by fetch-pack.c ad builtin/index-pack.
> 
> I expect that there won't be many of these fsck utility functions in
> the future, so just having a single fsck-cb.c makes sense.

I'm not convinced that having a single cb function merits its
own file. But, if you expect this pattern to be expanded a
couple more times, then I would say it is worth it. Do you have
such plans?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v4 00/22] fsck: API improvements
  2021-03-16 16:17                 ` [PATCH v4 " Ævar Arnfjörð Bjarmason
@ 2021-03-16 19:35                   ` Derrick Stolee
  2021-03-17 18:20                   ` [PATCH v5 00/19] " Ævar Arnfjörð Bjarmason
                                     ` (19 subsequent siblings)
  20 siblings, 0 replies; 229+ messages in thread
From: Derrick Stolee @ 2021-03-16 19:35 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan

On 3/16/2021 12:17 PM, Ævar Arnfjörð Bjarmason wrote:
> A re-send of a rebased v3, which I sent at:
> http://lore.kernel.org/git/20210306110439.27694-1-avarab@gmail.com as
> seen in the range-diff there are no changes since v3. I'm just sending
> this as a post-release bump of this, per
> https://lore.kernel.org/git/xmqqy2etczqi.fsf@gitster.g/
> 
> Ævar Arnfjörð Bjarmason (22):
>   fsck.h: update FSCK_OPTIONS_* for object_name
>   fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT}
>   fsck.h: reduce duplication between FSCK_OPTIONS_{DEFAULT,STRICT}
>   fsck.h: add a FSCK_OPTIONS_COMMON_ERROR_FUNC macro
>   fsck.h: indent arguments to of fsck_set_msg_type
>   fsck.h: use "enum object_type" instead of "int"
>   fsck.c: rename variables in fsck_set_msg_type() for less confusion
>   fsck.c: move definition of msg_id into append_msg_id()
>   fsck.c: rename remaining fsck_msg_id "id" to "msg_id"
>   fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type"
>   fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
>   fsck.h: re-order and re-assign "enum fsck_msg_type"
>   fsck.c: call parse_msg_type() early in fsck_set_msg_type()
>   fsck.c: undefine temporary STR macro after use
>   fsck.c: give "FOREACH_MSG_ID" a more specific name
>   fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h
>   fsck.c: pass along the fsck_msg_id in the fsck_error callback
>   fsck.c: add an fsck_set_msg_type() API that takes enums
>   fsck.c: move gitmodules_{found,done} into fsck_options
>   fetch-pack: don't needlessly copy fsck_options
>   fetch-pack: use file-scope static struct for fsck_options
>   fetch-pack: use new fsck API to printing dangling submodules

This series is carefully organized and motivated. It was quite
easy to read.

My complaints were minor. One was that patches 1-4 seemed to be
unnecessarily granular. I'm not sure that having four patches
like that will be more helpful for inspecting the history in
the future. But, I don't care enough to say this should be
re-rolled.

Finally, the last issue is that fsck-cb.c is loosely justified
with only one method inside. If you have plans in the near
future to add similar methods there, then I think that is fine.
Otherwise, it would be simpler to avoid the extra file and
code move.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v4 22/22] fetch-pack: use new fsck API to printing dangling submodules
  2021-03-16 19:32                   ` Derrick Stolee
@ 2021-03-17 13:47                     ` Ævar Arnfjörð Bjarmason
  2021-03-17 20:27                       ` Derrick Stolee
  0 siblings, 1 reply; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-17 13:47 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan


On Tue, Mar 16 2021, Derrick Stolee wrote:

> On 3/16/2021 12:17 PM, Ævar Arnfjörð Bjarmason wrote:
>> Refactor the check added in 5476e1efde (fetch-pack: print and use
>> dangling .gitmodules, 2021-02-22) to make use of us now passing the
>> "msg_id" to the user defined "error_func". We can now compare against
>> the FSCK_MSG_GITMODULES_MISSING instead of parsing the generated
>> message.
>> 
>> Let's also replace register_found_gitmodules() with directly
>> manipulating the "gitmodules_found" member. A recent commit moved it
>> into "fsck_options" so we could do this here.
>> 
>> Add a fsck-cb.c file similar to parse-options-cb.c, the alternative
>> would be to either define this directly in fsck.c as a public API, or
>> to create some library shared by fetch-pack.c ad builtin/index-pack.
>> 
>> I expect that there won't be many of these fsck utility functions in
>> the future, so just having a single fsck-cb.c makes sense.
>
> I'm not convinced that having a single cb function merits its
> own file. But, if you expect this pattern to be expanded a
> couple more times, then I would say it is worth it. Do you have
> such plans?

Not really, well. Vague ones, but nothing I have even local patches for.

It just seemed odd to stick random callback functions shared by related
programs into fsck.h's interface, but I guess with
FSCK_OPTIONS_MISSING_GITMODULES I already did that.

Do you suggest just putting it into fsck.c?

^ permalink raw reply	[flat|nested] 229+ messages in thread

* [PATCH v5 00/19] fsck: API improvements
  2021-03-16 16:17                 ` [PATCH v4 " Ævar Arnfjörð Bjarmason
  2021-03-16 19:35                   ` Derrick Stolee
@ 2021-03-17 18:20                   ` Ævar Arnfjörð Bjarmason
  2021-03-17 20:30                     ` Derrick Stolee
                                       ` (2 more replies)
  2021-03-17 18:20                   ` [PATCH v5 01/19] fsck.c: refactor and rename common config callback Ævar Arnfjörð Bjarmason
                                     ` (18 subsequent siblings)
  20 siblings, 3 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-17 18:20 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

A v5 with changes suggested by Derrick Stolee. Link to v4:
https://lore.kernel.org/git/20210316161738.30254-1-avarab@gmail.com/

Changes:

 * 1/19 is new, it's a simple refactoring of some git_config() code in
   fsck.c code I changed recently.

 * Squashed the first 4x patches of incrementally redefining two
   macros into one.

 * Squashed a whitespace-only change into another patch that changed
   the same code.

 * Got rid of fsck-cb.c, that one function just lives at the bottom of
   fsck.c now.

Ævar Arnfjörð Bjarmason (19):
  fsck.c: refactor and rename common config callback
  fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT}
  fsck.h: use "enum object_type" instead of "int"
  fsck.c: rename variables in fsck_set_msg_type() for less confusion
  fsck.c: move definition of msg_id into append_msg_id()
  fsck.c: rename remaining fsck_msg_id "id" to "msg_id"
  fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type"
  fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
  fsck.h: re-order and re-assign "enum fsck_msg_type"
  fsck.c: call parse_msg_type() early in fsck_set_msg_type()
  fsck.c: undefine temporary STR macro after use
  fsck.c: give "FOREACH_MSG_ID" a more specific name
  fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h
  fsck.c: pass along the fsck_msg_id in the fsck_error callback
  fsck.c: add an fsck_set_msg_type() API that takes enums
  fsck.c: move gitmodules_{found,done} into fsck_options
  fetch-pack: don't needlessly copy fsck_options
  fetch-pack: use file-scope static struct for fsck_options
  fetch-pack: use new fsck API to printing dangling submodules

 builtin/fsck.c           |  14 ++-
 builtin/index-pack.c     |  30 +-----
 builtin/mktag.c          |  14 ++-
 builtin/unpack-objects.c |   3 +-
 fetch-pack.c             |   6 +-
 fsck.c                   | 197 +++++++++++++++------------------------
 fsck.h                   | 131 +++++++++++++++++++++++---
 7 files changed, 213 insertions(+), 182 deletions(-)

Range-diff:
 1:  9cd942b526 <  -:  ---------- fsck.h: update FSCK_OPTIONS_* for object_name
 2:  d67966b838 <  -:  ---------- fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT}
 3:  211472e0c5 <  -:  ---------- fsck.h: reduce duplication between FSCK_OPTIONS_{DEFAULT,STRICT}
 4:  70afee988d <  -:  ---------- fsck.h: add a FSCK_OPTIONS_COMMON_ERROR_FUNC macro
 5:  1337d53352 <  -:  ---------- fsck.h: indent arguments to of fsck_set_msg_type
 -:  ---------- >  1:  fe33015e0d fsck.c: refactor and rename common config callback
 -:  ---------- >  2:  72f2e53afa fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT}
 6:  e4ef107bb4 =  3:  237a280686 fsck.h: use "enum object_type" instead of "int"
 7:  20bac3207e !  4:  13b76c73dd fsck.c: rename variables in fsck_set_msg_type() for less confusion
    @@ Commit message
         "msg_id" to "msg_id_str" etc. This will make a follow-up change
         smaller.
     
    +    While I'm at it properly indent the fsck_set_msg_type() argument list.
    +
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## fsck.c ##
    @@ fsck.c: int is_valid_msg_type(const char *msg_id, const char *msg_type)
      
      void fsck_set_msg_type(struct fsck_options *options,
     -		const char *msg_id, const char *msg_type)
    -+		const char *msg_id_str, const char *msg_type_str)
    ++		       const char *msg_id_str, const char *msg_type_str)
      {
     -	int id = parse_msg_id(msg_id), type;
     +	int msg_id = parse_msg_id(msg_id_str), msg_type;
    @@ fsck.c: int is_valid_msg_type(const char *msg_id, const char *msg_type)
      }
      
      void fsck_set_msg_types(struct fsck_options *options, const char *values)
    +
    + ## fsck.h ##
    +@@ fsck.h: struct fsck_options;
    + struct object;
    + 
    + void fsck_set_msg_type(struct fsck_options *options,
    +-		const char *msg_id, const char *msg_type);
    ++		       const char *msg_id, const char *msg_type);
    + void fsck_set_msg_types(struct fsck_options *options, const char *values);
    + int is_valid_msg_type(const char *msg_id, const char *msg_type);
    + 
 8:  09c3bba9e9 =  5:  4ae83403b7 fsck.c: move definition of msg_id into append_msg_id()
 9:  8067df53a2 =  6:  82107f1dac fsck.c: rename remaining fsck_msg_id "id" to "msg_id"
10:  bdf5e13f3d =  7:  796096bf73 fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type"
11:  b03caa237f !  8:  3664abb23d fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
    @@ builtin/index-pack.c: static void show_pack_info(int stat_only)
      	 * NEEDSWORK: Plumb the MSG_ID (from fsck.c) here and use it
     
      ## builtin/mktag.c ##
    -@@ builtin/mktag.c: static int mktag_config(const char *var, const char *value, void *cb)
    +@@ builtin/mktag.c: static struct fsck_options fsck_options = FSCK_OPTIONS_STRICT;
      static int mktag_fsck_error_func(struct fsck_options *o,
      				 const struct object_id *oid,
      				 enum object_type object_type,
    @@ fsck.c: static int fsck_msg_type(enum fsck_msg_id msg_id,
      		return FSCK_ERROR;
     @@ fsck.c: int is_valid_msg_type(const char *msg_id, const char *msg_type)
      void fsck_set_msg_type(struct fsck_options *options,
    - 		const char *msg_id_str, const char *msg_type_str)
    + 		       const char *msg_id_str, const char *msg_type_str)
      {
     -	int msg_id = parse_msg_id(msg_id_str), msg_type;
     +	int msg_id = parse_msg_id(msg_id_str);
12:  7b1d13b4cc =  9:  81e6d7ab45 fsck.h: re-order and re-assign "enum fsck_msg_type"
13:  a8e4ca7b19 ! 10:  5c2e8e7b84 fsck.c: call parse_msg_type() early in fsck_set_msg_type()
    @@ Commit message
     
      ## fsck.c ##
     @@ fsck.c: void fsck_set_msg_type(struct fsck_options *options,
    - 		const char *msg_id_str, const char *msg_type_str)
    + 		       const char *msg_id_str, const char *msg_type_str)
      {
      	int msg_id = parse_msg_id(msg_id_str);
     -	enum fsck_msg_type msg_type;
14:  214c375a20 = 11:  7ffbf9af3f fsck.c: undefine temporary STR macro after use
15:  19a2499a80 = 12:  12ff0f75eb fsck.c: give "FOREACH_MSG_ID" a more specific name
16:  6e1a7b6274 = 13:  0c49dd5164 fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h
17:  42af4e164c = 14:  900263f503 fsck.c: pass along the fsck_msg_id in the fsck_error callback
18:  fa47f473a8 ! 15:  5f270e88a0 fsck.c: add an fsck_set_msg_type() API that takes enums
    @@ builtin/mktag.c: int cmd_mktag(int argc, const char **argv, const char *prefix)
     +	fsck_set_msg_type_from_ids(&fsck_options, FSCK_MSG_EXTRA_HEADER_ENTRY,
     +				   FSCK_WARN);
      	/* config might set fsck.extraHeaderEntry=* again */
    - 	git_config(mktag_config, NULL);
    + 	git_config(git_fsck_config, &fsck_options);
      	if (fsck_tag_standalone(NULL, buf.buf, buf.len, &fsck_options,
     
      ## fsck.c ##
    @@ fsck.c: int is_valid_msg_type(const char *msg_id, const char *msg_type)
     +}
     +
      void fsck_set_msg_type(struct fsck_options *options,
    - 		const char *msg_id_str, const char *msg_type_str)
    + 		       const char *msg_id_str, const char *msg_type_str)
      {
     @@ fsck.c: void fsck_set_msg_type(struct fsck_options *options,
      	if (msg_type != FSCK_ERROR && msg_id_info[msg_id].msg_type == FSCK_FATAL)
19:  4cc3880cc4 = 16:  539d019712 fsck.c: move gitmodules_{found,done} into fsck_options
20:  fd219d318a = 17:  1acf744236 fetch-pack: don't needlessly copy fsck_options
21:  e4cd8c250e = 18:  b47c3d5ac6 fetch-pack: use file-scope static struct for fsck_options
22:  fdbc3c304c ! 19:  f05fa5c3ec fetch-pack: use new fsck API to printing dangling submodules
    @@ Commit message
         manipulating the "gitmodules_found" member. A recent commit moved it
         into "fsck_options" so we could do this here.
     
    -    Add a fsck-cb.c file similar to parse-options-cb.c, the alternative
    -    would be to either define this directly in fsck.c as a public API, or
    -    to create some library shared by fetch-pack.c ad builtin/index-pack.
    +    I'm sticking this callback in fsck.c. Perhaps in the future we'd like
    +    to accumulate such callbacks into another file (maybe fsck-cb.c,
    +    similar to parse-options-cb.c?), but while we've got just the one
    +    let's just put it into fsck.c.
     
    -    I expect that there won't be many of these fsck utility functions in
    -    the future, so just having a single fsck-cb.c makes sense.
    +    A better alternative in this case would be some library some more
    +    obvious library shared by fetch-pack.c ad builtin/index-pack.c, but
    +    there isn't such a thing.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    - ## Makefile ##
    -@@ Makefile: LIB_OBJS += fetch-negotiator.o
    - LIB_OBJS += fetch-pack.o
    - LIB_OBJS += fmt-merge-msg.o
    - LIB_OBJS += fsck.o
    -+LIB_OBJS += fsck-cb.o
    - LIB_OBJS += fsmonitor.o
    - LIB_OBJS += gettext.o
    - LIB_OBJS += gpg-interface.o
    -
      ## builtin/index-pack.c ##
     @@ builtin/index-pack.c: static int nr_threads;
      static int from_stdin;
    @@ fetch-pack.c: static void fsck_gitmodules_oids(struct oidset *gitmodules_oids)
      		die("fsck failed");
      }
     
    - ## fsck-cb.c (new) ##
    -@@
    -+#include "git-compat-util.h"
    -+#include "fsck.h"
    + ## fsck.c ##
    +@@ fsck.c: int fsck_error_function(struct fsck_options *o,
    + 	return 1;
    + }
    + 
    +-void register_found_gitmodules(struct fsck_options *options, const struct object_id *oid)
    +-{
    +-	oidset_insert(&options->gitmodules_found, oid);
    +-}
    +-
    + int fsck_finish(struct fsck_options *options)
    + {
    + 	int ret = 0;
    +@@ fsck.c: int git_fsck_config(const char *var, const char *value, void *cb)
    + 
    + 	return git_default_config(var, value, cb);
    + }
    ++
    ++/*
    ++ * Custom error callbacks that are used in more than one place.
    ++ */
     +
     +int fsck_error_cb_print_missing_gitmodules(struct fsck_options *o,
     +					   const struct object_id *oid,
    @@ fsck-cb.c (new)
     +	return fsck_error_function(o, oid, object_type, msg_type, msg_id, message);
     +}
     
    - ## fsck.c ##
    -@@ fsck.c: int fsck_error_function(struct fsck_options *o,
    - 	return 1;
    - }
    - 
    --void register_found_gitmodules(struct fsck_options *options, const struct object_id *oid)
    --{
    --	oidset_insert(&options->gitmodules_found, oid);
    --}
    --
    - int fsck_finish(struct fsck_options *options)
    - {
    - 	int ret = 0;
    -
      ## fsck.h ##
     @@ fsck.h: int fsck_walk(struct object *obj, void *data, struct fsck_options *options);
      int fsck_object(struct object *obj, void *data, unsigned long size,
    @@ fsck.h: int fsck_walk(struct object *obj, void *data, struct fsck_options *optio
       * fsck a tag, and pass info about it back to the caller. This is
       * exposed fsck_object() internals for git-mktag(1).
     @@ fsck.h: const char *fsck_describe_object(struct fsck_options *options,
    - int fsck_config_internal(const char *var, const char *value, void *cb,
    - 			 struct fsck_options *options);
    +  */
    + int git_fsck_config(const char *var, const char *value, void *cb);
      
     +/*
    -+ * Initializations for callbacks in fsck-cb.c
    ++ * Custom error callbacks that are used in more than one place.
     + */
     +#define FSCK_OPTIONS_MISSING_GITMODULES { \
     +	.strict = 1, \
     +	.error_func = fsck_error_cb_print_missing_gitmodules, \
     +	FSCK_OPTIONS_COMMON \
     +}
    -+
    -+/*
    -+ * Error callbacks in fsck-cb.c
    -+ */
     +int fsck_error_cb_print_missing_gitmodules(struct fsck_options *o,
     +					   const struct object_id *oid,
     +					   enum object_type object_type,
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply	[flat|nested] 229+ messages in thread

* [PATCH v5 01/19] fsck.c: refactor and rename common config callback
  2021-03-16 16:17                 ` [PATCH v4 " Ævar Arnfjörð Bjarmason
  2021-03-16 19:35                   ` Derrick Stolee
  2021-03-17 18:20                   ` [PATCH v5 00/19] " Ævar Arnfjörð Bjarmason
@ 2021-03-17 18:20                   ` Ævar Arnfjörð Bjarmason
  2021-03-17 18:20                   ` [PATCH v5 02/19] fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT} Ævar Arnfjörð Bjarmason
                                     ` (17 subsequent siblings)
  20 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-17 18:20 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Refactor code I recently changed in 1f3299fda9 (fsck: make
fsck_config() re-usable, 2021-01-05) so that I could use fsck's config
callback in mktag in 1f3299fda9 (fsck: make fsck_config() re-usable,
2021-01-05).

I don't know what I was thinking in structuring the code this way, but
it clearly makes no sense to have an fsck_config_internal() at all
just so it can get a fsck_options when git_config() already supports
passing along some void* data.

Let's just make use of that instead, which gets us rid of the two
wrapper functions, and brings fsck's common config callback in line
with other such reusable config callbacks.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  | 7 +------
 builtin/mktag.c | 7 +------
 fsck.c          | 4 ++--
 fsck.h          | 3 +--
 4 files changed, 5 insertions(+), 16 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 821e7798c7..a56a2d0513 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -71,11 +71,6 @@ static const char *printable_type(const struct object_id *oid,
 	return ret;
 }
 
-static int fsck_config(const char *var, const char *value, void *cb)
-{
-	return fsck_config_internal(var, value, cb, &fsck_obj_options);
-}
-
 static int objerror(struct object *obj, const char *err)
 {
 	errors_found |= ERROR_OBJECT;
@@ -803,7 +798,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 	if (name_objects)
 		fsck_enable_object_names(&fsck_walk_options);
 
-	git_config(fsck_config, NULL);
+	git_config(git_fsck_config, &fsck_obj_options);
 
 	if (connectivity_only) {
 		for_each_loose_object(mark_loose_for_connectivity, NULL, 0);
diff --git a/builtin/mktag.c b/builtin/mktag.c
index 41a399a69e..23c4b8763f 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -14,11 +14,6 @@ static int option_strict = 1;
 
 static struct fsck_options fsck_options = FSCK_OPTIONS_STRICT;
 
-static int mktag_config(const char *var, const char *value, void *cb)
-{
-	return fsck_config_internal(var, value, cb, &fsck_options);
-}
-
 static int mktag_fsck_error_func(struct fsck_options *o,
 				 const struct object_id *oid,
 				 enum object_type object_type,
@@ -93,7 +88,7 @@ int cmd_mktag(int argc, const char **argv, const char *prefix)
 	fsck_options.error_func = mktag_fsck_error_func;
 	fsck_set_msg_type(&fsck_options, "extraheaderentry", "warn");
 	/* config might set fsck.extraHeaderEntry=* again */
-	git_config(mktag_config, NULL);
+	git_config(git_fsck_config, &fsck_options);
 	if (fsck_tag_standalone(NULL, buf.buf, buf.len, &fsck_options,
 				&tagged_oid, &tagged_type))
 		die(_("tag on stdin did not pass our strict fsck check"));
diff --git a/fsck.c b/fsck.c
index e3030f3b35..5dfb99665a 100644
--- a/fsck.c
+++ b/fsck.c
@@ -1323,9 +1323,9 @@ int fsck_finish(struct fsck_options *options)
 	return ret;
 }
 
-int fsck_config_internal(const char *var, const char *value, void *cb,
-			 struct fsck_options *options)
+int git_fsck_config(const char *var, const char *value, void *cb)
 {
+	struct fsck_options *options = cb;
 	if (strcmp(var, "fsck.skiplist") == 0) {
 		const char *path;
 		struct strbuf sb = STRBUF_INIT;
diff --git a/fsck.h b/fsck.h
index 733378f126..f70d11c559 100644
--- a/fsck.h
+++ b/fsck.h
@@ -109,7 +109,6 @@ const char *fsck_describe_object(struct fsck_options *options,
  * git_config() callback for use by fsck-y tools that want to support
  * fsck.<msg> fsck.skipList etc.
  */
-int fsck_config_internal(const char *var, const char *value, void *cb,
-			 struct fsck_options *options);
+int git_fsck_config(const char *var, const char *value, void *cb);
 
 #endif
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 02/19] fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT}
  2021-03-16 16:17                 ` [PATCH v4 " Ævar Arnfjörð Bjarmason
                                     ` (2 preceding siblings ...)
  2021-03-17 18:20                   ` [PATCH v5 01/19] fsck.c: refactor and rename common config callback Ævar Arnfjörð Bjarmason
@ 2021-03-17 18:20                   ` Ævar Arnfjörð Bjarmason
  2021-03-17 18:20                   ` [PATCH v5 03/19] fsck.h: use "enum object_type" instead of "int" Ævar Arnfjörð Bjarmason
                                     ` (16 subsequent siblings)
  20 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-17 18:20 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Refactor the definitions of FSCK_OPTIONS_{DEFAULT,STRICT} to use
designated initializers.

While I'm at it add the "object_names" member to the
initialization. This was omitted in 7b35efd734e (fsck_walk():
optionally name objects on the go, 2016-07-17) when the field was
added.

I'm using a new FSCK_OPTIONS_COMMON and FSCK_OPTIONS_COMMON_ERROR_FUNC
helper macros to define what FSCK_OPTIONS_{DEFAULT,STRICT} have in
common, and define the two in terms of those macro.

The FSCK_OPTIONS_COMMON macro will be used in a subsequent commit to
define other variants of common fsck initialization that wants to use
a custom error function, but share the rest of the defaults.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.h | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/fsck.h b/fsck.h
index f70d11c559..15e12f292f 100644
--- a/fsck.h
+++ b/fsck.h
@@ -43,8 +43,17 @@ struct fsck_options {
 	kh_oid_map_t *object_names;
 };
 
-#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT }
-#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT }
+#define FSCK_OPTIONS_COMMON \
+	.walk = NULL, \
+	.msg_type = NULL, \
+	.skiplist = OIDSET_INIT, \
+	.object_names = NULL,
+#define FSCK_OPTIONS_COMMON_ERROR_FUNC \
+	FSCK_OPTIONS_COMMON \
+	.error_func = fsck_error_function
+
+#define FSCK_OPTIONS_DEFAULT	{ .strict = 0, FSCK_OPTIONS_COMMON_ERROR_FUNC }
+#define FSCK_OPTIONS_STRICT	{ .strict = 1, FSCK_OPTIONS_COMMON_ERROR_FUNC }
 
 /* descend in all linked child objects
  * the return value is:
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 03/19] fsck.h: use "enum object_type" instead of "int"
  2021-03-16 16:17                 ` [PATCH v4 " Ævar Arnfjörð Bjarmason
                                     ` (3 preceding siblings ...)
  2021-03-17 18:20                   ` [PATCH v5 02/19] fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT} Ævar Arnfjörð Bjarmason
@ 2021-03-17 18:20                   ` Ævar Arnfjörð Bjarmason
  2021-03-17 18:20                   ` [PATCH v5 04/19] fsck.c: rename variables in fsck_set_msg_type() for less confusion Ævar Arnfjörð Bjarmason
                                     ` (15 subsequent siblings)
  20 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-17 18:20 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Change the fsck_walk_func to use an "enum object_type" instead of an
"int" type. The types are compatible, and ever since this was added in
355885d5315 (add generic, type aware object chain walker, 2008-02-25)
we've used entries from object_type (OBJ_BLOB etc.).

So this doesn't really change anything as far as the generated code is
concerned, it just gives the compiler more information and makes this
easier to read.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c           | 3 ++-
 builtin/index-pack.c     | 3 ++-
 builtin/unpack-objects.c | 3 ++-
 fsck.h                   | 3 ++-
 4 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index a56a2d0513..ed5f2af6b5 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -192,7 +192,8 @@ static int traverse_reachable(void)
 	return !!result;
 }
 
-static int mark_used(struct object *obj, int type, void *data, struct fsck_options *options)
+static int mark_used(struct object *obj, enum object_type object_type,
+		     void *data, struct fsck_options *options)
 {
 	if (!obj)
 		return 1;
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index bad5748807..69f24fe9f7 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -212,7 +212,8 @@ static void cleanup_thread(void)
 	free(thread_data);
 }
 
-static int mark_link(struct object *obj, int type, void *data, struct fsck_options *options)
+static int mark_link(struct object *obj, enum object_type type,
+		     void *data, struct fsck_options *options)
 {
 	if (!obj)
 		return -1;
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index dd4a75e030..ca54fd1668 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -187,7 +187,8 @@ static void write_cached_object(struct object *obj, struct obj_buffer *obj_buf)
  * that have reachability requirements and calls this function.
  * Verify its reachability and validity recursively and write it out.
  */
-static int check_object(struct object *obj, int type, void *data, struct fsck_options *options)
+static int check_object(struct object *obj, enum object_type type,
+			void *data, struct fsck_options *options)
 {
 	struct obj_buffer *obj_buf;
 
diff --git a/fsck.h b/fsck.h
index 15e12f292f..e3edaff8e7 100644
--- a/fsck.h
+++ b/fsck.h
@@ -23,7 +23,8 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type);
  *     <0	error signaled and abort
  *     >0	error signaled and do not abort
  */
-typedef int (*fsck_walk_func)(struct object *obj, int type, void *data, struct fsck_options *options);
+typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type,
+			      void *data, struct fsck_options *options);
 
 /* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
 typedef int (*fsck_error)(struct fsck_options *o,
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 04/19] fsck.c: rename variables in fsck_set_msg_type() for less confusion
  2021-03-16 16:17                 ` [PATCH v4 " Ævar Arnfjörð Bjarmason
                                     ` (4 preceding siblings ...)
  2021-03-17 18:20                   ` [PATCH v5 03/19] fsck.h: use "enum object_type" instead of "int" Ævar Arnfjörð Bjarmason
@ 2021-03-17 18:20                   ` Ævar Arnfjörð Bjarmason
  2021-03-17 18:20                   ` [PATCH v5 05/19] fsck.c: move definition of msg_id into append_msg_id() Ævar Arnfjörð Bjarmason
                                     ` (14 subsequent siblings)
  20 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-17 18:20 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Rename variables in a function added in 0282f4dced0 (fsck: offer a
function to demote fsck errors to warnings, 2015-06-22).

It was needlessly confusing that it took a "msg_type" argument, but
then later declared another "msg_type" of a different type.

Let's rename that to "severity", and rename "id" to "msg_id" and
"msg_id" to "msg_id_str" etc. This will make a follow-up change
smaller.

While I'm at it properly indent the fsck_set_msg_type() argument list.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 24 ++++++++++++------------
 fsck.h |  2 +-
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/fsck.c b/fsck.c
index 5dfb99665a..7cc722a25c 100644
--- a/fsck.c
+++ b/fsck.c
@@ -203,27 +203,27 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type)
 }
 
 void fsck_set_msg_type(struct fsck_options *options,
-		const char *msg_id, const char *msg_type)
+		       const char *msg_id_str, const char *msg_type_str)
 {
-	int id = parse_msg_id(msg_id), type;
+	int msg_id = parse_msg_id(msg_id_str), msg_type;
 
-	if (id < 0)
-		die("Unhandled message id: %s", msg_id);
-	type = parse_msg_type(msg_type);
+	if (msg_id < 0)
+		die("Unhandled message id: %s", msg_id_str);
+	msg_type = parse_msg_type(msg_type_str);
 
-	if (type != FSCK_ERROR && msg_id_info[id].msg_type == FSCK_FATAL)
-		die("Cannot demote %s to %s", msg_id, msg_type);
+	if (msg_type != FSCK_ERROR && msg_id_info[msg_id].msg_type == FSCK_FATAL)
+		die("Cannot demote %s to %s", msg_id_str, msg_type_str);
 
 	if (!options->msg_type) {
 		int i;
-		int *msg_type;
-		ALLOC_ARRAY(msg_type, FSCK_MSG_MAX);
+		int *severity;
+		ALLOC_ARRAY(severity, FSCK_MSG_MAX);
 		for (i = 0; i < FSCK_MSG_MAX; i++)
-			msg_type[i] = fsck_msg_type(i, options);
-		options->msg_type = msg_type;
+			severity[i] = fsck_msg_type(i, options);
+		options->msg_type = severity;
 	}
 
-	options->msg_type[id] = type;
+	options->msg_type[msg_id] = msg_type;
 }
 
 void fsck_set_msg_types(struct fsck_options *options, const char *values)
diff --git a/fsck.h b/fsck.h
index e3edaff8e7..12ff99b56e 100644
--- a/fsck.h
+++ b/fsck.h
@@ -11,7 +11,7 @@ struct fsck_options;
 struct object;
 
 void fsck_set_msg_type(struct fsck_options *options,
-		const char *msg_id, const char *msg_type);
+		       const char *msg_id, const char *msg_type);
 void fsck_set_msg_types(struct fsck_options *options, const char *values);
 int is_valid_msg_type(const char *msg_id, const char *msg_type);
 
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 05/19] fsck.c: move definition of msg_id into append_msg_id()
  2021-03-16 16:17                 ` [PATCH v4 " Ævar Arnfjörð Bjarmason
                                     ` (5 preceding siblings ...)
  2021-03-17 18:20                   ` [PATCH v5 04/19] fsck.c: rename variables in fsck_set_msg_type() for less confusion Ævar Arnfjörð Bjarmason
@ 2021-03-17 18:20                   ` Ævar Arnfjörð Bjarmason
  2021-03-17 18:20                   ` [PATCH v5 06/19] fsck.c: rename remaining fsck_msg_id "id" to "msg_id" Ævar Arnfjörð Bjarmason
                                     ` (13 subsequent siblings)
  20 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-17 18:20 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Refactor code added in 71ab8fa840f (fsck: report the ID of the
error/warning, 2015-06-22) to resolve the msg_id to a string in the
function that wants it, instead of doing it in report().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fsck.c b/fsck.c
index 7cc722a25c..ffb9115ddb 100644
--- a/fsck.c
+++ b/fsck.c
@@ -264,8 +264,9 @@ void fsck_set_msg_types(struct fsck_options *options, const char *values)
 	free(to_free);
 }
 
-static void append_msg_id(struct strbuf *sb, const char *msg_id)
+static void append_msg_id(struct strbuf *sb, enum fsck_msg_id id)
 {
+	const char *msg_id = msg_id_info[id].id_string;
 	for (;;) {
 		char c = *(msg_id)++;
 
@@ -308,7 +309,7 @@ static int report(struct fsck_options *options,
 	else if (msg_type == FSCK_INFO)
 		msg_type = FSCK_WARN;
 
-	append_msg_id(&sb, msg_id_info[id].id_string);
+	append_msg_id(&sb, id);
 
 	va_start(ap, fmt);
 	strbuf_vaddf(&sb, fmt, ap);
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 06/19] fsck.c: rename remaining fsck_msg_id "id" to "msg_id"
  2021-03-16 16:17                 ` [PATCH v4 " Ævar Arnfjörð Bjarmason
                                     ` (6 preceding siblings ...)
  2021-03-17 18:20                   ` [PATCH v5 05/19] fsck.c: move definition of msg_id into append_msg_id() Ævar Arnfjörð Bjarmason
@ 2021-03-17 18:20                   ` Ævar Arnfjörð Bjarmason
  2021-03-17 18:20                   ` [PATCH v5 07/19] fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type" Ævar Arnfjörð Bjarmason
                                     ` (12 subsequent siblings)
  20 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-17 18:20 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Rename the remaining variables of type fsck_msg_id from "id" to
"msg_id". This change is relatively small, and is worth the churn for
a later change where we have different id's in the "report" function.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fsck.c b/fsck.c
index ffb9115ddb..a9a8783aeb 100644
--- a/fsck.c
+++ b/fsck.c
@@ -264,19 +264,19 @@ void fsck_set_msg_types(struct fsck_options *options, const char *values)
 	free(to_free);
 }
 
-static void append_msg_id(struct strbuf *sb, enum fsck_msg_id id)
+static void append_msg_id(struct strbuf *sb, enum fsck_msg_id msg_id)
 {
-	const char *msg_id = msg_id_info[id].id_string;
+	const char *msg_id_str = msg_id_info[msg_id].id_string;
 	for (;;) {
-		char c = *(msg_id)++;
+		char c = *(msg_id_str)++;
 
 		if (!c)
 			break;
 		if (c != '_')
 			strbuf_addch(sb, tolower(c));
 		else {
-			assert(*msg_id);
-			strbuf_addch(sb, *(msg_id)++);
+			assert(*msg_id_str);
+			strbuf_addch(sb, *(msg_id_str)++);
 		}
 	}
 
@@ -292,11 +292,11 @@ static int object_on_skiplist(struct fsck_options *opts,
 __attribute__((format (printf, 5, 6)))
 static int report(struct fsck_options *options,
 		  const struct object_id *oid, enum object_type object_type,
-		  enum fsck_msg_id id, const char *fmt, ...)
+		  enum fsck_msg_id msg_id, const char *fmt, ...)
 {
 	va_list ap;
 	struct strbuf sb = STRBUF_INIT;
-	int msg_type = fsck_msg_type(id, options), result;
+	int msg_type = fsck_msg_type(msg_id, options), result;
 
 	if (msg_type == FSCK_IGNORE)
 		return 0;
@@ -309,7 +309,7 @@ static int report(struct fsck_options *options,
 	else if (msg_type == FSCK_INFO)
 		msg_type = FSCK_WARN;
 
-	append_msg_id(&sb, id);
+	append_msg_id(&sb, msg_id);
 
 	va_start(ap, fmt);
 	strbuf_vaddf(&sb, fmt, ap);
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 07/19] fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type"
  2021-03-16 16:17                 ` [PATCH v4 " Ævar Arnfjörð Bjarmason
                                     ` (7 preceding siblings ...)
  2021-03-17 18:20                   ` [PATCH v5 06/19] fsck.c: rename remaining fsck_msg_id "id" to "msg_id" Ævar Arnfjörð Bjarmason
@ 2021-03-17 18:20                   ` Ævar Arnfjörð Bjarmason
  2021-03-17 18:20                   ` [PATCH v5 08/19] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Ævar Arnfjörð Bjarmason
                                     ` (11 subsequent siblings)
  20 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-17 18:20 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Refactor "if options->msg_type" and other code added in
0282f4dced0 (fsck: offer a function to demote fsck errors to warnings,
2015-06-22) to reduce the scope of the "int msg_type" variable.

This is in preparation for changing its type in a subsequent commit,
only using it in the "!options->msg_type" scope makes that change

This also brings the code in line with the fsck_set_msg_type()
function (also added in 0282f4dced0), which does a similar check for
"!options->msg_type". Another minor benefit is getting rid of the
style violation of not having braces for the body of the "if".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/fsck.c b/fsck.c
index a9a8783aeb..2f23255f99 100644
--- a/fsck.c
+++ b/fsck.c
@@ -167,19 +167,17 @@ void list_config_fsck_msg_ids(struct string_list *list, const char *prefix)
 static int fsck_msg_type(enum fsck_msg_id msg_id,
 	struct fsck_options *options)
 {
-	int msg_type;
-
 	assert(msg_id >= 0 && msg_id < FSCK_MSG_MAX);
 
-	if (options->msg_type)
-		msg_type = options->msg_type[msg_id];
-	else {
-		msg_type = msg_id_info[msg_id].msg_type;
+	if (!options->msg_type) {
+		int msg_type = msg_id_info[msg_id].msg_type;
+
 		if (options->strict && msg_type == FSCK_WARN)
 			msg_type = FSCK_ERROR;
+		return msg_type;
 	}
 
-	return msg_type;
+	return options->msg_type[msg_id];
 }
 
 static int parse_msg_type(const char *str)
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 08/19] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
  2021-03-16 16:17                 ` [PATCH v4 " Ævar Arnfjörð Bjarmason
                                     ` (8 preceding siblings ...)
  2021-03-17 18:20                   ` [PATCH v5 07/19] fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type" Ævar Arnfjörð Bjarmason
@ 2021-03-17 18:20                   ` Ævar Arnfjörð Bjarmason
  2021-03-17 18:20                   ` [PATCH v5 09/19] fsck.h: re-order and re-assign "enum fsck_msg_type" Ævar Arnfjörð Bjarmason
                                     ` (10 subsequent siblings)
  20 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-17 18:20 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Move the FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} defines into a new
fsck_msg_type enum.

These defines were originally introduced in:

 - ba002f3b28a (builtin-fsck: move common object checking code to
   fsck.c, 2008-02-25)
 - f50c4407305 (fsck: disallow demoting grave fsck errors to warnings,
   2015-06-22)
 - efaba7cc77f (fsck: optionally ignore specific fsck issues
   completely, 2015-06-22)
 - f27d05b1704 (fsck: allow upgrading fsck warnings to errors,
   2015-06-22)

The reason these were defined in two different places is because we
use FSCK_{IGNORE,INFO,FATAL} only in fsck.c, but FSCK_{ERROR,WARN} are
used by external callbacks.

Untangling that would take some more work, since we expose the new
"enum fsck_msg_type" to both. Similar to "enum object_type" it's not
worth structuring the API in such a way that only those who need
FSCK_{ERROR,WARN} pass around a different type.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c       |  2 +-
 builtin/index-pack.c |  3 ++-
 builtin/mktag.c      |  3 ++-
 fsck.c               | 21 ++++++++++-----------
 fsck.h               | 16 ++++++++++------
 5 files changed, 25 insertions(+), 20 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index ed5f2af6b5..17940a4e24 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -84,7 +84,7 @@ static int objerror(struct object *obj, const char *err)
 static int fsck_error_func(struct fsck_options *o,
 			   const struct object_id *oid,
 			   enum object_type object_type,
-			   int msg_type, const char *message)
+			   enum fsck_msg_type msg_type, const char *message)
 {
 	switch (msg_type) {
 	case FSCK_WARN:
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 69f24fe9f7..56b8efaa89 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1716,7 +1716,8 @@ static void show_pack_info(int stat_only)
 static int print_dangling_gitmodules(struct fsck_options *o,
 				     const struct object_id *oid,
 				     enum object_type object_type,
-				     int msg_type, const char *message)
+				     enum fsck_msg_type msg_type,
+				     const char *message)
 {
 	/*
 	 * NEEDSWORK: Plumb the MSG_ID (from fsck.c) here and use it
diff --git a/builtin/mktag.c b/builtin/mktag.c
index 23c4b8763f..052a510ad7 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -17,7 +17,8 @@ static struct fsck_options fsck_options = FSCK_OPTIONS_STRICT;
 static int mktag_fsck_error_func(struct fsck_options *o,
 				 const struct object_id *oid,
 				 enum object_type object_type,
-				 int msg_type, const char *message)
+				 enum fsck_msg_type msg_type,
+				 const char *message)
 {
 	switch (msg_type) {
 	case FSCK_WARN:
diff --git a/fsck.c b/fsck.c
index 2f23255f99..e1e942821d 100644
--- a/fsck.c
+++ b/fsck.c
@@ -22,9 +22,6 @@
 static struct oidset gitmodules_found = OIDSET_INIT;
 static struct oidset gitmodules_done = OIDSET_INIT;
 
-#define FSCK_FATAL -1
-#define FSCK_INFO -2
-
 #define FOREACH_MSG_ID(FUNC) \
 	/* fatal errors */ \
 	FUNC(NUL_IN_HEADER, FATAL) \
@@ -97,7 +94,7 @@ static struct {
 	const char *id_string;
 	const char *downcased;
 	const char *camelcased;
-	int msg_type;
+	enum fsck_msg_type msg_type;
 } msg_id_info[FSCK_MSG_MAX + 1] = {
 	FOREACH_MSG_ID(MSG_ID)
 	{ NULL, NULL, NULL, -1 }
@@ -164,13 +161,13 @@ void list_config_fsck_msg_ids(struct string_list *list, const char *prefix)
 		list_config_item(list, prefix, msg_id_info[i].camelcased);
 }
 
-static int fsck_msg_type(enum fsck_msg_id msg_id,
+static enum fsck_msg_type fsck_msg_type(enum fsck_msg_id msg_id,
 	struct fsck_options *options)
 {
 	assert(msg_id >= 0 && msg_id < FSCK_MSG_MAX);
 
 	if (!options->msg_type) {
-		int msg_type = msg_id_info[msg_id].msg_type;
+		enum fsck_msg_type msg_type = msg_id_info[msg_id].msg_type;
 
 		if (options->strict && msg_type == FSCK_WARN)
 			msg_type = FSCK_ERROR;
@@ -180,7 +177,7 @@ static int fsck_msg_type(enum fsck_msg_id msg_id,
 	return options->msg_type[msg_id];
 }
 
-static int parse_msg_type(const char *str)
+static enum fsck_msg_type parse_msg_type(const char *str)
 {
 	if (!strcmp(str, "error"))
 		return FSCK_ERROR;
@@ -203,7 +200,8 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type)
 void fsck_set_msg_type(struct fsck_options *options,
 		       const char *msg_id_str, const char *msg_type_str)
 {
-	int msg_id = parse_msg_id(msg_id_str), msg_type;
+	int msg_id = parse_msg_id(msg_id_str);
+	enum fsck_msg_type msg_type;
 
 	if (msg_id < 0)
 		die("Unhandled message id: %s", msg_id_str);
@@ -214,7 +212,7 @@ void fsck_set_msg_type(struct fsck_options *options,
 
 	if (!options->msg_type) {
 		int i;
-		int *severity;
+		enum fsck_msg_type *severity;
 		ALLOC_ARRAY(severity, FSCK_MSG_MAX);
 		for (i = 0; i < FSCK_MSG_MAX; i++)
 			severity[i] = fsck_msg_type(i, options);
@@ -294,7 +292,8 @@ static int report(struct fsck_options *options,
 {
 	va_list ap;
 	struct strbuf sb = STRBUF_INIT;
-	int msg_type = fsck_msg_type(msg_id, options), result;
+	enum fsck_msg_type msg_type = fsck_msg_type(msg_id, options);
+	int result;
 
 	if (msg_type == FSCK_IGNORE)
 		return 0;
@@ -1265,7 +1264,7 @@ int fsck_object(struct object *obj, void *data, unsigned long size,
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid,
 			enum object_type object_type,
-			int msg_type, const char *message)
+			enum fsck_msg_type msg_type, const char *message)
 {
 	if (msg_type == FSCK_WARN) {
 		warning("object %s: %s", fsck_describe_object(o, oid), message);
diff --git a/fsck.h b/fsck.h
index 12ff99b56e..0fff04373e 100644
--- a/fsck.h
+++ b/fsck.h
@@ -3,9 +3,13 @@
 
 #include "oidset.h"
 
-#define FSCK_ERROR 1
-#define FSCK_WARN 2
-#define FSCK_IGNORE 3
+enum fsck_msg_type {
+	FSCK_INFO  = -2,
+	FSCK_FATAL = -1,
+	FSCK_ERROR = 1,
+	FSCK_WARN,
+	FSCK_IGNORE
+};
 
 struct fsck_options;
 struct object;
@@ -29,17 +33,17 @@ typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type,
 /* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
 typedef int (*fsck_error)(struct fsck_options *o,
 			  const struct object_id *oid, enum object_type object_type,
-			  int msg_type, const char *message);
+			  enum fsck_msg_type msg_type, const char *message);
 
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid, enum object_type object_type,
-			int msg_type, const char *message);
+			enum fsck_msg_type msg_type, const char *message);
 
 struct fsck_options {
 	fsck_walk_func walk;
 	fsck_error error_func;
 	unsigned strict:1;
-	int *msg_type;
+	enum fsck_msg_type *msg_type;
 	struct oidset skiplist;
 	kh_oid_map_t *object_names;
 };
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 09/19] fsck.h: re-order and re-assign "enum fsck_msg_type"
  2021-03-16 16:17                 ` [PATCH v4 " Ævar Arnfjörð Bjarmason
                                     ` (9 preceding siblings ...)
  2021-03-17 18:20                   ` [PATCH v5 08/19] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Ævar Arnfjörð Bjarmason
@ 2021-03-17 18:20                   ` Ævar Arnfjörð Bjarmason
  2021-03-17 18:20                   ` [PATCH v5 10/19] fsck.c: call parse_msg_type() early in fsck_set_msg_type() Ævar Arnfjörð Bjarmason
                                     ` (9 subsequent siblings)
  20 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-17 18:20 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Change the values in the "enum fsck_msg_type" from being manually
assigned to using default C enum values.

This means we end up with a FSCK_IGNORE=0, which was previously
defined as "2".

I'm confident that nothing relies on these values, we always compare
them explicitly. Let's not omit "0" so it won't be assumed that we're
using these as a boolean somewhere.

This also allows us to re-structure the fields to mark which are
"private" v.s. "public". See the preceding commit for a rationale for
not simply splitting these into two enums, namely that this is used
for both the private and public fsck API.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.h | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fsck.h b/fsck.h
index 0fff04373e..25c456bbd3 100644
--- a/fsck.h
+++ b/fsck.h
@@ -4,11 +4,13 @@
 #include "oidset.h"
 
 enum fsck_msg_type {
-	FSCK_INFO  = -2,
-	FSCK_FATAL = -1,
-	FSCK_ERROR = 1,
+	/* for internal use only */
+	FSCK_IGNORE,
+	FSCK_INFO,
+	FSCK_FATAL,
+	/* "public", fed to e.g. error_func callbacks */
+	FSCK_ERROR,
 	FSCK_WARN,
-	FSCK_IGNORE
 };
 
 struct fsck_options;
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 10/19] fsck.c: call parse_msg_type() early in fsck_set_msg_type()
  2021-03-16 16:17                 ` [PATCH v4 " Ævar Arnfjörð Bjarmason
                                     ` (10 preceding siblings ...)
  2021-03-17 18:20                   ` [PATCH v5 09/19] fsck.h: re-order and re-assign "enum fsck_msg_type" Ævar Arnfjörð Bjarmason
@ 2021-03-17 18:20                   ` Ævar Arnfjörð Bjarmason
  2021-03-17 18:20                   ` [PATCH v5 11/19] fsck.c: undefine temporary STR macro after use Ævar Arnfjörð Bjarmason
                                     ` (8 subsequent siblings)
  20 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-17 18:20 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

There's no reason to defer the calling of parse_msg_type() until after
we've checked if the "id < 0". This is not a hot codepath, and
parse_msg_type() itself may die on invalid input.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fsck.c b/fsck.c
index e1e942821d..341c482fed 100644
--- a/fsck.c
+++ b/fsck.c
@@ -201,11 +201,10 @@ void fsck_set_msg_type(struct fsck_options *options,
 		       const char *msg_id_str, const char *msg_type_str)
 {
 	int msg_id = parse_msg_id(msg_id_str);
-	enum fsck_msg_type msg_type;
+	enum fsck_msg_type msg_type = parse_msg_type(msg_type_str);
 
 	if (msg_id < 0)
 		die("Unhandled message id: %s", msg_id_str);
-	msg_type = parse_msg_type(msg_type_str);
 
 	if (msg_type != FSCK_ERROR && msg_id_info[msg_id].msg_type == FSCK_FATAL)
 		die("Cannot demote %s to %s", msg_id_str, msg_type_str);
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 11/19] fsck.c: undefine temporary STR macro after use
  2021-03-16 16:17                 ` [PATCH v4 " Ævar Arnfjörð Bjarmason
                                     ` (11 preceding siblings ...)
  2021-03-17 18:20                   ` [PATCH v5 10/19] fsck.c: call parse_msg_type() early in fsck_set_msg_type() Ævar Arnfjörð Bjarmason
@ 2021-03-17 18:20                   ` Ævar Arnfjörð Bjarmason
  2021-03-17 18:20                   ` [PATCH v5 12/19] fsck.c: give "FOREACH_MSG_ID" a more specific name Ævar Arnfjörð Bjarmason
                                     ` (7 subsequent siblings)
  20 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-17 18:20 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

In f417eed8cde (fsck: provide a function to parse fsck message IDs,
2015-06-22) the "STR" macro was introduced, but that short macro name
was not undefined after use as was done earlier in the same series for
the MSG_ID macro in c99ba492f1c (fsck: introduce identifiers for fsck
messages, 2015-06-22).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fsck.c b/fsck.c
index 341c482fed..e657636a6f 100644
--- a/fsck.c
+++ b/fsck.c
@@ -100,6 +100,7 @@ static struct {
 	{ NULL, NULL, NULL, -1 }
 };
 #undef MSG_ID
+#undef STR
 
 static void prepare_msg_ids(void)
 {
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 12/19] fsck.c: give "FOREACH_MSG_ID" a more specific name
  2021-03-16 16:17                 ` [PATCH v4 " Ævar Arnfjörð Bjarmason
                                     ` (12 preceding siblings ...)
  2021-03-17 18:20                   ` [PATCH v5 11/19] fsck.c: undefine temporary STR macro after use Ævar Arnfjörð Bjarmason
@ 2021-03-17 18:20                   ` Ævar Arnfjörð Bjarmason
  2021-03-17 18:20                   ` [PATCH v5 13/19] fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h Ævar Arnfjörð Bjarmason
                                     ` (6 subsequent siblings)
  20 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-17 18:20 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Rename the FOREACH_MSG_ID macro to FOREACH_FSCK_MSG_ID in preparation
for moving it over to fsck.h. It's good convention to name macros
in *.h files in such a way as to clearly not clash with any other
names in other files.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fsck.c b/fsck.c
index e657636a6f..b64526ea35 100644
--- a/fsck.c
+++ b/fsck.c
@@ -22,7 +22,7 @@
 static struct oidset gitmodules_found = OIDSET_INIT;
 static struct oidset gitmodules_done = OIDSET_INIT;
 
-#define FOREACH_MSG_ID(FUNC) \
+#define FOREACH_FSCK_MSG_ID(FUNC) \
 	/* fatal errors */ \
 	FUNC(NUL_IN_HEADER, FATAL) \
 	FUNC(UNTERMINATED_HEADER, FATAL) \
@@ -83,7 +83,7 @@ static struct oidset gitmodules_done = OIDSET_INIT;
 
 #define MSG_ID(id, msg_type) FSCK_MSG_##id,
 enum fsck_msg_id {
-	FOREACH_MSG_ID(MSG_ID)
+	FOREACH_FSCK_MSG_ID(MSG_ID)
 	FSCK_MSG_MAX
 };
 #undef MSG_ID
@@ -96,7 +96,7 @@ static struct {
 	const char *camelcased;
 	enum fsck_msg_type msg_type;
 } msg_id_info[FSCK_MSG_MAX + 1] = {
-	FOREACH_MSG_ID(MSG_ID)
+	FOREACH_FSCK_MSG_ID(MSG_ID)
 	{ NULL, NULL, NULL, -1 }
 };
 #undef MSG_ID
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 13/19] fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h
  2021-03-16 16:17                 ` [PATCH v4 " Ævar Arnfjörð Bjarmason
                                     ` (13 preceding siblings ...)
  2021-03-17 18:20                   ` [PATCH v5 12/19] fsck.c: give "FOREACH_MSG_ID" a more specific name Ævar Arnfjörð Bjarmason
@ 2021-03-17 18:20                   ` Ævar Arnfjörð Bjarmason
  2021-03-17 18:20                   ` [PATCH v5 14/19] fsck.c: pass along the fsck_msg_id in the fsck_error callback Ævar Arnfjörð Bjarmason
                                     ` (5 subsequent siblings)
  20 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-17 18:20 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Move the FOREACH_FSCK_MSG_ID macro and the fsck_msg_id enum it helps
define from fsck.c to fsck.h. This is in preparation for having
non-static functions take the fsck_msg_id as an argument.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 66 ----------------------------------------------------------
 fsck.h | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 66 insertions(+), 66 deletions(-)

diff --git a/fsck.c b/fsck.c
index b64526ea35..49208ec636 100644
--- a/fsck.c
+++ b/fsck.c
@@ -22,72 +22,6 @@
 static struct oidset gitmodules_found = OIDSET_INIT;
 static struct oidset gitmodules_done = OIDSET_INIT;
 
-#define FOREACH_FSCK_MSG_ID(FUNC) \
-	/* fatal errors */ \
-	FUNC(NUL_IN_HEADER, FATAL) \
-	FUNC(UNTERMINATED_HEADER, FATAL) \
-	/* errors */ \
-	FUNC(BAD_DATE, ERROR) \
-	FUNC(BAD_DATE_OVERFLOW, ERROR) \
-	FUNC(BAD_EMAIL, ERROR) \
-	FUNC(BAD_NAME, ERROR) \
-	FUNC(BAD_OBJECT_SHA1, ERROR) \
-	FUNC(BAD_PARENT_SHA1, ERROR) \
-	FUNC(BAD_TAG_OBJECT, ERROR) \
-	FUNC(BAD_TIMEZONE, ERROR) \
-	FUNC(BAD_TREE, ERROR) \
-	FUNC(BAD_TREE_SHA1, ERROR) \
-	FUNC(BAD_TYPE, ERROR) \
-	FUNC(DUPLICATE_ENTRIES, ERROR) \
-	FUNC(MISSING_AUTHOR, ERROR) \
-	FUNC(MISSING_COMMITTER, ERROR) \
-	FUNC(MISSING_EMAIL, ERROR) \
-	FUNC(MISSING_NAME_BEFORE_EMAIL, ERROR) \
-	FUNC(MISSING_OBJECT, ERROR) \
-	FUNC(MISSING_SPACE_BEFORE_DATE, ERROR) \
-	FUNC(MISSING_SPACE_BEFORE_EMAIL, ERROR) \
-	FUNC(MISSING_TAG, ERROR) \
-	FUNC(MISSING_TAG_ENTRY, ERROR) \
-	FUNC(MISSING_TREE, ERROR) \
-	FUNC(MISSING_TREE_OBJECT, ERROR) \
-	FUNC(MISSING_TYPE, ERROR) \
-	FUNC(MISSING_TYPE_ENTRY, ERROR) \
-	FUNC(MULTIPLE_AUTHORS, ERROR) \
-	FUNC(TREE_NOT_SORTED, ERROR) \
-	FUNC(UNKNOWN_TYPE, ERROR) \
-	FUNC(ZERO_PADDED_DATE, ERROR) \
-	FUNC(GITMODULES_MISSING, ERROR) \
-	FUNC(GITMODULES_BLOB, ERROR) \
-	FUNC(GITMODULES_LARGE, ERROR) \
-	FUNC(GITMODULES_NAME, ERROR) \
-	FUNC(GITMODULES_SYMLINK, ERROR) \
-	FUNC(GITMODULES_URL, ERROR) \
-	FUNC(GITMODULES_PATH, ERROR) \
-	FUNC(GITMODULES_UPDATE, ERROR) \
-	/* warnings */ \
-	FUNC(BAD_FILEMODE, WARN) \
-	FUNC(EMPTY_NAME, WARN) \
-	FUNC(FULL_PATHNAME, WARN) \
-	FUNC(HAS_DOT, WARN) \
-	FUNC(HAS_DOTDOT, WARN) \
-	FUNC(HAS_DOTGIT, WARN) \
-	FUNC(NULL_SHA1, WARN) \
-	FUNC(ZERO_PADDED_FILEMODE, WARN) \
-	FUNC(NUL_IN_COMMIT, WARN) \
-	/* infos (reported as warnings, but ignored by default) */ \
-	FUNC(GITMODULES_PARSE, INFO) \
-	FUNC(BAD_TAG_NAME, INFO) \
-	FUNC(MISSING_TAGGER_ENTRY, INFO) \
-	/* ignored (elevated when requested) */ \
-	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
-
-#define MSG_ID(id, msg_type) FSCK_MSG_##id,
-enum fsck_msg_id {
-	FOREACH_FSCK_MSG_ID(MSG_ID)
-	FSCK_MSG_MAX
-};
-#undef MSG_ID
-
 #define STR(x) #x
 #define MSG_ID(id, msg_type) { STR(id), NULL, NULL, FSCK_##msg_type },
 static struct {
diff --git a/fsck.h b/fsck.h
index 25c456bbd3..7c868410eb 100644
--- a/fsck.h
+++ b/fsck.h
@@ -13,6 +13,72 @@ enum fsck_msg_type {
 	FSCK_WARN,
 };
 
+#define FOREACH_FSCK_MSG_ID(FUNC) \
+	/* fatal errors */ \
+	FUNC(NUL_IN_HEADER, FATAL) \
+	FUNC(UNTERMINATED_HEADER, FATAL) \
+	/* errors */ \
+	FUNC(BAD_DATE, ERROR) \
+	FUNC(BAD_DATE_OVERFLOW, ERROR) \
+	FUNC(BAD_EMAIL, ERROR) \
+	FUNC(BAD_NAME, ERROR) \
+	FUNC(BAD_OBJECT_SHA1, ERROR) \
+	FUNC(BAD_PARENT_SHA1, ERROR) \
+	FUNC(BAD_TAG_OBJECT, ERROR) \
+	FUNC(BAD_TIMEZONE, ERROR) \
+	FUNC(BAD_TREE, ERROR) \
+	FUNC(BAD_TREE_SHA1, ERROR) \
+	FUNC(BAD_TYPE, ERROR) \
+	FUNC(DUPLICATE_ENTRIES, ERROR) \
+	FUNC(MISSING_AUTHOR, ERROR) \
+	FUNC(MISSING_COMMITTER, ERROR) \
+	FUNC(MISSING_EMAIL, ERROR) \
+	FUNC(MISSING_NAME_BEFORE_EMAIL, ERROR) \
+	FUNC(MISSING_OBJECT, ERROR) \
+	FUNC(MISSING_SPACE_BEFORE_DATE, ERROR) \
+	FUNC(MISSING_SPACE_BEFORE_EMAIL, ERROR) \
+	FUNC(MISSING_TAG, ERROR) \
+	FUNC(MISSING_TAG_ENTRY, ERROR) \
+	FUNC(MISSING_TREE, ERROR) \
+	FUNC(MISSING_TREE_OBJECT, ERROR) \
+	FUNC(MISSING_TYPE, ERROR) \
+	FUNC(MISSING_TYPE_ENTRY, ERROR) \
+	FUNC(MULTIPLE_AUTHORS, ERROR) \
+	FUNC(TREE_NOT_SORTED, ERROR) \
+	FUNC(UNKNOWN_TYPE, ERROR) \
+	FUNC(ZERO_PADDED_DATE, ERROR) \
+	FUNC(GITMODULES_MISSING, ERROR) \
+	FUNC(GITMODULES_BLOB, ERROR) \
+	FUNC(GITMODULES_LARGE, ERROR) \
+	FUNC(GITMODULES_NAME, ERROR) \
+	FUNC(GITMODULES_SYMLINK, ERROR) \
+	FUNC(GITMODULES_URL, ERROR) \
+	FUNC(GITMODULES_PATH, ERROR) \
+	FUNC(GITMODULES_UPDATE, ERROR) \
+	/* warnings */ \
+	FUNC(BAD_FILEMODE, WARN) \
+	FUNC(EMPTY_NAME, WARN) \
+	FUNC(FULL_PATHNAME, WARN) \
+	FUNC(HAS_DOT, WARN) \
+	FUNC(HAS_DOTDOT, WARN) \
+	FUNC(HAS_DOTGIT, WARN) \
+	FUNC(NULL_SHA1, WARN) \
+	FUNC(ZERO_PADDED_FILEMODE, WARN) \
+	FUNC(NUL_IN_COMMIT, WARN) \
+	/* infos (reported as warnings, but ignored by default) */ \
+	FUNC(GITMODULES_PARSE, INFO) \
+	FUNC(BAD_TAG_NAME, INFO) \
+	FUNC(MISSING_TAGGER_ENTRY, INFO) \
+	/* ignored (elevated when requested) */ \
+	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
+
+#define MSG_ID(id, msg_type) FSCK_MSG_##id,
+enum fsck_msg_id {
+	FOREACH_FSCK_MSG_ID(MSG_ID)
+	FSCK_MSG_MAX
+};
+#undef MSG_ID
+
 struct fsck_options;
 struct object;
 
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 14/19] fsck.c: pass along the fsck_msg_id in the fsck_error callback
  2021-03-16 16:17                 ` [PATCH v4 " Ævar Arnfjörð Bjarmason
                                     ` (14 preceding siblings ...)
  2021-03-17 18:20                   ` [PATCH v5 13/19] fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h Ævar Arnfjörð Bjarmason
@ 2021-03-17 18:20                   ` Ævar Arnfjörð Bjarmason
  2021-03-17 18:20                   ` [PATCH v5 15/19] fsck.c: add an fsck_set_msg_type() API that takes enums Ævar Arnfjörð Bjarmason
                                     ` (4 subsequent siblings)
  20 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-17 18:20 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Change the fsck_error callback to also pass along the
fsck_msg_id. Before this change the only way to get the message id was
to parse it back out of the "message".

Let's pass it down explicitly for the benefit of callers that might
want to use it, as discussed in [1].

Passing the msg_type is now redundant, as you can always get it back
from the msg_id, but I'm not changing that convention. It's really
common to need the msg_type, and the report() function itself (which
calls "fsck_error") needs to call fsck_msg_type() to discover
it. Let's not needlessly re-do that work in the user callback.

1. https://lore.kernel.org/git/87blcja2ha.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c       | 4 +++-
 builtin/index-pack.c | 3 ++-
 builtin/mktag.c      | 1 +
 fsck.c               | 6 ++++--
 fsck.h               | 6 ++++--
 5 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 17940a4e24..70ff95837a 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -84,7 +84,9 @@ static int objerror(struct object *obj, const char *err)
 static int fsck_error_func(struct fsck_options *o,
 			   const struct object_id *oid,
 			   enum object_type object_type,
-			   enum fsck_msg_type msg_type, const char *message)
+			   enum fsck_msg_type msg_type,
+			   enum fsck_msg_id msg_id,
+			   const char *message)
 {
 	switch (msg_type) {
 	case FSCK_WARN:
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 56b8efaa89..2b2266a4b7 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1717,6 +1717,7 @@ static int print_dangling_gitmodules(struct fsck_options *o,
 				     const struct object_id *oid,
 				     enum object_type object_type,
 				     enum fsck_msg_type msg_type,
+				     enum fsck_msg_id msg_id,
 				     const char *message)
 {
 	/*
@@ -1727,7 +1728,7 @@ static int print_dangling_gitmodules(struct fsck_options *o,
 		printf("%s\n", oid_to_hex(oid));
 		return 0;
 	}
-	return fsck_error_function(o, oid, object_type, msg_type, message);
+	return fsck_error_function(o, oid, object_type, msg_type, msg_id, message);
 }
 
 int cmd_index_pack(int argc, const char **argv, const char *prefix)
diff --git a/builtin/mktag.c b/builtin/mktag.c
index 052a510ad7..96e63bc772 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -18,6 +18,7 @@ static int mktag_fsck_error_func(struct fsck_options *o,
 				 const struct object_id *oid,
 				 enum object_type object_type,
 				 enum fsck_msg_type msg_type,
+				 enum fsck_msg_id msg_id,
 				 const char *message)
 {
 	switch (msg_type) {
diff --git a/fsck.c b/fsck.c
index 49208ec636..01b2724ac0 100644
--- a/fsck.c
+++ b/fsck.c
@@ -245,7 +245,7 @@ static int report(struct fsck_options *options,
 	va_start(ap, fmt);
 	strbuf_vaddf(&sb, fmt, ap);
 	result = options->error_func(options, oid, object_type,
-				     msg_type, sb.buf);
+				     msg_type, msg_id, sb.buf);
 	strbuf_release(&sb);
 	va_end(ap);
 
@@ -1198,7 +1198,9 @@ int fsck_object(struct object *obj, void *data, unsigned long size,
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid,
 			enum object_type object_type,
-			enum fsck_msg_type msg_type, const char *message)
+			enum fsck_msg_type msg_type,
+			enum fsck_msg_id msg_id,
+			const char *message)
 {
 	if (msg_type == FSCK_WARN) {
 		warning("object %s: %s", fsck_describe_object(o, oid), message);
diff --git a/fsck.h b/fsck.h
index 7c868410eb..80b1984f34 100644
--- a/fsck.h
+++ b/fsck.h
@@ -101,11 +101,13 @@ typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type,
 /* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
 typedef int (*fsck_error)(struct fsck_options *o,
 			  const struct object_id *oid, enum object_type object_type,
-			  enum fsck_msg_type msg_type, const char *message);
+			  enum fsck_msg_type msg_type, enum fsck_msg_id msg_id,
+			  const char *message);
 
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid, enum object_type object_type,
-			enum fsck_msg_type msg_type, const char *message);
+			enum fsck_msg_type msg_type, enum fsck_msg_id msg_id,
+			const char *message);
 
 struct fsck_options {
 	fsck_walk_func walk;
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 15/19] fsck.c: add an fsck_set_msg_type() API that takes enums
  2021-03-16 16:17                 ` [PATCH v4 " Ævar Arnfjörð Bjarmason
                                     ` (15 preceding siblings ...)
  2021-03-17 18:20                   ` [PATCH v5 14/19] fsck.c: pass along the fsck_msg_id in the fsck_error callback Ævar Arnfjörð Bjarmason
@ 2021-03-17 18:20                   ` Ævar Arnfjörð Bjarmason
  2021-03-17 18:20                   ` [PATCH v5 16/19] fsck.c: move gitmodules_{found,done} into fsck_options Ævar Arnfjörð Bjarmason
                                     ` (3 subsequent siblings)
  20 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-17 18:20 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Change code I added in acf9de4c94e (mktag: use fsck instead of custom
verify_tag(), 2021-01-05) to make use of a new API function that takes
the fsck_msg_{id,type} types, instead of arbitrary strings that
we'll (hopefully) parse into those types.

At the time that the fsck_set_msg_type() API was introduced in
0282f4dced0 (fsck: offer a function to demote fsck errors to warnings,
2015-06-22) it was only intended to be used to parse user-supplied
data.

For things that are purely internal to the C code it makes sense to
have the compiler check these arguments, and to skip the sanity
checking of the data in fsck_set_msg_type() which is redundant to
checks we get from the compiler.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/mktag.c |  3 ++-
 fsck.c          | 27 +++++++++++++++++----------
 fsck.h          |  3 +++
 3 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/builtin/mktag.c b/builtin/mktag.c
index 96e63bc772..dddcccdd36 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -88,7 +88,8 @@ int cmd_mktag(int argc, const char **argv, const char *prefix)
 		die_errno(_("could not read from stdin"));
 
 	fsck_options.error_func = mktag_fsck_error_func;
-	fsck_set_msg_type(&fsck_options, "extraheaderentry", "warn");
+	fsck_set_msg_type_from_ids(&fsck_options, FSCK_MSG_EXTRA_HEADER_ENTRY,
+				   FSCK_WARN);
 	/* config might set fsck.extraHeaderEntry=* again */
 	git_config(git_fsck_config, &fsck_options);
 	if (fsck_tag_standalone(NULL, buf.buf, buf.len, &fsck_options,
diff --git a/fsck.c b/fsck.c
index 01b2724ac0..307d454d92 100644
--- a/fsck.c
+++ b/fsck.c
@@ -132,6 +132,22 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type)
 	return 1;
 }
 
+void fsck_set_msg_type_from_ids(struct fsck_options *options,
+				enum fsck_msg_id msg_id,
+				enum fsck_msg_type msg_type)
+{
+	if (!options->msg_type) {
+		int i;
+		enum fsck_msg_type *severity;
+		ALLOC_ARRAY(severity, FSCK_MSG_MAX);
+		for (i = 0; i < FSCK_MSG_MAX; i++)
+			severity[i] = fsck_msg_type(i, options);
+		options->msg_type = severity;
+	}
+
+	options->msg_type[msg_id] = msg_type;
+}
+
 void fsck_set_msg_type(struct fsck_options *options,
 		       const char *msg_id_str, const char *msg_type_str)
 {
@@ -144,16 +160,7 @@ void fsck_set_msg_type(struct fsck_options *options,
 	if (msg_type != FSCK_ERROR && msg_id_info[msg_id].msg_type == FSCK_FATAL)
 		die("Cannot demote %s to %s", msg_id_str, msg_type_str);
 
-	if (!options->msg_type) {
-		int i;
-		enum fsck_msg_type *severity;
-		ALLOC_ARRAY(severity, FSCK_MSG_MAX);
-		for (i = 0; i < FSCK_MSG_MAX; i++)
-			severity[i] = fsck_msg_type(i, options);
-		options->msg_type = severity;
-	}
-
-	options->msg_type[msg_id] = msg_type;
+	fsck_set_msg_type_from_ids(options, msg_id, msg_type);
 }
 
 void fsck_set_msg_types(struct fsck_options *options, const char *values)
diff --git a/fsck.h b/fsck.h
index 80b1984f34..344c3ddc74 100644
--- a/fsck.h
+++ b/fsck.h
@@ -82,6 +82,9 @@ enum fsck_msg_id {
 struct fsck_options;
 struct object;
 
+void fsck_set_msg_type_from_ids(struct fsck_options *options,
+				enum fsck_msg_id msg_id,
+				enum fsck_msg_type msg_type);
 void fsck_set_msg_type(struct fsck_options *options,
 		       const char *msg_id, const char *msg_type);
 void fsck_set_msg_types(struct fsck_options *options, const char *values);
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 16/19] fsck.c: move gitmodules_{found,done} into fsck_options
  2021-03-16 16:17                 ` [PATCH v4 " Ævar Arnfjörð Bjarmason
                                     ` (16 preceding siblings ...)
  2021-03-17 18:20                   ` [PATCH v5 15/19] fsck.c: add an fsck_set_msg_type() API that takes enums Ævar Arnfjörð Bjarmason
@ 2021-03-17 18:20                   ` Ævar Arnfjörð Bjarmason
  2021-03-17 18:20                   ` [PATCH v5 17/19] fetch-pack: don't needlessly copy fsck_options Ævar Arnfjörð Bjarmason
                                     ` (2 subsequent siblings)
  20 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-17 18:20 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Move the gitmodules_{found,done} static variables added in
159e7b080bf (fsck: detect gitmodules files, 2018-05-02) into the
fsck_options struct. It makes sense to keep all the context in the
same place.

This requires changing the recently added register_found_gitmodules()
function added in 5476e1efde (fetch-pack: print and use dangling
.gitmodules, 2021-02-22) to take fsck_options. That function will be
removed in a subsequent commit, but as it'll require the new
gitmodules_found attribute of "fsck_options" we need this intermediate
step first.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fetch-pack.c |  2 +-
 fsck.c       | 23 ++++++++++-------------
 fsck.h       |  7 ++++++-
 3 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 6a61a46428..82c3c2c043 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -998,7 +998,7 @@ static void fsck_gitmodules_oids(struct oidset *gitmodules_oids)
 
 	oidset_iter_init(gitmodules_oids, &iter);
 	while ((oid = oidset_iter_next(&iter)))
-		register_found_gitmodules(oid);
+		register_found_gitmodules(&fo, oid);
 	if (fsck_finish(&fo))
 		die("fsck failed");
 }
diff --git a/fsck.c b/fsck.c
index 307d454d92..00760b1f42 100644
--- a/fsck.c
+++ b/fsck.c
@@ -19,9 +19,6 @@
 #include "credential.h"
 #include "help.h"
 
-static struct oidset gitmodules_found = OIDSET_INIT;
-static struct oidset gitmodules_done = OIDSET_INIT;
-
 #define STR(x) #x
 #define MSG_ID(id, msg_type) { STR(id), NULL, NULL, FSCK_##msg_type },
 static struct {
@@ -624,7 +621,7 @@ static int fsck_tree(const struct object_id *oid,
 
 		if (is_hfs_dotgitmodules(name) || is_ntfs_dotgitmodules(name)) {
 			if (!S_ISLNK(mode))
-				oidset_insert(&gitmodules_found, oid);
+				oidset_insert(&options->gitmodules_found, oid);
 			else
 				retval += report(options,
 						 oid, OBJ_TREE,
@@ -638,7 +635,7 @@ static int fsck_tree(const struct object_id *oid,
 				has_dotgit |= is_ntfs_dotgit(backslash);
 				if (is_ntfs_dotgitmodules(backslash)) {
 					if (!S_ISLNK(mode))
-						oidset_insert(&gitmodules_found, oid);
+						oidset_insert(&options->gitmodules_found, oid);
 					else
 						retval += report(options, oid, OBJ_TREE,
 								 FSCK_MSG_GITMODULES_SYMLINK,
@@ -1150,9 +1147,9 @@ static int fsck_blob(const struct object_id *oid, const char *buf,
 	struct fsck_gitmodules_data data;
 	struct config_options config_opts = { 0 };
 
-	if (!oidset_contains(&gitmodules_found, oid))
+	if (!oidset_contains(&options->gitmodules_found, oid))
 		return 0;
-	oidset_insert(&gitmodules_done, oid);
+	oidset_insert(&options->gitmodules_done, oid);
 
 	if (object_on_skiplist(options, oid))
 		return 0;
@@ -1217,9 +1214,9 @@ int fsck_error_function(struct fsck_options *o,
 	return 1;
 }
 
-void register_found_gitmodules(const struct object_id *oid)
+void register_found_gitmodules(struct fsck_options *options, const struct object_id *oid)
 {
-	oidset_insert(&gitmodules_found, oid);
+	oidset_insert(&options->gitmodules_found, oid);
 }
 
 int fsck_finish(struct fsck_options *options)
@@ -1228,13 +1225,13 @@ int fsck_finish(struct fsck_options *options)
 	struct oidset_iter iter;
 	const struct object_id *oid;
 
-	oidset_iter_init(&gitmodules_found, &iter);
+	oidset_iter_init(&options->gitmodules_found, &iter);
 	while ((oid = oidset_iter_next(&iter))) {
 		enum object_type type;
 		unsigned long size;
 		char *buf;
 
-		if (oidset_contains(&gitmodules_done, oid))
+		if (oidset_contains(&options->gitmodules_done, oid))
 			continue;
 
 		buf = read_object_file(oid, &type, &size);
@@ -1259,8 +1256,8 @@ int fsck_finish(struct fsck_options *options)
 	}
 
 
-	oidset_clear(&gitmodules_found);
-	oidset_clear(&gitmodules_done);
+	oidset_clear(&options->gitmodules_found);
+	oidset_clear(&options->gitmodules_done);
 	return ret;
 }
 
diff --git a/fsck.h b/fsck.h
index 344c3ddc74..b25ae9d8b9 100644
--- a/fsck.h
+++ b/fsck.h
@@ -118,6 +118,8 @@ struct fsck_options {
 	unsigned strict:1;
 	enum fsck_msg_type *msg_type;
 	struct oidset skiplist;
+	struct oidset gitmodules_found;
+	struct oidset gitmodules_done;
 	kh_oid_map_t *object_names;
 };
 
@@ -125,6 +127,8 @@ struct fsck_options {
 	.walk = NULL, \
 	.msg_type = NULL, \
 	.skiplist = OIDSET_INIT, \
+	.gitmodules_found = OIDSET_INIT, \
+	.gitmodules_done = OIDSET_INIT, \
 	.object_names = NULL,
 #define FSCK_OPTIONS_COMMON_ERROR_FUNC \
 	FSCK_OPTIONS_COMMON \
@@ -149,7 +153,8 @@ int fsck_walk(struct object *obj, void *data, struct fsck_options *options);
 int fsck_object(struct object *obj, void *data, unsigned long size,
 	struct fsck_options *options);
 
-void register_found_gitmodules(const struct object_id *oid);
+void register_found_gitmodules(struct fsck_options *options,
+			       const struct object_id *oid);
 
 /*
  * fsck a tag, and pass info about it back to the caller. This is
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 17/19] fetch-pack: don't needlessly copy fsck_options
  2021-03-16 16:17                 ` [PATCH v4 " Ævar Arnfjörð Bjarmason
                                     ` (17 preceding siblings ...)
  2021-03-17 18:20                   ` [PATCH v5 16/19] fsck.c: move gitmodules_{found,done} into fsck_options Ævar Arnfjörð Bjarmason
@ 2021-03-17 18:20                   ` Ævar Arnfjörð Bjarmason
  2021-03-17 18:20                   ` [PATCH v5 18/19] fetch-pack: use file-scope static struct for fsck_options Ævar Arnfjörð Bjarmason
  2021-03-17 18:20                   ` [PATCH v5 19/19] fetch-pack: use new fsck API to printing dangling submodules Ævar Arnfjörð Bjarmason
  20 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-17 18:20 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Change the behavior of the .gitmodules validation added in
5476e1efde (fetch-pack: print and use dangling .gitmodules,
2021-02-22) so we're using one "fsck_options".

I found that code confusing to read. One might think that not setting
up the error_func earlier means that we're relying on the "error_func"
not being set in some code in between the two hunks being modified
here.

But we're not, all we're doing in the rest of "cmd_index_pack()" is
further setup by calling fsck_set_msg_types(), and assigning to
do_fsck_object.

So there was no reason in 5476e1efde to make a shallow copy of the
fsck_options struct before setting error_func. Let's just do this
setup at the top of the function, along with the "walk" assignment.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/index-pack.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 2b2266a4b7..5ad80b85b4 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1761,6 +1761,7 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 
 	read_replace_refs = 0;
 	fsck_options.walk = mark_link;
+	fsck_options.error_func = print_dangling_gitmodules;
 
 	reset_pack_idx_option(&opts);
 	git_config(git_index_pack_config, &opts);
@@ -1951,13 +1952,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	else
 		close(input_fd);
 
-	if (do_fsck_object) {
-		struct fsck_options fo = fsck_options;
-
-		fo.error_func = print_dangling_gitmodules;
-		if (fsck_finish(&fo))
-			die(_("fsck error in pack objects"));
-	}
+	if (do_fsck_object && fsck_finish(&fsck_options))
+		die(_("fsck error in pack objects"));
 
 	free(objects);
 	strbuf_release(&index_name_buf);
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 18/19] fetch-pack: use file-scope static struct for fsck_options
  2021-03-16 16:17                 ` [PATCH v4 " Ævar Arnfjörð Bjarmason
                                     ` (18 preceding siblings ...)
  2021-03-17 18:20                   ` [PATCH v5 17/19] fetch-pack: don't needlessly copy fsck_options Ævar Arnfjörð Bjarmason
@ 2021-03-17 18:20                   ` Ævar Arnfjörð Bjarmason
  2021-03-17 18:20                   ` [PATCH v5 19/19] fetch-pack: use new fsck API to printing dangling submodules Ævar Arnfjörð Bjarmason
  20 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-17 18:20 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Change code added in 5476e1efde (fetch-pack: print and use dangling
.gitmodules, 2021-02-22) so that we use a file-scoped "static struct
fsck_options" instead of defining one in the "fsck_gitmodules_oids()"
function.

We use this pattern in all of
builtin/{fsck,index-pack,mktag,unpack-objects}.c. It's odd to see
fetch-pack be the odd one out. One might think that we're using other
fsck_options structs in fetch-pack, or doing on fsck twice there, but
we're not.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fetch-pack.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 82c3c2c043..229fd8e2c2 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -38,6 +38,7 @@ static int server_supports_filtering;
 static int advertise_sid;
 static struct shallow_lock shallow_lock;
 static const char *alternate_shallow_file;
+static struct fsck_options fsck_options = FSCK_OPTIONS_STRICT;
 static struct strbuf fsck_msg_types = STRBUF_INIT;
 static struct string_list uri_protocols = STRING_LIST_INIT_DUP;
 
@@ -991,15 +992,14 @@ static void fsck_gitmodules_oids(struct oidset *gitmodules_oids)
 {
 	struct oidset_iter iter;
 	const struct object_id *oid;
-	struct fsck_options fo = FSCK_OPTIONS_STRICT;
 
 	if (!oidset_size(gitmodules_oids))
 		return;
 
 	oidset_iter_init(gitmodules_oids, &iter);
 	while ((oid = oidset_iter_next(&iter)))
-		register_found_gitmodules(&fo, oid);
-	if (fsck_finish(&fo))
+		register_found_gitmodules(&fsck_options, oid);
+	if (fsck_finish(&fsck_options))
 		die("fsck failed");
 }
 
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 19/19] fetch-pack: use new fsck API to printing dangling submodules
  2021-03-16 16:17                 ` [PATCH v4 " Ævar Arnfjörð Bjarmason
                                     ` (19 preceding siblings ...)
  2021-03-17 18:20                   ` [PATCH v5 18/19] fetch-pack: use file-scope static struct for fsck_options Ævar Arnfjörð Bjarmason
@ 2021-03-17 18:20                   ` Ævar Arnfjörð Bjarmason
  20 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-17 18:20 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Refactor the check added in 5476e1efde (fetch-pack: print and use
dangling .gitmodules, 2021-02-22) to make use of us now passing the
"msg_id" to the user defined "error_func". We can now compare against
the FSCK_MSG_GITMODULES_MISSING instead of parsing the generated
message.

Let's also replace register_found_gitmodules() with directly
manipulating the "gitmodules_found" member. A recent commit moved it
into "fsck_options" so we could do this here.

I'm sticking this callback in fsck.c. Perhaps in the future we'd like
to accumulate such callbacks into another file (maybe fsck-cb.c,
similar to parse-options-cb.c?), but while we've got just the one
let's just put it into fsck.c.

A better alternative in this case would be some library some more
obvious library shared by fetch-pack.c ad builtin/index-pack.c, but
there isn't such a thing.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/index-pack.c | 21 +--------------------
 fetch-pack.c         |  4 ++--
 fsck.c               | 23 ++++++++++++++++++-----
 fsck.h               | 18 +++++++++++++++---
 4 files changed, 36 insertions(+), 30 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 5ad80b85b4..11f0fafd33 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -120,7 +120,7 @@ static int nr_threads;
 static int from_stdin;
 static int strict;
 static int do_fsck_object;
-static struct fsck_options fsck_options = FSCK_OPTIONS_STRICT;
+static struct fsck_options fsck_options = FSCK_OPTIONS_MISSING_GITMODULES;
 static int verbose;
 static int show_resolving_progress;
 static int show_stat;
@@ -1713,24 +1713,6 @@ static void show_pack_info(int stat_only)
 	}
 }
 
-static int print_dangling_gitmodules(struct fsck_options *o,
-				     const struct object_id *oid,
-				     enum object_type object_type,
-				     enum fsck_msg_type msg_type,
-				     enum fsck_msg_id msg_id,
-				     const char *message)
-{
-	/*
-	 * NEEDSWORK: Plumb the MSG_ID (from fsck.c) here and use it
-	 * instead of relying on this string check.
-	 */
-	if (starts_with(message, "gitmodulesMissing")) {
-		printf("%s\n", oid_to_hex(oid));
-		return 0;
-	}
-	return fsck_error_function(o, oid, object_type, msg_type, msg_id, message);
-}
-
 int cmd_index_pack(int argc, const char **argv, const char *prefix)
 {
 	int i, fix_thin_pack = 0, verify = 0, stat_only = 0, rev_index;
@@ -1761,7 +1743,6 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 
 	read_replace_refs = 0;
 	fsck_options.walk = mark_link;
-	fsck_options.error_func = print_dangling_gitmodules;
 
 	reset_pack_idx_option(&opts);
 	git_config(git_index_pack_config, &opts);
diff --git a/fetch-pack.c b/fetch-pack.c
index 229fd8e2c2..008a3facd4 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -38,7 +38,7 @@ static int server_supports_filtering;
 static int advertise_sid;
 static struct shallow_lock shallow_lock;
 static const char *alternate_shallow_file;
-static struct fsck_options fsck_options = FSCK_OPTIONS_STRICT;
+static struct fsck_options fsck_options = FSCK_OPTIONS_MISSING_GITMODULES;
 static struct strbuf fsck_msg_types = STRBUF_INIT;
 static struct string_list uri_protocols = STRING_LIST_INIT_DUP;
 
@@ -998,7 +998,7 @@ static void fsck_gitmodules_oids(struct oidset *gitmodules_oids)
 
 	oidset_iter_init(gitmodules_oids, &iter);
 	while ((oid = oidset_iter_next(&iter)))
-		register_found_gitmodules(&fsck_options, oid);
+		oidset_insert(&fsck_options.gitmodules_found, oid);
 	if (fsck_finish(&fsck_options))
 		die("fsck failed");
 }
diff --git a/fsck.c b/fsck.c
index 00760b1f42..048cf81937 100644
--- a/fsck.c
+++ b/fsck.c
@@ -1214,11 +1214,6 @@ int fsck_error_function(struct fsck_options *o,
 	return 1;
 }
 
-void register_found_gitmodules(struct fsck_options *options, const struct object_id *oid)
-{
-	oidset_insert(&options->gitmodules_found, oid);
-}
-
 int fsck_finish(struct fsck_options *options)
 {
 	int ret = 0;
@@ -1284,3 +1279,21 @@ int git_fsck_config(const char *var, const char *value, void *cb)
 
 	return git_default_config(var, value, cb);
 }
+
+/*
+ * Custom error callbacks that are used in more than one place.
+ */
+
+int fsck_error_cb_print_missing_gitmodules(struct fsck_options *o,
+					   const struct object_id *oid,
+					   enum object_type object_type,
+					   enum fsck_msg_type msg_type,
+					   enum fsck_msg_id msg_id,
+					   const char *message)
+{
+	if (msg_id == FSCK_MSG_GITMODULES_MISSING) {
+		puts(oid_to_hex(oid));
+		return 0;
+	}
+	return fsck_error_function(o, oid, object_type, msg_type, msg_id, message);
+}
diff --git a/fsck.h b/fsck.h
index b25ae9d8b9..da58f585d7 100644
--- a/fsck.h
+++ b/fsck.h
@@ -153,9 +153,6 @@ int fsck_walk(struct object *obj, void *data, struct fsck_options *options);
 int fsck_object(struct object *obj, void *data, unsigned long size,
 	struct fsck_options *options);
 
-void register_found_gitmodules(struct fsck_options *options,
-			       const struct object_id *oid);
-
 /*
  * fsck a tag, and pass info about it back to the caller. This is
  * exposed fsck_object() internals for git-mktag(1).
@@ -203,4 +200,19 @@ const char *fsck_describe_object(struct fsck_options *options,
  */
 int git_fsck_config(const char *var, const char *value, void *cb);
 
+/*
+ * Custom error callbacks that are used in more than one place.
+ */
+#define FSCK_OPTIONS_MISSING_GITMODULES { \
+	.strict = 1, \
+	.error_func = fsck_error_cb_print_missing_gitmodules, \
+	FSCK_OPTIONS_COMMON \
+}
+int fsck_error_cb_print_missing_gitmodules(struct fsck_options *o,
+					   const struct object_id *oid,
+					   enum object_type object_type,
+					   enum fsck_msg_type msg_type,
+					   enum fsck_msg_id msg_id,
+					   const char *message);
+
 #endif
-- 
2.31.0.260.g719c683c1d


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* Re: [PATCH v4 01/22] fsck.h: update FSCK_OPTIONS_* for object_name
  2021-03-16 16:17                 ` [PATCH v4 01/22] fsck.h: update FSCK_OPTIONS_* for object_name Ævar Arnfjörð Bjarmason
@ 2021-03-17 18:35                   ` Junio C Hamano
  2021-03-19 14:43                   ` Johannes Schindelin
  1 sibling, 0 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-03-17 18:35 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Add the object_name member to the initialization macro. This was
> omitted in 7b35efd734e (fsck_walk(): optionally name objects on the
> go, 2016-07-17) when the field was added.

While this does not hurt, as the missing one was and is at the end
of the struct members, this has no effect.  As you'll be rewriting
everything into designated initializers anyway, does it matter, I
have to wonder (it would affect your commit count karma, but you
already have enough of them ;-)?

>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  fsck.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fsck.h b/fsck.h
> index 733378f126..2274843ba0 100644
> --- a/fsck.h
> +++ b/fsck.h
> @@ -43,8 +43,8 @@ struct fsck_options {
>  	kh_oid_map_t *object_names;
>  };
>  
> -#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT }
> -#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT }
> +#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT, NULL }
> +#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT, NULL }
>  
>  /* descend in all linked child objects
>   * the return value is:

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v4 02/22] fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT}
  2021-03-16 16:17                 ` [PATCH v4 02/22] fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT} Ævar Arnfjörð Bjarmason
  2021-03-16 18:59                   ` Derrick Stolee
@ 2021-03-17 18:38                   ` Junio C Hamano
  1 sibling, 0 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-03-17 18:38 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  fsck.h | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/fsck.h b/fsck.h
> index 2274843ba0..40f3cb3f64 100644
> --- a/fsck.h
> +++ b/fsck.h
> @@ -43,8 +43,22 @@ struct fsck_options {
>  	kh_oid_map_t *object_names;
>  };
>  
> -#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT, NULL }
> -#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT, NULL }
> +#define FSCK_OPTIONS_DEFAULT { \
> +	.walk = NULL, \
> +	.error_func = fsck_error_function, \
> +	.strict = 0, \
> +	.msg_type = NULL, \
> +	.skiplist = OIDSET_INIT, \
> +	.object_names = NULL, \
> +}
> +#define FSCK_OPTIONS_STRICT { \
> +	.walk = NULL, \
> +	.error_func = fsck_error_function, \
> +	.strict = 1, \
> +	.msg_type = NULL, \
> +	.skiplist = OIDSET_INIT, \
> +	.object_names = NULL, \
> +}

Being explicit is good, but spelling out zero initialization sounds
more like cluttering than clarifying.  I do not mind .strict = 0 in
the DEFAULT one only because it contrasts well with .strict = 1 on
the STRICT side, but it would be easier to read to omit these zero
initilization of the .walk, .msg_type and .object_names members.

>  /* descend in all linked child objects
>   * the return value is:

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v4 08/22] fsck.c: move definition of msg_id into append_msg_id()
  2021-03-16 16:17                 ` [PATCH v4 08/22] fsck.c: move definition of msg_id into append_msg_id() Ævar Arnfjörð Bjarmason
@ 2021-03-17 18:45                   ` Junio C Hamano
  0 siblings, 0 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-03-17 18:45 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Refactor code added in 71ab8fa840f (fsck: report the ID of the
> error/warning, 2015-06-22) to resolve the msg_id to a string in the
> function that wants it, instead of doing it in report().

This reintroduces the same confusion 07/22 tried to get rid of,
unless msg_id variable is renamed to msg_id_str in this step,
instead of being left to the next step, no?


> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  fsck.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/fsck.c b/fsck.c
> index 0a9ac9ca07..b977493f57 100644
> --- a/fsck.c
> +++ b/fsck.c
> @@ -264,8 +264,9 @@ void fsck_set_msg_types(struct fsck_options *options, const char *values)
>  	free(to_free);
>  }
>  
> -static void append_msg_id(struct strbuf *sb, const char *msg_id)
> +static void append_msg_id(struct strbuf *sb, enum fsck_msg_id id)
>  {
> +	const char *msg_id = msg_id_info[id].id_string;
>  	for (;;) {
>  		char c = *(msg_id)++;
>  
> @@ -308,7 +309,7 @@ static int report(struct fsck_options *options,
>  	else if (msg_type == FSCK_INFO)
>  		msg_type = FSCK_WARN;
>  
> -	append_msg_id(&sb, msg_id_info[id].id_string);
> +	append_msg_id(&sb, id);
>  
>  	va_start(ap, fmt);
>  	strbuf_vaddf(&sb, fmt, ap);

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v4 11/22] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
  2021-03-16 16:17                 ` [PATCH v4 11/22] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Ævar Arnfjörð Bjarmason
@ 2021-03-17 18:48                   ` Junio C Hamano
  0 siblings, 0 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-03-17 18:48 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Move the FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} defines into a new
> fsck_msg_type enum.

Nice.

> Untangling that would take some more work, since we expose the new
> "enum fsck_msg_type" to both. Similar to "enum object_type" it's not
> worth structuring the API in such a way that only those who need
> FSCK_{ERROR,WARN} pass around a different type.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v4 12/22] fsck.h: re-order and re-assign "enum fsck_msg_type"
  2021-03-16 16:17                 ` [PATCH v4 12/22] fsck.h: re-order and re-assign "enum fsck_msg_type" Ævar Arnfjörð Bjarmason
@ 2021-03-17 18:50                   ` Junio C Hamano
  0 siblings, 0 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-03-17 18:50 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Change the values in the "enum fsck_msg_type" from being manually
> assigned to using default C enum values.
>
> This means we end up with a FSCK_IGNORE=0, which was previously
> defined as "2".
>
> I'm confident that nothing relies on these values, we always compare
> them explicitly. Let's not omit "0" so it won't be assumed that we're
> using these as a boolean somewhere.

Do you mean by "compare them explicitly", we always compare for
equality?  

If the code had depended on constructs like "if (msg < FSCK_ERROR)",
this change would break badly.

> diff --git a/fsck.h b/fsck.h
> index 2ecc15eee7..fce9981a0c 100644
> --- a/fsck.h
> +++ b/fsck.h
> @@ -4,11 +4,13 @@
>  #include "oidset.h"
>  
>  enum fsck_msg_type {
> -	FSCK_INFO  = -2,
> -	FSCK_FATAL = -1,
> -	FSCK_ERROR = 1,
> +	/* for internal use only */
> +	FSCK_IGNORE,
> +	FSCK_INFO,
> +	FSCK_FATAL,
> +	/* "public", fed to e.g. error_func callbacks */
> +	FSCK_ERROR,
>  	FSCK_WARN,
> -	FSCK_IGNORE
>  };
>  
>  struct fsck_options;

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v4 14/22] fsck.c: undefine temporary STR macro after use
  2021-03-16 16:17                 ` [PATCH v4 14/22] fsck.c: undefine temporary STR macro after use Ævar Arnfjörð Bjarmason
@ 2021-03-17 18:57                   ` Junio C Hamano
  0 siblings, 0 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-03-17 18:57 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> In f417eed8cde (fsck: provide a function to parse fsck message IDs,
> 2015-06-22) the "STR" macro was introduced, but that short macro name
> was not undefined after use as was done earlier in the same series for
> the MSG_ID macro in c99ba492f1c (fsck: introduce identifiers for fsck
> messages, 2015-06-22).

Makes sense.  Thanks.

>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  fsck.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/fsck.c b/fsck.c
> index 2ccf1a2f0f..f4c924ed04 100644
> --- a/fsck.c
> +++ b/fsck.c
> @@ -100,6 +100,7 @@ static struct {
>  	{ NULL, NULL, NULL, -1 }
>  };
>  #undef MSG_ID
> +#undef STR
>  
>  static void prepare_msg_ids(void)
>  {

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v4 17/22] fsck.c: pass along the fsck_msg_id in the fsck_error callback
  2021-03-16 16:17                 ` [PATCH v4 17/22] fsck.c: pass along the fsck_msg_id in the fsck_error callback Ævar Arnfjörð Bjarmason
@ 2021-03-17 19:01                   ` Junio C Hamano
  0 siblings, 0 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-03-17 19:01 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Change the fsck_error callback to also pass along the
> fsck_msg_id. Before this change the only way to get the message id was
> to parse it back out of the "message".

Nice.

> Let's pass it down explicitly for the benefit of callers that might
> want to use it, as discussed in [1].
>
> Passing the msg_type is now redundant, as you can always get it back
> from the msg_id, but I'm not changing that convention. It's really
> common to need the msg_type, and the report() function itself (which
> calls "fsck_error") needs to call fsck_msg_type() to discover
> it. Let's not needlessly re-do that work in the user callback.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v4 22/22] fetch-pack: use new fsck API to printing dangling submodules
  2021-03-16 16:17                 ` [PATCH v4 22/22] fetch-pack: use new fsck API to printing dangling submodules Ævar Arnfjörð Bjarmason
  2021-03-16 19:32                   ` Derrick Stolee
@ 2021-03-17 19:12                   ` Junio C Hamano
  1 sibling, 0 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-03-17 19:12 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index 5ad80b85b4..11f0fafd33 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -120,7 +120,7 @@ static int nr_threads;
>  static int from_stdin;
>  static int strict;
>  static int do_fsck_object;
> -static struct fsck_options fsck_options = FSCK_OPTIONS_STRICT;
> +static struct fsck_options fsck_options = FSCK_OPTIONS_MISSING_GITMODULES;

Hmph, I do not think this is a good way to go.  Specifically,
fsck-cb.c with the definition of what this thing is, and in fsck.h
file the normal "options" initializers being defined quite far away
from where this is defined, it is hard to see what is different
between the normal strict one and MISSING_GITMODULES one.

Rather, it may be far simpler to keep only DEFAULT and STRICT, and
override .error_func at runtime in the codepath(s) that needs to,
which would make it more clear what is going on.  That way, we do
not need the split initializers with _ERROR_FUNC, which is another
reason why the approach taken by this series is not a good idea (it
does not scale---error-func may seem so special to deserve having
two sets of macros that use the default one and leave the member
unspecified, but it won't stay to be special forever).

IOW,

> @@ -1761,7 +1743,6 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
>  
>  	read_replace_refs = 0;
>  	fsck_options.walk = mark_link;
> -	fsck_options.error_func = print_dangling_gitmodules;

I doubt this hunk is an improvement.

> diff --git a/fsck-cb.c b/fsck-cb.c
> new file mode 100644
> index 0000000000..465a49235a
> --- /dev/null
> +++ b/fsck-cb.c
> @@ -0,0 +1,16 @@
> +#include "git-compat-util.h"
> +#include "fsck.h"
> +
> +int fsck_error_cb_print_missing_gitmodules(struct fsck_options *o,
> +					   const struct object_id *oid,
> +					   enum object_type object_type,
> +					   enum fsck_msg_type msg_type,
> +					   enum fsck_msg_id msg_id,
> +					   const char *message)
> +{
> +	if (msg_id == FSCK_MSG_GITMODULES_MISSING) {
> +		puts(oid_to_hex(oid));
> +		return 0;
> +	}
> +	return fsck_error_function(o, oid, object_type, msg_type, msg_id, message);
> +}

As Derrick noticed, I do not know if we want to have a separate file
for this single function.  Shouldn't it be part of builtin/index-pack.c,
or do we want other places to do the same kind of checks?

Thanks.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v4 22/22] fetch-pack: use new fsck API to printing dangling submodules
  2021-03-17 13:47                     ` Ævar Arnfjörð Bjarmason
@ 2021-03-17 20:27                       ` Derrick Stolee
  0 siblings, 0 replies; 229+ messages in thread
From: Derrick Stolee @ 2021-03-17 20:27 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan

On 3/17/2021 9:47 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Mar 16 2021, Derrick Stolee wrote:
> 
>> On 3/16/2021 12:17 PM, Ævar Arnfjörð Bjarmason wrote:
>>> I expect that there won't be many of these fsck utility functions in
>>> the future, so just having a single fsck-cb.c makes sense.
>>
>> I'm not convinced that having a single cb function merits its
>> own file. But, if you expect this pattern to be expanded a
>> couple more times, then I would say it is worth it. Do you have
>> such plans?
> 
> Not really, well. Vague ones, but nothing I have even local patches for.
> 
> It just seemed odd to stick random callback functions shared by related
> programs into fsck.h's interface, but I guess with
> FSCK_OPTIONS_MISSING_GITMODULES I already did that.
> 
> Do you suggest just putting it into fsck.c?

Yeah, if it is frequently paired with fsck operations, I think it
makes the most sense there.

And looking at it again, I'm not sure parse-options-cb.c has a
good excuse for being separate from parse-options.c, but that's
the current state so I wouldn't change it now.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v5 00/19] fsck: API improvements
  2021-03-17 18:20                   ` [PATCH v5 00/19] " Ævar Arnfjörð Bjarmason
@ 2021-03-17 20:30                     ` Derrick Stolee
  2021-03-17 21:06                     ` Junio C Hamano
  2021-03-28 13:15                     ` [PATCH v6 " Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 229+ messages in thread
From: Derrick Stolee @ 2021-03-17 20:30 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan

On 3/17/2021 2:20 PM, Ævar Arnfjörð Bjarmason wrote:
> A v5 with changes suggested by Derrick Stolee. Link to v4:
> https://lore.kernel.org/git/20210316161738.30254-1-avarab@gmail.com/
> 
> Changes:
> 
>  * 1/19 is new, it's a simple refactoring of some git_config() code in
>    fsck.c code I changed recently.

This new patch is simple, as advertised.
 
>  * Squashed the first 4x patches of incrementally redefining two
>    macros into one.

Thanks.

>  * Squashed a whitespace-only change into another patch that changed
>    the same code.
> 
>  * Got rid of fsck-cb.c, that one function just lives at the bottom of
>    fsck.c now.

I was late in giving you confirmation that fsck.c is a good place,
but you got there, anyway. Thanks!

This version LGTM.

-Stolee

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v5 00/19] fsck: API improvements
  2021-03-17 18:20                   ` [PATCH v5 00/19] " Ævar Arnfjörð Bjarmason
  2021-03-17 20:30                     ` Derrick Stolee
@ 2021-03-17 21:06                     ` Junio C Hamano
  2021-03-28 13:15                     ` [PATCH v6 " Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-03-17 21:06 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan, Derrick Stolee

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> A v5 with changes suggested by Derrick Stolee. Link to v4:
> https://lore.kernel.org/git/20210316161738.30254-1-avarab@gmail.com/
>
> Changes:
>
>  * 1/19 is new, it's a simple refactoring of some git_config() code in
>    fsck.c code I changed recently.

The new step makes sense.  I think the series and my comment e-mails
crossed, and all of my comments on the previous round still applies
(including the fsck-cb.c thing, which I think should be added to its
sole user index-pack.c, not to fsck.c).

Thanks, will queue.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v4 01/22] fsck.h: update FSCK_OPTIONS_* for object_name
  2021-03-16 16:17                 ` [PATCH v4 01/22] fsck.h: update FSCK_OPTIONS_* for object_name Ævar Arnfjörð Bjarmason
  2021-03-17 18:35                   ` Junio C Hamano
@ 2021-03-19 14:43                   ` Johannes Schindelin
  2021-03-20  9:16                     ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 229+ messages in thread
From: Johannes Schindelin @ 2021-03-19 14:43 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan

[-- Attachment #1: Type: text/plain, Size: 1441 bytes --]

Hi Ævar,

just a general note: this patch, which is the first of v4, is marked as
replying to the cover letter of v3. That feels quite odd. If you use
threading, why not let it reply to the cover letter of the same patch
series iteration?

In other words, would you mind using the `--thread=shallow` option in the
future, for better structuring on the mailing list?

Thanks,
Johannes

On Tue, 16 Mar 2021, Ævar Arnfjörð Bjarmason wrote:

> Add the object_name member to the initialization macro. This was
> omitted in 7b35efd734e (fsck_walk(): optionally name objects on the
> go, 2016-07-17) when the field was added.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  fsck.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fsck.h b/fsck.h
> index 733378f126..2274843ba0 100644
> --- a/fsck.h
> +++ b/fsck.h
> @@ -43,8 +43,8 @@ struct fsck_options {
>  	kh_oid_map_t *object_names;
>  };
>
> -#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT }
> -#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT }
> +#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT, NULL }
> +#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT, NULL }
>
>  /* descend in all linked child objects
>   * the return value is:
> --
> 2.31.0.260.g719c683c1d
>
>

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v4 01/22] fsck.h: update FSCK_OPTIONS_* for object_name
  2021-03-19 14:43                   ` Johannes Schindelin
@ 2021-03-20  9:16                     ` Ævar Arnfjörð Bjarmason
  2021-03-20 20:04                       ` Junio C Hamano
  0 siblings, 1 reply; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-20  9:16 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Junio C Hamano, Jeff King, Jonathan Tan


On Fri, Mar 19 2021, Johannes Schindelin wrote:

> Hi Ævar,
>
> just a general note: this patch, which is the first of v4, is marked as
> replying to the cover letter of v3. That feels quite odd. If you use
> threading, why not let it reply to the cover letter of the same patch
> series iteration?
>
> In other words, would you mind using the `--thread=shallow` option in the
> future, for better structuring on the mailing list?

Not at all, I've set it in my config now.

I've just been using the default configuration of format-patch
--in-reply-to --cover-letter && send-email *.patch all this time.

Looking around at other patch submissions (aside from GGG) this seems to
be the norm though, but isn't documented in SubmittingPatches
etc. AFAICT.

So I wonder if I'm using some different process from the norm, or if
most everyone else is just looking carefully at Message-ID/In-Reply-To
norms before sending...

> On Tue, 16 Mar 2021, Ævar Arnfjörð Bjarmason wrote:
>
>> Add the object_name member to the initialization macro. This was
>> omitted in 7b35efd734e (fsck_walk(): optionally name objects on the
>> go, 2016-07-17) when the field was added.
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>>  fsck.h | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/fsck.h b/fsck.h
>> index 733378f126..2274843ba0 100644
>> --- a/fsck.h
>> +++ b/fsck.h
>> @@ -43,8 +43,8 @@ struct fsck_options {
>>  	kh_oid_map_t *object_names;
>>  };
>>
>> -#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT }
>> -#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT }
>> +#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT, NULL }
>> +#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT, NULL }
>>
>>  /* descend in all linked child objects
>>   * the return value is:
>> --
>> 2.31.0.260.g719c683c1d
>>
>>


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v4 01/22] fsck.h: update FSCK_OPTIONS_* for object_name
  2021-03-20  9:16                     ` Ævar Arnfjörð Bjarmason
@ 2021-03-20 20:04                       ` Junio C Hamano
  0 siblings, 0 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-03-20 20:04 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Johannes Schindelin, git, Jeff King, Jonathan Tan

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> In other words, would you mind using the `--thread=shallow` option in the
>> future, for better structuring on the mailing list?
>
> Not at all, I've set it in my config now.
>
> I've just been using the default configuration of format-patch
> --in-reply-to --cover-letter && send-email *.patch all this time.
> ...
> So I wonder if I'm using some different process from the norm, or if
> most everyone else is just looking carefully at Message-ID/In-Reply-To
> norms before sending...

Interesting.  I always let send-email assign the message IDs and
haven't used --thread=<any> option at all.  In other words, my
format-patch output files have no message IDs in them or In-reply-to
header fields.  That in turn means that in-reply-to is decided not
when format-patch is run, but when send-email sends things out, it
gives them the ids and structures the in-reply-to chains.

I guess we have too much flexibility in our tooling X-<.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* [PATCH v6 00/19] fsck: API improvements
  2021-03-17 18:20                   ` [PATCH v5 00/19] " Ævar Arnfjörð Bjarmason
  2021-03-17 20:30                     ` Derrick Stolee
  2021-03-17 21:06                     ` Junio C Hamano
@ 2021-03-28 13:15                     ` Ævar Arnfjörð Bjarmason
  2021-03-28 13:15                       ` [PATCH v6 01/19] fsck.c: refactor and rename common config callback Ævar Arnfjörð Bjarmason
                                         ` (19 more replies)
  2 siblings, 20 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28 13:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

To recap on the goals in v1[1] this series gets rid of the need to
have the rececently added "print_dangling_gitmodules" function in
favor of a better fsck API to get at that information.

Changes since v5[2]:

 * Addressed all outstanding feedback AFAICT
 * The fields we init to 0/NULL in the new designated initializer are
   gone
 * There were comments on the refactoring of append_msg_id(), It turns
   out that we can entirely remove that function. So a new commit go
   added + one ejected to do that.
 * Clarifications in commit messages.
 * I'd still left behind a remnant of the old
   "print_dangling_gitmodules" code in v5's last commit. I.e. we had
   code that was accumulating its own list of gitmodules OIDs and then
   injecting into the fsck state, now that the fsck state tracks those
   itself we can that list directly instead.

1. https://lore.kernel.org/git/20210217194246.25342-1-avarab@gmail.com/
2. https://lore.kernel.org/git/20210317182054.5986-1-avarab@gmail.com/

Ævar Arnfjörð Bjarmason (19):
  fsck.c: refactor and rename common config callback
  fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT}
  fsck.h: use "enum object_type" instead of "int"
  fsck.c: rename variables in fsck_set_msg_type() for less confusion
  fsck.c: remove (mostly) redundant append_msg_id() function
  fsck.c: rename remaining fsck_msg_id "id" to "msg_id"
  fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type"
  fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
  fsck.h: re-order and re-assign "enum fsck_msg_type"
  fsck.c: call parse_msg_type() early in fsck_set_msg_type()
  fsck.c: undefine temporary STR macro after use
  fsck.c: give "FOREACH_MSG_ID" a more specific name
  fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h
  fsck.c: pass along the fsck_msg_id in the fsck_error callback
  fsck.c: add an fsck_set_msg_type() API that takes enums
  fsck.c: move gitmodules_{found,done} into fsck_options
  fetch-pack: don't needlessly copy fsck_options
  fetch-pack: use file-scope static struct for fsck_options
  fetch-pack: use new fsck API to printing dangling submodules

 builtin/fsck.c           |  14 ++-
 builtin/index-pack.c     |  30 +-----
 builtin/mktag.c          |  14 ++-
 builtin/unpack-objects.c |   3 +-
 fetch-pack.c             |  31 ++----
 fsck.c                   | 207 +++++++++++++--------------------------
 fsck.h                   | 127 +++++++++++++++++++++---
 7 files changed, 210 insertions(+), 216 deletions(-)

Range-diff:
 1:  fe33015e0d9 =  1:  579af32ab3e fsck.c: refactor and rename common config callback
 2:  72f2e53afac !  2:  b17c982293e fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT}
    @@ Commit message
         fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT}
     
         Refactor the definitions of FSCK_OPTIONS_{DEFAULT,STRICT} to use
    -    designated initializers.
    -
    -    While I'm at it add the "object_names" member to the
    -    initialization. This was omitted in 7b35efd734e (fsck_walk():
    -    optionally name objects on the go, 2016-07-17) when the field was
    -    added.
    -
    -    I'm using a new FSCK_OPTIONS_COMMON and FSCK_OPTIONS_COMMON_ERROR_FUNC
    -    helper macros to define what FSCK_OPTIONS_{DEFAULT,STRICT} have in
    -    common, and define the two in terms of those macro.
    -
    -    The FSCK_OPTIONS_COMMON macro will be used in a subsequent commit to
    -    define other variants of common fsck initialization that wants to use
    -    a custom error function, but share the rest of the defaults.
    +    designated initializers. This allows us to omit those fields that
    +    aren't initialized to zero or NULL.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    @@ fsck.h: struct fsck_options {
      
     -#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT }
     -#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT }
    -+#define FSCK_OPTIONS_COMMON \
    -+	.walk = NULL, \
    -+	.msg_type = NULL, \
    ++#define FSCK_OPTIONS_DEFAULT { \
     +	.skiplist = OIDSET_INIT, \
    -+	.object_names = NULL,
    -+#define FSCK_OPTIONS_COMMON_ERROR_FUNC \
    -+	FSCK_OPTIONS_COMMON \
    -+	.error_func = fsck_error_function
    -+
    -+#define FSCK_OPTIONS_DEFAULT	{ .strict = 0, FSCK_OPTIONS_COMMON_ERROR_FUNC }
    -+#define FSCK_OPTIONS_STRICT	{ .strict = 1, FSCK_OPTIONS_COMMON_ERROR_FUNC }
    ++	.error_func = fsck_error_function \
    ++}
    ++#define FSCK_OPTIONS_STRICT { \
    ++	.strict = 1, \
    ++	.error_func = fsck_error_function, \
    ++}
      
      /* descend in all linked child objects
       * the return value is:
 3:  237a2806865 =  3:  a721c396c50 fsck.h: use "enum object_type" instead of "int"
 4:  13b76c73dd7 =  4:  fcdba2f8fe8 fsck.c: rename variables in fsck_set_msg_type() for less confusion
 5:  4ae83403b73 !  5:  b07e8e026ac fsck.c: move definition of msg_id into append_msg_id()
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    fsck.c: move definition of msg_id into append_msg_id()
    +    fsck.c: remove (mostly) redundant append_msg_id() function
     
    -    Refactor code added in 71ab8fa840f (fsck: report the ID of the
    -    error/warning, 2015-06-22) to resolve the msg_id to a string in the
    -    function that wants it, instead of doing it in report().
    +    Remove the append_msg_id() function in favor of calling
    +    prepare_msg_ids(). We already have code to compute the camel-cased
    +    msg_id strings in msg_id_info, let's use it.
    +
    +    When the append_msg_id() function was added in 71ab8fa840f (fsck:
    +    report the ID of the error/warning, 2015-06-22) the prepare_msg_ids()
    +    function didn't exist. When prepare_msg_ids() was added in
    +    a46baac61eb (fsck: factor out msg_id_info[] lazy initialization code,
    +    2018-05-26) this code wasn't moved over to lazy initialization.
    +
    +    This changes the behavior of the code to initialize all the messages
    +    instead of just camel-casing the one we need on the fly. Since the
    +    common case is that we're printing just one message this is mostly
    +    redundant work.
    +
    +    But that's OK in this case, reporting this fsck issue to the user
    +    isn't performance-sensitive. If we were somehow doing so in a tight
    +    loop (in a hopelessly broken repository?) this would help, since we'd
    +    save ourselves from re-doing this work for identical messages, we
    +    could just grab the prepared string from msg_id_info after the first
    +    invocation.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    @@ fsck.c: void fsck_set_msg_types(struct fsck_options *options, const char *values
      }
      
     -static void append_msg_id(struct strbuf *sb, const char *msg_id)
    -+static void append_msg_id(struct strbuf *sb, enum fsck_msg_id id)
    +-{
    +-	for (;;) {
    +-		char c = *(msg_id)++;
    +-
    +-		if (!c)
    +-			break;
    +-		if (c != '_')
    +-			strbuf_addch(sb, tolower(c));
    +-		else {
    +-			assert(*msg_id);
    +-			strbuf_addch(sb, *(msg_id)++);
    +-		}
    +-	}
    +-
    +-	strbuf_addstr(sb, ": ");
    +-}
    +-
    + static int object_on_skiplist(struct fsck_options *opts,
    + 			      const struct object_id *oid)
      {
    -+	const char *msg_id = msg_id_info[id].id_string;
    - 	for (;;) {
    - 		char c = *(msg_id)++;
    - 
     @@ fsck.c: static int report(struct fsck_options *options,
      	else if (msg_type == FSCK_INFO)
      		msg_type = FSCK_WARN;
      
     -	append_msg_id(&sb, msg_id_info[id].id_string);
    -+	append_msg_id(&sb, id);
    ++	prepare_msg_ids();
    ++	strbuf_addf(&sb, "%s: ", msg_id_info[id].camelcased);
      
      	va_start(ap, fmt);
      	strbuf_vaddf(&sb, fmt, ap);
 6:  82107f1dac0 !  6:  321b0c652de fsck.c: rename remaining fsck_msg_id "id" to "msg_id"
    @@ Commit message
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## fsck.c ##
    -@@ fsck.c: void fsck_set_msg_types(struct fsck_options *options, const char *values)
    - 	free(to_free);
    - }
    - 
    --static void append_msg_id(struct strbuf *sb, enum fsck_msg_id id)
    -+static void append_msg_id(struct strbuf *sb, enum fsck_msg_id msg_id)
    - {
    --	const char *msg_id = msg_id_info[id].id_string;
    -+	const char *msg_id_str = msg_id_info[msg_id].id_string;
    - 	for (;;) {
    --		char c = *(msg_id)++;
    -+		char c = *(msg_id_str)++;
    - 
    - 		if (!c)
    - 			break;
    - 		if (c != '_')
    - 			strbuf_addch(sb, tolower(c));
    - 		else {
    --			assert(*msg_id);
    --			strbuf_addch(sb, *(msg_id)++);
    -+			assert(*msg_id_str);
    -+			strbuf_addch(sb, *(msg_id_str)++);
    - 		}
    - 	}
    - 
     @@ fsck.c: static int object_on_skiplist(struct fsck_options *opts,
      __attribute__((format (printf, 5, 6)))
      static int report(struct fsck_options *options,
    @@ fsck.c: static int object_on_skiplist(struct fsck_options *opts,
      	if (msg_type == FSCK_IGNORE)
      		return 0;
     @@ fsck.c: static int report(struct fsck_options *options,
    - 	else if (msg_type == FSCK_INFO)
      		msg_type = FSCK_WARN;
      
    --	append_msg_id(&sb, id);
    -+	append_msg_id(&sb, msg_id);
    + 	prepare_msg_ids();
    +-	strbuf_addf(&sb, "%s: ", msg_id_info[id].camelcased);
    ++	strbuf_addf(&sb, "%s: ", msg_id_info[msg_id].camelcased);
      
      	va_start(ap, fmt);
      	strbuf_vaddf(&sb, fmt, ap);
 7:  796096bf73e =  7:  948689ad5c8 fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type"
 8:  3664abb23de =  8:  8ea468bf4d8 fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
 9:  81e6d7ab450 !  9:  9316b35cd3b fsck.h: re-order and re-assign "enum fsck_msg_type"
    @@ Commit message
         defined as "2".
     
         I'm confident that nothing relies on these values, we always compare
    -    them explicitly. Let's not omit "0" so it won't be assumed that we're
    -    using these as a boolean somewhere.
    +    them for equality. Let's not omit "0" so it won't be assumed that
    +    we're using these as a boolean somewhere.
     
         This also allows us to re-structure the fields to mark which are
         "private" v.s. "public". See the preceding commit for a rationale for
10:  5c2e8e7b842 = 10:  d7f1c5d37de fsck.c: call parse_msg_type() early in fsck_set_msg_type()
11:  7ffbf9af3fa = 11:  ae5efd745cf fsck.c: undefine temporary STR macro after use
12:  12ff0f75ebf = 12:  96995244806 fsck.c: give "FOREACH_MSG_ID" a more specific name
13:  0c49dd5164f = 13:  1b42aea3a64 fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h
14:  900263f503a = 14:  563e6a0e5e6 fsck.c: pass along the fsck_msg_id in the fsck_error callback
15:  5f270e88a0a = 15:  5e504f25c51 fsck.c: add an fsck_set_msg_type() API that takes enums
16:  539d0197129 ! 16:  611631dd779 fsck.c: move gitmodules_{found,done} into fsck_options
    @@ Commit message
         gitmodules_found attribute of "fsck_options" we need this intermediate
         step first.
     
    +    An earlier version of this patch removed the small amount of
    +    duplication we now have between FSCK_OPTIONS_{DEFAULT,STRICT} with a
    +    FSCK_OPTIONS_COMMON macro. I don't think such de-duplication is worth
    +    it for this amount of copy/pasting.
    +
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## fetch-pack.c ##
    @@ fsck.h: struct fsck_options {
      	kh_oid_map_t *object_names;
      };
      
    -@@ fsck.h: struct fsck_options {
    - 	.walk = NULL, \
    - 	.msg_type = NULL, \
    + #define FSCK_OPTIONS_DEFAULT { \
      	.skiplist = OIDSET_INIT, \
     +	.gitmodules_found = OIDSET_INIT, \
     +	.gitmodules_done = OIDSET_INIT, \
    - 	.object_names = NULL,
    - #define FSCK_OPTIONS_COMMON_ERROR_FUNC \
    - 	FSCK_OPTIONS_COMMON \
    + 	.error_func = fsck_error_function \
    + }
    + #define FSCK_OPTIONS_STRICT { \
    + 	.strict = 1, \
    ++	.gitmodules_found = OIDSET_INIT, \
    ++	.gitmodules_done = OIDSET_INIT, \
    + 	.error_func = fsck_error_function, \
    + }
    + 
     @@ fsck.h: int fsck_walk(struct object *obj, void *data, struct fsck_options *options);
      int fsck_object(struct object *obj, void *data, unsigned long size,
      	struct fsck_options *options);
17:  1acf7442365 = 17:  03d512c8448 fetch-pack: don't needlessly copy fsck_options
18:  b47c3d5ac6f = 18:  581c87c63c6 fetch-pack: use file-scope static struct for fsck_options
19:  f05fa5c3ec9 ! 19:  6a38cade8c3 fetch-pack: use new fsck API to printing dangling submodules
    @@ fetch-pack.c: static int server_supports_filtering;
      static struct strbuf fsck_msg_types = STRBUF_INIT;
      static struct string_list uri_protocols = STRING_LIST_INIT_DUP;
      
    -@@ fetch-pack.c: static void fsck_gitmodules_oids(struct oidset *gitmodules_oids)
    +@@ fetch-pack.c: static int cmp_ref_by_name(const void *a_, const void *b_)
    + 	return strcmp(a->name, b->name);
    + }
      
    - 	oidset_iter_init(gitmodules_oids, &iter);
    - 	while ((oid = oidset_iter_next(&iter)))
    +-static void fsck_gitmodules_oids(struct oidset *gitmodules_oids)
    +-{
    +-	struct oidset_iter iter;
    +-	const struct object_id *oid;
    +-
    +-	if (!oidset_size(gitmodules_oids))
    +-		return;
    +-
    +-	oidset_iter_init(gitmodules_oids, &iter);
    +-	while ((oid = oidset_iter_next(&iter)))
     -		register_found_gitmodules(&fsck_options, oid);
    -+		oidset_insert(&fsck_options.gitmodules_found, oid);
    - 	if (fsck_finish(&fsck_options))
    - 		die("fsck failed");
    - }
    +-	if (fsck_finish(&fsck_options))
    +-		die("fsck failed");
    +-}
    +-
    + static struct ref *do_fetch_pack(struct fetch_pack_args *args,
    + 				 int fd[2],
    + 				 const struct ref *orig_ref,
    +@@ fetch-pack.c: static struct ref *do_fetch_pack(struct fetch_pack_args *args,
    + 	int agent_len;
    + 	struct fetch_negotiator negotiator_alloc;
    + 	struct fetch_negotiator *negotiator;
    +-	struct oidset gitmodules_oids = OIDSET_INIT;
    + 
    + 	negotiator = &negotiator_alloc;
    + 	fetch_negotiator_init(r, negotiator);
    +@@ fetch-pack.c: static struct ref *do_fetch_pack(struct fetch_pack_args *args,
    + 	else
    + 		alternate_shallow_file = NULL;
    + 	if (get_pack(args, fd, pack_lockfiles, NULL, sought, nr_sought,
    +-		     &gitmodules_oids))
    ++		     &fsck_options.gitmodules_found))
    + 		die(_("git fetch-pack: fetch failed."));
    +-	fsck_gitmodules_oids(&gitmodules_oids);
    ++	if (fsck_finish(&fsck_options))
    ++		die("fsck failed");
    + 
    +  all_done:
    + 	if (negotiator)
    +@@ fetch-pack.c: static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
    + 	struct string_list packfile_uris = STRING_LIST_INIT_DUP;
    + 	int i;
    + 	struct strvec index_pack_args = STRVEC_INIT;
    +-	struct oidset gitmodules_oids = OIDSET_INIT;
    + 
    + 	negotiator = &negotiator_alloc;
    + 	fetch_negotiator_init(r, negotiator);
    +@@ fetch-pack.c: static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
    + 			process_section_header(&reader, "packfile", 0);
    + 			if (get_pack(args, fd, pack_lockfiles,
    + 				     packfile_uris.nr ? &index_pack_args : NULL,
    +-				     sought, nr_sought, &gitmodules_oids))
    ++				     sought, nr_sought, &fsck_options.gitmodules_found))
    + 				die(_("git fetch-pack: fetch failed."));
    + 			do_check_stateless_delimiter(args, &reader);
    + 
    +@@ fetch-pack.c: static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
    + 
    + 		packname[the_hash_algo->hexsz] = '\0';
    + 
    +-		parse_gitmodules_oids(cmd.out, &gitmodules_oids);
    ++		parse_gitmodules_oids(cmd.out, &fsck_options.gitmodules_found);
    + 
    + 		close(cmd.out);
    + 
    +@@ fetch-pack.c: static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
    + 	string_list_clear(&packfile_uris, 0);
    + 	strvec_clear(&index_pack_args);
    + 
    +-	fsck_gitmodules_oids(&gitmodules_oids);
    ++	if (fsck_finish(&fsck_options))
    ++		die("fsck failed");
    + 
    + 	if (negotiator)
    + 		negotiator->release(negotiator);
     
      ## fsck.c ##
     @@ fsck.c: int fsck_error_function(struct fsck_options *o,
    @@ fsck.c: int git_fsck_config(const char *var, const char *value, void *cb)
     +}
     
      ## fsck.h ##
    +@@ fsck.h: int fsck_error_function(struct fsck_options *o,
    + 			const struct object_id *oid, enum object_type object_type,
    + 			enum fsck_msg_type msg_type, enum fsck_msg_id msg_id,
    + 			const char *message);
    ++int fsck_error_cb_print_missing_gitmodules(struct fsck_options *o,
    ++					   const struct object_id *oid,
    ++					   enum object_type object_type,
    ++					   enum fsck_msg_type msg_type,
    ++					   enum fsck_msg_id msg_id,
    ++					   const char *message);
    + 
    + struct fsck_options {
    + 	fsck_walk_func walk;
    +@@ fsck.h: struct fsck_options {
    + 	.gitmodules_done = OIDSET_INIT, \
    + 	.error_func = fsck_error_function, \
    + }
    ++#define FSCK_OPTIONS_MISSING_GITMODULES { \
    ++	.strict = 1, \
    ++	.gitmodules_found = OIDSET_INIT, \
    ++	.gitmodules_done = OIDSET_INIT, \
    ++	.error_func = fsck_error_cb_print_missing_gitmodules, \
    ++}
    + 
    + /* descend in all linked child objects
    +  * the return value is:
     @@ fsck.h: int fsck_walk(struct object *obj, void *data, struct fsck_options *options);
      int fsck_object(struct object *obj, void *data, unsigned long size,
      	struct fsck_options *options);
    @@ fsck.h: int fsck_walk(struct object *obj, void *data, struct fsck_options *optio
      /*
       * fsck a tag, and pass info about it back to the caller. This is
       * exposed fsck_object() internals for git-mktag(1).
    -@@ fsck.h: const char *fsck_describe_object(struct fsck_options *options,
    -  */
    - int git_fsck_config(const char *var, const char *value, void *cb);
    - 
    -+/*
    -+ * Custom error callbacks that are used in more than one place.
    -+ */
    -+#define FSCK_OPTIONS_MISSING_GITMODULES { \
    -+	.strict = 1, \
    -+	.error_func = fsck_error_cb_print_missing_gitmodules, \
    -+	FSCK_OPTIONS_COMMON \
    -+}
    -+int fsck_error_cb_print_missing_gitmodules(struct fsck_options *o,
    -+					   const struct object_id *oid,
    -+					   enum object_type object_type,
    -+					   enum fsck_msg_type msg_type,
    -+					   enum fsck_msg_id msg_id,
    -+					   const char *message);
    -+
    - #endif
-- 
2.31.1.445.g087790d4945


^ permalink raw reply	[flat|nested] 229+ messages in thread

* [PATCH v6 01/19] fsck.c: refactor and rename common config callback
  2021-03-28 13:15                     ` [PATCH v6 " Ævar Arnfjörð Bjarmason
@ 2021-03-28 13:15                       ` Ævar Arnfjörð Bjarmason
  2021-03-28 13:15                       ` [PATCH v6 02/19] fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT} Ævar Arnfjörð Bjarmason
                                         ` (18 subsequent siblings)
  19 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28 13:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Refactor code I recently changed in 1f3299fda9 (fsck: make
fsck_config() re-usable, 2021-01-05) so that I could use fsck's config
callback in mktag in 1f3299fda9 (fsck: make fsck_config() re-usable,
2021-01-05).

I don't know what I was thinking in structuring the code this way, but
it clearly makes no sense to have an fsck_config_internal() at all
just so it can get a fsck_options when git_config() already supports
passing along some void* data.

Let's just make use of that instead, which gets us rid of the two
wrapper functions, and brings fsck's common config callback in line
with other such reusable config callbacks.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  | 7 +------
 builtin/mktag.c | 7 +------
 fsck.c          | 4 ++--
 fsck.h          | 3 +--
 4 files changed, 5 insertions(+), 16 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 821e7798c70..a56a2d0513a 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -71,11 +71,6 @@ static const char *printable_type(const struct object_id *oid,
 	return ret;
 }
 
-static int fsck_config(const char *var, const char *value, void *cb)
-{
-	return fsck_config_internal(var, value, cb, &fsck_obj_options);
-}
-
 static int objerror(struct object *obj, const char *err)
 {
 	errors_found |= ERROR_OBJECT;
@@ -803,7 +798,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 	if (name_objects)
 		fsck_enable_object_names(&fsck_walk_options);
 
-	git_config(fsck_config, NULL);
+	git_config(git_fsck_config, &fsck_obj_options);
 
 	if (connectivity_only) {
 		for_each_loose_object(mark_loose_for_connectivity, NULL, 0);
diff --git a/builtin/mktag.c b/builtin/mktag.c
index 41a399a69e4..23c4b8763fa 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -14,11 +14,6 @@ static int option_strict = 1;
 
 static struct fsck_options fsck_options = FSCK_OPTIONS_STRICT;
 
-static int mktag_config(const char *var, const char *value, void *cb)
-{
-	return fsck_config_internal(var, value, cb, &fsck_options);
-}
-
 static int mktag_fsck_error_func(struct fsck_options *o,
 				 const struct object_id *oid,
 				 enum object_type object_type,
@@ -93,7 +88,7 @@ int cmd_mktag(int argc, const char **argv, const char *prefix)
 	fsck_options.error_func = mktag_fsck_error_func;
 	fsck_set_msg_type(&fsck_options, "extraheaderentry", "warn");
 	/* config might set fsck.extraHeaderEntry=* again */
-	git_config(mktag_config, NULL);
+	git_config(git_fsck_config, &fsck_options);
 	if (fsck_tag_standalone(NULL, buf.buf, buf.len, &fsck_options,
 				&tagged_oid, &tagged_type))
 		die(_("tag on stdin did not pass our strict fsck check"));
diff --git a/fsck.c b/fsck.c
index e3030f3b358..5dfb99665ae 100644
--- a/fsck.c
+++ b/fsck.c
@@ -1323,9 +1323,9 @@ int fsck_finish(struct fsck_options *options)
 	return ret;
 }
 
-int fsck_config_internal(const char *var, const char *value, void *cb,
-			 struct fsck_options *options)
+int git_fsck_config(const char *var, const char *value, void *cb)
 {
+	struct fsck_options *options = cb;
 	if (strcmp(var, "fsck.skiplist") == 0) {
 		const char *path;
 		struct strbuf sb = STRBUF_INIT;
diff --git a/fsck.h b/fsck.h
index 733378f1260..f70d11c5594 100644
--- a/fsck.h
+++ b/fsck.h
@@ -109,7 +109,6 @@ const char *fsck_describe_object(struct fsck_options *options,
  * git_config() callback for use by fsck-y tools that want to support
  * fsck.<msg> fsck.skipList etc.
  */
-int fsck_config_internal(const char *var, const char *value, void *cb,
-			 struct fsck_options *options);
+int git_fsck_config(const char *var, const char *value, void *cb);
 
 #endif
-- 
2.31.1.445.g087790d4945


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 02/19] fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT}
  2021-03-28 13:15                     ` [PATCH v6 " Ævar Arnfjörð Bjarmason
  2021-03-28 13:15                       ` [PATCH v6 01/19] fsck.c: refactor and rename common config callback Ævar Arnfjörð Bjarmason
@ 2021-03-28 13:15                       ` Ævar Arnfjörð Bjarmason
  2021-03-28 17:15                         ` Ramsay Jones
  2021-03-28 13:15                       ` [PATCH v6 03/19] fsck.h: use "enum object_type" instead of "int" Ævar Arnfjörð Bjarmason
                                         ` (17 subsequent siblings)
  19 siblings, 1 reply; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28 13:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Refactor the definitions of FSCK_OPTIONS_{DEFAULT,STRICT} to use
designated initializers. This allows us to omit those fields that
aren't initialized to zero or NULL.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.h | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fsck.h b/fsck.h
index f70d11c5594..73e8b9f3e4e 100644
--- a/fsck.h
+++ b/fsck.h
@@ -43,8 +43,14 @@ struct fsck_options {
 	kh_oid_map_t *object_names;
 };
 
-#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT }
-#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT }
+#define FSCK_OPTIONS_DEFAULT { \
+	.skiplist = OIDSET_INIT, \
+	.error_func = fsck_error_function \
+}
+#define FSCK_OPTIONS_STRICT { \
+	.strict = 1, \
+	.error_func = fsck_error_function, \
+}
 
 /* descend in all linked child objects
  * the return value is:
-- 
2.31.1.445.g087790d4945


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 03/19] fsck.h: use "enum object_type" instead of "int"
  2021-03-28 13:15                     ` [PATCH v6 " Ævar Arnfjörð Bjarmason
  2021-03-28 13:15                       ` [PATCH v6 01/19] fsck.c: refactor and rename common config callback Ævar Arnfjörð Bjarmason
  2021-03-28 13:15                       ` [PATCH v6 02/19] fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT} Ævar Arnfjörð Bjarmason
@ 2021-03-28 13:15                       ` Ævar Arnfjörð Bjarmason
  2021-03-28 13:15                       ` [PATCH v6 04/19] fsck.c: rename variables in fsck_set_msg_type() for less confusion Ævar Arnfjörð Bjarmason
                                         ` (16 subsequent siblings)
  19 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28 13:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Change the fsck_walk_func to use an "enum object_type" instead of an
"int" type. The types are compatible, and ever since this was added in
355885d5315 (add generic, type aware object chain walker, 2008-02-25)
we've used entries from object_type (OBJ_BLOB etc.).

So this doesn't really change anything as far as the generated code is
concerned, it just gives the compiler more information and makes this
easier to read.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c           | 3 ++-
 builtin/index-pack.c     | 3 ++-
 builtin/unpack-objects.c | 3 ++-
 fsck.h                   | 3 ++-
 4 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index a56a2d0513a..ed5f2af6b5c 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -192,7 +192,8 @@ static int traverse_reachable(void)
 	return !!result;
 }
 
-static int mark_used(struct object *obj, int type, void *data, struct fsck_options *options)
+static int mark_used(struct object *obj, enum object_type object_type,
+		     void *data, struct fsck_options *options)
 {
 	if (!obj)
 		return 1;
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 21899687e2c..f6e1178df90 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -212,7 +212,8 @@ static void cleanup_thread(void)
 	free(thread_data);
 }
 
-static int mark_link(struct object *obj, int type, void *data, struct fsck_options *options)
+static int mark_link(struct object *obj, enum object_type type,
+		     void *data, struct fsck_options *options)
 {
 	if (!obj)
 		return -1;
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index a4ba2ebac69..4a70b17f8fb 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -187,7 +187,8 @@ static void write_cached_object(struct object *obj, struct obj_buffer *obj_buf)
  * that have reachability requirements and calls this function.
  * Verify its reachability and validity recursively and write it out.
  */
-static int check_object(struct object *obj, int type, void *data, struct fsck_options *options)
+static int check_object(struct object *obj, enum object_type type,
+			void *data, struct fsck_options *options)
 {
 	struct obj_buffer *obj_buf;
 
diff --git a/fsck.h b/fsck.h
index 73e8b9f3e4e..f20f1259e84 100644
--- a/fsck.h
+++ b/fsck.h
@@ -23,7 +23,8 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type);
  *     <0	error signaled and abort
  *     >0	error signaled and do not abort
  */
-typedef int (*fsck_walk_func)(struct object *obj, int type, void *data, struct fsck_options *options);
+typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type,
+			      void *data, struct fsck_options *options);
 
 /* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
 typedef int (*fsck_error)(struct fsck_options *o,
-- 
2.31.1.445.g087790d4945


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 04/19] fsck.c: rename variables in fsck_set_msg_type() for less confusion
  2021-03-28 13:15                     ` [PATCH v6 " Ævar Arnfjörð Bjarmason
                                         ` (2 preceding siblings ...)
  2021-03-28 13:15                       ` [PATCH v6 03/19] fsck.h: use "enum object_type" instead of "int" Ævar Arnfjörð Bjarmason
@ 2021-03-28 13:15                       ` Ævar Arnfjörð Bjarmason
  2021-03-28 13:15                       ` [PATCH v6 05/19] fsck.c: remove (mostly) redundant append_msg_id() function Ævar Arnfjörð Bjarmason
                                         ` (15 subsequent siblings)
  19 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28 13:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Rename variables in a function added in 0282f4dced0 (fsck: offer a
function to demote fsck errors to warnings, 2015-06-22).

It was needlessly confusing that it took a "msg_type" argument, but
then later declared another "msg_type" of a different type.

Let's rename that to "severity", and rename "id" to "msg_id" and
"msg_id" to "msg_id_str" etc. This will make a follow-up change
smaller.

While I'm at it properly indent the fsck_set_msg_type() argument list.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 24 ++++++++++++------------
 fsck.h |  2 +-
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/fsck.c b/fsck.c
index 5dfb99665ae..7cc722a25cd 100644
--- a/fsck.c
+++ b/fsck.c
@@ -203,27 +203,27 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type)
 }
 
 void fsck_set_msg_type(struct fsck_options *options,
-		const char *msg_id, const char *msg_type)
+		       const char *msg_id_str, const char *msg_type_str)
 {
-	int id = parse_msg_id(msg_id), type;
+	int msg_id = parse_msg_id(msg_id_str), msg_type;
 
-	if (id < 0)
-		die("Unhandled message id: %s", msg_id);
-	type = parse_msg_type(msg_type);
+	if (msg_id < 0)
+		die("Unhandled message id: %s", msg_id_str);
+	msg_type = parse_msg_type(msg_type_str);
 
-	if (type != FSCK_ERROR && msg_id_info[id].msg_type == FSCK_FATAL)
-		die("Cannot demote %s to %s", msg_id, msg_type);
+	if (msg_type != FSCK_ERROR && msg_id_info[msg_id].msg_type == FSCK_FATAL)
+		die("Cannot demote %s to %s", msg_id_str, msg_type_str);
 
 	if (!options->msg_type) {
 		int i;
-		int *msg_type;
-		ALLOC_ARRAY(msg_type, FSCK_MSG_MAX);
+		int *severity;
+		ALLOC_ARRAY(severity, FSCK_MSG_MAX);
 		for (i = 0; i < FSCK_MSG_MAX; i++)
-			msg_type[i] = fsck_msg_type(i, options);
-		options->msg_type = msg_type;
+			severity[i] = fsck_msg_type(i, options);
+		options->msg_type = severity;
 	}
 
-	options->msg_type[id] = type;
+	options->msg_type[msg_id] = msg_type;
 }
 
 void fsck_set_msg_types(struct fsck_options *options, const char *values)
diff --git a/fsck.h b/fsck.h
index f20f1259e84..30a3acabc50 100644
--- a/fsck.h
+++ b/fsck.h
@@ -11,7 +11,7 @@ struct fsck_options;
 struct object;
 
 void fsck_set_msg_type(struct fsck_options *options,
-		const char *msg_id, const char *msg_type);
+		       const char *msg_id, const char *msg_type);
 void fsck_set_msg_types(struct fsck_options *options, const char *values);
 int is_valid_msg_type(const char *msg_id, const char *msg_type);
 
-- 
2.31.1.445.g087790d4945


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 05/19] fsck.c: remove (mostly) redundant append_msg_id() function
  2021-03-28 13:15                     ` [PATCH v6 " Ævar Arnfjörð Bjarmason
                                         ` (3 preceding siblings ...)
  2021-03-28 13:15                       ` [PATCH v6 04/19] fsck.c: rename variables in fsck_set_msg_type() for less confusion Ævar Arnfjörð Bjarmason
@ 2021-03-28 13:15                       ` Ævar Arnfjörð Bjarmason
  2021-03-28 13:15                       ` [PATCH v6 06/19] fsck.c: rename remaining fsck_msg_id "id" to "msg_id" Ævar Arnfjörð Bjarmason
                                         ` (14 subsequent siblings)
  19 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28 13:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Remove the append_msg_id() function in favor of calling
prepare_msg_ids(). We already have code to compute the camel-cased
msg_id strings in msg_id_info, let's use it.

When the append_msg_id() function was added in 71ab8fa840f (fsck:
report the ID of the error/warning, 2015-06-22) the prepare_msg_ids()
function didn't exist. When prepare_msg_ids() was added in
a46baac61eb (fsck: factor out msg_id_info[] lazy initialization code,
2018-05-26) this code wasn't moved over to lazy initialization.

This changes the behavior of the code to initialize all the messages
instead of just camel-casing the one we need on the fly. Since the
common case is that we're printing just one message this is mostly
redundant work.

But that's OK in this case, reporting this fsck issue to the user
isn't performance-sensitive. If we were somehow doing so in a tight
loop (in a hopelessly broken repository?) this would help, since we'd
save ourselves from re-doing this work for identical messages, we
could just grab the prepared string from msg_id_info after the first
invocation.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 21 ++-------------------
 1 file changed, 2 insertions(+), 19 deletions(-)

diff --git a/fsck.c b/fsck.c
index 7cc722a25cd..25c697fa6a2 100644
--- a/fsck.c
+++ b/fsck.c
@@ -264,24 +264,6 @@ void fsck_set_msg_types(struct fsck_options *options, const char *values)
 	free(to_free);
 }
 
-static void append_msg_id(struct strbuf *sb, const char *msg_id)
-{
-	for (;;) {
-		char c = *(msg_id)++;
-
-		if (!c)
-			break;
-		if (c != '_')
-			strbuf_addch(sb, tolower(c));
-		else {
-			assert(*msg_id);
-			strbuf_addch(sb, *(msg_id)++);
-		}
-	}
-
-	strbuf_addstr(sb, ": ");
-}
-
 static int object_on_skiplist(struct fsck_options *opts,
 			      const struct object_id *oid)
 {
@@ -308,7 +290,8 @@ static int report(struct fsck_options *options,
 	else if (msg_type == FSCK_INFO)
 		msg_type = FSCK_WARN;
 
-	append_msg_id(&sb, msg_id_info[id].id_string);
+	prepare_msg_ids();
+	strbuf_addf(&sb, "%s: ", msg_id_info[id].camelcased);
 
 	va_start(ap, fmt);
 	strbuf_vaddf(&sb, fmt, ap);
-- 
2.31.1.445.g087790d4945


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 06/19] fsck.c: rename remaining fsck_msg_id "id" to "msg_id"
  2021-03-28 13:15                     ` [PATCH v6 " Ævar Arnfjörð Bjarmason
                                         ` (4 preceding siblings ...)
  2021-03-28 13:15                       ` [PATCH v6 05/19] fsck.c: remove (mostly) redundant append_msg_id() function Ævar Arnfjörð Bjarmason
@ 2021-03-28 13:15                       ` Ævar Arnfjörð Bjarmason
  2021-03-28 13:15                       ` [PATCH v6 07/19] fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type" Ævar Arnfjörð Bjarmason
                                         ` (13 subsequent siblings)
  19 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28 13:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Rename the remaining variables of type fsck_msg_id from "id" to
"msg_id". This change is relatively small, and is worth the churn for
a later change where we have different id's in the "report" function.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fsck.c b/fsck.c
index 25c697fa6a2..a0463ea22cc 100644
--- a/fsck.c
+++ b/fsck.c
@@ -273,11 +273,11 @@ static int object_on_skiplist(struct fsck_options *opts,
 __attribute__((format (printf, 5, 6)))
 static int report(struct fsck_options *options,
 		  const struct object_id *oid, enum object_type object_type,
-		  enum fsck_msg_id id, const char *fmt, ...)
+		  enum fsck_msg_id msg_id, const char *fmt, ...)
 {
 	va_list ap;
 	struct strbuf sb = STRBUF_INIT;
-	int msg_type = fsck_msg_type(id, options), result;
+	int msg_type = fsck_msg_type(msg_id, options), result;
 
 	if (msg_type == FSCK_IGNORE)
 		return 0;
@@ -291,7 +291,7 @@ static int report(struct fsck_options *options,
 		msg_type = FSCK_WARN;
 
 	prepare_msg_ids();
-	strbuf_addf(&sb, "%s: ", msg_id_info[id].camelcased);
+	strbuf_addf(&sb, "%s: ", msg_id_info[msg_id].camelcased);
 
 	va_start(ap, fmt);
 	strbuf_vaddf(&sb, fmt, ap);
-- 
2.31.1.445.g087790d4945


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 07/19] fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type"
  2021-03-28 13:15                     ` [PATCH v6 " Ævar Arnfjörð Bjarmason
                                         ` (5 preceding siblings ...)
  2021-03-28 13:15                       ` [PATCH v6 06/19] fsck.c: rename remaining fsck_msg_id "id" to "msg_id" Ævar Arnfjörð Bjarmason
@ 2021-03-28 13:15                       ` Ævar Arnfjörð Bjarmason
  2021-03-28 13:15                       ` [PATCH v6 08/19] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Ævar Arnfjörð Bjarmason
                                         ` (12 subsequent siblings)
  19 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28 13:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Refactor "if options->msg_type" and other code added in
0282f4dced0 (fsck: offer a function to demote fsck errors to warnings,
2015-06-22) to reduce the scope of the "int msg_type" variable.

This is in preparation for changing its type in a subsequent commit,
only using it in the "!options->msg_type" scope makes that change

This also brings the code in line with the fsck_set_msg_type()
function (also added in 0282f4dced0), which does a similar check for
"!options->msg_type". Another minor benefit is getting rid of the
style violation of not having braces for the body of the "if".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/fsck.c b/fsck.c
index a0463ea22cc..8614ee2c2a0 100644
--- a/fsck.c
+++ b/fsck.c
@@ -167,19 +167,17 @@ void list_config_fsck_msg_ids(struct string_list *list, const char *prefix)
 static int fsck_msg_type(enum fsck_msg_id msg_id,
 	struct fsck_options *options)
 {
-	int msg_type;
-
 	assert(msg_id >= 0 && msg_id < FSCK_MSG_MAX);
 
-	if (options->msg_type)
-		msg_type = options->msg_type[msg_id];
-	else {
-		msg_type = msg_id_info[msg_id].msg_type;
+	if (!options->msg_type) {
+		int msg_type = msg_id_info[msg_id].msg_type;
+
 		if (options->strict && msg_type == FSCK_WARN)
 			msg_type = FSCK_ERROR;
+		return msg_type;
 	}
 
-	return msg_type;
+	return options->msg_type[msg_id];
 }
 
 static int parse_msg_type(const char *str)
-- 
2.31.1.445.g087790d4945


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 08/19] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
  2021-03-28 13:15                     ` [PATCH v6 " Ævar Arnfjörð Bjarmason
                                         ` (6 preceding siblings ...)
  2021-03-28 13:15                       ` [PATCH v6 07/19] fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type" Ævar Arnfjörð Bjarmason
@ 2021-03-28 13:15                       ` Ævar Arnfjörð Bjarmason
  2021-03-28 13:15                       ` [PATCH v6 09/19] fsck.h: re-order and re-assign "enum fsck_msg_type" Ævar Arnfjörð Bjarmason
                                         ` (11 subsequent siblings)
  19 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28 13:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Move the FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} defines into a new
fsck_msg_type enum.

These defines were originally introduced in:

 - ba002f3b28a (builtin-fsck: move common object checking code to
   fsck.c, 2008-02-25)
 - f50c4407305 (fsck: disallow demoting grave fsck errors to warnings,
   2015-06-22)
 - efaba7cc77f (fsck: optionally ignore specific fsck issues
   completely, 2015-06-22)
 - f27d05b1704 (fsck: allow upgrading fsck warnings to errors,
   2015-06-22)

The reason these were defined in two different places is because we
use FSCK_{IGNORE,INFO,FATAL} only in fsck.c, but FSCK_{ERROR,WARN} are
used by external callbacks.

Untangling that would take some more work, since we expose the new
"enum fsck_msg_type" to both. Similar to "enum object_type" it's not
worth structuring the API in such a way that only those who need
FSCK_{ERROR,WARN} pass around a different type.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c       |  2 +-
 builtin/index-pack.c |  3 ++-
 builtin/mktag.c      |  3 ++-
 fsck.c               | 21 ++++++++++-----------
 fsck.h               | 16 ++++++++++------
 5 files changed, 25 insertions(+), 20 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index ed5f2af6b5c..17940a4e24a 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -84,7 +84,7 @@ static int objerror(struct object *obj, const char *err)
 static int fsck_error_func(struct fsck_options *o,
 			   const struct object_id *oid,
 			   enum object_type object_type,
-			   int msg_type, const char *message)
+			   enum fsck_msg_type msg_type, const char *message)
 {
 	switch (msg_type) {
 	case FSCK_WARN:
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index f6e1178df90..8338b832b63 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1716,7 +1716,8 @@ static void show_pack_info(int stat_only)
 static int print_dangling_gitmodules(struct fsck_options *o,
 				     const struct object_id *oid,
 				     enum object_type object_type,
-				     int msg_type, const char *message)
+				     enum fsck_msg_type msg_type,
+				     const char *message)
 {
 	/*
 	 * NEEDSWORK: Plumb the MSG_ID (from fsck.c) here and use it
diff --git a/builtin/mktag.c b/builtin/mktag.c
index 23c4b8763fa..052a510ad7f 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -17,7 +17,8 @@ static struct fsck_options fsck_options = FSCK_OPTIONS_STRICT;
 static int mktag_fsck_error_func(struct fsck_options *o,
 				 const struct object_id *oid,
 				 enum object_type object_type,
-				 int msg_type, const char *message)
+				 enum fsck_msg_type msg_type,
+				 const char *message)
 {
 	switch (msg_type) {
 	case FSCK_WARN:
diff --git a/fsck.c b/fsck.c
index 8614ee2c2a0..c5a81e4ff05 100644
--- a/fsck.c
+++ b/fsck.c
@@ -22,9 +22,6 @@
 static struct oidset gitmodules_found = OIDSET_INIT;
 static struct oidset gitmodules_done = OIDSET_INIT;
 
-#define FSCK_FATAL -1
-#define FSCK_INFO -2
-
 #define FOREACH_MSG_ID(FUNC) \
 	/* fatal errors */ \
 	FUNC(NUL_IN_HEADER, FATAL) \
@@ -97,7 +94,7 @@ static struct {
 	const char *id_string;
 	const char *downcased;
 	const char *camelcased;
-	int msg_type;
+	enum fsck_msg_type msg_type;
 } msg_id_info[FSCK_MSG_MAX + 1] = {
 	FOREACH_MSG_ID(MSG_ID)
 	{ NULL, NULL, NULL, -1 }
@@ -164,13 +161,13 @@ void list_config_fsck_msg_ids(struct string_list *list, const char *prefix)
 		list_config_item(list, prefix, msg_id_info[i].camelcased);
 }
 
-static int fsck_msg_type(enum fsck_msg_id msg_id,
+static enum fsck_msg_type fsck_msg_type(enum fsck_msg_id msg_id,
 	struct fsck_options *options)
 {
 	assert(msg_id >= 0 && msg_id < FSCK_MSG_MAX);
 
 	if (!options->msg_type) {
-		int msg_type = msg_id_info[msg_id].msg_type;
+		enum fsck_msg_type msg_type = msg_id_info[msg_id].msg_type;
 
 		if (options->strict && msg_type == FSCK_WARN)
 			msg_type = FSCK_ERROR;
@@ -180,7 +177,7 @@ static int fsck_msg_type(enum fsck_msg_id msg_id,
 	return options->msg_type[msg_id];
 }
 
-static int parse_msg_type(const char *str)
+static enum fsck_msg_type parse_msg_type(const char *str)
 {
 	if (!strcmp(str, "error"))
 		return FSCK_ERROR;
@@ -203,7 +200,8 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type)
 void fsck_set_msg_type(struct fsck_options *options,
 		       const char *msg_id_str, const char *msg_type_str)
 {
-	int msg_id = parse_msg_id(msg_id_str), msg_type;
+	int msg_id = parse_msg_id(msg_id_str);
+	enum fsck_msg_type msg_type;
 
 	if (msg_id < 0)
 		die("Unhandled message id: %s", msg_id_str);
@@ -214,7 +212,7 @@ void fsck_set_msg_type(struct fsck_options *options,
 
 	if (!options->msg_type) {
 		int i;
-		int *severity;
+		enum fsck_msg_type *severity;
 		ALLOC_ARRAY(severity, FSCK_MSG_MAX);
 		for (i = 0; i < FSCK_MSG_MAX; i++)
 			severity[i] = fsck_msg_type(i, options);
@@ -275,7 +273,8 @@ static int report(struct fsck_options *options,
 {
 	va_list ap;
 	struct strbuf sb = STRBUF_INIT;
-	int msg_type = fsck_msg_type(msg_id, options), result;
+	enum fsck_msg_type msg_type = fsck_msg_type(msg_id, options);
+	int result;
 
 	if (msg_type == FSCK_IGNORE)
 		return 0;
@@ -1247,7 +1246,7 @@ int fsck_object(struct object *obj, void *data, unsigned long size,
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid,
 			enum object_type object_type,
-			int msg_type, const char *message)
+			enum fsck_msg_type msg_type, const char *message)
 {
 	if (msg_type == FSCK_WARN) {
 		warning("object %s: %s", fsck_describe_object(o, oid), message);
diff --git a/fsck.h b/fsck.h
index 30a3acabc50..baf37620760 100644
--- a/fsck.h
+++ b/fsck.h
@@ -3,9 +3,13 @@
 
 #include "oidset.h"
 
-#define FSCK_ERROR 1
-#define FSCK_WARN 2
-#define FSCK_IGNORE 3
+enum fsck_msg_type {
+	FSCK_INFO  = -2,
+	FSCK_FATAL = -1,
+	FSCK_ERROR = 1,
+	FSCK_WARN,
+	FSCK_IGNORE
+};
 
 struct fsck_options;
 struct object;
@@ -29,17 +33,17 @@ typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type,
 /* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
 typedef int (*fsck_error)(struct fsck_options *o,
 			  const struct object_id *oid, enum object_type object_type,
-			  int msg_type, const char *message);
+			  enum fsck_msg_type msg_type, const char *message);
 
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid, enum object_type object_type,
-			int msg_type, const char *message);
+			enum fsck_msg_type msg_type, const char *message);
 
 struct fsck_options {
 	fsck_walk_func walk;
 	fsck_error error_func;
 	unsigned strict:1;
-	int *msg_type;
+	enum fsck_msg_type *msg_type;
 	struct oidset skiplist;
 	kh_oid_map_t *object_names;
 };
-- 
2.31.1.445.g087790d4945


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 09/19] fsck.h: re-order and re-assign "enum fsck_msg_type"
  2021-03-28 13:15                     ` [PATCH v6 " Ævar Arnfjörð Bjarmason
                                         ` (7 preceding siblings ...)
  2021-03-28 13:15                       ` [PATCH v6 08/19] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Ævar Arnfjörð Bjarmason
@ 2021-03-28 13:15                       ` Ævar Arnfjörð Bjarmason
  2021-03-28 13:15                       ` [PATCH v6 10/19] fsck.c: call parse_msg_type() early in fsck_set_msg_type() Ævar Arnfjörð Bjarmason
                                         ` (10 subsequent siblings)
  19 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28 13:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Change the values in the "enum fsck_msg_type" from being manually
assigned to using default C enum values.

This means we end up with a FSCK_IGNORE=0, which was previously
defined as "2".

I'm confident that nothing relies on these values, we always compare
them for equality. Let's not omit "0" so it won't be assumed that
we're using these as a boolean somewhere.

This also allows us to re-structure the fields to mark which are
"private" v.s. "public". See the preceding commit for a rationale for
not simply splitting these into two enums, namely that this is used
for both the private and public fsck API.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.h | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fsck.h b/fsck.h
index baf37620760..a7e092d3fb4 100644
--- a/fsck.h
+++ b/fsck.h
@@ -4,11 +4,13 @@
 #include "oidset.h"
 
 enum fsck_msg_type {
-	FSCK_INFO  = -2,
-	FSCK_FATAL = -1,
-	FSCK_ERROR = 1,
+	/* for internal use only */
+	FSCK_IGNORE,
+	FSCK_INFO,
+	FSCK_FATAL,
+	/* "public", fed to e.g. error_func callbacks */
+	FSCK_ERROR,
 	FSCK_WARN,
-	FSCK_IGNORE
 };
 
 struct fsck_options;
-- 
2.31.1.445.g087790d4945


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 10/19] fsck.c: call parse_msg_type() early in fsck_set_msg_type()
  2021-03-28 13:15                     ` [PATCH v6 " Ævar Arnfjörð Bjarmason
                                         ` (8 preceding siblings ...)
  2021-03-28 13:15                       ` [PATCH v6 09/19] fsck.h: re-order and re-assign "enum fsck_msg_type" Ævar Arnfjörð Bjarmason
@ 2021-03-28 13:15                       ` Ævar Arnfjörð Bjarmason
  2021-03-28 13:15                       ` [PATCH v6 11/19] fsck.c: undefine temporary STR macro after use Ævar Arnfjörð Bjarmason
                                         ` (9 subsequent siblings)
  19 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28 13:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

There's no reason to defer the calling of parse_msg_type() until after
we've checked if the "id < 0". This is not a hot codepath, and
parse_msg_type() itself may die on invalid input.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fsck.c b/fsck.c
index c5a81e4ff05..80365e62842 100644
--- a/fsck.c
+++ b/fsck.c
@@ -201,11 +201,10 @@ void fsck_set_msg_type(struct fsck_options *options,
 		       const char *msg_id_str, const char *msg_type_str)
 {
 	int msg_id = parse_msg_id(msg_id_str);
-	enum fsck_msg_type msg_type;
+	enum fsck_msg_type msg_type = parse_msg_type(msg_type_str);
 
 	if (msg_id < 0)
 		die("Unhandled message id: %s", msg_id_str);
-	msg_type = parse_msg_type(msg_type_str);
 
 	if (msg_type != FSCK_ERROR && msg_id_info[msg_id].msg_type == FSCK_FATAL)
 		die("Cannot demote %s to %s", msg_id_str, msg_type_str);
-- 
2.31.1.445.g087790d4945


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 11/19] fsck.c: undefine temporary STR macro after use
  2021-03-28 13:15                     ` [PATCH v6 " Ævar Arnfjörð Bjarmason
                                         ` (9 preceding siblings ...)
  2021-03-28 13:15                       ` [PATCH v6 10/19] fsck.c: call parse_msg_type() early in fsck_set_msg_type() Ævar Arnfjörð Bjarmason
@ 2021-03-28 13:15                       ` Ævar Arnfjörð Bjarmason
  2021-03-28 13:15                       ` [PATCH v6 12/19] fsck.c: give "FOREACH_MSG_ID" a more specific name Ævar Arnfjörð Bjarmason
                                         ` (8 subsequent siblings)
  19 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28 13:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

In f417eed8cde (fsck: provide a function to parse fsck message IDs,
2015-06-22) the "STR" macro was introduced, but that short macro name
was not undefined after use as was done earlier in the same series for
the MSG_ID macro in c99ba492f1c (fsck: introduce identifiers for fsck
messages, 2015-06-22).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fsck.c b/fsck.c
index 80365e62842..1b12e824ef6 100644
--- a/fsck.c
+++ b/fsck.c
@@ -100,6 +100,7 @@ static struct {
 	{ NULL, NULL, NULL, -1 }
 };
 #undef MSG_ID
+#undef STR
 
 static void prepare_msg_ids(void)
 {
-- 
2.31.1.445.g087790d4945


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 12/19] fsck.c: give "FOREACH_MSG_ID" a more specific name
  2021-03-28 13:15                     ` [PATCH v6 " Ævar Arnfjörð Bjarmason
                                         ` (10 preceding siblings ...)
  2021-03-28 13:15                       ` [PATCH v6 11/19] fsck.c: undefine temporary STR macro after use Ævar Arnfjörð Bjarmason
@ 2021-03-28 13:15                       ` Ævar Arnfjörð Bjarmason
  2021-03-28 13:15                       ` [PATCH v6 13/19] fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h Ævar Arnfjörð Bjarmason
                                         ` (7 subsequent siblings)
  19 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28 13:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Rename the FOREACH_MSG_ID macro to FOREACH_FSCK_MSG_ID in preparation
for moving it over to fsck.h. It's good convention to name macros
in *.h files in such a way as to clearly not clash with any other
names in other files.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fsck.c b/fsck.c
index 1b12e824ef6..31c9088e3f7 100644
--- a/fsck.c
+++ b/fsck.c
@@ -22,7 +22,7 @@
 static struct oidset gitmodules_found = OIDSET_INIT;
 static struct oidset gitmodules_done = OIDSET_INIT;
 
-#define FOREACH_MSG_ID(FUNC) \
+#define FOREACH_FSCK_MSG_ID(FUNC) \
 	/* fatal errors */ \
 	FUNC(NUL_IN_HEADER, FATAL) \
 	FUNC(UNTERMINATED_HEADER, FATAL) \
@@ -83,7 +83,7 @@ static struct oidset gitmodules_done = OIDSET_INIT;
 
 #define MSG_ID(id, msg_type) FSCK_MSG_##id,
 enum fsck_msg_id {
-	FOREACH_MSG_ID(MSG_ID)
+	FOREACH_FSCK_MSG_ID(MSG_ID)
 	FSCK_MSG_MAX
 };
 #undef MSG_ID
@@ -96,7 +96,7 @@ static struct {
 	const char *camelcased;
 	enum fsck_msg_type msg_type;
 } msg_id_info[FSCK_MSG_MAX + 1] = {
-	FOREACH_MSG_ID(MSG_ID)
+	FOREACH_FSCK_MSG_ID(MSG_ID)
 	{ NULL, NULL, NULL, -1 }
 };
 #undef MSG_ID
-- 
2.31.1.445.g087790d4945


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 13/19] fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h
  2021-03-28 13:15                     ` [PATCH v6 " Ævar Arnfjörð Bjarmason
                                         ` (11 preceding siblings ...)
  2021-03-28 13:15                       ` [PATCH v6 12/19] fsck.c: give "FOREACH_MSG_ID" a more specific name Ævar Arnfjörð Bjarmason
@ 2021-03-28 13:15                       ` Ævar Arnfjörð Bjarmason
  2021-03-28 13:15                       ` [PATCH v6 14/19] fsck.c: pass along the fsck_msg_id in the fsck_error callback Ævar Arnfjörð Bjarmason
                                         ` (6 subsequent siblings)
  19 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28 13:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Move the FOREACH_FSCK_MSG_ID macro and the fsck_msg_id enum it helps
define from fsck.c to fsck.h. This is in preparation for having
non-static functions take the fsck_msg_id as an argument.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 66 ----------------------------------------------------------
 fsck.h | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 66 insertions(+), 66 deletions(-)

diff --git a/fsck.c b/fsck.c
index 31c9088e3f7..150fe467e43 100644
--- a/fsck.c
+++ b/fsck.c
@@ -22,72 +22,6 @@
 static struct oidset gitmodules_found = OIDSET_INIT;
 static struct oidset gitmodules_done = OIDSET_INIT;
 
-#define FOREACH_FSCK_MSG_ID(FUNC) \
-	/* fatal errors */ \
-	FUNC(NUL_IN_HEADER, FATAL) \
-	FUNC(UNTERMINATED_HEADER, FATAL) \
-	/* errors */ \
-	FUNC(BAD_DATE, ERROR) \
-	FUNC(BAD_DATE_OVERFLOW, ERROR) \
-	FUNC(BAD_EMAIL, ERROR) \
-	FUNC(BAD_NAME, ERROR) \
-	FUNC(BAD_OBJECT_SHA1, ERROR) \
-	FUNC(BAD_PARENT_SHA1, ERROR) \
-	FUNC(BAD_TAG_OBJECT, ERROR) \
-	FUNC(BAD_TIMEZONE, ERROR) \
-	FUNC(BAD_TREE, ERROR) \
-	FUNC(BAD_TREE_SHA1, ERROR) \
-	FUNC(BAD_TYPE, ERROR) \
-	FUNC(DUPLICATE_ENTRIES, ERROR) \
-	FUNC(MISSING_AUTHOR, ERROR) \
-	FUNC(MISSING_COMMITTER, ERROR) \
-	FUNC(MISSING_EMAIL, ERROR) \
-	FUNC(MISSING_NAME_BEFORE_EMAIL, ERROR) \
-	FUNC(MISSING_OBJECT, ERROR) \
-	FUNC(MISSING_SPACE_BEFORE_DATE, ERROR) \
-	FUNC(MISSING_SPACE_BEFORE_EMAIL, ERROR) \
-	FUNC(MISSING_TAG, ERROR) \
-	FUNC(MISSING_TAG_ENTRY, ERROR) \
-	FUNC(MISSING_TREE, ERROR) \
-	FUNC(MISSING_TREE_OBJECT, ERROR) \
-	FUNC(MISSING_TYPE, ERROR) \
-	FUNC(MISSING_TYPE_ENTRY, ERROR) \
-	FUNC(MULTIPLE_AUTHORS, ERROR) \
-	FUNC(TREE_NOT_SORTED, ERROR) \
-	FUNC(UNKNOWN_TYPE, ERROR) \
-	FUNC(ZERO_PADDED_DATE, ERROR) \
-	FUNC(GITMODULES_MISSING, ERROR) \
-	FUNC(GITMODULES_BLOB, ERROR) \
-	FUNC(GITMODULES_LARGE, ERROR) \
-	FUNC(GITMODULES_NAME, ERROR) \
-	FUNC(GITMODULES_SYMLINK, ERROR) \
-	FUNC(GITMODULES_URL, ERROR) \
-	FUNC(GITMODULES_PATH, ERROR) \
-	FUNC(GITMODULES_UPDATE, ERROR) \
-	/* warnings */ \
-	FUNC(BAD_FILEMODE, WARN) \
-	FUNC(EMPTY_NAME, WARN) \
-	FUNC(FULL_PATHNAME, WARN) \
-	FUNC(HAS_DOT, WARN) \
-	FUNC(HAS_DOTDOT, WARN) \
-	FUNC(HAS_DOTGIT, WARN) \
-	FUNC(NULL_SHA1, WARN) \
-	FUNC(ZERO_PADDED_FILEMODE, WARN) \
-	FUNC(NUL_IN_COMMIT, WARN) \
-	/* infos (reported as warnings, but ignored by default) */ \
-	FUNC(GITMODULES_PARSE, INFO) \
-	FUNC(BAD_TAG_NAME, INFO) \
-	FUNC(MISSING_TAGGER_ENTRY, INFO) \
-	/* ignored (elevated when requested) */ \
-	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
-
-#define MSG_ID(id, msg_type) FSCK_MSG_##id,
-enum fsck_msg_id {
-	FOREACH_FSCK_MSG_ID(MSG_ID)
-	FSCK_MSG_MAX
-};
-#undef MSG_ID
-
 #define STR(x) #x
 #define MSG_ID(id, msg_type) { STR(id), NULL, NULL, FSCK_##msg_type },
 static struct {
diff --git a/fsck.h b/fsck.h
index a7e092d3fb4..66c4a71139a 100644
--- a/fsck.h
+++ b/fsck.h
@@ -13,6 +13,72 @@ enum fsck_msg_type {
 	FSCK_WARN,
 };
 
+#define FOREACH_FSCK_MSG_ID(FUNC) \
+	/* fatal errors */ \
+	FUNC(NUL_IN_HEADER, FATAL) \
+	FUNC(UNTERMINATED_HEADER, FATAL) \
+	/* errors */ \
+	FUNC(BAD_DATE, ERROR) \
+	FUNC(BAD_DATE_OVERFLOW, ERROR) \
+	FUNC(BAD_EMAIL, ERROR) \
+	FUNC(BAD_NAME, ERROR) \
+	FUNC(BAD_OBJECT_SHA1, ERROR) \
+	FUNC(BAD_PARENT_SHA1, ERROR) \
+	FUNC(BAD_TAG_OBJECT, ERROR) \
+	FUNC(BAD_TIMEZONE, ERROR) \
+	FUNC(BAD_TREE, ERROR) \
+	FUNC(BAD_TREE_SHA1, ERROR) \
+	FUNC(BAD_TYPE, ERROR) \
+	FUNC(DUPLICATE_ENTRIES, ERROR) \
+	FUNC(MISSING_AUTHOR, ERROR) \
+	FUNC(MISSING_COMMITTER, ERROR) \
+	FUNC(MISSING_EMAIL, ERROR) \
+	FUNC(MISSING_NAME_BEFORE_EMAIL, ERROR) \
+	FUNC(MISSING_OBJECT, ERROR) \
+	FUNC(MISSING_SPACE_BEFORE_DATE, ERROR) \
+	FUNC(MISSING_SPACE_BEFORE_EMAIL, ERROR) \
+	FUNC(MISSING_TAG, ERROR) \
+	FUNC(MISSING_TAG_ENTRY, ERROR) \
+	FUNC(MISSING_TREE, ERROR) \
+	FUNC(MISSING_TREE_OBJECT, ERROR) \
+	FUNC(MISSING_TYPE, ERROR) \
+	FUNC(MISSING_TYPE_ENTRY, ERROR) \
+	FUNC(MULTIPLE_AUTHORS, ERROR) \
+	FUNC(TREE_NOT_SORTED, ERROR) \
+	FUNC(UNKNOWN_TYPE, ERROR) \
+	FUNC(ZERO_PADDED_DATE, ERROR) \
+	FUNC(GITMODULES_MISSING, ERROR) \
+	FUNC(GITMODULES_BLOB, ERROR) \
+	FUNC(GITMODULES_LARGE, ERROR) \
+	FUNC(GITMODULES_NAME, ERROR) \
+	FUNC(GITMODULES_SYMLINK, ERROR) \
+	FUNC(GITMODULES_URL, ERROR) \
+	FUNC(GITMODULES_PATH, ERROR) \
+	FUNC(GITMODULES_UPDATE, ERROR) \
+	/* warnings */ \
+	FUNC(BAD_FILEMODE, WARN) \
+	FUNC(EMPTY_NAME, WARN) \
+	FUNC(FULL_PATHNAME, WARN) \
+	FUNC(HAS_DOT, WARN) \
+	FUNC(HAS_DOTDOT, WARN) \
+	FUNC(HAS_DOTGIT, WARN) \
+	FUNC(NULL_SHA1, WARN) \
+	FUNC(ZERO_PADDED_FILEMODE, WARN) \
+	FUNC(NUL_IN_COMMIT, WARN) \
+	/* infos (reported as warnings, but ignored by default) */ \
+	FUNC(GITMODULES_PARSE, INFO) \
+	FUNC(BAD_TAG_NAME, INFO) \
+	FUNC(MISSING_TAGGER_ENTRY, INFO) \
+	/* ignored (elevated when requested) */ \
+	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
+
+#define MSG_ID(id, msg_type) FSCK_MSG_##id,
+enum fsck_msg_id {
+	FOREACH_FSCK_MSG_ID(MSG_ID)
+	FSCK_MSG_MAX
+};
+#undef MSG_ID
+
 struct fsck_options;
 struct object;
 
-- 
2.31.1.445.g087790d4945


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 14/19] fsck.c: pass along the fsck_msg_id in the fsck_error callback
  2021-03-28 13:15                     ` [PATCH v6 " Ævar Arnfjörð Bjarmason
                                         ` (12 preceding siblings ...)
  2021-03-28 13:15                       ` [PATCH v6 13/19] fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h Ævar Arnfjörð Bjarmason
@ 2021-03-28 13:15                       ` Ævar Arnfjörð Bjarmason
  2021-03-28 13:15                       ` [PATCH v6 15/19] fsck.c: add an fsck_set_msg_type() API that takes enums Ævar Arnfjörð Bjarmason
                                         ` (5 subsequent siblings)
  19 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28 13:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Change the fsck_error callback to also pass along the
fsck_msg_id. Before this change the only way to get the message id was
to parse it back out of the "message".

Let's pass it down explicitly for the benefit of callers that might
want to use it, as discussed in [1].

Passing the msg_type is now redundant, as you can always get it back
from the msg_id, but I'm not changing that convention. It's really
common to need the msg_type, and the report() function itself (which
calls "fsck_error") needs to call fsck_msg_type() to discover
it. Let's not needlessly re-do that work in the user callback.

1. https://lore.kernel.org/git/87blcja2ha.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c       | 4 +++-
 builtin/index-pack.c | 3 ++-
 builtin/mktag.c      | 1 +
 fsck.c               | 6 ++++--
 fsck.h               | 6 ++++--
 5 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 17940a4e24a..70ff95837ae 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -84,7 +84,9 @@ static int objerror(struct object *obj, const char *err)
 static int fsck_error_func(struct fsck_options *o,
 			   const struct object_id *oid,
 			   enum object_type object_type,
-			   enum fsck_msg_type msg_type, const char *message)
+			   enum fsck_msg_type msg_type,
+			   enum fsck_msg_id msg_id,
+			   const char *message)
 {
 	switch (msg_type) {
 	case FSCK_WARN:
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 8338b832b63..2f93957fb5e 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1717,6 +1717,7 @@ static int print_dangling_gitmodules(struct fsck_options *o,
 				     const struct object_id *oid,
 				     enum object_type object_type,
 				     enum fsck_msg_type msg_type,
+				     enum fsck_msg_id msg_id,
 				     const char *message)
 {
 	/*
@@ -1727,7 +1728,7 @@ static int print_dangling_gitmodules(struct fsck_options *o,
 		printf("%s\n", oid_to_hex(oid));
 		return 0;
 	}
-	return fsck_error_function(o, oid, object_type, msg_type, message);
+	return fsck_error_function(o, oid, object_type, msg_type, msg_id, message);
 }
 
 int cmd_index_pack(int argc, const char **argv, const char *prefix)
diff --git a/builtin/mktag.c b/builtin/mktag.c
index 052a510ad7f..96e63bc772a 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -18,6 +18,7 @@ static int mktag_fsck_error_func(struct fsck_options *o,
 				 const struct object_id *oid,
 				 enum object_type object_type,
 				 enum fsck_msg_type msg_type,
+				 enum fsck_msg_id msg_id,
 				 const char *message)
 {
 	switch (msg_type) {
diff --git a/fsck.c b/fsck.c
index 150fe467e43..23a77fe2e0f 100644
--- a/fsck.c
+++ b/fsck.c
@@ -227,7 +227,7 @@ static int report(struct fsck_options *options,
 	va_start(ap, fmt);
 	strbuf_vaddf(&sb, fmt, ap);
 	result = options->error_func(options, oid, object_type,
-				     msg_type, sb.buf);
+				     msg_type, msg_id, sb.buf);
 	strbuf_release(&sb);
 	va_end(ap);
 
@@ -1180,7 +1180,9 @@ int fsck_object(struct object *obj, void *data, unsigned long size,
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid,
 			enum object_type object_type,
-			enum fsck_msg_type msg_type, const char *message)
+			enum fsck_msg_type msg_type,
+			enum fsck_msg_id msg_id,
+			const char *message)
 {
 	if (msg_type == FSCK_WARN) {
 		warning("object %s: %s", fsck_describe_object(o, oid), message);
diff --git a/fsck.h b/fsck.h
index 66c4a71139a..fa2d4955ab3 100644
--- a/fsck.h
+++ b/fsck.h
@@ -101,11 +101,13 @@ typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type,
 /* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
 typedef int (*fsck_error)(struct fsck_options *o,
 			  const struct object_id *oid, enum object_type object_type,
-			  enum fsck_msg_type msg_type, const char *message);
+			  enum fsck_msg_type msg_type, enum fsck_msg_id msg_id,
+			  const char *message);
 
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid, enum object_type object_type,
-			enum fsck_msg_type msg_type, const char *message);
+			enum fsck_msg_type msg_type, enum fsck_msg_id msg_id,
+			const char *message);
 
 struct fsck_options {
 	fsck_walk_func walk;
-- 
2.31.1.445.g087790d4945


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 15/19] fsck.c: add an fsck_set_msg_type() API that takes enums
  2021-03-28 13:15                     ` [PATCH v6 " Ævar Arnfjörð Bjarmason
                                         ` (13 preceding siblings ...)
  2021-03-28 13:15                       ` [PATCH v6 14/19] fsck.c: pass along the fsck_msg_id in the fsck_error callback Ævar Arnfjörð Bjarmason
@ 2021-03-28 13:15                       ` Ævar Arnfjörð Bjarmason
  2021-03-28 13:15                       ` [PATCH v6 16/19] fsck.c: move gitmodules_{found,done} into fsck_options Ævar Arnfjörð Bjarmason
                                         ` (4 subsequent siblings)
  19 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28 13:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Change code I added in acf9de4c94e (mktag: use fsck instead of custom
verify_tag(), 2021-01-05) to make use of a new API function that takes
the fsck_msg_{id,type} types, instead of arbitrary strings that
we'll (hopefully) parse into those types.

At the time that the fsck_set_msg_type() API was introduced in
0282f4dced0 (fsck: offer a function to demote fsck errors to warnings,
2015-06-22) it was only intended to be used to parse user-supplied
data.

For things that are purely internal to the C code it makes sense to
have the compiler check these arguments, and to skip the sanity
checking of the data in fsck_set_msg_type() which is redundant to
checks we get from the compiler.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/mktag.c |  3 ++-
 fsck.c          | 27 +++++++++++++++++----------
 fsck.h          |  3 +++
 3 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/builtin/mktag.c b/builtin/mktag.c
index 96e63bc772a..dddcccdd368 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -88,7 +88,8 @@ int cmd_mktag(int argc, const char **argv, const char *prefix)
 		die_errno(_("could not read from stdin"));
 
 	fsck_options.error_func = mktag_fsck_error_func;
-	fsck_set_msg_type(&fsck_options, "extraheaderentry", "warn");
+	fsck_set_msg_type_from_ids(&fsck_options, FSCK_MSG_EXTRA_HEADER_ENTRY,
+				   FSCK_WARN);
 	/* config might set fsck.extraHeaderEntry=* again */
 	git_config(git_fsck_config, &fsck_options);
 	if (fsck_tag_standalone(NULL, buf.buf, buf.len, &fsck_options,
diff --git a/fsck.c b/fsck.c
index 23a77fe2e0f..a59832a1650 100644
--- a/fsck.c
+++ b/fsck.c
@@ -132,6 +132,22 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type)
 	return 1;
 }
 
+void fsck_set_msg_type_from_ids(struct fsck_options *options,
+				enum fsck_msg_id msg_id,
+				enum fsck_msg_type msg_type)
+{
+	if (!options->msg_type) {
+		int i;
+		enum fsck_msg_type *severity;
+		ALLOC_ARRAY(severity, FSCK_MSG_MAX);
+		for (i = 0; i < FSCK_MSG_MAX; i++)
+			severity[i] = fsck_msg_type(i, options);
+		options->msg_type = severity;
+	}
+
+	options->msg_type[msg_id] = msg_type;
+}
+
 void fsck_set_msg_type(struct fsck_options *options,
 		       const char *msg_id_str, const char *msg_type_str)
 {
@@ -144,16 +160,7 @@ void fsck_set_msg_type(struct fsck_options *options,
 	if (msg_type != FSCK_ERROR && msg_id_info[msg_id].msg_type == FSCK_FATAL)
 		die("Cannot demote %s to %s", msg_id_str, msg_type_str);
 
-	if (!options->msg_type) {
-		int i;
-		enum fsck_msg_type *severity;
-		ALLOC_ARRAY(severity, FSCK_MSG_MAX);
-		for (i = 0; i < FSCK_MSG_MAX; i++)
-			severity[i] = fsck_msg_type(i, options);
-		options->msg_type = severity;
-	}
-
-	options->msg_type[msg_id] = msg_type;
+	fsck_set_msg_type_from_ids(options, msg_id, msg_type);
 }
 
 void fsck_set_msg_types(struct fsck_options *options, const char *values)
diff --git a/fsck.h b/fsck.h
index fa2d4955ab3..d284bac3614 100644
--- a/fsck.h
+++ b/fsck.h
@@ -82,6 +82,9 @@ enum fsck_msg_id {
 struct fsck_options;
 struct object;
 
+void fsck_set_msg_type_from_ids(struct fsck_options *options,
+				enum fsck_msg_id msg_id,
+				enum fsck_msg_type msg_type);
 void fsck_set_msg_type(struct fsck_options *options,
 		       const char *msg_id, const char *msg_type);
 void fsck_set_msg_types(struct fsck_options *options, const char *values);
-- 
2.31.1.445.g087790d4945


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 16/19] fsck.c: move gitmodules_{found,done} into fsck_options
  2021-03-28 13:15                     ` [PATCH v6 " Ævar Arnfjörð Bjarmason
                                         ` (14 preceding siblings ...)
  2021-03-28 13:15                       ` [PATCH v6 15/19] fsck.c: add an fsck_set_msg_type() API that takes enums Ævar Arnfjörð Bjarmason
@ 2021-03-28 13:15                       ` Ævar Arnfjörð Bjarmason
  2021-03-28 13:15                       ` [PATCH v6 17/19] fetch-pack: don't needlessly copy fsck_options Ævar Arnfjörð Bjarmason
                                         ` (3 subsequent siblings)
  19 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28 13:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Move the gitmodules_{found,done} static variables added in
159e7b080bf (fsck: detect gitmodules files, 2018-05-02) into the
fsck_options struct. It makes sense to keep all the context in the
same place.

This requires changing the recently added register_found_gitmodules()
function added in 5476e1efde (fetch-pack: print and use dangling
.gitmodules, 2021-02-22) to take fsck_options. That function will be
removed in a subsequent commit, but as it'll require the new
gitmodules_found attribute of "fsck_options" we need this intermediate
step first.

An earlier version of this patch removed the small amount of
duplication we now have between FSCK_OPTIONS_{DEFAULT,STRICT} with a
FSCK_OPTIONS_COMMON macro. I don't think such de-duplication is worth
it for this amount of copy/pasting.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fetch-pack.c |  2 +-
 fsck.c       | 23 ++++++++++-------------
 fsck.h       |  9 ++++++++-
 3 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index fb04a76ca26..0f898a5ae14 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -998,7 +998,7 @@ static void fsck_gitmodules_oids(struct oidset *gitmodules_oids)
 
 	oidset_iter_init(gitmodules_oids, &iter);
 	while ((oid = oidset_iter_next(&iter)))
-		register_found_gitmodules(oid);
+		register_found_gitmodules(&fo, oid);
 	if (fsck_finish(&fo))
 		die("fsck failed");
 }
diff --git a/fsck.c b/fsck.c
index a59832a1650..642bd2ef9da 100644
--- a/fsck.c
+++ b/fsck.c
@@ -19,9 +19,6 @@
 #include "credential.h"
 #include "help.h"
 
-static struct oidset gitmodules_found = OIDSET_INIT;
-static struct oidset gitmodules_done = OIDSET_INIT;
-
 #define STR(x) #x
 #define MSG_ID(id, msg_type) { STR(id), NULL, NULL, FSCK_##msg_type },
 static struct {
@@ -606,7 +603,7 @@ static int fsck_tree(const struct object_id *oid,
 
 		if (is_hfs_dotgitmodules(name) || is_ntfs_dotgitmodules(name)) {
 			if (!S_ISLNK(mode))
-				oidset_insert(&gitmodules_found, oid);
+				oidset_insert(&options->gitmodules_found, oid);
 			else
 				retval += report(options,
 						 oid, OBJ_TREE,
@@ -620,7 +617,7 @@ static int fsck_tree(const struct object_id *oid,
 				has_dotgit |= is_ntfs_dotgit(backslash);
 				if (is_ntfs_dotgitmodules(backslash)) {
 					if (!S_ISLNK(mode))
-						oidset_insert(&gitmodules_found, oid);
+						oidset_insert(&options->gitmodules_found, oid);
 					else
 						retval += report(options, oid, OBJ_TREE,
 								 FSCK_MSG_GITMODULES_SYMLINK,
@@ -1132,9 +1129,9 @@ static int fsck_blob(const struct object_id *oid, const char *buf,
 	struct fsck_gitmodules_data data;
 	struct config_options config_opts = { 0 };
 
-	if (!oidset_contains(&gitmodules_found, oid))
+	if (!oidset_contains(&options->gitmodules_found, oid))
 		return 0;
-	oidset_insert(&gitmodules_done, oid);
+	oidset_insert(&options->gitmodules_done, oid);
 
 	if (object_on_skiplist(options, oid))
 		return 0;
@@ -1199,9 +1196,9 @@ int fsck_error_function(struct fsck_options *o,
 	return 1;
 }
 
-void register_found_gitmodules(const struct object_id *oid)
+void register_found_gitmodules(struct fsck_options *options, const struct object_id *oid)
 {
-	oidset_insert(&gitmodules_found, oid);
+	oidset_insert(&options->gitmodules_found, oid);
 }
 
 int fsck_finish(struct fsck_options *options)
@@ -1210,13 +1207,13 @@ int fsck_finish(struct fsck_options *options)
 	struct oidset_iter iter;
 	const struct object_id *oid;
 
-	oidset_iter_init(&gitmodules_found, &iter);
+	oidset_iter_init(&options->gitmodules_found, &iter);
 	while ((oid = oidset_iter_next(&iter))) {
 		enum object_type type;
 		unsigned long size;
 		char *buf;
 
-		if (oidset_contains(&gitmodules_done, oid))
+		if (oidset_contains(&options->gitmodules_done, oid))
 			continue;
 
 		buf = read_object_file(oid, &type, &size);
@@ -1241,8 +1238,8 @@ int fsck_finish(struct fsck_options *options)
 	}
 
 
-	oidset_clear(&gitmodules_found);
-	oidset_clear(&gitmodules_done);
+	oidset_clear(&options->gitmodules_found);
+	oidset_clear(&options->gitmodules_done);
 	return ret;
 }
 
diff --git a/fsck.h b/fsck.h
index d284bac3614..e20f9bcb394 100644
--- a/fsck.h
+++ b/fsck.h
@@ -118,15 +118,21 @@ struct fsck_options {
 	unsigned strict:1;
 	enum fsck_msg_type *msg_type;
 	struct oidset skiplist;
+	struct oidset gitmodules_found;
+	struct oidset gitmodules_done;
 	kh_oid_map_t *object_names;
 };
 
 #define FSCK_OPTIONS_DEFAULT { \
 	.skiplist = OIDSET_INIT, \
+	.gitmodules_found = OIDSET_INIT, \
+	.gitmodules_done = OIDSET_INIT, \
 	.error_func = fsck_error_function \
 }
 #define FSCK_OPTIONS_STRICT { \
 	.strict = 1, \
+	.gitmodules_found = OIDSET_INIT, \
+	.gitmodules_done = OIDSET_INIT, \
 	.error_func = fsck_error_function, \
 }
 
@@ -146,7 +152,8 @@ int fsck_walk(struct object *obj, void *data, struct fsck_options *options);
 int fsck_object(struct object *obj, void *data, unsigned long size,
 	struct fsck_options *options);
 
-void register_found_gitmodules(const struct object_id *oid);
+void register_found_gitmodules(struct fsck_options *options,
+			       const struct object_id *oid);
 
 /*
  * fsck a tag, and pass info about it back to the caller. This is
-- 
2.31.1.445.g087790d4945


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 17/19] fetch-pack: don't needlessly copy fsck_options
  2021-03-28 13:15                     ` [PATCH v6 " Ævar Arnfjörð Bjarmason
                                         ` (15 preceding siblings ...)
  2021-03-28 13:15                       ` [PATCH v6 16/19] fsck.c: move gitmodules_{found,done} into fsck_options Ævar Arnfjörð Bjarmason
@ 2021-03-28 13:15                       ` Ævar Arnfjörð Bjarmason
  2021-03-28 13:15                       ` [PATCH v6 18/19] fetch-pack: use file-scope static struct for fsck_options Ævar Arnfjörð Bjarmason
                                         ` (2 subsequent siblings)
  19 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28 13:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Change the behavior of the .gitmodules validation added in
5476e1efde (fetch-pack: print and use dangling .gitmodules,
2021-02-22) so we're using one "fsck_options".

I found that code confusing to read. One might think that not setting
up the error_func earlier means that we're relying on the "error_func"
not being set in some code in between the two hunks being modified
here.

But we're not, all we're doing in the rest of "cmd_index_pack()" is
further setup by calling fsck_set_msg_types(), and assigning to
do_fsck_object.

So there was no reason in 5476e1efde to make a shallow copy of the
fsck_options struct before setting error_func. Let's just do this
setup at the top of the function, along with the "walk" assignment.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/index-pack.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 2f93957fb5e..5b7bc3c8947 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1761,6 +1761,7 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 
 	read_replace_refs = 0;
 	fsck_options.walk = mark_link;
+	fsck_options.error_func = print_dangling_gitmodules;
 
 	reset_pack_idx_option(&opts);
 	git_config(git_index_pack_config, &opts);
@@ -1951,13 +1952,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	else
 		close(input_fd);
 
-	if (do_fsck_object) {
-		struct fsck_options fo = fsck_options;
-
-		fo.error_func = print_dangling_gitmodules;
-		if (fsck_finish(&fo))
-			die(_("fsck error in pack objects"));
-	}
+	if (do_fsck_object && fsck_finish(&fsck_options))
+		die(_("fsck error in pack objects"));
 
 	free(objects);
 	strbuf_release(&index_name_buf);
-- 
2.31.1.445.g087790d4945


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 18/19] fetch-pack: use file-scope static struct for fsck_options
  2021-03-28 13:15                     ` [PATCH v6 " Ævar Arnfjörð Bjarmason
                                         ` (16 preceding siblings ...)
  2021-03-28 13:15                       ` [PATCH v6 17/19] fetch-pack: don't needlessly copy fsck_options Ævar Arnfjörð Bjarmason
@ 2021-03-28 13:15                       ` Ævar Arnfjörð Bjarmason
  2021-03-28 13:15                       ` [PATCH v6 19/19] fetch-pack: use new fsck API to printing dangling submodules Ævar Arnfjörð Bjarmason
  2021-03-29  2:06                       ` [PATCH v6 00/19] fsck: API improvements Junio C Hamano
  19 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28 13:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Change code added in 5476e1efde (fetch-pack: print and use dangling
.gitmodules, 2021-02-22) so that we use a file-scoped "static struct
fsck_options" instead of defining one in the "fsck_gitmodules_oids()"
function.

We use this pattern in all of
builtin/{fsck,index-pack,mktag,unpack-objects}.c. It's odd to see
fetch-pack be the odd one out. One might think that we're using other
fsck_options structs in fetch-pack, or doing on fsck twice there, but
we're not.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fetch-pack.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 0f898a5ae14..4ec10a15852 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -38,6 +38,7 @@ static int server_supports_filtering;
 static int advertise_sid;
 static struct shallow_lock shallow_lock;
 static const char *alternate_shallow_file;
+static struct fsck_options fsck_options = FSCK_OPTIONS_STRICT;
 static struct strbuf fsck_msg_types = STRBUF_INIT;
 static struct string_list uri_protocols = STRING_LIST_INIT_DUP;
 
@@ -991,15 +992,14 @@ static void fsck_gitmodules_oids(struct oidset *gitmodules_oids)
 {
 	struct oidset_iter iter;
 	const struct object_id *oid;
-	struct fsck_options fo = FSCK_OPTIONS_STRICT;
 
 	if (!oidset_size(gitmodules_oids))
 		return;
 
 	oidset_iter_init(gitmodules_oids, &iter);
 	while ((oid = oidset_iter_next(&iter)))
-		register_found_gitmodules(&fo, oid);
-	if (fsck_finish(&fo))
+		register_found_gitmodules(&fsck_options, oid);
+	if (fsck_finish(&fsck_options))
 		die("fsck failed");
 }
 
-- 
2.31.1.445.g087790d4945


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 19/19] fetch-pack: use new fsck API to printing dangling submodules
  2021-03-28 13:15                     ` [PATCH v6 " Ævar Arnfjörð Bjarmason
                                         ` (17 preceding siblings ...)
  2021-03-28 13:15                       ` [PATCH v6 18/19] fetch-pack: use file-scope static struct for fsck_options Ævar Arnfjörð Bjarmason
@ 2021-03-28 13:15                       ` Ævar Arnfjörð Bjarmason
  2021-03-29  2:06                       ` [PATCH v6 00/19] fsck: API improvements Junio C Hamano
  19 siblings, 0 replies; 229+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28 13:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

Refactor the check added in 5476e1efde (fetch-pack: print and use
dangling .gitmodules, 2021-02-22) to make use of us now passing the
"msg_id" to the user defined "error_func". We can now compare against
the FSCK_MSG_GITMODULES_MISSING instead of parsing the generated
message.

Let's also replace register_found_gitmodules() with directly
manipulating the "gitmodules_found" member. A recent commit moved it
into "fsck_options" so we could do this here.

I'm sticking this callback in fsck.c. Perhaps in the future we'd like
to accumulate such callbacks into another file (maybe fsck-cb.c,
similar to parse-options-cb.c?), but while we've got just the one
let's just put it into fsck.c.

A better alternative in this case would be some library some more
obvious library shared by fetch-pack.c ad builtin/index-pack.c, but
there isn't such a thing.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/index-pack.c | 21 +--------------------
 fetch-pack.c         | 31 ++++++++-----------------------
 fsck.c               | 23 ++++++++++++++++++-----
 fsck.h               | 15 ++++++++++++---
 4 files changed, 39 insertions(+), 51 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 5b7bc3c8947..15507b5cff0 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -120,7 +120,7 @@ static int nr_threads;
 static int from_stdin;
 static int strict;
 static int do_fsck_object;
-static struct fsck_options fsck_options = FSCK_OPTIONS_STRICT;
+static struct fsck_options fsck_options = FSCK_OPTIONS_MISSING_GITMODULES;
 static int verbose;
 static int show_resolving_progress;
 static int show_stat;
@@ -1713,24 +1713,6 @@ static void show_pack_info(int stat_only)
 	}
 }
 
-static int print_dangling_gitmodules(struct fsck_options *o,
-				     const struct object_id *oid,
-				     enum object_type object_type,
-				     enum fsck_msg_type msg_type,
-				     enum fsck_msg_id msg_id,
-				     const char *message)
-{
-	/*
-	 * NEEDSWORK: Plumb the MSG_ID (from fsck.c) here and use it
-	 * instead of relying on this string check.
-	 */
-	if (starts_with(message, "gitmodulesMissing")) {
-		printf("%s\n", oid_to_hex(oid));
-		return 0;
-	}
-	return fsck_error_function(o, oid, object_type, msg_type, msg_id, message);
-}
-
 int cmd_index_pack(int argc, const char **argv, const char *prefix)
 {
 	int i, fix_thin_pack = 0, verify = 0, stat_only = 0, rev_index;
@@ -1761,7 +1743,6 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 
 	read_replace_refs = 0;
 	fsck_options.walk = mark_link;
-	fsck_options.error_func = print_dangling_gitmodules;
 
 	reset_pack_idx_option(&opts);
 	git_config(git_index_pack_config, &opts);
diff --git a/fetch-pack.c b/fetch-pack.c
index 4ec10a15852..c80eaee7694 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -38,7 +38,7 @@ static int server_supports_filtering;
 static int advertise_sid;
 static struct shallow_lock shallow_lock;
 static const char *alternate_shallow_file;
-static struct fsck_options fsck_options = FSCK_OPTIONS_STRICT;
+static struct fsck_options fsck_options = FSCK_OPTIONS_MISSING_GITMODULES;
 static struct strbuf fsck_msg_types = STRBUF_INIT;
 static struct string_list uri_protocols = STRING_LIST_INIT_DUP;
 
@@ -988,21 +988,6 @@ static int cmp_ref_by_name(const void *a_, const void *b_)
 	return strcmp(a->name, b->name);
 }
 
-static void fsck_gitmodules_oids(struct oidset *gitmodules_oids)
-{
-	struct oidset_iter iter;
-	const struct object_id *oid;
-
-	if (!oidset_size(gitmodules_oids))
-		return;
-
-	oidset_iter_init(gitmodules_oids, &iter);
-	while ((oid = oidset_iter_next(&iter)))
-		register_found_gitmodules(&fsck_options, oid);
-	if (fsck_finish(&fsck_options))
-		die("fsck failed");
-}
-
 static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 				 int fd[2],
 				 const struct ref *orig_ref,
@@ -1017,7 +1002,6 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 	int agent_len;
 	struct fetch_negotiator negotiator_alloc;
 	struct fetch_negotiator *negotiator;
-	struct oidset gitmodules_oids = OIDSET_INIT;
 
 	negotiator = &negotiator_alloc;
 	fetch_negotiator_init(r, negotiator);
@@ -1134,9 +1118,10 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 	else
 		alternate_shallow_file = NULL;
 	if (get_pack(args, fd, pack_lockfiles, NULL, sought, nr_sought,
-		     &gitmodules_oids))
+		     &fsck_options.gitmodules_found))
 		die(_("git fetch-pack: fetch failed."));
-	fsck_gitmodules_oids(&gitmodules_oids);
+	if (fsck_finish(&fsck_options))
+		die("fsck failed");
 
  all_done:
 	if (negotiator)
@@ -1587,7 +1572,6 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 	struct string_list packfile_uris = STRING_LIST_INIT_DUP;
 	int i;
 	struct strvec index_pack_args = STRVEC_INIT;
-	struct oidset gitmodules_oids = OIDSET_INIT;
 
 	negotiator = &negotiator_alloc;
 	fetch_negotiator_init(r, negotiator);
@@ -1678,7 +1662,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 			process_section_header(&reader, "packfile", 0);
 			if (get_pack(args, fd, pack_lockfiles,
 				     packfile_uris.nr ? &index_pack_args : NULL,
-				     sought, nr_sought, &gitmodules_oids))
+				     sought, nr_sought, &fsck_options.gitmodules_found))
 				die(_("git fetch-pack: fetch failed."));
 			do_check_stateless_delimiter(args, &reader);
 
@@ -1721,7 +1705,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 
 		packname[the_hash_algo->hexsz] = '\0';
 
-		parse_gitmodules_oids(cmd.out, &gitmodules_oids);
+		parse_gitmodules_oids(cmd.out, &fsck_options.gitmodules_found);
 
 		close(cmd.out);
 
@@ -1742,7 +1726,8 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 	string_list_clear(&packfile_uris, 0);
 	strvec_clear(&index_pack_args);
 
-	fsck_gitmodules_oids(&gitmodules_oids);
+	if (fsck_finish(&fsck_options))
+		die("fsck failed");
 
 	if (negotiator)
 		negotiator->release(negotiator);
diff --git a/fsck.c b/fsck.c
index 642bd2ef9da..f5ed6a26358 100644
--- a/fsck.c
+++ b/fsck.c
@@ -1196,11 +1196,6 @@ int fsck_error_function(struct fsck_options *o,
 	return 1;
 }
 
-void register_found_gitmodules(struct fsck_options *options, const struct object_id *oid)
-{
-	oidset_insert(&options->gitmodules_found, oid);
-}
-
 int fsck_finish(struct fsck_options *options)
 {
 	int ret = 0;
@@ -1266,3 +1261,21 @@ int git_fsck_config(const char *var, const char *value, void *cb)
 
 	return git_default_config(var, value, cb);
 }
+
+/*
+ * Custom error callbacks that are used in more than one place.
+ */
+
+int fsck_error_cb_print_missing_gitmodules(struct fsck_options *o,
+					   const struct object_id *oid,
+					   enum object_type object_type,
+					   enum fsck_msg_type msg_type,
+					   enum fsck_msg_id msg_id,
+					   const char *message)
+{
+	if (msg_id == FSCK_MSG_GITMODULES_MISSING) {
+		puts(oid_to_hex(oid));
+		return 0;
+	}
+	return fsck_error_function(o, oid, object_type, msg_type, msg_id, message);
+}
diff --git a/fsck.h b/fsck.h
index e20f9bcb394..7202c3c87e8 100644
--- a/fsck.h
+++ b/fsck.h
@@ -111,6 +111,12 @@ int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid, enum object_type object_type,
 			enum fsck_msg_type msg_type, enum fsck_msg_id msg_id,
 			const char *message);
+int fsck_error_cb_print_missing_gitmodules(struct fsck_options *o,
+					   const struct object_id *oid,
+					   enum object_type object_type,
+					   enum fsck_msg_type msg_type,
+					   enum fsck_msg_id msg_id,
+					   const char *message);
 
 struct fsck_options {
 	fsck_walk_func walk;
@@ -135,6 +141,12 @@ struct fsck_options {
 	.gitmodules_done = OIDSET_INIT, \
 	.error_func = fsck_error_function, \
 }
+#define FSCK_OPTIONS_MISSING_GITMODULES { \
+	.strict = 1, \
+	.gitmodules_found = OIDSET_INIT, \
+	.gitmodules_done = OIDSET_INIT, \
+	.error_func = fsck_error_cb_print_missing_gitmodules, \
+}
 
 /* descend in all linked child objects
  * the return value is:
@@ -152,9 +164,6 @@ int fsck_walk(struct object *obj, void *data, struct fsck_options *options);
 int fsck_object(struct object *obj, void *data, unsigned long size,
 	struct fsck_options *options);
 
-void register_found_gitmodules(struct fsck_options *options,
-			       const struct object_id *oid);
-
 /*
  * fsck a tag, and pass info about it back to the caller. This is
  * exposed fsck_object() internals for git-mktag(1).
-- 
2.31.1.445.g087790d4945


^ permalink raw reply related	[flat|nested] 229+ messages in thread

* Re: [PATCH v6 02/19] fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT}
  2021-03-28 13:15                       ` [PATCH v6 02/19] fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT} Ævar Arnfjörð Bjarmason
@ 2021-03-28 17:15                         ` Ramsay Jones
  2021-03-29  2:04                           ` Junio C Hamano
  0 siblings, 1 reply; 229+ messages in thread
From: Ramsay Jones @ 2021-03-28 17:15 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Johannes Schindelin,
	Jonathan Tan, Derrick Stolee

On Sun, Mar 28, 2021 at 03:15:34PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Refactor the definitions of FSCK_OPTIONS_{DEFAULT,STRICT} to use
> designated initializers. This allows us to omit those fields that
> aren't initialized to zero or NULL.

s/aren't/are/

[I apologize in advance - I am using mutt for the first time to reply
to a ML post and I don't know if I should be using L-ist-reply or a
g-roup-reply! :D ]

ATB,
Ramsay Jones


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v6 02/19] fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT}
  2021-03-28 17:15                         ` Ramsay Jones
@ 2021-03-29  2:04                           ` Junio C Hamano
  0 siblings, 0 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-03-29  2:04 UTC (permalink / raw)
  To: Ramsay Jones
  Cc: Ævar Arnfjörð Bjarmason, git, Jeff King,
	Johannes Schindelin, Jonathan Tan, Derrick Stolee

Ramsay Jones <ramsay@ramsayjones.plus.com> writes:

> On Sun, Mar 28, 2021 at 03:15:34PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> Refactor the definitions of FSCK_OPTIONS_{DEFAULT,STRICT} to use
>> designated initializers. This allows us to omit those fields that
>> aren't initialized to zero or NULL.
>
> s/aren't/are/

Thanks; tweak applied while queuing.

> [I apologize in advance - I am using mutt for the first time to reply
> to a ML post and I don't know if I should be using L-ist-reply or a
> g-roup-reply! :D ]

FWIW, on lore (reading via nntp), the message I am responding to
looks just fine.


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v6 00/19] fsck: API improvements
  2021-03-28 13:15                     ` [PATCH v6 " Ævar Arnfjörð Bjarmason
                                         ` (18 preceding siblings ...)
  2021-03-28 13:15                       ` [PATCH v6 19/19] fetch-pack: use new fsck API to printing dangling submodules Ævar Arnfjörð Bjarmason
@ 2021-03-29  2:06                       ` Junio C Hamano
  19 siblings, 0 replies; 229+ messages in thread
From: Junio C Hamano @ 2021-03-29  2:06 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan, Derrick Stolee

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> To recap on the goals in v1[1] this series gets rid of the need to
> have the rececently added "print_dangling_gitmodules" function in
> favor of a better fsck API to get at that information.

Read the whole series afresh, as well as "git diff @{1}" after
replacing to see what changed since the previous round.  Didn't find
anything iffy.

Unless somebody finds improvement opportunities in the coming couple
of days, let's declare it is good enough and merge to 'next',
polishing incrementally if needed.

Thanks.

^ permalink raw reply	[flat|nested] 229+ messages in thread

end of thread, other threads:[~2021-03-29  2:07 UTC | newest]

Thread overview: 229+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-15 23:43 RFC on packfile URIs and .gitmodules check Jonathan Tan
2021-01-16  0:30 ` Junio C Hamano
2021-01-16  3:22   ` Taylor Blau
2021-01-19 12:56     ` Derrick Stolee
2021-01-19 19:13       ` Jonathan Tan
2021-01-20  1:04         ` Junio C Hamano
2021-01-19 19:02     ` Jonathan Tan
2021-01-20  8:07 ` Ævar Arnfjörð Bjarmason
2021-01-20 19:30   ` Jonathan Tan
2021-01-21  3:06     ` Junio C Hamano
2021-01-21 18:32       ` Jonathan Tan
2021-01-21 18:39         ` Junio C Hamano
2021-01-20 19:36   ` [PATCH] Doc: clarify contents of packfile sent as URI Jonathan Tan
2021-01-24  2:34 ` [PATCH 0/4] Check .gitmodules when using packfile URIs Jonathan Tan
2021-01-24  2:34   ` [PATCH 1/4] http: allow custom index-pack args Jonathan Tan
2021-01-24  2:34   ` [PATCH 2/4] http-fetch: " Jonathan Tan
2021-01-24 11:52     ` Ævar Arnfjörð Bjarmason
2021-01-28  0:32       ` Jonathan Tan
2021-02-16 20:49     ` Josh Steadmon
2021-02-16 22:57       ` Junio C Hamano
2021-02-17 19:46         ` Jonathan Tan
2021-01-24  2:34   ` [PATCH 3/4] fetch-pack: with packfile URIs, use index-pack arg Jonathan Tan
2021-01-24  2:34   ` [PATCH 4/4] fetch-pack: print and use dangling .gitmodules Jonathan Tan
2021-01-24  7:56     ` Junio C Hamano
2021-01-26  1:57       ` Junio C Hamano
2021-01-28  1:04         ` Jonathan Tan
2021-01-24 12:18     ` Ævar Arnfjörð Bjarmason
2021-01-28  1:03       ` Jonathan Tan
2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
2021-02-17 21:02             ` Junio C Hamano
2021-02-18  0:00               ` Ævar Arnfjörð Bjarmason
2021-02-18 19:12                 ` Junio C Hamano
2021-02-18 19:57                   ` Jeff King
2021-02-18 20:27                     ` Junio C Hamano
2021-02-19  0:54                       ` Ævar Arnfjörð Bjarmason
2021-02-18 22:36                     ` Junio C Hamano
2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
2021-02-18 22:19               ` Junio C Hamano
2021-03-06 11:04               ` [PATCH v3 00/22] fsck: API improvements Ævar Arnfjörð Bjarmason
2021-03-07 23:04                 ` Junio C Hamano
2021-03-08  9:16                   ` Ævar Arnfjörð Bjarmason
2021-03-16 16:17                 ` [PATCH v4 " Ævar Arnfjörð Bjarmason
2021-03-16 19:35                   ` Derrick Stolee
2021-03-17 18:20                   ` [PATCH v5 00/19] " Ævar Arnfjörð Bjarmason
2021-03-17 20:30                     ` Derrick Stolee
2021-03-17 21:06                     ` Junio C Hamano
2021-03-28 13:15                     ` [PATCH v6 " Ævar Arnfjörð Bjarmason
2021-03-28 13:15                       ` [PATCH v6 01/19] fsck.c: refactor and rename common config callback Ævar Arnfjörð Bjarmason
2021-03-28 13:15                       ` [PATCH v6 02/19] fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT} Ævar Arnfjörð Bjarmason
2021-03-28 17:15                         ` Ramsay Jones
2021-03-29  2:04                           ` Junio C Hamano
2021-03-28 13:15                       ` [PATCH v6 03/19] fsck.h: use "enum object_type" instead of "int" Ævar Arnfjörð Bjarmason
2021-03-28 13:15                       ` [PATCH v6 04/19] fsck.c: rename variables in fsck_set_msg_type() for less confusion Ævar Arnfjörð Bjarmason
2021-03-28 13:15                       ` [PATCH v6 05/19] fsck.c: remove (mostly) redundant append_msg_id() function Ævar Arnfjörð Bjarmason
2021-03-28 13:15                       ` [PATCH v6 06/19] fsck.c: rename remaining fsck_msg_id "id" to "msg_id" Ævar Arnfjörð Bjarmason
2021-03-28 13:15                       ` [PATCH v6 07/19] fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type" Ævar Arnfjörð Bjarmason
2021-03-28 13:15                       ` [PATCH v6 08/19] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Ævar Arnfjörð Bjarmason
2021-03-28 13:15                       ` [PATCH v6 09/19] fsck.h: re-order and re-assign "enum fsck_msg_type" Ævar Arnfjörð Bjarmason
2021-03-28 13:15                       ` [PATCH v6 10/19] fsck.c: call parse_msg_type() early in fsck_set_msg_type() Ævar Arnfjörð Bjarmason
2021-03-28 13:15                       ` [PATCH v6 11/19] fsck.c: undefine temporary STR macro after use Ævar Arnfjörð Bjarmason
2021-03-28 13:15                       ` [PATCH v6 12/19] fsck.c: give "FOREACH_MSG_ID" a more specific name Ævar Arnfjörð Bjarmason
2021-03-28 13:15                       ` [PATCH v6 13/19] fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h Ævar Arnfjörð Bjarmason
2021-03-28 13:15                       ` [PATCH v6 14/19] fsck.c: pass along the fsck_msg_id in the fsck_error callback Ævar Arnfjörð Bjarmason
2021-03-28 13:15                       ` [PATCH v6 15/19] fsck.c: add an fsck_set_msg_type() API that takes enums Ævar Arnfjörð Bjarmason
2021-03-28 13:15                       ` [PATCH v6 16/19] fsck.c: move gitmodules_{found,done} into fsck_options Ævar Arnfjörð Bjarmason
2021-03-28 13:15                       ` [PATCH v6 17/19] fetch-pack: don't needlessly copy fsck_options Ævar Arnfjörð Bjarmason
2021-03-28 13:15                       ` [PATCH v6 18/19] fetch-pack: use file-scope static struct for fsck_options Ævar Arnfjörð Bjarmason
2021-03-28 13:15                       ` [PATCH v6 19/19] fetch-pack: use new fsck API to printing dangling submodules Ævar Arnfjörð Bjarmason
2021-03-29  2:06                       ` [PATCH v6 00/19] fsck: API improvements Junio C Hamano
2021-03-17 18:20                   ` [PATCH v5 01/19] fsck.c: refactor and rename common config callback Ævar Arnfjörð Bjarmason
2021-03-17 18:20                   ` [PATCH v5 02/19] fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT} Ævar Arnfjörð Bjarmason
2021-03-17 18:20                   ` [PATCH v5 03/19] fsck.h: use "enum object_type" instead of "int" Ævar Arnfjörð Bjarmason
2021-03-17 18:20                   ` [PATCH v5 04/19] fsck.c: rename variables in fsck_set_msg_type() for less confusion Ævar Arnfjörð Bjarmason
2021-03-17 18:20                   ` [PATCH v5 05/19] fsck.c: move definition of msg_id into append_msg_id() Ævar Arnfjörð Bjarmason
2021-03-17 18:20                   ` [PATCH v5 06/19] fsck.c: rename remaining fsck_msg_id "id" to "msg_id" Ævar Arnfjörð Bjarmason
2021-03-17 18:20                   ` [PATCH v5 07/19] fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type" Ævar Arnfjörð Bjarmason
2021-03-17 18:20                   ` [PATCH v5 08/19] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Ævar Arnfjörð Bjarmason
2021-03-17 18:20                   ` [PATCH v5 09/19] fsck.h: re-order and re-assign "enum fsck_msg_type" Ævar Arnfjörð Bjarmason
2021-03-17 18:20                   ` [PATCH v5 10/19] fsck.c: call parse_msg_type() early in fsck_set_msg_type() Ævar Arnfjörð Bjarmason
2021-03-17 18:20                   ` [PATCH v5 11/19] fsck.c: undefine temporary STR macro after use Ævar Arnfjörð Bjarmason
2021-03-17 18:20                   ` [PATCH v5 12/19] fsck.c: give "FOREACH_MSG_ID" a more specific name Ævar Arnfjörð Bjarmason
2021-03-17 18:20                   ` [PATCH v5 13/19] fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h Ævar Arnfjörð Bjarmason
2021-03-17 18:20                   ` [PATCH v5 14/19] fsck.c: pass along the fsck_msg_id in the fsck_error callback Ævar Arnfjörð Bjarmason
2021-03-17 18:20                   ` [PATCH v5 15/19] fsck.c: add an fsck_set_msg_type() API that takes enums Ævar Arnfjörð Bjarmason
2021-03-17 18:20                   ` [PATCH v5 16/19] fsck.c: move gitmodules_{found,done} into fsck_options Ævar Arnfjörð Bjarmason
2021-03-17 18:20                   ` [PATCH v5 17/19] fetch-pack: don't needlessly copy fsck_options Ævar Arnfjörð Bjarmason
2021-03-17 18:20                   ` [PATCH v5 18/19] fetch-pack: use file-scope static struct for fsck_options Ævar Arnfjörð Bjarmason
2021-03-17 18:20                   ` [PATCH v5 19/19] fetch-pack: use new fsck API to printing dangling submodules Ævar Arnfjörð Bjarmason
2021-03-16 16:17                 ` [PATCH v4 01/22] fsck.h: update FSCK_OPTIONS_* for object_name Ævar Arnfjörð Bjarmason
2021-03-17 18:35                   ` Junio C Hamano
2021-03-19 14:43                   ` Johannes Schindelin
2021-03-20  9:16                     ` Ævar Arnfjörð Bjarmason
2021-03-20 20:04                       ` Junio C Hamano
2021-03-16 16:17                 ` [PATCH v4 02/22] fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT} Ævar Arnfjörð Bjarmason
2021-03-16 18:59                   ` Derrick Stolee
2021-03-17 18:38                   ` Junio C Hamano
2021-03-16 16:17                 ` [PATCH v4 03/22] fsck.h: reduce duplication between FSCK_OPTIONS_{DEFAULT,STRICT} Ævar Arnfjörð Bjarmason
2021-03-16 16:17                 ` [PATCH v4 04/22] fsck.h: add a FSCK_OPTIONS_COMMON_ERROR_FUNC macro Ævar Arnfjörð Bjarmason
2021-03-16 19:06                   ` Derrick Stolee
2021-03-16 16:17                 ` [PATCH v4 05/22] fsck.h: indent arguments to of fsck_set_msg_type Ævar Arnfjörð Bjarmason
2021-03-16 16:17                 ` [PATCH v4 06/22] fsck.h: use "enum object_type" instead of "int" Ævar Arnfjörð Bjarmason
2021-03-16 16:17                 ` [PATCH v4 07/22] fsck.c: rename variables in fsck_set_msg_type() for less confusion Ævar Arnfjörð Bjarmason
2021-03-16 16:17                 ` [PATCH v4 08/22] fsck.c: move definition of msg_id into append_msg_id() Ævar Arnfjörð Bjarmason
2021-03-17 18:45                   ` Junio C Hamano
2021-03-16 16:17                 ` [PATCH v4 09/22] fsck.c: rename remaining fsck_msg_id "id" to "msg_id" Ævar Arnfjörð Bjarmason
2021-03-16 16:17                 ` [PATCH v4 10/22] fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type" Ævar Arnfjörð Bjarmason
2021-03-16 16:17                 ` [PATCH v4 11/22] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Ævar Arnfjörð Bjarmason
2021-03-17 18:48                   ` Junio C Hamano
2021-03-16 16:17                 ` [PATCH v4 12/22] fsck.h: re-order and re-assign "enum fsck_msg_type" Ævar Arnfjörð Bjarmason
2021-03-17 18:50                   ` Junio C Hamano
2021-03-16 16:17                 ` [PATCH v4 13/22] fsck.c: call parse_msg_type() early in fsck_set_msg_type() Ævar Arnfjörð Bjarmason
2021-03-16 16:17                 ` [PATCH v4 14/22] fsck.c: undefine temporary STR macro after use Ævar Arnfjörð Bjarmason
2021-03-17 18:57                   ` Junio C Hamano
2021-03-16 16:17                 ` [PATCH v4 15/22] fsck.c: give "FOREACH_MSG_ID" a more specific name Ævar Arnfjörð Bjarmason
2021-03-16 16:17                 ` [PATCH v4 16/22] fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h Ævar Arnfjörð Bjarmason
2021-03-16 16:17                 ` [PATCH v4 17/22] fsck.c: pass along the fsck_msg_id in the fsck_error callback Ævar Arnfjörð Bjarmason
2021-03-17 19:01                   ` Junio C Hamano
2021-03-16 16:17                 ` [PATCH v4 18/22] fsck.c: add an fsck_set_msg_type() API that takes enums Ævar Arnfjörð Bjarmason
2021-03-16 16:17                 ` [PATCH v4 19/22] fsck.c: move gitmodules_{found,done} into fsck_options Ævar Arnfjörð Bjarmason
2021-03-16 16:17                 ` [PATCH v4 20/22] fetch-pack: don't needlessly copy fsck_options Ævar Arnfjörð Bjarmason
2021-03-16 16:17                 ` [PATCH v4 21/22] fetch-pack: use file-scope static struct for fsck_options Ævar Arnfjörð Bjarmason
2021-03-16 16:17                 ` [PATCH v4 22/22] fetch-pack: use new fsck API to printing dangling submodules Ævar Arnfjörð Bjarmason
2021-03-16 19:32                   ` Derrick Stolee
2021-03-17 13:47                     ` Ævar Arnfjörð Bjarmason
2021-03-17 20:27                       ` Derrick Stolee
2021-03-17 19:12                   ` Junio C Hamano
2021-03-06 11:04               ` [PATCH v3 01/22] fsck.h: update FSCK_OPTIONS_* for object_name Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 02/22] fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT} Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 03/22] fsck.h: reduce duplication between FSCK_OPTIONS_{DEFAULT,STRICT} Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 04/22] fsck.h: add a FSCK_OPTIONS_COMMON_ERROR_FUNC macro Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 05/22] fsck.h: indent arguments to of fsck_set_msg_type Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 06/22] fsck.h: use "enum object_type" instead of "int" Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 07/22] fsck.c: rename variables in fsck_set_msg_type() for less confusion Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 08/22] fsck.c: move definition of msg_id into append_msg_id() Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 09/22] fsck.c: rename remaining fsck_msg_id "id" to "msg_id" Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 10/22] fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type" Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 11/22] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 12/22] fsck.h: re-order and re-assign "enum fsck_msg_type" Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 13/22] fsck.c: call parse_msg_type() early in fsck_set_msg_type() Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 14/22] fsck.c: undefine temporary STR macro after use Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 15/22] fsck.c: give "FOREACH_MSG_ID" a more specific name Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 16/22] fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 17/22] fsck.c: pass along the fsck_msg_id in the fsck_error callback Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 18/22] fsck.c: add an fsck_set_msg_type() API that takes enums Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 19/22] fsck.c: move gitmodules_{found,done} into fsck_options Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 20/22] fetch-pack: don't needlessly copy fsck_options Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 21/22] fetch-pack: use file-scope static struct for fsck_options Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 22/22] fetch-pack: use new fsck API to printing dangling submodules Ævar Arnfjörð Bjarmason
2021-02-18 10:58             ` [PATCH v2 01/10] fsck.h: indent arguments to of fsck_set_msg_type Ævar Arnfjörð Bjarmason
2021-02-18 10:58             ` [PATCH v2 02/10] fsck.h: use "enum object_type" instead of "int" Ævar Arnfjörð Bjarmason
2021-02-18 10:58             ` [PATCH v2 03/10] fsck.c: rename variables in fsck_set_msg_type() for less confusion Ævar Arnfjörð Bjarmason
2021-02-18 19:45               ` Jeff King
2021-02-18 10:58             ` [PATCH v2 04/10] fsck.c: move definition of msg_id into append_msg_id() Ævar Arnfjörð Bjarmason
2021-02-18 10:58             ` [PATCH v2 05/10] fsck.c: rename remaining fsck_msg_id "id" to "msg_id" Ævar Arnfjörð Bjarmason
2021-02-18 22:23               ` Junio C Hamano
2021-02-18 10:58             ` [PATCH v2 06/10] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Ævar Arnfjörð Bjarmason
2021-02-18 19:52               ` Jeff King
2021-02-18 22:27                 ` Junio C Hamano
2021-02-18 10:58             ` [PATCH v2 07/10] fsck.c: call parse_msg_type() early in fsck_set_msg_type() Ævar Arnfjörð Bjarmason
2021-02-18 22:29               ` Junio C Hamano
2021-02-18 10:58             ` [PATCH v2 08/10] fsck.c: undefine temporary STR macro after use Ævar Arnfjörð Bjarmason
2021-02-18 22:30               ` Junio C Hamano
2021-02-18 10:58             ` [PATCH v2 09/10] fsck.c: give "FOREACH_MSG_ID" a more specific name Ævar Arnfjörð Bjarmason
2021-02-18 19:56               ` Jeff King
2021-02-18 10:58             ` [PATCH v2 10/10] fsck.h: update FSCK_OPTIONS_* for object_name Ævar Arnfjörð Bjarmason
2021-02-18 19:56               ` Jeff King
2021-02-18 22:33                 ` Junio C Hamano
2021-02-18 22:32               ` Junio C Hamano
2021-02-17 19:42           ` [PATCH 01/14] fsck.h: indent arguments to of fsck_set_msg_type Ævar Arnfjörð Bjarmason
2021-02-17 19:42           ` [PATCH 02/14] fsck.h: use use "enum object_type" instead of "int" Ævar Arnfjörð Bjarmason
2021-02-17 23:40             ` Junio C Hamano
2021-02-17 19:42           ` [PATCH 03/14] fsck.c: rename variables in fsck_set_msg_type() for less confusion Ævar Arnfjörð Bjarmason
2021-02-17 19:42           ` [PATCH 04/14] fsck.c: move definition of msg_id into append_msg_id() Ævar Arnfjörð Bjarmason
2021-02-17 19:42           ` [PATCH 05/14] fsck.c: rename remaining fsck_msg_id "id" to "msg_id" Ævar Arnfjörð Bjarmason
2021-02-17 19:42           ` [PATCH 06/14] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Ævar Arnfjörð Bjarmason
2021-02-17 19:42           ` [PATCH 07/14] fsck.c: call parse_msg_type() early in fsck_set_msg_type() Ævar Arnfjörð Bjarmason
2021-02-17 19:42           ` [PATCH 08/14] fsck.c: undefine temporary STR macro after use Ævar Arnfjörð Bjarmason
2021-02-17 19:42           ` [PATCH 09/14] fsck.c: give "FOREACH_MSG_ID" a more specific name Ævar Arnfjörð Bjarmason
2021-02-17 19:42           ` [PATCH 10/14] fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h Ævar Arnfjörð Bjarmason
2021-02-17 19:42           ` [PATCH 11/14] fsck.c: pass along the fsck_msg_id in the fsck_error callback Ævar Arnfjörð Bjarmason
2021-02-17 19:42           ` [PATCH 12/14] fsck.c: add an fsck_set_msg_type() API that takes enums Ævar Arnfjörð Bjarmason
2021-02-17 19:42           ` [PATCH 13/14] fsck.h: update FSCK_OPTIONS_* for object_name Ævar Arnfjörð Bjarmason
2021-02-17 19:42           ` [PATCH 14/14] fsck.c: move gitmodules_{found,done} into fsck_options Ævar Arnfjörð Bjarmason
2021-02-17 20:05           ` [PATCH 4/4] fetch-pack: print and use dangling .gitmodules Jonathan Tan
2021-01-24 12:30     ` Ævar Arnfjörð Bjarmason
2021-01-28  1:15       ` Jonathan Tan
2021-02-17  2:10         ` Ævar Arnfjörð Bjarmason
2021-02-17 20:10           ` Jonathan Tan
2021-02-18 12:07             ` Ævar Arnfjörð Bjarmason
2021-02-17 19:27     ` Ævar Arnfjörð Bjarmason
2021-02-17 20:11       ` Jonathan Tan
2021-01-24  6:29   ` [PATCH 0/4] Check .gitmodules when using packfile URIs Junio C Hamano
2021-01-28  0:35     ` Jonathan Tan
2021-02-18 11:31       ` Ævar Arnfjörð Bjarmason
2021-02-18 23:34   ` Junio C Hamano
2021-02-19  0:46     ` Jonathan Tan
2021-02-20  3:31       ` Junio C Hamano
2021-02-19  1:08     ` Ævar Arnfjörð Bjarmason
2021-02-20  3:29       ` Junio C Hamano
2021-02-22 19:20 ` [PATCH v2 " Jonathan Tan
2021-02-22 19:20   ` [PATCH v2 1/4] http: allow custom index-pack args Jonathan Tan
2021-02-22 19:20   ` [PATCH v2 2/4] http-fetch: " Jonathan Tan
2021-02-23 13:17     ` Ævar Arnfjörð Bjarmason
2021-02-23 16:51       ` Jonathan Tan
2021-03-05  0:19     ` Jonathan Nieder
2021-03-05  1:16       ` [PATCH] fetch-pack: do not mix --pack_header and packfile uri Jonathan Tan
2021-03-05  1:52         ` Junio C Hamano
2021-03-05 18:50         ` Junio C Hamano
2021-03-05 19:46           ` Junio C Hamano
2021-03-05 23:11             ` Jonathan Tan
2021-03-05 23:20             ` Junio C Hamano
2021-03-05 22:59           ` Jonathan Tan
2021-03-05 23:18             ` Junio C Hamano
2021-03-08 19:14               ` Jonathan Tan
2021-03-08 19:34                 ` Junio C Hamano
2021-03-09 19:13                   ` Junio C Hamano
2021-03-10  5:24                     ` Junio C Hamano
2021-03-10 16:57                     ` Jonathan Tan
2021-03-10 18:30                       ` Junio C Hamano
2021-03-10 19:56                         ` Junio C Hamano
2021-03-10 23:29                           ` Jonathan Tan
2021-03-11  0:59                             ` Junio C Hamano
2021-03-11  1:41                             ` Junio C Hamano
2021-03-11 17:22                               ` Jonathan Tan
2021-03-11 21:21                                 ` Junio C Hamano
2021-02-22 19:20   ` [PATCH v2 3/4] fetch-pack: with packfile URIs, use index-pack arg Jonathan Tan
2021-02-22 19:20   ` [PATCH v2 4/4] fetch-pack: print and use dangling .gitmodules Jonathan Tan
2021-02-22 20:12   ` [PATCH v2 0/4] Check .gitmodules when using packfile URIs Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).