All of lore.kernel.org
 help / color / mirror / Atom feed
* Infinite loop + memory leak in annotate_refs_with_symref_info
@ 2023-04-11 19:53 Jonas Haag
  2023-04-11 20:25 ` Taylor Blau
  2023-04-11 21:06 ` Jeff King
  0 siblings, 2 replies; 30+ messages in thread
From: Jonas Haag @ 2023-04-11 19:53 UTC (permalink / raw)
  To: git

Hello!

There is an infinite loop with an accompanying memory leak in annotate_refs_with_symref_info that was introduced in Git 2.28 (I think in commit 2c6a403: “connect: add function to parse multiple v1 capability values”).

To reproduce the issue, start Klaus [1] using the --smarthttp option and attempt to clone a repository. git-remote-http will enter an infinite loop.

I think this is triggered by a bug in Dulwich, the Python Git implementation that Klaus uses. I’m assuming that Dulwich sends some invalid responses that make the Git client go into an infinite loop.

I believe the bug in Git is in connect.c, function parse_feature_value, in the updating of `*offset`: It doesn’t seem to take into account that `feature_list` has already been offset by `*offset`. I believe the update needs to use `*offset +=` instead of `*offset =`. When I make this change, the infinite loop seems to go away, and cloning via Klaus/Dulwich will fail with “invalid index-pack output”. Cloning from github.com works, although I’m not sure if that’s a relevant smoke test in this case.

Jonas

[1] https://github.com/jonashaag/klaus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Infinite loop + memory leak in annotate_refs_with_symref_info
  2023-04-11 19:53 Infinite loop + memory leak in annotate_refs_with_symref_info Jonas Haag
@ 2023-04-11 20:25 ` Taylor Blau
  2023-04-11 23:59   ` Taylor Blau
  2023-04-12  0:53   ` brian m. carlson
  2023-04-11 21:06 ` Jeff King
  1 sibling, 2 replies; 30+ messages in thread
From: Taylor Blau @ 2023-04-11 20:25 UTC (permalink / raw)
  To: Jonas Haag; +Cc: brian m. carlson, git

Hi Jonas,

On Tue, Apr 11, 2023 at 10:53:59PM +0300, Jonas Haag wrote:
> Hello!
>
> There is an infinite loop with an accompanying memory leak in
> annotate_refs_with_symref_info that was introduced in Git 2.28 (I
> think in commit 2c6a403: “connect: add function to parse multiple v1
> capability values”).

I'm not familiar with Klaus and don't have it installed, but a couple of
questions: were you able to reproduce this result with any other forges
or tools, and were you able to confirm that 2c6a403 is the culprit via a
bisection?

In case the answer to the latter question is "yes", I cc'd brian carlson
on this thread, since they are the original author of that patch.

> I believe the bug in Git is in connect.c, function
> parse_feature_value, in the updating of `*offset`: It doesn’t seem to
> take into account that `feature_list` has already been offset by
> `*offset`. I believe the update needs to use `*offset +=` instead of
> `*offset =`. When I make this change, the infinite loop seems to go
> away, and cloning via Klaus/Dulwich will fail with “invalid index-pack
> output”. Cloning from github.com works, although I’m not sure if
> that’s a relevant smoke test in this case.

I'm not sure I understand. Looking at the relevant bits in
connect.c::parse_feature_value(), it all seems correct to me, since the
beginning of `feature_list` is adjusted by the current value of
`*offset`.

> [1] https://github.com/jonashaag/klaus

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Infinite loop + memory leak in annotate_refs_with_symref_info
  2023-04-11 19:53 Infinite loop + memory leak in annotate_refs_with_symref_info Jonas Haag
  2023-04-11 20:25 ` Taylor Blau
@ 2023-04-11 21:06 ` Jeff King
  2023-04-11 21:16   ` Jeff King
  1 sibling, 1 reply; 30+ messages in thread
From: Jeff King @ 2023-04-11 21:06 UTC (permalink / raw)
  To: Jonas Haag; +Cc: git

On Tue, Apr 11, 2023 at 10:53:59PM +0300, Jonas Haag wrote:

> There is an infinite loop with an accompanying memory leak in
> annotate_refs_with_symref_info that was introduced in Git 2.28 (I
> think in commit 2c6a403: “connect: add function to parse multiple v1
> capability values”).

Have you tried to reproduce with a more recent version of Git? This
sounds a lot like the bug fixed in 44d2aec6e8 (connect: also update
offset for features without values, 2021-09-26), which is in v2.33.1.

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Infinite loop + memory leak in annotate_refs_with_symref_info
  2023-04-11 21:06 ` Jeff King
@ 2023-04-11 21:16   ` Jeff King
  2023-04-11 21:22     ` Taylor Blau
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2023-04-11 21:16 UTC (permalink / raw)
  To: Jonas Haag; +Cc: Taylor Blau, brian m. carlson, git

On Tue, Apr 11, 2023 at 05:06:33PM -0400, Jeff King wrote:

> On Tue, Apr 11, 2023 at 10:53:59PM +0300, Jonas Haag wrote:
> 
> > There is an infinite loop with an accompanying memory leak in
> > annotate_refs_with_symref_info that was introduced in Git 2.28 (I
> > think in commit 2c6a403: “connect: add function to parse multiple v1
> > capability values”).
> 
> Have you tried to reproduce with a more recent version of Git? This
> sounds a lot like the bug fixed in 44d2aec6e8 (connect: also update
> offset for features without values, 2021-09-26), which is in v2.33.1.

Never mind. I was able to reproduce (I never used klaus, but it's
packaged for Debian, so it was pretty easy to do). And yes, the problem
still exists today. And bisection confirms it's from 2c6a403.

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Infinite loop + memory leak in annotate_refs_with_symref_info
  2023-04-11 21:16   ` Jeff King
@ 2023-04-11 21:22     ` Taylor Blau
  2023-04-11 21:58       ` Jeff King
  0 siblings, 1 reply; 30+ messages in thread
From: Taylor Blau @ 2023-04-11 21:22 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonas Haag, brian m. carlson, git

On Tue, Apr 11, 2023 at 05:16:04PM -0400, Jeff King wrote:
> On Tue, Apr 11, 2023 at 05:06:33PM -0400, Jeff King wrote:
>
> > On Tue, Apr 11, 2023 at 10:53:59PM +0300, Jonas Haag wrote:
> >
> > > There is an infinite loop with an accompanying memory leak in
> > > annotate_refs_with_symref_info that was introduced in Git 2.28 (I
> > > think in commit 2c6a403: “connect: add function to parse multiple v1
> > > capability values”).
> >
> > Have you tried to reproduce with a more recent version of Git? This
> > sounds a lot like the bug fixed in 44d2aec6e8 (connect: also update
> > offset for features without values, 2021-09-26), which is in v2.33.1.
>
> Never mind. I was able to reproduce (I never used klaus, but it's
> packaged for Debian, so it was pretty easy to do). And yes, the problem
> still exists today. And bisection confirms it's from 2c6a403.

Yeah, same here. I hadn't used it either, but it's easily installable
via pip, too. Indeed, you can see the value of *offset jumping backwards
in `connect.c::parse_value_value()` (whose caller in this case is
`connect.c::annotate_refs_with_symref_info()`).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Infinite loop + memory leak in annotate_refs_with_symref_info
  2023-04-11 21:22     ` Taylor Blau
@ 2023-04-11 21:58       ` Jeff King
  2023-04-11 22:52         ` Junio C Hamano
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2023-04-11 21:58 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Jonas Haag, brian m. carlson, git

On Tue, Apr 11, 2023 at 05:22:02PM -0400, Taylor Blau wrote:

> > Never mind. I was able to reproduce (I never used klaus, but it's
> > packaged for Debian, so it was pretty easy to do). And yes, the problem
> > still exists today. And bisection confirms it's from 2c6a403.
> 
> Yeah, same here. I hadn't used it either, but it's easily installable
> via pip, too. Indeed, you can see the value of *offset jumping backwards
> in `connect.c::parse_value_value()` (whose caller in this case is
> `connect.c::annotate_refs_with_symref_info()`).

Yep. And Jonas's suggested fix is the right thing. Assigning offset
directly _would_ be the right thing, since we are taking the distance
back to the beginning of the feature_list string. Except that earlier in
the function we incremented feature_list by the incoming value of
the offset!

Here's a test that demonstrates the problem (and passes with the
accompanying code change):

diff --git a/connect.c b/connect.c
index c0c8a38178..5e265ba9d7 100644
--- a/connect.c
+++ b/connect.c
@@ -616,7 +616,7 @@ const char *parse_feature_value(const char *feature_list, const char *feature, i
 				if (lenp)
 					*lenp = 0;
 				if (offset)
-					*offset = found + len - feature_list;
+					*offset += found + len - feature_list;
 				return value;
 			}
 			/* feature with a value (e.g., "agent=git/1.2.3") */
@@ -628,7 +628,7 @@ const char *parse_feature_value(const char *feature_list, const char *feature, i
 				if (lenp)
 					*lenp = end;
 				if (offset)
-					*offset = value + end - feature_list;
+					*offset += value + end - feature_list;
 				return value;
 			}
 			/*
diff --git a/t/t5512-ls-remote.sh b/t/t5512-ls-remote.sh
index 20d063fb9a..c8422d66e7 100755
--- a/t/t5512-ls-remote.sh
+++ b/t/t5512-ls-remote.sh
@@ -360,4 +360,39 @@ test_expect_success 'ls-remote prefixes work with all protocol versions' '
 	test_cmp expect actual.v2
 '
 
+test_expect_success 'v0 clients can handle multiple symrefs' '
+	# Git will not generate multiple symref entries for v0 these days, but it
+	# is technically allowed, and we did so until d007dbf7d6 (Revert
+	# "upload-pack: send non-HEAD symbolic refs", 2013-11-18). Test the
+	# client handling here by faking that older behavior.
+	#
+	# Note that our oid is hard-coded to always be sha1, and not using
+	# test_oid. Since our fake capabilities line does not have an
+	# object-format entry, the client will always use sha1 mode.
+	oid=1234567890123456789012345678901234567890 &&
+	symrefs="symref=refs/remotes/origin/HEAD:refs/remotes/origin/main" &&
+	symrefs="$symrefs symref=HEAD:refs/heads/main" &&
+
+	test-tool pkt-line pack >input.q <<-EOF &&
+	$oid HEADQ$symrefs
+	$oid refs/heads/main
+	$oid refs/remotes/origin/HEAD
+	$oid refs/remotes/origin/main
+	0000
+	EOF
+	q_to_nul <input.q >input &&
+
+	cat >expect <<-EOF &&
+	ref: refs/heads/main	HEAD
+	$oid	HEAD
+	$oid	refs/heads/main
+	ref: refs/remotes/origin/main	refs/remotes/origin/HEAD
+	$oid	refs/remotes/origin/HEAD
+	$oid	refs/remotes/origin/main
+	EOF
+
+	git ls-remote --symref --upload-pack="cat input; cat;:" . >actual &&
+	test_cmp expect actual
+'
+
 test_done


I was going to ask if Jonas wanted to wrap this up as a patch, but I
think that's pretty much it right there. :) I'll write up a commit
message for it later tonight.

I also wondered if we tested this multiple-symref case for protocol v2
(where it works fine), but it looks like we may not. There are earlier
tests which _would_ trigger it, but we force them into v0 mode, due to
b2f73b70b2 (t5512: compensate for v0 only sending HEAD symrefs,
2019-02-25). I think we really should be letting ls-remote use the
protocol it prefers (v2 by default, and v0 if the suite is run in that
mode), and the expected output should be adjusted based on the mode.
I'll see if I can do that as well, to make this a two-patch series.

-Peff

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: Infinite loop + memory leak in annotate_refs_with_symref_info
  2023-04-11 21:58       ` Jeff King
@ 2023-04-11 22:52         ` Junio C Hamano
  2023-04-12  6:23           ` [PATCH 0/7] v0 multiple-symref infinite loop fix and test cleanup Jeff King
  0 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2023-04-11 22:52 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, Jonas Haag, brian m. carlson, git

Jeff King <peff@peff.net> writes:

> Yep. And Jonas's suggested fix is the right thing. Assigning offset
> directly _would_ be the right thing, since we are taking the distance
> back to the beginning of the feature_list string. Except that earlier in
> the function we incremented feature_list by the incoming value of
> the offset!

Sigh.  Thanks for finding the problem with a fix.  The data flow in
this function is horrible, but yes, "found + len - feature_list" is
smaller than the code expects to be because feature_list is moved
forward before entering the loop, and I can see how the patch fixes
the problem.

> diff --git a/t/t5512-ls-remote.sh b/t/t5512-ls-remote.sh
> index 20d063fb9a..c8422d66e7 100755
> --- a/t/t5512-ls-remote.sh
> +++ b/t/t5512-ls-remote.sh
> @@ -360,4 +360,39 @@ test_expect_success 'ls-remote prefixes work with all protocol versions' '
>  	test_cmp expect actual.v2
>  '
>  
> +test_expect_success 'v0 clients can handle multiple symrefs' '
> +	# Git will not generate multiple symref entries for v0 these days, but it
> +	# is technically allowed, and we did so until d007dbf7d6 (Revert
> +	# "upload-pack: send non-HEAD symbolic refs", 2013-11-18). Test the
> +	# client handling here by faking that older behavior.

Yeah, I remember that fix where somebody had tons of symbolic refs
and busted the protocol limit.  Is "multiple symref" used here
because it is the easiest to reproduce the issue, or have we saw
such a potentially broken server in the wild?

> +	# Note that our oid is hard-coded to always be sha1, and not using
> +	# test_oid. Since our fake capabilities line does not have an
> +	# object-format entry, the client will always use sha1 mode.

It probably is OK to run the test in that "undeclared - assume
SHA-1" mode, even though I think we give an explicit "object-format"
capability even when talking from the SHA-1 repository these days.

> +	oid=1234567890123456789012345678901234567890 &&
> +	symrefs="symref=refs/remotes/origin/HEAD:refs/remotes/origin/main" &&
> +	symrefs="$symrefs symref=HEAD:refs/heads/main" &&

> I also wondered if we tested this multiple-symref case for protocol v2
> (where it works fine), but it looks like we may not. There are earlier
> tests which _would_ trigger it, but we force them into v0 mode, due to
> b2f73b70b2 (t5512: compensate for v0 only sending HEAD symrefs,
> 2019-02-25). I think we really should be letting ls-remote use the
> protocol it prefers (v2 by default, and v0 if the suite is run in that
> mode), and the expected output should be adjusted based on the mode.
> I'll see if I can do that as well, to make this a two-patch series.

Thanks.  I really appreciate your being almost always thorough and
wish more contributors took inspirations.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Infinite loop + memory leak in annotate_refs_with_symref_info
  2023-04-11 20:25 ` Taylor Blau
@ 2023-04-11 23:59   ` Taylor Blau
  2023-04-12  0:53   ` brian m. carlson
  1 sibling, 0 replies; 30+ messages in thread
From: Taylor Blau @ 2023-04-11 23:59 UTC (permalink / raw)
  To: Jonas Haag; +Cc: brian m. carlson, git

On Tue, Apr 11, 2023 at 04:25:13PM -0400, Taylor Blau wrote:
> > I believe the bug in Git is in connect.c, function
> > parse_feature_value, in the updating of `*offset`: It doesn’t seem to
> > take into account that `feature_list` has already been offset by
> > `*offset`. I believe the update needs to use `*offset +=` instead of
> > `*offset =`. When I make this change, the infinite loop seems to go
> > away, and cloning via Klaus/Dulwich will fail with “invalid index-pack
> > output”. Cloning from github.com works, although I’m not sure if
> > that’s a relevant smoke test in this case.
>
> I'm not sure I understand. Looking at the relevant bits in
> connect.c::parse_feature_value(), it all seems correct to me, since the
> beginning of `feature_list` is adjusted by the current value of
> `*offset`.

Oops. This was exactly[1] the problem as you suggested, I was just
thinking about it backwards. When we write into `*offset`, we need to
take into account that `feature_list` has already been moved forward by
`*offset`.

Obviously the discussion can continue below [1], but just wanted to
correct my wrong here and acknowledge that you were absolutely right in
your original report.

Thanks,
Taylor

[1]: https://lore.kernel.org/git/20230411215845.GA678138@coredump.intra.peff.net/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Infinite loop + memory leak in annotate_refs_with_symref_info
  2023-04-11 20:25 ` Taylor Blau
  2023-04-11 23:59   ` Taylor Blau
@ 2023-04-12  0:53   ` brian m. carlson
  1 sibling, 0 replies; 30+ messages in thread
From: brian m. carlson @ 2023-04-12  0:53 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Jonas Haag, git

[-- Attachment #1: Type: text/plain, Size: 1267 bytes --]

On 2023-04-11 at 20:25:13, Taylor Blau wrote:
> Hi Jonas,
> 
> On Tue, Apr 11, 2023 at 10:53:59PM +0300, Jonas Haag wrote:
> > Hello!
> >
> > There is an infinite loop with an accompanying memory leak in
> > annotate_refs_with_symref_info that was introduced in Git 2.28 (I
> > think in commit 2c6a403: “connect: add function to parse multiple v1
> > capability values”).
> 
> I'm not familiar with Klaus and don't have it installed, but a couple of
> questions: were you able to reproduce this result with any other forges
> or tools, and were you able to confirm that 2c6a403 is the culprit via a
> bisection?
> 
> In case the answer to the latter question is "yes", I cc'd brian carlson
> on this thread, since they are the original author of that patch.

I may be the author of the patch, but I honestly don't remember much
about that code except that it was a bear to write, and I am honestly
not terribly surprised that it came out less than perfect.  I do
apologize for the buggy code, though.

It looks like Peff has come up with a patch downthread, and a test even,
which I think will probably fix the problem, so I'll refrain from
sending one myself for now.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 0/7] v0 multiple-symref infinite loop fix and test cleanup
  2023-04-11 22:52         ` Junio C Hamano
@ 2023-04-12  6:23           ` Jeff King
  2023-04-12  6:29             ` [PATCH 1/7] v0 protocol: fix infinite loop when parsing multi-valued capabilities Jeff King
                               ` (6 more replies)
  0 siblings, 7 replies; 30+ messages in thread
From: Jeff King @ 2023-04-12  6:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, Jonas Haag, brian m. carlson, git

On Tue, Apr 11, 2023 at 03:52:43PM -0700, Junio C Hamano wrote:

> > +test_expect_success 'v0 clients can handle multiple symrefs' '
> > +	# Git will not generate multiple symref entries for v0 these days, but it
> > +	# is technically allowed, and we did so until d007dbf7d6 (Revert
> > +	# "upload-pack: send non-HEAD symbolic refs", 2013-11-18). Test the
> > +	# client handling here by faking that older behavior.
> 
> Yeah, I remember that fix where somebody had tons of symbolic refs
> and busted the protocol limit.  Is "multiple symref" used here
> because it is the easiest to reproduce the issue, or have we saw
> such a potentially broken server in the wild?

Both. :) It was what we saw in the wild, but it's also one of only two
capabilities that can cause the problem, because it requires passing an
offset parameter, which we only do for capabilities we expect to see
multiple times. The other is "object-format", but we only ever print it
once. More details in the commit message to follow.

> > +	# Note that our oid is hard-coded to always be sha1, and not using
> > +	# test_oid. Since our fake capabilities line does not have an
> > +	# object-format entry, the client will always use sha1 mode.
> 
> It probably is OK to run the test in that "undeclared - assume
> SHA-1" mode, even though I think we give an explicit "object-format"
> capability even when talking from the SHA-1 repository these days.

We do, but I think it is OK to ignore that. The test will continue to
work even in a world where sha256 becomes the default. If we eventually
remove all vestiges of backwards-compatible sha1 support, it will have
to be updated, but I'm OK with that (keep in mind that the test is also
v0-only, as the v2 parser is totally different).

> > I also wondered if we tested this multiple-symref case for protocol v2
> > (where it works fine), but it looks like we may not. There are earlier
> > tests which _would_ trigger it, but we force them into v0 mode, due to
> > b2f73b70b2 (t5512: compensate for v0 only sending HEAD symrefs,
> > 2019-02-25). I think we really should be letting ls-remote use the
> > protocol it prefers (v2 by default, and v0 if the suite is run in that
> > mode), and the expected output should be adjusted based on the mode.
> > I'll see if I can do that as well, to make this a two-patch series.
> 
> Thanks.  I really appreciate your being almost always thorough and
> wish more contributors took inspirations.

Well, no good deed goes unpunished. :) These tests have not aged well,
so there were a number of fixes to make. Here's the series I came up
with.

The first one is the bug fix; I put it at the front because it's
obviously the most urgent. Patches 2-6 are test cleanups, and as such
should not be very risky, but I didn't want to hold up the fix. But they
do depend on the helper script introduced by patch 1, so they can't be
applied separately.

Patch 7 is a cleanup in the code which should have no behavior change.
It could be applied separately (or even dropped if you don't like it).

  [1/7]: v0 protocol: fix infinite loop when parsing multi-valued capabilities
  [2/7]: t5512: stop referring to "v1" protocol
  [3/7]: t5512: stop using jgit for capabilities^{} test
  [4/7]: t5512: add v2 support for "ls-remote --symref" test
  [5/7]: t5512: allow any protocol version for filtered symref test
  [6/7]: t5512: test "ls-remote --heads --symref" filtering with v0 and v2
  [7/7]: v0 protocol: use size_t for capability length/offset

 builtin/receive-pack.c |   2 +-
 connect.c              |  26 ++++----
 connect.h              |   4 +-
 fetch-pack.c           |   4 +-
 send-pack.c            |   2 +-
 t/t5512-ls-remote.sh   | 148 ++++++++++++++++++++++-------------------
 transport.c            |   2 +-
 upload-pack.c          |   2 +-
 8 files changed, 101 insertions(+), 89 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/7] v0 protocol: fix infinite loop when parsing multi-valued capabilities
  2023-04-12  6:23           ` [PATCH 0/7] v0 multiple-symref infinite loop fix and test cleanup Jeff King
@ 2023-04-12  6:29             ` Jeff King
  2023-04-12  6:46               ` Jeff King
  2023-04-12  6:29             ` [PATCH 2/7] t5512: stop referring to "v1" protocol Jeff King
                               ` (5 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2023-04-12  6:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, Jonas Haag, brian m. carlson, git

If Git's client-side parsing of an upload-pack response (so git-fetch or
ls-remote) sees multiple instances of a single capability, it can enter
an infinite loop due to a bug in advancing the "offset" parameter in the
parser.

This bug can't happen between a client and server of the same Git
version. The client bug is in parse_feature_value() when the caller
passes in an offset parameter. And that only happens when the v0
protocol is parsing "symref" and "object-format" capabilities, via
next_server_feature_value(). But Git has never produced multiple
object-format capabilities, and it stopped producing multiple symref
values in d007dbf7d6 (Revert "upload-pack: send non-HEAD symbolic refs",
2013-11-18).

However, upload-pack did produce multiple symref entries for a while,
and they are valid. Plus other implementations, such as Dulwich will
still do so. So we should handle them. And even if we do not expect it,
it is obviously a bug for the parser to enter an infinite loop.

The bug itself is pretty simple. Commit 2c6a403d96 (connect: add
function to parse multiple v1 capability values, 2020-05-25) added the
"offset" parameter, which is used as both an in- and out-parameter. When
parsing the first "symref" capability, *offset will be 0 on input, and
after parsing the capability, we set *offset to an index just past the
value by taking a pointer difference "(value + end) - feature_list".

But on the second call, now *offset is set to that larger index, which
lets us skip past the first "symref" capability. However, we do so by
incrementing feature_list. That means our pointer difference is now too
small; it is counting from where we resumed parsing, not from the start
of the original feature_list pointer. And because we incremented
feature_list only inside our function, and not the caller, that
increment is lost next time the function is called.

The simplest solution is to account for those skipped bytes by
incrementing *offset, rather than assigning to it. (The other possible
solution is to add an extra level of pointer indirection to feature_list
so that the caller knows we moved it forward, but that seems more
complicated).

Since the bug can't be reproduced using the current version of
git-upload-pack, we'll instead hard-code an input which triggers the
problem. Before this patch it loops forever re-parsing the second symref
entry. Now we check both that it finishes, and that it parses both
entries correctly (a case we could not test at all before).

We don't need to worry about testing v2 here; it communicates the
capabilities in a completely different way, and doesn't use this code at
all. There are tests earlier in t5512 that are meant to cover this (they
don't, but we'll address that in a future patch).

Reported-by: Jonas Haag <jonas@lophus.org>
Signed-off-by: Jeff King <peff@peff.net>
---
 connect.c            |  4 ++--
 t/t5512-ls-remote.sh | 44 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/connect.c b/connect.c
index c0c8a38178..5e265ba9d7 100644
--- a/connect.c
+++ b/connect.c
@@ -616,7 +616,7 @@ const char *parse_feature_value(const char *feature_list, const char *feature, i
 				if (lenp)
 					*lenp = 0;
 				if (offset)
-					*offset = found + len - feature_list;
+					*offset += found + len - feature_list;
 				return value;
 			}
 			/* feature with a value (e.g., "agent=git/1.2.3") */
@@ -628,7 +628,7 @@ const char *parse_feature_value(const char *feature_list, const char *feature, i
 				if (lenp)
 					*lenp = end;
 				if (offset)
-					*offset = value + end - feature_list;
+					*offset += value + end - feature_list;
 				return value;
 			}
 			/*
diff --git a/t/t5512-ls-remote.sh b/t/t5512-ls-remote.sh
index 20d063fb9a..64dc6fa91c 100755
--- a/t/t5512-ls-remote.sh
+++ b/t/t5512-ls-remote.sh
@@ -15,6 +15,19 @@ generate_references () {
 	done
 }
 
+test_expect_success 'set up fake upload-pack' '
+	# This can be used to simulate an upload-pack that just shows the
+	# contents of the "input" file (prepared with the test-tool pkt-line
+	# helper), and does not do any negotiation (since ls-remote does not
+	# need it).
+	write_script cat-input <<-\EOF
+	# send our initial advertisement/response
+	cat input
+	# soak up the flush packet from the client
+	cat
+	EOF
+'
+
 test_expect_success 'dies when no remote found' '
 	test_must_fail git ls-remote
 '
@@ -360,4 +373,35 @@ test_expect_success 'ls-remote prefixes work with all protocol versions' '
 	test_cmp expect actual.v2
 '
 
+test_expect_success 'v0 clients can handle multiple symrefs' '
+	# Modern versions of Git will not return multiple symref capabilities
+	# for v0, so we have to hard-code the response. Note that we will
+	# always use both v0 and object-format=sha1 here, as the hard-coded
+	# response reflects a server that only supports those.
+	oid=1234567890123456789012345678901234567890 &&
+	symrefs="symref=refs/remotes/origin/HEAD:refs/remotes/origin/main" &&
+	symrefs="$symrefs symref=HEAD:refs/heads/main" &&
+
+	test-tool pkt-line pack >input.q <<-EOF &&
+	$oid HEADQ$symrefs
+	$oid refs/heads/main
+	$oid refs/remotes/origin/HEAD
+	$oid refs/remotes/origin/main
+	0000
+	EOF
+	q_to_nul <input.q >input &&
+
+	cat >expect <<-EOF &&
+	ref: refs/heads/main	HEAD
+	$oid	HEAD
+	$oid	refs/heads/main
+	ref: refs/remotes/origin/main	refs/remotes/origin/HEAD
+	$oid	refs/remotes/origin/HEAD
+	$oid	refs/remotes/origin/main
+	EOF
+
+	git ls-remote --symref --upload-pack=./cat-input . >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.40.0.493.gfc602f1919


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/7] t5512: stop referring to "v1" protocol
  2023-04-12  6:23           ` [PATCH 0/7] v0 multiple-symref infinite loop fix and test cleanup Jeff King
  2023-04-12  6:29             ` [PATCH 1/7] v0 protocol: fix infinite loop when parsing multi-valued capabilities Jeff King
@ 2023-04-12  6:29             ` Jeff King
  2023-04-12  6:31             ` [PATCH 3/7] t5512: stop using jgit for capabilities^{} test Jeff King
                               ` (4 subsequent siblings)
  6 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2023-04-12  6:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, Jonas Haag, brian m. carlson, git

There really isn't a "v1" Git protocol. It's just v0 with an extra probe
which we used to test compatibility in preparation for v2. Any tests
that are looking for before/after behavior for v2 really care about
"v0". Mentioning "v1" in these tests is just making things more
confusing, because we don't care about that probe; we're really testing
v0. So let's say so.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5512-ls-remote.sh | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/t/t5512-ls-remote.sh b/t/t5512-ls-remote.sh
index 64dc6fa91c..ce7a9f1594 100755
--- a/t/t5512-ls-remote.sh
+++ b/t/t5512-ls-remote.sh
@@ -358,17 +358,17 @@ test_expect_success 'ls-remote --sort fails gracefully outside repository' '
 test_expect_success 'ls-remote patterns work with all protocol versions' '
 	git for-each-ref --format="%(objectname)	%(refname)" \
 		refs/heads/main refs/remotes/origin/main >expect &&
-	git -c protocol.version=1 ls-remote . main >actual.v1 &&
-	test_cmp expect actual.v1 &&
+	git -c protocol.version=0 ls-remote . main >actual.v0 &&
+	test_cmp expect actual.v0 &&
 	git -c protocol.version=2 ls-remote . main >actual.v2 &&
 	test_cmp expect actual.v2
 '
 
 test_expect_success 'ls-remote prefixes work with all protocol versions' '
 	git for-each-ref --format="%(objectname)	%(refname)" \
 		refs/heads/ refs/tags/ >expect &&
-	git -c protocol.version=1 ls-remote --heads --tags . >actual.v1 &&
-	test_cmp expect actual.v1 &&
+	git -c protocol.version=0 ls-remote --heads --tags . >actual.v0 &&
+	test_cmp expect actual.v0 &&
 	git -c protocol.version=2 ls-remote --heads --tags . >actual.v2 &&
 	test_cmp expect actual.v2
 '
-- 
2.40.0.493.gfc602f1919


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/7] t5512: stop using jgit for capabilities^{} test
  2023-04-12  6:23           ` [PATCH 0/7] v0 multiple-symref infinite loop fix and test cleanup Jeff King
  2023-04-12  6:29             ` [PATCH 1/7] v0 protocol: fix infinite loop when parsing multi-valued capabilities Jeff King
  2023-04-12  6:29             ` [PATCH 2/7] t5512: stop referring to "v1" protocol Jeff King
@ 2023-04-12  6:31             ` Jeff King
  2023-04-12  9:04               ` Jeff King
  2023-04-12  6:34             ` [PATCH 4/7] t5512: add v2 support for "ls-remote --symref" test Jeff King
                               ` (3 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2023-04-12  6:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, Jonas Haag, brian m. carlson, git

Commit eb398797cd (connect: advertized capability is not a ref,
2016-09-09) added support for an upload-pack server responding with:

  0000000000000000000000000000000000000000        capabilities^{}

followed by a NUL and capabilities. This is not something Git itself has
ever produced for upload-pack, but JGit does. And hence the test used
JGit to reproduce the real-world situation. That was good for verifying
that the incompatibility was fixed, but it's a lousy regression test for
a few reasons:

  - hardly anybody runs it, because you have to have jgit installed

  - we're depending on jgit's behavior for the test to do anything
    useful. In particular, this behavior is only relevant to the v0
    protocol, but these days we ask for the v2 protocol by default. So
    for modern jgit, this is probably testing nothing.

  - it's complicated and slow. We had to do some fifo trickery to handle
    races, and this one test makes up 40% of the runtime of the total
    script.

Instead, let's just hard-code the response that's of interest to us.
That will test exactly what we want for every run.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5512-ls-remote.sh | 51 +++++++++++++-------------------------------
 1 file changed, 15 insertions(+), 36 deletions(-)

diff --git a/t/t5512-ls-remote.sh b/t/t5512-ls-remote.sh
index ce7a9f1594..4e21ab60bf 100755
--- a/t/t5512-ls-remote.sh
+++ b/t/t5512-ls-remote.sh
@@ -300,44 +300,23 @@ test_expect_success 'ls-remote --symref omits filtered-out matches' '
 	test_cmp expect actual
 '
 
-test_lazy_prereq GIT_DAEMON '
-	test_bool_env GIT_TEST_GIT_DAEMON true
-'
-
-# This test spawns a daemon, so run it only if the user would be OK with
-# testing with git-daemon.
-test_expect_success PIPE,JGIT,GIT_DAEMON 'indicate no refs in standards-compliant empty remote' '
-	test_set_port JGIT_DAEMON_PORT &&
-	JGIT_DAEMON_PID= &&
-	git init --bare empty.git &&
-	>empty.git/git-daemon-export-ok &&
-	mkfifo jgit_daemon_output &&
-	{
-		jgit daemon --port="$JGIT_DAEMON_PORT" . >jgit_daemon_output &
-		JGIT_DAEMON_PID=$!
-	} &&
-	test_when_finished kill "$JGIT_DAEMON_PID" &&
-	{
-		read line &&
-		case $line in
-		Exporting*)
-			;;
-		*)
-			echo "Expected: Exporting" &&
-			false;;
-		esac &&
-		read line &&
-		case $line in
-		"Listening on"*)
-			;;
-		*)
-			echo "Expected: Listening on" &&
-			false;;
-		esac
-	} <jgit_daemon_output &&
+test_expect_success 'indicate no refs in v0 standards-compliant empty remote' '
+	# Git does not produce an output like this, but it does match the
+	# standard and is produced by other implementations like JGit. So
+	# hard-code the case we care about.
+	#
+	# The actual capabilities do not matter; there are none that would
+	# change how ls-remote behaves.
+	oid=0000000000000000000000000000000000000000 &&
+	test-tool pkt-line pack >input.q <<-EOF &&
+	$oid capabilities^{}Qcaps-go-here
+	0000
+	EOF
+	q_to_nul <input.q >input &&
+
 	# --exit-code asks the command to exit with 2 when no
 	# matching refs are found.
-	test_expect_code 2 git ls-remote --exit-code git://localhost:$JGIT_DAEMON_PORT/empty.git
+	test_expect_code 2 git ls-remote --exit-code --upload-pack=./cat-input .
 '
 
 test_expect_success 'ls-remote works outside repository' '
-- 
2.40.0.493.gfc602f1919


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/7] t5512: add v2 support for "ls-remote --symref" test
  2023-04-12  6:23           ` [PATCH 0/7] v0 multiple-symref infinite loop fix and test cleanup Jeff King
                               ` (2 preceding siblings ...)
  2023-04-12  6:31             ` [PATCH 3/7] t5512: stop using jgit for capabilities^{} test Jeff King
@ 2023-04-12  6:34             ` Jeff King
  2023-04-12  6:35             ` [PATCH 5/7] t5512: allow any protocol version for filtered symref test Jeff King
                               ` (2 subsequent siblings)
  6 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2023-04-12  6:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, Jonas Haag, brian m. carlson, git

Commit b2f73b70b2 (t5512: compensate for v0 only sending HEAD symrefs,
2019-02-25) configured this test to always run with protocol v0, since
the output is different for v2.

But that means we are not getting any test coverage of the feature with
v2 at all. We could obviously switch to using and expecting v2, but then
that leaves v0 behind (and while we don't use it by default, it's still
important for testing interoperability with older servers). Likewise, we
could switch the expected output based on $GIT_TEST_PROTOCOL_VERSION,
but hardly anybody runs the tests for v0 these days.

Instead, let's explicitly run it for both protocol versions to make sure
they're well behaved. This matches other similar tests added later in
6a139cdd74 (ls-remote: pass heads/tags prefixes to transport,
2018-10-31), etc.

Signed-off-by: Jeff King <peff@peff.net>
---
I wondered briefly if anybody ever runs with GIT_TEST_PROTOCOL_VERSION=0,
but I'm pretty sure the answer is "no", because a bunch of test scripts
fail. Those are almost certainly all non-bugs, but rather just tests
that assume modern v2 behavior. I'm not sure if it's worth putting any
effort into fixing. I didn't do so here (though before and after my
patches t5512 itself runs OK in either mode).

 t/t5512-ls-remote.sh | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/t/t5512-ls-remote.sh b/t/t5512-ls-remote.sh
index 4e21ab60bf..74f0204c5b 100755
--- a/t/t5512-ls-remote.sh
+++ b/t/t5512-ls-remote.sh
@@ -244,22 +244,25 @@ test_expect_success 'protocol v2 supports hiderefs' '
 
 test_expect_success 'ls-remote --symref' '
 	git fetch origin &&
-	echo "ref: refs/heads/main	HEAD" >expect &&
+	echo "ref: refs/heads/main	HEAD" >expect.v2 &&
 	generate_references \
 		HEAD \
-		refs/heads/main >>expect &&
+		refs/heads/main >>expect.v2 &&
+	echo "ref: refs/remotes/origin/main	refs/remotes/origin/HEAD" >>expect.v2 &&
 	oid=$(git rev-parse HEAD) &&
-	echo "$oid	refs/remotes/origin/HEAD" >>expect &&
+	echo "$oid	refs/remotes/origin/HEAD" >>expect.v2 &&
 	generate_references \
 		refs/remotes/origin/main \
 		refs/tags/mark \
 		refs/tags/mark1.1 \
 		refs/tags/mark1.10 \
-		refs/tags/mark1.2 >>expect &&
-	# Protocol v2 supports sending symrefs for refs other than HEAD, so use
-	# protocol v0 here.
-	GIT_TEST_PROTOCOL_VERSION=0 git ls-remote --symref >actual &&
-	test_cmp expect actual
+		refs/tags/mark1.2 >>expect.v2 &&
+	# v0 does not show non-HEAD symrefs
+	grep -v "ref: refs/remotes" <expect.v2 >expect.v0 &&
+	git -c protocol.version=0 ls-remote --symref >actual.v0 &&
+	test_cmp expect.v0 actual.v0 &&
+	git -c protocol.version=2 ls-remote --symref >actual.v2 &&
+	test_cmp expect.v2 actual.v2
 '
 
 test_expect_success 'ls-remote with filtered symref (refname)' '
-- 
2.40.0.493.gfc602f1919


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 5/7] t5512: allow any protocol version for filtered symref test
  2023-04-12  6:23           ` [PATCH 0/7] v0 multiple-symref infinite loop fix and test cleanup Jeff King
                               ` (3 preceding siblings ...)
  2023-04-12  6:34             ` [PATCH 4/7] t5512: add v2 support for "ls-remote --symref" test Jeff King
@ 2023-04-12  6:35             ` Jeff King
  2023-04-12  6:37             ` [PATCH 6/7] t5512: test "ls-remote --heads --symref" filtering with v0 and v2 Jeff King
  2023-04-12  6:40             ` [PATCH 7/7] v0 protocol: use size_t for capability length/offset Jeff King
  6 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2023-04-12  6:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, Jonas Haag, brian m. carlson, git

We have a test that checks that ls-remote, when asked only about HEAD,
will report the HEAD symref, and not others. This was marked to always
run with the v0 protocol by b2f73b70b2 (t5512: compensate for v0 only
sending HEAD symrefs, 2019-02-25).

But in v0 this test is doing nothing! For v0, upload-pack only reports
the HEAD symref anyway, so we'd never have any other symref to report.

For v2, it is useful; we learn about all symrefs (and the test repo has
multiple), so this demonstrates that we correctly avoid showing them.

We could perhaps mark this to test explicitly with v2, but since that is
the default these days, it's sufficient to just run ls-remote without
any protocol specification. It still passes if somebody does an explicit
GIT_TEST_PROTOCOL_VERSION=0; it's just testing less.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5512-ls-remote.sh | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/t/t5512-ls-remote.sh b/t/t5512-ls-remote.sh
index 74f0204c5b..1ac3b50ca5 100755
--- a/t/t5512-ls-remote.sh
+++ b/t/t5512-ls-remote.sh
@@ -271,9 +271,7 @@ test_expect_success 'ls-remote with filtered symref (refname)' '
 	ref: refs/heads/main	HEAD
 	$rev	HEAD
 	EOF
-	# Protocol v2 supports sending symrefs for refs other than HEAD, so use
-	# protocol v0 here.
-	GIT_TEST_PROTOCOL_VERSION=0 git ls-remote --symref . HEAD >actual &&
+	git ls-remote --symref . HEAD >actual &&
 	test_cmp expect actual
 '
 
-- 
2.40.0.493.gfc602f1919


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 6/7] t5512: test "ls-remote --heads --symref" filtering with v0 and v2
  2023-04-12  6:23           ` [PATCH 0/7] v0 multiple-symref infinite loop fix and test cleanup Jeff King
                               ` (4 preceding siblings ...)
  2023-04-12  6:35             ` [PATCH 5/7] t5512: allow any protocol version for filtered symref test Jeff King
@ 2023-04-12  6:37             ` Jeff King
  2023-04-12  6:40             ` [PATCH 7/7] v0 protocol: use size_t for capability length/offset Jeff King
  6 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2023-04-12  6:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, Jonas Haag, brian m. carlson, git

We have two overlapping tests for checking the behavior of "ls-remote
--symref" when filtering output. The first test checks that using
"--heads" will omit the symref for HEAD (since we don't print anything
about HEAD at all), but still prints other symrefs.

This has been marked as expecting failure since it was added in
99c08d4eb2 (ls-remote: add support for showing symrefs, 2016-01-19).
That's because back then, we only had the v0 protocol, and it only
reported on the HEAD symref, not others. But these days we have v2,
which does exactly what the test wants. It would even have started
unexpectedly passing when we switched to v2 by default, except that
b2f73b70b2 (t5512: compensate for v0 only sending HEAD symrefs,
2019-02-25) over-zealously marked it to run only in v0 mode.

So let's run it with both protocol versions, and adjust the expected
output for each. It passes in v2 without modification. In v0 mode, we'll
drop the extra symref, but this is still testing something useful: it
ensures that we do omit HEAD.

The test after this checks "--heads" again, this time using the expected
v0 output. That's now redundant. It also checks that limiting with a
pattern like "refs/heads/*" works similarly, but that's redundant with a
test earlier in the script which limits by HEAD (again, back then the
"HEAD" test was less interesting because there were no other symrefs to
omit, but in a modern v2 world, there are). So we can just delete that
second test entirely.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5512-ls-remote.sh | 26 +++++++-------------------
 1 file changed, 7 insertions(+), 19 deletions(-)

diff --git a/t/t5512-ls-remote.sh b/t/t5512-ls-remote.sh
index 1ac3b50ca5..92cb9fed78 100755
--- a/t/t5512-ls-remote.sh
+++ b/t/t5512-ls-remote.sh
@@ -275,30 +275,18 @@ test_expect_success 'ls-remote with filtered symref (refname)' '
 	test_cmp expect actual
 '
 
-test_expect_failure 'ls-remote with filtered symref (--heads)' '
+test_expect_success 'ls-remote with filtered symref (--heads)' '
 	git symbolic-ref refs/heads/foo refs/tags/mark &&
-	cat >expect <<-EOF &&
+	cat >expect.v2 <<-EOF &&
 	ref: refs/tags/mark	refs/heads/foo
 	$rev	refs/heads/foo
 	$rev	refs/heads/main
 	EOF
-	# Protocol v2 supports sending symrefs for refs other than HEAD, so use
-	# protocol v0 here.
-	GIT_TEST_PROTOCOL_VERSION=0 git ls-remote --symref --heads . >actual &&
-	test_cmp expect actual
-'
-
-test_expect_success 'ls-remote --symref omits filtered-out matches' '
-	cat >expect <<-EOF &&
-	$rev	refs/heads/foo
-	$rev	refs/heads/main
-	EOF
-	# Protocol v2 supports sending symrefs for refs other than HEAD, so use
-	# protocol v0 here.
-	GIT_TEST_PROTOCOL_VERSION=0 git ls-remote --symref --heads . >actual &&
-	test_cmp expect actual &&
-	GIT_TEST_PROTOCOL_VERSION=0 git ls-remote --symref . "refs/heads/*" >actual &&
-	test_cmp expect actual
+	grep -v "^ref: refs/tags/" <expect.v2 >expect.v0 &&
+	git -c protocol.version=0 ls-remote --symref --heads . >actual.v0 &&
+	test_cmp expect.v0 actual.v0 &&
+	git -c protocol.version=2 ls-remote --symref --heads . >actual.v2 &&
+	test_cmp expect.v2 actual.v2
 '
 
 test_expect_success 'indicate no refs in v0 standards-compliant empty remote' '
-- 
2.40.0.493.gfc602f1919


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 7/7] v0 protocol: use size_t for capability length/offset
  2023-04-12  6:23           ` [PATCH 0/7] v0 multiple-symref infinite loop fix and test cleanup Jeff King
                               ` (5 preceding siblings ...)
  2023-04-12  6:37             ` [PATCH 6/7] t5512: test "ls-remote --heads --symref" filtering with v0 and v2 Jeff King
@ 2023-04-12  6:40             ` Jeff King
  6 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2023-04-12  6:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, Jonas Haag, brian m. carlson, git

When parsing server capabilities, we use "int" to store lengths and
offsets. At first glance this seems like a spot where our parser may be
confused by integer overflow if somebody sent us a malicious response.

In practice these strings are all bounded by the 64k limit of a
pkt-line, so using "int" is OK. However, it makes the code simpler to
audit if they just use size_t everywhere. Note that because we take
these parameters as pointers, this also forces many callers to update
their declared types.

Signed-off-by: Jeff King <peff@peff.net>
---
I guess you could argue that "int" is fine here, given the context.
Mostly I just did a double-take while looking at the code, so I thought
this might save the next person from doing the same.

 builtin/receive-pack.c |  2 +-
 connect.c              | 22 +++++++++++-----------
 connect.h              |  4 ++--
 fetch-pack.c           |  4 ++--
 send-pack.c            |  2 +-
 transport.c            |  2 +-
 upload-pack.c          |  2 +-
 7 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 9109552533..3b495ecc84 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -2093,7 +2093,7 @@ static struct command *read_head_info(struct packet_reader *reader,
 			const char *feature_list = reader->line + linelen + 1;
 			const char *hash = NULL;
 			const char *client_sid;
-			int len = 0;
+			size_t len = 0;
 			if (parse_feature_request(feature_list, "report-status"))
 				report_status = 1;
 			if (parse_feature_request(feature_list, "report-status-v2"))
diff --git a/connect.c b/connect.c
index 5e265ba9d7..5685551cf0 100644
--- a/connect.c
+++ b/connect.c
@@ -22,7 +22,7 @@
 
 static char *server_capabilities_v1;
 static struct strvec server_capabilities_v2 = STRVEC_INIT;
-static const char *next_server_feature_value(const char *feature, int *len, int *offset);
+static const char *next_server_feature_value(const char *feature, size_t *len, size_t *offset);
 
 static int check_ref(const char *name, unsigned int flags)
 {
@@ -205,10 +205,10 @@ static void parse_one_symref_info(struct string_list *symref, const char *val, i
 static void annotate_refs_with_symref_info(struct ref *ref)
 {
 	struct string_list symref = STRING_LIST_INIT_DUP;
-	int offset = 0;
+	size_t offset = 0;
 
 	while (1) {
-		int len;
+		size_t len;
 		const char *val;
 
 		val = next_server_feature_value("symref", &len, &offset);
@@ -231,7 +231,7 @@ static void annotate_refs_with_symref_info(struct ref *ref)
 static void process_capabilities(struct packet_reader *reader, int *linelen)
 {
 	const char *feat_val;
-	int feat_len;
+	size_t feat_len;
 	const char *line = reader->line;
 	int nul_location = strlen(line);
 	if (nul_location == *linelen)
@@ -595,9 +595,9 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 	return list;
 }
 
-const char *parse_feature_value(const char *feature_list, const char *feature, int *lenp, int *offset)
+const char *parse_feature_value(const char *feature_list, const char *feature, size_t *lenp, size_t *offset)
 {
-	int len;
+	size_t len;
 
 	if (!feature_list)
 		return NULL;
@@ -621,7 +621,7 @@ const char *parse_feature_value(const char *feature_list, const char *feature, i
 			}
 			/* feature with a value (e.g., "agent=git/1.2.3") */
 			else if (*value == '=') {
-				int end;
+				size_t end;
 
 				value++;
 				end = strcspn(value, " \t\n");
@@ -643,8 +643,8 @@ const char *parse_feature_value(const char *feature_list, const char *feature, i
 
 int server_supports_hash(const char *desired, int *feature_supported)
 {
-	int offset = 0;
-	int len;
+	size_t offset = 0;
+	size_t len;
 	const char *hash;
 
 	hash = next_server_feature_value("object-format", &len, &offset);
@@ -668,12 +668,12 @@ int parse_feature_request(const char *feature_list, const char *feature)
 	return !!parse_feature_value(feature_list, feature, NULL, NULL);
 }
 
-static const char *next_server_feature_value(const char *feature, int *len, int *offset)
+static const char *next_server_feature_value(const char *feature, size_t *len, size_t *offset)
 {
 	return parse_feature_value(server_capabilities_v1, feature, len, offset);
 }
 
-const char *server_feature_value(const char *feature, int *len)
+const char *server_feature_value(const char *feature, size_t *len)
 {
 	return parse_feature_value(server_capabilities_v1, feature, len, NULL);
 }
diff --git a/connect.h b/connect.h
index f41a0b4c1f..1645126c17 100644
--- a/connect.h
+++ b/connect.h
@@ -12,14 +12,14 @@ int finish_connect(struct child_process *conn);
 int git_connection_is_socket(struct child_process *conn);
 int server_supports(const char *feature);
 int parse_feature_request(const char *features, const char *feature);
-const char *server_feature_value(const char *feature, int *len_ret);
+const char *server_feature_value(const char *feature, size_t *len_ret);
 int url_is_local_not_ssh(const char *url);
 
 struct packet_reader;
 enum protocol_version discover_version(struct packet_reader *reader);
 
 int server_supports_hash(const char *desired, int *feature_supported);
-const char *parse_feature_value(const char *feature_list, const char *feature, int *lenp, int *offset);
+const char *parse_feature_value(const char *feature_list, const char *feature, size_t *lenp, size_t *offset);
 int server_supports_v2(const char *c);
 void ensure_server_supports_v2(const char *c);
 int server_feature_v2(const char *c, const char **v);
diff --git a/fetch-pack.c b/fetch-pack.c
index 368f2ed25a..97a44ed582 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1099,7 +1099,7 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 	struct ref *ref = copy_ref_list(orig_ref);
 	struct object_id oid;
 	const char *agent_feature;
-	int agent_len;
+	size_t agent_len;
 	struct fetch_negotiator negotiator_alloc;
 	struct fetch_negotiator *negotiator;
 
@@ -1117,7 +1117,7 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 		agent_supported = 1;
 		if (agent_len)
 			print_verbose(args, _("Server version is %.*s"),
-				      agent_len, agent_feature);
+				      (int)agent_len, agent_feature);
 	}
 
 	if (!server_supports("session-id"))
diff --git a/send-pack.c b/send-pack.c
index f81bbb28d7..97344b629e 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -538,7 +538,7 @@ int send_pack(struct send_pack_args *args,
 		die(_("the receiving end does not support this repository's hash algorithm"));
 
 	if (args->push_cert != SEND_PACK_PUSH_CERT_NEVER) {
-		int len;
+		size_t len;
 		push_cert_nonce = server_feature_value("push-cert", &len);
 		if (push_cert_nonce) {
 			reject_invalid_nonce(push_cert_nonce, len);
diff --git a/transport.c b/transport.c
index 89a220425e..6223dc3de2 100644
--- a/transport.c
+++ b/transport.c
@@ -317,7 +317,7 @@ static struct ref *handshake(struct transport *transport, int for_push,
 	struct git_transport_data *data = transport->data;
 	struct ref *refs = NULL;
 	struct packet_reader reader;
-	int sid_len;
+	size_t sid_len;
 	const char *server_sid;
 
 	connect_setup(transport, for_push);
diff --git a/upload-pack.c b/upload-pack.c
index e23f16dfdd..565e46058f 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -1067,7 +1067,7 @@ static void receive_needs(struct upload_pack_data *data,
 		const char *features;
 		struct object_id oid_buf;
 		const char *arg;
-		int feature_len;
+		size_t feature_len;
 
 		reset_timeout(data->timeout);
 		if (packet_reader_read(reader) != PACKET_READ_NORMAL)
-- 
2.40.0.493.gfc602f1919

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/7] v0 protocol: fix infinite loop when parsing multi-valued capabilities
  2023-04-12  6:29             ` [PATCH 1/7] v0 protocol: fix infinite loop when parsing multi-valued capabilities Jeff King
@ 2023-04-12  6:46               ` Jeff King
  2023-04-12  7:25                 ` [PATCH v2 " Jeff King
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2023-04-12  6:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, Jonas Haag, brian m. carlson, git

On Wed, Apr 12, 2023 at 02:29:24AM -0400, Jeff King wrote:

> But on the second call, now *offset is set to that larger index, which
> lets us skip past the first "symref" capability. However, we do so by
> incrementing feature_list. That means our pointer difference is now too
> small; it is counting from where we resumed parsing, not from the start
> of the original feature_list pointer. And because we incremented
> feature_list only inside our function, and not the caller, that
> increment is lost next time the function is called.
> 
> The simplest solution is to account for those skipped bytes by
> incrementing *offset, rather than assigning to it. (The other possible
> solution is to add an extra level of pointer indirection to feature_list
> so that the caller knows we moved it forward, but that seems more
> complicated).

Hmph. So after convincing myself that was the end of it, now I'm not so
sure. We also increment feature_list if we find a false positive in the
middle of another entry. I.e., the code even after this patch looks
like:

          while (*feature_list) {
                  const char *found = strstr(feature_list, feature);
                  if (!found)
                          return NULL;
                  if (feature_list == found || isspace(found[-1])) {
                          const char *value = found + len;
                          /* feature with no value (e.g., "thin-pack") */
                          if (!*value || isspace(*value)) {
                                  if (lenp)
                                          *lenp = 0;
                                  if (offset)
                                          *offset += found + len - feature_list;
                                  return value;
                          }
                          /* feature with a value (e.g., "agent=git/1.2.3") */
                          else if (*value == '=') {
                                  size_t end;
  
                                  value++;
                                  end = strcspn(value, " \t\n");
                                  if (lenp)
                                          *lenp = end;
                                  if (offset)
                                          *offset += value + end - feature_list;
                                  return value;
                          }
                          /*
                           * otherwise we matched a substring of another feature;
                           * keep looking
                           */
                  }
                  feature_list = found + 1;
          }
          return NULL;

So if we have something like:

   agent=i-like-symrefs symref=HEAD:refs/heads/foo

then we'd find the "symref" value in the agent line, increment
feature_list, and then find the real one. But our pointer difference
will again be too short! And incrementing "offset" rather than assigning
it won't help, because those skipped bytes are not accounted for in the
existing value of "offset".

So what we probably want is a third possibility I didn't allow for: keep
the original value of feature_list intact, and use a separate pointer to
increment. And then assigning "*offset = value + end - feature_list"
will always be correct, because the offset will always be from the
original, true beginning of the string.

The fix is easy, but let me see if I can come up with a test.

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 1/7] v0 protocol: fix infinite loop when parsing multi-valued capabilities
  2023-04-12  6:46               ` Jeff King
@ 2023-04-12  7:25                 ` Jeff King
  2023-04-12  7:26                   ` Jeff King
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2023-04-12  7:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, Jonas Haag, brian m. carlson, git

On Wed, Apr 12, 2023 at 02:46:51AM -0400, Jeff King wrote:

> So if we have something like:
> 
>    agent=i-like-symrefs symref=HEAD:refs/heads/foo
> 
> then we'd find the "symref" value in the agent line, increment
> feature_list, and then find the real one. But our pointer difference
> will again be too short! And incrementing "offset" rather than assigning
> it won't help, because those skipped bytes are not accounted for in the
> existing value of "offset".
> 
> So what we probably want is a third possibility I didn't allow for: keep
> the original value of feature_list intact, and use a separate pointer to
> increment. And then assigning "*offset = value + end - feature_list"
> will always be correct, because the offset will always be from the
> original, true beginning of the string.
> 
> The fix is easy, but let me see if I can come up with a test.

OK, this is indeed buggy, but I'm not sure you can actually get it to
produce a user-visible result with the incrementing patch applied.

The problem is that "offset" is incremented by too-small an amount. So
in the example above, we might end up partway through the value of
"symref=", rather than at the end. In the worst case we might see the
same "symref=" entry again, which would cause us to attach it to the
list a second time, leaking the memory. But either way we always make
forward progress through the string, so we'd never loop infinitely, nor
produce a wrong answer (we can never _skip_ an entry, only accidentally
process it twice).

So I think it's worth fixing, and even worth beefing up the test to
include this case, but sadly there's no specific case to check for. So I
just rolled it into the fix in patch 1.

Here's that patch (and I'll send a range-diff in a second). The rest are
the same, modulo a small textual conflict in patch 7 that touches a
nearby line, but I'll refrain from re-sending the rest of them until
I've gotten more review.

-- >8 --
Subject: v0 protocol: fix infinite loop when parsing multi-valued capabilities

If Git's client-side parsing of an upload-pack response (so git-fetch or
ls-remote) sees multiple instances of a single capability, it can enter
an infinite loop due to a bug in advancing the "offset" parameter in the
parser.

This bug can't happen between a client and server of the same Git
version. The client bug is in parse_feature_value() when the caller
passes in an offset parameter. And that only happens when the v0
protocol is parsing "symref" and "object-format" capabilities, via
next_server_feature_value(). But Git has never produced multiple
object-format capabilities, and it stopped producing multiple symref
values in d007dbf7d6 (Revert "upload-pack: send non-HEAD symbolic refs",
2013-11-18).

However, upload-pack did produce multiple symref entries for a while,
and they are valid. Plus other implementations, such as Dulwich will
still do so. So we should handle them. And even if we do not expect it,
it is obviously a bug for the parser to enter an infinite loop.

The bug itself is pretty simple. Commit 2c6a403d96 (connect: add
function to parse multiple v1 capability values, 2020-05-25) added the
"offset" parameter, which is used as both an in- and out-parameter. When
parsing the first "symref" capability, *offset will be 0 on input, and
after parsing the capability, we set *offset to an index just past the
value by taking a pointer difference "(value + end) - feature_list".

But on the second call, now *offset is set to that larger index, which
lets us skip past the first "symref" capability. However, we do so by
incrementing feature_list. That means our pointer difference is now too
small; it is counting from where we resumed parsing, not from the start
of the original feature_list pointer. And because we incremented
feature_list only inside our function, and not the caller, that
increment is lost next time the function is called.

One solution would be to account for those skipped bytes by incrementing
*offset, rather than assigning to it. But wait, there's more!

We also increment feature_list if we have a near-miss. Say we are
looking for "symref" and find "almost-symref". In that case we'll point
feature_list to the "y" in "almost-symref" and restart our search. But
that again means our offset won't be correct, as it won't account for
the bytes between the start of the string and that "y".

So instead, let's just record the beginning of the feature_list string
in a separate pointer that we never touch. That offset we take in and
return is meant to be using that point as a base, and now we'll do so
consistently.

Since the bug can't be reproduced using the current version of
git-upload-pack, we'll instead hard-code an input which triggers the
problem. Before this patch it loops forever re-parsing the second symref
entry. Now we check both that it finishes, and that it parses both
entries correctly (a case we could not test at all before).

We don't need to worry about testing v2 here; it communicates the
capabilities in a completely different way, and doesn't use this code at
all. There are tests earlier in t5512 that are meant to cover this (they
don't, but we'll address that in a future patch).

Reported-by: Jonas Haag <jonas@lophus.org>
Signed-off-by: Jeff King <peff@peff.net>
---
 connect.c            |  5 +++--
 t/t5512-ls-remote.sh | 52 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/connect.c b/connect.c
index c0c8a38178..0dc739c4e5 100644
--- a/connect.c
+++ b/connect.c
@@ -597,6 +597,7 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 
 const char *parse_feature_value(const char *feature_list, const char *feature, int *lenp, int *offset)
 {
+	const char *orig_start = feature_list;
 	int len;
 
 	if (!feature_list)
@@ -616,7 +617,7 @@ const char *parse_feature_value(const char *feature_list, const char *feature, i
 				if (lenp)
 					*lenp = 0;
 				if (offset)
-					*offset = found + len - feature_list;
+					*offset = found + len - orig_start;
 				return value;
 			}
 			/* feature with a value (e.g., "agent=git/1.2.3") */
@@ -628,7 +629,7 @@ const char *parse_feature_value(const char *feature_list, const char *feature, i
 				if (lenp)
 					*lenp = end;
 				if (offset)
-					*offset = value + end - feature_list;
+					*offset = value + end - orig_start;
 				return value;
 			}
 			/*
diff --git a/t/t5512-ls-remote.sh b/t/t5512-ls-remote.sh
index 20d063fb9a..cab67282df 100755
--- a/t/t5512-ls-remote.sh
+++ b/t/t5512-ls-remote.sh
@@ -15,6 +15,19 @@ generate_references () {
 	done
 }
 
+test_expect_success 'set up fake upload-pack' '
+	# This can be used to simulate an upload-pack that just shows the
+	# contents of the "input" file (prepared with the test-tool pkt-line
+	# helper), and does not do any negotiation (since ls-remote does not
+	# need it).
+	write_script cat-input <<-\EOF
+	# send our initial advertisement/response
+	cat input
+	# soak up the flush packet from the client
+	cat
+	EOF
+'
+
 test_expect_success 'dies when no remote found' '
 	test_must_fail git ls-remote
 '
@@ -360,4 +373,43 @@ test_expect_success 'ls-remote prefixes work with all protocol versions' '
 	test_cmp expect actual.v2
 '
 
+test_expect_success 'v0 clients can handle multiple symrefs' '
+	# Modern versions of Git will not return multiple symref capabilities
+	# for v0, so we have to hard-code the response. Note that we will
+	# always use both v0 and object-format=sha1 here, as the hard-coded
+	# response reflects a server that only supports those.
+	oid=1234567890123456789012345678901234567890 &&
+	symrefs="symref=refs/remotes/origin/HEAD:refs/remotes/origin/main" &&
+	symrefs="$symrefs symref=HEAD:refs/heads/main" &&
+
+	# Likewise we want to make sure our parser is not fooled by the string
+	# "symref" appearing as part of an earlier cap. But there is no way to
+	# do that via upload-pack, as arbitrary strings can appear only in a
+	# "symref" value itself (where we skip past the values as a whole)
+	# and "agent" (which always appears after "symref", so putting our
+	# parser in a confused state is less interesting).
+	caps="some other caps including a-fake-symref-cap" &&
+
+	test-tool pkt-line pack >input.q <<-EOF &&
+	$oid HEADQ$caps $symrefs
+	$oid refs/heads/main
+	$oid refs/remotes/origin/HEAD
+	$oid refs/remotes/origin/main
+	0000
+	EOF
+	q_to_nul <input.q >input &&
+
+	cat >expect <<-EOF &&
+	ref: refs/heads/main	HEAD
+	$oid	HEAD
+	$oid	refs/heads/main
+	ref: refs/remotes/origin/main	refs/remotes/origin/HEAD
+	$oid	refs/remotes/origin/HEAD
+	$oid	refs/remotes/origin/main
+	EOF
+
+	git ls-remote --symref --upload-pack=./cat-input . >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.40.0.493.gfc602f1919


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/7] v0 protocol: fix infinite loop when parsing multi-valued capabilities
  2023-04-12  7:25                 ` [PATCH v2 " Jeff King
@ 2023-04-12  7:26                   ` Jeff King
  0 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2023-04-12  7:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, Jonas Haag, brian m. carlson, git

On Wed, Apr 12, 2023 at 03:25:10AM -0400, Jeff King wrote:

> So I think it's worth fixing, and even worth beefing up the test to
> include this case, but sadly there's no specific case to check for. So I
> just rolled it into the fix in patch 1.
> 
> Here's that patch (and I'll send a range-diff in a second). The rest are
> the same, modulo a small textual conflict in patch 7 that touches a
> nearby line, but I'll refrain from re-sending the rest of them until
> I've gotten more review.

And here's the range-diff.

1:  c379efdcc0 ! 1:  3958a89100 v0 protocol: fix infinite loop when parsing multi-valued capabilities
    @@ Commit message
         feature_list only inside our function, and not the caller, that
         increment is lost next time the function is called.
     
    -    The simplest solution is to account for those skipped bytes by
    -    incrementing *offset, rather than assigning to it. (The other possible
    -    solution is to add an extra level of pointer indirection to feature_list
    -    so that the caller knows we moved it forward, but that seems more
    -    complicated).
    +    One solution would be to account for those skipped bytes by incrementing
    +    *offset, rather than assigning to it. But wait, there's more!
    +
    +    We also increment feature_list if we have a near-miss. Say we are
    +    looking for "symref" and find "almost-symref". In that case we'll point
    +    feature_list to the "y" in "almost-symref" and restart our search. But
    +    that again means our offset won't be correct, as it won't account for
    +    the bytes between the start of the string and that "y".
    +
    +    So instead, let's just record the beginning of the feature_list string
    +    in a separate pointer that we never touch. That offset we take in and
    +    return is meant to be using that point as a base, and now we'll do so
    +    consistently.
     
         Since the bug can't be reproduced using the current version of
         git-upload-pack, we'll instead hard-code an input which triggers the
    @@ Commit message
         Signed-off-by: Jeff King <peff@peff.net>
     
      ## connect.c ##
    +@@ connect.c: struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
    + 
    + const char *parse_feature_value(const char *feature_list, const char *feature, int *lenp, int *offset)
    + {
    ++	const char *orig_start = feature_list;
    + 	int len;
    + 
    + 	if (!feature_list)
     @@ connect.c: const char *parse_feature_value(const char *feature_list, const char *feature, i
      				if (lenp)
      					*lenp = 0;
      				if (offset)
     -					*offset = found + len - feature_list;
    -+					*offset += found + len - feature_list;
    ++					*offset = found + len - orig_start;
      				return value;
      			}
      			/* feature with a value (e.g., "agent=git/1.2.3") */
    @@ connect.c: const char *parse_feature_value(const char *feature_list, const char
      					*lenp = end;
      				if (offset)
     -					*offset = value + end - feature_list;
    -+					*offset += value + end - feature_list;
    ++					*offset = value + end - orig_start;
      				return value;
      			}
      			/*
    @@ t/t5512-ls-remote.sh: test_expect_success 'ls-remote prefixes work with all prot
     +	symrefs="symref=refs/remotes/origin/HEAD:refs/remotes/origin/main" &&
     +	symrefs="$symrefs symref=HEAD:refs/heads/main" &&
     +
    ++	# Likewise we want to make sure our parser is not fooled by the string
    ++	# "symref" appearing as part of an earlier cap. But there is no way to
    ++	# do that via upload-pack, as arbitrary strings can appear only in a
    ++	# "symref" value itself (where we skip past the values as a whole)
    ++	# and "agent" (which always appears after "symref", so putting our
    ++	# parser in a confused state is less interesting).
    ++	caps="some other caps including a-fake-symref-cap" &&
    ++
     +	test-tool pkt-line pack >input.q <<-EOF &&
    -+	$oid HEADQ$symrefs
    ++	$oid HEADQ$caps $symrefs
     +	$oid refs/heads/main
     +	$oid refs/remotes/origin/HEAD
     +	$oid refs/remotes/origin/main
2:  2a8e6943e5 = 2:  4cb0739770 t5512: stop referring to "v1" protocol
3:  7e411ed56b = 3:  4fe3be0476 t5512: stop using jgit for capabilities^{} test
4:  65387a02bf = 4:  77875ff7f6 t5512: add v2 support for "ls-remote --symref" test
5:  3409bd94b0 = 5:  642608ac2d t5512: allow any protocol version for filtered symref test
6:  c42a25ceca = 6:  20c78c9e41 t5512: test "ls-remote --heads --symref" filtering with v0 and v2
7:  86067753cb ! 7:  48a46fc6fc v0 protocol: use size_t for capability length/offset
    @@ connect.c: struct ref **get_remote_refs(int fd_out, struct packet_reader *reader
     -const char *parse_feature_value(const char *feature_list, const char *feature, int *lenp, int *offset)
     +const char *parse_feature_value(const char *feature_list, const char *feature, size_t *lenp, size_t *offset)
      {
    + 	const char *orig_start = feature_list;
     -	int len;
     +	size_t len;
      

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/7] t5512: stop using jgit for capabilities^{} test
  2023-04-12  6:31             ` [PATCH 3/7] t5512: stop using jgit for capabilities^{} test Jeff King
@ 2023-04-12  9:04               ` Jeff King
  2023-04-14 21:24                 ` [PATCH v3 0/7] v0 multiple-symref infinite loop fix and test cleanup Jeff King
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2023-04-12  9:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, Jonas Haag, brian m. carlson, git

On Wed, Apr 12, 2023 at 02:31:18AM -0400, Jeff King wrote:

> Commit eb398797cd (connect: advertized capability is not a ref,
> 2016-09-09) added support for an upload-pack server responding with:
> 
>   0000000000000000000000000000000000000000        capabilities^{}
> 
> followed by a NUL and capabilities. This is not something Git itself has
> ever produced for upload-pack, but JGit does. And hence the test used
> JGit to reproduce the real-world situation. That was good for verifying
> that the incompatibility was fixed, but it's a lousy regression test for
> a few reasons:
> 
>   - hardly anybody runs it, because you have to have jgit installed
> 
>   - we're depending on jgit's behavior for the test to do anything
>     useful. In particular, this behavior is only relevant to the v0
>     protocol, but these days we ask for the v2 protocol by default. So
>     for modern jgit, this is probably testing nothing.

I was worried that changing this one might be churn. But as it turns
out, it reveals that there's a bug in the code which we've been missing
all this time because nobody was running the test!

When run with GIT_TEST_DEFAULT_HASH=sha256, this will fail because the
code uses the local idea of the hash algorithm, rather than what the
other side advertises. We have a linux-sha256 CI job, but of course it
didn't have jgit installed, so it always skipped this test (until my
patch, where it now reveals the bug).

The fix is just:

diff --git a/connect.c b/connect.c
index 66397cc911..c54adc652f 100644
--- a/connect.c
+++ b/connect.c
@@ -263,7 +263,8 @@ static int process_dummy_ref(const struct packet_reader *reader)
 		return 0;
 	name++;
 
-	return oideq(null_oid(), &oid) && !strcmp(name, "capabilities^{}");
+	return oideq(reader->hash_algo->null_oid, &oid) &&
+		!strcmp(name, "capabilities^{}");
 }
 
 static void check_no_capabilities(const char *line, int len)

I'll squash that in and update the commit message before I do the next
re-roll (but will still hold off a bit to get further comments).

-Peff

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 0/7] v0 multiple-symref infinite loop fix and test cleanup
  2023-04-12  9:04               ` Jeff King
@ 2023-04-14 21:24                 ` Jeff King
  2023-04-14 21:24                   ` [PATCH v3 1/7] v0 protocol: fix infinite loop when parsing multi-valued capabilities Jeff King
                                     ` (7 more replies)
  0 siblings, 8 replies; 30+ messages in thread
From: Jeff King @ 2023-04-14 21:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, Jonas Haag, brian m. carlson, git

On Wed, Apr 12, 2023 at 05:04:23AM -0400, Jeff King wrote:

> I'll squash that in and update the commit message before I do the next
> re-roll (but will still hold off a bit to get further comments).

Nobody said anything, so I assume the rest of the series is perfect. ;)

Junio, I see that you picked up this fix as a "squash", along with my
other "v2" update. Here's a v3 that does the actual squash along with a
commit message update. That ties up all loose ends from my perspective,
but of course if anybody has review comments, please send them.

The range-diff against what you have in jk/protocol-cap-parse-fix
(after squashing) is below.

  [1/7]: v0 protocol: fix infinite loop when parsing multi-valued capabilities
  [2/7]: t5512: stop referring to "v1" protocol
  [3/7]: v0 protocol: fix sha1/sha256 confusion for capabilities^{}
  [4/7]: t5512: add v2 support for "ls-remote --symref" test
  [5/7]: t5512: allow any protocol version for filtered symref test
  [6/7]: t5512: test "ls-remote --heads --symref" filtering with v0 and v2
  [7/7]: v0 protocol: use size_t for capability length/offset

 builtin/receive-pack.c |   2 +-
 connect.c              |  30 ++++----
 connect.h              |   4 +-
 fetch-pack.c           |   4 +-
 send-pack.c            |   2 +-
 t/t5512-ls-remote.sh   | 156 +++++++++++++++++++++++------------------
 transport.c            |   2 +-
 upload-pack.c          |   2 +-
 8 files changed, 112 insertions(+), 90 deletions(-)

1:  5471cf388b = 1:  117f371be1 v0 protocol: fix infinite loop when parsing multi-valued capabilities
2:  eb3e6e6d1c = 2:  de695291b0 t5512: stop referring to "v1" protocol
3:  c77b8ae4a0 ! 3:  515149d67a t5512: stop using jgit for capabilities^{} test
    @@ Metadata
     Author: Jeff King <peff@peff.net>
     
      ## Commit message ##
    -    t5512: stop using jgit for capabilities^{} test
    +    v0 protocol: fix sha1/sha256 confusion for capabilities^{}
     
         Commit eb398797cd (connect: advertized capability is not a ref,
         2016-09-09) added support for an upload-pack server responding with:
     
           0000000000000000000000000000000000000000        capabilities^{}
     
    -    followed by a NUL and capabilities. This is not something Git itself has
    -    ever produced for upload-pack, but JGit does. And hence the test used
    -    JGit to reproduce the real-world situation. That was good for verifying
    -    that the incompatibility was fixed, but it's a lousy regression test for
    -    a few reasons:
    +    followed by a NUL and the actual capabilities. We correctly parse the
    +    oid using the packet_reader's hash_algo field, but then we compare it to
    +    null_oid(), which will instead use our current repo's default algorithm.
    +    If we're defaulting to sha256 locally but the other side is sha1, they
    +    won't match and we'll fail to parse the line (and thus die()).
     
    -      - hardly anybody runs it, because you have to have jgit installed
    +    This can cause a test failure when the suite is run with
    +    GIT_TEST_DEFAULT_HASH=sha256, and we even do so regularly via the
    +    linux-sha256 CI job. But since the test requires JGit to run, it's
    +    usually just skipped, and nobody noticed the problem.
    +
    +    The reason the original patch used JGit is that Git itself does not ever
    +    produce such a line via upload-pack; the feature was added to fix a
    +    real-world problem when interacting with JGit. That was good for
    +    verifying that the incompatibility was fixed, but it's not a good
    +    regression test:
    +
    +      - hardly anybody runs it, because you have to have jgit installed;
    +        hence this bug going unnoticed
     
           - we're depending on jgit's behavior for the test to do anything
             useful. In particular, this behavior is only relevant to the v0
    @@ Commit message
             script.
     
         Instead, let's just hard-code the response that's of interest to us.
    -    That will test exactly what we want for every run.
    +    That will test exactly what we want for every run, and reveals the bug
    +    when run in sha256 mode. And of course we'll fix the actual bug by using
    +    the correct hash_algo struct.
     
         Signed-off-by: Jeff King <peff@peff.net>
     
4:  8db5b3c3bf = 4:  152d904a4a t5512: add v2 support for "ls-remote --symref" test
5:  f1cd63e16e = 5:  87053ab90b t5512: allow any protocol version for filtered symref test
6:  b6b9d1ad44 = 6:  37d300d244 t5512: test "ls-remote --heads --symref" filtering with v0 and v2
7:  870d6e0a3b = 7:  4db6853ea2 v0 protocol: use size_t for capability length/offset

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v3 1/7] v0 protocol: fix infinite loop when parsing multi-valued capabilities
  2023-04-14 21:24                 ` [PATCH v3 0/7] v0 multiple-symref infinite loop fix and test cleanup Jeff King
@ 2023-04-14 21:24                   ` Jeff King
  2023-04-14 21:24                   ` [PATCH v3 2/7] t5512: stop referring to "v1" protocol Jeff King
                                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2023-04-14 21:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, Jonas Haag, brian m. carlson, git

If Git's client-side parsing of an upload-pack response (so git-fetch or
ls-remote) sees multiple instances of a single capability, it can enter
an infinite loop due to a bug in advancing the "offset" parameter in the
parser.

This bug can't happen between a client and server of the same Git
version. The client bug is in parse_feature_value() when the caller
passes in an offset parameter. And that only happens when the v0
protocol is parsing "symref" and "object-format" capabilities, via
next_server_feature_value(). But Git has never produced multiple
object-format capabilities, and it stopped producing multiple symref
values in d007dbf7d6 (Revert "upload-pack: send non-HEAD symbolic refs",
2013-11-18).

However, upload-pack did produce multiple symref entries for a while,
and they are valid. Plus other implementations, such as Dulwich will
still do so. So we should handle them. And even if we do not expect it,
it is obviously a bug for the parser to enter an infinite loop.

The bug itself is pretty simple. Commit 2c6a403d96 (connect: add
function to parse multiple v1 capability values, 2020-05-25) added the
"offset" parameter, which is used as both an in- and out-parameter. When
parsing the first "symref" capability, *offset will be 0 on input, and
after parsing the capability, we set *offset to an index just past the
value by taking a pointer difference "(value + end) - feature_list".

But on the second call, now *offset is set to that larger index, which
lets us skip past the first "symref" capability. However, we do so by
incrementing feature_list. That means our pointer difference is now too
small; it is counting from where we resumed parsing, not from the start
of the original feature_list pointer. And because we incremented
feature_list only inside our function, and not the caller, that
increment is lost next time the function is called.

One solution would be to account for those skipped bytes by incrementing
*offset, rather than assigning to it. But wait, there's more!

We also increment feature_list if we have a near-miss. Say we are
looking for "symref" and find "almost-symref". In that case we'll point
feature_list to the "y" in "almost-symref" and restart our search. But
that again means our offset won't be correct, as it won't account for
the bytes between the start of the string and that "y".

So instead, let's just record the beginning of the feature_list string
in a separate pointer that we never touch. That offset we take in and
return is meant to be using that point as a base, and now we'll do so
consistently.

Since the bug can't be reproduced using the current version of
git-upload-pack, we'll instead hard-code an input which triggers the
problem. Before this patch it loops forever re-parsing the second symref
entry. Now we check both that it finishes, and that it parses both
entries correctly (a case we could not test at all before).

We don't need to worry about testing v2 here; it communicates the
capabilities in a completely different way, and doesn't use this code at
all. There are tests earlier in t5512 that are meant to cover this (they
don't, but we'll address that in a future patch).

Reported-by: Jonas Haag <jonas@lophus.org>
Signed-off-by: Jeff King <peff@peff.net>
---
 connect.c            |  5 +++--
 t/t5512-ls-remote.sh | 52 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/connect.c b/connect.c
index c0c8a38178..0dc739c4e5 100644
--- a/connect.c
+++ b/connect.c
@@ -597,6 +597,7 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 
 const char *parse_feature_value(const char *feature_list, const char *feature, int *lenp, int *offset)
 {
+	const char *orig_start = feature_list;
 	int len;
 
 	if (!feature_list)
@@ -616,7 +617,7 @@ const char *parse_feature_value(const char *feature_list, const char *feature, i
 				if (lenp)
 					*lenp = 0;
 				if (offset)
-					*offset = found + len - feature_list;
+					*offset = found + len - orig_start;
 				return value;
 			}
 			/* feature with a value (e.g., "agent=git/1.2.3") */
@@ -628,7 +629,7 @@ const char *parse_feature_value(const char *feature_list, const char *feature, i
 				if (lenp)
 					*lenp = end;
 				if (offset)
-					*offset = value + end - feature_list;
+					*offset = value + end - orig_start;
 				return value;
 			}
 			/*
diff --git a/t/t5512-ls-remote.sh b/t/t5512-ls-remote.sh
index 20d063fb9a..cab67282df 100755
--- a/t/t5512-ls-remote.sh
+++ b/t/t5512-ls-remote.sh
@@ -15,6 +15,19 @@ generate_references () {
 	done
 }
 
+test_expect_success 'set up fake upload-pack' '
+	# This can be used to simulate an upload-pack that just shows the
+	# contents of the "input" file (prepared with the test-tool pkt-line
+	# helper), and does not do any negotiation (since ls-remote does not
+	# need it).
+	write_script cat-input <<-\EOF
+	# send our initial advertisement/response
+	cat input
+	# soak up the flush packet from the client
+	cat
+	EOF
+'
+
 test_expect_success 'dies when no remote found' '
 	test_must_fail git ls-remote
 '
@@ -360,4 +373,43 @@ test_expect_success 'ls-remote prefixes work with all protocol versions' '
 	test_cmp expect actual.v2
 '
 
+test_expect_success 'v0 clients can handle multiple symrefs' '
+	# Modern versions of Git will not return multiple symref capabilities
+	# for v0, so we have to hard-code the response. Note that we will
+	# always use both v0 and object-format=sha1 here, as the hard-coded
+	# response reflects a server that only supports those.
+	oid=1234567890123456789012345678901234567890 &&
+	symrefs="symref=refs/remotes/origin/HEAD:refs/remotes/origin/main" &&
+	symrefs="$symrefs symref=HEAD:refs/heads/main" &&
+
+	# Likewise we want to make sure our parser is not fooled by the string
+	# "symref" appearing as part of an earlier cap. But there is no way to
+	# do that via upload-pack, as arbitrary strings can appear only in a
+	# "symref" value itself (where we skip past the values as a whole)
+	# and "agent" (which always appears after "symref", so putting our
+	# parser in a confused state is less interesting).
+	caps="some other caps including a-fake-symref-cap" &&
+
+	test-tool pkt-line pack >input.q <<-EOF &&
+	$oid HEADQ$caps $symrefs
+	$oid refs/heads/main
+	$oid refs/remotes/origin/HEAD
+	$oid refs/remotes/origin/main
+	0000
+	EOF
+	q_to_nul <input.q >input &&
+
+	cat >expect <<-EOF &&
+	ref: refs/heads/main	HEAD
+	$oid	HEAD
+	$oid	refs/heads/main
+	ref: refs/remotes/origin/main	refs/remotes/origin/HEAD
+	$oid	refs/remotes/origin/HEAD
+	$oid	refs/remotes/origin/main
+	EOF
+
+	git ls-remote --symref --upload-pack=./cat-input . >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.40.0.515.gdfb9e78b42


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 2/7] t5512: stop referring to "v1" protocol
  2023-04-14 21:24                 ` [PATCH v3 0/7] v0 multiple-symref infinite loop fix and test cleanup Jeff King
  2023-04-14 21:24                   ` [PATCH v3 1/7] v0 protocol: fix infinite loop when parsing multi-valued capabilities Jeff King
@ 2023-04-14 21:24                   ` Jeff King
  2023-04-14 21:25                   ` [PATCH v3 3/7] v0 protocol: fix sha1/sha256 confusion for capabilities^{} Jeff King
                                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2023-04-14 21:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, Jonas Haag, brian m. carlson, git

There really isn't a "v1" Git protocol. It's just v0 with an extra probe
which we used to test compatibility in preparation for v2. Any tests
that are looking for before/after behavior for v2 really care about
"v0". Mentioning "v1" in these tests is just making things more
confusing, because we don't care about that probe; we're really testing
v0. So let's say so.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5512-ls-remote.sh | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/t/t5512-ls-remote.sh b/t/t5512-ls-remote.sh
index cab67282df..b868e20d7e 100755
--- a/t/t5512-ls-remote.sh
+++ b/t/t5512-ls-remote.sh
@@ -358,17 +358,17 @@ test_expect_success 'ls-remote --sort fails gracefully outside repository' '
 test_expect_success 'ls-remote patterns work with all protocol versions' '
 	git for-each-ref --format="%(objectname)	%(refname)" \
 		refs/heads/main refs/remotes/origin/main >expect &&
-	git -c protocol.version=1 ls-remote . main >actual.v1 &&
-	test_cmp expect actual.v1 &&
+	git -c protocol.version=0 ls-remote . main >actual.v0 &&
+	test_cmp expect actual.v0 &&
 	git -c protocol.version=2 ls-remote . main >actual.v2 &&
 	test_cmp expect actual.v2
 '
 
 test_expect_success 'ls-remote prefixes work with all protocol versions' '
 	git for-each-ref --format="%(objectname)	%(refname)" \
 		refs/heads/ refs/tags/ >expect &&
-	git -c protocol.version=1 ls-remote --heads --tags . >actual.v1 &&
-	test_cmp expect actual.v1 &&
+	git -c protocol.version=0 ls-remote --heads --tags . >actual.v0 &&
+	test_cmp expect actual.v0 &&
 	git -c protocol.version=2 ls-remote --heads --tags . >actual.v2 &&
 	test_cmp expect actual.v2
 '
-- 
2.40.0.515.gdfb9e78b42


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 3/7] v0 protocol: fix sha1/sha256 confusion for capabilities^{}
  2023-04-14 21:24                 ` [PATCH v3 0/7] v0 multiple-symref infinite loop fix and test cleanup Jeff King
  2023-04-14 21:24                   ` [PATCH v3 1/7] v0 protocol: fix infinite loop when parsing multi-valued capabilities Jeff King
  2023-04-14 21:24                   ` [PATCH v3 2/7] t5512: stop referring to "v1" protocol Jeff King
@ 2023-04-14 21:25                   ` Jeff King
  2023-04-14 21:25                   ` [PATCH v3 4/7] t5512: add v2 support for "ls-remote --symref" test Jeff King
                                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2023-04-14 21:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, Jonas Haag, brian m. carlson, git

Commit eb398797cd (connect: advertized capability is not a ref,
2016-09-09) added support for an upload-pack server responding with:

  0000000000000000000000000000000000000000        capabilities^{}

followed by a NUL and the actual capabilities. We correctly parse the
oid using the packet_reader's hash_algo field, but then we compare it to
null_oid(), which will instead use our current repo's default algorithm.
If we're defaulting to sha256 locally but the other side is sha1, they
won't match and we'll fail to parse the line (and thus die()).

This can cause a test failure when the suite is run with
GIT_TEST_DEFAULT_HASH=sha256, and we even do so regularly via the
linux-sha256 CI job. But since the test requires JGit to run, it's
usually just skipped, and nobody noticed the problem.

The reason the original patch used JGit is that Git itself does not ever
produce such a line via upload-pack; the feature was added to fix a
real-world problem when interacting with JGit. That was good for
verifying that the incompatibility was fixed, but it's not a good
regression test:

  - hardly anybody runs it, because you have to have jgit installed;
    hence this bug going unnoticed

  - we're depending on jgit's behavior for the test to do anything
    useful. In particular, this behavior is only relevant to the v0
    protocol, but these days we ask for the v2 protocol by default. So
    for modern jgit, this is probably testing nothing.

  - it's complicated and slow. We had to do some fifo trickery to handle
    races, and this one test makes up 40% of the runtime of the total
    script.

Instead, let's just hard-code the response that's of interest to us.
That will test exactly what we want for every run, and reveals the bug
when run in sha256 mode. And of course we'll fix the actual bug by using
the correct hash_algo struct.

Signed-off-by: Jeff King <peff@peff.net>
---
 connect.c            |  3 ++-
 t/t5512-ls-remote.sh | 51 +++++++++++++-------------------------------
 2 files changed, 17 insertions(+), 37 deletions(-)

diff --git a/connect.c b/connect.c
index 0dc739c4e5..b7ba5f5727 100644
--- a/connect.c
+++ b/connect.c
@@ -263,7 +263,8 @@ static int process_dummy_ref(const struct packet_reader *reader)
 		return 0;
 	name++;
 
-	return oideq(null_oid(), &oid) && !strcmp(name, "capabilities^{}");
+	return oideq(reader->hash_algo->null_oid, &oid) &&
+		!strcmp(name, "capabilities^{}");
 }
 
 static void check_no_capabilities(const char *line, int len)
diff --git a/t/t5512-ls-remote.sh b/t/t5512-ls-remote.sh
index b868e20d7e..6fb67ddce4 100755
--- a/t/t5512-ls-remote.sh
+++ b/t/t5512-ls-remote.sh
@@ -300,44 +300,23 @@ test_expect_success 'ls-remote --symref omits filtered-out matches' '
 	test_cmp expect actual
 '
 
-test_lazy_prereq GIT_DAEMON '
-	test_bool_env GIT_TEST_GIT_DAEMON true
-'
-
-# This test spawns a daemon, so run it only if the user would be OK with
-# testing with git-daemon.
-test_expect_success PIPE,JGIT,GIT_DAEMON 'indicate no refs in standards-compliant empty remote' '
-	test_set_port JGIT_DAEMON_PORT &&
-	JGIT_DAEMON_PID= &&
-	git init --bare empty.git &&
-	>empty.git/git-daemon-export-ok &&
-	mkfifo jgit_daemon_output &&
-	{
-		jgit daemon --port="$JGIT_DAEMON_PORT" . >jgit_daemon_output &
-		JGIT_DAEMON_PID=$!
-	} &&
-	test_when_finished kill "$JGIT_DAEMON_PID" &&
-	{
-		read line &&
-		case $line in
-		Exporting*)
-			;;
-		*)
-			echo "Expected: Exporting" &&
-			false;;
-		esac &&
-		read line &&
-		case $line in
-		"Listening on"*)
-			;;
-		*)
-			echo "Expected: Listening on" &&
-			false;;
-		esac
-	} <jgit_daemon_output &&
+test_expect_success 'indicate no refs in v0 standards-compliant empty remote' '
+	# Git does not produce an output like this, but it does match the
+	# standard and is produced by other implementations like JGit. So
+	# hard-code the case we care about.
+	#
+	# The actual capabilities do not matter; there are none that would
+	# change how ls-remote behaves.
+	oid=0000000000000000000000000000000000000000 &&
+	test-tool pkt-line pack >input.q <<-EOF &&
+	$oid capabilities^{}Qcaps-go-here
+	0000
+	EOF
+	q_to_nul <input.q >input &&
+
 	# --exit-code asks the command to exit with 2 when no
 	# matching refs are found.
-	test_expect_code 2 git ls-remote --exit-code git://localhost:$JGIT_DAEMON_PORT/empty.git
+	test_expect_code 2 git ls-remote --exit-code --upload-pack=./cat-input .
 '
 
 test_expect_success 'ls-remote works outside repository' '
-- 
2.40.0.515.gdfb9e78b42


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 4/7] t5512: add v2 support for "ls-remote --symref" test
  2023-04-14 21:24                 ` [PATCH v3 0/7] v0 multiple-symref infinite loop fix and test cleanup Jeff King
                                     ` (2 preceding siblings ...)
  2023-04-14 21:25                   ` [PATCH v3 3/7] v0 protocol: fix sha1/sha256 confusion for capabilities^{} Jeff King
@ 2023-04-14 21:25                   ` Jeff King
  2023-04-14 21:25                   ` [PATCH v3 5/7] t5512: allow any protocol version for filtered symref test Jeff King
                                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2023-04-14 21:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, Jonas Haag, brian m. carlson, git

Commit b2f73b70b2 (t5512: compensate for v0 only sending HEAD symrefs,
2019-02-25) configured this test to always run with protocol v0, since
the output is different for v2.

But that means we are not getting any test coverage of the feature with
v2 at all. We could obviously switch to using and expecting v2, but then
that leaves v0 behind (and while we don't use it by default, it's still
important for testing interoperability with older servers). Likewise, we
could switch the expected output based on $GIT_TEST_PROTOCOL_VERSION,
but hardly anybody runs the tests for v0 these days.

Instead, let's explicitly run it for both protocol versions to make sure
they're well behaved. This matches other similar tests added later in
6a139cdd74 (ls-remote: pass heads/tags prefixes to transport,
2018-10-31), etc.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5512-ls-remote.sh | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/t/t5512-ls-remote.sh b/t/t5512-ls-remote.sh
index 6fb67ddce4..a34cab12ae 100755
--- a/t/t5512-ls-remote.sh
+++ b/t/t5512-ls-remote.sh
@@ -244,22 +244,25 @@ test_expect_success 'protocol v2 supports hiderefs' '
 
 test_expect_success 'ls-remote --symref' '
 	git fetch origin &&
-	echo "ref: refs/heads/main	HEAD" >expect &&
+	echo "ref: refs/heads/main	HEAD" >expect.v2 &&
 	generate_references \
 		HEAD \
-		refs/heads/main >>expect &&
+		refs/heads/main >>expect.v2 &&
+	echo "ref: refs/remotes/origin/main	refs/remotes/origin/HEAD" >>expect.v2 &&
 	oid=$(git rev-parse HEAD) &&
-	echo "$oid	refs/remotes/origin/HEAD" >>expect &&
+	echo "$oid	refs/remotes/origin/HEAD" >>expect.v2 &&
 	generate_references \
 		refs/remotes/origin/main \
 		refs/tags/mark \
 		refs/tags/mark1.1 \
 		refs/tags/mark1.10 \
-		refs/tags/mark1.2 >>expect &&
-	# Protocol v2 supports sending symrefs for refs other than HEAD, so use
-	# protocol v0 here.
-	GIT_TEST_PROTOCOL_VERSION=0 git ls-remote --symref >actual &&
-	test_cmp expect actual
+		refs/tags/mark1.2 >>expect.v2 &&
+	# v0 does not show non-HEAD symrefs
+	grep -v "ref: refs/remotes" <expect.v2 >expect.v0 &&
+	git -c protocol.version=0 ls-remote --symref >actual.v0 &&
+	test_cmp expect.v0 actual.v0 &&
+	git -c protocol.version=2 ls-remote --symref >actual.v2 &&
+	test_cmp expect.v2 actual.v2
 '
 
 test_expect_success 'ls-remote with filtered symref (refname)' '
-- 
2.40.0.515.gdfb9e78b42


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 5/7] t5512: allow any protocol version for filtered symref test
  2023-04-14 21:24                 ` [PATCH v3 0/7] v0 multiple-symref infinite loop fix and test cleanup Jeff King
                                     ` (3 preceding siblings ...)
  2023-04-14 21:25                   ` [PATCH v3 4/7] t5512: add v2 support for "ls-remote --symref" test Jeff King
@ 2023-04-14 21:25                   ` Jeff King
  2023-04-14 21:25                   ` [PATCH v3 6/7] t5512: test "ls-remote --heads --symref" filtering with v0 and v2 Jeff King
                                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2023-04-14 21:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, Jonas Haag, brian m. carlson, git

We have a test that checks that ls-remote, when asked only about HEAD,
will report the HEAD symref, and not others. This was marked to always
run with the v0 protocol by b2f73b70b2 (t5512: compensate for v0 only
sending HEAD symrefs, 2019-02-25).

But in v0 this test is doing nothing! For v0, upload-pack only reports
the HEAD symref anyway, so we'd never have any other symref to report.

For v2, it is useful; we learn about all symrefs (and the test repo has
multiple), so this demonstrates that we correctly avoid showing them.

We could perhaps mark this to test explicitly with v2, but since that is
the default these days, it's sufficient to just run ls-remote without
any protocol specification. It still passes if somebody does an explicit
GIT_TEST_PROTOCOL_VERSION=0; it's just testing less.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5512-ls-remote.sh | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/t/t5512-ls-remote.sh b/t/t5512-ls-remote.sh
index a34cab12ae..fbf23d12a6 100755
--- a/t/t5512-ls-remote.sh
+++ b/t/t5512-ls-remote.sh
@@ -271,9 +271,7 @@ test_expect_success 'ls-remote with filtered symref (refname)' '
 	ref: refs/heads/main	HEAD
 	$rev	HEAD
 	EOF
-	# Protocol v2 supports sending symrefs for refs other than HEAD, so use
-	# protocol v0 here.
-	GIT_TEST_PROTOCOL_VERSION=0 git ls-remote --symref . HEAD >actual &&
+	git ls-remote --symref . HEAD >actual &&
 	test_cmp expect actual
 '
 
-- 
2.40.0.515.gdfb9e78b42


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 6/7] t5512: test "ls-remote --heads --symref" filtering with v0 and v2
  2023-04-14 21:24                 ` [PATCH v3 0/7] v0 multiple-symref infinite loop fix and test cleanup Jeff King
                                     ` (4 preceding siblings ...)
  2023-04-14 21:25                   ` [PATCH v3 5/7] t5512: allow any protocol version for filtered symref test Jeff King
@ 2023-04-14 21:25                   ` Jeff King
  2023-04-14 21:25                   ` [PATCH v3 7/7] v0 protocol: use size_t for capability length/offset Jeff King
  2023-04-17 16:06                   ` [PATCH v3 0/7] v0 multiple-symref infinite loop fix and test cleanup Junio C Hamano
  7 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2023-04-14 21:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, Jonas Haag, brian m. carlson, git

We have two overlapping tests for checking the behavior of "ls-remote
--symref" when filtering output. The first test checks that using
"--heads" will omit the symref for HEAD (since we don't print anything
about HEAD at all), but still prints other symrefs.

This has been marked as expecting failure since it was added in
99c08d4eb2 (ls-remote: add support for showing symrefs, 2016-01-19).
That's because back then, we only had the v0 protocol, and it only
reported on the HEAD symref, not others. But these days we have v2,
which does exactly what the test wants. It would even have started
unexpectedly passing when we switched to v2 by default, except that
b2f73b70b2 (t5512: compensate for v0 only sending HEAD symrefs,
2019-02-25) over-zealously marked it to run only in v0 mode.

So let's run it with both protocol versions, and adjust the expected
output for each. It passes in v2 without modification. In v0 mode, we'll
drop the extra symref, but this is still testing something useful: it
ensures that we do omit HEAD.

The test after this checks "--heads" again, this time using the expected
v0 output. That's now redundant. It also checks that limiting with a
pattern like "refs/heads/*" works similarly, but that's redundant with a
test earlier in the script which limits by HEAD (again, back then the
"HEAD" test was less interesting because there were no other symrefs to
omit, but in a modern v2 world, there are). So we can just delete that
second test entirely.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5512-ls-remote.sh | 26 +++++++-------------------
 1 file changed, 7 insertions(+), 19 deletions(-)

diff --git a/t/t5512-ls-remote.sh b/t/t5512-ls-remote.sh
index fbf23d12a6..151c76eb09 100755
--- a/t/t5512-ls-remote.sh
+++ b/t/t5512-ls-remote.sh
@@ -275,30 +275,18 @@ test_expect_success 'ls-remote with filtered symref (refname)' '
 	test_cmp expect actual
 '
 
-test_expect_failure 'ls-remote with filtered symref (--heads)' '
+test_expect_success 'ls-remote with filtered symref (--heads)' '
 	git symbolic-ref refs/heads/foo refs/tags/mark &&
-	cat >expect <<-EOF &&
+	cat >expect.v2 <<-EOF &&
 	ref: refs/tags/mark	refs/heads/foo
 	$rev	refs/heads/foo
 	$rev	refs/heads/main
 	EOF
-	# Protocol v2 supports sending symrefs for refs other than HEAD, so use
-	# protocol v0 here.
-	GIT_TEST_PROTOCOL_VERSION=0 git ls-remote --symref --heads . >actual &&
-	test_cmp expect actual
-'
-
-test_expect_success 'ls-remote --symref omits filtered-out matches' '
-	cat >expect <<-EOF &&
-	$rev	refs/heads/foo
-	$rev	refs/heads/main
-	EOF
-	# Protocol v2 supports sending symrefs for refs other than HEAD, so use
-	# protocol v0 here.
-	GIT_TEST_PROTOCOL_VERSION=0 git ls-remote --symref --heads . >actual &&
-	test_cmp expect actual &&
-	GIT_TEST_PROTOCOL_VERSION=0 git ls-remote --symref . "refs/heads/*" >actual &&
-	test_cmp expect actual
+	grep -v "^ref: refs/tags/" <expect.v2 >expect.v0 &&
+	git -c protocol.version=0 ls-remote --symref --heads . >actual.v0 &&
+	test_cmp expect.v0 actual.v0 &&
+	git -c protocol.version=2 ls-remote --symref --heads . >actual.v2 &&
+	test_cmp expect.v2 actual.v2
 '
 
 test_expect_success 'indicate no refs in v0 standards-compliant empty remote' '
-- 
2.40.0.515.gdfb9e78b42


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 7/7] v0 protocol: use size_t for capability length/offset
  2023-04-14 21:24                 ` [PATCH v3 0/7] v0 multiple-symref infinite loop fix and test cleanup Jeff King
                                     ` (5 preceding siblings ...)
  2023-04-14 21:25                   ` [PATCH v3 6/7] t5512: test "ls-remote --heads --symref" filtering with v0 and v2 Jeff King
@ 2023-04-14 21:25                   ` Jeff King
  2023-04-17 16:06                   ` [PATCH v3 0/7] v0 multiple-symref infinite loop fix and test cleanup Junio C Hamano
  7 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2023-04-14 21:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, Jonas Haag, brian m. carlson, git

When parsing server capabilities, we use "int" to store lengths and
offsets. At first glance this seems like a spot where our parser may be
confused by integer overflow if somebody sent us a malicious response.

In practice these strings are all bounded by the 64k limit of a
pkt-line, so using "int" is OK. However, it makes the code simpler to
audit if they just use size_t everywhere. Note that because we take
these parameters as pointers, this also forces many callers to update
their declared types.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/receive-pack.c |  2 +-
 connect.c              | 22 +++++++++++-----------
 connect.h              |  4 ++--
 fetch-pack.c           |  4 ++--
 send-pack.c            |  2 +-
 transport.c            |  2 +-
 upload-pack.c          |  2 +-
 7 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 9109552533..3b495ecc84 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -2093,7 +2093,7 @@ static struct command *read_head_info(struct packet_reader *reader,
 			const char *feature_list = reader->line + linelen + 1;
 			const char *hash = NULL;
 			const char *client_sid;
-			int len = 0;
+			size_t len = 0;
 			if (parse_feature_request(feature_list, "report-status"))
 				report_status = 1;
 			if (parse_feature_request(feature_list, "report-status-v2"))
diff --git a/connect.c b/connect.c
index b7ba5f5727..c54adc652f 100644
--- a/connect.c
+++ b/connect.c
@@ -22,7 +22,7 @@
 
 static char *server_capabilities_v1;
 static struct strvec server_capabilities_v2 = STRVEC_INIT;
-static const char *next_server_feature_value(const char *feature, int *len, int *offset);
+static const char *next_server_feature_value(const char *feature, size_t *len, size_t *offset);
 
 static int check_ref(const char *name, unsigned int flags)
 {
@@ -205,10 +205,10 @@ static void parse_one_symref_info(struct string_list *symref, const char *val, i
 static void annotate_refs_with_symref_info(struct ref *ref)
 {
 	struct string_list symref = STRING_LIST_INIT_DUP;
-	int offset = 0;
+	size_t offset = 0;
 
 	while (1) {
-		int len;
+		size_t len;
 		const char *val;
 
 		val = next_server_feature_value("symref", &len, &offset);
@@ -231,7 +231,7 @@ static void annotate_refs_with_symref_info(struct ref *ref)
 static void process_capabilities(struct packet_reader *reader, int *linelen)
 {
 	const char *feat_val;
-	int feat_len;
+	size_t feat_len;
 	const char *line = reader->line;
 	int nul_location = strlen(line);
 	if (nul_location == *linelen)
@@ -596,10 +596,10 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 	return list;
 }
 
-const char *parse_feature_value(const char *feature_list, const char *feature, int *lenp, int *offset)
+const char *parse_feature_value(const char *feature_list, const char *feature, size_t *lenp, size_t *offset)
 {
 	const char *orig_start = feature_list;
-	int len;
+	size_t len;
 
 	if (!feature_list)
 		return NULL;
@@ -623,7 +623,7 @@ const char *parse_feature_value(const char *feature_list, const char *feature, i
 			}
 			/* feature with a value (e.g., "agent=git/1.2.3") */
 			else if (*value == '=') {
-				int end;
+				size_t end;
 
 				value++;
 				end = strcspn(value, " \t\n");
@@ -645,8 +645,8 @@ const char *parse_feature_value(const char *feature_list, const char *feature, i
 
 int server_supports_hash(const char *desired, int *feature_supported)
 {
-	int offset = 0;
-	int len;
+	size_t offset = 0;
+	size_t len;
 	const char *hash;
 
 	hash = next_server_feature_value("object-format", &len, &offset);
@@ -670,12 +670,12 @@ int parse_feature_request(const char *feature_list, const char *feature)
 	return !!parse_feature_value(feature_list, feature, NULL, NULL);
 }
 
-static const char *next_server_feature_value(const char *feature, int *len, int *offset)
+static const char *next_server_feature_value(const char *feature, size_t *len, size_t *offset)
 {
 	return parse_feature_value(server_capabilities_v1, feature, len, offset);
 }
 
-const char *server_feature_value(const char *feature, int *len)
+const char *server_feature_value(const char *feature, size_t *len)
 {
 	return parse_feature_value(server_capabilities_v1, feature, len, NULL);
 }
diff --git a/connect.h b/connect.h
index f41a0b4c1f..1645126c17 100644
--- a/connect.h
+++ b/connect.h
@@ -12,14 +12,14 @@ int finish_connect(struct child_process *conn);
 int git_connection_is_socket(struct child_process *conn);
 int server_supports(const char *feature);
 int parse_feature_request(const char *features, const char *feature);
-const char *server_feature_value(const char *feature, int *len_ret);
+const char *server_feature_value(const char *feature, size_t *len_ret);
 int url_is_local_not_ssh(const char *url);
 
 struct packet_reader;
 enum protocol_version discover_version(struct packet_reader *reader);
 
 int server_supports_hash(const char *desired, int *feature_supported);
-const char *parse_feature_value(const char *feature_list, const char *feature, int *lenp, int *offset);
+const char *parse_feature_value(const char *feature_list, const char *feature, size_t *lenp, size_t *offset);
 int server_supports_v2(const char *c);
 void ensure_server_supports_v2(const char *c);
 int server_feature_v2(const char *c, const char **v);
diff --git a/fetch-pack.c b/fetch-pack.c
index 368f2ed25a..97a44ed582 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1099,7 +1099,7 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 	struct ref *ref = copy_ref_list(orig_ref);
 	struct object_id oid;
 	const char *agent_feature;
-	int agent_len;
+	size_t agent_len;
 	struct fetch_negotiator negotiator_alloc;
 	struct fetch_negotiator *negotiator;
 
@@ -1117,7 +1117,7 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 		agent_supported = 1;
 		if (agent_len)
 			print_verbose(args, _("Server version is %.*s"),
-				      agent_len, agent_feature);
+				      (int)agent_len, agent_feature);
 	}
 
 	if (!server_supports("session-id"))
diff --git a/send-pack.c b/send-pack.c
index f81bbb28d7..97344b629e 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -538,7 +538,7 @@ int send_pack(struct send_pack_args *args,
 		die(_("the receiving end does not support this repository's hash algorithm"));
 
 	if (args->push_cert != SEND_PACK_PUSH_CERT_NEVER) {
-		int len;
+		size_t len;
 		push_cert_nonce = server_feature_value("push-cert", &len);
 		if (push_cert_nonce) {
 			reject_invalid_nonce(push_cert_nonce, len);
diff --git a/transport.c b/transport.c
index 89a220425e..6223dc3de2 100644
--- a/transport.c
+++ b/transport.c
@@ -317,7 +317,7 @@ static struct ref *handshake(struct transport *transport, int for_push,
 	struct git_transport_data *data = transport->data;
 	struct ref *refs = NULL;
 	struct packet_reader reader;
-	int sid_len;
+	size_t sid_len;
 	const char *server_sid;
 
 	connect_setup(transport, for_push);
diff --git a/upload-pack.c b/upload-pack.c
index e23f16dfdd..565e46058f 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -1067,7 +1067,7 @@ static void receive_needs(struct upload_pack_data *data,
 		const char *features;
 		struct object_id oid_buf;
 		const char *arg;
-		int feature_len;
+		size_t feature_len;
 
 		reset_timeout(data->timeout);
 		if (packet_reader_read(reader) != PACKET_READ_NORMAL)
-- 
2.40.0.515.gdfb9e78b42

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 0/7] v0 multiple-symref infinite loop fix and test cleanup
  2023-04-14 21:24                 ` [PATCH v3 0/7] v0 multiple-symref infinite loop fix and test cleanup Jeff King
                                     ` (6 preceding siblings ...)
  2023-04-14 21:25                   ` [PATCH v3 7/7] v0 protocol: use size_t for capability length/offset Jeff King
@ 2023-04-17 16:06                   ` Junio C Hamano
  7 siblings, 0 replies; 30+ messages in thread
From: Junio C Hamano @ 2023-04-17 16:06 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, Jonas Haag, brian m. carlson, git

Jeff King <peff@peff.net> writes:

> On Wed, Apr 12, 2023 at 05:04:23AM -0400, Jeff King wrote:
>
>> I'll squash that in and update the commit message before I do the next
>> re-roll (but will still hold off a bit to get further comments).
>
> Nobody said anything, so I assume the rest of the series is perfect. ;)
>
> Junio, I see that you picked up this fix as a "squash", along with my
> other "v2" update. Here's a v3 that does the actual squash along with a
> commit message update. That ties up all loose ends from my perspective,
> but of course if anybody has review comments, please send them.

These look good.  Queued.

Thanks.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2023-04-17 16:06 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-11 19:53 Infinite loop + memory leak in annotate_refs_with_symref_info Jonas Haag
2023-04-11 20:25 ` Taylor Blau
2023-04-11 23:59   ` Taylor Blau
2023-04-12  0:53   ` brian m. carlson
2023-04-11 21:06 ` Jeff King
2023-04-11 21:16   ` Jeff King
2023-04-11 21:22     ` Taylor Blau
2023-04-11 21:58       ` Jeff King
2023-04-11 22:52         ` Junio C Hamano
2023-04-12  6:23           ` [PATCH 0/7] v0 multiple-symref infinite loop fix and test cleanup Jeff King
2023-04-12  6:29             ` [PATCH 1/7] v0 protocol: fix infinite loop when parsing multi-valued capabilities Jeff King
2023-04-12  6:46               ` Jeff King
2023-04-12  7:25                 ` [PATCH v2 " Jeff King
2023-04-12  7:26                   ` Jeff King
2023-04-12  6:29             ` [PATCH 2/7] t5512: stop referring to "v1" protocol Jeff King
2023-04-12  6:31             ` [PATCH 3/7] t5512: stop using jgit for capabilities^{} test Jeff King
2023-04-12  9:04               ` Jeff King
2023-04-14 21:24                 ` [PATCH v3 0/7] v0 multiple-symref infinite loop fix and test cleanup Jeff King
2023-04-14 21:24                   ` [PATCH v3 1/7] v0 protocol: fix infinite loop when parsing multi-valued capabilities Jeff King
2023-04-14 21:24                   ` [PATCH v3 2/7] t5512: stop referring to "v1" protocol Jeff King
2023-04-14 21:25                   ` [PATCH v3 3/7] v0 protocol: fix sha1/sha256 confusion for capabilities^{} Jeff King
2023-04-14 21:25                   ` [PATCH v3 4/7] t5512: add v2 support for "ls-remote --symref" test Jeff King
2023-04-14 21:25                   ` [PATCH v3 5/7] t5512: allow any protocol version for filtered symref test Jeff King
2023-04-14 21:25                   ` [PATCH v3 6/7] t5512: test "ls-remote --heads --symref" filtering with v0 and v2 Jeff King
2023-04-14 21:25                   ` [PATCH v3 7/7] v0 protocol: use size_t for capability length/offset Jeff King
2023-04-17 16:06                   ` [PATCH v3 0/7] v0 multiple-symref infinite loop fix and test cleanup Junio C Hamano
2023-04-12  6:34             ` [PATCH 4/7] t5512: add v2 support for "ls-remote --symref" test Jeff King
2023-04-12  6:35             ` [PATCH 5/7] t5512: allow any protocol version for filtered symref test Jeff King
2023-04-12  6:37             ` [PATCH 6/7] t5512: test "ls-remote --heads --symref" filtering with v0 and v2 Jeff King
2023-04-12  6:40             ` [PATCH 7/7] v0 protocol: use size_t for capability length/offset Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.