git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Eric Sunshine <sunshine@sunshineco.com>
Cc: "Junio C Hamano" <gitster@pobox.com>,
	孟子易 <mengziyi540841@gmail.com>,
	git@vger.kernel.org
Subject: [PATCH v2 0/3] get rid of sscanf() when shortening refs
Date: Wed, 15 Feb 2023 10:16:02 -0500	[thread overview]
Message-ID: <Y+z3MtgayoXsxaHA@coredump.intra.peff.net> (raw)
In-Reply-To: <Y+vVFFCRem6t4IGM@coredump.intra.peff.net>

On Tue, Feb 14, 2023 at 01:38:13PM -0500, Jeff King wrote:

> OK, here it is. I split it into a few patches to hopefully make it a bit
> easier to follow.
> 
>   [1/3]: shorten_unambiguous_ref(): avoid integer truncation
>   [2/3]: shorten_unambiguous_ref(): use NUM_REV_PARSE_RULES constant
>   [3/3]: shorten_unambiguous_ref(): avoid sscanf()

And here's a v2 fixing the "refs/headsXfoo" problem from v1. I also
added a few tests to cover that case along with some others, including
the original problem that spawned this thread.

I hope setting LC_ALL is OK in the test suite (i.e., at worst it just
becomes a noop on platforms that don't have that locale).  We can drop
it, but then the test itself becomes kind of meaningless, as I do not
think it would even fail on macOS. I would be curious to see if the test
as written does fail with the current code (I lack access to a system to
test).

In writing these, I noticed that this patch is actually fixing another
bug in the original. ;) The "%s" match in scanf is greedy, so we'd never
turn "refs/remotes/foo/HEAD" into "foo" (instead we say "foo/HEAD"). I
doubt anybody cares very much, and you might even argue that "foo/HEAD",
while not the shortest possible answer, is preferable because it's more
descriptive. But I think "foo" matches the intent of the code, and
certainly it's an unambiguous shortening.

Anyway, here's the patches and the range diff.

  [1/3]: shorten_unambiguous_ref(): avoid integer truncation
  [2/3]: shorten_unambiguous_ref(): use NUM_REV_PARSE_RULES constant
  [3/3]: shorten_unambiguous_ref(): avoid sscanf()

 refs.c                  | 93 +++++++++++++++++++++--------------------
 t/t1401-symbolic-ref.sh | 34 +++++++++++++++
 2 files changed, 82 insertions(+), 45 deletions(-)

1:  56762dc84b = 1:  f84edd1791 shorten_unambiguous_ref(): avoid integer truncation
2:  83326a396a = 2:  9287afb50f shorten_unambiguous_ref(): use NUM_REV_PARSE_RULES constant
3:  754ea4eb40 ! 3:  7cc6a8b89d shorten_unambiguous_ref(): avoid sscanf()
    @@ Commit message
         that against the refname, pulling the "%s" content into a separate
         buffer.
     
    -    This has two downsides:
    +    This has a few downsides:
     
           - sscanf("%s") reportedly misbehaves on macOS with some input and
             locale combinations, returning a partial or garbled string. See
             this thread:
     
               https://lore.kernel.org/git/CAGF3oAcCi+fG12j-1U0hcrWwkF5K_9WhOi6ZPHBzUUzfkrZDxA@mail.gmail.com/
     
    +      - scanf's matching of "%s" is greedy. So the "refs/remotes/%s/HEAD"
    +        rule would never pull "origin" out of "refs/remotes/origin/HEAD".
    +        Instead it always produced "origin/HEAD", which is redundant with
    +        the "refs/remotes/%s" rule.
    +
           - scanf in general is an error-prone interface. For example, scanning
             for "%s" will copy bytes into a destination string, which must have
             been correctly sized ahead of time to avoid a buffer overflow. In
    @@ Commit message
         to the lookup rules with "%.*s"), and then copy it only when returning
         to the caller.
     
    +    There are a few new tests here, all using symbolic-ref (the code can be
    +    triggered in many ways, but symrefs are convenient in that we don't need
    +    to create a real ref, which avoids any complications from the filesystem
    +    munging the name):
    +
    +      - the first covers the real-world case which misbehaved on macOS.
    +        Setting LC_ALL is required to trigger the problem there (since
    +        otherwise our tests use LC_ALL=C), and hopefully is at worst simply
    +        ignored on other systems (and doesn't cause libc to complain, etc,
    +        on systems without that locale).
    +
    +      - the second covers the "origin/HEAD" case as discussed above, which
    +        is now fixed
    +
    +      - the remainder are for "weird" cases that work both before and after
    +        this patch, but would be easy to get wrong with off-by-one problems
    +        in the parsing (and came out of discussions and earlier iterations
    +        of the patch that did get them wrong).
    +
    +      - absent here are tests of boring, expected-to-work cases like
    +        "refs/heads/foo", etc. Those are covered all over the test suite
    +        both explicitly (for-each-ref's refname:short) and implicitly (in
    +        the output of git-status, etc).
    +
         Reported-by: 孟子易 <mengziyi540841@gmail.com>
         Helped-by: Eric Sunshine <sunshine@sunshineco.com>
         Signed-off-by: Jeff King <peff@peff.net>
    @@ refs.c: int update_ref(const char *msg, const char *refname,
     -		size_t total_len = 0;
     -		size_t offset = 0;
     +	/*
    -+	 * Check that rule matches refname up to the first percent
    -+	 * in the rule. This is basically skip_prefix(), but
    -+	 * ending at percent in the prefix, rather than end-of-string.
    ++	 * Check that rule matches refname up to the first percent in the rule.
    ++	 * We can bail immediately if not, but otherwise we leave "rule" at the
    ++	 * %-placeholder, and "refname" at the start of the potential matched
    ++	 * name.
     +	 */
    -+	do {
    ++	while (*rule != '%') {
     +		if (!*rule)
     +			BUG("rev-parse rule did not have percent");
    -+		if (*rule == '%')
    -+			break;
    -+	} while (*refname++ == *rule++);
    ++		if (*refname++ != *rule++)
    ++			return NULL;
    ++	}
      
     -		for (i = 0; i < NUM_REV_PARSE_RULES; i++)
     -			/* -2 for strlen("%.*s") - strlen("%s"); +1 for NUL */
     -			total_len += strlen(ref_rev_parse_rules[i]) - 2 + 1;
     +	/*
    -+	 * Check that we matched all the way to the "%" placeholder,
    -+	 * and skip past it within the rule string. If so, "refname" at
    -+	 * this point is the beginning of a potential match.
    ++	 * Check that our "%" is the expected placeholder. This assumes there
    ++	 * are no other percents (placeholder or quoted) in the string, but
    ++	 * that is sufficient for our rev-parse rules.
     +	 */
     +	if (!skip_prefix(rule, "%.*s", &rule))
     +		return NULL;
    @@ refs.c: char *refs_shorten_unambiguous_ref(struct ref_store *refs,
      	return xstrdup(refname);
      }
      
    +
    + ## t/t1401-symbolic-ref.sh ##
    +@@ t/t1401-symbolic-ref.sh: test_expect_success 'symbolic-ref pointing at another' '
    + 	test_cmp expect actual
    + '
    + 
    ++test_expect_success 'symbolic-ref --short handles complex utf8 case' '
    ++	name="测试-加-增加-加-增加" &&
    ++	git symbolic-ref TEST_SYMREF "refs/heads/$name" &&
    ++	# In the real world, we saw problems with this case only
    ++	# when the locale includes UTF-8. Set it here to try to make things as
    ++	# hard as possible for us to pass, but in practice we should do the
    ++	# right thing regardless (and of course some platforms may not even
    ++	# have this locale).
    ++	LC_ALL=en_US.UTF-8 git symbolic-ref --short TEST_SYMREF >actual &&
    ++	echo "$name" >expect &&
    ++	test_cmp expect actual
    ++'
    ++
    ++test_expect_success 'symbolic-ref --short handles name with suffix' '
    ++	git symbolic-ref TEST_SYMREF "refs/remotes/origin/HEAD" &&
    ++	git symbolic-ref --short TEST_SYMREF >actual &&
    ++	echo "origin" >expect &&
    ++	test_cmp expect actual
    ++'
    ++
    ++test_expect_success 'symbolic-ref --short handles almost-matching name' '
    ++	git symbolic-ref TEST_SYMREF "refs/headsXfoo" &&
    ++	git symbolic-ref --short TEST_SYMREF >actual &&
    ++	echo "headsXfoo" >expect &&
    ++	test_cmp expect actual
    ++'
    ++
    ++test_expect_success 'symbolic-ref --short handles name with percent' '
    ++	git symbolic-ref TEST_SYMREF "refs/heads/%foo" &&
    ++	git symbolic-ref --short TEST_SYMREF >actual &&
    ++	echo "%foo" >expect &&
    ++	test_cmp expect actual
    ++'
    ++
    + test_done

  parent reply	other threads:[~2023-02-15 15:16 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-13  6:38 bug report: symbolic-ref --short command echos the wrong text while use Chinese language 孟子易
2023-02-13 20:18 ` Jeff King
2023-02-13 22:58   ` Eric Sunshine
2023-02-14  1:39     ` Jeff King
2023-02-14  5:15       ` Eric Sunshine
2023-02-14  5:33         ` Eric Sunshine
2023-02-14  5:40           ` Junio C Hamano
2023-02-14  6:05             ` Eric Sunshine
2023-02-14  6:45               ` Junio C Hamano
2023-02-14  6:55                 ` Eric Sunshine
2023-02-14 16:01                   ` Jeff King
2023-02-14 16:29                     ` Eric Sunshine
2023-02-14 17:07                       ` Jeff King
2023-02-14 18:38                         ` [PATCH 0/3] get rid of sscanf() when shortening refs Jeff King
2023-02-14 18:39                           ` [PATCH 1/3] shorten_unambiguous_ref(): avoid integer truncation Jeff King
2023-02-14 18:40                           ` [PATCH 2/3] shorten_unambiguous_ref(): use NUM_REV_PARSE_RULES constant Jeff King
2023-02-14 21:34                             ` Junio C Hamano
2023-02-14 22:23                               ` Jeff King
2023-02-14 18:41                           ` [PATCH 3/3] shorten_unambiguous_ref(): avoid sscanf() Jeff King
2023-02-14 21:48                             ` Junio C Hamano
2023-02-14 22:25                               ` Junio C Hamano
2023-02-14 22:30                               ` Jeff King
2023-02-14 22:34                                 ` Junio C Hamano
2023-02-14 22:40                                   ` Jeff King
2023-02-15  5:10                                     ` Junio C Hamano
2023-02-15 14:30                                       ` Jeff King
2023-02-15 16:41                                         ` Junio C Hamano
2023-02-14 23:20                               ` Eric Sunshine
2023-02-15 15:16                           ` Jeff King [this message]
2023-02-15 15:16                             ` [PATCH v2 1/3] shorten_unambiguous_ref(): avoid integer truncation Jeff King
2023-02-15 15:16                             ` [PATCH v2 2/3] shorten_unambiguous_ref(): use NUM_REV_PARSE_RULES constant Jeff King
2023-02-15 15:16                             ` [PATCH v2 3/3] shorten_unambiguous_ref(): avoid sscanf() Jeff King
2023-02-16  5:56                               ` Torsten Bögershausen
2023-02-16  6:16                                 ` Eric Sunshine
2023-02-16 17:21                                   ` Junio C Hamano
2023-02-16 17:28                                     ` Jeff King
2023-02-16 23:36                                       ` Junio C Hamano
2023-02-16 17:31                                 ` Jeff King
2023-02-17  6:46                                   ` Torsten Bögershausen
2023-02-15 18:00                             ` [PATCH v2 0/3] get rid of sscanf() when shortening refs Junio C Hamano
2023-02-14 16:40                     ` bug report: symbolic-ref --short command echos the wrong text while use Chinese language Junio C Hamano
2023-02-14 17:40                       ` Jeff King
2023-02-15 16:26   ` Torsten Bögershausen
2023-02-15 16:37     ` Eric Sunshine
2023-02-15 17:19       ` Torsten Bögershausen
2023-02-16  6:08         ` Eric Sunshine

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y+z3MtgayoXsxaHA@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mengziyi540841@gmail.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).