All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/22] Clean up refname checks and normalization
@ 2011-09-15 21:10 Michael Haggerty
  2011-09-15 21:10 ` [PATCH v3 01/22] t1402: add some more tests Michael Haggerty
                   ` (21 more replies)
  0 siblings, 22 replies; 60+ messages in thread
From: Michael Haggerty @ 2011-09-15 21:10 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty

This patch series cleans up refname checks and normalization in a
different way than v1 and v2.  (Unnormalized refnames are those that
have extra leading slashes or runs of repeated slashes between
components, like "//refs/heads///master".)

As discussed on the mailing list [1], adding consistent support for
unnormalized refnames everywhere was becoming a bottomless pit, so
instead I am taking the opposite approach--only accept normalized
refnames.  Support for unnormalized refnames was broken anyway, so it
shouldn't be missed.

There is only one command that is now meant to accept unnormalized
refnames:

    $ git check-ref-format --normalize //refs/heads///master
    refs/heads/master
    $

This command can be used to normalize refnames for use elsewhere.

I have added a few internal checks that refnames have the correct
format; more tests should still be added.  But this patch series is
getting quite long already, so I would like to submit it.

During the development of this patch series, I discovered that
fetch_with_import() in remote-helper.c sometimes passes NULL to
read_ref(), and thereby to resolve_ref().  The two commits labeled
"remote: *" fix this in a naive way.  But somebody more familiar with
this code should check whether the fix is OK and more specifically
whether this is a symptom of a bigger problem in the remote-helper
code.

I should mention that this cleanup is preparation for my main goal:
storing ref caches hierarchically.  But there is still a lot of
tangled up and seemingly redundant code in refs.c, so it might be a
while before I get to the main project.

[1] http://permalink.gmane.org/gmane.comp.version-control.git/181268

Michael Haggerty (22):
  t1402: add some more tests
  git check-ref-format: add options --allow-onelevel and
    --refspec-pattern
  Change bad_ref_char() to return a boolean value
  Change check_ref_format() to take a flags argument
  Refactor check_refname_format()
  Do not allow ".lock" at the end of any refname component
  Make collapse_slashes() allocate memory for its result
  Inline function refname_format_print()
  Change check_refname_format() to reject unnormalized refnames
  resolve_ref(): explicitly fail if a symlink is not readable
  resolve_ref(): use prefixcmp()
  resolve_ref(): only follow a symlink that contains a valid,
    normalized refname
  resolve_ref(): turn buffer into a proper string as soon as possible
  resolve_ref(): extract a function get_packed_ref()
  resolve_ref(): do not follow incorrectly-formatted symbolic refs
  remote: use xstrdup() instead of strdup()
  remote: avoid passing NULL to read_ref()
  resolve_ref(): verify that the input refname has the right format
  resolve_ref(): emit warnings for improperly-formatted references
  resolve_ref(): also treat a too-long SHA1 as invalid
  resolve_ref(): expand documentation
  add_ref(): verify that the refname is formatted correctly

 Documentation/git-check-ref-format.txt |   53 ++++++--
 builtin/check-ref-format.c             |   61 ++++++---
 builtin/checkout.c                     |    2 +-
 builtin/fetch-pack.c                   |    2 +-
 builtin/receive-pack.c                 |    2 +-
 builtin/replace.c                      |    2 +-
 builtin/show-ref.c                     |    2 +-
 builtin/tag.c                          |    4 +-
 cache.h                                |   34 +++++-
 connect.c                              |    2 +-
 environment.c                          |    2 +-
 fast-import.c                          |    7 +-
 git_remote_helpers/git/git.py          |    2 +-
 notes-merge.c                          |    5 +-
 pack-refs.c                            |    2 +-
 refs.c                                 |  222 +++++++++++++++++++-------------
 refs.h                                 |   21 +++-
 remote.c                               |   55 ++------
 sha1_name.c                            |    4 +-
 t/t1402-check-ref-format.sh            |  120 +++++++++++++++--
 transport-helper.c                     |   10 +-
 transport.c                            |   16 +--
 walker.c                               |    2 +-
 23 files changed, 408 insertions(+), 224 deletions(-)

-- 
1.7.6.8.gd2879

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v3 01/22] t1402: add some more tests
  2011-09-15 21:10 [PATCH v3 00/22] Clean up refname checks and normalization Michael Haggerty
@ 2011-09-15 21:10 ` Michael Haggerty
  2011-09-15 21:10 ` [PATCH v3 02/22] git check-ref-format: add options --allow-onelevel and --refspec-pattern Michael Haggerty
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 60+ messages in thread
From: Michael Haggerty @ 2011-09-15 21:10 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty

The new tests reflect the status quo.  Soon the rule for "*.lock" in
refname components will be tightened up.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 t/t1402-check-ref-format.sh |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/t/t1402-check-ref-format.sh b/t/t1402-check-ref-format.sh
index ed4275a..dc43171 100755
--- a/t/t1402-check-ref-format.sh
+++ b/t/t1402-check-ref-format.sh
@@ -13,6 +13,8 @@ invalid_ref() {
 		"test_must_fail git check-ref-format '$1'"
 }
 
+invalid_ref ''
+invalid_ref '/'
 valid_ref 'heads/foo'
 invalid_ref 'foo'
 valid_ref 'foo/bar/baz'
@@ -27,6 +29,9 @@ invalid_ref 'heads/foo..bar'
 invalid_ref 'heads/foo?bar'
 valid_ref 'foo./bar'
 invalid_ref 'heads/foo.lock'
+invalid_ref 'heads///foo.lock'
+valid_ref 'foo.lock/bar'
+valid_ref 'foo.lock///bar'
 valid_ref 'heads/foo@bar'
 invalid_ref 'heads/v@{ation'
 invalid_ref 'heads/foo\bar'
@@ -83,5 +88,9 @@ invalid_ref_normalized '/foo'
 invalid_ref_normalized 'heads/foo/../bar'
 invalid_ref_normalized 'heads/./foo'
 invalid_ref_normalized 'heads\foo'
+invalid_ref_normalized 'heads/foo.lock'
+invalid_ref_normalized 'heads///foo.lock'
+valid_ref_normalized 'foo.lock/bar' 'foo.lock/bar'
+valid_ref_normalized 'foo.lock///bar' 'foo.lock/bar'
 
 test_done
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 02/22] git check-ref-format: add options --allow-onelevel and --refspec-pattern
  2011-09-15 21:10 [PATCH v3 00/22] Clean up refname checks and normalization Michael Haggerty
  2011-09-15 21:10 ` [PATCH v3 01/22] t1402: add some more tests Michael Haggerty
@ 2011-09-15 21:10 ` Michael Haggerty
  2011-09-15 21:10 ` [PATCH v3 03/22] Change bad_ref_char() to return a boolean value Michael Haggerty
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 60+ messages in thread
From: Michael Haggerty @ 2011-09-15 21:10 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty

Also add tests of the new options.  (Actually, one big reason to add
the new options is to make it easy to test check_ref_format(), though
the options should also be useful to other scripts.)

Interpret the result of check_ref_format() based on which types of
refnames are allowed.  However, because check_ref_format() can only
return a single value, one test case is still broken.  Specifically,
the case "git check-ref-format --onelevel '*'" incorrectly succeeds
because check_ref_format() returns CHECK_REF_FORMAT_ONELEVEL for this
refname even though the refname is also CHECK_REF_FORMAT_WILDCARD.
The type of check that leads to this failure is used elsewhere in
"real" code and could lead to bugs; it will be fixed over the next few
commits.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 Documentation/git-check-ref-format.txt |   29 +++++++++--
 builtin/check-ref-format.c             |   56 +++++++++++++++++---
 t/t1402-check-ref-format.sh            |   88 +++++++++++++++++++++++++++++---
 3 files changed, 152 insertions(+), 21 deletions(-)

diff --git a/Documentation/git-check-ref-format.txt b/Documentation/git-check-ref-format.txt
index c9fdf84..dcb8cc3 100644
--- a/Documentation/git-check-ref-format.txt
+++ b/Documentation/git-check-ref-format.txt
@@ -8,8 +8,8 @@ git-check-ref-format - Ensures that a reference name is well formed
 SYNOPSIS
 --------
 [verse]
-'git check-ref-format' <refname>
-'git check-ref-format' --print <refname>
+'git check-ref-format' [--print]
+       [--[no-]allow-onelevel] [--refspec-pattern] <refname>
 'git check-ref-format' --branch <branchname-shorthand>
 
 DESCRIPTION
@@ -32,14 +32,18 @@ git imposes the following rules on how references are named:
 
 . They must contain at least one `/`. This enforces the presence of a
   category like `heads/`, `tags/` etc. but the actual names are not
-  restricted.
+  restricted.  If the `--allow-onelevel` option is used, this rule
+  is waived.
 
 . They cannot have two consecutive dots `..` anywhere.
 
 . They cannot have ASCII control characters (i.e. bytes whose
   values are lower than \040, or \177 `DEL`), space, tilde `~`,
-  caret `{caret}`, colon `:`, question-mark `?`, asterisk `*`,
-  or open bracket `[` anywhere.
+  caret `{caret}`, or colon `:` anywhere.
+
+. They cannot have question-mark `?`, asterisk `{asterisk}`, or open
+  bracket `[` anywhere.  See the `--refspec-pattern` option below for
+  an exception to this rule.
 
 . They cannot end with a slash `/` nor a dot `.`.
 
@@ -78,6 +82,21 @@ were on.  This option should be used by porcelains to accept this
 syntax anywhere a branch name is expected, so they can act as if you
 typed the branch name.
 
+OPTIONS
+-------
+--allow-onelevel::
+--no-allow-onelevel::
+	Controls whether one-level refnames are accepted (i.e.,
+	refnames that do not contain multiple `/`-separated
+	components).  The default is `--no-allow-onelevel`.
+
+--refspec-pattern::
+	Interpret <refname> as a reference name pattern for a refspec
+	(as used with remote repositories).  If this option is
+	enabled, <refname> is allowed to contain a single `{asterisk}`
+	in place of a one full pathname component (e.g.,
+	`foo/{asterisk}/bar` but not `foo/bar{asterisk}`).
+
 EXAMPLES
 --------
 
diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c
index 0723cf2..7295954 100644
--- a/builtin/check-ref-format.c
+++ b/builtin/check-ref-format.c
@@ -8,7 +8,7 @@
 #include "strbuf.h"
 
 static const char builtin_check_ref_format_usage[] =
-"git check-ref-format [--print] <refname>\n"
+"git check-ref-format [--print] [options] <refname>\n"
 "   or: git check-ref-format --branch <branchname-shorthand>";
 
 /*
@@ -45,27 +45,65 @@ static int check_ref_format_branch(const char *arg)
 	return 0;
 }
 
-static int check_ref_format_print(const char *arg)
+static void refname_format_print(const char *arg)
 {
 	char *refname = xmalloc(strlen(arg) + 1);
 
-	if (check_ref_format(arg))
-		return 1;
 	collapse_slashes(refname, arg);
 	printf("%s\n", refname);
-	return 0;
 }
 
+#define REFNAME_ALLOW_ONELEVEL 1
+#define REFNAME_REFSPEC_PATTERN 2
+
 int cmd_check_ref_format(int argc, const char **argv, const char *prefix)
 {
+	int i;
+	int print = 0;
+	int flags = 0;
+
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage(builtin_check_ref_format_usage);
 
 	if (argc == 3 && !strcmp(argv[1], "--branch"))
 		return check_ref_format_branch(argv[2]);
-	if (argc == 3 && !strcmp(argv[1], "--print"))
-		return check_ref_format_print(argv[2]);
-	if (argc != 2)
+
+	for (i = 1; i < argc && argv[i][0] == '-'; i++) {
+		if (!strcmp(argv[i], "--print"))
+			print = 1;
+		else if (!strcmp(argv[i], "--allow-onelevel"))
+			flags |= REFNAME_ALLOW_ONELEVEL;
+		else if (!strcmp(argv[i], "--no-allow-onelevel"))
+			flags &= ~REFNAME_ALLOW_ONELEVEL;
+		else if (!strcmp(argv[i], "--refspec-pattern"))
+			flags |= REFNAME_REFSPEC_PATTERN;
+		else
+			usage(builtin_check_ref_format_usage);
+	}
+	if (! (i == argc - 1))
 		usage(builtin_check_ref_format_usage);
-	return !!check_ref_format(argv[1]);
+
+	switch (check_ref_format(argv[i])) {
+	case CHECK_REF_FORMAT_OK:
+		break;
+	case CHECK_REF_FORMAT_ERROR:
+		return 1;
+	case CHECK_REF_FORMAT_ONELEVEL:
+		if (!(flags & REFNAME_ALLOW_ONELEVEL))
+			return 1;
+		else
+			break;
+	case CHECK_REF_FORMAT_WILDCARD:
+		if (!(flags & REFNAME_REFSPEC_PATTERN))
+			return 1;
+		else
+			break;
+	default:
+		die("internal error: unexpected value from check_ref_format()");
+	}
+
+	if (print)
+		refname_format_print(argv[i]);
+
+	return 0;
 }
diff --git a/t/t1402-check-ref-format.sh b/t/t1402-check-ref-format.sh
index dc43171..f551eef 100755
--- a/t/t1402-check-ref-format.sh
+++ b/t/t1402-check-ref-format.sh
@@ -5,25 +5,38 @@ test_description='Test git check-ref-format'
 . ./test-lib.sh
 
 valid_ref() {
-	test_expect_success "ref name '$1' is valid" \
-		"git check-ref-format '$1'"
+	if test "$#" = 1
+	then
+		test_expect_success "ref name '$1' is valid" \
+			"git check-ref-format '$1'"
+	else
+		test_expect_success "ref name '$1' is valid with options $2" \
+			"git check-ref-format $2 '$1'"
+	fi
 }
 invalid_ref() {
-	test_expect_success "ref name '$1' is not valid" \
-		"test_must_fail git check-ref-format '$1'"
+	if test "$#" = 1
+	then
+		test_expect_success "ref name '$1' is invalid" \
+			"test_must_fail git check-ref-format '$1'"
+	else
+		test_expect_success "ref name '$1' is invalid with options $2" \
+			"test_must_fail git check-ref-format $2 '$1'"
+	fi
 }
 
 invalid_ref ''
 invalid_ref '/'
-valid_ref 'heads/foo'
-invalid_ref 'foo'
+invalid_ref '/' --allow-onelevel
 valid_ref 'foo/bar/baz'
 valid_ref 'refs///heads/foo'
 invalid_ref 'heads/foo/'
 valid_ref '/heads/foo'
 valid_ref '///heads/foo'
-invalid_ref '/foo'
 invalid_ref './foo'
+invalid_ref './foo/bar'
+invalid_ref 'foo/./bar'
+invalid_ref 'foo/bar/.'
 invalid_ref '.refs/foo'
 invalid_ref 'heads/foo..bar'
 invalid_ref 'heads/foo?bar'
@@ -38,6 +51,67 @@ invalid_ref 'heads/foo\bar'
 invalid_ref "$(printf 'heads/foo\t')"
 invalid_ref "$(printf 'heads/foo\177')"
 valid_ref "$(printf 'heads/fu\303\237')"
+invalid_ref 'heads/*foo/bar' --refspec-pattern
+invalid_ref 'heads/foo*/bar' --refspec-pattern
+invalid_ref 'heads/f*o/bar' --refspec-pattern
+
+ref='foo'
+invalid_ref "$ref"
+valid_ref "$ref" --allow-onelevel
+invalid_ref "$ref" --refspec-pattern
+valid_ref "$ref" '--refspec-pattern --allow-onelevel'
+
+ref='foo/bar'
+valid_ref "$ref"
+valid_ref "$ref" --allow-onelevel
+valid_ref "$ref" --refspec-pattern
+valid_ref "$ref" '--refspec-pattern --allow-onelevel'
+
+ref='foo/*'
+invalid_ref "$ref"
+invalid_ref "$ref" --allow-onelevel
+valid_ref "$ref" --refspec-pattern
+valid_ref "$ref" '--refspec-pattern --allow-onelevel'
+
+ref='*/foo'
+invalid_ref "$ref"
+invalid_ref "$ref" --allow-onelevel
+valid_ref "$ref" --refspec-pattern
+valid_ref "$ref" '--refspec-pattern --allow-onelevel'
+
+ref='foo/*/bar'
+invalid_ref "$ref"
+invalid_ref "$ref" --allow-onelevel
+valid_ref "$ref" --refspec-pattern
+valid_ref "$ref" '--refspec-pattern --allow-onelevel'
+
+ref='*'
+invalid_ref "$ref"
+
+#invalid_ref "$ref" --allow-onelevel
+test_expect_failure "ref name '$ref' is invalid with options --allow-onelevel" \
+	"test_must_fail git check-ref-format --allow-onelevel '$ref'"
+
+invalid_ref "$ref" --refspec-pattern
+valid_ref "$ref" '--refspec-pattern --allow-onelevel'
+
+ref='foo/*/*'
+invalid_ref "$ref" --refspec-pattern
+invalid_ref "$ref" '--refspec-pattern --allow-onelevel'
+
+ref='*/foo/*'
+invalid_ref "$ref" --refspec-pattern
+invalid_ref "$ref" '--refspec-pattern --allow-onelevel'
+
+ref='*/*/foo'
+invalid_ref "$ref" --refspec-pattern
+invalid_ref "$ref" '--refspec-pattern --allow-onelevel'
+
+ref='/foo'
+invalid_ref "$ref"
+valid_ref "$ref" --allow-onelevel
+invalid_ref "$ref" --refspec-pattern
+valid_ref "$ref" '--refspec-pattern --allow-onelevel'
 
 test_expect_success "check-ref-format --branch @{-1}" '
 	T=$(git write-tree) &&
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 03/22] Change bad_ref_char() to return a boolean value
  2011-09-15 21:10 [PATCH v3 00/22] Clean up refname checks and normalization Michael Haggerty
  2011-09-15 21:10 ` [PATCH v3 01/22] t1402: add some more tests Michael Haggerty
  2011-09-15 21:10 ` [PATCH v3 02/22] git check-ref-format: add options --allow-onelevel and --refspec-pattern Michael Haggerty
@ 2011-09-15 21:10 ` Michael Haggerty
  2011-09-15 21:10 ` [PATCH v3 04/22] Change check_ref_format() to take a flags argument Michael Haggerty
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 60+ messages in thread
From: Michael Haggerty @ 2011-09-15 21:10 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty

Previously most bad characters were indicated by returning 1, but "*"
was special-cased to return 2 instead of 1.  One caller examined the
return value to see whether the special case occurred.

But it is easier (to document and understand) for bad_ref_char()
simply to return a boolean value, treating "*" like any other bad
character.  Special-case the handling of "*" (which only occurs in
very specific circumstances) at the caller.  The resulting calling
code thereby also becomes more transparent.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.c |   15 ++++++---------
 1 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/refs.c b/refs.c
index a615043..fd29d89 100644
--- a/refs.c
+++ b/refs.c
@@ -860,22 +860,21 @@ int for_each_rawref(each_ref_fn fn, void *cb_data)
  * - it contains a "\" (backslash)
  */
 
+/* Return true iff ch is not allowed in reference names. */
 static inline int bad_ref_char(int ch)
 {
 	if (((unsigned) ch) <= ' ' || ch == 0x7f ||
 	    ch == '~' || ch == '^' || ch == ':' || ch == '\\')
 		return 1;
 	/* 2.13 Pattern Matching Notation */
-	if (ch == '?' || ch == '[') /* Unsupported */
+	if (ch == '*' || ch == '?' || ch == '[') /* Unsupported */
 		return 1;
-	if (ch == '*') /* Supported at the end */
-		return 2;
 	return 0;
 }
 
 int check_ref_format(const char *ref)
 {
-	int ch, level, bad_type, last;
+	int ch, level, last;
 	int ret = CHECK_REF_FORMAT_OK;
 	const char *cp = ref;
 
@@ -890,9 +889,8 @@ int check_ref_format(const char *ref)
 		/* we are at the beginning of the path component */
 		if (ch == '.')
 			return CHECK_REF_FORMAT_ERROR;
-		bad_type = bad_ref_char(ch);
-		if (bad_type) {
-			if (bad_type == 2 && (!*cp || *cp == '/') &&
+		if (bad_ref_char(ch)) {
+			if (ch == '*' && (!*cp || *cp == '/') &&
 			    ret == CHECK_REF_FORMAT_OK)
 				ret = CHECK_REF_FORMAT_WILDCARD;
 			else
@@ -902,8 +900,7 @@ int check_ref_format(const char *ref)
 		last = ch;
 		/* scan the rest of the path component */
 		while ((ch = *cp++) != 0) {
-			bad_type = bad_ref_char(ch);
-			if (bad_type)
+			if (bad_ref_char(ch))
 				return CHECK_REF_FORMAT_ERROR;
 			if (ch == '/')
 				break;
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 04/22] Change check_ref_format() to take a flags argument
  2011-09-15 21:10 [PATCH v3 00/22] Clean up refname checks and normalization Michael Haggerty
                   ` (2 preceding siblings ...)
  2011-09-15 21:10 ` [PATCH v3 03/22] Change bad_ref_char() to return a boolean value Michael Haggerty
@ 2011-09-15 21:10 ` Michael Haggerty
  2011-09-15 21:10 ` [PATCH v3 05/22] Refactor check_refname_format() Michael Haggerty
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 60+ messages in thread
From: Michael Haggerty @ 2011-09-15 21:10 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty

Change check_ref_format() to take a flags argument that indicates what
is acceptable in the reference name (analogous to "git
check-ref-format"'s "--allow-onelevel" and "--refspec-pattern").  This
is more convenient for callers and also fixes a failure in the test
suite (and likely elsewhere in the code) by enabling "onelevel" and
"refspec-pattern" to be allowed independently of each other.

Also rename check_ref_format() to check_refname_format() to make it
obvious that it deals with refnames rather than references themselves.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 builtin/check-ref-format.c    |   21 +---------------
 builtin/checkout.c            |    2 +-
 builtin/fetch-pack.c          |    2 +-
 builtin/receive-pack.c        |    2 +-
 builtin/replace.c             |    2 +-
 builtin/show-ref.c            |    2 +-
 builtin/tag.c                 |    4 +-
 connect.c                     |    2 +-
 environment.c                 |    2 +-
 fast-import.c                 |    7 +----
 git_remote_helpers/git/git.py |    2 +-
 notes-merge.c                 |    5 ++-
 pack-refs.c                   |    2 +-
 refs.c                        |   42 ++++++++++++++-----------------
 refs.h                        |   17 +++++++++----
 remote.c                      |   53 +++++++++++------------------------------
 sha1_name.c                   |    4 +-
 t/t1402-check-ref-format.sh   |    6 +----
 transport.c                   |   16 +++---------
 walker.c                      |    2 +-
 20 files changed, 69 insertions(+), 126 deletions(-)

diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c
index 7295954..8f41696 100644
--- a/builtin/check-ref-format.c
+++ b/builtin/check-ref-format.c
@@ -53,9 +53,6 @@ static void refname_format_print(const char *arg)
 	printf("%s\n", refname);
 }
 
-#define REFNAME_ALLOW_ONELEVEL 1
-#define REFNAME_REFSPEC_PATTERN 2
-
 int cmd_check_ref_format(int argc, const char **argv, const char *prefix)
 {
 	int i;
@@ -83,24 +80,8 @@ int cmd_check_ref_format(int argc, const char **argv, const char *prefix)
 	if (! (i == argc - 1))
 		usage(builtin_check_ref_format_usage);
 
-	switch (check_ref_format(argv[i])) {
-	case CHECK_REF_FORMAT_OK:
-		break;
-	case CHECK_REF_FORMAT_ERROR:
+	if (check_refname_format(argv[i], flags))
 		return 1;
-	case CHECK_REF_FORMAT_ONELEVEL:
-		if (!(flags & REFNAME_ALLOW_ONELEVEL))
-			return 1;
-		else
-			break;
-	case CHECK_REF_FORMAT_WILDCARD:
-		if (!(flags & REFNAME_REFSPEC_PATTERN))
-			return 1;
-		else
-			break;
-	default:
-		die("internal error: unexpected value from check_ref_format()");
-	}
 
 	if (print)
 		refname_format_print(argv[i]);
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 3bb6525..574d2b6 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -882,7 +882,7 @@ static int parse_branchname_arg(int argc, const char **argv,
 	new->name = arg;
 	setup_branch_path(new);
 
-	if (check_ref_format(new->path) == CHECK_REF_FORMAT_OK &&
+	if (!check_refname_format(new->path, 0) &&
 	    resolve_ref(new->path, branch_rev, 1, NULL))
 		hashcpy(rev, branch_rev);
 	else
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 412bd32..b51e478 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -544,7 +544,7 @@ static void filter_refs(struct ref **refs, int nr_match, char **match)
 	for (ref = *refs; ref; ref = next) {
 		next = ref->next;
 		if (!memcmp(ref->name, "refs/", 5) &&
-		    check_ref_format(ref->name + 5))
+		    check_refname_format(ref->name + 5, 0))
 			; /* trash */
 		else if (args.fetch_all &&
 			 (!args.depth || prefixcmp(ref->name, "refs/tags/") )) {
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index ae164da..0600efa 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -356,7 +356,7 @@ static const char *update(struct command *cmd)
 	struct ref_lock *lock;
 
 	/* only refs/... are allowed */
-	if (prefixcmp(name, "refs/") || check_ref_format(name + 5)) {
+	if (prefixcmp(name, "refs/") || check_refname_format(name + 5, 0)) {
 		rp_error("refusing to create funny ref '%s' remotely", name);
 		return "funny refname";
 	}
diff --git a/builtin/replace.c b/builtin/replace.c
index fe3a647..517fa10 100644
--- a/builtin/replace.c
+++ b/builtin/replace.c
@@ -94,7 +94,7 @@ static int replace_object(const char *object_ref, const char *replace_ref,
 		     "refs/replace/%s",
 		     sha1_to_hex(object)) > sizeof(ref) - 1)
 		die("replace ref name too long: %.*s...", 50, ref);
-	if (check_ref_format(ref))
+	if (check_refname_format(ref, 0))
 		die("'%s' is not a valid ref name.", ref);
 
 	if (!resolve_ref(ref, prev, 1, NULL))
diff --git a/builtin/show-ref.c b/builtin/show-ref.c
index 45f0340..fafb6dd 100644
--- a/builtin/show-ref.c
+++ b/builtin/show-ref.c
@@ -145,7 +145,7 @@ static int exclude_existing(const char *match)
 			if (strncmp(ref, match, matchlen))
 				continue;
 		}
-		if (check_ref_format(ref)) {
+		if (check_refname_format(ref, 0)) {
 			warning("ref '%s' ignored", ref);
 			continue;
 		}
diff --git a/builtin/tag.c b/builtin/tag.c
index 667515e..48be745 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -407,12 +407,12 @@ static int parse_msg_arg(const struct option *opt, const char *arg, int unset)
 static int strbuf_check_tag_ref(struct strbuf *sb, const char *name)
 {
 	if (name[0] == '-')
-		return CHECK_REF_FORMAT_ERROR;
+		return -1;
 
 	strbuf_reset(sb);
 	strbuf_addf(sb, "refs/tags/%s", name);
 
-	return check_ref_format(sb->buf);
+	return check_refname_format(sb->buf, 0);
 }
 
 int cmd_tag(int argc, const char **argv, const char *prefix)
diff --git a/connect.c b/connect.c
index ee1d4b4..51990fa 100644
--- a/connect.c
+++ b/connect.c
@@ -22,7 +22,7 @@ static int check_ref(const char *name, int len, unsigned int flags)
 	len -= 5;
 
 	/* REF_NORMAL means that we don't want the magic fake tag refs */
-	if ((flags & REF_NORMAL) && check_ref_format(name) < 0)
+	if ((flags & REF_NORMAL) && check_refname_format(name, 0))
 		return 0;
 
 	/* REF_HEADS means that we want regular branch heads */
diff --git a/environment.c b/environment.c
index e96edcf..8174b70 100644
--- a/environment.c
+++ b/environment.c
@@ -106,7 +106,7 @@ static char *expand_namespace(const char *raw_namespace)
 		if (strcmp((*c)->buf, "/") != 0)
 			strbuf_addf(&buf, "refs/namespaces/%s", (*c)->buf);
 	strbuf_list_free(components);
-	if (check_ref_format(buf.buf) != CHECK_REF_FORMAT_OK)
+	if (check_refname_format(buf.buf, 0))
 		die("bad git namespace path \"%s\"", raw_namespace);
 	strbuf_addch(&buf, '/');
 	return strbuf_detach(&buf, NULL);
diff --git a/fast-import.c b/fast-import.c
index 742e7da..f9347f5 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -722,13 +722,8 @@ static struct branch *new_branch(const char *name)
 
 	if (b)
 		die("Invalid attempt to create duplicate branch: %s", name);
-	switch (check_ref_format(name)) {
-	case 0: break; /* its valid */
-	case CHECK_REF_FORMAT_ONELEVEL:
-		break; /* valid, but too few '/', allow anyway */
-	default:
+	if (check_refname_format(name, REFNAME_ALLOW_ONELEVEL))
 		die("Branch name doesn't conform to GIT standards: %s", name);
-	}
 
 	b = pool_calloc(1, sizeof(struct branch));
 	b->name = pool_strdup(name);
diff --git a/git_remote_helpers/git/git.py b/git_remote_helpers/git/git.py
index a383e6c..007a1bf 100644
--- a/git_remote_helpers/git/git.py
+++ b/git_remote_helpers/git/git.py
@@ -54,7 +54,7 @@ def valid_git_ref (ref_name):
     # The following is a reimplementation of the git check-ref-format
     # command.  The rules were derived from the git check-ref-format(1)
     # manual page.  This code should be replaced by a call to
-    # check_ref_format() in the git library, when such is available.
+    # check_refname_format() in the git library, when such is available.
     if ref_name.endswith('/') or \
        ref_name.startswith('.') or \
        ref_name.count('/.') or \
diff --git a/notes-merge.c b/notes-merge.c
index e1aaf43..3bbcc9d 100644
--- a/notes-merge.c
+++ b/notes-merge.c
@@ -570,7 +570,8 @@ int notes_merge(struct notes_merge_options *o,
 	/* Dereference o->local_ref into local_sha1 */
 	if (!resolve_ref(o->local_ref, local_sha1, 0, NULL))
 		die("Failed to resolve local notes ref '%s'", o->local_ref);
-	else if (!check_ref_format(o->local_ref) && is_null_sha1(local_sha1))
+	else if (!check_refname_format(o->local_ref, 0) &&
+		is_null_sha1(local_sha1))
 		local = NULL; /* local_sha1 == null_sha1 indicates unborn ref */
 	else if (!(local = lookup_commit_reference(local_sha1)))
 		die("Could not parse local commit %s (%s)",
@@ -583,7 +584,7 @@ int notes_merge(struct notes_merge_options *o,
 		 * Failed to get remote_sha1. If o->remote_ref looks like an
 		 * unborn ref, perform the merge using an empty notes tree.
 		 */
-		if (!check_ref_format(o->remote_ref)) {
+		if (!check_refname_format(o->remote_ref, 0)) {
 			hashclr(remote_sha1);
 			remote = NULL;
 		} else {
diff --git a/pack-refs.c b/pack-refs.c
index 1290570..23bbd00 100644
--- a/pack-refs.c
+++ b/pack-refs.c
@@ -72,7 +72,7 @@ static void try_remove_empty_parents(char *name)
 	for (i = 0; i < 2; i++) { /* refs/{heads,tags,...}/ */
 		while (*p && *p != '/')
 			p++;
-		/* tolerate duplicate slashes; see check_ref_format() */
+		/* tolerate duplicate slashes; see check_refname_format() */
 		while (*p == '/')
 			p++;
 	}
diff --git a/refs.c b/refs.c
index fd29d89..aaa8730 100644
--- a/refs.c
+++ b/refs.c
@@ -872,10 +872,9 @@ static inline int bad_ref_char(int ch)
 	return 0;
 }
 
-int check_ref_format(const char *ref)
+int check_refname_format(const char *ref, int flags)
 {
 	int ch, level, last;
-	int ret = CHECK_REF_FORMAT_OK;
 	const char *cp = ref;
 
 	level = 0;
@@ -884,41 +883,42 @@ int check_ref_format(const char *ref)
 			; /* tolerate duplicated slashes */
 		if (!ch)
 			/* should not end with slashes */
-			return CHECK_REF_FORMAT_ERROR;
+			return -1;
 
 		/* we are at the beginning of the path component */
 		if (ch == '.')
-			return CHECK_REF_FORMAT_ERROR;
+			return -1;
 		if (bad_ref_char(ch)) {
-			if (ch == '*' && (!*cp || *cp == '/') &&
-			    ret == CHECK_REF_FORMAT_OK)
-				ret = CHECK_REF_FORMAT_WILDCARD;
+			if ((flags & REFNAME_REFSPEC_PATTERN) && ch == '*' &&
+				(!*cp || *cp == '/'))
+				/* Accept one wildcard as a full refname component. */
+				flags &= ~REFNAME_REFSPEC_PATTERN;
 			else
-				return CHECK_REF_FORMAT_ERROR;
+				return -1;
 		}
 
 		last = ch;
 		/* scan the rest of the path component */
 		while ((ch = *cp++) != 0) {
 			if (bad_ref_char(ch))
-				return CHECK_REF_FORMAT_ERROR;
+				return -1;
 			if (ch == '/')
 				break;
 			if (last == '.' && ch == '.')
-				return CHECK_REF_FORMAT_ERROR;
+				return -1;
 			if (last == '@' && ch == '{')
-				return CHECK_REF_FORMAT_ERROR;
+				return -1;
 			last = ch;
 		}
 		level++;
 		if (!ch) {
 			if (ref <= cp - 2 && cp[-2] == '.')
-				return CHECK_REF_FORMAT_ERROR;
-			if (level < 2)
-				return CHECK_REF_FORMAT_ONELEVEL;
+				return -1;
+			if (level < 2 && !(flags & REFNAME_ALLOW_ONELEVEL))
+				return -1;
 			if (has_extension(ref, ".lock"))
-				return CHECK_REF_FORMAT_ERROR;
-			return ret;
+				return -1;
+			return 0;
 		}
 	}
 }
@@ -1103,7 +1103,7 @@ static struct ref_lock *lock_ref_sha1_basic(const char *ref, const unsigned char
 struct ref_lock *lock_ref_sha1(const char *ref, const unsigned char *old_sha1)
 {
 	char refpath[PATH_MAX];
-	if (check_ref_format(ref))
+	if (check_refname_format(ref, 0))
 		return NULL;
 	strcpy(refpath, mkpath("refs/%s", ref));
 	return lock_ref_sha1_basic(refpath, old_sha1, 0, NULL);
@@ -1111,13 +1111,9 @@ struct ref_lock *lock_ref_sha1(const char *ref, const unsigned char *old_sha1)
 
 struct ref_lock *lock_any_ref_for_update(const char *ref, const unsigned char *old_sha1, int flags)
 {
-	switch (check_ref_format(ref)) {
-	default:
+	if (check_refname_format(ref, REFNAME_ALLOW_ONELEVEL))
 		return NULL;
-	case 0:
-	case CHECK_REF_FORMAT_ONELEVEL:
-		return lock_ref_sha1_basic(ref, old_sha1, flags, NULL);
-	}
+	return lock_ref_sha1_basic(ref, old_sha1, flags, NULL);
 }
 
 static struct lock_file packlock;
diff --git a/refs.h b/refs.h
index dfb086e..48540c0 100644
--- a/refs.h
+++ b/refs.h
@@ -97,11 +97,18 @@ int for_each_recent_reflog_ent(const char *ref, each_reflog_ent_fn fn, long, voi
  */
 extern int for_each_reflog(each_ref_fn, void *);
 
-#define CHECK_REF_FORMAT_OK 0
-#define CHECK_REF_FORMAT_ERROR (-1)
-#define CHECK_REF_FORMAT_ONELEVEL (-2)
-#define CHECK_REF_FORMAT_WILDCARD (-3)
-extern int check_ref_format(const char *target);
+#define REFNAME_ALLOW_ONELEVEL 1
+#define REFNAME_REFSPEC_PATTERN 2
+
+/*
+ * Return 0 iff ref has the correct format for a refname according to
+ * the rules described in Documentation/git-check-ref-format.txt.  If
+ * REFNAME_ALLOW_ONELEVEL is set in flags, then accept one-level
+ * reference names.  If REFNAME_REFSPEC_PATTERN is set in flags, then
+ * allow a "*" wildcard character in place of one of the name
+ * components.
+ */
+extern int check_refname_format(const char *ref, int flags);
 
 extern const char *prettify_refname(const char *refname);
 extern char *shorten_unambiguous_ref(const char *ref, int strict);
diff --git a/remote.c b/remote.c
index b8ecfa5..6fcf809 100644
--- a/remote.c
+++ b/remote.c
@@ -493,23 +493,6 @@ static void read_config(void)
 }
 
 /*
- * We need to make sure the remote-tracking branches are well formed, but a
- * wildcard refspec in "struct refspec" must have a trailing slash. We
- * temporarily drop the trailing '/' while calling check_ref_format(),
- * and put it back.  The caller knows that a CHECK_REF_FORMAT_ONELEVEL
- * error return is Ok for a wildcard refspec.
- */
-static int verify_refname(char *name, int is_glob)
-{
-	int result;
-
-	result = check_ref_format(name);
-	if (is_glob && result == CHECK_REF_FORMAT_WILDCARD)
-		result = CHECK_REF_FORMAT_OK;
-	return result;
-}
-
-/*
  * This function frees a refspec array.
  * Warning: code paths should be checked to ensure that the src
  *          and dst pointers are always freeable pointers as well
@@ -532,13 +515,13 @@ static void free_refspecs(struct refspec *refspec, int nr_refspec)
 static struct refspec *parse_refspec_internal(int nr_refspec, const char **refspec, int fetch, int verify)
 {
 	int i;
-	int st;
 	struct refspec *rs = xcalloc(sizeof(*rs), nr_refspec);
 
 	for (i = 0; i < nr_refspec; i++) {
 		size_t llen;
 		int is_glob;
 		const char *lhs, *rhs;
+		int flags;
 
 		is_glob = 0;
 
@@ -576,6 +559,7 @@ static struct refspec *parse_refspec_internal(int nr_refspec, const char **refsp
 
 		rs[i].pattern = is_glob;
 		rs[i].src = xstrndup(lhs, llen);
+		flags = REFNAME_ALLOW_ONELEVEL | (is_glob ? REFNAME_REFSPEC_PATTERN : 0);
 
 		if (fetch) {
 			/*
@@ -585,26 +569,20 @@ static struct refspec *parse_refspec_internal(int nr_refspec, const char **refsp
 			 */
 			if (!*rs[i].src)
 				; /* empty is ok */
-			else {
-				st = verify_refname(rs[i].src, is_glob);
-				if (st && st != CHECK_REF_FORMAT_ONELEVEL)
-					goto invalid;
-			}
+			else if (check_refname_format(rs[i].src, flags))
+				goto invalid;
 			/*
 			 * RHS
 			 * - missing is ok, and is same as empty.
 			 * - empty is ok; it means not to store.
 			 * - otherwise it must be a valid looking ref.
 			 */
-			if (!rs[i].dst) {
+			if (!rs[i].dst)
 				; /* ok */
-			} else if (!*rs[i].dst) {
+			else if (!*rs[i].dst)
 				; /* ok */
-			} else {
-				st = verify_refname(rs[i].dst, is_glob);
-				if (st && st != CHECK_REF_FORMAT_ONELEVEL)
-					goto invalid;
-			}
+			else if (check_refname_format(rs[i].dst, flags))
+				goto invalid;
 		} else {
 			/*
 			 * LHS
@@ -616,8 +594,7 @@ static struct refspec *parse_refspec_internal(int nr_refspec, const char **refsp
 			if (!*rs[i].src)
 				; /* empty is ok */
 			else if (is_glob) {
-				st = verify_refname(rs[i].src, is_glob);
-				if (st && st != CHECK_REF_FORMAT_ONELEVEL)
+				if (check_refname_format(rs[i].src, flags))
 					goto invalid;
 			}
 			else
@@ -630,14 +607,12 @@ static struct refspec *parse_refspec_internal(int nr_refspec, const char **refsp
 			 * - otherwise it must be a valid looking ref.
 			 */
 			if (!rs[i].dst) {
-				st = verify_refname(rs[i].src, is_glob);
-				if (st && st != CHECK_REF_FORMAT_ONELEVEL)
+				if (check_refname_format(rs[i].src, flags))
 					goto invalid;
 			} else if (!*rs[i].dst) {
 				goto invalid;
 			} else {
-				st = verify_refname(rs[i].dst, is_glob);
-				if (st && st != CHECK_REF_FORMAT_ONELEVEL)
+				if (check_refname_format(rs[i].dst, flags))
 					goto invalid;
 			}
 		}
@@ -1427,8 +1402,8 @@ int get_fetch_map(const struct ref *remote_refs,
 
 	for (rmp = &ref_map; *rmp; ) {
 		if ((*rmp)->peer_ref) {
-			int st = check_ref_format((*rmp)->peer_ref->name + 5);
-			if (st && st != CHECK_REF_FORMAT_ONELEVEL) {
+			if (check_refname_format((*rmp)->peer_ref->name + 5,
+				REFNAME_ALLOW_ONELEVEL)) {
 				struct ref *ignore = *rmp;
 				error("* Ignoring funny ref '%s' locally",
 				      (*rmp)->peer_ref->name);
@@ -1620,7 +1595,7 @@ static int one_local_ref(const char *refname, const unsigned char *sha1, int fla
 	int len;
 
 	/* we already know it starts with refs/ to get here */
-	if (check_ref_format(refname + 5))
+	if (check_refname_format(refname + 5, 0))
 		return 0;
 
 	len = strlen(refname) + 1;
diff --git a/sha1_name.c b/sha1_name.c
index ff5992a..143fd97 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -972,9 +972,9 @@ int strbuf_check_branch_ref(struct strbuf *sb, const char *name)
 {
 	strbuf_branchname(sb, name);
 	if (name[0] == '-')
-		return CHECK_REF_FORMAT_ERROR;
+		return -1;
 	strbuf_splice(sb, 0, 0, "refs/heads/", 11);
-	return check_ref_format(sb->buf);
+	return check_refname_format(sb->buf, 0);
 }
 
 /*
diff --git a/t/t1402-check-ref-format.sh b/t/t1402-check-ref-format.sh
index f551eef..1cad88f 100755
--- a/t/t1402-check-ref-format.sh
+++ b/t/t1402-check-ref-format.sh
@@ -87,11 +87,7 @@ valid_ref "$ref" '--refspec-pattern --allow-onelevel'
 
 ref='*'
 invalid_ref "$ref"
-
-#invalid_ref "$ref" --allow-onelevel
-test_expect_failure "ref name '$ref' is invalid with options --allow-onelevel" \
-	"test_must_fail git check-ref-format --allow-onelevel '$ref'"
-
+invalid_ref "$ref" --allow-onelevel
 invalid_ref "$ref" --refspec-pattern
 valid_ref "$ref" '--refspec-pattern --allow-onelevel'
 
diff --git a/transport.c b/transport.c
index fa279d5..feb2ff5 100644
--- a/transport.c
+++ b/transport.c
@@ -754,18 +754,10 @@ void transport_verify_remote_names(int nr_heads, const char **heads)
 			continue;
 
 		remote = remote ? (remote + 1) : local;
-		switch (check_ref_format(remote)) {
-		case 0: /* ok */
-		case CHECK_REF_FORMAT_ONELEVEL:
-			/* ok but a single level -- that is fine for
-			 * a match pattern.
-			 */
-		case CHECK_REF_FORMAT_WILDCARD:
-			/* ok but ends with a pattern-match character */
-			continue;
-		}
-		die("remote part of refspec is not a valid name in %s",
-		    heads[i]);
+		if (check_refname_format(remote,
+				REFNAME_ALLOW_ONELEVEL|REFNAME_REFSPEC_PATTERN))
+			die("remote part of refspec is not a valid name in %s",
+				heads[i]);
 	}
 }
 
diff --git a/walker.c b/walker.c
index dce7128..be389dc 100644
--- a/walker.c
+++ b/walker.c
@@ -190,7 +190,7 @@ static int interpret_target(struct walker *walker, char *target, unsigned char *
 {
 	if (!get_sha1_hex(target, sha1))
 		return 0;
-	if (!check_ref_format(target)) {
+	if (!check_refname_format(target, 0)) {
 		struct ref *ref = alloc_ref(target);
 		if (!walker->fetch_ref(walker, ref)) {
 			hashcpy(sha1, ref->old_sha1);
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 05/22] Refactor check_refname_format()
  2011-09-15 21:10 [PATCH v3 00/22] Clean up refname checks and normalization Michael Haggerty
                   ` (3 preceding siblings ...)
  2011-09-15 21:10 ` [PATCH v3 04/22] Change check_ref_format() to take a flags argument Michael Haggerty
@ 2011-09-15 21:10 ` Michael Haggerty
  2011-09-15 21:10 ` [PATCH v3 06/22] Do not allow ".lock" at the end of any refname component Michael Haggerty
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 60+ messages in thread
From: Michael Haggerty @ 2011-09-15 21:10 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty

Among other things, extract a function check_refname_component().

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.c |   95 +++++++++++++++++++++++++++++++++++++---------------------------
 1 files changed, 55 insertions(+), 40 deletions(-)

diff --git a/refs.c b/refs.c
index aaa8730..5259724 100644
--- a/refs.c
+++ b/refs.c
@@ -872,55 +872,70 @@ static inline int bad_ref_char(int ch)
 	return 0;
 }
 
+/*
+ * Try to read one refname component from the front of ref.  Return
+ * the length of the component found, or -1 if the component is not
+ * legal.
+ */
+static int check_refname_component(const char *ref)
+{
+	const char *cp;
+	char last = '\0';
+
+	for (cp = ref; ; cp++) {
+		char ch = *cp;
+		if (ch == '\0' || ch == '/')
+			break;
+		if (bad_ref_char(ch))
+			return -1; /* Illegal character in refname. */
+		if (last == '.' && ch == '.')
+			return -1; /* Refname contains "..". */
+		if (last == '@' && ch == '{')
+			return -1; /* Refname contains "@{". */
+		last = ch;
+	}
+	if (cp == ref)
+		return -1; /* Component has zero length. */
+	if (ref[0] == '.')
+		return -1; /* Component starts with '.'. */
+	return cp - ref;
+}
+
 int check_refname_format(const char *ref, int flags)
 {
-	int ch, level, last;
-	const char *cp = ref;
+	int component_len, component_count = 0;
 
-	level = 0;
 	while (1) {
-		while ((ch = *cp++) == '/')
-			; /* tolerate duplicated slashes */
-		if (!ch)
-			/* should not end with slashes */
-			return -1;
-
-		/* we are at the beginning of the path component */
-		if (ch == '.')
-			return -1;
-		if (bad_ref_char(ch)) {
-			if ((flags & REFNAME_REFSPEC_PATTERN) && ch == '*' &&
-				(!*cp || *cp == '/'))
+		while (*ref == '/')
+			ref++; /* tolerate leading and repeated slashes */
+
+		/* We are at the start of a path component. */
+		component_len = check_refname_component(ref);
+		if (component_len < 0) {
+			if ((flags & REFNAME_REFSPEC_PATTERN) &&
+					ref[0] == '*' &&
+					(ref[1] == '\0' || ref[1] == '/')) {
 				/* Accept one wildcard as a full refname component. */
 				flags &= ~REFNAME_REFSPEC_PATTERN;
-			else
-				return -1;
-		}
-
-		last = ch;
-		/* scan the rest of the path component */
-		while ((ch = *cp++) != 0) {
-			if (bad_ref_char(ch))
-				return -1;
-			if (ch == '/')
-				break;
-			if (last == '.' && ch == '.')
-				return -1;
-			if (last == '@' && ch == '{')
-				return -1;
-			last = ch;
-		}
-		level++;
-		if (!ch) {
-			if (ref <= cp - 2 && cp[-2] == '.')
-				return -1;
-			if (level < 2 && !(flags & REFNAME_ALLOW_ONELEVEL))
-				return -1;
-			if (has_extension(ref, ".lock"))
+				component_len = 1;
+			} else {
 				return -1;
-			return 0;
+			}
 		}
+		component_count++;
+		if (ref[component_len] == '\0')
+			break;
+		/* Skip to next component. */
+		ref += component_len + 1;
 	}
+
+	if (ref[component_len - 1] == '.')
+		return -1; /* Refname ends with '.'. */
+	if (component_len >= 5 && !memcmp(&ref[component_len - 5], ".lock", 5))
+		return -1; /* Refname ends with ".lock". */
+	if (!(flags & REFNAME_ALLOW_ONELEVEL) && component_count < 2)
+		return -1; /* Refname has only one component. */
+	return 0;
 }
 
 const char *prettify_refname(const char *name)
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 06/22] Do not allow ".lock" at the end of any refname component
  2011-09-15 21:10 [PATCH v3 00/22] Clean up refname checks and normalization Michael Haggerty
                   ` (4 preceding siblings ...)
  2011-09-15 21:10 ` [PATCH v3 05/22] Refactor check_refname_format() Michael Haggerty
@ 2011-09-15 21:10 ` Michael Haggerty
  2011-09-15 21:10 ` [PATCH v3 07/22] Make collapse_slashes() allocate memory for its result Michael Haggerty
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 60+ messages in thread
From: Michael Haggerty @ 2011-09-15 21:10 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty

Allowing any refname component to end with ".lock" is looking for
trouble; for example,

    $ git br foo.lock/bar
    $ git br foo
    fatal: Unable to create '[...]/.git/refs/heads/foo.lock': File exists.

Therefore, do not allow any refname component to end with ".lock".

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---

This change was discussed on the mailing list [1].  It is regrettable
that we can't change the name of the lock files to something that
cannot appear in a reference name (like .refname.lock), but such a
change would cause problems if two versions of git are simultaneously
accessing the same repository.

[1] http://thread.gmane.org/gmane.comp.version-control.git/181051/focus=181069

 Documentation/git-check-ref-format.txt |    4 +---
 refs.c                                 |    4 ++--
 t/t1402-check-ref-format.sh            |    8 ++++----
 3 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-check-ref-format.txt b/Documentation/git-check-ref-format.txt
index dcb8cc3..9114751 100644
--- a/Documentation/git-check-ref-format.txt
+++ b/Documentation/git-check-ref-format.txt
@@ -28,7 +28,7 @@ git imposes the following rules on how references are named:
 
 . They can include slash `/` for hierarchical (directory)
   grouping, but no slash-separated component can begin with a
-  dot `.`.
+  dot `.` or end with the sequence `.lock`.
 
 . They must contain at least one `/`. This enforces the presence of a
   category like `heads/`, `tags/` etc. but the actual names are not
@@ -47,8 +47,6 @@ git imposes the following rules on how references are named:
 
 . They cannot end with a slash `/` nor a dot `.`.
 
-. They cannot end with the sequence `.lock`.
-
 . They cannot contain a sequence `@{`.
 
 . They cannot contain a `\`.
diff --git a/refs.c b/refs.c
index 5259724..5a0bd0f 100644
--- a/refs.c
+++ b/refs.c
@@ -898,6 +898,8 @@ static int check_refname_component(const char *ref)
 		return -1; /* Component has zero length. */
 	if (ref[0] == '.')
 		return -1; /* Component starts with '.'. */
+	if (cp - ref >= 5 && !memcmp(cp - 5, ".lock", 5))
+		return -1; /* Refname ends with ".lock". */
 	return cp - ref;
 }
 
@@ -931,8 +933,6 @@ int check_refname_format(const char *ref, int flags)
 
 	if (ref[component_len - 1] == '.')
 		return -1; /* Refname ends with '.'. */
-	if (component_len >= 5 && !memcmp(&ref[component_len - 5], ".lock", 5))
-		return -1; /* Refname ends with ".lock". */
 	if (!(flags & REFNAME_ALLOW_ONELEVEL) && component_count < 2)
 		return -1; /* Refname has only one component. */
 	return 0;
diff --git a/t/t1402-check-ref-format.sh b/t/t1402-check-ref-format.sh
index 1cad88f..419788f 100755
--- a/t/t1402-check-ref-format.sh
+++ b/t/t1402-check-ref-format.sh
@@ -43,8 +43,8 @@ invalid_ref 'heads/foo?bar'
 valid_ref 'foo./bar'
 invalid_ref 'heads/foo.lock'
 invalid_ref 'heads///foo.lock'
-valid_ref 'foo.lock/bar'
-valid_ref 'foo.lock///bar'
+invalid_ref 'foo.lock/bar'
+invalid_ref 'foo.lock///bar'
 valid_ref 'heads/foo@bar'
 invalid_ref 'heads/v@{ation'
 invalid_ref 'heads/foo\bar'
@@ -160,7 +160,7 @@ invalid_ref_normalized 'heads/./foo'
 invalid_ref_normalized 'heads\foo'
 invalid_ref_normalized 'heads/foo.lock'
 invalid_ref_normalized 'heads///foo.lock'
-valid_ref_normalized 'foo.lock/bar' 'foo.lock/bar'
-valid_ref_normalized 'foo.lock///bar' 'foo.lock/bar'
+invalid_ref_normalized 'foo.lock/bar'
+invalid_ref_normalized 'foo.lock///bar'
 
 test_done
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 07/22] Make collapse_slashes() allocate memory for its result
  2011-09-15 21:10 [PATCH v3 00/22] Clean up refname checks and normalization Michael Haggerty
                   ` (5 preceding siblings ...)
  2011-09-15 21:10 ` [PATCH v3 06/22] Do not allow ".lock" at the end of any refname component Michael Haggerty
@ 2011-09-15 21:10 ` Michael Haggerty
  2011-09-15 21:10 ` [PATCH v3 08/22] Inline function refname_format_print() Michael Haggerty
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 60+ messages in thread
From: Michael Haggerty @ 2011-09-15 21:10 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty

This will make upcoming changes a tiny bit easier.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 builtin/check-ref-format.c |   19 ++++++++++---------
 1 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c
index 8f41696..989ee5c 100644
--- a/builtin/check-ref-format.c
+++ b/builtin/check-ref-format.c
@@ -12,25 +12,28 @@ static const char builtin_check_ref_format_usage[] =
 "   or: git check-ref-format --branch <branchname-shorthand>";
 
 /*
- * Remove leading slashes and replace each run of adjacent slashes in
- * src with a single slash, and write the result to dst.
+ * Return a copy of refname but with leading slashes removed and runs
+ * of adjacent slashes replaced with single slashes.
  *
  * This function is similar to normalize_path_copy(), but stripped down
  * to meet check_ref_format's simpler needs.
  */
-static void collapse_slashes(char *dst, const char *src)
+static char *collapse_slashes(const char *refname)
 {
+	char *ret = xmalloc(strlen(refname) + 1);
 	char ch;
 	char prev = '/';
+	char *cp = ret;
 
-	while ((ch = *src++) != '\0') {
+	while ((ch = *refname++) != '\0') {
 		if (prev == '/' && ch == prev)
 			continue;
 
-		*dst++ = ch;
+		*cp++ = ch;
 		prev = ch;
 	}
-	*dst = '\0';
+	*cp = '\0';
+	return ret;
 }
 
 static int check_ref_format_branch(const char *arg)
@@ -47,9 +50,7 @@ static int check_ref_format_branch(const char *arg)
 
 static void refname_format_print(const char *arg)
 {
-	char *refname = xmalloc(strlen(arg) + 1);
-
-	collapse_slashes(refname, arg);
+	char *refname = collapse_slashes(arg);
 	printf("%s\n", refname);
 }
 
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 08/22] Inline function refname_format_print()
  2011-09-15 21:10 [PATCH v3 00/22] Clean up refname checks and normalization Michael Haggerty
                   ` (6 preceding siblings ...)
  2011-09-15 21:10 ` [PATCH v3 07/22] Make collapse_slashes() allocate memory for its result Michael Haggerty
@ 2011-09-15 21:10 ` Michael Haggerty
  2011-09-15 21:10 ` [PATCH v3 09/22] Change check_refname_format() to reject unnormalized refnames Michael Haggerty
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 60+ messages in thread
From: Michael Haggerty @ 2011-09-15 21:10 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty

Soon we will make printing independent of collapsing.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 builtin/check-ref-format.c |   16 +++++++---------
 1 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c
index 989ee5c..f5df9aa 100644
--- a/builtin/check-ref-format.c
+++ b/builtin/check-ref-format.c
@@ -48,17 +48,12 @@ static int check_ref_format_branch(const char *arg)
 	return 0;
 }
 
-static void refname_format_print(const char *arg)
-{
-	char *refname = collapse_slashes(arg);
-	printf("%s\n", refname);
-}
-
 int cmd_check_ref_format(int argc, const char **argv, const char *prefix)
 {
 	int i;
 	int print = 0;
 	int flags = 0;
+	const char *refname;
 
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage(builtin_check_ref_format_usage);
@@ -81,11 +76,14 @@ int cmd_check_ref_format(int argc, const char **argv, const char *prefix)
 	if (! (i == argc - 1))
 		usage(builtin_check_ref_format_usage);
 
-	if (check_refname_format(argv[i], flags))
+	refname = argv[i];
+	if (check_refname_format(refname, flags))
 		return 1;
 
-	if (print)
-		refname_format_print(argv[i]);
+	if (print) {
+		refname = collapse_slashes(refname);
+		printf("%s\n", refname);
+	}
 
 	return 0;
 }
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 09/22] Change check_refname_format() to reject unnormalized refnames
  2011-09-15 21:10 [PATCH v3 00/22] Clean up refname checks and normalization Michael Haggerty
                   ` (7 preceding siblings ...)
  2011-09-15 21:10 ` [PATCH v3 08/22] Inline function refname_format_print() Michael Haggerty
@ 2011-09-15 21:10 ` Michael Haggerty
  2011-09-15 21:10 ` [PATCH v3 10/22] resolve_ref(): explicitly fail if a symlink is not readable Michael Haggerty
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 60+ messages in thread
From: Michael Haggerty @ 2011-09-15 21:10 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty

Since much of the infrastructure does not work correctly with
unnormalized refnames, change check_refname_format() to reject them.

Similarly, change "git check-ref-format" to reject unnormalized
refnames by default.  But add an option --normalize, which causes "git
check-ref-format" to normalize the refname before checking its format,
and print the normalized refname.  This is exactly the behavior of the
old --print option, which is retained but deprecated.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 Documentation/git-check-ref-format.txt |   26 ++++++++++++++++++--------
 builtin/check-ref-format.c             |   15 +++++++--------
 refs.c                                 |    3 ---
 refs.h                                 |    2 +-
 t/t1402-check-ref-format.sh            |   31 +++++++++++++++++++++++--------
 5 files changed, 49 insertions(+), 28 deletions(-)

diff --git a/Documentation/git-check-ref-format.txt b/Documentation/git-check-ref-format.txt
index 9114751..103e7b1 100644
--- a/Documentation/git-check-ref-format.txt
+++ b/Documentation/git-check-ref-format.txt
@@ -8,8 +8,9 @@ git-check-ref-format - Ensures that a reference name is well formed
 SYNOPSIS
 --------
 [verse]
-'git check-ref-format' [--print]
-       [--[no-]allow-onelevel] [--refspec-pattern] <refname>
+'git check-ref-format' [--normalize]
+       [--[no-]allow-onelevel] [--refspec-pattern]
+       <refname>
 'git check-ref-format' --branch <branchname-shorthand>
 
 DESCRIPTION
@@ -45,7 +46,11 @@ git imposes the following rules on how references are named:
   bracket `[` anywhere.  See the `--refspec-pattern` option below for
   an exception to this rule.
 
-. They cannot end with a slash `/` nor a dot `.`.
+. They cannot begin or end with a slash `/` or contain multiple
+  consecutive slashes (see the `--normalize` option below for an
+  exception to this rule)
+
+. They cannot end with a dot `.`.
 
 . They cannot contain a sequence `@{`.
 
@@ -70,10 +75,6 @@ reference name expressions (see linkgit:gitrevisions[7]):
 
 . at-open-brace `@{` is used as a notation to access a reflog entry.
 
-With the `--print` option, if 'refname' is acceptable, it prints the
-canonicalized name of a hypothetical reference with that name.  That is,
-it prints 'refname' with any extra `/` characters removed.
-
 With the `--branch` option, it expands the ``previous branch syntax''
 `@{-n}`.  For example, `@{-1}` is a way to refer the last branch you
 were on.  This option should be used by porcelains to accept this
@@ -95,6 +96,15 @@ OPTIONS
 	in place of a one full pathname component (e.g.,
 	`foo/{asterisk}/bar` but not `foo/bar{asterisk}`).
 
+--normalize::
+	Normalize 'refname' by removing any leading slash (`/`)
+	characters and collapsing runs of adjacent slashes between
+	name components into a single slash.  Iff the normalized
+	refname is valid then print it to standard output and exit
+	with a status of 0.  (`--print` is a deprecated way to spell
+	`--normalize`.)
+
+
 EXAMPLES
 --------
 
@@ -107,7 +117,7 @@ $ git check-ref-format --branch @{-1}
 * Determine the reference name to use for a new branch:
 +
 ------------
-$ ref=$(git check-ref-format --print "refs/heads/$newbranch") ||
+$ ref=$(git check-ref-format --normalize "refs/heads/$newbranch") ||
 die "we do not like '$newbranch' as a branch name."
 ------------
 
diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c
index f5df9aa..28a7320 100644
--- a/builtin/check-ref-format.c
+++ b/builtin/check-ref-format.c
@@ -8,7 +8,7 @@
 #include "strbuf.h"
 
 static const char builtin_check_ref_format_usage[] =
-"git check-ref-format [--print] [options] <refname>\n"
+"git check-ref-format [--normalize] [options] <refname>\n"
 "   or: git check-ref-format --branch <branchname-shorthand>";
 
 /*
@@ -51,7 +51,7 @@ static int check_ref_format_branch(const char *arg)
 int cmd_check_ref_format(int argc, const char **argv, const char *prefix)
 {
 	int i;
-	int print = 0;
+	int normalize = 0;
 	int flags = 0;
 	const char *refname;
 
@@ -62,8 +62,8 @@ int cmd_check_ref_format(int argc, const char **argv, const char *prefix)
 		return check_ref_format_branch(argv[2]);
 
 	for (i = 1; i < argc && argv[i][0] == '-'; i++) {
-		if (!strcmp(argv[i], "--print"))
-			print = 1;
+		if (!strcmp(argv[i], "--normalize") || !strcmp(argv[i], "--print"))
+			normalize = 1;
 		else if (!strcmp(argv[i], "--allow-onelevel"))
 			flags |= REFNAME_ALLOW_ONELEVEL;
 		else if (!strcmp(argv[i], "--no-allow-onelevel"))
@@ -77,13 +77,12 @@ int cmd_check_ref_format(int argc, const char **argv, const char *prefix)
 		usage(builtin_check_ref_format_usage);
 
 	refname = argv[i];
+	if (normalize)
+		refname = collapse_slashes(refname);
 	if (check_refname_format(refname, flags))
 		return 1;
-
-	if (print) {
-		refname = collapse_slashes(refname);
+	if (normalize)
 		printf("%s\n", refname);
-	}
 
 	return 0;
 }
diff --git a/refs.c b/refs.c
index 5a0bd0f..d2aac24 100644
--- a/refs.c
+++ b/refs.c
@@ -908,9 +908,6 @@ int check_refname_format(const char *ref, int flags)
 	int component_len, component_count = 0;
 
 	while (1) {
-		while (*ref == '/')
-			ref++; /* tolerate leading and repeated slashes */
-
 		/* We are at the start of a path component. */
 		component_len = check_refname_component(ref);
 		if (component_len < 0) {
diff --git a/refs.h b/refs.h
index 48540c0..b0da5fc 100644
--- a/refs.h
+++ b/refs.h
@@ -106,7 +106,7 @@ extern int for_each_reflog(each_ref_fn, void *);
  * REFNAME_ALLOW_ONELEVEL is set in flags, then accept one-level
  * reference names.  If REFNAME_REFSPEC_PATTERN is set in flags, then
  * allow a "*" wildcard character in place of one of the name
- * components.
+ * components.  No leading or repeated slashes are accepted.
  */
 extern int check_refname_format(const char *ref, int flags);
 
diff --git a/t/t1402-check-ref-format.sh b/t/t1402-check-ref-format.sh
index 419788f..710fcca 100755
--- a/t/t1402-check-ref-format.sh
+++ b/t/t1402-check-ref-format.sh
@@ -28,11 +28,17 @@ invalid_ref() {
 invalid_ref ''
 invalid_ref '/'
 invalid_ref '/' --allow-onelevel
+invalid_ref '/' --normalize
+invalid_ref '/' '--allow-onelevel --normalize'
 valid_ref 'foo/bar/baz'
-valid_ref 'refs///heads/foo'
+valid_ref 'foo/bar/baz' --normalize
+invalid_ref 'refs///heads/foo'
+valid_ref 'refs///heads/foo' --normalize
 invalid_ref 'heads/foo/'
-valid_ref '/heads/foo'
-valid_ref '///heads/foo'
+invalid_ref '/heads/foo'
+valid_ref '/heads/foo' --normalize
+invalid_ref '///heads/foo'
+valid_ref '///heads/foo' --normalize
 invalid_ref './foo'
 invalid_ref './foo/bar'
 invalid_ref 'foo/./bar'
@@ -60,12 +66,15 @@ invalid_ref "$ref"
 valid_ref "$ref" --allow-onelevel
 invalid_ref "$ref" --refspec-pattern
 valid_ref "$ref" '--refspec-pattern --allow-onelevel'
+invalid_ref "$ref" --normalize
+valid_ref "$ref" '--allow-onelevel --normalize'
 
 ref='foo/bar'
 valid_ref "$ref"
 valid_ref "$ref" --allow-onelevel
 valid_ref "$ref" --refspec-pattern
 valid_ref "$ref" '--refspec-pattern --allow-onelevel'
+valid_ref "$ref" --normalize
 
 ref='foo/*'
 invalid_ref "$ref"
@@ -78,6 +87,8 @@ invalid_ref "$ref"
 invalid_ref "$ref" --allow-onelevel
 valid_ref "$ref" --refspec-pattern
 valid_ref "$ref" '--refspec-pattern --allow-onelevel'
+invalid_ref "$ref" --normalize
+valid_ref "$ref" '--refspec-pattern --normalize'
 
 ref='foo/*/bar'
 invalid_ref "$ref"
@@ -105,9 +116,13 @@ invalid_ref "$ref" '--refspec-pattern --allow-onelevel'
 
 ref='/foo'
 invalid_ref "$ref"
-valid_ref "$ref" --allow-onelevel
+invalid_ref "$ref" --allow-onelevel
 invalid_ref "$ref" --refspec-pattern
-valid_ref "$ref" '--refspec-pattern --allow-onelevel'
+invalid_ref "$ref" '--refspec-pattern --allow-onelevel'
+invalid_ref "$ref" --normalize
+valid_ref "$ref" '--allow-onelevel --normalize'
+invalid_ref "$ref" '--refspec-pattern --normalize'
+valid_ref "$ref" '--refspec-pattern --allow-onelevel --normalize'
 
 test_expect_success "check-ref-format --branch @{-1}" '
 	T=$(git write-tree) &&
@@ -141,12 +156,12 @@ test_expect_success 'check-ref-format --branch from subdir' '
 
 valid_ref_normalized() {
 	test_expect_success "ref name '$1' simplifies to '$2'" "
-		refname=\$(git check-ref-format --print '$1') &&
+		refname=\$(git check-ref-format --normalize '$1') &&
 		test \"\$refname\" = '$2'"
 }
 invalid_ref_normalized() {
-	test_expect_success "check-ref-format --print rejects '$1'" "
-		test_must_fail git check-ref-format --print '$1'"
+	test_expect_success "check-ref-format --normalize rejects '$1'" "
+		test_must_fail git check-ref-format --normalize '$1'"
 }
 
 valid_ref_normalized 'heads/foo' 'heads/foo'
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 10/22] resolve_ref(): explicitly fail if a symlink is not readable
  2011-09-15 21:10 [PATCH v3 00/22] Clean up refname checks and normalization Michael Haggerty
                   ` (8 preceding siblings ...)
  2011-09-15 21:10 ` [PATCH v3 09/22] Change check_refname_format() to reject unnormalized refnames Michael Haggerty
@ 2011-09-15 21:10 ` Michael Haggerty
  2011-09-15 21:10 ` [PATCH v3 11/22] resolve_ref(): use prefixcmp() Michael Haggerty
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 60+ messages in thread
From: Michael Haggerty @ 2011-09-15 21:10 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty

Previously the failure came later, after a few steps in which the
length was treated like the actual length of a string.  Even though
the old code gave the same answers, it was somewhat misleading.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/refs.c b/refs.c
index d2aac24..c51fd45 100644
--- a/refs.c
+++ b/refs.c
@@ -518,6 +518,8 @@ const char *resolve_ref(const char *ref, unsigned char *sha1, int reading, int *
 		/* Follow "normalized" - ie "refs/.." symlinks by hand */
 		if (S_ISLNK(st.st_mode)) {
 			len = readlink(path, buffer, sizeof(buffer)-1);
+			if (len < 0)
+				return NULL;
 			if (len >= 5 && !memcmp("refs/", buffer, 5)) {
 				buffer[len] = 0;
 				strcpy(ref_buffer, buffer);
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 11/22] resolve_ref(): use prefixcmp()
  2011-09-15 21:10 [PATCH v3 00/22] Clean up refname checks and normalization Michael Haggerty
                   ` (9 preceding siblings ...)
  2011-09-15 21:10 ` [PATCH v3 10/22] resolve_ref(): explicitly fail if a symlink is not readable Michael Haggerty
@ 2011-09-15 21:10 ` Michael Haggerty
  2011-09-15 21:10 ` [PATCH v3 12/22] resolve_ref(): only follow a symlink that contains a valid, normalized refname Michael Haggerty
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 60+ messages in thread
From: Michael Haggerty @ 2011-09-15 21:10 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty

Terminate the link content string one step earlier, allowing
prefixcmp() to be used instead of the less clear memcmp().

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/refs.c b/refs.c
index c51fd45..da9737f 100644
--- a/refs.c
+++ b/refs.c
@@ -520,8 +520,8 @@ const char *resolve_ref(const char *ref, unsigned char *sha1, int reading, int *
 			len = readlink(path, buffer, sizeof(buffer)-1);
 			if (len < 0)
 				return NULL;
-			if (len >= 5 && !memcmp("refs/", buffer, 5)) {
-				buffer[len] = 0;
+			buffer[len] = 0;
+			if (!prefixcmp(buffer, "refs/")) {
 				strcpy(ref_buffer, buffer);
 				ref = ref_buffer;
 				if (flag)
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 12/22] resolve_ref(): only follow a symlink that contains a valid, normalized refname
  2011-09-15 21:10 [PATCH v3 00/22] Clean up refname checks and normalization Michael Haggerty
                   ` (10 preceding siblings ...)
  2011-09-15 21:10 ` [PATCH v3 11/22] resolve_ref(): use prefixcmp() Michael Haggerty
@ 2011-09-15 21:10 ` Michael Haggerty
  2011-09-15 21:10 ` [PATCH v3 13/22] resolve_ref(): turn buffer into a proper string as soon as possible Michael Haggerty
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 60+ messages in thread
From: Michael Haggerty @ 2011-09-15 21:10 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty


Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/refs.c b/refs.c
index da9737f..8f0b871 100644
--- a/refs.c
+++ b/refs.c
@@ -521,7 +521,8 @@ const char *resolve_ref(const char *ref, unsigned char *sha1, int reading, int *
 			if (len < 0)
 				return NULL;
 			buffer[len] = 0;
-			if (!prefixcmp(buffer, "refs/")) {
+			if (!prefixcmp(buffer, "refs/") &&
+					!check_refname_format(buffer, 0)) {
 				strcpy(ref_buffer, buffer);
 				ref = ref_buffer;
 				if (flag)
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 13/22] resolve_ref(): turn buffer into a proper string as soon as possible
  2011-09-15 21:10 [PATCH v3 00/22] Clean up refname checks and normalization Michael Haggerty
                   ` (11 preceding siblings ...)
  2011-09-15 21:10 ` [PATCH v3 12/22] resolve_ref(): only follow a symlink that contains a valid, normalized refname Michael Haggerty
@ 2011-09-15 21:10 ` Michael Haggerty
  2011-09-23  8:17   ` Thomas Rast
  2011-09-15 21:10 ` [PATCH v3 14/22] resolve_ref(): extract a function get_packed_ref() Michael Haggerty
                   ` (8 subsequent siblings)
  21 siblings, 1 reply; 60+ messages in thread
From: Michael Haggerty @ 2011-09-15 21:10 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty

Immediately strip off trailing spaces and null-terminate the string
holding the contents of the reference file; this allows the use of
string functions and avoids the need to keep separate track of the
string's length.  (get_sha1_hex() fails automatically if the string is
too short.)

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.c |   20 ++++++++++----------
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/refs.c b/refs.c
index 8f0b871..79ab0eb 100644
--- a/refs.c
+++ b/refs.c
@@ -546,25 +546,25 @@ const char *resolve_ref(const char *ref, unsigned char *sha1, int reading, int *
 			return NULL;
 		len = read_in_full(fd, buffer, sizeof(buffer)-1);
 		close(fd);
+		if (len < 0)
+			return NULL;
+		while (len && isspace(buffer[len-1]))
+			len--;
+		buffer[len] = '\0';
 
 		/*
 		 * Is it a symbolic ref?
 		 */
-		if (len < 4 || memcmp("ref:", buffer, 4))
+		if (prefixcmp(buffer, "ref:"))
 			break;
 		buf = buffer + 4;
-		len -= 4;
-		while (len && isspace(*buf))
-			buf++, len--;
-		while (len && isspace(buf[len-1]))
-			len--;
-		buf[len] = 0;
-		memcpy(ref_buffer, buf, len + 1);
-		ref = ref_buffer;
+		while (isspace(*buf))
+			buf++;
+		ref = strcpy(ref_buffer, buf);
 		if (flag)
 			*flag |= REF_ISSYMREF;
 	}
-	if (len < 40 || get_sha1_hex(buffer, sha1))
+	if (get_sha1_hex(buffer, sha1))
 		return NULL;
 	return ref;
 }
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 14/22] resolve_ref(): extract a function get_packed_ref()
  2011-09-15 21:10 [PATCH v3 00/22] Clean up refname checks and normalization Michael Haggerty
                   ` (12 preceding siblings ...)
  2011-09-15 21:10 ` [PATCH v3 13/22] resolve_ref(): turn buffer into a proper string as soon as possible Michael Haggerty
@ 2011-09-15 21:10 ` Michael Haggerty
  2011-09-15 21:10 ` [PATCH v3 15/22] resolve_ref(): do not follow incorrectly-formatted symbolic refs Michael Haggerty
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 60+ messages in thread
From: Michael Haggerty @ 2011-09-15 21:10 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty

Making it a function and giving it a name makes the code clearer.  I
also have a strong suspicion that the function will find other uses in
the future.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.c |   47 ++++++++++++++++++++++++++++++++++-------------
 1 files changed, 34 insertions(+), 13 deletions(-)

diff --git a/refs.c b/refs.c
index 79ab0eb..473f7f6 100644
--- a/refs.c
+++ b/refs.c
@@ -466,6 +466,23 @@ int resolve_gitlink_ref(const char *path, const char *refname, unsigned char *re
 }
 
 /*
+ * Try to read ref from the packed references.  On success, set sha1
+ * and return 0; otherwise, return -1.
+ */
+static int get_packed_ref(const char *ref, unsigned char *sha1)
+{
+	struct ref_list *list = get_packed_refs(NULL);
+	while (list) {
+		if (!strcmp(ref, list->name)) {
+			hashcpy(sha1, list->sha1);
+			return 0;
+		}
+		list = list->next;
+	}
+	return -1;
+}
+
+/*
  * If the "reading" argument is set, this function finds out what _object_
  * the ref points at by "reading" the ref.  The ref, if it is not symbolic,
  * has to exist, and if it is symbolic, it has to point at an existing ref,
@@ -497,22 +514,26 @@ const char *resolve_ref(const char *ref, unsigned char *sha1, int reading, int *
 			return NULL;
 
 		git_snpath(path, sizeof(path), "%s", ref);
-		/* Special case: non-existing file. */
+
 		if (lstat(path, &st) < 0) {
-			struct ref_list *list = get_packed_refs(NULL);
-			while (list) {
-				if (!strcmp(ref, list->name)) {
-					hashcpy(sha1, list->sha1);
-					if (flag)
-						*flag |= REF_ISPACKED;
-					return ref;
-				}
-				list = list->next;
+			if (errno != ENOENT)
+				return NULL;
+			/*
+			 * The loose reference file does not exist;
+			 * check for a packed reference.
+			 */
+			if (!get_packed_ref(ref, sha1)) {
+				if (flag)
+					*flag |= REF_ISPACKED;
+				return ref;
 			}
-			if (reading || errno != ENOENT)
+			/* The reference is not a packed reference, either. */
+			if (reading) {
 				return NULL;
-			hashclr(sha1);
-			return ref;
+			} else {
+				hashclr(sha1);
+				return ref;
+			}
 		}
 
 		/* Follow "normalized" - ie "refs/.." symlinks by hand */
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 15/22] resolve_ref(): do not follow incorrectly-formatted symbolic refs
  2011-09-15 21:10 [PATCH v3 00/22] Clean up refname checks and normalization Michael Haggerty
                   ` (13 preceding siblings ...)
  2011-09-15 21:10 ` [PATCH v3 14/22] resolve_ref(): extract a function get_packed_ref() Michael Haggerty
@ 2011-09-15 21:10 ` Michael Haggerty
  2011-09-15 21:10 ` [PATCH v3 16/22] remote: use xstrdup() instead of strdup() Michael Haggerty
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 60+ messages in thread
From: Michael Haggerty @ 2011-09-15 21:10 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty

Emit a warning and fail if a symbolic reference refers to an
incorrectly-formatted refname.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/refs.c b/refs.c
index 473f7f6..b055501 100644
--- a/refs.c
+++ b/refs.c
@@ -581,6 +581,11 @@ const char *resolve_ref(const char *ref, unsigned char *sha1, int reading, int *
 		buf = buffer + 4;
 		while (isspace(*buf))
 			buf++;
+		if (check_refname_format(buf, REFNAME_ALLOW_ONELEVEL)) {
+			warning("symbolic reference in %s is formatted incorrectly",
+				path);
+			return NULL;
+		}
 		ref = strcpy(ref_buffer, buf);
 		if (flag)
 			*flag |= REF_ISSYMREF;
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 16/22] remote: use xstrdup() instead of strdup()
  2011-09-15 21:10 [PATCH v3 00/22] Clean up refname checks and normalization Michael Haggerty
                   ` (14 preceding siblings ...)
  2011-09-15 21:10 ` [PATCH v3 15/22] resolve_ref(): do not follow incorrectly-formatted symbolic refs Michael Haggerty
@ 2011-09-15 21:10 ` Michael Haggerty
  2011-09-15 21:10 ` [PATCH v3 17/22] remote: avoid passing NULL to read_ref() Michael Haggerty
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 60+ messages in thread
From: Michael Haggerty @ 2011-09-15 21:10 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty


Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 remote.c           |    2 +-
 transport-helper.c |    4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/remote.c b/remote.c
index 6fcf809..e52aa9b 100644
--- a/remote.c
+++ b/remote.c
@@ -815,7 +815,7 @@ char *apply_refspecs(struct refspec *refspecs, int nr_refspec,
 						    refspec->dst, &ret))
 				return ret;
 		} else if (!strcmp(refspec->src, name))
-			return strdup(refspec->dst);
+			return xstrdup(refspec->dst);
 	}
 	return NULL;
 }
diff --git a/transport-helper.c b/transport-helper.c
index 4eab844..0713126 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -183,7 +183,7 @@ static struct child_process *get_helper(struct transport *transport)
 			ALLOC_GROW(refspecs,
 				   refspec_nr + 1,
 				   refspec_alloc);
-			refspecs[refspec_nr++] = strdup(capname + strlen("refspec "));
+			refspecs[refspec_nr++] = xstrdup(capname + strlen("refspec "));
 		} else if (!strcmp(capname, "connect")) {
 			data->connect = 1;
 		} else if (!prefixcmp(capname, "export-marks ")) {
@@ -445,7 +445,7 @@ static int fetch_with_import(struct transport *transport,
 		if (data->refspecs)
 			private = apply_refspecs(data->refspecs, data->refspec_nr, posn->name);
 		else
-			private = strdup(posn->name);
+			private = xstrdup(posn->name);
 		read_ref(private, posn->old_sha1);
 		free(private);
 	}
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 17/22] remote: avoid passing NULL to read_ref()
  2011-09-15 21:10 [PATCH v3 00/22] Clean up refname checks and normalization Michael Haggerty
                   ` (15 preceding siblings ...)
  2011-09-15 21:10 ` [PATCH v3 16/22] remote: use xstrdup() instead of strdup() Michael Haggerty
@ 2011-09-15 21:10 ` Michael Haggerty
  2011-09-15 21:10 ` [PATCH v3 18/22] resolve_ref(): verify that the input refname has the right format Michael Haggerty
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 60+ messages in thread
From: Michael Haggerty @ 2011-09-15 21:10 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty

read_ref() can (and in test t5800, actually *does*) return NULL.
Don't pass the NULL along to read_ref().  Coincidentally, this mistake
didn't make resolve_ref() blow up, but upcoming changes to
resolve_ref() will make it less forgiving.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 transport-helper.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/transport-helper.c b/transport-helper.c
index 0713126..6f227e2 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -446,8 +446,10 @@ static int fetch_with_import(struct transport *transport,
 			private = apply_refspecs(data->refspecs, data->refspec_nr, posn->name);
 		else
 			private = xstrdup(posn->name);
-		read_ref(private, posn->old_sha1);
-		free(private);
+		if (private) {
+			read_ref(private, posn->old_sha1);
+			free(private);
+		}
 	}
 	strbuf_release(&buf);
 	return 0;
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 18/22] resolve_ref(): verify that the input refname has the right format
  2011-09-15 21:10 [PATCH v3 00/22] Clean up refname checks and normalization Michael Haggerty
                   ` (16 preceding siblings ...)
  2011-09-15 21:10 ` [PATCH v3 17/22] remote: avoid passing NULL to read_ref() Michael Haggerty
@ 2011-09-15 21:10 ` Michael Haggerty
  2011-09-15 21:10 ` [PATCH v3 19/22] resolve_ref(): emit warnings for improperly-formatted references Michael Haggerty
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 60+ messages in thread
From: Michael Haggerty @ 2011-09-15 21:10 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty


Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/refs.c b/refs.c
index b055501..ee3e0cc 100644
--- a/refs.c
+++ b/refs.c
@@ -504,6 +504,9 @@ const char *resolve_ref(const char *ref, unsigned char *sha1, int reading, int *
 	if (flag)
 		*flag = 0;
 
+	if (check_refname_format(ref, REFNAME_ALLOW_ONELEVEL))
+		return NULL;
+
 	for (;;) {
 		char path[PATH_MAX];
 		struct stat st;
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 19/22] resolve_ref(): emit warnings for improperly-formatted references
  2011-09-15 21:10 [PATCH v3 00/22] Clean up refname checks and normalization Michael Haggerty
                   ` (17 preceding siblings ...)
  2011-09-15 21:10 ` [PATCH v3 18/22] resolve_ref(): verify that the input refname has the right format Michael Haggerty
@ 2011-09-15 21:10 ` Michael Haggerty
  2011-10-11 16:16   ` Jeff King
  2011-09-15 21:10 ` [PATCH v3 20/22] resolve_ref(): also treat a too-long SHA1 as invalid Michael Haggerty
                   ` (2 subsequent siblings)
  21 siblings, 1 reply; 60+ messages in thread
From: Michael Haggerty @ 2011-09-15 21:10 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty

While resolving references, if a reference is found that is in an
unrecognized format, emit a warning (and then fail, as before).
Wouldn't *you* want to know?

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/refs.c b/refs.c
index ee3e0cc..2387f4e 100644
--- a/refs.c
+++ b/refs.c
@@ -500,6 +500,7 @@ const char *resolve_ref(const char *ref, unsigned char *sha1, int reading, int *
 	ssize_t len;
 	char buffer[256];
 	static char ref_buffer[256];
+	char path[PATH_MAX];
 
 	if (flag)
 		*flag = 0;
@@ -508,7 +509,6 @@ const char *resolve_ref(const char *ref, unsigned char *sha1, int reading, int *
 		return NULL;
 
 	for (;;) {
-		char path[PATH_MAX];
 		struct stat st;
 		char *buf;
 		int fd;
@@ -593,8 +593,10 @@ const char *resolve_ref(const char *ref, unsigned char *sha1, int reading, int *
 		if (flag)
 			*flag |= REF_ISSYMREF;
 	}
-	if (get_sha1_hex(buffer, sha1))
+	if (get_sha1_hex(buffer, sha1)) {
+		warning("reference in %s is formatted incorrectly", path);
 		return NULL;
+	}
 	return ref;
 }
 
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 20/22] resolve_ref(): also treat a too-long SHA1 as invalid
  2011-09-15 21:10 [PATCH v3 00/22] Clean up refname checks and normalization Michael Haggerty
                   ` (18 preceding siblings ...)
  2011-09-15 21:10 ` [PATCH v3 19/22] resolve_ref(): emit warnings for improperly-formatted references Michael Haggerty
@ 2011-09-15 21:10 ` Michael Haggerty
  2011-09-15 21:10 ` [PATCH v3 21/22] resolve_ref(): expand documentation Michael Haggerty
  2011-09-15 21:10 ` [PATCH v3 22/22] add_ref(): verify that the refname is formatted correctly Michael Haggerty
  21 siblings, 0 replies; 60+ messages in thread
From: Michael Haggerty @ 2011-09-15 21:10 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty

If the SHA1 in a reference file is not terminated by a space or
end-of-file, consider it malformed and emit a warning.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/refs.c b/refs.c
index 2387f4e..0baa500 100644
--- a/refs.c
+++ b/refs.c
@@ -593,7 +593,8 @@ const char *resolve_ref(const char *ref, unsigned char *sha1, int reading, int *
 		if (flag)
 			*flag |= REF_ISSYMREF;
 	}
-	if (get_sha1_hex(buffer, sha1)) {
+	/* Please note that FETCH_HEAD has a second line containing other data. */
+	if (get_sha1_hex(buffer, sha1) || (buffer[40] != '\0' && !isspace(buffer[40]))) {
 		warning("reference in %s is formatted incorrectly", path);
 		return NULL;
 	}
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 21/22] resolve_ref(): expand documentation
  2011-09-15 21:10 [PATCH v3 00/22] Clean up refname checks and normalization Michael Haggerty
                   ` (19 preceding siblings ...)
  2011-09-15 21:10 ` [PATCH v3 20/22] resolve_ref(): also treat a too-long SHA1 as invalid Michael Haggerty
@ 2011-09-15 21:10 ` Michael Haggerty
  2011-09-15 21:10 ` [PATCH v3 22/22] add_ref(): verify that the refname is formatted correctly Michael Haggerty
  21 siblings, 0 replies; 60+ messages in thread
From: Michael Haggerty @ 2011-09-15 21:10 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty

Record information about resolve_ref(), hard-won via reverse
engineering, in a comment for future spelunkers.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 cache.h |   34 +++++++++++++++++++++++++++++++++-
 refs.c  |   12 ------------
 2 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/cache.h b/cache.h
index 607c2ea..aea8685 100644
--- a/cache.h
+++ b/cache.h
@@ -822,7 +822,39 @@ static inline int get_sha1_with_context(const char *str, unsigned char *sha1, st
 extern int get_sha1_hex(const char *hex, unsigned char *sha1);
 extern char *sha1_to_hex(const unsigned char *sha1);	/* static buffer result! */
 extern int read_ref(const char *filename, unsigned char *sha1);
-extern const char *resolve_ref(const char *path, unsigned char *sha1, int, int *);
+
+/*
+ * Resolve a reference, recursively following symbolic refererences.
+ *
+ * Store the referred-to object's name in sha1 and return the name of
+ * the non-symbolic reference that ultimately pointed at it.  The
+ * return value, if not NULL, is a pointer into either a static buffer
+ * or the input ref.
+ *
+ * If the reference cannot be resolved to an object, the behavior
+ * depends on the "reading" argument:
+ *
+ * - If reading is set, return NULL.
+ *
+ * - If reading is not set, clear sha1 and return the name of the last
+ *   reference name in the chain, which will either be a non-symbolic
+ *   reference or an undefined reference.  If this is a prelude to
+ *   "writing" to the ref, the return value is the name of the ref
+ *   that will actually be created or changed.
+ *
+ * If flag is non-NULL, set the value that it points to the
+ * combination of REF_ISPACKED (if the reference was found among the
+ * packed references) and REF_ISSYMREF (if the initial reference was a
+ * symbolic reference).
+ *
+ * If ref is not a properly-formatted, normalized reference, return
+ * NULL.  If more than MAXDEPTH recursive symbolic lookups are needed,
+ * give up and return NULL.
+ *
+ * errno is sometimes set on errors, but not always.
+ */
+extern const char *resolve_ref(const char *ref, unsigned char *sha1, int reading, int *flag);
+
 extern int dwim_ref(const char *str, int len, unsigned char *sha1, char **ref);
 extern int dwim_log(const char *str, int len, unsigned char *sha1, char **ref);
 extern int interpret_branch_name(const char *str, struct strbuf *);
diff --git a/refs.c b/refs.c
index 0baa500..096b42c 100644
--- a/refs.c
+++ b/refs.c
@@ -482,18 +482,6 @@ static int get_packed_ref(const char *ref, unsigned char *sha1)
 	return -1;
 }
 
-/*
- * If the "reading" argument is set, this function finds out what _object_
- * the ref points at by "reading" the ref.  The ref, if it is not symbolic,
- * has to exist, and if it is symbolic, it has to point at an existing ref,
- * because the "read" goes through the symref to the ref it points at.
- *
- * The access that is not "reading" may often be "writing", but does not
- * have to; it can be merely checking _where it leads to_. If it is a
- * prelude to "writing" to the ref, a write to a symref that points at
- * yet-to-be-born ref will create the real ref pointed by the symref.
- * reading=0 allows the caller to check where such a symref leads to.
- */
 const char *resolve_ref(const char *ref, unsigned char *sha1, int reading, int *flag)
 {
 	int depth = MAXDEPTH;
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 22/22] add_ref(): verify that the refname is formatted correctly
  2011-09-15 21:10 [PATCH v3 00/22] Clean up refname checks and normalization Michael Haggerty
                   ` (20 preceding siblings ...)
  2011-09-15 21:10 ` [PATCH v3 21/22] resolve_ref(): expand documentation Michael Haggerty
@ 2011-09-15 21:10 ` Michael Haggerty
  21 siblings, 0 replies; 60+ messages in thread
From: Michael Haggerty @ 2011-09-15 21:10 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier, Michael Haggerty

In add_ref(), verify that the refname is formatted correctly before
adding it to the ref_list.  Here we have to allow refname components
that start with ".", since (for example) the remote protocol uses
synthetic reference name ".have".  So add a new REFNAME_DOT_COMPONENT
flag that can be passed to check_refname_format() to allow leading
dots.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.c |   18 ++++++++++++++----
 refs.h |    6 +++++-
 2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/refs.c b/refs.c
index 096b42c..832a52f 100644
--- a/refs.c
+++ b/refs.c
@@ -56,6 +56,8 @@ static struct ref_list *add_ref(const char *name, const unsigned char *sha1,
 	entry = xmalloc(sizeof(struct ref_list) + len);
 	hashcpy(entry->sha1, sha1);
 	hashclr(entry->peeled);
+	if (check_refname_format(name, REFNAME_ALLOW_ONELEVEL|REFNAME_DOT_COMPONENT))
+		die("Reference has invalid format: '%s'", name);
 	memcpy(entry->name, name, len);
 	entry->flag = flag;
 	entry->next = list;
@@ -900,7 +902,7 @@ static inline int bad_ref_char(int ch)
  * the length of the component found, or -1 if the component is not
  * legal.
  */
-static int check_refname_component(const char *ref)
+static int check_refname_component(const char *ref, int flags)
 {
 	const char *cp;
 	char last = '\0';
@@ -919,8 +921,16 @@ static int check_refname_component(const char *ref)
 	}
 	if (cp == ref)
 		return -1; /* Component has zero length. */
-	if (ref[0] == '.')
-		return -1; /* Component starts with '.'. */
+	if (ref[0] == '.') {
+		if (!(flags & REFNAME_DOT_COMPONENT))
+			return -1; /* Component starts with '.'. */
+		/*
+		 * Even if leading dots are allowed, don't allow "."
+		 * as a component (".." is prevented by a rule above).
+		 */
+		if (ref[1] == '\0')
+			return -1; /* Component equals ".". */
+	}
 	if (cp - ref >= 5 && !memcmp(cp - 5, ".lock", 5))
 		return -1; /* Refname ends with ".lock". */
 	return cp - ref;
@@ -932,7 +942,7 @@ int check_refname_format(const char *ref, int flags)
 
 	while (1) {
 		/* We are at the start of a path component. */
-		component_len = check_refname_component(ref);
+		component_len = check_refname_component(ref, flags);
 		if (component_len < 0) {
 			if ((flags & REFNAME_REFSPEC_PATTERN) &&
 					ref[0] == '*' &&
diff --git a/refs.h b/refs.h
index b0da5fc..d5ac133 100644
--- a/refs.h
+++ b/refs.h
@@ -99,6 +99,7 @@ extern int for_each_reflog(each_ref_fn, void *);
 
 #define REFNAME_ALLOW_ONELEVEL 1
 #define REFNAME_REFSPEC_PATTERN 2
+#define REFNAME_DOT_COMPONENT 4
 
 /*
  * Return 0 iff ref has the correct format for a refname according to
@@ -106,7 +107,10 @@ extern int for_each_reflog(each_ref_fn, void *);
  * REFNAME_ALLOW_ONELEVEL is set in flags, then accept one-level
  * reference names.  If REFNAME_REFSPEC_PATTERN is set in flags, then
  * allow a "*" wildcard character in place of one of the name
- * components.  No leading or repeated slashes are accepted.
+ * components.  No leading or repeated slashes are accepted.  If
+ * REFNAME_DOT_COMPONENT is set in flags, then allow refname
+ * components to start with "." (but not a whole component equal to
+ * "." or "..").
  */
 extern int check_refname_format(const char *ref, int flags);
 
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 13/22] resolve_ref(): turn buffer into a proper string as soon as possible
  2011-09-15 21:10 ` [PATCH v3 13/22] resolve_ref(): turn buffer into a proper string as soon as possible Michael Haggerty
@ 2011-09-23  8:17   ` Thomas Rast
  2011-09-23 13:11     ` Michael Haggerty
  0 siblings, 1 reply; 60+ messages in thread
From: Thomas Rast @ 2011-09-23  8:17 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: git, Junio C Hamano, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier

Hi Michael

Michael Haggerty wrote:
> Immediately strip off trailing spaces and null-terminate the string
> holding the contents of the reference file; this allows the use of
> string functions and avoids the need to keep separate track of the
> string's length.  (get_sha1_hex() fails automatically if the string is
> too short.)
> 
> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>

I'm getting valgrind failures in t1450-fsck and t3800-mktag which
blame to this commit.  For t1450 it looks as follows:

    ok 5 - object with bad sha1

    expecting success: 
            git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
            test_when_finished "git update-ref -d refs/heads/invalid" &&
            git fsck 2>out &&
            cat out &&
            grep "not a commit" out

    ==19623== Use of uninitialised value of size 8
    ==19623==    at 0x4B6747: hexval (cache.h:798)
    ==19623==    by 0x4B6797: get_sha1_hex (hex.c:42)
    ==19623==    by 0x4DD12A: resolve_ref (refs.c:588)
    ==19623==    by 0x4DC777: get_ref_dir (refs.c:313)
    ==19623==    by 0x4DC6FA: get_ref_dir (refs.c:302)
    ==19623==    by 0x4DC963: get_loose_refs (refs.c:368)
    ==19623==    by 0x4DD556: do_for_each_ref (refs.c:687)
    ==19623==    by 0x4DDA05: for_each_replace_ref (refs.c:806)
    ==19623==    by 0x4E5CE9: prepare_replace_object (replace_object.c:86)
    ==19623==    by 0x4E5D3C: do_lookup_replace_object (replace_object.c:103)
    ==19623==    by 0x4C99BB: lookup_replace_object (cache.h:764)
    ==19623==    by 0x4C9FA6: parse_object (object.c:191)
    ==19623==  Uninitialised value was created by a stack allocation
    ==19623==    at 0x4DCE34: resolve_ref (refs.c:498)

or when I run it at the tip of pu instead of at the commit itself,
line numbers are like so:

    ==2308== Use of uninitialised value of size 8
    ==2308==    at 0x4ADB8B: get_sha1_hex (cache.h:800)
    ==2308==    by 0x4D4283: resolve_ref (refs.c:629)
    ==2308==    by 0x4D4851: get_ref_dir (refs.c:361)
    ==2308==    by 0x4D48C6: get_ref_dir (refs.c:350)
    ==2308==    by 0x4D4D29: do_for_each_ref (refs.c:412)
    ==2308==    by 0x4DCD93: do_lookup_replace_object (replace_object.c:86)
    ==2308==    by 0x4C31F4: parse_object (cache.h:764)
    ==2308==    by 0x4F2A1D: get_sha1_1 (sha1_name.c:567)
    ==2308==    by 0x4F2D5F: get_sha1_with_context_1 (sha1_name.c:1117)
    ==2308==    by 0x4F3543: get_sha1 (cache.h:822)
    ==2308==    by 0x461B50: cmd_rev_parse (rev-parse.c:723)
    ==2308==    by 0x404B71: run_builtin (git.c:308)
    ==2308==  Uninitialised value was created by a stack allocation
    ==2308==    at 0x4D4006: resolve_ref (refs.c:530)

Can you look into this?

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 13/22] resolve_ref(): turn buffer into a proper string as soon as possible
  2011-09-23  8:17   ` Thomas Rast
@ 2011-09-23 13:11     ` Michael Haggerty
  2011-09-23 13:38       ` [PATCH 1/1] get_sha1_hex(): do not read past a NUL character Michael Haggerty
  0 siblings, 1 reply; 60+ messages in thread
From: Michael Haggerty @ 2011-09-23 13:11 UTC (permalink / raw)
  To: Thomas Rast
  Cc: git, Junio C Hamano, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier

On 09/23/2011 10:17 AM, Thomas Rast wrote:
> Michael Haggerty wrote:
>> Immediately strip off trailing spaces and null-terminate the string
>> holding the contents of the reference file; this allows the use of
>> string functions and avoids the need to keep separate track of the
>> string's length.  (get_sha1_hex() fails automatically if the string is
>> too short.)
>>
>> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
> 
> I'm getting valgrind failures in t1450-fsck and t3800-mktag which
> blame to this commit.  For t1450 it looks as follows:
> 
>     ok 5 - object with bad sha1
> 
>     expecting success: 
>             git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
>             test_when_finished "git update-ref -d refs/heads/invalid" &&
>             git fsck 2>out &&
>             cat out &&
>             grep "not a commit" out
> 
>     ==19623== Use of uninitialised value of size 8
>     ==19623==    at 0x4B6747: hexval (cache.h:798)
>     ==19623==    by 0x4B6797: get_sha1_hex (hex.c:42)
>     ==19623==    by 0x4DD12A: resolve_ref (refs.c:588)

Thanks for finding this.

The problem comes from passing get_sha1_hex() an arbitrary
NUL-terminated string.  get_sha1_hex() calls hexval(), which returns -1
if it finds a NUL character, and therefore (correctly) rejects any
string that is less than 40 characters.  The problem is that
get_sha1_hex() reads characters pairwise, and even if the first
character in a pair is NUL, the second character is read anyway.
Therefore, for strings whose strlen is an even number less than 40,
get_sha1_hex() reads past the terminating NUL.

I don't know whether get_sha1_hex() is *supposed* to work for arbitrary
NUL-terminated strings because it is not documented.  But other callers
seem to make the same assumption that I did, so I think that I will fix
get_sha1_hex() to not read past a NUL value (albeit at the cost of an
extra check in each iteration of the loop).

The fix is trivial and will be sent shortly.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 1/1] get_sha1_hex(): do not read past a NUL character
  2011-09-23 13:11     ` Michael Haggerty
@ 2011-09-23 13:38       ` Michael Haggerty
  2011-09-23 18:59         ` Junio C Hamano
  0 siblings, 1 reply; 60+ messages in thread
From: Michael Haggerty @ 2011-09-23 13:38 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Rast, Michael Haggerty

Previously, get_sha1_hex() would read one character past the end of a
null-terminated string whose strlen was an even number less than 40.
Although the function correctly returned -1 in these cases, the extra
memory access might have been to uninitialized (or even, conceivably,
unallocated) memory.

Add a check to avoid reading past the end of a string.

This problem was discovered by Thomas Rast <trast@student.ethz.ch>
using valgrind.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
It is suggested to apply this bugfix before mh/check-ref-format-3;
otherwise the latter triggers the bug, leading to a valgrind error.

This patch could optionally be applied to maint (to which it can be
rebased cleanly), though the bug that it fixes is relatively benign.

 cache.h |    9 +++++++++
 hex.c   |   10 +++++++++-
 2 files changed, 18 insertions(+), 1 deletions(-)

diff --git cache.h cache.h
index 607c2ea..e7bbc0d 100644
--- cache.h
+++ cache.h
@@ -819,7 +819,16 @@ static inline int get_sha1_with_context(const char *str, unsigned char *sha1, st
 {
 	return get_sha1_with_context_1(str, sha1, orc, 0, NULL);
 }
+
+/*
+ * Try to read a SHA1 in hexadecimal format from the 40 characters
+ * starting at hex.  Write the 20-byte result to sha1 in binary form.
+ * Return 0 on success.  Reading stops if a NUL is encountered in the
+ * input, so it is safe to pass this function an arbitrary
+ * null-terminated string.
+ */
 extern int get_sha1_hex(const char *hex, unsigned char *sha1);
+
 extern char *sha1_to_hex(const unsigned char *sha1);	/* static buffer result! */
 extern int read_ref(const char *filename, unsigned char *sha1);
 extern const char *resolve_ref(const char *path, unsigned char *sha1, int, int *);
diff --git hex.c hex.c
index bb402fb..9ebc050 100644
--- hex.c
+++ hex.c
@@ -39,7 +39,15 @@ int get_sha1_hex(const char *hex, unsigned char *sha1)
 {
 	int i;
 	for (i = 0; i < 20; i++) {
-		unsigned int val = (hexval(hex[0]) << 4) | hexval(hex[1]);
+		unsigned int val;
+		/*
+		 * hex[1]=='\0' is caught when val is checked below,
+		 * but if hex[0] is NUL we have to avoid reading
+		 * past the end of the string:
+		 */
+		if (!hex[0])
+			return -1;
+		val = (hexval(hex[0]) << 4) | hexval(hex[1]);
 		if (val & ~0xff)
 			return -1;
 		*sha1++ = val;
-- 
1.7.7.rc2

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] get_sha1_hex(): do not read past a NUL character
  2011-09-23 13:38       ` [PATCH 1/1] get_sha1_hex(): do not read past a NUL character Michael Haggerty
@ 2011-09-23 18:59         ` Junio C Hamano
  2011-10-05 19:11           ` Thomas Rast
  0 siblings, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2011-09-23 18:59 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: git, Thomas Rast

Michael Haggerty <mhagger@alum.mit.edu> writes:

> Previously, get_sha1_hex() would read one character past the end of a
> null-terminated string whose strlen was an even number less than 40.
> Although the function correctly returned -1 in these cases, the extra
> memory access might have been to uninitialized (or even, conceivably,
> unallocated) memory.
>
> Add a check to avoid reading past the end of a string.

Makes sense; thanks.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] get_sha1_hex(): do not read past a NUL character
  2011-09-23 18:59         ` Junio C Hamano
@ 2011-10-05 19:11           ` Thomas Rast
  2011-10-05 20:37             ` Junio C Hamano
  0 siblings, 1 reply; 60+ messages in thread
From: Thomas Rast @ 2011-10-05 19:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Michael Haggerty, git

Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
> 
> > Previously, get_sha1_hex() would read one character past the end of a
> > null-terminated string whose strlen was an even number less than 40.
> > Although the function correctly returned -1 in these cases, the extra
> > memory access might have been to uninitialized (or even, conceivably,
> > unallocated) memory.
> >
> > Add a check to avoid reading past the end of a string.
> 
> Makes sense; thanks.

Has this fixed patch ever made it to pu?  I'm still seeing the same
breakage in the automated valgrind runs.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] get_sha1_hex(): do not read past a NUL character
  2011-10-05 19:11           ` Thomas Rast
@ 2011-10-05 20:37             ` Junio C Hamano
  0 siblings, 0 replies; 60+ messages in thread
From: Junio C Hamano @ 2011-10-05 20:37 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Michael Haggerty, git

Thomas Rast <trast@student.ethz.ch> writes:

> Junio C Hamano wrote:
>> Michael Haggerty <mhagger@alum.mit.edu> writes:
>> 
>> > Previously, get_sha1_hex() would read one character past the end of a
>> > null-terminated string whose strlen was an even number less than 40.
>> > Although the function correctly returned -1 in these cases, the extra
>> > memory access might have been to uninitialized (or even, conceivably,
>> > unallocated) memory.
>> >
>> > Add a check to avoid reading past the end of a string.
>> 
>> Makes sense; thanks.
>
> Has this fixed patch ever made it to pu?  I'm still seeing the same
> breakage in the automated valgrind runs.

I do not think so.

I was under the impression that Michael wanted to include this in the
early part of a re-roll of check-ref-format series.

I'll do the rebase myself. Thanks.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 19/22] resolve_ref(): emit warnings for improperly-formatted references
  2011-09-15 21:10 ` [PATCH v3 19/22] resolve_ref(): emit warnings for improperly-formatted references Michael Haggerty
@ 2011-10-11 16:16   ` Jeff King
  2011-10-11 17:53     ` Junio C Hamano
  0 siblings, 1 reply; 60+ messages in thread
From: Jeff King @ 2011-10-11 16:16 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: git, Junio C Hamano, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier

On Thu, Sep 15, 2011 at 11:10:40PM +0200, Michael Haggerty wrote:

> While resolving references, if a reference is found that is in an
> unrecognized format, emit a warning (and then fail, as before).
> Wouldn't *you* want to know?

Not necessarily. :)

I noticed this today, and bisected it to this patch:

  $ git.v1.7.7 show config
  fatal: ambiguous argument 'config': unknown revision or path not in the working tree.
  Use '--' to separate paths from revisions

  $ git.mh.check-ref-format-3 show config
  warning: reference in .git/config is formatted incorrectly
  warning: reference in .git/config is formatted incorrectly
  fatal: ambiguous argument 'config': unknown revision or path not in the working tree.
  Use '--' to separate paths from revisions

which is arguably not a big deal, as we eventually report failure
anyway. But it's even worse with:

  $ git branch config v1.7.7
  $ git show --oneline config
  warning: reference in .git/config is formatted incorrectly
  703f05a Git 1.7.7

As you probably guessed, this is due to the ref resolution rules first
looking directly in .git for refs, which catches things like "HEAD" and
fully-qualified refs like "refs/heads/foo".

Which means we have been considering ".git/config" as a possible ref for
a long time, which is arguably wrong. Your patch only makes it more
obvious that this is the case.

I wonder if the right solution is for us to be more picky about what can
be found in $GIT_DIR. Maybe matching all-uppercase, or starting with
"refs/", which I think would match existing convention?

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 19/22] resolve_ref(): emit warnings for improperly-formatted references
  2011-10-11 16:16   ` Jeff King
@ 2011-10-11 17:53     ` Junio C Hamano
  2011-10-11 18:07       ` Junio C Hamano
  0 siblings, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2011-10-11 17:53 UTC (permalink / raw)
  To: Jeff King
  Cc: Michael Haggerty, git, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier

Jeff King <peff@peff.net> writes:

> I wonder if the right solution is for us to be more picky about what can
> be found in $GIT_DIR. Maybe matching all-uppercase, or starting with
> "refs/", which I think would match existing convention?

I think we've discussed tightening it a few years ago already.

HEAD, MERGE_HEAD, FETCH_HEAD, etc. all are "^[_A-Z]*$" and it may even be
a good idea to insist "^[_A-Z]*HEAD$" or even "^([A-Z][A-Z]*_)?HEAD$".

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 19/22] resolve_ref(): emit warnings for improperly-formatted references
  2011-10-11 17:53     ` Junio C Hamano
@ 2011-10-11 18:07       ` Junio C Hamano
  2011-10-11 20:14         ` Re* " Junio C Hamano
  0 siblings, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2011-10-11 18:07 UTC (permalink / raw)
  To: Jeff King, Michael Haggerty
  Cc: git, cmn, A Large Angry SCM, Daniel Barkalow, Sverre Rabbelier

Junio C Hamano <gitster@pobox.com> writes:

> Jeff King <peff@peff.net> writes:
>
>> I wonder if the right solution is for us to be more picky about what can
>> be found in $GIT_DIR. Maybe matching all-uppercase, or starting with
>> "refs/", which I think would match existing convention?
>
> I think we've discussed tightening it a few years ago already.
>
> HEAD, MERGE_HEAD, FETCH_HEAD, etc. all are "^[_A-Z]*$" and it may even be
> a good idea to insist "^[_A-Z]*HEAD$" or even "^([A-Z][A-Z]*_)?HEAD$".

Perhaps like this? Only compile tested...

 sha1_name.c |   21 +++++++++++++++++++++
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index ba976b4..5effb1a 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -261,6 +261,23 @@ static char *substitute_branch_name(const char **string, int *len)
 	return NULL;
 }
 
+static int is_kind_of_head(const char *str)
+{
+	int len = strlen(str) - 4;
+	if (len < 0 || strcmp(str + len, "HEAD"))
+		return 0;
+	if (!len)
+		return 1;
+	if (str[--len] != '_' || !len)
+		return 0;
+	while (len) {
+		char ch = str[--len];
+		if (ch < 'A' || 'Z' < ch)
+			return 0;
+	}
+	return 1;
+}
+
 int dwim_ref(const char *str, int len, unsigned char *sha1, char **ref)
 {
 	char *last_branch = substitute_branch_name(&str, &len);
@@ -274,6 +291,8 @@ int dwim_ref(const char *str, int len, unsigned char *sha1, char **ref)
 		unsigned char *this_result;
 		int flag;
 
+		if (p == ref_rev_parse_rules && !is_kind_of_head(str))
+			continue;
 		this_result = refs_found ? sha1_from_ref : sha1;
 		mksnpath(fullref, sizeof(fullref), *p, len, str);
 		r = resolve_ref(fullref, this_result, 1, &flag);
@@ -302,6 +321,8 @@ int dwim_log(const char *str, int len, unsigned char *sha1, char **log)
 		char path[PATH_MAX];
 		const char *ref, *it;
 
+		if (p == ref_rev_parse_rules && !is_kind_of_head(str))
+			continue;
 		mksnpath(path, sizeof(path), *p, len, str);
 		ref = resolve_ref(path, hash, 1, NULL);
 		if (!ref)

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re* [PATCH v3 19/22] resolve_ref(): emit warnings for improperly-formatted references
  2011-10-11 18:07       ` Junio C Hamano
@ 2011-10-11 20:14         ` Junio C Hamano
  2011-10-11 20:39           ` Jeff King
  2011-10-11 23:07           ` Jeff King
  0 siblings, 2 replies; 60+ messages in thread
From: Junio C Hamano @ 2011-10-11 20:14 UTC (permalink / raw)
  To: Jeff King, Michael Haggerty
  Cc: git, cmn, A Large Angry SCM, Daniel Barkalow, Sverre Rabbelier

Junio C Hamano <gitster@pobox.com> writes:

>> I think we've discussed tightening it a few years ago already.
>>
>> HEAD, MERGE_HEAD, FETCH_HEAD, etc. all are "^[_A-Z]*$" and it may even be
>> a good idea to insist "^[_A-Z]*HEAD$" or even "^([A-Z][A-Z]*_)?HEAD$".
>
> Perhaps like this? Only compile tested...

Not quite. There are at least three bugs in the patch.

 - Some subsystems use random refnames like NOTES_MERGE_PARTIAL that would
   not match "^([A-Z][A-Z]*_)?HEAD$". The rule needs to be relaxed;

 - dwim_ref() can be fed "refs/heads/master" and is expected to dwim it to
   the master branch.

 - These codepaths get pointer+length so that it can be told to parse only
   the first 4 bytes in "HEAD:$path".

-- >8 --
Subject: [PATCH] Restrict ref-like names immediately below $GIT_DIR

We have always dwimmed the user input $string into a ref by first looking
directly inside $GIT_DIR, and then in $GIT_DIR/refs, $GIT_DIR/refs/tags,
etc., and that is what made

	git log HEAD..MERGE_HEAD

work correctly. This however means that

	git rev-parse config
        git log index

would look at $GIT_DIR/config and $GIT_DIR/index and see if they are valid
refs.

To reduce confusion, let's not dwim a path immediately below $GIT_DIR that
is not all-caps.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 sha1_name.c |   23 +++++++++++++++++++++++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index 143fd97..5eb19c2 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -261,6 +261,25 @@ static char *substitute_branch_name(const char **string, int *len)
 	return NULL;
 }
 
+static int ok_at_root_level(const char *str, int len)
+{
+	int seen_non_root_char = 0;
+
+	while (len--) {
+		char ch = *str++;
+
+		if (ch == '/')
+			return 1;
+		/*
+		 * Only accept likes of .git/HEAD, .git/MERGE_HEAD at
+		 * the root level as a ref.
+		 */
+		if (ch != '_' && (ch < 'A' || 'Z' < ch))
+			seen_non_root_char = 1;
+	}
+	return !seen_non_root_char;
+}
+
 int dwim_ref(const char *str, int len, unsigned char *sha1, char **ref)
 {
 	char *last_branch = substitute_branch_name(&str, &len);
@@ -274,6 +293,8 @@ int dwim_ref(const char *str, int len, unsigned char *sha1, char **ref)
 		unsigned char *this_result;
 		int flag;
 
+		if (p == ref_rev_parse_rules && !ok_at_root_level(str, len))
+			continue;
 		this_result = refs_found ? sha1_from_ref : sha1;
 		mksnpath(fullref, sizeof(fullref), *p, len, str);
 		r = resolve_ref(fullref, this_result, 1, &flag);
@@ -302,6 +323,8 @@ int dwim_log(const char *str, int len, unsigned char *sha1, char **log)
 		char path[PATH_MAX];
 		const char *ref, *it;
 
+		if (p == ref_rev_parse_rules && !ok_at_root_level(str, len))
+			continue;
 		mksnpath(path, sizeof(path), *p, len, str);
 		ref = resolve_ref(path, hash, 1, NULL);
 		if (!ref)

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: Re* [PATCH v3 19/22] resolve_ref(): emit warnings for improperly-formatted references
  2011-10-11 20:14         ` Re* " Junio C Hamano
@ 2011-10-11 20:39           ` Jeff King
  2011-10-11 21:31             ` Junio C Hamano
  2011-10-11 23:07           ` Jeff King
  1 sibling, 1 reply; 60+ messages in thread
From: Jeff King @ 2011-10-11 20:39 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Michael Haggerty, git, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier

On Tue, Oct 11, 2011 at 01:14:26PM -0700, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
> 
> >> I think we've discussed tightening it a few years ago already.
> >>
> >> HEAD, MERGE_HEAD, FETCH_HEAD, etc. all are "^[_A-Z]*$" and it may even be
> >> a good idea to insist "^[_A-Z]*HEAD$" or even "^([A-Z][A-Z]*_)?HEAD$".
> >
> > Perhaps like this? Only compile tested...
> 
> Not quite. There are at least three bugs in the patch.
> 
>  - Some subsystems use random refnames like NOTES_MERGE_PARTIAL that would
>    not match "^([A-Z][A-Z]*_)?HEAD$". The rule needs to be relaxed;

Yeah, I found one of these also. I thought at first we could rename
things like NOTES_MERGE_PARTIAL, as it's more about internal
communication within a specific version of git than it is about letting
an external program peek at our state. But there really are several of
them. And I think it makes sense to keep this safety valve conservative,
and try to document existing practice rather than dictating it. I'm
worried that some porcelain or other tool is using their own all-caps
name, and that tightening it too much would be breaking that.

Your relaxed rule does match things like COMMIT_EDITMSG and NOTES_MSG,
which are obviously bogus. At the same time, I'm not sure it's a big
deal. The point of this is to restrict the lookup to a class of names
which are likely magical to git, and users should probably avoid the
magical namespace (i.e., it's still not a good idea to call your branch
"HEAD"). Something like "config" is an easy branch name to unknowingly
use. Something like "COMMIT_EDITMSG" is not.

Your rule does disallow RENAMED-REF, which is used in branch renaming.
However, I'm having trouble figuring out what it's really for. It's not
mentioned in the documentation at all, and c976d41, its origin, says
only:

  This also indroduces $GIT_DIR/RENAME_REF which is a "metabranch"
  used when renaming branches. It will always hold the original
  sha1 for the latest renamed branch.

  Additionally, if $GIT_DIR/logs/RENAME_REF exists,
  all branch rename events are logged there.

But in the code, it is spelled RENAMED-REF (with a dash). And as far as
I can tell, does not actually create a reflog. And it's not documented
anywhere, so I suspect nobody is using it. Maybe it is worth switching
that name.

>  - dwim_ref() can be fed "refs/heads/master" and is expected to dwim it to
>    the master branch.

It looks like your code will allow any subdirectory. I had thought to
limit it to "refs/". Otherwise, my "config" example could be
"objects/pack", or "lost-found/commits", "remotes/foo", or something.
Obviously the longer the name, the smaller the possibility of an
accidental collision.  But I couldn't think of any other subdirectory
into which refs should go.

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: Re* [PATCH v3 19/22] resolve_ref(): emit warnings for improperly-formatted references
  2011-10-11 20:39           ` Jeff King
@ 2011-10-11 21:31             ` Junio C Hamano
  2011-10-11 22:54               ` Jeff King
  0 siblings, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2011-10-11 21:31 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Michael Haggerty, git, cmn, A Large Angry SCM,
	Daniel Barkalow, Sverre Rabbelier

Jeff King <peff@peff.net> writes:

> But in the code, it is spelled RENAMED-REF (with a dash). And as far as
> I can tell, does not actually create a reflog. And it's not documented
> anywhere, so I suspect nobody is using it. Maybe it is worth switching
> that name.

Or even better get rid of it?

>>  - dwim_ref() can be fed "refs/heads/master" and is expected to dwim it to
>>    the master branch.
>
> It looks like your code will allow any subdirectory. I had thought to
> limit it to "refs/". Otherwise, my "config" example could be
> "objects/pack", or "lost-found/commits", "remotes/foo", or something.
> Obviously the longer the name, the smaller the possibility of an
> accidental collision.  But I couldn't think of any other subdirectory
> into which refs should go.

I wanted to start as loose as possible to avoid negatively impacting
existing users, later to tighten.  As fsck and friends never look outside
of refs/, I think the prefix refs/ is a reasonable restriction that is
safe.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: Re* [PATCH v3 19/22] resolve_ref(): emit warnings for improperly-formatted references
  2011-10-11 21:31             ` Junio C Hamano
@ 2011-10-11 22:54               ` Jeff King
  2011-10-12 16:52                 ` Junio C Hamano
  0 siblings, 1 reply; 60+ messages in thread
From: Jeff King @ 2011-10-11 22:54 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Lars Hjemli, Michael Haggerty, git, cmn, A Large Angry SCM,
	Daniel Barkalow, Sverre Rabbelier

On Tue, Oct 11, 2011 at 02:31:48PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > But in the code, it is spelled RENAMED-REF (with a dash). And as far as
> > I can tell, does not actually create a reflog. And it's not documented
> > anywhere, so I suspect nobody is using it. Maybe it is worth switching
> > that name.
> 
> Or even better get rid of it?

Fine by me. I'm not sure anyone is even aware that it exists. Just to
double-check, I grepped the list archives, and the biggest mention of it
was its presence causing a weird bug:

  http://thread.gmane.org/gmane.comp.version-control.git/143737

Googling turns up only confusion, nobody actually using it or
recommending that it be used.

OTOH, it does actually serialize branch renames, since we take a lock on
it. Maybe that's important. Cc'ing Lars. Certainly renaming it would be
the conservative choice.

-Peff

PS I mentioned above that it does not actually create a reflog. Digging
in the code more, I think it is capable of it, but my git.git clone had
RENAMED-REF, but no reflog. I would have thought core.logAllRefUpdates
would turn it on, but maybe there is a funny interaction with things not
in refs/.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: Re* [PATCH v3 19/22] resolve_ref(): emit warnings for improperly-formatted references
  2011-10-11 20:14         ` Re* " Junio C Hamano
  2011-10-11 20:39           ` Jeff King
@ 2011-10-11 23:07           ` Jeff King
  2011-10-11 23:50             ` Junio C Hamano
  2011-10-12 19:20             ` Junio C Hamano
  1 sibling, 2 replies; 60+ messages in thread
From: Jeff King @ 2011-10-11 23:07 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Michael Haggerty, git, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier

On Tue, Oct 11, 2011 at 01:14:26PM -0700, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
> 
> >> I think we've discussed tightening it a few years ago already.
> >>
> >> HEAD, MERGE_HEAD, FETCH_HEAD, etc. all are "^[_A-Z]*$" and it may even be
> >> a good idea to insist "^[_A-Z]*HEAD$" or even "^([A-Z][A-Z]*_)?HEAD$".
> >
> > Perhaps like this? Only compile tested...
> 
> Not quite. There are at least three bugs in the patch.
> 
>  - Some subsystems use random refnames like NOTES_MERGE_PARTIAL that would
>    not match "^([A-Z][A-Z]*_)?HEAD$". The rule needs to be relaxed;
> 
>  - dwim_ref() can be fed "refs/heads/master" and is expected to dwim it to
>    the master branch.
> 
>  - These codepaths get pointer+length so that it can be told to parse only
>    the first 4 bytes in "HEAD:$path".

One more bug. :)

We also look at ref_rev_parse_rules in shorten_unambiguous_ref. So even
with your patch, I still get the warning with:

  $ git branch config
  $ git for-each-ref --format='%(refname:short)' refs/heads/

It looks like we also use it in remote.c:count_refspec_match, but I
haven't figured out if that can trigger a warning or not.

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: Re* [PATCH v3 19/22] resolve_ref(): emit warnings for improperly-formatted references
  2011-10-11 23:07           ` Jeff King
@ 2011-10-11 23:50             ` Junio C Hamano
  2011-10-12  2:11               ` Jeff King
  2011-10-12  2:56               ` Re* [PATCH v3 19/22] resolve_ref(): emit warnings for improperly-formatted references Michael Haggerty
  2011-10-12 19:20             ` Junio C Hamano
  1 sibling, 2 replies; 60+ messages in thread
From: Junio C Hamano @ 2011-10-11 23:50 UTC (permalink / raw)
  To: Jeff King
  Cc: Michael Haggerty, git, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier

Jeff King <peff@peff.net> writes:

> It looks like we also use it in remote.c:count_refspec_match, but I
> haven't figured out if that can trigger a warning or not.

It starts sounding like that the ill-thought-out warning should be ripped
out regardless of what other things we do.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: Re* [PATCH v3 19/22] resolve_ref(): emit warnings for improperly-formatted references
  2011-10-11 23:50             ` Junio C Hamano
@ 2011-10-12  2:11               ` Jeff King
  2011-10-12  4:41                 ` Junio C Hamano
  2011-10-12  2:56               ` Re* [PATCH v3 19/22] resolve_ref(): emit warnings for improperly-formatted references Michael Haggerty
  1 sibling, 1 reply; 60+ messages in thread
From: Jeff King @ 2011-10-12  2:11 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Michael Haggerty, git, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier

On Tue, Oct 11, 2011 at 04:50:46PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > It looks like we also use it in remote.c:count_refspec_match, but I
> > haven't figured out if that can trigger a warning or not.
> 
> It starts sounding like that the ill-thought-out warning should be ripped
> out regardless of what other things we do.

Maybe. I think it is not the warning that is wrong, but that it is
exposing a slightly hack-ish part of git (that we consider things like
$GIT_DIR/config as possible refs, and just silently reject them because
they happen not to have the right format).

On the other hand, it has been working fine that way for years, so maybe
it is not worth changing now.

At any rate, I think the changes should be all or nothing. If the
warning goes away, fine. But if the warning stays, and dwim_ref is going
to have special rules for looking in the top-level $GIT_DIR, then things
like shorten_unambiguous_ref need to respect those rules, or we've just
created a new bug.

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: Re* [PATCH v3 19/22] resolve_ref(): emit warnings for improperly-formatted references
  2011-10-11 23:50             ` Junio C Hamano
  2011-10-12  2:11               ` Jeff King
@ 2011-10-12  2:56               ` Michael Haggerty
  1 sibling, 0 replies; 60+ messages in thread
From: Michael Haggerty @ 2011-10-12  2:56 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, git, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier

On 10/12/2011 01:50 AM, Junio C Hamano wrote:
> It starts sounding like that the ill-thought-out warning should be ripped
> out regardless of what other things we do.

ISTM that the warning is proving its worth already by illuminating some
questionable practices (treating any file in $GIT_DIR as a potential
reference) :-)

However, it might be that fixing 100% of the questionable practices is
too much work.  In that case, I suggest that the warning not be ripped
out altogether, but rather that the warning only be emitted for invalid
references whose names start with "refs/".

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: Re* [PATCH v3 19/22] resolve_ref(): emit warnings for improperly-formatted references
  2011-10-12  2:11               ` Jeff King
@ 2011-10-12  4:41                 ` Junio C Hamano
  2011-10-12  4:50                   ` Jeff King
  0 siblings, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2011-10-12  4:41 UTC (permalink / raw)
  To: Jeff King
  Cc: Michael Haggerty, git, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier

Jeff King <peff@peff.net> writes:

> At any rate, I think the changes should be all or nothing. If the
> warning goes away, fine. But if the warning stays, and dwim_ref is going
> to have special rules for looking in the top-level $GIT_DIR, then things
> like shorten_unambiguous_ref need to respect those rules, or we've just
> created a new bug.

Whether we remove the warning or not, I think it would be an improvement
not to look at random files directly underneath $GIT_DIR/. I am not sure
how we can be confident that we caught everything, though.

In other words, is shorten-unambiguous-refs the last one that needs
fixing? How would we know for certain?

Also I tend to think Michael's "only warn in refs/" is probably not the
right solution. When a caller asks to resolve_ref(MERGE_HEAD), one of
these things can be true:

 - A file $GIT_DIR/MERGE_HEAD does not exist; this is not inherently an
   error unless we were supposed to be in the middle of a conflicted
   merge.

 - A file $GIT_DIR/MERGE_HEAD exists, and records a correct 40-hexadecimal
   get_sha1_hex() can grok. This is perfectly normal.

 - A file $GIT_DIR/MERGE_HEAD exists, but get_sha1_hex() does not grok
   it. Michael warns against this twice, and I think it is a wrong thing
   to pass this unnoticed.

Once we tighten all the "too loose accesses to $GIT_DIR/$random_filename",
we might even want to have an option to cause the caller to die() in the
error case, and the logic is the same for refs under $GIT_DIR/refs/, not
just the ref-like-things directly under $GIT_DIR.

A regular ref, can also appear in $GIT_DIR/packed-refs, but a corruption
of an entry in the file will be caught when the file is read and outside
the scope of this discussion, I think.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: Re* [PATCH v3 19/22] resolve_ref(): emit warnings for improperly-formatted references
  2011-10-12  4:41                 ` Junio C Hamano
@ 2011-10-12  4:50                   ` Jeff King
  2011-10-12 17:48                     ` [PATCH 1/2] refs.c: move dwim_ref()/dwim_log() from sha1_name.c Junio C Hamano
  2011-10-12 17:49                     ` [PATCH 2/2] Restrict ref-like names immediately below $GIT_DIR Junio C Hamano
  0 siblings, 2 replies; 60+ messages in thread
From: Jeff King @ 2011-10-12  4:50 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Michael Haggerty, git, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier

On Tue, Oct 11, 2011 at 09:41:39PM -0700, Junio C Hamano wrote:

> Whether we remove the warning or not, I think it would be an improvement
> not to look at random files directly underneath $GIT_DIR/. I am not sure
> how we can be confident that we caught everything, though.
> 
> In other words, is shorten-unambiguous-refs the last one that needs
> fixing? How would we know for certain?

My assumption was that the set of rules is defined by
ref_rev_parse_rules, so grepping for functions that access that would be
sufficient.

It looks like there is also ref_fetch_rules, which serves a similar
purpose. Uses of that would also need to be audited.

I _think_ that should be enough for arbitrary lookup. I'm sure other
callsites directly say things like 'git_path("MERGE_HEAD")', but that's
not a problem. They would be doing so with a well-defined top-level
refname. This is really just about the ref_rev_parse_rules lookups.

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: Re* [PATCH v3 19/22] resolve_ref(): emit warnings for improperly-formatted references
  2011-10-11 22:54               ` Jeff King
@ 2011-10-12 16:52                 ` Junio C Hamano
  0 siblings, 0 replies; 60+ messages in thread
From: Junio C Hamano @ 2011-10-12 16:52 UTC (permalink / raw)
  To: Lars Hjemli, Jeff King
  Cc: Michael Haggerty, git, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier

Subject: [PATCH] branch -m/-M: remove undocumented RENAMED-REF

The commit message for c976d41 (git-branch: add options and tests for
branch renaming, 2006-11-28) mentions RENAME_REF but otherwise this is not
documented anywhere, and it does not appear in any of the tests.

Worse yet, the name of the actual file is "RENAMED-REF".

This was supposed to hold the commit object name at the tip of the branch
the most recent "branch -m/-M" renamed, but that is not necessary in order
to be able to recover from a mistake. Even when "branch -M A B" overwrites
an existing branch B, what is kept in RENAMED-REF is the commit at the tip
of the original branch A, not the commit B from the now-lost branch.

Just remove this unused "feature".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * Let's do this.

 refs.c |    8 --------
 1 files changed, 0 insertions(+), 8 deletions(-)

diff --git a/refs.c b/refs.c
index cdedb45..ecfc22c 100644
--- a/refs.c
+++ b/refs.c
@@ -786,7 +786,6 @@ int delete_ref(const char *refname, unsigned char *sha1)
 
 int rename_ref(const char *oldref, const char *newref)
 {
-	static const char renamed_ref[] = "RENAMED-REF";
 	unsigned char sha1[20], orig_sha1[20];
 	int flag = 0, logmoved = 0;
 	struct ref_lock *lock;
@@ -809,13 +808,6 @@ int rename_ref(const char *oldref, const char *newref)
 	if (snprintf(msg, sizeof(msg), "renamed %s to %s", oldref, newref) > sizeof(msg))
 		return error("Refnames to long");
 
-	lock = lock_ref_sha1_basic(renamed_ref, NULL, NULL);
-	if (!lock)
-		return error("unable to lock %s", renamed_ref);
-	lock->force_write = 1;
-	if (write_ref_sha1(lock, orig_sha1, msg))
-		return error("unable to save current sha1 in %s", renamed_ref);
-
 	if (log && rename(git_path("logs/%s", oldref), git_path("tmp-renamed-log")))
 		return error("unable to move logfile logs/%s to tmp-renamed-log: %s",
 			oldref, strerror(errno));
-- 
1.7.7.213.g8b0e1

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 1/2] refs.c: move dwim_ref()/dwim_log() from sha1_name.c
  2011-10-12  4:50                   ` Jeff King
@ 2011-10-12 17:48                     ` Junio C Hamano
  2011-10-12 17:49                     ` [PATCH 2/2] Restrict ref-like names immediately below $GIT_DIR Junio C Hamano
  1 sibling, 0 replies; 60+ messages in thread
From: Junio C Hamano @ 2011-10-12 17:48 UTC (permalink / raw)
  To: Jeff King
  Cc: Michael Haggerty, git, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier

OK, so here is a re-roll of the one from yesterday. This one is new but is
a mere preparatory step.

-- >8 --

Both dwim_ref()/dwim_log() functions are intimately related to the ref
parsing rules defined in refs.c and better fits there. Move them together
with substitute_branch_name(), a file scope static helper function.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 refs.c      |   85 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 sha1_name.c |   85 -----------------------------------------------------------
 2 files changed, 85 insertions(+), 85 deletions(-)

diff --git a/refs.c b/refs.c
index 832a52f..e3692bd 100644
--- a/refs.c
+++ b/refs.c
@@ -1067,6 +1067,91 @@ static int is_refname_available(const char *ref, const char *oldref,
 	return 1;
 }
 
+/*
+ * *string and *len will only be substituted, and *string returned (for
+ * later free()ing) if the string passed in is a magic short-hand form
+ * to name a branch.
+ */
+static char *substitute_branch_name(const char **string, int *len)
+{
+	struct strbuf buf = STRBUF_INIT;
+	int ret = interpret_branch_name(*string, &buf);
+
+	if (ret == *len) {
+		size_t size;
+		*string = strbuf_detach(&buf, &size);
+		*len = size;
+		return (char *)*string;
+	}
+
+	return NULL;
+}
+
+int dwim_ref(const char *str, int len, unsigned char *sha1, char **ref)
+{
+	char *last_branch = substitute_branch_name(&str, &len);
+	const char **p, *r;
+	int refs_found = 0;
+
+	*ref = NULL;
+	for (p = ref_rev_parse_rules; *p; p++) {
+		char fullref[PATH_MAX];
+		unsigned char sha1_from_ref[20];
+		unsigned char *this_result;
+		int flag;
+
+		this_result = refs_found ? sha1_from_ref : sha1;
+		mksnpath(fullref, sizeof(fullref), *p, len, str);
+		r = resolve_ref(fullref, this_result, 1, &flag);
+		if (r) {
+			if (!refs_found++)
+				*ref = xstrdup(r);
+			if (!warn_ambiguous_refs)
+				break;
+		} else if ((flag & REF_ISSYMREF) && strcmp(fullref, "HEAD"))
+			warning("ignoring dangling symref %s.", fullref);
+	}
+	free(last_branch);
+	return refs_found;
+}
+
+int dwim_log(const char *str, int len, unsigned char *sha1, char **log)
+{
+	char *last_branch = substitute_branch_name(&str, &len);
+	const char **p;
+	int logs_found = 0;
+
+	*log = NULL;
+	for (p = ref_rev_parse_rules; *p; p++) {
+		struct stat st;
+		unsigned char hash[20];
+		char path[PATH_MAX];
+		const char *ref, *it;
+
+		mksnpath(path, sizeof(path), *p, len, str);
+		ref = resolve_ref(path, hash, 1, NULL);
+		if (!ref)
+			continue;
+		if (!stat(git_path("logs/%s", path), &st) &&
+		    S_ISREG(st.st_mode))
+			it = path;
+		else if (strcmp(ref, path) &&
+			 !stat(git_path("logs/%s", ref), &st) &&
+			 S_ISREG(st.st_mode))
+			it = ref;
+		else
+			continue;
+		if (!logs_found++) {
+			*log = xstrdup(it);
+			hashcpy(sha1, hash);
+		}
+		if (!warn_ambiguous_refs)
+			break;
+	}
+	free(last_branch);
+	return logs_found;
+}
+
 static struct ref_lock *lock_ref_sha1_basic(const char *ref, const unsigned char *old_sha1, int flags, int *type_p)
 {
 	char *ref_file;
diff --git a/sha1_name.c b/sha1_name.c
index 143fd97..d423635 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -241,91 +241,6 @@ static int ambiguous_path(const char *path, int len)
 	return slash;
 }
 
-/*
- * *string and *len will only be substituted, and *string returned (for
- * later free()ing) if the string passed in is a magic short-hand form
- * to name a branch.
- */
-static char *substitute_branch_name(const char **string, int *len)
-{
-	struct strbuf buf = STRBUF_INIT;
-	int ret = interpret_branch_name(*string, &buf);
-
-	if (ret == *len) {
-		size_t size;
-		*string = strbuf_detach(&buf, &size);
-		*len = size;
-		return (char *)*string;
-	}
-
-	return NULL;
-}
-
-int dwim_ref(const char *str, int len, unsigned char *sha1, char **ref)
-{
-	char *last_branch = substitute_branch_name(&str, &len);
-	const char **p, *r;
-	int refs_found = 0;
-
-	*ref = NULL;
-	for (p = ref_rev_parse_rules; *p; p++) {
-		char fullref[PATH_MAX];
-		unsigned char sha1_from_ref[20];
-		unsigned char *this_result;
-		int flag;
-
-		this_result = refs_found ? sha1_from_ref : sha1;
-		mksnpath(fullref, sizeof(fullref), *p, len, str);
-		r = resolve_ref(fullref, this_result, 1, &flag);
-		if (r) {
-			if (!refs_found++)
-				*ref = xstrdup(r);
-			if (!warn_ambiguous_refs)
-				break;
-		} else if ((flag & REF_ISSYMREF) && strcmp(fullref, "HEAD"))
-			warning("ignoring dangling symref %s.", fullref);
-	}
-	free(last_branch);
-	return refs_found;
-}
-
-int dwim_log(const char *str, int len, unsigned char *sha1, char **log)
-{
-	char *last_branch = substitute_branch_name(&str, &len);
-	const char **p;
-	int logs_found = 0;
-
-	*log = NULL;
-	for (p = ref_rev_parse_rules; *p; p++) {
-		struct stat st;
-		unsigned char hash[20];
-		char path[PATH_MAX];
-		const char *ref, *it;
-
-		mksnpath(path, sizeof(path), *p, len, str);
-		ref = resolve_ref(path, hash, 1, NULL);
-		if (!ref)
-			continue;
-		if (!stat(git_path("logs/%s", path), &st) &&
-		    S_ISREG(st.st_mode))
-			it = path;
-		else if (strcmp(ref, path) &&
-			 !stat(git_path("logs/%s", ref), &st) &&
-			 S_ISREG(st.st_mode))
-			it = ref;
-		else
-			continue;
-		if (!logs_found++) {
-			*log = xstrdup(it);
-			hashcpy(sha1, hash);
-		}
-		if (!warn_ambiguous_refs)
-			break;
-	}
-	free(last_branch);
-	return logs_found;
-}
-
 static inline int upstream_mark(const char *string, int len)
 {
 	const char *suffix[] = { "@{upstream}", "@{u}" };
-- 
1.7.7.213.g8b0e1

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 2/2] Restrict ref-like names immediately below $GIT_DIR
  2011-10-12  4:50                   ` Jeff King
  2011-10-12 17:48                     ` [PATCH 1/2] refs.c: move dwim_ref()/dwim_log() from sha1_name.c Junio C Hamano
@ 2011-10-12 17:49                     ` Junio C Hamano
  2011-10-12 18:01                       ` Michael Haggerty
  2011-10-12 21:51                       ` Jeff King
  1 sibling, 2 replies; 60+ messages in thread
From: Junio C Hamano @ 2011-10-12 17:49 UTC (permalink / raw)
  To: Jeff King
  Cc: Michael Haggerty, git, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier

We have always dwimmed the user input $string into a ref by first looking
directly inside $GIT_DIR, and then in $GIT_DIR/refs, $GIT_DIR/refs/tags,
etc., and that is what made

    git log HEAD..MERGE_HEAD

work correctly. This however means that

    git rev-parse config
    git log index

would look at $GIT_DIR/config and $GIT_DIR/index and see if they are valid
refs.

To reduce confusion, let's not dwim a path immediately below $GIT_DIR that
is not all-caps.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * And this adds coverage to refname_match() and shorten_unambiguous_ref()
   on top of the one from yesterday.
 
 refs.c |   36 ++++++++++++++++++++++++++++++++++--
 1 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/refs.c b/refs.c
index e3692bd..e54c482 100644
--- a/refs.c
+++ b/refs.c
@@ -994,12 +994,34 @@ const char *ref_fetch_rules[] = {
 	NULL
 };
 
+static int refname_ok_at_root_level(const char *str, int len)
+{
+	int seen_non_root_char = 0;
+
+	while (len--) {
+		char ch = *str++;
+
+		if (ch == '/')
+			return 1;
+		/*
+		 * Only accept likes of .git/HEAD, .git/MERGE_HEAD at
+		 * the root level as a ref.
+		 */
+		if (ch != '_' && (ch < 'A' || 'Z' < ch))
+			seen_non_root_char = 1;
+	}
+	return !seen_non_root_char;
+}
+
 int refname_match(const char *abbrev_name, const char *full_name, const char **rules)
 {
 	const char **p;
 	const int abbrev_name_len = strlen(abbrev_name);
 
 	for (p = rules; *p; p++) {
+		if (p == rules &&
+		    !refname_ok_at_root_level(abbrev_name, abbrev_name_len))
+			continue;
 		if (!strcmp(full_name, mkpath(*p, abbrev_name_len, abbrev_name))) {
 			return 1;
 		}
@@ -1100,6 +1122,8 @@ int dwim_ref(const char *str, int len, unsigned char *sha1, char **ref)
 		unsigned char *this_result;
 		int flag;
 
+		if (p == ref_rev_parse_rules && !refname_ok_at_root_level(str, len))
+			continue;
 		this_result = refs_found ? sha1_from_ref : sha1;
 		mksnpath(fullref, sizeof(fullref), *p, len, str);
 		r = resolve_ref(fullref, this_result, 1, &flag);
@@ -1128,6 +1152,8 @@ int dwim_log(const char *str, int len, unsigned char *sha1, char **log)
 		char path[PATH_MAX];
 		const char *ref, *it;
 
+		if (p == ref_rev_parse_rules && !refname_ok_at_root_level(str, len))
+			continue;
 		mksnpath(path, sizeof(path), *p, len, str);
 		ref = resolve_ref(path, hash, 1, NULL);
 		if (!ref)
@@ -2045,12 +2071,14 @@ char *shorten_unambiguous_ref(const char *ref, int strict)
 	/* buffer for scanf result, at most ref must fit */
 	short_name = xstrdup(ref);
 
-	/* skip first rule, it will always match */
-	for (i = nr_rules - 1; i > 0 ; --i) {
+	for (i = nr_rules - 1; i >= 0; i--) {
 		int j;
 		int rules_to_fail = i;
 		int short_name_len;
 
+		if (!i && !refname_ok_at_root_level(ref, strlen(ref)))
+			continue;
+
 		if (1 != sscanf(ref, scanf_fmts[i], short_name))
 			continue;
 
@@ -2076,6 +2104,10 @@ char *shorten_unambiguous_ref(const char *ref, int strict)
 			if (i == j)
 				continue;
 
+			if (!j &&
+			    !refname_ok_at_root_level(short_name, short_name_len))
+				continue;
+
 			/*
 			 * the short name is ambiguous, if it resolves
 			 * (with this previous rule) to a valid ref
-- 
1.7.7.213.g8b0e1

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH 2/2] Restrict ref-like names immediately below $GIT_DIR
  2011-10-12 17:49                     ` [PATCH 2/2] Restrict ref-like names immediately below $GIT_DIR Junio C Hamano
@ 2011-10-12 18:01                       ` Michael Haggerty
  2011-10-12 18:07                         ` Junio C Hamano
  2011-10-12 21:51                       ` Jeff King
  1 sibling, 1 reply; 60+ messages in thread
From: Michael Haggerty @ 2011-10-12 18:01 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, git, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier

On 10/12/2011 07:49 PM, Junio C Hamano wrote:
> diff --git a/refs.c b/refs.c
> index e3692bd..e54c482 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -994,12 +994,34 @@ const char *ref_fetch_rules[] = {
>  	NULL
>  };
>  
> +static int refname_ok_at_root_level(const char *str, int len)
> +{
> +	int seen_non_root_char = 0;
> +
> +	while (len--) {
> +		char ch = *str++;
> +
> +		if (ch == '/')
> +			return 1;
> +		/*
> +		 * Only accept likes of .git/HEAD, .git/MERGE_HEAD at
> +		 * the root level as a ref.
> +		 */
> +		if (ch != '_' && (ch < 'A' || 'Z' < ch))
> +			seen_non_root_char = 1;
> +	}
> +	return !seen_non_root_char;
> +}
> +

Nit: the seen_non_root_char variable can be replaced by an early "return
0" from the loop and "return 1" if the loop falls through.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 2/2] Restrict ref-like names immediately below $GIT_DIR
  2011-10-12 18:01                       ` Michael Haggerty
@ 2011-10-12 18:07                         ` Junio C Hamano
  2011-10-12 21:42                           ` Michael Haggerty
  0 siblings, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2011-10-12 18:07 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Jeff King, git, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier

Michael Haggerty <mhagger@alum.mit.edu> writes:

> On 10/12/2011 07:49 PM, Junio C Hamano wrote:
>> diff --git a/refs.c b/refs.c
>> index e3692bd..e54c482 100644
>> --- a/refs.c
>> +++ b/refs.c
>> @@ -994,12 +994,34 @@ const char *ref_fetch_rules[] = {
>>  	NULL
>>  };
>>  
>> +static int refname_ok_at_root_level(const char *str, int len)
>> +{
>> +	int seen_non_root_char = 0;
>> +
>> +	while (len--) {
>> +		char ch = *str++;
>> +
>> +		if (ch == '/')
>> +			return 1;
>> +		/*
>> +		 * Only accept likes of .git/HEAD, .git/MERGE_HEAD at
>> +		 * the root level as a ref.
>> +		 */
>> +		if (ch != '_' && (ch < 'A' || 'Z' < ch))
>> +			seen_non_root_char = 1;
>> +	}
>> +	return !seen_non_root_char;
>> +}
>> +
>
> Nit: the seen_non_root_char variable can be replaced by an early "return
> 0" from the loop and "return 1" if the loop falls through.

Hmm, I thought that would fail when you feed "refs/heads/master" to the
function.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: Re* [PATCH v3 19/22] resolve_ref(): emit warnings for improperly-formatted references
  2011-10-11 23:07           ` Jeff King
  2011-10-11 23:50             ` Junio C Hamano
@ 2011-10-12 19:20             ` Junio C Hamano
  2011-10-12 19:26               ` Jeff King
  1 sibling, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2011-10-12 19:20 UTC (permalink / raw)
  To: Jeff King
  Cc: Michael Haggerty, git, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier

Jeff King <peff@peff.net> writes:

> We also look at ref_rev_parse_rules in shorten_unambiguous_ref. So even
> with your patch, I still get the warning with:
>
>   $ git branch config
>   $ git for-each-ref --format='%(refname:short)' refs/heads/
>
> It looks like we also use it in remote.c:count_refspec_match, but I
> haven't figured out if that can trigger a warning or not.

I do not think this is "do not trigger a warning" exercise. This is "we no
longer consider names outside refs/ as potential magic refs unless they
are all uppercase".

If the updated dwim_ref() says $GIT_DIR/frotz will no longer create
ambiguity with $GIT_DIR/refs/heads/frotz, then refname_match() needs to
know about it, and cause count_refspec_match() to say that "frotz"
uniquely names "refs/heads/frotz".

The same stands for shorten_unambiguous_ref(). If $GIT_DIR/frotz no longer
is ambiguous with $GIT_DIR/refs/heads/frotz, then %(refname:shrot) for
refs/heads/frotz should yield "frotz", and we should not have to qualify
it as "heads/frotz".

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: Re* [PATCH v3 19/22] resolve_ref(): emit warnings for improperly-formatted references
  2011-10-12 19:20             ` Junio C Hamano
@ 2011-10-12 19:26               ` Jeff King
  0 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2011-10-12 19:26 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Michael Haggerty, git, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier

On Wed, Oct 12, 2011 at 12:20:18PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > We also look at ref_rev_parse_rules in shorten_unambiguous_ref. So even
> > with your patch, I still get the warning with:
> >
> >   $ git branch config
> >   $ git for-each-ref --format='%(refname:short)' refs/heads/
> >
> > It looks like we also use it in remote.c:count_refspec_match, but I
> > haven't figured out if that can trigger a warning or not.
> 
> I do not think this is "do not trigger a warning" exercise. This is "we no
> longer consider names outside refs/ as potential magic refs unless they
> are all uppercase".
> 
> If the updated dwim_ref() says $GIT_DIR/frotz will no longer create
> ambiguity with $GIT_DIR/refs/heads/frotz, then refname_match() needs to
> know about it, and cause count_refspec_match() to say that "frotz"
> uniquely names "refs/heads/frotz".
> 
> The same stands for shorten_unambiguous_ref(). If $GIT_DIR/frotz no longer
> is ambiguous with $GIT_DIR/refs/heads/frotz, then %(refname:shrot) for
> refs/heads/frotz should yield "frotz", and we should not have to qualify
> it as "heads/frotz".

Absolutely. I was lazy in how I woreded it, but I meant that the warning
was the indicator that we were doing something buggy, not that it was
the bug itself. It just happened to be a convenient way to test. :)

Your most recent patch looks right.

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 2/2] Restrict ref-like names immediately below $GIT_DIR
  2011-10-12 18:07                         ` Junio C Hamano
@ 2011-10-12 21:42                           ` Michael Haggerty
  2011-10-12 22:26                             ` Junio C Hamano
  2011-10-19  5:28                             ` Junio C Hamano
  0 siblings, 2 replies; 60+ messages in thread
From: Michael Haggerty @ 2011-10-12 21:42 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, git, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier

On 10/12/2011 08:07 PM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
> 
>> On 10/12/2011 07:49 PM, Junio C Hamano wrote:
>>> diff --git a/refs.c b/refs.c
>>> index e3692bd..e54c482 100644
>>> --- a/refs.c
>>> +++ b/refs.c
>>> @@ -994,12 +994,34 @@ const char *ref_fetch_rules[] = {
>>>  	NULL
>>>  };
>>>  
>>> +static int refname_ok_at_root_level(const char *str, int len)
>>> +{
>>> +	int seen_non_root_char = 0;
>>> +
>>> +	while (len--) {
>>> +		char ch = *str++;
>>> +
>>> +		if (ch == '/')
>>> +			return 1;
>>> +		/*
>>> +		 * Only accept likes of .git/HEAD, .git/MERGE_HEAD at
>>> +		 * the root level as a ref.
>>> +		 */
>>> +		if (ch != '_' && (ch < 'A' || 'Z' < ch))
>>> +			seen_non_root_char = 1;
>>> +	}
>>> +	return !seen_non_root_char;
>>> +}
>>> +
>>
>> Nit: the seen_non_root_char variable can be replaced by an early "return
>> 0" from the loop and "return 1" if the loop falls through.
> 
> Hmm, I thought that would fail when you feed "refs/heads/master" to the
> function.

You're right.  My brain must be scrambled from all of the rebasing that
I have been doing ;-)

How about adding

/*
 * Accept strings that are either ALL_CAPS or include a '/'.
 */

(I think the underscore is implied by the example, but the comment could
be expanded if necessary to be explicit.)

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 2/2] Restrict ref-like names immediately below $GIT_DIR
  2011-10-12 17:49                     ` [PATCH 2/2] Restrict ref-like names immediately below $GIT_DIR Junio C Hamano
  2011-10-12 18:01                       ` Michael Haggerty
@ 2011-10-12 21:51                       ` Jeff King
  1 sibling, 0 replies; 60+ messages in thread
From: Jeff King @ 2011-10-12 21:51 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Michael Haggerty, git, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier

On Wed, Oct 12, 2011 at 10:49:41AM -0700, Junio C Hamano wrote:

> +static int refname_ok_at_root_level(const char *str, int len)
> +{
> +	int seen_non_root_char = 0;
> +
> +	while (len--) {
> +		char ch = *str++;
> +
> +		if (ch == '/')
> +			return 1;
> +		/*
> +		 * Only accept likes of .git/HEAD, .git/MERGE_HEAD at
> +		 * the root level as a ref.
> +		 */
> +		if (ch != '_' && (ch < 'A' || 'Z' < ch))
> +			seen_non_root_char = 1;
> +	}
> +	return !seen_non_root_char;
> +}

I thought from your earlier comment:

> I wanted to start as loose as possible to avoid negatively impacting
> existing users, later to tighten.  As fsck and friends never look
> outside of refs/, I think the prefix refs/ is a reasonable restriction
> that is safe.

that you did agree with tightening this up to allow just refs/ as a
subdirectory.

Squashable patch is below.

diff --git a/refs.c b/refs.c
index 0f26d9d..b159c4a 100644
--- a/refs.c
+++ b/refs.c
@@ -994,21 +994,20 @@ int check_refname_format(const char *ref, int flags)
 
 static int refname_ok_at_root_level(const char *str, int len)
 {
-	int seen_non_root_char = 0;
+	if (len >= 5 && !memcmp(str, "refs/", 5))
+		return 1;
 
 	while (len--) {
 		char ch = *str++;
 
-		if (ch == '/')
-			return 1;
 		/*
 		 * Only accept likes of .git/HEAD, .git/MERGE_HEAD at
 		 * the root level as a ref.
 		 */
 		if (ch != '_' && (ch < 'A' || 'Z' < ch))
-			seen_non_root_char = 1;
+			return 0;
 	}
-	return !seen_non_root_char;
+	return 1;
 }
 
 int refname_match(const char *abbrev_name, const char *full_name, const char **rules)

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH 2/2] Restrict ref-like names immediately below $GIT_DIR
  2011-10-12 21:42                           ` Michael Haggerty
@ 2011-10-12 22:26                             ` Junio C Hamano
  2011-10-19  5:28                             ` Junio C Hamano
  1 sibling, 0 replies; 60+ messages in thread
From: Junio C Hamano @ 2011-10-12 22:26 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Junio C Hamano, Jeff King, git, cmn, A Large Angry SCM,
	Daniel Barkalow, Sverre Rabbelier

Michael Haggerty <mhagger@alum.mit.edu> writes:

> /*
>  * Accept strings that are either ALL_CAPS or include a '/'.
>  */
>
> (I think the underscore is implied by the example, but the comment could
> be expanded if necessary to be explicit.)

I like that comment hints "_" by having it between all and caps ;-).

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 2/2] Restrict ref-like names immediately below $GIT_DIR
  2011-10-12 21:42                           ` Michael Haggerty
  2011-10-12 22:26                             ` Junio C Hamano
@ 2011-10-19  5:28                             ` Junio C Hamano
  2011-10-19  6:19                               ` Junio C Hamano
  2011-10-19 19:29                               ` Junio C Hamano
  1 sibling, 2 replies; 60+ messages in thread
From: Junio C Hamano @ 2011-10-19  5:28 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Jeff King, git, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier

Michael Haggerty <mhagger@alum.mit.edu> writes:

>>> Nit: the seen_non_root_char variable can be replaced by an early "return
>>> 0" from the loop and "return 1" if the loop falls through.
>> 
>> Hmm, I thought that would fail when you feed "refs/heads/master" to the
>> function.
>
> You're right.  My brain must be scrambled from all of the rebasing that
> I have been doing ;-)
>
> How about adding
>
> /*
>  * Accept strings that are either ALL_CAPS or include a '/'.
>  */
>
> (I think the underscore is implied by the example, but the comment could
> be expanded if necessary to be explicit.)

I was trying to summarize this topic for Release Notes.

  Possibly incompatible changes
  -----------------------------

   * Special refs such as "HEAD", "MERGE_HEAD", and "FETCH_HEAD" are now
     restricted to names with only uppercase letters and underscore. All
     other refs must live under refs/ namespace. Earlier, you could
     confuse git by storing an object name in $GIT_DIR/tmp/junk and say
     "git show tmp/junk", but this will no longer work.

But noticed that "git update-ref tmp/junk HEAD" does create such a ref
that won't be recognized, and "git check-ref-format tmp/junk" is happy.

I think we would need to restrict check_ref_format() so that these
commands (and possibly others, but I think that single function will cover
pretty much everything) also reject "tmp/junk" immediately below $GIT_DIR
as a bad string. Otherwise we cannot merge these fixups, which would mean
we would have to revert the "Clean up refname checks and normalization"
series, at least the part that started emitting the "warning", which is a
mess I would rather want to avoid.

Opinions on how to get us out of this mess?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 2/2] Restrict ref-like names immediately below $GIT_DIR
  2011-10-19  5:28                             ` Junio C Hamano
@ 2011-10-19  6:19                               ` Junio C Hamano
  2011-10-19 15:18                                 ` Michael Haggerty
  2011-10-19 19:29                               ` Junio C Hamano
  1 sibling, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2011-10-19  6:19 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Jeff King, git, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier

Junio C Hamano <gitster@pobox.com> writes:

> I was trying to summarize this topic for Release Notes.
>
>   Possibly incompatible changes
>   -----------------------------
>
>    * Special refs such as "HEAD", "MERGE_HEAD", and "FETCH_HEAD" are now
>      restricted to names with only uppercase letters and underscore. All
>      other refs must live under refs/ namespace. Earlier, you could
>      confuse git by storing an object name in $GIT_DIR/tmp/junk and say
>      "git show tmp/junk", but this will no longer work.
>
> But noticed that "git update-ref tmp/junk HEAD" does create such a ref
> that won't be recognized, and "git check-ref-format tmp/junk" is happy.
>
> I think we would need to restrict check_ref_format() so that these
> commands (and possibly others, but I think that single function will cover
> pretty much everything) also reject "tmp/junk" immediately below $GIT_DIR
> as a bad string. Otherwise we cannot merge these fixups, which would mean
> we would have to revert the "Clean up refname checks and normalization"
> series, at least the part that started emitting the "warning", which is a
> mess I would rather want to avoid.
>
> Opinions on how to get us out of this mess?

Addendum.

I was digging this further and see fairly large conflicts between the bulk
of "clean up refname checks and normalization" topic and the logic to
avoid the additional warning by tightening the dwimmery.

check_refname_format() has always assumed that it is OK to be called at
any level of substring, and there are many code like this one (example is
from remote.c::get_fetch_map()):

        for (rmp = &ref_map; *rmp; ) {
                if ((*rmp)->peer_ref) {
                        if (check_refname_format((*rmp)->peer_ref->name + 5,
                                REFNAME_ALLOW_ONELEVEL)) {
                                struct ref *ignore = *rmp;
                                error("* Ignoring funny ref '%s' locally",
                                      (*rmp)->peer_ref->name);

This code somehow _knows_ that peer_ref->name begins with "refs/" and that
is the reason it adds 5 to skip that known part. In this particular case,
I think we can simply drop the +5 and allow-onelevel, but there are other
instances of the calls to the function that feeds the rest of the refname
string, skipping leading substring (not necessarily "refs/"), assuming
that any component string is either valid or invalid no matter where it
appears in the full refname. I wouldn't be surprised if some callers do
not even have enough information to tell what the leading substring would
be in the full refname context (e.g. parsing of "master:master" refspec,
relying on the later dwimmery to add refs/heads/ in front, could just
verify that "master" is in good format with allow-onelevel).

The new restriction bolted on to that logic in jc/check-ref-format-fixup
series to work around the new warning in 629cd3a (resolve_ref(): emit
warnings for improperly-formatted references, 2011-09-15) is incompatible
with the assumption, as we would need to check full refname, and treat the
first refname component very differently from other components. It has to
be either "refs" in multi-component refname, or all caps in a one-level
one, but the callers of check_refname_format() are not designed to feed
the full refname to begin with.

I am tempted to revert 629cd3a (resolve_ref(): emit warnings for
improperly-formatted references, 2011-09-15) that started giving the
warnings, and drop the jc/check-ref-format-fixup topic [*1*] altogether,
but that is a short-sighted workaround I would rather want to avoid. It
essentially declares that the "Clean up refname checks" topic was a
failure and did not manage to really clean things up.

A possible alternative might be to leave check_refname_format() and its
callers as they are, introduce check_full_refname() function that knows
the new restriction on top of check-ref-format-fixup, and use that in
lock_ref_sha1(), lock_any_ref_for_update() and is_refname_available()
[*2*]. That way, we can keep the potentially useful "ill-formed contents
in the ref" warning and avoid possible confusion caused by random files
that are directly under $GIT_DIR, which would be far more preferable in
the longer term.

Anybody wants to give it a try?


[Footnote]

*1* The topic is misnamed; it is about fixing dwimmery, not checking.

*2* This currently does not seem to check the well-formedness of the ref
it is creating at all; it should check the "ref", but not "oldref" to
allow renaming a malformed ref to correct earlier mistakes.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 2/2] Restrict ref-like names immediately below $GIT_DIR
  2011-10-19  6:19                               ` Junio C Hamano
@ 2011-10-19 15:18                                 ` Michael Haggerty
  2011-10-19 17:10                                   ` Junio C Hamano
  0 siblings, 1 reply; 60+ messages in thread
From: Michael Haggerty @ 2011-10-19 15:18 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, git, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier

On 10/19/2011 08:19 AM, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
>> I was trying to summarize this topic for Release Notes.
>>
>>   Possibly incompatible changes
>>   -----------------------------
>>
>>    * Special refs such as "HEAD", "MERGE_HEAD", and "FETCH_HEAD" are now
>>      restricted to names with only uppercase letters and underscore. All
>>      other refs must live under refs/ namespace. Earlier, you could
>>      confuse git by storing an object name in $GIT_DIR/tmp/junk and say
>>      "git show tmp/junk", but this will no longer work.
>>
>> But noticed that "git update-ref tmp/junk HEAD" does create such a ref
>> that won't be recognized, and "git check-ref-format tmp/junk" is happy.
>>
>> I think we would need to restrict check_ref_format() so that these
>> commands (and possibly others, but I think that single function will cover
>> pretty much everything) also reject "tmp/junk" immediately below $GIT_DIR
>> as a bad string. Otherwise we cannot merge these fixups, which would mean
>> we would have to revert the "Clean up refname checks and normalization"
>> series, at least the part that started emitting the "warning", which is a
>> mess I would rather want to avoid.
>>
>> Opinions on how to get us out of this mess?
> 
> Addendum.
> 
> I was digging this further and see fairly large conflicts between the bulk
> of "clean up refname checks and normalization" topic and the logic to
> avoid the additional warning by tightening the dwimmery.
> 
> check_refname_format() has always assumed that it is OK to be called at
> any level of substring, and there are many code like this one (example is
> from remote.c::get_fetch_map()):
> 
>         for (rmp = &ref_map; *rmp; ) {
>                 if ((*rmp)->peer_ref) {
>                         if (check_refname_format((*rmp)->peer_ref->name + 5,
>                                 REFNAME_ALLOW_ONELEVEL)) {
>                                 struct ref *ignore = *rmp;
>                                 error("* Ignoring funny ref '%s' locally",
>                                       (*rmp)->peer_ref->name);
> 
> This code somehow _knows_ that peer_ref->name begins with "refs/" and that
> is the reason it adds 5 to skip that known part. In this particular case,
> I think we can simply drop the +5 and allow-onelevel, but there are other
> instances of the calls to the function that feeds the rest of the refname
> string, skipping leading substring (not necessarily "refs/"), assuming
> that any component string is either valid or invalid no matter where it
> appears in the full refname. I wouldn't be surprised if some callers do
> not even have enough information to tell what the leading substring would
> be in the full refname context (e.g. parsing of "master:master" refspec,
> relying on the later dwimmery to add refs/heads/ in front, could just
> verify that "master" is in good format with allow-onelevel).
> 
> The new restriction bolted on to that logic in jc/check-ref-format-fixup
> series to work around the new warning in 629cd3a (resolve_ref(): emit
> warnings for improperly-formatted references, 2011-09-15) is incompatible
> with the assumption, as we would need to check full refname, and treat the
> first refname component very differently from other components. It has to
> be either "refs" in multi-component refname, or all caps in a one-level
> one, but the callers of check_refname_format() are not designed to feed
> the full refname to begin with.
> 
> I am tempted to revert 629cd3a (resolve_ref(): emit warnings for
> improperly-formatted references, 2011-09-15) that started giving the
> warnings, and drop the jc/check-ref-format-fixup topic [*1*] altogether,
> but that is a short-sighted workaround I would rather want to avoid. It
> essentially declares that the "Clean up refname checks" topic was a
> failure and did not manage to really clean things up.
> 
> A possible alternative might be to leave check_refname_format() and its
> callers as they are, introduce check_full_refname() function that knows
> the new restriction on top of check-ref-format-fixup, and use that in
> lock_ref_sha1(), lock_any_ref_for_update() and is_refname_available()
> [*2*]. That way, we can keep the potentially useful "ill-formed contents
> in the ref" warning and avoid possible confusion caused by random files
> that are directly under $GIT_DIR, which would be far more preferable in
> the longer term.
> 
> Anybody wants to give it a try?

I think that the refs/-or-ALL_CAPS test fits most naturally in
check_refname_format(), controlled by an option flag.

I'm starting by building parts of the solution, namely something like:

* Add an option REFNAME_ALLOW_DWIM which causes check_refname_format()
to accept reference names that *could* be dwimmed into a valid refname.
 Specifically, when this option is *not* specified, then the option must
start with "refs/" or be ALL_CAPS.  When it *is* specified, the behavior
will be much like the current behavior when ALLOW_ONELEVEL is specified
(in fact, the new option might make ALLOW_ONELEVEL obsolete).

* Add a new function parse_refname_prefix(str, len, flags) which tries
to read a refname from the front of str (of length len, not necessarily
NUL-terminated) and returns the length of the part that could be used.
This function will support the same flag argument as
check_refname_format() (in fact the latter would become a trivial
wrapper of the new function).  I expect that this function will be
useful for dwimmery.

Hopefully I'll have patches before the end of the (Berlin) day.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 2/2] Restrict ref-like names immediately below $GIT_DIR
  2011-10-19 15:18                                 ` Michael Haggerty
@ 2011-10-19 17:10                                   ` Junio C Hamano
  0 siblings, 0 replies; 60+ messages in thread
From: Junio C Hamano @ 2011-10-19 17:10 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Jeff King, git, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier

Michael Haggerty <mhagger@alum.mit.edu> writes:

> On 10/19/2011 08:19 AM, Junio C Hamano wrote:
> 
>> A possible alternative might be to leave check_refname_format() and its
>> callers as they are, introduce check_full_refname() function that knows
>> the new restriction on top of check-ref-format-fixup, and use that in
>> lock_ref_sha1(), lock_any_ref_for_update() and is_refname_available()
>> [*2*]. That way, we can keep the potentially useful "ill-formed contents
>> in the ref" warning and avoid possible confusion caused by random files
>> that are directly under $GIT_DIR, which would be far more preferable in
>> the longer term.
>> 
>> Anybody wants to give it a try?
>
> I think that the refs/-or-ALL_CAPS test fits most naturally in
> check_refname_format(), controlled by an option flag.

That is fine too, if a separate function check_full_refname() would end up
being nothing more than a thin wrapper that passes the "we are checking a
full refname" option.

> I'm starting by building parts of the solution, namely something like:
> ...
>
> Hopefully I'll have patches before the end of the (Berlin) day.

It is unclear how these two functions could be related to the issue, but
hopefully we will soon find out when we see your patch.  Thanks.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 2/2] Restrict ref-like names immediately below $GIT_DIR
  2011-10-19  5:28                             ` Junio C Hamano
  2011-10-19  6:19                               ` Junio C Hamano
@ 2011-10-19 19:29                               ` Junio C Hamano
  2011-10-19 19:39                                 ` [PATCH] resolve_ref(): report breakage to the caller without warning Junio C Hamano
  2011-10-19 20:31                                 ` [PATCH 2/2] Restrict ref-like names immediately below $GIT_DIR Michael Haggerty
  1 sibling, 2 replies; 60+ messages in thread
From: Junio C Hamano @ 2011-10-19 19:29 UTC (permalink / raw)
  To: git
  Cc: Michael Haggerty, Jeff King, cmn, A Large Angry SCM,
	Daniel Barkalow, Sverre Rabbelier

Junio C Hamano <gitster@pobox.com> writes:

> I was trying to summarize this topic for Release Notes.
>
>   Possibly incompatible changes
>   -----------------------------
>
>    * Special refs such as "HEAD", "MERGE_HEAD", and "FETCH_HEAD" are now
>      restricted to names with only uppercase letters and underscore. All
>      other refs must live under refs/ namespace. Earlier, you could
>      confuse git by storing an object name in $GIT_DIR/tmp/junk and say
>      "git show tmp/junk", but this will no longer work.
>
> But noticed that "git update-ref tmp/junk HEAD" does create such a ref
> that won't be recognized, and "git check-ref-format tmp/junk" is happy.
>
> I think we would need to restrict check_ref_format() so that these
> commands (and possibly others, but I think that single function will cover
> pretty much everything) also reject "tmp/junk" immediately below $GIT_DIR
> as a bad string. Otherwise we cannot merge these fixups, which would mean
> we would have to revert the "Clean up refname checks and normalization"
> series, at least the part that started emitting the "warning", which is a
> mess I would rather want to avoid.
>
> Opinions on how to get us out of this mess?

Let me step back a bit to avoid making hasty decisions that may cause
negative effect on the end users only because a single topic with a
possible fallout has been merged to 'master' already (a sane position for
such a situation always has been to revert the merge of the topic so far,
but I ended up being hasty, only because the merge was deliberately made
early in the cycle to give us enough time to deal with fallouts).

In general, the "Hey that file that appears in our dwimmed ref namespace
does not store correct [0-9a-f]{40} contents" warning message is a good
thing to have. When we try the dwimmery and disambiguation, we however
look at the potential refs and warn disambiguity only when two or more
such files have good contents. E.g. if I do this:

    $ git rev-parse HEAD >.git/refs/heads/frotz
    $ echo hello >.git/refs/tags/frotz
    $ git show frotz

we have never paid attention to the broken tag and showed the 'frotz'
branch without complaining. Once tags/frotz gets a real object name,
however, we start giving ambiguity warnings.

Perhaps that is what we should be fixing instead.

Right now, dwim_ref() classifies candidates into two categories, ones that
resolve_ref() succeeds, and others that resolve_ref() does not. And when
we have two or more candidates that successfully resolves, there is an
ambiguity.

Perhaps we need the third kind: ones that exist on the filesystem but
are ill-formed.

When naïvely looked at, the code in 'master' may seem to be doing just
that, but it does _not_ have any way to affect the ambiguity detection
logic even if the caller wanted to. The warning is issued at a wrong
level.

Perhaps resolve_ref() should return in its *flag parameter that "a file
exists there but incorrectly formatted", and dwim_ref() should notice and
use that information to warn about ambiguity and also illformed-ness.

A patch is attached at the end of this message to minimally fix what is in
'master' (without the jc/check-ref-format-fixup topic). This update allows
us to make dwim_ref() notice such "exists but broken" candidates and later
take them into consideration when deciding if a name is ambiguous, but I
did not want to change the semantics from the traditional implementation,
so it only uses this information to warn ref breakages. Also this squelches
the phony "index is not well formed" warning against "git show index --"
by knowing that many files directly under $GIT_DIR are not ref-like things.
Michaels's "when taken as a full refname, is this string valid?" update
mentioned in the thread may be used to replace the strchr(fullref, '/') check
that can be seen in this patch.

 refs.c      |   22 +++++++++++-----------
 refs.h      |    5 +++--
 sha1_name.c |    5 ++++-
 3 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/refs.c b/refs.c
index cab4394..0f58e46 100644
--- a/refs.c
+++ b/refs.c
@@ -4,9 +4,8 @@
 #include "tag.h"
 #include "dir.h"
 
-/* ISSYMREF=01 and ISPACKED=02 are public interfaces */
-#define REF_KNOWS_PEELED 04
-#define REF_BROKEN 010
+/* ISSYMREF=0x01, ISPACKED=0x02 and ISBROKEN=0x04 are public interfaces */
+#define REF_KNOWS_PEELED 0x10
 
 struct ref_entry {
 	unsigned char flag; /* ISSYMREF? ISPACKED? */
@@ -329,12 +328,12 @@ static void get_ref_dir(const char *submodule, const char *base,
 				flag = 0;
 				if (resolve_gitlink_ref(submodule, ref, sha1) < 0) {
 					hashclr(sha1);
-					flag |= REF_BROKEN;
+					flag |= REF_ISBROKEN;
 				}
 			} else
 				if (!resolve_ref(ref, sha1, 1, &flag)) {
 					hashclr(sha1);
-					flag |= REF_BROKEN;
+					flag |= REF_ISBROKEN;
 				}
 			add_ref(ref, sha1, flag, array, NULL);
 		}
@@ -501,7 +500,6 @@ const char *resolve_ref(const char *ref, unsigned char *sha1, int reading, int *
 	ssize_t len;
 	char buffer[256];
 	static char ref_buffer[256];
-	char path[PATH_MAX];
 
 	if (flag)
 		*flag = 0;
@@ -510,6 +508,7 @@ const char *resolve_ref(const char *ref, unsigned char *sha1, int reading, int *
 		return NULL;
 
 	for (;;) {
+		char path[PATH_MAX];
 		struct stat st;
 		char *buf;
 		int fd;
@@ -586,8 +585,8 @@ const char *resolve_ref(const char *ref, unsigned char *sha1, int reading, int *
 		while (isspace(*buf))
 			buf++;
 		if (check_refname_format(buf, REFNAME_ALLOW_ONELEVEL)) {
-			warning("symbolic reference in %s is formatted incorrectly",
-				path);
+			if (flag)
+				*flag |= REF_ISBROKEN;
 			return NULL;
 		}
 		ref = strcpy(ref_buffer, buf);
@@ -596,7 +595,8 @@ const char *resolve_ref(const char *ref, unsigned char *sha1, int reading, int *
 	}
 	/* Please note that FETCH_HEAD has a second line containing other data. */
 	if (get_sha1_hex(buffer, sha1) || (buffer[40] != '\0' && !isspace(buffer[40]))) {
-		warning("reference in %s is formatted incorrectly", path);
+		if (flag)
+			*flag |= REF_ISBROKEN;
 		return NULL;
 	}
 	return ref;
@@ -624,8 +624,8 @@ static int do_one_ref(const char *base, each_ref_fn fn, int trim,
 		return 0;
 
 	if (!(flags & DO_FOR_EACH_INCLUDE_BROKEN)) {
-		if (entry->flag & REF_BROKEN)
-			return 0; /* ignore dangling symref */
+		if (entry->flag & REF_ISBROKEN)
+			return 0; /* ignore dangling symref and corrupt ref */
 		if (!has_sha1_file(entry->sha1)) {
 			error("%s does not point to a valid object!", entry->name);
 			return 0;
diff --git a/refs.h b/refs.h
index 0229c57..7442b29 100644
--- a/refs.h
+++ b/refs.h
@@ -10,8 +10,9 @@ struct ref_lock {
 	int force_write;
 };
 
-#define REF_ISSYMREF 01
-#define REF_ISPACKED 02
+#define REF_ISSYMREF 0x01
+#define REF_ISPACKED 0x02
+#define REF_ISBROKEN 0x04
 
 /*
  * Calls the specified function for each ref file until it returns nonzero,
diff --git a/sha1_name.c b/sha1_name.c
index ba976b4..1fe37c6 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -282,8 +282,11 @@ int dwim_ref(const char *str, int len, unsigned char *sha1, char **ref)
 				*ref = xstrdup(r);
 			if (!warn_ambiguous_refs)
 				break;
-		} else if ((flag & REF_ISSYMREF) && strcmp(fullref, "HEAD"))
+		} else if ((flag & REF_ISSYMREF) && strcmp(fullref, "HEAD")) {
 			warning("ignoring dangling symref %s.", fullref);
+		} else if ((flag & REF_ISBROKEN) && strchr(fullref, '/')) {
+			warning("ignoring broken ref %s.", fullref);
+		}
 	}
 	free(last_branch);
 	return refs_found;

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH] resolve_ref(): report breakage to the caller without warning
  2011-10-19 19:29                               ` Junio C Hamano
@ 2011-10-19 19:39                                 ` Junio C Hamano
  2011-10-19 20:31                                 ` [PATCH 2/2] Restrict ref-like names immediately below $GIT_DIR Michael Haggerty
  1 sibling, 0 replies; 60+ messages in thread
From: Junio C Hamano @ 2011-10-19 19:39 UTC (permalink / raw)
  To: git
  Cc: Michael Haggerty, Jeff King, cmn, A Large Angry SCM,
	Daniel Barkalow, Sverre Rabbelier

629cd3a (resolve_ref(): emit warnings for improperly-formatted references,
2011-09-15) made resolve_ref() warn against files that are found in the
directories the ref dwimmery looks at. The intent may be good, but these
messages come from a wrong level of the API hierarchy.

Instead record the breakage in "flags" whose purpose is to explain the
result of the function to the caller, who is in a much better position to
make intelligent decision based on the information.

This updates sha1_name.c::dwim_ref() to warn against such a broken
candidate only when it does not appear directly below $GIT_DIR to restore
the traditional behaviour, as we know many files directly underneath
$GIT_DIR/ are are not refs.

Warning against "git show config --" with "$GIT_DIR/config does not look
like a well-formed ref" does not make sense, and we may later tweak the
dwimmery not to even consider them as candidates, but that is a longer
term topic.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * With a log message and the same patch text.

 refs.c      |   22 +++++++++++-----------
 refs.h      |    5 +++--
 sha1_name.c |    5 ++++-
 3 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/refs.c b/refs.c
index cab4394..0f58e46 100644
--- a/refs.c
+++ b/refs.c
@@ -4,9 +4,8 @@
 #include "tag.h"
 #include "dir.h"
 
-/* ISSYMREF=01 and ISPACKED=02 are public interfaces */
-#define REF_KNOWS_PEELED 04
-#define REF_BROKEN 010
+/* ISSYMREF=0x01, ISPACKED=0x02 and ISBROKEN=0x04 are public interfaces */
+#define REF_KNOWS_PEELED 0x10
 
 struct ref_entry {
 	unsigned char flag; /* ISSYMREF? ISPACKED? */
@@ -329,12 +328,12 @@ static void get_ref_dir(const char *submodule, const char *base,
 				flag = 0;
 				if (resolve_gitlink_ref(submodule, ref, sha1) < 0) {
 					hashclr(sha1);
-					flag |= REF_BROKEN;
+					flag |= REF_ISBROKEN;
 				}
 			} else
 				if (!resolve_ref(ref, sha1, 1, &flag)) {
 					hashclr(sha1);
-					flag |= REF_BROKEN;
+					flag |= REF_ISBROKEN;
 				}
 			add_ref(ref, sha1, flag, array, NULL);
 		}
@@ -501,7 +500,6 @@ const char *resolve_ref(const char *ref, unsigned char *sha1, int reading, int *
 	ssize_t len;
 	char buffer[256];
 	static char ref_buffer[256];
-	char path[PATH_MAX];
 
 	if (flag)
 		*flag = 0;
@@ -510,6 +508,7 @@ const char *resolve_ref(const char *ref, unsigned char *sha1, int reading, int *
 		return NULL;
 
 	for (;;) {
+		char path[PATH_MAX];
 		struct stat st;
 		char *buf;
 		int fd;
@@ -586,8 +585,8 @@ const char *resolve_ref(const char *ref, unsigned char *sha1, int reading, int *
 		while (isspace(*buf))
 			buf++;
 		if (check_refname_format(buf, REFNAME_ALLOW_ONELEVEL)) {
-			warning("symbolic reference in %s is formatted incorrectly",
-				path);
+			if (flag)
+				*flag |= REF_ISBROKEN;
 			return NULL;
 		}
 		ref = strcpy(ref_buffer, buf);
@@ -596,7 +595,8 @@ const char *resolve_ref(const char *ref, unsigned char *sha1, int reading, int *
 	}
 	/* Please note that FETCH_HEAD has a second line containing other data. */
 	if (get_sha1_hex(buffer, sha1) || (buffer[40] != '\0' && !isspace(buffer[40]))) {
-		warning("reference in %s is formatted incorrectly", path);
+		if (flag)
+			*flag |= REF_ISBROKEN;
 		return NULL;
 	}
 	return ref;
@@ -624,8 +624,8 @@ static int do_one_ref(const char *base, each_ref_fn fn, int trim,
 		return 0;
 
 	if (!(flags & DO_FOR_EACH_INCLUDE_BROKEN)) {
-		if (entry->flag & REF_BROKEN)
-			return 0; /* ignore dangling symref */
+		if (entry->flag & REF_ISBROKEN)
+			return 0; /* ignore dangling symref and corrupt ref */
 		if (!has_sha1_file(entry->sha1)) {
 			error("%s does not point to a valid object!", entry->name);
 			return 0;
diff --git a/refs.h b/refs.h
index 0229c57..7442b29 100644
--- a/refs.h
+++ b/refs.h
@@ -10,8 +10,9 @@ struct ref_lock {
 	int force_write;
 };
 
-#define REF_ISSYMREF 01
-#define REF_ISPACKED 02
+#define REF_ISSYMREF 0x01
+#define REF_ISPACKED 0x02
+#define REF_ISBROKEN 0x04
 
 /*
  * Calls the specified function for each ref file until it returns nonzero,
diff --git a/sha1_name.c b/sha1_name.c
index ba976b4..1fe37c6 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -282,8 +282,11 @@ int dwim_ref(const char *str, int len, unsigned char *sha1, char **ref)
 				*ref = xstrdup(r);
 			if (!warn_ambiguous_refs)
 				break;
-		} else if ((flag & REF_ISSYMREF) && strcmp(fullref, "HEAD"))
+		} else if ((flag & REF_ISSYMREF) && strcmp(fullref, "HEAD")) {
 			warning("ignoring dangling symref %s.", fullref);
+		} else if ((flag & REF_ISBROKEN) && strchr(fullref, '/')) {
+			warning("ignoring broken ref %s.", fullref);
+		}
 	}
 	free(last_branch);
 	return refs_found;
-- 
1.7.7.494.g7d7707

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH 2/2] Restrict ref-like names immediately below $GIT_DIR
  2011-10-19 19:29                               ` Junio C Hamano
  2011-10-19 19:39                                 ` [PATCH] resolve_ref(): report breakage to the caller without warning Junio C Hamano
@ 2011-10-19 20:31                                 ` Michael Haggerty
  2011-10-19 20:39                                   ` Junio C Hamano
  1 sibling, 1 reply; 60+ messages in thread
From: Michael Haggerty @ 2011-10-19 20:31 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier

On 10/19/2011 09:29 PM, Junio C Hamano wrote:
> In general, the "Hey that file that appears in our dwimmed ref namespace
> does not store correct [0-9a-f]{40} contents" warning message is a good
> thing to have. When we try the dwimmery and disambiguation, we however
> look at the potential refs and warn disambiguity only when two or more
> such files have good contents. E.g. if I do this:
> 
>     $ git rev-parse HEAD >.git/refs/heads/frotz
>     $ echo hello >.git/refs/tags/frotz
>     $ git show frotz
> 
> we have never paid attention to the broken tag and showed the 'frotz'
> branch without complaining. Once tags/frotz gets a real object name,
> however, we start giving ambiguity warnings.
> 
> Perhaps that is what we should be fixing instead.

This all sounds very sane.  dwim_ref() currently casts about wildly
looking for possible references, so it makes sense that it be selective
about what warnings it emits.  I also agree with the principle that this
warning is better emitted from higher-level code.

> Perhaps resolve_ref() should return in its *flag parameter that "a file
> exists there but incorrectly formatted", and dwim_ref() should notice and
> use that information to warn about ambiguity and also illformed-ness.

ISTM that such warnings should also be emitted when (for example) the
following commands encounter corrupt references: "git fsck", "git
pack-refs", "git gc", and "git push".  (Maybe these commands already
emit warnings; I haven't checked.)  These commands are not run so
frequently (so that the warnings would not be so annoying).  They are
also presumably not so promiscuous about where they look for refs and
therefore won't generate so many false alarms.

But it will be easy to add warnings to these commands using the
REF_ISBROKEN flag that you made public.

> A patch is attached at the end of this message to minimally fix what is in
> 'master' (without the jc/check-ref-format-fixup topic).  [...]

I would have preferred this change being applied on top of your first
patch in jc/check-ref-format-fixup, because moving these functions to
refs.c makes sense.  (Not to mention that my "REFNAME_FULL" patch series
is built on top of jc/check-ref-format-fixup.)

>  refs.c      |   22 +++++++++++-----------
>  refs.h      |    5 +++--
>  sha1_name.c |    5 ++++-
>  3 files changed, 18 insertions(+), 14 deletions(-)
> 
> diff --git a/refs.c b/refs.c
> index cab4394..0f58e46 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -329,12 +328,12 @@ static void get_ref_dir(const char *submodule, const char *base,
>  				flag = 0;
>  				if (resolve_gitlink_ref(submodule, ref, sha1) < 0) {
>  					hashclr(sha1);
> -					flag |= REF_BROKEN;
> +					flag |= REF_ISBROKEN;
>  				}

The renaming of this constant could have been done in a separate patch
to reduce noise like this in the main patch.

Otherwise the patch looks fine to me.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 2/2] Restrict ref-like names immediately below $GIT_DIR
  2011-10-19 20:31                                 ` [PATCH 2/2] Restrict ref-like names immediately below $GIT_DIR Michael Haggerty
@ 2011-10-19 20:39                                   ` Junio C Hamano
  0 siblings, 0 replies; 60+ messages in thread
From: Junio C Hamano @ 2011-10-19 20:39 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: git, Jeff King, cmn, A Large Angry SCM, Daniel Barkalow,
	Sverre Rabbelier

Michael Haggerty <mhagger@alum.mit.edu> writes:

> I would have preferred this change being applied on top of your first
> patch in jc/check-ref-format-fixup, because moving these functions to
> refs.c makes sense.

Thanks for a quick sanity check, and I agree.

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2011-10-19 20:39 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-15 21:10 [PATCH v3 00/22] Clean up refname checks and normalization Michael Haggerty
2011-09-15 21:10 ` [PATCH v3 01/22] t1402: add some more tests Michael Haggerty
2011-09-15 21:10 ` [PATCH v3 02/22] git check-ref-format: add options --allow-onelevel and --refspec-pattern Michael Haggerty
2011-09-15 21:10 ` [PATCH v3 03/22] Change bad_ref_char() to return a boolean value Michael Haggerty
2011-09-15 21:10 ` [PATCH v3 04/22] Change check_ref_format() to take a flags argument Michael Haggerty
2011-09-15 21:10 ` [PATCH v3 05/22] Refactor check_refname_format() Michael Haggerty
2011-09-15 21:10 ` [PATCH v3 06/22] Do not allow ".lock" at the end of any refname component Michael Haggerty
2011-09-15 21:10 ` [PATCH v3 07/22] Make collapse_slashes() allocate memory for its result Michael Haggerty
2011-09-15 21:10 ` [PATCH v3 08/22] Inline function refname_format_print() Michael Haggerty
2011-09-15 21:10 ` [PATCH v3 09/22] Change check_refname_format() to reject unnormalized refnames Michael Haggerty
2011-09-15 21:10 ` [PATCH v3 10/22] resolve_ref(): explicitly fail if a symlink is not readable Michael Haggerty
2011-09-15 21:10 ` [PATCH v3 11/22] resolve_ref(): use prefixcmp() Michael Haggerty
2011-09-15 21:10 ` [PATCH v3 12/22] resolve_ref(): only follow a symlink that contains a valid, normalized refname Michael Haggerty
2011-09-15 21:10 ` [PATCH v3 13/22] resolve_ref(): turn buffer into a proper string as soon as possible Michael Haggerty
2011-09-23  8:17   ` Thomas Rast
2011-09-23 13:11     ` Michael Haggerty
2011-09-23 13:38       ` [PATCH 1/1] get_sha1_hex(): do not read past a NUL character Michael Haggerty
2011-09-23 18:59         ` Junio C Hamano
2011-10-05 19:11           ` Thomas Rast
2011-10-05 20:37             ` Junio C Hamano
2011-09-15 21:10 ` [PATCH v3 14/22] resolve_ref(): extract a function get_packed_ref() Michael Haggerty
2011-09-15 21:10 ` [PATCH v3 15/22] resolve_ref(): do not follow incorrectly-formatted symbolic refs Michael Haggerty
2011-09-15 21:10 ` [PATCH v3 16/22] remote: use xstrdup() instead of strdup() Michael Haggerty
2011-09-15 21:10 ` [PATCH v3 17/22] remote: avoid passing NULL to read_ref() Michael Haggerty
2011-09-15 21:10 ` [PATCH v3 18/22] resolve_ref(): verify that the input refname has the right format Michael Haggerty
2011-09-15 21:10 ` [PATCH v3 19/22] resolve_ref(): emit warnings for improperly-formatted references Michael Haggerty
2011-10-11 16:16   ` Jeff King
2011-10-11 17:53     ` Junio C Hamano
2011-10-11 18:07       ` Junio C Hamano
2011-10-11 20:14         ` Re* " Junio C Hamano
2011-10-11 20:39           ` Jeff King
2011-10-11 21:31             ` Junio C Hamano
2011-10-11 22:54               ` Jeff King
2011-10-12 16:52                 ` Junio C Hamano
2011-10-11 23:07           ` Jeff King
2011-10-11 23:50             ` Junio C Hamano
2011-10-12  2:11               ` Jeff King
2011-10-12  4:41                 ` Junio C Hamano
2011-10-12  4:50                   ` Jeff King
2011-10-12 17:48                     ` [PATCH 1/2] refs.c: move dwim_ref()/dwim_log() from sha1_name.c Junio C Hamano
2011-10-12 17:49                     ` [PATCH 2/2] Restrict ref-like names immediately below $GIT_DIR Junio C Hamano
2011-10-12 18:01                       ` Michael Haggerty
2011-10-12 18:07                         ` Junio C Hamano
2011-10-12 21:42                           ` Michael Haggerty
2011-10-12 22:26                             ` Junio C Hamano
2011-10-19  5:28                             ` Junio C Hamano
2011-10-19  6:19                               ` Junio C Hamano
2011-10-19 15:18                                 ` Michael Haggerty
2011-10-19 17:10                                   ` Junio C Hamano
2011-10-19 19:29                               ` Junio C Hamano
2011-10-19 19:39                                 ` [PATCH] resolve_ref(): report breakage to the caller without warning Junio C Hamano
2011-10-19 20:31                                 ` [PATCH 2/2] Restrict ref-like names immediately below $GIT_DIR Michael Haggerty
2011-10-19 20:39                                   ` Junio C Hamano
2011-10-12 21:51                       ` Jeff King
2011-10-12  2:56               ` Re* [PATCH v3 19/22] resolve_ref(): emit warnings for improperly-formatted references Michael Haggerty
2011-10-12 19:20             ` Junio C Hamano
2011-10-12 19:26               ` Jeff King
2011-09-15 21:10 ` [PATCH v3 20/22] resolve_ref(): also treat a too-long SHA1 as invalid Michael Haggerty
2011-09-15 21:10 ` [PATCH v3 21/22] resolve_ref(): expand documentation Michael Haggerty
2011-09-15 21:10 ` [PATCH v3 22/22] add_ref(): verify that the refname is formatted correctly Michael Haggerty

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.