All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] Improved infrastructure for refname normalization
@ 2011-09-09 11:46 Michael Haggerty
  2011-09-09 11:46 ` [PATCH 1/6] Change bad_ref_char() to return a boolean value Michael Haggerty
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Michael Haggerty @ 2011-09-09 11:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, cmn, Michael Haggerty

As a prerequisite to storing references caches hierarchically (itself
needed for performance reasons), here is a patch series to help us get
refname normalization under control.

The problem is that some UI accepts unnormalized reference names (like
"/foo/bar" or "foo///bar" instead of "foo/bar") and passes them on to
library routines without normalizing them.  The library, on the other
hand, assumes that the refnames are normalized.  Sometimes (mostly in
the case of loose references) unnormalized refnames happen to work,
but in other cases (like packed references or when looking up refnames
in the cache) they silently fail.  Given that refnames are sometimes
treated as path names, there is a chance that some security-relevant
bugs are lurking in this area, if not in git proper then in scripts
that interact with git.

This patch series adds the following tools for dealing with refnames
and their normalization (without actually doing much to fix the
problem; see below):

* Fix check_ref_format() to make it easier and reliable to specify
  which types of refnames are allowed in a particular situation
  (multi-level vs. one-level, with vs. without a refspec-like "*",
  normalized or unnormalized).

* Add a function normalize_refname() that can check and optionally
  normalize refnames according to the one true set of rules.

* Add options to "git check-ref-format" to give scripts access to
  these facilities (and to allow them to be tested in the test suite).

* Forbid ".lock" at the end of any refname component, as directories
  with such names can conflict with attempts to create lock files for
  other refnames.


The patches just provide the tools for handling refnames more
consistently.  How do we actually fix the problem of inconsistent
handling of refname normalization?  First we need policy.  I suggest
the following:

Unnormalized refnames should only be accepted at the UI level and
should be normalized before use.  This can be done using code like

    char normalized_refname[strlen(arg) + 1];
    if (normalize_refname(normalized_refname, sizeof(normalized_refname),
                           arg, REFNAME_ALLOW_UNNORMALIZED|other_flags))
        die("invalid refname '%s'", arg);

    /* From now on, use normalized_refname. */

Refnames coming from other sources, such as from a remote repository,
should be checked that they are in the correct *normalized* format,
like so:

    if (check_ref_format(refname, other_flags))
        die("invalid refname '%s'", refname);

Refnames from the local repository (e.g., from the packed references
file) should also be checked that they are in the correct normalized
format, though this policy could be debated if there are performance
concerns.

Refnames probably do not need to be checked at the entrance to the
refs.{c,h} library functions because callers are responsible for not
passing invalid or unnormalized refnames in.  However, some assert()s
would probably be justified, especially during the transition while we
are fixing broken callers.

If there is agreement about this policy, I would be happy to write it
up (presumably in Documentation/technical/api-ref.txt, maybe also
incorporating the content from
Documentation/technical/api-ref-iteration.txt).

I do not yet have enough global overview of the code to know which
callers need fixing, but once these tools are in place the callers can
be fixed incrementally.

Michael Haggerty (6):
  Change bad_ref_char() to return a boolean value
  git check-ref-format: add options --onelevel-ok and --refname-pattern
  Change check_ref_format() to take a flags argument
  Add a library function normalize_refname()
  Do not allow ".lock" at the end of any refname component
  Add a REFNAME_ALLOW_UNNORMALIZED flag to check_ref_format()

 Documentation/git-check-ref-format.txt |   42 ++++++++--
 builtin/check-ref-format.c             |   62 ++++++++--------
 builtin/checkout.c                     |    2 +-
 builtin/fetch-pack.c                   |    2 +-
 builtin/receive-pack.c                 |    3 +-
 builtin/replace.c                      |    2 +-
 builtin/show-ref.c                     |    2 +-
 builtin/tag.c                          |    4 +-
 connect.c                              |    2 +-
 environment.c                          |    2 +-
 fast-import.c                          |    7 +--
 notes-merge.c                          |    5 +-
 pack-refs.c                            |    2 +-
 refs.c                                 |  133 +++++++++++++++++++-------------
 refs.h                                 |   30 ++++++-
 remote.c                               |   55 ++++---------
 sha1_name.c                            |    4 +-
 t/t1402-check-ref-format.sh            |   93 ++++++++++++++++++++--
 transport.c                            |   18 ++---
 walker.c                               |    2 +-
 20 files changed, 294 insertions(+), 178 deletions(-)

-- 
1.7.6.8.gd2879

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/6] Change bad_ref_char() to return a boolean value
  2011-09-09 11:46 [PATCH 0/6] Improved infrastructure for refname normalization Michael Haggerty
@ 2011-09-09 11:46 ` Michael Haggerty
  2011-09-09 11:46 ` [PATCH 2/6] git check-ref-format: add options --onelevel-ok and --refname-pattern Michael Haggerty
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Michael Haggerty @ 2011-09-09 11:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, cmn, Michael Haggerty

Previously most bad characters were indicated by returning 1, but "*"
was special-cased to return 2 instead of 1.  One caller examined the
return value to see whether the special case occurred.

But it is easier (to document and understand) for bad_ref_char()
simply to return a boolean value, treating "*" like any other bad
character.  Special-case the handling of "*" (which only occurs in
very specific circumstances) at the caller.  The resulting calling
code thereby also becomes more transparent.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---

This is just a random refactoring.

 refs.c |   15 ++++++---------
 1 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/refs.c b/refs.c
index a615043..fd29d89 100644
--- a/refs.c
+++ b/refs.c
@@ -860,22 +860,21 @@ int for_each_rawref(each_ref_fn fn, void *cb_data)
  * - it contains a "\" (backslash)
  */
 
+/* Return true iff ch is not allowed in reference names. */
 static inline int bad_ref_char(int ch)
 {
 	if (((unsigned) ch) <= ' ' || ch == 0x7f ||
 	    ch == '~' || ch == '^' || ch == ':' || ch == '\\')
 		return 1;
 	/* 2.13 Pattern Matching Notation */
-	if (ch == '?' || ch == '[') /* Unsupported */
+	if (ch == '*' || ch == '?' || ch == '[') /* Unsupported */
 		return 1;
-	if (ch == '*') /* Supported at the end */
-		return 2;
 	return 0;
 }
 
 int check_ref_format(const char *ref)
 {
-	int ch, level, bad_type, last;
+	int ch, level, last;
 	int ret = CHECK_REF_FORMAT_OK;
 	const char *cp = ref;
 
@@ -890,9 +889,8 @@ int check_ref_format(const char *ref)
 		/* we are at the beginning of the path component */
 		if (ch == '.')
 			return CHECK_REF_FORMAT_ERROR;
-		bad_type = bad_ref_char(ch);
-		if (bad_type) {
-			if (bad_type == 2 && (!*cp || *cp == '/') &&
+		if (bad_ref_char(ch)) {
+			if (ch == '*' && (!*cp || *cp == '/') &&
 			    ret == CHECK_REF_FORMAT_OK)
 				ret = CHECK_REF_FORMAT_WILDCARD;
 			else
@@ -902,8 +900,7 @@ int check_ref_format(const char *ref)
 		last = ch;
 		/* scan the rest of the path component */
 		while ((ch = *cp++) != 0) {
-			bad_type = bad_ref_char(ch);
-			if (bad_type)
+			if (bad_ref_char(ch))
 				return CHECK_REF_FORMAT_ERROR;
 			if (ch == '/')
 				break;
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/6] git check-ref-format: add options --onelevel-ok and --refname-pattern
  2011-09-09 11:46 [PATCH 0/6] Improved infrastructure for refname normalization Michael Haggerty
  2011-09-09 11:46 ` [PATCH 1/6] Change bad_ref_char() to return a boolean value Michael Haggerty
@ 2011-09-09 11:46 ` Michael Haggerty
  2011-09-09 11:46 ` [PATCH 3/6] Change check_ref_format() to take a flags argument Michael Haggerty
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Michael Haggerty @ 2011-09-09 11:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, cmn, Michael Haggerty

Also add tests of the new options.  (Actually, one big reason to add
the new options is to make it easy to test check_ref_format(), though
the options should also be useful to other scripts.)

Interpret the result of check_ref_format() based on which types of
refnames are allowed.  However, because check_ref_format() can only
return a single value, one test case is still broken.  Specifically,
the case "git check-ref-format --onelevel '*'" incorrectly succeeds
because check_ref_format() returns CHECK_REF_FORMAT_ONELEVEL for this
refname even though the refname is also CHECK_REF_FORMAT_WILDCARD.
The type of check that leads to this failure is used elsewhere in
"real" code and could lead to bugs; it will be fixed over the next few
commits.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---

The failing test is expressed awkwardly in
t/t1402-check-ref-format.sh, but since it will be fixed in the next
commit it doesn't seem worth investing any work in it.

 Documentation/git-check-ref-format.txt |   29 +++++++++--
 builtin/check-ref-format.c             |   56 +++++++++++++++++---
 t/t1402-check-ref-format.sh            |   87 +++++++++++++++++++++++++++++---
 3 files changed, 151 insertions(+), 21 deletions(-)

diff --git a/Documentation/git-check-ref-format.txt b/Documentation/git-check-ref-format.txt
index c9fdf84..3ab22b9 100644
--- a/Documentation/git-check-ref-format.txt
+++ b/Documentation/git-check-ref-format.txt
@@ -8,8 +8,8 @@ git-check-ref-format - Ensures that a reference name is well formed
 SYNOPSIS
 --------
 [verse]
-'git check-ref-format' <refname>
-'git check-ref-format' --print <refname>
+'git check-ref-format' [--print]
+       [--[no-]allow-onelevel] [--refspec-pattern] <refname>
 'git check-ref-format' --branch <branchname-shorthand>
 
 DESCRIPTION
@@ -32,14 +32,18 @@ git imposes the following rules on how references are named:
 
 . They must contain at least one `/`. This enforces the presence of a
   category like `heads/`, `tags/` etc. but the actual names are not
-  restricted.
+  restricted.  If the `--allow-onelevel` option is used, this rule
+  is waived.
 
 . They cannot have two consecutive dots `..` anywhere.
 
 . They cannot have ASCII control characters (i.e. bytes whose
   values are lower than \040, or \177 `DEL`), space, tilde `~`,
-  caret `{caret}`, colon `:`, question-mark `?`, asterisk `*`,
-  or open bracket `[` anywhere.
+  caret `{caret}`, or colon `:` anywhere.
+
+. They cannot have question-mark `?`, asterisk `*`, or open bracket
+  `[` anywhere.  See the `--refspec-pattern` option below for an
+  exception to this rule.
 
 . They cannot end with a slash `/` nor a dot `.`.
 
@@ -78,6 +82,21 @@ were on.  This option should be used by porcelains to accept this
 syntax anywhere a branch name is expected, so they can act as if you
 typed the branch name.
 
+OPTIONS
+-------
+--allow-onelevel::
+--no-allow-onelevel::
+	Controls whether one-level refnames are accepted (i.e.,
+	refnames that do not contain multiple `/`-separated
+	components).  The default is `--no-allow-onelevel`.
+
+--refspec-pattern::
+	Interpret <refname> as a reference name pattern for a refspec
+	(as used with remote repositories).  If this option is
+	enabled, <refname> is allowed to contain a single `*` in place
+	of a one full pathname component (e.g., `foo/*/bar` but not
+	`foo/bar*`).
+
 EXAMPLES
 --------
 
diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c
index 0723cf2..6bb9377 100644
--- a/builtin/check-ref-format.c
+++ b/builtin/check-ref-format.c
@@ -8,7 +8,7 @@
 #include "strbuf.h"
 
 static const char builtin_check_ref_format_usage[] =
-"git check-ref-format [--print] <refname>\n"
+"git check-ref-format [--print] [options] <refname>\n"
 "   or: git check-ref-format --branch <branchname-shorthand>";
 
 /*
@@ -45,27 +45,65 @@ static int check_ref_format_branch(const char *arg)
 	return 0;
 }
 
-static int check_ref_format_print(const char *arg)
+static void refname_format_print(const char *arg)
 {
 	char *refname = xmalloc(strlen(arg) + 1);
 
-	if (check_ref_format(arg))
-		return 1;
 	collapse_slashes(refname, arg);
 	printf("%s\n", refname);
-	return 0;
 }
 
+#define REFNAME_ALLOW_ONELEVEL 1
+#define REFNAME_REFSPEC_PATTERN 2
+
 int cmd_check_ref_format(int argc, const char **argv, const char *prefix)
 {
+	int i;
+	int print = 0;
+	int flags = 0;
+
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage(builtin_check_ref_format_usage);
 
 	if (argc == 3 && !strcmp(argv[1], "--branch"))
 		return check_ref_format_branch(argv[2]);
-	if (argc == 3 && !strcmp(argv[1], "--print"))
-		return check_ref_format_print(argv[2]);
-	if (argc != 2)
+
+	for (i = 1; i < argc && argv[i][0] == '-'; ++i) {
+		if (!strcmp(argv[i], "--print"))
+			print = 1;
+		else if (!strcmp(argv[i], "--allow-onelevel"))
+			flags |= REFNAME_ALLOW_ONELEVEL;
+		else if (!strcmp(argv[i], "--no-allow-onelevel"))
+			flags &= ~REFNAME_ALLOW_ONELEVEL;
+		else if (!strcmp(argv[i], "--refspec-pattern"))
+			flags |= REFNAME_REFSPEC_PATTERN;
+		else
+			usage(builtin_check_ref_format_usage);
+	}
+	if (! (i == argc - 1))
 		usage(builtin_check_ref_format_usage);
-	return !!check_ref_format(argv[1]);
+
+	switch (check_ref_format(argv[i])) {
+	case CHECK_REF_FORMAT_OK:
+		break;
+	case CHECK_REF_FORMAT_ERROR:
+		return 1;
+	case CHECK_REF_FORMAT_ONELEVEL:
+		if (!(flags & REFNAME_ALLOW_ONELEVEL))
+			return 1;
+		else
+			break;
+	case CHECK_REF_FORMAT_WILDCARD:
+		if (!(flags & REFNAME_REFSPEC_PATTERN))
+			return 1;
+		else
+			break;
+	default:
+		die("internal error: unexpected value from check_ref_format()");
+	}
+
+	if (print)
+		refname_format_print(argv[i]);
+
+	return 0;
 }
diff --git a/t/t1402-check-ref-format.sh b/t/t1402-check-ref-format.sh
index ed4275a..95dcd31 100755
--- a/t/t1402-check-ref-format.sh
+++ b/t/t1402-check-ref-format.sh
@@ -5,23 +5,35 @@ test_description='Test git check-ref-format'
 . ./test-lib.sh
 
 valid_ref() {
-	test_expect_success "ref name '$1' is valid" \
-		"git check-ref-format '$1'"
+	if test "$#" = 1
+	then
+		test_expect_success "ref name '$1' is valid" \
+			"git check-ref-format '$1'"
+	else
+		test_expect_success "ref name '$1' is valid with options $2" \
+			"git check-ref-format $2 '$1'"
+	fi
 }
 invalid_ref() {
-	test_expect_success "ref name '$1' is not valid" \
-		"test_must_fail git check-ref-format '$1'"
+	if test "$#" = 1
+	then
+		test_expect_success "ref name '$1' is invalid" \
+			"test_must_fail git check-ref-format '$1'"
+	else
+		test_expect_success "ref name '$1' is invalid with options $2" \
+			"test_must_fail git check-ref-format $2 '$1'"
+	fi
 }
 
-valid_ref 'heads/foo'
-invalid_ref 'foo'
 valid_ref 'foo/bar/baz'
 valid_ref 'refs///heads/foo'
 invalid_ref 'heads/foo/'
 valid_ref '/heads/foo'
 valid_ref '///heads/foo'
-invalid_ref '/foo'
 invalid_ref './foo'
+invalid_ref './foo/bar'
+invalid_ref 'foo/./bar'
+invalid_ref 'foo/bar/.'
 invalid_ref '.refs/foo'
 invalid_ref 'heads/foo..bar'
 invalid_ref 'heads/foo?bar'
@@ -33,6 +45,67 @@ invalid_ref 'heads/foo\bar'
 invalid_ref "$(printf 'heads/foo\t')"
 invalid_ref "$(printf 'heads/foo\177')"
 valid_ref "$(printf 'heads/fu\303\237')"
+invalid_ref 'heads/*foo/bar' --refspec-pattern
+invalid_ref 'heads/foo*/bar' --refspec-pattern
+invalid_ref 'heads/f*o/bar' --refspec-pattern
+
+ref='foo'
+invalid_ref "$ref"
+valid_ref "$ref" --allow-onelevel
+invalid_ref "$ref" --refspec-pattern
+valid_ref "$ref" '--refspec-pattern --allow-onelevel'
+
+ref='foo/bar'
+valid_ref "$ref"
+valid_ref "$ref" --allow-onelevel
+valid_ref "$ref" --refspec-pattern
+valid_ref "$ref" '--refspec-pattern --allow-onelevel'
+
+ref='foo/*'
+invalid_ref "$ref"
+invalid_ref "$ref" --allow-onelevel
+valid_ref "$ref" --refspec-pattern
+valid_ref "$ref" '--refspec-pattern --allow-onelevel'
+
+ref='*/foo'
+invalid_ref "$ref"
+invalid_ref "$ref" --allow-onelevel
+valid_ref "$ref" --refspec-pattern
+valid_ref "$ref" '--refspec-pattern --allow-onelevel'
+
+ref='foo/*/bar'
+invalid_ref "$ref"
+invalid_ref "$ref" --allow-onelevel
+valid_ref "$ref" --refspec-pattern
+valid_ref "$ref" '--refspec-pattern --allow-onelevel'
+
+ref='*'
+invalid_ref "$ref"
+
+#invalid_ref "$ref" --allow-onelevel
+test_expect_failure "ref name '$ref' is invalid with options --allow-onelevel" \
+	"test_must_fail git check-ref-format --allow-onelevel '$ref'"
+
+invalid_ref "$ref" --refspec-pattern
+valid_ref "$ref" '--refspec-pattern --allow-onelevel'
+
+ref='foo/*/*'
+invalid_ref "$ref" --refspec-pattern
+invalid_ref "$ref" '--refspec-pattern --allow-onelevel'
+
+ref='*/foo/*'
+invalid_ref "$ref" --refspec-pattern
+invalid_ref "$ref" '--refspec-pattern --allow-onelevel'
+
+ref='*/*/foo'
+invalid_ref "$ref" --refspec-pattern
+invalid_ref "$ref" '--refspec-pattern --allow-onelevel'
+
+ref='/foo'
+invalid_ref "$ref"
+valid_ref "$ref" --allow-onelevel
+invalid_ref "$ref" --refspec-pattern
+valid_ref "$ref" '--refspec-pattern --allow-onelevel'
 
 test_expect_success "check-ref-format --branch @{-1}" '
 	T=$(git write-tree) &&
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/6] Change check_ref_format() to take a flags argument
  2011-09-09 11:46 [PATCH 0/6] Improved infrastructure for refname normalization Michael Haggerty
  2011-09-09 11:46 ` [PATCH 1/6] Change bad_ref_char() to return a boolean value Michael Haggerty
  2011-09-09 11:46 ` [PATCH 2/6] git check-ref-format: add options --onelevel-ok and --refname-pattern Michael Haggerty
@ 2011-09-09 11:46 ` Michael Haggerty
  2011-09-09 11:46 ` [PATCH 4/6] Add a library function normalize_refname() Michael Haggerty
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Michael Haggerty @ 2011-09-09 11:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, cmn, Michael Haggerty

Change check_ref_format() to take a flags argument that indicates what
is acceptable in the reference name (analogous to --allow-onelevel and
--refspec-pattern).  This is more convenient for callers and also
fixes a failure in the test suite (and likely elsewhere in the code)
by enabling "onelevel" and "refspec-pattern" to be allowed
independently of each other.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 builtin/check-ref-format.c  |   21 +----------------
 builtin/checkout.c          |    2 +-
 builtin/fetch-pack.c        |    2 +-
 builtin/receive-pack.c      |    2 +-
 builtin/replace.c           |    2 +-
 builtin/show-ref.c          |    2 +-
 builtin/tag.c               |    4 +-
 connect.c                   |    2 +-
 environment.c               |    2 +-
 fast-import.c               |    7 +-----
 notes-merge.c               |    5 ++-
 pack-refs.c                 |    2 +-
 refs.c                      |   42 +++++++++++++++------------------
 refs.h                      |   17 +++++++++----
 remote.c                    |   53 +++++++++++-------------------------------
 sha1_name.c                 |    4 +-
 t/t1402-check-ref-format.sh |    6 +----
 transport.c                 |   15 ++---------
 walker.c                    |    2 +-
 19 files changed, 67 insertions(+), 125 deletions(-)

diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c
index 6bb9377..c639400 100644
--- a/builtin/check-ref-format.c
+++ b/builtin/check-ref-format.c
@@ -53,9 +53,6 @@ static void refname_format_print(const char *arg)
 	printf("%s\n", refname);
 }
 
-#define REFNAME_ALLOW_ONELEVEL 1
-#define REFNAME_REFSPEC_PATTERN 2
-
 int cmd_check_ref_format(int argc, const char **argv, const char *prefix)
 {
 	int i;
@@ -83,24 +80,8 @@ int cmd_check_ref_format(int argc, const char **argv, const char *prefix)
 	if (! (i == argc - 1))
 		usage(builtin_check_ref_format_usage);
 
-	switch (check_ref_format(argv[i])) {
-	case CHECK_REF_FORMAT_OK:
-		break;
-	case CHECK_REF_FORMAT_ERROR:
+	if (check_ref_format(argv[i], flags))
 		return 1;
-	case CHECK_REF_FORMAT_ONELEVEL:
-		if (!(flags & REFNAME_ALLOW_ONELEVEL))
-			return 1;
-		else
-			break;
-	case CHECK_REF_FORMAT_WILDCARD:
-		if (!(flags & REFNAME_REFSPEC_PATTERN))
-			return 1;
-		else
-			break;
-	default:
-		die("internal error: unexpected value from check_ref_format()");
-	}
 
 	if (print)
 		refname_format_print(argv[i]);
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 3bb6525..2882116 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -882,7 +882,7 @@ static int parse_branchname_arg(int argc, const char **argv,
 	new->name = arg;
 	setup_branch_path(new);
 
-	if (check_ref_format(new->path) == CHECK_REF_FORMAT_OK &&
+	if (!check_ref_format(new->path, 0) &&
 	    resolve_ref(new->path, branch_rev, 1, NULL))
 		hashcpy(rev, branch_rev);
 	else
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 412bd32..411aa7d 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -544,7 +544,7 @@ static void filter_refs(struct ref **refs, int nr_match, char **match)
 	for (ref = *refs; ref; ref = next) {
 		next = ref->next;
 		if (!memcmp(ref->name, "refs/", 5) &&
-		    check_ref_format(ref->name + 5))
+		    check_ref_format(ref->name + 5, 0))
 			; /* trash */
 		else if (args.fetch_all &&
 			 (!args.depth || prefixcmp(ref->name, "refs/tags/") )) {
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index ae164da..4e880ef 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -356,7 +356,7 @@ static const char *update(struct command *cmd)
 	struct ref_lock *lock;
 
 	/* only refs/... are allowed */
-	if (prefixcmp(name, "refs/") || check_ref_format(name + 5)) {
+	if (prefixcmp(name, "refs/") || check_ref_format(name + 5, 0)) {
 		rp_error("refusing to create funny ref '%s' remotely", name);
 		return "funny refname";
 	}
diff --git a/builtin/replace.c b/builtin/replace.c
index fe3a647..15f0e5e 100644
--- a/builtin/replace.c
+++ b/builtin/replace.c
@@ -94,7 +94,7 @@ static int replace_object(const char *object_ref, const char *replace_ref,
 		     "refs/replace/%s",
 		     sha1_to_hex(object)) > sizeof(ref) - 1)
 		die("replace ref name too long: %.*s...", 50, ref);
-	if (check_ref_format(ref))
+	if (check_ref_format(ref, 0))
 		die("'%s' is not a valid ref name.", ref);
 
 	if (!resolve_ref(ref, prev, 1, NULL))
diff --git a/builtin/show-ref.c b/builtin/show-ref.c
index 45f0340..375a14b 100644
--- a/builtin/show-ref.c
+++ b/builtin/show-ref.c
@@ -145,7 +145,7 @@ static int exclude_existing(const char *match)
 			if (strncmp(ref, match, matchlen))
 				continue;
 		}
-		if (check_ref_format(ref)) {
+		if (check_ref_format(ref, 0)) {
 			warning("ref '%s' ignored", ref);
 			continue;
 		}
diff --git a/builtin/tag.c b/builtin/tag.c
index 667515e..7aceaab 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -407,12 +407,12 @@ static int parse_msg_arg(const struct option *opt, const char *arg, int unset)
 static int strbuf_check_tag_ref(struct strbuf *sb, const char *name)
 {
 	if (name[0] == '-')
-		return CHECK_REF_FORMAT_ERROR;
+		return -1;
 
 	strbuf_reset(sb);
 	strbuf_addf(sb, "refs/tags/%s", name);
 
-	return check_ref_format(sb->buf);
+	return check_ref_format(sb->buf, 0);
 }
 
 int cmd_tag(int argc, const char **argv, const char *prefix)
diff --git a/connect.c b/connect.c
index ee1d4b4..292a9e2 100644
--- a/connect.c
+++ b/connect.c
@@ -22,7 +22,7 @@ static int check_ref(const char *name, int len, unsigned int flags)
 	len -= 5;
 
 	/* REF_NORMAL means that we don't want the magic fake tag refs */
-	if ((flags & REF_NORMAL) && check_ref_format(name) < 0)
+	if ((flags & REF_NORMAL) && check_ref_format(name, 0))
 		return 0;
 
 	/* REF_HEADS means that we want regular branch heads */
diff --git a/environment.c b/environment.c
index e96edcf..8acbb87 100644
--- a/environment.c
+++ b/environment.c
@@ -106,7 +106,7 @@ static char *expand_namespace(const char *raw_namespace)
 		if (strcmp((*c)->buf, "/") != 0)
 			strbuf_addf(&buf, "refs/namespaces/%s", (*c)->buf);
 	strbuf_list_free(components);
-	if (check_ref_format(buf.buf) != CHECK_REF_FORMAT_OK)
+	if (check_ref_format(buf.buf, 0))
 		die("bad git namespace path \"%s\"", raw_namespace);
 	strbuf_addch(&buf, '/');
 	return strbuf_detach(&buf, NULL);
diff --git a/fast-import.c b/fast-import.c
index 742e7da..4d55ee6 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -722,13 +722,8 @@ static struct branch *new_branch(const char *name)
 
 	if (b)
 		die("Invalid attempt to create duplicate branch: %s", name);
-	switch (check_ref_format(name)) {
-	case 0: break; /* its valid */
-	case CHECK_REF_FORMAT_ONELEVEL:
-		break; /* valid, but too few '/', allow anyway */
-	default:
+	if (check_ref_format(name, REFNAME_ALLOW_ONELEVEL))
 		die("Branch name doesn't conform to GIT standards: %s", name);
-	}
 
 	b = pool_calloc(1, sizeof(struct branch));
 	b->name = pool_strdup(name);
diff --git a/notes-merge.c b/notes-merge.c
index e1aaf43..bb8d7c8 100644
--- a/notes-merge.c
+++ b/notes-merge.c
@@ -570,7 +570,8 @@ int notes_merge(struct notes_merge_options *o,
 	/* Dereference o->local_ref into local_sha1 */
 	if (!resolve_ref(o->local_ref, local_sha1, 0, NULL))
 		die("Failed to resolve local notes ref '%s'", o->local_ref);
-	else if (!check_ref_format(o->local_ref) && is_null_sha1(local_sha1))
+	else if (!check_ref_format(o->local_ref, 0) &&
+		is_null_sha1(local_sha1))
 		local = NULL; /* local_sha1 == null_sha1 indicates unborn ref */
 	else if (!(local = lookup_commit_reference(local_sha1)))
 		die("Could not parse local commit %s (%s)",
@@ -583,7 +584,7 @@ int notes_merge(struct notes_merge_options *o,
 		 * Failed to get remote_sha1. If o->remote_ref looks like an
 		 * unborn ref, perform the merge using an empty notes tree.
 		 */
-		if (!check_ref_format(o->remote_ref)) {
+		if (!check_ref_format(o->remote_ref, 0)) {
 			hashclr(remote_sha1);
 			remote = NULL;
 		} else {
diff --git a/pack-refs.c b/pack-refs.c
index 1290570..bc0032d 100644
--- a/pack-refs.c
+++ b/pack-refs.c
@@ -72,7 +72,7 @@ static void try_remove_empty_parents(char *name)
 	for (i = 0; i < 2; i++) { /* refs/{heads,tags,...}/ */
 		while (*p && *p != '/')
 			p++;
-		/* tolerate duplicate slashes; see check_ref_format() */
+		/* tolerate duplicate slashes; see normalize_refname() */
 		while (*p == '/')
 			p++;
 	}
diff --git a/refs.c b/refs.c
index fd29d89..a206a4c 100644
--- a/refs.c
+++ b/refs.c
@@ -872,10 +872,9 @@ static inline int bad_ref_char(int ch)
 	return 0;
 }
 
-int check_ref_format(const char *ref)
+int check_ref_format(const char *ref, int flags)
 {
 	int ch, level, last;
-	int ret = CHECK_REF_FORMAT_OK;
 	const char *cp = ref;
 
 	level = 0;
@@ -884,41 +883,42 @@ int check_ref_format(const char *ref)
 			; /* tolerate duplicated slashes */
 		if (!ch)
 			/* should not end with slashes */
-			return CHECK_REF_FORMAT_ERROR;
+			return -1;
 
 		/* we are at the beginning of the path component */
 		if (ch == '.')
-			return CHECK_REF_FORMAT_ERROR;
+			return -1;
 		if (bad_ref_char(ch)) {
-			if (ch == '*' && (!*cp || *cp == '/') &&
-			    ret == CHECK_REF_FORMAT_OK)
-				ret = CHECK_REF_FORMAT_WILDCARD;
+			if ((flags & REFNAME_REFSPEC_PATTERN) && ch == '*' &&
+				(!*cp || *cp == '/'))
+				/* Accept one wildcard as a full refname component. */
+				flags &= ~REFNAME_REFSPEC_PATTERN;
 			else
-				return CHECK_REF_FORMAT_ERROR;
+				return -1;
 		}
 
 		last = ch;
 		/* scan the rest of the path component */
 		while ((ch = *cp++) != 0) {
 			if (bad_ref_char(ch))
-				return CHECK_REF_FORMAT_ERROR;
+				return -1;
 			if (ch == '/')
 				break;
 			if (last == '.' && ch == '.')
-				return CHECK_REF_FORMAT_ERROR;
+				return -1;
 			if (last == '@' && ch == '{')
-				return CHECK_REF_FORMAT_ERROR;
+				return -1;
 			last = ch;
 		}
 		level++;
 		if (!ch) {
 			if (ref <= cp - 2 && cp[-2] == '.')
-				return CHECK_REF_FORMAT_ERROR;
-			if (level < 2)
-				return CHECK_REF_FORMAT_ONELEVEL;
+				return -1;
+			if (level < 2 && !(flags & REFNAME_ALLOW_ONELEVEL))
+				return -1;
 			if (has_extension(ref, ".lock"))
-				return CHECK_REF_FORMAT_ERROR;
-			return ret;
+				return -1;
+			return 0;
 		}
 	}
 }
@@ -1103,7 +1103,7 @@ static struct ref_lock *lock_ref_sha1_basic(const char *ref, const unsigned char
 struct ref_lock *lock_ref_sha1(const char *ref, const unsigned char *old_sha1)
 {
 	char refpath[PATH_MAX];
-	if (check_ref_format(ref))
+	if (check_ref_format(ref, 0))
 		return NULL;
 	strcpy(refpath, mkpath("refs/%s", ref));
 	return lock_ref_sha1_basic(refpath, old_sha1, 0, NULL);
@@ -1111,13 +1111,9 @@ struct ref_lock *lock_ref_sha1(const char *ref, const unsigned char *old_sha1)
 
 struct ref_lock *lock_any_ref_for_update(const char *ref, const unsigned char *old_sha1, int flags)
 {
-	switch (check_ref_format(ref)) {
-	default:
+	if (check_ref_format(ref, REFNAME_ALLOW_ONELEVEL))
 		return NULL;
-	case 0:
-	case CHECK_REF_FORMAT_ONELEVEL:
-		return lock_ref_sha1_basic(ref, old_sha1, flags, NULL);
-	}
+	return lock_ref_sha1_basic(ref, old_sha1, flags, NULL);
 }
 
 static struct lock_file packlock;
diff --git a/refs.h b/refs.h
index dfb086e..b248ce6 100644
--- a/refs.h
+++ b/refs.h
@@ -97,11 +97,18 @@ int for_each_recent_reflog_ent(const char *ref, each_reflog_ent_fn fn, long, voi
  */
 extern int for_each_reflog(each_ref_fn, void *);
 
-#define CHECK_REF_FORMAT_OK 0
-#define CHECK_REF_FORMAT_ERROR (-1)
-#define CHECK_REF_FORMAT_ONELEVEL (-2)
-#define CHECK_REF_FORMAT_WILDCARD (-3)
-extern int check_ref_format(const char *target);
+#define REFNAME_ALLOW_ONELEVEL 1
+#define REFNAME_REFSPEC_PATTERN 2
+
+/*
+ * Return 0 iff ref has the correct format for a refname according to
+ * the rules described in Documentation/git-check-ref-format.txt.  If
+ * REFNAME_ALLOW_ONELEVEL is set in flags, then accept one-level
+ * reference names.  If REFNAME_REFSPEC_PATTERN is set in flags, then
+ * allow a "*" wildcard character in place of one of the name
+ * components.
+ */
+extern int check_ref_format(const char *target, int flags);
 
 extern const char *prettify_refname(const char *refname);
 extern char *shorten_unambiguous_ref(const char *ref, int strict);
diff --git a/remote.c b/remote.c
index b8ecfa5..7059885 100644
--- a/remote.c
+++ b/remote.c
@@ -493,23 +493,6 @@ static void read_config(void)
 }
 
 /*
- * We need to make sure the remote-tracking branches are well formed, but a
- * wildcard refspec in "struct refspec" must have a trailing slash. We
- * temporarily drop the trailing '/' while calling check_ref_format(),
- * and put it back.  The caller knows that a CHECK_REF_FORMAT_ONELEVEL
- * error return is Ok for a wildcard refspec.
- */
-static int verify_refname(char *name, int is_glob)
-{
-	int result;
-
-	result = check_ref_format(name);
-	if (is_glob && result == CHECK_REF_FORMAT_WILDCARD)
-		result = CHECK_REF_FORMAT_OK;
-	return result;
-}
-
-/*
  * This function frees a refspec array.
  * Warning: code paths should be checked to ensure that the src
  *          and dst pointers are always freeable pointers as well
@@ -532,13 +515,13 @@ static void free_refspecs(struct refspec *refspec, int nr_refspec)
 static struct refspec *parse_refspec_internal(int nr_refspec, const char **refspec, int fetch, int verify)
 {
 	int i;
-	int st;
 	struct refspec *rs = xcalloc(sizeof(*rs), nr_refspec);
 
 	for (i = 0; i < nr_refspec; i++) {
 		size_t llen;
 		int is_glob;
 		const char *lhs, *rhs;
+		int flags;
 
 		is_glob = 0;
 
@@ -576,6 +559,7 @@ static struct refspec *parse_refspec_internal(int nr_refspec, const char **refsp
 
 		rs[i].pattern = is_glob;
 		rs[i].src = xstrndup(lhs, llen);
+		flags = REFNAME_ALLOW_ONELEVEL | (is_glob ? REFNAME_REFSPEC_PATTERN : 0);
 
 		if (fetch) {
 			/*
@@ -585,26 +569,20 @@ static struct refspec *parse_refspec_internal(int nr_refspec, const char **refsp
 			 */
 			if (!*rs[i].src)
 				; /* empty is ok */
-			else {
-				st = verify_refname(rs[i].src, is_glob);
-				if (st && st != CHECK_REF_FORMAT_ONELEVEL)
-					goto invalid;
-			}
+			else if (check_ref_format(rs[i].src, flags))
+				goto invalid;
 			/*
 			 * RHS
 			 * - missing is ok, and is same as empty.
 			 * - empty is ok; it means not to store.
 			 * - otherwise it must be a valid looking ref.
 			 */
-			if (!rs[i].dst) {
+			if (!rs[i].dst)
 				; /* ok */
-			} else if (!*rs[i].dst) {
+			else if (!*rs[i].dst)
 				; /* ok */
-			} else {
-				st = verify_refname(rs[i].dst, is_glob);
-				if (st && st != CHECK_REF_FORMAT_ONELEVEL)
-					goto invalid;
-			}
+			else if (check_ref_format(rs[i].dst, flags))
+				goto invalid;
 		} else {
 			/*
 			 * LHS
@@ -616,8 +594,7 @@ static struct refspec *parse_refspec_internal(int nr_refspec, const char **refsp
 			if (!*rs[i].src)
 				; /* empty is ok */
 			else if (is_glob) {
-				st = verify_refname(rs[i].src, is_glob);
-				if (st && st != CHECK_REF_FORMAT_ONELEVEL)
+				if (check_ref_format(rs[i].src, flags))
 					goto invalid;
 			}
 			else
@@ -630,14 +607,12 @@ static struct refspec *parse_refspec_internal(int nr_refspec, const char **refsp
 			 * - otherwise it must be a valid looking ref.
 			 */
 			if (!rs[i].dst) {
-				st = verify_refname(rs[i].src, is_glob);
-				if (st && st != CHECK_REF_FORMAT_ONELEVEL)
+				if (check_ref_format(rs[i].src, flags))
 					goto invalid;
 			} else if (!*rs[i].dst) {
 				goto invalid;
 			} else {
-				st = verify_refname(rs[i].dst, is_glob);
-				if (st && st != CHECK_REF_FORMAT_ONELEVEL)
+				if (check_ref_format(rs[i].dst, flags))
 					goto invalid;
 			}
 		}
@@ -1427,8 +1402,8 @@ int get_fetch_map(const struct ref *remote_refs,
 
 	for (rmp = &ref_map; *rmp; ) {
 		if ((*rmp)->peer_ref) {
-			int st = check_ref_format((*rmp)->peer_ref->name + 5);
-			if (st && st != CHECK_REF_FORMAT_ONELEVEL) {
+			if (check_ref_format((*rmp)->peer_ref->name + 5,
+				REFNAME_ALLOW_ONELEVEL)) {
 				struct ref *ignore = *rmp;
 				error("* Ignoring funny ref '%s' locally",
 				      (*rmp)->peer_ref->name);
@@ -1620,7 +1595,7 @@ static int one_local_ref(const char *refname, const unsigned char *sha1, int fla
 	int len;
 
 	/* we already know it starts with refs/ to get here */
-	if (check_ref_format(refname + 5))
+	if (check_ref_format(refname + 5, 0))
 		return 0;
 
 	len = strlen(refname) + 1;
diff --git a/sha1_name.c b/sha1_name.c
index ff5992a..975ec3b 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -972,9 +972,9 @@ int strbuf_check_branch_ref(struct strbuf *sb, const char *name)
 {
 	strbuf_branchname(sb, name);
 	if (name[0] == '-')
-		return CHECK_REF_FORMAT_ERROR;
+		return -1;
 	strbuf_splice(sb, 0, 0, "refs/heads/", 11);
-	return check_ref_format(sb->buf);
+	return check_ref_format(sb->buf, 0);
 }
 
 /*
diff --git a/t/t1402-check-ref-format.sh b/t/t1402-check-ref-format.sh
index 95dcd31..6ac5025 100755
--- a/t/t1402-check-ref-format.sh
+++ b/t/t1402-check-ref-format.sh
@@ -81,11 +81,7 @@ valid_ref "$ref" '--refspec-pattern --allow-onelevel'
 
 ref='*'
 invalid_ref "$ref"
-
-#invalid_ref "$ref" --allow-onelevel
-test_expect_failure "ref name '$ref' is invalid with options --allow-onelevel" \
-	"test_must_fail git check-ref-format --allow-onelevel '$ref'"
-
+invalid_ref "$ref" --allow-onelevel
 invalid_ref "$ref" --refspec-pattern
 valid_ref "$ref" '--refspec-pattern --allow-onelevel'
 
diff --git a/transport.c b/transport.c
index fa279d5..225d9b8 100644
--- a/transport.c
+++ b/transport.c
@@ -754,18 +754,9 @@ void transport_verify_remote_names(int nr_heads, const char **heads)
 			continue;
 
 		remote = remote ? (remote + 1) : local;
-		switch (check_ref_format(remote)) {
-		case 0: /* ok */
-		case CHECK_REF_FORMAT_ONELEVEL:
-			/* ok but a single level -- that is fine for
-			 * a match pattern.
-			 */
-		case CHECK_REF_FORMAT_WILDCARD:
-			/* ok but ends with a pattern-match character */
-			continue;
-		}
-		die("remote part of refspec is not a valid name in %s",
-		    heads[i]);
+		if (check_ref_format(remote, REFNAME_ALLOW_ONELEVEL|REFNAME_REFSPEC_PATTERN))
+			die("remote part of refspec is not a valid name in %s",
+				heads[i]);
 	}
 }
 
diff --git a/walker.c b/walker.c
index dce7128..e5d8eb2 100644
--- a/walker.c
+++ b/walker.c
@@ -190,7 +190,7 @@ static int interpret_target(struct walker *walker, char *target, unsigned char *
 {
 	if (!get_sha1_hex(target, sha1))
 		return 0;
-	if (!check_ref_format(target)) {
+	if (!check_ref_format(target, 0)) {
 		struct ref *ref = alloc_ref(target);
 		if (!walker->fetch_ref(walker, ref)) {
 			hashcpy(sha1, ref->old_sha1);
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/6] Add a library function normalize_refname()
  2011-09-09 11:46 [PATCH 0/6] Improved infrastructure for refname normalization Michael Haggerty
                   ` (2 preceding siblings ...)
  2011-09-09 11:46 ` [PATCH 3/6] Change check_ref_format() to take a flags argument Michael Haggerty
@ 2011-09-09 11:46 ` Michael Haggerty
  2011-09-09 11:46 ` [PATCH 5/6] Do not allow ".lock" at the end of any refname component Michael Haggerty
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Michael Haggerty @ 2011-09-09 11:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, cmn, Michael Haggerty

Implement check_ref_format() using the new function.  Also use it to
replace collapse_slashes() in check-ref-format.c.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 builtin/check-ref-format.c  |   41 ++++-------------
 refs.c                      |  109 +++++++++++++++++++++++++++----------------
 refs.h                      |   24 +++++++---
 t/t1402-check-ref-format.sh |    3 +
 4 files changed, 98 insertions(+), 79 deletions(-)

diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c
index c639400..4c202af 100644
--- a/builtin/check-ref-format.c
+++ b/builtin/check-ref-format.c
@@ -11,28 +11,6 @@ static const char builtin_check_ref_format_usage[] =
 "git check-ref-format [--print] [options] <refname>\n"
 "   or: git check-ref-format --branch <branchname-shorthand>";
 
-/*
- * Remove leading slashes and replace each run of adjacent slashes in
- * src with a single slash, and write the result to dst.
- *
- * This function is similar to normalize_path_copy(), but stripped down
- * to meet check_ref_format's simpler needs.
- */
-static void collapse_slashes(char *dst, const char *src)
-{
-	char ch;
-	char prev = '/';
-
-	while ((ch = *src++) != '\0') {
-		if (prev == '/' && ch == prev)
-			continue;
-
-		*dst++ = ch;
-		prev = ch;
-	}
-	*dst = '\0';
-}
-
 static int check_ref_format_branch(const char *arg)
 {
 	struct strbuf sb = STRBUF_INIT;
@@ -45,12 +23,15 @@ static int check_ref_format_branch(const char *arg)
 	return 0;
 }
 
-static void refname_format_print(const char *arg)
+static int check_ref_format_print(const char *arg, int flags)
 {
-	char *refname = xmalloc(strlen(arg) + 1);
+	int refnamelen = strlen(arg) + 1;
+	char *refname = xmalloc(refnamelen);
 
-	collapse_slashes(refname, arg);
+	if (normalize_refname(refname, refnamelen, arg, flags))
+		return 1;
 	printf("%s\n", refname);
+	return 0;
 }
 
 int cmd_check_ref_format(int argc, const char **argv, const char *prefix)
@@ -79,12 +60,8 @@ int cmd_check_ref_format(int argc, const char **argv, const char *prefix)
 	}
 	if (! (i == argc - 1))
 		usage(builtin_check_ref_format_usage);
-
-	if (check_ref_format(argv[i], flags))
-		return 1;
-
 	if (print)
-		refname_format_print(argv[i]);
-
-	return 0;
+		return check_ref_format_print(argv[i], flags);
+	else
+		return !!check_ref_format(argv[i], flags);
 }
diff --git a/refs.c b/refs.c
index a206a4c..372350e 100644
--- a/refs.c
+++ b/refs.c
@@ -872,55 +872,84 @@ static inline int bad_ref_char(int ch)
 	return 0;
 }
 
-int check_ref_format(const char *ref, int flags)
+int normalize_refname(char *dst, int dstlen, const char *ref, int flags)
 {
-	int ch, level, last;
-	const char *cp = ref;
-
-	level = 0;
-	while (1) {
-		while ((ch = *cp++) == '/')
-			; /* tolerate duplicated slashes */
-		if (!ch)
-			/* should not end with slashes */
-			return -1;
+	int ch, last, component_len, component_count = 0;
+	const char *cp = ref, *component;
 
-		/* we are at the beginning of the path component */
-		if (ch == '.')
-			return -1;
-		if (bad_ref_char(ch)) {
-			if ((flags & REFNAME_REFSPEC_PATTERN) && ch == '*' &&
-				(!*cp || *cp == '/'))
-				/* Accept one wildcard as a full refname component. */
-				flags &= ~REFNAME_REFSPEC_PATTERN;
-			else
-				return -1;
-		}
+	ch = *cp;
+	do {
+		while (ch == '/')
+			ch = *++cp; /* tolerate leading and repeated slashes */
 
-		last = ch;
-		/* scan the rest of the path component */
-		while ((ch = *cp++) != 0) {
-			if (bad_ref_char(ch))
-				return -1;
-			if (ch == '/')
-				break;
+		/*
+		 * We are at the start of a path component.  Record
+		 * its start for later reference.  If we are copying
+		 * to dst, use the copy there, because we might be
+		 * overwriting ref; otherwise, use the copy from the
+		 * input string.
+		 */
+		component = dst ? dst : cp;
+		component_len = 0;
+		last = '\0';
+		while (1) {
+			if (ch != 0 && bad_ref_char(ch)) {
+				if ((flags & REFNAME_REFSPEC_PATTERN) &&
+					ch == '*' &&
+					component_len == 0 &&
+					(cp[1] == 0 || cp[1] == '/')) {
+					/* Accept one wildcard as a full refname component. */
+					flags &= ~REFNAME_REFSPEC_PATTERN;
+				} else {
+					/* Illegal character in refname */
+					return -1;
+				}
+			}
 			if (last == '.' && ch == '.')
+				/* Refname must not contain "..". */
 				return -1;
 			if (last == '@' && ch == '{')
+				/* Refname must not contain "@{". */
 				return -1;
+			if (dst) {
+				if (dstlen-- <= 0)
+					/* Output array was too small. */
+					return -1;
+				*dst++ = ch;
+			}
+			if (ch == 0 || ch == '/')
+				break;
+			++component_len;
 			last = ch;
+			ch = *++cp;
 		}
-		level++;
-		if (!ch) {
-			if (ref <= cp - 2 && cp[-2] == '.')
-				return -1;
-			if (level < 2 && !(flags & REFNAME_ALLOW_ONELEVEL))
-				return -1;
-			if (has_extension(ref, ".lock"))
-				return -1;
-			return 0;
-		}
-	}
+
+		/* We are at the end of a path component. */
+		++component_count;
+		if (component_len == 0)
+			/* Either ref was zero length or it ended with slash. */
+			return -1;
+
+		if (component[0] == '.')
+			/* Components must not start with '.'. */
+			return -1;
+	} while (ch != 0);
+
+	if (last == '.')
+		/* Refname must not end with '.'. */
+		return -1;
+	if (component_len >= 5 && !memcmp(&component[component_len - 5], ".lock", 5))
+		/* Refname must not end with ".lock". */
+		return -1;
+	if (!(flags & REFNAME_ALLOW_ONELEVEL) && component_count < 2)
+		/* Refname must have at least two components. */
+		return -1;
+	return 0;
+}
+
+int check_ref_format(const char *ref, int flags)
+{
+	return normalize_refname(NULL, 0, ref, flags);
 }
 
 const char *prettify_refname(const char *name)
diff --git a/refs.h b/refs.h
index b248ce6..8a15f83 100644
--- a/refs.h
+++ b/refs.h
@@ -101,14 +101,24 @@ extern int for_each_reflog(each_ref_fn, void *);
 #define REFNAME_REFSPEC_PATTERN 2
 
 /*
- * Return 0 iff ref has the correct format for a refname according to
- * the rules described in Documentation/git-check-ref-format.txt.  If
- * REFNAME_ALLOW_ONELEVEL is set in flags, then accept one-level
- * reference names.  If REFNAME_REFSPEC_PATTERN is set in flags, then
- * allow a "*" wildcard character in place of one of the name
- * components.
+ * Check that ref is a valid refname according to the rules described
+ * in Documentation/git-check-ref-format.txt and normalize it by
+ * stripping out superfluous "/" characters.  If dst != NULL, write
+ * the normalized refname to dst, which must be an allocated character
+ * array with length dstlen (typically at least as long as ref).  dst
+ * may point at the same memory as ref.  Return 0 iff the refname was
+ * OK and fit into dst.  If REFNAME_ALLOW_ONELEVEL is set in flags,
+ * then accept one-level reference names.  If REFNAME_REFSPEC_PATTERN
+ * is set in flags, then allow a "*" wildcard characters in place of
+ * one of the name components.
  */
-extern int check_ref_format(const char *target, int flags);
+extern int normalize_refname(char *dst, int dstlen, const char *ref, int flags);
+
+/*
+ * Return 0 iff ref has the correct format for a refname.  See
+ * normalize_refname() for details.
+ */
+extern int check_ref_format(const char *ref, int flags);
 
 extern const char *prettify_refname(const char *refname);
 extern char *shorten_unambiguous_ref(const char *ref, int strict);
diff --git a/t/t1402-check-ref-format.sh b/t/t1402-check-ref-format.sh
index 6ac5025..20a7782 100755
--- a/t/t1402-check-ref-format.sh
+++ b/t/t1402-check-ref-format.sh
@@ -39,6 +39,7 @@ invalid_ref 'heads/foo..bar'
 invalid_ref 'heads/foo?bar'
 valid_ref 'foo./bar'
 invalid_ref 'heads/foo.lock'
+invalid_ref 'heads///foo.lock'
 valid_ref 'heads/foo@bar'
 invalid_ref 'heads/v@{ation'
 invalid_ref 'heads/foo\bar'
@@ -152,5 +153,7 @@ invalid_ref_normalized '/foo'
 invalid_ref_normalized 'heads/foo/../bar'
 invalid_ref_normalized 'heads/./foo'
 invalid_ref_normalized 'heads\foo'
+invalid_ref_normalized 'heads/foo.lock'
+invalid_ref_normalized 'heads///foo.lock'
 
 test_done
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/6] Do not allow ".lock" at the end of any refname component
  2011-09-09 11:46 [PATCH 0/6] Improved infrastructure for refname normalization Michael Haggerty
                   ` (3 preceding siblings ...)
  2011-09-09 11:46 ` [PATCH 4/6] Add a library function normalize_refname() Michael Haggerty
@ 2011-09-09 11:46 ` Michael Haggerty
  2011-09-09 11:46 ` [PATCH 6/6] Add a REFNAME_ALLOW_UNNORMALIZED flag to check_ref_format() Michael Haggerty
  2011-09-09 14:06 ` [PATCH 0/6] Improved infrastructure for refname normalization A Large Angry SCM
  6 siblings, 0 replies; 13+ messages in thread
From: Michael Haggerty @ 2011-09-09 11:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, cmn, Michael Haggerty

Allowing any refname component to end with ".lock" is looking for
trouble; for example,

    $ git br foo.lock/bar
    $ git br foo
    fatal: Unable to create '[...]/.git/refs/heads/foo.lock': File exists.

Therefore, do not allow any refname component to end with ".lock".

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 Documentation/git-check-ref-format.txt |    4 +---
 refs.c                                 |    6 +++---
 t/t1402-check-ref-format.sh            |    4 ++++
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-check-ref-format.txt b/Documentation/git-check-ref-format.txt
index 3ab22b9..f2d21c7 100644
--- a/Documentation/git-check-ref-format.txt
+++ b/Documentation/git-check-ref-format.txt
@@ -28,7 +28,7 @@ git imposes the following rules on how references are named:
 
 . They can include slash `/` for hierarchical (directory)
   grouping, but no slash-separated component can begin with a
-  dot `.`.
+  dot `.` or end with the sequence `.lock`.
 
 . They must contain at least one `/`. This enforces the presence of a
   category like `heads/`, `tags/` etc. but the actual names are not
@@ -47,8 +47,6 @@ git imposes the following rules on how references are named:
 
 . They cannot end with a slash `/` nor a dot `.`.
 
-. They cannot end with the sequence `.lock`.
-
 . They cannot contain a sequence `@{`.
 
 . They cannot contain a `\`.
diff --git a/refs.c b/refs.c
index 372350e..6985a3f 100644
--- a/refs.c
+++ b/refs.c
@@ -933,14 +933,14 @@ int normalize_refname(char *dst, int dstlen, const char *ref, int flags)
 		if (component[0] == '.')
 			/* Components must not start with '.'. */
 			return -1;
+		if (component_len >= 5 && !memcmp(&component[component_len - 5], ".lock", 5))
+			/* Components must not end with ".lock". */
+			return -1;
 	} while (ch != 0);
 
 	if (last == '.')
 		/* Refname must not end with '.'. */
 		return -1;
-	if (component_len >= 5 && !memcmp(&component[component_len - 5], ".lock", 5))
-		/* Refname must not end with ".lock". */
-		return -1;
 	if (!(flags & REFNAME_ALLOW_ONELEVEL) && component_count < 2)
 		/* Refname must have at least two components. */
 		return -1;
diff --git a/t/t1402-check-ref-format.sh b/t/t1402-check-ref-format.sh
index 20a7782..6848bfb 100755
--- a/t/t1402-check-ref-format.sh
+++ b/t/t1402-check-ref-format.sh
@@ -40,6 +40,8 @@ invalid_ref 'heads/foo?bar'
 valid_ref 'foo./bar'
 invalid_ref 'heads/foo.lock'
 invalid_ref 'heads///foo.lock'
+invalid_ref 'foo.lock/bar'
+invalid_ref 'foo.lock///bar'
 valid_ref 'heads/foo@bar'
 invalid_ref 'heads/v@{ation'
 invalid_ref 'heads/foo\bar'
@@ -155,5 +157,7 @@ invalid_ref_normalized 'heads/./foo'
 invalid_ref_normalized 'heads\foo'
 invalid_ref_normalized 'heads/foo.lock'
 invalid_ref_normalized 'heads///foo.lock'
+invalid_ref_normalized 'foo.lock/bar'
+invalid_ref_normalized 'foo.lock///bar'
 
 test_done
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 6/6] Add a REFNAME_ALLOW_UNNORMALIZED flag to check_ref_format()
  2011-09-09 11:46 [PATCH 0/6] Improved infrastructure for refname normalization Michael Haggerty
                   ` (4 preceding siblings ...)
  2011-09-09 11:46 ` [PATCH 5/6] Do not allow ".lock" at the end of any refname component Michael Haggerty
@ 2011-09-09 11:46 ` Michael Haggerty
  2011-09-09 23:30   ` Junio C Hamano
  2011-09-09 14:06 ` [PATCH 0/6] Improved infrastructure for refname normalization A Large Angry SCM
  6 siblings, 1 reply; 13+ messages in thread
From: Michael Haggerty @ 2011-09-09 11:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, cmn, Michael Haggerty

Let the callers of check_ref_format() (and normalize_refname()) decide
whether to accept unnormalized refnames via a new
REFNAME_ALLOW_UNNORMALIZED flag.  Change callers to set this flag,
which preserves their current behavior.  (There are likely places
where this flag can be removed.)

Add analogous options --allow-unnormalized and --no-allow-unnormalized
options to "git check-ref-format" and add tests.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 Documentation/git-check-ref-format.txt |    9 ++++++++-
 builtin/check-ref-format.c             |    6 +++++-
 builtin/checkout.c                     |    2 +-
 builtin/fetch-pack.c                   |    2 +-
 builtin/receive-pack.c                 |    3 ++-
 builtin/replace.c                      |    2 +-
 builtin/show-ref.c                     |    2 +-
 builtin/tag.c                          |    2 +-
 connect.c                              |    2 +-
 environment.c                          |    2 +-
 fast-import.c                          |    2 +-
 notes-merge.c                          |    4 ++--
 refs.c                                 |   11 +++++++----
 refs.h                                 |    5 ++++-
 remote.c                               |    8 +++++---
 sha1_name.c                            |    2 +-
 t/t1402-check-ref-format.sh            |    3 +++
 transport.c                            |    5 ++++-
 walker.c                               |    2 +-
 19 files changed, 50 insertions(+), 24 deletions(-)

diff --git a/Documentation/git-check-ref-format.txt b/Documentation/git-check-ref-format.txt
index f2d21c7..fac44ec 100644
--- a/Documentation/git-check-ref-format.txt
+++ b/Documentation/git-check-ref-format.txt
@@ -71,7 +71,7 @@ reference name expressions (see linkgit:gitrevisions[7]):
 . at-open-brace `@{` is used as a notation to access a reflog entry.
 
 With the `--print` option, if 'refname' is acceptable, it prints the
-canonicalized name of a hypothetical reference with that name.  That is,
+normalized name of a hypothetical reference with that name.  That is,
 it prints 'refname' with any extra `/` characters removed.
 
 With the `--branch` option, it expands the ``previous branch syntax''
@@ -95,6 +95,13 @@ OPTIONS
 	of a one full pathname component (e.g., `foo/*/bar` but not
 	`foo/bar*`).
 
+--allow-unnormalized::
+--no-allow-unnormalized::
+	Controls whether <refname> is allowed to contain extra `/`
+	characters at the beginning and between name components.
+	These extra slashes are stripped out of the value printed by
+	the `--print` option.)  The default is `--allow-unnormalized`.
+
 EXAMPLES
 --------
 
diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c
index 4c202af..ba0456c 100644
--- a/builtin/check-ref-format.c
+++ b/builtin/check-ref-format.c
@@ -38,7 +38,7 @@ int cmd_check_ref_format(int argc, const char **argv, const char *prefix)
 {
 	int i;
 	int print = 0;
-	int flags = 0;
+	int flags = REFNAME_ALLOW_UNNORMALIZED;
 
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage(builtin_check_ref_format_usage);
@@ -55,6 +55,10 @@ int cmd_check_ref_format(int argc, const char **argv, const char *prefix)
 			flags &= ~REFNAME_ALLOW_ONELEVEL;
 		else if (!strcmp(argv[i], "--refspec-pattern"))
 			flags |= REFNAME_REFSPEC_PATTERN;
+		else if (!strcmp(argv[i], "--allow-unnormalized"))
+			flags |= REFNAME_ALLOW_UNNORMALIZED;
+		else if (!strcmp(argv[i], "--no-allow-unnormalized"))
+			flags &= ~REFNAME_ALLOW_UNNORMALIZED;
 		else
 			usage(builtin_check_ref_format_usage);
 	}
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 2882116..18c3270 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -882,7 +882,7 @@ static int parse_branchname_arg(int argc, const char **argv,
 	new->name = arg;
 	setup_branch_path(new);
 
-	if (!check_ref_format(new->path, 0) &&
+	if (!check_ref_format(new->path, REFNAME_ALLOW_UNNORMALIZED) &&
 	    resolve_ref(new->path, branch_rev, 1, NULL))
 		hashcpy(rev, branch_rev);
 	else
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 411aa7d..a4416e4 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -544,7 +544,7 @@ static void filter_refs(struct ref **refs, int nr_match, char **match)
 	for (ref = *refs; ref; ref = next) {
 		next = ref->next;
 		if (!memcmp(ref->name, "refs/", 5) &&
-		    check_ref_format(ref->name + 5, 0))
+		    check_ref_format(ref->name + 5, REFNAME_ALLOW_UNNORMALIZED))
 			; /* trash */
 		else if (args.fetch_all &&
 			 (!args.depth || prefixcmp(ref->name, "refs/tags/") )) {
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 4e880ef..e3159c5 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -356,7 +356,8 @@ static const char *update(struct command *cmd)
 	struct ref_lock *lock;
 
 	/* only refs/... are allowed */
-	if (prefixcmp(name, "refs/") || check_ref_format(name + 5, 0)) {
+	if (prefixcmp(name, "refs/") ||
+			check_ref_format(name + 5, REFNAME_ALLOW_UNNORMALIZED)) {
 		rp_error("refusing to create funny ref '%s' remotely", name);
 		return "funny refname";
 	}
diff --git a/builtin/replace.c b/builtin/replace.c
index 15f0e5e..4490656 100644
--- a/builtin/replace.c
+++ b/builtin/replace.c
@@ -94,7 +94,7 @@ static int replace_object(const char *object_ref, const char *replace_ref,
 		     "refs/replace/%s",
 		     sha1_to_hex(object)) > sizeof(ref) - 1)
 		die("replace ref name too long: %.*s...", 50, ref);
-	if (check_ref_format(ref, 0))
+	if (check_ref_format(ref, REFNAME_ALLOW_UNNORMALIZED))
 		die("'%s' is not a valid ref name.", ref);
 
 	if (!resolve_ref(ref, prev, 1, NULL))
diff --git a/builtin/show-ref.c b/builtin/show-ref.c
index 375a14b..dba92b2 100644
--- a/builtin/show-ref.c
+++ b/builtin/show-ref.c
@@ -145,7 +145,7 @@ static int exclude_existing(const char *match)
 			if (strncmp(ref, match, matchlen))
 				continue;
 		}
-		if (check_ref_format(ref, 0)) {
+		if (check_ref_format(ref, REFNAME_ALLOW_UNNORMALIZED)) {
 			warning("ref '%s' ignored", ref);
 			continue;
 		}
diff --git a/builtin/tag.c b/builtin/tag.c
index 7aceaab..50d2018 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -412,7 +412,7 @@ static int strbuf_check_tag_ref(struct strbuf *sb, const char *name)
 	strbuf_reset(sb);
 	strbuf_addf(sb, "refs/tags/%s", name);
 
-	return check_ref_format(sb->buf, 0);
+	return check_ref_format(sb->buf, REFNAME_ALLOW_UNNORMALIZED);
 }
 
 int cmd_tag(int argc, const char **argv, const char *prefix)
diff --git a/connect.c b/connect.c
index 292a9e2..d50f217 100644
--- a/connect.c
+++ b/connect.c
@@ -22,7 +22,7 @@ static int check_ref(const char *name, int len, unsigned int flags)
 	len -= 5;
 
 	/* REF_NORMAL means that we don't want the magic fake tag refs */
-	if ((flags & REF_NORMAL) && check_ref_format(name, 0))
+	if ((flags & REF_NORMAL) && check_ref_format(name, REFNAME_ALLOW_UNNORMALIZED))
 		return 0;
 
 	/* REF_HEADS means that we want regular branch heads */
diff --git a/environment.c b/environment.c
index 8acbb87..2a41c03 100644
--- a/environment.c
+++ b/environment.c
@@ -106,7 +106,7 @@ static char *expand_namespace(const char *raw_namespace)
 		if (strcmp((*c)->buf, "/") != 0)
 			strbuf_addf(&buf, "refs/namespaces/%s", (*c)->buf);
 	strbuf_list_free(components);
-	if (check_ref_format(buf.buf, 0))
+	if (check_ref_format(buf.buf, REFNAME_ALLOW_UNNORMALIZED))
 		die("bad git namespace path \"%s\"", raw_namespace);
 	strbuf_addch(&buf, '/');
 	return strbuf_detach(&buf, NULL);
diff --git a/fast-import.c b/fast-import.c
index 4d55ee6..d8e9fe2 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -722,7 +722,7 @@ static struct branch *new_branch(const char *name)
 
 	if (b)
 		die("Invalid attempt to create duplicate branch: %s", name);
-	if (check_ref_format(name, REFNAME_ALLOW_ONELEVEL))
+	if (check_ref_format(name, REFNAME_ALLOW_ONELEVEL|REFNAME_ALLOW_UNNORMALIZED))
 		die("Branch name doesn't conform to GIT standards: %s", name);
 
 	b = pool_calloc(1, sizeof(struct branch));
diff --git a/notes-merge.c b/notes-merge.c
index bb8d7c8..c09ce7c 100644
--- a/notes-merge.c
+++ b/notes-merge.c
@@ -570,7 +570,7 @@ int notes_merge(struct notes_merge_options *o,
 	/* Dereference o->local_ref into local_sha1 */
 	if (!resolve_ref(o->local_ref, local_sha1, 0, NULL))
 		die("Failed to resolve local notes ref '%s'", o->local_ref);
-	else if (!check_ref_format(o->local_ref, 0) &&
+	else if (!check_ref_format(o->local_ref, REFNAME_ALLOW_UNNORMALIZED) &&
 		is_null_sha1(local_sha1))
 		local = NULL; /* local_sha1 == null_sha1 indicates unborn ref */
 	else if (!(local = lookup_commit_reference(local_sha1)))
@@ -584,7 +584,7 @@ int notes_merge(struct notes_merge_options *o,
 		 * Failed to get remote_sha1. If o->remote_ref looks like an
 		 * unborn ref, perform the merge using an empty notes tree.
 		 */
-		if (!check_ref_format(o->remote_ref, 0)) {
+		if (!check_ref_format(o->remote_ref, REFNAME_ALLOW_UNNORMALIZED)) {
 			hashclr(remote_sha1);
 			remote = NULL;
 		} else {
diff --git a/refs.c b/refs.c
index 6985a3f..c2a7c01 100644
--- a/refs.c
+++ b/refs.c
@@ -879,8 +879,11 @@ int normalize_refname(char *dst, int dstlen, const char *ref, int flags)
 
 	ch = *cp;
 	do {
-		while (ch == '/')
-			ch = *++cp; /* tolerate leading and repeated slashes */
+		if (flags && REFNAME_ALLOW_UNNORMALIZED) {
+			/* tolerate leading and repeated slashes */
+			while (ch == '/')
+				ch = *++cp;
+		}
 
 		/*
 		 * We are at the start of a path component.  Record
@@ -1132,7 +1135,7 @@ static struct ref_lock *lock_ref_sha1_basic(const char *ref, const unsigned char
 struct ref_lock *lock_ref_sha1(const char *ref, const unsigned char *old_sha1)
 {
 	char refpath[PATH_MAX];
-	if (check_ref_format(ref, 0))
+	if (check_ref_format(ref, REFNAME_ALLOW_UNNORMALIZED))
 		return NULL;
 	strcpy(refpath, mkpath("refs/%s", ref));
 	return lock_ref_sha1_basic(refpath, old_sha1, 0, NULL);
@@ -1140,7 +1143,7 @@ struct ref_lock *lock_ref_sha1(const char *ref, const unsigned char *old_sha1)
 
 struct ref_lock *lock_any_ref_for_update(const char *ref, const unsigned char *old_sha1, int flags)
 {
-	if (check_ref_format(ref, REFNAME_ALLOW_ONELEVEL))
+	if (check_ref_format(ref, REFNAME_ALLOW_ONELEVEL|REFNAME_ALLOW_UNNORMALIZED))
 		return NULL;
 	return lock_ref_sha1_basic(ref, old_sha1, flags, NULL);
 }
diff --git a/refs.h b/refs.h
index 8a15f83..f203726 100644
--- a/refs.h
+++ b/refs.h
@@ -99,6 +99,7 @@ extern int for_each_reflog(each_ref_fn, void *);
 
 #define REFNAME_ALLOW_ONELEVEL 1
 #define REFNAME_REFSPEC_PATTERN 2
+#define REFNAME_ALLOW_UNNORMALIZED 4
 
 /*
  * Check that ref is a valid refname according to the rules described
@@ -110,7 +111,9 @@ extern int for_each_reflog(each_ref_fn, void *);
  * OK and fit into dst.  If REFNAME_ALLOW_ONELEVEL is set in flags,
  * then accept one-level reference names.  If REFNAME_REFSPEC_PATTERN
  * is set in flags, then allow a "*" wildcard characters in place of
- * one of the name components.
+ * one of the name components.  If REFNAME_ALLOW_UNNORMALIZED is set
+ * in flags, then allow extra "/" characters at the start of the
+ * refname or between name components.
  */
 extern int normalize_refname(char *dst, int dstlen, const char *ref, int flags);
 
diff --git a/remote.c b/remote.c
index 7059885..87da1e9 100644
--- a/remote.c
+++ b/remote.c
@@ -559,7 +559,9 @@ static struct refspec *parse_refspec_internal(int nr_refspec, const char **refsp
 
 		rs[i].pattern = is_glob;
 		rs[i].src = xstrndup(lhs, llen);
-		flags = REFNAME_ALLOW_ONELEVEL | (is_glob ? REFNAME_REFSPEC_PATTERN : 0);
+		flags = REFNAME_ALLOW_ONELEVEL |
+			(is_glob ? REFNAME_REFSPEC_PATTERN : 0) |
+			REFNAME_ALLOW_UNNORMALIZED;
 
 		if (fetch) {
 			/*
@@ -1403,7 +1405,7 @@ int get_fetch_map(const struct ref *remote_refs,
 	for (rmp = &ref_map; *rmp; ) {
 		if ((*rmp)->peer_ref) {
 			if (check_ref_format((*rmp)->peer_ref->name + 5,
-				REFNAME_ALLOW_ONELEVEL)) {
+				REFNAME_ALLOW_ONELEVEL|REFNAME_ALLOW_UNNORMALIZED)) {
 				struct ref *ignore = *rmp;
 				error("* Ignoring funny ref '%s' locally",
 				      (*rmp)->peer_ref->name);
@@ -1595,7 +1597,7 @@ static int one_local_ref(const char *refname, const unsigned char *sha1, int fla
 	int len;
 
 	/* we already know it starts with refs/ to get here */
-	if (check_ref_format(refname + 5, 0))
+	if (check_ref_format(refname + 5, REFNAME_ALLOW_UNNORMALIZED))
 		return 0;
 
 	len = strlen(refname) + 1;
diff --git a/sha1_name.c b/sha1_name.c
index 975ec3b..6a5d104 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -974,7 +974,7 @@ int strbuf_check_branch_ref(struct strbuf *sb, const char *name)
 	if (name[0] == '-')
 		return -1;
 	strbuf_splice(sb, 0, 0, "refs/heads/", 11);
-	return check_ref_format(sb->buf, 0);
+	return check_ref_format(sb->buf, REFNAME_ALLOW_UNNORMALIZED);
 }
 
 /*
diff --git a/t/t1402-check-ref-format.sh b/t/t1402-check-ref-format.sh
index 6848bfb..a1d4c5b 100755
--- a/t/t1402-check-ref-format.sh
+++ b/t/t1402-check-ref-format.sh
@@ -27,9 +27,12 @@ invalid_ref() {
 
 valid_ref 'foo/bar/baz'
 valid_ref 'refs///heads/foo'
+invalid_ref 'refs///heads/foo' --no-allow-unnormalized
 invalid_ref 'heads/foo/'
 valid_ref '/heads/foo'
+invalid_ref '/heads/foo' --no-allow-unnormalized
 valid_ref '///heads/foo'
+invalid_ref '///heads/foo' --no-allow-unnormalized
 invalid_ref './foo'
 invalid_ref './foo/bar'
 invalid_ref 'foo/./bar'
diff --git a/transport.c b/transport.c
index 225d9b8..65209af 100644
--- a/transport.c
+++ b/transport.c
@@ -754,7 +754,10 @@ void transport_verify_remote_names(int nr_heads, const char **heads)
 			continue;
 
 		remote = remote ? (remote + 1) : local;
-		if (check_ref_format(remote, REFNAME_ALLOW_ONELEVEL|REFNAME_REFSPEC_PATTERN))
+		if (check_ref_format(remote,
+				     REFNAME_ALLOW_ONELEVEL|
+				     REFNAME_REFSPEC_PATTERN|
+				     REFNAME_ALLOW_UNNORMALIZED))
 			die("remote part of refspec is not a valid name in %s",
 				heads[i]);
 	}
diff --git a/walker.c b/walker.c
index e5d8eb2..f0dbe58 100644
--- a/walker.c
+++ b/walker.c
@@ -190,7 +190,7 @@ static int interpret_target(struct walker *walker, char *target, unsigned char *
 {
 	if (!get_sha1_hex(target, sha1))
 		return 0;
-	if (!check_ref_format(target, 0)) {
+	if (!check_ref_format(target, REFNAME_ALLOW_UNNORMALIZED)) {
 		struct ref *ref = alloc_ref(target);
 		if (!walker->fetch_ref(walker, ref)) {
 			hashcpy(sha1, ref->old_sha1);
-- 
1.7.6.8.gd2879

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/6] Improved infrastructure for refname normalization
  2011-09-09 11:46 [PATCH 0/6] Improved infrastructure for refname normalization Michael Haggerty
                   ` (5 preceding siblings ...)
  2011-09-09 11:46 ` [PATCH 6/6] Add a REFNAME_ALLOW_UNNORMALIZED flag to check_ref_format() Michael Haggerty
@ 2011-09-09 14:06 ` A Large Angry SCM
  2011-09-09 15:33   ` Michael Haggerty
  6 siblings, 1 reply; 13+ messages in thread
From: A Large Angry SCM @ 2011-09-09 14:06 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: git, Junio C Hamano, cmn

On 09/09/2011 07:46 AM, Michael Haggerty wrote:
> As a prerequisite to storing references caches hierarchically (itself
> needed for performance reasons), here is a patch series to help us get
> refname normalization under control.
>
> The problem is that some UI accepts unnormalized reference names (like
> "/foo/bar" or "foo///bar" instead of "foo/bar") and passes them on to
> library routines without normalizing them.  The library, on the other
> hand, assumes that the refnames are normalized.  Sometimes (mostly in
> the case of loose references) unnormalized refnames happen to work,
> but in other cases (like packed references or when looking up refnames
> in the cache) they silently fail.  Given that refnames are sometimes
> treated as path names, there is a chance that some security-relevant
> bugs are lurking in this area, if not in git proper then in scripts
> that interact with git.

Why can't the library do the normalization instead of expecting every 
other component that deals with reference names having to do it for the 
library?

[...]

>
> * Forbid ".lock" at the end of any refname component, as directories
>    with such names can conflict with attempts to create lock files for
>    other refnames.

I find this overly restrictive. If you need to create a lock based on a 
reference name or component, use a name for the lock object that starts 
with one of the characters that reference names or components are 
already forbidden from starting with.


Gitzilla

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/6] Improved infrastructure for refname normalization
  2011-09-09 14:06 ` [PATCH 0/6] Improved infrastructure for refname normalization A Large Angry SCM
@ 2011-09-09 15:33   ` Michael Haggerty
  2011-09-09 17:57     ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Haggerty @ 2011-09-09 15:33 UTC (permalink / raw)
  To: gitzilla; +Cc: git, Junio C Hamano, cmn

On 09/09/2011 04:06 PM, A Large Angry SCM wrote:
> On 09/09/2011 07:46 AM, Michael Haggerty wrote:
>> As a prerequisite to storing references caches hierarchically (itself
>> needed for performance reasons), here is a patch series to help us get
>> refname normalization under control.
>>
>> The problem is that some UI accepts unnormalized reference names (like
>> "/foo/bar" or "foo///bar" instead of "foo/bar") and passes them on to
>> library routines without normalizing them.  The library, on the other
>> hand, assumes that the refnames are normalized.  Sometimes (mostly in
>> the case of loose references) unnormalized refnames happen to work,
>> but in other cases (like packed references or when looking up refnames
>> in the cache) they silently fail.  Given that refnames are sometimes
>> treated as path names, there is a chance that some security-relevant
>> bugs are lurking in this area, if not in git proper then in scripts
>> that interact with git.
> 
> Why can't the library do the normalization instead of expecting every
> other component that deals with reference names having to do it for the
> library?

The library could do the normalization, but

1. It would probably cost a lot of redundant checks as reference names
pass in and out of the library and back in again

2. Normalization requires copying or overwriting the incoming string, so
each time a refname crosses the library perimeter there might have to be
an extra memory allocation with the associated headaches of dealing with
the ownership of the memory.

3. The library doesn't encapsulate all uses of reference names; for
example, for_each_ref() invokes a callback function with the refname as
an argument.  The callback function is free to do a strcmp() of the
refname (normalized by the library) with some arbitrary string that it
got from the command line.  Either the caller has to do the
normalization itself (i.e., outside of the library) or the library has
to learn how to do every possible filtering operation with refnames.

>> * Forbid ".lock" at the end of any refname component, as directories
>>    with such names can conflict with attempts to create lock files for
>>    other refnames.
> 
> I find this overly restrictive. If you need to create a lock based on a
> reference name or component, use a name for the lock object that starts
> with one of the characters that reference names or components are
> already forbidden from starting with.

I agree; this is unpleasantly restrictive.

But please remember that refnames already cannot end in ".lock"
("foo/bar.lock" is already forbidden; this change also prohibits
"foo.lock/bar").

However, your suggested solution would cause problems if two versions of
git are running on the same machine.  An old version of git would not
know to respect the new version's lock files.  ISTM that this would be
too dangerous.  Suggestions welcome.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/6] Improved infrastructure for refname normalization
  2011-09-09 15:33   ` Michael Haggerty
@ 2011-09-09 17:57     ` Junio C Hamano
  2011-09-10  3:31       ` Michael Haggerty
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2011-09-09 17:57 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: gitzilla, git, cmn

Michael Haggerty <mhagger@alum.mit.edu> writes:

> The library could do the normalization, but
>
> 1. It would probably cost a lot of redundant checks as reference names
> pass in and out of the library and back in again
>
> 2. Normalization requires copying or overwriting the incoming string, so
> each time a refname crosses the library perimeter there might have to be
> an extra memory allocation with the associated headaches of dealing with
> the ownership of the memory.
>
> 3. The library doesn't encapsulate all uses of reference names; for
> example, for_each_ref() invokes a callback function with the refname as
> an argument.  The callback function is free to do a strcmp() of the
> refname (normalized by the library) with some arbitrary string that it
> got from the command line.  Either the caller has to do the
> normalization itself (i.e., outside of the library) or the library has
> to learn how to do every possible filtering operation with refnames.

4. The caller needs to be corrected to pay attention to the normalization
the library did for it. Your code may use a string as a ref and then
create something based on the refname; illustrating with a fictitious
example:

	ref = make_branch_ref("refs/heads/%s", branch_name);
        update_ref(ref, sha1);
        write_log("created branch '%s'", branch_name);

Even though make_branch_ref() may have removed duplicated slashes from the
name in "branch_name" when it computed "ref", the log still will record
unnormalized name.

I think the callers need to be aware of the normalization in practice
anyway for this reason, and a good way forward is to give the callers a
library interface to do so. It might even make sense to make the other
parts of the API _reject_ unnormalized input to catch offending callers.

By the way, does this series introduce new infrastructure features that
can be reused in different areas, such as Hui's "alt_odb path
normalization" patch?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 6/6] Add a REFNAME_ALLOW_UNNORMALIZED flag to check_ref_format()
  2011-09-09 11:46 ` [PATCH 6/6] Add a REFNAME_ALLOW_UNNORMALIZED flag to check_ref_format() Michael Haggerty
@ 2011-09-09 23:30   ` Junio C Hamano
  2011-09-10  4:04     ` Michael Haggerty
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2011-09-09 23:30 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: git, cmn

Michael Haggerty <mhagger@alum.mit.edu> writes:

> Let the callers of check_ref_format() (and normalize_refname()) decide
> whether to accept unnormalized refnames via a new
> REFNAME_ALLOW_UNNORMALIZED flag.  Change callers to set this flag,
> which preserves their current behavior.  (There are likely places
> where this flag can be removed.)

Hmm, is it just me who finds --no-allow-unnormalized backwards?

More importantly, shouldn't every caller be required to normalize refnames
by default, unless it can justify why it does not have to with a compelling
reason?

In other words, I would be Ok if "--no-require-normalized" was the default
for "git check-ref-format" for scripts' use, and I also would be perfectly
fine if callers feed un-normalized strings that came from the command line
argument and other end user input to normalize_refname(), but once such a
string is normalized, shouldn't the rest of the callchain be passing the
normalized refname all the way?

To put it another way, my knee jerk reaction is that we shouldn't need
such a "flag". Shouldn't it be sufficient for normalize_refname() and
nothing else to allow unnormalized input, and everybody else should barf
when they see an un-normalized input?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/6] Improved infrastructure for refname normalization
  2011-09-09 17:57     ` Junio C Hamano
@ 2011-09-10  3:31       ` Michael Haggerty
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Haggerty @ 2011-09-10  3:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: gitzilla, git, cmn

On 09/09/2011 07:57 PM, Junio C Hamano wrote:
> By the way, does this series introduce new infrastructure features that
> can be reused in different areas, such as Hui's "alt_odb path
> normalization" patch?

That code is for normalizing filesystem paths, right?

The rules for normalizing filesystem paths are similar to those for
refnames (except maybe for stripping the leading "/").  But the validity
checks are different, and should be kept separate in case some of the
rules need to be tweaked.  Since I put the code for validity checks and
normalization of refnames in a single function, I don't think it makes
sense to share code.

It would be possible to separate the validity checks from the
normalization, but that would require two scans of the refname.  And I
think it should be considered rather an accident that filesystem names
and refnames have similar conventions (even though there is a strong
historical reason for the similarity); they could some day diverge if,
say, we started adding support for Windows-native paths.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 6/6] Add a REFNAME_ALLOW_UNNORMALIZED flag to check_ref_format()
  2011-09-09 23:30   ` Junio C Hamano
@ 2011-09-10  4:04     ` Michael Haggerty
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Haggerty @ 2011-09-10  4:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, cmn

On 09/10/2011 01:30 AM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
>> Let the callers of check_ref_format() (and normalize_refname()) decide
>> whether to accept unnormalized refnames via a new
>> REFNAME_ALLOW_UNNORMALIZED flag.  Change callers to set this flag,
>> which preserves their current behavior.  (There are likely places
>> where this flag can be removed.)
> 
> [...]
> To put it another way, my knee jerk reaction is that we shouldn't need
> such a "flag". Shouldn't it be sufficient for normalize_refname() and
> nothing else to allow unnormalized input, and everybody else should barf
> when they see an un-normalized input?

That is a good idea.

I will make the current normalize_refname() function static and hide the
REFNAME_ALLOW_UNNORMALIZED option from the outside world.  Then I will
write a new public normalize_refname() function that calls the static
version with REFNAME_ALLOW_UNNORMALIZED set, and change
check_ref_format() to call normalize_refname() with
REFNAME_ALLOW_UNNORMALIZED unset.

What should I do with all of the current callers of check_ref_format(),
given that I don't want to be personally responsible for analyzing and
rewriting them all?  The hard-nosed approach would be to say that they
are calling check_ref_format() without normalizing the refnames, so they
are already broken (albeit perhaps sometimes accidentally functional),
and it is OK that the new behavior of check_ref_format() causes them to
fail explicitly.

A more forgiving approach would be to implement another transition
function like check_ref_format_deprecated_unsafe() that accepts
unnormalized refnames, change the callers to use this function during
the transition, and remove it only after all callers have been fixed.

Suggestions?

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-09-10  4:05 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-09 11:46 [PATCH 0/6] Improved infrastructure for refname normalization Michael Haggerty
2011-09-09 11:46 ` [PATCH 1/6] Change bad_ref_char() to return a boolean value Michael Haggerty
2011-09-09 11:46 ` [PATCH 2/6] git check-ref-format: add options --onelevel-ok and --refname-pattern Michael Haggerty
2011-09-09 11:46 ` [PATCH 3/6] Change check_ref_format() to take a flags argument Michael Haggerty
2011-09-09 11:46 ` [PATCH 4/6] Add a library function normalize_refname() Michael Haggerty
2011-09-09 11:46 ` [PATCH 5/6] Do not allow ".lock" at the end of any refname component Michael Haggerty
2011-09-09 11:46 ` [PATCH 6/6] Add a REFNAME_ALLOW_UNNORMALIZED flag to check_ref_format() Michael Haggerty
2011-09-09 23:30   ` Junio C Hamano
2011-09-10  4:04     ` Michael Haggerty
2011-09-09 14:06 ` [PATCH 0/6] Improved infrastructure for refname normalization A Large Angry SCM
2011-09-09 15:33   ` Michael Haggerty
2011-09-09 17:57     ` Junio C Hamano
2011-09-10  3:31       ` Michael Haggerty

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.