All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] create_ref_entry(): move check_refname_format() call to callers
@ 2012-04-29  6:18 mhagger
  2012-04-29 11:58 ` Jeff King
  0 siblings, 1 reply; 8+ messages in thread
From: mhagger @ 2012-04-29  6:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Michael Haggerty

From: Michael Haggerty <mhagger@alum.mit.edu>

This allows the caller to decide what flags to use for
check_refname_format().

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---

On 04/27/2012 05:06 PM, Junio C Hamano wrote:
> Jeff King<peff@peff.net>  writes:
> 
>> I upgraded git on a machine recently, and it created problems for a repo
>> with a bogus character in a ref name.  Older versions of git never
>> complained about it. Newer ones, containing your dce4bab ("add_ref():
>> verify that the refname is formatted correctly") do. That's fine; it's
>> bogus and git _should_ complain about it.
>>
>> However, recovering from the situation is unnecessarily hard, ...
>> ...
>> I seem to recall discussing this format-tightening and trying to be sure
>> that users were left with a way forward for fixing their repos. But I
>> can't find the discussion, and I don't recall any conclusion we came to.
> 
> I haven't dug the archive but I do recall pointing many issues out
> around the theme "be liberal in what you accept and strict in what you
> produce" on this topic, and loosening one or two showstoppers during the
> review cycle, but obviously we did not catch all of them.
> 
> Michael?

I will work on providing more infrastructure for checking refnames at
varying levels of strictness, but I don't know enough about the code
paths to be able to find the places where the strictness levels need
tweaking.

For this to work, the various callers of check_refname_format() will
have to be able to influence the level of strictness that they want to
enforce.  This patch is one trivial step in that direction.

 refs.c |   23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/refs.c b/refs.c
index 09322fe..bc735e7 100644
--- a/refs.c
+++ b/refs.c
@@ -172,15 +172,11 @@ struct ref_entry {
 };
 
 static struct ref_entry *create_ref_entry(const char *refname,
-					  const unsigned char *sha1, int flag,
-					  int check_name)
+					  const unsigned char *sha1, int flag)
 {
 	int len;
 	struct ref_entry *ref;
 
-	if (check_name &&
-	    check_refname_format(refname, REFNAME_ALLOW_ONELEVEL|REFNAME_DOT_COMPONENT))
-		die("Reference has invalid format: '%s'", refname);
 	len = strlen(refname) + 1;
 	ref = xmalloc(sizeof(struct ref_entry) + len);
 	hashcpy(ref->u.value.sha1, sha1);
@@ -710,7 +706,11 @@ static void read_packed_refs(FILE *f, struct ref_dir *dir)
 
 		refname = parse_ref_line(refline, sha1);
 		if (refname) {
-			last = create_ref_entry(refname, sha1, flag, 1);
+			if (check_refname_format(refname,
+						 REFNAME_ALLOW_ONELEVEL|
+						 REFNAME_DOT_COMPONENT))
+				die("Packed reference has invalid format: '%s'", refname);
+			last = create_ref_entry(refname, sha1, flag);
 			add_ref(dir, last);
 			continue;
 		}
@@ -745,8 +745,11 @@ static struct ref_dir *get_packed_refs(struct ref_cache *refs)
 
 void add_packed_ref(const char *refname, const unsigned char *sha1)
 {
+	if (check_refname_format(refname,
+				 REFNAME_ALLOW_ONELEVEL|REFNAME_DOT_COMPONENT))
+		die("Reference has invalid format: '%s'", refname);
 	add_ref(get_packed_refs(get_ref_cache(NULL)),
-			create_ref_entry(refname, sha1, REF_ISPACKED, 1));
+			create_ref_entry(refname, sha1, REF_ISPACKED));
 }
 
 static void get_ref_dir(struct ref_cache *refs, const char *base,
@@ -805,7 +808,11 @@ static void get_ref_dir(struct ref_cache *refs, const char *base,
 				hashclr(sha1);
 				flag |= REF_ISBROKEN;
 			}
-			add_ref(dir, create_ref_entry(refname, sha1, flag, 1));
+			if (check_refname_format(refname,
+						 REFNAME_ALLOW_ONELEVEL|
+						 REFNAME_DOT_COMPONENT))
+				die("Loose reference has invalid format: '%s'", refname);
+			add_ref(dir, create_ref_entry(refname, sha1, flag));
 		}
 		free(refname);
 		closedir(d);
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] create_ref_entry(): move check_refname_format() call to callers
  2012-04-29  6:18 [PATCH] create_ref_entry(): move check_refname_format() call to callers mhagger
@ 2012-04-29 11:58 ` Jeff King
  2012-04-30  6:15   ` Junio C Hamano
  2012-04-30 16:18   ` Michael Haggerty
  0 siblings, 2 replies; 8+ messages in thread
From: Jeff King @ 2012-04-29 11:58 UTC (permalink / raw)
  To: mhagger; +Cc: Junio C Hamano, git

On Sun, Apr 29, 2012 at 08:18:08AM +0200, mhagger@alum.mit.edu wrote:

> I will work on providing more infrastructure for checking refnames at
> varying levels of strictness, but I don't know enough about the code
> paths to be able to find the places where the strictness levels need
> tweaking.
> 
> For this to work, the various callers of check_refname_format() will
> have to be able to influence the level of strictness that they want to
> enforce.  This patch is one trivial step in that direction.

It seems like the create_ref_entry code paths should _always_ just be
issuing warnings, as they are about reading existing refs, no? The die()
side should only come when we are writing refs.

So something like this patch:

diff --git a/refs.c b/refs.c
index a5802e1..3dba205 100644
--- a/refs.c
+++ b/refs.c
@@ -180,7 +180,7 @@ static struct ref_entry *create_ref_entry(const char *refname,
 
 	if (check_name &&
 	    check_refname_format(refname, REFNAME_ALLOW_ONELEVEL|REFNAME_DOT_COMPONENT))
-		die("Reference has invalid format: '%s'", refname);
+		warning("Reference has invalid format: '%s'", refname);
 	len = strlen(refname) + 1;
 	ref = xmalloc(sizeof(struct ref_entry) + len);
 	hashcpy(ref->u.value.sha1, sha1);
@@ -926,7 +926,7 @@ const char *resolve_ref_unsafe(const char *refname, unsigned char *sha1, int rea
 	if (flag)
 		*flag = 0;
 
-	if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL))
+	if (!reading && check_refname_format(refname, REFNAME_ALLOW_ONELEVEL))
 		return NULL;
 
 	for (;;) {

which gives me the following behavior on the snippet I posted earlier:

  $ git fsck
  warning: Reference has invalid format: 'refs/tags/foo^?bar'

  $ git rev-parse --verify "$evil"
  711078d8c4c0d26b02afee9f385a64455fe4cccd

  $ git for-each-ref
  warning: Reference has invalid format: 'refs/tags/foo^?bar'
  711078d8c4c0d26b02afee9f385a64455fe4cccd commit refs/heads/master
  711078d8c4c0d26b02afee9f385a64455fe4cccd commit refs/tags/foobar

  $ git tag -l
  warning: Reference has invalid format: 'refs/tags/foo^?bar'
  foo^?bar

  $ git tag fixed "$evil"
  warning: Reference has invalid format: 'refs/tags/foo^?bar'
  $ git rev-parse fixed
  711078d8c4c0d26b02afee9f385a64455fe4cccd
  $ git tag -d "$evil"
  Deleted tag 'foo^?bar' (was 711078d)

-Peff

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] create_ref_entry(): move check_refname_format() call to callers
  2012-04-29 11:58 ` Jeff King
@ 2012-04-30  6:15   ` Junio C Hamano
  2012-04-30 16:18   ` Michael Haggerty
  1 sibling, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2012-04-30  6:15 UTC (permalink / raw)
  To: Jeff King; +Cc: mhagger, git

Jeff King <peff@peff.net> writes:

> On Sun, Apr 29, 2012 at 08:18:08AM +0200, mhagger@alum.mit.edu wrote:
> ...
> It seems like the create_ref_entry code paths should _always_ just be
> issuing warnings, as they are about reading existing refs, no? The die()
> side should only come when we are writing refs.

That matches my observation.  Michael, are we missing something?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] create_ref_entry(): move check_refname_format() call to callers
  2012-04-29 11:58 ` Jeff King
  2012-04-30  6:15   ` Junio C Hamano
@ 2012-04-30 16:18   ` Michael Haggerty
  2012-04-30 17:14     ` Junio C Hamano
  1 sibling, 1 reply; 8+ messages in thread
From: Michael Haggerty @ 2012-04-30 16:18 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git

On 04/29/2012 01:58 PM, Jeff King wrote:
> On Sun, Apr 29, 2012 at 08:18:08AM +0200, mhagger@alum.mit.edu wrote:
>
>> I will work on providing more infrastructure for checking refnames at
>> varying levels of strictness, but I don't know enough about the code
>> paths to be able to find the places where the strictness levels need
>> tweaking.
>>
>> For this to work, the various callers of check_refname_format() will
>> have to be able to influence the level of strictness that they want to
>> enforce.  This patch is one trivial step in that direction.
>
> It seems like the create_ref_entry code paths should _always_ just be
> issuing warnings, as they are about reading existing refs, no? The die()
> side should only come when we are writing refs.

There are not two but three classes of refnames:

1. Fully legal refnames (according to the rules of check-ref-format).

2. Refnames that might cause parsing trouble in some circumstances but 
could otherwise be tolerated (with a warning) in the hope that the user 
can delete them before they cause further confusion.  These include all 
illegal refnames that aren't covered by (3).

3. Refnames that are so pathological that we don't want to let them into 
our code at all, under any circumstances.  This category, I think, 
includes refnames that include "/../" (because they could cause our code 
to walk up the filesystem) or LF (because if written to a packed-refs 
file they would make it unreadable) and perhaps "//" (because they would 
confuse the loose reference code and the hierarchical reference cache 
code).  And depending on how much you trust shell scripts to do quoting 
correctly, other patterns might also be risky.

I would like to be able to distinguish between (2) and (3) before 
deciding what to do in any specific cases.  References in category (2) 
can probably be accepted (with a warning) in old data but we should not 
allow the user to create new ones.  References in category (3) are more 
critical; I see three options for dealing with them: die(), emit a 
warning then drop them on the floor, or emit a warning but handle them 
somehow (for example, by URL-encoding them).

Treating category (3) the same as category (2) was more or less the 
status quo before the changes, but I think it is dangerous, especially 
when dealing with references that might come from remote sources (do you 
disagree?).


Regarding create_ref_entry(), there are three callers:

* read_packed_refs(): Only used to read references from local
   packed-refs files.  Seems uncritical in terms of security, so for now
   I suppose we could change this caller to emit a warning but use the
   reference anyway.  (It would not be a good idea to emit a warning but
   *not* use the reference, because the next time the packed-refs file
   is rewritten the reference would be lost.)

* get_ref_dir(): Only used to read loose references from the local
   filesystem.  Seems uncritical in terms of security, so for now
   I suppose we could change this caller to emit a warning but use the
   reference anyway.  Alternately, we could emit a warning but not use
   the reference; this would not result in any data loss because nobody
   would have a reason to remove the loose reference file.

* add_packed_ref(): Used by write_remote_refs() to insert references
   from a cloned repository into the local packed-refs file.  Here I
   think we have to be paranoid about accepting refnames in category
   (3).  For example, it might be reasonable to emit a warning
   mentioning the illegal reference name *and the SHA1 that it referred
   to* then to drop it on the floor.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] create_ref_entry(): move check_refname_format() call to callers
  2012-04-30 16:18   ` Michael Haggerty
@ 2012-04-30 17:14     ` Junio C Hamano
  2012-04-30 20:29       ` Michael Haggerty
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2012-04-30 17:14 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Jeff King, git

Michael Haggerty <mhagger@alum.mit.edu> writes:

> There are not two but three classes of refnames:
>
> 1. Fully legal refnames (according to the rules of check-ref-format).
>
> 2. Refnames that might cause parsing trouble in some circumstances but
> could otherwise be tolerated (with a warning) in the hope that the
> user can delete them before they cause further confusion.  These
> include all illegal refnames that aren't covered by (3).
>
> 3. Refnames that are so pathological that we don't want to let them
> into our code at all, under any circumstances.  This category, I
> think, includes refnames that include "/../" (because they could cause
> our code to walk up the filesystem) or LF (because if written to a
> packed-refs file they would make it unreadable) and perhaps "//"
> (because they would confuse the loose reference code and the
> hierarchical reference cache code).  And depending on how much you
> trust shell scripts to do quoting correctly, other patterns might also
> be risky.

I can buy that we would want to reject /../ or // from the command line
even when we are not creating such a ref (e.g. "git symbolic-ref HEAD
refs/heads/../heads/foo"). Note that even though we won't read any loose
ref of that form from the repository ever (unless your recent ref API
updates broke it ;-), we _could_ see them in .git/packed-refs if the user
manually edited the file. In such a case, I would say that recovering from
such a corruption is totally up to the user, just like it is up to you
after you edit the bits on the disk platter to confuse the filesystem
code.

But if we were to do so, class (3) needs to be tightly controlled. For
example, a ref "refs/heads/m${LF}aster", e.g.

    $ git update-ref 'refs/heads/m
    aster' HEAD

may have been usable with an ancient git, and as long as the you never
packed your refs, you should be able to see it with 'git branch', and you
should be able to fix it with 'git branch master "m${LF}aster"'.

> I would like to be able to distinguish between (2) and (3) before
> deciding what to do in any specific cases.  References in category (2)
> can probably be accepted (with a warning) in old data but we should
> not allow the user to create new ones.

There is no probably about it.  We should warn and accept---anything else
is a regression---and refuse to create.

> References in category (3) are
> more critical; I see three options for dealing with them: die(), emit
> a warning then drop them on the floor, or emit a warning but handle
> them somehow (for example, by URL-encoding them).

As long as class (3) is tightly controlled, die() until the corruption is
sorted out (if it is coming to us by reading from existing data) would be
the best option to avoid spreading the breakage further.

> Treating category (3) the same as category (2) was more or less the
> status quo before the changes, but I think it is dangerous, especially
> when dealing with references that might come from remote sources (do
> you disagree?).

I actually do. You say "parsing trouble" in (2), but what are the examples
of "parsing trouble" you have in mind?  "refs/heads/a..b" is an illegal
name per check-ref-format and is (and should be) in class (2), but as long
as we do not create such a ref, if we happen to be able to read it (maybe
"git rev-parse HEAD >.git/refs/heads/a..b" dropped it there), the user
should be able to see it with "git branch", and should be able to recover
with "git branch -m a..b a-dot-dot-b". Until the user does so, we cannot
sensibly respond to "git log a..b" (which is the "parsing trouble" you are
referring to, I think), of course, but that is to be expected.

What kind of "danger" do you have in mind?

> Regarding create_ref_entry(), there are three callers:
>
> * read_packed_refs(): Only used to read references from local
>   packed-refs files.  Seems uncritical in terms of security, so for now
>   I suppose we could change this caller to emit a warning but use the
>   reference anyway.

There is no _could_.  This must show it with a warning and let the caller
use its value.

> * get_ref_dir(): Only used to read loose references from the local
>   filesystem.

Likewise.

> * add_packed_ref(): Used by write_remote_refs() to insert references
>   from a cloned repository into the local packed-refs file.

The caller should be checking the names and rejecting the attempt to
create a ref with a bogus name.

I think create_ref_entry(), if the reader does not pay much attention to
the "_entry" part of its name, gives a false impression that this is about
"creating a ref", but it isn't.  It is actually about keeping an in-core
image of the outside world, and the one that affects the outside world
(either loose or packed refs) should be making sure the names are safe
ones to use.

Thanks for a good summary.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] create_ref_entry(): move check_refname_format() call to callers
  2012-04-30 17:14     ` Junio C Hamano
@ 2012-04-30 20:29       ` Michael Haggerty
  2012-04-30 21:11         ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Haggerty @ 2012-04-30 20:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git

On 04/30/2012 07:14 PM, Junio C Hamano wrote:
> Michael Haggerty<mhagger@alum.mit.edu>  writes:
>
>> There are not two but three classes of refnames:
>>
>> 1. Fully legal refnames (according to the rules of check-ref-format).
>>
>> 2. Refnames that might cause parsing trouble in some circumstances but
>> could otherwise be tolerated (with a warning) in the hope that the
>> user can delete them before they cause further confusion.  These
>> include all illegal refnames that aren't covered by (3).
>>
>> 3. Refnames that are so pathological that we don't want to let them
>> into our code at all, under any circumstances.  This category, I
>> think, includes refnames that include "/../" (because they could cause
>> our code to walk up the filesystem) or LF (because if written to a
>> packed-refs file they would make it unreadable) and perhaps "//"
>> (because they would confuse the loose reference code and the
>> hierarchical reference cache code).  And depending on how much you
>> trust shell scripts to do quoting correctly, other patterns might also
>> be risky.
>>[...]
>> Treating category (3) the same as category (2) was more or less the
>> status quo before the changes, but I think it is dangerous, especially
>> when dealing with references that might come from remote sources (do
>> you disagree?).
>
> I actually do. [...example involving class (2) refname omitted...]
>
> What kind of "danger" do you have in mind?

I agree with you about how class (2) refnames can be handled, namely on 
a best-effort basis, without paranoia.  The "danger" that I worry about 
is if we treat class (3) refnames the same non-paranoid way that we 
treat class (2) refnames.  To be sure, my worries are at the level of 
"it seems like a lot of data paths have to be trusted and I have not 
personally verified that there is no danger and I have a vivid 
imagination" rather than "I have figured out how to implement an exploit".

For example, have all of the following code paths been audited to make 
sure that they cannot introduce class (3) refnames into a repository 
(including via symbolic refs with class (3) targets) even in the face of 
a malicious remote?  Can we (and do we want to) rely on this level of 
vigilance being sustained in the future?

* git clone (for all transport types)?

* git push (for all transport types)?

* git fast-import?

* git am?

* All kinds of remote helpers (git svn, etc.) and import scripts (e.g., 
the malicious refs might come directly from a Subversion repository)?

...no doubt there are others.

What if a malicious repository is copied using rsync (i.e., not "git 
clone rsync:..." but straight rsync)?  Granted, the victim would have 
worse problems if he didn't delete any version of $GIT/config that was 
contained in the copy before using it.

Banning class (3) references at lib level would provide a second level 
of defense against errors in the outer level.  Allowing "class 3" 
references at the lib level, by comparison, means putting a lot of 
confidence in the auditing of all of the ref-creating code paths.

It's true that a loose reference cannot (by construction) contain 
constructs like "/../" or "//".  And packed refs are not stored via 
filesystem paths, so such constructs are not a direct hazard.  But there 
are lots of ways that a packed ref can be turned into a loose ref, and 
the process of creating a loose ref from a class (3) packed ref could 
cause trouble.

I propose that I implement a REFNAME_NONSTRICT flag which relaxes the 
normal rules as follows:

* ".." is only forbidden if it is a complete refname component

* All characters (including control characters, '?', '*', and '[') are 
allowed, except for NUL, SP, LF, ':', and '\'.

* The character sequence "@{" is *not* prohibited.

The reason I suggest excluding ':' and '\' is because of their 
significance in DOS pathnames; let me know if you think this is unnecessary.

Do you want these changes in master or in maint (or even older)?

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] create_ref_entry(): move check_refname_format() call to callers
  2012-04-30 20:29       ` Michael Haggerty
@ 2012-04-30 21:11         ` Junio C Hamano
  2012-05-02 15:38           ` Michael Haggerty
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2012-04-30 21:11 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Jeff King, git

Michael Haggerty <mhagger@alum.mit.edu> writes:

> For example, have all of the following code paths been audited to make
> sure that they cannot introduce class (3) refnames into a repository
> (including via symbolic refs with class (3) targets) even in the face
> of a malicious remote?  Can we (and do we want to) rely on this level
> of vigilance being sustained in the future?

Auditing is one thing, but perhaps the right solution to that issue is to
refactor the existing code so that we have only a handful (preferrably
one) API entry point that is used to create a new ref (not to be confused
with create_ref_entry(), which is not necessarily about creating a ref)?

The UI layer may place additional restrictions to the source data used to
eventually lead to a ref creation (e.g. your updated "git branch" may
forbid you to create a branch with the name of an existing tag, perhaps),
but after passing its check, the API to create a new ref will do mandatory
"check-ref-format" check.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] create_ref_entry(): move check_refname_format() call to callers
  2012-04-30 21:11         ` Junio C Hamano
@ 2012-05-02 15:38           ` Michael Haggerty
  0 siblings, 0 replies; 8+ messages in thread
From: Michael Haggerty @ 2012-05-02 15:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git

On 04/30/2012 11:11 PM, Junio C Hamano wrote:
> Michael Haggerty<mhagger@alum.mit.edu>  writes:
>
>> For example, have all of the following code paths been audited to make
>> sure that they cannot introduce class (3) refnames into a repository
>> (including via symbolic refs with class (3) targets) even in the face
>> of a malicious remote?  Can we (and do we want to) rely on this level
>> of vigilance being sustained in the future?
>
> Auditing is one thing, but perhaps the right solution to that issue is to
> refactor the existing code so that we have only a handful (preferrably
> one) API entry point that is used to create a new ref (not to be confused
> with create_ref_entry(), which is not necessarily about creating a ref)?

Yes, definitely.  And more broadly, I want refs.{c,h} to become the 
*only* mechanism for working with refs.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-05-02 15:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-29  6:18 [PATCH] create_ref_entry(): move check_refname_format() call to callers mhagger
2012-04-29 11:58 ` Jeff King
2012-04-30  6:15   ` Junio C Hamano
2012-04-30 16:18   ` Michael Haggerty
2012-04-30 17:14     ` Junio C Hamano
2012-04-30 20:29       ` Michael Haggerty
2012-04-30 21:11         ` Junio C Hamano
2012-05-02 15:38           ` Michael Haggerty

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.