All of lore.kernel.org
 help / color / mirror / Atom feed
* Alternates corruption issue
@ 2012-01-31 14:05 Richard Purdie
  2012-01-31 19:39 ` Jeff King
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Purdie @ 2012-01-31 14:05 UTC (permalink / raw)
  To: GIT Mailing-list; +Cc: Hart, Darren, Ashfield, Bruce

I have a problem with git clone commands using alternates failing by
mixing up different repositories. I have a situation where I could end
up with both:

/srv/mirrors/repo
/srv/mirrors/repo.git

as bare clones.

I then try cloning "repo" with alternates with the command:

$ git clone -s -n /srv/mirrors/repo /tmp/foo
Cloning into /tmp/foo...
done.

$ cat /tmp/foo/.git/objects/info/alternates
/srv/mirrors/repo.git/objects

Note how I'm now referencing repo.git, not repo. This doesn't work as
expected giving some very bizarre results when actually using the
repository.

I appreciate this is a rather bizarre corner case but its one that is
breaking the build system I work with. Ideally people would use a
consistent URL for the same repository but we have an example where they
haven't and this really shouldn't break like this.

Looking at the code, the cause seems to be

clone.c:get_repo_path():

	static char *suffix[] = { "/.git", ".git", "" };

since its looking in order for:
 repo/.git (fails)
 repo.git (suceeds, incorrect)
 repo (never looked at)

I'm not sure what would break if that order were to change, swapping the
last two options.

I can "force" the issue by running:

git clone -s -n /srv/mirrors/repo/ /tmp/foo

but this results in the slightly odd looking:

$ cat /tmp/foo/.git/objects/info/alternates
/srv/mirrors/repo//objects

which does at least work.

Any idea if its possible to fix the root cause of this problem?

Cheers,

Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Alternates corruption issue
  2012-01-31 14:05 Alternates corruption issue Richard Purdie
@ 2012-01-31 19:39 ` Jeff King
  2012-01-31 20:25   ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff King @ 2012-01-31 19:39 UTC (permalink / raw)
  To: Richard Purdie; +Cc: GIT Mailing-list, Hart, Darren, Ashfield, Bruce

On Tue, Jan 31, 2012 at 02:05:29PM +0000, Richard Purdie wrote:

> I have a problem with git clone commands using alternates failing by
> mixing up different repositories. I have a situation where I could end
> up with both:
> 
> /srv/mirrors/repo
> /srv/mirrors/repo.git

I think the alternates thing is just a side effect. The real problem is
that clone is picking the wrong repo to clone from! So a simple complete
test would be something like:

    # make confusing bare repos
    git init --bare bare &&
    git init --bare bare.git &&

    # now make some commits; after this
    # segment, "bare" should have its tip
    # at commit "one", and "bare.git" should
    # be at "two".
    git init source &&
    (cd source &&
     echo content >file &&
     git add file &&
     git commit -m one &&
     git push ../bare HEAD &&
     echo content >>file &&
     git commit -a -m two &&
     git push ../bare.git HEAD
    ) &&

    # now try to clone bare and see which one we get;
    # we would expect the tip to be at "one", but it's not
    git clone bare clone &&
    cd clone &&
    echo one >expect &&
    git log -1 --format=%s >actual &&
    diff -u expect actual

    # we cloned the wrong repository!

> Looking at the code, the cause seems to be
> 
> clone.c:get_repo_path():
> 
> 	static char *suffix[] = { "/.git", ".git", "" };
> 
> since its looking in order for:
>  repo/.git (fails)
>  repo.git (suceeds, incorrect)
>  repo (never looked at)

Yeah, this seems brain-dead. It's good that git looks through a magic
list to guess what I meant, but shouldn't the first thing it tries be
the literal string that I gave, and only _then_ kick in the magic?

But even weirder, this doesn't match the lookup in other parts of git. I
see four distinct places in git where we do repository lookup:

  1. get_repo_path used by clone (i.e., this)

  2. enter_repo used by upload-pack, receive-pack, and git-daemon; this
     looks first for ".git/.git", to find a non-bare repo which is named
     like a bare one.

  3. setup_git_directory_*, used by most git programs. This starts
     looking in the current directory and going up, looking first for
     ".git", and then checking for a bare repo. So if you were to do
     "cd bare && git foo", then we would correctly find the current
     directory, not "bare.git".

  4. setup_explicit_git_dir, which gets called if you use --git-dir or
     set GIT_DIR. This expects to find the repo directly in the
     directory you specified, with no magic.

So I think (3) and (4) are sensible, but I tend to think that (1) and
(2) are wrong, and should prefer the directory you gave before the
magic.

> I'm not sure what would break if that order were to change, swapping the
> last two options.

I believe that would work in your case, but it seems like the most
correct thing would actually be:

  { "", "/.git", ".git" }

That is:

  1. Try the literal path the user gave as a repo

  2. Otherwise, try it as the root of a working tree (containing .git)

  3. Otherwise, assume they were too lazy to type ".git" and include it

If you simply swap the last two, then there is one obscure case you
would miss: a bare repo which contains a .git directory. Of course, that
seems like an insane thing to have, but if you did have it, I would
expect git to follow the rules I outlined above. That being said,
if you follow the rules in setup_git_directory_gently_1, it actually
prefers "/.git" to a bare repo. So probably your swap is better, as it
matches the lookup here with the regular setup_git_directory one.

One slight complication is that get_repo_path only checks whether the
paths exist, not whether they are actually git repositories. So by
simply swapping the order, you are breaking some other obscure cases,
like where a "foo" directory exists but is _not_ a git repo, and a
"foo.git" exists alongside it (which sounds a bit crazy, but I could see
somebody using that scheme if they wanted didn't want the git repository
to live inside the worktree for some reason).

One way of dealing with that would be to make get_repo_path a little
more robust by only selecting paths which actually look like git
directories.

The patch below does this, and makes my test above pass. But I think
before talking too much about code, it would be best to decide on what
the lookup order _should_ be, and then we can write tests and implement
to them.

-Peff

---
diff --git a/builtin/clone.c b/builtin/clone.c
index 9084feb..0fbbae9 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -108,7 +108,7 @@ static const char *argv_submodule[] = {
 
 static char *get_repo_path(const char *repo, int *is_bundle)
 {
-	static char *suffix[] = { "/.git", ".git", "" };
+	static char *suffix[] = { "/.git", "", ".git" };
 	static char *bundle_suffix[] = { ".bundle", "" };
 	struct stat st;
 	int i;
@@ -118,7 +118,7 @@ static char *get_repo_path(const char *repo, int *is_bundle)
 		path = mkpath("%s%s", repo, suffix[i]);
 		if (stat(path, &st))
 			continue;
-		if (S_ISDIR(st.st_mode)) {
+		if (S_ISDIR(st.st_mode) && is_git_directory(path)) {
 			*is_bundle = 0;
 			return xstrdup(absolute_path(path));
 		} else if (S_ISREG(st.st_mode) && st.st_size > 8) {
diff --git a/cache.h b/cache.h
index 9bd8c2d..7b2857f 100644
--- a/cache.h
+++ b/cache.h
@@ -436,6 +436,7 @@ extern char *get_object_directory(void);
 extern char *get_index_file(void);
 extern char *get_graft_file(void);
 extern int set_git_dir(const char *path);
+extern int is_git_directory(const char *path);
 extern const char *get_git_namespace(void);
 extern const char *strip_namespace(const char *namespaced_ref);
 extern const char *get_git_work_tree(void);
diff --git a/path.c b/path.c
index b6f71d1..1ca6567 100644
--- a/path.c
+++ b/path.c
@@ -293,7 +293,7 @@ const char *enter_repo(const char *path, int strict)
 
 	if (!strict) {
 		static const char *suffix[] = {
-			".git/.git", "/.git", ".git", "", NULL,
+			"/.git", "", ".git/.git", ".git", NULL,
 		};
 		const char *gitfile;
 		int len = strlen(path);
@@ -324,8 +324,11 @@ const char *enter_repo(const char *path, int strict)
 			return NULL;
 		len = strlen(used_path);
 		for (i = 0; suffix[i]; i++) {
+			struct stat st;
 			strcpy(used_path + len, suffix[i]);
-			if (!access(used_path, F_OK)) {
+			if (!stat(used_path, &st) &&
+			    (S_ISREG(st.st_mode) ||
+			     (S_ISDIR(st.st_mode) && is_git_directory(used_path)))) {
 				strcat(validated_path, suffix[i]);
 				break;
 			}
diff --git a/setup.c b/setup.c
index 61c22e6..7a3618f 100644
--- a/setup.c
+++ b/setup.c
@@ -247,7 +247,7 @@ const char **get_pathspec(const char *prefix, const char **pathspec)
  *    a proper "ref:", or a regular file HEAD that has a properly
  *    formatted sha1 object name.
  */
-static int is_git_directory(const char *suspect)
+int is_git_directory(const char *suspect)
 {
 	char path[PATH_MAX];
 	size_t len = strlen(suspect);

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Alternates corruption issue
  2012-01-31 19:39 ` Jeff King
@ 2012-01-31 20:25   ` Junio C Hamano
  2012-01-31 20:44     ` Jeff King
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2012-01-31 20:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Richard Purdie, GIT Mailing-list, Hart, Darren, Ashfield, Bruce

Jeff King <peff@peff.net> writes:

> I believe that would work in your case, but it seems like the most
> correct thing would actually be:
>
>   { "", "/.git", ".git" }
>
> That is:
>
>   1. Try the literal path the user gave as a repo
>
>   2. Otherwise, try it as the root of a working tree (containing .git)
>
>   3. Otherwise, assume they were too lazy to type ".git" and include it

That sounds sensible, together with this, to which I agree with:

> One way of dealing with that would be to make get_repo_path a little
> more robust by only selecting paths which actually look like git
> directories.

... but ...

> diff --git a/builtin/clone.c b/builtin/clone.c
> index 9084feb..0fbbae9 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -108,7 +108,7 @@ static const char *argv_submodule[] = {
>  
>  static char *get_repo_path(const char *repo, int *is_bundle)
>  {
> -	static char *suffix[] = { "/.git", ".git", "" };
> +	static char *suffix[] = { "/.git", "", ".git" };

... this does not match that simple and clear guideline.

Shouldn't this simply be { "", "/.git", ".git" }?

> diff --git a/path.c b/path.c
> index b6f71d1..1ca6567 100644
> --- a/path.c
> +++ b/path.c
> @@ -293,7 +293,7 @@ const char *enter_repo(const char *path, int strict)
>  
>  	if (!strict) {
>  		static const char *suffix[] = {
> -			".git/.git", "/.git", ".git", "", NULL,
> +			"/.git", "", ".git/.git", ".git", NULL,
>  		};

Neither does this.

Shouldn't this be { "", "/.git", ".git", ".git/.git", NULL }?

I must be missing something from your description...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Alternates corruption issue
  2012-01-31 20:25   ` Junio C Hamano
@ 2012-01-31 20:44     ` Jeff King
  2012-01-31 21:40       ` Jonathan Nieder
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff King @ 2012-01-31 20:44 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Richard Purdie, GIT Mailing-list, Hart, Darren, Ashfield, Bruce

On Tue, Jan 31, 2012 at 12:25:45PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > I believe that would work in your case, but it seems like the most
> > correct thing would actually be:
> >
> >   { "", "/.git", ".git" }
> >
> > That is:
> >
> >   1. Try the literal path the user gave as a repo
> >
> >   2. Otherwise, try it as the root of a working tree (containing .git)
> >
> >   3. Otherwise, assume they were too lazy to type ".git" and include it
> 
> That sounds sensible, together with this, to which I agree with:
> [...]
> ... but ...
> [...]
> > -	static char *suffix[] = { "/.git", ".git", "" };
> > +	static char *suffix[] = { "/.git", "", ".git" };
> 
> ... this does not match that simple and clear guideline.
> 
> Shouldn't this simply be { "", "/.git", ".git" }?

No, it does not match. While the sequence I outlined above makes the
most sense to me, it does not match what setup_git_directory does, which
prefers "foo/.git" to using "foo" as a bare repo. I think being
consistent between all of the lookup points makes sense. The patch took
the least-invasive approach and aligned clone and enter_repo with
setup_git_directory.

However, we could also tweak setup_git_directory to prefer bare repos
over ".git" to keep things consistent. While it makes me feel good from
a theoretical standpoint (because the rules above seem simple and
intuitive to me), I'm not sure it's a good idea in practice.

The case we would "catch" with such a change is when you have a ".git"
directory inside a bare repo. Right now, we prefer the ".git" inside it,
and ignore the containing bare repo. With such a change, we would prefer
the outer bare repo. Which makes sense to me. It does break a use
case like this:

  # make a new repo
  git init
  cd .git
  # now track parts of the repo
  git init
  git add config
  git commit -m 'tracking our repo config'

but I'm not sure if that is sane or not.

But also consider false positives. What if you have a repository that
looks like a git repo (i.e., has "objects", "refs", and "HEAD" in it),
but also has ".git". Right now we say "Oh, it has .git, that must be the
repo". But with the proposed change, we could accidentally find the
enclosing repo.

Now, the chances of is_git_directory being wrong seem quite slim to me.
But then, the chances of somebody actually have a repo with a ".git"
_inside_ it seem pretty slim to me. So I think we are really dealing
with a tiny corner case, and it is perhaps anybody's guess whether
anyone is depending on the current behavior in the wild. I don't overly
care either way, but when in doubt, I tend to stick with the existing
behavior.

> >  	if (!strict) {
> >  		static const char *suffix[] = {
> > -			".git/.git", "/.git", ".git", "", NULL,
> > +			"/.git", "", ".git/.git", ".git", NULL,
> >  		};
> 
> Neither does this.
> 
> Shouldn't this be { "", "/.git", ".git", ".git/.git", NULL }?

Right, same case.

> I must be missing something from your description...

I mentioned the issue in my original message, but perhaps didn't
emphasize it very well.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Alternates corruption issue
  2012-01-31 20:44     ` Jeff King
@ 2012-01-31 21:40       ` Jonathan Nieder
  2012-01-31 21:47         ` Jeff King
  0 siblings, 1 reply; 19+ messages in thread
From: Jonathan Nieder @ 2012-01-31 21:40 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Richard Purdie, GIT Mailing-list, Hart, Darren,
	Ashfield, Bruce

Jeff King wrote:

> No, it does not match. While the sequence I outlined above makes the
> most sense to me, it does not match what setup_git_directory does, which
> prefers "foo/.git" to using "foo" as a bare repo. I think being
> consistent between all of the lookup points makes sense. The patch took
> the least-invasive approach and aligned clone and enter_repo with
> setup_git_directory.
>
> However, we could also tweak setup_git_directory to prefer bare repos
> over ".git" to keep things consistent. While it makes me feel good from
> a theoretical standpoint (because the rules above seem simple and
> intuitive to me), I'm not sure it's a good idea in practice.

Wait, don't these two functions serve two completely different purposes?

One is the implementation of (A):

	cd foo
	git rev-parse --git-dir

The other implements (B):

	git ls-remote foo

If "foo" is actually a bare repository that moonlights as a worktree for
a non-bare repository, then:

 1) Whoever set up these directories is completely insane[*].  Maybe we
    should emit a warning.

 2) As a naive user, I would expect (A) to give a different result
    from (B).

Hope that helps,
Jonathan

[*] ok, ok, they can be confused instead of insane:
http://bugs.debian.org/399041

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Alternates corruption issue
  2012-01-31 21:40       ` Jonathan Nieder
@ 2012-01-31 21:47         ` Jeff King
  2012-01-31 21:55           ` Jonathan Nieder
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff King @ 2012-01-31 21:47 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Junio C Hamano, Richard Purdie, GIT Mailing-list, Hart, Darren,
	Ashfield, Bruce

On Tue, Jan 31, 2012 at 03:40:47PM -0600, Jonathan Nieder wrote:

> Jeff King wrote:
> 
> > No, it does not match. While the sequence I outlined above makes the
> > most sense to me, it does not match what setup_git_directory does, which
> > prefers "foo/.git" to using "foo" as a bare repo. I think being
> > consistent between all of the lookup points makes sense. The patch took
> > the least-invasive approach and aligned clone and enter_repo with
> > setup_git_directory.
> >
> > However, we could also tweak setup_git_directory to prefer bare repos
> > over ".git" to keep things consistent. While it makes me feel good from
> > a theoretical standpoint (because the rules above seem simple and
> > intuitive to me), I'm not sure it's a good idea in practice.
> 
> Wait, don't these two functions serve two completely different purposes?

Yes, but I would expect the lookup to be the same.

> One is the implementation of (A):
> 
> 	cd foo
> 	git rev-parse --git-dir
> 
> The other implements (B):
> 
> 	git ls-remote foo

Right. But would you expect:

  git ls-remote foo

to behave the same as:

  cd foo
  git ls-remote .

and would you furthermore expect that to behave the same as:

  cd foo
  git rev-parse --git-dir

?

Maybe I am crazy, but I see all of them as ways of specifying "I am
interested in repository foo", and any lookup magic should be the same
in all cases.

> If "foo" is actually a bare repository that moonlights as a worktree for
> a non-bare repository, then:
> 
>  1) Whoever set up these directories is completely insane[*].  Maybe we
>     should emit a warning.

Yes, I think we are dealing with an insane corner case.

>  2) As a naive user, I would expect (A) to give a different result
>     from (B).

Why?

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Alternates corruption issue
  2012-01-31 21:47         ` Jeff King
@ 2012-01-31 21:55           ` Jonathan Nieder
  2012-01-31 22:05             ` Jeff King
  0 siblings, 1 reply; 19+ messages in thread
From: Jonathan Nieder @ 2012-01-31 21:55 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Richard Purdie, GIT Mailing-list, Hart, Darren,
	Ashfield, Bruce

Jeff King wrote:

> Right. But would you expect:
>
>   git ls-remote foo
>
> to behave the same as:
>
>   cd foo
>   git ls-remote .
>
> and would you furthermore expect that to behave the same as:
>
>   cd foo
>   git rev-parse --git-dir
>
> ?

Nah.  I never run "git ls-remote .". ;-)

[...]
>>  2) As a naive user, I would expect (A) to give a different result
>>     from (B).
>
> Why?

Normally git commands expect me to be in (possibly a deeply nested
subdirectory) of the worktree.  The typical case is a non-bare
repository.  I have been taught that it walks to the toplevel and
finds a ".git" directory.

By contrast, the path passed to git transport commands like "git fetch
otherhost:/foo/bar/baz.git" is a path to the git repository metadata.
I am not allowed to pass a path to a subdirectory, for example.

As a user, when would the distinction ever come up?  One is a fuzzy
"which repository am I working in", and the other is a precise "point
me to the metadata directory, but I allow you to abbreviate a little
by leaving out the trailing .git".

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Alternates corruption issue
  2012-01-31 21:55           ` Jonathan Nieder
@ 2012-01-31 22:05             ` Jeff King
  2012-01-31 22:22               ` Jonathan Nieder
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff King @ 2012-01-31 22:05 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Junio C Hamano, Richard Purdie, GIT Mailing-list, Hart, Darren,
	Ashfield, Bruce

On Tue, Jan 31, 2012 at 03:55:01PM -0600, Jonathan Nieder wrote:

> >>  2) As a naive user, I would expect (A) to give a different result
> >>     from (B).
> >
> > Why?
> 
> Normally git commands expect me to be in (possibly a deeply nested
> subdirectory) of the worktree.  The typical case is a non-bare
> repository.  I have been taught that it walks to the toplevel and
> finds a ".git" directory.
> 
> By contrast, the path passed to git transport commands like "git fetch
> otherhost:/foo/bar/baz.git" is a path to the git repository metadata.
> I am not allowed to pass a path to a subdirectory, for example.

True. But I consider that to make the walk-backwards-from-pwd case
simply a superset. That is, in (A) we are walking backwards and trying
to apply the lookup rule from (B) individually to each directory we
consider (though even that is not entirely true, as we don't look for
parallel "$PWD.git" directories in the walk).

I'll admit I don't care that much, though. This is extremely unlikely to
come up. The real issue is fixing the fact that we prefer "foo.git" to
"foo", even when the user told us "foo". I am content to leave the rest
of it as-is, which is what my original patch did.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Alternates corruption issue
  2012-01-31 22:05             ` Jeff King
@ 2012-01-31 22:22               ` Jonathan Nieder
  2012-01-31 22:42                 ` Jeff King
  2012-02-02 21:59                 ` Jeff King
  0 siblings, 2 replies; 19+ messages in thread
From: Jonathan Nieder @ 2012-01-31 22:22 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Richard Purdie, GIT Mailing-list, Hart, Darren,
	Ashfield, Bruce

Jeff King wrote:

> True. But I consider that to make the walk-backwards-from-pwd case
> simply a superset. That is, in (A) we are walking backwards and trying
> to apply the lookup rule from (B) individually to each directory we
> consider (though even that is not entirely true, as we don't look for
> parallel "$PWD.git" directories in the walk).

That parenthesis is important.  Although in a nicer world maybe we
would want some symmetry like this, (A) and (B) really have nothing to
do with each other.

Forgetting the algorithms:

 (A) means "find what repository we are in the worktree for, or,
 barring that, what bare repository we are in".

 (B) means "find which repository the user pointed to.  To be extra
 nice, we allow a '.git' extension to be left out, so the URL used
 doesn't have to include the redundant information that this is a
 git repository, and we even allow pointing to the toplevel of a
 worktree instead of a repository, too."

Notice that the above description does not exactly match the actual
behavior of git.  For example, if someone has a directory layout like
this:

  repo-manipulator/
     .git/
     src/
     testcases/
       repo1.git/
       repo2.git/

and git is run from repo1.git, according to the description above, the
naive user _wanted_ git commands to apply to the toplevel repository.
And in practice, I think that's often true, though changing the
behavior of git to match that would not be worth the downsides.

> I'll admit I don't care that much, though. This is extremely unlikely to
> come up.

I admit part of the reason I care is that just putting "" first would
probably taken care of the more important part of
<http://bugs.debian.org/399041>.

Anyway, thanks for explaining.  Hopefully I can get to this soon and
factor out a common function for get_repo_path and enter_repo to call
so playing with the ordering becomes a little less scary. ;-)

Jonathan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Alternates corruption issue
  2012-01-31 22:22               ` Jonathan Nieder
@ 2012-01-31 22:42                 ` Jeff King
  2012-01-31 22:59                   ` Jonathan Nieder
  2012-02-02 21:59                 ` Jeff King
  1 sibling, 1 reply; 19+ messages in thread
From: Jeff King @ 2012-01-31 22:42 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Junio C Hamano, Richard Purdie, GIT Mailing-list, Hart, Darren,
	Ashfield, Bruce

On Tue, Jan 31, 2012 at 04:22:58PM -0600, Jonathan Nieder wrote:

> Forgetting the algorithms:
> 
>  (A) means "find what repository we are in the worktree for, or,
>  barring that, what bare repository we are in".
> 
>  (B) means "find which repository the user pointed to.  To be extra
>  nice, we allow a '.git' extension to be left out, so the URL used
>  doesn't have to include the redundant information that this is a
>  git repository, and we even allow pointing to the toplevel of a
>  worktree instead of a repository, too."

I can see that as a plausible interpretation. But I think your "means"
is kind of vague.  In other words, we are assigning a meaning that is
not actually documented anywhere, and which we are hoping matches user
expectations. So the correct behavior kind of depends on your mental
model. IOW, I find your mental model plausible, but it is not the one
that I had.

But like I said, I'm OK either way. I'll prepare this fix with the least
behavior change possible. Then doing your change on top should be a
trivial rearrangement of the lookup arrays (the "hard" part for your
proposed change is adding in the is_git_directory() calls[1], but I'll
do that as part of mine).

> I admit part of the reason I care is that just putting "" first would
> probably taken care of the more important part of
> <http://bugs.debian.org/399041>.

Would that fix it? If I understand it, the repo in question is bare with
a ".git" directory inside it. Your proposed change would make "git
ls-remote foo" work, but wouldn't "cd foo && git log" still be broken?
At the very least, it would show a completely different repo than
ls-remote.

-Peff

[1] Without the is_git_directory() check, the ordering you propose
    breaks things very badly. Because if I point to a non-bare directory
    "foo", we say "OK, foo exists" and never look at "foo/.git". So
    instead, we have to say "OK, foo exists, but hey, it looks nothing
    like a .git directory" and keep trying.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Alternates corruption issue
  2012-01-31 22:42                 ` Jeff King
@ 2012-01-31 22:59                   ` Jonathan Nieder
  0 siblings, 0 replies; 19+ messages in thread
From: Jonathan Nieder @ 2012-01-31 22:59 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Richard Purdie, GIT Mailing-list, Hart, Darren,
	Ashfield, Bruce

Jeff King wrote:
> On Tue, Jan 31, 2012 at 04:22:58PM -0600, Jonathan Nieder wrote:

>> I admit part of the reason I care is that just putting "" first would
>> probably taken care of the more important part of
>> <http://bugs.debian.org/399041>.
>
> Would that fix it? If I understand it, the repo in question is bare with
> a ".git" directory inside it.

The layout was foo/.git/.git.  The real repository was made with plain
"git init" or "git clone", and then "git svn init" or similar was run
from within .git, creating a .git subdirectory.  (Something more must
have happened, actually, since the tree was weirdly sparse:

	$ find .git/.git
	.git/.git
	.git/.git/svn
	.git/.git/svn/git-svn
	.git/.git/svn/git-svn/.rev_db

)

In the current scheme, to access such a broken repository remotely,
you have to use a URL ".../foo".  URLs pointing to foo/.git (like gitk
once used) end up rewritten as .git/.git.  Of course, your patch with
the is_git_directory() will fix this specific case. :)

Not a huge deal, especially since "git svn init" prints out where it
is writing these days.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Alternates corruption issue
  2012-01-31 22:22               ` Jonathan Nieder
  2012-01-31 22:42                 ` Jeff King
@ 2012-02-02 21:59                 ` Jeff King
  2012-02-03  0:47                   ` Junio C Hamano
  2012-02-03 14:40                   ` Richard Purdie
  1 sibling, 2 replies; 19+ messages in thread
From: Jeff King @ 2012-02-02 21:59 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Junio C Hamano, Richard Purdie, GIT Mailing-list, Hart, Darren,
	Ashfield, Bruce

On Tue, Jan 31, 2012 at 04:22:58PM -0600, Jonathan Nieder wrote:

> Anyway, thanks for explaining.  Hopefully I can get to this soon and
> factor out a common function for get_repo_path and enter_repo to call
> so playing with the ordering becomes a little less scary. ;-)

So here's what I think we should apply to fix the particular issue that
Richard mentioned at the start of this thread.

Besides tweaking the ordering, the main contribution is a set of tests
that actually check some of these ambiguous cases (especially checking
the fact that both code paths behave identically!). I didn't factor the
logic into a common function, but doing so should be a little safer on
top of these tests, if you're still interested.

-- >8 --
Subject: standardize and improve lookup rules for external local repos

When you specify a local repository on the command line of
clone, ls-remote, upload-pack, receive-pack, or upload-archive,
or in a request to git-daemon, we perform a little bit of
lookup magic, doing things like looking in working trees for
.git directories and appending ".git" for bare repos.

For clone, this magic happens in get_repo_path. For
everything else, it happens in enter_repo. In both cases,
there are some ambiguous or confusing cases that aren't
handled well, and there is one case that is not handled the
same by both methods.

This patch tries to provide (and test!) standard, sensible
lookup rules for both code paths. The intended changes are:

  1. When looking up "foo", we have always preferred
     a working tree "foo" (containing "foo/.git" over the
     bare "foo.git". But we did not prefer a bare "foo" over
     "foo.git". With this patch, we do so.

  2. We would select directories that existed but didn't
     actually look like git repositories. With this patch,
     we make sure a selected directory looks like a git
     repo. Not only is this more sensible in general, but it
     will help anybody who is negatively affected by change
     (1) negatively (e.g., if they had "foo.git" next to its
     separate work tree "foo", and expect to keep finding
     "foo.git" when they reference "foo").

  3. The enter_repo code path would, given "foo", look for
     "foo.git/.git" (i.e., do the ".git" append magic even
     for a repo with working tree). The clone code path did
     not; with this patch, they now behave the same.

In the unlikely case of a working tree overlaying a bare
repo (i.e., a ".git" directory _inside_ a bare repo), we
continue to treat it as a working tree (prefering the
"inner" .git over the bare repo). This is mainly because the
combination seems nonsensical, and I'd rather stick with
existing behavior on the off chance that somebody is relying
on it.

Signed-off-by: Jeff King <peff@peff.net>
---
I could go either way with the "prefer foo to foo/.git" thing we've been
discussing. I left it as-is here out of a sense of being conservative.
But I have no objection at all to you doing a patch on top that
justifies the change. You'll just need to tweak the "check" line of the
final test.

 builtin/clone.c           |    4 +-
 cache.h                   |    1 +
 path.c                    |    7 ++-
 setup.c                   |    2 +-
 t/t5900-repo-selection.sh |  100 +++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 109 insertions(+), 5 deletions(-)
 create mode 100755 t/t5900-repo-selection.sh

diff --git a/builtin/clone.c b/builtin/clone.c
index c62d4b5..637611a 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -107,7 +107,7 @@ static const char *argv_submodule[] = {
 
 static char *get_repo_path(const char *repo, int *is_bundle)
 {
-	static char *suffix[] = { "/.git", ".git", "" };
+	static char *suffix[] = { "/.git", "", ".git/.git", ".git" };
 	static char *bundle_suffix[] = { ".bundle", "" };
 	struct stat st;
 	int i;
@@ -117,7 +117,7 @@ static char *get_repo_path(const char *repo, int *is_bundle)
 		path = mkpath("%s%s", repo, suffix[i]);
 		if (stat(path, &st))
 			continue;
-		if (S_ISDIR(st.st_mode)) {
+		if (S_ISDIR(st.st_mode) && is_git_directory(path)) {
 			*is_bundle = 0;
 			return xstrdup(absolute_path(path));
 		} else if (S_ISREG(st.st_mode) && st.st_size > 8) {
diff --git a/cache.h b/cache.h
index 9bd8c2d..ae0396b 100644
--- a/cache.h
+++ b/cache.h
@@ -432,6 +432,7 @@ extern char *git_work_tree_cfg;
 extern int is_inside_work_tree(void);
 extern int have_git_dir(void);
 extern const char *get_git_dir(void);
+extern int is_git_directory(const char *path);
 extern char *get_object_directory(void);
 extern char *get_index_file(void);
 extern char *get_graft_file(void);
diff --git a/path.c b/path.c
index b6f71d1..6f2aa69 100644
--- a/path.c
+++ b/path.c
@@ -293,7 +293,7 @@ const char *enter_repo(const char *path, int strict)
 
 	if (!strict) {
 		static const char *suffix[] = {
-			".git/.git", "/.git", ".git", "", NULL,
+			"/.git", "", ".git/.git", ".git", NULL,
 		};
 		const char *gitfile;
 		int len = strlen(path);
@@ -324,8 +324,11 @@ const char *enter_repo(const char *path, int strict)
 			return NULL;
 		len = strlen(used_path);
 		for (i = 0; suffix[i]; i++) {
+			struct stat st;
 			strcpy(used_path + len, suffix[i]);
-			if (!access(used_path, F_OK)) {
+			if (!stat(used_path, &st) &&
+			    (S_ISREG(st.st_mode) ||
+			    (S_ISDIR(st.st_mode) && is_git_directory(used_path)))) {
 				strcat(validated_path, suffix[i]);
 				break;
 			}
diff --git a/setup.c b/setup.c
index 61c22e6..7a3618f 100644
--- a/setup.c
+++ b/setup.c
@@ -247,7 +247,7 @@ const char **get_pathspec(const char *prefix, const char **pathspec)
  *    a proper "ref:", or a regular file HEAD that has a properly
  *    formatted sha1 object name.
  */
-static int is_git_directory(const char *suspect)
+int is_git_directory(const char *suspect)
 {
 	char path[PATH_MAX];
 	size_t len = strlen(suspect);
diff --git a/t/t5900-repo-selection.sh b/t/t5900-repo-selection.sh
new file mode 100755
index 0000000..3d5b418
--- /dev/null
+++ b/t/t5900-repo-selection.sh
@@ -0,0 +1,100 @@
+#!/bin/sh
+
+test_description='selecting remote repo in ambiguous cases'
+. ./test-lib.sh
+
+reset() {
+	rm -rf foo foo.git fetch clone
+}
+
+make_tree() {
+	git init "$1" &&
+	(cd "$1" && test_commit "$1")
+}
+
+make_bare() {
+	git init --bare "$1" &&
+	(cd "$1" &&
+	 tree=`git hash-object -w -t tree /dev/null` &&
+	 commit=$(echo "$1" | git commit-tree $tree) &&
+	 git update-ref HEAD $commit
+	)
+}
+
+get() {
+	git init --bare fetch &&
+	(cd fetch && git fetch "../$1") &&
+	git clone "$1" clone
+}
+
+check() {
+	echo "$1" >expect &&
+	(cd fetch && git log -1 --format=%s FETCH_HEAD) >actual.fetch &&
+	(cd clone && git log -1 --format=%s HEAD) >actual.clone &&
+	test_cmp expect actual.fetch &&
+	test_cmp expect actual.clone
+}
+
+test_expect_success 'find .git dir in worktree' '
+	reset &&
+	make_tree foo &&
+	get foo &&
+	check foo
+'
+
+test_expect_success 'automagically add .git suffix' '
+	reset &&
+	make_bare foo.git &&
+	get foo &&
+	check foo.git
+'
+
+test_expect_success 'automagically add .git suffix to worktree' '
+	reset &&
+	make_tree foo.git &&
+	get foo &&
+	check foo.git
+'
+
+test_expect_success 'prefer worktree foo over bare foo.git' '
+	reset &&
+	make_tree foo &&
+	make_bare foo.git &&
+	get foo &&
+	check foo
+'
+
+test_expect_success 'prefer bare foo over bare foo.git' '
+	reset &&
+	make_bare foo &&
+	make_bare foo.git &&
+	get foo &&
+	check foo
+'
+
+test_expect_success 'disambiguate with full foo.git' '
+	reset &&
+	make_bare foo &&
+	make_bare foo.git &&
+	get foo.git &&
+	check foo.git
+'
+
+test_expect_success 'we are not fooled by non-git foo directory' '
+	reset &&
+	make_bare foo.git &&
+	mkdir foo &&
+	get foo &&
+	check foo.git
+'
+
+test_expect_success 'prefer inner .git over outer bare' '
+	reset &&
+	make_tree foo &&
+	make_bare foo.git &&
+	mv foo/.git foo.git &&
+	get foo.git &&
+	check foo
+'
+
+test_done
-- 
1.7.9.rc1.28.gf4be5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Alternates corruption issue
  2012-02-02 21:59                 ` Jeff King
@ 2012-02-03  0:47                   ` Junio C Hamano
  2012-02-03 12:02                     ` Jeff King
  2012-02-03 14:40                   ` Richard Purdie
  1 sibling, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2012-02-03  0:47 UTC (permalink / raw)
  To: Jeff King
  Cc: Jonathan Nieder, Richard Purdie, GIT Mailing-list, Hart, Darren,
	Ashfield, Bruce

Jeff King <peff@peff.net> writes:

> @@ -324,8 +324,11 @@ const char *enter_repo(const char *path, int strict)
>  			return NULL;
>  		len = strlen(used_path);
>  		for (i = 0; suffix[i]; i++) {
> +			struct stat st;
>  			strcpy(used_path + len, suffix[i]);
> -			if (!access(used_path, F_OK)) {
> +			if (!stat(used_path, &st) &&
> +			    (S_ISREG(st.st_mode) ||
> +			    (S_ISDIR(st.st_mode) && is_git_directory(used_path)))) {

Hmm, how would this change interact with

>  				strcat(validated_path, suffix[i]);
>  				break;
>  			}

	gitfile = read_gitfile(used_path);

that appear after the context in the patch?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Alternates corruption issue
  2012-02-03  0:47                   ` Junio C Hamano
@ 2012-02-03 12:02                     ` Jeff King
  2012-02-03 17:38                       ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff King @ 2012-02-03 12:02 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jonathan Nieder, Richard Purdie, GIT Mailing-list, Hart, Darren,
	Ashfield, Bruce

On Thu, Feb 02, 2012 at 04:47:31PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > @@ -324,8 +324,11 @@ const char *enter_repo(const char *path, int strict)
> >  			return NULL;
> >  		len = strlen(used_path);
> >  		for (i = 0; suffix[i]; i++) {
> > +			struct stat st;
> >  			strcpy(used_path + len, suffix[i]);
> > -			if (!access(used_path, F_OK)) {
> > +			if (!stat(used_path, &st) &&
> > +			    (S_ISREG(st.st_mode) ||
> > +			    (S_ISDIR(st.st_mode) && is_git_directory(used_path)))) {
> 
> Hmm, how would this change interact with
> 
> >  				strcat(validated_path, suffix[i]);
> >  				break;
> >  			}
> 
> 	gitfile = read_gitfile(used_path);
> 
> that appear after the context in the patch?

It assumes that any file named ".git" is worth reading and selecting.
And then later we actually read_gitfile to find out if it's worth-while.
There is no change of behavior from before the patch, as we would
similarly notice the file (without checking if it's a real gitfile) and
then later read it and possibly fail.

However, with the ordering change, there is a technically a regression
in one case: a random file "foo" next to a repo "foo.git". Saying "git
ls-remote foo" used to prefer "foo.git", and will now select the file
"foo" only to fail.

The code-path in clone's get_repo_path handles this properly (it checks
that the path is really a valid gitfile before finishing the loop). The
gitfile-reading from later in enter_repo could be hoisted into the loop.
If was trying to make a less-invasive change; if we're going to do that
much rewriting, it probably makes sense to factor out the logic from
get_repo_path and have them share the code.

Thanks for noticing. I saw this issue when I was writing the original
version of the patch, and meant to revisit it and at least document it
in the commit message, but I ended up forgetting.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Alternates corruption issue
  2012-02-02 21:59                 ` Jeff King
  2012-02-03  0:47                   ` Junio C Hamano
@ 2012-02-03 14:40                   ` Richard Purdie
  1 sibling, 0 replies; 19+ messages in thread
From: Richard Purdie @ 2012-02-03 14:40 UTC (permalink / raw)
  To: Jeff King
  Cc: Jonathan Nieder, Junio C Hamano, GIT Mailing-list, Hart, Darren,
	Ashfield, Bruce

On Thu, 2012-02-02 at 16:59 -0500, Jeff King wrote:
> On Tue, Jan 31, 2012 at 04:22:58PM -0600, Jonathan Nieder wrote:
> 
> > Anyway, thanks for explaining.  Hopefully I can get to this soon and
> > factor out a common function for get_repo_path and enter_repo to call
> > so playing with the ordering becomes a little less scary. ;-)
> 
> So here's what I think we should apply to fix the particular issue that
> Richard mentioned at the start of this thread.
> 
> Besides tweaking the ordering, the main contribution is a set of tests
> that actually check some of these ambiguous cases (especially checking
> the fact that both code paths behave identically!). I didn't factor the
> logic into a common function, but doing so should be a little safer on
> top of these tests, if you're still interested.

I didn't have much to add to the discussion yesterday but this solution
looks good to me and should resolve the problems I was seeing.

Thanks!

Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Alternates corruption issue
  2012-02-03 12:02                     ` Jeff King
@ 2012-02-03 17:38                       ` Junio C Hamano
  2012-02-03 21:29                         ` Jeff King
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2012-02-03 17:38 UTC (permalink / raw)
  To: Jeff King
  Cc: Jonathan Nieder, Richard Purdie, GIT Mailing-list, Hart, Darren,
	Ashfield, Bruce

Jeff King <peff@peff.net> writes:

> However, with the ordering change, there is a technically a regression
> in one case: a random file "foo" next to a repo "foo.git". Saying "git
> ls-remote foo" used to prefer "foo.git", and will now select the file
> "foo" only to fail.

Yeah, very true X-<.

> Thanks for noticing. I saw this issue when I was writing the original
> version of the patch, and meant to revisit it and at least document it
> in the commit message, but I ended up forgetting.

No, thanks for working on this.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Alternates corruption issue
  2012-02-03 17:38                       ` Junio C Hamano
@ 2012-02-03 21:29                         ` Jeff King
  2012-02-03 21:51                           ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff King @ 2012-02-03 21:29 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jonathan Nieder, Richard Purdie, GIT Mailing-list, Hart, Darren,
	Ashfield, Bruce

On Fri, Feb 03, 2012 at 09:38:13AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > However, with the ordering change, there is a technically a regression
> > in one case: a random file "foo" next to a repo "foo.git". Saying "git
> > ls-remote foo" used to prefer "foo.git", and will now select the file
> > "foo" only to fail.
> 
> Yeah, very true X-<.
> 
> > Thanks for noticing. I saw this issue when I was writing the original
> > version of the patch, and meant to revisit it and at least document it
> > in the commit message, but I ended up forgetting.
> 
> No, thanks for working on this.

What do you want to do with it, then? Take my patch and ignore the
gitfile issue, or have me refactor it more heavily? I could go either
way.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Alternates corruption issue
  2012-02-03 21:29                         ` Jeff King
@ 2012-02-03 21:51                           ` Junio C Hamano
  2012-02-03 21:53                             ` Jeff King
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2012-02-03 21:51 UTC (permalink / raw)
  To: Jeff King
  Cc: Jonathan Nieder, Richard Purdie, GIT Mailing-list, Hart, Darren,
	Ashfield, Bruce

Jeff King <peff@peff.net> writes:

>> No, thanks for working on this.
>
> What do you want to do with it, then? Take my patch and ignore the
> gitfile issue, or have me refactor it more heavily? I could go either
> way.

How about queuing it as is in next and see if anybody screams loudly
enough so that we can fix-up if necessary in-tree?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Alternates corruption issue
  2012-02-03 21:51                           ` Junio C Hamano
@ 2012-02-03 21:53                             ` Jeff King
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff King @ 2012-02-03 21:53 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jonathan Nieder, Richard Purdie, GIT Mailing-list, Hart, Darren,
	Ashfield, Bruce

On Fri, Feb 03, 2012 at 01:51:33PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> >> No, thanks for working on this.
> >
> > What do you want to do with it, then? Take my patch and ignore the
> > gitfile issue, or have me refactor it more heavily? I could go either
> > way.
> 
> How about queuing it as is in next and see if anybody screams loudly
> enough so that we can fix-up if necessary in-tree?

Sounds good to me. Jonathan might be interested in building on top of
it, too.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-02-03 21:53 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-31 14:05 Alternates corruption issue Richard Purdie
2012-01-31 19:39 ` Jeff King
2012-01-31 20:25   ` Junio C Hamano
2012-01-31 20:44     ` Jeff King
2012-01-31 21:40       ` Jonathan Nieder
2012-01-31 21:47         ` Jeff King
2012-01-31 21:55           ` Jonathan Nieder
2012-01-31 22:05             ` Jeff King
2012-01-31 22:22               ` Jonathan Nieder
2012-01-31 22:42                 ` Jeff King
2012-01-31 22:59                   ` Jonathan Nieder
2012-02-02 21:59                 ` Jeff King
2012-02-03  0:47                   ` Junio C Hamano
2012-02-03 12:02                     ` Jeff King
2012-02-03 17:38                       ` Junio C Hamano
2012-02-03 21:29                         ` Jeff King
2012-02-03 21:51                           ` Junio C Hamano
2012-02-03 21:53                             ` Jeff King
2012-02-03 14:40                   ` Richard Purdie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.