All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC: git cat-file --follow-symlinks?
@ 2015-04-29 20:57 David Turner
  2015-04-29 21:16 ` Jonathan Nieder
                   ` (2 more replies)
  0 siblings, 3 replies; 43+ messages in thread
From: David Turner @ 2015-04-29 20:57 UTC (permalink / raw)
  To: git mailing list

I recently had a situation where I was using git cat-file (--batch) to
read files and directories out of the repository -- basically, todo the
equivalent of open, opendir, etc, on an arbitrary revision.
Unfortunately, I had to do a lot of gymnastics to handle symlinks in the
repository.  Instead of just doing echo $SHA:foo/bar/baz | git cat-file
--batch, I would have to first check if foo was a symlink, and if so,
follow it, and then check bar, and so on.

Instead, it would be cool if cat-file had a mode in which it would
follow symlinks.

The major wrinkle is that symlinks can point outside the repository --
either because they are absolute paths, or because they are relative
paths with enough ../ in them.  For this case, I propose that
--follow-symlinks should output [sha] "symlink" [target] instead of the
usual [sha] "blob" [bytes].  Since --follow-symlinks is new, this format
change will not break any existing code.

(I also propose that we use Linux's limit of 40 symlinks by default, but
--follow-symlinks could also have =max_links_to_follow to adjust this if
anyone cares)

Do people think this is reasonable?  If so, I'll see if I can get some
time to work on it this month.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-29 20:57 RFC: git cat-file --follow-symlinks? David Turner
@ 2015-04-29 21:16 ` Jonathan Nieder
  2015-04-29 21:24   ` David Turner
  2015-04-29 21:17 ` Junio C Hamano
  2015-04-30  8:10 ` Michael Haggerty
  2 siblings, 1 reply; 43+ messages in thread
From: Jonathan Nieder @ 2015-04-29 21:16 UTC (permalink / raw)
  To: David Turner; +Cc: git mailing list

Hi,

David Turner wrote:

> Instead, it would be cool if cat-file had a mode in which it would
> follow symlinks.

Makes sense.

> The major wrinkle is that symlinks can point outside the repository --
> either because they are absolute paths, or because they are relative
> paths with enough ../ in them.  For this case, I propose that
> --follow-symlinks should output [sha] "symlink" [target] instead of the
> usual [sha] "blob" [bytes].

What happens when the symlink payload contains a newline?

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-29 20:57 RFC: git cat-file --follow-symlinks? David Turner
  2015-04-29 21:16 ` Jonathan Nieder
@ 2015-04-29 21:17 ` Junio C Hamano
  2015-04-29 21:30   ` David Turner
  2015-04-30  8:10 ` Michael Haggerty
  2 siblings, 1 reply; 43+ messages in thread
From: Junio C Hamano @ 2015-04-29 21:17 UTC (permalink / raw)
  To: David Turner; +Cc: git mailing list

David Turner <dturner@twopensource.com> writes:

> Do people think this is reasonable?

I personally don't, exactly because we track the contents of the
symlink itself, not the referent.  Your "major wrinkle" that they
can point outside the repository is a mere manifestation of that.

The format specifiers the --batch option takes do not exactly give
you what the in-tree type of the thing is, to allow the receiving
end that parses the tagline (which it needs to do anyway in order to
find out where the current record ends) act on it.  %(objecttype)
would just say "blob" and you cannot tell if it is a plain file,
executable or a symbolic link.

Perhaps an ideal interface might be something like this:

    $ echo HEAD:RelNotes |
      git cat-file --batch='%(objecttype) %(intreemode) %(objectsize)'
    blob 160000 32
    Documentation/RelNotes/2.4.0.txt

I suspect it would be just the matter of teaching "cat-file --batch"
to read from get_sha1_with_context() in batch_one_object(), instead
of reading from get_sha1() which it currently does.

And that inteferface I think I can live with.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-29 21:16 ` Jonathan Nieder
@ 2015-04-29 21:24   ` David Turner
  0 siblings, 0 replies; 43+ messages in thread
From: David Turner @ 2015-04-29 21:24 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git mailing list

On Wed, 2015-04-29 at 14:16 -0700, Jonathan Nieder wrote:
> Hi,
> 
> David Turner wrote:
> 
> > Instead, it would be cool if cat-file had a mode in which it would
> > follow symlinks.
> 
> Makes sense.
> 
> > The major wrinkle is that symlinks can point outside the repository --
> > either because they are absolute paths, or because they are relative
> > paths with enough ../ in them.  For this case, I propose that
> > --follow-symlinks should output [sha] "symlink" [target] instead of the
> > usual [sha] "blob" [bytes].
> 
> What happens when the symlink payload contains a newline?

Oh, right.
So, how about [sha] "symlink" [bytes] "\n" [target] instead?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-29 21:17 ` Junio C Hamano
@ 2015-04-29 21:30   ` David Turner
  2015-04-29 21:48     ` Jeff King
  2015-04-29 21:49     ` Junio C Hamano
  0 siblings, 2 replies; 43+ messages in thread
From: David Turner @ 2015-04-29 21:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git mailing list

On Wed, 2015-04-29 at 14:17 -0700, Junio C Hamano wrote:
> David Turner <dturner@twopensource.com> writes:
> 
> > Do people think this is reasonable?
> 
> I personally don't, exactly because we track the contents of the
> symlink itself, not the referent.  Your "major wrinkle" that they
> can point outside the repository is a mere manifestation of that.

I'm not sure I understand why tracking the contents of the symlink is a
problem for this approach.  It seems reasonable to ask what would have
happened had I checked out the repo at a certain SHA and said "cat
foo/bar/baz".

> The format specifiers the --batch option takes do not exactly give
> you what the in-tree type of the thing is, to allow the receiving
> end that parses the tagline (which it needs to do anyway in order to
> find out where the current record ends) act on it.  %(objecttype)
> would just say "blob" and you cannot tell if it is a plain file,
> executable or a symbolic link.
> 
> Perhaps an ideal interface might be something like this:
> 
>     $ echo HEAD:RelNotes |
>       git cat-file --batch='%(objecttype) %(intreemode) %(objectsize)'
>     blob 160000 32
>     Documentation/RelNotes/2.4.0.txt
> 
> I suspect it would be just the matter of teaching "cat-file --batch"
> to read from get_sha1_with_context() in batch_one_object(), instead
> of reading from get_sha1() which it currently does.
> 
> And that inteferface I think I can live with.

Even if I had %(intreemode), I would still have to do a recursive search
to figure out whether Documentation or RelNotes was a symlink.  This is
why I want a follow-symlinks mode.  And since I am already reading
RelNotes, I can (and presently do) parse the mode out of that data.
$(intreedmode) would save me some parsing, but it would not save me any
reading, nor would it make my code any less complex.  But
--follow-symlinks would simplify my code.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-29 21:30   ` David Turner
@ 2015-04-29 21:48     ` Jeff King
  2015-04-29 22:19       ` Jonathan Nieder
  2015-04-29 22:29       ` David Turner
  2015-04-29 21:49     ` Junio C Hamano
  1 sibling, 2 replies; 43+ messages in thread
From: Jeff King @ 2015-04-29 21:48 UTC (permalink / raw)
  To: David Turner; +Cc: Junio C Hamano, git mailing list

On Wed, Apr 29, 2015 at 02:30:59PM -0700, David Turner wrote:

> > I personally don't, exactly because we track the contents of the
> > symlink itself, not the referent.  Your "major wrinkle" that they
> > can point outside the repository is a mere manifestation of that.
> 
> I'm not sure I understand why tracking the contents of the symlink is a
> problem for this approach.  It seems reasonable to ask what would have
> happened had I checked out the repo at a certain SHA and said "cat
> foo/bar/baz".

But git can't answer that question, can it? That is something that you
are asking the filesystem and the OS, and it may involve leaving the git
repository altogether (and depend on things like your cwd). Certainly
git can ask the filesystem for you, but it's more flexible if you do it
yourself (at the expense of more code on your end; see below).

> > Perhaps an ideal interface might be something like this:
> > 
> >     $ echo HEAD:RelNotes |
> >       git cat-file --batch='%(objecttype) %(intreemode) %(objectsize)'
> >     blob 160000 32
> >     Documentation/RelNotes/2.4.0.txt
> > 
> > I suspect it would be just the matter of teaching "cat-file --batch"
> > to read from get_sha1_with_context() in batch_one_object(), instead
> > of reading from get_sha1() which it currently does.
> > 
> > And that inteferface I think I can live with.
> 
> Even if I had %(intreemode), I would still have to do a recursive search
> to figure out whether Documentation or RelNotes was a symlink.  This is
> why I want a follow-symlinks mode.  And since I am already reading
> RelNotes, I can (and presently do) parse the mode out of that data.
> $(intreedmode) would save me some parsing, but it would not save me any
> reading, nor would it make my code any less complex.  But
> --follow-symlinks would simplify my code.

Wouldn't git have to do the same recursive search? That is, with the
interface above, you would see "ah, %(intreemode) says we are a symlink;
let me ask again using the filename from the symlink contents". And
repeat until you get a non-symlink. But with a --follow-symlinks option,
git is just doing the same thing internally. It cannot ask the
filesystem because these are not real files.

So the advantages of --follow-symlinks are:

  1. It's more efficient. Instead of round-tripping across the pipe, git
     follows the link internally.

  2. It's easier for callers. Git only has to implement it once, and
     callers get it for free. Also, callers do not have to have a
     bidirectional conversation with cat-file (which is doubly awkward
     if they are trying to send the output of cat-file elsewhere, since
     they end up having to forward along the non-symlinked output).

The disadvantages are:

  1. Git has to make a decision about what to do in corner cases. What
     is our cwd for relative links? The project root? Can we be in a
     subdir of the repo? What do we do about symlinks that point to
     non-existent files?  Ones that point outside the repository? If we
     cat the actual filesystem, what is the cwd of the pretend
     working-tree that we start from? Do we need to be able to show the
     contents _and_ the fact that we followed one or more links (and
     their intermediate names?).

Can you think of arguments (either on pro or con side) that I am
missing? Or did I misunderstand what you meant by "recursive search"?

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-29 21:30   ` David Turner
  2015-04-29 21:48     ` Jeff King
@ 2015-04-29 21:49     ` Junio C Hamano
  2015-04-29 22:47       ` David Turner
  1 sibling, 1 reply; 43+ messages in thread
From: Junio C Hamano @ 2015-04-29 21:49 UTC (permalink / raw)
  To: David Turner; +Cc: git mailing list

David Turner <dturner@twopensource.com> writes:

>> Perhaps an ideal interface might be something like this:
>> 
>>     $ echo HEAD:RelNotes |
>>       git cat-file --batch='%(objecttype) %(intreemode)
>>     %(objectsize)' blob 160000 32
>>     Documentation/RelNotes/2.4.0.txt
>> 
>> I suspect it would be just the matter of teaching "cat-file
>> --batch" to read from get_sha1_with_context() in
>> batch_one_object(), instead of reading from get_sha1() which it
>> currently does.
>> 
>> And that inteferface I think I can live with.
>
> Even if I had %(intreemode), I would still have to do a recursive
> search to figure out whether Documentation or RelNotes was a
> symlink.

Yes, and why is that a problem?  Think of "cat-file --batch" an
"object server" you query interactively.  You start the process, ask
it about HEAD:RelNotes, and learn that the blob is a link that
points at Documentation/RelNotes/2.4.0.txt.  Then you ask it about
"HEAD:Documentation/RelNotes/2.4.0.txt", which _may_ answer "no such
file", at which point you can start worrying about referring to
places outside the tree (i.e. untracked).

"cat-file" does not know about your project, and especially its
external dependencies, if a symbolic link ever steps outside the
tree objects, better than you do.  Because it is a low-level
plumbing command, allowing it to make policy decisions (e.g. "if
outside repository, always look at the filesystem that the program
happens to be running" [*1*]) is something I would reject as much as
possible.  It will paint us into a corner we cannot later escape out
of.

> This is
> why I want a follow-symlinks mode.  And since I am already reading
> RelNotes, I can (and presently do) parse the mode out of that
> data.

mode?  How?  If all you have is an blob object name and no context
around it (i.e. the top-level tree object has that blob with 160000
mode bits), you cannot tell a symlink from a regular file.


[Footnote]

*1* For example, you may have two projects's working trees A and B
    sitting next to each other, and A/sibling may be a symbolic link
    that points at ../B/some/thing.  A Porcelain that uses cat-file
    --batch as "the object server" may notice v1.0:sibling in A's
    history points at ../B/some/thing and would want to grab
    some/thing from the contemporary version of B's commit, instead
    of just blindly going to the filesystem.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-29 21:48     ` Jeff King
@ 2015-04-29 22:19       ` Jonathan Nieder
  2015-04-29 23:05         ` Jeff King
  2015-04-29 22:29       ` David Turner
  1 sibling, 1 reply; 43+ messages in thread
From: Jonathan Nieder @ 2015-04-29 22:19 UTC (permalink / raw)
  To: Jeff King; +Cc: David Turner, Junio C Hamano, git mailing list

Jeff King wrote:

>   1. Git has to make a decision about what to do in corner cases. What
>      is our cwd for relative links? The project root?

I don't follow.  Isn't symlink resolution always relative to the
symlink, regardless of cwd?

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-29 21:48     ` Jeff King
  2015-04-29 22:19       ` Jonathan Nieder
@ 2015-04-29 22:29       ` David Turner
  2015-04-29 23:11         ` Jeff King
  1 sibling, 1 reply; 43+ messages in thread
From: David Turner @ 2015-04-29 22:29 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git mailing list

On Wed, 2015-04-29 at 17:48 -0400, Jeff King wrote:
> On Wed, Apr 29, 2015 at 02:30:59PM -0700, David Turner wrote:
> 
> > > I personally don't, exactly because we track the contents of the
> > > symlink itself, not the referent.  Your "major wrinkle" that they
> > > can point outside the repository is a mere manifestation of that.
> > 
> > I'm not sure I understand why tracking the contents of the symlink is a
> > problem for this approach.  It seems reasonable to ask what would have
> > happened had I checked out the repo at a certain SHA and said "cat
> > foo/bar/baz".
> 
> But git can't answer that question, can it? That is something that you
> are asking the filesystem and the OS, and it may involve leaving the git
> repository altogether (and depend on things like your cwd). Certainly
> git can ask the filesystem for you, but it's more flexible if you do it
> yourself (at the expense of more code on your end; see below).

As far as I know, symlink resolution doesn't depend on cwd.  And paths
passed to cat-file are interpreted relative to the repo root regardless
of the cwd anyway.

It's true that full symlink resolution might depend on leaving the git
repo.  I didn't want git to have to deal with that, so my proposal told
the caller about out-of-repo symlinks, allowing the caller to deal. The
caller can then use the standard library, which already knows how to
resolve symlinks.  I guess a sequence of links could leave and then
reenter the repository; that is indeed a corner case that I am happy to
leave to the callers; any caller who cares will be no worse off than
they are now.

> > > Perhaps an ideal interface might be something like this:
> > > 
> > >     $ echo HEAD:RelNotes |
> > >       git cat-file --batch='%(objecttype) %(intreemode) %(objectsize)'
> > >     blob 160000 32
> > >     Documentation/RelNotes/2.4.0.txt
> > > 
> > > I suspect it would be just the matter of teaching "cat-file --batch"
> > > to read from get_sha1_with_context() in batch_one_object(), instead
> > > of reading from get_sha1() which it currently does.
> > > 
> > > And that inteferface I think I can live with.
> > 
> > Even if I had %(intreemode), I would still have to do a recursive search
> > to figure out whether Documentation or RelNotes was a symlink.  This is
> > why I want a follow-symlinks mode.  And since I am already reading
> > RelNotes, I can (and presently do) parse the mode out of that data.
> > $(intreedmode) would save me some parsing, but it would not save me any
> > reading, nor would it make my code any less complex.  But
> > --follow-symlinks would simplify my code.
> 
> Wouldn't git have to do the same recursive search? That is, with the
> interface above, you would see "ah, %(intreemode) says we are a symlink;
> let me ask again using the filename from the symlink contents". And
> repeat until you get a non-symlink. But with a --follow-symlinks option,
> git is just doing the same thing internally. It cannot ask the
> filesystem because these are not real files.
> 
> So the advantages of --follow-symlinks are:
> 
>   1. It's more efficient. Instead of round-tripping across the pipe, git
>      follows the link internally.
> 
>   2. It's easier for callers. Git only has to implement it once, and
>      callers get it for free. Also, callers do not have to have a
>      bidirectional conversation with cat-file (which is doubly awkward
>      if they are trying to send the output of cat-file elsewhere, since
>      they end up having to forward along the non-symlinked output).
> 
> The disadvantages are:
> 
>   1. Git has to make a decision about what to do in corner cases. What
>      is our cwd for relative links? The project root? Can we be in a
>      subdir of the repo? What do we do about symlinks that point to
>      non-existent files?  Ones that point outside the repository? If we
>      cat the actual filesystem, what is the cwd of the pretend
>      working-tree that we start from? Do we need to be able to show the
>      contents _and_ the fact that we followed one or more links (and
>      their intermediate names?).
> 
> Can you think of arguments (either on pro or con side) that I am
> missing? Or did I misunderstand what you meant by "recursive search"?

Overall, I agree.  I think the disadvantages are somewhat overstated.

As I said above, I don't think the cwd is a problem.  The output for
symlinks which point outside the repo should be absolute (in the case of
absolute symlinks), or relative to the repo root (for relative
symlinks).  In other words, if my repo contains:
foo/bar -> ../../baz
then the output[1] would be 
symlink 6
../baz

I can't think of any other output that would be reasonable here, but
maybe there's something I don't understand.

Note that I'm not proposing that cat-file should actually read any files
or directories from outside of the repo.  It should just make it
possible for the caller to do so.


[1] (I realize, now, that having a sha would be useless, so I've omitted
it).

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-29 21:49     ` Junio C Hamano
@ 2015-04-29 22:47       ` David Turner
  0 siblings, 0 replies; 43+ messages in thread
From: David Turner @ 2015-04-29 22:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git mailing list

On Wed, 2015-04-29 at 14:49 -0700, Junio C Hamano wrote:
> David Turner <dturner@twopensource.com> writes:
> 
> >> Perhaps an ideal interface might be something like this:
> >> 
> >>     $ echo HEAD:RelNotes |
> >>       git cat-file --batch='%(objecttype) %(intreemode)
> >>     %(objectsize)' blob 160000 32
> >>     Documentation/RelNotes/2.4.0.txt
> >> 
> >> I suspect it would be just the matter of teaching "cat-file
> >> --batch" to read from get_sha1_with_context() in
> >> batch_one_object(), instead of reading from get_sha1() which it
> >> currently does.
> >> 
> >> And that inteferface I think I can live with.
> >
> > Even if I had %(intreemode), I would still have to do a recursive
> > search to figure out whether Documentation or RelNotes was a
> > symlink.

I apologize.  I have misread your example.  All of my text was assuming
that Documentation/Relnotes/2.4.0.txt was a symlink, instead
of /RelNotes being a symlink to Documentation/Relnotes/2.4.0.txt.  So my
previous message was very difficult to interpret.  I think Jeff King's
reply is a better starting point for discussion, since it lays out the
advantages and disadvantages of the proposal.

> allowing it to make policy decisions (e.g. "if
> outside repository, always look at the filesystem that the program
> happens to be running" [*1*]) 

Despite my confusion, I don't think I ever proposed doing this.  I
proposed that in the case that a symlink points outside the repo,
cat-file would tell the caller that it has done so, so that the caller
can decide what to do.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-29 22:19       ` Jonathan Nieder
@ 2015-04-29 23:05         ` Jeff King
  0 siblings, 0 replies; 43+ messages in thread
From: Jeff King @ 2015-04-29 23:05 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: David Turner, Junio C Hamano, git mailing list

On Wed, Apr 29, 2015 at 03:19:07PM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> 
> >   1. Git has to make a decision about what to do in corner cases. What
> >      is our cwd for relative links? The project root?
> 
> I don't follow.  Isn't symlink resolution always relative to the
> symlink, regardless of cwd?

Yeah, I was being silly to think this was a concern for intra-repository
links. It's well-defined there.  E.g., if "foo/bar/baz" points to
"../moof", that is just "foo/moof".

But if we leave the git tree the location of our root matters. E.g., if
"foo/bar/baz" points to "../../../../moof", where is that anchored in
the filesystem? So perhaps "cwd" is not the right term. It is really
"where are we pretending our working tree is".

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-29 22:29       ` David Turner
@ 2015-04-29 23:11         ` Jeff King
  2015-04-30  0:37           ` Jeff King
  0 siblings, 1 reply; 43+ messages in thread
From: Jeff King @ 2015-04-29 23:11 UTC (permalink / raw)
  To: David Turner; +Cc: Junio C Hamano, git mailing list

On Wed, Apr 29, 2015 at 03:29:36PM -0700, David Turner wrote:

> Overall, I agree.  I think the disadvantages are somewhat overstated.
> 
> As I said above, I don't think the cwd is a problem.  The output for
> symlinks which point outside the repo should be absolute (in the case of
> absolute symlinks), or relative to the repo root (for relative
> symlinks).  In other words, if my repo contains:
> foo/bar -> ../../baz
> then the output[1] would be 
> symlink 6
> ../baz
> 
> I can't think of any other output that would be reasonable here, but
> maybe there's something I don't understand.

Yeah, I agree if you let git punt on leaving the filesystem, most of the
complicated problems go away. It still feels a bit more magical than I
expect out of cat-file, and there are still corner cases (e.g., do we do
cycle detection? Or just have a limit to the recursion depth?)

And if you are punting on some cases, I think you'd still want to be
able to report on the symlinks you couldn't resolve (e.g., because they
went out of tree, pointed to non-existent files, or caused cycles). So
it seems like %(intreemode) is a good first step, because it lets you
express that (and more).

And then you could implement --follow-symlinks on top of that; it can't
catch all cases, but you've left callers with an escape hatch to do
their own resolution if they want, without having to implement a new
syntax for it.

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-29 23:11         ` Jeff King
@ 2015-04-30  0:37           ` Jeff King
  2015-04-30  1:06             ` David Turner
  0 siblings, 1 reply; 43+ messages in thread
From: Jeff King @ 2015-04-30  0:37 UTC (permalink / raw)
  To: David Turner; +Cc: Junio C Hamano, git mailing list

On Wed, Apr 29, 2015 at 07:11:50PM -0400, Jeff King wrote:

> Yeah, I agree if you let git punt on leaving the filesystem, most of the
> complicated problems go away. It still feels a bit more magical than I
> expect out of cat-file, and there are still corner cases (e.g., do we do
> cycle detection? Or just have a limit to the recursion depth?)

I was pondering the "magical" above. I think what bugs me is that it
seems like a feature that is implemented as part of one random bit of
plumbing, but not available elsewhere.

Conceptually, this is like peeling object names. You may give a tag
name, but if you ask for a tree commit we will peel the tag to a commit,
and the commit to a tree. This is sort of the same thing; you give a
path within a tree, and we will peel until we hit a "real" non-symlink
object.

I don't know what the syntax would look like. To match "foo^{tree}" it
would be something like:

  HEAD:foo/bar^{resolve}

or something like that. Except that it is a bad idea to allow "^{}"
syntax on the right-hand side of a colon, as it is ambiguous with
filenames that contain "^{resolve}". So it would have to look something
like:

  HEAD^{resolve}:foo/bar

which is a _little_ weird, but actually kind of makes sense. The
"resolve" operation inherently is not just about the filename, but about
uses HEAD^{tree} as the root context.

So I dunno. This pushes the resolving logic even _lower_ in the stack
than it would be in cat-file. So why do I like it more? Cognitive
dissonance? I guess I the appeal to me is that it:

  1. Makes the concept available more generally (you can "rev-parse" it,
     you can "git show" it, etc). It also lets you _name_ the object in
     question, so you can ask for other things besides it contents (like
     its name, its type, etc).

  2. Positions it alongside other peeling name-resolution functions.

Thoughts?

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-30  0:37           ` Jeff King
@ 2015-04-30  1:06             ` David Turner
  2015-04-30  1:16               ` Jeff King
  0 siblings, 1 reply; 43+ messages in thread
From: David Turner @ 2015-04-30  1:06 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git mailing list

On Wed, 2015-04-29 at 20:37 -0400, Jeff King wrote:
> On Wed, Apr 29, 2015 at 07:11:50PM -0400, Jeff King wrote:
> 
> > Yeah, I agree if you let git punt on leaving the filesystem, most of the
> > complicated problems go away. It still feels a bit more magical than I
> > expect out of cat-file, and there are still corner cases (e.g., do we do
> > cycle detection? Or just have a limit to the recursion depth?)
> 
> I was pondering the "magical" above. I think what bugs me is that it
> seems like a feature that is implemented as part of one random bit of
> plumbing, but not available elsewhere.
> 
> Conceptually, this is like peeling object names. You may give a tag
> name, but if you ask for a tree commit we will peel the tag to a commit,
> and the commit to a tree. This is sort of the same thing; you give a
> path within a tree, and we will peel until we hit a "real" non-symlink
> object.
> 
> I don't know what the syntax would look like. To match "foo^{tree}" it
> would be something like:
> 
>   HEAD:foo/bar^{resolve}
> 
> or something like that. Except that it is a bad idea to allow "^{}"
> syntax on the right-hand side of a colon, as it is ambiguous with
> filenames that contain "^{resolve}". So it would have to look something
> like:
> 
>   HEAD^{resolve}:foo/bar
> 
> which is a _little_ weird, but actually kind of makes sense. The
> "resolve" operation inherently is not just about the filename, but about
> uses HEAD^{tree} as the root context.
> 
> So I dunno. This pushes the resolving logic even _lower_ in the stack
> than it would be in cat-file. So why do I like it more? Cognitive
> dissonance? I guess I the appeal to me is that it:
> 
>   1. Makes the concept available more generally (you can "rev-parse" it,
>      you can "git show" it, etc). It also lets you _name_ the object in
>      question, so you can ask for other things besides it contents (like
>      its name, its type, etc).
> 
>   2. Positions it alongside other peeling name-resolution functions.

Just to clarify: if you do git rev-parse, and the result is an
out-of-tree symlink, you see /foo or ../foo instead of a sha?  And if
you "git show" it it says "symlink HEAD:../foo"?

This seems totally reasonable to me, and solves my problem.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-30  1:06             ` David Turner
@ 2015-04-30  1:16               ` Jeff King
  2015-04-30  1:31                 ` Junio C Hamano
  2015-04-30  1:45                 ` David Turner
  0 siblings, 2 replies; 43+ messages in thread
From: Jeff King @ 2015-04-30  1:16 UTC (permalink / raw)
  To: David Turner; +Cc: Junio C Hamano, git mailing list

On Wed, Apr 29, 2015 at 06:06:23PM -0700, David Turner wrote:

> >   HEAD^{resolve}:foo/bar
> [...]
>
> Just to clarify: if you do git rev-parse, and the result is an
> out-of-tree symlink, you see /foo or ../foo instead of a sha?  And if
> you "git show" it it says "symlink HEAD:../foo"?

I had imagined we would stop resolution and you would just get the last
object peeled object. Combined with teaching cat-file to show more
object context, doing:

  echo content >dest ;# actual blob
  ln -s dest link    ;# link to blob
  ln -s broken foo   ;# broken link
  ln -s out ../foo   ;# out-of-tree link
  git add . && git commit -m foo
  for i in link broken out; do
	echo HEAD^{resolve}:$i
  done |
  git cat-file --batch="%(intreemode) %(size)"

would yield:

 (1)   100644 8
       content
 (2)   040000 3
       foo
 (3)   040000 6
       ../foo

where the left-margin numbers are for reference:

  1. We dereference a real symlink, and pretend like we actually asked
     for its referent.

  2. For a broken link, we can't dereference, so we return the link
     itself. You can tell by the mode, and the content tells you what
     would have been dereferenced.

  3. Ditto for out-of-tree. Note that this would be the _raw_ symlink
     contents, not any kind of simplification (so if you asked for
     "foo/bar/baz" and it was "../../../../out", you would the full path
     with all those dots, not a simplified "../out", which I think is
     what you were trying to show in earlier examples).

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-30  1:16               ` Jeff King
@ 2015-04-30  1:31                 ` Junio C Hamano
  2015-04-30  3:18                   ` Jeff King
  2015-04-30  1:45                 ` David Turner
  1 sibling, 1 reply; 43+ messages in thread
From: Junio C Hamano @ 2015-04-30  1:31 UTC (permalink / raw)
  To: Jeff King; +Cc: David Turner, git mailing list

Jeff King <peff@peff.net> writes:

> I had imagined we would stop resolution and you would just get the last
> object peeled object. Combined with teaching cat-file to show more
> object context, doing:
>
>   echo content >dest ;# actual blob
>   ln -s dest link    ;# link to blob
>   ln -s broken foo   ;# broken link
>   ln -s out ../foo   ;# out-of-tree link
>   git add . && git commit -m foo
>   for i in link broken out; do
> 	echo HEAD^{resolve}:$i
>   done |
>   git cat-file --batch="%(intreemode) %(size)"
>
> would yield:
>
>  (1)   100644 8
>        content
>  (2)   040000 3
>        foo
>  (3)   040000 6
>        ../foo
>
> where the left-margin numbers are for reference:
>
>   1. We dereference a real symlink, and pretend like we actually asked
>      for its referent.
>
>   2. For a broken link, we can't dereference, so we return the link
>      itself. You can tell by the mode, and the content tells you what
>      would have been dereferenced.
>
>   3. Ditto for out-of-tree. Note that this would be the _raw_ symlink
>      contents, not any kind of simplification (so if you asked for
>      "foo/bar/baz" and it was "../../../../out", you would the full path
>      with all those dots, not a simplified "../out", which I think is
>      what you were trying to show in earlier examples).

s/040000/160000/ I would think (if you really meant to expose a
tree, write it as 40000 instead, so that people will not get a wrong
impression and reimplement a broken tree object encoding some popular
Git hosting site broke their customer projects with ;-).

I am not sure $treeish^{resolve} is a great syntax, but I like the
concept and agree that it is a lot more sensible to handle this at
the level of sha1_name.c layer than an ad-hoc solution in the
cat-file layer.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-30  1:16               ` Jeff King
  2015-04-30  1:31                 ` Junio C Hamano
@ 2015-04-30  1:45                 ` David Turner
  2015-04-30  3:37                   ` Jeff King
  1 sibling, 1 reply; 43+ messages in thread
From: David Turner @ 2015-04-30  1:45 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git mailing list

On Wed, 2015-04-29 at 21:16 -0400, Jeff King wrote:
> On Wed, Apr 29, 2015 at 06:06:23PM -0700, David Turner wrote:
>   3. Ditto for out-of-tree. Note that this would be the _raw_ symlink
>      contents, not any kind of simplification (so if you asked for
>      "foo/bar/baz" and it was "../../../../out", you would the full path
>      with all those dots, not a simplified "../out", which I think is
>      what you were trying to show in earlier examples).

Unfortunately, we need the simplified version, because we otherwise
don't know what the ..s are relative to in the case of a link to a link:

  echo content >dest ;# actual blob
  mkdir -p foo/bar
  ln -s foo/bar/baz fleem             # in-tree link-to-link 
  ln -s ../../../external foo/bar/baz # out-of-tree link

If echo HEAD^{resolve}:fleem were to return ../../../external (after
following the first symlink to the second), we would have lost
information.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-30  1:31                 ` Junio C Hamano
@ 2015-04-30  3:18                   ` Jeff King
  0 siblings, 0 replies; 43+ messages in thread
From: Jeff King @ 2015-04-30  3:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: David Turner, git mailing list

On Wed, Apr 29, 2015 at 06:31:22PM -0700, Junio C Hamano wrote:

> > would yield:
> >
> >  (1)   100644 8
> >        content
> >  (2)   040000 3
> >        foo
> >  (3)   040000 6
> >        ../foo
> [...]
> 
> s/040000/160000/ I would think (if you really meant to expose a
> tree, write it as 40000 instead, so that people will not get a wrong
> impression and reimplement a broken tree object encoding some popular
> Git hosting site broke their customer projects with ;-).

Whooops. Yes, definitely 160000. That's what I get for not
double-checking. :)

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-30  1:45                 ` David Turner
@ 2015-04-30  3:37                   ` Jeff King
  2015-04-30  5:34                     ` Junio C Hamano
                                       ` (3 more replies)
  0 siblings, 4 replies; 43+ messages in thread
From: Jeff King @ 2015-04-30  3:37 UTC (permalink / raw)
  To: David Turner; +Cc: Junio C Hamano, git mailing list

On Wed, Apr 29, 2015 at 06:45:45PM -0700, David Turner wrote:

> On Wed, 2015-04-29 at 21:16 -0400, Jeff King wrote:
> > On Wed, Apr 29, 2015 at 06:06:23PM -0700, David Turner wrote:
> >   3. Ditto for out-of-tree. Note that this would be the _raw_ symlink
> >      contents, not any kind of simplification (so if you asked for
> >      "foo/bar/baz" and it was "../../../../out", you would the full path
> >      with all those dots, not a simplified "../out", which I think is
> >      what you were trying to show in earlier examples).
> 
> Unfortunately, we need the simplified version, because we otherwise
> don't know what the ..s are relative to in the case of a link to a link:
> 
>   echo content >dest ;# actual blob
>   mkdir -p foo/bar
>   ln -s foo/bar/baz fleem             # in-tree link-to-link 
>   ln -s ../../../external foo/bar/baz # out-of-tree link
> 
> If echo HEAD^{resolve}:fleem were to return ../../../external (after
> following the first symlink to the second), we would have lost
> information.

Urgh, yeah, thanks for the counter-example.

Here are some possible alternatives:

  1. If we can't resolve fully, don't resolve anything. I.e., return the
     "fleem" object here, and the caller can recurse if they want. This is
     simple and correct, but not as helpful to somebody who wants to follow
     the out-of-tree link (they have to re-traverse the fleem->foo/bar/baz
     link themselves).

  2. Consider it can error if resolution fails. If you ask for
     "HEAD^{tree}^{commit}", that does not resolve to anything (because
     we can't peel the tree to a commit). Like (1), this is simple and
     correct, but probably not all that helpful. The caller has to
     start from scratch and resolve themselves, rather than getting an
     intermediate result.

  3. Return an object with the symlink relative to the original
     filename (so "../external" in this case). This is kind of weird,
     though, because we're not just returning a string from the name
     resolution. It's an actual object.  So we'd be generating a fake
     object that doesn't actually exist in the object db and
     returning that. Feeding that sha1 to another program would fail.

  4. Return the last object we could resolve, as I described. So
     foo/bar/baz (with "../../../external" as its content) in this case.
     When you resolve a name, you can ask for the context we discovered
     along the way by traversing the tree. The mode is one example we've
     already discussed, but the path name is another. So something like:

       echo "HEAD^{resolve}:fleem" |
       git cat-file --batch="%(objectname) %(size) %(intreepath)"

     would show:

       1234abcd 17 foo/bar/baz
       ../../../external

     And then the caller knows that the path is not relative to the
     original "fleem", but rather to "foo/bar/baz".

     The problem is that although this context lookup is already part of
     get_sha1_with_context, that is not exposed through every interface.
     E.g., "git rev-parse HEAD^{resolve}:fleem" will give you an object,
     but you have no way of knowing the context.

I can't say that I'm excited about any of them. Perhaps you or somebody
else can think of a more clever solution.

Note that the complication with (3) does come from my trying to push
this down into the name-resolution code.

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-30  3:37                   ` Jeff King
@ 2015-04-30  5:34                     ` Junio C Hamano
  2015-04-30  8:12                       ` Michael Haggerty
  2015-04-30 10:04                     ` Andreas Schwab
                                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 43+ messages in thread
From: Junio C Hamano @ 2015-04-30  5:34 UTC (permalink / raw)
  To: Jeff King; +Cc: David Turner, git mailing list

Jeff King <peff@peff.net> writes:

> Here are some possible alternatives:
>
>   1. If we can't resolve fully, don't resolve anything. I.e., return the
>      "fleem" object here, and the caller can recurse if they want. This is
>      simple and correct, but not as helpful to somebody who wants to follow
>      the out-of-tree link (they have to re-traverse the fleem->foo/bar/baz
>      link themselves).
>
>   2. Consider it can error if resolution fails. If you ask for
>      "HEAD^{tree}^{commit}", that does not resolve to anything (because
>      we can't peel the tree to a commit). Like (1), this is simple and
>      correct, but probably not all that helpful. The caller has to
>      start from scratch and resolve themselves, rather than getting an
>      intermediate result.
>
>   3. Return an object with the symlink relative to the original
>      filename (so "../external" in this case). This is kind of weird,
>      though, because we're not just returning a string from the name
>      resolution. It's an actual object.  So we'd be generating a fake
>      object that doesn't actually exist in the object db and
>      returning that. Feeding that sha1 to another program would fail.
>
>   4. Return the last object we could resolve, as I described. So
>      foo/bar/baz (with "../../../external" as its content) in this case.
>      When you resolve a name, you can ask for the context we discovered
>      along the way by traversing the tree. The mode is one example we've
>      already discussed, but the path name is another. So something like:
>
>        echo "HEAD^{resolve}:fleem" |
>        git cat-file --batch="%(objectname) %(size) %(intreepath)"
>
>      would show:
>
>        1234abcd 17 foo/bar/baz
>        ../../../external
>
>      And then the caller knows that the path is not relative to the
>      original "fleem", but rather to "foo/bar/baz".
>
>      The problem is that although this context lookup is already part of
>      get_sha1_with_context, that is not exposed through every interface.
>      E.g., "git rev-parse HEAD^{resolve}:fleem" will give you an object,
>      but you have no way of knowing the context.
>
> I can't say that I'm excited about any of them. Perhaps you or somebody
> else can think of a more clever solution.

Me neither, but if I really had to pick one, it would be the last
one, except that %(intreepath) would have to be somehow quoted,
perhaps like how the output from "git diff" quotes funny pathnames.

When we want to use this as an extended SHA-1 syntax (i.e. outside
of "cat-file --batch"), it most likely should just error out if the
link does not resolve to a path that would eventually result in an
in-tree object, with the same error message you would get when you
ask for the object name for "HEAD:no-such-path".

But stepping back a bit.

We have been talking about HEAD^{resolve}:fleem but how did we learn
that there is a path "fleem" in the tree of HEAD in the first place?
I would presume that the answer eventually boils down to "somebody
fed HEAD to 'ls-tree -r'", and then that somebody is an idiot if it
did not grab the mode bits to learn what kind of blob fleem is, or
if it did not tell the guy that wants to drive "cat-file --batch".

So this whole thing somehow starts to smell like a solution that is
looking for a problem that has arisen only because the use case
story behind it is screwy.  Again, why is it such a huge problem to
make it relative and ask a newly formlated question for the guy who
is driving "cat-file --batch" in the first place?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-29 20:57 RFC: git cat-file --follow-symlinks? David Turner
  2015-04-29 21:16 ` Jonathan Nieder
  2015-04-29 21:17 ` Junio C Hamano
@ 2015-04-30  8:10 ` Michael Haggerty
  2 siblings, 0 replies; 43+ messages in thread
From: Michael Haggerty @ 2015-04-30  8:10 UTC (permalink / raw)
  To: David Turner, git mailing list

On 04/29/2015 10:57 PM, David Turner wrote:
> I recently had a situation where I was using git cat-file (--batch) to
> read files and directories out of the repository -- basically, todo the
> equivalent of open, opendir, etc, on an arbitrary revision.
> Unfortunately, I had to do a lot of gymnastics to handle symlinks in the
> repository.  Instead of just doing echo $SHA:foo/bar/baz | git cat-file
> --batch, I would have to first check if foo was a symlink, and if so,
> follow it, and then check bar, and so on.
> 
> Instead, it would be cool if cat-file had a mode in which it would
> follow symlinks.

I guess it's obvious, but I haven't seen it discussed in this thread, so
I wanted to point out that this feature has some limitations related to
how its arguments are constructed.

In the examples discussed,

    git cat-file --follow-symlinks $committish:foo/bar/baz

, we know the root of a tree and we know the relative path where the
symlink was located, so all is well (modulo a policy for handling
symlinks that point outside of the repo). But the following, which would
naively seem to be identical, cannot work:

    oid=$(git rev-parse $committish:foo/bar/baz)
    git cat-file --follow-symlinks $sha1

The problem is that `$oid` is the name of a blob, and `cat-file` can't
know whether the blob represents the contents of a symlink or the
contents of a file. (And even if it knew, it would have no idea what
tree the symlink paths should be interpreted relative to.) What if we
pass `cat-file` a tree and a relative path instead?:

    tree=$(git rev-parse $committish:foo/bar)
    git cat-file --follow-symlinks $tree:baz

Now it can work, but only if the symlink chain never rises above the
level of `$tree`. So for example, if `foo/bar/baz` points at `../xyzzy`,
then the very first example would succeed, whereas the last one would
have to fail. Please note that there is no possible way to avoid failure
by reading files from the filesystem outside of the repository, because
in this case `cat-file` can have no idea where to look.

> The major wrinkle is that symlinks can point outside the repository --
> either because they are absolute paths, or because they are relative
> paths with enough ../ in them.  For this case, I propose that
> --follow-symlinks should output [sha] "symlink" [target] instead of the
> usual [sha] "blob" [bytes].  Since --follow-symlinks is new, this format
> change will not break any existing code.
> [...]

I don't think this is doable in the general case, because it is not only
the last component of the path that can point outside of the repository.
Suppose we have

    foo -> ../plugh

and I run

    git cat-file --follow-symlinks HEAD:foo/bar/baz

The lookup of `foo` already falls outside of the repository, and
`bar/baz` is relative to *it*, so in this case it would have to return

    ???? "symlink" ../plugh/bar/baz

The question is, what SHA-1 should be output in place of the question
marks? The only SHA-1 we have handy is the SHA-1 of `foo`, but that
doesn't seem especially useful.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-30  5:34                     ` Junio C Hamano
@ 2015-04-30  8:12                       ` Michael Haggerty
  2015-04-30 18:03                         ` David Turner
  0 siblings, 1 reply; 43+ messages in thread
From: Michael Haggerty @ 2015-04-30  8:12 UTC (permalink / raw)
  To: Junio C Hamano, Jeff King; +Cc: David Turner, git mailing list

On 04/30/2015 07:34 AM, Junio C Hamano wrote:
> [...]
> But stepping back a bit.
> 
> We have been talking about HEAD^{resolve}:fleem but how did we learn
> that there is a path "fleem" in the tree of HEAD in the first place?
> I would presume that the answer eventually boils down to "somebody
> fed HEAD to 'ls-tree -r'", and then that somebody is an idiot if it
> did not grab the mode bits to learn what kind of blob fleem is, or
> if it did not tell the guy that wants to drive "cat-file --batch".

I think a plausible use case for this feature is to read
`$tag^{resolve}:RelNotes`, in which case the reason we know it's there
is "because the maintainer told us it is there".

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-30  3:37                   ` Jeff King
  2015-04-30  5:34                     ` Junio C Hamano
@ 2015-04-30 10:04                     ` Andreas Schwab
  2015-04-30 18:27                       ` Jeff King
  2015-04-30 19:25                     ` David Turner
  2015-05-01  3:29                     ` David Turner
  3 siblings, 1 reply; 43+ messages in thread
From: Andreas Schwab @ 2015-04-30 10:04 UTC (permalink / raw)
  To: Jeff King; +Cc: David Turner, Junio C Hamano, git mailing list

Jeff King <peff@peff.net> writes:

>   4. Return the last object we could resolve, as I described. So
>      foo/bar/baz (with "../../../external" as its content) in this case.
>      When you resolve a name, you can ask for the context we discovered
>      along the way by traversing the tree. The mode is one example we've
>      already discussed, but the path name is another. So something like:
>
>        echo "HEAD^{resolve}:fleem" |
>        git cat-file --batch="%(objectname) %(size) %(intreepath)"
>
>      would show:
>
>        1234abcd 17 foo/bar/baz
>        ../../../external
>
>      And then the caller knows that the path is not relative to the
>      original "fleem", but rather to "foo/bar/baz".

Note that ".." will always follow the *physical* structure, so if
foo/bar/baz is walking over symbolic links, "../../.." may lead you to
somewhere else entirely.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-30  8:12                       ` Michael Haggerty
@ 2015-04-30 18:03                         ` David Turner
  2015-04-30 18:19                           ` Junio C Hamano
  0 siblings, 1 reply; 43+ messages in thread
From: David Turner @ 2015-04-30 18:03 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Junio C Hamano, Jeff King, git mailing list

On Thu, 2015-04-30 at 10:12 +0200, Michael Haggerty wrote:
> On 04/30/2015 07:34 AM, Junio C Hamano wrote:
> > [...]
> > But stepping back a bit.
> > 
> > We have been talking about HEAD^{resolve}:fleem but how did we learn
> > that there is a path "fleem" in the tree of HEAD in the first place?
> > I would presume that the answer eventually boils down to "somebody
> > fed HEAD to 'ls-tree -r'", and then that somebody is an idiot if it
> > did not grab the mode bits to learn what kind of blob fleem is, or
> > if it did not tell the guy that wants to drive "cat-file --batch".
> 
> I think a plausible use case for this feature is to read
> `$tag^{resolve}:RelNotes`, in which case the reason we know it's there
> is "because the maintainer told us it is there".

Yes, that is approximately my use case.  Read on for details:
With a colleague, I'm building a mode for the free and open source Pants
build system that will support build-aware sparse checkouts.  Pants is a
build tool for monorepos (inspired by Google's Blaze and similar to
Facebook's Buck).  Most individual users will only be using a tiny
subset of the full repository, so it would be convenient if they only
had to check out what the plan to use.  Assume that they want to check
out only a certain target (a path, approximately) plus its transitive
dependencies, on a certain revision.  So pants first checks that
directory (at that rev) for a BUILD file.  That BUILD file might point
to other BUILD files as dependencies, so again, we must examine those,
recursively.  

In no case did we do a ls-files command, since we want to examine as
little of the repo as possible.  And even if we had done an ls-files, we
would still need to resolve all of the symlinks ourselves.

So that's the motivation here.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-30 18:03                         ` David Turner
@ 2015-04-30 18:19                           ` Junio C Hamano
  2015-04-30 18:28                             ` David Turner
  0 siblings, 1 reply; 43+ messages in thread
From: Junio C Hamano @ 2015-04-30 18:19 UTC (permalink / raw)
  To: David Turner; +Cc: Michael Haggerty, Jeff King, git mailing list

David Turner <dturner@twopensource.com> writes:

> In no case did we do a ls-files command,...

"ls-tree -r" is what I would have imagined you would be using, as
somebody needs to have the full repository in order to resolve the
symbolic links _anyway_, and that somebody does not need to have a
checkout in order to do so.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-30 10:04                     ` Andreas Schwab
@ 2015-04-30 18:27                       ` Jeff King
  2015-04-30 19:18                         ` Junio C Hamano
  0 siblings, 1 reply; 43+ messages in thread
From: Jeff King @ 2015-04-30 18:27 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: David Turner, Junio C Hamano, git mailing list

On Thu, Apr 30, 2015 at 12:04:10PM +0200, Andreas Schwab wrote:

> Jeff King <peff@peff.net> writes:
> 
> >   4. Return the last object we could resolve, as I described. So
> >      foo/bar/baz (with "../../../external" as its content) in this case.
> >      When you resolve a name, you can ask for the context we discovered
> >      along the way by traversing the tree. The mode is one example we've
> >      already discussed, but the path name is another. So something like:
> >
> >        echo "HEAD^{resolve}:fleem" |
> >        git cat-file --batch="%(objectname) %(size) %(intreepath)"
> >
> >      would show:
> >
> >        1234abcd 17 foo/bar/baz
> >        ../../../external
> >
> >      And then the caller knows that the path is not relative to the
> >      original "fleem", but rather to "foo/bar/baz".
> 
> Note that ".." will always follow the *physical* structure, so if
> foo/bar/baz is walking over symbolic links, "../../.." may lead you to
> somewhere else entirely.

True. I had not considered that, as git does not walk over such symlinks
at all currently. But presumably one would want it to to implement this
kind of "follow symlink" logic. IOW, we cannot just look up
"foo/bar/baz" in the first place, as that may not even exist in the
tree; we may need to realize that "foo" is a symlink and resolve that
first, then find "bar/baz" in the destination.

Which means that I think this has to be implemented as part of the name
resolution (i.e., the "^{resolve}") proposal. cat-file could not say:

  get_sha1_with_context("HEAD:foo/bar/baz", sha1, &ctx);
  if (S_ISLNK(ctx.mode))
     ... resolve ...

The initial get_sha1 would fail if "foo" is a symlink. Likewise, one
cannot implement this by querying cat-file repeatedly without asking for
each leading prefix (so ask for "HEAD:foo", see if it's a link, then
"HEAD:foo/bar", etc).

Of course it does not _have_ to be part of the normal get_sha1 name
resolution. But if not, it would have to reimplement the tree-walking
part of that name resolution.

Thanks for giving another interesting case to consider.

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-30 18:19                           ` Junio C Hamano
@ 2015-04-30 18:28                             ` David Turner
  2015-04-30 18:32                               ` Jeff King
  0 siblings, 1 reply; 43+ messages in thread
From: David Turner @ 2015-04-30 18:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Michael Haggerty, Jeff King, git mailing list

On Thu, 2015-04-30 at 11:19 -0700, Junio C Hamano wrote:
> David Turner <dturner@twopensource.com> writes:
> 
> > In no case did we do a ls-files command,...
> 
> "ls-tree -r" is what I would have imagined you would be using, as
> somebody needs to have the full repository in order to resolve the
> symbolic links _anyway_, and that somebody does not need to have a
> checkout in order to do so.

Yes, they have the full repo, but we are only exploring a small fraction
of it. ls-tree -r would require parsing the entire thing.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-30 18:28                             ` David Turner
@ 2015-04-30 18:32                               ` Jeff King
  2015-04-30 18:44                                 ` David Turner
  0 siblings, 1 reply; 43+ messages in thread
From: Jeff King @ 2015-04-30 18:32 UTC (permalink / raw)
  To: David Turner; +Cc: Junio C Hamano, Michael Haggerty, git mailing list

On Thu, Apr 30, 2015 at 11:28:42AM -0700, David Turner wrote:

> On Thu, 2015-04-30 at 11:19 -0700, Junio C Hamano wrote:
> > David Turner <dturner@twopensource.com> writes:
> > 
> > > In no case did we do a ls-files command,...
> > 
> > "ls-tree -r" is what I would have imagined you would be using, as
> > somebody needs to have the full repository in order to resolve the
> > symbolic links _anyway_, and that somebody does not need to have a
> > checkout in order to do so.
> 
> Yes, they have the full repo, but we are only exploring a small fraction
> of it. ls-tree -r would require parsing the entire thing.

git ls-tree HEAD -- BUILD ?

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-30 18:32                               ` Jeff King
@ 2015-04-30 18:44                                 ` David Turner
  2015-04-30 18:49                                   ` Jeff King
  0 siblings, 1 reply; 43+ messages in thread
From: David Turner @ 2015-04-30 18:44 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Michael Haggerty, git mailing list

On Thu, 2015-04-30 at 14:32 -0400, Jeff King wrote:
> On Thu, Apr 30, 2015 at 11:28:42AM -0700, David Turner wrote:
> 
> > On Thu, 2015-04-30 at 11:19 -0700, Junio C Hamano wrote:
> > > David Turner <dturner@twopensource.com> writes:
> > > 
> > > > In no case did we do a ls-files command,...
> > > 
> > > "ls-tree -r" is what I would have imagined you would be using, as
> > > somebody needs to have the full repository in order to resolve the
> > > symbolic links _anyway_, and that somebody does not need to have a
> > > checkout in order to do so.
> > 
> > Yes, they have the full repo, but we are only exploring a small fraction
> > of it. ls-tree -r would require parsing the entire thing.
> 
> git ls-tree HEAD -- BUILD ?


This does not actually seem to work (even with -r); it only recurses
into directories that are named BUILD, rather than being equivalent to
git ls-tree -r HEAD |grep /BUILD$.

Also, BUILD files are scattered throughout the tree, so the entire tree
would still need to be traversed.  At present, our monorepo is not quite
large enough for this to matter (a full ls-tree only takes me 0.6s), but
it is growing.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-30 18:44                                 ` David Turner
@ 2015-04-30 18:49                                   ` Jeff King
  2015-04-30 19:00                                     ` David Turner
  0 siblings, 1 reply; 43+ messages in thread
From: Jeff King @ 2015-04-30 18:49 UTC (permalink / raw)
  To: David Turner; +Cc: Junio C Hamano, Michael Haggerty, git mailing list

On Thu, Apr 30, 2015 at 11:44:50AM -0700, David Turner wrote:

> > git ls-tree HEAD -- BUILD ?
> 
> This does not actually seem to work (even with -r); it only recurses
> into directories that are named BUILD, rather than being equivalent to
> git ls-tree -r HEAD |grep /BUILD$.

Ah, I thought that was what you wanted (to find specific files, not a
pattern). I think `ls-tree` doesn't understand our normal pathspecs, for
historical reasons.

> Also, BUILD files are scattered throughout the tree, so the entire tree
> would still need to be traversed.  At present, our monorepo is not quite
> large enough for this to matter (a full ls-tree only takes me 0.6s), but
> it is growing.

But aren't you asking git to do that internally? I.e., it can limit the
traversal for a prefix-match, but it cannot do so for an arbitrary
filename. It has to open every tree. So the extra expense is really just
the I/O over the pipe. That's not optimal, but it is a constant factor
slowdown from what git would do internally.

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-30 18:49                                   ` Jeff King
@ 2015-04-30 19:00                                     ` David Turner
  2015-04-30 19:10                                       ` Jeff King
  0 siblings, 1 reply; 43+ messages in thread
From: David Turner @ 2015-04-30 19:00 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Michael Haggerty, git mailing list

On Thu, 2015-04-30 at 14:49 -0400, Jeff King wrote:
> On Thu, Apr 30, 2015 at 11:44:50AM -0700, David Turner wrote:
> 
> > > git ls-tree HEAD -- BUILD ?
> > 
> > This does not actually seem to work (even with -r); it only recurses
> > into directories that are named BUILD, rather than being equivalent to
> > git ls-tree -r HEAD |grep /BUILD$.
> 
> Ah, I thought that was what you wanted (to find specific files, not a
> pattern). I think `ls-tree` doesn't understand our normal pathspecs, for
> historical reasons.
> 
> > Also, BUILD files are scattered throughout the tree, so the entire tree
> > would still need to be traversed.  At present, our monorepo is not quite
> > large enough for this to matter (a full ls-tree only takes me 0.6s), but
> > it is growing.
> 
> But aren't you asking git to do that internally? I.e., it can limit the
> traversal for a prefix-match, but it cannot do so for an arbitrary
> filename. It has to open every tree. So the extra expense is really just
> the I/O over the pipe. That's not optimal, but it is a constant factor
> slowdown from what git would do internally.

No, I'm not trying to find all BUILD files -- only ones that are in the
transitive dependency tree of the target I'm trying to sparsely check
out. So if the target foo/bar/baz depends on morx/fleem, and morx/fleem
depends on plugh/xyzzy, then I have to examine those three places only.
I don't have to examine anything in the gibbberish/ subtree, for
instance.  

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-30 19:00                                     ` David Turner
@ 2015-04-30 19:10                                       ` Jeff King
  2015-04-30 19:17                                         ` David Turner
  0 siblings, 1 reply; 43+ messages in thread
From: Jeff King @ 2015-04-30 19:10 UTC (permalink / raw)
  To: David Turner; +Cc: Junio C Hamano, Michael Haggerty, git mailing list

On Thu, Apr 30, 2015 at 12:00:22PM -0700, David Turner wrote:

> > > Also, BUILD files are scattered throughout the tree, so the entire tree
> > > would still need to be traversed.  At present, our monorepo is not quite
> > > large enough for this to matter (a full ls-tree only takes me 0.6s), but
> > > it is growing.
> > 
> > But aren't you asking git to do that internally? I.e., it can limit the
> > traversal for a prefix-match, but it cannot do so for an arbitrary
> > filename. It has to open every tree. So the extra expense is really just
> > the I/O over the pipe. That's not optimal, but it is a constant factor
> > slowdown from what git would do internally.
> 
> No, I'm not trying to find all BUILD files -- only ones that are in the
> transitive dependency tree of the target I'm trying to sparsely check
> out. So if the target foo/bar/baz depends on morx/fleem, and morx/fleem
> depends on plugh/xyzzy, then I have to examine those three places only.
> I don't have to examine anything in the gibbberish/ subtree, for
> instance.  

OK, let me see if I understand your use case by parroting it back.

You _don't_ want to feed git a "find all BUILD" pattern, which is good
(because it doesn't work ;) ). You do want to feed it a set of raw
paths to find, because you're going to discover the paths yourself at
each step as you recurse through the dependency-chain of build files. 
You don't actually care about feeding those paths to "ls-tree" at all.
You care only about the _content_ at each path (and will parse that
content to see if you need to take a further recursive step).

So I think git out-of-the-box supports that pretty well (via cat-file).
And your sticking point is that some of the paths may involve symlinks
in the tree, so you want cat-file to answer "if I had checked this out
and typed cat /some/path/to/BUILD, what content would I get". Which
brings us back to the original symlink question.

Is that all accurate?

I'm not sure that helps with the "how to handle symlinks" discussion,
but at least your goals make sense to me at this point.

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-30 19:10                                       ` Jeff King
@ 2015-04-30 19:17                                         ` David Turner
  0 siblings, 0 replies; 43+ messages in thread
From: David Turner @ 2015-04-30 19:17 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Michael Haggerty, git mailing list

On Thu, 2015-04-30 at 15:10 -0400, Jeff King wrote:
> On Thu, Apr 30, 2015 at 12:00:22PM -0700, David Turner wrote:
> 
> > > > Also, BUILD files are scattered throughout the tree, so the entire tree
> > > > would still need to be traversed.  At present, our monorepo is not quite
> > > > large enough for this to matter (a full ls-tree only takes me 0.6s), but
> > > > it is growing.
> > > 
> > > But aren't you asking git to do that internally? I.e., it can limit the
> > > traversal for a prefix-match, but it cannot do so for an arbitrary
> > > filename. It has to open every tree. So the extra expense is really just
> > > the I/O over the pipe. That's not optimal, but it is a constant factor
> > > slowdown from what git would do internally.
> > 
> > No, I'm not trying to find all BUILD files -- only ones that are in the
> > transitive dependency tree of the target I'm trying to sparsely check
> > out. So if the target foo/bar/baz depends on morx/fleem, and morx/fleem
> > depends on plugh/xyzzy, then I have to examine those three places only.
> > I don't have to examine anything in the gibbberish/ subtree, for
> > instance.  
> 
> OK, let me see if I understand your use case by parroting it back.
> 
> You _don't_ want to feed git a "find all BUILD" pattern, which is good
> (because it doesn't work ;) ). You do want to feed it a set of raw
> paths to find, because you're going to discover the paths yourself at
> each step as you recurse through the dependency-chain of build files. 
> You don't actually care about feeding those paths to "ls-tree" at all.
> You care only about the _content_ at each path (and will parse that
> content to see if you need to take a further recursive step).
> 
> So I think git out-of-the-box supports that pretty well (via cat-file).
> And your sticking point is that some of the paths may involve symlinks
> in the tree, so you want cat-file to answer "if I had checked this out
> and typed cat /some/path/to/BUILD, what content would I get". Which
> brings us back to the original symlink question.
> 
> Is that all accurate?

Yes.  That is a very good summary.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-30 18:27                       ` Jeff King
@ 2015-04-30 19:18                         ` Junio C Hamano
  0 siblings, 0 replies; 43+ messages in thread
From: Junio C Hamano @ 2015-04-30 19:18 UTC (permalink / raw)
  To: Jeff King; +Cc: Andreas Schwab, David Turner, git mailing list

Jeff King <peff@peff.net> writes:

> Which means that I think this has to be implemented as part of the name
> resolution (i.e., the "^{resolve}") proposal. cat-file could not say:
>
>   get_sha1_with_context("HEAD:foo/bar/baz", sha1, &ctx);
>   if (S_ISLNK(ctx.mode))
>      ... resolve ...
>
> The initial get_sha1 would fail if "foo" is a symlink. Likewise, one
> cannot implement this by querying cat-file repeatedly without asking for
> each leading prefix (so ask for "HEAD:foo", see if it's a link, then
> "HEAD:foo/bar", etc).
>
> Of course it does not _have_ to be part of the normal get_sha1 name
> resolution. But if not, it would have to reimplement the tree-walking
> part of that name resolution.
>
> Thanks for giving another interesting case to consider.

Yup, everything above makes sense, and I think it is an argument for
making this new feature as part of the sha1-name infrastructure, if
only that it has to do some sort of tree-walking already anyway.

Thanks.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-30  3:37                   ` Jeff King
  2015-04-30  5:34                     ` Junio C Hamano
  2015-04-30 10:04                     ` Andreas Schwab
@ 2015-04-30 19:25                     ` David Turner
  2015-04-30 19:46                       ` Junio C Hamano
  2015-05-01  3:29                     ` David Turner
  3 siblings, 1 reply; 43+ messages in thread
From: David Turner @ 2015-04-30 19:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git mailing list

On Wed, 2015-04-29 at 23:37 -0400, Jeff King wrote:
>   3. Return an object with the symlink relative to the original
>      filename (so "../external" in this case). This is kind of weird,
>      though, because we're not just returning a string from the name
>      resolution. It's an actual object.  So we'd be generating a fake
>      object that doesn't actually exist in the object db and
>      returning that. Feeding that sha1 to another program would fail.

> I can't say that I'm excited about any of them. Perhaps you or somebody
> else can think of a more clever solution.
> 
> Note that the complication with (3) does come from my trying to push
> this down into the name-resolution code.

All else being equal, I would prefer the more general solution.  But
here, the generality comes with a price that seems somewhat high. 

When I think about the commands that might use this, cat-file and
ls-tree are at the top of the list (although as noted, I am only likely
to use cat-file, and it's not clear what ls-tree should do in the event
of an out-of-repo link). 

I could imagine someone caring about grep and diff.  Someone who cares
about grep would likely want it to be willing to go out-of-worktree (as
opposed to silently missing things).  I think we all agree that having
git go out-of-worktree is a mistake, so I'm not sure this use-case is
one that is supportable.

The weirdest case is log.  If I say git log HEAD^{resolve} --
foo/bar/baz, does it mean "commits that have touched what is now pointed
to by foo/bar/baz"?  Or does it mean "commits that have touched a thing
that was at that time pointed to by foo/bar/baz"? [1]  The second one is
more useful, since it could not otherwise be achieved.  But I think this
would require additional code in log on top of whatever additional code
is in sha1_name.  In other words, we would not get it for free just by
adjusting sha1_name.

Are there other relevant commands that I'm missing?  

If not, I think we should reconsider the original thought of just
supporting cat-file.  The nice thing about just supporting cat-file is
that for out-of-repo links we can add a special form to the output, that
does not contain a sha (since there is no corresponding sha in the
repo).  In other words, something like your solution 3, quoted above.


[1] See page 11 of http://inform7.com/learn/documents/WhitePaper.pdf
("Has the president ever been ill?") for a similar case.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-30 19:25                     ` David Turner
@ 2015-04-30 19:46                       ` Junio C Hamano
  2015-04-30 19:51                         ` Jeff King
  2015-04-30 20:05                         ` Junio C Hamano
  0 siblings, 2 replies; 43+ messages in thread
From: Junio C Hamano @ 2015-04-30 19:46 UTC (permalink / raw)
  To: David Turner; +Cc: Jeff King, git mailing list

David Turner <dturner@twopensource.com> writes:

> The weirdest case is log.  If I say git log HEAD^{resolve} --
> foo/bar/baz,...

That invocation does not make any sense to me, at least within the
context of what has been discussed for ^{resolve}, which is an
instruction to the "name to object name" mapping layer to notice
symbolic links while it traverses the tree containment relationships
starting from the root of the tree to arrive at a single object
name.

    git rev-parse HEAD^{resolve}:path/that/might/involve/symlink/some/where
    git cat-file HEAD^{resolve}:path/that/might/involve/symlink/some/where
    git grep -e pattern HEAD^{resolve}:path/that/might/involve/symlink/some/where

would, though.  In other words, ^{resolve} that is not followed by a
colon and path is something entirely different from what we have
been discussing.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-30 19:46                       ` Junio C Hamano
@ 2015-04-30 19:51                         ` Jeff King
  2015-04-30 20:05                         ` Junio C Hamano
  1 sibling, 0 replies; 43+ messages in thread
From: Jeff King @ 2015-04-30 19:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: David Turner, git mailing list

On Thu, Apr 30, 2015 at 12:46:02PM -0700, Junio C Hamano wrote:

> David Turner <dturner@twopensource.com> writes:
> 
> > The weirdest case is log.  If I say git log HEAD^{resolve} --
> > foo/bar/baz,...
> 
> That invocation does not make any sense to me, at least within the
> context of what has been discussed for ^{resolve}, which is an
> instruction to the "name to object name" mapping layer to notice
> symbolic links while it traverses the tree containment relationships
> starting from the root of the tree to arrive at a single object
> name.
> 
>     git rev-parse HEAD^{resolve}:path/that/might/involve/symlink/some/where
>     git cat-file HEAD^{resolve}:path/that/might/involve/symlink/some/where
>     git grep -e pattern HEAD^{resolve}:path/that/might/involve/symlink/some/where
> 
> would, though.  In other words, ^{resolve} that is not followed by a
> colon and path is something entirely different from what we have
> been discussing.

Yeah, I agree that HEAD^{resolve} without a colon does not make any
sense. In fact, I wanted to originally suggest a syntax that replaced
the colon with something else, to make it clear that it the modifier is
really about the colon. But I could not think of a character that was
readable and would not have backward-compatibility issues.

I guess you could spell it:

  HEAD^{resolve:foo/bar/baz}

but that opens up parsing questions for the filename. Would we allow "}"
in the filename? Or require that the "}" balance, which means
speculative parsing if there is more content after the trailing "}"
(e.g., you could in theory resolve to a tree and then stick a ":" with
more path after that. Yech).

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-30 19:46                       ` Junio C Hamano
  2015-04-30 19:51                         ` Jeff King
@ 2015-04-30 20:05                         ` Junio C Hamano
  1 sibling, 0 replies; 43+ messages in thread
From: Junio C Hamano @ 2015-04-30 20:05 UTC (permalink / raw)
  To: git mailing list; +Cc: Jeff King, David Turner

Junio C Hamano <gitster@pobox.com> writes:

> ... In other words, ^{resolve} that is not followed by a
> colon and path is something entirely different from what we have
> been discussing.

Having said that, I am not saying that such an alternative "follow
symbolic links in many other places" is a worthless suggestion.

It just does not work as an extended SHA-1 syntax.

    git rev-parse HEAD^{resolve}:RelNotes HEAD:Documentation

would make sense; RelNotes, if it were a symbolic link, is resolved,
while Documentation will never be.  On the other hand

    git log next^{resolve} master -- Documentation

will not make any sense, as it is a totally conflicting request.
Does it resolve symlinks only when encountering a commit that the
traversal that started from 'next' happened to have reached before
the traversal from 'master' got there?  What should happen for
commits that are reachable from both?

So even if (and this is a big if) such an "aggressive symlink
following" mode were useful for some commands, I think the switch
belongs to the command, not the per-object syntax.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-04-30  3:37                   ` Jeff King
                                       ` (2 preceding siblings ...)
  2015-04-30 19:25                     ` David Turner
@ 2015-05-01  3:29                     ` David Turner
  2015-05-01  5:36                       ` Jeff King
  3 siblings, 1 reply; 43+ messages in thread
From: David Turner @ 2015-05-01  3:29 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git mailing list

On Wed, 2015-04-29 at 23:37 -0400, Jeff King wrote:
> On Wed, Apr 29, 2015 at 06:45:45PM -0700, David Turner wrote:
> 
> > On Wed, 2015-04-29 at 21:16 -0400, Jeff King wrote:
> > > On Wed, Apr 29, 2015 at 06:06:23PM -0700, David Turner wrote:
> > >   3. Ditto for out-of-tree. Note that this would be the _raw_ symlink
> > >      contents, not any kind of simplification (so if you asked for
> > >      "foo/bar/baz" and it was "../../../../out", you would the full path
> > >      with all those dots, not a simplified "../out", which I think is
> > >      what you were trying to show in earlier examples).
> > 
> > Unfortunately, we need the simplified version, because we otherwise
> > don't know what the ..s are relative to in the case of a link to a link:
> > 
> >   echo content >dest ;# actual blob
> >   mkdir -p foo/bar
> >   ln -s foo/bar/baz fleem             # in-tree link-to-link 
> >   ln -s ../../../external foo/bar/baz # out-of-tree link
> > 
> > If echo HEAD^{resolve}:fleem were to return ../../../external (after
> > following the first symlink to the second), we would have lost
> > information.
> 
> Urgh, yeah, thanks for the counter-example.
> 
> Here are some possible alternatives:
> 
>   1. If we can't resolve fully, don't resolve anything. I.e., return the
>      "fleem" object here, and the caller can recurse if they want. This is
>      simple and correct, but not as helpful to somebody who wants to follow
>      the out-of-tree link (they have to re-traverse the fleem->foo/bar/baz
>      link themselves).
> 
>   2. Consider it can error if resolution fails. If you ask for
>      "HEAD^{tree}^{commit}", that does not resolve to anything (because
>      we can't peel the tree to a commit). Like (1), this is simple and
>      correct, but probably not all that helpful. The caller has to
>      start from scratch and resolve themselves, rather than getting an
>      intermediate result.
> 
>   3. Return an object with the symlink relative to the original
>      filename (so "../external" in this case). This is kind of weird,
>      though, because we're not just returning a string from the name
>      resolution. It's an actual object.  So we'd be generating a fake
>      object that doesn't actually exist in the object db and
>      returning that. Feeding that sha1 to another program would fail.
> 
>   4. Return the last object we could resolve, as I described. So
>      foo/bar/baz (with "../../../external" as its content) in this case.
>      When you resolve a name, you can ask for the context we discovered
>      along the way by traversing the tree. The mode is one example we've
>      already discussed, but the path name is another. So something like:
> 
>        echo "HEAD^{resolve}:fleem" |
>        git cat-file --batch="%(objectname) %(size) %(intreepath)"
> 
>      would show:
> 
>        1234abcd 17 foo/bar/baz
>        ../../../external
> 
>      And then the caller knows that the path is not relative to the
>      original "fleem", but rather to "foo/bar/baz".
> 
>      The problem is that although this context lookup is already part of
>      get_sha1_with_context, that is not exposed through every interface.
>      E.g., "git rev-parse HEAD^{resolve}:fleem" will give you an object,
>      but you have no way of knowing the context.
> 
> I can't say that I'm excited about any of them. Perhaps you or somebody
> else can think of a more clever solution.
> 
> Note that the complication with (3) does come from my trying to push
> this down into the name-resolution code.

Actually, I think 4 has an insurmountable problem.  Here's the case I'm
thinking of:

ln -s ..  morx

Imagine that we go to look up 'morx/fleem'.  Now morx is the "last
object we could resolve", but we don't know how much of our input has
been consumed at this point.  So consumers don't know that after they
exit the repo, they still need to find fleem next to it.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-05-01  3:29                     ` David Turner
@ 2015-05-01  5:36                       ` Jeff King
  2015-05-01 17:29                         ` David Turner
  0 siblings, 1 reply; 43+ messages in thread
From: Jeff King @ 2015-05-01  5:36 UTC (permalink / raw)
  To: David Turner; +Cc: Junio C Hamano, git mailing list

On Thu, Apr 30, 2015 at 08:29:14PM -0700, David Turner wrote:

> >   4. Return the last object we could resolve, as I described. So
> [...]
> 
> Actually, I think 4 has an insurmountable problem.  Here's the case I'm
> thinking of:
> 
> ln -s ..  morx
> 
> Imagine that we go to look up 'morx/fleem'.  Now morx is the "last
> object we could resolve", but we don't know how much of our input has
> been consumed at this point.  So consumers don't know that after they
> exit the repo, they still need to find fleem next to it.

Yes, agreed (my list was written before Andreas brought up the idea of
symlinks in the intermediate paths). I think to let the caller pick up
where you left off, you would have to create a new string that has the
"remainder" concatenated to it.

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-05-01  5:36                       ` Jeff King
@ 2015-05-01 17:29                         ` David Turner
  2015-05-01 20:11                           ` Jeff King
  0 siblings, 1 reply; 43+ messages in thread
From: David Turner @ 2015-05-01 17:29 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git mailing list

On Fri, 2015-05-01 at 01:36 -0400, Jeff King wrote:
> On Thu, Apr 30, 2015 at 08:29:14PM -0700, David Turner wrote:
> 
> > >   4. Return the last object we could resolve, as I described. So
> > [...]
> > 
> > Actually, I think 4 has an insurmountable problem.  Here's the case I'm
> > thinking of:
> > 
> > ln -s ..  morx
> > 
> > Imagine that we go to look up 'morx/fleem'.  Now morx is the "last
> > object we could resolve", but we don't know how much of our input has
> > been consumed at this point.  So consumers don't know that after they
> > exit the repo, they still need to find fleem next to it.
> 
> Yes, agreed (my list was written before Andreas brought up the idea of
> symlinks in the intermediate paths). I think to let the caller pick up
> where you left off, you would have to create a new string that has the
> "remainder" concatenated to it.

Since that new string does not exist in the object db, isn't that pretty
much proposal 3?  We could, in this case, provide a fake sha as well
("0"*40), to make it clear that the object does not exist.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-05-01 17:29                         ` David Turner
@ 2015-05-01 20:11                           ` Jeff King
  2015-05-01 21:09                             ` David Turner
  0 siblings, 1 reply; 43+ messages in thread
From: Jeff King @ 2015-05-01 20:11 UTC (permalink / raw)
  To: David Turner; +Cc: Junio C Hamano, git mailing list

On Fri, May 01, 2015 at 10:29:15AM -0700, David Turner wrote:

> > > Actually, I think 4 has an insurmountable problem.  Here's the case I'm
> > > thinking of:
> > > 
> > > ln -s ..  morx
> > > 
> > > Imagine that we go to look up 'morx/fleem'.  Now morx is the "last
> > > object we could resolve", but we don't know how much of our input has
> > > been consumed at this point.  So consumers don't know that after they
> > > exit the repo, they still need to find fleem next to it.
> > 
> > Yes, agreed (my list was written before Andreas brought up the idea of
> > symlinks in the intermediate paths). I think to let the caller pick up
> > where you left off, you would have to create a new string that has the
> > "remainder" concatenated to it.
> 
> Since that new string does not exist in the object db, isn't that pretty
> much proposal 3?  We could, in this case, provide a fake sha as well
> ("0"*40), to make it clear that the object does not exist.

Yes, I think it is the same as proposal 3. Complete with all of the
fake-object awkwardness. I'm not sure I like the fake-sha1 idea. The
general pattern for accessing an object is:

  1. Turn some user-provided name into an object (get_sha1).

  2. Retrieve that object content (read_sha1_file).

By pushing the symlink resolution into step 1, it "just works"
everywhere. But if we hand back a fake sha1, now every call-site has to
be aware of it.

I think the solutions range from:

  a. Put resolution in get_sha1. Return an error when we can't
     resolve. Callers are on their own to do anything else.

  b. Put resolution in get_sha1. If we can't resolve, return an error.
     If the _with_context variant is called, leave our partial result
     string there. Some callers may choose to expose that information
     (e.g., cat-file might), at which point the user can "pick up" where
     git leaves off for out-of-tree links.

  c. Forget about get_sha1. This gets implemented elsewhere (e.g., as a
     cat-file feature as you originally proposed).

Certainly (a) is tempting and simple, but my understanding of your use
case is that you would like to follow out-of-tree links.

It seems like (b) is the most flexible, in the sense that it would
solve your case, and allows "git rev-parse HEAD^{resolve}:foo" when the
result is well-formed inside the repository. But I wonder if it's
actually worth the complexity. Without exposing the information for the
user to continue the traversal, it seems like only half a solution for
those parts of the code. And we still have to design some kind of custom
output for cat-file to expose the context.

So maybe (c) really is the simplest way forward. I dunno. I know that's
coming full circle to your original proposal. Hopefully that isn't too
infuriating for you. ;)

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC: git cat-file --follow-symlinks?
  2015-05-01 20:11                           ` Jeff King
@ 2015-05-01 21:09                             ` David Turner
  0 siblings, 0 replies; 43+ messages in thread
From: David Turner @ 2015-05-01 21:09 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git mailing list

On Fri, 2015-05-01 at 16:11 -0400, Jeff King wrote:

> So maybe (c) really is the simplest way forward. I dunno. I know that's
> coming full circle to your original proposal. Hopefully that isn't too
> infuriating for you. ;)

It's exactly the opposite of infuriating.  Now I understand all the
issues much better, and it will probably be easier for me to implement.

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2015-05-01 21:09 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-29 20:57 RFC: git cat-file --follow-symlinks? David Turner
2015-04-29 21:16 ` Jonathan Nieder
2015-04-29 21:24   ` David Turner
2015-04-29 21:17 ` Junio C Hamano
2015-04-29 21:30   ` David Turner
2015-04-29 21:48     ` Jeff King
2015-04-29 22:19       ` Jonathan Nieder
2015-04-29 23:05         ` Jeff King
2015-04-29 22:29       ` David Turner
2015-04-29 23:11         ` Jeff King
2015-04-30  0:37           ` Jeff King
2015-04-30  1:06             ` David Turner
2015-04-30  1:16               ` Jeff King
2015-04-30  1:31                 ` Junio C Hamano
2015-04-30  3:18                   ` Jeff King
2015-04-30  1:45                 ` David Turner
2015-04-30  3:37                   ` Jeff King
2015-04-30  5:34                     ` Junio C Hamano
2015-04-30  8:12                       ` Michael Haggerty
2015-04-30 18:03                         ` David Turner
2015-04-30 18:19                           ` Junio C Hamano
2015-04-30 18:28                             ` David Turner
2015-04-30 18:32                               ` Jeff King
2015-04-30 18:44                                 ` David Turner
2015-04-30 18:49                                   ` Jeff King
2015-04-30 19:00                                     ` David Turner
2015-04-30 19:10                                       ` Jeff King
2015-04-30 19:17                                         ` David Turner
2015-04-30 10:04                     ` Andreas Schwab
2015-04-30 18:27                       ` Jeff King
2015-04-30 19:18                         ` Junio C Hamano
2015-04-30 19:25                     ` David Turner
2015-04-30 19:46                       ` Junio C Hamano
2015-04-30 19:51                         ` Jeff King
2015-04-30 20:05                         ` Junio C Hamano
2015-05-01  3:29                     ` David Turner
2015-05-01  5:36                       ` Jeff King
2015-05-01 17:29                         ` David Turner
2015-05-01 20:11                           ` Jeff King
2015-05-01 21:09                             ` David Turner
2015-04-29 21:49     ` Junio C Hamano
2015-04-29 22:47       ` David Turner
2015-04-30  8:10 ` Michael Haggerty

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.