All of lore.kernel.org
 help / color / mirror / Atom feed
* git-archive and tar options
@ 2011-07-13 23:34 Neal Kreitzinger
  2011-07-14  1:56 ` Jeff King
  0 siblings, 1 reply; 22+ messages in thread
From: Neal Kreitzinger @ 2011-07-13 23:34 UTC (permalink / raw)
  To: git

the git-archive manpage states:

"git archive [--format=<fmt>] [--list] [--prefix=<prefix>/] [<extra>] [-o 
| --output=<file>] [--worktree-attributes] [--remote=<repo> 
[--exec=<git-upload-archive>]] <tree-ish>  [path\u2026]

<extra>
    This can be any options that the archiver backend understands. See next 
section."

I have tar 1.23 and want to use the --transform option.  How can I feed 
git-archive additional tar options?

Working syntax starting points for git-archive and tar:

git archive --format=tar -o my.tar HEAD Web/Templates/
tar -cvf my.tar --transform 's,^Web/Templates/,myPath/myWeb/Templates/,' 
WebPortal/Templates/

Failed syntax attempts for feeding tar option to git-archive:

git archive --format=tar -o my.tar HEAD --transform 
's,^Web/Templates/,myPath/myWeb/Templates/,' WebPortal/Templates/
error: unknown option `transform'

git archive --format=tar -o my.tar --transform 
's,^Web/Templates/,myPath/myWeb/Templates/,' HEAD WebPortal/Templates/
error: unknown option `transform'


v/r,
neal 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: git-archive and tar options
  2011-07-13 23:34 git-archive and tar options Neal Kreitzinger
@ 2011-07-14  1:56 ` Jeff King
  2011-07-14 17:16   ` René Scharfe
  2011-07-14 17:48   ` Andreas Schwab
  0 siblings, 2 replies; 22+ messages in thread
From: Jeff King @ 2011-07-14  1:56 UTC (permalink / raw)
  To: Neal Kreitzinger; +Cc: git

On Wed, Jul 13, 2011 at 06:34:32PM -0500, Neal Kreitzinger wrote:

> the git-archive manpage states:
> 
> "git archive [--format=<fmt>] [--list] [--prefix=<prefix>/] [<extra>] [-o 
> | --output=<file>] [--worktree-attributes] [--remote=<repo> 
> [--exec=<git-upload-archive>]] <tree-ish>  [path\u2026]
> 
> <extra>
>     This can be any options that the archiver backend understands. See next 
> section."
>
> I have tar 1.23 and want to use the --transform option.  How can I feed 
> git-archive additional tar options?

Right. And the next section is "Backend Extra Options", which has:

   zip
       -0
           Store the files instead of deflating them.

       -9
           Highest and slowest compression level. You can specify any number from 1 to 9
           to adjust compression speed and ratio.

And nothing else. We don't actually call your system "tar" to generate
the tarball, which is what I assume you thought when you saw "backend".
A patch to make it more clear would be welcome.

> Working syntax starting points for git-archive and tar:
> 
> git archive --format=tar -o my.tar HEAD Web/Templates/
> tar -cvf my.tar --transform 's,^Web/Templates/,myPath/myWeb/Templates/,' 
> WebPortal/Templates/
> 
> Failed syntax attempts for feeding tar option to git-archive:
> 
> git archive --format=tar -o my.tar HEAD --transform 
> 's,^Web/Templates/,myPath/myWeb/Templates/,' WebPortal/Templates/
> error: unknown option `transform'
> 
> git archive --format=tar -o my.tar --transform 
> 's,^Web/Templates/,myPath/myWeb/Templates/,' HEAD WebPortal/Templates/
> error: unknown option `transform'

Yeah, that won't work, because there is no such option. We do have
"--prefix", but I suspect that's not flexible enough for what you want.

So you're probably stuck with extracting the results of "git archive" to
a temporary directory and then using GNU tar to re-archive them (or if
you have a checkout, you can just tar that up directly, feeding the list
from "git ls-files" into tar). It would be nice if GNU tar could act as
a post-processor, and do something like:

  git archive HEAD | tar --pipe-mode --transform=whatever >my.tar

But AFAIK, nothing like "--pipe-mode" exists.

It would probably not be a very hard feature to add to "git archive" if
you're interested in doing so.

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: git-archive and tar options
  2011-07-14  1:56 ` Jeff King
@ 2011-07-14 17:16   ` René Scharfe
  2011-07-14 17:27     ` Jeff King
  2011-07-14 17:48   ` Andreas Schwab
  1 sibling, 1 reply; 22+ messages in thread
From: René Scharfe @ 2011-07-14 17:16 UTC (permalink / raw)
  To: Jeff King; +Cc: Neal Kreitzinger, git

Am 14.07.2011 03:56, schrieb Jeff King:
> On Wed, Jul 13, 2011 at 06:34:32PM -0500, Neal Kreitzinger wrote:
>> Working syntax starting points for git-archive and tar:
>>
>> git archive --format=tar -o my.tar HEAD Web/Templates/
>> tar -cvf my.tar --transform 's,^Web/Templates/,myPath/myWeb/Templates/,' 
>> WebPortal/Templates/
>>
>> Failed syntax attempts for feeding tar option to git-archive:
>>
>> git archive --format=tar -o my.tar HEAD --transform 
>> 's,^Web/Templates/,myPath/myWeb/Templates/,' WebPortal/Templates/
>> error: unknown option `transform'
>>
>> git archive --format=tar -o my.tar --transform 
>> 's,^Web/Templates/,myPath/myWeb/Templates/,' HEAD WebPortal/Templates/
>> error: unknown option `transform'
> 
> Yeah, that won't work, because there is no such option. We do have
> "--prefix", but I suspect that's not flexible enough for what you want.

If you only need a single subdirectory with a custom prefix you could do
something like this (variables only used to keep the lines short):

	$ subdir=WebPortal/Templates
	$ prefix=myPath/myWeb/Templates/
	$ (cd "$subdir" && git archive --prefix="$prefix" HEAD) >my.tar

The output file can be specified with -o as well, of course, but you'd
either need to use an absolute path or add "../" for each directory
level you descend into (-o ../../my.tar in this case).

René

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: git-archive and tar options
  2011-07-14 17:16   ` René Scharfe
@ 2011-07-14 17:27     ` Jeff King
  2011-07-14 17:45       ` René Scharfe
                         ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Jeff King @ 2011-07-14 17:27 UTC (permalink / raw)
  To: René Scharfe; +Cc: Neal Kreitzinger, git

On Thu, Jul 14, 2011 at 07:16:24PM +0200, René Scharfe wrote:

> >> git archive --format=tar -o my.tar --transform 
> >> 's,^Web/Templates/,myPath/myWeb/Templates/,' HEAD WebPortal/Templates/
> >> error: unknown option `transform'
> > 
> > Yeah, that won't work, because there is no such option. We do have
> > "--prefix", but I suspect that's not flexible enough for what you want.
> 
> If you only need a single subdirectory with a custom prefix you could do
> something like this (variables only used to keep the lines short):
> 
> 	$ subdir=WebPortal/Templates
> 	$ prefix=myPath/myWeb/Templates/
> 	$ (cd "$subdir" && git archive --prefix="$prefix" HEAD) >my.tar
> 
> The output file can be specified with -o as well, of course, but you'd
> either need to use an absolute path or add "../" for each directory
> level you descend into (-o ../../my.tar in this case).

Couldn't you also do:

  git archive --prefix=$prefix HEAD:$subdir >my.tar

? I guess that loses the pax header with the commit sha1 in it, though,
because you are feeding a straight tree instead of a commit.

We didn't when git-archive was written, but these days we have
get_sha1_with_context to remember incidental things about an object we
look up. It should perhaps remember the commit (if any) we used to reach
a treeish, and then the above command line could still insert the pax
header.

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: git-archive and tar options
  2011-07-14 17:27     ` Jeff King
@ 2011-07-14 17:45       ` René Scharfe
  2011-07-14 18:18         ` Jeff King
  2011-07-14 21:23       ` Junio C Hamano
  2011-07-18 18:13       ` Neal Kreitzinger
  2 siblings, 1 reply; 22+ messages in thread
From: René Scharfe @ 2011-07-14 17:45 UTC (permalink / raw)
  To: Jeff King; +Cc: Neal Kreitzinger, git

Am 14.07.2011 19:27, schrieb Jeff King:
> On Thu, Jul 14, 2011 at 07:16:24PM +0200, René Scharfe wrote:
> 
>>>> git archive --format=tar -o my.tar --transform 
>>>> 's,^Web/Templates/,myPath/myWeb/Templates/,' HEAD WebPortal/Templates/
>>>> error: unknown option `transform'
>>>
>>> Yeah, that won't work, because there is no such option. We do have
>>> "--prefix", but I suspect that's not flexible enough for what you want.
>>
>> If you only need a single subdirectory with a custom prefix you could do
>> something like this (variables only used to keep the lines short):
>>
>> 	$ subdir=WebPortal/Templates
>> 	$ prefix=myPath/myWeb/Templates/
>> 	$ (cd "$subdir" && git archive --prefix="$prefix" HEAD) >my.tar
>>
>> The output file can be specified with -o as well, of course, but you'd
>> either need to use an absolute path or add "../" for each directory
>> level you descend into (-o ../../my.tar in this case).
> 
> Couldn't you also do:
> 
>   git archive --prefix=$prefix HEAD:$subdir >my.tar
> 
> ? I guess that loses the pax header with the commit sha1 in it, though,
> because you are feeding a straight tree instead of a commit.

Yes, and yes.

> We didn't when git-archive was written, but these days we have
> get_sha1_with_context to remember incidental things about an object we
> look up. It should perhaps remember the commit (if any) we used to reach
> a treeish, and then the above command line could still insert the pax
> header.

That's a good idea to increase consistency, as there shouldn't really be
a difference in output between the two subdirectory syntaxes.

I always wondered, however, if the embedded commit ID has really been
used to identify the corresponding version of an archive that somehow
lost its filename (due to being piped?).

René

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: git-archive and tar options
  2011-07-14  1:56 ` Jeff King
  2011-07-14 17:16   ` René Scharfe
@ 2011-07-14 17:48   ` Andreas Schwab
  2011-07-19 20:10     ` Sylvain Rabot
  1 sibling, 1 reply; 22+ messages in thread
From: Andreas Schwab @ 2011-07-14 17:48 UTC (permalink / raw)
  To: Jeff King; +Cc: Neal Kreitzinger, git

Jeff King <peff@peff.net> writes:

> So you're probably stuck with extracting the results of "git archive" to
> a temporary directory and then using GNU tar to re-archive them (or if
> you have a checkout, you can just tar that up directly, feeding the list
> from "git ls-files" into tar).

That would lose the embedded commit-id, though.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: git-archive and tar options
  2011-07-14 17:45       ` René Scharfe
@ 2011-07-14 18:18         ` Jeff King
  2011-07-14 19:12           ` Jakub Narebski
  0 siblings, 1 reply; 22+ messages in thread
From: Jeff King @ 2011-07-14 18:18 UTC (permalink / raw)
  To: René Scharfe; +Cc: Neal Kreitzinger, git

On Thu, Jul 14, 2011 at 07:45:07PM +0200, René Scharfe wrote:

> > We didn't when git-archive was written, but these days we have
> > get_sha1_with_context to remember incidental things about an object we
> > look up. It should perhaps remember the commit (if any) we used to reach
> > a treeish, and then the above command line could still insert the pax
> > header.
> 
> That's a good idea to increase consistency, as there shouldn't really be
> a difference in output between the two subdirectory syntaxes.

The patch to do this is pretty tiny. See below.

There are a few issues, though:

  1. I think this is probably the right thing to do, and most people
     will be happy about it. But I guess I can see an argument that the
     commit-id should not be there, as the subtree does not represent
     that commit.

     IOW, if you assume the commit-id in the output means
     "by the way, this came from commit X", this change is a good thing.
     If you assume it means "this is the tree from commit X", then it's
     not.  I have no idea how people use it. I never have, but I always
     assumed the use case was "I have this random tarball. Where did it
     come from?".

  2. The object_context already has the sha1 we want, but it is under
     the name "tree", which is not an accurate name. It's actually
     "whatever is on the left side of the :". Which should be a
     tree-ish, but could be a commit or a tree.

  3. It looks like we fill in object_context whenever we see something
     like "tree-ish:path". But we should perhaps also do so when peeling
     something like "tree-ish^{tree}".

> I always wondered, however, if the embedded commit ID has really been
> used to identify the corresponding version of an archive that somehow
> lost its filename (due to being piped?).

I dunno. I've never used it.

-- >8 --
Subject: [PATCH] archive: look harder for commit id

When "git archive" is given a commit, the output will
contain the commit sha1 (either as a pax header for tar
format, or in a file comment for zip).

When it's given a name that resolves to a tree, like:

  git archive git-1.7.0:Documentation

then the archive code never sees the commit, and no
commit-id is output. We can use get_sha1_with_context to
remember the commit that led us to that tree (if any).

Signed-off-by: Jeff King <peff@peff.net>
---
 archive.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/archive.c b/archive.c
index 42f2d2f..d0ba7fb 100644
--- a/archive.c
+++ b/archive.c
@@ -256,11 +256,14 @@ static void parse_treeish_arg(const char **argv,
 	struct tree *tree;
 	const struct commit *commit;
 	unsigned char sha1[20];
+	struct object_context oc;
 
-	if (get_sha1(name, sha1))
+	if (get_sha1_with_context(name, sha1, &oc))
 		die("Not a valid object name");
 
 	commit = lookup_commit_reference_gently(sha1, 1);
+	if (!commit)
+		commit = lookup_commit_reference_gently(oc.tree, 1);
 	if (commit) {
 		commit_sha1 = commit->object.sha1;
 		archive_time = commit->date;
-- 
1.7.6.38.ge5b33

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: git-archive and tar options
  2011-07-14 18:18         ` Jeff King
@ 2011-07-14 19:12           ` Jakub Narebski
  0 siblings, 0 replies; 22+ messages in thread
From: Jakub Narebski @ 2011-07-14 19:12 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, Neal Kreitzinger, git

Jeff King <peff@peff.net> writes:

> On Thu, Jul 14, 2011 at 07:45:07PM +0200, René Scharfe wrote:
> 
> > > We didn't when git-archive was written, but these days we have
> > > get_sha1_with_context to remember incidental things about an object we
> > > look up. It should perhaps remember the commit (if any) we used to reach
> > > a treeish, and then the above command line could still insert the pax
> > > header.
> > 
> > That's a good idea to increase consistency, as there shouldn't really be
> > a difference in output between the two subdirectory syntaxes.
> 
> The patch to do this is pretty tiny. See below.
> 
> There are a few issues, though:
> 
>   1. I think this is probably the right thing to do, and most people
>      will be happy about it. But I guess I can see an argument that the
>      commit-id should not be there, as the subtree does not represent
>      that commit.
> 
>      IOW, if you assume the commit-id in the output means
>      "by the way, this came from commit X", this change is a good thing.
>      If you assume it means "this is the tree from commit X", then it's
>      not.  I have no idea how people use it. I never have, but I always
>      assumed the use case was "I have this random tarball. Where did it
>      come from?".

Perhaps we should embed '<commit-id>:<subtree>' instead in pax header,
in that case?  Or <commit-id>.<subtree> if ':' is forbidden.

-- 
Jakub Narębski
Poland

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: git-archive and tar options
  2011-07-14 17:27     ` Jeff King
  2011-07-14 17:45       ` René Scharfe
@ 2011-07-14 21:23       ` Junio C Hamano
  2011-07-14 21:25         ` Jeff King
  2011-07-14 21:38         ` Jakub Narebski
  2011-07-18 18:13       ` Neal Kreitzinger
  2 siblings, 2 replies; 22+ messages in thread
From: Junio C Hamano @ 2011-07-14 21:23 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, Neal Kreitzinger, git

Jeff King <peff@peff.net> writes:

> Couldn't you also do:
>
>   git archive --prefix=$prefix HEAD:$subdir >my.tar
>
> ? I guess that loses the pax header with the commit sha1 in it, though,
> because you are feeding a straight tree instead of a commit.
>
> We didn't when git-archive was written, but these days we have
> get_sha1_with_context to remember incidental things about an object we
> look up. It should perhaps remember the commit (if any) we used to reach
> a treeish, and then the above command line could still insert the pax
> header.

Why?

The tree you are writing out that way look very different from what is
recorded in the commit object. What's the point of introducing confusion
by allowing many tarballs with different contents written from the same
commits with such tweaks all labelled with the same pax header?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: git-archive and tar options
  2011-07-14 21:23       ` Junio C Hamano
@ 2011-07-14 21:25         ` Jeff King
  2011-07-14 23:30           ` Junio C Hamano
  2011-07-14 21:38         ` Jakub Narebski
  1 sibling, 1 reply; 22+ messages in thread
From: Jeff King @ 2011-07-14 21:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, Neal Kreitzinger, git

On Thu, Jul 14, 2011 at 02:23:10PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Couldn't you also do:
> >
> >   git archive --prefix=$prefix HEAD:$subdir >my.tar
> >
> > ? I guess that loses the pax header with the commit sha1 in it, though,
> > because you are feeding a straight tree instead of a commit.
> >
> > We didn't when git-archive was written, but these days we have
> > get_sha1_with_context to remember incidental things about an object we
> > look up. It should perhaps remember the commit (if any) we used to reach
> > a treeish, and then the above command line could still insert the pax
> > header.
> 
> Why?
> 
> The tree you are writing out that way look very different from what is
> recorded in the commit object. What's the point of introducing confusion
> by allowing many tarballs with different contents written from the same
> commits with such tweaks all labelled with the same pax header?

See my later message. I think it depends on how the embedded id is used.
Is it to say "this represents the tree of this git commit"? Or is it to
help people who later have a tarball and have no clue which commit it
might have come from?

I don't have a strong opinion either way. I've never used this feature
at all.

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: git-archive and tar options
  2011-07-14 21:23       ` Junio C Hamano
  2011-07-14 21:25         ` Jeff King
@ 2011-07-14 21:38         ` Jakub Narebski
  1 sibling, 0 replies; 22+ messages in thread
From: Jakub Narebski @ 2011-07-14 21:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, René Scharfe, Neal Kreitzinger, git

Junio C Hamano <gitster@pobox.com> writes:

> Jeff King <peff@peff.net> writes:
> 
> > Couldn't you also do:
> >
> >   git archive --prefix=$prefix HEAD:$subdir >my.tar
> >
> > ? I guess that loses the pax header with the commit sha1 in it, though,
> > because you are feeding a straight tree instead of a commit.
> >
> > We didn't when git-archive was written, but these days we have
> > get_sha1_with_context to remember incidental things about an object we
> > look up. It should perhaps remember the commit (if any) we used to reach
> > a treeish, and then the above command line could still insert the pax
> > header.
> 
> Why?
> 
> The tree you are writing out that way look very different from what is
> recorded in the commit object. What's the point of introducing confusion
> by allowing many tarballs with different contents written from the same
> commits with such tweaks all labelled with the same pax header?
 
Perhaps pax header should contain <commit-id>:<subdir> then?
Just a thought.

-- 
Jakub Narebski

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: git-archive and tar options
  2011-07-14 21:25         ` Jeff King
@ 2011-07-14 23:30           ` Junio C Hamano
  2011-07-15 20:59             ` René Scharfe
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2011-07-14 23:30 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, Neal Kreitzinger, git

Jeff King <peff@peff.net> writes:

>> Why?
>> 
>> The tree you are writing out that way look very different from what is
>> recorded in the commit object. What's the point of introducing confusion
>> by allowing many tarballs with different contents written from the same
>> commits with such tweaks all labelled with the same pax header?
>
> See my later message. I think it depends on how the embedded id is used.
> Is it to say "this represents the tree of this git commit"? Or is it to
> help people who later have a tarball and have no clue which commit it
> might have come from?

People, who have no clue which part of the subtree was extract and what
leading path was added, would still have to wonder where the tree came
from even with the embedded id. Without your patch, if the tarball has an
embedded id, wouldn't they at least be able to assume it is the whole
thing of that commit? If you label a randomly mutated tree with the same
label, you cannot tell the genuine one from manipulated ones.

Not that I have strong opinions on this, either, but that is what I meant
by "_introducing_" confusion.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: git-archive and tar options
  2011-07-14 23:30           ` Junio C Hamano
@ 2011-07-15 20:59             ` René Scharfe
  2011-07-18 19:31               ` Neal Kreitzinger
  0 siblings, 1 reply; 22+ messages in thread
From: René Scharfe @ 2011-07-15 20:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Neal Kreitzinger, git

Am 15.07.2011 01:30, schrieb Junio C Hamano:
> Jeff King <peff@peff.net> writes:
> 
>>> Why?
>>>
>>> The tree you are writing out that way look very different from what is
>>> recorded in the commit object. What's the point of introducing confusion
>>> by allowing many tarballs with different contents written from the same
>>> commits with such tweaks all labelled with the same pax header?
>>
>> See my later message. I think it depends on how the embedded id is used.
>> Is it to say "this represents the tree of this git commit"? Or is it to
>> help people who later have a tarball and have no clue which commit it
>> might have come from?
> 
> People, who have no clue which part of the subtree was extract and what
> leading path was added, would still have to wonder where the tree came
> from even with the embedded id. Without your patch, if the tarball has an
> embedded id, wouldn't they at least be able to assume it is the whole
> thing of that commit? If you label a randomly mutated tree with the same
> label, you cannot tell the genuine one from manipulated ones.
> 
> Not that I have strong opinions on this, either, but that is what I meant
> by "_introducing_" confusion.

When we started to write the ID into generated archives, there was only
git-tar-tree and no <rev>:<path> syntax.  It would write the ID only if
it was given a commit and not if it got a tree or if the user started it
from a subdirectory.  The result was that only the full tree of a commit
was branded with the commit ID.

Now we have git archive, a more flexible command line syntax all around,
path limiting as well as attributes that can affect the contents of the
files in the archive.  Back then the commmit ID was sufficient as a
concise and canonical label of the archive contents, but now things are
a bit more complicated.

Which use cases are we aiming for?  Do we want to include all of the
command line arguments (with revs resolved to SHA1-IDs)?  Only those
that modify archive contents?  And any applied attributes?  Or do we
want to get stricter and only write the commit ID if a full unchanged
tree of a commit is being archived?

René

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: git-archive and tar options
  2011-07-14 17:27     ` Jeff King
  2011-07-14 17:45       ` René Scharfe
  2011-07-14 21:23       ` Junio C Hamano
@ 2011-07-18 18:13       ` Neal Kreitzinger
  2011-07-18 20:50         ` René Scharfe
  2 siblings, 1 reply; 22+ messages in thread
From: Neal Kreitzinger @ 2011-07-18 18:13 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, Neal Kreitzinger, git

On 7/14/2011 12:27 PM, Jeff King wrote:
> On Thu, Jul 14, 2011 at 07:16:24PM +0200, René Scharfe wrote:
>
>>>> git archive --format=tar -o my.tar --transform
>>>> 's,^Web/Templates/,myPath/myWeb/Templates/,' HEAD
>>>> WebPortal/Templates/ error: unknown option `transform'
>>>
>>> Yeah, that won't work, because there is no such option. We do
>>> have "--prefix", but I suspect that's not flexible enough for
>>> what you want.
>>
>> If you only need a single subdirectory with a custom prefix you
>> could do something like this (variables only used to keep the lines
>> short):
>>
>> $ subdir=WebPortal/Templates $ prefix=myPath/myWeb/Templates/ $ (cd
>> "$subdir"&&  git archive --prefix="$prefix" HEAD)>my.tar
>>
>> The output file can be specified with -o as well, of course, but
>> you'd either need to use an absolute path or add "../" for each
>> directory level you descend into (-o ../../my.tar in this case).
>
> Couldn't you also do:
>
> git archive --prefix=$prefix HEAD:$subdir>my.tar
>
> ? I guess that loses the pax header with the commit sha1 in it,
> though, because you are feeding a straight tree instead of a commit.
>
> We didn't when git-archive was written, but these days we have
> get_sha1_with_context to remember incidental things about an object
> we look up. It should perhaps remember the commit (if any) we used to
> reach a treeish, and then the above command line could still insert
> the pax header.
>
HEAD:$subdir worked on my bare repo.  I ran it for each transformant 
pathspec and then combined the archives with tar --catenate:

# git archive --format=tar --prefix=myWeb/myRoot/myAPP/Templates/
HEAD:WebPortal/Templates/ >myAPP.myTag.tar
# git archive --format=tar --prefix=opt/mySTUFF/v01/SCRIPTS/
HEAD:SCRIPTS/ >SCRIPTS.tar
# tar --file=myAPP.myTag.tar -A SCRIPTS.tar

However, the permissions also need to change to 777 and tar --mode would 
not effect this in combination with --catenation or -x.  Is there a way 
I can change the permissions without having to untar->chmod->retar, and 
without having to use a non-bare repo as an intermediary?

v/r,
neal

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: git-archive and tar options
  2011-07-15 20:59             ` René Scharfe
@ 2011-07-18 19:31               ` Neal Kreitzinger
  2011-07-18 20:50                 ` René Scharfe
  0 siblings, 1 reply; 22+ messages in thread
From: Neal Kreitzinger @ 2011-07-18 19:31 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, Jeff King, Neal Kreitzinger, git

On 7/15/2011 3:59 PM, René Scharfe wrote:
> Am 15.07.2011 01:30, schrieb Junio C Hamano:
>> Jeff King<peff@peff.net>  writes:
>>
>>>> Why?
>>>>
>>>> The tree you are writing out that way look very different from
>>>>  what is recorded in the commit object. What's the point of
>>>> introducing confusion by allowing many tarballs with different
>>>>  contents written from the same commits with such tweaks all
>>>> labelled with the same pax header?
>>>
>>> See my later message. I think it depends on how the embedded id
>>> is used. Is it to say "this represents the tree of this git
>>> commit"? Or is it to help people who later have a tarball and
>>> have no clue which commit it might have come from?
>>
>> People, who have no clue which part of the subtree was extract and
>>  what leading path was added, would still have to wonder where the
>>  tree came from even with the embedded id. Without your patch, if
>> the tarball has an embedded id, wouldn't they at least be able to
>> assume it is the whole thing of that commit? If you label a
>> randomly mutated tree with the same label, you cannot tell the
>> genuine one from manipulated ones.
>>
>> Not that I have strong opinions on this, either, but that is what I
>> meant by "_introducing_" confusion.
>
> When we started to write the ID into generated archives, there was
> only git-tar-tree and no<rev>:<path>  syntax.  It would write the ID
>  only if it was given a commit and not if it got a tree or if the
> user started it from a subdirectory.  The result was that only the
> full tree of a commit was branded with the commit ID.
>
> Now we have git archive, a more flexible command line syntax all
> around, path limiting as well as attributes that can affect the
> contents of the files in the archive.  Back then the commmit ID was
> sufficient as a concise and canonical label of the archive contents,
>  but now things are a bit more complicated.
>
> Which use cases are we aiming for?  Do we want to include all of the
> command line arguments (with revs resolved to SHA1-IDs)?  Only those
> that modify archive contents?  And any applied attributes?  Or do we
> want to get stricter and only write the commit ID if a full unchanged
> tree of a commit is being archived?
>
In regards to the use cases you enumerated, I think logging the command
line syntax along with the appropriate ref context (HEAD value, etc)
would document exactly what's in the archive.

In regards to use cases in general, my impression is that git-archive is 
for producing archives useful for deployment.  The target deployed 
structure may vary so expecting the source git repo to reflect this is 
unfeasable.  It seems like utilizing the local tar installation would 
effect the necessary transformations. I'm not sure what the source and 
target tar version disparity problems might me.

A practical problem with the pax header is that its only useful if you
still have the archive.  Archives usually get deleted after being
extracted.  Therefore, an option to also generate (and add to the 
archive) an automatic "VERSION.TXT" file of some sort which specifies 
the context of the archive would be much more useful.  It would need its 
own --prefix option because oftentimes it would be dynamically generated 
based on the git-archive request.

Another use case is that it seems like there should also be the option 
to only tar the objects changed between a specified range of commits. 
However, I'm not sure if tar can handle deletions (moves, deletions, 
renames) upon extraction in this context.

I can see that my use cases are something that I can script myself, but 
to do so it seems like I would be better off using a non-bare repo 
checkout as an intermediary.  If that is what I am expected to do then I 
am not sure what the usefulness of git-archive is intended to be.  Maybe 
I don't understand what others use it for.

v/r,
neal

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: git-archive and tar options
  2011-07-18 19:31               ` Neal Kreitzinger
@ 2011-07-18 20:50                 ` René Scharfe
  0 siblings, 0 replies; 22+ messages in thread
From: René Scharfe @ 2011-07-18 20:50 UTC (permalink / raw)
  To: Neal Kreitzinger; +Cc: Junio C Hamano, Jeff King, Neal Kreitzinger, git

Am 18.07.2011 21:31, schrieb Neal Kreitzinger:
> In regards to use cases in general, my impression is that git-archive is
> for producing archives useful for deployment.  The target deployed
> structure may vary so expecting the source git repo to reflect this is
> unfeasable.  It seems like utilizing the local tar installation would
> effect the necessary transformations. I'm not sure what the source and
> target tar version disparity problems might me.

Direct deployment is not an _intended_ use case, but I see that it might
be useful for that, especially with scripting languages.

I'm not sure I like tar's --transform option, though.  This seems to be
too heavy a solution.  For your example it would be enough to support
multiple tree arguments (with their own respective prefixes) in one go.

> A practical problem with the pax header is that its only useful if you
> still have the archive.  Archives usually get deleted after being
> extracted.  Therefore, an option to also generate (and add to the
> archive) an automatic "VERSION.TXT" file of some sort which specifies
> the context of the archive would be much more useful.  It would need its
> own --prefix option because oftentimes it would be dynamically generated
> based on the git-archive request.

The attribute export-subst with its $Format:$ expansion is intended to
be used for such version files.  It still lacks the ability to produce
git-describe-style version strings, but commit hashes can be used instead.

> Another use case is that it seems like there should also be the option
> to only tar the objects changed between a specified range of commits.
> However, I'm not sure if tar can handle deletions (moves, deletions,
> renames) upon extraction in this context.

Well, you could build a list of paths using git log --name-status or
similar and feed that to git archive.  If you want to keep a directory
in sync with a repo, why not use git checkout, though? :)

> I can see that my use cases are something that I can script myself, but
> to do so it seems like I would be better off using a non-bare repo
> checkout as an intermediary.  If that is what I am expected to do then I
> am not sure what the usefulness of git-archive is intended to be.  Maybe
> I don't understand what others use it for.

The primary use case is to create source code archives that people can
download, build and deploy who are not interested in downloading the
whole history or in using git at all.

René

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: git-archive and tar options
  2011-07-18 18:13       ` Neal Kreitzinger
@ 2011-07-18 20:50         ` René Scharfe
  2011-07-19  0:12           ` Neal Kreitzinger
  0 siblings, 1 reply; 22+ messages in thread
From: René Scharfe @ 2011-07-18 20:50 UTC (permalink / raw)
  To: Neal Kreitzinger; +Cc: Jeff King, Neal Kreitzinger, git

Am 18.07.2011 20:13, schrieb Neal Kreitzinger:
> HEAD:$subdir worked on my bare repo.  I ran it for each transformant
> pathspec and then combined the archives with tar --catenate:
> 
> # git archive --format=tar --prefix=myWeb/myRoot/myAPP/Templates/
> HEAD:WebPortal/Templates/ >myAPP.myTag.tar
> # git archive --format=tar --prefix=opt/mySTUFF/v01/SCRIPTS/
> HEAD:SCRIPTS/ >SCRIPTS.tar
> # tar --file=myAPP.myTag.tar -A SCRIPTS.tar
> 
> However, the permissions also need to change to 777 and tar --mode would
> not effect this in combination with --catenation or -x.  Is there a way
> I can change the permissions without having to untar->chmod->retar, and
> without having to use a non-bare repo as an intermediary?

You can use the configuration setting tar.umask to affect the
permissions of the archive entries.  Set it to 0 to pass the permission
bits from the repo unchanged.

René

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: git-archive and tar options
  2011-07-18 20:50         ` René Scharfe
@ 2011-07-19  0:12           ` Neal Kreitzinger
  2011-07-19 17:56             ` René Scharfe
  0 siblings, 1 reply; 22+ messages in thread
From: Neal Kreitzinger @ 2011-07-19  0:12 UTC (permalink / raw)
  To: René Scharfe; +Cc: Jeff King, Neal Kreitzinger, git

On 7/18/2011 3:50 PM, René Scharfe wrote:
> Am 18.07.2011 20:13, schrieb Neal Kreitzinger:
>> HEAD:$subdir worked on my bare repo.  I ran it for each transformant
>> pathspec and then combined the archives with tar --catenate:
>>
>> # git archive --format=tar --prefix=myWeb/myRoot/myAPP/Templates/
>> HEAD:WebPortal/Templates/>myAPP.myTag.tar
>> # git archive --format=tar --prefix=opt/mySTUFF/v01/SCRIPTS/
>> HEAD:SCRIPTS/>SCRIPTS.tar
>> # tar --file=myAPP.myTag.tar -A SCRIPTS.tar
>>
>> However, the permissions also need to change to 777 and tar --mode would
>> not effect this in combination with --catenation or -x.  Is there a way
>> I can change the permissions without having to untar->chmod->retar, and
>> without having to use a non-bare repo as an intermediary?
>
> You can use the configuration setting tar.umask to affect the
> permissions of the archive entries.  Set it to 0 to pass the permission
> bits from the repo unchanged.
>
The permissions in my repo are 775 and 664 and I want to change them to 777.

-neal

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: git-archive and tar options
  2011-07-19  0:12           ` Neal Kreitzinger
@ 2011-07-19 17:56             ` René Scharfe
  2011-07-21  2:13               ` Neal Kreitzinger
  0 siblings, 1 reply; 22+ messages in thread
From: René Scharfe @ 2011-07-19 17:56 UTC (permalink / raw)
  To: Neal Kreitzinger; +Cc: Jeff King, Neal Kreitzinger, git

Am 19.07.2011 02:12, schrieb Neal Kreitzinger:
> On 7/18/2011 3:50 PM, René Scharfe wrote:
>> Am 18.07.2011 20:13, schrieb Neal Kreitzinger:
>>> However, the permissions also need to change to 777 and tar --mode would
>>> not effect this in combination with --catenation or -x.  Is there a way
>>> I can change the permissions without having to untar->chmod->retar, and
>>> without having to use a non-bare repo as an intermediary?
>>
>> You can use the configuration setting tar.umask to affect the
>> permissions of the archive entries.  Set it to 0 to pass the permission
>> bits from the repo unchanged.
>>
> The permissions in my repo are 775 and 664 and I want to change them to
> 777.

Git doesn't store all permission bits.  If a file is marked as
executable then you get 777, otherwise 666 -- minus the umask, which is
0002 by default.  So in order to achive rwx permissions for all in the
archive, you need to A) mark the files as executable in the repository
and B) set tar.umask to 0 to get allow the world to write.

However, what's the reason for requiring this lack of access control?
Why o+w?

René

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: git-archive and tar options
  2011-07-14 17:48   ` Andreas Schwab
@ 2011-07-19 20:10     ` Sylvain Rabot
  0 siblings, 0 replies; 22+ messages in thread
From: Sylvain Rabot @ 2011-07-19 20:10 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Jeff King, Neal Kreitzinger, git

[-- Attachment #1: Type: text/plain, Size: 573 bytes --]

On Thu, 2011-07-14 at 19:48 +0200, Andreas Schwab wrote:
> Jeff King <peff@peff.net> writes:
> 
> > So you're probably stuck with extracting the results of "git archive" to
> > a temporary directory and then using GNU tar to re-archive them (or if
> > you have a checkout, you can just tar that up directly, feeding the list
> > from "git ls-files" into tar).
> 
> That would lose the embedded commit-id, though.
> 
> Andreas.
> 

You can pass the commit id this way.

$ tar --pax-option "comment=$COMMIT" ...

-- 
Sylvain Rabot <sylvain@abstraction.fr>

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: git-archive and tar options
  2011-07-19 17:56             ` René Scharfe
@ 2011-07-21  2:13               ` Neal Kreitzinger
  2011-07-21 16:59                 ` Neal Kreitzinger
  0 siblings, 1 reply; 22+ messages in thread
From: Neal Kreitzinger @ 2011-07-21  2:13 UTC (permalink / raw)
  To: René Scharfe; +Cc: Jeff King, Neal Kreitzinger, git

On 7/19/2011 12:56 PM, René Scharfe wrote:
> Am 19.07.2011 02:12, schrieb Neal Kreitzinger:
>> On 7/18/2011 3:50 PM, René Scharfe wrote:
>>> Am 18.07.2011 20:13, schrieb Neal Kreitzinger:
>>>> However, the permissions also need to change to 777 and tar --mode would
>>>> not effect this in combination with --catenation or -x.  Is there a way
>>>> I can change the permissions without having to untar->chmod->retar, and
>>>> without having to use a non-bare repo as an intermediary?
>>> You can use the configuration setting tar.umask to affect the
>>> permissions of the archive entries.  Set it to 0 to pass the permission
>>> bits from the repo unchanged.
>>>
>> The permissions in my repo are 775 and 664 and I want to change them to
>> 777.
> Git doesn't store all permission bits.  If a file is marked as
> executable then you get 777, otherwise 666 -- minus the umask, which is
> 0002 by default.  So in order to achive rwx permissions for all in the
> archive, you need to A) mark the files as executable in the repository
> and B) set tar.umask to 0 to get allow the world to write.
>
> However, what's the reason for requiring this lack of access control?
> Why o+w?
tar.umask worked.  Thank you for explaining how the permissions work in 
this context.  I now see that 775 and 664 would work for the apache 
component and for executing our binaries.  Thanks for pointing this 
out.  However, another element of our application is a proprietary 
runtime that runs on top of linux and runs our core binaries.  This 
allows us to store our binaries in git and deploy them directly on the 
customer server from git (via git-archive).  That runtime needs o+w in 
order to update the 'last run date' in the binary which is critical to 
our troubleshooting in the field.  o+w is needed because the user's 
runtime instance runs with user permissions when executed from a linux 
command line terminal and our users are not setup in the same group as 
the binaries.  Therefore, with tar.umask = 0000 I can deploy 777 and 666 
permissions and everything will work.

I suppose I could write a script to change the tar.umask entry to 0000 
only when running git-archive for the binary portion, and use tar.umask 
0002 when extracting the other portions.  I could also change our setup 
to put the users and the runmodules in the same group and use tar.umask 
0002 across the board.  These would be more correct than the chmod 777 
shotgun that we currently use to blast away our permissions problems.

git-archive is a "quick" solution to our immediate deployment needs.  
Eventually, I plan on using git on the source and target machines as the 
core mechanism to "promote to production" (ie. deploy to customer 
servers).  It looks like others are using git for deployment also.  In 
my previous shops which used other VCS's on minicomputers and 
mainframes, "promote to production" meant the universal run path for all 
users (and especially for productional data transactions) on that 
central machine.  In my current shop (my first linux shop) we have 
multiple concurrent versions of production on a multitude of 
productional machines and even concurrently on an individual 
productional machine in some cases.  The main reason we chose git is 
because it is the only VCS that can handle this.

v/r,
neal

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: git-archive and tar options
  2011-07-21  2:13               ` Neal Kreitzinger
@ 2011-07-21 16:59                 ` Neal Kreitzinger
  0 siblings, 0 replies; 22+ messages in thread
From: Neal Kreitzinger @ 2011-07-21 16:59 UTC (permalink / raw)
  Cc: René Scharfe, Jeff King, Neal Kreitzinger, git

On 7/20/2011 9:13 PM, Neal Kreitzinger wrote:
> On 7/19/2011 12:56 PM, René Scharfe wrote:
>> Am 19.07.2011 02:12, schrieb Neal Kreitzinger:
>>> On 7/18/2011 3:50 PM, René Scharfe wrote:
>>>> Am 18.07.2011 20:13, schrieb Neal Kreitzinger:
>>>>> However, the permissions also need to change to 777 and tar
>>>>> --mode would not effect this in combination with --catenation
>>>>> or -x. Is there a way I can change the permissions without
>>>>> having to untar->chmod->retar, and without having to use a
>>>>> non-bare repo as an intermediary?
>>>> You can use the configuration setting tar.umask to affect the
>>>> permissions of the archive entries. Set it to 0 to pass the
>>>> permission bits from the repo unchanged.
>>>>
>>> The permissions in my repo are 775 and 664 and I want to change
>>> them to 777.
>> Git doesn't store all permission bits. If a file is marked as
>> executable then you get 777, otherwise 666 -- minus the umask,
>> which is 0002 by default. So in order to achive rwx permissions for
>> all in the archive, you need to A) mark the files as executable in
>> the repository and B) set tar.umask to 0 to get allow the world to
>> write.
>>
>> However, what's the reason for requiring this lack of access
>> control? Why o+w?
> tar.umask worked. Thank you for explaining how the permissions work
> in this context. I now see that 775 and 664 would work for the apache
>  component and for executing our binaries. Thanks for pointing this
> out. However, another element of our application is a proprietary
> runtime that runs on top of linux and runs our core binaries. This
> allows us to store our binaries in git and deploy them directly on
> the customer server from git (via git-archive). That runtime needs
> o+w in order to update the 'last run date' in the binary which is
> critical to our troubleshooting in the field. o+w is needed because
> the user's runtime instance runs with user permissions when executed
> from a linux command line terminal and our users are not setup in the
> same group as the binaries. Therefore, with tar.umask = 0000 I can
> deploy 777 and 666 permissions and everything will work.
>
> I suppose I could write a script to change the tar.umask entry to
> 0000 only when running git-archive for the binary portion, and use
> tar.umask 0002 when extracting the other portions. I could also
> change our setup to put the users and the runmodules in the same
> group and use tar.umask 0002 across the board. These would be more
> correct than the chmod 777 shotgun that we currently use to blast
> away our permissions problems.
>
> git-archive is a "quick" solution to our immediate deployment needs.
>  Eventually, I plan on using git on the source and target machines as
> the core mechanism to "promote to production" (ie. deploy to customer
>  servers). It looks like others are using git for deployment also. In
> my previous shops which used other VCS's on minicomputers and
> mainframes, "promote to production" meant the universal run path for
> all users (and especially for productional data transactions) on that
> central machine. In my current shop (my first linux shop) we have
> multiple concurrent versions of production on a multitude of
> productional machines and even concurrently on an individual
> productional machine in some cases. The main reason we chose git is
> because it is the only VCS that can handle this.
>
Actually, the apache user (web interface) also needs to be able to
update the binaries with 'last run date'. In this context the o+w allows
this, also.  I don't know enough about apache and permissions to try and 
add apache to the same group as the binaries at this point.

-neal

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2011-07-21 16:59 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-13 23:34 git-archive and tar options Neal Kreitzinger
2011-07-14  1:56 ` Jeff King
2011-07-14 17:16   ` René Scharfe
2011-07-14 17:27     ` Jeff King
2011-07-14 17:45       ` René Scharfe
2011-07-14 18:18         ` Jeff King
2011-07-14 19:12           ` Jakub Narebski
2011-07-14 21:23       ` Junio C Hamano
2011-07-14 21:25         ` Jeff King
2011-07-14 23:30           ` Junio C Hamano
2011-07-15 20:59             ` René Scharfe
2011-07-18 19:31               ` Neal Kreitzinger
2011-07-18 20:50                 ` René Scharfe
2011-07-14 21:38         ` Jakub Narebski
2011-07-18 18:13       ` Neal Kreitzinger
2011-07-18 20:50         ` René Scharfe
2011-07-19  0:12           ` Neal Kreitzinger
2011-07-19 17:56             ` René Scharfe
2011-07-21  2:13               ` Neal Kreitzinger
2011-07-21 16:59                 ` Neal Kreitzinger
2011-07-14 17:48   ` Andreas Schwab
2011-07-19 20:10     ` Sylvain Rabot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.