git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] import-tars: ignore the global PAX header
@ 2020-03-23 13:08 Johannes Schindelin via GitGitGadget
  2020-03-23 17:09 ` René Scharfe
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2020-03-23 13:08 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

Git's own `git archive` inserts that header, but it often gets into the
way of `import-tars.perl` e.g. when a prefix was specified (for example
via `--prefix=my-project-1.0.0/`, or when downloading a `.tar.gz` from
GitHub releases): this prefix _should_ be stripped.

Let's just skip it.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
    Ignore the global PAX header in import-tars.perl
    
    This problem came up in Pacman-related work, where PKGBUILD definitions
    would reference the tarballs downloaded from GitHub, and patches would
    be applied on top. To work on those patches efficiently (e.g. when an
    upgrade to a new version of the project no longer lets those patches
    apply), I need to be able to import those tarballs into playground
    worktrees and work on them. I like to use 
    contrib/fast-import/import-tars.perl for that purpose, but it really
    needs to strip the prefix, otherwise it is too tedious to work with it.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-577%2Fdscho%2Fimport-tars-skip-pax-header-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-577/dscho/import-tars-skip-pax-header-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/577

 contrib/fast-import/import-tars.perl | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/contrib/fast-import/import-tars.perl b/contrib/fast-import/import-tars.perl
index e800d9f5c9c..d50ce26d5d9 100755
--- a/contrib/fast-import/import-tars.perl
+++ b/contrib/fast-import/import-tars.perl
@@ -139,6 +139,8 @@
 			print FI "\n";
 		}
 
+		next if ($typeflag eq 'g'); # ignore global header
+
 		my $path;
 		if ($prefix) {
 			$path = "$prefix/$name";

base-commit: b4374e96c84ed9394fed363973eb540da308ed4f
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] import-tars: ignore the global PAX header
  2020-03-23 13:08 [PATCH] import-tars: ignore the global PAX header Johannes Schindelin via GitGitGadget
@ 2020-03-23 17:09 ` René Scharfe
  2020-03-23 17:41   ` Junio C Hamano
  2020-03-23 21:05   ` Johannes Schindelin
  2020-03-23 23:25 ` brian m. carlson
  2020-03-24 19:35 ` [PATCH v2] " Johannes Schindelin via GitGitGadget
  2 siblings, 2 replies; 13+ messages in thread
From: René Scharfe @ 2020-03-23 17:09 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget, git; +Cc: Johannes Schindelin

Am 23.03.20 um 14:08 schrieb Johannes Schindelin via GitGitGadget:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> Git's own `git archive` inserts that header, but it often gets into the
> way of `import-tars.perl` e.g. when a prefix was specified (for example
> via `--prefix=my-project-1.0.0/`, or when downloading a `.tar.gz` from
> GitHub releases): this prefix _should_ be stripped.
>
> Let's just skip it.

git archive uses a global pax header to pass the ID of the archived
commit as a comment, and for mtime values after 2242-03-16.  Ignoring it
in a simple importer seems reasonable for now, but I don't understand
how this relates to prefixes.  Is it because the header is treated as a
regular file with the full path "pax_global_header" (independently from
any prefix for actual files) and can thus be placed outside the expected
destination directory?

Thanks,
René

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] import-tars: ignore the global PAX header
  2020-03-23 17:09 ` René Scharfe
@ 2020-03-23 17:41   ` Junio C Hamano
  2020-03-23 21:08     ` Johannes Schindelin
  2020-03-23 21:05   ` Johannes Schindelin
  1 sibling, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2020-03-23 17:41 UTC (permalink / raw)
  To: René Scharfe
  Cc: Johannes Schindelin via GitGitGadget, git, Johannes Schindelin

René Scharfe <l.s.r@web.de> writes:

> Am 23.03.20 um 14:08 schrieb Johannes Schindelin via GitGitGadget:
>> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>>
>> Git's own `git archive` inserts that header, but it often gets into the
>> way of `import-tars.perl` e.g. when a prefix was specified (for example
>> via `--prefix=my-project-1.0.0/`, or when downloading a `.tar.gz` from
>> GitHub releases): this prefix _should_ be stripped.
>>
>> Let's just skip it.
>
> git archive uses a global pax header to pass the ID of the archived
> commit as a comment, and for mtime values after 2242-03-16.  Ignoring it
> in a simple importer seems reasonable for now, but I don't understand
> how this relates to prefixes.  Is it because the header is treated as a
> regular file with the full path "pax_global_header" (independently from
> any prefix for actual files) and can thus be placed outside the expected
> destination directory?

Thanks for asking the question, as I was also curious if we are
throwing away too much (perhaps "prefix is given as a global pax
header, and ignoring all global pax headers is the most expedite
way" was the reason the patch was written that way?).  I agree with
you that for the purpose of simple-minded importer, it probably is
acceptable to take such a short-cut, but it would help future
developers if we clearly documented that it is a short-cut that
throws too much.  That would welcome their effort to enhance the
importer, if they find it more useful to keep some other information
found in global headers, without breaking the intent of this change.

Having said all that, even before "git archive" existed, release
tarballs by many projects had leading prefix so that a tarball
extract would be made inside a versioned directory.  To truly help
users of the importer, doesn't the logic to allow the user to say
"please strip one leading level of directory from all the tarballs I
feed you, as I know they are versioned directories" belong to the
command line option of the importer?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] import-tars: ignore the global PAX header
  2020-03-23 17:09 ` René Scharfe
  2020-03-23 17:41   ` Junio C Hamano
@ 2020-03-23 21:05   ` Johannes Schindelin
  2020-03-23 21:53     ` Junio C Hamano
  1 sibling, 1 reply; 13+ messages in thread
From: Johannes Schindelin @ 2020-03-23 21:05 UTC (permalink / raw)
  To: René Scharfe; +Cc: Johannes Schindelin via GitGitGadget, git

[-- Attachment #1: Type: text/plain, Size: 1796 bytes --]

Hi René,

On Mon, 23 Mar 2020, René Scharfe wrote:

> Am 23.03.20 um 14:08 schrieb Johannes Schindelin via GitGitGadget:
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > Git's own `git archive` inserts that header, but it often gets into the
> > way of `import-tars.perl` e.g. when a prefix was specified (for example
> > via `--prefix=my-project-1.0.0/`, or when downloading a `.tar.gz` from
> > GitHub releases): this prefix _should_ be stripped.
> >
> > Let's just skip it.
>
> git archive uses a global pax header to pass the ID of the archived
> commit as a comment, and for mtime values after 2242-03-16.  Ignoring it
> in a simple importer seems reasonable for now, but I don't understand
> how this relates to prefixes.

The tar importer in `contrib/fast-import/import-tars.perl` has a very
convenient feature that I took for public knowledge: if _all_ paths stored
in the imported `.tar` start with a common prefix, e.g.
`my-project-1.0.0/` as indicated in the commit message (or `git-2.26.0/`
in the tar at https://github.com/git/git/archive/v2.26.0.tar.gz), then
this prefix is stripped.

This feature makes a ton of sense because it is relatively common to
import two or more revisions of the same project into Git, and obviously
we don't want all files to live in a tree whose name changes from revision
to revision.

Now, the problem with that feature is that it breaks down if there is a
`pax_global_header` "file" located outside of said prefix.

> Is it because the header is treated as a regular file with the full path
> "pax_global_header" (independently from any prefix for actual files) and
> can thus be placed outside the expected destination directory?

Exactly.

Thanks,
Dscho

>
> Thanks,
> René
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] import-tars: ignore the global PAX header
  2020-03-23 17:41   ` Junio C Hamano
@ 2020-03-23 21:08     ` Johannes Schindelin
  2020-03-23 21:50       ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Johannes Schindelin @ 2020-03-23 21:08 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: René Scharfe, Johannes Schindelin via GitGitGadget, git

[-- Attachment #1: Type: text/plain, Size: 2529 bytes --]

Hi Junio,

On Mon, 23 Mar 2020, Junio C Hamano wrote:

> René Scharfe <l.s.r@web.de> writes:
>
> > Am 23.03.20 um 14:08 schrieb Johannes Schindelin via GitGitGadget:
> >> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >>
> >> Git's own `git archive` inserts that header, but it often gets into the
> >> way of `import-tars.perl` e.g. when a prefix was specified (for example
> >> via `--prefix=my-project-1.0.0/`, or when downloading a `.tar.gz` from
> >> GitHub releases): this prefix _should_ be stripped.
> >>
> >> Let's just skip it.
> >
> > git archive uses a global pax header to pass the ID of the archived
> > commit as a comment, and for mtime values after 2242-03-16.  Ignoring it
> > in a simple importer seems reasonable for now, but I don't understand
> > how this relates to prefixes.  Is it because the header is treated as a
> > regular file with the full path "pax_global_header" (independently from
> > any prefix for actual files) and can thus be placed outside the expected
> > destination directory?
>
> Thanks for asking the question, as I was also curious if we are
> throwing away too much (perhaps "prefix is given as a global pax
> header, and ignoring all global pax headers is the most expedite
> way" was the reason the patch was written that way?).  I agree with
> you that for the purpose of simple-minded importer, it probably is
> acceptable to take such a short-cut, but it would help future
> developers if we clearly documented that it is a short-cut that
> throws too much.  That would welcome their effort to enhance the
> importer, if they find it more useful to keep some other information
> found in global headers, without breaking the intent of this change.

I don't think that we're throwing away anything because the PAX header is
intended to be a _header_, not a _file_, yet
`contrib/fast-import/import-tars.perl` currently treats PAX headers that
way.

> Having said all that, even before "git archive" existed, release
> tarballs by many projects had leading prefix so that a tarball
> extract would be made inside a versioned directory.  To truly help
> users of the importer, doesn't the logic to allow the user to say
> "please strip one leading level of directory from all the tarballs I
> feed you, as I know they are versioned directories" belong to the
> command line option of the importer?

I guess nobody needed an explicit way to strip path prefixes yet, since
the implicit way works so well.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] import-tars: ignore the global PAX header
  2020-03-23 21:08     ` Johannes Schindelin
@ 2020-03-23 21:50       ` Junio C Hamano
  0 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2020-03-23 21:50 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: René Scharfe, Johannes Schindelin via GitGitGadget, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> throws too much.  That would welcome their effort to enhance the
>> importer, if they find it more useful to keep some other information
>> found in global headers, without breaking the intent of this change.
>
> I don't think that we're throwing away anything because the PAX header is
> intended to be a _header_, not a _file_, yet
> `contrib/fast-import/import-tars.perl` currently treats PAX headers that
> way.

What I meant (and wrote) was information contained within the
header.  You could store such metadata (e.g. this tree came from
this commit from the upstream project) in the commit object while
importing, for example.

As I wrote, I do not think we need to implement such feature right
now.  I am just saying that we should make sure we are not
unintendely discouraging future developers from doing so by giving
an impression that we are claiming "a pax header is intended to be a
header and has no interesting information---never look at it".  If
we said "We discard the headers because it is the most expedient way
and we currently have no use for them", that would make it clear
that it is OK for them (the future ourselves developers) to extend
the code not to lose the information, as long as they keep ignoring
the prefix thing alone, if they want to follow the course set by
this change.

>> Having said all that, even before "git archive" existed, release
>> tarballs by many projects had leading prefix so that a tarball
>> extract would be made inside a versioned directory.  To truly help
>> users of the importer, doesn't the logic to allow the user to say
>> "please strip one leading level of directory from all the tarballs I
>> feed you, as I know they are versioned directories" belong to the
>> command line option of the importer?
>
> I guess nobody needed an explicit way to strip path prefixes yet, since
> the implicit way works so well.

Now you confused me.  If implicit way already works well, why are we
adding this patch to make it implicitly ignore?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] import-tars: ignore the global PAX header
  2020-03-23 21:05   ` Johannes Schindelin
@ 2020-03-23 21:53     ` Junio C Hamano
  0 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2020-03-23 21:53 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: René Scharfe, Johannes Schindelin via GitGitGadget, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> The tar importer in `contrib/fast-import/import-tars.perl` has a very
> convenient feature that I took for public knowledge: if _all_ paths stored
> in the imported `.tar` start with a common prefix, e.g.
> `my-project-1.0.0/` as indicated in the commit message (or `git-2.26.0/`
> in the tar at https://github.com/git/git/archive/v2.26.0.tar.gz), then
> this prefix is stripped.

Ahh, scratch what I said in my recent response.  No, the above is
not public knowledge, and that was missing from the log message that
confused me.

With that spelled out, yes, it makes sense not to allow the phony
"file" at the top level created out of the pax headers.

So, I think with a bit more explanation in the log, the patch would
be OK.   We do not want _any_ header.

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] import-tars: ignore the global PAX header
  2020-03-23 13:08 [PATCH] import-tars: ignore the global PAX header Johannes Schindelin via GitGitGadget
  2020-03-23 17:09 ` René Scharfe
@ 2020-03-23 23:25 ` brian m. carlson
  2020-03-24 13:35   ` Johannes Schindelin
  2020-03-24 19:35 ` [PATCH v2] " Johannes Schindelin via GitGitGadget
  2 siblings, 1 reply; 13+ messages in thread
From: brian m. carlson @ 2020-03-23 23:25 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

[-- Attachment #1: Type: text/plain, Size: 2374 bytes --]

On 2020-03-23 at 13:08:44, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> Git's own `git archive` inserts that header, but it often gets into the
> way of `import-tars.perl` e.g. when a prefix was specified (for example
> via `--prefix=my-project-1.0.0/`, or when downloading a `.tar.gz` from
> GitHub releases): this prefix _should_ be stripped.
> 
> Let's just skip it.
> 
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>     Ignore the global PAX header in import-tars.perl
>     
>     This problem came up in Pacman-related work, where PKGBUILD definitions
>     would reference the tarballs downloaded from GitHub, and patches would
>     be applied on top. To work on those patches efficiently (e.g. when an
>     upgrade to a new version of the project no longer lets those patches
>     apply), I need to be able to import those tarballs into playground
>     worktrees and work on them. I like to use 
>     contrib/fast-import/import-tars.perl for that purpose, but it really
>     needs to strip the prefix, otherwise it is too tedious to work with it.
> 
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-577%2Fdscho%2Fimport-tars-skip-pax-header-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-577/dscho/import-tars-skip-pax-header-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/577
> 
>  contrib/fast-import/import-tars.perl | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/contrib/fast-import/import-tars.perl b/contrib/fast-import/import-tars.perl
> index e800d9f5c9c..d50ce26d5d9 100755
> --- a/contrib/fast-import/import-tars.perl
> +++ b/contrib/fast-import/import-tars.perl
> @@ -139,6 +139,8 @@
>  			print FI "\n";
>  		}
>  
> +		next if ($typeflag eq 'g'); # ignore global header
> +

In general, it isn't safe to do this.  A pax global header contains
attributes that may live in a normal extended header at a lower
priority.  So it is valid, for example, to write an mtime field in the
global header that applies to the entire archive and overrides the
ustar header block (and is overridden by a normal extended header).

I think we need a different solution for this case.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] import-tars: ignore the global PAX header
  2020-03-23 23:25 ` brian m. carlson
@ 2020-03-24 13:35   ` Johannes Schindelin
  0 siblings, 0 replies; 13+ messages in thread
From: Johannes Schindelin @ 2020-03-24 13:35 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Johannes Schindelin via GitGitGadget, git

Hi brian,

On Mon, 23 Mar 2020, brian m. carlson wrote:

> On 2020-03-23 at 13:08:44, Johannes Schindelin via GitGitGadget wrote:
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > Git's own `git archive` inserts that header, but it often gets into the
> > way of `import-tars.perl` e.g. when a prefix was specified (for example
> > via `--prefix=my-project-1.0.0/`, or when downloading a `.tar.gz` from
> > GitHub releases): this prefix _should_ be stripped.
> >
> > Let's just skip it.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> >     Ignore the global PAX header in import-tars.perl
> >
> >     This problem came up in Pacman-related work, where PKGBUILD definitions
> >     would reference the tarballs downloaded from GitHub, and patches would
> >     be applied on top. To work on those patches efficiently (e.g. when an
> >     upgrade to a new version of the project no longer lets those patches
> >     apply), I need to be able to import those tarballs into playground
> >     worktrees and work on them. I like to use
> >     contrib/fast-import/import-tars.perl for that purpose, but it really
> >     needs to strip the prefix, otherwise it is too tedious to work with it.
> >
> > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-577%2Fdscho%2Fimport-tars-skip-pax-header-v1
> > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-577/dscho/import-tars-skip-pax-header-v1
> > Pull-Request: https://github.com/gitgitgadget/git/pull/577
> >
> >  contrib/fast-import/import-tars.perl | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/contrib/fast-import/import-tars.perl b/contrib/fast-import/import-tars.perl
> > index e800d9f5c9c..d50ce26d5d9 100755
> > --- a/contrib/fast-import/import-tars.perl
> > +++ b/contrib/fast-import/import-tars.perl
> > @@ -139,6 +139,8 @@
> >  			print FI "\n";
> >  		}
> >
> > +		next if ($typeflag eq 'g'); # ignore global header
> > +
>
> In general, it isn't safe to do this.  A pax global header contains
> attributes that may live in a normal extended header at a lower
> priority.  So it is valid, for example, to write an mtime field in the
> global header that applies to the entire archive and overrides the
> ustar header block (and is overridden by a normal extended header).
>
> I think we need a different solution for this case.

I agree that we need a different solution for this case.

At the same time, I would like to point out that I am trying to address a
_different_ problem than "we're not using the information contained within
the PAX global header": I want to prevent that header from polluting the
top-level tree.

For what it's worth, my patch does not prevent future patches from using
the information contained within the global header :-)

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2] import-tars: ignore the global PAX header
  2020-03-23 13:08 [PATCH] import-tars: ignore the global PAX header Johannes Schindelin via GitGitGadget
  2020-03-23 17:09 ` René Scharfe
  2020-03-23 23:25 ` brian m. carlson
@ 2020-03-24 19:35 ` Johannes Schindelin via GitGitGadget
  2020-03-24 21:04   ` Junio C Hamano
  2 siblings, 1 reply; 13+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2020-03-24 19:35 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, brian m. carlson, Junio C Hamano,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The tar importer in `contrib/fast-import/import-tars.perl` has a very
convenient feature: if _all_ paths stored in the imported `.tar` start
with a common prefix, e.g. `git-2.26.0/` in the tar at
https://github.com/git/git/archive/v2.26.0.tar.gz, then this prefix is
stripped.

This feature makes a ton of sense because it is relatively common to
import two or more revisions of the same project into Git, and obviously
we don't want all files to live in a tree whose name changes from
revision to revision.

Now, the problem with that feature is that it breaks down if there is a
`pax_global_header` "file" located outside of said prefix, at the top of
the tree. This is the case for `.tar` files generated by Git's very own
`git archive` command: it inserts that header, and `git archive` allows
specifying a common prefix (that the header does _not_ share with the
other files contained in the archive) via `--prefix=my-project-1.0.0/`.

Let's just skip any global header when importing `.tar` files into Git.

Note: this global header might contain useful information. For example,
in the output of `git archive`, it lists the original commit, which _is_
useful information. A future improvement to the `import-tars.perl`
script might be to include that information in the commit message, or do
other things with the information (e.g. use `mtime` information
contained in the global header as date of the commit). This patch does
not prevent any future patch from making that happen, it only prevents
the header from being treated as if it was a regular file.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
    Ignore the global PAX header in import-tars.perl
    
    This problem came up in Pacman-related work, where PKGBUILD definitions
    would reference the tarballs downloaded from GitHub, and patches would
    be applied on top. To work on those patches efficiently (e.g. when an
    upgrade to a new version of the project no longer lets those patches
    apply), I need to be able to import those tarballs into playground
    worktrees and work on them. I like to use 
    contrib/fast-import/import-tars.perl for that purpose, but it really
    needs to strip the prefix, otherwise it is too tedious to work with it.
    
    Changes since v1:
    
     * Mentioned the implicit prefix-stripping feature of import-tars.perl 
       in the commit message; Without that context, it is really hard to
       understand the motivation for this patch.
     * Clarified in the commit message that this patch does not prevent any
       future patches that would use the information contained in the global
       header.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-577%2Fdscho%2Fimport-tars-skip-pax-header-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-577/dscho/import-tars-skip-pax-header-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/577

Range-diff vs v1:

 1:  718bde8f4a7 ! 1:  842dabe6128 import-tars: ignore the global PAX header
     @@ -2,12 +2,34 @@
      
          import-tars: ignore the global PAX header
      
     -    Git's own `git archive` inserts that header, but it often gets into the
     -    way of `import-tars.perl` e.g. when a prefix was specified (for example
     -    via `--prefix=my-project-1.0.0/`, or when downloading a `.tar.gz` from
     -    GitHub releases): this prefix _should_ be stripped.
     +    The tar importer in `contrib/fast-import/import-tars.perl` has a very
     +    convenient feature: if _all_ paths stored in the imported `.tar` start
     +    with a common prefix, e.g. `git-2.26.0/` in the tar at
     +    https://github.com/git/git/archive/v2.26.0.tar.gz, then this prefix is
     +    stripped.
      
     -    Let's just skip it.
     +    This feature makes a ton of sense because it is relatively common to
     +    import two or more revisions of the same project into Git, and obviously
     +    we don't want all files to live in a tree whose name changes from
     +    revision to revision.
     +
     +    Now, the problem with that feature is that it breaks down if there is a
     +    `pax_global_header` "file" located outside of said prefix, at the top of
     +    the tree. This is the case for `.tar` files generated by Git's very own
     +    `git archive` command: it inserts that header, and `git archive` allows
     +    specifying a common prefix (that the header does _not_ share with the
     +    other files contained in the archive) via `--prefix=my-project-1.0.0/`.
     +
     +    Let's just skip any global header when importing `.tar` files into Git.
     +
     +    Note: this global header might contain useful information. For example,
     +    in the output of `git archive`, it lists the original commit, which _is_
     +    useful information. A future improvement to the `import-tars.perl`
     +    script might be to include that information in the commit message, or do
     +    other things with the information (e.g. use `mtime` information
     +    contained in the global header as date of the commit). This patch does
     +    not prevent any future patch from making that happen, it only prevents
     +    the header from being treated as if it was a regular file.
      
          Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
      


 contrib/fast-import/import-tars.perl | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/contrib/fast-import/import-tars.perl b/contrib/fast-import/import-tars.perl
index e800d9f5c9c..d50ce26d5d9 100755
--- a/contrib/fast-import/import-tars.perl
+++ b/contrib/fast-import/import-tars.perl
@@ -139,6 +139,8 @@
 			print FI "\n";
 		}
 
+		next if ($typeflag eq 'g'); # ignore global header
+
 		my $path;
 		if ($prefix) {
 			$path = "$prefix/$name";

base-commit: b4374e96c84ed9394fed363973eb540da308ed4f
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] import-tars: ignore the global PAX header
  2020-03-24 19:35 ` [PATCH v2] " Johannes Schindelin via GitGitGadget
@ 2020-03-24 21:04   ` Junio C Hamano
  2020-03-25 17:59     ` Johannes Schindelin
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2020-03-24 21:04 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, René Scharfe, brian m. carlson, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

>      +    Note: this global header might contain useful information. For example,
>      +    in the output of `git archive`, it lists the original commit, which _is_
>      +    useful information. A future improvement to the `import-tars.perl`
>      +    script might be to include that information in the commit message, or do
>      +    other things with the information (e.g. use `mtime` information
>      +    contained in the global header as date of the commit). This patch does
>      +    not prevent any future patch from making that happen, it only prevents
>      +    the header from being treated as if it was a regular file.
>       
>           Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
>       
>
>
>  contrib/fast-import/import-tars.perl | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/contrib/fast-import/import-tars.perl b/contrib/fast-import/import-tars.perl
> index e800d9f5c9c..d50ce26d5d9 100755
> --- a/contrib/fast-import/import-tars.perl
> +++ b/contrib/fast-import/import-tars.perl
> @@ -139,6 +139,8 @@
>  			print FI "\n";
>  		}
>  
> +		next if ($typeflag eq 'g'); # ignore global header
> +

Yeah, it is more like "don't create a phony file out of global
header" which is exactly the point of this fix, rather than "ignore
global header", and if the contents of the header is used for any
other purpose (e.g. metadata that will be added to the log message
or mode bits that is forced on all files), that can be handled
before this line.  And the current code structure is already
prepared for it: all that remains in the block after this point is
to create a file at $path and store its contents.

So, this makes sense.  I suspect that with an update to the comment
in the direction, there probably is no need for the huge "Note" in
the log message.

Thanks.

>  		my $path;
>  		if ($prefix) {
>  			$path = "$prefix/$name";
>
> base-commit: b4374e96c84ed9394fed363973eb540da308ed4f

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] import-tars: ignore the global PAX header
  2020-03-24 21:04   ` Junio C Hamano
@ 2020-03-25 17:59     ` Johannes Schindelin
  2020-03-25 18:43       ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Johannes Schindelin @ 2020-03-25 17:59 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin via GitGitGadget, git, René Scharfe,
	brian m. carlson

Hi Junio,

On Tue, 24 Mar 2020, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> >      +    Note: this global header might contain useful information. For example,
> >      +    in the output of `git archive`, it lists the original commit, which _is_
> >      +    useful information. A future improvement to the `import-tars.perl`
> >      +    script might be to include that information in the commit message, or do
> >      +    other things with the information (e.g. use `mtime` information
> >      +    contained in the global header as date of the commit). This patch does
> >      +    not prevent any future patch from making that happen, it only prevents
> >      +    the header from being treated as if it was a regular file.
> >
> >           Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> >
> >
> >  contrib/fast-import/import-tars.perl | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/contrib/fast-import/import-tars.perl b/contrib/fast-import/import-tars.perl
> > index e800d9f5c9c..d50ce26d5d9 100755
> > --- a/contrib/fast-import/import-tars.perl
> > +++ b/contrib/fast-import/import-tars.perl
> > @@ -139,6 +139,8 @@
> >  			print FI "\n";
> >  		}
> >
> > +		next if ($typeflag eq 'g'); # ignore global header
> > +
>
> Yeah, it is more like "don't create a phony file out of global
> header" which is exactly the point of this fix, rather than "ignore
> global header", and if the contents of the header is used for any
> other purpose (e.g. metadata that will be added to the log message
> or mode bits that is forced on all files), that can be handled
> before this line.  And the current code structure is already
> prepared for it: all that remains in the block after this point is
> to create a file at $path and store its contents.
>
> So, this makes sense.  I suspect that with an update to the comment
> in the direction, there probably is no need for the huge "Note" in
> the log message.

Too many of such in-code comments added by me became stale. I'd rather
keep this in the commit message where it explains the reasoning for the
current change.

Thanks,
Dscho

>
> Thanks.
>
> >  		my $path;
> >  		if ($prefix) {
> >  			$path = "$prefix/$name";
> >
> > base-commit: b4374e96c84ed9394fed363973eb540da308ed4f
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] import-tars: ignore the global PAX header
  2020-03-25 17:59     ` Johannes Schindelin
@ 2020-03-25 18:43       ` Junio C Hamano
  0 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2020-03-25 18:43 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Johannes Schindelin via GitGitGadget, git, René Scharfe,
	brian m. carlson

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Too many of such in-code comments added by me became stale. I'd rather
> keep this in the commit message where it explains the reasoning for the
> current change.

I was updating this week's "What's cooking" report, and described
this one like so:

    * js/import-tars-do-not-make-phony-files-from-pax-headers (2020-03-24) 1 commit
     - import-tars: ignore the global PAX header

     The import-tars importer (in contrib/fast-import/) used to create
     phony files at the top-level of the repository when the archive
     contains global PAX headers, which made its own logic to detect and
     omit the common leading directory ineffective, which has been
     corrected.

I guess we are ready to mark it as "Will merge".

Thanks.  

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-03-25 18:43 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-23 13:08 [PATCH] import-tars: ignore the global PAX header Johannes Schindelin via GitGitGadget
2020-03-23 17:09 ` René Scharfe
2020-03-23 17:41   ` Junio C Hamano
2020-03-23 21:08     ` Johannes Schindelin
2020-03-23 21:50       ` Junio C Hamano
2020-03-23 21:05   ` Johannes Schindelin
2020-03-23 21:53     ` Junio C Hamano
2020-03-23 23:25 ` brian m. carlson
2020-03-24 13:35   ` Johannes Schindelin
2020-03-24 19:35 ` [PATCH v2] " Johannes Schindelin via GitGitGadget
2020-03-24 21:04   ` Junio C Hamano
2020-03-25 17:59     ` Johannes Schindelin
2020-03-25 18:43       ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).