git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Joey Hess <id@joeyh.name>, Git List <git@vger.kernel.org>
Subject: Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution
Date: Fri, 03 Feb 2023 14:32:08 +0100	[thread overview]
Message-ID: <230203.86o7qac1hr.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <Y9yHWFh0ijwrqhOX@mit.edu>


On Thu, Feb 02 2023, Theodore Ts'o wrote:

> On Thu, Feb 02, 2023 at 05:19:30PM -0400, Joey Hess wrote:
>> In my opinion as the original developer of pristine-tar, it's too
>> complicated to be usefully used by git. The problem it solves is of a
>> larger scope than the problem git has here. (I hope.)
>
> Well, the problem which I believe folks on this thread are trying to
> deal with is a way to reconstruct a bit-for-bit compressed tarball of
> a particular release in a way that minimizes the cost of storage in
> the git tree.  One way of doing that would be to guarantee that git
> archive would return something which is always bit-for-bit identical.
> Another way is to use something like pristine tar.

I think that's what this side-thread has devolved into, but I honestly
don't see how that's useful or more than tangentally related to the
problem noted at the start of the thread.

If you are writing a new system that consumes "git archive" output
something like what I'm proposing to add in [1] should nicely sidestep
this issue, just checksum the uncompressed archive (assuming you're OK
with our soft "tar" guarantees), or "git tag -v" (if you can) etc.

That part of the docs is just a summary of what Konstantin Ryabitsev
pointed out in a side-thread.

One might also imagine any other number of trivial solutions to the
problem, e.g. people interested in this can unpack the archive, and then
(needs to guarantee sorted order, which I think find(1) doesn't, but
just as a POC):

	(cd unpacked && find . -type f -printf "%f\n" -exec cat {} \; | sha256sum)

Or whatever.

But any such solution to the abstract problem isn't going to help the
existing users whose systems broke because they were assuming certain
things about the "git archive" output.

For those users I think (as my proposed series does) we should just do
whatever we can do limit the disruption, as my proposed [2] does by
switching back to "gzip".

For those users who are creating new systems that might use "git
archive" today we then just need to update the documentation going
forward. Maybe those could use "pristine-tar", or perhaps they can use
some entirely different distribution mechanism.

1. https://lore.kernel.org/git/patch-9.9-b40833b2168-20230202T093212Z-avarab@gmail.com/
2. https://lore.kernel.org/git/cover-0.9-00000000000-20230202T093212Z-avarab@gmail.com/

  reply	other threads:[~2023-02-03 13:39 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-31  0:06 Stability of git-archive, breaking (?) the Github universe, and a possible solution Eli Schwartz
2023-01-31  7:49 ` Ævar Arnfjörð Bjarmason
2023-01-31  9:11   ` Eli Schwartz
2023-02-02  9:32   ` [PATCH 0/9] git archive: use gzip again by default, document output stabilty Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 1/9] archive & tar config docs: de-duplicate configuration section Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 2/9] git config docs: document "tar.<format>.{command,remote}" Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 3/9] archiver API: make the "flags" in "struct archiver" an enum Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 4/9] archive: omit the shell for built-in "command" filters Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 5/9] archive-tar.c: move internal gzip implementation to a function Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 6/9] archive: use "gzip -cn" for stability, not "git archive gzip" Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 7/9] test-lib.sh: add a lazy GZIP prerequisite Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 8/9] archive tests: test for "gzip -cn" and "git archive gzip" stability Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 9/9] git archive docs: document output non-stability Ævar Arnfjörð Bjarmason
2023-02-02 10:25       ` brian m. carlson
2023-02-02 10:30         ` Ævar Arnfjörð Bjarmason
2023-02-02 16:34         ` Junio C Hamano
2023-02-04 17:46           ` brian m. carlson
2023-02-02 16:17     ` [PATCH 0/9] git archive: use gzip again by default, document output stabilty Phillip Wood
2023-02-02 16:40       ` Junio C Hamano
2023-02-03 13:49       ` Ævar Arnfjörð Bjarmason
2023-02-06 14:46         ` Phillip Wood
2023-02-03 15:47       ` Theodore Ts'o
2023-02-02 16:25     ` Junio C Hamano
2023-02-04 18:08       ` René Scharfe
2023-02-05 21:30         ` Ævar Arnfjörð Bjarmason
2023-02-12 17:41           ` René Scharfe
2023-02-02 19:23     ` Raymond E. Pasco
2023-02-03  8:06       ` [PATCH] archive: document output stability concerns Raymond E. Pasco
2023-01-31  9:54 ` Stability of git-archive, breaking (?) the Github universe, and a possible solution brian m. carlson
2023-01-31 11:31   ` Ævar Arnfjörð Bjarmason
2023-01-31 15:05   ` Konstantin Ryabitsev
2023-01-31 22:32     ` brian m. carlson
2023-02-01  9:40       ` Ævar Arnfjörð Bjarmason
2023-02-01 11:34         ` demerphq
2023-02-01 12:21           ` Michal Suchánek
2023-02-01 12:48             ` demerphq
2023-02-01 13:43               ` Ævar Arnfjörð Bjarmason
2023-02-01 15:21                 ` demerphq
2023-02-01 18:56                   ` Theodore Ts'o
2023-02-02 21:19                     ` Joey Hess
2023-02-03  4:02                       ` Theodore Ts'o
2023-02-03 13:32                         ` Ævar Arnfjörð Bjarmason [this message]
2023-02-01 23:16         ` brian m. carlson
2023-02-01 23:37           ` Junio C Hamano
2023-02-02 23:01             ` brian m. carlson
2023-02-02 23:47               ` rsbecker
2023-02-03 13:18                 ` Ævar Arnfjörð Bjarmason
2023-02-02  0:42           ` Ævar Arnfjörð Bjarmason
2023-02-01 12:17       ` Raymond E. Pasco
2023-01-31 15:56   ` Eli Schwartz
2023-01-31 16:20     ` Konstantin Ryabitsev
2023-01-31 16:34       ` Eli Schwartz
2023-01-31 20:34         ` Konstantin Ryabitsev
2023-01-31 20:45         ` Michal Suchánek
2023-02-01  1:33     ` brian m. carlson
2023-02-01 12:42   ` Ævar Arnfjörð Bjarmason
2023-02-01 23:18     ` brian m. carlson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=230203.86o7qac1hr.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=id@joeyh.name \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).